Recently I realized that the size of my executables is quite large. I am developing software for Cortex-M microcontrollers, using Eclipse and GCC.
To check this out, I used an example project I found on the internet, that simply blinks an LED by manipulating the registers directly and that is makefile based.
I created a very similar project using my libraries, startup code, linker scripts etc that uses Eclipse's managed makefiles.
The first project compiled successfully, and produced a binary file of app. 6kB. The second project produced a binary file of app.48kB! That is obviously quite a large difference for essentially the same result, and the later is definitely a huge file, for just blinking an LED. In both cases optimizations were off.
In my own libraries, there are some volatile buffers, that may be the excuse for the large BSS, or data sections, so I decided to begin by concentrating on the text section (which is still 5 times larger 5kB to 27kB).
I took a look at the map file to see what is really linked to the binary file. Same or similar functions had also similar size.
There is one thing that seems very-very odd to me. There are functions wich are defined only once in the whole project, but appear to have been linked multiple times, each time from a different object file, and each time occupying space in the text section. Take a look for example to function .text.port_lock.
Is this normal? How can I reduce the final file size, and how can I tell the toolchain to only link once each function?
The map file
Edit:
As stated in the comments the two programs are not different, it is the same thing, with minor modifications (e.g. startup code, and function to access the GPIO register). I am not testing GCC's ability to optimize code, thus I used -O0. I try to understand the map file, and why I see the some functions multiple times.
You are misreading the map file. None of the occurrences of .text.port_lock,
for example, represents a definition of the ChibiOS function void port_lock(void).
All the occurrences of .text.port_lock refer to input linker sections.
The first 4 occurrences, lying within the section of the map file titled
Discarded input sections, refer to input linker sections that the linker
discarded. For example:
.text.port_lock
0x00000000 0x1c /home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chmempools.o)
means that the linker found a section .text.port_lock of size 28 bytes
in input file /home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chmempools.o)
and threw it away.
The next 6 occurrences, lying within the the section of the map file titled
Linker script and memory map all refer to input linker sections that were
mapped into the output .text section. For example the first one:
.text.port_lock
0x000012a8 0x1c /tmp/ccaossic.ltrans0.ltrans.o
means that the linker found a section .text.port_lock of size 28 bytes
in input file /tmp/ccaossic.ltrans0.ltrans.o
and mapped it at address 0x000012a8 in the output .text section. Likewise the
second occurrence:
.text.port_lock
0x00001f70 0x1c /home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chsys.o)
means that an input section of the same name and size also was found in input
file /home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chsys.o)
and was mapped at address 0x00001f70 in the output .text section.
Altogether there are .text.port_lock input sections, all of them 28 bytes,
mapped in your output .text section from these input files:
/tmp/ccaossic.ltrans0.ltrans.o
/home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chsys.o)
/home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chthreads.o)
/home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chcore_v7m.o)
/home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chmemcore.o)
/home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chschd.o)
In all 6 of these cases, the input section contains no symbols, and in particular
no functions. For contrast, here is an example of an input section that does contain symbols:
.text 0x000002f0 0x28 /home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chcoreasm_v7m.o)
0x000002f0 _port_switch
0x00000300 _port_thread_start
0x00000310 _port_switch_from_isr
0x00000314 _port_exit_from_isr
This is the input .text section from /home/fotis/Documents/Resources/Chibios/Chibios/Debug/libChibios.a(chcoreasm_v7m.o).
The map file contains no indication that the port_lock function is linked multiple times. It contains no
indication that this function is linked at all. If it were linked multiple times then there
would have been a multiple-definition linkage error (except in the event that it had
been annotated as a weak symbol).
Why these six 28-byte input sections containing no symbols are all linked, or if they
need to be, is a matter about which I have no adequate evidence or ChibiOS expertise. I notice
that all but one of the object files from which these input sections come are
archive members of libChibios. In that light, it is worth remembering that if your linkage
requires an archive member for any reason then by default you will link the whole
archive member, even it contains more stuff than you need. On the other hand, the fact
that some port_lock input sections are discarded and some are kept suggests that there
is a need to keep the ones that are kept. If for my own cunning reasons I write a source file
essentially like:
static int __attribute__((section(".text.foo"))) __attribute__((used))
boo(int i)
{
return i * 2;
}
int bar(int i)
{
return boo(i);
}
then in my map file you will see an empty input section called .text.foo. This
doesn't tell you anything about the symbols I'm linking.
How can I tell the toolchain to only link once each function?
The linker will not link any symbol definition more than once, except in the special
case of weak symbols. Your map file contains no evidence of any function being linked more than once.
How can I reduce the final file size?
Compile with -Os for your release, of course. And to minimize linkage redundancy,
see this question.
Reading a linker map file is usually a clunky way of investigating the symbols and sections
in your binaries. Prefer objdump, readelf
and nm
Related
I noticed something peculiar while inspecting a map file produced with the -Map=mapfile option of the GNU linker. It listed a couple of symbols as belonging to the .text section, whereas the binary's symbol table lists them as part of the .rodata section. I suspect that this is some sort of cheeky optimization, as the compiler probably determined that those symbols are only ever read, but what surprises me is that the map file doesn't reflect that. My understanding is the linking is pretty much the final stage of the compilation process and all optimization happens before it. Is that correct? Why were these symbols optimized afterwards?
As you've probably inferred, the toolchain is GCC. The source code is written in C.
I am using gcc to compile some code for an ARM Cortex-M4F microcontroller. My application uses a large collection of data that I have conveniently written into a C file: large arrays of ints and floats, circular linked lists pointing to those various arrays, various structs, that sort of thing.
When I compile this it adds up to about 500K of data, plus a few hundred K for the actual application. gcc is putting all this data conveniently into the .data section; ld then tries to build the elf putting the .data section into RAM and .text (code) section into FLASH. The mcu I am using doesn't have 500K of RAM so it cannot build the ELF, but it does have 1M of FLASH. I tried changing my linker script so that both .data and .text are put into FLASH which "worked" but there are some other bit of code that expect its data to go into RAM, so ultimately execution failed; I can't make that sweeping change.
What I need is to tell gcc to put every single object in this C file into the .text section so it can go into FLASH with the rest of the non-mutable stuff, or at least some other section which I can then instruct my linker script what to do with so that it doesn't interfere with existing blobs that have no problem fitting in RAM. I don't know how to do that. Here is a very slimmed down example of what I have
/* data.c */
static char* option_array_0_0[] =
{
"O=0.40",
"K=FOO",
"JAR=JAR",
};
static int width_array_0_0[] =
{
0,
1,
1,
};
Window window_array_0[] =
{
{
option_array,
width_array,
},
};
/* main.c */
extern Window window_array_0[];
int main()
{
/* Do stuff */
}
The stuff in data.c, window_array_0 and everything (or most everything, maybe the string arrays are going to .text?) it links to, is being put in .data which my linker script puts into RAM. I want it all in a different section which I can then put into FLASH. There are thousands of these types of arrays in there, plus hundreds of structs and dozens of other bits of information. Is this possible to change? As a test I replaced my "window_array_0" with a char[500000] of random data and that compiled without complaint so I assume it put it all into .text (as would be expected), I just don't know how to make it do so for arbitrary objects.
Thanks for your help.
As other commenters pointed, 'static const' items usually end up in .rodata section which is likely to be placed right next to .text in potentially read-only memory. The caveat is that it may or may not be true in your case as that is specific to particular target and may be changed by particular build process (i.e. linker options, specific section specified via __attribute__((section("name"))) in C code, linker script, binaries tweaked after build with various binutils, etc).
If you need to have precise control over in-memory layout and you know what you're doing, you can use LD script to do it. Among other things it will let you specify that .rodata from file data.o should be placed just before/after .text from all other .o files linked into the executable.
You can use arm-<your toolchain variant>-ld -verbose to dump default linker script and use it as a starting point for tweaking.
Most compilers/linkers, if you declare a variable as static const, it will place it in the text section instead of data. Obviously, these must be preinitialized and not modified at run-time, but that's the only way it makes sense to go in flash.
Somewhere in the code (.text section) put like a sample:
__attribute__((section(".text")))
const int test[0x0A] = {0,0,0,0,0,0,0,0,0,0};
or without const if you want a variable to change:
__attribute__((section(".text")))
int test[0x0A] = {0,0,0,0,0,0,0,0,0,0};
Then try changing it:
test[0] = 0xffffffff;
This works on my Linux 32 intel machine.
IIRC, code in flash is usually required to be ROPI (read only position independent). So option_array_0_0 and width_array_0_0 would need const qualifiers (read only). However:
Window window_array_0[] =
{
{
option_array,
width_array,
},
};
needs to be changed somehow (I'm assuming option_array and width_array are indeed arrays). That makes window_array_0 not position independent.
All non-trivial embedded projects shall have a linker script. It defines where different symbols (function, global variables, debug symbols, ...) shall be placed in memory. Here is some tutorial about gcc linker script.
forcing all data to be placed in one section might not be the best solution!
NOTE: Since these arrays are NOT constants, it doesn't make sense to store them in data section! because this means you will end up over-writing code section by you code (which is PRETTY dangerous). Either make them const or create a no-init section which is not initialized by the loader, and you initialize it on reset.
I'm using custom elf headers in an autotools C project similar to this thread: How do you get the start and end addresses of a custom ELF section in C (gcc)?. The problem is that the c files that declare the custom sections are linked into a static library which is then linked to the final application.
In this configuration the symbols __start_custom_section and __stop_custom_section do not get generated. I define the elf section like this:
struct mystruct __attribute((__section__("custom_section"))) __attribute((__used__) = {
...
};
If I link to the object file instead of the library the symbols get created and everything works as expected. This isn't a scalable solution though because I'd like new modules to just work by compiling them into the modules library. Any idea why the linker doesn't create these special symbols when the section exists in a library vs a single object file?
I have done something similar to this recently, and my solution does not rely on any compiler specific implementations, internal undocumented symbols, etc. However, it does require a bit more work :)
Background
The ELF binary on disk can be loaded and parsed quite easily by knowing its format and using a couple structures provided to us: http://linux.die.net/man/5/elf. You can iterate through each of its segments and sections (segments are containers for sections). If you do this, you can calculate the the relative start/end virtual addresses of your section. By this logic, you would think that you can do the same thing at runtime by iterating through the segments and sections of the loaded, in-memory version of the ELF binary. But alas, you can only iterate through the segments themselves (via http://linux.die.net/man/3/dl_iterate_phdr), and all section metadata has been lost.
So, how can we retain the section metadata? Store it ourselves.
Solution
If you have a custom section named '.mycustom', then define a metadata struct that should at minimum store two numbers that will indicate the relative start address and the size of your '.mycustom' section. Create a global instance of this metadata struct that will live by itself in another custom section named '.mycustom_meta'.
Example:
typedef struct
{
unsigned long ulStart;
unsinged long ulSize;
} CustomSectionMeta;
__attribute((__section__(".mycustom_meta"))) CustomSectionMeta g_customSectionMeta = { 0, 0 };
You can see that our struct instance is initialized with zero for both start and size values. When you compile this code, your object file will contain a section named '.mycustom_meta' which will be 8 bytes in size for a 32-bit compilation (or 16 bytes for 64-bit), and the values will be all zeroes. Run objdump on it and you will see as much. Go ahead and put that into a static lib (.a) if you want, run readelf on it, and you will see exactly the same thing. Build it into a shared object (.so) if you want, run readelf on it, and again you will see the same thing. Build it into an executable program, run readelf on it, and voila its still there.
Now the trick comes in. You need to write a little executable (lets call it MetaWriter) that will update your ELF file on disk to fill in the start and size values. Here are the basic steps:
Open your ELF file (.o, .so, or executable) in binary mode and read it into a contiguous array. Or, you can mmap it into memory to achieve the same.
Read through the binary using header structures and instructions found in the ELF link I listed above.
Find your '.mycustom' section and read section.sh_addr and section.sh_size.
Find your '.mycustom_meta' section. Create an instance of CustomSectionMeta using the start and size values from step 3. memcpy() your struct over the top of the existing '.mycustom_meta' section data, which up to now was all zeroes.
Save you ELF data back to the original file. It should now be completely unmodified except for the few bytes you wrote into your '.mycustom_meta' section.
What I did was executed this MetaWriter program as part of the build process in my Makefile. So, you would build your .so or executable, then run MetaWriter on it to fill in the meta section. After that, its ready to go.
Now, when the code in your .so or executable runs, it can just read from g_customSectionMeta, which will be populated with the starting address offset of your '.mycustom' section, as well as the size of it, which can be used to easily calculate the end, of course. This start offset must be added to the base address of your loaded ELF binary. There are a couple ways to get this, but the easiest way I found was to run dladdr on a symbol that I know to exist in the binary (such as g_customSectionMeta!) and use the resulting value of dli_fbase to know the base address of the module.
Example:
#include <dlfcn.h>
Dl_info dlInfo;
if (dladdr(&g_customSectionMeta, &dlInfo) != 0)
{
void * vpBase = dlInfo.dli_fbase;
void * vpMyCustomStart = vpBase + g_customSectionMeta.ulStart;
void * vpMyCustomEnd = vpMyCustomStart + g_customSectionMeta.ulSize;
}
It would be a bit overboard to post the full amount of code required to do all this work, especially the parsing of the ELF binary in MetaWriter. However, if you need some help, feel free to reach out to me.
In my case, the variable was not referenced in the code and the section was optimised out in release mode (-O2). Adding used attribute solved the issue. Example:
static const unsigned char unused_var[] __attribute__((used, section("foo"))) = {
0xCA, 0xFE, 0xBA, 0xBE
};
I'm trying to debug a linker problem that I have, when writing a kernel.
The issue is that I have a variable SCAN_CODE_MAPPING that I'm not able to use -- it appears to be empty or something. I can fix this by changing the way I link my program, but I don't know why.
When I look inside the generated binary file using objdump, the data for the variable is definitely there, so there's just something broken with the reference to it.
Here's a gist with both of the linker scripts and the part of the symbol table that's different between the two files.
What confuses me is that both of the symbol tables have all the same symbols, they're all the same length, and they appear to contain the right data. The only difference that I can see is that they're not in the same order.
So far I've tried
inspecting the SCAN_CODE_MAPPING memory location to make sure it has the data I expect and hasn't been zeroed out
checking that all the symbols are the same
checking that all the symbol contents are the same length
looking at .data.rel.ro.local to make sure it has the address of the data
One possible clue is this warning:
warning: uninitialized space declared in non-BSS section `.text': zeroing
which I get in both the broken and the correct case.
What should I try next?
The problem here turned out to be that I was writing an OS, and only 12k of it was being loaded instead of the whole thing. So the linker script was actually working fine.
The main tools I used to understand binaries were:
nm
objdump
readelf
You can get a ton more information using "readelf".
In particular, take a look at the program headers:
readelf -l program
Your BSS section is quite different than the standard one, which probably causing the warning. Here's what the default looks like on my system:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
. = ALIGN(. != 0 ? 64 / 8 : 1);
}
If an input section doesn't match anything in your linker script, the linker still has to place it somewhere. Make sure you're covering all the input sections.
Note that there is a difference between sections and segments. Sections are used by the linker, but the only thing the program loader looks at are the segments. The text segment includes the text section, but it also includes other sections. Sections that go into the same segment must be adjacent. So order does matter.
The rodata section usually goes after the text section. These are both read-only during execution and will show up once in your program headers as a LOAD entry with read & execute permissions. That LOAD entry is the text segment.
The bss section usually goes after the data section. These are both writable during execution and will show up once in your program headers as a LOAD entry with read & write permissions. That LOAD entry is the data segment.
If you change the order, it affects how the linker generates the program headers. The program headers, rather than the section headers, are used when loading your program prior to executing it. Make sure to check the program headers when using a custom linker script.
If you can give more specifics about what your actual symptoms are, then it'll be easier to help.
The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.