Understanding certain ELF file structure - c

From ARM's infocenter, regarding section static linking and relocations:
** Section #1 'ER_RO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 28 bytes (alignment 4)
Address: 0x00008000
$a
.text
bar
0x00008000: E59f000C .... LDR r0,[pc,#12] ; [0x8014] = 0x801C
0x00008004: E5901000 .... LDR r1,[r0,#0]
0x00008008: E2411001 ..A. SUB r1,r1,#1
0x0000800C: E5801000 .... STR r1,[r0,#0]
0x00008010: E12FFF1E ../. BX lr
$d
0x00008014: 0000801C .... DCD 32796
$a
.text
foo
0x00008018: EAFFFFF8 .... B bar ; 0x8000
and from ELF for the ARM architecture:
Table 4-7, Mapping symbols
Name Meaning
$a - Start of a sequence of ARM instructions
$d - Start of a sequence of data items (for example, a literal pool)
As you can see, the ELF file contains a section in which there is code (bar), then data/ro (32796), then more code (foo) in consecutive addresses.
Now, a basic principle regarding any SW file structure is that the SW is composed from different and separate sections - text (code), data, and bss. (and rodata if we want to be pedantic) as we can see if we examine the MAP file.
So, this ELF structure is not consistent with this basic principle, so my question is what is going on here? am I mistaking in this basic principle? if not, than is this ELF structure will be changed in run time to meet the sections separation?
and why is the ELF section contains mixed types in a certain sequential address space?
NOTE: I assume the scatter file used in the example is the default one since the document contains the example do not provide any scatter file along with the example.

At run time, the sections do not matter, only the PT_LOAD segments in the program header. The ELF specification is quite flexible there as well, but some loaders have restrictions on the PT_LOAD segments they can process.
The reason for splitting code and data this way could be that this architecture supports only a limited range of PC-relative addressing and needs a constant pool for loading most constants (because constructing them via immediates is too expensive). Having as few large constants pools as possible is attractive because it leads to improved data and instruction cache utilization (instead of caching memory which is not of the right type and this can never be used), but you may still need more than one if the code size exceeds what can be addressed directly.

Related

How can i extract constants' addresses,added by compiler optimization, from ELF file?

I'm writing some code size analysis tool for my C program, using the output ELF file.
I'm using readelf -debug-dump=info to generate Dwarf format file.
I've noticed that My compiler is adding as a part of the optimization new consts, that are not in Dwarf file, to the .rodata section.
So .rodata section size includes their sizes but i don't have their sizes in Dwarf.
Here is an example fro map file:
*(.rodata)
.rodata 0x10010000 0xc0 /<.o file0 path>
0x10010000 const1
0x10010040 const2
.rodata 0x100100c0 0xa /<.o file1 path>
fill 0x100100ca 0x6
.rodata 0x100100d0 0x6c /<.o file2 path>
0x100100d0 const3
0x100100e0 const4
0x10010100 const5
0x10010120 const6
fill 0x1001013c 0x4
In file1 above, although i didn't declare on const variable - the compiler does, this const is taking space in .rodata yet there is no symbol/name for it.
Here is the code inside some function that generates it:
uint8 arr[3][2] = {{146,179},
{133, 166},
{108, 141}} ;
So compiler add some consts values to optimize the load to the array.
How can i extract theses hidden additions from data sections?
I want to be able to fully characterize my code - How much space is used in each file, etc...
I am guessing here - it will be linker dependent, but when you have code such as:
uint8 arr[3][2] = {{146,179},
{133, 166},
{108, 141}} ;
arr at run-time exists in r/w memory, but its initialiser will be located in R/O memory to be copied to the R/W memory when the array is initialised. The linker need only provide the address, because the size will be known locally as a compile-time constant embedded as a literal in the initializing code. Consequently the size information does not appear in the map, because the linker discards that information.
Length is however implicit by the address of adjacent objects for filled space. So for example:
The size of const1 for example is equal to const2 - const1 and for const6 it is 0x1001013c - const6.
It is all rather academic however - you have precise control over this in terms of the size of your constant initialisers. They are not magically created data unrelated to your code, and I am not convinced that thy are a product of optimization as you suggest. The non-zero initialisers must exist regardless of optimisation options, and in any case optimisation primarily affects the size and/or speed of code (.text) rather then data. The impact on data sizes is likely to relate only to padding and alignment and in debug builds possibly "guard-space" for overrun detection.
However there is no need at all for you to guess. You can determine how this data is used by inspecting the disassembly or observing its execution (at the instruction level) in a debugger - to see exactly where initialised variables are copying the data from. You could even place an read-access break-point at these addresses and you will determine directly what code is utilizing them.
to get the size of elf file in details use
"You can use nm and size to get the size of functions and ELF sections.
To get the size of the functions (and objects with static storage duration):
$ nm --print-size --size-sort --radix=d tst.o
The second column shows the size in decimal of function and objects.
To get the size of the sections:
$ size -A -d tst.o
The second column shows the size in decimal of the sections."
Tool to analyze size of ELF sections and symbol

Moving memcpy into another code section

I am building a piece of software meant to run on an ARM Cortex-M0+ microcontroller. It includes a USB bootloader of sorts that runs as a secondary program upon a call to a function. I'm having an issue with the insertion of the memcpy function during compilation.
Background
The linker script is where it all starts. Most of it is pretty straightforward and standard. The program is stored in .text and is executed from there as well. Everything in .text is stored in the flash section of the chip.
The strangeness is the part where the bootloader runs. In order to be able to write all of the flash without overwriting the bootloader code, my bootloader entry point initiates a copy of the bootloader program into the SRAM portion of the microcontroller and then executes it from there. This way, the bootloader can safely erase all of the flash on the device without inadverently deleting itself.
This is implemented by doing an faked "overlay" in the linker script (the real OVERLAY didn't quite match my use case):
/**
* The bootloader and general ram live in the same area of memory
* NOTE: The bootloader gets its own special RAM space and it lives on top
* of both .data and .bss.
*/
_shared_start = .;
.bootloader _shared_start : AT(_end_flash)
{
/* We keep the bootloader and its data together */
_start_bootloader_flash = LOADADDR(.bootloader);
_start_bootloader = .;
*(.bootloader.data)
*(.bootloader.data.*)
. = ALIGN(1024); /* Interrupt vector tables must be aligned to a 1024-byte boundary */
*(.bootloader.interrupt_vector_table)
*(.bootloader)
_end_bootloader = .;
}
.data _shared_start : AT(_end_flash + SIZEOF(.bootloader))
{
_start_data_flash = LOADADDR(.data);
_start_data = .;
*(.data)
*(.data.*)
*(.shdata)
_end_data = .;
}
. = _shared_start + SIZEOF (.data);
_bootloader_size = _end_bootloader - _start_bootloader;
_data_size = _end_data - _start_data;
_end_flash is a reference to the end of the previous section which stored all of its data in flash (.text, .rodata, .init...basically anything read-only gets stuck there).
What this accomplishes is that the .data and .bss sections normally live in RAM. However, the .bootloader sections also live in the same place in RAM. Both sections are stored to the flash sequentially when compiled. In my crt0 routines, the .data section is copied from the flash into its appropriate address in RAM (specified by _start_data) and the .bss section is zeroed. I have an additional section stored in the .text section which initiates the bootloader by copying its data from the flash into RAM, overwriting whatever was in .data and .bss. The only exit from the bootloader is a system reset, so it is ok that it destroys the data for the running program. After copying the bootloader into RAM, it executes it.
The Question
Obviously, there are some possible issues with compiling an overlaid program and making sure all the references line up. In order to mitigate issues that would crop up accessing bootloader code from the normal program or accessing the normal .data or .bss from the bootloader, I have the following three lines in my linker script:
NOCROSSREFS(.bootloader .text);
NOCROSSREFS(.bootloader .data);
NOCROSSREFS(.bootloader .bss);
Now, whenever I have a cross between the .text (which might be erased by the bootloader), .data (which the bootloader lives on top of), or .bss (again, the bootloader lives on top of it) and the .bootloader section, a compiler error will be issued.
This worked great until I actually started writing code. Part of my code includes some struct copying and other such things. Apparently, the compiler decided to do this (bootloader_ functions live in the .bootloader section):
20000340 <bootloader_usb_endp0_handler>:
...
20000398: 1c11 adds r1, r2, #0
2000039a: 1c1a adds r2, r3, #0
2000039c: f000 f8e0 bl 20000560 <__memcpy_veneer>
...
20000560 <__memcpy_veneer>:
20000560: b401 push {r0}
20000562: 4802 ldr r0, [pc, #8] ; (2000056c <__memcpy_veneer+0xc>)
20000564: 4684 mov ip, r0
20000566: bc01 pop {r0}
20000568: 4760 bx ip
2000056a: bf00 nop
2000056c: 00000869 andeq r0, r0, r9, ror #16
In my chip's architecture, addresses 0x20000000 until 0xE000000 or so are located in SRAM (I only have 4Kb of that actually on the device). Any address below 0x1fffffc00 is located in the flash section.
The problem is this: In my function located in my .bootloader section (bootloader_usb_endp0_handler), a reference to memcpy (2000039c, 20000562, and 2000056c) was inserted because I'm doing a struct copy among other things. The reference it put to memcpy is at address 0x00000869, which lives in the flash...which could be erased.
The particular code is:
static setup_t last_setup;
last_setup = *((setup_t*)(bdt->addr));
Where setup_t is a two-word struct and bdt->addr is a void* which I know points to data that looks like a setup_t. This line generates the call to memcpy.
My question is: I'd really like to keep my struct copying. It is convenient. Is there any way to specify to the compiler to place the memcpy into a specific section other than the default? I want that to happen just for the bootloader module. All the other code can have it's memcpy...I just want a special copy for my bootloader module that lives inside .bootloader.
If this simply isn't possible, I'm going to either write the entire bootloader in assembly (not as fun) or go the route of compiling the bootloader separately, including it as a fairly long hexadecimal string in the end program, and executing the string after copying it to RAM. The string route doesn't appeal to me very well because it is breakable and difficult to implement...so any other suggestions would also be appreciated.
The compilation line for this module is:
arm-none-eabi-gcc -Wall -fno-common -mthumb -mcpu=cortex-m0plus -ffreestanding -fno-builtin -nodefaultlibs -nostdlib -O0 -c src/bootloader.c -o obj/bootloader.o
Normally the optimization would be -Os, but I was trying to get rid of the memcpy...it didn't work.
Also, I've looked at this question and it didn't fix the problem.
I never tried, but you might get away using the EXTERN() linker script directive to force load your newlib memcpy() twice - first in the bootloader link stage into your desired section and later undefining it and link it a second time into your "normal" code.

How to read the relocation records of an object file

I'm trying to understand the linking stage of C toolchain. I wrote a sample program and dissected the resulting object file. While this helped me to get a better understanding of the processes involved, there are some things which remain unclear to me.
Here are:
My (blazingly simple) sample program
Relevant parts of the object disassembly
The objects symbol table
The objects relocation table
Part 1: Handling of initialized variables.
Is it correct, that theses relocation table entries...
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000002b dir32 .data
00000035 dir32 .data
0000003f dir32 .data
... are basically telling the linker, that the addresses stored at offset 2b, 35 and 3f from .text are not absolute adresses, but relative adresses (= offsets) in relation to .data? It is my understanding that this enables the linker to
either convert these relative adresses to absolute adresses for creation of a non-relocatable object file,
or just adjust them accordingly in case the object file gets linked with some other object file.
Part 2: Handling of uninitialized variables.
I don't understand why uninitalized variables are handled so differently to initialized variables. Why are the register adresses stored in the opcode,
equal for all the uninitialized variables (0x0, 0x0 and 0x0), while being
different for all the initialized variables (0x0, 0x4 and 0x8)?
Also the value field of their relocation table entries is entirely unclear to me. I would have expected the .bss section to be referenced there.
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000d dir32 _var1_zeroed-0x00000004
00000017 dir32 _var2_zeroed-0x00000004
00000021 dir32 _var3_zeroed-0x00000004
... are basically telling the linker, that the addresses stored at offset ...
No, the linker is no longer involved with this. The relocation tables tell the loader, the part of the operating system that's responsible for loading the executable image into memory about the addresses.
The linker builds the executable image based on the assumption that everything is ideal and the image can be loaded at the intended address. If that's the case then everything is hunky-dory, nothing needs to be done. If there's a conflict however, the virtual address space is already in use by something else, then the image needs to be relocated at a different address.
That requires addresses to be patched, the offset between the ideal and the actual load address needs to be added. So if the .data section ends up at another address then addresses .text+0x2b, .text+0x35, etcetera, must be changed. No different for the uninitialized variables, the linker already picked an address for them but when _var1_zeroed-0x00000004 ends up at another address then .text+0x0d, .text+0x17, etcetera, need to be changed.

where should the .bss section of ELF file take in memory?

It is known that .bss section was not stored in the disk, but the .bss section in memory should be initialized to zero. but where should it take in the memory? Is there any information displayed in the ELF header or the Is the .bss section likely to appear next to the data section, or something else??
The BSS is between the data and the heap, as detailed in this marvelous article.
You can find out the size of each section using size:
cnicutar#lemon:~$ size try
text data bss dec hex filename
1108 496 16 1620 654 try
To know where the bss segment will be in memory, it is sufficient to run readelf -S program, and check the Addr column on the .bss row.
In most cases, you will also see that the initialized data section (.data) comes immediately before. That is, you will see that Addr+Size of the .data section matches the starting address of the .bss section.
However, that is not always necessarily the case. These are historical conventions, and the ELF specification (to be read alongside the platform specific supplement, for instance Chapter 5 in the one covering 32-bit x86 machines) allows for much more sophisticated configurations, and not all of them are supported by Linux.
For instance, the section may not be called .bss at all. The only 2 properties that make a BSS section such are:
The section is marked with SHT_NOBITS (that is, it takes space in memory but none on the storage) which shows up as NOBITS in readelf's output.
It maps to a loadable (PT_LOAD), readable (PF_R), and writeable (PF_W) segment. Such a segment is also shorter on storage than it is in memory (p_filesz < p_memsz).
You can have multiple BSS sections: PowerPC executables may have .sbss and .sbss2 for uninitialized data variables.
Finally, the BSS section is not necessarily adjacent to the data section or the heap. If you check the Linux kernel (more in particular the load_elf_binary function) you can see that the BSS sections (or more precisely, the segment it maps to) may even be interleaved with code and initialized data. The Linux kernel manages to sort that out.

which part of ELF file must be loaded into the memory?

An ELF file for executables has a program (segment) header and a section header, which can be seen through readelf -a, here is an example:
The two pictures above are section header and program (segment) header, respectively. It can be seen that a segment header is composed of several section headers, which is used for loading program into the memory.
Is it only necessary for .text, .rodata, .data, .bss sections to be loaded into the memory?
Are all of the other sections in the segment (e.g. .ctors, .dtors .jcr in the 3rd segment) used for aligning?
Sections and segments are two different concepts completely. Sections pertain the the semantics of the data stored there (i.e. what it will be used for) and are actually irrelevant once a program or shared library is linked except for debugging purposes. You could even remove the section headers entirely (or overwrite them with random garbage) and a program would still work.
Segments (i.e. program header load directives) are what the kernel and/or dynamic linker actually look at when loading a program. For example, in your case you have two load directives. The first one causes the first 4k (1 page) of the file to be mapped at address 0x08048000, and indicates that only the first 0x4b8 bytes of this mapping are actually to be used (the rest is alignment). The second causes the first 8k (2 pages) of the file to be mapped at address 0x08049000. The vast majority of that is alignment. The first 0xf14 bytes are not part of the load directive (just alignment) and will be wasted. Beginning at 0x08049f14, 0x108 bytes mapped from the file are actually used, and another 0x10 bytes (to reach the MemSize of 0x118) are zero-filled by the loader (kernel or dynamic linker). This spans up to 0x0804a02c (in the second mapped page). The rest of the second mapped page is unused/wasted (but malloc might be able to recover it for use as part of the heap).
Finally, while the section headers will not be used at all, the contents of many different sections may be used by your program while it's running. Note that the address ranges of .ctors and .dtors lie in the beginning of the second load mapping, so they are mapped and accessible by the program at runtime (the runtime startup/exit code will use them to run global constructors and destructors, if C++ or "GNU C" code with ctor/dtor attribute was used). Also note that .data starts at address 0x0804a00c, in the second mapped page. This allows the first page to be protected read-only after relocations are applied (the RELRO directive in the program header).

Resources