Aligning a .data or .text section - linker

I am building a bootloader for an ARM Cortex-A9 target. The output of the Microsoft linker is passed to a locator application (romimage) that locates the linker output section and builds the linear memory layout that runs in the target.
One part of the bootloader is the MMU table that must be located at a 64k aligned boundary. The table is defined in its own section:
AREA |.mmu|, DATA
global MmuTable
MmuTable
% 0x10000
end
There are not other modules that create an output to section .mmu. The linker command line includes these options:
-DRIVER -SECTION:.mmu,R,ALIGN=65536
But the symbol MmuTable is not aligned at a 64k boundary.
How can the Microsoft linker (version 11.00.50728.6) be instructed to place a section aligned to a 64k boundary?

Related

ELF binary analysis static vs dynamic. How does assembly code| instruction memory mapping changes?

./hello is a simple echo program in c.
according to objdump file-headers,
$ objdump -f ./hello
./hello: file format elf32-i386
architecture: i386, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x00000430
./hello has start address 0x430
Now loading this binary in gdb.
(gdb) file ./hello
Reading symbols from ./hello...(no debugging symbols found)...done.
(gdb) x/x _start
0x430 <_start>: 0x895eed31
(gdb) break _start
Breakpoint 1 at 0x430
(gdb) run
Starting program: /1/vooks/cdac/ditiss/proj/binaries/temp/hello
Breakpoint 1, 0x00400430 in _start ()
(gdb) x/x _start
0x400430 <_start>: 0x895eed31
(gdb)
in above output before setting breakpoint or running the binary, _start has address 0x430, but after running it, this address changes to 0x400430.
$ readelf -l ./hello | grep LOAD
LOAD 0x000000 0x00000000 0x00000000 0x007b4 0x007b4 R E 0x1000
LOAD 0x000eec 0x00001eec 0x00001eec 0x00130 0x00134 RW 0x1000
How this mapping happens?
Kindly Help.
Basically, after linking, ELF file format provide all necessary information for the loaders to load the program into memory and run it.
Each piece of code and data is placed within an offset inside a section, like data section, text section, etc. and access of specific function or global variable is done by adding the proper offset to the section start address.
Now, ELF file format also include program header table:
An executable or shared object file's program header table is an array
of structures, each describing a segment or other information that the
system needs to prepare the program for execution. An object file
segment contains one or more sections, as described in "Segment
Contents".
Those structures are then used by the OS loader to load the image to memory. The structure:
typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
Note the following fields:
p_vaddr
The virtual address at which the first byte of the segment resides in memory
p_offset
The offset from the beginning of the file at which the first byte of
the segment resides.
And p_type
The kind of segment this array element describes or how to interpret the array element's information. Type values and their meanings are specified in Table 7-35.
From Table 7-35, note PT_LOAD:
Specifies a loadable segment, described by p_filesz and p_memsz. The
bytes from the file are mapped to the beginning of the memory segment.
If the segment's memory size (p_memsz) is larger than the file size
(p_filesz), the extra bytes are defined to hold the value 0 and to
follow the segment's initialized area. The file size can not be larger
than the memory size. Loadable segment entries in the program header
table appear in ascending order, sorted on the p_vaddr member.
So, by looking at those fields (and more) the loader can locate the segments (which can contain multiple sections) within the ELF file, and load them (PT_LOAD) into memory at a given virtual address.
Now, can a virtual address of an ELF file segment be changed at runtime (load time)? yes:
The virtual addresses in the program headers might not represent the
actual virtual addresses of the program's memory image. See "Program
Loading (Processor-Specific)".
So, program header contains the segments the OS loader will load into memory (loadable segments, which contains loadable sections), but the virtual addresses the loader puts them can differ from the addresses in the ELF file.
How?
To understand it, lets first read about Base Address
Executable and shared object files have a base address, which is the
lowest virtual address associated with the memory image of the
program's object file. One use of the base address is to relocate the
memory image of the program during dynamic linking.
An executable or shared object file's base address is calculated
during execution from three values: the memory load address, the
maximum page size, and the lowest virtual address of a program's
loadable segment. The virtual addresses in the program headers might
not represent the actual virtual addresses of the program's memory
image. See "Program Loading (Processor-Specific)".
So the practice is the following:
position-independent code. This code enables a segment's virtual
address change from one process to another, without invalidating
execution behavior.
Though the system chooses virtual addresses for individual processes,
it maintains the relative positions of the segments. Because
position-independent code uses relative addressing between segments,
the difference between virtual addresses in memory must match the
difference between virtual addresses in the file.
So by using relative addressing, (PIE- position independent executable) the actual placement can differ from the address in the ELF file.
From PeterCordes's answer:
0x400000 is the Linux default base address for loading PIE
executables with ASLR disabled (like GDB does by default).
So for your specific case (PIE executable in Linux) loader picks this base address.
Of course position independent is just an option. Program can be compiled without it, and than absolute addressing mode takes place, in which there must not be difference between segment address in ELF to the real memory address segment is loaded to:
Executable file segments typically contain absolute
code. For the process to execute correctly, the segments must reside
at the virtual addresses used to create the executable file. The
system uses the p_vaddr values unchanged as virtual addresses.
I would recommend you to take a look at the linux implementation of elf image loading here, and those two SO threads here and here.
Paragraphs takes from Oracle ELF documents (here and here)
You have a PIE executable (Position Independent Executable), so the file only contains an offset relative to the load address, which the OS chooses (and can randomize).
0x400000 is the Linux default base address for loading PIE executables with ASLR disabled (like GDB does by default).
If you compile with -m32 -fno-pie -no-pie hello.c to make a normal position dependent dynamically linked executable that can load from static locations with mov eax, [symname] instead of having to get EIP in a register and use that to do PC-relative addressing without x86-64 RIP-relative addressing modes, objdump -f will say:
./hello-32-nopie: file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08048380 # hard-coded load address, can't be ASLRed
instead of
architecture: i386, flags 0x00000150: # some different flags set
HAS_SYMS, DYNAMIC, D_PAGED # different ELF type
start address 0x000003e0
In a "regular" position-dependent executable, the linker chooses that base address by default, and does embed it in the executable. The OS's program loader does not get to choose for ELF executables, only for ELF shared objects. A non-PIE executable can't be loaded at any other address, so only their libraries can be ASLRed, not the executable itself. This is why PIE executables were invented.
A non-PIE is allowed to embed absolute addresses without any metadata that would let an OS try to relocate it. Or it's allowed to contain hand-written asm that takes advantage of whatever it wants to about the numeric values of addresses.
A PIE is an ELF shared object with an entry-point. Until PIEs were invented, ELF shared objects were usually only used for shared libraries. See 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIEs.
They're quite inefficient for 32-bit code, I'd recommend not making 32-bit PIEs.
A static executable can't be a PIE, so gcc -static will create a non-PIE elf executable; it implies -no-pie. (So will linking with ld directly, because only gcc changed to making PIEs by default, gcc needs to pass -pie to ld to do that.)
So it's easy to understand why you wrote "static vs. dynamic" in your title, if the only dynamic executables you ever looked at were PIEs. But a dynamically linked non-PIE ELF executable is totally normal, and what you should be doing if you care about performance but for some reason want / need to make 32-bit executables.
Until the last couple years or so, normal binaries like /bin/ls in normal Linux distros were non-PIE dynamic executables. For x86-64 code, being PIE only slows them down by maybe 1% I think I've read. Slightly larger code for putting a static address in a register, or for indexing a static array. Nowhere near the amount of overhead that 32-bit code has for PIC/PIE.

Compiling and linking position independent code (PIC) for ARM M4

I'm working on making a bootloader and giving it the ability to update itself. This process involves copying the binary to a new location, jumping to it, and using it to flash the new bootloader in the original location. This is all being developed for an M4 processor in Eclipse, using the ARM GCC toolchain.
To do this, I've gathered that I need to compile as Position Independent Code (PIC).
I've searched around and found this excellent article, so when I added "-fPIC" to the ARM GCC compiler call I expected to see linker errors about GOT and PLT being missing
https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
In my linker script, I added these location to the .data section as follows:
.data : AT(__DATA_ROM)
{
. = ALIGN(4);
__DATA_RAM = .;
__data_start__ = .; /* Create a global symbol at data start. */
*(.got*) /* .got and .plt for position independent code */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
KEEP(*(.jcr*))
. = ALIGN(4);
__data_end__ = .; /* Define a global symbol at data end. */
} > m_data
However, this code fails to copy-up from ROM to RAM.
My next thought was that perhaps my linker needed to be aware it was linking a PIC executable. To find out, I added "--pic-executable" to the LD linker script call. However, the linker now generated sections for "interp", "dyn", "rel.dyn" and "hash". I tried throwing these into the data section as well, but got the following errors:
gcc-arm-none-eabi-4_9/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe:
could not find output section .hash
gcc-arm-none-eabi-4_9/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe:
final link failed: Nonrepresentable section on output
I assume this means the compiler didn't actually fill the ".hash" section with anything, so the link failed.
Am I going about this correctly? Is there anything else I need to add to get the compiler to do? Any help would be greatly appreciated.
Recently I researched extensively Cortex-M4 bootloading a PIC firmware image.
There are couple of things needed:
A very simple bootloader. It only needs to read 2 4-byte words from the flash location of the firmware image. First is stack pointer address and second is Reset_Handler address. Bootloader needs to jump to Reset_Handler. But as the firmware is relocated further in the flash, the Reset_Handler is actually a bit further. See this picture for clarification:
So, bootloader adds the correct offset before jumping to Reset_Handler of the firmware image. No other patching is done, but bootloader (in my solution at least) stores location and offset of the firmware image, calculates checksum and passes this information in registers for the firmware image to use.
Then we need modifications in firmware linker file to force the ISR vector table in the beginning of RAM. For Cortex-M4 and VTOR (Vector Offset Table Register) the ISR vector table needs to be aligned to 512 boundary. In the beginning of RAM it is there naturally. In linker script we also dedicate a position in RAM for Global Offset Table (GOT) for easier manipulation. Address ranges should be exported via linker script symbols also.
We need also C compiler options. Basically these:
-fpic
-mpic-register=r9
-msingle-pic-base
-mno-pic-data-is-text-relative
They go to C compiler only! This creates the Global Offset Table (GOT) accounting.
Finally we need dedicated and tedious assembly bootstrap routines in the firmware image project. These routines perform the normal startup task of setting up the C runtime environment, but they additionally also go and read ISR and GOT from flash and copy them to RAM. In RAM, the addresses of those tables pointing to flash are offsetted by the amount the firmware image is running apart from bootloader. Example pictures:
I spent last 6 months researching this subject and I have written an in-depth article about it here:
https://techblog.paalijarvi.fi/2022/01/16/portable-position-independent-code-pic-bootloader-and-firmware-for-arm-cortex-m0-and-cortex-m4/
By providing the link I hope I can contribute to StackOverflow which I used as a starting point for my research. I hope this helps some people in learning the intrinsics.
Booting a code and relocating involves many careful steps and initialization of configuration of RAM, SPI and other necessary peripherals.
I know U-Boot does the sequence you are trying to achieve. So a better starting is to walk through u-boot documentation and sources in, machine specific folders for the processor or board of interest.
For what it's worth, neither I nor NXP's technical support team were able to get their S32DS IDE to compile truly position independent code.
To this day, we have two bootloaders - one compiled for location A, the other is an intermediate for location B. Both of which are required for an update.
TI does this with the Tivaware bootloader. You can do it with linker gnu ld trickery:
.text 0x20000000 : AT (0x00000000)
{
_text = .;
KEEP(*(.isr_vector))
*(.text*)
*(.rodata*)
_etext = .;
}
Startup code that copies this code from Flash at 0x0 to SRAM at 0x2000_0000 is left as an exercise to the reader.

RAM & ROM memory segments

There are different memory segments such as .bss, .text, .data,.rodata,....
I've failed to know which of them locates in RAM and which of them locates in FLASH memory, many sources have mentioned them in both sections of (RAM & ROM) memories.
Please provide a fair explanation of the memory segments of RAM and flash.
ATMEL studio compiler
ATMEGA 32 platform
Hopefully you understand the typical uses of those section names. .text being code, .rodata read only data, .data being non-zero read/write data (global variables for example that have been initialized at compile time), .bss read/write data assumed to be zero, uninitialized. (global variables that were not initialized).
so .text and .rodata are read only so they can be in flash or ram and be used there. .data and .bss are read/write so they need to be USED in ram, but in order to put that information in ram it has to be in a non-volatile place when the power is off, then copied over to ram. So in a microcontroller the .data information will live in flash and the bootstrap code needs to copy that data to its home in ram where the code expects to find it. For .bss you dont need all those zeros you just need the starting address and number of bytes and the bootstrap can zero that memory.
so all of them can/do live in both. but the typical use case is the read only ones are USED in flash, and the read/write USED in ram.
They are located wherever your project's linker script defines them to be located.
Some targets locate and execute code in ROM, while others may copy code from ROM to RAM on start-up and execute from RAM - usually for performance reasons on faster processors. As such .text and .rodata may be located in R/W or R/O memory. However .bss and .data cannot by definition be located in R/O memory.
ROM cannot be written to, but RAM can be written to.
ROM holds the (BIOS) Basic Input / Output System, but RAM holds the programs running and the data used.
ROM is much smaller than RAM.
ROM is non-volatile (permanent), but RAM is volatile.

What does > region1 AT > region2 mean in an LD linker script?

I'm trying to understand a third party linker script.
At the beginning of the script it defines two memory (using MEMORY {...}) called iram and dram.
Then there are a few sections defined that have the following syntax:
.data{
...
} > dram AT > iram
I know that > dram at the end means to position that section (.data in this case) in the dram region. However I don't understand what the "AT > iram" means.
The dram part of the .data definition in your example specifies the virtual memory address (VMA) of the .data section, whereas the the iram part specifies the load memory address (LMA).
The VMA is the address the section will have when the program is run. The LMA is the address of the section when program is being loaded. As an example this can be used to provide initial values for global variables in non-volatile memory which are copied to RAM during program load.
More information can also be found in the manual for the GNU linker ld: https://sourceware.org/binutils/docs/ld/Output-Section-Attributes.html#Output-Section-Attributes

Unexpected linker section output location

I'm trying to use the ld command in linux on an assembly file for a kernel. For it to boot with grub, it needs to be after the 1Mb address. So my link script has the text going to the address 0x00100000.
Here's the linker script I'm using:
SECTIONS {
.text 0x00100000 :{
*(.text)
}
textEnd = .;
.data :{
*(.data)
*(.rodata)
}
dataEnd = .;
.bss :{
*(.common)
*(.bss)
}
bssEnd = .;
}
My question is about the output file. When I look at the binary of the file, text section starts at 0x1000. When I change the text location in the script and use addresses lower than 0x1000, such as 0x500, the text will start there. But whenever I go above 0x1000, it rounds it (0x2500 will put the text at 0x500).
When I specify that the text should be at 0x100000, shouldn't it be there in the output file? Or is there another part of the binary that specifies that there's more moving to do. I'm asking because there's a problem booting my kernel, but for now I'm just simply trying to understand the linker output.
You are referring to two different address spaces. The addresses you refer to within the linked file (such as 0x1000 and 0x500) are just the file offsets. The addresses specified in the linker script, such as 0x00100000, are with respect to computer memory (i.e. RAM).
In the case of the linker script, the linker is being told that the .text section of the binary/executable file should be loaded at the 1MiB point in RAM (i.e. 0x00100000). This has less to do with the layout of the file output by the linker and more to do with how the file is to be loaded when executed.
The section locations in the actual file have to do with alignment. That is, your linker appears to be aligning the first section at a 4096-byte boundary. If, for example, each section is less than 4096 bytes in size and each placed at 4096-byte boundary, their respective offsets in the file would be 0x1000, 0x2000, 0x3000, etc. By default, this alignment would also hold once the file is loaded into RAM such that the previous example would yield sections located at 0x00100000, 0x00101000, 0x00102000, etc.
And it appears that when you change the load location to a small enough number, the linker automatically changes the alignment. However, the 'ALIGN' function can be used if you wanted to manually specify the alignment.
For a short & sweet explanation of the linker (describing all of the above in more detail) I recommend:
http://www.math.utah.edu/docs/info/ld_3.html
or
http://sourceware.org/binutils/docs-2.15/ld/Scripts.html

Resources