What does > region1 AT > region2 mean in an LD linker script? - linker

I'm trying to understand a third party linker script.
At the beginning of the script it defines two memory (using MEMORY {...}) called iram and dram.
Then there are a few sections defined that have the following syntax:
.data{
...
} > dram AT > iram
I know that > dram at the end means to position that section (.data in this case) in the dram region. However I don't understand what the "AT > iram" means.

The dram part of the .data definition in your example specifies the virtual memory address (VMA) of the .data section, whereas the the iram part specifies the load memory address (LMA).
The VMA is the address the section will have when the program is run. The LMA is the address of the section when program is being loaded. As an example this can be used to provide initial values for global variables in non-volatile memory which are copied to RAM during program load.
More information can also be found in the manual for the GNU linker ld: https://sourceware.org/binutils/docs/ld/Output-Section-Attributes.html#Output-Section-Attributes

Related

Aligning a .data or .text section

I am building a bootloader for an ARM Cortex-A9 target. The output of the Microsoft linker is passed to a locator application (romimage) that locates the linker output section and builds the linear memory layout that runs in the target.
One part of the bootloader is the MMU table that must be located at a 64k aligned boundary. The table is defined in its own section:
AREA |.mmu|, DATA
global MmuTable
MmuTable
% 0x10000
end
There are not other modules that create an output to section .mmu. The linker command line includes these options:
-DRIVER -SECTION:.mmu,R,ALIGN=65536
But the symbol MmuTable is not aligned at a 64k boundary.
How can the Microsoft linker (version 11.00.50728.6) be instructed to place a section aligned to a 64k boundary?

RAM & ROM memory segments

There are different memory segments such as .bss, .text, .data,.rodata,....
I've failed to know which of them locates in RAM and which of them locates in FLASH memory, many sources have mentioned them in both sections of (RAM & ROM) memories.
Please provide a fair explanation of the memory segments of RAM and flash.
ATMEL studio compiler
ATMEGA 32 platform
Hopefully you understand the typical uses of those section names. .text being code, .rodata read only data, .data being non-zero read/write data (global variables for example that have been initialized at compile time), .bss read/write data assumed to be zero, uninitialized. (global variables that were not initialized).
so .text and .rodata are read only so they can be in flash or ram and be used there. .data and .bss are read/write so they need to be USED in ram, but in order to put that information in ram it has to be in a non-volatile place when the power is off, then copied over to ram. So in a microcontroller the .data information will live in flash and the bootstrap code needs to copy that data to its home in ram where the code expects to find it. For .bss you dont need all those zeros you just need the starting address and number of bytes and the bootstrap can zero that memory.
so all of them can/do live in both. but the typical use case is the read only ones are USED in flash, and the read/write USED in ram.
They are located wherever your project's linker script defines them to be located.
Some targets locate and execute code in ROM, while others may copy code from ROM to RAM on start-up and execute from RAM - usually for performance reasons on faster processors. As such .text and .rodata may be located in R/W or R/O memory. However .bss and .data cannot by definition be located in R/O memory.
ROM cannot be written to, but RAM can be written to.
ROM holds the (BIOS) Basic Input / Output System, but RAM holds the programs running and the data used.
ROM is much smaller than RAM.
ROM is non-volatile (permanent), but RAM is volatile.

Why is the entry point address in my executable 0x8048330? (0x330 being the offset of .text section)

I wrote a small program to add two integers and on using readelf -a executable_name it showed the entry point address in elf header as:
Entry point address: 0x8048330
How does my executable know this address beforehand even before loader loads it in memory?
elf_format.pdf says this member gives the virtual address to which the system first transfers control, thus starting the process. Can anyone please explain what is the meaning of this statement and what is the meaning of virtual address here?
Also let me know, from where the executable file gets the value of 0x8048330 as entry point address. Just for cross check I compiled another program and for that also, the entry point address remains the same value 0x8048330 (offset of .text section being 0x330 in both the cases).
For first question:
the entry point you saw, 0x8048330, is a virtual memory address (in the opposite, is physical memory).
This means your executive doesn't have to know what physical address to map. (after it loads with a loader)
It doesn't even have the access to the physical memory.
To the process of your program, your .text section always starts from 0x8048330, your system (OS and hardware) will then map it (the virtual address) to the physical memory at run-time.
mapping and managing physical memory is a lot of things, you can check on Google for more information.
For the second question
I'm not sure which part confused you so I'll try to cover them all:
Could more than one program have same entry point?
Yes, there could be another program with the same entry point 0x8048330. because this address is virtual, the programs will be mapped to different physical memory at run-time when you try to run them at the same time.
Does the entry always 0x8048330?
Well, Linux executives are start from 0x8048000, but the offset of .text section is related to other sections length. So no, it could be 0x8048034 or anything else.
Why it always start from 0x8048000?
I think it's kind of history thing, the designer of Linux picked this one for some unknown or even random reason. you can refer this thread to see what under that area.
The entry address is set by the link editor, at the time when it creates the
executable. The loader maps the program file at the address(es) specified
by the ELF headers before transferring control to the entry address.
To use a concrete example, consider the following:
% file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, \
for GNU/Linux 2.6.15, not stripped
% readelf -e a.out
... snip ...
Elf file type is EXEC (Executable file)
Entry point 0x8048170
There are 6 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x7cca6 0x7cca6 R E 0x1000
LOAD 0x07cf98 0x080c5f98 0x080c5f98 0x00788 0x022fc RW 0x1000
... snip ...
The first program header specifies that the contents of the file at
file offset 0 should be mapped to virtual address 0x08048000. The
file and memory sizes for this segment are 0x7cca6 bytes. This
segment is to be mapped in readable and executable but not writable
(it contains the program's code).
The entry point address specified in the ELF header is 0x8048170, which
falls inside the region containing program code.
The book "Linkers and Loaders" by John Levine is a good resource to consult on matters related to link editors and loaders.
About the virtual address question:
Normal userland applications work with virtual addresses which means they don't access directly the memory space. The OS (with the help of some microprocessor's special functions) maps this virtual addresses to physical addresses.
This way, the OS prevents applications from reading/writing into other applications memory or OS reserved memory. Also, this allows the paging of memory (use hard disk as memory) in a transparent way for the application.

Unexpected linker section output location

I'm trying to use the ld command in linux on an assembly file for a kernel. For it to boot with grub, it needs to be after the 1Mb address. So my link script has the text going to the address 0x00100000.
Here's the linker script I'm using:
SECTIONS {
.text 0x00100000 :{
*(.text)
}
textEnd = .;
.data :{
*(.data)
*(.rodata)
}
dataEnd = .;
.bss :{
*(.common)
*(.bss)
}
bssEnd = .;
}
My question is about the output file. When I look at the binary of the file, text section starts at 0x1000. When I change the text location in the script and use addresses lower than 0x1000, such as 0x500, the text will start there. But whenever I go above 0x1000, it rounds it (0x2500 will put the text at 0x500).
When I specify that the text should be at 0x100000, shouldn't it be there in the output file? Or is there another part of the binary that specifies that there's more moving to do. I'm asking because there's a problem booting my kernel, but for now I'm just simply trying to understand the linker output.
You are referring to two different address spaces. The addresses you refer to within the linked file (such as 0x1000 and 0x500) are just the file offsets. The addresses specified in the linker script, such as 0x00100000, are with respect to computer memory (i.e. RAM).
In the case of the linker script, the linker is being told that the .text section of the binary/executable file should be loaded at the 1MiB point in RAM (i.e. 0x00100000). This has less to do with the layout of the file output by the linker and more to do with how the file is to be loaded when executed.
The section locations in the actual file have to do with alignment. That is, your linker appears to be aligning the first section at a 4096-byte boundary. If, for example, each section is less than 4096 bytes in size and each placed at 4096-byte boundary, their respective offsets in the file would be 0x1000, 0x2000, 0x3000, etc. By default, this alignment would also hold once the file is loaded into RAM such that the previous example would yield sections located at 0x00100000, 0x00101000, 0x00102000, etc.
And it appears that when you change the load location to a small enough number, the linker automatically changes the alignment. However, the 'ALIGN' function can be used if you wanted to manually specify the alignment.
For a short & sweet explanation of the linker (describing all of the above in more detail) I recommend:
http://www.math.utah.edu/docs/info/ld_3.html
or
http://sourceware.org/binutils/docs-2.15/ld/Scripts.html

where should the .bss section of ELF file take in memory?

It is known that .bss section was not stored in the disk, but the .bss section in memory should be initialized to zero. but where should it take in the memory? Is there any information displayed in the ELF header or the Is the .bss section likely to appear next to the data section, or something else??
The BSS is between the data and the heap, as detailed in this marvelous article.
You can find out the size of each section using size:
cnicutar#lemon:~$ size try
text data bss dec hex filename
1108 496 16 1620 654 try
To know where the bss segment will be in memory, it is sufficient to run readelf -S program, and check the Addr column on the .bss row.
In most cases, you will also see that the initialized data section (.data) comes immediately before. That is, you will see that Addr+Size of the .data section matches the starting address of the .bss section.
However, that is not always necessarily the case. These are historical conventions, and the ELF specification (to be read alongside the platform specific supplement, for instance Chapter 5 in the one covering 32-bit x86 machines) allows for much more sophisticated configurations, and not all of them are supported by Linux.
For instance, the section may not be called .bss at all. The only 2 properties that make a BSS section such are:
The section is marked with SHT_NOBITS (that is, it takes space in memory but none on the storage) which shows up as NOBITS in readelf's output.
It maps to a loadable (PT_LOAD), readable (PF_R), and writeable (PF_W) segment. Such a segment is also shorter on storage than it is in memory (p_filesz < p_memsz).
You can have multiple BSS sections: PowerPC executables may have .sbss and .sbss2 for uninitialized data variables.
Finally, the BSS section is not necessarily adjacent to the data section or the heap. If you check the Linux kernel (more in particular the load_elf_binary function) you can see that the BSS sections (or more precisely, the segment it maps to) may even be interleaved with code and initialized data. The Linux kernel manages to sort that out.

Resources