virtual and physical addresses of sections in elf files - linker

How does objdump compute the physical address (LMA) of elf sections? As far as I can tell, elf section headers only contain the virtual address (VMA) of sections [1].
Usually, VMA and LMA are the same. But for initialized data sections (.data), the VMA is the RAM location of the variables and LMA is the ROM location where the initial values are located. Crt0 is responsible for copying the initial values into RAM before main() is called. For example:
$ objdump -h my.elf
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0003c3d0 00080000 00080000 00010000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
5 .data 000008d0 40000000 000d08d4 00060000 2**3
CONTENTS, ALLOC, LOAD, DATA
-Tom
[1] http://www.ouah.org/RevEng/x430.htm

Find this about LMA:
http://www-zeuthen.desy.de/dv/documentation/unixguide/infohtml/binutils/docs/ld/Basic-Script-Concepts.html#Basic-Script-Concepts
The important is following:
Every loadable or allocatable output section has two addresses. The first is the VMA, or virtual memory address. This is the address the section will have when the output file is run. The second is the LMA, or load memory address. This is the address at which the section will be loaded. In most cases the two addresses will be the same. An example of when they might be different is when a data section is loaded into ROM, and then copied into RAM when the program starts up (this technique is often used to initialize global variables in a ROM based system). In this case the ROM address would be the LMA, and the RAM address would be the VMA

The section header contains a single address. It looks to me like address in the section header is the VMA. The program headers contain the mapping of VMA to LMA.
For example, here's a snippet of what "objdump -x" shows for my elf file:
Program Header:
<a few lines removed>
LOAD off 0x00000240 vaddr 0x00000048 paddr 0x0000018c align 2**0
filesz 0x00000000 memsz 0x00000004 flags rw-
Sections:
Idx Name Size VMA LMA File off Algn
<a few lines removed>
3 .bss 00000004 00000048 0000018c 00000240 2**1
ALLOC
So, .bss has a VMA of 0x48. If you look through the program headers, one entry has a "vaddr" of 0x48 and a paddr of 0x18c, which is the LMA.

Physical address is an attribute of ELF file segment. ELF file section does not have such attribute. It is possible though to map sections to corresponding segment's memory.
The meaning of physical address is architecture dependent and may vary between different OSes and hardware platforms.
From this link:
p_paddr - On systems for which
physical addressing is relevant, this
member is reserved for the segment’s
physical address. Because System V
ignores physical addressing for
application programs, this member has
unspecified contents for executable
files and shared objects.
It looks like your Crt0 makes some assumption about meaning of physical address located in ELF file. This assumption may be true for the particular system, but is not garanteed on another.

Related

Linker Inserts Unnecessary Opcode Padding

I've recently come across a minor issue when linking multiple object files for a Motorola 68000 based system (SEGA Mega Drive). The problem is, when an input section for one object file ends and the next one begins, the linker fills memory addresses with zeros so that the next object file begins aligned on a four byte boundary. The text below is a memory map output by the linker. As you can see, the .text output section contains three object files. The first two (main.o, swap.o), were written in C compiled and assembled using m68k-elf-gcc. The third one (swap_asm.o) was hand written in 68000 assembly and assembled using the vasm. The function at the beginning of swap.o would normally start at address 0x0000001E. But, the linker is *fill*ing the beginning of the swap.o file with two bytes, specifically 0x0000. So, swap.o starts at 0x00000020. But, swap_asm.o is not getting aligned and begins at a non-four-byte-aligned address, 0x00000036. Is there a way to make the linker not add any padding and just start the swap.o right away? I understand there are a few work arounds like filling the space with a NOP, but I was wondering if there is a way to just not do a *fill*?
.text 0x00000000 0x4c
main.o(.text)
.text 0x00000000 0x1e main.o
0x00000000 main
swap.o(.text)
*fill* 0x0000001e 0x2
.text 0x00000020 0x16 swap.o
0x00000020 swap
swap_asm.o(.text)
.text 0x00000036 0x16 swap_asm.o
0x00000036 swap_asm
The 68000 processor requires instructions to be aligned (and this requirement holds also for data). Despite of the CPU requirements (which are unskipable) the linker also uses a script in which the segments are required to have some alignment (normally to provide for this cpu requirements)
While the linker script can be tweakable, It can be the case that changing the alignment makes the linker to produce incorrect code (because of what is said in the above paragraph) but anycase, that's something you can try and test.
Motorola 68000 (and more the 16 bit version of the MegaDrive) triggers a bus error trap when a 16bit transfer is requested on an odd address. The same happens if a 32bit (but this happens also up to the 68030, the 68040 I think already handles this making several bus accesses, like the Intel processors)
So I found my answer. When the assembler detects long (32-bits) data is being dealt with in an assembly file, it automatically aligns the input section along a 4 byte boundary. You can actually override this using SUBALIGN in a linker script. Here's my linker script aligning input sections along a 2 byte boundary.
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x00400000
}
SECTIONS
{
.text : SUBALIGN(0x2) {
*(.header)
*(.boot)
obj/main.o(.text)
*(.text)
*(.isr)
*(.vdp)
} > rom
.data : { *(.data) } > rom
.bss : { *(.bss) } > rom
}
New linker map:
.text 0x00000000 0x4a
main.o(.text)
.text 0x00000000 0x1e main.o
0x00000000 main
swap.o(.text)
.text 0x0000001e 0x14 swap.o
0x0000001e swap
swap_asm.o(.text)
.text 0x00000034 0x16 swap_asm.o
0x00000034 swap_asm

.text section address range of position independent executable

I want the address of the .text section of a position independent executable. Using readelf -S:
Name Type Address Offset
Size EntSize Flags Link Info Align
.text PROGBITS 0000000000002700 00002700
0000000000001672 0000000000000000 AX 0 0 16
I learn that it will begin 0x2700 bytes past where library was loaded into memory. But how can I get the load address of the executable?
Is there any other way to get the .text section address range during runtime (from the running program)?
Is there any other way to get the .text section address range during runtime (from the running program)?
Yes: you need to use dl_iterate_phdr and use info->dlpi_addr to locate the PIE binary in memory at runtime. The very first call to your callback will be for the main executable.

What does > region1 AT > region2 mean in an LD linker script?

I'm trying to understand a third party linker script.
At the beginning of the script it defines two memory (using MEMORY {...}) called iram and dram.
Then there are a few sections defined that have the following syntax:
.data{
...
} > dram AT > iram
I know that > dram at the end means to position that section (.data in this case) in the dram region. However I don't understand what the "AT > iram" means.
The dram part of the .data definition in your example specifies the virtual memory address (VMA) of the .data section, whereas the the iram part specifies the load memory address (LMA).
The VMA is the address the section will have when the program is run. The LMA is the address of the section when program is being loaded. As an example this can be used to provide initial values for global variables in non-volatile memory which are copied to RAM during program load.
More information can also be found in the manual for the GNU linker ld: https://sourceware.org/binutils/docs/ld/Output-Section-Attributes.html#Output-Section-Attributes

Unexpected linker section output location

I'm trying to use the ld command in linux on an assembly file for a kernel. For it to boot with grub, it needs to be after the 1Mb address. So my link script has the text going to the address 0x00100000.
Here's the linker script I'm using:
SECTIONS {
.text 0x00100000 :{
*(.text)
}
textEnd = .;
.data :{
*(.data)
*(.rodata)
}
dataEnd = .;
.bss :{
*(.common)
*(.bss)
}
bssEnd = .;
}
My question is about the output file. When I look at the binary of the file, text section starts at 0x1000. When I change the text location in the script and use addresses lower than 0x1000, such as 0x500, the text will start there. But whenever I go above 0x1000, it rounds it (0x2500 will put the text at 0x500).
When I specify that the text should be at 0x100000, shouldn't it be there in the output file? Or is there another part of the binary that specifies that there's more moving to do. I'm asking because there's a problem booting my kernel, but for now I'm just simply trying to understand the linker output.
You are referring to two different address spaces. The addresses you refer to within the linked file (such as 0x1000 and 0x500) are just the file offsets. The addresses specified in the linker script, such as 0x00100000, are with respect to computer memory (i.e. RAM).
In the case of the linker script, the linker is being told that the .text section of the binary/executable file should be loaded at the 1MiB point in RAM (i.e. 0x00100000). This has less to do with the layout of the file output by the linker and more to do with how the file is to be loaded when executed.
The section locations in the actual file have to do with alignment. That is, your linker appears to be aligning the first section at a 4096-byte boundary. If, for example, each section is less than 4096 bytes in size and each placed at 4096-byte boundary, their respective offsets in the file would be 0x1000, 0x2000, 0x3000, etc. By default, this alignment would also hold once the file is loaded into RAM such that the previous example would yield sections located at 0x00100000, 0x00101000, 0x00102000, etc.
And it appears that when you change the load location to a small enough number, the linker automatically changes the alignment. However, the 'ALIGN' function can be used if you wanted to manually specify the alignment.
For a short & sweet explanation of the linker (describing all of the above in more detail) I recommend:
http://www.math.utah.edu/docs/info/ld_3.html
or
http://sourceware.org/binutils/docs-2.15/ld/Scripts.html

where should the .bss section of ELF file take in memory?

It is known that .bss section was not stored in the disk, but the .bss section in memory should be initialized to zero. but where should it take in the memory? Is there any information displayed in the ELF header or the Is the .bss section likely to appear next to the data section, or something else??
The BSS is between the data and the heap, as detailed in this marvelous article.
You can find out the size of each section using size:
cnicutar#lemon:~$ size try
text data bss dec hex filename
1108 496 16 1620 654 try
To know where the bss segment will be in memory, it is sufficient to run readelf -S program, and check the Addr column on the .bss row.
In most cases, you will also see that the initialized data section (.data) comes immediately before. That is, you will see that Addr+Size of the .data section matches the starting address of the .bss section.
However, that is not always necessarily the case. These are historical conventions, and the ELF specification (to be read alongside the platform specific supplement, for instance Chapter 5 in the one covering 32-bit x86 machines) allows for much more sophisticated configurations, and not all of them are supported by Linux.
For instance, the section may not be called .bss at all. The only 2 properties that make a BSS section such are:
The section is marked with SHT_NOBITS (that is, it takes space in memory but none on the storage) which shows up as NOBITS in readelf's output.
It maps to a loadable (PT_LOAD), readable (PF_R), and writeable (PF_W) segment. Such a segment is also shorter on storage than it is in memory (p_filesz < p_memsz).
You can have multiple BSS sections: PowerPC executables may have .sbss and .sbss2 for uninitialized data variables.
Finally, the BSS section is not necessarily adjacent to the data section or the heap. If you check the Linux kernel (more in particular the load_elf_binary function) you can see that the BSS sections (or more precisely, the segment it maps to) may even be interleaved with code and initialized data. The Linux kernel manages to sort that out.

Resources