How does the linker generate final virtual memory addresses? - c

Assume this simple code:
int main(){return 0;}
using objdump we can see the memory addresses:
0000000100003fa0 _main:
100003fa0: 55 pushq %rbp
100003fa1: 48 89 e5 movq %rsp, %rbp
100003fa4: 31 c0 xorl %eax, %eax
100003fa6: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
100003fad: 5d popq %rbp
100003fae: c3 retq
I know that 0x100003fa0 (as an example) is a virtual memory address.
The OS will map it to the physical memory when my program is loaded.
2 Questions:
1- Can the initial address of main be random? as they are virtual I'm guessing it can be
any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?
2- How does the linker come up with the initial address? (again is the starting address random?)

Can the initial address of main be random? as they are virtual I'm guessing it can be any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?
The memory being virtual doesn’t mean that all of the virtual address space is yours to do with as you please. On most OSes, the executable modules (programs and libraries) need to use a subset of the address space or the loader will refuse to load them. That is highly platform-dependent of course.
So the address can be whatever you want as long as it is within the platform-specific range. I doubt that any platform would allow 0x1, not only because some platforms need the code to be aligned to something larger than a byte.
Furthermore, on many platforms the addresses are merely hints: if they can be used as-is, the loader doesn't have to relocate a given section in the binary. Otherwise, it'll move it to a block of the address space that is available. This is fairly common, e.g. on Windows, the 32-bit binaries (e.g. DLLs) have base addresses: if available, the loader can load the binary faster. So, in the hypothetical case of the "initial address" being 0x1, assuming that alignment wasn't a problem, the address will just end up being moved elsewhere in the address space.
It's also worth noting that the "initial address" is a bit of an unspecific term. The binary modules that are loaded when an executable starts, consist of something akin to sections. Each of the sections has its own base address, and possibly also internal (relative) addresses or address references that are tabulated. In addition, one or more of the executable sections will also have an "entry" address. Those addresses will be used by the loader to execute initialization code (e.g. DllMain concept on Windows) - that code always returns quickly. Eventually, one of the sections, that nothing else depends on, will have a suitably named entry point and will be the "actual" program you wrote - that one will keep running and return only when the program has been exited. At that point the control may return to the loader, which will note that nothing else is to be executed, and the process will be torn down. The details of all this are highly platform dependent - I'm only giving a high-level overview, it's not literally done that way on any particular platform.
How does the linker come up with the initial address? (again is the starting address random?)
The linker has no idea what to do by itself. When you link your program, the linker gets fed several more files that come with the platform itself. Those files are linker scripts and various static libraries needed to make the code able to start up. The linker scripts give the linker the constraints in which it can assign addresses. So it’s all highly platform-specific again. The linker can either assign the addresses in a completely deterministic fashion, ie. the same inputs produces identical output always, or it can be told to assign certain kinds of addresses at random (in a non-overlapping fashion of course). That’s known as ASLR (address space randomization).

Not sure about Visual C but gcc (or rather ld) uses a linker script to determine final addresses. This can be specified using the -T option. Full details of gcc linker scripts can be found at: https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts.
Normally you don't need to play with this since your toolchain will be built either for the host machine or when cross compiling with the correct settings for a target.
For ASLR, and .so files you will need to compile with the -PIC or -PIE (position independent code or position independent executable). You compiled code will only contain offsets against some base address in memory. The base address then gets set by the operating system loader before running your application.

Those addresses are base addresses and offsets. ELF file contains special information on how to calculate the actual addresses when the program is loaded. It is a rather advanced topic but how the .elf file is loaded and executed you can read here: How do I load and execute an ELF binary executable manually? or https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/

Related

How can compilation occur without symbol resolution?

Here is my question. Suppose you want to compile the c code:
void some_function() {
write_string("Hello, World!\n");
}
For this example, I want to focus specifically on the string: "Hello, World!\n". My understanding is that the compiler will put the string into the .rodata section in an elf file. A symbol, referring to its location in the .rodata section, is added to the symbol table and that symbol is kept in the .text section as a placeholder for the location of the string.
Here is the problem. How can you leave a value like that unresolved in machine code? In x86, it should be easy enough for the linker to do a find and replace on the symbol when the location is known. However, there are many CPU architectures where an address can not be encoded in its entirety into a single machine instruction. Therefore the value would have to be loaded in 2 stages, using separate machine instructions and the linker would have to figure that out. It would have to be smart enough to manipulate the machine code with half the address in one place the half the address in another. Furthermore, somehow the elf file has to represent this complex encoding scheme for the linker later on. How does this all work?
I most programs, this will be in a user space application. So the kernel may load the .rodata section wherever it wants in memory. So it would seem that when the program is loaded, somehow, at runtime, the kernel loader would have to resolve all these symbols in the program prior to beginning execution. It would have to inject into the machine code where it put each section so they may be referenced appropriately. How does this work?
I have a feeling that my understanding and above descriptions are wrong or that I am missing something very important because this does not seem right to me. Ether that, or there is in fact the logic to preform these complex functions within modern kernels and linkers. I am looking for some further explanation and understanding.
Compilation takes place, emitting something like this:
lea rdi, [rip+some_function.hello_world]
mov rax, [rip+some_function.write_string]
call rax
after the asm pass, we end up with something that disassembles to
lea rdi, [rip+00000000]
mov rax, [rip+00000000]
call rax
where the two 00000000 slots are filled as load-time fixups. The loader performs symbol resolution and fills in the 00000000 values with the correct values.
This is a simplification. In reality there's an extra layer of indirection called the global offset table, which is used (among other things) to put all the fixups right next to each other.
The innards of how this works is CPU and OS specific, but in general you don't really have to care exactly how it works, and it could change in the next release of the compiler (and has changed at least twice already). The loader understands fixups at a very generic level using a fixup table, and can deal with new ideas so long as they resolve to put (absolute or relative) address of a symbol at offset + size.
The Alpha processor had it kind of bad back in the day. Fixups had to be in between functions, and relative addressing could be only done in signed 16 bit sizes, so the fixups for functions were located immediately before or after each function, and presumably you got an error in the ASM pass if the pointer didn't fit because the function was too big. I did come up with a clever sequence that would have fixed the problem on Alpha, but that was long after the platform was retired, and nobody cares anymore so it never got implemented.
I remember the bad old days from before the loader could do good patchups. There once was a global (and I really do mean global) table of shared library load addresses, and the compiler emitted absolute addresses and you had to rebuild your application if you changed a library, even though you used shared libraries. That just wasn't the brightest ideas, and no wonder people keps statically linked emergency binaries lying around. Breaking libc wasn't fun.

Running address of an application, followed by heap and stack expansions

I have an m.c:
extern void a(char*);
int main(int ac, char **av){
static char string [] = "Hello , world!\n";
a(string);
}
and an a.c:
#include <unistd.h>
#include <string.h>
void a(char* s){
write(1, s, strlen(s));
}
I compile and build these as:
g++ -c -g -std=c++14 -MMD -MP -MF "m.o.d" -o m.o m.c
g++ -c -g -std=c++14 -MMD -MP -MF "a.o.d" -o a.o a.c
g++ -o linux m.o a.o -lm -lpthread -ldl
Then, I examine the executable, linux thus:
objdump -drwxCS -Mintel linux
The output of this on my Ubuntu 16.04.6 starts off with:
start address 0x0000000000400540
then, later, is the init section:
00000000004004c8 <_init>:
4004c8: 48 83 ec 08 sub rsp,0x8
Finally, is the fini section:
0000000000400704 <_fini>:
400704: 48 83 ec 08 sub rsp,0x8
400708: 48 83 c4 08 add rsp,0x8
40070c: c3 ret
The program references the string Hello , world!\n which is in .data section obtained by command:
objdump -sj .data linux
Contents of section .data:
601030 00000000 00000000 00000000 00000000 ................
601040 48656c6c 6f202c20 776f726c 64210a00 Hello , world!..
All of this tells me that the executable has been created so as to be loaded in actual memory address starting from around 0x0000000000400540 (address of .init) and the program accesses data in actual memory address extending until atleast 601040 (address of .data)
I base this on Chapter 7 of "Linkers & Loaders" by John R Levine, where he states:
A linker combines a set of input files into a single output file that
is ready to be loaded at a specific address.
My question is about the next line.
If, when the program is loaded, storage at that address isn't
available, the loader has to relocate the loaded program to reflect
the actual load address.
(1) Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040, how is it decided where to start my new executable linux?
(2) Related to this, in Chapter 4, it is stated:
..ELF objects...are loaded in about the middle of the address space so
the stack can grown down below the text segment and the heap can grow
up from the end of the data, keeping the total address space in use
relatively compact.
Suppose a previous running application started at, say, 200000 and now linux starts around 400540. There is no clash or overlap of memory address. But as the programs continue, suppose the heap of the previous application creeps up to 300000, while the stack of the newly launched linux has grown downward to 310000. Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?
If, when the program is loaded, storage at that address isn't available, the loader has to relocate the loaded program to reflect the actual load address.
Not all file formats support this:
GCC for 32-bit Windows will add the information required for the loader in the case of dynamic libraries (.dll). However, the information is not added to executable files (.exe), so such an executable file must be loaded to a fixed address.
Under Linux it is a bit more complicated; however, it is also not possible to load many (typically older 32-bit) executable files to different addresses while dynamic libraries (.so) can be loaded to different addresses.
Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040 ...
Modern computers (all x86 32-bit computers) have a paging MMU which is used by most modern operating systems. This is some circuit (typically in the CPU) which translates addresses seen by the software to addresses seen by the RAM. In your example, 400540 could be translated to 1234000, so accessing the address 400540 will actually access the address 1234000 in RAM.
The point is: Modern OSs use different MMU configurations for different tasks. So if you start your program again, a different MMU configuration is used that translates address 400540 seen by the software to address address 2345000 in RAM. Both programs using address 400540 can run at the same time because one program will actually access address 1234000 and the other one will access address 2345000 in RAM when the programs access the address 400540.
This means that some address (e.g. 400540) will never be "already in use" when the executable file is loaded.
The address may already be in use when a dynamic library (.so/.dll) is loaded because these libraries share the memory with the executable file.
... how is it decided where to start my new executable linux?
Under Linux the executable file will be loaded to the fixed address if it was linked in a way that it cannot be moved to another address. (As already said: This was typical for older 32-bit files.) In your example the "Hello world" string would be located at address 0x601040 if your compiler and linker created the executable that way.
However, most 64-bit executables can be loaded to a different address. Linux will load them to some random address because of security reasons making it more difficult for viruses or other malware to attack the program.
... so the stack can grown down below the text segment ...
I've never seen this memory layout in any operating system:
Both under Linux and under Solaris the stack was located at the end of the address space (somewhere around 0xBFFFFF00), while the text segment was loaded quite close to the start of the memory (maybe address 0x401000).
... and the heap can grow up from the end of the data, ...
suppose the heap of the previous application creeps up ...
Many implementations since the late 1990s do not use heap any more. Instead, they use mmap() to reserve new memory.
According to the manual page of brk(), the heap was declared as "legacy feature" in the year 2001, so it should not be used by new programs any longer.
(However, according to Peter Cordes malloc() still seems to use the heap in some cases.)
Unlike "simple" operating systems like MS-DOS, Linux does not allow you "simply" to use the heap, but you have to call the function brk() to tell Linux how much heap you want to use.
If a program uses heap and it uses more heap than available, the brk() function returns some error code and the malloc() function simply returns NULL.
However, this situation typically happens because no more RAM is available and not because the heap overlaps with some other memory area.
... while the stack of the newly launched linux has grown downward to ...
Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?
Indeed, the size of the stack is limited.
If you use too much stack, you have a "stack overflow".
This program will intentionally use too much stack - just to see what happens:
.globl _start
_start:
sub $0x100000, %rsp
push %rax
push %rax
jmp _start
In the case of an operating system with an MMU (such as Linux), your program will crash with an error message:
~$ ./example_program
Segmentation fault (core dumped)
~$
EDIT/ADDENDUM
Is stack for all running programs located at the "end"?
In older Linux versions, the stack was located near (but not exactly at) the end of the virtual memory accessible by the program: Programs could access the address range from 0 to 0xBFFFFFFF in those Linux versions. The initial stack pointer was located around 0xBFFFFE00. (The command line arguments and environment variables came after the stack.)
And is this the end of actual physical memory? Will not the stack of different running programs then get mixed up? I was under the impression that all of the stack and memory of a program remains contiguous in actual physical memory, ...
On a computer using an MMU, the program never sees physical memory:
When the program is loaded, the OS will search some free area of RAM - maybe it finds some at the physical address 0xABC000. Then it configures the MMU in a way that the virtual addresses 0xBFFFF000-0xBFFFFFFF are translated to the physical addresses 0xABC000-0xABCFFF.
This means: Whenever the program accesses address 0xBFFFFE20 (for example using a push operation), the physical address 0xABCE20 in the RAM is actually accessed.
There is no possibility for a program at all to access a certain physical address.
If you have another program running, the MMU is configured in a way that the addresses 0xBFFFF000-0xBFFFFFFF are translated to the addresses 0x345000-0x345FFF when the other program is running.
So if one of the two programs will perform a push operation and the stack pointer is 0xBFFFFE20, the address 0xABCE20 in RAM will be accessed; if the other program performs a push operation (with the same stack pointer value), the address 0x345E20 will be accessed.
Therefore, the stacks will not mix up.
OSs not using an MMU but supporting multi-tasking (examples are the Amiga 500 or early Apple Macintoshes) will of course not work this way. Such OSs use special file formats (and not ELF) which are optimized for running multiple programs without MMU. Compiling programs for such OSs is much more complex than compiling programs for Linux or Windows. And there are even restrictions for the software developer (example: functions and arrays should not be too long).
Also, does each program have its own stack pointer, base pointer, registers, etc.? Or does the OS just have one set of these registers to be shared by all programs?
(Assuming a single-core CPU), the CPU has one set of registers; and only one program can run at the same time.
When you start multiple programs, the OS will switch between the programs. This means program A runs for (for example) 1/50 second, then program B runs for 1/50 second, then program A runs for 1/50 second and so on. It appears to you as if the programs run the same time.
When the OS switches from program A to program B, it must first save the values of the registers (of program A). Then it must change the MMU configuration. Finally it must restore program B's register values.
Yes, objdump on this executable shows addresses where its segments will be mapped. (Linking collects sections into segments: What's the difference of section and segment in ELF file format) .data and .text get linked into different sections with different permissions (read+write vs. read+exec).
If, when the program is loaded, storage at that address isn't available
That could only happen when loading a dynamic library, not the executable itself. Virtual memory means that each process has its own private virtual address space, even if they were started from the same executable. (This is also why ld can always pick the same default base address for the text and data segments, not trying to slot every executable and library on the system into a different spot in a single address space.)
An executable is the first thing that gets to lay claim to parts of that address space, when it's loaded/mapped by the OS's ELF program loader. That's why traditional (non-PIE) ELF executables can be non-relocatable, unlike ELF shared objects like /lib/libc.so.6
If you single-step a program with a debugger, or include a sleep, you'll have time to look at less /proc/<PID>/maps. Or cat /proc/self/maps to have cat show you its own map. (Also /proc/self/smaps for more details info on each mapping, like how much of it is dirty, using hugepages, etc.)
(Newer GNU/Linux distros configure GCC to make PIE executables by default: 32-bit absolute addresses no longer allowed in x86-64 Linux?. In that case objdump would only see addresses relative to a base of 0 or 1000 or something. And compiler-generated asm would have used PC-relative addressing, not absolute.)

what does PC have to do with load or link address?

Link address is the address where execution of a program takes place, while load address is the address in memory where the program is actually placed.
Now i'm confused what is the value in program counter? is it the load address or is it the link address?
Link address is the address where execution of a program takes place
No, it's not.
while load address is the address in memory where the program is actually placed.
Kind of. The program usually consists of more than one instruction, so it can't be placed at a single "load address".
When people talk about load address, they usually talk about relocatable code that can be relocated (at runtime) to an arbitrary load address.
For example, let's take a program that is linked at address 0x20020, and consists of 100 4-byte instructions, which all execute sequentially (e.g. it's a sequence of ADDs followed by a single SYSCALL to exit the pogram).
If such a program is loaded at address 0x20020, then at runtime the program counter will have value 0x20020, then it will advance to the next instruction at 0x20024, then to 0x20028, etc. until it reaches the last instruction of the program at 0x201ac.
But if that program is loaded at address 0x80020020 (i.e. if the program is relocated by 0x80000000 from its linked-at address), then the program counter will start at 0x80020020, and the last instruction will be at 0x800201ac.
Note that on many OSes executables are not relocatable and thus have to always be loaded at the same address they were linked at (i.e. with relocation 0; in this case "link address" really is the address where execution starts), while shared libraries are almost always relocatable and are often linked at address 0 and have non-zero relocation.
Both are different concepts, used in different context. The Linker/Loader is mainly responsible for code relocation and modification; the PC is a digital counter which indicates the positioning of program sequence(not a type's address/location like linker/loader).
Linking & Loading :-
The heart of a linker or loader's actions is relocation and code
modification. When a compiler or assembler generates an object file,
it generates the code using the unrelocated addresses of code and data
defined within the file, and usually zeros for code and data defined
elsewhere. As part of the linking process, the linker modifies the
object code to reflect the actual addresses assigned. For example,
consider this snippet of x86 code that moves the contents of variable
a to variable b using the eax register.
mov a,%eax
mov %eax,b
If a is defined in the same file at location 1234 hex and b is
imported from somewhere else, the generated object code will be:
A1 34 12 00 00 mov a,%eax
A3 00 00 00 00 mov %eax,b
Each instruction contains a one-byte operation code followed by a
four-byte address. The first instruction has a reference to 1234 (byte
reversed, since the x86 uses a right to left byte order) and the
second a reference to zero since the location of b is unknown.
Now assume that the linker links this code so that the section in
which a is located is relocated by hex 10000 bytes, and b turns out to
be at hex 9A12. The linker modifies the code to be:
A1 34 12 01 00 mov a,%eax
A3 12 9A 00 00 mov %eax,b
That is, it adds 10000 to the address in the first instruction so now
it refers to a's relocated address which is 11234, and it patches in
the address for b. These adjustments affect instructions, but any
pointers in the data part of an object file have to be adjusted as
well.
Program Counter :-
The program counter (PC) is a processor register that indicates where
a computer is in its program sequence.
In a typical central processing unit (CPU), the PC is a digital
counter (which is the origin of the term "program counter") that may
be one of many registers in the CPU hardware. The instruction cycle
begins with a fetch, in which the CPU places the value of the PC on
the address bus to send it to the memory.
The memory responds by
sending the contents of that memory location on the data bus. (This is
the stored-program computer model, in which executable instructions
are stored alongside ordinary data in memory, and handled identically
by it).
Following the fetch, the CPU proceeds to execution, taking
some action based on the memory contents that it obtained. At some
point in this cycle, the PC will be modified so that the next
instruction executed is a different one (typically, incremented so
that the next instruction is the one starting at the memory address
immediately following the last memory location of the current
instruction).
I would put the term "load address" out of your thinking. It does not really exist in a modern operating system. In ye old days of multiple programs loaded into the same address space (and each program loaded into a contiguous region of memory), load address had significance. Now it does not. He's why.
An executable file is typically going to define a number of different program segments. These may not be loaded contiguously in memory. For example, the linker often directs the creation of stack areas remote from other areas of the program.
The executable will indicate the location that should be the initial value of the PC. This might not be at the start of a program segment, let alone be in the first program segment.

In ELF or DWARF, how can I get .PLT section values? -- Trying to get the address of a function on where an instrumentation tool is in

I am working in obtaining all the data of a program using its ELF and DWARF info and by hooking a pin tool to a process that is currently running -- It is kind of a debugger using a Pin tool.
For getting the local variables from the stack I am working with the registers EIP, EBP and ESP which I have access to from Pin.
What stroke me as weird is that I was expecting EIP to be pointing to the current function that was running when the pin tool was attached to the process, but instead EIP is pointing to the section .PLT. In other words, if the pin tool was hooked into the process when Foo() was running, then I was expecting EIP to be pointing to some address inside the Foo function. However it is pointing to the beginning of the .PLT section.
What I need to know is which function the process is currently in -- Is there any way to get the address of the function using the .PLT section? Is there any other ways to get the address of the function from the stack or using Pin? I hope I was clear enough, let me know if there are any questions though.
I might not be understanding exactly what is going on here...is the instruction pointer really in the .plt section or are you just getting a garbage value from Pin ?
You name the instruction pointer you are reading EIP, which might be a problem if you are running on a 64bit system, is that the case ?
You see the instruction pointer register is a 32bit value on a 32bit system, and a 64bit value on a 64bit system. So Pin actually provides 3 REG_* names for the instruction pointer: EIP, RIP and GBP. EIP is always the lower 32bit half of the register, RIP the 64bit value, and GBP one of the two depending on your architecture. Asking for EIP on a 64bit system gives you garbage, same for asking RIP on a 32bit one.
Otherwise, a quick look on Google gives me this. Quoting a bit:
By default the .plt entries are all initialized by the linker not to point to the correct target functions, but instead to point to the dynamic loader itself. Thus, the first time you call any given function, the dynamic loader looks up the function and fixes the target of the .plt so that the next time this .plt slot is used we call the correct function.
And more importantly:
It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example.
Hope that helps.

How to use relative position in c/assembly?

It's said Position Independent Code only uses relative position instead of absolute positions, how's this implemented in c and assembly respectively?
Let's take char test[] = "string"; as an example, how to reference it by relative address?
In C, position-independent code is a detail of the compiler's implementation. See your compiler manual to determine whether it is supported and how.
In assembly, position-independent code is a detail of the instruction set architecture. See your CPU manual to find out how to read the PC (program counter) register, how efficient that is, and what the recommended best practices are in translating a code address to a data address.
Position-relative data is less popular now that code and data are separated into different pages on most modern operating systems. It is a good way to implement self-contained executable modules, but the most common such things nowadays are viruses.
On x86, position-independent code in principle looks like this:
call 1f
1: popl %ebx
followed by use of ebx as a base pointer with a displacement equal to the distance between the data to be accessed and the address of the popl instruction.
In reality it's often more complicated, and typically a tiny thunk function might be used to load the PIC register like this:
load_ebx:
movl 4(%esp),%ebx
addl $some_offset,%ebx
ret
where the offset is chosen such that, when the thunk returns, ebx contains a pointer to a designated special point in the program/library (usually the start of the global offset table), and then all subsequent ebx-relative accesses can simply use the distance between the desired data and the designated special point as the offset.
On other archs everything is similar in principle, but there may be easier ways to load the program counter. Many simply let you use the pc or ip register as an ordinary register in relative addressing modes.
In pseudo code it could look like:
lea str1(pc), r0 ; load address of string relative to the pc (assuming constant strings, maybe)
st r0, test ; save the address in test (test could also be PIC, in which case it could be relative
; to some register)
A lot depends on your compiler and CPU architecture, as the previous answer stated. One way to find out would be to compile with the appropriate flags (-PIC -S for gcc) and look at the assembly language you get.

Resources