load-time ELF relocation - c

I am writing a simple user-space ELF loader under Linux (why? for 'fun'). My loader at the moment is quite simple and is designed to load only statically-linked ELF files containing position-independent code.
Normally, when a program is loaded by the kernel's ELF loader, it is loaded into its own address space. As such, the data segment and code segment can be loaded at the correct virtual address as specified in the ELF segments.
In my case, however, I am requesting addresses from the kernel via mmap, and may or may not get the addresses requested in the ELF segments. This is not a problem for the code segment since it is position independent. However, if the data segment is not loaded at the expected address, code will not be able to properly reference anything stored in the data segment.
Indeed, my loader appears to work fine with a simple assembly executable that does not contain any data. But as soon as I add a data segment and reference it, the executable fails to run correctly or SEGFAULTs.
How, if possible, can I fixup any references to the data segment to point to the correct place? Is there a relocation section stored in the (static) ELF file for this purpose?

If you modify the absolute addresses available in the .got section, (global offset table) your program should work. Make sure to modify the absolute address calculation to cater for the new distance between .text and .data, I'm afraid you need to figure out where this information comes from, for your architecture.
See this: Global Offset Table (Processor-Specific)
Good luck.

I don't see any way you can do that, unless you emulate the kernel-provided virtual address space completely, and run the code inside that virtual space. When you mmap the data section from the file, you are intrinsically relocating it to an unknown address of the virtual address space of your ELF interpreter, and your code will not be able to reference to it in any way.
Glad to be proven wrong. There's something very cool to learn here.

Related

How does the linker know where a particular section of an executable will be mapped into the address space of a process?

I was reading about how programs get loaded into memory. I wanted to understand how different sections (like text,data,rodata etc.) of a PE/ELF file are mapped at different places in the virtual address space of a process. I particularly wanted to understand how does the linker know where will a particular section (say for example rodata) will be mapped onto the address space, so that it can resolve correct addresses for all the symbols before creating the executable.
Do operating systems (eg. Windows) have a fixed range of virtual addresses for a process where they load/map a particular section? If not, then how will the linker resolve the correct addresses for the symbols in different sections?
It doesn't. Linker can only propose the executable image be loaded at certain VA, usually 0x00400000 for PE or 0x10000000 for DLL. Virtual adresses of sections (.text, .rodata, .data etc) are aligned by the section alignment (usually 0x00001000) and their proposed VA are therefore 0x00401000, 0x00402000 etc. Linker then fixes adresses of symbols to those assumed VAs.
The default ImageBase address (taken from linker script or linker arguments) is not required by OS loader, but I don't see a reason to change it, its a good habit to see nice rounded addresses during debugging.
In rare cases the Windows loader may find out that part of the proposed address space is occupied, so it will load the image at a different VA and fix position-dependent (absolute) references to their new VA.
Special PE section relocs contains addresses of references to program symbols which need relocation at load-time.

Symbol Resolution and Dynamic Linking

I have been reading about the relocation and symbol resolution process and I have a few questions on the same.
So the whole process(of loading the exec) starts with exec(BA_OS) command. During exec(BA_OS), the system retrieves a path name from the PT_INTERP segment and creates the initial process image from the interpreter file’s segments. That is, instead of using the original executable file’s segment images, the system composes a memory image for the interpreter. It then is the interpreter’s responsibility to receive control from the system and provide an environment for the application program.
After that dynamic linker does the following(as long as LD_BIND_NOW has a non-null value):
Adding the executable file's memory segments to the process image;
Adding shared object memory segments to the process image;
Performing relocations for the executable file and its shared
objects;
Closing the file descriptor that was used to read the executable
file, if one was given to the dynamic linker;
Transferring control to the program, making it look as if the program
had received control directly from exec(BA_OS).
So, my question now is
1. When are these shared objects loaded in memory?
The second step above states that linker adds shared object memory segments to the process image. Do all(and by all I mean shared object and their dependencies and dependencies of them and so on) the libraries are loaded at this point? Or the linker only creates a process image using dependencies and loads the library in physical memory later when required?
2. How dynamic linker get the address to patch the GOT entry of the symbol?
Below is what happens(or what I know what happens) when a function is called for the first time.
Jump to the PLT entry of our symbol.
Jump to the GOT entry of our symbol.
Jump back to the PLT entry and push an offset on the stack. That
the offset is actually an Elf_Rel structure describing how to patch the symbol.
Jump to the PLT stub entry.
Push a pointer to a link_map structure(on another article this was a pointer to the Relocation table) in order for the linker to
find in which library the symbol belongs to.
Call the dynamic linker.
Patch the GOT entry.
What does the dynamic linker do when it is called? How does it use the items on the stack (the offset and the pointer) to patch the GOT entry?
Suppose there are a large number of .so files, then is there any order in which dynamic linker searches through those files?
I have just started learning about this stuff. Please correct me if my understanding is incorrect. If you know of any other good resources please let me know.

Where memory segments are defined?

I just learned about different memory segments like: Text, Data, Stack and Heap. My question is:
1- Where the boundaries between these sections are defined? Is it in Compiler or OS?
2- How the compiler or OS know which addresses belong to each section? Should we define it anywhere?
This answer is from the point of view of a more special-purpose embedded system rather than a more general-purpose computing platform running an OS such as Linux.
Where the boundaries between these sections are defined? Is it in Compiler or OS?
Neither the compiler nor the OS do this. It's the linker that determines where the memory sections are located. The compiler generates object files from the source code. The linker uses the linker script file to locate the object files in memory. The linker script (or linker directive) file is a file that is a part of the project and identifies the type, size and address of the various memory types such as ROM and RAM. The linker program uses the information from the linker script file to know where each memory starts. Then the linker locates each type of memory from an object file into an appropriate memory section. For example, code goes in the .text section which is usually located in ROM. Variables go in the .data or .bss section which are located in RAM. The stack and heap also go in RAM. As the linker fills one section it learns the size of that section and can then know where to start the next section. For example, the .bss section may start where the .data section ended.
The size of the stack and heap may be specified in the linker script file or as project options in the IDE.
IDEs for embedded systems typically provide a generic linker script file automatically when you create a project. The generic linker file is suitable for many projects so you may never have to customize it. But as you customize your target hardware and application further you may find that you also need to customize the linker script file. For example, if you add an external ROM or RAM to the board then you'll need to add information about that memory to the linker script so that the linker knows how to locate stuff there.
The linker can generate a map file which describes how each section was located in memory. The map file may not be generated by default and you may need to turn on a build option if you want to review it.
How the compiler or OS know which addresses belong to each section?
Well I don't believe the compiler or OS actually know this information, at least not in the sense that you could query them for the information. The compiler has finished its job before the memory sections are located by the linker so the compiler doesn't know the information. The OS, well how do I explain this? An embedded application may not even use an OS. The OS is just some code that provides services for an application. The OS doesn't know and doesn't care where the boundaries of memory sections are. All that information is already baked into the executable code by the time the OS is running.
Should we define it anywhere?
Look at the linker script (or linker directive) file and read the linker manual. The linker script is input to the linker and provides the rough outlines of memory. The linker locates everything in memory and determines the extent of each section.
For your Query :-
Where the boundaries between these sections are defined? Is it in Compiler or OS?
Answer is OS.
There is no universally common addressing scheme for the layout of the .text segment (executable code), .data segment (variables) and other program segments. However, the layout of the program itself is well-formed according to the system (OS) that will execute the program.
How the compiler or OS know which addresses belong to each section? Should we define it anywhere?
I divided your this question into 3 questions :-
About the text (code) and data sections and their limitation?
Text and Data are prepared by the compiler. The requirement for the compiler is to make sure that they are accessible and pack them in the lower portion of address space. The accessible address space will be limited by the hardware, e.g. if the instruction pointer register is 32-bit, then text address space would be 4 GiB.
About Heap Section and limit? Is it the total available RAM memory?
After text and data, the area above that is the heap. With virtual memory, the heap can practically grow up close to the max address space.
Do the stack and the heap have a static size limit?
The final segment in the process address space is the stack. The stack takes the end segment of the address space and it starts from the end and grows down.
Because the heap grows up and the stack grows down, they basically limit each other. Also, because both type of segments are writeable, it wasn't always a violation for one of them to cross the boundary, so you could have buffer or stack overflow. Now there are mechanism to stop them from happening.
There is a set limit for heap (stack) for each process to start with. This limit can be changed at runtime (using brk()/sbrk()). Basically what happens is when the process needs more heap space and it has run out of allocated space, the standard library will issue the call to the OS. The OS will allocate a page, which usually will be manage by user library for the program to use. I.e. if the program wants 1 KiB, the OS will give additional 4 KiB and the library will give 1 KiB to the program and have 3 KiB left for use when the program ask for more next time.
Most of the time the layout will be Text, Data, Heap (grows up), unallocated space and finally Stack (grows down). They all share the same address space.
The sections are defined by a format which is loosely tied to the OS. For example on Linux you have ELF and on Mac OS you have Mach-O.
You do not define the sections explicitly as a programmer, in 99.9% of cases. The compiler knows what to put where.

Loading a non-relocatable, static ELF binary in userspace

I'm trying to write a basic userspace ELF loader that should be able to load statically linked (not dynamically linked) non-relocatable binaries (i.e. not built with -pie, -fPIE and so on). It should work on x86 CPU's for now.
I've followed the code on loading ELF file in C in user space and it works well when the executable is relocatable, but as expected completely fails if it isn't since the program is loaded in the wrong virtual memory range and instantly crashes.
But I tried modifying it to load the program at the virtual offset it expects (using phdr.p_vaddr) but I ran into a complication: my loader is already using that virtual memory range! I can't mmap it, much less write anything into it. How do I proceed so that I can load my non-relocatable binary into my loader's address space without overwriting the loader's own code before it's finished? Do I need to get my loader to run from a completely different virtual memory range, perhaps by getting the linker to link it way above the usual virtual memory range for a non-relocatable binary (which happens to start at 0x400000 in my case) or is there some trick to it?
I've read the ELF documentation (I am working with ELF64 here by the way, but I think ELF32 and ELF64 are very similar) and a lot of documents on the web and I still don't get it.
Can someone explain how an ELF loader deals with this particular complication? Thanks!
Archimedes called "heureka" when he found that at a location can only be one object. If your ELF binary must be at one location because you can't rebuild it for another location you have to relocate the loader itself.
The non-relocatable ELF doesn't include enough Information to move it to a different address. You could probably write a decompiler that detects all address references in the code but it's not worth. You will have problems when you try to analyze data references like pointers stored in pre-initialized variables.
Rewrite the loader if you can't get the source code of you ELF binary or a relocatable version.
BTW: Archimedes heureka was deadly for the goldsmith who cheated. I hope it's not so expensive in your case.

load time relocation and virtual memory

I am wondering what load-time relocation actually means on a system with virtual memory support.I was thinking that in a system with virtual memory every executable will have addresses starting from zero and at run-time the addresses will be translated into physical addresses using page tables.Therefore the executable can be loaded anywhere in memory without the need of any relocation. However this article on shared libraries mentions that linker specifies an address in the executable where the executable is to be loaded (Entry-point address).
http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
Also there are many articles on dynamic linking which talk about absolute addresses.
Is my understanding wrong ?
Load-time relocation and virtual memory support are two different concepts. Almost all CPUs and OSes these days have virtual memory support. The only really important point to understand about virtual memory is this: forget physical addresses. That is now a hardware and OS responsibility and, unless you are writing a paging system, you can forget about physical addresses. All addresses that a program uses are virtual addresses. This is a huge advantage and immensely simplifies the programming model. On 32-bit systems, this simply means that each process gets its own 4 GiB memory space, ranging from 0x00000000 to 0xffffffff.
An .exe represents a process. A linker produces .exe from .obj files. While both are binary files, .obj files are not executable because they do not contain the addresses of all the variables and functions. It is the job of the linker to provide these addresses, which it determines by placing these .obj files end-to-end and then computing the exact addresses of all the symbols (functions and variables). Thus, the .exe that is created has every address of functions and variables "hard-coded" into it. But there is still one critical information needed before the .exe can be created. The linker has to have insider knowledge about where in memory the .exe will be loaded. Will it be at address 0x00000000, or at 0xffff0000, or somewhere else? For example, in Windows all .exes are always loaded at an absolute starting address of 0x00400000. This is called the base address. When the linker generates the final addresses of symbols (functions and variables), it computes those from this address onward.
Now, .exes rarely need to be loaded at any other address. But the same is not true for .dlls. .ddls are the same as .exes (both are formatted in the portable executable (PE) file format, which describes the memory layout, for example, where text goes, where data goes, and how to find which one). .dlls have a preferred address, too. This simply means that the linker uses this value when it computes the addresses for symbols inside the .dll. If the .dll is loaded at this address, then we are all set.
But if the .dll cannot be loaded at this address (say it was 0x10000000) because some other .dll had already been loaded at this address, then the loader will find some other space in memory and load the .dll there. However, the global addresses of functions and symbols in the .dll are now incorrect. Thus, the loader has to do a relocation (also called "fixup"), in which it adjusts the addresses of all global symbols and functions to reflect their actual addresses.
In order to do this adjustment, the loader needs to be able to find all such symbols in the .dll. The PE file has a .reloc section that contains the internal offset of all such symbols.
Of course, there are other details, for example, regarding how indirection can be used when the compiler generated the code so that, instead of making direct calls, the calls are indirect and variables are accessed via known memory locations in the header of the .exe.
Finally, the gist is this: You need relocation (of some sort) to adjust addresses in the call and jump as well as variable access instructions when the code does not load at the position (within the 4 GiB address space) it was expected to load. When the OS loads a .exe, it has to pick a suitable place in this 4 GiB address space where it will copy the code and data chunks from this .exe on disk.

Resources