Symbol Resolution and Dynamic Linking - c

I have been reading about the relocation and symbol resolution process and I have a few questions on the same.
So the whole process(of loading the exec) starts with exec(BA_OS) command. During exec(BA_OS), the system retrieves a path name from the PT_INTERP segment and creates the initial process image from the interpreter file’s segments. That is, instead of using the original executable file’s segment images, the system composes a memory image for the interpreter. It then is the interpreter’s responsibility to receive control from the system and provide an environment for the application program.
After that dynamic linker does the following(as long as LD_BIND_NOW has a non-null value):
Adding the executable file's memory segments to the process image;
Adding shared object memory segments to the process image;
Performing relocations for the executable file and its shared
objects;
Closing the file descriptor that was used to read the executable
file, if one was given to the dynamic linker;
Transferring control to the program, making it look as if the program
had received control directly from exec(BA_OS).
So, my question now is
1. When are these shared objects loaded in memory?
The second step above states that linker adds shared object memory segments to the process image. Do all(and by all I mean shared object and their dependencies and dependencies of them and so on) the libraries are loaded at this point? Or the linker only creates a process image using dependencies and loads the library in physical memory later when required?
2. How dynamic linker get the address to patch the GOT entry of the symbol?
Below is what happens(or what I know what happens) when a function is called for the first time.
Jump to the PLT entry of our symbol.
Jump to the GOT entry of our symbol.
Jump back to the PLT entry and push an offset on the stack. That
the offset is actually an Elf_Rel structure describing how to patch the symbol.
Jump to the PLT stub entry.
Push a pointer to a link_map structure(on another article this was a pointer to the Relocation table) in order for the linker to
find in which library the symbol belongs to.
Call the dynamic linker.
Patch the GOT entry.
What does the dynamic linker do when it is called? How does it use the items on the stack (the offset and the pointer) to patch the GOT entry?
Suppose there are a large number of .so files, then is there any order in which dynamic linker searches through those files?
I have just started learning about this stuff. Please correct me if my understanding is incorrect. If you know of any other good resources please let me know.

Related

How does the linker know where a particular section of an executable will be mapped into the address space of a process?

I was reading about how programs get loaded into memory. I wanted to understand how different sections (like text,data,rodata etc.) of a PE/ELF file are mapped at different places in the virtual address space of a process. I particularly wanted to understand how does the linker know where will a particular section (say for example rodata) will be mapped onto the address space, so that it can resolve correct addresses for all the symbols before creating the executable.
Do operating systems (eg. Windows) have a fixed range of virtual addresses for a process where they load/map a particular section? If not, then how will the linker resolve the correct addresses for the symbols in different sections?
It doesn't. Linker can only propose the executable image be loaded at certain VA, usually 0x00400000 for PE or 0x10000000 for DLL. Virtual adresses of sections (.text, .rodata, .data etc) are aligned by the section alignment (usually 0x00001000) and their proposed VA are therefore 0x00401000, 0x00402000 etc. Linker then fixes adresses of symbols to those assumed VAs.
The default ImageBase address (taken from linker script or linker arguments) is not required by OS loader, but I don't see a reason to change it, its a good habit to see nice rounded addresses during debugging.
In rare cases the Windows loader may find out that part of the proposed address space is occupied, so it will load the image at a different VA and fix position-dependent (absolute) references to their new VA.
Special PE section relocs contains addresses of references to program symbols which need relocation at load-time.

Loading a non-relocatable, static ELF binary in userspace

I'm trying to write a basic userspace ELF loader that should be able to load statically linked (not dynamically linked) non-relocatable binaries (i.e. not built with -pie, -fPIE and so on). It should work on x86 CPU's for now.
I've followed the code on loading ELF file in C in user space and it works well when the executable is relocatable, but as expected completely fails if it isn't since the program is loaded in the wrong virtual memory range and instantly crashes.
But I tried modifying it to load the program at the virtual offset it expects (using phdr.p_vaddr) but I ran into a complication: my loader is already using that virtual memory range! I can't mmap it, much less write anything into it. How do I proceed so that I can load my non-relocatable binary into my loader's address space without overwriting the loader's own code before it's finished? Do I need to get my loader to run from a completely different virtual memory range, perhaps by getting the linker to link it way above the usual virtual memory range for a non-relocatable binary (which happens to start at 0x400000 in my case) or is there some trick to it?
I've read the ELF documentation (I am working with ELF64 here by the way, but I think ELF32 and ELF64 are very similar) and a lot of documents on the web and I still don't get it.
Can someone explain how an ELF loader deals with this particular complication? Thanks!
Archimedes called "heureka" when he found that at a location can only be one object. If your ELF binary must be at one location because you can't rebuild it for another location you have to relocate the loader itself.
The non-relocatable ELF doesn't include enough Information to move it to a different address. You could probably write a decompiler that detects all address references in the code but it's not worth. You will have problems when you try to analyze data references like pointers stored in pre-initialized variables.
Rewrite the loader if you can't get the source code of you ELF binary or a relocatable version.
BTW: Archimedes heureka was deadly for the goldsmith who cheated. I hope it's not so expensive in your case.

How can I manually (programmatically) place objects in my multicore project?

I am developing a mutlicore project for our embedded architecture using the gnu toolchain. In this architecture, all independent cores share the same global flat memory space. Each core has its own internal memory, which is addressable from any other core through its global 32-bit address.
There is no OS implemented and we do low-level programming, but in C instead of assembly. Each core has its own executable, generated with a separate compilation. The current method we use for inter-core communication is through calculation of absolute addresses of objects in the destination core's data space. If we build the same code for all cores, then the objects are located by the linker in the same place, so accessing an object in a remote core is merely changing the high-order bits of the address of the object in the current core and making the transaction. Similar concept allows us to share objects that are located in the external DRAM.
Things start getting complicated when:
The code is not the same in the two cores, so objects may not be allocated in similar addresses,
We sometimes use a "host", which is another processor running some control code that requires access to objects in the cores, as well as shared objects in the external memory.
In order to overcome this problem, I am looking for an elegant way of placing variables in build time. I would like to avoid changing the linker script file as possible. However, it seems like in the C level, I could only control placement up to using a combination of the section attribute (which is too coarse) and the align attribute (which doesn't guarantee the exact place).
A possible hack is to use inline assembly to define the objects and explicitly place them (using the .org and .global keywords), but it seems somewhat ugly (and we did not yet actually test this idea...)
So, here's the questions:
Is there a semistandard way, or an elegant solution for manually placing objects in a C program?
Can I declare an "uber"-extarnel objects in my code and make the linker resolve their addresses using another project's executable?
This question describes a similar situation, but there the user references a pre-allocated resource (like a peripheral) whose address is known prior to build time.
Maybe you should try to use 'placement' tag from new operator. More exactly if you have already an allocated/shared memory you may create new objects on that. Please see: create objects in pre-allocated memory
You don't say exactly what sort of data you'll be sharing, but assuming it's mostly fixed-size statically allocated variables, I would place all the data in a single struct and share only that.
The key point here is that this struct must be shared code, even if the rest of the programs are not. It would be possible to append extra fields (perhaps with a version field so that the reader can interpret it correctly), but existing fields must not be removed or modifed. structs are already used as the interface between libraries everywhere, so their layout can be relied upon (although a little more care will be need in a heterogeneous environment, as long as the type sizes and alignments are the same you should be ok).
You can then share structs by either:
a) putting them in a special section and using the linker script to put that in a known location;
b) allocating the struct in static data, and placing a pointer to that at a known location, say in your assembly start-up files; or
c) as (b), but allocate the struct on the heap, and copy the pointer to the known pointer location at run-time. The has the advantage that the pointer can be pre-adjusted for external consumers, thus avoiding a certain amount of messing about.
Hope that helps
Response to question 1: no, there isn't.
As for the rest, it depends very much of the operating system you use. On our system at the time I was in embedded, we had only one processor's memory to handle (80186 and 68030 based), but had multi-tasking but from the same binary. Our tool chain was extended to handle the memory in a certain way.
The toolchain looked like that (on 80186):
Microsoft C 16bit or Borland-C
Linker linking to our specific crt.o which defined some special symbols and segments.
Microsoft linker, generating an exe and a map file with a MS-DOS address schema
A locator that adjusted the addresses in the executable and generated a flat binary
Address patcher.
An EPROM burner (later a Flash loader).
In our assembly we defined a symbol that was always at the beginning of data segment and we patched the binary with a hard coded value coming from the located map file. This allowed the library to use all the remaining memory as a heap.
In fact, if you haven't the controle on the locator (the elf loader on linux or the exe/dll loader in windows) you're screwed.
You're well off the beaten path here - don't expect anything 'standard' for any of this :)
This answer suggests a method of passing a list of raw addresses to the linker. When linking the external executable, generate a linker map file, then process it to produce this raw symbol table.
You could also try linking the entire program (all cores' programs) into a single executable. Use section definitions and a linker script to put each core's program into its internal memory address space; you can build each core's program separately, incrementally link it to a single .o file, then use objcopy to rename its sections to contain the core ID for the linker script, and rename (hide) private symbols if you're duplicating the same code across multiple cores. Finally, manually supply the start address for each core to your bootstrap code instead of using the normal start symbol.

load time relocation and virtual memory

I am wondering what load-time relocation actually means on a system with virtual memory support.I was thinking that in a system with virtual memory every executable will have addresses starting from zero and at run-time the addresses will be translated into physical addresses using page tables.Therefore the executable can be loaded anywhere in memory without the need of any relocation. However this article on shared libraries mentions that linker specifies an address in the executable where the executable is to be loaded (Entry-point address).
http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
Also there are many articles on dynamic linking which talk about absolute addresses.
Is my understanding wrong ?
Load-time relocation and virtual memory support are two different concepts. Almost all CPUs and OSes these days have virtual memory support. The only really important point to understand about virtual memory is this: forget physical addresses. That is now a hardware and OS responsibility and, unless you are writing a paging system, you can forget about physical addresses. All addresses that a program uses are virtual addresses. This is a huge advantage and immensely simplifies the programming model. On 32-bit systems, this simply means that each process gets its own 4 GiB memory space, ranging from 0x00000000 to 0xffffffff.
An .exe represents a process. A linker produces .exe from .obj files. While both are binary files, .obj files are not executable because they do not contain the addresses of all the variables and functions. It is the job of the linker to provide these addresses, which it determines by placing these .obj files end-to-end and then computing the exact addresses of all the symbols (functions and variables). Thus, the .exe that is created has every address of functions and variables "hard-coded" into it. But there is still one critical information needed before the .exe can be created. The linker has to have insider knowledge about where in memory the .exe will be loaded. Will it be at address 0x00000000, or at 0xffff0000, or somewhere else? For example, in Windows all .exes are always loaded at an absolute starting address of 0x00400000. This is called the base address. When the linker generates the final addresses of symbols (functions and variables), it computes those from this address onward.
Now, .exes rarely need to be loaded at any other address. But the same is not true for .dlls. .ddls are the same as .exes (both are formatted in the portable executable (PE) file format, which describes the memory layout, for example, where text goes, where data goes, and how to find which one). .dlls have a preferred address, too. This simply means that the linker uses this value when it computes the addresses for symbols inside the .dll. If the .dll is loaded at this address, then we are all set.
But if the .dll cannot be loaded at this address (say it was 0x10000000) because some other .dll had already been loaded at this address, then the loader will find some other space in memory and load the .dll there. However, the global addresses of functions and symbols in the .dll are now incorrect. Thus, the loader has to do a relocation (also called "fixup"), in which it adjusts the addresses of all global symbols and functions to reflect their actual addresses.
In order to do this adjustment, the loader needs to be able to find all such symbols in the .dll. The PE file has a .reloc section that contains the internal offset of all such symbols.
Of course, there are other details, for example, regarding how indirection can be used when the compiler generated the code so that, instead of making direct calls, the calls are indirect and variables are accessed via known memory locations in the header of the .exe.
Finally, the gist is this: You need relocation (of some sort) to adjust addresses in the call and jump as well as variable access instructions when the code does not load at the position (within the 4 GiB address space) it was expected to load. When the OS loads a .exe, it has to pick a suitable place in this 4 GiB address space where it will copy the code and data chunks from this .exe on disk.

load-time ELF relocation

I am writing a simple user-space ELF loader under Linux (why? for 'fun'). My loader at the moment is quite simple and is designed to load only statically-linked ELF files containing position-independent code.
Normally, when a program is loaded by the kernel's ELF loader, it is loaded into its own address space. As such, the data segment and code segment can be loaded at the correct virtual address as specified in the ELF segments.
In my case, however, I am requesting addresses from the kernel via mmap, and may or may not get the addresses requested in the ELF segments. This is not a problem for the code segment since it is position independent. However, if the data segment is not loaded at the expected address, code will not be able to properly reference anything stored in the data segment.
Indeed, my loader appears to work fine with a simple assembly executable that does not contain any data. But as soon as I add a data segment and reference it, the executable fails to run correctly or SEGFAULTs.
How, if possible, can I fixup any references to the data segment to point to the correct place? Is there a relocation section stored in the (static) ELF file for this purpose?
If you modify the absolute addresses available in the .got section, (global offset table) your program should work. Make sure to modify the absolute address calculation to cater for the new distance between .text and .data, I'm afraid you need to figure out where this information comes from, for your architecture.
See this: Global Offset Table (Processor-Specific)
Good luck.
I don't see any way you can do that, unless you emulate the kernel-provided virtual address space completely, and run the code inside that virtual space. When you mmap the data section from the file, you are intrinsically relocating it to an unknown address of the virtual address space of your ELF interpreter, and your code will not be able to reference to it in any way.
Glad to be proven wrong. There's something very cool to learn here.

Resources