My final ELF file contains >500 dynamic relocations of type R_ARM_RELATIVE and 5 static relocations of type R_ARM_ABS32.
As far as I know the static relocations are only needed for static linking. My file is going to be only dynamically linked when loaded.
I've found this information here:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf
Static relocations are processed by a static linker; they are normally either
fully resolved or used to produce dynamic relocations for processing by a
post-linking step or a dynamic loader. A well formed image will have no static
relocations after static linking is complete, so a post-linker or dynamic loader
will normally only have to deal with dynamic relocations.
Why can my file contain static relocations while it is fully linked?
My other question is about R_ARM_RELATIVE relocations.
They do not use the .dynsym table for relocation process (R_ARM_ABS32 does).
So why there are so many (>70) symbols in .dynsym table, instead of 5?
Are these symbols used for something else besides the relocation process in dynamic linking?
Related
I want to create my own linker and loader. I know that in the linking stage the linker will take into consideration the relocation data in the ELF header for all the object files.
The linker then will create an executable file with all the addresses resolved and will store it in the hard drive.
When the time comes the loader will have to load that executable in main memory but the memmory already contain running programs so there will be conflicts.
Question1: Must the loader relocate the addresses all over again?
Question2: If yes, does that mean that the loader must scan all the text sectors of the executable and change the addresses of all cpu instructions??*
*that means that the loader have a copy of the ISA in memory and must scan instruction per instruction. It's like an execution before the execution.
There are no relocation data in the ELF header. Linkable ELF object files store relocation data in subservient sections named .rela.text, .rela.data etc.
Static linker on Linux will choose the starting address where the executable image will be loaded (usually 0x08048000) and then it uses relocations to update instructions and data in code and data sections. After those .rela.text and .rela.data have been handled, subservient .rela section are no longer needed and may be stripped off the final ELF executable file.
When the time comes to load the linked executable file in memory, loader creates a new process in protected mode. All virtual address space is assigned to the process and it is unoccupied. Other programs may be loaded in the same computer but they run happily each in their private addressing space.
The scenario you're afraid of sometimes happens on Windows, when different dynamic libraries were linked to start at conflicting virtual address. Therefore Portable executable format (PE/DLL) keeps relocation records in subservient section .reloc and yes, the loader must relocate all addresses mentioned in this section then.
Similar situation is on DOS in real mode, where there is only one 1 MiB address space common for all processes. MZ executables are linked to virtual address 0 and all adresses which require relocation are kept in Relocation pointer table following the MZ EXE header, and the loader is responsible for updating segment addresses mentioned in this pointer table.
Answer1: Relocation is necessary only if the executable image is loaded at different address that it was linked to, and if it is not linked to Position-Independed Executable.
Answer2: Relocation does not concern addresses of all CPU instruction, only those fields in instruction body (displacement or immediate address) which refer to an address. Such places must be explicitly specified in relocation records. If the relocation information was stripped off the file, your loader should refuse execution.
Good source of information: Linkers and Blog by Ian Lance Taylor.
I use static linking to produce the executable object files and I use readelf to check the file and found there is one section called: .rela.plt
the keyword 'rela' indicates that this is related to relocation. but since I use static linking, not using any shared library, so the output executable file should be a fully linked executable file, so why this file still contain relocation information?
There are two ways run-time relocations can end up in statically-linked programs.
The GNU toolchain supports selecting different function implementations at run time using the IFUNC mechanism. On x86-64, these show up as R_X86_64_IRELATIVE relocations.
Some targets support statically linked position independent executables (via -static-pie in the GNU toolchain). Since the the load address differs from program run to program due to address-space layout randomization, any global data object that contains a pointer needs to be relocated at run time. On x86-64, these relocations show up as R_X86_64_RELATIVE.
(There might be other things that need relocations in statically linked programs on more obscure targets.)
I have two dynamically loadable libraries lib_smtp.so and and libpop.so etc. Both have a global variable named protocol which is initialized to "SMTP" and "POP" respectively. I have another static library libhttp.a where protocol is initialized to "HTTP".
Now for some reason i need to compile all dynamic linkable and loadable libraries statically and include in the executable. Doing so i am getting error "multiple definition of symbol" during linking of static libraries.
I am curious to know how linker resolves duplicate symbols during dynamic linking where all three mentioned libraries are getting linked ?
Is there some way i can do the same statically as linker is doing in dynamic linking ie without any conflict add all static libraries to executable which have same symbols? if not, why the process is different for statically linked libraries.
Dynamic linking in modern Linux and several other operating systems is based on the ELF binary format. The (ELF) dynamic libraries on which an executable or other shared library relies are prioritized. To resolve a given symbol, the dynamic linker checks each library in priority order until it finds one that defines the symbol.
That can be dicey when multiple dynamic objects define the same symbol and also multiple dynamic objects use that symbol. It can then be the case that the symbol is resolved differently in different dynamic objects.
Full details are out of scope for SO, but I don't know a better technical explanation than the one in Ulrich Drepper's paper "How to Write Shared Libraries".
In dynamic linking some facility called "symbol visibility" kicks in. Essentially this allows to expose only certain symbols across the object's (object in the sense of shared object) boundaries. It is good style to compile and link shared objects with symbols being hidden by default and only expose those explicitly that are required by callees.
Symbol visibility is applied during linking and so far only implemented in dynamic linkers. It's certainly possible to also have it in static linkage, Apple's GCC variant implements so called Mach-O relocateable object files which can be statically linked with visibility applied. But I don't know if the vanilla GCC, binutils ld or the gold linker can do this for plain old ELF.
I have two dynamically loadable libraries lib_smtp.so and and libpop.so etc. Both have a global variable named protocol which is initialized to "SMTP" and "POP" respectively. I have another static library libhttp.a where protocol is initialized to "HTTP".
Now for some reason i need to compile all dynamic linkable and loadable libraries statically and include in the executable. Doing so i am getting error "multiple definition of symbol" during linking of static libraries.
I am curious to know how linker resolves duplicate symbols during dynamic linking where all three mentioned libraries are getting linked ?
Is there some way i can do the same statically as linker is doing in dynamic linking ie without any conflict add all static libraries to executable which have same symbols? if not, why the process is different for statically linked libraries.
Dynamic linking in modern Linux and several other operating systems is based on the ELF binary format. The (ELF) dynamic libraries on which an executable or other shared library relies are prioritized. To resolve a given symbol, the dynamic linker checks each library in priority order until it finds one that defines the symbol.
That can be dicey when multiple dynamic objects define the same symbol and also multiple dynamic objects use that symbol. It can then be the case that the symbol is resolved differently in different dynamic objects.
Full details are out of scope for SO, but I don't know a better technical explanation than the one in Ulrich Drepper's paper "How to Write Shared Libraries".
In dynamic linking some facility called "symbol visibility" kicks in. Essentially this allows to expose only certain symbols across the object's (object in the sense of shared object) boundaries. It is good style to compile and link shared objects with symbols being hidden by default and only expose those explicitly that are required by callees.
Symbol visibility is applied during linking and so far only implemented in dynamic linkers. It's certainly possible to also have it in static linkage, Apple's GCC variant implements so called Mach-O relocateable object files which can be statically linked with visibility applied. But I don't know if the vanilla GCC, binutils ld or the gold linker can do this for plain old ELF.
When linking an application against a dynamic shared library such as in
gcc -o myprog myprog.o -lmylib
I know the linker (ld on my Linux) use the -l option to store in the produced myprog ELF executable file the name of the library (mylib in this case) that will be used at load and link time (both when the program will be started if we ignore lazy dynamic linking). I am wondering what are the other jobs perform by ld (I am only speaking of the static linking step done at compilation time) regarding the dynamic shared library ?
ld must checks for undefined symbol existence in provided dynamic shared libraries
any other stuff ?
Moreover, I will be interested on pointers you are using (books, online documentation) regarding ELF format and dynamic linking and loading processes.
While you hit the most obvious things ld needs to do when linking to ELF shared libraries, there are a few more you missed. I'll re-state the ones you mentioned and add some more:
Ensuring that all undefined symbols are resolved (unless the output is a shared library itself, in which case undefined symbols are valid).
Storing a reference to the library in a DT_NEEDED record of the _DYNAMIC object of the output file.
If the output is not position-independent and references objects (in the sense of data, as opposed to functions) in the shared library, generating a copy relocation to copy the original image of the object into the main program's data segment at load time, and the proper symbol table entry so that references to the object in the shared library itself get resolved to the new copy in the main program, rather than the original copy in the library.
Generating PLT thunks for the destination of each function call in the output that's not resolved at ld-time to a definition in the output.
These are the tasks I can think of that are specific to use of shared libraries, and of course don't include all the work that the linker already does which would be the same as for static linking. One way to think of what ld does with dynamic linking is that it takes object files with a huge repertoire of relocation types (representing anything the compiler or assembler can produce) and resolves all but a small number of them (for static linking, that number would be zero), where all of the remaining relocations fit into a much more limited set of types resolvable by the dynamic linker at load time.
One important step is the creation of a dynamic symbol table, which the runtime linker ld.so can use to link the executable against the library at runtime. It will also write the dynamic relocation table to note which machine code locations need to be changed to point to dynamically linked symbols. To see details:
objdump -T myprog
objdump -R myprog
Also note that the string written to the executable will actually be the SONAME of the library, which might be something like mylib.so.0. This will ensure that even when you install a newer and incompatible mylib.so.1.42 at some later point, the executable will use the compatible ABI version 0 instead. For details:
ldd myprog
Of course, the linker will also link your object files against one another, but since it does that even in the absence of a dynamic shared library, I take it that you are not interested in this part of its operation.