Order of symbol lookup in .so dependency graph - linker

Suppose I have a shared objects load-time dependency graph and symbol foo referenced in one of the .so. Suppose also this symbol foo is defined in several other shared objects. My question is: which definition will be found, what is a look up order and where it is defined (in what standard, man page)?
Example:
Consider a dependency graph
https://i.imgur.com/jdhD3V0.png
where libraries listed in ldd order e.g.
ldd a.so
b.so
d.so
Lets assume foo defined in c.so and d.so and first referenced in f.so. My experimentation shows that implementation from d.so will be taken by the linker. It looks like that libraries are searched in bfs order. Is this right? Does this coincide with libraries loading order? I could not find this in any documentation and must be sure it is not implementation defined.
Thank you!

Dynamic linking is specified in the ELF specfication. (Note that there are some really old PDFs and Postscript files floating around, but those are generally very outdated.) Symbol lookup is described in section Shared Object Dependencies:
When resolving symbolic references, the dynamic linker examines the symbol tables with a breadth-first search. That is, it first looks at the symbol table of the executable program itself, then at the symbol tables of the DT_NEEDED entries (in order), and then at the second level DT_NEEDED entries, and so on.
(There are various extensions which alter this behavior. The ELF specification itself defines the DF_SYMBOLIC flag.)
This means that your question cannot be answered because your graph does not show the main executable, and it is unclear in which order multiple dependencies are searched (top-to-bottom or bottom-to-top).
Whether the lookup order matches the object loading order is implementation-defined because merely loading an object (without executing its initialization functions) is not something that has an observable effect according to the ELF specification.
Initialization order (the order in which initialization functions are executed) is less constrained than symbol lookup order because the order of DT_NEEDED entries does not matter to that. So in theory, it is possible that an implementation loads an initializes d.so before b.so, but the symbols from b.so interpose that of d.so because it comes first in the symbol search order (due to the way the DT_NEEDED entries are ordered).

Related

How do I find undefined functions and add/link them as external functions?

I'm new to IAR workbenches in general (and EWARM to be precise), so I have a couple of potentially silly questions.
For starters, here's what I actually want to do and the questions aroused:
I need to check .o (.obj) file for undefined symbols and potentially collect them. When I was workings with GCC, I used nm with --undefined-only option to list such symbols. So, is there a similar tool in IAR (EWARM)?
Having these undefined symbols collected, I need to manually link these symbols (functions) to specific addresses. While working with GCC I used ld script and placed function = address entries in ENTRY part of the script. So, what's the right way to do the same thing in EWARM?
Any help is appreciated.
There is no direct way of doing this using only the tools from the EWARM distribution but since iccarm produces ELF-files you can continue using nm --undefined-only for this step.
There are at least two different ways of doing this. First, there is a command-line option to ilink that allows you to define symbol to address mappings. For instance, adding --define_symbol print=0x1234 will add the symbol print with the value 0x1234. Second, symbols can be defined in the linker configuration file (.icf-file) using the define exported symbol directive. The example above is expressed as define exported symbol print = 0x1234.

Symbol confusion in dynamic linked library when app code has same symbol [duplicate]

I have two dynamically loadable libraries lib_smtp.so and and libpop.so etc. Both have a global variable named protocol which is initialized to "SMTP" and "POP" respectively. I have another static library libhttp.a where protocol is initialized to "HTTP".
Now for some reason i need to compile all dynamic linkable and loadable libraries statically and include in the executable. Doing so i am getting error "multiple definition of symbol" during linking of static libraries.
I am curious to know how linker resolves duplicate symbols during dynamic linking where all three mentioned libraries are getting linked ?
Is there some way i can do the same statically as linker is doing in dynamic linking ie without any conflict add all static libraries to executable which have same symbols? if not, why the process is different for statically linked libraries.
Dynamic linking in modern Linux and several other operating systems is based on the ELF binary format. The (ELF) dynamic libraries on which an executable or other shared library relies are prioritized. To resolve a given symbol, the dynamic linker checks each library in priority order until it finds one that defines the symbol.
That can be dicey when multiple dynamic objects define the same symbol and also multiple dynamic objects use that symbol. It can then be the case that the symbol is resolved differently in different dynamic objects.
Full details are out of scope for SO, but I don't know a better technical explanation than the one in Ulrich Drepper's paper "How to Write Shared Libraries".
In dynamic linking some facility called "symbol visibility" kicks in. Essentially this allows to expose only certain symbols across the object's (object in the sense of shared object) boundaries. It is good style to compile and link shared objects with symbols being hidden by default and only expose those explicitly that are required by callees.
Symbol visibility is applied during linking and so far only implemented in dynamic linkers. It's certainly possible to also have it in static linkage, Apple's GCC variant implements so called Mach-O relocateable object files which can be statically linked with visibility applied. But I don't know if the vanilla GCC, binutils ld or the gold linker can do this for plain old ELF.

Duplicate symbols in Microsoft C library

I'm writing a linker for Windows PE format object files, and I've got to the stage where it can link together object files produced by the Microsoft compiler, but when I try to link with libcmt.lib I get a lot of duplicate symbols.
For example, cosl is defined by three different objects in the library. All three refer to definitions in different places, and all three look genuine, e.g. they point to text segments named .text$mn and have storage class IMAGE_SYM_CLASS_EXTERNAL.
Is it the case that these are alternate versions and the linker is supposed to pick one based on some criterion, or am I misunderstanding something about the semantics of the PE library format?
As referenced in the comments, the OP is not processing the COMDAT section properly.
http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/pecoff.doc

How to get GCC to export ALL symbols to the output file

I am developing an operating system, and I need to load some modules BEFORE paging is set up. So since paging is not set up at this point I need to relocate all of the symbols in the program to there physical address. My problem is that not all symbols can be found in the symbol table and not all relocation info can be found in rel.text. How can I get GCC to export all symbol data???
Surely, ANYTHING needing relocation will be in the relocation table. How else could it be loaded? Whether paging is enabled or not, relocation works exactly the same - entries that are absolute locations in the binary are listed with an offset, and then processed by the loading software. Everything else should be fine without relocation.
Note that a symbol table is not meaningful for resolving relocations in and of itself, as that only gives the location of a symbol.
Are you perhaps thinking of the symbols in your OS itself? If so, it's really a case of exporting the symbols from your OS in an appropriate way. Linux has EXPORT_SYMBOL(name), which builds a symbol table within the kernel itself. [Note that this is NOT the symbols generated by gcc or ld, but symbols built by macros, and processed in the kernel.
Edit to clarify, as I ran out of space in "comment":
There are two types of "relocations": Internal ones - where you have absolute references to things in your own module, e.g. pointers to strings, poitners to functions, jump tables for switch statements, and so on - these should simply be a question of adding up the current value with the offset for where the binary is actually located (virtual address of course). The other is "external references", such as when your module calls, say spinlock() - this is not implemented inside the module, so it will have an "external reference". In this case, there will be a relocation entry with "spinlock" as the name and an offset of where the call to spinlock goes in the module. Now you obviously need a symbol table to look up where in your kernel "spinlock" is located [and if you want to be really complicated, allow for moduels to reference other modules, but I'd leave that until you have one module loading OK first!].
Really your question is about the linker. And the answer depends on the linker, that you are using.
If it is the standard linker ld under gcc, try the "-Wl,-r" option.

How to get memory locations of library functions?

I am compiling a C program with the SPARC RTEMS C compiler.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
I have also tried using the RCC nm utility, which returns a slightly more readable symbol table. I assume that the location given by this utility for, say, printf, is the location where printf is in memory and that every program that calls printf will reach that location during execution. Is this a valid assumption?
Is there any way to get a list of locations for all the library/system functions? Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library? It seems to me to be the latter, given the number of things I found in the symbol table and memory map. Can I make it link only the required functions?
Thanks for your help.
Most often, when using a dynamic library, the nm utility will not be able to give you the exact answer. Binaries these days use what is known as relocatable addresses. These addresses change when they are mapped to the process' address space.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
The linker map will usually have all symbols -- yours, the standard libraries, runtime hooks etc.
Is there any way to get a list of locations for all the library/system functions?
The headers are a good place to look.
Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library?
Linking does not necessarily mean that all symbols will be resolved (i.e. given an address). It depends on the type of binary you are creating.
Some compilers like gcc however, does allow you whether to create a non-relocatable binary or not. (For gcc you may check out exp files, dlltool etc.) Check with the appropriate documentation.
With dynamic linking,
1. your executable has a special place for all external calls (PLT table).
2. your executable has a list of libraries it depends on
These two things are independent. It is impossible to say which external function lives in which library.
When a program does an external function call, what actually happens it calls an entry in the PLT table, which does a jump into the dynamic loader. The dynamic loader looks which function was called (via PLT), looks its name (via symbol table in the executable) and looks up that name in ALL libraries that are mapped (all that given executable is dependant on). Once the name is found, the address of the corresponding function is written back to the PLT, so next time the call is made directly bypassing the dynamic linker.
To answer your question, you should do the same job as dynamic linker does: get a list of dependent libs, and lookup all names in them. This could be done using 'nm' or 'readelf' utility.
As for static linkage, I think all symbols in given object file within libXXX.a get linked in. For example, static library libXXX.a consists of object files a.o, b.o and c.o. If you need a function foo(), and it resides in a.o, then a.o will be linked to your app - together with function foo() and all other data defined in it. This is the reason why for example C library functions are split per file.
If you want to dynamically link you use dlopen/dlsym to resolve UNIX .so shared library entry points.
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Assuming you know the names of the functions you want to call, and which .so they are in. It is fairly simple.
void *handle;
int *iptr, (*fptr)(int);
/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);
/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");
/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);
If you want to get a list of all dynamic symbols, objdump -T file.so is your best bet. (objdump -t file.a if your looking for statically bound functions). Objdump is cross platform, part of binutils, so in a pinch, you can copy your binary files to another system and interrorgate them with objdump on a different platform.
If you want dynamic linking to be optimal, you should take a look at your ld.so.conf, which specifie's the search order for the ld.so.cache (so.cache right ;).

Resources