How does g++ linker resolve symbols among .so files - linker

I understand object ordering is very important during linking. I've had a lot of headache before trying to get ld to resolve all symbols. This time ld didn't generate any error, but the output is wrong!
The project is big (50K+ lines of C++) and I can't generate a simplified version, so I'll try to describe what I encountered. Hopefully some expert can help me figure out.
g++ -o bad.out a.o b.o ... x.so y.so
g++ -o good.out a.o b.o ... y.so x.so
While good.out runs correctly, bad.out does not. Both x.so and y.so are provided by independent vendors, so their ordering should not matter. Here are more clues:
a.o uses x.so
b.o uses y.so
a.o and b.o are independent, but they both use some common classes
The incorrect behavior manifests as a function OnRspLogin() never called back. This is a pure virtual function defined in y.so and implemented in b.o. "grep OnRspLogin *.o *.so" only found match in y.so and b.o.
Apparently ld didn't resolved OnRspLogin() to the one in b.o, but which one did it resolve to? This worries me because linker didn't generate any error or warning.
I'm using gcc 4.4.7-4 on CentOS 6.5.
EDIT:
I found x.so and y.so both contain some common symbols (e.g. T TcpClient), so I guess linker picked x.so:TcpClient (instead of y.so:TcpClient) when resolving b.o:TcpClient. While changing .so order may solve this problem, I'm afraid that linker may incorrectly resolve some other symbols in a.o. So is there anyway to tell the linker to resolve b.o using only y.so? Note that these .so files are provided by 3rd parties and I cannot change them.

I found x.so and y.so both contain some common symbols (e.g. T TcpClient)
That is a problem. If these symbols are supposed to be distinct, then you can't link these two libraries together -- they are not link compatible.
The usual way that vendors resolve these kind of problems is that they use distinct, vendor-specific prefix on all of their exported symbols (e.g. vendorA_TcpClient) and hide all other symbols.
Note that these .so files are provided by 3rd parties and I cannot change them.
You can tell vendors that you can't use their library, unless they avoid defining symbols that aren't prefixed with their unique identifiers, and that you are not going to pay them unless they resolve this problem. Vendors often become quite responsive when do that.

Related

GCC/ld not using shared object with -l

I've read this question: ld cannot find shared library even with -L specified, but I'm asking a follow-up: why does GCC do this?
This is something I ran into while building a binary with links to two in-house libraries.
gcc cannot find the symbols from one of the libraries with -l, but uses the other one just fine!
Originally, the command from my Makefile was gcc baz.o qux.o -lfoo -lbar, with the linker unable to find the symbols from libfoo, while finding the symbols from -lbar. The libraries are the exact same type of file in the same locations: headers in /usr/local/include and libraries in /usr/local/lib. In fact, libfoo depends on libbar.
Corrected, the command is now gcc baz.o qux.o -lbar /usr/local/lib/libfoo.so. I have determined this is not an ordering issue.
Why does gcc need the shared object instead of an -l? Is there a better way to do this, other than using the absolute path? This solution seems kludgy to me.
The exact code and output I'm using are as follows:
Output from using -lsandbox
The content of libsandbox.h
The Makefile I'm referencing, annotated.
Thanks!

The problem of link order when dynamic link libraries depend on each other

Sorry for my bad english.
During GCC compilation, if main.o depends on liba.so, and liba.so depends on libb.so
Then you should link liba.so first and then libb.so. Conversely, an error will occur
The reasons I learned are:
The compiler will traverse all .o, .so modules in sequence, and put them into the list U if they encounter undefined symbols
In the process of sequentially traversing all .o, .so modules, the symbols in the .o, .so are used to interpret the symbols in list U
At the end of traversal, if there are still undefined symbols in U, an undefined symbol error is reported
So if liba.so and libb.so depend on each other, in theory i need to link them like this:
-la -lb -la
However, the actual operation shows that liba.so does not need to be linked twice
Why?
Is the link principle I learned wrong, or did the compiler optimize it
if main.o depends on liba.so, and liba.so depends on libb.so
Then you should link liba.so first and then libb.so. Conversely, an error will occur
You got that backwards: if liba.so depends on libb.so, then the correct link order is -la -lb.
However, the actual operation shows that liba.so does not need to be linked twice
Why?
In general, for UNIX linkers, the order matters only for archive libraries.
Unlike with archive libraries, when linking shared libraries, you get the entire library, so if it appears on the link line once, there is never a need to repeat it again.
To understand why you might need to repeat an archive library on the link line, read this.
Is the link principle I learned wrong, or did the compiler optimize it
The "principle" you stated is wrong (backwards) and the compiler is not involved in the link stage at all.

Linking error in static lib unless used in main project

I'm creating a little static library for having thread pools, and it depends on 2 other homemade static libraries (a homemade printf and a homemade mini libc).
But sub-functions like ft_bzero are not linked in the project unless I use them on the root project, the one that needs to use thread pools library. So I have the linking error coming from my thpool lib.
Sample :
cc -Wall -Werror -Wextra -MD -I ./ -I ./jqueue -I ../libft/incs -I
../printf/incs -o .objs/thpool_create.o -c ./thpool_create.c
ar rc libthpool.a ./.objs/thpool_create.o etcetc
In the libraries, I compile every .o and use an ar rc libthpool.a *.o. Then I compile .o from main project (a single test.c actually), and then
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lft -lftprintf -lthpool -lpthread
How can I solve my errors?
Since the code in the ftpool library uses code from ft and ftprintf, you (almost certainly) need to list the libraries in the reverse order:
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lthpool -lftprintf -lft -lpthread
When scanning a static library, the linker looks for definitions of symbols that are currently undefined. If your test code only calls functions from thpool, then none of the symbols in ft are referenced when the ft library is scanned, so nothing is included from the library; if none of the symbols from ftprintf are referenced when the ftprintf library is scanned, nothing is included from ftprintf either. When it comes across the symbols in thpool that reference things from ft or ftprintf, it's too late; the linker doesn't rescan the libraries. Hence you need to list the libraries in an order such that all references from one library (A) to another (B) are found by linking (A) before (B). If the test code references some of the functions in ft or ftprintf, you may get lucky, or a bit lucky; some symbols may be linked in. But if there are functions in thpool that make the first reference to a function in ft, with the order in the question, you've lost the chance to link everything. Hence the suggested reordering.
Another (very grubby, but nonetheless effective) technique is to rescan the static libraries by listing them several times on the command line.
With shared libraries, the rules of linking are different. If a shared library satisfies any symbol, the whole library will be available, so the linker remembers all the defined symbols, and you might well get away with the original link order.
You might need to look up 'topological sort'. You should certainly aim to design your static libraries so that there are no loops in the dependencies; that leads to cycles of dependencies, and the only reliable solutions are either to rescan the libraries or combine the libraries.

Undefined reference to function of another lib

Yeah, I know many people asked that question before, but I still can't understand the problem in my case
I have 2 libs, let's say liba & libb. libb uses liba but is compiled in .a so it should link at compile time.
I have the following GCC command:
gcc -o my_program obj/mymain.o obj/myutils.o liba/liba.a libb/libb.a -Iinclude -Iliba -Ilibb
But GCC is returning me a lot of "Undefined reference to ..." from libb functions to liba functions.
What is happening? What should I do?
Thank you
The evaluation of commands on a link compile command is very important.
When the compiler sees .o files, they get added to the target binary automatically, so all .o files are present. That leaves a list of undefined entities which need to be found.
The next stage is to look through the libraries. Each library is searched, and the .o elements of each library which fulfills an undefined reference is added to the target binary. That always resolves some issues. However, it may also have further requirements. So adding part of a library may add to the required elements to be satisfied.
When a library requires another library, it needs to be specified after something which required it, and before the libraries which satisfy its requirements.
There is a chance if the .o files also require the same parts of a library, this issue can crop up when code is deleted from a .o (removing the mechanism which pulls in the library part).

Statically linking against LAPACK

I'm attempting to do a release of some software and am currently working through a script for the build process. I'm stuck on something I never thought I would be, statically linking LAPACK on x86_64 linux. During configuration AC_SEARCH_LIB([main],[lapack]) works, but compilation of the lapack units do not work, for example undefiend reference to 'dsyev_' --no lapack/blas routine goes unnoticed.
I've confirmed I have the libraries installed and even compiled them myself with the appropriate options to make them static with the same results.
Here is an example I had used in my first experience with LAPACK a few years ago that works dynamically, but not statically: http://pastebin.com/cMm3wcwF
The two methods I'm using to compile are the following,
gcc -llapack -o eigen eigen.c
gcc -static -llapack -o eigen eigen.c
Your linking order is wrong. Link libraries after the code that requires them, not before. Like this:
gcc -o eigen eigen.c -llapack
gcc -static -o eigen eigen.c -llapack
That should resolve the linkage problems.
To answer the subsequent question why this works, the GNU ld documentation say this:
It makes a difference where in the command you write this option; the
linker searches and processes libraries and object files in the order
they are specified. Thus, foo.o -lz bar.o' searches libraryz' after
file foo.o but before bar.o. If bar.o refers to functions in `z',
those functions may not be loaded.
........
Normally the files found this way are library files—archive files
whose members are object files. The linker handles an archive file by
scanning through it for members which define symbols that have so far
been referenced but not defined. But if the file that is found is an
ordinary object file, it is linked in the usual fashion.
ie. the linker is going to make one pass through a file looking for unresolved symbols, and it follows files in the order you provide them (ie. "left to right"). If you have not yet specified a dependency when a file is read, the linker will not be able to satisfy the dependency. Every object in the link list is parsed only once.
Note also that GNU ld can do reordering in cases where circular dependencies are detected when linking shared libraries or object files. But static libraries are only parsed for unknown symbols once.

Resources