Add dynamic dependency to shared object in C++ - linker

I want to create shared object B that dynamically links to shared object A. I'm using the following command to compile shared object B:
g++ -fPIC -shared -L/path/to/directory -lA -o libB.so B.cpp
It is my understanding that -lA is what tells the linker that libB.so should dynamically link to /path/to/directory/libA.so. However, when I do ldd on the final product, the dependency is not listed (and loading libB.so fails because of these missing dependencies).
ldd libB.so
linux-vdso.so.1 => (0x00007ffd4233e000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f35072fe000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f35070e7000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3506de1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3506a1c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3507807000)
Am I wrong about what -l is supposed to do? I assume that the above is a minimal set of dynamic dependencies from C++.
Are there any gotchas to look for? For instance, does the linker simply ignore -l requests when it can't find the file or something (and I have to debug my paths more than I already have)?
Do I have to put something in my C++ code to indicate dependency (like "extern" functions or something)?
Update:
I have determined that the set of dynamic dependencies that ld reports does depend on my C++ code and does not appear to depend on any -L or -l flags I supply. The linker is automatically guessing which shared objects my libB.so should depend on, and it's not assuming enough.
For instance, I know that I'll need B to load A because I call some code that eventually calls code in libA.so. How do I provide this information to the linker?
Clarifications:
What I'm calling "shared object A" is a complex thing that may load some code dynamically. I want to include enough dependencies so that it will not fail with "missing symbols" when it tries to load this code dynamically. That's why I want to force dependencies in B, because the linker might not statically find them in the dependency tree.
Also, I'm using g++ 4.8.4 (Ubuntu 14.04). This is relevant because g++ started implicitly applying -Wl,--as-needed as of version 4.6.

Probably the dependency is not listed in the ldd's output because no symbols which are used in B.cpp are found in libA.so. This may happen because of C++ symbol names mangling: if libA.so has been compiled by the C compiler, it may store the symbol for function void foo() under pretty name foo, whereas C++ compiler will mangle it into something like _Z3foov. You can check it with the following commands:
$ # Replace "SomeSharedObject" with the actual name of symbol exported by libA.so.
$ strings libA.so | grep SomeSharedObject
$ strings libB.so | grep SomeSharedObject
To avoid this, one could put the declaration of the symbol foo into an extern "C" {} clause. Then compiler will not mangle this name, and linker will probably find this name in libA.so.
The -l option works as you expect in a laboratory environment:
$ cat bar.cpp
extern void foo();
void bar()
{
foo();
}
$ cat baz.cpp
extern void bar();
void baz()
{
bar();
}
$ # Link against libssl.so (OpenSSL).
$ # Obviously libssl.so is unnecessary in libbar.so.
$ g++ -fPIC -shared bar.cpp -o libbar.so -lssl
$ ldd libbar.so
statically linked
$ # Link against libbar.so in the current directory
$ g++ -fPIC -shared baz.cpp -o libbaz.so -L`pwd` -lbar
$ ldd libbaz.so
linux-vdso.so.1 => (0x00007fff7cfe2000)
libbar.so => not found
Here libbar.so depends on the function foo. But it hasn't been found in any library, including libssl.so. So ldd reports the shared object libbar.so as "statically linked". All symbols which were not found when libbar.so was produced, will be searched for when creating the final executable which depends on libbar.so.
In turn, libbaz.so depends on libbar.so, because the function void bar() was found in the above shared object which we have specified via -l option. If we omit the -L option, linker would report error like -lbar: not found. If we omit both -L and -l, libbaz.so would not depend on any shared object just as libbar.so.

You are compiling to a weird object name. Linking will happen when you create an executable. then you have to mention your various objects and libraries.

Related

Can a dynamic library depend on a static library in C and vice-versa?

I am trying to understand static libraries and shared objects in C. I am trying to understand whether one type of library can depend on other type.
Consider a scenario:
libA.so has a function foo_A_dyn():
libA.so ---> foo_A_dyn()
foo_A_dyn() uses a function foo_B_static() which is defined in libB.a which is a static library.
libB.a ---> foo_B_static()
I have built my libraries in the following way:
gcc -c foo_B.c -o foo_B.o
ar -cvq libB.a foo_B.o
gcc -fPIC -c foo_A.c -o foo_A.o
gcc -shared libA.so foo_A.o -I.
gcc main.c -lA -lB -L. -I. -o EXE
Note: main.c makes call to foo_A_dyn() and does NOT call foo_B_static() directly.
And now when I am trying to build my executable EXE, I am getting the error "undefined reference to foo_B_static".
I think the error seems genuine but I am not able to decode the rationale behind this and put it to words.
Can someone please help?
From gcc link options:
-llibrary
-l library
...
It makes a difference where in the command you write this option; the linker searches and processes libraries and object files in the order they are specified. Thus, ‘foo.o -lz bar.o’ searches library ‘z’ after file foo.o but before bar.o. If bar.o refers to functions in ‘z’, those functions may not be loaded.
Try:
gcc main.c -lB -lA -L. -I. -o EXE
Here's what the linker is doing. When we link our executable ('EXE' above) it has some symbols (functions and other things) that are unresolved. It will look down the list of libraries that follow in sequential order, trying to resolve unresolved symbols. Along the way, it finds that some of the symbols are provided by libB.so, so it notes that they are now resolved by this library. While going through libB.so it finds some symbols which are unresolved and it tries to resolve them by looking up the library that follows.
When we are ordering the libraries like:
gcc main.c -lA -lB -L. -I. -o EXE
Linker is not able to lookup for the definition of symbols used in libB into libA. Reason could be that backward reference is not available.
I have also figured out that:
shared object can depend on a static archive,
a static archive can depend on a shared object, and
one static archive can depend on another static archive
Please let me know if I have erred somewhere.

What's the difference between `-rpath-link` and `-L`?

The man for gold states:
-L DIR, --library-path DIR
Add directory to search path
--rpath-link DIR
Add DIR to link time shared library search path
The man for bfd ld makes it sort of sound like -rpath-link is used for recursively included sos.
ld.lld doesn't even list it as an argument.
Could somebody clarify this situation for me?
Here is a demo, for GNU ld, of the difference between -L and -rpath-link -
and for good measure, the difference between -rpath-link and -rpath.
foo.c
#include <stdio.h>
void foo(void)
{
puts(__func__);
}
bar.c
#include <stdio.h>
void bar(void)
{
puts(__func__);
}
foobar.c
extern void foo(void);
extern void bar(void);
void foobar(void)
{
foo();
bar();
}
main.c
extern void foobar(void);
int main(void)
{
foobar();
return 0;
}
Make two shared libraries, libfoo.so and libbar.so:
$ gcc -c -Wall -fPIC foo.c bar.c
$ gcc -shared -o libfoo.so foo.o
$ gcc -shared -o libbar.so bar.o
Make a third shared library, libfoobar.so that depends on the first two;
$ gcc -c -Wall -fPIC foobar.c
$ gcc -shared -o libfoobar.so foobar.o -lfoo -lbar
/usr/bin/ld: cannot find -lfoo
/usr/bin/ld: cannot find -lbar
collect2: error: ld returned 1 exit status
Oops. The linker doesn't know where to look to resolve -lfoo or -lbar.
The -L option fixes that.
$ gcc -shared -o libfoobar.so foobar.o -L. -lfoo -lbar
The -Ldir option tells the linker that dir is one of the directories to
search for libraries that resolve the -lname options it is given. It searches
the -L directories first, in their commandline order; then it searches its
configured default directories, in their configured order.
Now make a program that depends on libfoobar.so:
$ gcc -c -Wall main.c
$ gcc -o prog main.o -L. -lfoobar
/usr/bin/ld: warning: libfoo.so, needed by ./libfoobar.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libbar.so, needed by ./libfoobar.so, not found (try using -rpath or -rpath-link)
./libfoobar.so: undefined reference to `bar'
./libfoobar.so: undefined reference to `foo'
collect2: error: ld returned 1 exit status
Oops again. The linker detects the dynamic dependencies requested by libfoobar.so
but can't satisfy them. Let's resist its advice - try using -rpath or -rpath-link -
for a bit and see what we can do with -L and -l:
$ gcc -o prog main.o -L. -lfoobar -lfoo -lbar
So far so good. But:
$ ./prog
./prog: error while loading shared libraries: libfoobar.so: cannot open shared object file: No such file or directory
at runtime, the loader can't find libfoobar.so.
What about the linker's advice then? With -rpath-link, we can do:
$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath-link=$(pwd)
and that linkage also succeeds. ($(pwd) means "Print Working Directory" and just "copies" the current path.)
The -rpath-link=dir option tells the linker that when it encounters an input file that
requests dynamic dependencies - like libfoobar.so - it should search directory dir to
resolve them. So we don't need to specify those dependencies with -lfoo -lbar and don't
even need to know what they are. What they are is information already written in the
dynamic section of libfoobar.so:-
$ readelf -d libfoobar.so
Dynamic section at offset 0xdf8 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libfoo.so]
0x0000000000000001 (NEEDED) Shared library: [libbar.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
...
...
We just need to know a directory where they can be found, whatever they are.
But does that give us a runnable prog?
$ ./prog
./prog: error while loading shared libraries: libfoobar.so: cannot open shared object file: No such file or directory
No. Same as story as before. That's because -rpath-link=dir gives the linker the information
that the loader would need to resolve some of the dynamic dependencies of prog
at runtime - assuming it remained true at runtime - but it doesn't write that information into the dynamic section of prog.
It just lets the linkage succeed, without our needing to spell out all the recursive dynamic
dependencies of the linkage with -l options.
At runtime, libfoo.so, libbar.so - and indeed libfoobar.so -
might well not be where they are now - $(pwd) - but the loader might be able to locate them
by other means: through the ldconfig cache or a setting
of the LD_LIBRARY_PATH environment variable, e.g:
$ export LD_LIBRARY_PATH=.; ./prog
foo
bar
rpath=dir provides the linker with the same information as rpath-link=dir
and instructs the linker to bake that information into the dynamic section of
the output file. Let's try that:
$ export LD_LIBRARY_PATH=
$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd)
$ ./prog
foo
bar
All good. Because now, prog contains the information that $(pwd) is a runtime search
path for shared libraries that it depends on, as we can see:
$ readelf -d prog
Dynamic section at offset 0xe08 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libfoobar.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000f (RPATH) Library rpath: [/home/imk/develop/so/scrap]
... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
That search path will be tried after the directories listed in LD_LIBRARY_PATH, if any are set, and before the system defaults - the ldconfig-ed directories, plus /lib and /usr/lib.
The --rpath-link option is used by bfd ld to add to the search path used for finding DT_NEEDED shared libraries when doing link-time symbol resolution. It's basically telling the linker what to use as the runtime search path when attempting to mimic what the dynamic linker would do when resolving symbols (as set by --rpath options or the LD_LIBRARY_PATH environment variable).
Gold does not follow DT_NEEDED entries when resolving symbols in shared libraries, so the --rpath-link option is ignored. This was a deliberate design decision; indirect dependencies do not need to be present or in their runtime locations during the link process.

Isn't ld checking for unresolved symbols in shared libraries redundant?

When linking a program against a shared object, ld will ensure that symbols can be resolved. This basically ensures that the interfaces between the program and its shared objects are compatible. After reading Linking with dynamic library with dependencies, I learnt that ld will descend into linked shared objects and attempt to resolve their symbols too.
Aren't my shared object's references already checked when the shared objects are themselves linked?
I can understand the appeal of finding out at link time whether a program has all the pieces it requires to start, but does it seems irrelevant in the context of packages building where shared objects may be distributed separately (Debian's lib* packages, for instance). It introduces recursive build dependencies on systems uninterested in executing built programs.
Can I trust the dependencies resolved when the shared object was built? If so, how safe is it to use -unresolved-symbols=ignore-in-shared-libs when building my program?
You're wondering why a program's linkage should bother to resolve symbols originating in
the shared libraries that it's linked with because:
Aren't my shared object's references already checked when the shared objects are themselves linked?
No they're not, unless you expressly insist on it when you link the shared library,
Here I'm going to build a shared library libfoo.so:
foo.c
extern void bar();
void foo(void)
{
bar();
}
Routinely compile and link:
$ gcc -fPIC -c foo.c
$ gcc -shared -o libfoo.so foo.o
No problem, and bar is undefined:
$ nm --undefined-only libfoo.so | grep bar
U bar
I need to insist to get the linker to object to that:
$ gcc --shared -o libfoo.so foo.o -Wl,--no-undefined
foo.o: In function `foo':
foo.c:(.text+0xa): undefined reference to `bar'
Of course:
main.c
extern void foo(void);
int main(void)
{
foo();
return 0;
}
it won't let me link libfoo with a program:
$ gcc -c main.c
$ gcc -o prog main.o -L. -lfoo
./libfoo.so: undefined reference to `bar'
unless I also resolve bar in the same linkage:
bar.c
#include <stdio.h>
void bar(void)
{
puts("Hello world!");
}
maybe by getting it from another shared library:
gcc -fPIC -c bar.c
$ gcc -shared -o libbar.so bar.o
$ gcc -o prog main.o -L. -lfoo -lbar
And then everything's fine.
$ export LD_LIBRARY_PATH=.; ./prog
Hello world!
It's of the essense of a shared library that it doesn't by default have
to have all of its symbols resolved at linktime. That way that a program - which
typically does need all its symbols resolved a linktime - can get all its symbols
resolved by being linked with more than one library.
Aren't my shared object's references already checked
when the shared objects are themselves linked?
Well, shared libs might have been linked with -Wl,--allow-shlib-undefined or with dummy dependencies so it still makes sense to check them.
Can I trust the dependencies resolved when the shared object was built?
Probly not, current linking environment and the environment used to link original shlibs may be different.
If so, how safe is it to use -unresolved-symbols=ignore-in-shared-libs
when building my program?
You may be missing potential errors in this case (or rather delaying them to runtime which is still bad). Imagine a situation where some of the symbols needed by shared objects are to come from executable itself or from one of the libs which is linked by executable (but not by the shlib which is missing the symbols).
EDIT
Although above is correct, Mike Kinghan's answer gives stronger argument in favor of symbol resolution in libraries during executable link.

Are runtime libraries inherently dynamic libraries?

I am cross compiling for a system with an OpenMP parallelized program, but when I run on the target, I get the error:
can't load library 'libgomp.so.1'
After looking around, I see that it's an OpenMP runtime library. Is there any was to statically link the library it on the compiler host machine, or does it need to be present on the target machine? If it can be statically linked, then what makes a runtime library different from a dynamic library? Could one statically or dynamically link any library, provided the environment was correct?
You can selectively statically link certain libraries by providing certain linker options. For libgomp it would be something like:
gcc -o executable foo.o bar.o -Wl,-static -lgomp -Wl,-Bdynamic -lpthread -lother -llibs
Any library listed between -Wl,-static and -Wl,-Bdynamic will be linked in statically. -fopenmp should not be present in the linking command as it expands to linker flags that get appended after the user supplied options, and therefore libpthread should be listed explicitly. It also means that even simple OpenMP programs have to be compiled and linked in two separate steps for static linking to work.
Example:
// foo.c
#include <stdio.h>
#include <omp.h>
int main(void)
{
#pragma omp parallel
printf("Hello world from thread %d\n", omp_get_thread_num());
return 0;
}
Traditional compilation:
$ gcc -fopenmp -o foo foo.c
$ ldd foo
linux-vdso.so.1 => (0x00007ffff5661000)
libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x0000003bcfa00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003bc2600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003bc1e00000)
librt.so.1 => /lib64/librt.so.1 (0x0000003bc3200000)
/lib64/ld-linux-x86-64.so.2 (0x0000003bc1a00000)
The program is linked against the DSO version of libgomp.
Fully static linking:
$ gcc -fopenmp -static -o foo foo.c
$ ldd foo
not a dynamic executable
With -static all libraries are linked in statically into the executable.
Linking only libgomp statically:
$ gcc -fopenmp -c foo.c
$ gcc -o foo foo.o -Wl,-static -lgomp -Wl,-Bdynamic -lpthread
$ ldd foo
linux-vdso.so.1 => (0x00007ffdaaf61000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003bc2600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003bc1e00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003bc1a00000)
It is important to maintain the correct order of statically linked objects in that case. If foo.o is placed after -lgomp, a link error results:
$ gcc -o foo -Wl,-static -lgomp -Wl,-Bdynamic foo.o -lpthread
foo.o: In function `main':
foo.c:(.text+0x14): undefined reference to `GOMP_parallel_start'
foo.c:(.text+0x23): undefined reference to `GOMP_parallel_end'
foo.o: In function `main.omp_fn.0':
foo.c:(.text+0x3b): undefined reference to `omp_get_thread_num'
collect2: ld returned 1 exit status
Any object file resulting from source code that contains OpenMP constructs should be placed before -lgomp.
The term "runtime library" is usually used for the standard library and environment needed to run your program. In the case of a C program, it's the C standard library, maybe some other libraries specific for your compiler, and some object files linked to your program to set up the standard C environment.
A "runtime library" can be a dynamic library, or even a collection of multiple dynamic libraries. But it can also be one or more static libraries as well.
Dynamic libraries are convenient way of providing runtime libraries, as many programs will potentially want to link against such a library. And this is why dynamic linking and dynamic libraries are out there - the first process that requires particular dynamic library will cause it to be loaded to memory. Later, this instance of dynamic library can be reused by many processes.
Imagine if you'd have tens of process running and each of them statically link against say C runtime library. I assume memory consumption would grow rather significantly compared to the case where each of these processes would link against single DLL.
In my opinion, if library will be used by many different processes (for example DirectX might be such library) it might be more efficient to provide it as dynamic library. Otherwise, static linking is preferable.

Will my linker link objects from unneeded source files?

When compiling a project, if my source files include a file with no used functions, will the unneeded object file be included in the compiler's output?
e.g.
foo.c
int main() {return 0;}
bar.c
void unusedFunction {;}
compiler execution:
gcc foo.c bar.c -o output
Would the output file be any smaller if I had omitted bar.c from the compiler command?
Most linkers will link unneeded files unless you tell them not to. There are flags for this.
Suppose you link with two unnneeded files: an object file and a library:
gcc main.o unneeded.o -lunneeded
With GNU Binutils or Gold, the flags are --gc-sections for unneeded symbols / object files, and --as-needed for libraries. These are linker flags, however, so they must be prefixed with -Wl,. Note that the order of these flags is important—flags only apply to libraries and object files which appear after the flags on the command line, so the flags must be specified first.
gcc -Wl,--as-needed -Wl,--gc-sections main.o unneeded.o -lunneeded
On OS X, there is a different linker so the flags are different. The -dead_strip flag removes unneeded symbols / object files, and the -dead_strip_dylibs flag removes unneeded libraries.
gcc -Wl,-dead_strip -Wl,-dead_strip_dylibs main.o unneeded.o -luneeded
Example
$ cat main.c
int main() { }
$ cat unneeded.c
void unneeded() { }
$ gcc -c main.c
$ gcc -c unneeded.c
If we link normally, we get everything...
$ gcc main.o unneeded.o -lz
$ nm a.out | grep unneeded
0000000000400574 T unneeded
$ readelf -d a.out | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
With the right flags, we get just what we need...
$ gcc -Wl,--as-needed -Wl,--gc-sections main.o unneeded.o -lz
$ nm a.out | grep unneeded
$ readelf -d a.out | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Every source or object file referenced in the compilation command will be included in the final executable.
To include only what's needed, build a static library from your objects, and refer to it when linking, rather than to individual objects. Static libraries were invented just for this purpose.
Use something like Scons (http://www.scons.org/) or Gradle (http://www.gradle.org/) to figure out the dependencies and will then just link the appropriate bits (assuming you are using static linking)
Dynamic linking is another matter — you get the lot as other programs may need the additional stuff.
But as to the command line given it will link it in. Why add it in the first place if it is not required?

Resources