Dynamic library "forwarding" - c

[Edit: In short, the question is: when I link against a dynamic library that is linked against another dynamic library, do I have to explicitly link against that as well?]
I saw something like this in a piece of software. It doesn't work and now I am wondering whether it is supposed to work. Can I link a library "bar" dynamically against another library "foo" and then link against that library to access the symbols from "foo" (because "bar" should want to be linked to "foo")? (I am using linux and gcc 4.8.2 in case that matters.)
Concretely suppose I have the three files below. Now I do
gcc -c -Wall -Werror -fpic foo.c
gcc -shared -olibfoo.so foo.o
at which point I would usually do
gcc -o program main.c -L. -lfoo
to get a working program. Now instead I do
gcc -shared -olibbar.so -L. -lfoo
gcc -o program main.c -L. -lbar
This doesn't work:
/tmp/cciNSTyI.o: In function `main':
main.c:(.text+0xf): undefined reference to `foo'
collect2: error: ld returned 1 exit status
Should it?
foo.h
#ifndef foo_h__
#define foo_h__
extern void foo(void);
#endif
foo.c
#include <stdio.h>
void foo(void)
{
puts("foo");
}
main.c
#include <stdio.h>
#include "foo.h"
int main(void)
{
puts("Library test...");
foo();
return 0;
}
Edit: I wrote an answer about my understanding of what's going on below.
One thing I'm still not quite clear about is the order of arguments: If (with a file bar.c as in that answer) I link bar with the lines (note the position of "bar.o")
gcc -o program main.c -L. -lbar
gcc -shared -olibbar.so -L. -lfoo bar.o
then it "bar" does not depend on "foo":
> readelf -d libbar.so
Dynamic section at offset 0xe18 contains 24 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x5a8
[...]

No, it is not possible to "forward" dynamically linked libraries.
When you link against any library, statically or dynamically, you actually enable your main executable to call on the functions/symbols defined in the linked library.
In your case, the library bar, does not have the function foo() defined in it. So when bar.so is created, the symbols generated are registered in the symbol table of your main executable - program. As, the symbols in the bar lib do not contain any function called foo(), it doesn't get registered in the symbol table of program. So when foo() gets called during time, the loader tries to find the .so in which foo() would be defined among all the libraries you linked during compiling program. Hence the run time error. It doesn't show a compile time error because you had included the foo.h header file to it.
You need to explicitly link all the libraries of which symbols(functions, variables, constants, etc.) you want to reference in the code being compiled.

when I link against a dynamic library that is linked against another dynamic library, do I have to explicitly link against that as well?
No.
That other library must be linked against shared libraries it requires, otherwise it would be one massive fuster cluck (the one you get with .a files).
Imagine someone else adding a shared library dependency to a shared library you use. That would cause your application to fail at link-time (at best) or run-time. This is why shared libraries carry their own dependencies.
You can use readelf utility to examine shared library dependencies, e.g.:
$ readelf -d /usr/lib64/libboost_wave-mt.so
Dynamic section at offset 0x12fd58 contains 30 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libboost_filesystem-mt.so.5]
0x0000000000000001 (NEEDED) Shared library: [libboost_thread-mt.so.5]
0x0000000000000001 (NEEDED) Shared library: [libboost_date_time-mt.so.5]
0x0000000000000001 (NEEDED) Shared library: [libboost_system-mt.so.5]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000e (SONAME) Library soname: [libboost_wave-mt.so.5]
0x000000000000000c (INIT) 0xb49f0
0x000000000000000d (FINI) 0x10c018
0x000000006ffffef5 (GNU_HASH) 0x1b8
0x0000000000000005 (STRTAB) 0xbe08
0x0000000000000006 (SYMTAB) 0x2cb8
0x000000000000000a (STRSZ) 637705 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000003 (PLTGOT) 0x3308a8
0x0000000000000002 (PLTRELSZ) 7584 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0xb2c50
0x0000000000000007 (RELA) 0xa8600
0x0000000000000008 (RELASZ) 42576 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0xa8530
0x000000006fffffff (VERNEEDNUM) 4
0x000000006ffffff0 (VERSYM) 0xa7912
0x000000006ffffff9 (RELACOUNT) 405
0x0000000000000000 (NULL) 0x0
Note NEEDED attributes - these are the shared libraries that get loaded automatically when you load this shared library.
In
gcc -shared -olibbar.so -L. -lfoo
You produce a shared library from a shared library. In this case you need to do partial linking with --relocatable linker option:
gcc -shared -Wl,--relocatable -olibbar.so -L. -lfoo

This is following up on the answer by Maxim Egorushkin (in particular using readelf was instructive).
First it seems that only libraries are marked as "NEEDED" if they are actually needed anywhere, i.e. if some symbol defined by them is used at all. In the example in the question this is of course not the case.
Second, even if via these dependencies "foo" ends up being linked in, its symbols are still not available to main.c unless it is linked in explicitly.
To verify this behaviour, consider the files
bar.h
#ifndef bar_h__
#define bar_h__
extern void bar(void);
#endif
and
bar.c
#include <stdio.h>
#include "foo.h"
void bar(void)
{
foo();
puts("bar");
}
that are compiled and linked via
gcc -o program main.c -L. -lbar
gcc -shared -olibbar.so -L. -lfoo bar.o
Then "bar" depends on "foo" (unlike in the answer):
> readelf -d libbar.so
Dynamic section at offset 0xe08 contains 25 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libfoo.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x5b0
If one replaces every "foo" by "bar" in the file main.c above, one can compile and link everything to a working program
gcc -o program main.c -L. -lbar -Wl,-rpath-link .
If one keeps the "foo"-lines, linking fails because the foo-symbol cannot be resolved (the program isn't even linked against "bar").
The interesting case is the one where calls to both "foo" and "bar" occur. Now main.c is linked against "bar" which is linked against "foo", but the symbols in "foo" are still not available to main.c.
This behaviour actually makes sense because main.c might depend on a different library defining a function foo() and then it doesn't want to know about the one that "bar" uses. And if that behaviour is desired, also the optimization of not "NEEDing" libraries that aren't needed is valid.

Related

Why the sequence of the parameters passes to the `gcc` influence the output of `readelf -d` for the built shared library?

Given these:
bar.cpp:
int foo();
int bar()
{
return foo();
}
foo.cpp:
int foo()
{
return 42;
}
The libfoo.so is built by gcc for foo.cpp,i.e. gcc -shared -o libfoo.so -fPIC foo.c
As it's all known that readelf -d could be used to show the dependency of a specific shared library.
$ gcc -shared -o libbar2.so -fPIC bar.c -lfoo -L.
$ gcc -shared -o libbar.so -lfoo -L. -fPIC bar.c
$ readelf -d libbar2.so | grep -i needed
0x0000000000000001 (NEEDED) Shared library: [libfoo.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
$ readelf -d libbar.so | grep -i needed
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Why the sequence of the parameters passes to the gcc influence the output of readelf -d for the built shared library?
All these tests are on Ubuntu16.04 with gcc 5.4.0.
Update:
$ ls -l libbar*
-rwxrwxr-x 1 joy joy 8000 Oct 4 23:16 libbar2.so
-rwxrwxr-x 1 joy joy 8000 Oct 4 23:16 libbar.so
$ sum -r libbar*
00265 8 libbar2.so
56181 8 libbar.so
The linking process is sequential and the order in which you specify the files is important. The file are treated in the order they are given. See this extract from the ld manual:
Some of the command-line options to ld may be specified at any point
in the command line. However, options which
refer to files, such as -l or -T, cause the file to be read at the point at which the option appears in the command
line, relative to the object files and other file options.
When you try to link a shared library into another one, the linker will lookup if there is any undefined reference that requires something from the library in all the files considered UP TO NOW(hence in your second example, there is no files prior to the libfoo library ) , and if there is none, the library is left aside, and the linking continue with the remaining files.
Here you also have a behaviour that may be surprising: it is possible (by default) to create shared libraries that still have undefined references (that means they are not self contained). That is what happen in your second example(libbar.so). If you want to avoid this behaviour to be sure you are not in this case you can add the -Wl,-no-undefined option (see https://stackoverflow.com/a/2356393/4871988).
If you add this option the second case will raise an error at link time.
EDIT: I found this other extract in the ld manual that explain this behaviour:
The linker will search an archive only once, at the location where it
is specified on the command line. If the
archive defines a symbol which was undefined in some object which appeared before the archive on the command
line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an
object appearing later on the command line will not cause the linker to search the archive again.
See the -( option for a way to force the linker to search archives multiple times.
You may list the same archive multiple times on the command line.
This also applies to shared libraries

What's the difference between `-rpath-link` and `-L`?

The man for gold states:
-L DIR, --library-path DIR
Add directory to search path
--rpath-link DIR
Add DIR to link time shared library search path
The man for bfd ld makes it sort of sound like -rpath-link is used for recursively included sos.
ld.lld doesn't even list it as an argument.
Could somebody clarify this situation for me?
Here is a demo, for GNU ld, of the difference between -L and -rpath-link -
and for good measure, the difference between -rpath-link and -rpath.
foo.c
#include <stdio.h>
void foo(void)
{
puts(__func__);
}
bar.c
#include <stdio.h>
void bar(void)
{
puts(__func__);
}
foobar.c
extern void foo(void);
extern void bar(void);
void foobar(void)
{
foo();
bar();
}
main.c
extern void foobar(void);
int main(void)
{
foobar();
return 0;
}
Make two shared libraries, libfoo.so and libbar.so:
$ gcc -c -Wall -fPIC foo.c bar.c
$ gcc -shared -o libfoo.so foo.o
$ gcc -shared -o libbar.so bar.o
Make a third shared library, libfoobar.so that depends on the first two;
$ gcc -c -Wall -fPIC foobar.c
$ gcc -shared -o libfoobar.so foobar.o -lfoo -lbar
/usr/bin/ld: cannot find -lfoo
/usr/bin/ld: cannot find -lbar
collect2: error: ld returned 1 exit status
Oops. The linker doesn't know where to look to resolve -lfoo or -lbar.
The -L option fixes that.
$ gcc -shared -o libfoobar.so foobar.o -L. -lfoo -lbar
The -Ldir option tells the linker that dir is one of the directories to
search for libraries that resolve the -lname options it is given. It searches
the -L directories first, in their commandline order; then it searches its
configured default directories, in their configured order.
Now make a program that depends on libfoobar.so:
$ gcc -c -Wall main.c
$ gcc -o prog main.o -L. -lfoobar
/usr/bin/ld: warning: libfoo.so, needed by ./libfoobar.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libbar.so, needed by ./libfoobar.so, not found (try using -rpath or -rpath-link)
./libfoobar.so: undefined reference to `bar'
./libfoobar.so: undefined reference to `foo'
collect2: error: ld returned 1 exit status
Oops again. The linker detects the dynamic dependencies requested by libfoobar.so
but can't satisfy them. Let's resist its advice - try using -rpath or -rpath-link -
for a bit and see what we can do with -L and -l:
$ gcc -o prog main.o -L. -lfoobar -lfoo -lbar
So far so good. But:
$ ./prog
./prog: error while loading shared libraries: libfoobar.so: cannot open shared object file: No such file or directory
at runtime, the loader can't find libfoobar.so.
What about the linker's advice then? With -rpath-link, we can do:
$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath-link=$(pwd)
and that linkage also succeeds. ($(pwd) means "Print Working Directory" and just "copies" the current path.)
The -rpath-link=dir option tells the linker that when it encounters an input file that
requests dynamic dependencies - like libfoobar.so - it should search directory dir to
resolve them. So we don't need to specify those dependencies with -lfoo -lbar and don't
even need to know what they are. What they are is information already written in the
dynamic section of libfoobar.so:-
$ readelf -d libfoobar.so
Dynamic section at offset 0xdf8 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libfoo.so]
0x0000000000000001 (NEEDED) Shared library: [libbar.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
...
...
We just need to know a directory where they can be found, whatever they are.
But does that give us a runnable prog?
$ ./prog
./prog: error while loading shared libraries: libfoobar.so: cannot open shared object file: No such file or directory
No. Same as story as before. That's because -rpath-link=dir gives the linker the information
that the loader would need to resolve some of the dynamic dependencies of prog
at runtime - assuming it remained true at runtime - but it doesn't write that information into the dynamic section of prog.
It just lets the linkage succeed, without our needing to spell out all the recursive dynamic
dependencies of the linkage with -l options.
At runtime, libfoo.so, libbar.so - and indeed libfoobar.so -
might well not be where they are now - $(pwd) - but the loader might be able to locate them
by other means: through the ldconfig cache or a setting
of the LD_LIBRARY_PATH environment variable, e.g:
$ export LD_LIBRARY_PATH=.; ./prog
foo
bar
rpath=dir provides the linker with the same information as rpath-link=dir
and instructs the linker to bake that information into the dynamic section of
the output file. Let's try that:
$ export LD_LIBRARY_PATH=
$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd)
$ ./prog
foo
bar
All good. Because now, prog contains the information that $(pwd) is a runtime search
path for shared libraries that it depends on, as we can see:
$ readelf -d prog
Dynamic section at offset 0xe08 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libfoobar.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000f (RPATH) Library rpath: [/home/imk/develop/so/scrap]
... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
That search path will be tried after the directories listed in LD_LIBRARY_PATH, if any are set, and before the system defaults - the ldconfig-ed directories, plus /lib and /usr/lib.
The --rpath-link option is used by bfd ld to add to the search path used for finding DT_NEEDED shared libraries when doing link-time symbol resolution. It's basically telling the linker what to use as the runtime search path when attempting to mimic what the dynamic linker would do when resolving symbols (as set by --rpath options or the LD_LIBRARY_PATH environment variable).
Gold does not follow DT_NEEDED entries when resolving symbols in shared libraries, so the --rpath-link option is ignored. This was a deliberate design decision; indirect dependencies do not need to be present or in their runtime locations during the link process.

Isn't ld checking for unresolved symbols in shared libraries redundant?

When linking a program against a shared object, ld will ensure that symbols can be resolved. This basically ensures that the interfaces between the program and its shared objects are compatible. After reading Linking with dynamic library with dependencies, I learnt that ld will descend into linked shared objects and attempt to resolve their symbols too.
Aren't my shared object's references already checked when the shared objects are themselves linked?
I can understand the appeal of finding out at link time whether a program has all the pieces it requires to start, but does it seems irrelevant in the context of packages building where shared objects may be distributed separately (Debian's lib* packages, for instance). It introduces recursive build dependencies on systems uninterested in executing built programs.
Can I trust the dependencies resolved when the shared object was built? If so, how safe is it to use -unresolved-symbols=ignore-in-shared-libs when building my program?
You're wondering why a program's linkage should bother to resolve symbols originating in
the shared libraries that it's linked with because:
Aren't my shared object's references already checked when the shared objects are themselves linked?
No they're not, unless you expressly insist on it when you link the shared library,
Here I'm going to build a shared library libfoo.so:
foo.c
extern void bar();
void foo(void)
{
bar();
}
Routinely compile and link:
$ gcc -fPIC -c foo.c
$ gcc -shared -o libfoo.so foo.o
No problem, and bar is undefined:
$ nm --undefined-only libfoo.so | grep bar
U bar
I need to insist to get the linker to object to that:
$ gcc --shared -o libfoo.so foo.o -Wl,--no-undefined
foo.o: In function `foo':
foo.c:(.text+0xa): undefined reference to `bar'
Of course:
main.c
extern void foo(void);
int main(void)
{
foo();
return 0;
}
it won't let me link libfoo with a program:
$ gcc -c main.c
$ gcc -o prog main.o -L. -lfoo
./libfoo.so: undefined reference to `bar'
unless I also resolve bar in the same linkage:
bar.c
#include <stdio.h>
void bar(void)
{
puts("Hello world!");
}
maybe by getting it from another shared library:
gcc -fPIC -c bar.c
$ gcc -shared -o libbar.so bar.o
$ gcc -o prog main.o -L. -lfoo -lbar
And then everything's fine.
$ export LD_LIBRARY_PATH=.; ./prog
Hello world!
It's of the essense of a shared library that it doesn't by default have
to have all of its symbols resolved at linktime. That way that a program - which
typically does need all its symbols resolved a linktime - can get all its symbols
resolved by being linked with more than one library.
Aren't my shared object's references already checked
when the shared objects are themselves linked?
Well, shared libs might have been linked with -Wl,--allow-shlib-undefined or with dummy dependencies so it still makes sense to check them.
Can I trust the dependencies resolved when the shared object was built?
Probly not, current linking environment and the environment used to link original shlibs may be different.
If so, how safe is it to use -unresolved-symbols=ignore-in-shared-libs
when building my program?
You may be missing potential errors in this case (or rather delaying them to runtime which is still bad). Imagine a situation where some of the symbols needed by shared objects are to come from executable itself or from one of the libs which is linked by executable (but not by the shlib which is missing the symbols).
EDIT
Although above is correct, Mike Kinghan's answer gives stronger argument in favor of symbol resolution in libraries during executable link.

Link problems with libc++abi when linking against libc++ via cmake

I'm trying to build a simple ("hello world") C++ program with LLVM/Clang 3.7.0 built from sources against the toolchain's libc++, with the command line:
clang++ -std=c++14 -stdlib=libc++ -fno-exceptions hello.cpp
However, I get the following errors:
/usr/bin/ld: warning: libc++abi.so.1, needed by /bulk/workbench/llvm/3.7.0
/toolchain4/bin/../lib/libc++.so, not found (try using -rpath or -rpath-link)
/bulk/workbench/llvm/3.7.0/toolchain4/bin/../lib/libc++.so: undefined reference to `__cxa_rethrow_primary_exception'
/bulk/workbench/llvm/3.7.0/toolchain4/bin/../lib/libc++.so: undefined reference to `__cxa_decrement_exception_refcount'
/bulk/workbench/llvm/3.7.0/toolchain4/bin/../lib/libc++.so: undefined reference to `std::out_of_range::~out_of_range()'
[...]
The LD_LIBRARY_PATH is not set and the toolchain's install directory is added to my working PATH by:
export PATH=$PATH:/bulk/workbench/llvm/3.7.0/toolchain4/bin/
I'm on Ubuntu GNU/Linux 14.04 and I have not installed anything LLVM or Clang related packages from any repository.
According to the libc++ documentation:
On Linux libc++ can typically be used with only ‘-stdlib=libc++’. However some libc++ installations require the user manually link libc++abi themselves. If you are running into linker errors when using libc++ try adding ‘-lc++abi’ to the link line.
Doing as suggested gives a successful build.
So, my question is this:
Why do I have to specify the -lc++abi dependency explicitly on the line of the build command?
Doing
readelf -d $(llvm-config --libdir)/libc++.so
gives
Dynamic section at offset 0xb68c8 contains 31 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x000000000000000e (SONAME) Library soname: [libc++.so.1]
0x000000000000000f (RPATH) Library rpath: [$ORIGIN/../lib]
0x000000000000000c (INIT) 0x350a8
[...]
Shouldn't the embedded RPATH in the dynamic section of the ELF be considered by ld as described in its man page under the section -rpath-link=dir?
Moreover, when I set the LD_LIBRARY_PATH with
LD_LIBRARY_PATH=$(llvm-config --libdir)
the initial build command (without specifying -lc++abi) works, as also described in the 5th clause of the aforementioned man entry.

Will my linker link objects from unneeded source files?

When compiling a project, if my source files include a file with no used functions, will the unneeded object file be included in the compiler's output?
e.g.
foo.c
int main() {return 0;}
bar.c
void unusedFunction {;}
compiler execution:
gcc foo.c bar.c -o output
Would the output file be any smaller if I had omitted bar.c from the compiler command?
Most linkers will link unneeded files unless you tell them not to. There are flags for this.
Suppose you link with two unnneeded files: an object file and a library:
gcc main.o unneeded.o -lunneeded
With GNU Binutils or Gold, the flags are --gc-sections for unneeded symbols / object files, and --as-needed for libraries. These are linker flags, however, so they must be prefixed with -Wl,. Note that the order of these flags is important—flags only apply to libraries and object files which appear after the flags on the command line, so the flags must be specified first.
gcc -Wl,--as-needed -Wl,--gc-sections main.o unneeded.o -lunneeded
On OS X, there is a different linker so the flags are different. The -dead_strip flag removes unneeded symbols / object files, and the -dead_strip_dylibs flag removes unneeded libraries.
gcc -Wl,-dead_strip -Wl,-dead_strip_dylibs main.o unneeded.o -luneeded
Example
$ cat main.c
int main() { }
$ cat unneeded.c
void unneeded() { }
$ gcc -c main.c
$ gcc -c unneeded.c
If we link normally, we get everything...
$ gcc main.o unneeded.o -lz
$ nm a.out | grep unneeded
0000000000400574 T unneeded
$ readelf -d a.out | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
With the right flags, we get just what we need...
$ gcc -Wl,--as-needed -Wl,--gc-sections main.o unneeded.o -lz
$ nm a.out | grep unneeded
$ readelf -d a.out | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Every source or object file referenced in the compilation command will be included in the final executable.
To include only what's needed, build a static library from your objects, and refer to it when linking, rather than to individual objects. Static libraries were invented just for this purpose.
Use something like Scons (http://www.scons.org/) or Gradle (http://www.gradle.org/) to figure out the dependencies and will then just link the appropriate bits (assuming you are using static linking)
Dynamic linking is another matter — you get the lot as other programs may need the additional stuff.
But as to the command line given it will link it in. Why add it in the first place if it is not required?

Resources