Static vs Dynamic Linking - c

I'm trying to understand how the ELF looks like for a statically vs. a dynamically linked program.
I understand that this is how static linking works:
In my case, I have two files, foo.c and bar.c.
I also have their object files; foo.o and bar.o.
With the objdump command, I can see the relocations in each file.
How do I statically link the foo.o and bar.o?
How do I dynamically link the foo.o and bar.o?
How can I see the difference in the output files?

Linking dynamically is the default mode of most linkers these days. If you want to link statically you have to use the -static flag when linking. To clarify, when I say "linking dynamically" versus "linking statically" I mean the linking with external libraries, and not generating a library that in turn can be linked (dynamically or statically).
The difference can't be seen in the object files you pass to the linker, as it has nothing to do with the compiler and object-file generation, the result can only be seen in the resulting executable program after linking, and the biggest difference is that the executable will most likely be larger.
The resulting and fully linked executable will be larger because then all the libraries (for which there are static libraries) will actually be linked into the executable program quite literally. It's basically including the libraries object files together with your own object files. Actually, on POSIX platforms static libraries are just archives of object files.

Related

On linking of shared libraries, are they really final, and if so, why?

I am trying to understand more about linking and shared library.
Ultimately, I wonder if it's possible to add a method to a shared library. For instance, suppose one has a source file a.c, and a library lib.so (without the source file). Let's furthermore assume, for simplicity, that a.c declares a single method, whose name is not present in lib.so. I thought maybe it might be possible to, at linking time, link a.o to lib.so while instructing to create newLib.so, and forcing the linker to export all methods/variable in lib.so to that the newLib.so is now basically lib.so with the added method from a.so.
More generally, if one has some source file depending on a shared library, can one create a single output file (library or executable) that is not dependent on the shared library anymore ? (That is, all the relevant methods/variable from the library would have been exported/linked/inlined to the new executable, hence making the dependency void). If that's not possible, what is technically preventing it ?
A somehow similar question has been asked here: Merge multiple .so shared libraries.
One of the reply includes the following text: "If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.: without explaining the technical details. Was it a mistake or does it hold ? If so, how to do it ?
Once you have a shared library libfoo.so the only ways you can use it
in the linkage of anything else are:-
Link a program that dynamically depends on it, e.g.
$ gcc -o prog bar.o ... -lfoo
Or, link another shared library that dynamically depends on it, e.g.
$ gcc -shared -o libbar.so bar.o ... -lfoo
In either case the product of the linkage, prog or libbar.so
acquires a dynamic dependency on libfoo.so. This means that prog|libfoo.so
has information inscribed in it by the linker that instructs the
OS loader, at runtime, to find libfoo.so, load it into the
address space of the current process and bind the program's references to libfoo's exported symbols to
the addresses of their definitions.
So libfoo.so must continue to exist as well as prog|libbar.so.
It is not possible to link libfoo.so with prog|libbar.so in
such a way that libfoo.so is physically merged into prog|libbar.so
and is no longer a runtime dependency.
It doesn't matter whether or not you have the source code of the
other linkage input files - bar.o ... - that depend on libfoo.so. The
only kind of linkage you can do with a shared library is dynamic linkage.
This is in complete contrast with the linkage of a static library
You wonder about the statement in this this answer where it says:
If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.
The author is just observing that if I have source files
foo_a.c foo_b.c... bar_a.c bar_b.c
which I compile to the corresponding object files:
foo_a.o foo_b.o... bar_a.o bar_b.o...
or if I simply have those object files. Then as well as - or instead of - linking them into two shared libraries:
$ gcc -shared -o libfoo.so foo_a.o foo_b.o...
$ gcc -shared -o libbar.so bar_a.o bar_b.o...
I could link them into one:
$ gcc -shared -o libfoobar.so foo_a.o foo_b.o... bar_a.o bar_b.o...
which would have no dependency on libfoo.so or libbar.so even if they exist.
And although that could be straightforward it could also be false. If there is
any symbol name that is globally defined in any of foo_a.o foo_b.o... and
also globally defined in any of bar_a.o bar_b.o... then it will not matter
to the linkage of either libfoo.so or libbar.so (and it need not be dynamically
exported by either of them). But the linkage of libfoobar.so will fail for
multiple definition of name.
If we build a shared library libbar.so that depends on libfoo.so and has
itself been linked with libfoo.so:
$ gcc -shared -o libbar.so bar.o ... -lfoo
and we then want to link a program with libbar.so, we can do that in such a way
that we don't need to mention its dependency libfoo.so:
$ gcc -o prog main.o ... -lbar -Wl,-rpath=<path/to/libfoo.so>
See this answer to follow that up. But
this doesn't change the fact that libbar.so has a runtime dependency on libfoo.so.
If that's not possible, what is technically preventing it?
What technically prevents linking a shared library with some program
or shared library targ in a way that physically merges it into targ is that a
shared library (like a program) is not the sort of thing that a linker knows
how to physically merge into its output file.
Input files that the linker can physically merge into targ need to
have structural properties that guide the linker in doing that merging. That is the structure of object files.
They consist of named input sections of object code or data that are tagged with various attributes.
Roughly speaking, the linker cuts up the object files into their sections and distributes them into
output sections of the output file according to their attributes, and makes
binary modifications to the merged result to resolve static symbol references
or enable the OS loader to resolve dynamic ones at runtime.
This is not a reversible process. The linker can't consume a program or
shared library and reconstruct the object files from which it was made to
merge them again into something else.
But that's really beside the point. When input files are physically
merged into targ, that is called static linkage.
When input files are just externally referenced in targ to
make the OS loader map them into a process it has launched for targ,
that is called dynamic linkage. Technical development has given us
a file-format solution to each of these needs: object files for static linkage, shared libraries
for dynamic linkage. Neither can be used for the purpose of the other.

Linking an archive to an archive

With GCC on Linux, is it possible to link a .a into another .a and then only link the resultant .a to my application? Or must my application know of the dependence between one archive and another and link them both?
My understanding is that I must know of the dependencies and link all archives at the end, not in an intermediary step, which seems a little ugly.
This is slightly different than How to merge two "ar" static libraries into one as I'm after a clear description that this is only possible by working around the problem and that linking the two archives together in the naive way is incorrect and will not work, along with the reason as to why.
Yes, your application has to know the dependencies between your different static libraries.
Let's say you have two static libraries a and b.
a has a function void print_a(), and b has a function void print_b() that is calling to print_a(). So, b depends on a.
Their binaries will look like liba.a and libb.a.
Let's say that library b has a reference to a function defined in library a - void print_b(void).
When compiling library b only its symbols are defined in the binary's code section while the others are still undefined:
host$ nm libb.a | grep print
U _print_a <--- Undefined
0000000000000000 T _print_b <--- Defined, in code section
0000000000000068 S _print_b.eh
U _printf
Therefore, when compiling the application that wants to use both of the libraries, linking only to libb.a won't be enough. You'll have to link your application to both libraries. Each library will provide its own symbols addresses in the code section and then your application will be able to link to both.
Something like:
gcc -o main main.c libb.a liba.a
BTW: When compiling library b that uses a, you can but it's not necessary to link to a. The result will be just the same.
Why is this the behavior
When compiling + linking the application that uses static libraries, the symbols in the application source files have to be defined somewhere (with the exception of dynamic linking, but this is done only with dynamic libraries/shared objects. Here we deal with static ones).
Now, remember that a static library is just an archive of objects. When it's created there's no linking phase. Just:
Compiling source code (*.c) to objects (*.o)
Archiving them together in a libXXXX.a file.
It means that if this library (library b in my example) uses some function (void print_a(void)) that is defined in another library (a), this symbol won't be resolved (not as a compilation error, but as the normal behavior). It will be set as Undefined symbol (as we see in the output of nm command) after the library creation, and it will wait to be linked later to its definition. And it's OK because a static library is not executable.
Now returning to application - the linking phase of the application needs to find all the definitions of all the symbols. If you just gave it libb.a as an argument, it wouldn't be able to find the definition to print_a(), because it's not there, it's still undefined. It exists only in liba.a.
Therefore, you must provide both of the libraries.
Let libx.a and liby.a be the modules you want to combine. You can try:-
mkdir tmp # create temporary directory for extracting
cd tmp
ar x ../libx.a # extract libx.a
cp ../liby.a ../libxy.a
ar -q ../libxy.a * # add extracted files to libxy.a
cd ..
rm -rf tmp
libxy.a thus created contains .o files from both .a files

Link static library with static library

I have a Makefile.am with two noinst_LIBRARIES, and one of them needs to link with the other.
Adding it to the CFLAGS throws a compiler warning, but as far as I know, automake likes to freak out about using LDADD with libraries, since they are not complete programs.
How can I do this, assuming libb.a needs to pull in liba.a?
You can't do it. Actually, what you are trying to do doesn't really make sense. Static libraries are just archives containing object files and a table of contents. Put simply, you can think of a static library as a .zip containing .o files.
The linking phase only takes place when compiling a shared object or executable. When your program is linked against liba.a, you also need to specify -static -lb or similar and that's it.

Statically linking against LAPACK

I'm attempting to do a release of some software and am currently working through a script for the build process. I'm stuck on something I never thought I would be, statically linking LAPACK on x86_64 linux. During configuration AC_SEARCH_LIB([main],[lapack]) works, but compilation of the lapack units do not work, for example undefiend reference to 'dsyev_' --no lapack/blas routine goes unnoticed.
I've confirmed I have the libraries installed and even compiled them myself with the appropriate options to make them static with the same results.
Here is an example I had used in my first experience with LAPACK a few years ago that works dynamically, but not statically: http://pastebin.com/cMm3wcwF
The two methods I'm using to compile are the following,
gcc -llapack -o eigen eigen.c
gcc -static -llapack -o eigen eigen.c
Your linking order is wrong. Link libraries after the code that requires them, not before. Like this:
gcc -o eigen eigen.c -llapack
gcc -static -o eigen eigen.c -llapack
That should resolve the linkage problems.
To answer the subsequent question why this works, the GNU ld documentation say this:
It makes a difference where in the command you write this option; the
linker searches and processes libraries and object files in the order
they are specified. Thus, foo.o -lz bar.o' searches libraryz' after
file foo.o but before bar.o. If bar.o refers to functions in `z',
those functions may not be loaded.
........
Normally the files found this way are library files—archive files
whose members are object files. The linker handles an archive file by
scanning through it for members which define symbols that have so far
been referenced but not defined. But if the file that is found is an
ordinary object file, it is linked in the usual fashion.
ie. the linker is going to make one pass through a file looking for unresolved symbols, and it follows files in the order you provide them (ie. "left to right"). If you have not yet specified a dependency when a file is read, the linker will not be able to satisfy the dependency. Every object in the link list is parsed only once.
Note also that GNU ld can do reordering in cases where circular dependencies are detected when linking shared libraries or object files. But static libraries are only parsed for unknown symbols once.

Restricting symbols in a Linux static library

I'm looking for ways to restrict the number of C symbols exported to a Linux static library (archive). I'd like to limit these to only those symbols that are part of the official API for the library. I already use 'static' to declare most functions as static, but this restricts them to file scope. I'm looking for a way to restrict to scope to the library.
I can do this for shared libraries using the techniques in Ulrich Drepper's How to Write Shared Libraries, but I can't apply these techniques to static archives. In his earlier Good Practices in Library Design paper, he writes:
The only possibility is to combine all object files which need
certain internal resources into one using 'ld -r' and then restrict the symbols
which are exported by this combined object file. The GNU linker has options to
do just this.
Could anyone help me discover what these options might be? I've had some success with 'strip -w -K prefix_*', but this feels brutish. Ideally, I'd like a solution that will work with both GCC 3 and 4.
Thanks!
I don't believe GNU ld has any such options; Ulrich must have meant objcopy, which has many such options: --localize-hidden, --localize-symbol=symbolname, --localize-symbols=filename.
The --localize-hidden in particular allows one to have a very fine control over which symbols are exposed. Consider:
int foo() { return 42; }
int __attribute__((visibility("hidden"))) bar() { return 24; }
gcc -c foo.c
nm foo.o
000000000000000b T bar
0000000000000000 T foo
objcopy --localize-hidden foo.o bar.o
nm bar.o
000000000000000b t bar
0000000000000000 T foo
So bar() is no longer exported from the object (even though it is still present and usable for debugging). You could also remove bar() all together with objcopy --strip-unneeded.
Static libraries can not do what you want for code compiled with either GCC 3.x or 4.x.
If you can use shared objects (libraries), the GNU linker does what you need with a feature called a version script. This is usually used to provide version-specific entry points, but the degenerate case just distinguishes between public and private symbols without any versioning. A version script is specified with the --version-script= command line option to ld.
The contents of a version script that makes the entry points foo and bar public and hides all other interfaces:
{ global: foo; bar; local: *; };
See the ld doc at: http://sourceware.org/binutils/docs/ld/VERSION.html#VERSION
I'm a big advocate of shared libraries, and this ability to limit the visibility of globals is one their great virtues.
A document that provides more of the advantages of shared objects, but written for Solaris (by Greg Nakhimovsky of happy memory), is at http://developers.sun.com/solaris/articles/linker_mapfiles.html
I hope this helps.
The merits of this answer will depend on why you're using static libraries. If it's to allow the linker to drop unused objects later then I have little to add. If it's for the purpose of organisation - minimising the number of objects that have to be passed around to link applications - this extension of Employed Russian's answer may be of use.
At compile time, the visibility of all symbols within a compilation unit can be set using:
-fvisibility=hidden
-fvisibility=default
This implies one can compile a single file "interface.c" with default visibility and a larger number of implementation files with hidden visibility, without annotating the source. A relocatable link will then produce a single object file where the non-api functions are "hidden":
ld -r interface.o implementation0.o implementation1.o -o relocatable.o
The combined object file can now be subjected to objcopy:
objcopy --localize-hidden relocatable.o mylibrary.o
Thus we have a single object file "library" or "module" which exposes only the intended API.
The above strategy interacts moderately well with link time optimisation. Compile with -flto and perform the relocatable link by passing -r to the linker via the compiler:
gcc -fuse-linker-plugin -flto -nostdlib -Wl,-r {objects} -o relocatable.o
Use objcopy to localise the hidden symbols as before, then call the linker a final time to strip the local symbols and whatever other dead code it can find in the post-lto object. Sadly, relocatable.o is unlikely to have retained any lto related information:
gcc -nostdlib -Wl,-r,--discard-all relocatable.o mylibrary.o
Current implementations of lto appear to be active during the relocatable link stage. With lto on, the hidden=>local symbols were stripped by the final relocatable link. Without lto, the hidden=>local symbols survived the final relocatable link.
Future implementations of lto seem likely to preserve the required metadata through the relocatable link stage, but at present the outcome of the relocatable link appears to be a plain old object file.
This is a refinement of the answers from EmployedRussian and JonChesterfield, which may be helpful if you're generating both dynamic and static libraries.
Start with the standard mechanism for hiding symbols in DSOs (the dynamic version of your lib). Compile all files with -fvisibility=hidden. In the header file which defines your API, change the declarations of the classes and functions you want to make public:
#define DLL_PUBLIC __attribute__ ((visibility ("default")))
extern DLL_PUBLIC int my_api_func(int);
See here for details. This works for both C and C++. This is sufficient for DSOs, but you'll need to add these build steps for static libraries:
ld -r obj1.o obj2.o ... objn.o -o static1.o
objcopy --localize-hidden static1.o static2.o
ar -rcs mylib.a static2.o
The ar step is optional - you can just link against static2.o.
My way of doing it is to mark everything that is not to be exported with INTERNAL,
include guard all .h files, compile dev builds with -DINTERNAL= and compile release builds with a single .c file that includes all other library .c files with -DINTERNAL=static.

Resources