Linking multiple incompatible versions of a static library into one executable - c

I am presently developing for a system which discourages (i.e. essentially forbids) dynamic libraries. Therefore, everything has to be linked statically.
The application framework I am using (which cannot be changed) is using an old, statically-linked version of a library libfoo.a (version r7). A library I am using, libbar, needs libfoo.a version r8 (specifically, some of the new features are crucial for the library to function). I can edit and recompile libbar as well as libfoo r8, but I want to avoid changing them as much as possible because I am not very familiar with the code (and would have to pass code changes upstream).
Unfortunately, the two libfoo libraries have a substantial number of symbols in common. So, the linker spits out a ton of "multiple symbol definition" errors.
I've heard it's possible to use objcopy and friends to "inline" a static library into another. However, I'm not really sure how to achieve this in practice, nor if it's even the best option.
So, how can I successfully compile an executable which uses two, incompatible versions of the same library? I've already considered avoiding this situation but it will be much harder to work with.

It turns out that this is actually possible with some ld and objcopy magic.
Basically, the procedure looks like this:
# Unpack libraries
ar x libbar.a
ar x libfoo.a
# Grab symbol table (symbols to export)
nm -Ag libbar.a | grep -v ' U ' | cut -d' ' -f 3 > libbar.sym
# Build a single object file with libfoo relocated in
ld -Er *.o -o libbar-merged.lo
# Localize all symbols except for libbar's symbols
objcopy --keep-global-symbols libbar.sym libbar-merged.lo libbar-merged.o
# Create an archive to hold the merged library
ar crs libbar-merged.a libbar-merged.o
This effectively creates a single super-library which exports only the symbols from the original libbar, and which has the other library relocated in.
There's probably another, cleaner way to achieve this result, but this method works for me and allows me to statically link two incompatible libraries into the same executable, with no apparent ill effects.

Related

Linking error in static lib unless used in main project

I'm creating a little static library for having thread pools, and it depends on 2 other homemade static libraries (a homemade printf and a homemade mini libc).
But sub-functions like ft_bzero are not linked in the project unless I use them on the root project, the one that needs to use thread pools library. So I have the linking error coming from my thpool lib.
Sample :
cc -Wall -Werror -Wextra -MD -I ./ -I ./jqueue -I ../libft/incs -I
../printf/incs -o .objs/thpool_create.o -c ./thpool_create.c
ar rc libthpool.a ./.objs/thpool_create.o etcetc
In the libraries, I compile every .o and use an ar rc libthpool.a *.o. Then I compile .o from main project (a single test.c actually), and then
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lft -lftprintf -lthpool -lpthread
How can I solve my errors?
Since the code in the ftpool library uses code from ft and ftprintf, you (almost certainly) need to list the libraries in the reverse order:
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lthpool -lftprintf -lft -lpthread
When scanning a static library, the linker looks for definitions of symbols that are currently undefined. If your test code only calls functions from thpool, then none of the symbols in ft are referenced when the ft library is scanned, so nothing is included from the library; if none of the symbols from ftprintf are referenced when the ftprintf library is scanned, nothing is included from ftprintf either. When it comes across the symbols in thpool that reference things from ft or ftprintf, it's too late; the linker doesn't rescan the libraries. Hence you need to list the libraries in an order such that all references from one library (A) to another (B) are found by linking (A) before (B). If the test code references some of the functions in ft or ftprintf, you may get lucky, or a bit lucky; some symbols may be linked in. But if there are functions in thpool that make the first reference to a function in ft, with the order in the question, you've lost the chance to link everything. Hence the suggested reordering.
Another (very grubby, but nonetheless effective) technique is to rescan the static libraries by listing them several times on the command line.
With shared libraries, the rules of linking are different. If a shared library satisfies any symbol, the whole library will be available, so the linker remembers all the defined symbols, and you might well get away with the original link order.
You might need to look up 'topological sort'. You should certainly aim to design your static libraries so that there are no loops in the dependencies; that leads to cycles of dependencies, and the only reliable solutions are either to rescan the libraries or combine the libraries.

How do I strip symbols only from dependent libraries?

I'd like to ship libfoo.a, which is composed of foo.o--which in turn depends on libVendorBar.a and libVendorZoo.a.
When I link and generate my libfoo.a I notice that symbols in libVendor*.a are still public and visible for potential client applications to link against.
Due to many reasons outside of my control, I absolutely do not want 3rd party clients to be able to directly link against the vendor libraries.
How do I force gcc to resolve all libVendor symbols for libfoo and discard them, so that only symbols from libfoo are visible?
I'm not using any LD_FLAGS currently and everything is statically linked.
Unfortunately static libraries do not have equivalent of -fvisibility=hidden used for shared libraries. You can achieve what you need with more work though:
first link all necessary code into foo.o:
ld -r foo.o -Lpath/to/vendor/libs -lBar -lZoo -o foo_linked.o
This would allow you can to ship libfoo.a without vendor libs (vendor symbols are still present in it).
Unfortunately you can't simply remove vendor symbols from library symtab (e.g. via objcopy -L and strip --strip-symbol) because linker will need them for relocation processing during final executable link. But you can at least rename them to something unreadable:
for sym in all symbols you want to hide; do
id=$(echo $sym | md5sum | awk '{print $1}')
objcopy --redefine-sym $sym=f_$id foo_linked.o
done
Note however that this wouldn't stop motivated user from reverse engineering vendor's code.

Linking an archive to an archive

With GCC on Linux, is it possible to link a .a into another .a and then only link the resultant .a to my application? Or must my application know of the dependence between one archive and another and link them both?
My understanding is that I must know of the dependencies and link all archives at the end, not in an intermediary step, which seems a little ugly.
This is slightly different than How to merge two "ar" static libraries into one as I'm after a clear description that this is only possible by working around the problem and that linking the two archives together in the naive way is incorrect and will not work, along with the reason as to why.
Yes, your application has to know the dependencies between your different static libraries.
Let's say you have two static libraries a and b.
a has a function void print_a(), and b has a function void print_b() that is calling to print_a(). So, b depends on a.
Their binaries will look like liba.a and libb.a.
Let's say that library b has a reference to a function defined in library a - void print_b(void).
When compiling library b only its symbols are defined in the binary's code section while the others are still undefined:
host$ nm libb.a | grep print
U _print_a <--- Undefined
0000000000000000 T _print_b <--- Defined, in code section
0000000000000068 S _print_b.eh
U _printf
Therefore, when compiling the application that wants to use both of the libraries, linking only to libb.a won't be enough. You'll have to link your application to both libraries. Each library will provide its own symbols addresses in the code section and then your application will be able to link to both.
Something like:
gcc -o main main.c libb.a liba.a
BTW: When compiling library b that uses a, you can but it's not necessary to link to a. The result will be just the same.
Why is this the behavior
When compiling + linking the application that uses static libraries, the symbols in the application source files have to be defined somewhere (with the exception of dynamic linking, but this is done only with dynamic libraries/shared objects. Here we deal with static ones).
Now, remember that a static library is just an archive of objects. When it's created there's no linking phase. Just:
Compiling source code (*.c) to objects (*.o)
Archiving them together in a libXXXX.a file.
It means that if this library (library b in my example) uses some function (void print_a(void)) that is defined in another library (a), this symbol won't be resolved (not as a compilation error, but as the normal behavior). It will be set as Undefined symbol (as we see in the output of nm command) after the library creation, and it will wait to be linked later to its definition. And it's OK because a static library is not executable.
Now returning to application - the linking phase of the application needs to find all the definitions of all the symbols. If you just gave it libb.a as an argument, it wouldn't be able to find the definition to print_a(), because it's not there, it's still undefined. It exists only in liba.a.
Therefore, you must provide both of the libraries.
Let libx.a and liby.a be the modules you want to combine. You can try:-
mkdir tmp # create temporary directory for extracting
cd tmp
ar x ../libx.a # extract libx.a
cp ../liby.a ../libxy.a
ar -q ../libxy.a * # add extracted files to libxy.a
cd ..
rm -rf tmp
libxy.a thus created contains .o files from both .a files

Force linking a static library into a shared one with Libtool

I have a library (libfoo) that is compiled using libtool into two objects: libfoo.a and libfoo.so.
I have to create, using libtool also, another library (libbar) that will be a single shared library (libbar.so) containing all libfoo's code.
In order to do this, I have to force libbar to link against libfoo.a, and not libfoo.so.
I am in an autotools environment, so I have to solve this using standard configure.in or Makefile.am rules.
I tried several things, like in configure.in :
LDFLAGS="$LDFLAGS "-Wl,-Bstatic -lfoo -Wl,-Bdynamic"
That always results in the -Wl flags on the linking line; but -lfoo has disappeared and has been placed in an absolute-path form (/opt/foo/lib/libfoo.so) at the beginning of it.
I also tried:
LDFLAGS="$LDFLAGS "-L/opt/foo/lib libfoo.a"
or in Makefile.am:
libbar_la_LDADD = -Wl,-Bstatic -lfoo -Wl,-Bdynamic
and
libbar_la_LTLIBRARIES = libfoo.a
etc etc (with many, many variants !)
But I think that definitely I do not have knowledge enough of Autotools/Libtool to solve this alone. I have not been able to find information on the Net about it, always slightly different issues.
You could probably use a convenience library. Convenience libraries are intermediate static libraries which are not installed. You could use the prefix noinst to build one.
noinst_LTLIBRARIES = libfoo_impl.la
lib_LTLIBRARIES = libfoo.la libbar.la
libfoo_la_LIBADD = libfoo_impl.la
libbar_la_LIBADD = libfoo_impl.la
The standard way would be to build libfoo with --disable-shared. Whether to link statically or dynamically is a decision for the user to make, so there's really no way to force it as a package maintainer, but you could set the configury of libbar to fail if libfoo.so is present (I'm not sure of a clean way to do that, and believe it would be a bad idea since it really is a choice for the user.) I think the best bet is to have the user build libfoo with --disable-shared, but you can force that choice by specifying static libraries only in libfoo/configure.ac:
LT_INIT([disable-shared])
Note that if you do that, it will not be possible to build libfoo as a shared library. Perhaps that is what you want.

Limiting visibility of symbols when linking shared libraries

Some platforms mandate that you provide a list of a shared library's external symbols to the linker. However, on most unixish systems that's not necessary: all non-static symbols will be available by default.
My understanding is that the GNU toolchain can optionally restrict visibility just to symbols explicitly declared. How can that be achieved using GNU ld?
GNU ld can do that on ELF platforms.
Here is how to do it with a linker version script:
/* foo.c */
int foo() { return 42; }
int bar() { return foo() + 1; }
int baz() { return bar() - 1; }
gcc -fPIC -shared -o libfoo.so foo.c && nm -D libfoo.so | grep ' T '
By default, all symbols are exported:
0000000000000718 T _fini
00000000000005b8 T _init
00000000000006b7 T bar
00000000000006c9 T baz
00000000000006ac T foo
Let's say you want to export only bar() and baz(). Create a "version script" libfoo.version:
FOO {
global: bar; baz; # explicitly list symbols to be exported
local: *; # hide everything else
};
Pass it to the linker:
gcc -fPIC -shared -o libfoo.so foo.c -Wl,--version-script=libfoo.version
Observe exported symbols:
nm -D libfoo.so | grep ' T '
00000000000005f7 T bar
0000000000000609 T baz
I think the easiest way of doing that is adding the -fvisibility=hidden to gcc options and explicitly make visibility of some symbols public in the code (by __attribute__((visibility("default")))). See the documentation here.
There may be a way to accomplish that by ld linker scripts, but I don't know much about it.
The code generated to call any exported functions or use any exported globals is less efficient than those that aren't exported. There is an extra level of indirection involved. This applies to any function that might be exported at compile time. gcc will still produce extra indirection for a function that is later un-exported by a linker script. So using the visibility attribute will produce better code than the linker script.
Seems there's several ways to manage exported symbols on GNU/Linux. From my reading these are the 3 methods:
Source code annotation/decoration:
Method 1: -fvisibility=hidden along with __attribute__((visibility("default")))
Method 2 (since GCC 4): #pragma GCC visibility
Version Script:
Method 3: Version script (aka "symbol maps") passed to the linker (eg. -Wl,--version-script=<version script file>)
I won't get into examples here since they're mostly covered by other answers, but here's some notes, pros & cons to the different approaches off the top of my head:
Using the annotated approach allows the compiler to optimize the code a bit (one less indirection).
If using the annotated approach, then consider also using strip --strip-all --discard-all.
The annotated approach can add more work for internal function-level unit tests since the unit tests may not have access to the symbols. This might require building separate files: one for internal development & testing, and another for production. (This approach is generally non-optimal from a unit test purist perspective.)
Using a version script loses the optimization but allows symbol versioning which seems to not be available with the annotated approach.
Using a version script allows for unit testing assuming code is first built into an archive (.a) file and then linked into a DSO (.so). The unit tests would link with the .a.
Version scripts are not supported on Mac (at least not if using the linker provided by Mac, even if using GCC for the compiler), so if Mac is needed use the annotated approach.
I'm sure there are others.
Here's some references (with examples) that I've found helpful:
http://blog.fesnel.com/blog/2009/08/19/hiding-whats-exposed-in-a-shared-library/
https://accu.org/index.php/journals/1372
https://akkadia.org/drepper/dsohowto.pdf
If you are using libtool, there is another option much like Employed Russian's answer.
Using his example, it would be something like:
cat export.sym
bar
baz
Then run libtool with the following option:
libtool -export-symbols export.sym ...
Note that when using -export-symbols all symbols are NOT exported by default, and only those in export.sym are exported (so the "local: *" line in libfoo.version is actually implicit in this approach).

Resources