Restricting symbols in a Linux static library - c

I'm looking for ways to restrict the number of C symbols exported to a Linux static library (archive). I'd like to limit these to only those symbols that are part of the official API for the library. I already use 'static' to declare most functions as static, but this restricts them to file scope. I'm looking for a way to restrict to scope to the library.
I can do this for shared libraries using the techniques in Ulrich Drepper's How to Write Shared Libraries, but I can't apply these techniques to static archives. In his earlier Good Practices in Library Design paper, he writes:
The only possibility is to combine all object files which need
certain internal resources into one using 'ld -r' and then restrict the symbols
which are exported by this combined object file. The GNU linker has options to
do just this.
Could anyone help me discover what these options might be? I've had some success with 'strip -w -K prefix_*', but this feels brutish. Ideally, I'd like a solution that will work with both GCC 3 and 4.
Thanks!

I don't believe GNU ld has any such options; Ulrich must have meant objcopy, which has many such options: --localize-hidden, --localize-symbol=symbolname, --localize-symbols=filename.
The --localize-hidden in particular allows one to have a very fine control over which symbols are exposed. Consider:
int foo() { return 42; }
int __attribute__((visibility("hidden"))) bar() { return 24; }
gcc -c foo.c
nm foo.o
000000000000000b T bar
0000000000000000 T foo
objcopy --localize-hidden foo.o bar.o
nm bar.o
000000000000000b t bar
0000000000000000 T foo
So bar() is no longer exported from the object (even though it is still present and usable for debugging). You could also remove bar() all together with objcopy --strip-unneeded.

Static libraries can not do what you want for code compiled with either GCC 3.x or 4.x.
If you can use shared objects (libraries), the GNU linker does what you need with a feature called a version script. This is usually used to provide version-specific entry points, but the degenerate case just distinguishes between public and private symbols without any versioning. A version script is specified with the --version-script= command line option to ld.
The contents of a version script that makes the entry points foo and bar public and hides all other interfaces:
{ global: foo; bar; local: *; };
See the ld doc at: http://sourceware.org/binutils/docs/ld/VERSION.html#VERSION
I'm a big advocate of shared libraries, and this ability to limit the visibility of globals is one their great virtues.
A document that provides more of the advantages of shared objects, but written for Solaris (by Greg Nakhimovsky of happy memory), is at http://developers.sun.com/solaris/articles/linker_mapfiles.html
I hope this helps.

The merits of this answer will depend on why you're using static libraries. If it's to allow the linker to drop unused objects later then I have little to add. If it's for the purpose of organisation - minimising the number of objects that have to be passed around to link applications - this extension of Employed Russian's answer may be of use.
At compile time, the visibility of all symbols within a compilation unit can be set using:
-fvisibility=hidden
-fvisibility=default
This implies one can compile a single file "interface.c" with default visibility and a larger number of implementation files with hidden visibility, without annotating the source. A relocatable link will then produce a single object file where the non-api functions are "hidden":
ld -r interface.o implementation0.o implementation1.o -o relocatable.o
The combined object file can now be subjected to objcopy:
objcopy --localize-hidden relocatable.o mylibrary.o
Thus we have a single object file "library" or "module" which exposes only the intended API.
The above strategy interacts moderately well with link time optimisation. Compile with -flto and perform the relocatable link by passing -r to the linker via the compiler:
gcc -fuse-linker-plugin -flto -nostdlib -Wl,-r {objects} -o relocatable.o
Use objcopy to localise the hidden symbols as before, then call the linker a final time to strip the local symbols and whatever other dead code it can find in the post-lto object. Sadly, relocatable.o is unlikely to have retained any lto related information:
gcc -nostdlib -Wl,-r,--discard-all relocatable.o mylibrary.o
Current implementations of lto appear to be active during the relocatable link stage. With lto on, the hidden=>local symbols were stripped by the final relocatable link. Without lto, the hidden=>local symbols survived the final relocatable link.
Future implementations of lto seem likely to preserve the required metadata through the relocatable link stage, but at present the outcome of the relocatable link appears to be a plain old object file.

This is a refinement of the answers from EmployedRussian and JonChesterfield, which may be helpful if you're generating both dynamic and static libraries.
Start with the standard mechanism for hiding symbols in DSOs (the dynamic version of your lib). Compile all files with -fvisibility=hidden. In the header file which defines your API, change the declarations of the classes and functions you want to make public:
#define DLL_PUBLIC __attribute__ ((visibility ("default")))
extern DLL_PUBLIC int my_api_func(int);
See here for details. This works for both C and C++. This is sufficient for DSOs, but you'll need to add these build steps for static libraries:
ld -r obj1.o obj2.o ... objn.o -o static1.o
objcopy --localize-hidden static1.o static2.o
ar -rcs mylib.a static2.o
The ar step is optional - you can just link against static2.o.

My way of doing it is to mark everything that is not to be exported with INTERNAL,
include guard all .h files, compile dev builds with -DINTERNAL= and compile release builds with a single .c file that includes all other library .c files with -DINTERNAL=static.

Related

On linking of shared libraries, are they really final, and if so, why?

I am trying to understand more about linking and shared library.
Ultimately, I wonder if it's possible to add a method to a shared library. For instance, suppose one has a source file a.c, and a library lib.so (without the source file). Let's furthermore assume, for simplicity, that a.c declares a single method, whose name is not present in lib.so. I thought maybe it might be possible to, at linking time, link a.o to lib.so while instructing to create newLib.so, and forcing the linker to export all methods/variable in lib.so to that the newLib.so is now basically lib.so with the added method from a.so.
More generally, if one has some source file depending on a shared library, can one create a single output file (library or executable) that is not dependent on the shared library anymore ? (That is, all the relevant methods/variable from the library would have been exported/linked/inlined to the new executable, hence making the dependency void). If that's not possible, what is technically preventing it ?
A somehow similar question has been asked here: Merge multiple .so shared libraries.
One of the reply includes the following text: "If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.: without explaining the technical details. Was it a mistake or does it hold ? If so, how to do it ?
Once you have a shared library libfoo.so the only ways you can use it
in the linkage of anything else are:-
Link a program that dynamically depends on it, e.g.
$ gcc -o prog bar.o ... -lfoo
Or, link another shared library that dynamically depends on it, e.g.
$ gcc -shared -o libbar.so bar.o ... -lfoo
In either case the product of the linkage, prog or libbar.so
acquires a dynamic dependency on libfoo.so. This means that prog|libfoo.so
has information inscribed in it by the linker that instructs the
OS loader, at runtime, to find libfoo.so, load it into the
address space of the current process and bind the program's references to libfoo's exported symbols to
the addresses of their definitions.
So libfoo.so must continue to exist as well as prog|libbar.so.
It is not possible to link libfoo.so with prog|libbar.so in
such a way that libfoo.so is physically merged into prog|libbar.so
and is no longer a runtime dependency.
It doesn't matter whether or not you have the source code of the
other linkage input files - bar.o ... - that depend on libfoo.so. The
only kind of linkage you can do with a shared library is dynamic linkage.
This is in complete contrast with the linkage of a static library
You wonder about the statement in this this answer where it says:
If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.
The author is just observing that if I have source files
foo_a.c foo_b.c... bar_a.c bar_b.c
which I compile to the corresponding object files:
foo_a.o foo_b.o... bar_a.o bar_b.o...
or if I simply have those object files. Then as well as - or instead of - linking them into two shared libraries:
$ gcc -shared -o libfoo.so foo_a.o foo_b.o...
$ gcc -shared -o libbar.so bar_a.o bar_b.o...
I could link them into one:
$ gcc -shared -o libfoobar.so foo_a.o foo_b.o... bar_a.o bar_b.o...
which would have no dependency on libfoo.so or libbar.so even if they exist.
And although that could be straightforward it could also be false. If there is
any symbol name that is globally defined in any of foo_a.o foo_b.o... and
also globally defined in any of bar_a.o bar_b.o... then it will not matter
to the linkage of either libfoo.so or libbar.so (and it need not be dynamically
exported by either of them). But the linkage of libfoobar.so will fail for
multiple definition of name.
If we build a shared library libbar.so that depends on libfoo.so and has
itself been linked with libfoo.so:
$ gcc -shared -o libbar.so bar.o ... -lfoo
and we then want to link a program with libbar.so, we can do that in such a way
that we don't need to mention its dependency libfoo.so:
$ gcc -o prog main.o ... -lbar -Wl,-rpath=<path/to/libfoo.so>
See this answer to follow that up. But
this doesn't change the fact that libbar.so has a runtime dependency on libfoo.so.
If that's not possible, what is technically preventing it?
What technically prevents linking a shared library with some program
or shared library targ in a way that physically merges it into targ is that a
shared library (like a program) is not the sort of thing that a linker knows
how to physically merge into its output file.
Input files that the linker can physically merge into targ need to
have structural properties that guide the linker in doing that merging. That is the structure of object files.
They consist of named input sections of object code or data that are tagged with various attributes.
Roughly speaking, the linker cuts up the object files into their sections and distributes them into
output sections of the output file according to their attributes, and makes
binary modifications to the merged result to resolve static symbol references
or enable the OS loader to resolve dynamic ones at runtime.
This is not a reversible process. The linker can't consume a program or
shared library and reconstruct the object files from which it was made to
merge them again into something else.
But that's really beside the point. When input files are physically
merged into targ, that is called static linkage.
When input files are just externally referenced in targ to
make the OS loader map them into a process it has launched for targ,
that is called dynamic linkage. Technical development has given us
a file-format solution to each of these needs: object files for static linkage, shared libraries
for dynamic linkage. Neither can be used for the purpose of the other.

Create non-PIC shared libraries with ld

I have a bunch of object files that have been compiled without the -fPIC option. So the calls to the functions do not use #PLT. (source code is C and is compiled with clang).
I want to link these object files into a shared library that I can load at runtime using dlopen. I need to do this because I have to do a lot of setup before the actual .so is loaded.
But every time I try to link with the -shared option, I get the error -
relocation R_X86_64_PC32 against symbol splay_tree_lookup can not be used when making a shared object; recompile with -fPIC
I have no issues recompiling from source. But I don't want to use -fPIC. This is part of a research project where we are working on a custom compiler. PIC wouldn't work for the type of guarantees we are trying to provide in the compiler.
Is there some flag I can use with ld so that it generate load time relocating libraries. In fact I am okay with no relocations. I can provide a base address for the library and dlopen can fail if the virtual address is not available.
The command I am using for compiling my c files are equivalent to -
clang -m64 -c foo.c
and for linking I am using
clang -m64 -shared *.o -o foo.so
I say equivalent because it is a custom compiler (forked off clang) and has some extra steps. But it is equivalent.
It is not possible to dynamically load your existing non PIC objects with the expectation of it working without problems.
If you cannot recompile the original code to create a proper shared library that supports PIC, then I suggest you create a service executable that links to a static library composed of those objects. The service executable can then provide IPC/RPC/REST API/shared memory/whatever to allow your object code to be used by your program.
Then, you can author a shared library which is compiled with PIC that provides wrapper APIs that launches and communicates with the service executable to perform the actual work.
On further thought, this wrapper API library may as well be static. The dynamic aspect of it is performed by launching the service executable.
Recompiling the library's object files with the -fpic -shared options would be the best option, if this is possible!
man ld says:
-i Perform an incremental link (same as option -r).
-r
--relocatable
Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld. This is often called partial linking. As a side effect, in environments that support standard Unix magic numbers, this option also sets the output file’s magic number to "OMAGIC". If this option is not specified, an absolute file is produced. When linking C++ programs, this option will not resolve references to constructors; to do that, use -Ur.
When an input file does not have the same format as the output file, partial linking is only supported if that input file does not contain any relocations. Different output formats can have further restrictions; for example some "a.out"-based formats do not support partial linking with input files in other formats at all.
I believe you can partially link your library object files into a relocatable (PIC) library, then link that library with your source code object file to make a shared library.
ld -r -o libfoo.so *.o
cp libfoo.so /foodir/libfoo.so
cd foodir
clang -m32 -fpic -c foo.c
clang -m32 -fpic -shared *.o -o foo.so
Regarding library base address:
(Again from man ld)
--section-start=sectionname=org
Locate a section in the output file at the absolute address given by org. You may use this option as many times as necessary to locate multiple sections in the command line. org must be a single hexadecimal integer; for compatibility with other linkers, you may omit the leading 0x usually associated with hexadecimal values. Note: there should be no white space between sectionname, the equals sign ("="), and org.
You could perhaps move your library's .text section?
--image-base value
Use value as the base address of your program or dll. This is the lowest memory location that will be used when your program or dll is loaded. To reduce the need to relocate and improve performance of your dlls, each should have a unique base address and not overlap any other dlls. The default is 0x400000 for executables, and 0x10000000 for dlls. [This option is specific to the i386 PE targeted port of the linker]

Why the order of object files is important for static libraries?

I create some files:
file1.c
file2.c
file3.c
I compile them using gcc -c file1.c and i did the same for other files, and i get object files. Later i used ar tool to create static library.
Everythink works correctly, but ar has option
ar -m -a file.o lib.a filetomove.o
to move object files in library, why order of object files is important? Please, show me example where object files must be in correct order.
This is less and less of a problem as time goes on, but for a long time linkers were single pass. That means if a symbol was defined in a.o and referenced in b.o, the linker had to "see" b.o before a.o or it would never find a definition for the reference.
In other circumstances, sometimes a "default" function is provided in a library that is linked last. This is a popular technique in embedded systems development. You can provide an override function by linking it in a static library or object module, but if you don't, the last library will provide a symbol that satisfies the linker.

Statically linking against LAPACK

I'm attempting to do a release of some software and am currently working through a script for the build process. I'm stuck on something I never thought I would be, statically linking LAPACK on x86_64 linux. During configuration AC_SEARCH_LIB([main],[lapack]) works, but compilation of the lapack units do not work, for example undefiend reference to 'dsyev_' --no lapack/blas routine goes unnoticed.
I've confirmed I have the libraries installed and even compiled them myself with the appropriate options to make them static with the same results.
Here is an example I had used in my first experience with LAPACK a few years ago that works dynamically, but not statically: http://pastebin.com/cMm3wcwF
The two methods I'm using to compile are the following,
gcc -llapack -o eigen eigen.c
gcc -static -llapack -o eigen eigen.c
Your linking order is wrong. Link libraries after the code that requires them, not before. Like this:
gcc -o eigen eigen.c -llapack
gcc -static -o eigen eigen.c -llapack
That should resolve the linkage problems.
To answer the subsequent question why this works, the GNU ld documentation say this:
It makes a difference where in the command you write this option; the
linker searches and processes libraries and object files in the order
they are specified. Thus, foo.o -lz bar.o' searches libraryz' after
file foo.o but before bar.o. If bar.o refers to functions in `z',
those functions may not be loaded.
........
Normally the files found this way are library files—archive files
whose members are object files. The linker handles an archive file by
scanning through it for members which define symbols that have so far
been referenced but not defined. But if the file that is found is an
ordinary object file, it is linked in the usual fashion.
ie. the linker is going to make one pass through a file looking for unresolved symbols, and it follows files in the order you provide them (ie. "left to right"). If you have not yet specified a dependency when a file is read, the linker will not be able to satisfy the dependency. Every object in the link list is parsed only once.
Note also that GNU ld can do reordering in cases where circular dependencies are detected when linking shared libraries or object files. But static libraries are only parsed for unknown symbols once.

Limiting visibility of symbols when linking shared libraries

Some platforms mandate that you provide a list of a shared library's external symbols to the linker. However, on most unixish systems that's not necessary: all non-static symbols will be available by default.
My understanding is that the GNU toolchain can optionally restrict visibility just to symbols explicitly declared. How can that be achieved using GNU ld?
GNU ld can do that on ELF platforms.
Here is how to do it with a linker version script:
/* foo.c */
int foo() { return 42; }
int bar() { return foo() + 1; }
int baz() { return bar() - 1; }
gcc -fPIC -shared -o libfoo.so foo.c && nm -D libfoo.so | grep ' T '
By default, all symbols are exported:
0000000000000718 T _fini
00000000000005b8 T _init
00000000000006b7 T bar
00000000000006c9 T baz
00000000000006ac T foo
Let's say you want to export only bar() and baz(). Create a "version script" libfoo.version:
FOO {
global: bar; baz; # explicitly list symbols to be exported
local: *; # hide everything else
};
Pass it to the linker:
gcc -fPIC -shared -o libfoo.so foo.c -Wl,--version-script=libfoo.version
Observe exported symbols:
nm -D libfoo.so | grep ' T '
00000000000005f7 T bar
0000000000000609 T baz
I think the easiest way of doing that is adding the -fvisibility=hidden to gcc options and explicitly make visibility of some symbols public in the code (by __attribute__((visibility("default")))). See the documentation here.
There may be a way to accomplish that by ld linker scripts, but I don't know much about it.
The code generated to call any exported functions or use any exported globals is less efficient than those that aren't exported. There is an extra level of indirection involved. This applies to any function that might be exported at compile time. gcc will still produce extra indirection for a function that is later un-exported by a linker script. So using the visibility attribute will produce better code than the linker script.
Seems there's several ways to manage exported symbols on GNU/Linux. From my reading these are the 3 methods:
Source code annotation/decoration:
Method 1: -fvisibility=hidden along with __attribute__((visibility("default")))
Method 2 (since GCC 4): #pragma GCC visibility
Version Script:
Method 3: Version script (aka "symbol maps") passed to the linker (eg. -Wl,--version-script=<version script file>)
I won't get into examples here since they're mostly covered by other answers, but here's some notes, pros & cons to the different approaches off the top of my head:
Using the annotated approach allows the compiler to optimize the code a bit (one less indirection).
If using the annotated approach, then consider also using strip --strip-all --discard-all.
The annotated approach can add more work for internal function-level unit tests since the unit tests may not have access to the symbols. This might require building separate files: one for internal development & testing, and another for production. (This approach is generally non-optimal from a unit test purist perspective.)
Using a version script loses the optimization but allows symbol versioning which seems to not be available with the annotated approach.
Using a version script allows for unit testing assuming code is first built into an archive (.a) file and then linked into a DSO (.so). The unit tests would link with the .a.
Version scripts are not supported on Mac (at least not if using the linker provided by Mac, even if using GCC for the compiler), so if Mac is needed use the annotated approach.
I'm sure there are others.
Here's some references (with examples) that I've found helpful:
http://blog.fesnel.com/blog/2009/08/19/hiding-whats-exposed-in-a-shared-library/
https://accu.org/index.php/journals/1372
https://akkadia.org/drepper/dsohowto.pdf
If you are using libtool, there is another option much like Employed Russian's answer.
Using his example, it would be something like:
cat export.sym
bar
baz
Then run libtool with the following option:
libtool -export-symbols export.sym ...
Note that when using -export-symbols all symbols are NOT exported by default, and only those in export.sym are exported (so the "local: *" line in libfoo.version is actually implicit in this approach).

Resources