Limiting visibility of symbols when linking shared libraries - linker

Some platforms mandate that you provide a list of a shared library's external symbols to the linker. However, on most unixish systems that's not necessary: all non-static symbols will be available by default.
My understanding is that the GNU toolchain can optionally restrict visibility just to symbols explicitly declared. How can that be achieved using GNU ld?

GNU ld can do that on ELF platforms.
Here is how to do it with a linker version script:
/* foo.c */
int foo() { return 42; }
int bar() { return foo() + 1; }
int baz() { return bar() - 1; }
gcc -fPIC -shared -o libfoo.so foo.c && nm -D libfoo.so | grep ' T '
By default, all symbols are exported:
0000000000000718 T _fini
00000000000005b8 T _init
00000000000006b7 T bar
00000000000006c9 T baz
00000000000006ac T foo
Let's say you want to export only bar() and baz(). Create a "version script" libfoo.version:
FOO {
global: bar; baz; # explicitly list symbols to be exported
local: *; # hide everything else
};
Pass it to the linker:
gcc -fPIC -shared -o libfoo.so foo.c -Wl,--version-script=libfoo.version
Observe exported symbols:
nm -D libfoo.so | grep ' T '
00000000000005f7 T bar
0000000000000609 T baz

I think the easiest way of doing that is adding the -fvisibility=hidden to gcc options and explicitly make visibility of some symbols public in the code (by __attribute__((visibility("default")))). See the documentation here.
There may be a way to accomplish that by ld linker scripts, but I don't know much about it.

The code generated to call any exported functions or use any exported globals is less efficient than those that aren't exported. There is an extra level of indirection involved. This applies to any function that might be exported at compile time. gcc will still produce extra indirection for a function that is later un-exported by a linker script. So using the visibility attribute will produce better code than the linker script.

Seems there's several ways to manage exported symbols on GNU/Linux. From my reading these are the 3 methods:
Source code annotation/decoration:
Method 1: -fvisibility=hidden along with __attribute__((visibility("default")))
Method 2 (since GCC 4): #pragma GCC visibility
Version Script:
Method 3: Version script (aka "symbol maps") passed to the linker (eg. -Wl,--version-script=<version script file>)
I won't get into examples here since they're mostly covered by other answers, but here's some notes, pros & cons to the different approaches off the top of my head:
Using the annotated approach allows the compiler to optimize the code a bit (one less indirection).
If using the annotated approach, then consider also using strip --strip-all --discard-all.
The annotated approach can add more work for internal function-level unit tests since the unit tests may not have access to the symbols. This might require building separate files: one for internal development & testing, and another for production. (This approach is generally non-optimal from a unit test purist perspective.)
Using a version script loses the optimization but allows symbol versioning which seems to not be available with the annotated approach.
Using a version script allows for unit testing assuming code is first built into an archive (.a) file and then linked into a DSO (.so). The unit tests would link with the .a.
Version scripts are not supported on Mac (at least not if using the linker provided by Mac, even if using GCC for the compiler), so if Mac is needed use the annotated approach.
I'm sure there are others.
Here's some references (with examples) that I've found helpful:
http://blog.fesnel.com/blog/2009/08/19/hiding-whats-exposed-in-a-shared-library/
https://accu.org/index.php/journals/1372
https://akkadia.org/drepper/dsohowto.pdf

If you are using libtool, there is another option much like Employed Russian's answer.
Using his example, it would be something like:
cat export.sym
bar
baz
Then run libtool with the following option:
libtool -export-symbols export.sym ...
Note that when using -export-symbols all symbols are NOT exported by default, and only those in export.sym are exported (so the "local: *" line in libfoo.version is actually implicit in this approach).

Related

How to write common functions for reusing in C

I was trying to write a common function for other files could reuse it, the example as following, I have three files:
The first file: cat test1.h
void say();
The second file: cat test1.c
void say(){
printf("This is c example!");
}
The third file: cat test2.c
include "test1.h"
void main(){
say();
}
but when I ran: gcc -g -o test2 test2.c
it threw error as:
undefined reference to `say'
Additionally: I knew this would work:gcc -g -o test2 test1.c test2.c
but I don't wanna do this, because the other team would use the server, and I hope them directly use my binary code not source code. I hope that just like we use printf() function, we just need include .
You can build yourself a library from the object files containing your useful functions, and store the header(s) that describe them in a convenient location. You and your colleagues then compile with the headers and link that library with any executables that use any of those functions. That's very much the same general mechanism that the C compiler uses to include the standard headers and automatically link with the standard C library.
The mechanics vary a bit depending on platform (Windows vs Unix being the primary distinction, though there are differences between Unix platforms too), and also on the type of library (static archive vs dynamic linked / loaded libraries — also known as shared objects or shared libraries).
In broad outline, for a Unix system with a static library, you'd:
Compile library object files libfile1.o, libfile2.o, … using (for example) gcc -c libfile1.c libfile2.c.
Create an archive from the object files — using for example ar r libname.a libfile1.o libfile2.o.
Copy the headers to a standard location such as /usr/local/include.
Copy the library to a standard location such as /usr/local/lib.
You'd compile any code that uses the library functions with -I/usr/local/include (if that is not already a standard compilation option).
You'd link the programs with -L/usr/local/lib -lname (you might not need to specify -L… but you would need to specify -lname).
Including a header file does not make a function available. It simply informs the compiler that the function will be provided at a later time.
You should compile the file with the function into a shareable object file (or a library if there is more than one function that you want to share). Mind the switch -c which tells gcc not to build an executable file:
gcc -o test1.o test1.c -c
Similarly, compile the main function into its own object file. Now you or anyone else can link the object file with their main program:
gcc -o test2 test2.o test1.o
The process can be automated using make.
Other programmers can use compiled object files (`*.o') in their programs. They need only to have a header file with function prototypes, extern data declarations and type definitions.
You can also wrap many object files into the library.
On many systems you can also create the dynamic linked libraries which do not have to be linked into the executable.
you also need to compile test1:
gcc -g -o test2 test1.c test2.c.

Linking error in static lib unless used in main project

I'm creating a little static library for having thread pools, and it depends on 2 other homemade static libraries (a homemade printf and a homemade mini libc).
But sub-functions like ft_bzero are not linked in the project unless I use them on the root project, the one that needs to use thread pools library. So I have the linking error coming from my thpool lib.
Sample :
cc -Wall -Werror -Wextra -MD -I ./ -I ./jqueue -I ../libft/incs -I
../printf/incs -o .objs/thpool_create.o -c ./thpool_create.c
ar rc libthpool.a ./.objs/thpool_create.o etcetc
In the libraries, I compile every .o and use an ar rc libthpool.a *.o. Then I compile .o from main project (a single test.c actually), and then
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lft -lftprintf -lthpool -lpthread
How can I solve my errors?
Since the code in the ftpool library uses code from ft and ftprintf, you (almost certainly) need to list the libraries in the reverse order:
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lthpool -lftprintf -lft -lpthread
When scanning a static library, the linker looks for definitions of symbols that are currently undefined. If your test code only calls functions from thpool, then none of the symbols in ft are referenced when the ft library is scanned, so nothing is included from the library; if none of the symbols from ftprintf are referenced when the ftprintf library is scanned, nothing is included from ftprintf either. When it comes across the symbols in thpool that reference things from ft or ftprintf, it's too late; the linker doesn't rescan the libraries. Hence you need to list the libraries in an order such that all references from one library (A) to another (B) are found by linking (A) before (B). If the test code references some of the functions in ft or ftprintf, you may get lucky, or a bit lucky; some symbols may be linked in. But if there are functions in thpool that make the first reference to a function in ft, with the order in the question, you've lost the chance to link everything. Hence the suggested reordering.
Another (very grubby, but nonetheless effective) technique is to rescan the static libraries by listing them several times on the command line.
With shared libraries, the rules of linking are different. If a shared library satisfies any symbol, the whole library will be available, so the linker remembers all the defined symbols, and you might well get away with the original link order.
You might need to look up 'topological sort'. You should certainly aim to design your static libraries so that there are no loops in the dependencies; that leads to cycles of dependencies, and the only reliable solutions are either to rescan the libraries or combine the libraries.

Linking multiple incompatible versions of a static library into one executable

I am presently developing for a system which discourages (i.e. essentially forbids) dynamic libraries. Therefore, everything has to be linked statically.
The application framework I am using (which cannot be changed) is using an old, statically-linked version of a library libfoo.a (version r7). A library I am using, libbar, needs libfoo.a version r8 (specifically, some of the new features are crucial for the library to function). I can edit and recompile libbar as well as libfoo r8, but I want to avoid changing them as much as possible because I am not very familiar with the code (and would have to pass code changes upstream).
Unfortunately, the two libfoo libraries have a substantial number of symbols in common. So, the linker spits out a ton of "multiple symbol definition" errors.
I've heard it's possible to use objcopy and friends to "inline" a static library into another. However, I'm not really sure how to achieve this in practice, nor if it's even the best option.
So, how can I successfully compile an executable which uses two, incompatible versions of the same library? I've already considered avoiding this situation but it will be much harder to work with.
It turns out that this is actually possible with some ld and objcopy magic.
Basically, the procedure looks like this:
# Unpack libraries
ar x libbar.a
ar x libfoo.a
# Grab symbol table (symbols to export)
nm -Ag libbar.a | grep -v ' U ' | cut -d' ' -f 3 > libbar.sym
# Build a single object file with libfoo relocated in
ld -Er *.o -o libbar-merged.lo
# Localize all symbols except for libbar's symbols
objcopy --keep-global-symbols libbar.sym libbar-merged.lo libbar-merged.o
# Create an archive to hold the merged library
ar crs libbar-merged.a libbar-merged.o
This effectively creates a single super-library which exports only the symbols from the original libbar, and which has the other library relocated in.
There's probably another, cleaner way to achieve this result, but this method works for me and allows me to statically link two incompatible libraries into the same executable, with no apparent ill effects.

gcc detect duplicate symbols/functions in static libraries

Is there any way we can get gcc to detect a duplicate symbol in static libraries vs the main code (Or another static library ?)
Here's the situation:
main.c erroneously contained a function definition, e.g. with the signature uint foohash(const char*)
foo.c also contains a function definition with the signature uint foohash(const char*)
foo.c and other source files are compiled to a static util library, which the main program links in, i.e. something like:
gcc -o main main.o util.o -L ./libs -lfooutils
So, now main.o and libs/libfooutils.a both contain a foohash function. Presumably the linker found that symbol in main.o and doesn't bother looking for it elsewhere.
Is there any way we can get gcc to detect such a situation ?
Indeed as Simon Richter stated, --whole-archive option can be useful. Try to change your command-line to:
gcc -o main main.o util.o -L ./libs -Wl,--whole-archive -lfooutils -Wl,--no-whole-archive
and you'll see a multiple definition error.
gcc calls the ld program for linking. The relevant ld options are:
--no-define-common
--traditional-format
--warn-common
See the man page for ld. These should be what you need to experiment with to get the warnings sought.
Short answer: no.
GCC does not actually do anything with libraries. It is the task of ld, the linker (called automatically by GCC) to pull in symbols from libraries, and that's really a fairly dumb tool.
The linker has lots of complex jiggery pokery for combining different types of data from different sources, and supporting different file formats, and all the evil little details of binary executables, but in the end, all it really does is look for undefined symbols and find the definitions.
What you can do is a link trace (pass -t to gcc) to see what comes from where. Or else run nm on all the object files and libraries in your system, and write a script to detect duplicates.

Restricting symbols in a Linux static library

I'm looking for ways to restrict the number of C symbols exported to a Linux static library (archive). I'd like to limit these to only those symbols that are part of the official API for the library. I already use 'static' to declare most functions as static, but this restricts them to file scope. I'm looking for a way to restrict to scope to the library.
I can do this for shared libraries using the techniques in Ulrich Drepper's How to Write Shared Libraries, but I can't apply these techniques to static archives. In his earlier Good Practices in Library Design paper, he writes:
The only possibility is to combine all object files which need
certain internal resources into one using 'ld -r' and then restrict the symbols
which are exported by this combined object file. The GNU linker has options to
do just this.
Could anyone help me discover what these options might be? I've had some success with 'strip -w -K prefix_*', but this feels brutish. Ideally, I'd like a solution that will work with both GCC 3 and 4.
Thanks!
I don't believe GNU ld has any such options; Ulrich must have meant objcopy, which has many such options: --localize-hidden, --localize-symbol=symbolname, --localize-symbols=filename.
The --localize-hidden in particular allows one to have a very fine control over which symbols are exposed. Consider:
int foo() { return 42; }
int __attribute__((visibility("hidden"))) bar() { return 24; }
gcc -c foo.c
nm foo.o
000000000000000b T bar
0000000000000000 T foo
objcopy --localize-hidden foo.o bar.o
nm bar.o
000000000000000b t bar
0000000000000000 T foo
So bar() is no longer exported from the object (even though it is still present and usable for debugging). You could also remove bar() all together with objcopy --strip-unneeded.
Static libraries can not do what you want for code compiled with either GCC 3.x or 4.x.
If you can use shared objects (libraries), the GNU linker does what you need with a feature called a version script. This is usually used to provide version-specific entry points, but the degenerate case just distinguishes between public and private symbols without any versioning. A version script is specified with the --version-script= command line option to ld.
The contents of a version script that makes the entry points foo and bar public and hides all other interfaces:
{ global: foo; bar; local: *; };
See the ld doc at: http://sourceware.org/binutils/docs/ld/VERSION.html#VERSION
I'm a big advocate of shared libraries, and this ability to limit the visibility of globals is one their great virtues.
A document that provides more of the advantages of shared objects, but written for Solaris (by Greg Nakhimovsky of happy memory), is at http://developers.sun.com/solaris/articles/linker_mapfiles.html
I hope this helps.
The merits of this answer will depend on why you're using static libraries. If it's to allow the linker to drop unused objects later then I have little to add. If it's for the purpose of organisation - minimising the number of objects that have to be passed around to link applications - this extension of Employed Russian's answer may be of use.
At compile time, the visibility of all symbols within a compilation unit can be set using:
-fvisibility=hidden
-fvisibility=default
This implies one can compile a single file "interface.c" with default visibility and a larger number of implementation files with hidden visibility, without annotating the source. A relocatable link will then produce a single object file where the non-api functions are "hidden":
ld -r interface.o implementation0.o implementation1.o -o relocatable.o
The combined object file can now be subjected to objcopy:
objcopy --localize-hidden relocatable.o mylibrary.o
Thus we have a single object file "library" or "module" which exposes only the intended API.
The above strategy interacts moderately well with link time optimisation. Compile with -flto and perform the relocatable link by passing -r to the linker via the compiler:
gcc -fuse-linker-plugin -flto -nostdlib -Wl,-r {objects} -o relocatable.o
Use objcopy to localise the hidden symbols as before, then call the linker a final time to strip the local symbols and whatever other dead code it can find in the post-lto object. Sadly, relocatable.o is unlikely to have retained any lto related information:
gcc -nostdlib -Wl,-r,--discard-all relocatable.o mylibrary.o
Current implementations of lto appear to be active during the relocatable link stage. With lto on, the hidden=>local symbols were stripped by the final relocatable link. Without lto, the hidden=>local symbols survived the final relocatable link.
Future implementations of lto seem likely to preserve the required metadata through the relocatable link stage, but at present the outcome of the relocatable link appears to be a plain old object file.
This is a refinement of the answers from EmployedRussian and JonChesterfield, which may be helpful if you're generating both dynamic and static libraries.
Start with the standard mechanism for hiding symbols in DSOs (the dynamic version of your lib). Compile all files with -fvisibility=hidden. In the header file which defines your API, change the declarations of the classes and functions you want to make public:
#define DLL_PUBLIC __attribute__ ((visibility ("default")))
extern DLL_PUBLIC int my_api_func(int);
See here for details. This works for both C and C++. This is sufficient for DSOs, but you'll need to add these build steps for static libraries:
ld -r obj1.o obj2.o ... objn.o -o static1.o
objcopy --localize-hidden static1.o static2.o
ar -rcs mylib.a static2.o
The ar step is optional - you can just link against static2.o.
My way of doing it is to mark everything that is not to be exported with INTERNAL,
include guard all .h files, compile dev builds with -DINTERNAL= and compile release builds with a single .c file that includes all other library .c files with -DINTERNAL=static.

Resources