How do I strip symbols only from dependent libraries?

How do I strip symbols only from dependent libraries? - c

I'd like to ship libfoo.a, which is composed of foo.o--which in turn depends on libVendorBar.a and libVendorZoo.a.
When I link and generate my libfoo.a I notice that symbols in libVendor*.a are still public and visible for potential client applications to link against.
Due to many reasons outside of my control, I absolutely do not want 3rd party clients to be able to directly link against the vendor libraries.
How do I force gcc to resolve all libVendor symbols for libfoo and discard them, so that only symbols from libfoo are visible?
I'm not using any LD_FLAGS currently and everything is statically linked.

Unfortunately static libraries do not have equivalent of -fvisibility=hidden used for shared libraries. You can achieve what you need with more work though:
first link all necessary code into foo.o:
ld -r foo.o -Lpath/to/vendor/libs -lBar -lZoo -o foo_linked.o
This would allow you can to ship libfoo.a without vendor libs (vendor symbols are still present in it).
Unfortunately you can't simply remove vendor symbols from library symtab (e.g. via objcopy -L and strip --strip-symbol) because linker will need them for relocation processing during final executable link. But you can at least rename them to something unreadable:
for sym in all symbols you want to hide; do
id=$(echo $sym | md5sum | awk '{print $1}')
objcopy --redefine-sym $sym=f_$id foo_linked.o
done
Note however that this wouldn't stop motivated user from reverse engineering vendor's code.

Related

Create non-PIC shared libraries with ld

I have a bunch of object files that have been compiled without the -fPIC option. So the calls to the functions do not use #PLT. (source code is C and is compiled with clang).
I want to link these object files into a shared library that I can load at runtime using dlopen. I need to do this because I have to do a lot of setup before the actual .so is loaded.
But every time I try to link with the -shared option, I get the error -
relocation R_X86_64_PC32 against symbol splay_tree_lookup can not be used when making a shared object; recompile with -fPIC
I have no issues recompiling from source. But I don't want to use -fPIC. This is part of a research project where we are working on a custom compiler. PIC wouldn't work for the type of guarantees we are trying to provide in the compiler.
Is there some flag I can use with ld so that it generate load time relocating libraries. In fact I am okay with no relocations. I can provide a base address for the library and dlopen can fail if the virtual address is not available.
The command I am using for compiling my c files are equivalent to -
clang -m64 -c foo.c
and for linking I am using
clang -m64 -shared *.o -o foo.so
I say equivalent because it is a custom compiler (forked off clang) and has some extra steps. But it is equivalent.

It is not possible to dynamically load your existing non PIC objects with the expectation of it working without problems.
If you cannot recompile the original code to create a proper shared library that supports PIC, then I suggest you create a service executable that links to a static library composed of those objects. The service executable can then provide IPC/RPC/REST API/shared memory/whatever to allow your object code to be used by your program.
Then, you can author a shared library which is compiled with PIC that provides wrapper APIs that launches and communicates with the service executable to perform the actual work.
On further thought, this wrapper API library may as well be static. The dynamic aspect of it is performed by launching the service executable.

Recompiling the library's object files with the -fpic -shared options would be the best option, if this is possible!
man ld says:
-i Perform an incremental link (same as option -r).
-r
--relocatable
Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld. This is often called partial linking. As a side effect, in environments that support standard Unix magic numbers, this option also sets the output file’s magic number to "OMAGIC". If this option is not specified, an absolute file is produced. When linking C++ programs, this option will not resolve references to constructors; to do that, use -Ur.
When an input file does not have the same format as the output file, partial linking is only supported if that input file does not contain any relocations. Different output formats can have further restrictions; for example some "a.out"-based formats do not support partial linking with input files in other formats at all.
I believe you can partially link your library object files into a relocatable (PIC) library, then link that library with your source code object file to make a shared library.
ld -r -o libfoo.so *.o
cp libfoo.so /foodir/libfoo.so
cd foodir
clang -m32 -fpic -c foo.c
clang -m32 -fpic -shared *.o -o foo.so
Regarding library base address:
(Again from man ld)
--section-start=sectionname=org
Locate a section in the output file at the absolute address given by org. You may use this option as many times as necessary to locate multiple sections in the command line. org must be a single hexadecimal integer; for compatibility with other linkers, you may omit the leading 0x usually associated with hexadecimal values. Note: there should be no white space between sectionname, the equals sign ("="), and org.
You could perhaps move your library's .text section?
--image-base value
Use value as the base address of your program or dll. This is the lowest memory location that will be used when your program or dll is loaded. To reduce the need to relocate and improve performance of your dlls, each should have a unique base address and not overlap any other dlls. The default is 0x400000 for executables, and 0x10000000 for dlls. [This option is specific to the i386 PE targeted port of the linker]

Linking multiple incompatible versions of a static library into one executable

I am presently developing for a system which discourages (i.e. essentially forbids) dynamic libraries. Therefore, everything has to be linked statically.
The application framework I am using (which cannot be changed) is using an old, statically-linked version of a library libfoo.a (version r7). A library I am using, libbar, needs libfoo.a version r8 (specifically, some of the new features are crucial for the library to function). I can edit and recompile libbar as well as libfoo r8, but I want to avoid changing them as much as possible because I am not very familiar with the code (and would have to pass code changes upstream).
Unfortunately, the two libfoo libraries have a substantial number of symbols in common. So, the linker spits out a ton of "multiple symbol definition" errors.
I've heard it's possible to use objcopy and friends to "inline" a static library into another. However, I'm not really sure how to achieve this in practice, nor if it's even the best option.
So, how can I successfully compile an executable which uses two, incompatible versions of the same library? I've already considered avoiding this situation but it will be much harder to work with.

It turns out that this is actually possible with some ld and objcopy magic.
Basically, the procedure looks like this:
# Unpack libraries
ar x libbar.a
ar x libfoo.a
# Grab symbol table (symbols to export)
nm -Ag libbar.a | grep -v ' U ' | cut -d' ' -f 3 > libbar.sym
# Build a single object file with libfoo relocated in
ld -Er *.o -o libbar-merged.lo
# Localize all symbols except for libbar's symbols
objcopy --keep-global-symbols libbar.sym libbar-merged.lo libbar-merged.o
# Create an archive to hold the merged library
ar crs libbar-merged.a libbar-merged.o
This effectively creates a single super-library which exports only the symbols from the original libbar, and which has the other library relocated in.
There's probably another, cleaner way to achieve this result, but this method works for me and allows me to statically link two incompatible libraries into the same executable, with no apparent ill effects.

Is there a way to unhide hidden-visibility symbols with GNU binutils?

I'm working on a script to make uClibc usable on an existing glibc-targetted gcc/binutils toolchain, and the one problem I'm left with is that pthread_cancel needs to dlopen libgcc_s.so.1. The version supplied with the host gcc is linked to depend on glibc, so I'm instead using ld's -u option to pull in the needed symbols (and their dependencies) from libgcc_eh.a to make a replacement libgcc_s.so.1:
gcc -specs uclibc.specs -Wl,-u,_Unwind_Resume -Wl,-u,__gcc_personality_v0 \
-Wl,-u,_Unwind_ForcedUnwind -Wl,-u,_Unwind_GetCFA -shared -o libgcc_s.so.1
In principle I would be done, but all the symbols in libgcc_eh.a have their visibility set to hidden, so in the output .so file, they all become local and don't get added to the .dynsym symbol table.
I'm looking for a way to use binutils (perhaps objcopy? or a linker script?) on either the .so file or the original .o files in libgcc_eh.a to un-hide these symbols. Is this possible?

objcopy doesn't seem to have this feature, but you can do it with the ELFkickers rebind tool:
rebind --visibility default file.o SYMBOLS...
This must be done on the original .o files. If you try to do it on the .so it'll be too late, because the hidden symbols will have been omitted from the .dynsym section.

I think you should be able to use --globalize-symbol in objcopy.
e.g.
$ nm /usr/lib/gcc/i686-redhat-linux/4.6.3/libgcc_eh.a | grep emutls_alloc
00000000 t emutls_alloc
$ objcopy --globalize-symbol=emutls_alloc /usr/lib/gcc/i686-redhat-linux/4.6.3/libgcc_eh.a /tmp/libgcc_eh.a
$ nm /tmp/libgcc_eh.a |grep emutls_alloc
00000000 T emutls_alloc
You can provide --globalize-symbol several times to objcopy, but you'll need to explicitly mention the full symbol name of all the symbols you want to globalize.
Though I'm not sure what kind of breakage could occur turning libgcc_eh.a into a shared object, as libgcc_eh.a is presumably compiled without -fpic/-fPIC. Turns out libgcc_eh.a is compiled as position independent code.

object file from .a not included in .so

I have created a .c file which is being converted to a .o file along with around 300 other .c files and included in a .a static library. This library, along with many others is being used to create a .so dynamic library. On analyzing both the .a and the .so file with nm, I found that for some reason the symbols defined in the .c file are present in the .a file but not in the .so file. I can think of no reason this should happen. Can somebody please help me out here? The steps used to create the two binaries are:
gcc -fvisibility=hidden -c foo.c -o foo.c.o
ar cr libbar.a foo.c.o ...
gcc -fvisibility=hidden -fPIC -o libfinal.so libbar.a x.o y.a ...
The reason I have specified visibility hidden here is that I want to expose only a few selected symbols. To expose the symbols from foo.c I have specified the visibility attribute so that the functions signatures in the header foo.h look like:
extern int _____attribute_____ ((visibility ("default"))) func();
EDIT: The command nm libbar.a | grep Ctx gives:
000023c5 T CtxAcquireBitmap
000026e9 T CtxAcquireArray
00001e77 T CtxCallMethod
However, nm libfinal.so | grep Ctx does not show anything.
UPDATE: Found another post which discusses the uses of the --whole-archive option. Also, stumbled across the --export-dynamicoption which apparently tells the linker to retain unreferenced symbols. Investigating further.

Try using --whole-archive linker option to include all objects into your shared library when linking
gcc -o libfinal.so -Wl,--whole-archive libbar.a x.o y.a -Wl,--no-whole-archive
From man ld:
--whole-archive
For each archive mentioned on the command line after the --whole-archive option, include every object file in the archive in the
link, rather than searching the archive for the required object files. This is normally used to turn an archive file into a shared
library, forcing every object to be included in the resulting shared library. This option may be used more than once.
Two notes when using this option from gcc: First, gcc doesn't know about this option, so you have to use -Wl,-whole-archive.
Second, don't forget to use -Wl,-no-whole-archive after your list of archives, because gcc will add its own list of archives to your
link and you may not want this flag to affect those as well.

As far as I know, when compiling against a .a, gcc will only pull out the objects that are referenced by the other modules. If your intent is to include the whole content of the .a in the .so, a plain "compile/link x.c into libfinal.so using content in libbar.a" is not what you want.

Creating a dummy reference for the required symbols in my main file did not solve the problem. The referenced symbols appeared in the binary dump (obtained using nm) with a U (= undefined) marker. I managed to solve the problem by linking the object file directly when creating the .so file instead of including it in the .a library first. As these functions were marked extern they were included in the .so even though they were not being referenced within the library. Had they not been marked extern, they would not have been included just like sylvainulg said.
Thanks to Dmitry for pointing out the --whole-archive option. I did not know that such an option exists.

Restricting symbols in a Linux static library

I'm looking for ways to restrict the number of C symbols exported to a Linux static library (archive). I'd like to limit these to only those symbols that are part of the official API for the library. I already use 'static' to declare most functions as static, but this restricts them to file scope. I'm looking for a way to restrict to scope to the library.
I can do this for shared libraries using the techniques in Ulrich Drepper's How to Write Shared Libraries, but I can't apply these techniques to static archives. In his earlier Good Practices in Library Design paper, he writes:
The only possibility is to combine all object files which need
certain internal resources into one using 'ld -r' and then restrict the symbols
which are exported by this combined object file. The GNU linker has options to
do just this.
Could anyone help me discover what these options might be? I've had some success with 'strip -w -K prefix_*', but this feels brutish. Ideally, I'd like a solution that will work with both GCC 3 and 4.
Thanks!

I don't believe GNU ld has any such options; Ulrich must have meant objcopy, which has many such options: --localize-hidden, --localize-symbol=symbolname, --localize-symbols=filename.
The --localize-hidden in particular allows one to have a very fine control over which symbols are exposed. Consider:
int foo() { return 42; }
int __attribute__((visibility("hidden"))) bar() { return 24; }
gcc -c foo.c
nm foo.o
000000000000000b T bar
0000000000000000 T foo
objcopy --localize-hidden foo.o bar.o
nm bar.o
000000000000000b t bar
0000000000000000 T foo
So bar() is no longer exported from the object (even though it is still present and usable for debugging). You could also remove bar() all together with objcopy --strip-unneeded.

Static libraries can not do what you want for code compiled with either GCC 3.x or 4.x.
If you can use shared objects (libraries), the GNU linker does what you need with a feature called a version script. This is usually used to provide version-specific entry points, but the degenerate case just distinguishes between public and private symbols without any versioning. A version script is specified with the --version-script= command line option to ld.
The contents of a version script that makes the entry points foo and bar public and hides all other interfaces:
{ global: foo; bar; local: *; };
See the ld doc at: http://sourceware.org/binutils/docs/ld/VERSION.html#VERSION
I'm a big advocate of shared libraries, and this ability to limit the visibility of globals is one their great virtues.
A document that provides more of the advantages of shared objects, but written for Solaris (by Greg Nakhimovsky of happy memory), is at http://developers.sun.com/solaris/articles/linker_mapfiles.html
I hope this helps.

The merits of this answer will depend on why you're using static libraries. If it's to allow the linker to drop unused objects later then I have little to add. If it's for the purpose of organisation - minimising the number of objects that have to be passed around to link applications - this extension of Employed Russian's answer may be of use.
At compile time, the visibility of all symbols within a compilation unit can be set using:
-fvisibility=hidden
-fvisibility=default
This implies one can compile a single file "interface.c" with default visibility and a larger number of implementation files with hidden visibility, without annotating the source. A relocatable link will then produce a single object file where the non-api functions are "hidden":
ld -r interface.o implementation0.o implementation1.o -o relocatable.o
The combined object file can now be subjected to objcopy:
objcopy --localize-hidden relocatable.o mylibrary.o
Thus we have a single object file "library" or "module" which exposes only the intended API.
The above strategy interacts moderately well with link time optimisation. Compile with -flto and perform the relocatable link by passing -r to the linker via the compiler:
gcc -fuse-linker-plugin -flto -nostdlib -Wl,-r {objects} -o relocatable.o
Use objcopy to localise the hidden symbols as before, then call the linker a final time to strip the local symbols and whatever other dead code it can find in the post-lto object. Sadly, relocatable.o is unlikely to have retained any lto related information:
gcc -nostdlib -Wl,-r,--discard-all relocatable.o mylibrary.o
Current implementations of lto appear to be active during the relocatable link stage. With lto on, the hidden=>local symbols were stripped by the final relocatable link. Without lto, the hidden=>local symbols survived the final relocatable link.
Future implementations of lto seem likely to preserve the required metadata through the relocatable link stage, but at present the outcome of the relocatable link appears to be a plain old object file.

This is a refinement of the answers from EmployedRussian and JonChesterfield, which may be helpful if you're generating both dynamic and static libraries.
Start with the standard mechanism for hiding symbols in DSOs (the dynamic version of your lib). Compile all files with -fvisibility=hidden. In the header file which defines your API, change the declarations of the classes and functions you want to make public:
#define DLL_PUBLIC __attribute__ ((visibility ("default")))
extern DLL_PUBLIC int my_api_func(int);
See here for details. This works for both C and C++. This is sufficient for DSOs, but you'll need to add these build steps for static libraries:
ld -r obj1.o obj2.o ... objn.o -o static1.o
objcopy --localize-hidden static1.o static2.o
ar -rcs mylib.a static2.o
The ar step is optional - you can just link against static2.o.

My way of doing it is to mark everything that is not to be exported with INTERNAL,
include guard all .h files, compile dev builds with -DINTERNAL= and compile release builds with a single .c file that includes all other library .c files with -DINTERNAL=static.