Removing symbols from `.a`s - linker

I'm compiling a C++ static library using g++ via Cmake. I want to remove symbols relating to the internal implementation so they don't show up in nm. (See here and here for the same with shared libraries.)
This answer tells you how to do it on iOS, and I'm trying to understand what happens under the hood so I can replicate on Linux. They invoke ld with:
-r/--relocatable to Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld.
-x/--discard-all: Delete all local symbols.
AFAICS the -r glues all the modules into one module, and then the -x removes symbols only used inside that module. Is that right?
It's not clear how the linker 'knows' which symbols will be exported externally? Does it rely on __attribute__((visibility("hidden/default"))) as in the .so case?
Edit: clearly I'm confused... I thought cmake invoked ld to link the .os into .a. Googled + clarified above.
Question still stands: how do I modify the build process to exclude most symbols?

Related

ldd says "not found" for one file, and found for another

I'm trying to gather all the dependencies needed by some .so file. I use the Recursive ldd script, but that doesn't really matter for the manner of sake.
I want to put all the .so files in one directory, say it's in
/home/user/project/lib
I'm having a weird experience: Say there's a file libmat.so which I want to gather all its dependencies. So I ran
/home/user/project/lib$ ldd libmat.so
...
libmwboost_system.so.1.65.1 => /home/user/project/lib/libmwboost_system.so.1.65.1
libmwboost_filesystem.so.1.65.1 => /home/user/project/lib/libmwboost_filesystem.so.1.65.1
...
So we see that ldd recognized the libmwboost_system.so.1.65.1 file in the current directory.
Turns out that libmwboost_filesystem.so.1.65.1.so also depends on libmwboost_system.so.1.65.1,
But when I run:
/home/user/project/lib$ ldd libmwboost_filesystem.so.1.65.1
...
libmwboost_system.so.1.65.1 => not found
...
How come ldd can find it when I run in on libmat.so and can't when I run it on libmwboost_filesystem.so.1.65.1 ?
I would be glad if someone could provide an explanation in the context of the linking process. As far as I know, when you link a file against a library, you use the following flags:
~$ gcc my_program.c -Lpath/to/solib/for/static/linker -lnameoflib -wl,-rpath=path/to/solib/for/dynamic/linker
This -Wl,-rpath flag embeds in the executable the path of the library that the dynamic linker will search for at run time. In the case of a shared library that depends on other libraries - does it work the same?
Ok so that's how it works:
When linking a binary - whether it's an executable or another shared library - against a shared library, the static linker embeds in the binary the names of the libraries we linked against. Those libraries will be loaded by the dynmaic linker - ld.so - once the program is run by the user.
Now the question is where the dynamic linker will search for these libraries at runtime. Briefly, according to the man page of ld.so, when the dynamic linker first inspects the binary to resolve its dependencies, it goes through the strings of the dependencies. If a string contains a slash "/" then the string is interpreted as a path. This can happen if the library name was specified with a slash at link time, like this:
~$ gcc prog.c ../path/to/library.so
In this case, the string embeded in the executable will be: ../path/to/library.so and the dynamic linker will search it relatively to the location of the binary.
If not, it will search for the library in a list of locations (the whole list is specified in the man page). The first location is the directories specified in the DT_RUNPATH section attribute of the binary. This can be set at link time using the -Wl,-rpath flag:
~$ gcc prog.c -Lpath/of/lib/ -lmylib -Wl,-rpath=path/of/lib
In this case, the path/of/lib will be searched for the library by the dynamic library.
You can inspect this attribute using readelf:
~$ readelf -d binary | grep RUNPATH
In my case, the libmat.so library contained a RUNPATH attribute set to $ORIGIN, meaning that libraries will be searched in the same location of the binary library. Whereas, the libmwboost_filesystem.so.1.65.1 didn't have this attribute set, that is why ldd didn't find the library.
ldd is just using ld.so to try and load the libraries, and shows where it found them, according to the search path specified in the ld.so man page.

What does the GNU ld --undefined option do?

Can somebody explain what the GNU ld option --undefined does?
Working on a LiteOS project. The app is linked with many -u options. For example -utask_shellcmd.
The GNU linker manual for --undefined=symbol simply says:
Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional modules from standard libraries.
So the symbol will be included in the output file as an undefined. What if the symbol is already defined in one of the linked obj files? If it is really undefined, when the linking of additional modules will happen and how does that happen?
The -u option is only relevant when archive (.a) libraries are involved (maybe also .so libraries with --as-needed in effect).
Unlike individual object files (.o) on the linking command line, which are all linked in the order in which they appear, object files from an archive library are only linked when they satisfy one or more undefined symbol references at the point they appear in the link command line order. Once once .o file from the archive is pulled into the link, the process is repeated recursively, so that if it introduces more undefined symbol references, other object files from the same (or later) archives will be pulled in to satisfy them.
Using -u allows you to cause a particular symbol (and, indirectly, all dependencies of the object file it was defined in) to be pulled into the link. Of course you could just put all .o files on the command line directly, without using any archive libraries, but by using libraries you can avoid linking unused object files (this is especially useful if large parts of the code may be unused depending on build-time-configurable settings in other files!) while getting the ones you need.

Re-export Shared Library Symbols from Other Library (OS X / POSIX)

My question is fairly OS X on x86-64 specific but a universal solution that works on other POSIX OSes is even more appreciated.
Given a list of symbol names of some shared library (called original library in the following) and I want my shared library to re-export these symbols. Re-export as in if someone tries to resolve the symbol against my library I either provide my version of this symbol or (if my library doesn't have this symbol) forward to the original library's symbol.
I don't know the types of the symbols, I only know whether they are functions (type T in nm output) or other symbols (type S in nm output).
For functions, I already have a solution: For every function I want to re-export I generate an assembly stub that does dynamically resolve the symbol (using dlsym()) and then jumps into the resolved function with the very same environment (registers rdi, rsi, rdx, rcx, r8, r9, stack pointer, ...). I'm basically generating universal proxy functions. Using some macro trickery that can be generated fairly easy without writing code for each and every symbol.
For non-function symbols the problem seems to be harder because I cannot generate this universal proxy function, because the resolving party does never call a function.
Using a constructor function static void init(void) __attribute__((constructor)); I can execute code whenever someone loads my library, that would be a good point to resolve and re-export all non-function symbols if that's possible.
In other words, I'd like to write the symbol table of my library to point to the respective symbols of another shared library. Doing the rewriting at compile or run time is okay (run time preferred). Or put yet another way, the behaviour of DYLD_INSERT_LIBRARIES (LD_PRELOAD) is exactly what I need but I don't want to insert a new library, I want to replace one (in the file system). EDIT: The reason I don't want/can't use DYLD_INSERT_LIBRARIES or any other environment variable of the DYLD_* family is that they are ignored for code signed, restricted, ... binaries.
I'm aware of the -reexport-l, -reexport_library and -reexported_symbols_list linker flags but I could not get them to work, especially when my library is a "replacement" for frameworks that are part of umbrella frameworks (example: /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/SearchKit) because ld forbids to link directly against parts of umbrella frameworks.
EDIT: Because I explained it somewhat ambiguously: I can't change the way the actual program is linked. The goal is to produce a shared library that is a replacement for the original library. (Apparently called filter library.)
Found it out now (OS X specific): clang -o replacement-lib.dylib ... -Xlinker -reexport_library PATH_TO_ORIGINAL_LIB does the trick. PATH_TO_ORIGINAL_LIB could for example be /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit.
If PATH_TO_ORIGINAL_LIB is a library that is part of an umbrella framework (as in the example above), then replace PATH_TO_ORIGINAL_LIB by the path of some other lib (I created a lib empty.dylib for that) and as a second step do
install_name_tool -change /usr/local/lib/empty.dylib PATH_TO_ORIGINAL_LIB replacement-lib.dylib
To see if the actual reexporting worked use:
otool -l replacement-lib.dylib | grep -A2 LC_REEXPORT_DYLIB
The output should look like
cmd LC_REEXPORT_DYLIB
cmdsize XX
name empty.dylib (offset YY)
After launching the install_name_tool it could be
cmd LC_REEXPORT_DYLIB
cmdsize XX
name /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit (offset YY)
You could link against both libraries and use the link order to make sure to link against the right symbols. This works on both OS X and Linux:
cc -o executable -lmylib -loriglib
Where origlib is the original library and mylib contains symbols that are supposed to overwrite symbols in origlib. Then the executable will be linked against your symbols from mylib first and all unresolved symbols will be linked against origlib.
This works in the same way when linking against OS X frameworks. Just link against your library that replaces symbols first and against the framework after.
cc -o executable -lmylib -framework SomeFramework
Edit: If you just want to replace symbols at runtime then you can use LD_PRELOAD in the same way:
cc -o executable -framework SomeFramework
LD_PRELOAD=libmylib.dylib ./executable

Linking against a shared library on AIX

I'm trying to link against a shared library (apr) on AIX 5.3 using gcc/libtool.
The output from the compiler is as follows (with some irrelevant flags removed for the sake of simplicity):
libtool: link: gcc -o test test.o -L/opt/freeware/lib -lapr-1 -lpthread -Wl,-blibpath:/opt/freeware lib:/usr/lib:/lib
Then I checked what shared libs the resulting binary uses:
$ ldd test
test needs:
/usr/lib/libc.a(shr.o)
/usr/lib/libpthread.a(shr_xpg5.o)
/unix
/usr/lib/libcrypt.a(shr.o)
/usr/lib/libpthreads.a(shr_comm.o)
Notice that 'libapr-1' is missing here, though the symbols are there in the binary (verified with nm), which suggests that it is linked in statically.
This wouldn't be such a big problem for simple programs. Unfortunately my code in question uses dynamically loadable modules. The main program calls apr_initialize which sets a static variable 'apr_pools_initialized' inside the library. The loadable modules then try to use apr_pool_create which first check whether the initialization has been performed. Since they have their own statically linked apr, the static variable 'apr_pools_initialized' is not at the same memory location what the main program initialized. This makes the statically linked binary non-functional.
The apr library is installed using a precompiled binary rpm (apr and apr-devel). The relevant library files are there:
# rpm -ql apr|grep \\.so$
/opt/freeware/lib/libapr-1.so
/opt/freeware/lib64/libapr-1.so
/usr/lib/libapr-1.so
# rpm -ql apr-devel|grep \\.a$
/opt/freeware/lib/libapr-1.a
/opt/freeware/lib64/libapr-1.a
/usr/lib/libapr-1.a
/usr/lib64/libapr-1.a
/usr/lib64/libapr-1.so
I tried to remove the '.a' files hoping that the linker would have no choice but to use the '.so' and link it dynamically, unfortunately AIX is different and this does not work.
Regarding this topic I have found this answer and another libtool question which give some insight.
The question is: How can I link this to my binary dynamically?
Actually the links referenced contained the solution to this problem, which is:
-Wl,-brtl
Adding these LDFLAGS solved the linking problem.

Statically linking libclang in C code

I'm trying to write a simple syntax checker for C code using the frontend available in libclang. Due to deployment concerns, I need to be able to statically link all the libraries in libclang, and not pass around the .so file that has all the libraries.
I'm building clang/llvm from source, and in llvm/Release+Asserts/lib I have a bunch of .a files that I think I should be able to use, but it never seems to work (the linker spews out thousands of errors about missing symbols). However, when I compile it using the libclang.so also present in that directory as follows:
clang main.c -o bin/dlc -I../llvm/tools/clang/include -L../llvm/Release+Asserts/lib/ -lclang
Everything seems to work well.
What is the minimum set of .a files I need to include to make this work? I've tried including absolutely all of the .a files in the build output directory, with them provided to clang/gcc in different orders, without any success. I only need the functions mentioned in libclang's Index.h, but there don't seem to be any resources or documentation on what the various libclang*.a files are for. It would be very helpful to know which files libclang.so pulls in.
The following is supposed to work, as long the whole project has all static libraries (I counted 116 in my Release/lib directory).
clang main.c -o bin/dlc -I../llvm/tools/clang/include ../llvm/Release/lib/*.a
[edit: clang main.c -o bin/dlc -I../llvm/tools/clang/include ../llvm/Release/lib/libclang.a ../llvm/Release/lib/*.a]
Note that the output binary is not static, so you don't need any -static flag for gcc or ld, if you're using this syntax.
If that doesn't work you might need to list the libraries in order: if some library requires a function available in another library, then it may be necessary to list it first in the command line. See comments about link order at:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Link-Options.html#Link-Options

Resources