GNU Linker map: trace origin of symbol from dynamic library - c

I am trying to list all real file dependencies of an ELF executable in order to improve granularity of incremental building/testing.
When I link an executable against a set of libraries, the symbols from the STATIC ones appear on top of the linker map, which is good. I would like to also know when the linker include a symbol from a shared library and the path to the defining file.
For exemple if I have an executable looking like:
#include "ext_src.h"
#include "ext_lib.h"
#include "int_src.h"
int main(){
ext_src();
ext_lib();
int_src();
}
Where each of the 3 functions comes from a different library, the compiler command being:
/usr/bin/cc -O3 -DNDEBUG -Wl,-Map=exec2.map exec2.c.o -o exec2 -Wl,-rpath,backend_libs: backend_libs/libsub.so backend_libs/libEXT_dependence1.so extern/lib/an_extern_lib/libext_lib_normal.a
I can only have the information I seek (which is that ext_lib() comes from ext_lib.c.o) for the static library on top of the linker map:
Membre d'archive inclu pour satisfaire la référence par fichier (symbole)
extern/lib/an_extern_lib/libext_lib_normal.a(ext_lib.c.o)
CMakeFiles/exec2.dir/entry/exec2.c.o (ext_lib)
/usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
/usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/Scrt1.o (__libc_csu_init)
The information does not seem to be anywhere in the linker map. Indeed I cant find the module I know where ext_src() is defined in it.
Does someone have an idea how to get the file from which ext_src is defined? It need to be in a way that it would list only the symbols that my executable actually uses
Edit: I also forgot to mention that I control the compilation of the libraries I link to. Thus I am open to a solution involving compiling theses libraries with weird flags, debug sections...

Related

NASM: How to resolve these unresolved externals?

I'm trying to get started learning basic assembly with Paul A. Carter's book "PC Assembly Language." However I'm unable to run the first example Carter provides, so I'm kind of stuck until I figure this out.
I assembled the example "first.asm" without any problem, but I can't figure out how to link these files: first.obj, driver.c, asm_io.obj into an executable. In the comment section of first.asm Carter gives these instructions for creating an executable (I'm using Windows 10, VS community 2015 developer command prompt):
; Using MS C/C++
; nasm -f win32 first.asm
; cl first.obj driver.c asm_io.obj
I'm doing exactly that but I'm getting a fatal error 2 unresolved externals, _printf and _scanf. I have every necessary file that I can think of in the same directory, and I'm compiling in that directory.
Driver.c calls the function defined in and it uses a header file called "CDECL.h"; I have this file in my directory, but I don't understand much about this header file. I wonder if the problem is here. I haven't altered it or anything. I assembled asm_io.asm according to Dr. Carter's instructions.
Not too far into asm_io.asm is see this:
extern _scanf, _printf, _getchar, _putchar, _fputs
So here are the unresolved externals. Shouldn't they be defined in stdio.h? Driver.c includes stdio.h, shouldn't the linker be able to resolve these symbols be looking at stdio.h? What might I be missing?
ps. I'm new to programming in general, and this is my first stack overflow question. I'm open to any and all criticism/feedback. I'll provide more information if you need it, I just didn't want to post a massive wall of text and code if not necessary.
Welcome to SO. You need to understand:-
The difference between a header file, e.g.
foo.h // C or maybe C++ header file)
and a library, e.g.
foo.lib foo.dll // Windows
libfoo.a, libfoo.so // Unix/Linux
that implements the calling interface that is (merely) described in a header file.
The difference between compiling or assembling a source file, e.g.
bar.c // C source file
bar.asm // Assembly source, Windows
bar.s // Assembly source, Unix/Linux
to make an object file. e.g.
bar.obj // Windows
bar.o // Unix/Linux
and linking object files and libraries together make a complete executable.
Linking can succeed only if the linker is supplied with (or knows by default)
the names and locations of object files and/or libraries that provide
implementations of all the functions that are called in the program - including
functions whose calling interfaces are described in header files. Otherwise
unresolved symbol errors ensue.
Research these points and you'll quickly get yourself unstuck. See this
pretty good introductory tutorial, which although it is about getting
started with the GNU Compiler Collection rather
than with assembly language programming, will clarify the principles
and distinctions you need to grasp.

Why including an h file with external vars and funcs results in undefined references

What if I want these externals to be resolved in runtime with dlopen?
Im trying to understand why including an h file, with shared library external vars and funcs, to a C executable program results in undefined/unresolved. (when linking)
Why do I have to add "-lsomelib" flag to the gcc linkage if I only want these symbols to be resolved in runtime.
What does the link time linker need these deffinitions resolutions for. Why cant it wait for the resolution in runtime when using dlopen.
Can anyone help me understand this?
Here something that may help understanding:
there are 3 types of linking:
static linking (.a): the compiler includes the content of the library into your code at link time so that you can move the code to other computers with the same architecture and run it.
dynamic linking (.so): the compiler resolves the symbols at link time (during compilation); but the does not includes the code of the library in your executable. When the program is started, the library is loaded. And if the library is not found the program stop. You need the library on the computer that is running the program
dynamic loading: You are in charge of loading the library functions at runtime, using dlopen and etc. Specially used for plugins
see also: http://www.ibm.com/developerworks/library/l-dynamic-libraries/ and
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
A header file (e.g. an *.h file referenced by some #include directive) is relevant to the C or C++ compiler. The linker does not know about source files (which are input to the compiler), but only about object files produced by the assembler (in executable and linkable format, i.e. ELF)
A library file (give by -lfoo) is relevant only at link time. The compiler does not know about libraries.
The dynamic linker needs to know which libraries should be linked. At runtime it does symbol resolution (against a fixed & known set of shared libraries). The dynamic linker won't try linking all the possible shared libraries present on your system (because it has too many shared objects, or because it may have several conflicting versions of a given library), it will link only a fixed set of libraries provided inside the executable. Use objdump(1) & readelf(1) & nm(1) to explore ELF object files and executables, and ldd(1) to understand shared libraries dependencies.
Notice that the g++ program is used both for compilation and for linking. (actually it is a driver program: it starts some cc1plus -the C++ compiler proper- to compile a C++ code to an assembly file, some as -the assembler- to assemble an assembly file into an object file, and some ld -the linker- to link object files and libraries).
Run g++ as g++ -v to understand what it is doing, i.e. what program[s] is it running.
If you don't link the required libraries, at link time, some references remain unresolved (because some object files contain an external reference and relocation).
(things are slightly more complex with link-time optimization, which we could ignore)
Read also Program Library HowTo, Levine's book linkers and loaders, and Drepper's paper: how to write shared libraries
If you use dynamic loading at runtime (by using dlopen(3) on some plugin), you need to know the type and signature of relevant functions (returned by dlsym(3)). A program loading plugins always have its specific plugin conventions. For examples look at the conventions used for geany plugins & GCC plugins (see also these slides about GCC plugins).
In practice, if you are developing your application accepting some plugins, you will define a set of names, their expected type, signature, and role. e.g.
typedef void plugin_start_function_t (const char*);
typedef int plugin_more_function_t (int, double);
then declare e.g. some variables (or fields in a data structure) to point to them with a naming convention
plugin_start_function_t* plustart; // app_plugin_start in plugins
#define NAME_plustart "app_plugin_start"
plugin_more_function_t* plumore; // app_plugin_more in plugins
#define NAME_plumore "app_plugin_more"
Then load the plugin and set these pointers, e.g.
void* plugdlh = dlopen(plugin_path, RTLD_NOW);
if (!plugdlh) {
fprintf(stderr, "failed to load %s: %s\n", plugin_path, dlerror());
exit(EXIT_FAILURE; }
then retrieve the symbols:
plustart = dlsym(plugdlh, NAME_plustart);
if (!plustart) {
fprintf(stderr, "failed to find %s in %s: %s\n",
NAME_plustart, plugin_path, dlerror();
exit(EXIT_FAILURE);
}
plumore = dlsym(plugdlh, NAME_plumore);
if (!plumore) {
fprintf(stderr, "failed to find %s in %s: %s\n",
NAME_plumore, plugin_path, dlerror();
exit(EXIT_FAILURE);
}
Then use appropriately the plustart and plumore function pointers.
In your plugin, you need to code
extern "C" void app_plugin_start(const char*);
extern "C" int app_plugin_more (int, double);
and give a definition to both of them. The plugin should be compiled as position independent code, e.g. with
g++ -Wall -fPIC -O -g pluginsrc1.c -o pluginsrc1.pic.o
g++ -Wall -fPIC -O -g pluginsrc2.c -o pluginsrc2.pic.o
and linked with
g++ -shared pluginsrc1.pic.o pluginsrc2.pic.o -o yourplugin.so
You may want to link extra shared libraries to your plugin.
You generally should link your main program (the one loading plugins) with the -rdynamic link flag (because you want some symbols of your main program to be visible to your plugins).
Read also the C++ dlopen mini howto

Re-export Shared Library Symbols from Other Library (OS X / POSIX)

My question is fairly OS X on x86-64 specific but a universal solution that works on other POSIX OSes is even more appreciated.
Given a list of symbol names of some shared library (called original library in the following) and I want my shared library to re-export these symbols. Re-export as in if someone tries to resolve the symbol against my library I either provide my version of this symbol or (if my library doesn't have this symbol) forward to the original library's symbol.
I don't know the types of the symbols, I only know whether they are functions (type T in nm output) or other symbols (type S in nm output).
For functions, I already have a solution: For every function I want to re-export I generate an assembly stub that does dynamically resolve the symbol (using dlsym()) and then jumps into the resolved function with the very same environment (registers rdi, rsi, rdx, rcx, r8, r9, stack pointer, ...). I'm basically generating universal proxy functions. Using some macro trickery that can be generated fairly easy without writing code for each and every symbol.
For non-function symbols the problem seems to be harder because I cannot generate this universal proxy function, because the resolving party does never call a function.
Using a constructor function static void init(void) __attribute__((constructor)); I can execute code whenever someone loads my library, that would be a good point to resolve and re-export all non-function symbols if that's possible.
In other words, I'd like to write the symbol table of my library to point to the respective symbols of another shared library. Doing the rewriting at compile or run time is okay (run time preferred). Or put yet another way, the behaviour of DYLD_INSERT_LIBRARIES (LD_PRELOAD) is exactly what I need but I don't want to insert a new library, I want to replace one (in the file system). EDIT: The reason I don't want/can't use DYLD_INSERT_LIBRARIES or any other environment variable of the DYLD_* family is that they are ignored for code signed, restricted, ... binaries.
I'm aware of the -reexport-l, -reexport_library and -reexported_symbols_list linker flags but I could not get them to work, especially when my library is a "replacement" for frameworks that are part of umbrella frameworks (example: /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/SearchKit) because ld forbids to link directly against parts of umbrella frameworks.
EDIT: Because I explained it somewhat ambiguously: I can't change the way the actual program is linked. The goal is to produce a shared library that is a replacement for the original library. (Apparently called filter library.)
Found it out now (OS X specific): clang -o replacement-lib.dylib ... -Xlinker -reexport_library PATH_TO_ORIGINAL_LIB does the trick. PATH_TO_ORIGINAL_LIB could for example be /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit.
If PATH_TO_ORIGINAL_LIB is a library that is part of an umbrella framework (as in the example above), then replace PATH_TO_ORIGINAL_LIB by the path of some other lib (I created a lib empty.dylib for that) and as a second step do
install_name_tool -change /usr/local/lib/empty.dylib PATH_TO_ORIGINAL_LIB replacement-lib.dylib
To see if the actual reexporting worked use:
otool -l replacement-lib.dylib | grep -A2 LC_REEXPORT_DYLIB
The output should look like
cmd LC_REEXPORT_DYLIB
cmdsize XX
name empty.dylib (offset YY)
After launching the install_name_tool it could be
cmd LC_REEXPORT_DYLIB
cmdsize XX
name /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit (offset YY)
You could link against both libraries and use the link order to make sure to link against the right symbols. This works on both OS X and Linux:
cc -o executable -lmylib -loriglib
Where origlib is the original library and mylib contains symbols that are supposed to overwrite symbols in origlib. Then the executable will be linked against your symbols from mylib first and all unresolved symbols will be linked against origlib.
This works in the same way when linking against OS X frameworks. Just link against your library that replaces symbols first and against the framework after.
cc -o executable -lmylib -framework SomeFramework
Edit: If you just want to replace symbols at runtime then you can use LD_PRELOAD in the same way:
cc -o executable -framework SomeFramework
LD_PRELOAD=libmylib.dylib ./executable

Statically linking libclang in C code

I'm trying to write a simple syntax checker for C code using the frontend available in libclang. Due to deployment concerns, I need to be able to statically link all the libraries in libclang, and not pass around the .so file that has all the libraries.
I'm building clang/llvm from source, and in llvm/Release+Asserts/lib I have a bunch of .a files that I think I should be able to use, but it never seems to work (the linker spews out thousands of errors about missing symbols). However, when I compile it using the libclang.so also present in that directory as follows:
clang main.c -o bin/dlc -I../llvm/tools/clang/include -L../llvm/Release+Asserts/lib/ -lclang
Everything seems to work well.
What is the minimum set of .a files I need to include to make this work? I've tried including absolutely all of the .a files in the build output directory, with them provided to clang/gcc in different orders, without any success. I only need the functions mentioned in libclang's Index.h, but there don't seem to be any resources or documentation on what the various libclang*.a files are for. It would be very helpful to know which files libclang.so pulls in.
The following is supposed to work, as long the whole project has all static libraries (I counted 116 in my Release/lib directory).
clang main.c -o bin/dlc -I../llvm/tools/clang/include ../llvm/Release/lib/*.a
[edit: clang main.c -o bin/dlc -I../llvm/tools/clang/include ../llvm/Release/lib/libclang.a ../llvm/Release/lib/*.a]
Note that the output binary is not static, so you don't need any -static flag for gcc or ld, if you're using this syntax.
If that doesn't work you might need to list the libraries in order: if some library requires a function available in another library, then it may be necessary to list it first in the command line. See comments about link order at:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Link-Options.html#Link-Options

how do I always include symbols from a static library?

Suppose I have a static library libx.a. How to I make some symbols (not all) from this library to be always present in any binary I link with my library? Reason is that I need these symbols to be available via dlopen+dlsym. I'm aware of --whole-archive linker switch, but it forces all object files from library archive to linked into resulting binary, and that is not what I want...
Observations so far (CentOS 5.4, 32bit) (upd: this paragraph is wrong; I could not reproduce this behaviour)
ld main.o libx.a
will happily strip all non-referenced symbols, while
ld main.o -L. -lx
will link whole library in. I guess this depends on version of binutils used, however, and newer linkers will be able to cherry-pick individual objects from a static library.
Another question is how can I achieve the same effect under Windows?
Thanks in advance. Any hints will be greatly appreciated.
Imagine you have a project which consists of the following three C files in the same folder;
// ---- jam.h
int jam_badger(int);
// ---- jam.c
#include "jam.h"
int jam_badger(int a)
{
return a + 1;
}
// ---- main.c
#include "jam.h"
int main()
{
return jam_badger(2);
}
And you build it with a boost-build bjam file like this;
lib jam : jam.c <link>static ;
lib jam_badger : jam ;
exe demo : jam_badger main.c ;
You will get an error like this.
undefined reference to `jam_badger'
(I have used bjam here because the file is easier to read, but you could use anything you want)
Removing the 'static' produces a working binary, as does adding static to the other library, or just using the one library (rather than the silly wrapping on inside the other)
The reason this happens is because ld is clever enough to only select the parts of the archive which are actually used, which in this case is none of them.
The solution is to surround the static archives with -Wl,--whole-archive and -Wl,--no-whole-archive, like so;
g++ -o "libjam_candle_badger.so" -Wl,--whole-archive libjam_badger.a Wl,--no-whole-archive
Not quite sure how to get boost-build to do this for you, but you get the idea.
First things first: ld main.o libx.a does not build a valid executable. In general, you should never use ld to link anything directly; always use proper compiler driver (gcc in this case) instead.
Also, "ld main.o libx.a" and "ld main.o -L. -lx" should be exactly equivalent. I am very doubtful you actually got different results from these two commands.
Now to answer your question: if you want foo, bar and baz to be exported from your a.out, do this:
gcc -Wl,-u,foo,-u,bar,-u,baz main.o -L. -lx -rdynamic
Update:
your statement: "symbols I want to include are used by library internally only" doesn't make much sense: if the symbols are internal to the library, why do you want to export them? And if something else uses them (via dlsym), then they are not internal to the library -- they are part of the library public API.
You should clarify your question and explain what you really are trying to achieve. Providing sample code will not hurt either.
I would start with splitting off those symbols you always need into a seperate library, retaining only the optional ones in libx.a.
Take an address of the symbol you need to include.
If gcc's optimiser anyway eliminates it, do something with this address - should be enough.

Resources