Instrumenting a C library - c

I have a binary library and a binary executable using that library, both written in C. I know the C API provided by the library, but neither the source of the library or the executable. I would like to understand how the executable uses the library (compare my previous question How to know which functions of a library get called by a program).
The proposed solutions did not give satisfactory results. A possibility not mentioned seems to be to implement a wrapper library that imitates the known interface of the binary library I am interested in. My idea is to forward all of the calls to the wrapper to the binary library. This should allow me to log all the calls and passed parameters, in other words to instrument the library.
I succeeded in implementing the wrapper library on Linux as a dynamic link library (*.so), together with my own sample application connecting to the wrapper. The wrapper, in turn, uses the original binary library. Both libraries are used with dlopen and dlsym to access the API. However, I am facing the following practical problem: I do not manage to link the original binary executable to my wrapper library. That is related to the fact that the executable expects the library under a certain name. However, if I name my rapper library that way, it conflicts with the original library. Surprisingly (to me) simply renaming the .so-file of that one and linking the wrapper library against it does not work (The result stops without error message when the wrapper library calls dlopen and I do not get more information from the debugger than that it seems to happen in an malloc).
I tried a number of things like using symbolic links to move one of the libraries out of the search path of the run-time linker, to add paths to the LD_LIBRARY_PATH environment variable, different relative locations of the .so-files (and corresponding paths for dlopen), as well as different compiler options, so far without success.
To summarize, I would like
(executable)_orig->(lib.so)->(lib.so)_orig
where (executable)_orig and (lib.so)_orig (both binaries that I cannot influence) are such that
(executable)_orig->(lib.so)_orig
works. I have the sources of (lib.so) and can modify it as I wish. Also, I can modify the Linux host system as I like. The task of (lib.so) is to tell me how (executable)_orig and (lib.so)_orig interact.
I also have
(executable)->(wrapper.so)->(lib.so)_orig
working, which seems to indicate that the issue is related to the naming and loading conventions for the libraries.
This is a separate new question because it deals with the specific practical issue sketched above. Beyond that, some background info on why renaming the file corresponding to (lib.so)_orig to circumvent the issue may fail could also prove useful.

Related

How do dynamic linkers/loaders resolve symbols?

I'm a CS student, and I'm doing a project on shared libraries and dynamic linking/loading. One of the questions I have to answer is how symbols are resolved with dynamic linking/loading. I've scoured the internet and haven't been able to find anything conclusive. I understand different linkers may resolve symbols differently across different operating systems. I'm just looking for a general, windows-based answer; how are symbols resolved in dynamic linking?
Thank You!
Well, let's stick on Windows. I'll answer in a few words and then vote for moving this question to CS site instead of main SO.
First, dynamic linking can be at program start (prelinked variant) and when program code explicitly requests some library load. While the same DLLs are used there, details differ.
Prelinked variant will work with so called import library that is static but contains special thunks (AKA trampolines) that are replaced with jumps to real code when dynamic loader attaches a real library (DLL file). For linkers that aren't aware of dynamic loading, import library is enough to still provide dynamic linking - that how it appeared in DOS/Windows. The calling code could be ignorant of details if the code is provided by static or dynamic library.
On-the-fly loading is using methods like LoadLibrary to load the library (and activate it) and GetProcAddress to get pointer of real function implementation. In that case, application (or another library) is aware of details of this mechanism.
I hope this brief is enough to provide you with enough words for further delving.

Getting known library paths from ldconfig for use with dlopen

I have a program written in C that uses dlopen for loading plug-in modules. When the library is dynamically loaded, it runs constructor code which register pointer to structure with function implementations with the main application by use of exported function. I want to use absolute path for specifying the file to dlopen.
Then I have other part of the program with takes file, determine if it is ELF, then looks into the ELF header for specific ELF section, read this section and extract from it pertinent information. This way it filters only shared libraries which I have previously tagged as a plug-in module.
However, I am solving a problem how to discover them on the fly (in portable Linux way, i.e. it will run on Debian and on Fedora too and so on) from the main program. I have been thinking about using ldconfig for this. (As the modules will be installed by way of distro packaging system, APT for example.) Is there any way how to programmatically get the string list of known libraries from C program other than directly reading the /etc/ld.co.cache file? I was thinking that maybe there is some header library which will give char** when I ask.
Or, maybe is there any better solution to my problem?
(I am proponent of using standard system components that programming one-off solutions which will need support in the future.)

Hiding a library within a library

Here's the situation. I have an old legacy library that is broken in many places, but has a lot of important code built in (we do not have the source, just the lib + headers). The functions exposed by this library have to be handled in a "special" way, some post and pre-processing or things go bad. What I'm thinking is to create another library that uses this old library, and exposes a new set of functions that are "safe".
I quickly tried creating this new library, and linked that into the main program. However, it still links to the symbols in the old library that are exposed through the new library.
One thing would obviously be to ask people not to use these functions, but if I could hide them through some way, only exposing the safe functions, that would be even better.
Is it possible? Alternatives?
(it's running on an ARM microcontroller. the file format is ELF, and the OS is an RTOS from Keil, using their compiler)
[update]
Here's what i ended up doing: I created dummy functions within the new library that use the same prototypes as the ones in the old. Linked the new library into the main program, and if the other developers try to use the "bad" functions from the old library it will break the build with a "Symbol abcd multiply defined (by old_lib.o and new_lib.o)." Good enough for government work...
[update2]
I actually found out that i can manually hide components of a library when linking them in through the IDE =P, much better solution. sorry for taking up space here.
If you're using the GNU binutils, objcopy can prefix all symbols with a string of your choice. Just use objcopy --prefix-symbols=brokenlib_ old.so new.so (be careful: omitting new.so will cause old.so to be overwritten!)
Now you use brokenlib_foo() to call the original version of foo().
If you use libtool to compile and link the library instead of ld, you can provide -export-symbols to control the output symbols, but this will only work if your old library can be statically linked. If it is dynamically linked (.so, .dylib, or .dll), this will not be possible.

Proxy shared library (sharedlib, shlib, so) for ELF?

On Windows, it's more or less common to create "proxy DLLs" which take place of the original DLL and forward calls to it (after any additional actions as needed). You can read about it here and here for example.
However, shlib munging culture under Linux is quite different. It starts with the fact that LD_PRELOAD is the builtin feature with ld.so under Linux, which simply injects separate shlib into process and uses any symbols it defines as override. And that "injection" technique seems to define whole direction of thought - here's a typical ELF hacking tool or this question, where gentleman seems to have the same usecase as me, but starts with asking how he can patch existing binaries.
No, thanks. I don't want to inject into or modify something which is nor mine. All I want to do is to make a standalone proxy shlib which will call out to the original. Ideally, there would be a tool which can be fed with the original .so and create a C source code which would just redirect to original's functions, while letting me easily override anything I want. So, where's such tool? ;-) Thanks.
Using LD_PRELOAD doesn't really involve modifying something which isn't yours, and the injection isn't all that different from normal dynamic library loading. The “typical ELF hacking tool” from the ERESI project is unrelated to LD_PRELOAD. You should not be afraid of it. A good introduction to writing LD_PRELOAD-able “proxies” is here.
That being said, if you want to create a system-wide proxy for some library, you might argue that globally setting LD_PRELOAD (and thus loading your proxy into every binary that ever runs on your system) is undesirable. It is commonly used to override functions from glibc by tools such as libeatmydata or socksify, but if you're overriding a function in a library that is bigger and/or less widespread than glibc, it makes sense to try to find another approach, to really create a proxy for just that one library.
One such approach is to use patchelf --replace-needed or --add-needed to hardcode the full pathname of the original library and then make sure the proxy library is found first by setting LD_LIBRARY_PATH¹. So, the complete procedure is:
create an LD_PRELOAD-able library that overrides some functions of the original one (test that it works using only LD_PRELOAD before proceeding further!)
compile and link this library with the original library so that ldd libwrapper-foo.so includes something like:
libfoo.so.0 => /usr/lib/x86_64-linux-gnu/libfoo.so.0 (0x0000deadbeef0000)
hardcode the full path using patchelf:
patchelf --replace-needed libfoo.so.0 /usr/lib/x86_64-linux-gnu/libfoo.so.0 libwrapper-foo.so
symlink libwrapper-foo.so to libfoo.so.0
now LD_LIBRARY_PATH=. ldd $(which program-that-uses-libfoo) should include these lines:
libfoo.so.0 => ./libfoo.so.0 (0x0000dead56780000)
/usr/lib/x86_64-linux-gnu/libfoo.so.0 (0x0000dead1234000000)
set LD_LIBRARY_PATH to full path to the wrapper library in your .bashrc or somewhere
A real-life example of such proxy libary is my wrapper for libpango that enables subpixel positioning for all applications.
¹) It might also be possible to put this proxy library into /usr/local/lib, but ldconfig (the tool that updates shared libraries cache) refuses to use libraries with hardcoded absolute paths.
apitrace is a tool which covers detailed tracing of graphic libs (OpenGL, DirectX) calls for a number of platform. It's probably too detailed and complex for generic solution, but at least provides some reference and affinity.

Dynamic modules with DLLs on Windows

I'm writing an application in C that can be extended at runtime by means of modules / shared objects / DLLs. Those modules may use the API of the existing program but may also provide new functions for use in later loaded modules, so there is the possibility for modules to have dependencies on each other.
My current approach under Linux is to have every module define a depends() function that returns a list of other module names it depends on. That way, I can compile and link every module for itself, load a module with dlopen() and RTLD_LAZY, resolve its dependencies first and then fully load it with RTLD_GLOBAL. This works just fine and does exactly what I want. It also allows me to replace a module with a different version without recompiling all other modules depending on it.
The actual problem arises when porting this to Windows. First, I haven't found any way to link a DLL without already providing it with the export symbol tables of all its dependencies. Is there one I've overlooked?
Second, LoadLibraryEx from the Windows API doesn't seem to be able to perform any lazy loading because instead of letting me handle dependencies, it goes ahead and loads all referenced DLLs itself before it even returns. Since I'd like to perform version checking as well before actually loading modules in the future, this it not at all what I want. Is there any way to circumvent this behaviour?
The third odd thing is that I cannot replace a DLL without recompiling all other modules depending on it. It actually does work sometimes, but usually wild things start to happen or the program segfaults.
Is it even possible to write a modular application like that on Windows? Any suggestions or different approaches are highly appreciated!
Update: Just to provide some clarification on how my modules use each others functions on Linux (which I would like to have on Windows as well): Every module just returns the name of another module it would like to call functions from in the described depends() function and includes its header, then calls the used functions directly in the code without any wrapping. This works because Linux does not require you to have all symbols resolved at link time for shared objects.
You can export all functions manually (using __declspec(dllexport)) and load them using GetProcAddress. In this case you need to know the signature of each function, and you're limited to C functions only, but this is going to work. If you compile both modules, your C functions can also return C++ classes, more on this later. Using GetProcAddress & LoadLibrary makes modules totally independent. Basically, you do the linking manually, but as I understand, this is what you do on Linux, right?
LoadLibary only loads the libraries a library is dependent on, so make sure they don't load depend on each other. Either they are really independent, or they aren't, if done properly, changing one library will not force a recompile of the other (as you don't link them together).
A good idea is to use something like COM, so make each of your libraries return an interface, instead of individual functions. That way, you can simply load a whole DLL and also link them easily together (passing a DLL -> passing an object). Look up XPCOM and COM, it's actually very easy to do.

Resources