Can LD_PRELOAD be used to load different versions of glibc?

Can LD_PRELOAD be used to load different versions of glibc? - c

Cast of characters
big-old-app is linked to an old version of glibc, say glibc-2.12. I cannot do anything to change this.
cute-new-addon.o is linked to a newer version, glibc-2.23. This glibc-2.23 is in a nonstandard path (because I don't have sudo powers).
The story
I want to use cute-new-addon.o inside big-old-app. I would normally write a script for big-old-app to execute, which then calls cute-new-addon.o to perform its tricks. From the command line, it would look like:
$ big-old-app script.txt
However, when I do that, big-old-app would complain that cute-new-addon.o cannot find glibc-2.23. That's understandable, because I have not specified any standard paths. What if I do:
$ LD_LIBRARY_PATH=/path/to/mylibs:$LD_LIBRARY_PATH big-old-app script.txt
It segfaults! :(
I think this is because big-old-app references a newer mylibc.so.6. When doing so, the implementations are no longer what big-old-app is used to, so it segfaults.
The question
Regarding script.txt, I don't think I have the ability to specify the newer mylibc.so.6 before invoking cute-new-addon.o. big-old-app and cute-new-addon.o are tightly intertwined that I have no way of knowing when either of them need their corresponding glibc.
And yes, cute-new-addon.o rpath is pointed to /path/to/mylibs and I can confirm via ldd that all the libraries it needs, it looks for in /path/to/mylibs.
Can I use LD_PRELOAD to load two different versions of glibc? And let big-old-app and cute-new-addon.o look for what they need as they please?

LD_PRELOAD cannot be used because the glibc dynamic linker (sometimes called ld.so or the program interpreter; the location on disk is platform-specific) is only compatible with libc.so.6 (and the rest of the libraries) from the same glibc build.
You can use an explicit loader invocation of the other glibc, along with library path settings that cause the loader to load the glibc objects from a separate directory, and not the system directories. The glibc wiki has an example how to do this.

Related

Port glibc 2.25 and test memory functions

I was investigating whether a few memory functions(memcpy, memset, memmove) in glibc-2.25 with various versions(sse4, ssse3, avx2, avx512) could have performance gain for our server programs in Linux(glibc 2.12).
My first attempt was to download a tar ball of glibc-2.25 and build/test following the instructions here https://sourceware.org/glibc/wiki/Testing/Builds. I manually commented out kernel version check and everything went well. Then a test program was linked with newly built glibc with the procedure listed in section "Compile against glibc build tree" of glibc wiki and 'ldd test' shows that it indeed depended on the expected libraries:
# $GLIBC is /data8/home/wentingli/temp/glibc/build
libm.so.6 => /data8/home/wentingli/temp/glibc/build/math/libm.so.6 (0x00007fe42f364000)
libc.so.6 => /data8/home/wentingli/temp/glibc/build/libc.so.6 (0x00007fe42efc4000)
/data8/home/wentingli/temp/glibc/build/elf/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fe42f787000)
libdl.so.2 => /data8/home/wentingli/temp/glibc/build/dlfcn/libdl.so.2 (0x00007fe42edc0000)
libpthread.so.0 => /data8/home/wentingli/temp/glibc/build/nptl/libpthread.so.0 (0x00007fe42eba2000)
I use gdb to verify which memset/memcpy was actually called but it always shows that __memset_sse2_unaligned_erms is used while I was expecting that some more advanced version of the function(avx2,avx512) could be in use.
My questions are:
Did glibc-2.25 select the most suitable version of memory functions automatically according to cpu/os/memory address? If not, am I missing any configuration during glibc build or something wrong with my setup?
Is there any other alternatives for porting memory functions from newer glibc?
Any help or suggestion would be appreciated.

On x86, glibc will automatically select an implementation which is most suitable for the CPU of the system, usually based on guidance from Intel. (Whether this is the best choice for your scenario might not be clear because the performance trade-offs for many of the vector instructions are extremely complex.) Only if you explicitly disable IFUNCs in the toolchain, this will not happen, but __memset_sse2_unaligned_erms isn't the default implementation, so this does not apply here. The ERMS feature is pretty recent, so this is not completely unreasonable.
Building a new glibc is probably the right approach to test these string functions. Theoretically, you could also use LD_PRELOAD to override the glibc-provided functions, but it is a bit cumbersome to build the string functions outside the glibc build system.
If you want to run a program against a patched glibc without installing the latter, you need to use the testrun.sh script in the glibc build directory (or a similar approach).

Dynamically change the running code by writing into the FILE?

I got to know of a way to print the source code of a running code in C using the __FILE__ macro. As such I can seek the location and use putchar() to alter the contents of the file.
Is it possible to dynamically change the running code using this method?

Is it possible to dynamically change the running code using this method ?
No, because once a program is compiled it no longer depends on the source file.
If you want learn how to alter the behavior of an process that is already running from within the process itself, you need to learn about assembly for the architecture you're using, the executable file format on your system, and the process API on your system, at the very least.

As most other answers are explaining, in practical terms, most C implementations are compilers. So the executable that is running has only an indirect (and delayed) relation with the source code, because the source code had to be processed by the compiler to produce that executable.
Remember that a programming language is (not a software but...) a specification, written in some report. Read n1570, draft specification of C11. Most implementations of C are command-line compilers (e.g. GCC & Clang/LLVM in the free software realm), even if you might find interpreters.
However, with some operating systems (notably POSIX ones, such as MacOSX and Linux), you could dynamically load some plugin. Or you could create, in some other way (such as JIT compilation libraries like libgccjit or LLVM or libjit or GNU lightning), a fresh function and dynamically get a pointer to it (and that is not stricto sensu conforming to the C standard, where a function pointer should point to some existing function of your program).
On Linux, you might generate (at runtime of your own program, linked with -rdynamic to have its names usable from plugins, and with -ldl library to get the dynamic loader) some C code in some temporary source file e.g. /tmp/gencode.c, run a compilation (using e.g. system(3) or popen(3)) of that emitted code as a /tmp/gencode.so plugin thru a command like e.g. gcc -O1 -g -Wall -fPIC -shared /tmp/gencode.c -o /tmp/gencode.so, then dynamically load that plugin using dlopen(3), find function pointers (from some conventional name) in that loaded plugin with dlsym(3), and call indirectly that function pointer. My manydl.c program shows that is possible for many hundred thousands of generated C files and loaded plugins. I'm using similar tricks in my GCC MELT. See also this and that. Notice that you don't really "self-modify" C code, you more broadly generate additional C code, compile it (as some plugin, etc...), and then load it -as an extension or plugin- then use it.
(for pragmatical reasons including ease of debugging, I don't recommend overwriting some existing C file, but just emitting new C code in some fresh temporary .c file -from some internal AST-like representation- that you would later feed to the compiler)
Is it possible to dynamically change the running code?
In general (at least on Linux and most POSIX systems), the machine code sits in a read-only code segment of the virtual address space so you cannot change or overwrite it; but you can use indirection thru function pointers (in your C code) to call newly loaded code (e.g. from dlopen-ed plugins).
However, you might also read about homoiconic languages, metaprogramming, multi-staged programming, and try to use Common Lisp (e.g. using its SBCL implementation, which compile to machine code at every REPL interaction and at every eval). I also recommend reading SICP (an excellent and freely available introduction to programming, with some chapters related to metaprogramming approaches)
PS. Dynamic loading of plugins is also possible in Windows -which I don't know- with LoadLibrary, but with a very different (and incompatible) model. Read Levine's linkers and loaders.

A computer doesn't understand the code as we do. It compiles or interprets it and loads into memory. Our modification of code is just changing the file. One needs to compile it and link it with other libraries and load it into memory.
ptrace() is a syscall used to inject code into a running program. You can probably look into that and achieve whatever you are trying to do.
Inject hello world in a running program. I have tried and tested this sometime before.

Is there a reliable way to know what libraries could be dlopen()ed in an elf binary?

Basically, I want to get a list of libraries a binary might load.
The unreliable way I came up with that seems to work (with possible false-positives):
comm -13 <(ldd elf_file | sed 's|\s*\([^ ]*\)\s.*|\1|'| sort -u) <(strings -a elf_file | egrep '^(|.*/)lib[^:/]*\.so(|\.[0-9]+)$' | sort -u)
This is not reliable. But it gives useful information, even if the binary was stripped.
Is there a reliable way to get this information without possible false-positives?
EDIT: More context.
Firefox is transitioning from using gstreamer to using ffmpeg.
I was wondering what versions of libavcodec.so will work.
libxul.so uses dlopen() for many optional features.
And the library names are hard-coded. So, the above command helps
in this case.
I also have a general interest in package management and binary dependencies.
I know you can get direct dependencies with readelf -d, dependencies of
dependencies with ldd. And I was wondering about optional dependencies, hence the question.

ldd tells you the libraries your binary has been linked against. These are not those that the program could open with dlopen.
The signature for dlopen is
void *dlopen(const char *filename, int flag);
So you could, still unreliably, run strings on the binary, but this could still fail if the library name is not a static string, but built or read from somewhere during program execution -- and this last situation means that the answer to your question is "no"... Not reliably. (The name of the library file could be read from the network, from a Unix socket, or even uncompressed on the fly, for example. Anything is possible! -- although I wouldn't recommend any of these ideas myself...)
edit: also, as John Bollinger mentioned, the library names could be read from a config file.
edit: you could also try substituting the dlopen system call with one of yours (this is done by the Boehm garbage collector with malloc, for example), so it would open the library, but also log its name somewhere. But if the program didn't open a specific library during execution, you still won't know about it.

(I am focusing on Linux; I guess that most of my answer fits for every POSIX systems; but on MacOSX dlopen wants .dylib dynamic library files, not .so shared objects)
A program could even emit some C code in some temporary file /tmp/foo1234.c, fork a compilation of that /tmp/foo1234.c into a shared library /tmp/foo1234.so by some gcc -O -shared -fPIC /tmp/foo1234.c -o /tmp/foo1234.so command -generated and executed at runtime of your program-, perhaps remove the /tmp/foo1234.c file -since it is not needed any more-, and dlopen that /tmp/foo1234.so (and perhaps even remove /tmp/foo1234.so after dlopen), all that in the same process. My GCC MELT plugin for gcc does exactly this, and so does Bigloo, and the GCCJIT library is doing something close.
So in general, your quest is impossible and even has no sense.
Is there a reliable way to get this information without possible false-positives?
No, there is no reliable way to get such information without false positives (you could prove that equivalent to the halting problem, or to some other undecidable problem). See also Rice's theorem.
In practice, most dlopen happens on plugins provided by some configuration. There might not be exactly named as such in a configuration file (e.g. some Foo programs might have a convention like a plugin named bar in some foo.conf configuration file is provided by foo-bar.so plugin).
However, you might find some heuristic approximation. Most programs doing some dlopen have some plugin convention requesting some particular symbol names in the plugin. You could search for shared objects defining these names. Of course you'll get false positives.
For example, the zsh shell accepts plugins called zsh modules. the example module shows that enables_,
boot_, features_ etc... functions are expected in zsh modules. You could use nm -D to find *.so files providing these (hence finding the plugins likely to be perhaps loadable by zsh)
(I am not convinced that such an approach is worthwhile; in fact you should usually know which plugins are useful on your system by which applications)
BTW, you could use strace(1) on the execution of some command to understand the syscalls it is doing, hence the plugins it is loading. You might also use ltrace(1), or pmap(1) (on some given process), or simply -for a process 1234- use cat /proc/1234/maps to understand its virtual address space, hence the plugins it has already loaded. See proc(5).
Notice that strace, ltrace, pmap exist on Linux, but many POSIX systems have similar programs.
Also, a program could generate some machine code at runtime and execute it (SBCL does that at every REPL interaction!). Your program could also use some JIT techniques (e.g. with libjit, llvm, asmjit, GCCJIT or with hand-written code...) to do likewise. So plugin-like behavior can happen without dlopen (and you might mimic dlopen with mmap calls and some ELF relocation processing).
Addenda:
If you are installing firefox from its packaged version (e.g. the iceweasel package on Debian), its package is likely to handle the dependencies

forcing linking with a different SONAME than this of library

How to link a binary in a manner to be compatible with two existing version of a library that have conflicting SONAME ?
Those two versions don't share same SONAME prefix. One is libcapi10.so.3 and the other is libcapi10.so.4.
I can't recompile them since i get them as binary, and since those are certified crypto libraries i can't request new one with right SONAME. Of course i would not have faced any problem if one was libcap10.so.3 and the other libcap10.so.3.1 since i would just need to link over the first to be compatible with the second.
Those two libraries are told to be binary compatible ( i should trust this info ).
I searched but didn't find any good way to do , either with linker options or using objcopy. I would like to avoid patching binary by hand to use it at compilation linking time.
So back to my initial question : How to specify SONAME to be ( in this case libcap10.so ) used for link ?
( I already searched, and my current findings are just that it is a no go, but unfortunately this is a requirement ... ).
Update:
i patched .so library using a binary sed-like tool replacing libcapi10.so.6\0 with libcapi10.so\0, what works since new name is shorter than previous and that elf structure for SONAME is a C-string ended with a 0 and that elf checksum is not used during gcc linking. i used that patched library only at compilation time, then i can use either one or the other original library on my target system with the same binary.

patchelf is your friend. You can do something like: patchelf --replace-needed libcapi10.so.3 libcapi10.so.4 <your_thing>.
Patchelf is useful for a variety of other things such as changing RPATH, too. Check out the manpage. Very nifty toy.

My current best answer allowing to fullfill my need.
I patched .so library using a binary sed-like tool replacing libcapi10.so.6\0 with libcapi10.so\0, what works since new name is shorter than previous and that elf structure for SONAME is a C-string ended with a 0 and that elf checksum is not used during gcc linking. i used that patched library only at compilation time, then i can use either one or the other original library on my target system with the same binary.
best hints in stackverflow found there :
How can I change the filename of a shared library after building a program that depends on it?

How to relink existing shared library with extra object file

I have existing Linux shared object file (shared library) which has been stripped. I want to produce a new version of the library with some additional functions included. I had hoped that something like the following would work, but does not:
ld -o newlib.so newfuncs.o --whole-archive existinglib.so
I do not have the source to the existing library. I could get it but getting a full build environment with the necessary dependencies in place would be a lot of effort for what seems like a simple problem.

You might like to try coming at this from a slightly different angle by loading your object using preloading.
Set LD_PRELOAD to point to your new object
export LD_PRELOAD=/my/newfuncs/dir/newfuncs.o
and specify the existing library in the same way via your LD_LIBRARY_PATH.
This will then instruct the run time linker to search for needed symbols in your object before looking in objects located in your LD_LIBRARY_PATH.
BTW You can put calls in your object to then call the function that would've been called if you hadn't specified an LD_PRELOAD object or objects. This is why this is sometimes called interposing.
This is how many memory allocation analysis tools work. They interpose versions of malloc() and free() that records the calls to alloc() and free() before then calling the actual system alloc and free functions to perform the memory management.
There's plenty of tutorials on the interwebs on using LD_PRELOAD. One of the original and best is still "Building library interposers for fun and profit". Though written nine years ago and written for Solaris it is still an excellent resource.
HTH and good luck.

Completely untested idea:
# mv existinglib.so existinglib-real.so
# ld -o exlistinglib.so -shared newfuncs.o -lexistinglib-real
The dynamic linker, when loading a program that expects to load existinglib.so, will find your version, and also load existinglib-real.so on which it depends. It doesn't quite achieve the stated goal of your question, but should look as if it does to the program loading the library.

Shared libraries are not archives, they are really more like executables. You cannot therefore just stuff extra content into them in the same way you could for a .a static library.

Short answer: exactly what you asked for can't be done.
Longer answer: depending on why you want to do this, and exactly how existinglib.so has been linked, you may get close to desired behavior. In addition to LD_PRELOAD and renaming existinglib.so already mentioned, you could also use a linker script (cat /lib/libc.so to see what I mean).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight