How to circumvent dlopen() caching?

How to circumvent dlopen() caching? - c

According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).

POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).

Related

How can I "dump" a Function to a file?

For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?

If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.

The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!

Will dlopen yield the same handle for two calls with the same file?

If I use dlopen on the same lib/file two times in the same application-run, will it yield the same handle in both cases? Is there any guarantee for this (a short experiment showed that it at least does on my box)?
I'm currently playing around with a little plugin-system (out of curiosity) and if there would be some kind of guarantee for this observed behaviour I could use this address as a key for a plugin to prevent duplicated loads.

Yes. The dlopen(3) linux man page says:
If the same library is loaded again with dlopen(), the same file
handle is returned. The dl library maintains reference counts for
library handles, so a dynamic library is not deallocated until
dlclose() has been called on it as many times as dlopen()
has succeeded on it.
BTW, on Linux systems, you can dlopen a lot (many dozens of thousands) of shared libraries, as my example manydl.c demonstrates. The main limitation is address space.
So practically, not bothering about dlclose-ing stuff is possible.
(unless your dlopen-ed shared libraries have weird or resource consuming constructor or destructor functions)
Added in December 2017:
Notice that what is relevant is the exact path string passed to dlopen. So if you use "./foo.so" and "././foo.so" (or "../foosymlink.so" where foosymlink.so is a symlink to foo.so) the dlopen-ed handles are different, and in some cases weird behavior of the two instances of that shared library might happen.
added in june 2019:
Read also Drepper's How to write shared libraries paper (it explains also well how to use them!).

CPU dependent code: how to avoid function pointers?

I have performance critical code written for multiple CPUs. I detect CPU at run-time and based on that I use appropriate function for the detected CPU. So, now I have to use function pointers and call functions using these function pointers:
void do_something_neon(void);
void do_something_armv6(void);
void (*do_something)(void);
if(cpu == NEON) {
do_something = do_something_neon;
}else{
do_something = do_something_armv6;
}
//Use function pointer:
do_something();
...
Not that it matters, but I'll mention that I have optimized functions for different cpu's: armv6 and armv7 with NEON support. The problem is that by using function pointers in many places the code become slower and I'd like to avoid that problem.
Basically, at load time linker resolves relocs and patches code with function addresses. Is there a way to control better that behavior?
Personally, I'd propose two different ways to avoid function pointers: create two separate .so (or .dll) for cpu dependent functions, place them in different folders and based on detected CPU add one of these folders to the search path (or LD_LIB_PATH). The, load main code and dynamic linker will pick up required dll from the search path. The other way is to compile two separate copies of library :)
The drawback of the first method is that it forces me to have at least 3 shared objects (dll's): two for the cpu dependent functions and one for the main code that uses them. I need 3 because I have to be able to do CPU detection before loading code that uses these cpu dependent functions. The good part about the first method is that the app won't need to load multiple copies of the same code for multiple CPUs, it will load only the copy that will be used. The drawback of the second method is quite obvious, no need to talk about it.
I'd like to know if there is a way to do that without using shared objects and manually loading them at runtime. One of the ways would be some hackery that involves patching code at run-time, it's probably too complicated to get it done properly). Is there a better way to control relocations at load time? Maybe place cpu dependent functions in different sections and then somehow specify what section has priority? I think MAC's macho format has something like that.
ELF-only (for arm target) solution is enough for me, I don't really care for PE (dll's).
thanks

You may want to lookup the GNU dynamic linker extension STT_GNU_IFUNC. From Drepper's blog when it was added:
Therefore I’ve designed an ELF extension which allows to make the decision about which implementation to use once per process run. It is implemented using a new ELF symbol type (STT_GNU_IFUNC). Whenever the a symbol lookup resolves to a symbol with this type the dynamic linker does not immediately return the found value. Instead it is interpreting the value as a function pointer to a function that takes no argument and returns the real function pointer to use. The code called can be under control of the implementer and can choose, based on whatever information the implementer wants to use, which of the two or more implementations to use.
Source: http://udrepper.livejournal.com/20948.html
Nonetheless, as others have said, I think you're mistaken about the performance impact of indirect calls. All code in shared libraries will be called via a (hidden) function pointer in the GOT and a PLT entry that loads/calls that function pointer.

For the best performance you need to minimize the number of indirect calls (through pointers) per second and allow the compiler to optimize your code better (DLLs hamper this because there must be a clear boundary between a DLL and the main executable and there's no optimization across this boundary).
I'd suggest doing these:
moving as much of the main executable's code that frequently calls DLL functions into the DLL. That'll minimize the number of indirect calls per second and allow for better optimization at compile time too.
moving almost all your code into separate CPU-specific DLLs and leaving to main() only the job of loading the proper DLL OR making CPU-specific executables w/o DLLs.

Here's the exact answer that I was looking for.
GCC's __attribute__((ifunc("resolver")))
It requires fairly recent binutils.
There's a good article that describes this extension: Gnu support for CPU dispatching - sort of...

Lazy loading ELF symbols from shared libraries is described in section 1.5.5 of Ulrich Drepper's DSO How To (updated 2011-12-10). For ARM it is described in section 3.1.3 of ELF for ARM.
EDIT: With the STT_GNU_IFUNC extension mentioned by R. I forgot that was an extension. GNU Binutils supports that for ARM, apparently since March 2011, according to changelog.
If you want to call functions without the indirection of the PLT, I suggest function pointers or per-arch shared libraries inside which function calls don't go through PLTs (beware: calling an exported function is through the PLT).
I wouldn't patch the code at runtime. I mean, you can. You can add a build step: after compilation disassemble your binaries, find all offsets of calls to functions that have multi-arch alternatives, build table of patch locations, link that into your code. In main, remap the text segment writeable, patch the offsets according to the table you prepared, map it back to read-only, flush the instruction cache, and proceed. I'm sure it will work. How much performance do you expect to gain by this approach? I think loading different shared libraries at runtime is easier. And function pointers are easier still.

How to ensure unused symbols are not linked into the final executable?

First of all my apologies to those of you who would have followed my questions posted in the last few days. This might sound a little repetitive as I had been asking questions related to -ffunction-sections & -fdata-sections and this one is on the same line. Those questions and their answers didn't solve my problem, so I realized it is best for me to state the full problem here and let SO experts ponder about it. Sorry for not doing so earlier.
So, here goes my problem:
I build a set of static libraries which provide a lot of functionalities. These static libraries will be provided to many products. Not all products will use all of the functionalities provided by my libs. The problem is that the library sizes are quite big and the products want it to be reduced. The main goal is to reduce the final executable size and not the library size itself.
Now, I did some research and found out that, if there are 4 functions in a source file and only one function of that is used by the application, the linker will still include the rest of the 3 functions into the final executable as they all belong to the same object file. I further analyzed and found that -ffunction-sections, -fdata-sections and -gc-sections(this one is a linker option) will ensure only that one function gets linked.
But, these options for some reasons beyond my control cannot be used now.
Is there any other way in which I can ensure that the linker will link only the function which is strictly required and exclude all other functions even if they are in the same object file?
Are there any other ways of dealing with the problem?
Note: Reorganizing my code is almost ruled out as it is a legacy code and big.
I am dealing mainly with VxWorks & GCC here.
Thanks for any help!

Ultimately, the only way to ensure that only the functions you want are linked is to ensure that each source (object) file in the library only exports one function symbol - one (visible) function per file. Typically, there are some files which export several functions which are always all used together - the initialization and finalization functions for a package, for example. Also, there are often functions used by the exported function that do not need to be visible outside the source (object) file - make sure they are static.
If you looked at Plauger's "The Standard C Library", you'll find that every function is implemented in a separate file, even if the file ends up 4 lines long (one header, one function line, an open brace, one line of code, and a close brace).
Jay asked:
In the case of a big project, doesn't it become difficult to manage with so many files? Also, I don't find many open source projects following this model. OpenSSL is one example.
I didn't say it was widely used - it isn't. But it is the way to make sure that binaries are minimized. The compiler (linker) won't do the minimization for you - at least, I'm not aware of any that do. On a large project, you design the source files so that closely related functions that will normally all be used together are grouped in single source files. Functions that are only occasionally used should be placed in separate files. Ideally, the rarely used functions should each be in their own file; failing that, group small numbers of them into small (but non-minimal) files. That way, if one of the rarely used functions is used, you only get a limited amount of extra unused code linked.
As to number of files - yes, the technique espoused does mean a lot of files. You have to weigh the workload of managing (naming) lots of files against the benefit of minimal code size. Automatic build systems remove most of the pain; VCS systems handle lots of files.
Another alternative is to put the library code into a shared object - or dynamic link library (DLL). The programs then link with the shared object, which is loaded into memory just once and shared between programs using it. The (non-constant) data is replicated for each process. This reduces the size of the programs on disk, at the cost of fixups during the load process. However, you then don't need to worry about executable size; the executables do not include the shared objects. And you can update the library (if you're careful) without recompiling the main programs that use it. The reduced size of the executables is one reason shared libraries are popular.

Change library load order at run time (like LD_PRELOAD but during execution)

How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");

AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.

There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.

It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.

You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.

there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work

Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight