dlopen from memory? - c

I'm looking for a way to load generated object code directly from memory.
I understand that if I write it to a file, I can call dlopen to dynamically load its symbols and link them. However, this seems a bit of a roundabout way, considering that it starts off in memory, is written to disk, and then is reloaded in memory by dlopen. I'm wondering if there is some way to dynamically link object code that exists in memory. From what I can tell there might be a few different ways to do this:
Trick dlopen into thinking that your memory location is a file, even though it never leaves memory.
Find some other system call which does what I'm looking for (I don't think this exists).
Find some dynamic linking library which can link code directly in memory. Obviously, this one is a bit hard to google for, as "dynamic linking library" turns up information on how to dynamically link libraries, not on libraries which perform the task of dynamically linking.
Abstract some API from a linker and create a new library out its codebase. (obviously this is the least desirable option for me).
So which ones of these are possible? feasible? Could you point me to any of the things I hypothesized existed? Is there another way I haven't even thought of?

I needed a solution to this because I have a scriptable system that has no filesystem (using blobs from a database) and needs to load binary plugins to support some scripts. This is the solution I came up with which works on FreeBSD but may not be portable.
void *dlblob(const void *blob, size_t len) {
/* Create shared-memory file descriptor */
int fd = shm_open(SHM_ANON, O_RDWR, 0);
ftruncate(fd, len);
/* MemMap file descriptor, and load data */
void *mem = mmap(NULL, len, PROT_WRITE, MAP_SHARED, fd, 0);
memcpy(mem, blob, len);
munmap(mem, len);
/* Open Dynamic Library from SHM file descriptor */
void *so = fdlopen(fd,RTLD_LAZY);
close(fd);
return so;
}
Obviously the code lacks any kind of error checking etc, but this is the core functionality.
ETA: My initial assumption that fdlopen is POSIX was wrong, this appears to be a FreeBSD-ism.

I don't see why you'd be considering dlopen, since that will require a lot more nonportable code to generate the right object format on disk (e.g. ELF) for loading. If you already know how to generate machine code for your architecture, just mmap memory with PROT_READ|PROT_WRITE|PROT_EXEC and put your code there, then assign the address to a function pointer and call it. Very simple.

There is no standard way to do it other than writing out the file and then loading it again with dlopen().
You may find some alternative method on your current specific platform. It will up to you to decide whether that is better than using the 'standard and (relatively) portable' approach.
Since generating the object code in the first place is rather platform specific, additional platform-specific techniques may not matter to you. But it is a judgement call - and in any case depends on there being a non-standard technique, which is relatively improbable.

We implemented a way to do this at Google. Unfortunately upstream glibc has failed to comprehend the need so it was never accepted. The feature request with patches has stalled. It's known as dlopen_from_offset.
The dlopen_with_offset glibc code is available in the glibc google/grte* branches. But nobody should enjoy modifying their own glibc.

You don't need to load the code generated in memory, since it is already in memory!
However, you can -in a non portable way- generate machine code in memory (provided it is in a memory segment mmap-ed with PROT_EXEC flag).
(in that case, no "linking" or relocation step is required, since you generate machine code with definitive absolute or relative addresses, in particular to call external functions)
Some libraries exist which do that: On GNU/Linux under x86 or x86-64, I know of GNU Lightning (which generates quickly machine code which runs slowly), DotGNU LibJIT (which generates medium quality code), and LLVM & GCCJIT (which is able to generate quite optimized code in memory, but takes time to emit it). And LuaJit has some similar facility too. Since 2015 GCC 5 has a gccjit library.
And of course, you can still generate C code in a file, fork a compiler to compile it into a shared object, and dlopen that shared object file. I'm doing that in GCC MELT , a domain specific language to extend GCC. It does work quite well in practice.
addenda
If performance of writing the generated C file is a concern (it should not be, since compiling a C file is much slower than writing it) consider using some tmpfs file system for that (perhaps in /tmp/ which is often a tmpfs filesystem on Linux)

Related

How to circumvent dlopen() caching?

According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).
POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).

How can I "dump" a Function to a file?

For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?
If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.
The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!

How can a C shared library function learn the executable's path

I am writing a C shared library in Linux in which a function would like to discover the path to the currently running executable. It does NOT have access to argv[0] in main(), and I don't want to require the program accessing the library to pass that in.
How can a function like this, outside main() and in the wild, get to the path of the running executable? So far I've thought of 2 rather unportable, unreliable ways: 1) try to read /proc/getpid()/exe and 2) try to climb the stack to __libc_start_main() and read the stack params. I worry about all machines having /proc mounted.
Can you think of another way? Is there something buried anywhere in dlopen(NULL, 0) ? Can I get a reliable proc image of self from the kernel??
Thanks for any thoughts.
/proc is your best chance, as "path of the executable" is not that well defined concept in Linux (you can even delete it while the program is running).
To get the breakdown of loaded modules (with the main executable usually being the first entry) you should look at /proc/<pid>/maps. It's a text formatted file which will allow you to associate executable and library paths with load addresses (if the former are known and still valid).
Unless you are writing software that may be used very early in system startup, you can safely assume that /proc will always be mounted on a Linux system. It contains quite a bit of data that is not accessible any other way, and thus must be mounted for a system to function properly. As such, you can pretty easily obtain a path to your executable using:
readlink("/proc/self/exe", buf, sizeof(buf));
If for some reason you want to avoid this, it's also possible to read it from the process's auxiliary vector:
#include <sys/auxv.h>
#include <elf.h>
const char *execpath = (const char *) getauxval(AT_EXECFN);
Note that this will require a recent version of glibc (2.16 or later). It'll also return the path that was used to execute your application (e.g, possibly something like ./binary), rather than its absolute path.

Testing my software with a not mmap-able filesystem

I have a piece of code that tries to mmap some file. If it can mmap the file, it does something, if it can not mmap it does something else.
The code is working in both cases, but I wanted to to do some tests in some filesystem that does not support mmap. The problem is that I did not find any filesystem that were not able to mmap.
Can someone point me out some filesystem that is not mmap-able?
You might simulate a non-mmap-capable system using library interposition. Simply take this C file
#include <errno.h>
#include <sys/mman.h>
void* mmap(void*, size_t, int, int, int, off_t) {
errno = ENODEV;
return NULL;
}
and compile it to a shared library. Name the path of that library in the LD_PRELOAD environment variable, and it should take precedence over the real mmap, thus simulating a system where mmap will always fail. That way, you can test your code without superuser privileges, without creating certain file systems, and without having to have the corresponding kernel modules, userland tools and the likes available.
You might theoretically encounter situations where some library outside your control relies on specific kinds of mmap to always work. A map with MAP_ANONYMOUS is a prime example, since it is not backed by a file system and therefore does not depend on FS types. If you encounter any problems in libraries failing due to violated mmap assumptions, you might have to modify the interposer to have a closer look at its arguments and forward some calls to the libc implementation while rejecting others itself. But I'd only do this if the need actually arises.

How to dynamically load often re-generated c code quickly?

I want to be able to generate C code dynamically and re-load it quickly into my running C program.
I am on Linux, how could this be done?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a gcc -O -fPIC -shared command to make some shared object foo.so out of your generated foo.c). By experience, generating C code takes much less time than compiling it (with gcc -O). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.
Once you emitted your C code, forked its compilation into a .so shared object, you can dlopen it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need to dlclose on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practically dlopen-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused .so would be swapped-out. dlopen is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.
You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
addenda
You asked
Can a library .so file on Linux be re-compiled and re- loaded at runtime?
Of course yes: you should fork a command to build the library from the generated C code (e.g. a gcc -O -fPIC -shared generated.c -o generated.so, but you could do it indirectly e.g. by running a make -j, especially if the generated.so is big enough to make it relevant to split the generated.c in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like /some/file/path/to/generated.so, and probably the RTLD_NOW flag, to it) and you have to use dlsym to find relevant symbols inside. Don't think of re-loading (a second time) the same generated.so, better to emit a unique generated1.c (then generated2.c etc...) C file, then to compile it to a unique generated1.so (the second time to generated2.so, etc...) then to dlopen it (and this can be done many hundred thousands of times). You may want to have, in the emitted generated*.c files, some constructor functions which would be executed at dlopen time of the generated*.so
Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your generated*.so thru dlsym-ed function pointers. In practice you would decide for example that each generated*.c defines a function void dynfoo(int) and int dynbar(int,int) and use dlsym with "dynfoo" and "dynbar" and call these thru function pointers (returned by dlsym). You should also define conventions of how and when these dynfoo and dynbar would be called. You'll better link your base application with -rdynamic so that your generated*.c files could call your application functions.
You don't want your generated*.so to re-define existing names. For instance, you don't want to redefine malloc in your generated*.c and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).
You probably won't bother to dlclose a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all to dlclose). If you do dlclose some dynamically loaded generated*.so file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.
P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.
Your best bet's probably the TCC compiler, which allows you to do exactly this --- compile source code, add it to your program, run it, all without touching files.
For a more robust but non-C-based solution, you should probably check out the LLVM project, which does much the same thing but from the perspective of producing JITs. You don't get to go via C, instead using a kind of abstract portable machine code, but the generated code is loads faster and it's under more active development.
OTOH if you want to do it all manually by shelling out to gcc, compiling a .so and then loading it yourself, dlopen() and dlclose() will do what you want.
Are you sure C is the right answer here? There are various interpreted languages such as Lua, Bigloo Scheme, or perhaps even Python that embed very well into an existing C application. You can write the dynamic parts using the extension language, which will support reloading code at runtime.
The obvious disadvantage is performance - if you absolutely need the raw speed of compiled C then these may be a no-go.
If you want to reload a library dynamically, you can use dlopen function (see mans). It opens a library .so file and returns a void* pointer to it, then you can get a pointer to any function/variable of your library with dlsym.
To compile your libraries in-memory, well, the best thing I think you can do is creating memory filesystem as described here.

Resources