Loading .so Files From Memory [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
dlopen from memory?
I've seen this for Windows' DLL files, being loaded from a memory buffer, but I cant find it anywhere for Linux, and "ld" source code is the most complex code I've ever seen. So:
Is there any example of loading .so files from memory? Even a simple one that I can finish? I just don't know where to start, even though I've read most of the ELF specifications it's still mysterious to me.

You're looking at the source code of a wrong thing: ld doesn't do program and library loading. Instead, you should look at the source code of dlopen and dlsym functions found in libc. Also, you should look at the source of the dynamic linker: ld-linux.so (the true name varies with the platform; execute ldd /bin/ls to find out where the dynamic linker resides).
ELF parsing isn't difficult, but it requires attention to detail and understanding of assembly code for the particular CPU; you need also ABI specification for your platform (and it's different for 32- and 64-bit linux, and is also different between CPUs.)
If you just need to load object files from memory at run-time (i.e., it doesn't have to be a SO), you can look at X11 project: they have implemented a module system which, basically, loads object code at some address and relocates it.

You need dlopen() family of functions (on GNU/Linux, they are defined in /usr/include/dlfcn.h).
For an example, take a look at how PHP does modules.

What does "loading .so files from memory" means to you?
If you have any *.so file, then it is in some file system, and has a path. Then just use dlopen on it.
If it is not a file, what is it? How did you get in memory? What exactly have you in memory? (Do you have an ELF header and ELF layout in memory?)
If you have enough information to make an ELF *.so file, dump (i.e. write) such file into some file system (use a temporary filesystem like tmpfs if you are concerned with disk performance). Then dlopen that.
If you don't have enough information to make an ELF .so file, then probably you are dynamically building code in memory. Look at what existing machine code generating infrastructure (like LLVM, GCCJIT, libjit, GNU lightning, LuaJit ....) are doing.
If you have a full functional code in memory, ensure that the memory is executable with mmap & mprotect and jump into it (e.g. using function pointer tricks).

Related

Getting known library paths from ldconfig for use with dlopen

I have a program written in C that uses dlopen for loading plug-in modules. When the library is dynamically loaded, it runs constructor code which register pointer to structure with function implementations with the main application by use of exported function. I want to use absolute path for specifying the file to dlopen.
Then I have other part of the program with takes file, determine if it is ELF, then looks into the ELF header for specific ELF section, read this section and extract from it pertinent information. This way it filters only shared libraries which I have previously tagged as a plug-in module.
However, I am solving a problem how to discover them on the fly (in portable Linux way, i.e. it will run on Debian and on Fedora too and so on) from the main program. I have been thinking about using ldconfig for this. (As the modules will be installed by way of distro packaging system, APT for example.) Is there any way how to programmatically get the string list of known libraries from C program other than directly reading the /etc/ld.co.cache file? I was thinking that maybe there is some header library which will give char** when I ask.
Or, maybe is there any better solution to my problem?
(I am proponent of using standard system components that programming one-off solutions which will need support in the future.)

How the OS find shared library path in two different linking?:run-time linking (loading) and compile time linking shared library in linux

i am a little confused about how shared library and the OS works.
1st question : how the OS manages shared libraries?, how they are specified uniquely? by file name or some other(say an ID) things? or by full path?!
2nd question : i know first when we compile and link codes, the linker need to access the shared library(.so) to perform linking, then after this stage when we execute the compiled program the OS loads the shared library and this libraries may be in different locations(am I wrong?) BUT i do not understand how the OS knows where to look for shared library, is library information (name? path? or what?!) coded in the executable ?
When compiling a program, libraries (other than the language runtime) must be explicitly specified in the build, otherwise they will not be included. There are some standard library directories, so for example you can specify -lfoo, and it will automatically look for libfoo.a or libfoo.so in the various usual directories like /usr/lib, /usr/local/lib etc.
Note, however, that a name like libfoo.so is usually a symlink to the actual library file name, which might be something like libfoo.so.1. This way, if there needs to be a backward-incompatible change to the ABI (the layout of some structure might change, say), then the new version of the library becomes libfoo.so.2, and binaries linked against the old version remain unaffected.
So the linker follows the symlink, and inserts a reference to the versioned name libfoo.so.1 into the executable, instead of the unversioned name libfoo.so. It can also insert the full path, but this is usually not done. Instead, when the executable is run, there is a system search path, as configured in your systemwide /etc/ld.so.conf, that is used to find the library.
(Actually, ld.so.conf is just the human-readable source for your library search paths; this is compiled into binary form in /etc/ld.so.cache for speed, using the ldconfig command. This is why you need to run ldconfig every time you make changes to the shareable libraries on your system.)
That’s a very simplified explanation of what is going on. There is a whole lot more that is not covered here. Here and here are some reference docs that might be useful on the build process. And here is a description of the system executable loader.

Creating ELF binaries without using libelf or other libraries

Recently I tried to write a simple compiler on the linux platform by myself.
When it comes to the backend of the compiler, I decided to generate ELF-formatted binaries without using a third-party library, such as libelf.
Instead I want to try to write machine code directly into the file coresponding to the ELF ABI just by using the write() function and controlling all details of the ELF file.
The advantage of this approach is that I can control everything for my compiler.
But I am hesitating. Is that way feasible, considering how detailed the ELF ABI is?
I hope for any suggestions and pointers to good available resources available.
How easy/feasible this is depends on what features you want to support. If you want to use dynamic linking, you have to deal with the symbol table, relocations, etc. And of course if you want to be able to link with existing libraries, even static ones, you'll have to support whatever they need. But if your goal is just to make standalone static ELF binaries, it's really very easy. All you need is a main ELF header (100% boilerplate) and 2 PT_LOAD program headers: one to load your program's code segment, the other to load its data segment. In theory they could be combined, but security-hardened kernels do not allow a given page to be both writable and executable, so it would be smart to separate them.
Some suggested reading:
http://www.linuxjournal.com/article/1059

Linux: Is it possible to make some plugin oriented programming using statically linked binaries?

Assume we have a very small embedded system consisting only of the linux kernel and a single statically linked binary run as init. We want the binary to be able to dynamically load external plugins in runtime.
Is it possible on linux? Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world, so is there any other way to do it?
You could run the "plugins" as child processes, and communicate over IPC (shared memory, pipes, or so forth).
They would exist in their own process space, so you couldn't directly call functions in them (besides, if they're also statically linked, you won't have any function entry points other than main that you could reach), but you could (e.g.) send a command over a named pipe, or pass data in a shared memory structure.
Note that, the moment you load the second binary, you have lost one of the main benefits of static linking (because now you have two copies of your libc loaded), so you might want to consider just biting the bullet and using dynamic linking. You'll burn a few 100K's in adding the dynamic linking support, but the GNU libc is about 2M, so if you're loading one plug-in, you've gained maybe 1.8M in savings already; and for each additional plug-in you load, you're saving some 2M.
Dlopen only works with shared libraries and dynamic linking cause static binaries don't export any symbols to the outside world
You can dlopen a shared library from a statically linked binary when using glibc. If you need your plugin to reference symbols from the main executable, you would have to pass in pointers to them into the plugin, similar to this.
is there any other way to do it?
You could also write your own module loader. The Linux kernel does this, and so does Xorg.

How do runtime loadable kernel modules know the addresses of core kernel functions?

I would be interested in answers for both Linux and NT (or any other for that matter)
Edit:
Thanks Laurion for the answer.
More information here:
http://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-one
http://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-two
The runtime loader normally fixes up references to imported functions when the module is loaded. It looks at the table of imported functions and puts in the proper address. The module uses the imported functions through an indirection table.
Having written a loader for both windows kernel (and windows userspace) before: it works the same way. essentially all binaries have something called IAT (eg, http://msdn.microsoft.com/en-us/magazine/cc301808.aspx this is the eternal classic paper). When the loader allocated memory for the DLL it will copy the DLL there, and read the IAT of the DLL for all the symbols that it needs (by name), and then lookup the names in the export section of the Windows core DLL (eg, kernel32.dll), and fill it up with the address read. all the needed files will have to be read and address fillup, before the DLL can continue execution.
Linux works the same way too.....be it userspace or kernel. ELF structure call it relocation table.
http://www.bravegnu.org/gnu-eprog/linker.html
Hope that help :-) (the details are similar for x86 arch).

Resources