For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?
If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.
The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!
Related
According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).
POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).
I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13
This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.
I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.
To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so
I want to be able to generate C code dynamically and re-load it quickly into my running C program.
I am on Linux, how could this be done?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a gcc -O -fPIC -shared command to make some shared object foo.so out of your generated foo.c). By experience, generating C code takes much less time than compiling it (with gcc -O). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.
Once you emitted your C code, forked its compilation into a .so shared object, you can dlopen it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need to dlclose on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practically dlopen-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused .so would be swapped-out. dlopen is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.
You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
addenda
You asked
Can a library .so file on Linux be re-compiled and re- loaded at runtime?
Of course yes: you should fork a command to build the library from the generated C code (e.g. a gcc -O -fPIC -shared generated.c -o generated.so, but you could do it indirectly e.g. by running a make -j, especially if the generated.so is big enough to make it relevant to split the generated.c in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like /some/file/path/to/generated.so, and probably the RTLD_NOW flag, to it) and you have to use dlsym to find relevant symbols inside. Don't think of re-loading (a second time) the same generated.so, better to emit a unique generated1.c (then generated2.c etc...) C file, then to compile it to a unique generated1.so (the second time to generated2.so, etc...) then to dlopen it (and this can be done many hundred thousands of times). You may want to have, in the emitted generated*.c files, some constructor functions which would be executed at dlopen time of the generated*.so
Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your generated*.so thru dlsym-ed function pointers. In practice you would decide for example that each generated*.c defines a function void dynfoo(int) and int dynbar(int,int) and use dlsym with "dynfoo" and "dynbar" and call these thru function pointers (returned by dlsym). You should also define conventions of how and when these dynfoo and dynbar would be called. You'll better link your base application with -rdynamic so that your generated*.c files could call your application functions.
You don't want your generated*.so to re-define existing names. For instance, you don't want to redefine malloc in your generated*.c and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).
You probably won't bother to dlclose a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all to dlclose). If you do dlclose some dynamically loaded generated*.so file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.
P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.
Your best bet's probably the TCC compiler, which allows you to do exactly this --- compile source code, add it to your program, run it, all without touching files.
For a more robust but non-C-based solution, you should probably check out the LLVM project, which does much the same thing but from the perspective of producing JITs. You don't get to go via C, instead using a kind of abstract portable machine code, but the generated code is loads faster and it's under more active development.
OTOH if you want to do it all manually by shelling out to gcc, compiling a .so and then loading it yourself, dlopen() and dlclose() will do what you want.
Are you sure C is the right answer here? There are various interpreted languages such as Lua, Bigloo Scheme, or perhaps even Python that embed very well into an existing C application. You can write the dynamic parts using the extension language, which will support reloading code at runtime.
The obvious disadvantage is performance - if you absolutely need the raw speed of compiled C then these may be a no-go.
If you want to reload a library dynamically, you can use dlopen function (see mans). It opens a library .so file and returns a void* pointer to it, then you can get a pointer to any function/variable of your library with dlsym.
To compile your libraries in-memory, well, the best thing I think you can do is creating memory filesystem as described here.
I have performance critical code written for multiple CPUs. I detect CPU at run-time and based on that I use appropriate function for the detected CPU. So, now I have to use function pointers and call functions using these function pointers:
void do_something_neon(void);
void do_something_armv6(void);
void (*do_something)(void);
if(cpu == NEON) {
do_something = do_something_neon;
}else{
do_something = do_something_armv6;
}
//Use function pointer:
do_something();
...
Not that it matters, but I'll mention that I have optimized functions for different cpu's: armv6 and armv7 with NEON support. The problem is that by using function pointers in many places the code become slower and I'd like to avoid that problem.
Basically, at load time linker resolves relocs and patches code with function addresses. Is there a way to control better that behavior?
Personally, I'd propose two different ways to avoid function pointers: create two separate .so (or .dll) for cpu dependent functions, place them in different folders and based on detected CPU add one of these folders to the search path (or LD_LIB_PATH). The, load main code and dynamic linker will pick up required dll from the search path. The other way is to compile two separate copies of library :)
The drawback of the first method is that it forces me to have at least 3 shared objects (dll's): two for the cpu dependent functions and one for the main code that uses them. I need 3 because I have to be able to do CPU detection before loading code that uses these cpu dependent functions. The good part about the first method is that the app won't need to load multiple copies of the same code for multiple CPUs, it will load only the copy that will be used. The drawback of the second method is quite obvious, no need to talk about it.
I'd like to know if there is a way to do that without using shared objects and manually loading them at runtime. One of the ways would be some hackery that involves patching code at run-time, it's probably too complicated to get it done properly). Is there a better way to control relocations at load time? Maybe place cpu dependent functions in different sections and then somehow specify what section has priority? I think MAC's macho format has something like that.
ELF-only (for arm target) solution is enough for me, I don't really care for PE (dll's).
thanks
You may want to lookup the GNU dynamic linker extension STT_GNU_IFUNC. From Drepper's blog when it was added:
Therefore I’ve designed an ELF extension which allows to make the decision about which implementation to use once per process run. It is implemented using a new ELF symbol type (STT_GNU_IFUNC). Whenever the a symbol lookup resolves to a symbol with this type the dynamic linker does not immediately return the found value. Instead it is interpreting the value as a function pointer to a function that takes no argument and returns the real function pointer to use. The code called can be under control of the implementer and can choose, based on whatever information the implementer wants to use, which of the two or more implementations to use.
Source: http://udrepper.livejournal.com/20948.html
Nonetheless, as others have said, I think you're mistaken about the performance impact of indirect calls. All code in shared libraries will be called via a (hidden) function pointer in the GOT and a PLT entry that loads/calls that function pointer.
For the best performance you need to minimize the number of indirect calls (through pointers) per second and allow the compiler to optimize your code better (DLLs hamper this because there must be a clear boundary between a DLL and the main executable and there's no optimization across this boundary).
I'd suggest doing these:
moving as much of the main executable's code that frequently calls DLL functions into the DLL. That'll minimize the number of indirect calls per second and allow for better optimization at compile time too.
moving almost all your code into separate CPU-specific DLLs and leaving to main() only the job of loading the proper DLL OR making CPU-specific executables w/o DLLs.
Here's the exact answer that I was looking for.
GCC's __attribute__((ifunc("resolver")))
It requires fairly recent binutils.
There's a good article that describes this extension: Gnu support for CPU dispatching - sort of...
Lazy loading ELF symbols from shared libraries is described in section 1.5.5 of Ulrich Drepper's DSO How To (updated 2011-12-10). For ARM it is described in section 3.1.3 of ELF for ARM.
EDIT: With the STT_GNU_IFUNC extension mentioned by R. I forgot that was an extension. GNU Binutils supports that for ARM, apparently since March 2011, according to changelog.
If you want to call functions without the indirection of the PLT, I suggest function pointers or per-arch shared libraries inside which function calls don't go through PLTs (beware: calling an exported function is through the PLT).
I wouldn't patch the code at runtime. I mean, you can. You can add a build step: after compilation disassemble your binaries, find all offsets of calls to functions that have multi-arch alternatives, build table of patch locations, link that into your code. In main, remap the text segment writeable, patch the offsets according to the table you prepared, map it back to read-only, flush the instruction cache, and proceed. I'm sure it will work. How much performance do you expect to gain by this approach? I think loading different shared libraries at runtime is easier. And function pointers are easier still.
How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");
AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.
There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.
It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.
You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.
there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work
Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.