I have written a custom library which implements malloc/calloc/realloc/free using the standard C prototypes, and I figured out how to compile it to an so. I want to test the library by linking a standard application against it? What would be a good way to do this? Once I have a working library I assume I can just load it with LD_PRELOAD, but how do I get my functions to co-exist with but take precedence over the system library ones? My functions need to make a call to malloc in order to get memory to run, so I can't just completely ditch stdlib... Help?
Functions that you are trying to replace are standard C functions, not macros, not system calls. So you have to simply give your functions the same names and compile them into a shared library.
Then, use LD_PRELOAD to pre-load your library before binary starts. Since all addresses are resolved once, linker will figure out addresses of your functions and remember their names and will not look for them in standard library later.
This approach might not work if your program is linked with the standard runtime statically. Also, it will not work on Mac OS X as there is another API for interpolation.
In Linux, for example, in order for your functions to co-exist (i.e. if you want to use system malloc in your own implementation of malloc), you have to open the standard library manually using dlopen, look up functions you need there using dlsym and call them later by address.
Don't write your malloc() in terms of malloc() -- write it using sbrk, which gets memory directly from the OS.
If you have control of the source code that is to use this library, here is one possibility. Use different function names: Rather than malloc, for example, call it newCoolMalloc. This method is sometimes simpler and doesn't depend on special linker options.
Then in your code, use #define to cause the code to call the desired set of functions. You can #define malloc to be something different. For example:
#define malloc newCoolMalloc
#define free newCoolFree
If you do that, though, you have to be very very careful to include that consistently. Otherwise you run the risk of using stdlib malloc in one place and then your own free in another leading to messy bugs. One way to help mitigate that situation is to (if possible) in your own code use custom names for the allocation and free functions. Then it is easier to ensure that the correct one is being called. You can define the various custom names to your own malloc functions or even the original stdlib malloc functions.
For example, you might use mallocPlaceHolder as the actual name in the code:
someThing = mallocPlaceHolder( nbytes );
Then your defines would look more like this:
#define mallocPlaceHolder myCoolMalloc
If no function of the form mallocPlaceHolder (and associated free) actually exist, it avoids mixing different libraries.
Related
I'd like to provide an implementation of malloc for newlib-nano when using it with gcc. In my situation, I have some source file, say main.c, that calls strftime. The newlib-nano implementation of strftime uses malloc. In a header file, my_memory.h, I've declared a function void *malloc(size_t size) and provided an implementation in a corresponding my_memory.c file.
When linking the project using gcc, the linker fails at .../libc_nano.a(liba-malloc.o) because of multiple definitions of malloc. The behavior I'd like is for the linker to take my implementation of malloc rather than pulling newlib-nano's, but to retain using newlib-nano's implementation of other standard library functions, e.g. memset.
I've searched around for an "exclude object file from static library" option in gcc to try to exclude libc_nano.a(liba-malloc.o) but with no luck. Note that the compiler is pulling in this object file and I don't have access to the compiler's libc_nano.a to patch liba-malloc.o with my own object file.
Anyway, am I missing something, or is it not possible to achieve what I'm trying to achieve?
Likely liba-malloc.o contains other allocator function definitions like calloc, free, realloc, etc. and thus is getting pulled in for linking because of references to one of them. You can see this with the -t option to ld (pass -Wl,-t on gcc command line when linking to use it). If this is the case, you can avoid linking it just by ensuring you've provided definitions of all these functions yourself.
A better idea might be getting rid of the malloc dependency by using a different strftime. It's rather ridiculous for strftime, especially an embedded-oriented implementation, to be calling malloc; it has no fundamental need to and I'm somewhat baffled how they found a way to make malloc useful to it. Aside from some tie-in with locale which could be extricated fairly easily, musl libc's strftime.c (disclosure: author=me) is very self-contained and could probably serve as a drop-in replacement.
So libgcc will use the application's malloc and free, if it has defined one, in order to satisfy the application's need to call free() after certain library calls, eg realpath.
Within my dynamic library, I really don't want to use that application malloc/free, because I don't trust it generally - I'm happy to use libgcc's implementation of malloc, which is what most applications use, so I declared my own malloc() function that calls libgcc's implementation via dlsym().
All was going well until... I want to call realpath (and perhaps others)! As a simple fix, I need a way to do the equivalent of dlsym() but on the main executable (which I don't own) to get the application's implementation of free, if any. Does such a thing exist?
I know it must, because the dynamic linker "does the right thing", but is it accessible to mere mortal programmers, and how?
In the particular case of realpath, I know I can provide a buffer, but that comes with its own unknown dangers about buffer size. For some other calls I can't do that.
I can also go down the winding path of symbol renaming with objcopy, but I'd prefer not to, if possible.
[laters]
I do take your point on malloc being possibly defined by another dynamic library, and I would want to use that version, while the application still uses its compiled in version (I have seen that it does continue to use it, even if tcmalloc is preloaded, for example.
I guess that extends the question to ask if any library has defined malloc, and if the application has defined malloc, I want to cherry-pick which version of malloc/free I use in each place in my code to match the behaviour of libgcc when necessary, and not when not, so I want to be able to get a reference to them both.
In the short term I have resolved my current issue by replacing realpath() in my code with a version that pre-defines the buffer using my malloc, before calling the libgcc implementation, but I feel this is very much a band-aide.
The interposing malloc can come from anywhere, not just the executable. It could be a shared object referenced by a DT_NEEDED entry in the dynamic section of another object, or a library injected into the process image using LD_PRELOAD.
In general, many libraries have functions which allocate something which then has to be deallocated using free. Software ported to Windows will not do this (because DLLs have separate heaps there), but otherwise, it is not uncommon. There is not just realpath, there is also strdup, asprintf, and probably I bunch of other functions I do not remember.
In your case, you should just call free on such pointers, and use a different name for your own memory deallocation functions. Once malloc has been interposed, it is not possible to safely use the original libc allocator because it has not been properly initialized. For example, if you call the glibc malloc function in a process which uses a different, interposed malloc, then malloc will not be thread-safe: the initialization is not thread-safe because the implementation knows that pthread_create will call malloc before creating the first new thread, thereby initializing malloc while the process is single-threaded. Which is why there is no synchronization in the initialization code.
(libgcc does not provide malloc, by the way. It comes from libc/glibc.)
I want to write a shared library in such a way that it is possible to isolate it’s memory usage from the application it is linked against. That is, if the shared library, let’s call it libmemory.so, calls malloc, I want to maintain that memory in a separate heap from the heap that is used to service calls to malloc made in the application. This question isn't about writing memory allocators, it more about linking and loading the library and application together.
So far I’ve been experimenting with combinations of function interposition, symbol visibility, and linking tricks. So far, I can’t get this right because of one thing: the standard library. I cannot find a way to distinguish between calls to the standard library that internally use malloc that happen in libmemory.so versus the application. This causes an issue since then any standard library usage within libmemory.so pollutes the application heap.
My current strategy is to interpose definitions of malloc in the shared library as a hidden symbol. This works nicely and all of the library code works as expected except, of course, the standard library which is loaded dynamically at runtime. Naturally, I’ve been trying to find a way to statically embed the standard library usage so that it would use the interposed malloc in libmemory.so at compile time. I’ve tried -static-libgcc and -static-libstdc++ without success (and anyway, it seems this is discouraged). Is this the right answer?
What do?
P.s., further reading is always appreciated, and help on the question tagging front would be nice.
I’ve tried -static-libgcc and -static-libstdc++ without success
Of course this wouldn't succeed: malloc doesn't live in libgcc or libstdc++; it lives in libc.
What you want to do is statically link libmemory.so with some alternative malloc implementation, such as tcmalloc or jemalloc, and hide all malloc symbols. Then your library and your application will have absolutely separate heaps.
It goes without saying that you must never allocate something in your library and free it in the application, or vice versa.
In theory you could also link the malloc part of system libc.a into your library, but in practice GLIBC (and most other UNIX C libraries) does not support partially-static link (if you link libc.a, you must not link libc.so).
Update:
If libmemory.so makes use of a standard library function, e.g., gmtime_r, which is linked in dynamically, thereby resolving malloc at runtime, then libmemory.so mistakenly uses malloc provided at runtime (the one apparently from glibc
There is nothing mistaken about that. Since you've hidden your malloc inside your library, there is no other malloc that gmtime_r could use.
Also, gmtime_r doesn't allocate memory, except for internal use by GLIBC itself, and such memory could be cleaned up by __libc_freeres, so it would be wrong to allocate this memory anywhere other than using GLIBC's malloc.
Now, fopen is another example you used, and fopen does malloc memory. Apparently you would like fopen to call your malloc (even though it's not visible to fopen) when called by your library, but call system malloc when called by the application. But how can fopen know who called it? Surely you are not suggesting that fopen walk the stack to figure out whether it was called by your library or by something else?
So, if you really want to make your library never call into system malloc, then you would have to statically link all other libc functions that you use and that may call malloc (and hide them in your library as well).
You could use something like uclibc or dietlibc to achieve that.
I want to modifiy an existing shared library so that it uses different memory management routines depending on the application using the shared library.
(For now) there will be two families of memory management routines:
The standard malloc, calloc etc functions
specialized versions of malloc, calloc etc
I have come up with a potential way of solving this problem (with the help of some people here on SO). There are still a few grey areas and I would like some feedback on my proposal so far.
This is how I intend to implement the modification:
Replace existing calls to malloc/calloc etc with my_malloc/my_calloc etc. These new functions will invoke correctly assigned function pointers instead of calling hard coded function names.
Provide a mechanism for the shared library to initialize the function pointers used by my_malloc etc to point to the standard C memory mgmt routines - this allows me to provide backward compatability to applications which depend on this shared library - so they don't have to be modified as well. In C++, I could have done this by using static variable initialization (for example) - I'm not sure if the same 'pattern' can be used in C.
Introduce a new idempotent function initAPI(type) function which is called (at startup) by the application that need to use different mem mgmt routines in the shared libray. The initAPI() function assigns the memory mgmt func ptrs to the appropriate functions.
Clearly, it would be preferable if I could restrict who could call initAPI() or when it was called - for example, the function should NOT be called after API calls have been made to the library - as this will change the memory mgmt routines. So I would like to restrict where it is called and by whom. This is an access problem which can be solved by making the method private in C++, I am not sure how to do this in C.
The problems in 2 and 3 above can be trivially resolved in C++, however I am constrained to using C, so I would like to solve these issues in C.
Finally, assuming that the function pointers can be correctly set during initialisation as described above - I have a second question, regarding the visibility of global variables in a shared library, accross different processes using the shared library. The function pointers will be implemented as global variables (I'm not too concerned about thread safety FOR NOW - although I envisage wrapping access with mutex locking at some point)* and each application using the shared library should not interfere with the memory management routines used for another application using the shared library.
I suspect that it is code (not data) that is shared between processes using a shlib - however, I would like that confirmed - preferably, with a link that backs up that assertion.
*Note: if I am naively downplaying threading issues that may occur in the future as a result of the 'architecture' I described above, someone please alert me!..
BTW, I am building the library on Linux (Ubuntu)
Since I'm not entirely sure what the question being asked is, I will try to provide information that may be of use.
You've indicated c and linux, it is probably safe to assume you are also using the GNU toolchain.
GCC provides a constructor function attribute that causes a function to be called automatically before execution enters main(). You could use this to better control when your library initialization routine, initAPI() is called.
void __attribute__ ((constructor)) initAPI(void);
In the case of library initialization, constructor routines are executed before dlopen() returns if the library is loaded at runtime or before main() is started if the library is loaded at load time.
The GNU linker has a --wrap <symbol> option which allows you to provide wrappers for system functions.
If you link with --wrap malloc, references to malloc() will redirect to __wrap_malloc() (which you implement), and references to __real_malloc() will redirect to the original malloc() (so you can call it from within your wrapper implementation).
Instead of using the --wrap malloc option to provide a reference to the original malloc() you could also dynamically load a pointer to the original malloc() using dlsym(). You cannot directly call the original malloc() from the wrapper because it will be interpreted as a recursive call to the wrapper itself.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <dlfcn.h>
void * malloc(size_t size) {
static void * (*func)(size_t) = NULL;
void * ret;
if (!func) {
/* get reference to original (libc provided) malloc */
func = (void *(*)(size_t)) dlsym(RTLD_NEXT, "malloc");
}
/* code to execute before calling malloc */
...
/* call original malloc */
ret = func(size);
/* code to execute after calling malloc */
...
return ret;
}
I suggest reading Jay Conrod's blog post entitled Tutorial: Function Interposition in Linux for additional information on replacing calls to functions in dynamic libraries with calls to your own wrapper functions.
-1 for the lack of concrete questions. The text is long, could have been written more succintly, and it does not contain a single question-mark.
Now to address your problems:
Static data (what you call "global variables") of a shared library is per-process. Your global variables in one process will not interfere with global variables in another process. No need for mutexes.
In C, you cannot restrict[1] who can call a function. It can be called by anybody who knows its name or has a pointer to it. You can code initAPI() such that it visibly aborts the program (crashes it) if it is not the first library function called. You are library writer, you set the rules of the game, and you have NO obligation towards coders who do not respect the rules.
[1] You can declare the function with static, meaning it can be called by name only by the code within the same translation unit; it can still be called through a pointer by anybody who manages to obtain a pointer to it. Such functions are not "exported" from libraries, so this is not applicable to your scenario.
Achieving this:
(For now) there will be two families of memory management routines:
The standard malloc, calloc etc functions
specialized versions of malloc, calloc etc
with dynamic libraries on Linux is trivial, and does not require the complicated scheme you have concocted (nor the LD_PRELOAD or dlopen suggested by #ugoren).
When you want to provide specialized versions of malloc and friends, simply link these routines into your main executable. Voila: your existing shared library will pick them up from there, no modifications required.
You could also build specialized malloc into e.g. libmymalloc.so, and put that library on the link line before libc, to achieve the same result.
The dynamic loader will use the first malloc it can see, and searches the list starting from the a.out, and proceeding to search other libraries in the same order they were listed on link command line.
UPDATE:
On further reflection, I don't think what you propose will work.
Yes, it will work (I use that functionality every day, by linking tcmalloc into my main executable).
When your shared library (the one providing an API) calls malloc "behind the scenes", which (of possibly several) malloc implementations does it get? The first one that is visible to the dynamic linker. If you link a malloc implementation into a.out, that will be the one.
It's easy enough for you to require that your initialization function is:
called from the main thread
that the client may call it exactly once
and that the client may provide the optional function pointers by parameter
If different applications run in separate processes, it's quite simple to do using dynamic libraries.
The library can simply call malloc() and free(), and applications that want to override it could load another library, with alternative implementations for these libraries.
This can be done with the LD_PRELOAD environment variable.
Or, if your library is loaded with dlopen(), just load the malloc library first.
This is basically what tools such as valgrind, which replace malloc, do.
How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");
AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.
There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.
It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.
You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.
there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work
Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.