C Shared library: static variable initialization + global variable visibility among processes - c

I want to modifiy an existing shared library so that it uses different memory management routines depending on the application using the shared library.
(For now) there will be two families of memory management routines:
The standard malloc, calloc etc functions
specialized versions of malloc, calloc etc
I have come up with a potential way of solving this problem (with the help of some people here on SO). There are still a few grey areas and I would like some feedback on my proposal so far.
This is how I intend to implement the modification:
Replace existing calls to malloc/calloc etc with my_malloc/my_calloc etc. These new functions will invoke correctly assigned function pointers instead of calling hard coded function names.
Provide a mechanism for the shared library to initialize the function pointers used by my_malloc etc to point to the standard C memory mgmt routines - this allows me to provide backward compatability to applications which depend on this shared library - so they don't have to be modified as well. In C++, I could have done this by using static variable initialization (for example) - I'm not sure if the same 'pattern' can be used in C.
Introduce a new idempotent function initAPI(type) function which is called (at startup) by the application that need to use different mem mgmt routines in the shared libray. The initAPI() function assigns the memory mgmt func ptrs to the appropriate functions.
Clearly, it would be preferable if I could restrict who could call initAPI() or when it was called - for example, the function should NOT be called after API calls have been made to the library - as this will change the memory mgmt routines. So I would like to restrict where it is called and by whom. This is an access problem which can be solved by making the method private in C++, I am not sure how to do this in C.
The problems in 2 and 3 above can be trivially resolved in C++, however I am constrained to using C, so I would like to solve these issues in C.
Finally, assuming that the function pointers can be correctly set during initialisation as described above - I have a second question, regarding the visibility of global variables in a shared library, accross different processes using the shared library. The function pointers will be implemented as global variables (I'm not too concerned about thread safety FOR NOW - although I envisage wrapping access with mutex locking at some point)* and each application using the shared library should not interfere with the memory management routines used for another application using the shared library.
I suspect that it is code (not data) that is shared between processes using a shlib - however, I would like that confirmed - preferably, with a link that backs up that assertion.
*Note: if I am naively downplaying threading issues that may occur in the future as a result of the 'architecture' I described above, someone please alert me!..
BTW, I am building the library on Linux (Ubuntu)

Since I'm not entirely sure what the question being asked is, I will try to provide information that may be of use.
You've indicated c and linux, it is probably safe to assume you are also using the GNU toolchain.
GCC provides a constructor function attribute that causes a function to be called automatically before execution enters main(). You could use this to better control when your library initialization routine, initAPI() is called.
void __attribute__ ((constructor)) initAPI(void);
In the case of library initialization, constructor routines are executed before dlopen() returns if the library is loaded at runtime or before main() is started if the library is loaded at load time.
The GNU linker has a --wrap <symbol> option which allows you to provide wrappers for system functions.
If you link with --wrap malloc, references to malloc() will redirect to __wrap_malloc() (which you implement), and references to __real_malloc() will redirect to the original malloc() (so you can call it from within your wrapper implementation).
Instead of using the --wrap malloc option to provide a reference to the original malloc() you could also dynamically load a pointer to the original malloc() using dlsym(). You cannot directly call the original malloc() from the wrapper because it will be interpreted as a recursive call to the wrapper itself.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <dlfcn.h>
void * malloc(size_t size) {
static void * (*func)(size_t) = NULL;
void * ret;
if (!func) {
/* get reference to original (libc provided) malloc */
func = (void *(*)(size_t)) dlsym(RTLD_NEXT, "malloc");
}
/* code to execute before calling malloc */
...
/* call original malloc */
ret = func(size);
/* code to execute after calling malloc */
...
return ret;
}
I suggest reading Jay Conrod's blog post entitled Tutorial: Function Interposition in Linux for additional information on replacing calls to functions in dynamic libraries with calls to your own wrapper functions.

-1 for the lack of concrete questions. The text is long, could have been written more succintly, and it does not contain a single question-mark.
Now to address your problems:
Static data (what you call "global variables") of a shared library is per-process. Your global variables in one process will not interfere with global variables in another process. No need for mutexes.
In C, you cannot restrict[1] who can call a function. It can be called by anybody who knows its name or has a pointer to it. You can code initAPI() such that it visibly aborts the program (crashes it) if it is not the first library function called. You are library writer, you set the rules of the game, and you have NO obligation towards coders who do not respect the rules.
[1] You can declare the function with static, meaning it can be called by name only by the code within the same translation unit; it can still be called through a pointer by anybody who manages to obtain a pointer to it. Such functions are not "exported" from libraries, so this is not applicable to your scenario.

Achieving this:
(For now) there will be two families of memory management routines:
The standard malloc, calloc etc functions
specialized versions of malloc, calloc etc
with dynamic libraries on Linux is trivial, and does not require the complicated scheme you have concocted (nor the LD_PRELOAD or dlopen suggested by #ugoren).
When you want to provide specialized versions of malloc and friends, simply link these routines into your main executable. Voila: your existing shared library will pick them up from there, no modifications required.
You could also build specialized malloc into e.g. libmymalloc.so, and put that library on the link line before libc, to achieve the same result.
The dynamic loader will use the first malloc it can see, and searches the list starting from the a.out, and proceeding to search other libraries in the same order they were listed on link command line.
UPDATE:
On further reflection, I don't think what you propose will work.
Yes, it will work (I use that functionality every day, by linking tcmalloc into my main executable).
When your shared library (the one providing an API) calls malloc "behind the scenes", which (of possibly several) malloc implementations does it get? The first one that is visible to the dynamic linker. If you link a malloc implementation into a.out, that will be the one.

It's easy enough for you to require that your initialization function is:
called from the main thread
that the client may call it exactly once
and that the client may provide the optional function pointers by parameter

If different applications run in separate processes, it's quite simple to do using dynamic libraries.
The library can simply call malloc() and free(), and applications that want to override it could load another library, with alternative implementations for these libraries.
This can be done with the LD_PRELOAD environment variable.
Or, if your library is loaded with dlopen(), just load the malloc library first.
This is basically what tools such as valgrind, which replace malloc, do.

Related

ignore "default" functions in c [duplicate]

For example, if I want to override malloc(), what's the best way to do it?
Currently the simplest way I know of is:
malloc.h
#include <stdlib.h>
#define malloc my_malloc
void* my_malloc (size_t size);
foobar.c
#include "malloc.h"
void foobar(void)
{
void* leak = malloc(1024);
}
The problem with this approach is that we now have to use "malloc.h" and can never use "stdlib.h". Is there a way around this? I'm particularly interested in importing 3rd party libraries without modifying them at all, but forcing them into calling my custom libc functions (like malloc).
The short answer is you probably want to use the LD_PRELOAD trick: What is the LD_PRELOAD trick?
That approach basically inserts your own custom shared library on runtime before any other shared library is loaded, exporting the functions you want to override, such as malloc(). By the time the other shared libraries are loaded your symbol is already there and gets preference when resolving calls to that symbol name from other libraries. From within your malloc() wrapper/replacement you can even chose to call the next malloc symbol, which typically would be the actual libc symbol.
This blog post has a lot of comprehensive information about this method:
http://samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/
Note that example is overriding libc's write() and puts() functions, but the same logic applies for malloc():
LD_PRELOAD allows a shared library to be loaded before any other libraries. So all I need to do is to write a shared library that overrides write and puts functions. If we wrap these functions, we need a way to call the real functions to perform the system call. dlsym just do that for us [man 3 dlsym]: > The function dlsym() takes a “handle” of a dynamic library returned by dlopen() and the null-terminated symbol name, returning the address where that symbol is loaded into memory. If the symbol is not found, in the specified library or any of the libraries that were automatically loaded by dlopen() when that library was loaded, dlsym() returns NULL…
So inside the wrapper function we can use dlsym to get the address of the related symbol in memory and call the glibc function. Another approach can be calling the syscall directly, both approaches will work.
That blog post also describes a compile-time method I did not know about that involves passing a linker flag to ld, "--wrap":
Another way of wrapping functions is by using linker at the link time. GNU linker provides an option to wrap a function for a symbol [man 1 ld]: > Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to “__wrap_symbol”. Any undefined reference to “__real_symbol” will be resolved to symbol.
The handy thing about LD_PRELOAD is that might allow you to change the malloc() implementation on production applications for quick testing, or even allow the user to select (I do this in some server applications) which implementation to use. The 'tcmalloc' library for example can be easily inserted into an application to evaluate performance gains in heavily threaded applications (where tcmalloc tends to perform a lot better than libc's malloc implementation).
Finally if you're on Windows, perhaps try this: LD_PRELOAD equivalent for Windows to preload shared libraries

How can I detect the main executable's function definitions from a dynamic library - particularly malloc

So libgcc will use the application's malloc and free, if it has defined one, in order to satisfy the application's need to call free() after certain library calls, eg realpath.
Within my dynamic library, I really don't want to use that application malloc/free, because I don't trust it generally - I'm happy to use libgcc's implementation of malloc, which is what most applications use, so I declared my own malloc() function that calls libgcc's implementation via dlsym().
All was going well until... I want to call realpath (and perhaps others)! As a simple fix, I need a way to do the equivalent of dlsym() but on the main executable (which I don't own) to get the application's implementation of free, if any. Does such a thing exist?
I know it must, because the dynamic linker "does the right thing", but is it accessible to mere mortal programmers, and how?
In the particular case of realpath, I know I can provide a buffer, but that comes with its own unknown dangers about buffer size. For some other calls I can't do that.
I can also go down the winding path of symbol renaming with objcopy, but I'd prefer not to, if possible.
[laters]
I do take your point on malloc being possibly defined by another dynamic library, and I would want to use that version, while the application still uses its compiled in version (I have seen that it does continue to use it, even if tcmalloc is preloaded, for example.
I guess that extends the question to ask if any library has defined malloc, and if the application has defined malloc, I want to cherry-pick which version of malloc/free I use in each place in my code to match the behaviour of libgcc when necessary, and not when not, so I want to be able to get a reference to them both.
In the short term I have resolved my current issue by replacing realpath() in my code with a version that pre-defines the buffer using my malloc, before calling the libgcc implementation, but I feel this is very much a band-aide.
The interposing malloc can come from anywhere, not just the executable. It could be a shared object referenced by a DT_NEEDED entry in the dynamic section of another object, or a library injected into the process image using LD_PRELOAD.
In general, many libraries have functions which allocate something which then has to be deallocated using free. Software ported to Windows will not do this (because DLLs have separate heaps there), but otherwise, it is not uncommon. There is not just realpath, there is also strdup, asprintf, and probably I bunch of other functions I do not remember.
In your case, you should just call free on such pointers, and use a different name for your own memory deallocation functions. Once malloc has been interposed, it is not possible to safely use the original libc allocator because it has not been properly initialized. For example, if you call the glibc malloc function in a process which uses a different, interposed malloc, then malloc will not be thread-safe: the initialization is not thread-safe because the implementation knows that pthread_create will call malloc before creating the first new thread, thereby initializing malloc while the process is single-threaded. Which is why there is no synchronization in the initialization code.
(libgcc does not provide malloc, by the way. It comes from libc/glibc.)

Thread-safe init of read-only global data

Let's imagine that I'm writing a library that has a reasonably large amount of read-only global data that needs to be initialized before the library can be used. For example, perhaps the global data be lookup tables for various parts of the application logic that won't change during the lifetime of the program.
Now I have a few ways to initialize this data:
I may require that the user call some kind of init() function before the library is used.
I may lazily construct the data the first time a function is called on my library.
I may include the data in a initializer statement in the source, such that variables are statically initialized to their final value.
Now if my data is read-only and should be the same for every environment the library runs in, then (3) is fairly appealing. Even in that case it has some downsides: if the data is very large (but easy to generate procedurally) the size of bloat up a lot (e.g., a library with 50K of code but 8MB of lookup tables would end up around 8050K). Similarly, the source itself may be very large, or the build system needs to handle the generation of the source at compile time.
The main reason you might not able to use (3) is that the tables might be fixed (read-only), but require generation at runtime because they embed some information about the environment (e.g., the value of an environment variable, I configuration setting read from a file, information about the machine architecture, whatever). This data can't be embedded in the source since depends on the runtime environment.
So we have methods (1) and (2) at least - but I can't see how to make these thread-safe in a simple way. The rest of the library can be thread-safe simply by not mutating any global state - just like the vast majority of C functions can be written in a thread-safe way w/o any explicit use of threading primitives.
I can't figure out a similar alternative for this global init, however:
(1) Is undesirable because we prefer not to require the user to call this method, and in any case it simply moves the problem up to the calling code: the calling code then needs to organize to call this init() method exactly once across all threads using the library, and before any thread uses the library.
(2) Fails since concurrent calls to the library might do a double init.
In C++ you can just initialize globals with a method call, like int data[] = loadData(). Is there any equivalent in C? Or am I stuck using threading primitives (which vary by platform, e.g., pthread_once, call_once and whatever Windows has) just to get my thread-safe init?
I don't know of any platform-independent way of initializing a library in a thread-safe manner. That's not surprising since there's no platform-independent threading model in C.
So your solution is going to be platform-specific.
#ThingyWotsit mentions in the comments using C++ to initialize your library, and that will be thread-safe. But it may very well lock you into a specific C++ run-time, so it may not be a useful solution for your C shared object/library. You may not be willing or able to add a dependency on C++ and you may especially not be willing or able to be locked into a specific C++ run-time.
For GCC, you can use the __attribute((constructor)) to have your iniitaliziation function called when the shared object is loaded:
constructor
destructor
constructor (priority)
destructor (priority)
The constructor attribute causes the function to be called automatically before execution enters main ().
Similarly, the destructor attribute causes the function to be called
automatically after main () has completed or exit () has been called.
Functions with these attributes are useful for initializing data that
will be used implicitly during the execution of the program.
You may provide an optional integer priority to control the order in
which constructor and destructor functions are run. A constructor with
a smaller priority number runs before a constructor with a larger
priority number; the opposite relationship holds for destructors. So,
if you have a constructor that allocates a resource and a destructor
that deallocates the same resource, both functions typically have the
same priority. The priorities for constructor and destructor functions
are the same as those specified for namespace-scope C++ objects (see
C++ Attributes).
For example:
static __attribute__((constructor)) void my_lib_init_func( void )
{
...
}
Your code will run before main() is called.
If your library is dynamically loaded (explicit call to dlopen(), for exmaple), your init function will be called when your library is loaded, and your library won't be considered loaded until it returns.
Other compilers provide the functionally-identical #pragma init():
#pragma init(my_lib_init_func)
static void my_lib_init_func( void )
{
...
}
See #pragma init and #pragma fini using gcc compiler on linux
For Windows? The Windows C++ run-time is pretty stable and ubiquitous. I'd just use a C++ solution on Windows, especially if you're compiling with MSVC. (But see the comments...)
Option 3 is always preferable when possible. Your reasoning about the cons is wrong. If you have an 8MB constant table in the executable file, it's directly mapped and shared by all instances of the program or users of the shared library on any remotely modern operating system. If you generate it at runtime, each process will have its own copy of the table.
When option 3 is not available you must use pthread_once or equivalent or implement your own version of the same (much less efficiently) using a lock. There is little reason to use weird OS-specific replacements for it; all major platforms either support POSIX threads API natively or have existing libraries which provide it on top of the platform's low-level primitives.

How to override standard libc functions?

For example, if I want to override malloc(), what's the best way to do it?
Currently the simplest way I know of is:
malloc.h
#include <stdlib.h>
#define malloc my_malloc
void* my_malloc (size_t size);
foobar.c
#include "malloc.h"
void foobar(void)
{
void* leak = malloc(1024);
}
The problem with this approach is that we now have to use "malloc.h" and can never use "stdlib.h". Is there a way around this? I'm particularly interested in importing 3rd party libraries without modifying them at all, but forcing them into calling my custom libc functions (like malloc).
The short answer is you probably want to use the LD_PRELOAD trick: What is the LD_PRELOAD trick?
That approach basically inserts your own custom shared library on runtime before any other shared library is loaded, exporting the functions you want to override, such as malloc(). By the time the other shared libraries are loaded your symbol is already there and gets preference when resolving calls to that symbol name from other libraries. From within your malloc() wrapper/replacement you can even chose to call the next malloc symbol, which typically would be the actual libc symbol.
This blog post has a lot of comprehensive information about this method:
http://samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/
Note that example is overriding libc's write() and puts() functions, but the same logic applies for malloc():
LD_PRELOAD allows a shared library to be loaded before any other libraries. So all I need to do is to write a shared library that overrides write and puts functions. If we wrap these functions, we need a way to call the real functions to perform the system call. dlsym just do that for us [man 3 dlsym]: > The function dlsym() takes a “handle” of a dynamic library returned by dlopen() and the null-terminated symbol name, returning the address where that symbol is loaded into memory. If the symbol is not found, in the specified library or any of the libraries that were automatically loaded by dlopen() when that library was loaded, dlsym() returns NULL…
So inside the wrapper function we can use dlsym to get the address of the related symbol in memory and call the glibc function. Another approach can be calling the syscall directly, both approaches will work.
That blog post also describes a compile-time method I did not know about that involves passing a linker flag to ld, "--wrap":
Another way of wrapping functions is by using linker at the link time. GNU linker provides an option to wrap a function for a symbol [man 1 ld]: > Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to “__wrap_symbol”. Any undefined reference to “__real_symbol” will be resolved to symbol.
The handy thing about LD_PRELOAD is that might allow you to change the malloc() implementation on production applications for quick testing, or even allow the user to select (I do this in some server applications) which implementation to use. The 'tcmalloc' library for example can be easily inserted into an application to evaluate performance gains in heavily threaded applications (where tcmalloc tends to perform a lot better than libc's malloc implementation).
Finally if you're on Windows, perhaps try this: LD_PRELOAD equivalent for Windows to preload shared libraries

Compiling a custom malloc

I have written a custom library which implements malloc/calloc/realloc/free using the standard C prototypes, and I figured out how to compile it to an so. I want to test the library by linking a standard application against it? What would be a good way to do this? Once I have a working library I assume I can just load it with LD_PRELOAD, but how do I get my functions to co-exist with but take precedence over the system library ones? My functions need to make a call to malloc in order to get memory to run, so I can't just completely ditch stdlib... Help?
Functions that you are trying to replace are standard C functions, not macros, not system calls. So you have to simply give your functions the same names and compile them into a shared library.
Then, use LD_PRELOAD to pre-load your library before binary starts. Since all addresses are resolved once, linker will figure out addresses of your functions and remember their names and will not look for them in standard library later.
This approach might not work if your program is linked with the standard runtime statically. Also, it will not work on Mac OS X as there is another API for interpolation.
In Linux, for example, in order for your functions to co-exist (i.e. if you want to use system malloc in your own implementation of malloc), you have to open the standard library manually using dlopen, look up functions you need there using dlsym and call them later by address.
Don't write your malloc() in terms of malloc() -- write it using sbrk, which gets memory directly from the OS.
If you have control of the source code that is to use this library, here is one possibility. Use different function names: Rather than malloc, for example, call it newCoolMalloc. This method is sometimes simpler and doesn't depend on special linker options.
Then in your code, use #define to cause the code to call the desired set of functions. You can #define malloc to be something different. For example:
#define malloc newCoolMalloc
#define free newCoolFree
If you do that, though, you have to be very very careful to include that consistently. Otherwise you run the risk of using stdlib malloc in one place and then your own free in another leading to messy bugs. One way to help mitigate that situation is to (if possible) in your own code use custom names for the allocation and free functions. Then it is easier to ensure that the correct one is being called. You can define the various custom names to your own malloc functions or even the original stdlib malloc functions.
For example, you might use mallocPlaceHolder as the actual name in the code:
someThing = mallocPlaceHolder( nbytes );
Then your defines would look more like this:
#define mallocPlaceHolder myCoolMalloc
If no function of the form mallocPlaceHolder (and associated free) actually exist, it avoids mixing different libraries.

Resources