GNU LD: Override funtions in static library - linker

I have the following problem:
In some commercial project (ARM Cortex-M), we have pre-compiled libraries from a supplier.
The library contains code from different suppliers; some library functions use the standard malloc() function call, while others use the memory allocator of the operating system (in the example below: os_malloc()).
The malloc() function is taken from the libc.a standard library.
Both implementations seem to disturb each other. For this reason I'd like to replace the default malloc() (and free()) implementation by something like this:
void * malloc(uint32_t size)
{
return os_malloc(size);
}
void free(void * ptr)
{
os_free(ptr);
}
However, in this case I would have to remove the object file containing malloc() and free() from the libc.a library file - what I also don't want to do.
Is there a possibility to tell the GNU linker not to take some symbol from a library if it is already defined outside the library?
... or a possibility to tell the linker not to take a certain object file from a library?
... or a possibility to do something like PROVIDE(malloc = os_malloc); in the linker script although the symbol malloc is defined in some library?
Thank you very much.

However, in this case I would have to remove the object file containing malloc() and free() from the libc.a library file - what I also don't want to do.
That statement is false, and demonstrates that you do not understand how linkers work with archive libraries.
You do not in fact need to remove anything from libc.a (though the replacement may not work for other reasons).
Here is a good explainer.

Related

ignore "default" functions in c [duplicate]

For example, if I want to override malloc(), what's the best way to do it?
Currently the simplest way I know of is:
malloc.h
#include <stdlib.h>
#define malloc my_malloc
void* my_malloc (size_t size);
foobar.c
#include "malloc.h"
void foobar(void)
{
void* leak = malloc(1024);
}
The problem with this approach is that we now have to use "malloc.h" and can never use "stdlib.h". Is there a way around this? I'm particularly interested in importing 3rd party libraries without modifying them at all, but forcing them into calling my custom libc functions (like malloc).
The short answer is you probably want to use the LD_PRELOAD trick: What is the LD_PRELOAD trick?
That approach basically inserts your own custom shared library on runtime before any other shared library is loaded, exporting the functions you want to override, such as malloc(). By the time the other shared libraries are loaded your symbol is already there and gets preference when resolving calls to that symbol name from other libraries. From within your malloc() wrapper/replacement you can even chose to call the next malloc symbol, which typically would be the actual libc symbol.
This blog post has a lot of comprehensive information about this method:
http://samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/
Note that example is overriding libc's write() and puts() functions, but the same logic applies for malloc():
LD_PRELOAD allows a shared library to be loaded before any other libraries. So all I need to do is to write a shared library that overrides write and puts functions. If we wrap these functions, we need a way to call the real functions to perform the system call. dlsym just do that for us [man 3 dlsym]: > The function dlsym() takes a “handle” of a dynamic library returned by dlopen() and the null-terminated symbol name, returning the address where that symbol is loaded into memory. If the symbol is not found, in the specified library or any of the libraries that were automatically loaded by dlopen() when that library was loaded, dlsym() returns NULL…
So inside the wrapper function we can use dlsym to get the address of the related symbol in memory and call the glibc function. Another approach can be calling the syscall directly, both approaches will work.
That blog post also describes a compile-time method I did not know about that involves passing a linker flag to ld, "--wrap":
Another way of wrapping functions is by using linker at the link time. GNU linker provides an option to wrap a function for a symbol [man 1 ld]: > Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to “__wrap_symbol”. Any undefined reference to “__real_symbol” will be resolved to symbol.
The handy thing about LD_PRELOAD is that might allow you to change the malloc() implementation on production applications for quick testing, or even allow the user to select (I do this in some server applications) which implementation to use. The 'tcmalloc' library for example can be easily inserted into an application to evaluate performance gains in heavily threaded applications (where tcmalloc tends to perform a lot better than libc's malloc implementation).
Finally if you're on Windows, perhaps try this: LD_PRELOAD equivalent for Windows to preload shared libraries

Providing a `malloc` implementation for `newlib-nano`

I'd like to provide an implementation of malloc for newlib-nano when using it with gcc. In my situation, I have some source file, say main.c, that calls strftime. The newlib-nano implementation of strftime uses malloc. In a header file, my_memory.h, I've declared a function void *malloc(size_t size) and provided an implementation in a corresponding my_memory.c file.
When linking the project using gcc, the linker fails at .../libc_nano.a(liba-malloc.o) because of multiple definitions of malloc. The behavior I'd like is for the linker to take my implementation of malloc rather than pulling newlib-nano's, but to retain using newlib-nano's implementation of other standard library functions, e.g. memset.
I've searched around for an "exclude object file from static library" option in gcc to try to exclude libc_nano.a(liba-malloc.o) but with no luck. Note that the compiler is pulling in this object file and I don't have access to the compiler's libc_nano.a to patch liba-malloc.o with my own object file.
Anyway, am I missing something, or is it not possible to achieve what I'm trying to achieve?
Likely liba-malloc.o contains other allocator function definitions like calloc, free, realloc, etc. and thus is getting pulled in for linking because of references to one of them. You can see this with the -t option to ld (pass -Wl,-t on gcc command line when linking to use it). If this is the case, you can avoid linking it just by ensuring you've provided definitions of all these functions yourself.
A better idea might be getting rid of the malloc dependency by using a different strftime. It's rather ridiculous for strftime, especially an embedded-oriented implementation, to be calling malloc; it has no fundamental need to and I'm somewhat baffled how they found a way to make malloc useful to it. Aside from some tie-in with locale which could be extricated fairly easily, musl libc's strftime.c (disclosure: author=me) is very self-contained and could probably serve as a drop-in replacement.

How to override standard libc functions?

For example, if I want to override malloc(), what's the best way to do it?
Currently the simplest way I know of is:
malloc.h
#include <stdlib.h>
#define malloc my_malloc
void* my_malloc (size_t size);
foobar.c
#include "malloc.h"
void foobar(void)
{
void* leak = malloc(1024);
}
The problem with this approach is that we now have to use "malloc.h" and can never use "stdlib.h". Is there a way around this? I'm particularly interested in importing 3rd party libraries without modifying them at all, but forcing them into calling my custom libc functions (like malloc).
The short answer is you probably want to use the LD_PRELOAD trick: What is the LD_PRELOAD trick?
That approach basically inserts your own custom shared library on runtime before any other shared library is loaded, exporting the functions you want to override, such as malloc(). By the time the other shared libraries are loaded your symbol is already there and gets preference when resolving calls to that symbol name from other libraries. From within your malloc() wrapper/replacement you can even chose to call the next malloc symbol, which typically would be the actual libc symbol.
This blog post has a lot of comprehensive information about this method:
http://samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/
Note that example is overriding libc's write() and puts() functions, but the same logic applies for malloc():
LD_PRELOAD allows a shared library to be loaded before any other libraries. So all I need to do is to write a shared library that overrides write and puts functions. If we wrap these functions, we need a way to call the real functions to perform the system call. dlsym just do that for us [man 3 dlsym]: > The function dlsym() takes a “handle” of a dynamic library returned by dlopen() and the null-terminated symbol name, returning the address where that symbol is loaded into memory. If the symbol is not found, in the specified library or any of the libraries that were automatically loaded by dlopen() when that library was loaded, dlsym() returns NULL…
So inside the wrapper function we can use dlsym to get the address of the related symbol in memory and call the glibc function. Another approach can be calling the syscall directly, both approaches will work.
That blog post also describes a compile-time method I did not know about that involves passing a linker flag to ld, "--wrap":
Another way of wrapping functions is by using linker at the link time. GNU linker provides an option to wrap a function for a symbol [man 1 ld]: > Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to “__wrap_symbol”. Any undefined reference to “__real_symbol” will be resolved to symbol.
The handy thing about LD_PRELOAD is that might allow you to change the malloc() implementation on production applications for quick testing, or even allow the user to select (I do this in some server applications) which implementation to use. The 'tcmalloc' library for example can be easily inserted into an application to evaluate performance gains in heavily threaded applications (where tcmalloc tends to perform a lot better than libc's malloc implementation).
Finally if you're on Windows, perhaps try this: LD_PRELOAD equivalent for Windows to preload shared libraries

Is a user defined function able to act instead of libc's function ?

I want to code some libc functions myself (but not all libc!) for increasing performance in my programs . but does GCC use them instead of libc functions in the compiled program or ignores them?
Pretty much all the public symbols in glibc are weak linked, which means you can provide your own implementation which will take precedence over the weak symbols during linking.
So, yes. You can just define your own functions with the same name/arguments and they will get used instead. Be sure to look in the header files to see the real signature of a function, some functions may be a macro expanding to another function.
You can also create a shared library that contains the functions you want to override, and have the dynamic linker pre-load it to override functions in shared libraries. See this question for more information.

Question about overriding C standard library functions and how to link everything together

I made my own implementation of _init , malloc , free ( and others ).
Inside these functions I use the dlfcn.h (dlopen , dlsym etc) library to call the actual standard versions. I put then in a single file and compile them as a shared library ( memory.so ). When I wish to run an executable and make it call my versions of these functions I simply set LD_PRELOAD=memory.so .
The problem is that I have a number of other modules which memory.c depends on. These include a file containing functions to scan elf files ( symbols.c ) and my own implementation of a hash table ( hashtable.c ) which I use to keep track of memory leaks among others.
My question is if there is a way to separately compile hashtable.c & symbols.c so any malloc references are resolved with the standard library and not with the ones included on memory.c. I could of course use the dlfcn.h libraries on everything that memory.c depends on but I would prefer it if there was a way to avoid that.
I still haven't completely figured out how linking works so any help would be appreciated.
Thank you
If you are working with glibc you can use alternative non-overriden function names:
[max#truth ~]$ nm --defined-only --dynamic /lib64/libc.so.6 | egrep "malloc\b"
0000003e56079540 T __libc_malloc
0000003e56079540 T malloc
Note the same function address in the above. In other words, malloc() function is given two names, so that the original malloc() version is available under __libc_malloc() name in case malloc() has been interposed.
A quick grep on glibc sources reveals the only caller of __libc_malloc() is mcheck. These function aliases are a glibc implementation detail and there is no header for them. malloc/mcheck.c declares the internal functions as below:
extern __typeof (malloc) __libc_malloc;
extern __typeof (free) __libc_free;
extern __typeof (realloc) __libc_realloc;
Other C libraries may have differently named aliases, so using dlsym() to get the original malloc() address is more portable.
First it is important to note there is no need to do what you want to do for memory debuggers in Linux as glibc provides specific hook functions for memory functions (see: http://www.gnu.org/s/libc/manual/html_node/Allocation-Debugging.html)
But disregarding this, the general solution is to NOT use dlopen() to get a reference to glibc for dlsym() but rather use the magical handle RTLD_NEXT. Here is the relevant part from the dlopen() man page:
"There are two special pseudo-handles, RTLD_DEFAULT and RTLD_NEXT. The former will find the first occurrence of the desired symbol using the default library search order. The latter will find the next occurrence of a function in the search order after the current library. This allows one to provide a wrapper around a function in another shared library."
See for example: http://developers.sun.com/solaris/articles/lib_interposers_code.html
You could take a look at electric fence. It overrides a lot of standard functions to do various memory debugging.

Resources