Providing a `malloc` implementation for `newlib-nano` - c

I'd like to provide an implementation of malloc for newlib-nano when using it with gcc. In my situation, I have some source file, say main.c, that calls strftime. The newlib-nano implementation of strftime uses malloc. In a header file, my_memory.h, I've declared a function void *malloc(size_t size) and provided an implementation in a corresponding my_memory.c file.
When linking the project using gcc, the linker fails at .../libc_nano.a(liba-malloc.o) because of multiple definitions of malloc. The behavior I'd like is for the linker to take my implementation of malloc rather than pulling newlib-nano's, but to retain using newlib-nano's implementation of other standard library functions, e.g. memset.
I've searched around for an "exclude object file from static library" option in gcc to try to exclude libc_nano.a(liba-malloc.o) but with no luck. Note that the compiler is pulling in this object file and I don't have access to the compiler's libc_nano.a to patch liba-malloc.o with my own object file.
Anyway, am I missing something, or is it not possible to achieve what I'm trying to achieve?

Likely liba-malloc.o contains other allocator function definitions like calloc, free, realloc, etc. and thus is getting pulled in for linking because of references to one of them. You can see this with the -t option to ld (pass -Wl,-t on gcc command line when linking to use it). If this is the case, you can avoid linking it just by ensuring you've provided definitions of all these functions yourself.
A better idea might be getting rid of the malloc dependency by using a different strftime. It's rather ridiculous for strftime, especially an embedded-oriented implementation, to be calling malloc; it has no fundamental need to and I'm somewhat baffled how they found a way to make malloc useful to it. Aside from some tie-in with locale which could be extricated fairly easily, musl libc's strftime.c (disclosure: author=me) is very self-contained and could probably serve as a drop-in replacement.

Related

Disable default malloc with arm-none-eabi on cortex m3 (bare metal)

I want to provide my own or better no malloc function. So I want to make sure it's not linked at all.
I already pass -nostdlib and --specs=nano.specs to the linker.
When providing my own malloc function I get:
../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/lib/thumb/v7-m\libc_nano.a(lib_a-malloc.o): In function `malloc':
malloc.c:(.text.malloc+0x0): multiple definition of `malloc'
I'm looking for a way to skip linking of the lib_a-malloc.o
As a clarification: It's more about having no malloc at all than about providing my own implementation. Providing my own implementation was just to check if there already is one or for debugging purpose.
Using the same name as a standard function's name is almost always a bad idea.
Even you, after some time not working on that project, will not remember that this malloc() you are reading in your code is not the malloc() that we all know and loved. Let aside anyone else.
So, for maintainability and readability, I suggest you name your function differently, plain example: my_malloc().
PS: You might want to read GCC - How to stop malloc being linked?

If a standard library function is reimplemented, which of the two is called?

If a library function like e.g. malloc is reimplemented, then there are two symbols with that name, one in a local object file and one in the system library. Which of the two is used when a function from e.g. stdio is used, which calls malloc (and why)?
The link behaviour is, in a general way:
Include all symbols defined in the object files.
Then resolve the undefined ones using the objects in the libs.
So, if malloc is reimplemented and linked as object file, the version in the object file will override the version in the standard libraries. However, if the new malloc is linked as a library, it depends on the libraries linking order.
Another way, considering the gnu binutils as scope, to override library functions is to wrap the function using the --wrap parameter: https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html
By using the --wrap ld option we can get both functions linked and the wrapper function would be able to call the wrapped one.
The linking order also depends on the command line parameters order. So I'm considering here that libs are listed after objects because, in general, doesn't make sense to put libraries before objects as their objective is to supply the missing symbols demanded by those.
One answer is that you are getting yourself in an awful lot of trouble by trying to replace malloc. Don't go there. Don't even think about it. Especially if you need to ask questions on stackoverflow, don't even think about it.
Another answer is that you are invoking undefined behaviour, and the likely result is that the malloc function will be called that will hurt you most. If you are lucky, while developing. If you are less lucky, once your code is in the hand of customers. Don't do it.
Writing wrappers around malloc with new names is bad enough. Trying to replace malloc is madness.

malloc function interposition in the standard C and C++ libraries

I want to write a shared library in such a way that it is possible to isolate it’s memory usage from the application it is linked against. That is, if the shared library, let’s call it libmemory.so, calls malloc, I want to maintain that memory in a separate heap from the heap that is used to service calls to malloc made in the application. This question isn't about writing memory allocators, it more about linking and loading the library and application together.
So far I’ve been experimenting with combinations of function interposition, symbol visibility, and linking tricks. So far, I can’t get this right because of one thing: the standard library. I cannot find a way to distinguish between calls to the standard library that internally use malloc that happen in libmemory.so versus the application. This causes an issue since then any standard library usage within libmemory.so pollutes the application heap.
My current strategy is to interpose definitions of malloc in the shared library as a hidden symbol. This works nicely and all of the library code works as expected except, of course, the standard library which is loaded dynamically at runtime. Naturally, I’ve been trying to find a way to statically embed the standard library usage so that it would use the interposed malloc in libmemory.so at compile time. I’ve tried -static-libgcc and -static-libstdc++ without success (and anyway, it seems this is discouraged). Is this the right answer?
What do?
P.s., further reading is always appreciated, and help on the question tagging front would be nice.
I’ve tried -static-libgcc and -static-libstdc++ without success
Of course this wouldn't succeed: malloc doesn't live in libgcc or libstdc++; it lives in libc.
What you want to do is statically link libmemory.so with some alternative malloc implementation, such as tcmalloc or jemalloc, and hide all malloc symbols. Then your library and your application will have absolutely separate heaps.
It goes without saying that you must never allocate something in your library and free it in the application, or vice versa.
In theory you could also link the malloc part of system libc.a into your library, but in practice GLIBC (and most other UNIX C libraries) does not support partially-static link (if you link libc.a, you must not link libc.so).
Update:
If libmemory.so makes use of a standard library function, e.g., gmtime_r, which is linked in dynamically, thereby resolving malloc at runtime, then libmemory.so mistakenly uses malloc provided at runtime (the one apparently from glibc
There is nothing mistaken about that. Since you've hidden your malloc inside your library, there is no other malloc that gmtime_r could use.
Also, gmtime_r doesn't allocate memory, except for internal use by GLIBC itself, and such memory could be cleaned up by __libc_freeres, so it would be wrong to allocate this memory anywhere other than using GLIBC's malloc.
Now, fopen is another example you used, and fopen does malloc memory. Apparently you would like fopen to call your malloc (even though it's not visible to fopen) when called by your library, but call system malloc when called by the application. But how can fopen know who called it? Surely you are not suggesting that fopen walk the stack to figure out whether it was called by your library or by something else?
So, if you really want to make your library never call into system malloc, then you would have to statically link all other libc functions that you use and that may call malloc (and hide them in your library as well).
You could use something like uclibc or dietlibc to achieve that.

Compiling a custom malloc

I have written a custom library which implements malloc/calloc/realloc/free using the standard C prototypes, and I figured out how to compile it to an so. I want to test the library by linking a standard application against it? What would be a good way to do this? Once I have a working library I assume I can just load it with LD_PRELOAD, but how do I get my functions to co-exist with but take precedence over the system library ones? My functions need to make a call to malloc in order to get memory to run, so I can't just completely ditch stdlib... Help?
Functions that you are trying to replace are standard C functions, not macros, not system calls. So you have to simply give your functions the same names and compile them into a shared library.
Then, use LD_PRELOAD to pre-load your library before binary starts. Since all addresses are resolved once, linker will figure out addresses of your functions and remember their names and will not look for them in standard library later.
This approach might not work if your program is linked with the standard runtime statically. Also, it will not work on Mac OS X as there is another API for interpolation.
In Linux, for example, in order for your functions to co-exist (i.e. if you want to use system malloc in your own implementation of malloc), you have to open the standard library manually using dlopen, look up functions you need there using dlsym and call them later by address.
Don't write your malloc() in terms of malloc() -- write it using sbrk, which gets memory directly from the OS.
If you have control of the source code that is to use this library, here is one possibility. Use different function names: Rather than malloc, for example, call it newCoolMalloc. This method is sometimes simpler and doesn't depend on special linker options.
Then in your code, use #define to cause the code to call the desired set of functions. You can #define malloc to be something different. For example:
#define malloc newCoolMalloc
#define free newCoolFree
If you do that, though, you have to be very very careful to include that consistently. Otherwise you run the risk of using stdlib malloc in one place and then your own free in another leading to messy bugs. One way to help mitigate that situation is to (if possible) in your own code use custom names for the allocation and free functions. Then it is easier to ensure that the correct one is being called. You can define the various custom names to your own malloc functions or even the original stdlib malloc functions.
For example, you might use mallocPlaceHolder as the actual name in the code:
someThing = mallocPlaceHolder( nbytes );
Then your defines would look more like this:
#define mallocPlaceHolder myCoolMalloc
If no function of the form mallocPlaceHolder (and associated free) actually exist, it avoids mixing different libraries.

Question about overriding C standard library functions and how to link everything together

I made my own implementation of _init , malloc , free ( and others ).
Inside these functions I use the dlfcn.h (dlopen , dlsym etc) library to call the actual standard versions. I put then in a single file and compile them as a shared library ( memory.so ). When I wish to run an executable and make it call my versions of these functions I simply set LD_PRELOAD=memory.so .
The problem is that I have a number of other modules which memory.c depends on. These include a file containing functions to scan elf files ( symbols.c ) and my own implementation of a hash table ( hashtable.c ) which I use to keep track of memory leaks among others.
My question is if there is a way to separately compile hashtable.c & symbols.c so any malloc references are resolved with the standard library and not with the ones included on memory.c. I could of course use the dlfcn.h libraries on everything that memory.c depends on but I would prefer it if there was a way to avoid that.
I still haven't completely figured out how linking works so any help would be appreciated.
Thank you
If you are working with glibc you can use alternative non-overriden function names:
[max#truth ~]$ nm --defined-only --dynamic /lib64/libc.so.6 | egrep "malloc\b"
0000003e56079540 T __libc_malloc
0000003e56079540 T malloc
Note the same function address in the above. In other words, malloc() function is given two names, so that the original malloc() version is available under __libc_malloc() name in case malloc() has been interposed.
A quick grep on glibc sources reveals the only caller of __libc_malloc() is mcheck. These function aliases are a glibc implementation detail and there is no header for them. malloc/mcheck.c declares the internal functions as below:
extern __typeof (malloc) __libc_malloc;
extern __typeof (free) __libc_free;
extern __typeof (realloc) __libc_realloc;
Other C libraries may have differently named aliases, so using dlsym() to get the original malloc() address is more portable.
First it is important to note there is no need to do what you want to do for memory debuggers in Linux as glibc provides specific hook functions for memory functions (see: http://www.gnu.org/s/libc/manual/html_node/Allocation-Debugging.html)
But disregarding this, the general solution is to NOT use dlopen() to get a reference to glibc for dlsym() but rather use the magical handle RTLD_NEXT. Here is the relevant part from the dlopen() man page:
"There are two special pseudo-handles, RTLD_DEFAULT and RTLD_NEXT. The former will find the first occurrence of the desired symbol using the default library search order. The latter will find the next occurrence of a function in the search order after the current library. This allows one to provide a wrapper around a function in another shared library."
See for example: http://developers.sun.com/solaris/articles/lib_interposers_code.html
You could take a look at electric fence. It overrides a lot of standard functions to do various memory debugging.

Resources