I would like to monitor the use of mallocs and frees in an application by using the malloc and free hooks.
Here's the documentation http://www.gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html
From the example page you can see that my_malloc_hook transiently switches the malloc hook off (or to the previous hook in the chain) before re-invoking malloc.
This is a problem when monitoring multi-threaded applications (see end of question for explanation).
Other examples of the use of malloc hook that I have found on the internet have the same problem.
Is there a way to re-write this function to work correctly in a multi-threaded application?
For instance, is there an internal libc function that the malloc hook can invoke that completes the allocation, without the need to deactivate my hook.
I can't look at the libc source code due to corporate legal policy, so the answer may be obvious.
My design spec says I cannot replace malloc with a different malloc design.
I can assume that no other hooks are in play.
UPDATE
Since the malloc hook is temporarily removed while servicing the malloc, another thread may call malloc and NOT get the hook.
It has been suggested that malloc has a big lock around it that prevents this from happening, but it's not documented, and the fact that I effectively recursively call malloc suggests any lock must either exist after the hook, or be jolly clever:
caller ->
malloc ->
malloc-hook (disables hook) ->
malloc -> # possible hazard starts here
malloc_internals
malloc <-
malloc-hook (enables hook) <-
malloc
caller
UPDATED
You are right to not trust __malloc_hooks; I have glanced at the code, and they are - staggeringly crazily - not thread safe.
Invoking the inherited hooks directly, rather than restoring and re-entering malloc, seems to be deviating from the the document you cite a little bit too much to feel comfortable suggesting.
From http://manpages.sgvulcan.com/malloc_hook.3.php:
Hook variables are not thread-safe so they are deprecated now. Programmers should instead preempt calls to the relevant functions by defining and exporting functions like "malloc" and "free".
The appropriate way to inject debug malloc/realloc/free functions is to provide your own library that exports your 'debug' versions of these functions, and then defers itself to the real ones. C linking is done in explicit order, so if two libraries offer the same function, the first specified is used. You can also inject your malloc at load-time on unix using the LD_PRELOAD mechanisms.
http://linux.die.net/man/3/efence describes Electric Fence, which details both these approaches.
You can use your own locking if in these debug functions if that is necessary.
I have the same problem. I have solved it with that example. If we do not define THREAD_SAFE, we have the example given by the man, and we have a segmentation error.
If we define THREAD_SAFE, we have no segmentation error.
#include <malloc.h>
#include <pthread.h>
#define THREAD_SAFE
#undef THREAD_SAFE
/** rqmalloc_hook_ */
static void* (*malloc_call)(size_t,const void*);
static void* rqmalloc_hook_(size_t taille,const void* appel)
{
void* memoire;
__malloc_hook=malloc_call;
memoire=malloc(taille);
#ifndef THREAD_SAFE
malloc_call=__malloc_hook;
#endif
__malloc_hook=rqmalloc_hook_;
return memoire;
}
/** rqfree_hook_ */
static void (*free_call)(void*,const void*);
static void rqfree_hook_(void* memoire,const void* appel)
{
__free_hook=free_call;
free(memoire);
#ifndef THREAD_SAFE
free_call=__free_hook;
#endif
__free_hook=rqfree_hook_;
}
/** rqrealloc_hook_ */
static void* (*realloc_call)(void*,size_t,const void*);
static void* rqrealloc_hook_(void* memoire,size_t taille,const void* appel)
{
__realloc_hook=realloc_call;
memoire=realloc(memoire,taille);
#ifndef THREAD_SAFE
realloc_call=__realloc_hook;
#endif
__realloc_hook=rqrealloc_hook_;
return memoire;
}
/** memory_init */
void memory_init(void)
{
malloc_call = __malloc_hook;
__malloc_hook = rqmalloc_hook_;
free_call = __free_hook;
__free_hook = rqfree_hook_;
realloc_call = __realloc_hook;
__realloc_hook = rqrealloc_hook_;
}
/** f1/f2 */
void* f1(void* param)
{
void* m;
while (1) {m=malloc(100); free(m);}
}
void* f2(void* param)
{
void* m;
while (1) {m=malloc(100); free(m);}
}
/** main */
int main(int argc, char *argv[])
{
memory_init();
pthread_t t1,t2;
pthread_create(&t1,NULL,f1,NULL);
pthread_create(&t1,NULL,f2,NULL);
sleep(60);
return(0);
}
Since all calls to malloc() will go through your hook, you can synchronize on a semaphore (wait until it is free, lock it, juggle the hooks and free the semaphore).
[EDIT] IANAL but ... If you can use glibc in your code, then you can look at the code (since it's LGPL, anyone using it must be allowed to have a copy of the source). So I'm not sure you understood the legal situation correctly or maybe you're not legally allowed to use glibc by your company.
[EDIT2] After some thinking, I guess that this part of the call path must be protected by a lock of some kind which glibc creates for you. Otherwise, using hooks in multi-threaded code would never work reliably and I'm sure the docs would mention this. Since malloc() must be thread safe, the hooks must be as well.
If you're still worried, I suggest to write a small test program with two threads which allocate and free memory in a loop. Increment a counter in the hook. After a million rounds, the counter should be exactly two million. If this holds, then the hook is protected by the malloc() lock as well.
[EDIT3] If the test fails, then, because of your legal situation, it's not possible to implement the monitor. Tell your boss and let him make a decision about it.
[EDIT4] Googling turned up this comment from a bug report:
The hooks are not thread-safe. Period. What are you trying to fix?
This is part of a discussion from March 2009 about a bug in libc/malloc/malloc.c which contains a fix. So maybe a version of glibc after this date works but there doesn't seem to be a guarantee. It also seems to depend on your version of GCC.
There is no way to use the malloc hooks in a thread-safe way while recursing into malloc. The interface is badly designed, probably beyond repair.
Even if you put a mutex in your hook code, the problem is that calls into malloc do not see those locks until after they have passed through the hook mechanism, and to pass through the hook mechanism, they look at global variables (the hook pointers) without acquiring your mutex. As you're saving, changing and restoring these pointers in one thread, allocator calls in another thread are affected by them.
The main design problem is that the hooks are null pointers by default. If the interface simply provided non-null default hooks which are the allocator proper (the bottom-level allocator which doesn't call any more hooks), then it would be simple and safe to add hooks: you could just save the previous hooks, and in the new hooks, recurse into malloc by calling the hold hooks, without fiddling with any global pointers (other than at hook installation time, which can be done prior to any threads start up).
Alternatively, glibc could provide an internal malloc interface which doesn't invoke the hooks.
Another sane design would be the use of thread-local storage for the hooks. Overriding and restoring a hook would be done in one thread, without disturbing the hooks seen by another thread.
As it stands, what you can do to use the glibc malloc hook safely is to avoid recursing into malloc. Do not change the hook pointers inside the hook callbacks, and simply call your own allocator.
Related
I've tried to patch the libc i'm using to never call malloc hook. But this hook is actually initialized with a pointer to this function:
malloc_hook_ini (size_t sz, const void *caller)
{
__malloc_hook = NULL;
ptmalloc_init ();
return __libc_malloc (sz);
}
I think this function is responsible for critical initializations in malloc, so it needs to be called at least once. For instance since the free hook is not initialized with a critical function, i can just nop the call instruction
DJ Delorie posted a patch which removes the hooks. It requires some porting to the current tree, though.
Alternatively, you could interpose a different malloc which does not have such hooks. If you do this, glibc will use the interposed malloc, so the hooks are never called, either.
I am writing an XS module. I allocate some resource (e.g. malloc() or SvREFCNT_inc()) then do some operations involving the Perl API, then free the resource. This is fine in normal C because C has no exceptions, but code using the Perl API may croak(), thus preventing normal cleanup and leaking the resources. It therefore seems impossible to write correct XS code except for fairly simple cases.
When I croak() myself I can clean up any resources allocated so far, but I may be calling functions that croak() directly which would sidestep any cleanup code I write.
Pseudo-code to illustrate my concern:
static void some_other_function(pTHX_ Data* d) {
...
if (perhaps) croak("Could not frobnicate the data");
}
MODULE = Example PACKAGE = Example
void
xs(UV n)
CODE:
{
/* Allocate resources needed for this function */
Data* object_graph;
Newx(object_graph, 1, Data);
Data_init(object_graph, n);
/* Call functions which use the Perl API */
some_other_function(aTHX_ object_graph);
/* Clean up before returning.
* Not run if above code croak()s!
* Can this be put into the XS equivalent of a "try...finally" block?
*/
Data_destroy(object_graph);
Safefree(object_graph);
}
So how do I safely clean up resources in XS code? How can I register some destructor that is run when exceptions are thrown, or when I return from XS code back to Perl code?
My ideas and findings so far:
I can create a class that runs necessary cleanup in the destructor, then create a mortal SV containing an instance of this class. At some point in the future Perl will free that SV and run my destructor. However, this seems rather backwards, and there has to be a better way.
XSAWYERX's XS Fun booklet seems to discuss DESTROY methods at great length, but not the handling of exceptions that originate within XS code.
LEONT's Scope::OnExit module features XS code using SAVEDESTRUCTOR() and SAVEDESTRUCTOR_X() macros. These do not seem to be documented.
The Perl API lists save_destructor() and save_destructor_x() functions as public but undocumented.
Perl's scope.h header (included by perl.h) declares SAVEDESTRUCTOR(f,p) and SAVEDESTRUCTOR_X(f,p) macros, without any further explanation. Judging from context and the Scope::OnExit code, f is a function pointer and p a void pointer that will be passed to f. The _X version is for functions that are declared with the pTHX_ macro parameter.
Am I on the right track with this? Should I use these macros as appropriate? In which Perl version were they introduced? Is there any further guidance available on their use? When precisely are the destructors triggered? Presumably at a point related to the FREETMPS or LEAVE macros?
Upon further research, it turns out that SAVEDESTRUCTOR is in fact documented – in perlguts rather than perlapi. The exact semantics are documented there.
I therefore assume that SAVEDESTRUCTOR is supposed to be used as a "finally" block for cleanup, and is sufficiently safe and stable.
Excerpt from Localizing changes in perlguts, which discusses the equivalent to { local $foo; ... } blocks:
There is a way to achieve a similar task from C via Perl API: create a pseudo-block, and arrange for some changes to be automatically undone at the end of it, either explicit, or via a non-local exit (via die()). A block-like construct is created by a pair of ENTER/LEAVE macros (see Returning a Scalar in perlcall). Such a construct may be created specially for some important localized task, or an existing one (like boundaries of enclosing Perl subroutine/block, or an existing pair for freeing TMPs) may be used. (In the second case the overhead of additional localization must be almost negligible.) Note that any XSUB is automatically enclosed in an ENTER/LEAVE pair.
Inside such a pseudo-block the following service is available:
[…]
SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)
At the end of pseudo-block the function f is called with the only argument p.
SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)
At the end of pseudo-block the function f is called with the implicit context argument (if any), and p.
The section also lists a couple of specialized destructors, like SAVEFREESV(SV *sv) and SAVEMORTALIZESV(SV *sv) that may be more correct than a premature sv_2mortal() in some cases.
These macros have basically been available since effectively forever, at least Perl 5.6 or older.
I don't understand why there's a need for another level of indirection when releasing or acquiring the GVL in Ruby C API.
Both rb_thread_call_without_gvl() and rb_thread_call_with_gvl() require a function that accepts only one argument which isn't always the case.
I don't want to wrap my arguments in a struct just for the purpose of releasing the GVL. It complicates the code's readability and requires casting from and to void pointers.
After looking into Ruby's threading code I found the GVL_UNLOCK_BEGIN/GVL_UNLOCK_END macros that matches Python's Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS but I can't find documentation about them and when they are safe to use.
There's also the BLOCKING_REGION macro is used within rb_thread_call_without_gvl() but I'm not sure if it's safe to use it as a standalone without calling rb_thread_call_without_gvl() itself.
What is the correct way to safely release the GVL in the middle of the execution flow without having to call another function?
In Ruby 2.x, there is only the rb_thread_call_without_gvl API. GVL_UNLOCK_BEGIN and GVL_UNLOCK_END are implementation details that are only defined in thread.c, and are therefore unavailable to Ruby extensions. Thus, the direct answer to your question is "there is no way to correctly and safely release the GVL without calling another function".
There was previously a "region-based" API, rb_thread_blocking_region_begin/rb_thread_blocking_region_end, but this API was deprecated in Ruby 1.9.3 and removed in Ruby 2.2 (see https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/CAPI_obsolete_definitions for the CAPI deprecation schedule).
Therefore, unfortunately, you are stuck with rb_thread_call_without_gvl.
That said, there's a few things you could do to ease the pain. In standard C, converting between most pointers and void * is implicit, so you don't have to add a cast. Furthermore, using designated initializer syntax can simplify the creation of the argument structure.
Thus, you can write
struct my_func_args {
int arg1;
char *arg2;
};
void *func_no_gvl(void *data) {
struct my_func_args *args = data;
/* do stuff with args->arg... */
return NULL;
}
VALUE my_ruby_function(...) {
...
struct my_func_args args = {
// designated initializer syntax (C99) for cleaner code
.arg1 = ...,
.arg2 = ...,
};
// call without an unblock function
void *res = rb_thread_call_without_gvl(func_no_gvl, &args, NULL, NULL);
...
}
Although this doesn't solve your original problem, it does at least make it more tolerable (I hope).
What is the correct way to safely release the GVL in the middle of the
execution flow without having to call another function?
You must use the supplied API or whatever method you use will eventually break. The API to the GVL is defined in thread.h
void *rb_thread_call_with_gvl(void *(*func)(void *), void *data1);
void *rb_thread_call_without_gvl(void *(*func)(void *), void *data1,
rb_unblock_function_t *ubf, void *data2);
void *rb_thread_call_without_gvl2(void *(*func)(void *), void *data1,
rb_unblock_function_t *ubf, void *data2);
What you find in the header is an agreement between you the consumer of their API's and the author of the API's. Think of it as a contract. Anything you find in a .c in particular static methods and MACROS are not for consumption outside the file unless it's found in the header. The static keyword prevents this from happening, it's one of the reason it exists and it's most important use in C. The other items you mentioned are in thread.c. You can poke around in thread.c but using anything from it is a violation of the API's contract ie it's not safe and never will be.
I'm not suggesting you do this but the only way for you to do what you want is to copy portions of their implementation into your own code and this would not pass a code review. The amount of code you would need to copy out would likely dwarf anything you would need to do to use their API's safely.
I am writing a memory profiler for C and for that am intercepting calls to the malloc, realloc and free functions via malloc_hooks. Unfortunately, these are deprecated because of their poor behavior in multi threaded environments. I could not find a document describing the alternative best practice solution to achieve the same thing, can someone enlighten me?
I've read that a simple #define malloc(s) malloc_hook(s) would do the trick, but that does not work with the system setup I have in mind, because it is too intrusive to the original code base to be suitable for use in a profiling / tracing tool. Having to manually change the original application code is a killer for any decent profiler. Optimally, the solution I am looking for should be enabled or disabled just by linking to an optional shared library. For example, my current setup uses a function declared with __attribute__ ((constructor)) to install the intercepting malloc hooks.
Thanks
After trying some things, I finally managed to figure out how to do this.
First of all, in glibc, malloc is defined as a weak symbol, which means that it can be overwritten by the application or a shared library. Hence, LD_PRELOAD is not necessarily needed. Instead, I implemented the following function in a shared library:
void*
malloc (size_t size)
{
[ ... ]
}
Which gets called by the application instead of glibcs malloc.
Now, to be equivalent to the __malloc_hooks functionality, a couple of things are still missing.
1.) the caller address
In addition to the original parameters to malloc, glibcs __malloc_hooks also provide the address of the calling function, which is actually the return address of where malloc would return to. To achieve the same thing, we can use the __builtin_return_address function that is available in gcc. I have not looked into other compilers, because I am limited to gcc anyway, but if you happen to know how to do such a thing portably, please drop me a comment :)
Our malloc function now looks like this:
void*
malloc (size_t size)
{
void *caller = __builtin_return_address(0);
[ ... ]
}
2.) accessing glibcs malloc from within your hook
As I am limited to glibc in my application, I chose to use __libc_malloc to access the original malloc implementation. Alternatively, dlsym(RTLD_NEXT, "malloc") can be used, but at the possible pitfall that this function uses calloc on its first call, possibly resulting in an infinite loop leading to a segfault.
complete malloc hook
My complete hooking function now looks like this:
extern void *__libc_malloc(size_t size);
int malloc_hook_active = 0;
void*
malloc (size_t size)
{
void *caller = __builtin_return_address(0);
if (malloc_hook_active)
return my_malloc_hook(size, caller);
return __libc_malloc(size);
}
where my_malloc_hook looks like this:
void*
my_malloc_hook (size_t size, void *caller)
{
void *result;
// deactivate hooks for logging
malloc_hook_active = 0;
result = malloc(size);
// do logging
[ ... ]
// reactivate hooks
malloc_hook_active = 1;
return result;
}
Of course, the hooks for calloc, realloc and free work similarly.
dynamic and static linking
With these functions, dynamic linking works out of the box. Linking the .so file containing the malloc hook implementation will result of all calls to malloc from the application and also all library calls to be routed through my hook. Static linking is problematic though. I have not yet wrapped my head around it completely, but in static linking malloc is not a weak symbol, resulting in a multiple definition error at link time.
If you need static linking for whatever reason, for example translating function addresses in 3rd party libraries to code lines via debug symbols, then you can link these 3rd party libs statically while still linking the malloc hooks dynamically, avoiding the multiple definition problem. I have not yet found a better workaround for this, if you know one,feel free to leave me a comment.
Here is a short example:
gcc -o test test.c -lmalloc_hook_library -Wl,-Bstatic -l3rdparty -Wl,-Bdynamic
3rdparty will be linked statically, while malloc_hook_library will be linked dynamically, resulting in the expected behaviour, and addresses of functions in 3rdparty to be translatable via debug symbols in test. Pretty neat, huh?
Conlusion
the techniques above describe a non-deprecated, pretty much equivalent approach to __malloc_hooks, but with a couple of mean limitations:
__builtin_caller_address only works with gcc
__libc_malloc only works with glibc
dlsym(RTLD_NEXT, [...]) is a GNU extension in glibc
the linker flags -Wl,-Bstatic and -Wl,-Bdynamic are specific to the GNU binutils.
In other words, this solution is utterly non-portable and alternative solutions would have to be added if the hooks library were to be ported to a non-GNU operating system.
You can use LD_PRELOAD & dlsym
See "Tips for malloc and free" at http://www.slideshare.net/tetsu.koba/presentations
Just managed to NDK build code containing __malloc_hook.
Looks like it's been re-instated in Android API v28, according to https://android.googlesource.com/platform/bionic/+/master/libc/include/malloc.h, esp:
extern void* (*volatile __malloc_hook)(size_t __byte_count, const void* __caller) __INTRODUCED_IN(28);
I am studying on "reading code" by reading pieces of NetBSD source code.
(for whoever is interested, it's < Code Reading: The Open Source Perspective > I'm reading)
And I found this function:
/* convert IP address to a string, but not into a single buffer
*/
char *
naddr_ntoa(naddr a)
{
#define NUM_BUFS 4
static int bufno;
static struct {
char str[16]; /* xxx.xxx.xxx.xxx\0 */
} bufs[NUM_BUFS];
char *s;
struct in_addr addr;
addr.s_addr = a;
strlcpy(bufs[bufno].str, inet_ntoa(addr), sizeof(bufs[bufno].str));
s = bufs[bufno].str;
bufno = (bufno+1) % NUM_BUFS;
return s;
#undef NUM_BUFS
}
It introduces 4 different temporary buffers to wrap inet_ntoa function since inet_ntoa is not re-entrant.
But seems to me this naddr_ntoa function is also not re-entrant:
the static bufno variable can be manipulated by other so the temporary buffers do not seem work as expected here.
So is it a potential bug?
Yes, this is a potential bug. If you want a similar function that most likely reentrant you could use e.g. inet_ntop (which incidentally handles IPv6 as well).
That code comes from src/sbin/routed/trace.c and it is not a general library routine, but just a custom hack used only in the routed program. The addrname() function in the same file makes use of the same trick, for the same reason. It's not even NetBSD code per se, but rather it comes from SGI originally, and is maintained by Vernon Schryver (see The Routed Page).
It's just a quick hack to allow use of multiple calls within the same expression, such as where the results are being used in one printf() call: E.g.:
printf("addr1->%s, addr2->%s, addr3->%s, addr4->%s\n",
naddr_ntoa(addr1), naddr_ntoa(addr2), naddr_ntoa(addr3), naddr_ntoa(addr4));
There are several examples of similar uses in the routed source files (if.c, input.c, rdisc.c).
There is no bug in this code. The routed program is not multi-threaded. Reentrancy is not being addressed at all in this hack. This trick has been done by design for a very specific purpose that has nothing to do with reentrancy. The Code Reading author(s) is wrong to associate this trick with reentrancy.
It's simply a way to hide the saving of multiple results in an array of static variables instead of having to individually copy those results from one static variable into separate storage in the calling function when multiple results are required for a single expression.
Remember that static variables have all the properties of global variables except for the limited scope of their identifier. It is of course true that unprotected use of global (or static) variables inside a function make that function non-reentrant, but that's not the only problem global variables cause. Use of a fully-reentrant function would not be appropriate in routed because it would actually make the code more complex than necessary, whereas this hack keeps the calling code clean and simple. It would though have been better for the hack to be properly documented such that future maintainers would more easily spot when NUM_BUFS has to be adjusted.