The descriptions seem virtually identical. Are there any nuances between the two that should be noted? Why would someone use one over the other? This question may be imposed as well for Tcl_Alloc() and malloc().
They're used because Tcl supports being built on Windows with one tool chain, and loading a DLL built with a different toolchain. A key feature of that scenario is that it is fairly common for different toolchains to have their own implementations of the C library, and that means different implementations of malloc(). You must match malloc() and free() to the same library or you get some truly weird failures (crashes, memory leaks, etc.) By providing Tcl_Alloc and Tcl_Free (which are usually very thin wrappers) it makes it possible for user code to match up the allocations and releases correctly.
This is normally the most obvious reason to do that:
Normally, the best understood reason to use your own version of the memory allocation functions is to have a single definition that allows you to change the memory allocator for a different allocator. (a debugging, extended, or implemented with security options, etc.)
Just assume you have the following implementation:
void *my_malloc(size_t siz)
{
return malloc(siz);
}
void my_free(void *ptr)
{
free(ptr);
}
defined in allocator_malloc.c
and for a special customer X you have acquired a license of the new ACME allocator. For this customer you link your executable with the file allocator_ACME.c which contains:
void *my_malloc(size_t siz)
{
return ACME_malloc(siz);
}
void free(void *ptr)
{
ACME_free(ptr);
}
Then, just linking your executable with one or the other file, you generate a dependency of the standard library malloc(), or you'll have to provide an implementation of ACME_malloc() function. In this way, just changing the presence of one of several object files, changes the whole set of dependencies (assuming you have definitions for both my_malloc() and my_free() in your source file) into one of several different implementations.
The drawback is that you have one level of function call more, so in some cases a more sofisticated solution has to be used.
Assume that you buy an automatic garbage collector, so you don't need to return the memory allocated with malloc, as for some magic, the library will detect that you have not used it more, and it garbage collects it automatically:
void *my_malloc(size_t siz)
{
return GC_malloc(siz);
}
void my_free(void *ptr)
{
/* empty */
}
Related
I have (mapped in memory) two object files, "A.o" and "B.o", with the same CPU Instruction Set (not necessarily Intel --it can be x86, x86_64, MIPS(32/64), ARM(32/64), PowerPC(32/64),..., but always the same in both object files).
Also, both object files are compiled with the same endianness (both little endian, or both big endian).
However (you knew there was a however, otherwise there wouldn't be any question), "A.o" and "B.o" can have a different function calling convention and, to make things worse, unknown to each other ("A.o" has not even the slightest idea about the calling convention for functions in "B.o", and vice versa).
"A.o" and "B.o" are obviously designed to call functions within their same object file, but there must be a (very) limited interface for communicating between them (otherwise, if execution starts at some function in "A.o", no function from "B.o" would ever be executed if there was no such interface).
The file where execution started (let's suppose it's "A.o") knows the addresses of all static symbols from "B.o" (the addresses of all functions and all global variables). But the opposite is not true (well, the limited interface I'm trying to write would overcome that, but "B.o" doesn't know any address from "A.o" before such interface is established).
Finally the question: How can execution jump from a function in "A.o" to a function in "B.o", and back, while also communicating some data?
I need it to:
Be done in standard C (no assembly).
Be portable C (not compiler-dependent, nor CPU-dependent).
Be thread safe.
Don't make any assumption about the calling conventions involved.
Be able to communicate data between the two object files.
My best idea, for the moment, seems that can meet all these requirements, except thread safety. For example, if I define an struct like this:
struct data_interface {
int value_in;
int value_out; };
I could write a pointer to an struct like this from "A.o" into a global variable of "B.o" (knowing in advance that such global variable in "B.o" has space enough for storing a pointer).
Then, the interface function would be a void interface(void) (I'm assuming that calling void(void) functions is safe across different calling conventions... if this is not true, then my idea wouldn't work). Calling such a function from "A.o" to "B.o" would communicate the data to the code in "B.o". And, fingers crossed, when the called function in "B.o" returns, it would travel back nicely (supposing the different calling convention doesn't change the behaviour when returning from void(void) functions).
However, this is not thread safe, of course.
For it to be thread safe, I guess my only option is to access the stack.
But... can the stack be accessed in a portable way in standard C?
Here are two suggestions.
Data interface
This elaborates on the struct you defined yourself. From what I've seen in the past, compilers typically use a single register (e.g. eax) for their return value (provided the return type fits in a register). My guess is, the following function prototype is likely to be unaffected by differing calling conventions.
struct data_interface *get_empty_data_interface(void);
If so, then you could use that in a way that is similar to the idea you already had about using arrays. Define the following struct and functions in B:
struct data_interface {
int ready;
int the_real_data;
};
struct data_interface *get_empty_data_interface(void)
{
struct data_interface *ptr = malloc(sizeof(struct data_interface));
add_to_list_of_data_block_pointers(ptr);
ptr->ready = 0;
return ptr;
}
void the_function(void)
{
execute_functionality_for_every_data_block_in_my_list_that_is_flagged_ready_and_remove_from_list();
}
To call the function, do this in A:
struct data_interface *ptr = get_empty_data_interface();
ptr->the_real_data = 12345;
ptr->ready = 1;
the_function();
For thread-safety, make sure the list of data blocks maintained by B is thread-safe.
Simultaneous calls to get_empty_data_interface should not overwrite each other's slot in the list.
Simultaneous calls to the_function should not both pick up the same list element.
Wrapper functions
You could try to expose wrapper functions with a well-known calling convention (e.g. cdecl); if necessary defined in a separate object file that is aware of the calling convention of the functions it wraps.
Unfortunately you will probably need non-portable function attributes for this.
You may be able to cheat your way out of it by declaring variadic wrapper functions (with an ellipsis parameter, like printf has); compilers are likely to fall back on cdecl for those. This eliminates non-portable function attributes, but it may be unreliable; you would have to verify my assumption for every compiler you'd like to support. When testing this, keep in mind that compiler options (in particular optimizations) may well play a role. All in all, quite a dirty approach.
the question implies that both object files are compiled differently except for the endianness and that they are linked together into one executable.
it says that A.o knows all static symbols from B.o, but the opposite is not true.
Don't make any assumption about the calling conventions involved.
so we'll be using only void f(void) type of functions.
you'll declare int X, Y; in B.o and extern int X, Y; in A.o so before you call the functions in B.o you check the Y flag, if raised wait until it falls. when a B's function is called it raises the Y flag, read the input from X, do some calculations, write the result back in X and return.
then the calling function in A.o copies the value from X into it's own compilation unit and clears the Y flag.
...if calling a void f(void) function just makes a wild jump from one point in the code to another.
another way to do it would be to declare static int Y = 0; in B.o and omit it entirely in A.o
then when a B.o function gets called it checks if Y == 0 and if so increase Y, read X, do calculations, write X, decrease Y and return. if not so then wait to become 0 and block the calling function.
or maybe even have a static flag in every B.o function, but i don't see the point in this waste since the communication data is global in B.o
Remember that there are both caller saves and callee saves conventions out there, together with variations on use of registers to pass values, use or not of a frame pointer, and even (in some architectures, in some optimisation levels) the use of the delay slot in a branch to hold the first instruction of the subroutine. You are not going to be able to do this without some knowledge of the calling conventions in play, but fortunately the linker will need that anyway. Presumably there is some higher level entity that is responsible for loading those DLLs and that knows the calling conventions for both of them?
Anything you do here is going to be at best deep into implementation defined territory, if not technically undefined behaviour, and you will want to make a deep study of the linker and loader (In particular the linker must know how to resolve dynamic linkage in your unknown calling convention or you will not be able to load that shared object in a meaningful way, so you may be able to leaverage it using libbfd or such but that is outside the scope of C).
The place this sort of thing can go very wrong is if shared resources are allocated in A and freed in B (Memory springs to mind) as memory management is a usually a library based wrapper over the operating systems SBRK or similar, and these implementations of memory management are not inherently compatible in memory layout, other places you may be bitten by this include IO (see shennanigans you sometimes get when mixing printf and cout in c++ for a benign example), and locking.
Let's assume, I have a C structure, DynApiArg_t.
typedef struct DynApiArg_s {
uint32_t m1;
...
uint32_t mx;
} DynApiArg_t;
The pointer of this struct is passed as an arg to a function say
void DynLibApi(DynApiArg_t *arg)
{
arg->m1 = 0;
another_fn_in_the_lib(arg->mold); /* May crash here. (1) */
}
which is present in a dynamic library, libdyn.so. This API is invoked from an executable via a dlopen/dlsym procedure of invocation.
In case this dynamic library is updated to version 2, where DynApiArg_t now has new member, say m2, as below:
typedef struct DynApiArg_s {
uint32_t m1;
OldMbr_t *mold;
...
uint32_t mx;
uint32_t m2;
NewMbr *mnew;
} DynApiArg_t;
Without a complete rebuild of the executable or other libs that call this API via a dlopen/dlsym, everytime this API is invoked, I see the process crashing, due to the some dereference of any member in the struct. I understand accessing m2 may be a problem. But access to member mold like below is seen causing crashes.
typedef void (*fnPtr_t)(DynApiArg_t*);
void DynApiCaller(DynApiArg_t *arg)
{
void *libhdl = dlopen("libdyn.so", RTLD_LAZY | RTLD_GLOBAL);
fnPtr_t fptr = dlsym(libhdl, "DynLibApi");
fnptr(arg); /* actual call to the dynamically loaded API (2) */
}
In the call to the API via fnptr, at line marked (2), when the old/existing members (in v1 of lib, when DynApiCaller was initially compiled) is accessed at (1), it happens to be any garbage value or even NULL at times.
What is the right way to handle such updates without a complete recompilation of the executable everytime the dependant libs are updated?
I've seen libs being named with symliks with version numbers like libsolid.so.4. Is there something related to this versioning system that can help me? If so can you point me to right documentations for these if any?
There are a number of approaches to solve this problem:
Include the API version in the dynamic library name.
Instead of dlopen("libfoo.so"), you use dlopen("libfoo.so.4"). Different major versions of the library are essentially separate, and can coexist on the same system; so, the package name for that library would be e.g. libfoo-4. You can have libfoo.so.4 and libfoo.so.5 installed at the same time. Minor versions, say libfoo-4.2, install libfoo.so.4.2, and symlink libfoo.so.4 to libfoo.so.4.2.
Initially define the structures with zero padding (required to be zero in earlier versions of the library), and have the later versions reuse the padding fields, but keeping the structures the same size.
Use versioned symbol names. This is a Linux extension, using dlvsym(). A single shared library binary can implement several versions of the same dynamic symbol.
Use resolver functions to determine the symbols at load time. This allows e.g. hardware architecture-optimized variants of functions to be selected at run time, but is less useful with a dlopen()-based approach.
Use a structure to describe the library API, and a versioned function to obtain/initialize that API.
For example, version 4 of your library could implement
struct libfoo_api {
int (*func1)(int arg1, int arg2);
double *data;
void (*func2)(void);
/* ... */
};
and only export one symbol,
int libfoo_init(struct libfoo_api *const api, const int version);
Calling that function would initialize the api structure with the symbols supported, with the assumption that the structure corresponds to the specified version. A single shared library can support multiple versions. If a version is not supported, it can return a failure.
This is especially useful for plugin-type interfaces (although then the _init function is more likely to call application-provided functionality registering functions, rather than fill in a structure), as a single file can contain optimized functionality for a number of versions, optimized for a number of compatible hardware architectures (for example, AMD/Intel architectures with different SSE/AVX/AVX2/AVX512 support).
Note that the above implementation details can be "hidden" in a header file, making actual C code using the shared library much simpler. It also helps making the same API work across a number of OSes, simply by changing the header file to use the approach that works best on that OS, while keeping the actual C interface the same.
I don't understand why there's a need for another level of indirection when releasing or acquiring the GVL in Ruby C API.
Both rb_thread_call_without_gvl() and rb_thread_call_with_gvl() require a function that accepts only one argument which isn't always the case.
I don't want to wrap my arguments in a struct just for the purpose of releasing the GVL. It complicates the code's readability and requires casting from and to void pointers.
After looking into Ruby's threading code I found the GVL_UNLOCK_BEGIN/GVL_UNLOCK_END macros that matches Python's Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS but I can't find documentation about them and when they are safe to use.
There's also the BLOCKING_REGION macro is used within rb_thread_call_without_gvl() but I'm not sure if it's safe to use it as a standalone without calling rb_thread_call_without_gvl() itself.
What is the correct way to safely release the GVL in the middle of the execution flow without having to call another function?
In Ruby 2.x, there is only the rb_thread_call_without_gvl API. GVL_UNLOCK_BEGIN and GVL_UNLOCK_END are implementation details that are only defined in thread.c, and are therefore unavailable to Ruby extensions. Thus, the direct answer to your question is "there is no way to correctly and safely release the GVL without calling another function".
There was previously a "region-based" API, rb_thread_blocking_region_begin/rb_thread_blocking_region_end, but this API was deprecated in Ruby 1.9.3 and removed in Ruby 2.2 (see https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/CAPI_obsolete_definitions for the CAPI deprecation schedule).
Therefore, unfortunately, you are stuck with rb_thread_call_without_gvl.
That said, there's a few things you could do to ease the pain. In standard C, converting between most pointers and void * is implicit, so you don't have to add a cast. Furthermore, using designated initializer syntax can simplify the creation of the argument structure.
Thus, you can write
struct my_func_args {
int arg1;
char *arg2;
};
void *func_no_gvl(void *data) {
struct my_func_args *args = data;
/* do stuff with args->arg... */
return NULL;
}
VALUE my_ruby_function(...) {
...
struct my_func_args args = {
// designated initializer syntax (C99) for cleaner code
.arg1 = ...,
.arg2 = ...,
};
// call without an unblock function
void *res = rb_thread_call_without_gvl(func_no_gvl, &args, NULL, NULL);
...
}
Although this doesn't solve your original problem, it does at least make it more tolerable (I hope).
What is the correct way to safely release the GVL in the middle of the
execution flow without having to call another function?
You must use the supplied API or whatever method you use will eventually break. The API to the GVL is defined in thread.h
void *rb_thread_call_with_gvl(void *(*func)(void *), void *data1);
void *rb_thread_call_without_gvl(void *(*func)(void *), void *data1,
rb_unblock_function_t *ubf, void *data2);
void *rb_thread_call_without_gvl2(void *(*func)(void *), void *data1,
rb_unblock_function_t *ubf, void *data2);
What you find in the header is an agreement between you the consumer of their API's and the author of the API's. Think of it as a contract. Anything you find in a .c in particular static methods and MACROS are not for consumption outside the file unless it's found in the header. The static keyword prevents this from happening, it's one of the reason it exists and it's most important use in C. The other items you mentioned are in thread.c. You can poke around in thread.c but using anything from it is a violation of the API's contract ie it's not safe and never will be.
I'm not suggesting you do this but the only way for you to do what you want is to copy portions of their implementation into your own code and this would not pass a code review. The amount of code you would need to copy out would likely dwarf anything you would need to do to use their API's safely.
I am writing a memory profiler for C and for that am intercepting calls to the malloc, realloc and free functions via malloc_hooks. Unfortunately, these are deprecated because of their poor behavior in multi threaded environments. I could not find a document describing the alternative best practice solution to achieve the same thing, can someone enlighten me?
I've read that a simple #define malloc(s) malloc_hook(s) would do the trick, but that does not work with the system setup I have in mind, because it is too intrusive to the original code base to be suitable for use in a profiling / tracing tool. Having to manually change the original application code is a killer for any decent profiler. Optimally, the solution I am looking for should be enabled or disabled just by linking to an optional shared library. For example, my current setup uses a function declared with __attribute__ ((constructor)) to install the intercepting malloc hooks.
Thanks
After trying some things, I finally managed to figure out how to do this.
First of all, in glibc, malloc is defined as a weak symbol, which means that it can be overwritten by the application or a shared library. Hence, LD_PRELOAD is not necessarily needed. Instead, I implemented the following function in a shared library:
void*
malloc (size_t size)
{
[ ... ]
}
Which gets called by the application instead of glibcs malloc.
Now, to be equivalent to the __malloc_hooks functionality, a couple of things are still missing.
1.) the caller address
In addition to the original parameters to malloc, glibcs __malloc_hooks also provide the address of the calling function, which is actually the return address of where malloc would return to. To achieve the same thing, we can use the __builtin_return_address function that is available in gcc. I have not looked into other compilers, because I am limited to gcc anyway, but if you happen to know how to do such a thing portably, please drop me a comment :)
Our malloc function now looks like this:
void*
malloc (size_t size)
{
void *caller = __builtin_return_address(0);
[ ... ]
}
2.) accessing glibcs malloc from within your hook
As I am limited to glibc in my application, I chose to use __libc_malloc to access the original malloc implementation. Alternatively, dlsym(RTLD_NEXT, "malloc") can be used, but at the possible pitfall that this function uses calloc on its first call, possibly resulting in an infinite loop leading to a segfault.
complete malloc hook
My complete hooking function now looks like this:
extern void *__libc_malloc(size_t size);
int malloc_hook_active = 0;
void*
malloc (size_t size)
{
void *caller = __builtin_return_address(0);
if (malloc_hook_active)
return my_malloc_hook(size, caller);
return __libc_malloc(size);
}
where my_malloc_hook looks like this:
void*
my_malloc_hook (size_t size, void *caller)
{
void *result;
// deactivate hooks for logging
malloc_hook_active = 0;
result = malloc(size);
// do logging
[ ... ]
// reactivate hooks
malloc_hook_active = 1;
return result;
}
Of course, the hooks for calloc, realloc and free work similarly.
dynamic and static linking
With these functions, dynamic linking works out of the box. Linking the .so file containing the malloc hook implementation will result of all calls to malloc from the application and also all library calls to be routed through my hook. Static linking is problematic though. I have not yet wrapped my head around it completely, but in static linking malloc is not a weak symbol, resulting in a multiple definition error at link time.
If you need static linking for whatever reason, for example translating function addresses in 3rd party libraries to code lines via debug symbols, then you can link these 3rd party libs statically while still linking the malloc hooks dynamically, avoiding the multiple definition problem. I have not yet found a better workaround for this, if you know one,feel free to leave me a comment.
Here is a short example:
gcc -o test test.c -lmalloc_hook_library -Wl,-Bstatic -l3rdparty -Wl,-Bdynamic
3rdparty will be linked statically, while malloc_hook_library will be linked dynamically, resulting in the expected behaviour, and addresses of functions in 3rdparty to be translatable via debug symbols in test. Pretty neat, huh?
Conlusion
the techniques above describe a non-deprecated, pretty much equivalent approach to __malloc_hooks, but with a couple of mean limitations:
__builtin_caller_address only works with gcc
__libc_malloc only works with glibc
dlsym(RTLD_NEXT, [...]) is a GNU extension in glibc
the linker flags -Wl,-Bstatic and -Wl,-Bdynamic are specific to the GNU binutils.
In other words, this solution is utterly non-portable and alternative solutions would have to be added if the hooks library were to be ported to a non-GNU operating system.
You can use LD_PRELOAD & dlsym
See "Tips for malloc and free" at http://www.slideshare.net/tetsu.koba/presentations
Just managed to NDK build code containing __malloc_hook.
Looks like it's been re-instated in Android API v28, according to https://android.googlesource.com/platform/bionic/+/master/libc/include/malloc.h, esp:
extern void* (*volatile __malloc_hook)(size_t __byte_count, const void* __caller) __INTRODUCED_IN(28);
I'm creating a cross platform library using C. I have a piece of code like the following, in which I'm using the libc memory management functions directly:
myObject* myObjectCreate(void)
{
...
myObject *pObject = (myObject*)malloc(sizeof(*pObject));
...
}
void myObjectDestroy(myObject *pObject)
{
...
free(pObject);
...
}
I understand these memory management functions are not always available, especially on embedded systems based on low-end microcontrollers. Unfortunately my library needs to be compilable on these systems.
To work around this problem, I suppose I'd have to make these functions customisable by my library client.
So, what are the recommended ways to achieve this?
There are many approaches.
I use #if, combined with compiler provided defines, to have per platform behaviour.
Should a given functionality (such as malloc) be found, #define MYLIB_MALLOC can be defined.
Then, later, you can check for #ifdef MYLIB_MALLOC and if not present, provide a dummy malloc function, which will allow your code to compile.
Use function pointers.
Define the following pointers in the library:
void* (*CustomMalloc)(size_t) = NULL;
void (*CustomFree)(void*) = NULL;
And prior to using of the library functions initialize these pointers to point to custom implementations of malloc() and free(). Or initialize them to point to the real malloc() and free().
Inside of the library replace malloc(size) with CustomMalloc(size) and free(pointer) with CustomFree(pointer).
Use conditional compile, i.e. define some macro's like LIBC_AVAIL, LIBC_NOT_AVAIL and include different code when compiling.