Which Unix don't have a thread-safe malloc? - c

I want my C program to be portable even on very old Unix OS but the problem is that I'm using pthreads and dynamic allocation (malloc). All Unix I know of have a thread-safe malloc (Linux, *BSD, Irix, Solaris) however this is not guaranteed by the C standard, and I'm sure there are very old versions where this is not true.
So, is there some list of platforms that I'd need to wrap malloc() calls with a mutex lock? I plan to write a ./configure test that checks if current platform is in that list.
The other alternative would be to test malloc() for thread-safety, but I know of no deterministic way to do this. Any ideas on this one too?

The only C standard that has threads (and can thus is relevant to your question) is C11, which states:
For purposes of determining the existence of a data race, memory
allocation functions behave as though they accessed only memory
locations accessible through their arguments and not other static
duration storage.
Or in other words, as long as two threads don't pass the same address to realloc or free all calls to the memory functions are thread safe.
For POSIX, that is all Unix'es that you can find nowadays you have:
Each function defined in the System Interfaces volume of IEEE Std 1003.1-2001 is thread-safe unless explicitly stated otherwise.
I don't know from where you take your assertion that malloc wouldn't be thread safe for older Unixes, a system with threads that doesn't implement that thread safe is pretty much useless. What might be a problem on such an older system is performance, but it should always be functional.

Related

How do I call opendir without using malloc'd memory?

Just for educational purposes, I'm writing a C program without any malloc, and I'm checking that there's no heap usage by using mallinfo().uordblks. I've noticed that the function opendir triggers a huge spike in malloc'd memory according to mallinfo, and I'm not sure why. I'm wondering if there's a way to give opendir a stack-allocated buffer in order to do what it needs so that I can avoid this (similar to setvbuf, which I used to avoid buffering on the heap for stdout/stderr). Bascially, how do I read the contents of a directory without using heap-allocated memory?. If it makes a difference, I'm on a Linux machine.
You can't, any more than you could use stdio without the possibility that it calls malloc, or likewise many other components in libc. Fundamentally there's no reason that any of the standard library functions can't use malloc internally, although for many it would have to be conditional with fallback paths (because they're not allowed to fail, or because they need to be async-signal-safe, etc.) and for lots it would make no sense whatsoever for them to do so in a reasonable implementation.
In any case, since unlike with stdio (where you can do low-level fd operations instead) there is no portable directory-access API that's not normally implemented with a userspace buffer object (DIR), you either have to accept that it uses malloc or go with a non-portable lower-level interface (on Linux, the SYS_getdents64 syscall).
One option on systems that let you define your own malloc would be doing that, and having it allocate from a fixed pool or direct mmap or similar, if there's a reason you need to avoid whatever malloc normally does on your system.

xmlCleanupParser() memory loss?

as xmlCleanupParser() from the very good libxml2 is not thread-safe, my question is (and I have no possibility to check it out), how much Memory (rough number) is lost to xmlParseFile() and -more importantly- is this memory loss cumulating over many calls to xPF()?
Despite the fact, that malloc() and free() or whatever memory handling implementations are not necessarily thread safe in C < 11, there's always the problem of shared/global memory. File handles to the same file in different threads aren't that bad as long as they're read only.
However, starting with libxml2 2.4.7, you might be able to enable thread safety at the API level, for single threads per document: http://www.xmlsoft.org/threads.html
When I look at the sources of libxml2 2.9.1, I'm positive that thread safety is fully implemented, despite global mutexes, there's also an atomic allocation function.
Downloads:
ftp://xmlsoft.org/libxml2/
following the advice given by meaning-matters, and using the only tool, I found under OS2 (this ancient old IBM operating system) to check memory, there seams to be no difference in memory-loss between using xCP() or choosing not to (for me).

Does malloc itself provide some kind of synchronization?

I heard "malloc is thread-safe because it provide a synchronization primitive so that simultaneous to malloc will not corrupt the heap".
But when I look at the source code of malloc function in visual studio crt, it turns out that the malloc function just pass the request to syscall HeapAlloc. So I think it is the opearting system itself provide some kind of synchronization to protect application from corrupted heap rather than malloc.
Then what about linux? Does malloc itself provide some kind of synchronization?
The only standard that speaks about this is C11 (since there was no notion of multithreading before), which says (7.22.3/2):
For purposes of determining the existence of a data race, memory allocation functions
behave as though they accessed only memory locations accessible through their
arguments and not other static dura­tion storage. These functions may, however, visibly
modify the storage that they allocate or de­allo­cate. A call to free or realloc that
deallocates a region p of memory synchronizes with any allocation call that allocates all
or part of the region p. This synchronization occurs after any access of p by the
deallocating function, and before any such access by the allocating function.
In short, "it's all fine".
However, specific implementations like Linux will surely have been providing their own, strong guarantees for a long time (since ptmalloc2 I think), and it's basically always been fine. [Update, thanks to #ArjunShankar: Posix does indeed require that malloc be thread-safe.]
(Note, though, that other implementations such as Google's tcmalloc may have better performance in multithreaded applications.)
(For C++, see C++11: 18.6.1.4.)

Call C function with different stack pointer (gcc)

I'm looking for a way to call a C function in a different stack, i.e. save the current stack pointer, set the stack pointer to a different location, call the function and restore the old stack pointer when it returns.
The purpose of this is a lightweight threading system for a programming language. Threads will operate on very small stacks, check when more stack is needed and dynamically resize it. This is so that thousands of threads can be allocated without wasting a lot of memory. When calling in to C code it is not safe to use a tiny stack, since the C code does not know about checking and resizing, so I want to use a big pthread stack which is used only for calling C (shared between lightweight threads on the same pthread).
Now I could write assembly code stubs which will work fine, but I wondered if there is a better way to do this, such as a gcc extension or a library which already implements it. If not, then I guess I'll have my head buried in ABI and assembly language manuals ;-) I only ask this out of laziness and not wanting to reinvent the wheel.
Assuming you're using POSIX threads and on a POSIX system, you can achieve this with signals. Setup an alternate signal handling stack (sigaltstack) and designate one special real-time signal to have its handler run on the alternate signal stack. Then raise the signal to switch to the stack, and have the signal handler read the data for what function to call, and what argument to pass it, from thread-local data.
Note that this approach is fairly expensive (multiple system calls to change stacks), but should be 100% portable to POSIX systems. Since it's slow, you might want to make arch-specific call-on-alt-stack functions written in assembly, and only use my general solution as a fallback for archs where you haven't written an assembly version.

Which C standard library functions use malloc under the hood

I want to know which C standard library functions use malloc and free under the hood. It looked to me as if printf would be using malloc, but when I tested a program with valgrind, I noticed that printf calls didn't allocate any memory using malloc. How come? How does it manage the memory then?
Usually, the only routines in the C99 standard that might use malloc() are the standard I/O functions (in <stdio.h> where the file structure and the buffer used by it is often allocated as if by malloc(). Some of the locale handling may use dynamic memory. All the other routines have no need for dynamic memory allocation in general.
Now, is any of that formally documented? No, I don't think it is. There is no blanket restriction 'the functions in the library shall not use malloc()'. (There are, however, restrictions on other functions - such as strtok() and srand() and rand(); they may not be used by the implementation, and the implementation may not use any of the other functions that may return a pointer to a static memory location.) However, one of the reasons why the extremely useful strdup() function is not in the standard C library is (reportedly) because it does memory allocation. It also isn't completely clear whether this was a factor in the routines such as asprintf() and vasprintf() in TR 24731-2 not making it into C1x, but it could have been a factor.
The standard doesn't place any requirements on the implementation, AFAIK.
I don't know exactly how printf is implemented, but of the top of my head, I can't think of a reason why it would need to dynamically allocate memory. You could always look at the source for your platform.
It depends on which libc you are using. There should be no restriction on the C spec and up to the implementation.
For instance, newlib's printf usually done with using memory on stack frame, but when it really needs to, it calls an internal function _malloc_r() directly.
I have not used valgrind, I'm not sure if it can detect use of _malloc_r().
Neither the C nor the POSIX standard force implementors to make use of malloc(), so there's no general answer to your question.
However, every sane standard library implementation that uses malloc() in one of its functions will set errno to ENOMEM if malloc() fails. Hence, you can derive from the documentation whether a library function uses malloc() or not. Point in case: on my system, mmap() may use malloc(), since mmap() may set errno to ENOMEM.
That having said, using valgrind is a poor way to find out whether a particular function calls malloc() or not. Consider the following piece of code:
void foo(int x)
{
if (!x) malloc(1);
}
If you call this function with an argument other than 0, valgrind won't notice that it may actually call malloc(). Think of valgrind as a virtual machine (since that's what it is): it doesn't look at your code, it only sees what the machine would actually execute.
printf doesn't need to form the entire output string in one shot, it can send it to output piece by piece, and when it encounters a format specifier, it can output that piece of data as it is formed, and continue on with the rest of the string.
At most it would need a locally defined array of characters (on the stack) large enough to hold the largest integer or floating point number it can handle, which isn't very large.

Resources