Which C standard library functions use malloc under the hood

Which C standard library functions use malloc under the hood - c

I want to know which C standard library functions use malloc and free under the hood. It looked to me as if printf would be using malloc, but when I tested a program with valgrind, I noticed that printf calls didn't allocate any memory using malloc. How come? How does it manage the memory then?

Usually, the only routines in the C99 standard that might use malloc() are the standard I/O functions (in <stdio.h> where the file structure and the buffer used by it is often allocated as if by malloc(). Some of the locale handling may use dynamic memory. All the other routines have no need for dynamic memory allocation in general.
Now, is any of that formally documented? No, I don't think it is. There is no blanket restriction 'the functions in the library shall not use malloc()'. (There are, however, restrictions on other functions - such as strtok() and srand() and rand(); they may not be used by the implementation, and the implementation may not use any of the other functions that may return a pointer to a static memory location.) However, one of the reasons why the extremely useful strdup() function is not in the standard C library is (reportedly) because it does memory allocation. It also isn't completely clear whether this was a factor in the routines such as asprintf() and vasprintf() in TR 24731-2 not making it into C1x, but it could have been a factor.

The standard doesn't place any requirements on the implementation, AFAIK.
I don't know exactly how printf is implemented, but of the top of my head, I can't think of a reason why it would need to dynamically allocate memory. You could always look at the source for your platform.

It depends on which libc you are using. There should be no restriction on the C spec and up to the implementation.
For instance, newlib's printf usually done with using memory on stack frame, but when it really needs to, it calls an internal function _malloc_r() directly.
I have not used valgrind, I'm not sure if it can detect use of _malloc_r().

Neither the C nor the POSIX standard force implementors to make use of malloc(), so there's no general answer to your question.
However, every sane standard library implementation that uses malloc() in one of its functions will set errno to ENOMEM if malloc() fails. Hence, you can derive from the documentation whether a library function uses malloc() or not. Point in case: on my system, mmap() may use malloc(), since mmap() may set errno to ENOMEM.
That having said, using valgrind is a poor way to find out whether a particular function calls malloc() or not. Consider the following piece of code:
void foo(int x)
{
if (!x) malloc(1);
}
If you call this function with an argument other than 0, valgrind won't notice that it may actually call malloc(). Think of valgrind as a virtual machine (since that's what it is): it doesn't look at your code, it only sees what the machine would actually execute.

printf doesn't need to form the entire output string in one shot, it can send it to output piece by piece, and when it encounters a format specifier, it can output that piece of data as it is formed, and continue on with the rest of the string.
At most it would need a locally defined array of characters (on the stack) large enough to hold the largest integer or floating point number it can handle, which isn't very large.

Related

Does glibc (GNU C Library) provide a way to obtain the size of an allocated memory block?

I have a pointer. I know its address (I got as an argument to a function), and I know that it points to a memory address previously allocated by the malloc() call.
Is there any way to know the size of this allocated memory block?
I would prefer a cross-platform, standard solution, but I think these do not exist. Thus, anything is okay, even hardcore low-level malloc data structure manipulation, if there is no better. I use glibc with x86_64 architecture and there is no plan to run the result elsewhere. I am not looking for a generic answer, it can be specific to glibc/x86_64.
I think, this information should be available, otherways realloc() could not work.
This question asks for a generic, standard-compliant solution, which is impossible. I am looking for a glibc/x86_64 solution which is possible, because the glibc is open source and the glibc realloc() needs this to work, and this question allows answers by digging in non-standard ways in the low-levels malloc internals.

malloc_usable_size returns the number of usable bytes in the block of allocated memory pointed to by the pointer it is passed. This is not necessarily the original requested size; it is the provided size, which may be larger, at the convenience of the allocation software.
The GNU C Library apparently does not document this directly:
This part of the GNU C Library documentation says it provides malloc_usable_size but does not document its behavior, and it appears to be the only mention in the full documentation there.
This GNU C Library page says its API is documented by the Linux man-pages project, among others, and those pages point to this for malloc_usable_size.
So I suppose you may take that last page as having the imprimatur of the GNU C Library. It says size_t malloc_usable_size(void *ptr) “returns the number of usable bytes in the block pointed to by ptr, a pointer to a block of memory allocated by malloc(3) or a related function,” and indicates the function is declared in <malloc.h>. Also, if ptr is null, zero is returned.

How do I call opendir without using malloc'd memory?

Just for educational purposes, I'm writing a C program without any malloc, and I'm checking that there's no heap usage by using mallinfo().uordblks. I've noticed that the function opendir triggers a huge spike in malloc'd memory according to mallinfo, and I'm not sure why. I'm wondering if there's a way to give opendir a stack-allocated buffer in order to do what it needs so that I can avoid this (similar to setvbuf, which I used to avoid buffering on the heap for stdout/stderr). Bascially, how do I read the contents of a directory without using heap-allocated memory?. If it makes a difference, I'm on a Linux machine.

You can't, any more than you could use stdio without the possibility that it calls malloc, or likewise many other components in libc. Fundamentally there's no reason that any of the standard library functions can't use malloc internally, although for many it would have to be conditional with fallback paths (because they're not allowed to fail, or because they need to be async-signal-safe, etc.) and for lots it would make no sense whatsoever for them to do so in a reasonable implementation.
In any case, since unlike with stdio (where you can do low-level fd operations instead) there is no portable directory-access API that's not normally implemented with a userspace buffer object (DIR), you either have to accept that it uses malloc or go with a non-portable lower-level interface (on Linux, the SYS_getdents64 syscall).
One option on systems that let you define your own malloc would be doing that, and having it allocate from a fixed pool or direct mmap or similar, if there's a reason you need to avoid whatever malloc normally does on your system.

Does fprintf use malloc() under the hood?

I want a minimal o-damn-malloc-just-failed handler, which writes some info to a file (probably just standard error). I would prefer to use fprintf() rather than write(), but this will fail badly if fprintf() itself tries to malloc().
Is there some guarantee, either in the C standard, or even just in glibc that fprintf won't do this?

No, there's no guarantee that it won't. However, most implementations I've seen tend to use a fixed size buffer for creating the formatted output string (a).
In terms of glibc (source here), there are calls to malloc within stdio-common/vfprintf.c, which a lot of the printf family use at the lower end, so I wouldn't rely on it if I were you. Even the string-buffer output calls like sprintf, which you may think wouldn't need it, seem to resolve down to that call, after setting up some tricky FILE-like string handles - see libio/iovsprintf.c.
My advice is to then write your own code for doing the output so as to ensure no memory allocations are done under the hood (and hope, of course, that write itself doesn't do this (unlikelier than *printf doing it)). Since you're probably not going to be outputting much converted stuff anyway (probably just "Dang, I done run outta memory!"), the need for formatted output should be questionable anyway.
(a) The C99 environmental considerations gives an indication that (at least) some early implementations had a buffering limit. From my memory of the Turbo C stuff, I thought 4K was about the limit and indeed, C99 states (in 7.19.6.1 fprintf):
The number of characters that can be produced by any single conversion shall be at least
4095.
(the mandate for C89 was to codify existing practice, not create a new language, and that's one reason why some of these mimimum maxima were put in the standard - they were carried forward to later iterations of the standard).

The C standard doesn't guarantee that fprintf won't call malloc under the hood. Indeed, it doesn't guarantee anything about what happens when you override malloc. You should refer to the documentation for your specific C library, or simply write your own fprintf-like function which makes direct syscalls, avoiding any possibility of heap allocation.

The only functions you can be reasonably sure will not call malloc are those marked async-signal-safe by POSIX. Since malloc is not required to be async-signal-safe (and since it's essentially impossible to make it async-signal-safe without making it unusably inefficient), async-signal-safe functions normally cannot call it.
With that said, I'm nearly sure glibc's printf functions (including fprintf and even snprintf) can and will use malloc for some (all?) format strings.

What non-NULL memory addresses are free to be used as error values in C?

I need to write my own memory allocation functions for the GMP library, since the default functions call abort() and leave no way I can think of to restore program flow after that occurs (I have calls to mpz_init all over the place, and how to handle the failure changes based upon what happened around that call). However, the documentation requires that the value the function returns to not be NULL.
Is there at least one range of addresses that can always be guaranteed to be invalid? It would be useful to know them all, so I could use different addresses for different error codes, or possibly even different ranges for different families of errors.

If the default memory allocation functions abort(), and GMP's code can't deal with a NULL, then GMP is likely not prepared to deal with the possibility of memory allocation failures at all. If you return a deliberately invalid address, GMP's probably going to try to dereference it, and promptly crash, which is just as bad as calling abort(). Worse, even, because the stacktrace won't point at what's really causing the problem.
As such, if you're going to return at all, you must return a valid pointer, one which isn't being used by anything else.
Now, one slightly evil option would be to use setjmp() and longjmp() to exit the GMP routines. However, this will leave GMP in an unpredictable state - you should assume that you can never call a GMP routine again after this point. It will also likely result in memory leaks... but that's probably the least of your concerns at this point.
Another option is to have a reserved pool in the system malloc - that is, at application startup:
emergencyMemory = malloc(bignumber);
Now if malloc() fails, you do free(emergencyMemory), and, hopefully, you have enough room to recover. Keep in mind that this only gives you a finite amount of headroom - you have to hope GMP will return to your code (and that code will check and see that the emergency pool has been used) before you truly run out of memory.
You can, of course, also use these two methods in combination - first use the reserved pool and try to recover, and if that fails, longjmp() out, display an error message (if you can), and terminate gracefully.

No, there isn't a portable range of invalid pointer values.
You could use platform-specific definitions, or you could use the addresses of some global objects:
const void *const error_out_of_bounds = &error_out_of_bounds;
const void *const error_no_sprockets = &error_no_sprockets;
[Edit: sorry, missed that you were hoping to return these values to a library. As bdonlan says, you can't do that. Even if you find some "invalid" values, the library won't be expecting them. It is a requirement that your function must return a valid value, or abort.]
You could do something like this in globals:
void (*error_handler)(void*);
void *error_data;
Then in your code:
error_handler = some_handler;
error_data = &some_data;
mpz_init(something);
In your allocator:
if (allocated_memory_ok) return the_memory;
error_handler(error_data);
abort();
Setting up the error handler and data before calling mzp_init might be somewhat tedious, but depending how different the behaviour is in different cases, you might be able to write some function or macro to deal with it.
What you can't do, though, is recover and carry on running if the GMP library isn't designed to cope after an allocation fails. You're at the mercy of your tools in that respect - if the library call doesn't return on error, then who knows what broken state its internals will be left in.
But that's a fully general view, whereas GMP is open source. You can find out what actually happens in mpz_init, at least for a particular release of GMP. There might be some way to ensure in advance that your allocator has enough memory to satisfy the request(s), or there might be some way to wriggle out without doing too much damage (like bdonlon says, a longjmp).

Since nobody has provided the correct answer, the set of non-NULL memory addresses you can safely use as error values is the same as the set of addresses you create for this purpose. Simply declare a static const char (or global const char if you need it to be globally visible) array whose size N is the number of error codes you need, and use pointers to the N elements of this array as the N error values.
If your pointer type is not char * but something else, you may need to use an object of that type instead of a char array, since converting these char pointers into another pointer type is not guaranteed to work.

Only garanteed on current main stream operating systems (with enabled virtual memory) and CPU architectures:
-1L (means all bits on in a value large enough for a pointer)
This is used by a lot of libraries to mark pointers which are freed. With this you can find out easily if the error cames from using a NULL pointer or a hanging reference.
Works on HP-UX, Windows, Solaris, AIX, Linux, Free-Net-OpenBSD and with i386, amd64, ia64, parisc, sparc and powerpc.
Think this works enough. Don't see any reason for more then this two values (0,-1)

If you only return e.g. 16-bit or 32-bit aligned pointers, an uneven pointer-address (LSB equal to 1) will be at least "mysterious", and would create an opportunity for using my all-time favorite bogus-value 0xDEADBEEF (for 32-bit pointers) or 0xDEADBEEFBADF00D (for 64-bit pointers).

There are several ranges you can use, they are operating system and architecture specific.
Typically most platforms will reserve the first page (usually 4K bytes in length), to catch dereferencing of null pointers (plus room for a slight offset).
You can also point to the reserved operating system pages, on Linux these occupy the region from 0xc0000000 to 0xffffffff (on a 32 bit system). From userspace you won't have necessary privileges to access this region.
Another option (if you want to allocate several such values, is to allocate a page without read or write permissions using mmap or equivalent, and use offsets into this page for each distinct error value.
The simplest solution, is just to use either values immediately negative to 0, (-1, -2, etc.), or immediately positive (1, 2, ...). You can be very certain these addresses are on inaccessible pages.

A possibility is to take C library addresses that are guaranteed to exist and that thus will never be returned by malloc or similar. To be most portable this should be object pointers and not function pointers, but casting ((void*)main) would probably be ok on most architectures. One data pointer that comes to my mind is environ, but which is POSIX, or stdin etc which are not guaranteed to be "real" variables.
To use this you could just use the following:
extern char** environ; /* guaranteed to exist in POSIX */
#define DEADBEAF ((void*)&environ)

When a third-party C function returns a pointer, should you free it yourself?

There are many functions (especially in the POSIX library) that return pointers to almost-necessarily freshly allocated data. Their manpages don't say if you should free them, or if there's some obscure mechanism at play (like returning a pointer to a static buffer, or something along these lines).
For instance, the inet_ntoa function returns a char* most likely out from nowhere, but the manpage doesn't say how it was allocated. I ended up using inet_ntop instead because at least I knew where the destination allocation came from.
What's the standard rule for C functions returning pointers? Who's responsible for freeing their memory?

You have to read the documentation, there is no other way. My man page for inet_ntoa reads:
The string is returned in a statically allocated buffer, which subsequent calls will overwrite.
So in this case you must not attempt to free the returned pointer.

There really isn't a standard rule. Some functions require your to pass a pointer in, and they fill data into that space (e.g., sprintf). Others return the address of a static data area (e.g., many of the functions in <time.h>). Others still allocate memory when needed (e.g., setvbuf).
About the best you can do is hope that the documentation tells you what pointers need to be freed. You shouldn't normally attempt to free pointers it returns unless the documentation tells you to. Unless you're passing in the address of a buffer for it to use, or it specifies that you need to free the memory, you should generally assume that it's using a static data area. This means (among other things) that you should assume the value will be changed by any subsequent calls to the same routine. If you're writing multithreaded code, you should generally assume that the function is not really thread-safe -- that you have a shared data area that requires synchronization, so you should acquire a lock, call the function, copy the data out of its data area, and only then release the lock.

There's no standard rule. Ideally, a standard library function such as inet_ntoa comes with a man
page which describes the "rules of engagement" i.e. the interface of the function - arguments expected, return values in case of success and errors as well as the semantics of dealing with allocated memory.
From the man page of inet_ntoa:
The inet_ntoa() function converts the
Internet host address in, given in
network byte order, to a string in
IPv4 dotted-decimal notation. The
string is returned in a statically
allocated buffer, which subsequent
calls will overwrite.

At least on my machine (Mac OS X 10.6) the final sentence of the manpage, under BUGS is:
The string returned by inet_ntoa() resides in a static memory area.

I think your idea that "many" functions in POSIX return pointers this way is mistaken. Your example, inet_ntoa is not in POSIX and was deliberately excluded because it's deprecated and broken.
The number of standard functions which return pointers to allocated memory is actually rather small, and most of the ones that do so provide a special complementary function you're required to use for freeing the memory (for instance, fopen and fclose, getaddrinfo and freeaddrinfo, or regcomp and regfree). Simply calling free on the pointer returned would be very bad; at best you'd end up with serious memory leaks, and at worst it could lead to unexpected crashes (for instance if the library was keeping the objects it allocated in a linked list).
Whether a function is part of the system library or a third-party library, it should document the expected usage of any pointers it returns (and whether/how it's necessary to free them). For standard functions, the best reference on this matter is POSIX itself. You could also check the man pages for your particular system. If the code is part of a third-party library, it should come with documentation (perhaps in man pages, in the header files, or in a comprehensive document on library usage). A well-written library will provide special functions to free objects it allocates, so as to avoid introducing dependencies on the way it's (currently) implemented to code that uses the library.
As far as the nonstandard inet_ntoa and similar legacy functions go, they return pointers to internal static buffers. This makes them unsuitable for use with threads or in library code (which must take care not to destroy the caller's state unless it'd documented as doing so). Often the documentation for such functions will say that they are not required to be thread-safe, that they are not reentrant, or that they may return a pointer to an internal static buffer which may be overwritten by subsequent calls to the function. Many people, myself included, believe that such functions should not be used at all in modern code.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight