Does fprintf use malloc() under the hood? - c

I want a minimal o-damn-malloc-just-failed handler, which writes some info to a file (probably just standard error). I would prefer to use fprintf() rather than write(), but this will fail badly if fprintf() itself tries to malloc().
Is there some guarantee, either in the C standard, or even just in glibc that fprintf won't do this?

No, there's no guarantee that it won't. However, most implementations I've seen tend to use a fixed size buffer for creating the formatted output string (a).
In terms of glibc (source here), there are calls to malloc within stdio-common/vfprintf.c, which a lot of the printf family use at the lower end, so I wouldn't rely on it if I were you. Even the string-buffer output calls like sprintf, which you may think wouldn't need it, seem to resolve down to that call, after setting up some tricky FILE-like string handles - see libio/iovsprintf.c.
My advice is to then write your own code for doing the output so as to ensure no memory allocations are done under the hood (and hope, of course, that write itself doesn't do this (unlikelier than *printf doing it)). Since you're probably not going to be outputting much converted stuff anyway (probably just "Dang, I done run outta memory!"), the need for formatted output should be questionable anyway.
(a) The C99 environmental considerations gives an indication that (at least) some early implementations had a buffering limit. From my memory of the Turbo C stuff, I thought 4K was about the limit and indeed, C99 states (in 7.19.6.1 fprintf):
The number of characters that can be produced by any single conversion shall be at least
4095.
(the mandate for C89 was to codify existing practice, not create a new language, and that's one reason why some of these mimimum maxima were put in the standard - they were carried forward to later iterations of the standard).

The C standard doesn't guarantee that fprintf won't call malloc under the hood. Indeed, it doesn't guarantee anything about what happens when you override malloc. You should refer to the documentation for your specific C library, or simply write your own fprintf-like function which makes direct syscalls, avoiding any possibility of heap allocation.

The only functions you can be reasonably sure will not call malloc are those marked async-signal-safe by POSIX. Since malloc is not required to be async-signal-safe (and since it's essentially impossible to make it async-signal-safe without making it unusably inefficient), async-signal-safe functions normally cannot call it.
With that said, I'm nearly sure glibc's printf functions (including fprintf and even snprintf) can and will use malloc for some (all?) format strings.

Related

Which Unix don't have a thread-safe malloc?

I want my C program to be portable even on very old Unix OS but the problem is that I'm using pthreads and dynamic allocation (malloc). All Unix I know of have a thread-safe malloc (Linux, *BSD, Irix, Solaris) however this is not guaranteed by the C standard, and I'm sure there are very old versions where this is not true.
So, is there some list of platforms that I'd need to wrap malloc() calls with a mutex lock? I plan to write a ./configure test that checks if current platform is in that list.
The other alternative would be to test malloc() for thread-safety, but I know of no deterministic way to do this. Any ideas on this one too?
The only C standard that has threads (and can thus is relevant to your question) is C11, which states:
For purposes of determining the existence of a data race, memory
allocation functions behave as though they accessed only memory
locations accessible through their arguments and not other static
duration storage.
Or in other words, as long as two threads don't pass the same address to realloc or free all calls to the memory functions are thread safe.
For POSIX, that is all Unix'es that you can find nowadays you have:
Each function defined in the System Interfaces volume of IEEE Std 1003.1-2001 is thread-safe unless explicitly stated otherwise.
I don't know from where you take your assertion that malloc wouldn't be thread safe for older Unixes, a system with threads that doesn't implement that thread safe is pretty much useless. What might be a problem on such an older system is performance, but it should always be functional.

Which C standard library functions use malloc under the hood

I want to know which C standard library functions use malloc and free under the hood. It looked to me as if printf would be using malloc, but when I tested a program with valgrind, I noticed that printf calls didn't allocate any memory using malloc. How come? How does it manage the memory then?
Usually, the only routines in the C99 standard that might use malloc() are the standard I/O functions (in <stdio.h> where the file structure and the buffer used by it is often allocated as if by malloc(). Some of the locale handling may use dynamic memory. All the other routines have no need for dynamic memory allocation in general.
Now, is any of that formally documented? No, I don't think it is. There is no blanket restriction 'the functions in the library shall not use malloc()'. (There are, however, restrictions on other functions - such as strtok() and srand() and rand(); they may not be used by the implementation, and the implementation may not use any of the other functions that may return a pointer to a static memory location.) However, one of the reasons why the extremely useful strdup() function is not in the standard C library is (reportedly) because it does memory allocation. It also isn't completely clear whether this was a factor in the routines such as asprintf() and vasprintf() in TR 24731-2 not making it into C1x, but it could have been a factor.
The standard doesn't place any requirements on the implementation, AFAIK.
I don't know exactly how printf is implemented, but of the top of my head, I can't think of a reason why it would need to dynamically allocate memory. You could always look at the source for your platform.
It depends on which libc you are using. There should be no restriction on the C spec and up to the implementation.
For instance, newlib's printf usually done with using memory on stack frame, but when it really needs to, it calls an internal function _malloc_r() directly.
I have not used valgrind, I'm not sure if it can detect use of _malloc_r().
Neither the C nor the POSIX standard force implementors to make use of malloc(), so there's no general answer to your question.
However, every sane standard library implementation that uses malloc() in one of its functions will set errno to ENOMEM if malloc() fails. Hence, you can derive from the documentation whether a library function uses malloc() or not. Point in case: on my system, mmap() may use malloc(), since mmap() may set errno to ENOMEM.
That having said, using valgrind is a poor way to find out whether a particular function calls malloc() or not. Consider the following piece of code:
void foo(int x)
{
if (!x) malloc(1);
}
If you call this function with an argument other than 0, valgrind won't notice that it may actually call malloc(). Think of valgrind as a virtual machine (since that's what it is): it doesn't look at your code, it only sees what the machine would actually execute.
printf doesn't need to form the entire output string in one shot, it can send it to output piece by piece, and when it encounters a format specifier, it can output that piece of data as it is formed, and continue on with the rest of the string.
At most it would need a locally defined array of characters (on the stack) large enough to hold the largest integer or floating point number it can handle, which isn't very large.

strlen not checking for NULL

Why is strlen() not checking for NULL?
if I do strlen(NULL), the program segmentation faults.
Trying to understand the rationale behind it (if any).
The rational behind it is simple -- how can you check the length of something that does not exist?
Also, unlike "managed languages" there is no expectations the run time system will handle invalid data or data structures correctly. (This type of issue is exactly why more "modern" languages are more popular for non-computation or less performant requiring applications).
A standard template in c would look like this
int someStrLen;
if (someStr != NULL) // or if (someStr)
someStrLen = strlen(someStr);
else
{
// handle error.
}
The portion of the language standard that defines the string handling library states that, unless specified otherwise for the specific function, any pointer arguments must have valid values.
The philosphy behind the design of the C standard library is that the programmer is ultimately in the best position to know whether a run-time check really needs to be performed. Back in the days when your total system memory was measured in kilobytes, the overhead of performing an unnecessary runtime check could be pretty painful. So the C standard library doesn't bother doing any of those checks; it assumes that the programmer has already done it if it's really necessary. If you know you will never pass a bad pointer value to strlen (such as, you're passing in a string literal, or a locally allocated array), then there's no need to clutter up the resulting binary with an unnecessary check against NULL.
The standard does not require it, so implementations just avoid a test and potentially an expensive jump.
A little macro to help your grief:
#define strlens(s) (s==NULL?0:strlen(s))
Three significant reasons:
The standard library and the C language are designed assuming that the programmer knows what he is doing, so a null pointer isn't treated as an edge case, but rather as a programmer's mistake that results in undefined behaviour;
It incurs runtime overhead - calling strlen thousands of times and always doing str != NULL is not reasonable unless the programmer is treated as a sissy;
It adds up to the code size - it could only be a few instructions, but if you adopt this principle and do it everywhere it can inflate your code significantly.
size_t strlen ( const char * str );
http://www.cplusplus.com/reference/clibrary/cstring/strlen/
Strlen takes a pointer to a character array as a parameter, null is not a valid argument to this function.

Which functions in the C standard library commonly encourage bad practice? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.
Specifically, to quote R..
strncpy does not null-terminate, and
does null-pad the whole remainder of
the destination buffer, which is a
huge waste of time. You can work
around the former by adding your own
null padding, but not the latter. It
was never intended for use as a "safe
string handling" function, but for
working with fixed-size fields in Unix
directory tables and database files.
snprintf(dest, n, "%s", src) is the
only correct "safe strcpy" in standard
C, but it's likely to be a lot slower.
By the way, truncation in itself can
be a major bug and in some cases might
lead to privilege elevation or DoS, so
throwing "safe" string functions that
truncate their output at a problem is
not a way to make it "safe" or
"secure". Instead, you should ensure
that the destination buffer is the
right size and simply use strcpy (or
better yet, memcpy if you already know
the source string length).
And from Jonathan Leffler
Note that strncat() is even more
confusing in its interface than
strncpy() - what exactly is that
length argument, again? It isn't what
you'd expect based on what you supply
strncpy() etc - so it is more error
prone even than strncpy(). For copying
strings around, I'm increasingly of
the opinion that there is a strong
argument that you only need memmove()
because you always know all the sizes
ahead of time and make sure there's
enough space ahead of time. Use
memmove() in preference to any of
strcpy(), strcat(), strncpy(),
strncat(), memcpy().
So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:
What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?
In the interests of objectivity, I have a number of criteria for an answer:
Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
Please highlight the misuse to which the code is currently put.
Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.
Please avoid:
Debates over naming conventions of functions (except where this unequivocably causes confusion).
"I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.
As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.
I am also working as per C99.
What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?
I'm gonna go with the obvious :
char *gets(char *s);
With its remarkable particularity that it's simply impossible to use it appropriately.
A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.
Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.
The CERT C Secure Coding Standard lists many of these pitfalls you asked about.
In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).
This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).
Instead, use one of the strtol() family of functions.
Let us extend the question to interfaces in a broader sense.
errno:
technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:
because it may cause overhead for the
caller to access the value, to check the "error" (which might just be an exceptional event)
because it even imposes at some places that the caller clears this "variable" before making a library call
because it implements a simple error
return by setting a global state, of the library.
The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain
There is often a strtok_r.
For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.
I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.
Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.
EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.
One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.
There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.
If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.
If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.
And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).
I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.
How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.
In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.
My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.
On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.
Another answer, since these are not really related, rand:
it is of unspecified random quality
it is not re-entrant
Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).
basename() and dirname() aren't threadsafe.

When a third-party C function returns a pointer, should you free it yourself?

There are many functions (especially in the POSIX library) that return pointers to almost-necessarily freshly allocated data. Their manpages don't say if you should free them, or if there's some obscure mechanism at play (like returning a pointer to a static buffer, or something along these lines).
For instance, the inet_ntoa function returns a char* most likely out from nowhere, but the manpage doesn't say how it was allocated. I ended up using inet_ntop instead because at least I knew where the destination allocation came from.
What's the standard rule for C functions returning pointers? Who's responsible for freeing their memory?
You have to read the documentation, there is no other way. My man page for inet_ntoa reads:
The string is returned in a statically allocated buffer, which subsequent calls will overwrite.
So in this case you must not attempt to free the returned pointer.
There really isn't a standard rule. Some functions require your to pass a pointer in, and they fill data into that space (e.g., sprintf). Others return the address of a static data area (e.g., many of the functions in <time.h>). Others still allocate memory when needed (e.g., setvbuf).
About the best you can do is hope that the documentation tells you what pointers need to be freed. You shouldn't normally attempt to free pointers it returns unless the documentation tells you to. Unless you're passing in the address of a buffer for it to use, or it specifies that you need to free the memory, you should generally assume that it's using a static data area. This means (among other things) that you should assume the value will be changed by any subsequent calls to the same routine. If you're writing multithreaded code, you should generally assume that the function is not really thread-safe -- that you have a shared data area that requires synchronization, so you should acquire a lock, call the function, copy the data out of its data area, and only then release the lock.
There's no standard rule. Ideally, a standard library function such as inet_ntoa comes with a man
page which describes the "rules of engagement" i.e. the interface of the function - arguments expected, return values in case of success and errors as well as the semantics of dealing with allocated memory.
From the man page of inet_ntoa:
The inet_ntoa() function converts the
Internet host address in, given in
network byte order, to a string in
IPv4 dotted-decimal notation. The
string is returned in a statically
allocated buffer, which subsequent
calls will overwrite.
At least on my machine (Mac OS X 10.6) the final sentence of the manpage, under BUGS is:
The string returned by inet_ntoa() resides in a static memory area.
I think your idea that "many" functions in POSIX return pointers this way is mistaken. Your example, inet_ntoa is not in POSIX and was deliberately excluded because it's deprecated and broken.
The number of standard functions which return pointers to allocated memory is actually rather small, and most of the ones that do so provide a special complementary function you're required to use for freeing the memory (for instance, fopen and fclose, getaddrinfo and freeaddrinfo, or regcomp and regfree). Simply calling free on the pointer returned would be very bad; at best you'd end up with serious memory leaks, and at worst it could lead to unexpected crashes (for instance if the library was keeping the objects it allocated in a linked list).
Whether a function is part of the system library or a third-party library, it should document the expected usage of any pointers it returns (and whether/how it's necessary to free them). For standard functions, the best reference on this matter is POSIX itself. You could also check the man pages for your particular system. If the code is part of a third-party library, it should come with documentation (perhaps in man pages, in the header files, or in a comprehensive document on library usage). A well-written library will provide special functions to free objects it allocates, so as to avoid introducing dependencies on the way it's (currently) implemented to code that uses the library.
As far as the nonstandard inet_ntoa and similar legacy functions go, they return pointers to internal static buffers. This makes them unsuitable for use with threads or in library code (which must take care not to destroy the caller's state unless it'd documented as doing so). Often the documentation for such functions will say that they are not required to be thread-safe, that they are not reentrant, or that they may return a pointer to an internal static buffer which may be overwritten by subsequent calls to the function. Many people, myself included, believe that such functions should not be used at all in modern code.

Resources