Thread-safety of C standard library on OS X - c

Is there a definitive list of functions that are thread-safe in Mac OS X's implementation of the C standard library?
There is a good answer here with regards to glibc and f*() functions specifically, but I have failed to find any such resource with respect to OS X. Is there such a thing?
For example, are strptime() and strftime() thread-safe? printf()? These are some that may have internal buffers that I would not want to mess up. :)

The Single Unix Specification gives a fairly short list of functions that are allowed to be non–thread-safe (except that functions in the "Legacy Feature Group" are allowed to be non–thread-safe despite not being listed there). The list includes strtok(), which Dave mentions in his answer, but does not include strptime(), nor strftime(), nor printf().
This StackOverflow answer asserts, in response to a question that is fairly similar to this one, that OS X does support the above aspect of the spec, so I think that's probably the best list to use. (You'll probably also be interested in the rest of that question, and in the other answer to it, by the way.)

Any function which seems to have some magical remembering power, is likely not to be thread-safe. Any function which returns a pointer you aren't expected to free() is very frequently not thread-safe.
Many of the functions you really have to worry about return char*, or struct foo*. Although this isn't a perfect rule, this is often indicative of a function which has some sort of static storage, and isn't thread-safe.
strtok() is a simple example of is, and has been succeeded by strtok_r() which is thread-safe. For many non-thread-safe functions, there exists a function_r() (r for reentrant) which is.

Related

Is there a standard-compliant way to detect whether a function in the C standard library is implemented via intrinsic/builtin?

Is there a standard-compliant way to detect whether a function in the C standard library is implemented via intrinsic/builtin?
I'm pretty confident I can implement code which performs better than the function provided by the standard library for a specific call site if only because of function call overhead. But if the function in question is implemented via intrinsic/builtin, there's no function call overhead to beat, so it would be foolish to try.
If there's a way, I have a feeling it won't be simple because it may vary by call site. For example, passing a constant length to memcpy may provide the compiler a great opportunity to generate inline code, but a variable length may provide a lesser opportunity. I guess the best hint available might be one of three values, "always", "never", or "sometimes". That would be good enough for me.
The details of how this might be accomplished are negotiable as long as they're standard-compliant. The version of the standard is even negotiable because that's testable and I'd be happy making the safest assumption if the question weren't answerable for an earlier version of the standard. But of course a way to do this at compile-time would be preferred.
(edited to include concrete details to make it easier to think about even though these details don't matter)
Let's assume memcpy is indeed the function in question and that we know the length is always variable because it was passed in to the function which calls memcpy, but we also know that length is frequently 1.
The overhead of calling into a library will surely dominate both if (1==length) and *dst = *src;. So the questions are how frequently 1 is actually the value, which is a question only I can answer, and whether any possibility the implementation will call into a library can be eliminated.
This question isn't about whether one can write a function which goes faster than memcpy or any other standard library function. There are plenty of questions on that and this isn't one of them.
It seems the closest we'll get to a simple YES or NO answer is this comment from Nate Eldredge: "The C standard doesn't even have the concept of 'intrinsic / builtin'".

Is there a particular reason for memmem being a GNU extension?

In C, the memmem function is used to locate a particular sequence of bytes in a memory area. It can be assimilated to strstr, which is dedicated to null-terminated strings.
Is there any particular reason for this function to be available as a GNU extension, and not directly in the standard libraries? The manual states :
This function was broken in Linux libraries up to and including libc 5.0.9; there the needle and haystack arguments were interchanged, and a pointer to the end of the first occurrence of needle was returned.
Both old and new libc's have the bug that if needle is empty, haystack-1 (instead of haystack) is returned. And glibc 2.0 makes it worse, returning a pointer to the last byte of haystack. This is fixed in glibc 2.1.
I can see it went through several fixes, yet I'd like to know why it was not made as directly available (if not more) as strstr on some distributions. Does it still bring up implementation issues?
Edit (motivations): I wouldn't ask this question if the standard had decided it the other way around: including memmem but not strstr. Indeed, strstr could be something like:
memmem(str, strlen(str), "search", 6);
Slightly trickier, but still a pretty logical one-liner considering that it is very usual in C functions to require both the data chunk and its length.
Edit (2): another motivation from comments and answers. Quoting Theolodis:
Not every function is necessary to every single, or at least most of the C developers, so it would actually make the standard libraries unnecessarily huge.
Well, I couldn't agree more, I'm always in when it comes to making the librairies lighter and faster. But then... why both strncpy and memcpy (from keltar's comment)...? I could almost ask: why has poor memmem been "black-sheeped"?
Historically, that is before the first revision of the Standard, C has been made by compiler writers.
In the case of strstr, it is a little bit different because it has been introduced by the C Committee, the C89 Rationale document tells us that:
"The strstr function is an invention of the Committee. It is included as a hook for efficient algorithms, or for built-in substring instruction."
The C Committee does not explain why it has not made a more general function not limited to strings so any reasoning may only be speculation. My only guess is the use case has been considered not important enough to have a generic memmem instead of strstr. Remember that in the goals of C there is this requirement (in the C99 Rationale) "Keep the language small and simple". Also even POSIX didn't consider it for inclusion.
In any case to my knowledge nobody has proposed any Defect Report or proposal to have memmem included.

Why there is so many functions in string.h library that are "not recommended for use"?

There is something I try to understand about C origins, why there are functions that are not recommended for use in most of SO questions. Like strtok or strncpy, they are simply not safe to work with. Evrywhere I see recomendations to write my own implementation. Why wouldn't the standard change strncpy for example to BSD strlcpy, but is left instead with these "monsters"?
C is a product of the early 1970s, and it shows. Many of the iffier library functions were written when the C user community was very small and limited to academia, most of whom were experienced programmers.
By the time the first standard was released in 1989, those original library functions were already entrenched in 10 to 15 years' worth of legacy code (not the least of which was the Unix operating system and most of its tools). The committee in charge of standardization was loath to break the existing codebase, so those functions were incorporated into the standard pretty much as-is; all that really changed was adding prototype syntax to the declarations and changing char * to void * where necessary (malloc, memcpy, memset, etc.).
AFAIK, only one library function has actually been removed from the language since standardization - gets. The mayhem caused by that one library call is scarier than the prospect of breaking what is by now almost 40 years' worth of legacy code.
There is a LOT of legacy "C" and "C++" code out there. If they removed all the "unsafe" functions from the "C" runtime libraries, it would be prohibitive for many developers to upgrade their compilers because all the old code wouldn't build any more.
Sometimes they will give "deprecated" compiler messages (MSFT is fond of this) so you will find and change to using the new, safer functions.
New code should use the "safe" functions, of course, but many of us are stuck with old compilers and legacy code to maintain :)
They still exist because of historical ancestral relationship with the "old system" / "codes" that still use them - i.e. to support "Backward Compatibility"
Own implementation is suggested to make the programmer use their own logic at their own risk as no one can know much better about their environment then the programmer himself, as for example, strtok is not thread safe.
It's all just dogma. Use the functions just be aware that they're indifferent to your goals in that they might not work in all circumstances (ie strtok and multi-threading) or they expect conditions to be caught before/after usage (ie strncpy and missing termination characters).

Is there something to replace the <ucontext.h> functions?

The user thread functions in <ucontext.h> are deprecated because they use a deprecated C feature (they use a function declaration with empty parentheses for an argument).
Is there a standard replacement for them? I don't feel full-fledged threads are good at implementing cooperative threading.
If you really want to do something like what the ucontext.h functions allow, I would keep using them. Anything else will be less portable. Marking them obsolescent in POSIX seems to have been a horrible mistake of pedantry by someone on the committee. POSIX itself requires function pointers and data pointers to be the same size and for function pointers to be representable cast to void *, and C itself requires a cast between function pointer types and back to be round-trip safe, so there are many ways this issue could have been solved.
There is one real problem, that converting the int argc, ... passed into makecontext into a form to pass to the function cannot be done without major assistance from the compiler unless the calling convention for variadic and non-variadic functions happens to be the same (and even then it's rather questionable whether it can be done robustly). This problem however could have been solved simply by deprecating the use of makecontext in any form other than makecontext(ucp, func, 1, (void *)arg);.
Perhaps a better question though is why you think ucontext.h functions are the best way to handle threading. If you do want to go with them, I might suggest writing a wrapper interface that you can implement either with ucontext.h or with pthreads, then comparing the performance and bloat. This will also have the advantage that, should future systems drop support for ucontext.h, you can simply switch to compiling with the pthread-based implementation and everything will simply work. (By then, the bloat might be less important, the benefit of multi-core/SMP will probably be huge, and hopefully pthread implementations will be less bloated.)
Edit (based on OP's request): To implement "cooperative threading" with pthreads, you need condition variables. Here's a decent pthreads tutorial with information on using them:
https://computing.llnl.gov/tutorials/pthreads/#ConditionVariables
Your cooperative multitasking primitive of "hand off execution to thread X" would go something like:
self->flag = 0;
other_thread->flag = 1;
pthread_mutex_lock(other_thread->mutex);
pthread_cond_signal(other_thread->cond);
pthread_mutex_unlock(other_thread->mutex);
pthread_mutex_lock(self->mutex);
while (!self->flag)
pthread_cond_wait(self->cond, self->mutex);
pthread_mutex_unlock(self->mutex);
Hope I got that all right; at least the general idea is correct. If anyone sees mistakes please comment so I can fix it. Half of the locking (other_thread's mutex) is probably entirely unnecessary with this sort of usage, so you could perhaps make the mutex a local variable in the task_switch function. All you'd really be doing is using pthread_cond_wait and pthread_cond_signal as "go to sleep" and "wake up other thread" primitives.
For what it's worth, there's a Boost.Context library that was recently accepted and needs only to be merged into an official Boost release. Boost.Context addresses the same use cases as the POSIX ucontext family: low-overhead cooperative context switching. The author has taken pains with performance issues.
No, there is no standard replacement for them.
You options are
continue to use <ucontext.h>even though they contain obsolete C.
switch to pthreads
write your own co-thread library
use an existing (and possibly not-so-portable) co-thread library such as http://swtch.com/libtask/ , though many of such libraries are implemented on top of ucontext.h
The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Still lists makecontext() and swapcontext() with the same deprecated syntax. I have not seen anything more recent.

Safer Alternatives to the C Standard Library

The C standard library is notoriously poor when it comes to I/O safety. Many functions have buffer overflows (gets, scanf), or can clobber memory if not given proper arguments (scanf), and so on. Every once in a while, I come across an enterprising hacker who has written his own library that lacks these flaws.
What are the best of these libraries you have seen? Have you used them in production code, and if so, which held up as more than hobby projects?
I use GLib library, it has many good standard and non standard functions.
See https://developer.gnome.org/glib/stable/
and maybe you fall in love... :)
For example:
https://developer.gnome.org/glib/stable/glib-String-Utility-Functions.html#g-strdup-printf
explains that g_strdup_printf is:
Similar to the standard C sprintf() function but safer, since it calculates the maximum space required and allocates memory to hold the result.
This isn't really answering your question about the safest libraries to use, but most functions that are vulnerable to buffer overflows that you mentioned have safer versions which take the buffer length as an argument to prevent the security holes that are opened up when the standard methods are used.
Unless you have relaxed the level of warnings, you will usually get compiler warnings when you use the deprecated methods, suggesting you use the safer methods instead.
I believe the Apache Portable Runtime (apr) library is safer than the standard C library. I use it, well, as part of an apache module, but also for independent processes.
For Windows there is a 'safe' C/C++ library.
You're always at liberty to implement any library you like and to use it - the hard part is making sure it is available on the platforms you need your software to work on. You can also use wrappers around the standard functions where appropriate.
Whether it is really a good idea is somewhat debatable, but there is TR24731 published by the C standard committee - for a safer set of C functions. There's definitely some good stuff in there. See this question: Do you use the TR 24731 Safe Functions in your C code?, which includes links to the technical report.
Maybe the first question to ask is if your really need plain C? (maybe a language like .net or java is an option - then e.g. buffer overflows are not really a problem anymore)
Another option is maybe to write parts of your project in C++ if other higher level languages are not an option. You can then have a C interface which encapsulates the C++ code if you really need C.
Because if you add all the advanced functions the C++ standard library has build in - your C code would only be marginally faster most times (and contain a lot more bugs than an existing and tested framework).

Resources