Are posix regcomp and regexec threadsafe? In specific, on GNU libc? - c

Two separate questions here really: Can I use regexes in a multithreaded program without locking and, if so, can I use the same regex_t at the same time in multiple threads? I can't find an answer on Google or the manpages.

http://www.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
2.9.1 Thread-Safety
All functions defined by this volume of POSIX.1-2008 shall be thread-safe, except that the following functions1 need not be thread-safe.
...
regexec and regcomp are not in that list, so they are required to be thread-safe.
See also: http://www.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
Part of the rationale text reads:
The interface is defined so that the matched substrings rm_sp and rm_ep are in a separate regmatch_t structure instead of in regex_t. This allows a single compiled RE to be used simultaneously in several contexts; in main() and a signal handler, perhaps, or in multiple threads of lightweight processes.

Can I use regexes in a multithreaded program without locking
Different ones, yes.
can I use the same regex_t at the same time in multiple threads?
In general: If you plan on doing so, you will have to do the locking around the functions, since few data structures do the locking for you.
regexec: Since regexec however takes a const regex_t, executing regexec seems safe for concurrent execution without locking. (After all, this is POSIX.1-2001, where stupid stuff like static buffers as used in the early BSD APIs usually don't occur anymore.)

Related

Why does system() exist?

Many papers and such mention that calls to 'system()' are unsafe and unportable. I do not dispute their arguments.
I have noticed, though, that many Unix utilities have a C library equivalent. If not, the source is available for a wide variety of these tools.
While many papers and such recommend against goto, there are those who can make an argument for its use, and there are simple reasons why it's in C at all.
So, why do we need system()? How much existing code relies on it that can't easily be changed?
sarcastic answer Because if it didn't exist people would ask why that functionality didn't exist...
better answer
Many of the system functionality is not part of the 'C' standard but are part of say the Linux spec and Windows most likely has some equivalent. So if you're writing an app that will only be used on Linux environments then using these functions is not an issue, and as such is actually useful. If you're writing an application that can run on both Linux and Windows (or others) these calls become problematic because they may not be portable between system. The key (imo) is that you are simply aware of the issues/concerns and program accordingly (e.g. use appropriate #ifdef's to protect the code etc...)
The closest thing to an official "why" answer you're likely to find is the C89 Rationale. 4.10.4.5 The system function reads:
The system function allows a program to suspend its execution temporarily in order to run another program to completion.
Information may be passed to the called program in three ways: through command-line argument strings, through the environment, and (most portably) through data files. Before calling the system function, the calling program should close all such data files.
Information may be returned from the called program in two ways: through the implementation-defined return value (in many implementations, the termination status code which is the argument to the exit function is returned by the implementation to the caller as the value returned by the system function), and (most portably) through data files.
If the environment is interactive, information may also be exchanged with users of interactive devices.
Some implementations offer built-in programs called "commands" (for example, date) which may provide useful information to an application program via the system function. The Standard does not attempt to characterize such commands, and their use is not portable.
On the other hand, the use of the system function is portable, provided the implementation supports the capability. The Standard permits the application to ascertain this by calling the system function with a null pointer argument. Whether more levels of nesting are supported can also be ascertained this way; assuming more than one such level is obviously dangerous.
Aside from that, I would say mainly for historical reasons. In the early days of Unix and C, system was a convenient library function that fulfilled a need that several interactive programs needed: as mentioned above, "suspend[ing] its execution temporarily in order to run another program". It's not well-designed or suitable for any serious tasks (the POSIX requirements for it make it fundamentally non-thread-safe, it doesn't admit asynchronous events to be handled by the calling program while the other program is running, etc.) and its use is error-prone (safe construction of command string is difficult) and non-portable (because the particular form of command strings is implementation-defined, though POSIX defines this for POSIX-conforming implementations).
If C were being designed today, it almost certainly would not include system, and would either leave this type of functionality entirely to the implementation and its library extensions, or would specify something more akin to posix_spawn and related interfaces.
Many interactive applications offer a way for users to execute shell commands. For instance, in vi you can do:
:!ls
and it will execute the ls command. system() is a function they can use to do this, rather than having to write their own fork() and exec() code.
Also, fork() and exec() aren't portable between operating systems; using system() makes code that executes shell commands more portable.

Are functions in the C standard library thread safe?

Where can I get a definitive answer, whether my memcpy (using the eglibc implementation that comes with Ubuntu) is thread safe? - Honestly, I really did not find a clear YES or NO in the docs.
By the way, with "thread safe" I mean it is safe to use memcpy concurrently whenever it would be safe to copy the date byte for byte concurrently. This should be possible at least if read-only data are copied to regions that do not overlap.
Ideally I would like to see something like the lists at the bottom of this page in the ARM compiler docs.
You can find that list here, at chapter 2.9.1 Thread-Safety : http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01
That is, this is a list over functions that posix does not require to be thread safe. All other functions are required to be thread safe. Posix includes the standard C library and the typical "unix" interfaces. (Full list here, http://pubs.opengroup.org/onlinepubs/9699919799/functions/contents.html)
memcpy() is specified by posix, but not part of the list in 2.9.1, and can therefore be considered thread safe.
The various environments on linux at least tries to implement posix to the best of its abilities - The functions on linux/glibc might be thread-safe even if posix doesn't require it to be - though this is rarely documented. For other functions/libraries than what posix covers, you are left with what their authors have documented...
From what I can tell, posix equates thread safety with reentrancy, and guarantees there is no internal data races. You, however, are responsible for the possible external data races - such as protecting yourself from calling e.g. memcpy() with memory that might be updated concurrently.
It depends on the function, and how you use it.
Take for example memcpy, it is generally thread safe, if you copy data where both source and destination is private to a single thread. If you write to data that can be read from/written to by another thread, it's no longer thread safe and you have to protect the access.
If a glibc function is not thread-safe then the man page will say so, and there will (most likely) be a thread safe variant also documented.
See, for example, man strtok:
SYNOPSIS
#include
char *strtok(char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr);
The _r (for "reentrant") is the thread-safe variant.
Unfortunately, the man pages do not make a habit of stating that a function is thread safe, but only mention thread-safety when it is an issue.
As with all functions, if you give it a pointer to a shared resource then it will become thread-unsafe. It is up to you to handle locking.

Where can I find the list of non-reentrant functions provided in gnu libc?

I am now porting an single-threaded library to support multi-threads, and I need the whole list of functions that use local static or global variables.
Any information is appreciated.
Check the manual page for each function you use ... the non-thread-safe ones will be identified as such, and the manual page will mention a thread safe version when there is one (e.g., readdir_r). You could extract the list by running a script over the man pages.
Edit: Although my answer has been accepted, I fear that it is inaccurate and possibly dangerous. For example, while strerror_r mentions that it is a thread safe version of strerror, strerror itself says nothing about thread safety ... what it says instead is "the string might be overwritten", which merely implies that it isn't thread-safe. So you need to search for at least "might be overwritten" as well as "thread", but there's no guarantee that even that will be complete.
Its always a good idea to know if a particular function is reentrant or not, but you must also consider the situation when you may call several reentrant functions from a shared piece of code from multiple threads, which could also lead to problems when using shared data.
So, if you have any data shared between threads, the data must be "protected" irregardless of the fact that the functions being called are reentrant.
Consider the following function:
void yourFunc(CommonObject *o)
{
/* This function is NOT thread safe */
reentrant_func1(o->propertyA);
reentrant_func2(o->propertyA);
}
If this function is not mutex protected, you will get undesired behavior in a multithreaded application, irregardless of the fact that func1 and func2 are reentrant.

C setjmp.h and ucontext.h, which is better?

Hi I'm need to jump from a place to another...
But I would like to know which is better to use, setjmp or ucontext, things like:
Are setjmp and ucontext portable?
My code is thread safe using these library?
Why use one instead another?
Which is fast and secure?
...(Someone please, can answer future question that I forgot to put here?)
Please give a little more information that I'm asking for, like examples or some docs...
I had searching on the web, but I only got exception handling in C like example of setjmp, and I got nothing about ucontex.h, I got that it was used for multitask, what's the difference of it and pthread?
Thanks a lot.
setjmp is portable (ISO C89 and C99) and ucontext (obsolescent in SUSv3 and removed from SUSv4/POSIX 2008) is not. However ucontext was considerably more powerful in specification. In practice, if you used nasty hacks with setjmp/longjmp and signal handlers and alternate signal handling stacks, you could make these just about as powerful as ucontext, but they were not "portable".
Neither should be used for multithreading. For that purpose POSIX threads (pthread functions). I have several reasons for saying this:
If you're writing threaded code, you might as well make it actually run concurrently. We're hitting the speed limits of non-parallel computing and future machines will be more and more parallel, so take advantage of that.
ucontext was removed from the standards and might not be supported in future OS's (or even some present ones?)
Rolling your own threads cannot be made transparent to library code you might want to use. It might break library code that makes reasonable assumptions about concurrency, locking, etc. As long as your multithreading is cooperative rather than async-signal-based there are probably not too many issues like this, but once you've gotten this deep into nonportable hacks things can get very fragile.
...and probably some more reasons I can't think of right now. :-)
On the portability matter, setjmp() is portable to all hosted C implementations; the <ucontext.h> functions are part of the XSI extensions to POSIX - this makes setjmp() significantly more portable.
It is possible to use setjmp() in a thread-safe manner. It doesn't make much sense to use the ucontext functions in a threaded program - you would use multiple threads rather than multiple contexts.
Use setjmp() if you want to quickly return from a deeply-nested function call (this is why you find that most examples show its use for exception handling). Use the ucontext functions for implementing user-space threads or coroutines (or don't use them at all).
The "fast and secure" question makes no sense. The implementations are typically as fast as it is practical to make them, but they perform different functions so cannot be directly compared (the ucontext functions do more work, so will typically be slightly slower).
Note that the ucontext functions are listed as obsolescent in the two most recent editions of POSIX. The pthreads threading functions should generally be used instead.
setjmp/longjmp are only intended to restore a "calling" context, so you can use it only to do a "fast exit" from a chain of subroutines. Different uses may or may not work depending on the system, but in general these functions are not intended to do this kind of things. So "ucontext" is better. Also have a look to "fibers" (native on Windows). Here a link to an article that may be helpful:
How to implement a practical fiber scheduler?
Bye!

Is there something to replace the <ucontext.h> functions?

The user thread functions in <ucontext.h> are deprecated because they use a deprecated C feature (they use a function declaration with empty parentheses for an argument).
Is there a standard replacement for them? I don't feel full-fledged threads are good at implementing cooperative threading.
If you really want to do something like what the ucontext.h functions allow, I would keep using them. Anything else will be less portable. Marking them obsolescent in POSIX seems to have been a horrible mistake of pedantry by someone on the committee. POSIX itself requires function pointers and data pointers to be the same size and for function pointers to be representable cast to void *, and C itself requires a cast between function pointer types and back to be round-trip safe, so there are many ways this issue could have been solved.
There is one real problem, that converting the int argc, ... passed into makecontext into a form to pass to the function cannot be done without major assistance from the compiler unless the calling convention for variadic and non-variadic functions happens to be the same (and even then it's rather questionable whether it can be done robustly). This problem however could have been solved simply by deprecating the use of makecontext in any form other than makecontext(ucp, func, 1, (void *)arg);.
Perhaps a better question though is why you think ucontext.h functions are the best way to handle threading. If you do want to go with them, I might suggest writing a wrapper interface that you can implement either with ucontext.h or with pthreads, then comparing the performance and bloat. This will also have the advantage that, should future systems drop support for ucontext.h, you can simply switch to compiling with the pthread-based implementation and everything will simply work. (By then, the bloat might be less important, the benefit of multi-core/SMP will probably be huge, and hopefully pthread implementations will be less bloated.)
Edit (based on OP's request): To implement "cooperative threading" with pthreads, you need condition variables. Here's a decent pthreads tutorial with information on using them:
https://computing.llnl.gov/tutorials/pthreads/#ConditionVariables
Your cooperative multitasking primitive of "hand off execution to thread X" would go something like:
self->flag = 0;
other_thread->flag = 1;
pthread_mutex_lock(other_thread->mutex);
pthread_cond_signal(other_thread->cond);
pthread_mutex_unlock(other_thread->mutex);
pthread_mutex_lock(self->mutex);
while (!self->flag)
pthread_cond_wait(self->cond, self->mutex);
pthread_mutex_unlock(self->mutex);
Hope I got that all right; at least the general idea is correct. If anyone sees mistakes please comment so I can fix it. Half of the locking (other_thread's mutex) is probably entirely unnecessary with this sort of usage, so you could perhaps make the mutex a local variable in the task_switch function. All you'd really be doing is using pthread_cond_wait and pthread_cond_signal as "go to sleep" and "wake up other thread" primitives.
For what it's worth, there's a Boost.Context library that was recently accepted and needs only to be merged into an official Boost release. Boost.Context addresses the same use cases as the POSIX ucontext family: low-overhead cooperative context switching. The author has taken pains with performance issues.
No, there is no standard replacement for them.
You options are
continue to use <ucontext.h>even though they contain obsolete C.
switch to pthreads
write your own co-thread library
use an existing (and possibly not-so-portable) co-thread library such as http://swtch.com/libtask/ , though many of such libraries are implemented on top of ucontext.h
The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Still lists makecontext() and swapcontext() with the same deprecated syntax. I have not seen anything more recent.

Resources