According to the man page (2) the exit function is not thread safe : MT-Unsafe race:exit, this is because this function tries to clean up resources (flush data to the disk, close file descriptors, etc...) by calling callbacks registered using on_exit and atexit. And I want my program to do that ! (one of my thread keeps a fd open during the whole program's lifespan so _exit is not an option for me because I want all the data to be written to the output file)
My question is the following : if I'm being careful and I don't share any sensible data (like a fd) between my threads, is it "acceptable" to call exit in a multi-threaded program ? Note that I'm only calling exit if an unrecoverable error occurs. Yet, I can't afford having a segfault while the program tries to exit. The thing is, an unrecoverable error can happen from any thread...
I was thinking about using setjmp/longjmp to kill my threads "nicely" but this would be quite complex to do and would require many changes everywhere in my code.
Any suggestions would be greatly appreciated. Thanks ! :)
EDIT : Thanks to #Ctx enlightenment, I came up with the following idea :
#define EXIT(status) do { pthread_mutex_lock(&exit_mutex); exit(status); } while(0)
Of course the exit_mutex must be global (extern).
The manpage states that
The exit() function uses a global variable that is not protected, so it is not thread-safe.
so it won't help, if you are being careful in any way.
But the problem documented is a race condition: MT-Unsafe race:exit
So if you make sure, that exit() can never be called concurrently from two threads, you should be on the safe side! You can make this sure by using a mutex for example.
A modern cross-platform C++ solution could be:
#include <cstdlib>
#include <mutex>
std::mutex exit_mutex;
[[noreturn]] void exit_thread_safe(const int status)
{
exit_mutex.lock();
exit(status);
}
The mutex ensures that exit is never called by 2 (or more) different threads.
However, I still question the reason behind even caring about this. How likely is a multi-threaded call to exit() and which bad things can even realistically happen?
EDIT:
Using std::quick_exit avoids the clang diagnostic warning.
It can't be done: even if no data is shared between threads at first, data must be shared between a thread and its cleanup function. The function should run only after the thread has stopped or reached a safe point.
Related
I'm reading code for a linux daemon and its main() function is structured like this:
int main(int argc, char **argv){
// code which spawns some worker threads, sets up some
// signal handlers for clean termination, etc.
// ...
for(;;){ sleep(1); }
p_clean_exit();
return 0;
}
As it stands this makes no sense to me.
The for loop will keep the process alive, waking every sencond, then going back to sleep.
p_clean_exit() wil never be called nor 0 will be returned from the last statement.
Of course there's code elsewhere which sends signals, and and installed handlers which in turn call p-clen_exit() on their own for program termination. But this one instance here will never be reached. Right?
Does this make actual sense under some circumstance?
Does this make actual sense under some circumstance?
None of the code in your example really makes much sense: there's no point in writing a daemon that wakes up once per second only to sleep again without doing anything. But it's okay, because this is just a sample... it's the skeleton of a daemon, and it's understood that you'll add code inside that loop that does something interesting. The code that you add might call break or otherwise cause the loop to exit, and that's when it's good form to call p_clean_exit().
So yes, it's true that if you were to compile and run the code as is, the call to p_clean_exit() won't mean much. But no, it's not pointless to have it there, because the whole point of the code isn't to use it as is, it's to show you how to structure a real daemon, and a real daemon should absolutely clean up after itself.
Does this make actual sense under some circumstance?
Yes, there are function macros in C and there is Duff's case machine and setjmp function call in C. There may be a call to setjmp above that executes a switch and "jumps over" the endless loop to execute cleanup code, when a longjmp is executed from some code running concurrently to the main thread.
#define p_clean_init(context) switch (setjmp(context)) { case 0:
#define p_clean_exit() case 1: }
jmp_buf context;
int signal_handler(...) {
siglongjmp(context, 1);
}
int main() {
signal(..., signal_handler);
p_clean_init(context);
for(;;){ sleep(1); }
p_clean_exit();
}
There are pthread_cleanup_push and pthread_cleanup_pop functions that are used in a similar fashion - and from documentation we know that POSIX.1 permits pthread_cleanup_push() and pthread_cleanup_pop() to be implemented as macros that expand to text containing '{' and '}'. You may explore it's glibc implementation.
Maybe worth noting: by expanding macros to a switch(...) (but also to goto with labels as values GCC extension) is explored by pthreadthread project, the method used most notably by Contiki OS. But it would not work here unless the endless loop would yield periodically.
I have read few books on parallel programming over the past few months and I decided to close it off with learning about the posix thread.
I am reading "PThreads programming - A Posix standard for better multiprocessing nutshell-handbook". In chapter 5 ( Pthreads and Unix ) the author talks about handling signals in multi-threaded programs. In the "Threadsafe Library Functions and System Calls" section, the author made a statement that I have not seen in most books that I have read on parallel programming. The statement was:
Race conditions can also occur in traditional, single-threaded programs that use signal handlers or that call routines recursively. A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack.
I find it a little bit tedious to decipher this statement. Does the race condition in the recursive function occur when the recursive function keeps an internal structure by using the static storage type?
I would also love to know how signal handlers can cause RACE CONDITION IN SINGLE THREADED PROGRAMS
Note: Am not a computer science student , i would really appreciate simplified terms
I don't think one can call it a race condition in the classical meaning. Race conditions have a somewhat stochastic behavior, depending on the scheduler policy and timings.
The author is probably talking about bugs that can arise when the same object/resource is accessed from multiple recursive calls. But this behavior is completely deterministic and manageable.
Signals on the other hand is a different story as they occur asynchronously and can apparently interrupt some data processing in the middle and trigger some other processing on that data, corrupting it when returned to the interrupted task.
A signal handler can be called at any time without warning, and it potentially can access any global state in the program.
So, suppose your program has some global flag, that the signal handler sets in response to,... I don't know,... SIGINT. And your program checks the flag before each call to f(x).
if (! flag) {
f(x);
}
That's a data race. There is no guarantee that f(x) will not be called after the signal happens because the signal could sneak in at any time, including right after the "main" program tests the flag.
First it is important to understand what a race condition is. The definition given by Wikipedia is:
Race conditions arise in software when an application depends on the sequence or timing of processes or threads for it to operate properly.
The important thing to note is that a program can behave both properly and improperly based on timing or ordering of execution.
We can fairly easily create "dummy" race conditions in single threaded programs under this definition.
bool isnow(time_t then) {
time_t now = time(0);
return now == then;
}
The above function is a very dumb example and while mostly it will not work, sometimes it will give the correct answer. The correct vs. incorrect behavior depends entirely on timing and so represents a race condition on a single thread.
Taking it a step further we can write another dummy program.
bool printHello() {
sleep(10);
printf("Hello\n");
}
The expected behavior of the above program is to print "Hello" after waiting 10 seconds.
If we send a SIGINT signal 11 seconds after calling our function, everything behaves as expected. If we send a SIGINT signal 3 seconds after calling our function, the program behaves improperly and does not print "Hello".
The only difference between the correct and incorrect behavior was the timing of the SIGINT signal. Thus, a race condition was introduced by signal handling.
I'm going to give a more general answer than you asked for. And this is my own, personal, pragmatic answer, not necessarily one that hews to any official, formal definition of the term "race condition".
Me, I hate race conditions. They lead to huge classes of nasty bugs that are hard to think about, hard to find, and sometimes hard to fix. So I don't like doing programming that's susceptible to race conditions. So I don't do much classically multithreaded programming.
But even though I don't do much multithreaded programming, I'm still confronted by certain classes of what feel to me like race conditions from time to time. Here are the three I try to keep in mind:
The one you mentioned: signal handlers. Receipt of a signal, and calling of a signal handler, is a truly asynchronous event. If you have a data structure of some kind, and you're in the middle of modifying it when a signal occurs, and if your signal handler also tries to modify that same data structure, you've got a race condition. If the code that was interrupted was in the middle of doing something that left the data structure in an inconsistent state, the code in the signal handler might be confused. Note, too, that it's not necessarily code right in the signal handler, but any function called by the signal handler, or called by a function that's called by the signal handler, etc.
Shared OS resources, typically in the filesystem: If your program accesses (or modifies) a file or directory in the filesystem that's also being accessed or modified by another process, you've got a big potential for race conditions. (This is not surprising, because in a computer science sense, multiple processes are multiple threads. They may have separate address spaces meaning they can't interfere with each other that way, but obviously the filesystem is a shared resource where they still can interfere with each other.)
Non-reentrant functions like strtok. If a function maintains internal, static state, you can't have a second call to that function if another instance is active. This is not a "race condition" in the formal sense at all, but it has many of the same symptoms, and also some of the same fixes: don't use static data; do try to write your functions so that they're reentrant.
The author of the book in which you found seems to be defining the term "race condition" in an unusual manner, or maybe he's just used the wrong term.
By the usual definition, no, recursion does not create race conditions in single-threaded programs because the term is defined with respect to the respective actions of multiple threads of execution. It is possible, however, for a recursion to produce exposure to non-reentrancy of some of the functions involved. It's also possible for a single thread to deadlock against itself. These do not reflect race conditions, but perhaps one or both of them is what the author meant.
Alternatively, maybe what you read is the result of a bad editing job. The text you quoted groups functions that employ signal handling together with recursive functions, and signal handlers indeed can produce data races, just as a multiple threads can do, because execution of a signal handler has the relevant characteristics of execution of a separate thread.
Race conditions absolutely happen in single-threaded programs once you have signal handlers. Look at the Unix manual page for pselect().
One way it happens is like this: You have a signal handler that sets some global flag. You check your global flag and because it is clear you make a system call that suspends, confident that when the signal arrives the system call will exit early. But the signal arrives just after you check the global flag and just before the system call takes place. So now you're hung in a system call waiting for a signal that has already arrived. In this case, the race is between your single-threaded code and an external signal.
Well, consider the following code:
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int num = 2;
void lock_and_call_again() {
pthread_mutex_lock(&mutex);
if(num > 0) {
--num;
lock_and_call_again();
}
}
int main(int argc, char** argv) {
lock_and_call_again();
}
(Compile with gcc -pthread thread-test.c if you safe the code as thread-test.c)
This is clearly single-threaded, isn't it?
Never the less, it will enter a dead-lock, because you try to lock an already locked mutex.
That's basically what is meant within the paragraph you cited, IMHO:
It does not matter whether it is done in several threads or one single thread, if you try to lock an already locked mutex, your program will end in an dead-lock.
If a function calls itself, like lock_and_call above, it what is called a recursive call .
Just as james large explains, a signal can occur any time, and if a signal handler is registered with this signal, it will called at unpredictable times, if no measures are taken, even while the same handler is already being executed - yielding some kind of implicit recursive execution of the signal handler.
If this handler aquires some kind of a lock, you end up in a dead-lock, even without a function calling itself explicitly.
Consider the following function:
pthread_mutex_t mutex;
void my_handler(int s) {
pthread_mutex_lock(&mutex);
sleep(10);
pthread_mutex_unnlock(&mutex);
}
Now if you register this function for a particular signal, it will be called whenever the signal is caught by your program. If the handler has been called and sleeps, it might get interrupted, the handler called again, and the handler try to lock the mutex that is already locked.
Regarding the wording of the citation:
"A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack."
When a function gets called, some information is stored on the process's stack - e.g. the return address. This information is called a call frame. If you call a function recursively, like in the example above, this information gets stored on the stack several times - several call frames are stored.
It's stated a littlebit clumsy, I admit...
I am working on a project in which a "student" thread will use a semaphore to wake up a "TA" thread.
I have a semaphore called studentNeedsHelp_Sem
I initialize it with sem_init(&studentNeedsHelp_Sem, 0 ,0);
Before any of my student or TA threads are even created, I include these 3 lines:
printf("DEBUG WAITING\n");
sem_wait(&studentNeedsHelp_Sem);
printf("DEBUG DONE WAITING\n");
What should happen: DEBUG WAITING is printed, then we have to wait until a student actually needs help (and calls sem_post(&studentNeedsHelp_Sem) ) to see DEBUG DONE WAITING
What is happening: both are printed before my student threads even start.
(I am working in C on OSX, using POSIX pthreads)
Thank you for your help!
Check the return value of sem_wait (as you should always do whenever calling a library function or system call). It's probably negative, indicating an error; look at errno or use perror to display the error. I wouldn't be surprised if it's EDEADLK.
Indeed, if no threads have been started, then surely there is nobody to post the semaphore. So sem_wait would never return at all. This is a deadlock; your program is waiting for something that provably can never happen. It may be that OSX's thread library detects this and has sem_wait return with an error, on the assumption that this isn't what you intended, and at least if sem_wait returns your program has a chance to recover.
For anyone stumbling across this question, as it turns out, MacOS does not support unnamed POSIX semaphores. A good alternative is named POSIX threads. They don't appear to differ much in their function besides initialization and destruction.
Instead of using sem_init, you can declare them with
sem_t *semaphore = sem_open("/semaphoreName", O_CREAT, 0644, 1);
You can read more about those parameters here.
You will use sem_wait and sem_post just as you would unnamed semaphores.
Instead of sem_destroy, you will use
sem_close(semaphore);
and then subsequently
sem_unlink("/semaphoreName");
This is all includes in the <semaphore.h> library just as unnamed semaphores are.
Hope this helps someone that is just as confused as I was.
There is question about using exit in C++. The answer discusses that it is not good idea mainly because of RAII, e.g., if exit is called somewhere in code, destructors of objects will not be called, hence, if for example a destructor was meant to write data to file, this will not happen, because the destructor was not called.
I was interested how is this situation in C. Are similar issues applicable also in C? I thought since in C we don't use constructors/destructors, situation might be different in C. So is it ok to use exit in C? For example I have seen following functions sometimes used in C:
void die(const char *message)
{
if(errno) {
perror(message);
} else {
printf("ERROR: %s\n", message);
}
exit(1);
}
Rather than abort(), the exit() function in C is considered to be a "graceful" exit.
From C11 (N1570) 7.22.4.4/p2 The exit function (emphasis mine):
The exit function causes normal program termination to occur.
The Standard also says in 7.22.4.4/p4 that:
Next, all open streams with unwritten buffered data are flushed, all
open streams are closed, and all files created by the tmpfile function
are removed.
It is also worth looking at 7.21.3/p5 Files:
If the main function returns to its original caller, or if the exit
function is called, all open files are closed (hence all output
streams are flushed) before program termination. Other paths to
program termination, such as calling the abort function, need not
close all files properly.
However, as mentioned in comments below you can't assume that it will cover every other resource, so you may need to resort to atexit() and define callbacks for their release individually. In fact it is exactly what atexit() is intended to do, as it says in 7.22.4.2/p2 The atexit function:
The atexit function registers the function pointed to by func, to be
called without arguments at normal program termination.
Notably, the C standard does not say precisely what should happen to objects of allocated storage duration (i.e. malloc()), thus requiring you be aware of how it is done on particular implementation. For modern, host-oriented OS it is likely that the system will take care of it, but still you might want to handle this by yourself in order to silence memory debuggers such as Valgrind.
Yes, it is ok to use exit in C.
To ensure all buffers and graceful orderly shutdown, it would be recommended to use this function atexit, more information on this here
An example code would be like this:
void cleanup(void){
/* example of closing file pointer and free up memory */
if (fp) fclose(fp);
if (ptr) free(ptr);
}
int main(int argc, char **argv){
/* ... */
atexit(cleanup);
/* ... */
return 0;
}
Now, whenever exit is called, the function cleanup will get executed, which can house graceful shutdown, clean up of buffers, memory etc.
You don't have constructors and destructors but you could have resources (e.g. files, streams, sockets) and it is important to close them correctly. A buffer could not be written synchronously, so exiting from the program without correctly closing the resource first, could lead to corruption.
Using exit() is OK
Two major aspects of code design that have not yet been mentioned are 'threading' and 'libraries'.
In a single-threaded program, in the code you're writing to implement that program, using exit() is fine. My programs use it routinely when something has gone wrong and the code isn't going to recover.
But…
However, calling exit() is a unilateral action that can't be undone. That's why both 'threading' and 'libraries' require careful thought.
Threaded programs
If a program is multi-threaded, then using exit() is a dramatic action which terminates all the threads. It will probably be inappropriate to exit the entire program. It may be appropriate to exit the thread, reporting an error. If you're cognizant of the design of the program, then maybe that unilateral exit is permissible, but in general, it will not be acceptable.
Library code
And that 'cognizant of the design of the program' clause applies to code in libraries, too. It is very seldom correct for a general purpose library function to call exit(). You'd be justifiably upset if one of the standard C library functions failed to return just because of an error. (Obviously, functions like exit(), _Exit(), quick_exit(), abort() are intended not to return; that's different.) The functions in the C library therefore either "can't fail" or return an error indication somehow. If you're writing code to go into a general purpose library, you need to consider the error handling strategy for your code carefully. It should fit in with the error handling strategies of the programs with which it is intended to be used, or the error handling may be made configurable.
I have a series of library functions (in a package with header "stderr.h", a name which treads on thin ice) that are intended to exit as they're used for error reporting. Those functions exit by design. There are a related series of functions in the same package that report errors and do not exit. The exiting functions are implemented in terms of the non-exiting functions, of course, but that's an internal implementation detail.
I have many other library functions, and a good many of them rely on the "stderr.h" code for error reporting. That's a design decision I made and is one that I'm OK with. But when the errors are reported with the functions that exit, it limits the general usefulness the library code. If the code calls the error reporting functions that do not exit, then the main code paths in the function have to deal with error returns sanely — detect them and relay an error indication to the calling code.
The code for my error reporting package is available in my SOQ (Stack Overflow Questions) repository on GitHub as files stderr.c and stderr.h in the src/libsoq sub-directory.
One reason to avoid exit in functions other than main() is the possibility that your code might be taken out of context. Remember, exit is a type of non local control flow. Like uncatchable exceptions.
For example, you might write some storage management functions that exit on a critical disk error. Then someone decides to move them into a library. Exiting from a library is something that will cause the calling program to exit in an inconsitent state which it may not be prepared for.
Or you might run it on an embedded system. There is nowhere to exit to, the whole thing runs in a while(1) loop in main(). It might not even be defined in the standard library.
Depending on what you are doing, exit may be the most logical way out of a program in C. I know it's very useful for checking to make sure chains of callbacks work correctly. Take this example callback I used recently:
unsigned char cbShowDataThenExit( unsigned char *data, unsigned short dataSz,unsigned char status)
{
printf("cbShowDataThenExit with status %X (dataSz %d)\n", status, dataSz);
printf("status:%d\n",status);
printArray(data,dataSz);
cleanUp();
exit(0);
}
In the main loop, I set everything up for this system and then wait in a while(1) loop. It is possible to make a global flag to exit the while loop instead, but this is simple and does what it needs to do. If you are dealing with any open buffers like files and devices you should clean them up before close for consistency.
It is terrible in a big project when any code can exit except for coredump. Trace is very import to maintain a online server.
so I cannot seem to find solid info on whether assert is useable in a mulththreaded context.
logically to me it seems if an assertion fails the thread get shutdown but not the other threads?
or does the entire process get killed?
so basically my question. is it safe to use assert in a multithreaded environment without leaking resources?
if you see the man page of assert(), it clearly states,
The purpose of this macro is to help the programmer find bugs in his
program. The message "assertion failed in file foo.c, function
do_bar(), line 1287" is of no help at all to a user.
This means, it's only useful [and should be used] in a developing environment, not in production software. IMO, in development stage, you need not to worry about leaks caused by assert(). YMMV.
Once you finished debugging your code, you can simply switch off the assert() functionality by defining [#define] NDEBUG.
I'd say more than yes. If I'd see a multithreaded code without asserts I'd not trust it. If you simplify a bit its implementations to something like:
#define assert(x) if( !(x) ) abort()
You'll see that it does nothing special for thread-safety or thread-specific. It's your responsibility to provide race-free condition and if the assertion fails, the whole process is aborted.
The entire process gets killed. Assert will send the expression, source filename and line number to stderr and then call abort(). Abort() terminates the entire process.