Related
This question already has answers here:
How to avoid using printf in a signal handler?
(8 answers)
Closed 2 years ago.
I have a code that looks like this:
//global variables
void signal_handler() {
//deallocation of global variables
free(foo);
close(foo_2);
exit(0);
}
int main () {
signal(SIGINT, signal_handler);
//irrelevant code
}
As you can see, I changed the CTRL+C interruption to execute the signal_handler function once instead of killing the process right away. I read somewhere that some functions like might be free are not async-safe and would NOT execute in the signal_handler but I'm not sure about that.
Can I execute functions like free, close, exit or even pthread_join in a signal handler?
No. Only functions listed in man 7 signal-safety are safe to call inside a signal handler.
close is listed and should be safe. free is not. For reasons why you would have to look at its source code (it contains locks). exit is not safe because it can call arbitrary cleanup handlers. You have _exit which exits abruptly without the cleanup.
You techincally can compile a program that calls such functions in a signal handler, nothing stops you from doing that. However it will result in undefined behavior if the function you are trying to execute is not async-signal-safe. It's not like unsafe function would just "NOT execute" as you say, they very well could, but that'd still be undefined behavior.
A list of async-signal-safe functions is documented in man 7 signal-safety. The close() function is safe, while free() and phtread_join() are not. The exit() function is also not safe to call from a signal handler, if you wish to exit from such context you will have to do so using _exit() instead.
The only way to safely call a function that is not async-signal-safe when receiving a signal is to "remember" that you have to call it (for example setting a global variable) and then do so after returning from the signal handler.
Short answer is no:
7.1.4 Use of library functions
...
4 The functions in the standard library are not guaranteed to be reentrant and may modify
objects with static or thread storage duration.188)
188) Thus, a signal handler cannot, in general, call standard library functions
C 2011 Online Draft
Real-world example of the consequences - I worked on a system that communicated with an Access database. There was a signal handler that tried to write an error message to the console with fprintf, but somehow during the signal handling process stderr got mapped to the .mdb file that stored the database, overwriting the header and ruining the database beyond repair.
There's honestly not a whole lot you can do in a signal handler other than set a flag to be checked elsewhere.
Can I execute free() or close() in a signal handler?
You definitely should not. See signal(7) and signal-safety(7)
In practice, it might work like you want perhaps more than half of the time. IIRC, the GCC compiler is doing like you want to do, and it usually works.
A better approach is to use some write(2) to a pipe(7) (from inside your signal handler) and from time to time check that pipe (in your main program) with poll(2) or related things.
Or you could set some volatile sigatomic_t flag; (perhaps it should be also _Atomic) in your signal handler, and check that flag elsewhere (in the main program, outside of signal handlers).
Qt is explaining that better than I could do in a few minutes.
On Linux, see also signalfd(2) and eventfd(2).
I have read few books on parallel programming over the past few months and I decided to close it off with learning about the posix thread.
I am reading "PThreads programming - A Posix standard for better multiprocessing nutshell-handbook". In chapter 5 ( Pthreads and Unix ) the author talks about handling signals in multi-threaded programs. In the "Threadsafe Library Functions and System Calls" section, the author made a statement that I have not seen in most books that I have read on parallel programming. The statement was:
Race conditions can also occur in traditional, single-threaded programs that use signal handlers or that call routines recursively. A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack.
I find it a little bit tedious to decipher this statement. Does the race condition in the recursive function occur when the recursive function keeps an internal structure by using the static storage type?
I would also love to know how signal handlers can cause RACE CONDITION IN SINGLE THREADED PROGRAMS
Note: Am not a computer science student , i would really appreciate simplified terms
I don't think one can call it a race condition in the classical meaning. Race conditions have a somewhat stochastic behavior, depending on the scheduler policy and timings.
The author is probably talking about bugs that can arise when the same object/resource is accessed from multiple recursive calls. But this behavior is completely deterministic and manageable.
Signals on the other hand is a different story as they occur asynchronously and can apparently interrupt some data processing in the middle and trigger some other processing on that data, corrupting it when returned to the interrupted task.
A signal handler can be called at any time without warning, and it potentially can access any global state in the program.
So, suppose your program has some global flag, that the signal handler sets in response to,... I don't know,... SIGINT. And your program checks the flag before each call to f(x).
if (! flag) {
f(x);
}
That's a data race. There is no guarantee that f(x) will not be called after the signal happens because the signal could sneak in at any time, including right after the "main" program tests the flag.
First it is important to understand what a race condition is. The definition given by Wikipedia is:
Race conditions arise in software when an application depends on the sequence or timing of processes or threads for it to operate properly.
The important thing to note is that a program can behave both properly and improperly based on timing or ordering of execution.
We can fairly easily create "dummy" race conditions in single threaded programs under this definition.
bool isnow(time_t then) {
time_t now = time(0);
return now == then;
}
The above function is a very dumb example and while mostly it will not work, sometimes it will give the correct answer. The correct vs. incorrect behavior depends entirely on timing and so represents a race condition on a single thread.
Taking it a step further we can write another dummy program.
bool printHello() {
sleep(10);
printf("Hello\n");
}
The expected behavior of the above program is to print "Hello" after waiting 10 seconds.
If we send a SIGINT signal 11 seconds after calling our function, everything behaves as expected. If we send a SIGINT signal 3 seconds after calling our function, the program behaves improperly and does not print "Hello".
The only difference between the correct and incorrect behavior was the timing of the SIGINT signal. Thus, a race condition was introduced by signal handling.
I'm going to give a more general answer than you asked for. And this is my own, personal, pragmatic answer, not necessarily one that hews to any official, formal definition of the term "race condition".
Me, I hate race conditions. They lead to huge classes of nasty bugs that are hard to think about, hard to find, and sometimes hard to fix. So I don't like doing programming that's susceptible to race conditions. So I don't do much classically multithreaded programming.
But even though I don't do much multithreaded programming, I'm still confronted by certain classes of what feel to me like race conditions from time to time. Here are the three I try to keep in mind:
The one you mentioned: signal handlers. Receipt of a signal, and calling of a signal handler, is a truly asynchronous event. If you have a data structure of some kind, and you're in the middle of modifying it when a signal occurs, and if your signal handler also tries to modify that same data structure, you've got a race condition. If the code that was interrupted was in the middle of doing something that left the data structure in an inconsistent state, the code in the signal handler might be confused. Note, too, that it's not necessarily code right in the signal handler, but any function called by the signal handler, or called by a function that's called by the signal handler, etc.
Shared OS resources, typically in the filesystem: If your program accesses (or modifies) a file or directory in the filesystem that's also being accessed or modified by another process, you've got a big potential for race conditions. (This is not surprising, because in a computer science sense, multiple processes are multiple threads. They may have separate address spaces meaning they can't interfere with each other that way, but obviously the filesystem is a shared resource where they still can interfere with each other.)
Non-reentrant functions like strtok. If a function maintains internal, static state, you can't have a second call to that function if another instance is active. This is not a "race condition" in the formal sense at all, but it has many of the same symptoms, and also some of the same fixes: don't use static data; do try to write your functions so that they're reentrant.
The author of the book in which you found seems to be defining the term "race condition" in an unusual manner, or maybe he's just used the wrong term.
By the usual definition, no, recursion does not create race conditions in single-threaded programs because the term is defined with respect to the respective actions of multiple threads of execution. It is possible, however, for a recursion to produce exposure to non-reentrancy of some of the functions involved. It's also possible for a single thread to deadlock against itself. These do not reflect race conditions, but perhaps one or both of them is what the author meant.
Alternatively, maybe what you read is the result of a bad editing job. The text you quoted groups functions that employ signal handling together with recursive functions, and signal handlers indeed can produce data races, just as a multiple threads can do, because execution of a signal handler has the relevant characteristics of execution of a separate thread.
Race conditions absolutely happen in single-threaded programs once you have signal handlers. Look at the Unix manual page for pselect().
One way it happens is like this: You have a signal handler that sets some global flag. You check your global flag and because it is clear you make a system call that suspends, confident that when the signal arrives the system call will exit early. But the signal arrives just after you check the global flag and just before the system call takes place. So now you're hung in a system call waiting for a signal that has already arrived. In this case, the race is between your single-threaded code and an external signal.
Well, consider the following code:
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int num = 2;
void lock_and_call_again() {
pthread_mutex_lock(&mutex);
if(num > 0) {
--num;
lock_and_call_again();
}
}
int main(int argc, char** argv) {
lock_and_call_again();
}
(Compile with gcc -pthread thread-test.c if you safe the code as thread-test.c)
This is clearly single-threaded, isn't it?
Never the less, it will enter a dead-lock, because you try to lock an already locked mutex.
That's basically what is meant within the paragraph you cited, IMHO:
It does not matter whether it is done in several threads or one single thread, if you try to lock an already locked mutex, your program will end in an dead-lock.
If a function calls itself, like lock_and_call above, it what is called a recursive call .
Just as james large explains, a signal can occur any time, and if a signal handler is registered with this signal, it will called at unpredictable times, if no measures are taken, even while the same handler is already being executed - yielding some kind of implicit recursive execution of the signal handler.
If this handler aquires some kind of a lock, you end up in a dead-lock, even without a function calling itself explicitly.
Consider the following function:
pthread_mutex_t mutex;
void my_handler(int s) {
pthread_mutex_lock(&mutex);
sleep(10);
pthread_mutex_unnlock(&mutex);
}
Now if you register this function for a particular signal, it will be called whenever the signal is caught by your program. If the handler has been called and sleeps, it might get interrupted, the handler called again, and the handler try to lock the mutex that is already locked.
Regarding the wording of the citation:
"A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack."
When a function gets called, some information is stored on the process's stack - e.g. the return address. This information is called a call frame. If you call a function recursively, like in the example above, this information gets stored on the stack several times - several call frames are stored.
It's stated a littlebit clumsy, I admit...
I have a long-running C program which opens a file in the beginning, writes out "interesting" stuff during execution, and closes the file just before it finishes. The code, compiled with gcc -o test test.c (gcc version 5.3.1.) looks like as follows:
//contents of test.c
#include<stdio.h>
FILE * filept;
int main() {
filept = fopen("test.txt","w");
unsigned long i;
for (i = 0; i < 1152921504606846976; ++i) {
if (i == 0) {//This case is interesting!
fprintf(filept, "Hello world\n");
}
}
fclose(filept);
return 0;
}
The problem is that since this is a scientific computation (think of searching for primes, or whatever is your favourite hard-to-crack stuff) it could really run for a very long time. Since I determined that I am not patient enough, I would like to abort the current computation, but I would like to do this in an intelligent way by somehow forcing the program by external means to flush out all the data that is currently in the OS buffer/disk cache, wherever.
Here is what I have tried (for this bogus program above, and of course not for the real deal which is currently still running):
pressing ctrl+C; or
sending kill -6 <PID> (and also kill -3 <PID>) -- as suggested by #BartekBanachewicz,
but after either of these approaches the file test.txt created in the very beginning of the program remains empty. This means, that the contents of fprintf() were left in some intermediate buffer during the computation, waiting for some OS/hardware/software flush signal, but since no such a signal was obtained, the contents disappeared. This also means, that the comment made by #EJP
Your question is based on a fallacy. 'Stuff that is in the OS
buffer/disk cache' won't be lost.
does not seem to apply here. Experience shows, that stuff indeed get lost.
I am using Ubuntu 16.04 and I am willing to attach a debugger to this process if it is possible, and if it is safe to retrieve the data this way. Since I never done such a thing before, I would appreciate if someone would provide me a detailed answer how to get the contents flushed into the disk safely and surely. Or I am open to other methods as well. There is no room for error here, as I am not going to rerun the program again.
Note: Sure I could have opened and closed a file inside the if branch, but that is extremely inefficient once you have many things to be written. Recompiling the program is not possible, as it is still in the middle of some computation.
Note2: the original question was asked the same question in a slightly more abstract way related to C++, and was tagged as such (that is why people in the comments suggesting std::flush(), which wouldn't help even if this was a C++ question). Well, I guess I made a major edit then.
Somewhat related: Will data written via write() be flushed to disk if a process is killed?
Can I just add some clarity? Obviously months have passed, and I imagine your program isn't running any more ... but there's some confusion here about buffering which still isn't clear.
As soon as you use the stdio library and FILE *, you will by default have a fairly small (implementation dependent, but typically some KB) buffer inside your program which is accumulating what you write, and flushing it to the OS when it's full, (or on file close). When you kill your process, it is this buffer that gets lost.
If the data has been flushed to the OS, then it is kept in a unix file buffer until the OS decides to persist it to disk (usually fairly soon), or someone runs the sync command. If you kill the power on your computer, then this buffer gets lost as well. You probably don't care about this scenario, because you probably aren't planning to yank the power! But this is what #EJP was talking about (re Stuff that is in the OS buffer/disk cache' won't be lost): your problem is the stdio cache, not the OS.
In an ideal world, you'd write your app so it fflushed (or std::flush()) at key points. In your example, you'd say:
if (i == 0) {//This case is interesting!
fprintf(filept, "Hello world\n");
fflush(filept);
}
which would cause the stdio buffer to flush to the OS. I imagine your real writer is more complex, and in that situation I would try to make the fflush happen "often but not too often". Too rare, and you lose data when you kill the process, too often and you lose the performance benefits of buffering if you are writing a lot.
In your described situation, where the program is already running and can't be stopped and rewritten, then your only hope, as you say, is to stop it in a debugger. The details of what you need to do depend on the implementation of the std lib, but you can usually look inside the FILE *filept object and start following pointers, messy though. #ivan_pozdeev's comment about executing std::flush() or fflush() within the debugger is helpful.
By default, the response to the signal SIGTERM is to shut down the application immediately. However, you can add your own custom signal handler to override this behaviour, like this:
#include <unistd.h>
#include <signal.h>
#include <atomic>
...
std::atomic_bool shouldStop;
...
void signalHandler(int sig)
{
//code for clean shutdown goes here: MUST be async-signal safe, such as:
shouldStop = true;
}
...
int main()
{
...
signal(SIGTERM, signalHandler); //this tells the OS to use your signal handler instead of default
signal(SIGINT, signalHandler); //can do it for other signals too
...
//main work logic, which could be of form:
while(!shouldStop) {
...
if(someTerminatingCondition) break;
...
}
//cleanup including flushing
...
}
Be aware that if take this approach, you must make sure that your program does actually terminate after your custom handler is run (it is under no obligation to do so immediately, and can run clean-up logic as it sees fit). If it doesn't shut down, linux will not shut it down either so the SIGTERM will be 'ignored' from an outside perspective.
Note that by default the linux kill command sends a SIGTERM, invoking the behaviour above. If your program is running in the foreground and Ctrl-C is pressed, a SIGINT is sent instead, which is why you might want to handle that as well as per above.
Note also, the implementation suggested above takes care to be safe, in that no async logic is performed in the signal handler other than setting an atomic flag. This is important, as pointed out in the comments below. See the Async-signal safe section of this page for details of what is and isn't allowed.
There is question about using exit in C++. The answer discusses that it is not good idea mainly because of RAII, e.g., if exit is called somewhere in code, destructors of objects will not be called, hence, if for example a destructor was meant to write data to file, this will not happen, because the destructor was not called.
I was interested how is this situation in C. Are similar issues applicable also in C? I thought since in C we don't use constructors/destructors, situation might be different in C. So is it ok to use exit in C? For example I have seen following functions sometimes used in C:
void die(const char *message)
{
if(errno) {
perror(message);
} else {
printf("ERROR: %s\n", message);
}
exit(1);
}
Rather than abort(), the exit() function in C is considered to be a "graceful" exit.
From C11 (N1570) 7.22.4.4/p2 The exit function (emphasis mine):
The exit function causes normal program termination to occur.
The Standard also says in 7.22.4.4/p4 that:
Next, all open streams with unwritten buffered data are flushed, all
open streams are closed, and all files created by the tmpfile function
are removed.
It is also worth looking at 7.21.3/p5 Files:
If the main function returns to its original caller, or if the exit
function is called, all open files are closed (hence all output
streams are flushed) before program termination. Other paths to
program termination, such as calling the abort function, need not
close all files properly.
However, as mentioned in comments below you can't assume that it will cover every other resource, so you may need to resort to atexit() and define callbacks for their release individually. In fact it is exactly what atexit() is intended to do, as it says in 7.22.4.2/p2 The atexit function:
The atexit function registers the function pointed to by func, to be
called without arguments at normal program termination.
Notably, the C standard does not say precisely what should happen to objects of allocated storage duration (i.e. malloc()), thus requiring you be aware of how it is done on particular implementation. For modern, host-oriented OS it is likely that the system will take care of it, but still you might want to handle this by yourself in order to silence memory debuggers such as Valgrind.
Yes, it is ok to use exit in C.
To ensure all buffers and graceful orderly shutdown, it would be recommended to use this function atexit, more information on this here
An example code would be like this:
void cleanup(void){
/* example of closing file pointer and free up memory */
if (fp) fclose(fp);
if (ptr) free(ptr);
}
int main(int argc, char **argv){
/* ... */
atexit(cleanup);
/* ... */
return 0;
}
Now, whenever exit is called, the function cleanup will get executed, which can house graceful shutdown, clean up of buffers, memory etc.
You don't have constructors and destructors but you could have resources (e.g. files, streams, sockets) and it is important to close them correctly. A buffer could not be written synchronously, so exiting from the program without correctly closing the resource first, could lead to corruption.
Using exit() is OK
Two major aspects of code design that have not yet been mentioned are 'threading' and 'libraries'.
In a single-threaded program, in the code you're writing to implement that program, using exit() is fine. My programs use it routinely when something has gone wrong and the code isn't going to recover.
But…
However, calling exit() is a unilateral action that can't be undone. That's why both 'threading' and 'libraries' require careful thought.
Threaded programs
If a program is multi-threaded, then using exit() is a dramatic action which terminates all the threads. It will probably be inappropriate to exit the entire program. It may be appropriate to exit the thread, reporting an error. If you're cognizant of the design of the program, then maybe that unilateral exit is permissible, but in general, it will not be acceptable.
Library code
And that 'cognizant of the design of the program' clause applies to code in libraries, too. It is very seldom correct for a general purpose library function to call exit(). You'd be justifiably upset if one of the standard C library functions failed to return just because of an error. (Obviously, functions like exit(), _Exit(), quick_exit(), abort() are intended not to return; that's different.) The functions in the C library therefore either "can't fail" or return an error indication somehow. If you're writing code to go into a general purpose library, you need to consider the error handling strategy for your code carefully. It should fit in with the error handling strategies of the programs with which it is intended to be used, or the error handling may be made configurable.
I have a series of library functions (in a package with header "stderr.h", a name which treads on thin ice) that are intended to exit as they're used for error reporting. Those functions exit by design. There are a related series of functions in the same package that report errors and do not exit. The exiting functions are implemented in terms of the non-exiting functions, of course, but that's an internal implementation detail.
I have many other library functions, and a good many of them rely on the "stderr.h" code for error reporting. That's a design decision I made and is one that I'm OK with. But when the errors are reported with the functions that exit, it limits the general usefulness the library code. If the code calls the error reporting functions that do not exit, then the main code paths in the function have to deal with error returns sanely — detect them and relay an error indication to the calling code.
The code for my error reporting package is available in my SOQ (Stack Overflow Questions) repository on GitHub as files stderr.c and stderr.h in the src/libsoq sub-directory.
One reason to avoid exit in functions other than main() is the possibility that your code might be taken out of context. Remember, exit is a type of non local control flow. Like uncatchable exceptions.
For example, you might write some storage management functions that exit on a critical disk error. Then someone decides to move them into a library. Exiting from a library is something that will cause the calling program to exit in an inconsitent state which it may not be prepared for.
Or you might run it on an embedded system. There is nowhere to exit to, the whole thing runs in a while(1) loop in main(). It might not even be defined in the standard library.
Depending on what you are doing, exit may be the most logical way out of a program in C. I know it's very useful for checking to make sure chains of callbacks work correctly. Take this example callback I used recently:
unsigned char cbShowDataThenExit( unsigned char *data, unsigned short dataSz,unsigned char status)
{
printf("cbShowDataThenExit with status %X (dataSz %d)\n", status, dataSz);
printf("status:%d\n",status);
printArray(data,dataSz);
cleanUp();
exit(0);
}
In the main loop, I set everything up for this system and then wait in a while(1) loop. It is possible to make a global flag to exit the while loop instead, but this is simple and does what it needs to do. If you are dealing with any open buffers like files and devices you should clean them up before close for consistency.
It is terrible in a big project when any code can exit except for coredump. Trace is very import to maintain a online server.
Obviously, it's good practice. That goes without saying. I see it every time in example code (like socket(), fork(), or malloc(), to name a few). I know to do it, I just don't understand the why of it so much. Are they prone to failing often? Is it because system calls are made in kernel mode? What's the reasoning behind it?
I presume you are asking why code that calls these routines checks the results to determine whether an error occurred.
Each of the routines you cite, socket, fork, and malloc, requires resources. Those resources may be unavailable either because the calling process has exceeded limits set by the system administrator or the user or because the system has exhausted the resources it has and cannot provide any more to processes. Therefore, it is possible, even if not frequent, that a call to one of these routines will return failure. So a calling process should check for failure.
Additionally, in some implementations, some system routines (such as read and write) can be interrupted if a signal is delivered to the process before the operation completed. (When a signal arrives, it is considered important, and it is desirable to deliver it to the process immediately rather than wait for a potentially long operation to complete. So the operation is interrupted, the signal is delivered, the process may handle the signal and return from the signal handler. Then control is returned to the code that called the original routine, and that code must be informed that the operation was interrupted.) This interruption results in returning failure with an error status indicating the operation was interrupted.
Always, if only..
Way back when as a C function could only return an integer, and exceptions were science fiction, they came up with the idea of returning either success or a code that provided a clue as to what had gone wrong. It became a convention.
Depends on what you call a failure.
Something like opening a file (given the developer can be bothered) are relatively easy to deal with, File not found for instance. Malloc, is a bit more difficult to take some remedial action.
The key point though is to check as near to the error as possible. If you don't, you find that the file you wanted to open and append to didn't exist 10,000 lines of code later, when you try and write the results of your extensive computation to it and get say an access violation.
Basically this stuff is the reason exceptions were invented. Checking the return value is "optional", swallowing an exception is explicit.
example:
FILE *fp;
fp = fopen("c:\\removedDirectory\nonexistingFile.txt", "r")//returns NULL
if(fp != NULL)
{
//stuff here will fail if fp == NULL
}
If you do not check output of fopen, (replace with any function that returns an error) and fp is NULL, the subsequent functions depending on a real file stream will not work.