How to join a thread that is hanging on blocking IO? - c

I have a thread running in the background that is reading events from an input device in a blocking fashion, now when I exit the application I want to clean up the thread properly, but I can't just run a pthread_join() because the thread would never exit due to the blocking IO.
How do I properly solve that situation? Should I send a pthread_kill(theard, SIGIO) or a pthread_kill(theard, SIGALRM) to break the block? Is either of that even the right signal? Or is there another way to solve this situation and let that child thread exit the blocking read?
Currently a bit puzzled since none of my googling turned up a solution.
This is on Linux and using pthreads.
Edit: I played around a bit with SIGIO and SIGALRM, when I don't install a signal handler they break the blocking IO up, but give a message on the console ("I/O possible") but when I install a signal handler, to avoid that message, they no longer break the blocking IO, so the thread doesn't terminate. So I am kind of back to step one.

The canonical way to do this is with pthread_cancel, where the thread has done pthread_cleanup_push/pop to provide cleanup for any resources it is using.
Unfortunately this can NOT be used in C++ code, ever. Any C++ std lib code, or ANY try {} catch() on the calling stack at the time of pthread_cancel will potentially segvi killing your whole process.
The only workaround is to handle SIGUSR1, setting a stop flag, pthread_kill(SIGUSR1), then anywhere the thread is blocked on I/O, if you get EINTR check the stop flag before retrying the I/O. In practice, this does not always succeed on Linux, don't know why.
But in any case it's useless to talk about if you have to call any 3rd party lib, because they will most likely have a tight loop that simply restarts I/O on EINTR. Reverse engineering their file descriptor to close it won't cut it either—they could be waiting on a semaphore or other resource. In this case, it is simply impossible to write working code, period. Yes, this is utterly brain-damaged. Talk to the guys who designed C++ exceptions and pthread_cancel. Supposedly this may be fixed in some future version of C++. Good luck with that.

I too would recommend using a select or some other non-signal-based means of terminating your thread. One of the reasons we have threads is to try and get away from signal madness. That said...
Generally one uses pthread_kill() with SIGUSR1 or SIGUSR2 to send a signal to the thread. The other suggested signals--SIGTERM, SIGINT, SIGKILL--have process-wide semantics that you may not be interested in.
As for the behavior when you sent the signal, my guess is that it has to do with how you handled the signal. If you have no handler installed, the default action of that signal are applied, but in the context of the thread that received the signal. So SIGALRM, for instance, would be "handled" by your thread, but the handling would consist of terminating the process--probably not the desired behavior.
Receipt of a signal by the thread will generally break it out of a read with EINTR, unless it is truly in that uninterruptible state as mentioned in an earlier answer. But I think it's not, or your experiments with SIGALRM and SIGIO would not have terminated the process.
Is your read perhaps in some sort of a loop? If the read terminates with -1 return, then break out of that loop and exit the thread.
You can play with this very sloppy code I put together to test out my assumptions--I am a couple of timezones away from my POSIX books at the moment...
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <signal.h>
int global_gotsig = 0;
void *gotsig(int sig, siginfo_t *info, void *ucontext)
{
global_gotsig++;
return NULL;
}
void *reader(void *arg)
{
char buf[32];
int i;
int hdlsig = (int)arg;
struct sigaction sa;
sa.sa_handler = NULL;
sa.sa_sigaction = gotsig;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
if (sigaction(hdlsig, &sa, NULL) < 0) {
perror("sigaction");
return (void *)-1;
}
i = read(fileno(stdin), buf, 32);
if (i < 0) {
perror("read");
} else {
printf("Read %d bytes\n", i);
}
return (void *)i;
}
main(int argc, char **argv)
{
pthread_t tid1;
void *ret;
int i;
int sig = SIGUSR1;
if (argc == 2) sig = atoi(argv[1]);
printf("Using sig %d\n", sig);
if (pthread_create(&tid1, NULL, reader, (void *)sig)) {
perror("pthread_create");
exit(1);
}
sleep(5);
printf("killing thread\n");
pthread_kill(tid1, sig);
i = pthread_join(tid1, &ret);
if (i < 0)
perror("pthread_join");
else
printf("thread returned %ld\n", (long)ret);
printf("Got sig? %d\n", global_gotsig);
}

Your select() could have a timeout, even if it is infrequent, in order to exit the thread gracefully on a certain condition. I know, polling sucks...
Another alternative is to have a pipe for each child and add that to the list of file descriptors being watched by the thread. Send a byte to the pipe from the parent when you want that child to exit. No polling at the cost of a pipe per thread.

Old question which could very well get a new answer as things have evolved and a new technology is now available to better handle signals in threads.
Since Linux kernel 2.6.22, the system offers a new function called signalfd() which can be used to open a file descriptor for a given set of Unix signals (outside of those that outright kill a process.)
// defined a set of signals
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGUSR1);
// ... you can add more than one ...
// prevent the default signal behavior (very important)
sigprocmask(SIG_BLOCK, &set, nullptr);
// open a file descriptor using that set of Unix signals
f_socket = signalfd(-1, &set, SFD_NONBLOCK | SFD_CLOEXEC);
Now you can use the poll() or select() functions to listen to the signal along the more usual file descriptor (socket, file on disk, etc.) you were listening on.
The NONBLOCK is important if you want a loop that can check signals and other file descriptors over and over again (i.e. it is also important on your other file descriptor).
I have such an implementation that works with (1) timers, (2) sockets, (3) pipes, (4) Unix signals, (5) regular files. Actually, really any file descriptor plus timers.
https://github.com/m2osw/snapcpp/blob/master/snapwebsites/libsnapwebsites/src/snapwebsites/snap_communicator.cpp
https://github.com/m2osw/snapcpp/blob/master/snapwebsites/libsnapwebsites/src/snapwebsites/snap_communicator.h
You may also be interested by libraries such as libevent

Depends how it's waiting for IO.
If the thread is in the "Uninterruptible IO" state (shown as "D" in top), then there really is absolutely nothing you can do about it. Threads normally only enter this state briefly, doing something such as waiting for a page to be swapped in (or demand-loaded, e.g. from mmap'd file or shared library etc), however a failure (particularly of a NFS server) could cause it to stay in that state for longer.
There is genuinely no way of escaping from this "D" state. The thread will not respond to signals (you can send them, but they will be queued).
If it's a normal IO function such as read(), write() or a waiting function like select() or poll(), signals would be delivered normally.

One solution that occurred to me the last time I had an issue like this was to create a file (eg. a pipe) that existed only for the purpose of waking up blocking threads.
The idea would be to create a file from the main loop (or 1 per thread, as timeout suggests - this would give you finer control over which threads are woken). All of the threads that are blocking on file I/O would do a select(), using the file(s) that they are trying to operate on, as well as the file created by the main loop (as a member of the read file descriptor set). This should make all of the select() calls return.
Code to handle this "event" from the main loop would need to be added to each of the threads.
If the main loop needed to wake up all of the threads it could either write to the file or close it.
I can't say for sure if this works, as a restructure meant that the need to try it vanished.

I think, as you said, the only way would be to send a signal then catch and deal with it appropriately. Alternatives might be SIGTERM, SIGUSR1, SIGQUIT, SIGHUP, SIGINT, etc.
You could also use select() on your input descriptor so that you only read when it is ready. You could use select() with a timeout of, say, one second and then check if that thread should finish.

I always add a "kill" function related to the thread function which I run before join that ensures the thread will be joinable within reasonable time. When a thread uses blocking IO I try to utilize the system to break the lock. For example, when using a socket I would have kill call shutdown(2) or close(2) on it which would cause the network stack to terminate it cleanly.
Linux' socket implementation is thread safe.

I'm surprised that nobody has suggested pthread_cancel. I recently wrote a multi-threaded I/O program and calling cancel() and the join() afterwards worked just great.
I had originally tried the pthread_kill() but ended up just terminating the entire program with the signals I tested with.

If you're blocking in a third-party library that loops on EINTR, you might want to consider a combination of using pthread_kill with a signal (USR1 etc) calling an empty function (not SIG_IGN) with actually closing/replacing the file descriptor in question. By using dup2 to replace the fd with /dev/null or similar, you'll cause the third-party library to get an end-of-file result when it retries the read.
Note that by dup()ing the original socket first, you can avoid needing to actually close the socket.

Signals and thread is a subtle problem on Linux according to the different man pages.
Do you use LinuxThreads, or NPTL (if you are on Linux) ?
I am not sure of this, but I think the signal handler affects the whole process, so either you terminate your whole process or everything continue.
You should use timed select or poll, and set a global flag to terminate your thread.

I think the cleanest approach would have the thread using conditional variables in a loop for continuing.
When an i/o event is fired, the conditional should be signaled.
The main thread could just signal the condition while chaning the loop predicate to false.
something like:
while (!_finished)
{
pthread_cond_wait(&cond);
handleio();
}
cleanup();
Remember with conditional variables to properly handle signals. They can have things such as 'spurious wakeups'. So i would wrap your own function around the cond_wait function.

struct pollfd pfd;
pfd.fd = socket;
pfd.events = POLLIN | POLLHUP | POLLERR;
pthread_lock(&lock);
while(thread_alive)
{
int ret = poll(&pfd, 1, 100);
if(ret == 1)
{
//handle IO
}
else
{
pthread_cond_timedwait(&lock, &cond, 100);
}
}
pthread_unlock(&lock);
thread_alive is a thread specific variable that can be used in combination with the signal to kill the thread.
as for the handle IO section you need to make sure that you used open with the O_NOBLOCK option, or if its a socket there is a similar flag you can set MSG_NOWAIT??. for other fds im not sure

Related

SigHandler causing program to not terminate

Currently I am trying to create a signal handler that, when it receives a SIGTERM signal, it closes open network sockets and file descriptors.
Here is my SigHandler function
static void SigHandler(int signo){
if(signo == SIGTERM){
log_trace("SIGTERM received - handling signal");
CloseSockets();
log_trace("SIGTERM received - All sockets closed");
if (closeFile() == -1)
log_trace("SIGTERM received - No File associated with XXX open - continuing with shutdown");
else
log_trace("SIGTERM received - Closed File Descriptor for XXX - continuing with shutdown");
log_trace("Gracefully shutting down XXX Service");
} else {
log_trace("%d received - incompatible signal");
return;
}
exit(0);
}
This code below sits in main
if (sigemptyset(&set) == SIGEMPTYSET_ERROR){
log_error("Signal handling initialization failed");
}
else {
if(sigaddset(&set, SIGTERM) == SIGADDSET_ERROR) {
log_error("Signal SIGTERM not valid");
}
action.sa_flags = 0;
action.sa_mask = set;
action.sa_handler = &SigHandler;
if (sigaction(SIGTERM, &action, NULL) == SIGACTION_ERROR) {
log_error("SIGTERM handler initialization error");
}
}
When I send kill -15 PID, nothing happens. The process doesn't terminate, nor does it become a zombie process (not that it should anyway). I do see the traces printing within the SigHandler function however, so I know it is reaching that point in the code. It just seems that when it comes to exit(0), that doesn't work.
When I send SIGKILL (kill -9 PID) it kills the process just fine.
Apologies if this is vague, I'm still quite new to C and UNIX etc so I'm quite unfamiliar with most of how this works at a low level.
Your signal handler routine is conceptually wrong (it does not use just async-signal-safe functions). Read carefully signal(7) and signal-safety(7) to understand why. And your handler could apparently work most of the time but still be undefined behavior.
The usual trick is to set (in your signal handler) some volatile sig_atomic_t variable and test that variable outside of the signal handler.
Another possible trick is the pipe(7) to self trick (the Qt documentation explains it well), with your signal handler just doing a write(2) (which is async-signal-safe) to some global file descriptor obtained by e.g. pipe(2) (or perhaps the Linux specific eventfd(2)...) at program initialization before installing that signal handler.
A Linux specific way is to use signalfd(2) for SIGTERM and handle that in your own event loop (based upon poll(2)). That trick is conceptually a variant of the pipe to self one. But signalfd has some shortcomings, that a web search will find you easily.
Signals are conceptually hard to use (some view them as a design mistake in Unix), especially in multi-threaded programs.
You might want to read the old ALP book. It has some good explanations related to your issue.
PS. If your system is QNX you should read its documentation.
You should be using _exit from the signal handler instead, this also closes all the files.
Also read (very carefully) Basile's answer and take a long hard look at the list of async safe functions which you are allowed to use in signal handlers.
His advice about just changing a flag and testing it in your code is the best way if you need to do something you aren't allowed in the signal handler. Note that all blocking posix calls can be interrupted by signals so testing your atomic variable if you get an error on a blocking call (to say read) is a sure way to know if you have received a signal.

Using a sig_atomic_t flag together with blocking calls

Say I have a flag to indicate an exit condition that I with to enable with a signal. Then I can attach the following handler to SIGUSR1 for instance.
volatile sig_atomic_t finished = 0;
void catch_signal(int sig)
{
finished = 1;
}
I then use the flag to determine when a particular loop should end. In this particular case I have a thread running (but I believe my problem applies without threads also, so don't focus on that part).
void *thread_routine(void *arg)
{
while (!finished) {
/* What if the signal happens here? */
if ((clientfd = accept(sockfd, &remote_addr, &addr_size)) == -1) {
if (errno == EINTR)
continue;
/* Error handling */
}
handle_client(clientfd);
}
}
This loop is supposed to continue to run until I raise my SIGUSR1 signal. When it receives the signal I want it to stop gracefully as soon as possible. Since I have a blocking accept call I don't have the loop spinning around wasting CPU cycles, which is good, and the signal can at any moment interrupt the blocking accept and cause the loop to terminate.
The problem is, as shown in the comment in the code, that the signal could be delivered right after the while condition but before the accept call. Then the signal handler will set finished to true, but after the execution resumes, accept will be called and block indefinitely. How can I avoid this condition and make sure that I always will be able to terminate the loop with my signal?
Assuming I still want to use a signal to control this, I can think of two possible solutions. The first one is to turn on some alarm that re-raises a signal after a while if the signal was missed the first time. The second one is to put a timeout on the socket so that accept returns after some amount time so that the flag can be examined again. But these solutions are more like workarounds (especially since I change the blocking behaviour of accept in my second solution) and if there is some cleaner and more straightforward solution I'd like to use that instead.
The Self-Pipe Trick can be used in such cases.
You open a pipe and use select to wait both on the pipefd and sockfd. The handler writes a char to the pipe. After the select, checking fd set helps you determine if you can go for accept or not.
I realize this question is over a year old, now, but pselect() was designed exactly for this type of situation. You can provide pselect() (and select() generally) with file descriptors of listening sockets, and those functions will return when there is an accept()able connection available.
The general approach is you block all relevant signals, and then call pselect() with a signal mask to unblock them. pselect() will atomically:
Unblock the signal(s)
Call accept()
Block the signal(s) again when accept() returns
so you can essentially guarantee that the only time that signal will actually be delivered and handled is when pselect() is running, and you don't have to worry about it being caught after you check finished but before you call accept(). In other words, you make sure that whenever that signal is delivered, it'll always interrupt pselect() and set errno to EINTR, so that's the only place you have to check for it.

Is there a version of the wait() system call that sets a timeout?

Is there any way to use the wait() system call with a timeout, besides using a busy-waiting or busy-sleeping loop?
I've got a parent process that forks itself and execs a child executable. It then waits for the child to finish, grabs its output by whatever means appropriate, and and performs further processing. If the process does not finish within a certain period of time, it assumes that its execution timed out, and does something else. Unfortunately, this timeout detection is necessary given the nature of the problem.
There's not a wait call that takes a timeout.
What you can do instead is install a signal handler that sets a flag for SIGCHLD, and use select() to implement a timeout. select() will be interrupted by a signal.
static volatile int punt;
static void sig_handler(int sig)
{
punt = 1;
}
...
struct timeval timeout = {10,0};
int rc;
signal(SIGCHLD, sig_handler);
fork/exec stuff
//select will get interrupted by a signal
rc = select(0, NULL,NULL,NULL, &timeout );
if (rc == 0) {
// timed out
} else if (punt) {
//child terminated
}
More logic is needed if you have other signal you need to handle as well though
You can use waitpid together with the WNOHANG option and a sleep.
while(waitpid(pid, &status, WNOHANG) == 0) {
sleep(1);
}
But this will be an active sleeping. However I see no other way using the wait type of functions.
On linux, you can also solve this problem using signalfd. signalfd essentially takes a set of signals and creates an fd which you can read; each block you read corresponds to a signal which has fired. (You should block these signals with sigprocmask so that they are not actually sent.)
The advantage of signalfd is that you can use the fd with select, poll, or epoll, all of which allow for timeouts, and all of which allow you to wait for other things as well.
One note: If the same signal fires twice before the corresponding struct signalfd_siginfo is read, you'll only receive a single indication. So when you get a SIGCHLD indication, you need to waitpid(-1, &status, &WNOHANG) repeatedly until it returns -1.
On FreeBSD, you can achieve the same effect rather more directly using kqueue and a kevent of type EVFILT_PROC. (You can also kqueue a SIGCHLD event, but EVFILT_PROC lets you specify the events by child pid instead of globally for all children.) This should also work on Mac OS X, but I've never tried it.

IPC using Signals on linux

It is possible to do IPC (inter process communication) using signal catch and signal raise?
I made two programs. In the first program I did handling of signals, and in the other program I just raised signal which I want to handle in another program. I'ts working fine for me but I want to do communication between these two programs using signals and also want to send some bytes of data with this raise signal. How can I do this?
I want to pass messages with this signal also. Can i do it? It is possible?
And also, what are the disadvantages and advantages of IPC mechanisms using signals?
The following is working code of my two programs. Ising this, I am able to just raise signals and catch signals, but I want to pass data from one program to another.
In the second program, I used the first program's process ID. How can I make it dynamic.?
first program :
/* Example of using sigaction() to setup a signal handler with 3 arguments
* including siginfo_t.
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>
static void hdl (int sig, siginfo_t *siginfo, void *context)
{
printf("sig no = %d \n", sig);
if(sig == SIGINT)
exit(0);
printf ("Sending PID: %ld, UID: %ld\n",
(long)siginfo->si_pid, (long)siginfo->si_uid);
}
int main (int argc, char *argv[])
{
struct sigaction act;
sigemptyset(&act.sa_mask);
act.sa_sigaction = &hdl;
act.sa_flags = SA_SIGINFO;
if (sigaction(SIGUSR1, &act, NULL) < 0) {
perror ("sigaction SIGUSR1");
return 1;
}
if (sigaction(SIGINT, &act, NULL) < 0) {
perror ("sigaction SIGINT");
return 1;
}
while (1)
{
sleep(1);
}
return 0;
}
second program
#include <stdio.h>
#include <signal.h>
void main(void)
{
while (1)
{
sleep(1);
kill(11558, SIGUSR1);
}
}
Signals are intended to provide a rudimentary form of control over a process, not as an IPC mechanism. Signals have several issues when used as anything else:
A lot of system calls will be interrupted by a signal and need special handling.
Accordingly, a lot of code in the wild is not signal-safe.
Signals do not have any kind of data content, except for themselves. This makes them mostly useless as a message passing method.
There is only so much you can do in a signal handler.
Most importantly, subsequent signals of the same type are not queued - they are merged into one instance.
Even more important, there is no guarantee that signals are delivered in the same order as they were generated. From the manual page:
By contrast, if multiple standard signals are pending for a process, the order in which
they are delivered is unspecified.
You might theoretically be able set up some kind of channel using several signals going back and forth, with some acting like some sort of acknowledgement, but no sane person would want to attempt something like that. You might as well use smoke signals instead...
No, don't try and use signals for this. You cannot attach extra data with signals other than the siginfo struct. The main problem with using signals though is that so little is signal safe. You have to avoid just about all the C runtime routines, and make sure the recieving program does EINTR checks on all its kernel calls. The only thing you can say about when a signal occurs is that it won't be when you expect it (a bit like the Spanish Inquisition).
I suggest you look into the other IPC mechanisms, such as shared memory, message queues, fifos (named pipes), and sockets.
It is possible to do IPC (inter process communication) using signal catch and signal raise?
Yes and no. Considering signals only, you can send a signal to another process, but you can't send anything other than just a signal.
I want to pass messages with this signal also. Can i do it? It is possible?
No, not the way you're trying to. You can use sockets, files, pipes, or named pipes to do this. If you want to learn more about UNIX IPC, read Advanced Programming in the UNIX Environment.
Except in one specific case that I've encountered signals aren't generally useful as IPC mechanism.
The only time I've used signals was as part of an IPC mechanism when you need to interrupt the normal flow of operation of the signalled process to handle something, for example a timer interrupt. The signal ( have used signals together with boost shared memory to implement interprocess event management. The shared memory contains a list of events that need processing and the signal is used to get the process to process these events. These events are out-of-band and unpredictable so using a signal was ideal. I performed considerable testing to verify the implementation (and it was hard to get it all stable).
This used sigqueue together with signal SIGRTMIN+1 in a Linux environment using glibc and using SA_RESTART on the sigaction will avoid the need to directly handle EINTR see glibc: Primitives Interrupted by Signals. BSD has a similar scheme so EINTR handling wasn't required in my system. All of the points made by the other answers were considered and handled (and tested).
However if you just want to pass values back and forwards within the normal operation of the process then another IPC such as sockets, files, pipes or named pipes are better. If you can use ZeroMQ then even better as that does a lot of the hard work for you in a very elegant way.
I'm currently reading man 7 signal:
Real-time signals are distinguished by the following:
If the signal is sent using sigqueue(3), an accompanying value (either an integer or a pointer) can be sent with the signal. ...
Note: Real-time signals start from SIGRTMIN to SIGRTMAX.

Linux select() vs ppoll() vs pselect()

In my application, there is a io-thread, that is dedicated for
Wrapping data received from the application in a custom protocol
Sending the data+custom protocol packet over tcp/ip
Receiving data+custom protocol packet over tcp/ip
Unwrapping the custom protocol and handing the data to the application.
Application processes the data over a different thread. Additionally, the requirements dictate that the unacknowledged window size should be 1, i.e. there should be only one pending unacknowledged message at anytime. This implies that if io-thread has dispatched a message over the socket, it will not send any more messages, till it hears an ack from the receiver.
Application's processing thread communicates to io-thread via pipe. Application needs to shut gracefully if someone from linux CLI types ctrl+C.
Thus, given these requirements, i have following options
Use PPoll() on socket and pipe descriptors
Use Select()
Use PSelect()
I have following questions
The decision between select() and poll(). My application only deals with less than 50 file descriptors. Is it okay to assume there would be no difference whether i choose select or poll ?
Decision between select() and pselect(). I read the linux documentation and it states about race condition between signals and select(). I dont have experience with signals, so can someone explain more clearly about the race condition and select() ? Does it have something to do with someone pressing ctrl+C on CLI and application not stopping?
Decision between pselect and ppoll() ? Any thoughts on one vs the other
I'd suggest by starting the comparison with select() vs poll(). Linux also provides both pselect() and ppoll(); and the extra const sigset_t * argument to pselect() and ppoll() (vs select() and poll()) has the same effect on each "p-variant", as it were. If you are not using signals, you have no race to protect against, so the base question is really about efficiency and ease of programming.
Meanwhile there's already a stackoverflow.com answer here: what are the differences between poll and select.
As for the race: once you start using signals (for whatever reason), you will learn that in general, a signal handler should just set a variable of type volatile sig_atomic_t to indicate that the signal has been detected. The fundamental reason for this is that many library calls are not re-entrant, and a signal can be delivered while you're "in the middle of" such a routine. For instance, simply printing a message to a stream-style data structure such as stdout (C) or cout (C++) can lead to re-entrancy issues.
Suppose you have code that uses a volatile sig_atomic_t flag variable, perhaps to catch SIGINT, something like this (see also http://pubs.opengroup.org/onlinepubs/007904975/functions/sigaction.html):
volatile sig_atomic_t got_interrupted = 0;
void caught_signal(int unused) {
got_interrupted = 1;
}
...
struct sigaction sa;
sa.sa_handler = caught_signal;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
if (sigaction(SIGINT, &sa, NULL) == -1) ... handle error ...
...
Now, in the main body of your code, you might want to "run until interrupted":
while (!got_interrupted) {
... do some work ...
}
This is fine up until you start needing to make calls that wait for some input/output, such as select or poll. The "wait" action needs to wait for that I/O—but it also needs to wait for a SIGINT interrupt. If you just write:
while (!got_interrupted) {
... do some work ...
result = select(...); /* or result = poll(...) */
}
then it's possible that the interrupt will happen just before you call select() or poll(), rather than afterward. In this case, you did get interrupted—and the variable got_interrupted gets set—but after that, you start waiting. You should have checked the got_interrupted variable before you started waiting, not after.
You can try writing:
while (!got_interrupted) {
... do some work ...
if (!got_interrupted)
result = select(...); /* or result = poll(...) */
}
This shrinks the "race window", because now you'll detect the interrupt if it happens while you're in the "do some work" code; but there is still a race, because the interrupt can happen right after you test the variable, but right before the select-or-poll.
The solution is to make the "test, then wait" sequence "atomic", using the signal-blocking properties of sigprocmask (or, in POSIX threaded code, pthread_sigmask):
sigset_t mask, omask;
...
while (!got_interrupted) {
... do some work ...
/* begin critical section, test got_interrupted atomically */
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
if (sigprocmask(SIG_BLOCK, &mask, &omask))
... handle error ...
if (got_interrupted) {
sigprocmask(SIG_SETMASK, &omask, NULL); /* restore old signal mask */
break;
}
result = pselect(..., &omask); /* or ppoll() etc */
sigprocmask(SIG_SETMASK, &omask, NULL);
/* end critical section */
}
(the above code is actually not that great, it's structured for illustration rather than efficiency -- it's more efficient to do the signal mask manipulation slightly differently, and place the "got interrupted" tests differently).
Until you actually start needing to catch SIGINT, though, you need only compare select() and poll() (and if you start needing large numbers of descriptors, some of the event-based stuff like epoll() is more efficient than either one).
Between (p)select and (p)poll is a rather subtle difference:
For select, you have to initialize and populate the ugly fd_set bitmaps everytime before you call select because select modifies them in-place in a "destructive" fashion. (poll distinguishes between the .events and .revents members in struct pollfd).
After selecting, the entire bitmap is often scanned (by people/code) for events even if most of the fds are not even watched.
Third, the bitmap can only deal with fds whose number is less than a certain limit (contemporary implementations: somewhere between 1024..4096), which rules it out in programs where high fds can be easibly attained (notwithstanding that such programs are likely to already use epoll instead).
The accepted answer is not correct vis a vis difference between select and pselect. It does describe well how a race condition between sig-handler and select can arise, but it is incorrect in how it uses pselect to solve the problem. It misses the main point about pselect which is that it waits for EITHER the file-descriptor or the signal to become ready. pselect returns when either of these are ready.Select ONLY waits on the file-descriptor. Select ignores signals. See this blog post for a good working example:
https://www.linuxprogrammingblog.com/code-examples/using-pselect-to-avoid-a-signal-race
To make the picture presented by the accepted answer complete following basic fact should be mentioned: both select() and pselect() may return EINTR as stated in their man pages:
EINTR A signal was caught; see signal(7).
This "caught" means that the signal should be recognized as "occurred during the system call execution":
1. If non-masked signal occurs during select/pselect execution then select/pselect will exit.
2. If non-masked signal occurs before select/pselect has been called this will not have any effect and select/pselect will continue waiting, potentially forever.
So if a signal occurs during select/pselect execution we are ok - the execution of select/pselect will be interrupted and then we can test the reason for the exit and discover that is was EINTR and then we can exit the loop.
The real threat that we face is a possibility of signal occurrence outside of select/pselect execution, then we may hang in the system call forever. Any attempt to discover this "outsider" signal by naive means:
if (was_a_signal) {
...
}
will fail since no matter how close this test will be to the call of select/pselect there is always a possibility that the signal will occur just after the test and before the call to select/pselect.
Then, if the only place to catch the signal is during select/pselect execution we should invent some kind of "wine funnel" so all "wine splashes" (signals), even outside of "bottle neck" (select/pselect execution period) will eventually come to the "bottle neck".
But how can you deceive system call and make it "think" that the signal has occurred during this system call execution when in reality it has occurred before?
Easy. Here is our "wine funnel": you just block the signal of interest and by that cause it (if it has occurred at all) waiting outside of the process "for the door to be opened" and you "open the door" (unmask the signal) only when you're prepared "to welcome the guest" (select/pselect is running). Then the "arrived" signal will be recognized as "just occurred" and will interrupt the execution of the system call.
Of course, "opening the door" is the most critical part of the plan - it cannot be done by the usual means (first unmask, then call to select/pselect), the only possibility is to do the both actions (unmask and system call) at once (atomically) - this is what pselect() is capable of but select() is not.

Resources