using select() with pipe - this is what I am doing and now I need to catch SIGTERM on that. how can I do it? Do I have to do it when select() returns error ( < 0 ) ?
First, SIGTERM will kill your process if not caught, and select() will not return. Thus, you must install a signal handler for SIGTERM. Do that using sigaction().
However, the SIGTERM signal can arrive at a moment where your thread is not blocked at select(). It would be a rare condition, if your process is mostly sleeping on the file descriptors, but it can otherwise happen. This means that either your signal handler must do something to inform the main routine of the interruption, namely, setting some flag variable (of type sig_atomic_t), or you must guarantee that SIGTERM is only delivered when the process is sleeping on select().
I'll go with the latter approach, since it's simpler, albeit less flexible (see end of the post).
So, you block SIGTERM just before calling select(), and reblock it right away after the function returns, so that your process only receives the signal while sleeping inside select(). But note that this actually creates a race condition. If the signal arrives just after the unblock, but just before select() is called, the system call will not have been called yet and thus it will not return -1. If the signal arrives just after select() returns successfully, but just before the re-block, you have also lost the signal.
Thus, you must use pselect() for that. It does the blocking/unblocking around select() atomically.
First, block SIGTERM using sigprocmask() before entering the pselect() loop. After that, just call pselect() with the original mask returned by sigprocmask(). This way you guarantee your process will only be interrupted while sleeping on select().
In summary:
Install a handler for SIGTERM (that does nothing);
Before entering the pselect() loop, block SIGTERM using sigprocmask();
Call pselect() with the old signal mask returned by sigprocmask();
Inside the pselect() loop, now you can check safely whether pselect() returned -1 and errno is EINTR.
Please note that if, after pselect() returns successfully, you do a lot of work, you may experience bigger latency when responding to SIGTERM (since the process must do all processing and return to pselect() before actually processing the signal). If this is a problem, you must use a flag variable inside the signal handler, so that you can check for this variable in a number of specific points in your code. Using a flag variable does not eliminate the race condition and does not eliminate the need for pselect(), though.
Remember: whenever you need to wait on some file descriptors or for the delivery of a signal, you must use pselect() (or ppoll(), for the systems that support it).
Edit: nothing better than a code example to illustrate the usage.
#define _POSIX_C_SOURCE 200809L
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/select.h>
#include <unistd.h>
// Signal handler to catch SIGTERM.
void sigterm(int signo) {
(void)signo;
}
int main(void) {
// Install the signal handler for SIGTERM.
struct sigaction s;
s.sa_handler = sigterm;
sigemptyset(&s.sa_mask);
s.sa_flags = 0;
sigaction(SIGTERM, &s, NULL);
// Block SIGTERM.
sigset_t sigset, oldset;
sigemptyset(&sigset);
sigaddset(&sigset, SIGTERM);
sigprocmask(SIG_BLOCK, &sigset, &oldset);
// Enter the pselect() loop, using the original mask as argument.
fd_set set;
FD_ZERO(&set);
FD_SET(0, &set);
while (pselect(1, &set, NULL, NULL, NULL, &oldset) >= 0) {
// Do some processing. Note that the process will not be
// interrupted while inside this loop.
sleep(5);
}
// See why pselect() has failed.
if (errno == EINTR)
puts("Interrupted by SIGTERM.");
else
perror("pselect()");
return EXIT_SUCCESS;
}
The answer is partly in one of the comment in the Q&A you point to;
> Interrupt will cause select() to return a -1 with errno set to EINTR
That is; for any interrupt(signal) caught the select will return, and the errno will be set to EINTR.
Now if you specifically want to catch SIGTERM, then you need to set that up with a call to signal, like this;
signal(SIGTERM,yourcatchfunction);
where your catch function should be defined something like
void yourcatchfunction(int signaleNumber) { .... }
So in summary, you have setup a signal handler yourcatchfunction and your program is currently in a select() call waiting for IO -- when a signal arrives, your catchfunction will be called and when you return from that the select call will return with the errno set to EINTR.
However be aware that the SIGTERM can occur at any time so you may not be in the select call when it occur, in which case you will never see the EINTR but only a regular call of the yourcatchfunction
Hence the select() returning with err and errno EINTR is just so you can take non-blocking action -- it is not what will catch the signal.
You can call select() in a loop. This is known as restarting the system call. Here is some pseudo-C.
int retval = -1;
int select_errno = 0;
do {
retval = select(...);
if (retval < 0)
{
/* Cache the value of errno in case a system call is later
* added prior to the loop guard (i.e., the while expression). */
select_errno = errno;
}
/* Other system calls might be added here. These could change the
* value of errno, losing track of the error during the select(),
* again this is the reason we cached the value. (E.g, you might call
* a log method which calls gettimeofday().) */
/* Automatically restart the system call if it was interrupted by
* a signal -- with a while loop. */
} while ((retval < 0) && (select_errno == EINTR));
if (retval < 0) {
/* Handle other errors here. See select man page. */
} else {
/* Successful invocation of select(). */
}
Related
I'm trying to add a signal handler for proper cleanup to my event-driven application.
My signal handler for SIGINT only changes the value of a global flag variable, which is then checked in the main loop. To avoid races, the signal is blocked at all times, except during the pselect() call. This should cause pending signals to be delivered only during the pselect() call, which should be interrupted and fail with EINTR.
This usually works fine, except if there are already events pending on the monitored file descriptors (e.g. under heavy load, when there's always activity on the file descriptors).
This sample program reproduces the problem:
#include <assert.h>
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <sys/select.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
volatile sig_atomic_t stop_requested = 0;
void handle_signal(int sig)
{
// Use write() and strlen() instead of printf(), which is not async-signal-safe
const char * out = "Caught stop signal. Exiting.\n";
size_t len = strlen (out);
ssize_t writelen = write(STDOUT_FILENO, out, len);
assert(writelen == (ssize_t) len);
stop_requested = 1;
}
int main(void)
{
int ret;
// Install signal handler
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handle_signal;
ret = sigaction(SIGINT, &sa, NULL);
assert(ret == 0);
}
// Block SIGINT
sigset_t old_sigmask;
{
sigset_t blocked;
sigemptyset(&blocked);
sigaddset(&blocked, SIGINT);
ret = sigprocmask(SIG_BLOCK, &blocked, &old_sigmask);
assert(ret == 0);
}
ret = raise(SIGINT);
assert(ret == 0);
// Create pipe and write data to it
int pipefd[2];
ret = pipe(pipefd);
assert(ret == 0);
ssize_t writelen = write(pipefd[1], "foo", 3);
assert(writelen == 3);
while (stop_requested == 0)
{
printf("Calling pselect().\n");
fd_set fds;
FD_ZERO(&fds);
FD_SET(pipefd[0], &fds);
struct timespec * timeout = NULL;
int ret = pselect(pipefd[0] + 1, &fds, NULL, NULL, timeout, &old_sigmask);
assert(ret >= 0 || errno == EINTR);
printf("pselect() returned %d.\n", ret);
if (FD_ISSET(pipefd[0], &fds))
printf("pipe is readable.\n");
sleep(1);
}
printf("Event loop terminated.\n");
}
This program installs a handler for SIGINT, then blocks SIGINT, sends SIGINT to itself (which will not be delivered yet because SIGINT is blocked), creates a pipe and writes some data into the pipe, and then monitors the read end of the pipe for readability.
This readability monitoring is done using pselect(), which is supposed to unblock SIGINT, which should then interrupt the pselect() and call the signal handler.
However, on Linux (I tested on 5.6 and 4.19), the pselect() call returns 1 instead and indicates readability of the pipe, without calling the signal handler. Since this test program does not read the data that was written to the pipe, the file descriptor will never cease to be readable, and the signal handler is never called. In real programs, a similar situation might arise under heavy load, where a lot of data might be available for reading on different file descriptors (e.g. sockets).
On the other hand, on FreeBSD (I tested on 12.1), the signal handler is called, and then pselect() returns -1 and sets errno to EINTR. This is what I expected to happen on Linux as well.
Am I misunderstanding something, or am I using these interfaces incorrectly? Or should I just fall back to the old self-pipe trick, which (I believe) would handle this case better?
This is a type of resource starvation caused by always checking for active resources in the same order. When resources are always checked in the same order, if the resources checked first are busy enough the resources checked later may never get any attention.
See What is starvation?.
The Linux implementation of pselect() apparently checks file descriptors before checking for signals. The BSD implementation does the opposite.
For what it's worth, the POSIX documentation for pselect() states:
If none of the selected descriptors are ready for the requested operation, the pselect() or select() function shall block until at least one of the requested operations becomes ready, until the timeout occurs, or until interrupted by a signal.
A strict reading of that description requires checking the descriptors first. If any descriptor is active, pselect() will return that instead of failing with errno set to EINTR.
In that case, if the descriptors are so busy that one is always active, the signal processing gets starved.
The BSD implementation likely starves active descriptors if signals come in too fast.
One common solution is to always process all active resources every time a select() call or similar returns. But you can't do that with your current design that mixes signals with descriptors because pselect() doesn't even get to checking for a pending signal if there are active descriptors. As #Shawn mentioned in the comments, you can map signals to file descriptors using signalfd(). Then add the descriptor from signalfd() to the file descriptor set passed to pselect().
Context is this Redis issue. We have a wait3() call that waits for the AOF rewriting child to create the new AOF version on disk. When the child is done, the parent is notified via wait3() in order to substitute the old AOF with the new one.
However in the context of the above issue the user notified us about a bug. I modified a bit the implementation of Redis 3.0 in order to clearly log when wait3() returned -1 instead of crashing because of this unexpected condition. So this is what happens apparently:
wait3() is called when we have pending children to wait for.
the SIGCHLD should be set to SIG_DFL, there is no code setting this signal at all in Redis, so it's the default behavior.
When the first AOF rewrite happens, wait3() successfully works as expected.
Starting from the second AOF rewrite (the second child created), wait3() starts to return -1.
AFAIK it is not possible in the current code that we call wait3() while there are no pending children, since when the AOF child is created, we set server.aof_child_pid to the value of the pid, and we reset it only after a successful wait3() call.
So wait3() should have no reason to fail with -1 and ECHILD, but it does, so probably the zombie child is not created for some unexpected reason.
Hypothesis 1: It is possible that Linux during certain odd conditions will discard the zombie child, for example because of memory pressure? Does not look reasonable since the zombie has just metadata attached to it but who knows.
Note that we call wait3() with WNOHANG. And given that SIGCHLD is set to SIG_DFL by default, the only condition that should lead to failing and returning -1 and ECHLD should be no zombie available to report the information.
Hypothesis 2: Other thing that could happen but there is no explanation if it happens, is that after the first child dies, the SIGCHLD handler is set to SIG_IGN, causing wait3() to return -1 and ECHLD.
Hypothesis 3: Is there some way to remove the zombie children externally? Maybe this user has some kind of script that removes zombie processes in the background so that then the information is no longer available for wait3()? To my knowledge it should never be possible to remove the zombie if the parent does not wait for it (with waitpid or handling the signal) and if the SIGCHLD is not ignored, but maybe there is some Linux specific way.
Hypothesis 4: There is actually some bug in the Redis code so that we successfully wait3() the child the first time without correctly resetting the state, and later we call wait3() again and again but there are no longer zombies, so it returns -1. Analyzing the code it looks impossible, but maybe I'm wrong.
Another important thing: we never observed this in the past. Only happens in this specific Linux system apparently.
UPDATE: Yossi Gottlieb proposed that the SIGCHLD is received by another thread in the Redis process for some reason (does not happen normally, only on this system). We already mask SIGALRM in bio.c threads, perhaps we could try masking SIGCHLD from I/O threads as well.
Appendix: selected parts of Redis code
Where wait3() is called:
/* Check if a background saving or AOF rewrite in progress terminated. */
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
int statloc;
pid_t pid;
if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
int exitcode = WEXITSTATUS(statloc);
int bysignal = 0;
if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);
if (pid == -1) {
redisLog(LOG_WARNING,"wait3() returned an error: %s. "
"rdb_child_pid = %d, aof_child_pid = %d",
strerror(errno),
(int) server.rdb_child_pid,
(int) server.aof_child_pid);
} else if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode,bysignal);
} else if (pid == server.aof_child_pid) {
backgroundRewriteDoneHandler(exitcode,bysignal);
} else {
redisLog(REDIS_WARNING,
"Warning, detected child with unmatched pid: %ld",
(long)pid);
}
updateDictResizePolicy();
}
} else {
Selected parts of backgroundRewriteDoneHandler:
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
if (!bysignal && exitcode == 0) {
int newfd, oldfd;
char tmpfile[256];
long long now = ustime();
mstime_t latency;
redisLog(REDIS_NOTICE,
"Background AOF rewrite terminated with success");
... more code to handle the rewrite, never calls return ...
} else if (!bysignal && exitcode != 0) {
server.aof_lastbgrewrite_status = REDIS_ERR;
redisLog(REDIS_WARNING,
"Background AOF rewrite terminated with error");
} else {
server.aof_lastbgrewrite_status = REDIS_ERR;
redisLog(REDIS_WARNING,
"Background AOF rewrite terminated by signal %d", bysignal);
}
cleanup:
aofClosePipes();
aofRewriteBufferReset();
aofRemoveTempFile(server.aof_child_pid);
server.aof_child_pid = -1;
server.aof_rewrite_time_last = time(NULL)-server.aof_rewrite_time_start;
server.aof_rewrite_time_start = -1;
/* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
server.aof_rewrite_scheduled = 1;
}
As you can see all the code paths must execute the cleanup code that reset server.aof_child_pid to -1.
Errors logged by Redis during the issue
21353:C 29 Nov 04:00:29.957 * AOF rewrite: 8 MB of memory used by copy-on-write
27848:M 29 Nov 04:00:30.133 ^# wait3() returned an error: No child processes. rdb_child_pid = -1, aof_child_pid = 21353
As you can see aof_child_pid is not -1.
TLDR: you are currently relying on unspecified behaviour of signal(2); use sigaction (carefully) instead.
Firstly, SIGCHLD is strange. From the manual page for sigaction;
POSIX.1-1990 disallowed setting the action for SIGCHLD to SIG_IGN. POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD can be used to prevent the creation of zombies (see wait(2)). Nevertheless, the historical BSD and System V behaviors for ignoring SIGCHLD differ, so that the only completely portable method of ensuring that terminated children do not become zombies is to catch the SIGCHLD signal and perform a wait(2) or similar.
And here's the bit from wait(2)'s manual page:
POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see sigaction(2)), then children that terminate do not become zombies and a call to wait() or waitpid() will block until all children have terminated, and then fail with errno set to ECHILD. (The original POSIX standard left the behavior of setting SIGCHLD to SIG_IGN unspecified. Note that even though the default disposition of SIGCHLD is "ignore", explicitly setting the disposition to SIG_IGN results in different treatment of zombie process children.) Linux 2.6 conforms to this specification. However, Linux 2.4 (and earlier) does not: if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored, that is, the call blocks until the next child terminates and then returns the process ID and status of that child.
Note the effect of that is that if the signal's handling behaves like SIG_IGN is set, then (under Linux 2.6+) you will see the behaviour you are seeing - i.e. wait() will return -1 and ECHLD because the child will have been automatically reaped.
Secondly, signal handling with pthreads (which I think you are using here) is notoriously hard. The way it's meant to work (as I'm sure you know) is that process directed signals get sent to an arbitrary thread within the process that has the signal unmasked. But whilst threads have their own signal mask, there is a process wide action handler.
Putting these two things together, I think you are running across a problem I've run across before. I have had problems getting SIGCHLD handling to work with signal() (which is fair enough as that was deprecated prior to pthreads), which were fixed by moving to sigaction and carefully setting per thread signal masks. My conclusion at the time was that the C library was emulating (with sigaction) what I was telling it to do with signal(), but was getting tripped up by pthreads.
Note that you are currently relying on unspecified behaviour. From the manual page of signal(2):
The effects of signal() in a multithreaded process are unspecified.
Here's what I recommend you do:
Move to sigaction() and pthread_sigmask(). Explicitly set the handling of all the signals you care about (even if you think that's the current default), even when setting them to SIG_IGN or SIG_DFL. I block signals whilst I do this (possibly overabundance of caution but I copied the example from somewhere).
Here's what I am doing (roughly):
sigset_t set;
struct sigaction sa;
/* block all signals */
sigfillset (&set);
pthread_sigmask (SIG_BLOCK, &set, NULL);
/* Set up the structure to specify the new action. */
memset (&sa, 0, sizeof (struct sigaction));
sa.sa_handler = handlesignal; /* signal handler for INT, TERM, HUP, USR1, USR2 */
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGINT, &sa, NULL);
sigaction (SIGTERM, &sa, NULL);
sigaction (SIGHUP, &sa, NULL);
sigaction (SIGUSR1, &sa, NULL);
sigaction (SIGUSR2, &sa, NULL);
sa.sa_handler = SIG_IGN;
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGPIPE, &sa, NULL); /* I don't care about SIGPIPE */
sa.sa_handler = SIG_DFL;
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGCHLD, &sa, NULL); /* I want SIGCHLD to be handled by SIG_DFL */
pthread_sigmask (SIG_UNBLOCK, &set, NULL);
Where possible set all your signal handlers and masks etc. prior to any pthread operations. Where possible do not change signal handlers and masks (you might need to do this prior to and subsequent to fork() calls).
If you need to a signal handler for SIGCHLD (rather than relying on SIG_DFL), if possible let it be received by any thread, and use the self-pipe method or similar to alert the main program.
If you must have threads that do/don't handle certain signals, try to restrict yourself to pthread_sigmask in the relevant thread rather than sig* calls.
Just in case you run headlong into the next issue I ran into, ensure that after you have fork()'d, you set up again the signal handling from scratch (in the child) rather than relying on whatever you might inherit from the the parent process. If there's one thing worse than signals mixed with pthread, it's signals mixed with pthread with fork().
Note I cannot explain exactly entirely why change (1) works, but it has fixed what looks like a very similar issue for me and was after all relying on something that was 'unspecified' previously. It's closest to your 'hypothesis 2' but I think it is really incomplete emulation of legacy signal functions (specifically emulating the previously racy behaviour of signal() which is what caused it to be replaced by sigaction() in the first place - but this is just a guess).
Incidentally, I suggest you use wait4() or (as you aren't using rusage) waitpid() rather than wait3(), so you can specify a specific PID to wait for. If you have something else that generates children (I've had a library do it), you may end up waiting for the wrong thing. That said, I don't think that's what's happening here.
i have the following case
void foo() {
printf("hi\n");
while(1);
}
int main(void)
{
struct sigaction temp;
temp.sa_handler = &foo;
sigfillset(&temp.sa_mask);
sigdelset(&temp.sa_mask, SIGVTALRM);
sigdelset(&temp.sa_mask, SIGINT );
sigaction(SIGVTALRM, &temp, NULL);
struct itimerval tv;
tv.it_value.tv_sec = 2; /* first time interval, seconds part */
tv.it_value.tv_usec = 0; /* first time interval, microseconds part */
tv.it_interval.tv_sec = 2; /* following time intervals, seconds part */
tv.it_interval.tv_usec = 0; /* following time intervals, microseconds part */
if (setitimer(ITIMER_VIRTUAL, &tv, NULL)){
perror(NULL);
}
while(1);
return 0;
}
all I want is that every 2 seconds foo will be called (foo actually does some other stuff other than while(1), just assume foo run takes more than 2 seconds), after 2 seconds foo is indeed called but then no other call is made untill foo returns. I tried playing with the signal masks (hence the sigfillset) but also when simply calling signal(SIGVTALRM, foo) no changes are made in the result. I also tried having the itimerval and the sigactions variables declared outside main and it didn't quite affect anything.
is the thing I'm trying to do even possible?
thanks!
reference: <http://www.gnu.org/software/libc/manual/html_node/Signals-in-Handler.html>
24.4.4 Signals Arriving While a Handler Runs
What happens if another signal arrives while your signal handler function is running?
When the handler for a particular signal is invoked, that signal is automatically blocked until the handler returns. That means that if two signals of the same kind arrive close together, the second one will be held until the first has been handled. (The handler can explicitly unblock the signal using sigprocmask, if you want to allow more signals of this type to arrive; see Process Signal Mask.)
However, your handler can still be interrupted by delivery of another kind of signal. To avoid this, you can use the sa_mask member of the action structure passed to sigaction to explicitly specify which signals should be blocked while the signal handler runs. These signals are in addition to the signal for which the handler was invoked, and any other signals that are normally blocked by the process. See Blocking for Handler.
When the handler returns, the set of blocked signals is restored to the value it had before the handler ran. So using sigprocmask inside the handler only affects what signals can arrive during the execution of the handler itself, not what signals can arrive once the handler returns.
Portability Note: Always use sigaction to establish a handler for a signal that you expect to receive asynchronously, if you want your program to work properly on System V Unix. On this system, the handling of a signal whose handler was established with signal automatically sets the signal’s action back to SIG_DFL, and the handler must re-establish itself each time it runs. This practice, while inconvenient, does work when signals cannot arrive in succession. However, if another signal can arrive right away, it may arrive before the handler can re-establish itself. Then the second signal would receive the default handling, which could terminate the process.
reference:<http://www.gnu.org/software/libc/manual/html_node/Process-Signal-Mask.html#Process-Signal-Mask>
24.7.3 Process Signal Mask
The collection of signals that are currently blocked is called the signal mask. Each process has its own signal mask. When you create a new process (see Creating a Process), it inherits its parent’s mask. You can block or unblock signals with total flexibility by modifying the signal mask.
The prototype for the sigprocmask function is in signal.h.
Note that you must not use sigprocmask in multi-threaded processes, because each thread has its own signal mask and there is no single process signal mask. According to POSIX, the behavior of sigprocmask in a multi-threaded process is “unspecified”. Instead, use pthread_sigmask.
Function: int sigprocmask (int how, const sigset_t *restrict set, sigset_t *restrict oldset)
Preliminary: | MT-Unsafe race:sigprocmask/bsd(SIG_UNBLOCK) | AS-Unsafe lock/hurd | AC-Unsafe lock/hurd | See POSIX Safety Concepts.
The sigprocmask function is used to examine or change the calling process’s signal mask. The how argument determines how the signal mask is changed, and must be one of the following values:
SIG_BLOCK
Block the signals in set—add them to the existing mask. In other words, the new mask is the union of the existing mask and set.
SIG_UNBLOCK
Unblock the signals in set—remove them from the existing mask.
SIG_SETMASK
Use set for the mask; ignore the previous value of the mask.
The last argument, oldset, is used to return information about the old process signal mask. If you just want to change the mask without looking at it, pass a null pointer as the oldset argument. Similarly, if you want to know what’s in the mask without changing it, pass a null pointer for set (in this case the how argument is not significant). The oldset argument is often used to remember the previous signal mask in order to restore it later. (Since the signal mask is inherited over fork and exec calls, you can’t predict what its contents are when your program starts running.)
If invoking sigprocmask causes any pending signals to be unblocked, at least one of those signals is delivered to the process before sigprocmask returns. The order in which pending signals are delivered is not specified, but you can control the order explicitly by making multiple sigprocmask calls to unblock various signals one at a time.
The sigprocmask function returns 0 if successful, and -1 to indicate an error. The following errno error conditions are defined for this function:
EINVAL
The how argument is invalid.
You can’t block the SIGKILL and SIGSTOP signals, but if the signal set includes these, sigprocmask just ignores them instead of returning an error status.
Remember, too, that blocking program error signals such as SIGFPE leads to undesirable results for signals generated by an actual program error (as opposed to signals sent with raise or kill). This is because your program may be too broken to be able to continue executing to a point where the signal is unblocked again. See Program Error Signals.
I know that this has been answered and accepted already but I made tiny changes to the OP's question as follows in accordance with my comments and had a successful result (foo being called every 2 seconds, ad infinitum)
Note that addition of the memset of the temp variable and the changing from SIGVTALRM to SIGALRM.
#include <stdio.h>
#include <sys/time.h>
void foo() {
printf("hi\n");
}
int main(int argc, char **argv)
{
struct sigaction temp;
memset(&temp, 0, sizeof(temp));
temp.sa_handler = &foo;
sigfillset(&temp.sa_mask);
sigdelset(&temp.sa_mask, SIGALRM);
sigdelset(&temp.sa_mask, SIGINT );
sigaction(SIGALRM, &temp, NULL);
struct itimerval tv;
tv.it_value.tv_sec = 2; /* first time interval, seconds part */
tv.it_value.tv_usec = 0; /* first time interval, microseconds part */
tv.it_interval.tv_sec = 2; /* following time intervals, seconds part */
tv.it_interval.tv_usec = 0; /* following time intervals, microseconds part */
if (setitimer(ITIMER_REAL, &tv, NULL)){
fprintf (stderr, "cannot start timer\n");
perror(NULL);
}
while(1) {
fprintf (stdout, "sleep 1\n");
sleep (1);
}
return 0;
}
I want to use the select() function to wait for 1 second, as my program uses signals to control stuff, so sleep() would return prematurely. The weird thing is that when using select() it also returns prematurely.
I am calling select like this
struct timeval timeout;
timeout.tv_sec = 10;
timeout.tv_usec = 1000000;
select (0 ,NULL, NULL, NULL, &timeout);
but whenever a signal arrives, it returns (I am using a nano second timer for the signal)
Anyone knows why?
Try something like this:
struct timespec timeout;
timeout.tv_sec = 10;
timeout.tv_nsec = 0;
while (nanosleep(&timeout, &timeout) && errno == EINTR);
The "remaining time" pointer to nanosleep will take care of letting you restart the sleep with the necessary amount of remaining time if it gets interrupted.
man 7 signal says:
Interruption of System Calls and Library Functions by Signal Handlers
If a signal handler is invoked while a system call or library
function call is blocked, then
either:
* the call is automatically restarted after the signal handler
returns; or
* the call fails with the error EINTR.
Which of these two behaviors occurs depends on the interface and
whether or not the signal handler
was established using the SA_RESTART flag (see sigaction(2)).
The details vary across UNIX sys‐
tems; below, the details for Linux.
If a blocked call to one of the following interfaces is
interrupted by a signal handler, then
the
call will be automatically restarted after the signal handler
returns if the SA_RESTART flag was
used; otherwise the call will fail with the error EINTR
Generally, checking if the return value is -1 and errno == EINTR and then re-calling the function is the right way to correct for this.
Sockets on Linux question
I have a worker thread that is blocked on an accept() call. It simply waits for an incoming network connection, handles it, and then returns to listening for the next connection.
When it is time for the program to exit, how do I signal this network worker thread (from the main thread) to return from the accept() call while still being able to gracefully exit its loop and handle its cleanup code.
Some things I tried:
pthread_kill to send a signal. Feels kludgy to do this, plus it doesn't reliably allow the thread to do it's shutdown logic. Also makes the program terminate as well. I'd like to avoid signals if at all possible.
pthread_cancel. Same as above. It's a harsh kill on the thread. That, and the thread may be doing something else.
Closing the listen socket from the main thread in order to make accept() abort. This doesn't reliably work.
Some constraints:
If the solution involves making the listen socket non-blocking, that is fine. But I don't want to accept a solution that involves the thread waking up via a select call every few seconds to check the exit condition.
The thread condition to exit may not be tied to the process exiting.
Essentially, the logic I am going for looks like this.
void* WorkerThread(void* args)
{
DoSomeImportantInitialization(); // initialize listen socket and some thread specific stuff
while (HasExitConditionBeenSet()==false)
{
listensize = sizeof(listenaddr);
int sock = accept(listensocket, &listenaddr, &listensize);
// check if exit condition has been set using thread safe semantics
if (HasExitConditionBeenSet())
{
break;
}
if (sock < 0)
{
printf("accept returned %d (errno==%d)\n", sock, errno);
}
else
{
HandleNewNetworkCondition(sock, &listenaddr);
}
}
DoSomeImportantCleanup(); // close listen socket, close connections, cleanup etc..
return NULL;
}
void SignalHandler(int sig)
{
printf("Caught CTRL-C\n");
}
void NotifyWorkerThreadToExit(pthread_t thread_handle)
{
// signal thread to exit
}
int main()
{
void* ptr_ret= NULL;
pthread_t workerthread_handle = 0;
pthread_create(&workerthread, NULL, WorkerThread, NULL);
signal(SIGINT, SignalHandler);
sleep((unsigned int)-1); // sleep until the user hits ctrl-c
printf("Returned from sleep call...\n");
SetThreadExitCondition(); // sets global variable with barrier that worker thread checks on
// this is the function I'm stalled on writing
NotifyWorkerThreadToExit(workerthread_handle);
// wait for thread to exit cleanly
pthread_join(workerthread_handle, &ptr_ret);
DoProcessCleanupStuff();
}
Close the socket using the shutdown() call. This will wake up any threads blocked on it, while keeping the file descriptor valid.
close() on a descriptor another thread B is using is inherently hazardous: another thread C may open a new file descriptor which thread B will then use instead of the closed one. dup2() a /dev/null onto it avoids that problem, but does not wake up blocked threads reliably.
Note that shutdown() only works on sockets -- for other kinds of descriptors you likely need the select+pipe-to-self or cancellation approaches.
You can use a pipe to notify the thread that you want it to exit. Then you can have a select() call which selects on both the pipe and the listening socket.
For example (compiles but not fully tested):
// NotifyPipe.h
#ifndef NOTIFYPIPE_H_INCLUDED
#define NOTIFYPIPE_H_INCLUDED
class NotifyPipe
{
int m_receiveFd;
int m_sendFd;
public:
NotifyPipe();
virtual ~NotifyPipe();
int receiverFd();
void notify();
};
#endif // NOTIFYPIPE_H_INCLUDED
// NotifyPipe.cpp
#include "NotifyPipe.h"
#include <unistd.h>
#include <assert.h>
#include <fcntl.h>
NotifyPipe::NotifyPipe()
{
int pipefd[2];
int ret = pipe(pipefd);
assert(ret == 0); // For real usage put proper check here
m_receiveFd = pipefd[0];
m_sendFd = pipefd[1];
fcntl(m_sendFd,F_SETFL,O_NONBLOCK);
}
NotifyPipe::~NotifyPipe()
{
close(m_sendFd);
close(m_receiveFd);
}
int NotifyPipe::receiverFd()
{
return m_receiveFd;
}
void NotifyPipe::notify()
{
write(m_sendFd,"1",1);
}
Then select with receiverFd(), and notify for termination using notify().
Close the listening socket and accept will return an error.
What doesn't reliably work with this? Describe the problems you're facing.
pthread_cancel to cancel a thread blocked in accept() is risky if the pthread implementation does not implement cancellation properly, that is if the thread created a socket, just before returning to your code, a pthread_cancel() is called for it, the thread is canceled, and the newly created socket is leaked. Although FreeBSD 9.0 and later does not have such a race condition problem, but you should check your OS first.