I tried to find an answer to my question at this post: Signal handler and waitpid coexisting but for me isn't very clear at the moment.
I try to explain my problems:
I'm trying to write a C program that concerns IPC between a parent process and its children.
The parent process creates N child processes, then it waits for the termination in a loop like this:
while((pid_term = waitpid(-1, &status, 0)) != -1)
After X seconds, parent receives SIGALRM, then with the sigaction system call,
it catches the alarm:
struct sigaction act;
act.sa_handler = alarmHandler;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGALRM, &act, NULL);
But, when the handler function returns, the waitpid also returns -1, and the parent process exits from the while loop above.
At the moment, the handler function has an empty body.
I ask myself what happened — why did waitpid() return -1 after the handler invocation even though most of the children are still alive? Why doesn't this happen with signal() function?
The default behavior of signal handlers established by sigaction is to interrupt blocking system calls; if you check errno after the alarm fires you should observe it to be set to EINTR. This behavior is almost never what you want; it's only the default for backward compatibility's sake. You can make it not do this by setting the SA_RESTART bit in sa_flags:
struct sigaction act;
act.sa_flags = SA_RESTART;
act.sa_handler = alarmHandler;
sigemptyset(&act.sa_mask);
sigaction(SIGALRM, &act, 0);
One of the most important reasons to use sigaction instead of signal, is that when you use signal it is unpredictable whether or not the signal handler will interrupt blocking system calls. (The System V lineage picked one semantic and the BSD lineage picked the other.)
Related
I'm writing my own echo server using sockets and syscalls. I am using epoll to work with many different clients at the same time and all the operations done with clients are nonblocking. When the server is on and doing nothing, it is in epoll_wait. Now I want to add the possibility to shut the server down using signals. For example, I start the server in bash terminal, then I press ctrl-c and the server somehow handles SIGINT. My plan is to use signalfd. I create new signalfd and add it to epoll instance with the following code:
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGINT);
signal_fd = signalfd(-1, &mask, 0);
epoll_event event;
event.data.fd = signal_fd;
event.events = EPOLLIN;
epoll_ctl(fd, EPOLL_CTL_ADD, signal_fd, &event);
Then I expect, that when epoll is waiting and I press ctrl-c, event on epoll happens, it wakes up and then I handle the signal with the following code:
if (events[i].data.fd == signal_fd)
{
//do something
exit(0);
}
Though in reality the server just stops without handling the signal. What am I doing wrong, what is the correct way to solve my problem? And if I'm not understanding signals correctly, what is the place, where the one should use signalfd?
epoll_wait returns -1 and errno == EINTR when it is interrupted by a signal. In this case you need to read from signal_fd.
Set the signal handler for your signals to SIG_IGN, otherwise signals may terminate your application.
See man signal 7:
The following interfaces are never restarted after being interrupted by
a signal handler, regardless of the use of SA_RESTART; they always fail
with the error EINTR when interrupted by a signal handler:
File descriptor multiplexing interfaces: epoll_wait(2),
epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2).
Though in reality the server just stops without handling the signal. What am I doing wrong, what is the correct way to solve my problem? And if I'm not understanding signals correctly, what is the place, where one should use signalfd?
Signal handlers are per process. You left the signal handler at the default, which is to terminate the processes.
So you need to add something like this,
struct sigaction action;
std::memset(&action, 0, sizeof(struct sigaction));
action.sa_handler = your_handler;
sigaction(signum, &action, NULL);
for each signum that you want your application to receive interrupts for. Also handle the return value of sigaction. My experience is that if you use SIG_IGN as handler than you still interrupt a system call like epoll_pwait from the "outside", but it won't work when you try to wake up the thread from the program itself by sending the signal directly to that thread using pthread_kill.
Next you need to mask all signals from every thread, so that by default no thread will receive it (otherwise a random thread is woken up to handle the signal). The easiest way to do that is by doing it in main before creating any thread.
For example,
sigset_t all_signals;
sigemptyset(&all_signals);
sigaddset(&all_signals, signum); // Repeat for each signum that you use.
sigprocmask(SIG_BLOCK, &all_signals, NULL);
And then unblock the signals per thread when you want that thread to receive the signal.
If you use signalfd, then you do not want to unblock them - that system call unblocks the signals itself, just pass the appropriate mask (set bits for signalfd (it uses the passed mask to unblock). See also the man page of signalfd).
epoll_pwait works differently; like pselect you unblock the signal that you are interested in. You set a handler for that signal (see above) that sets a flag. Then just before calling epoll_pwait you block the signal, then test the flag and handle it, and then call epoll_pwait without first unblocking the signal. After epoll_wait returns you can unblock the signal again so that your handler can be called again.
You have to block all the signals you want to handle with your signal-FD before you create that signal-FD. Otherwise, those signals still interrupt blocked system calls such as epoll_wait() - as you observed.
See also the signalfd(2) man page:
Normally, the set of signals to be received via the file descriptor
should be blocked using sigprocmask(2), to prevent the signals being
handled according to their default dispositions.
Thus, you have to change your example like this:
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGINT);
int r = sigprocmask(SIG_BLOCK, &mask, 0);
if (r == -1) {
// XXX handle errors
}
signal_fd = signalfd(-1, &mask, 0);
if (signal_fd == -1) {
// XXX handle errors
}
epoll_event event;
event.data.fd = signal_fd;
event.events = EPOLLIN;
r = epoll_ctl(fd, EPOLL_CTL_ADD, signal_fd, &event);
if (r == -1) {
// XXX handle errors
}
I'm supposed to install a signal handler to call the function stopContinue() when I receive SIGINT. Here is my code snippet in C, but I'm not sure if this correct. Please let me know where I'm going wrong.
struct sigaction act;
memset(&act, '\0', sizeof(act));
action.sa_flags = 0;
action.sa_handler = stopContinue;
sigaction(SIGINT, &act, NULL);
Does this look roughly correct?
There is a discussion on using sigaction vs. action, and the use of flags here that is different than another question, this is not a dupe.
Why use sigaction if you're not even going to use the struct's sa_flags member? To call stopContinue when Ctrl+C is pressed, you can use:
signal(SIGINT, stopContinue);
If you simply want to ignore Ctrl+C, you can make it even easier:
signal(SIGINT, SIG_IGN);
SIG_IGN is a macro to ignore signals.
Context is this Redis issue. We have a wait3() call that waits for the AOF rewriting child to create the new AOF version on disk. When the child is done, the parent is notified via wait3() in order to substitute the old AOF with the new one.
However in the context of the above issue the user notified us about a bug. I modified a bit the implementation of Redis 3.0 in order to clearly log when wait3() returned -1 instead of crashing because of this unexpected condition. So this is what happens apparently:
wait3() is called when we have pending children to wait for.
the SIGCHLD should be set to SIG_DFL, there is no code setting this signal at all in Redis, so it's the default behavior.
When the first AOF rewrite happens, wait3() successfully works as expected.
Starting from the second AOF rewrite (the second child created), wait3() starts to return -1.
AFAIK it is not possible in the current code that we call wait3() while there are no pending children, since when the AOF child is created, we set server.aof_child_pid to the value of the pid, and we reset it only after a successful wait3() call.
So wait3() should have no reason to fail with -1 and ECHILD, but it does, so probably the zombie child is not created for some unexpected reason.
Hypothesis 1: It is possible that Linux during certain odd conditions will discard the zombie child, for example because of memory pressure? Does not look reasonable since the zombie has just metadata attached to it but who knows.
Note that we call wait3() with WNOHANG. And given that SIGCHLD is set to SIG_DFL by default, the only condition that should lead to failing and returning -1 and ECHLD should be no zombie available to report the information.
Hypothesis 2: Other thing that could happen but there is no explanation if it happens, is that after the first child dies, the SIGCHLD handler is set to SIG_IGN, causing wait3() to return -1 and ECHLD.
Hypothesis 3: Is there some way to remove the zombie children externally? Maybe this user has some kind of script that removes zombie processes in the background so that then the information is no longer available for wait3()? To my knowledge it should never be possible to remove the zombie if the parent does not wait for it (with waitpid or handling the signal) and if the SIGCHLD is not ignored, but maybe there is some Linux specific way.
Hypothesis 4: There is actually some bug in the Redis code so that we successfully wait3() the child the first time without correctly resetting the state, and later we call wait3() again and again but there are no longer zombies, so it returns -1. Analyzing the code it looks impossible, but maybe I'm wrong.
Another important thing: we never observed this in the past. Only happens in this specific Linux system apparently.
UPDATE: Yossi Gottlieb proposed that the SIGCHLD is received by another thread in the Redis process for some reason (does not happen normally, only on this system). We already mask SIGALRM in bio.c threads, perhaps we could try masking SIGCHLD from I/O threads as well.
Appendix: selected parts of Redis code
Where wait3() is called:
/* Check if a background saving or AOF rewrite in progress terminated. */
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
int statloc;
pid_t pid;
if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
int exitcode = WEXITSTATUS(statloc);
int bysignal = 0;
if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);
if (pid == -1) {
redisLog(LOG_WARNING,"wait3() returned an error: %s. "
"rdb_child_pid = %d, aof_child_pid = %d",
strerror(errno),
(int) server.rdb_child_pid,
(int) server.aof_child_pid);
} else if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode,bysignal);
} else if (pid == server.aof_child_pid) {
backgroundRewriteDoneHandler(exitcode,bysignal);
} else {
redisLog(REDIS_WARNING,
"Warning, detected child with unmatched pid: %ld",
(long)pid);
}
updateDictResizePolicy();
}
} else {
Selected parts of backgroundRewriteDoneHandler:
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
if (!bysignal && exitcode == 0) {
int newfd, oldfd;
char tmpfile[256];
long long now = ustime();
mstime_t latency;
redisLog(REDIS_NOTICE,
"Background AOF rewrite terminated with success");
... more code to handle the rewrite, never calls return ...
} else if (!bysignal && exitcode != 0) {
server.aof_lastbgrewrite_status = REDIS_ERR;
redisLog(REDIS_WARNING,
"Background AOF rewrite terminated with error");
} else {
server.aof_lastbgrewrite_status = REDIS_ERR;
redisLog(REDIS_WARNING,
"Background AOF rewrite terminated by signal %d", bysignal);
}
cleanup:
aofClosePipes();
aofRewriteBufferReset();
aofRemoveTempFile(server.aof_child_pid);
server.aof_child_pid = -1;
server.aof_rewrite_time_last = time(NULL)-server.aof_rewrite_time_start;
server.aof_rewrite_time_start = -1;
/* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
server.aof_rewrite_scheduled = 1;
}
As you can see all the code paths must execute the cleanup code that reset server.aof_child_pid to -1.
Errors logged by Redis during the issue
21353:C 29 Nov 04:00:29.957 * AOF rewrite: 8 MB of memory used by copy-on-write
27848:M 29 Nov 04:00:30.133 ^# wait3() returned an error: No child processes. rdb_child_pid = -1, aof_child_pid = 21353
As you can see aof_child_pid is not -1.
TLDR: you are currently relying on unspecified behaviour of signal(2); use sigaction (carefully) instead.
Firstly, SIGCHLD is strange. From the manual page for sigaction;
POSIX.1-1990 disallowed setting the action for SIGCHLD to SIG_IGN. POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD can be used to prevent the creation of zombies (see wait(2)). Nevertheless, the historical BSD and System V behaviors for ignoring SIGCHLD differ, so that the only completely portable method of ensuring that terminated children do not become zombies is to catch the SIGCHLD signal and perform a wait(2) or similar.
And here's the bit from wait(2)'s manual page:
POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see sigaction(2)), then children that terminate do not become zombies and a call to wait() or waitpid() will block until all children have terminated, and then fail with errno set to ECHILD. (The original POSIX standard left the behavior of setting SIGCHLD to SIG_IGN unspecified. Note that even though the default disposition of SIGCHLD is "ignore", explicitly setting the disposition to SIG_IGN results in different treatment of zombie process children.) Linux 2.6 conforms to this specification. However, Linux 2.4 (and earlier) does not: if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored, that is, the call blocks until the next child terminates and then returns the process ID and status of that child.
Note the effect of that is that if the signal's handling behaves like SIG_IGN is set, then (under Linux 2.6+) you will see the behaviour you are seeing - i.e. wait() will return -1 and ECHLD because the child will have been automatically reaped.
Secondly, signal handling with pthreads (which I think you are using here) is notoriously hard. The way it's meant to work (as I'm sure you know) is that process directed signals get sent to an arbitrary thread within the process that has the signal unmasked. But whilst threads have their own signal mask, there is a process wide action handler.
Putting these two things together, I think you are running across a problem I've run across before. I have had problems getting SIGCHLD handling to work with signal() (which is fair enough as that was deprecated prior to pthreads), which were fixed by moving to sigaction and carefully setting per thread signal masks. My conclusion at the time was that the C library was emulating (with sigaction) what I was telling it to do with signal(), but was getting tripped up by pthreads.
Note that you are currently relying on unspecified behaviour. From the manual page of signal(2):
The effects of signal() in a multithreaded process are unspecified.
Here's what I recommend you do:
Move to sigaction() and pthread_sigmask(). Explicitly set the handling of all the signals you care about (even if you think that's the current default), even when setting them to SIG_IGN or SIG_DFL. I block signals whilst I do this (possibly overabundance of caution but I copied the example from somewhere).
Here's what I am doing (roughly):
sigset_t set;
struct sigaction sa;
/* block all signals */
sigfillset (&set);
pthread_sigmask (SIG_BLOCK, &set, NULL);
/* Set up the structure to specify the new action. */
memset (&sa, 0, sizeof (struct sigaction));
sa.sa_handler = handlesignal; /* signal handler for INT, TERM, HUP, USR1, USR2 */
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGINT, &sa, NULL);
sigaction (SIGTERM, &sa, NULL);
sigaction (SIGHUP, &sa, NULL);
sigaction (SIGUSR1, &sa, NULL);
sigaction (SIGUSR2, &sa, NULL);
sa.sa_handler = SIG_IGN;
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGPIPE, &sa, NULL); /* I don't care about SIGPIPE */
sa.sa_handler = SIG_DFL;
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGCHLD, &sa, NULL); /* I want SIGCHLD to be handled by SIG_DFL */
pthread_sigmask (SIG_UNBLOCK, &set, NULL);
Where possible set all your signal handlers and masks etc. prior to any pthread operations. Where possible do not change signal handlers and masks (you might need to do this prior to and subsequent to fork() calls).
If you need to a signal handler for SIGCHLD (rather than relying on SIG_DFL), if possible let it be received by any thread, and use the self-pipe method or similar to alert the main program.
If you must have threads that do/don't handle certain signals, try to restrict yourself to pthread_sigmask in the relevant thread rather than sig* calls.
Just in case you run headlong into the next issue I ran into, ensure that after you have fork()'d, you set up again the signal handling from scratch (in the child) rather than relying on whatever you might inherit from the the parent process. If there's one thing worse than signals mixed with pthread, it's signals mixed with pthread with fork().
Note I cannot explain exactly entirely why change (1) works, but it has fixed what looks like a very similar issue for me and was after all relying on something that was 'unspecified' previously. It's closest to your 'hypothesis 2' but I think it is really incomplete emulation of legacy signal functions (specifically emulating the previously racy behaviour of signal() which is what caused it to be replaced by sigaction() in the first place - but this is just a guess).
Incidentally, I suggest you use wait4() or (as you aren't using rusage) waitpid() rather than wait3(), so you can specify a specific PID to wait for. If you have something else that generates children (I've had a library do it), you may end up waiting for the wrong thing. That said, I don't think that's what's happening here.
I have the following program where I set the parent's process group and the child's process group, as well as giving the terminal control to the parent. Then, I run "cat" in the "background" child, which is supposed to generate SIGTTIN. However, the printf line in sighandler is not printed. Any ideas how to properly detect SIGTTIN in this case?
void sighandler(int signo){
printf("SIGTTIN detected\n");
}
int main() {
int status;
pid_t pid;
pid = fork ();
setpgid(0,0);
tcsetpgrp (STDIN_FILENO, 0);
signal(SIGTTIN, sighandler);
if (pid == 0)
{
setpgid(0,0);
execl ("cat", NULL);
_exit (EXIT_FAILURE);
}
else{
int status;
setpgid(pid,pid);
waitpid(-1, &status, 0);
}
return status;
}
Mariska,
For Parent Processes
As explained in the Stack Overflow post titled, "Catch Ctrl-C in C,":
The behavior of signal() varies across UNIX versions, and has also
varied historically across different versions of Linux. Avoid its use:
use sigaction(2) instead.
As described in the Linux Programmer's Manual, you should use sigaction():
The sigaction() system call is used to change the action taken by a
process on receipt of a specific signal.
Try this:
#include<stdio.h>
#include <signal.h>
static void handler(int signum)
{
/* Take appropriate actions for signal delivery */
printf("SIGTTIN detected\n");
}
int main()
{
struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART; /* Restart functions if
interrupted by handler */
if (sigaction(SIGINT, &sa, NULL) == -1)
/* Handle error */;
/* Further code */
}
For Child Processes
There are a couple of points you should know when dealing with signal handlers for the child processes:
A forked child inherits the signal handlers from the parent
Because of the above, you need to implement some sort of signal handler for the parent and then change the signal handler before and after executing a child.
As explained in the Linux Programmer's Manual:
All process attributes are preserved during an execve(), except the following:
a. The set of pending signals is cleared (sigpending(2)).
b. The dispositions of any signals that are being caught are
reset to being ignored.
c. Any alternate signal stack is not preserved (sigaltstack(2)).
Thus, the exec() functions do not preserve signal handlers.
From the above, I am trying to show you that pressing Ctrl-C sends the signal to the parent process (unless you use exec()), and then the signals are automatically propagated to children. This is why we need to change the signal handler. Even when the child is currently "active", the parent will still receive signals before the child will.
Please let me know if you have any questions!
In Android the bionic loader sets a default signal handler for every process on statrtup:
void debugger_init()
{
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_sigaction = debugger_signal_handler;
act.sa_flags = SA_RESTART | SA_SIGINFO;
sigemptyset(&act.sa_mask);
sigaction(SIGILL, &act, NULL);
sigaction(SIGABRT, &act, NULL);
sigaction(SIGBUS, &act, NULL);
sigaction(SIGFPE, &act, NULL);
sigaction(SIGSEGV, &act, NULL);
sigaction(SIGSTKFLT, &act, NULL);
sigaction(SIGPIPE, &act, NULL);
}
I would like to set it back to its default, meaning I want to ignore these signal and that the default handler will take place (CORE DUMP)
How do I revert the action performed ? I want to ignore all these as if the above function never was called
Read signal(7), sigaction(2) and perhaps signal(2).
You could call
signal(SIGILL, SIG_DFL);
signal(SIGABRT, SIG_DFL);
and so on early in your main (which is entered after dynamic loading)
You could also use sigaction with sa_handler set to SIG_DFL
Of course, things are more tricky if you want to default handle these signals before your main, e.g. in some static constructor!
I found it could lead unexpected behavior when mixed using sigaction and signal to set for one process.
From signal(2) posted above(wouldn't surprise me if this warning wasn't there 8 years ago):
WARNING: the behavior of signal() varies across UNIX versions,
and has also varied historically across different versions of
Linux. Avoid its use: use sigaction(2) instead.
Looking at https://docs.oracle.com/cd/E19455-01/806-5257/tlib-49639/index.html
int pthread_sigmask(int how, const sigset_t *new, sigset_t *old);
When the value of new is NULL, the value of how is not significant and the signal mask of the thread is unchanged. So, to inquire about currently blocked signals, assign a NULL value to the new argument.
So I guess you could use that to get the current sigmask and just wipe each one
sigset_t tempSet;
pthread_sigmask(SIG_SETMASK, NULL, &tempSet);
sigdelset(&tempSet, /*Signal you don't want to handle*/);
sigdelset(&tempSet, /*repeat for each signal*/);
pthread_sigmask(SIG_SETMASK, &tempSet, NULL);
It's pretty much the same thing with sigact to query the current action for a signal, from sigaction(2)
sigaction() can be called with a NULL second argument to query
the current signal handler.
It's not clear to me the ramifications of, in my case, having SIGKILL in the first call to sigaction
struct sigaction sigAct;
sigaction(SIGKILL, NULL, &sigAct);
sigAct.sa_handler = SIG_DFL; // Ensure default handling of Kill signal
sigaction(/*Signal you don't want to handle*/, &sigAct, NULL);
sigaction(/*repeat for each signal*/, &sigAct, NULL);
Using siggetmask is obsolete by sigprocmask, and sigprocmask is only for single threaded environments.