I have 2 programs: 1) Father 2) Child.
When Father receives SIGINT (CTRL-C) signal his handler sends a SIGTERM to his child. The problem is that often (not always, don't know why) it shows this error in loop after SIGINT:
Invalid Argument
Goal of the father is to create a child and then just being alive to be ready to handle SIGINT.
Father
#include "library.h"
static void handler();
int main(int argc, char* argv[]){
int value, que_id;
char str_que_id[10], **child_arg;
pid_t child_pid;
sigaction int_sa;
//Create message queue
do{
que_id = msgget(IPC_PRIVATE, ALL_PERM | IPC_CREAT);
}while(que_id == -1);
snprintf(str_que_id, sizeof(str_que_id), "%d", que_id);
//Set arguments for child
child_arg = malloc(sizeof(char*) * 3);
child[0] = "child";
child[1] = str_que_id;
child[2] = NULL;
//Set handler for SIGINT
int_sa.sa_handler = &handler;
int_sa.sa_flags = SA_RESTART;
sigemptyset(&int_sa.sa_mask);
sigaddset(&int_sa.sa_mask, SIGALRM);
sigaction(SIGINT, &int_sa, NULL);
//Fork new child
if(value = fork() == 0){
child_pid = getpid();
do{
errno = 0;
execve("./child", child_arg, NULL);
}while(errno);
}
//Keep alive father
while(1);
return 0;
}
static void handler(){
if(kill(child_pid, SIGTERM) != -1)
waitpid(child_pid, NULL, WNOHANG);
while(msgctl(que_id, IPC_RMID, NULL) == -1);
free(child_arg);
exit(getpid());
}
Goal of the child (only for now in my project) is just to wait a new message incoming from the message queue. Since there won't be any message, it will always be blocked.
Child
#include "library.h"
typedef struct _Msgbuf {
long mtype;
char[10] message;
} Msgbuf;
int main(int argc, char * argv[]){
int que_id;
//Recovery of message queue id
que_id = atoi(argv[1]);
//Set handler for SIGTERM
signal(SIGTERM, handler);
//Dynamic allocation of message
received = calloc(1, sizeof(Msgbuf));
while(1){
do{
errno = 0;
//This will block child because there won't be any message incoming
msgrcv(que_id, received, sizeof(Msgbuf) - sizeof(long), getpid(), 0);
if(errno)
perror(NULL);
}while(errno && errno != EINTR);
}
}
static void handler(){
free(received);
exit(getpid());
}
I know from the man pages on msgrcv():
The calling process catches a signal. In this case the system call fails with errno set to EINTR. (msgrcv() is never automatically restarted after being interrupted by a signal handler, regardless of the setting of the SA_RESTART flag when establishing a signal handler.)
So why does it go to loop printing that error? It should exit in the handler instead it seems that after the handler comes back and (since the free(received) ) it doesn't find the buffer of the message setting errno to EINVAL .
(Almost) always errno only carries a sane value if and only if a function call failed.
This is the case for msgrcv().
From msgrcv()'s documentation:
RETURN VALUE
Upon successful completion, msgrcv() shall return a value equal to the number of bytes actually placed into the buffer mtext. Otherwise, no message shall be received, msgrcv() shall return -1, and errno shall be set to indicate the error.
So only use errno if msgrcv() returned -1, else errno's value is undefined and it might very well contain garbage or not ...
The code below does not make sense ...
msgrcv(que_id, received, sizeof(Msgbuf) - sizeof(long), getpid(), 0);
if(errno)
perror(NULL);
} while(errno && errno != EINTR);
... and should look like:
if (-1 == msgrcv(que_id, received, sizeof(Msgbuf) - sizeof(long), getpid(), 0))
{
/* Only here errno had a well defined value. */
perror("msgrcv() failed"); /* perror() translates errno into a human readable text prefixed by its argument and logs it to the stderr. */
}
else
{
errno = 0;
}
} while (errno && errno != EINTR);
This BTW
do{
errno = 0;
execve("./child", child_arg, NULL);
}while(errno);
only works as the members of the exec*() family of functions only return on error. So when the while's condition is tested then execve() had failed, though errno had been set. Here also the initial errnr = 0; setting is useless.
There are a number of problems with your program. It invokes undefined behaviour by calling exit, free, and msgctl from within the signal handlers. The table in the Signal Actions section of The Open Group Base Specifications lists the functions that are safe to call from within a signal handler. In most cases, you simply want to toggle a "running" flag from within the handler and have your main loop run until it is told to exit. Something like the following simple example:
#include <signal.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/* this will be set when the signal is received */
static sig_atomic_t running = 1;
void
sig_handler(int signo, siginfo_t *si, void *context)
{
running = 0;
}
int
main(int argc, char *argv[])
{
int rc;
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = &sig_handler;
rc = sigaction(SIGINT, &sa, NULL);
if (rc < 0) {
perror("sigaction");
exit(EXIT_FAILURE);
}
printf("Waiting for SIGINT\n");
while (running) {
printf("... sleeping for 10 seconds\n");
sleep(10);
}
printf("Signal received\n");
return 0;
}
I put together a more complex session on repl.it as well.
The other problem is that you assume that errno retains a zero value across function calls. This is likely the case but the only thing that you should assume about errno is that it will be assigned a value when a library function returns a failure code -- e.g., read returns -1 and sets errno to something that indicates the error. The conventional way to call a C runtime library function is to check the return value and consult errno when appropriate:
int bytes_read;
unsigned char buf[128];
bytes_read = read(some_fd, &buf[0], sizeof(buf));
if (bytes_read < 0) {
printf("read failed: %s (%d)\n", strerror(errno), errno);
}
Your application is probably looping because the parent is misbehaving and not waiting on the child or something similar (see above about undefined behavior). If the message queue is removed before the child exits, then the msgrcv call is going to fail and set errno to EINVAL. You should check if msgrcv is failing before you check errno. The child should also be terminating the loop when it encounters a msgrcv failure with errno equal to EINVAL since that is a terminal condition -- the anonymous message queue can never be recreated after it ceases to exist.
Related
I'm a newbie in c development. Recently, I noticed a problem when I was learning multi-threaded development, when I set a signal in the main thread of Action and when I try to block the signal action set by the main thread in the child thread, I find that it does not work.
Here is a brief description of the code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
void *thread_start(void *_arg) {
sleep(2);
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGUSR2);
pthread_sigmask(SIG_BLOCK, &mask, NULL);
printf("child-thread executed\n");
while (true) {
sleep(1);
}
return NULL;
}
void sig_handler(int _sig) {
printf("executed\n");
}
int main(int argc, char *argv[]) {
pthread_t t_id;
int s = pthread_create(&t_id, NULL, thread_start, NULL);
if (s != 0) {
char *msg = strerror(s);
printf("%s\n", msg);
}
printf("main-thread executed, create [%lu]\n", t_id);
signal(SIGUSR2, sig_handler);
while (true) {
sleep(1);
}
return EXIT_SUCCESS;
}
The signal mask is a per-thread property, a thread will inherit whatever the parent has at time of thread creation but, after that, it controls its own copy.
In other words, blocking a signal in a thread only affects the delivery of signals for that thread, not for any other.
In any case, even if it were shared (it's not), you would have a potential race condition since you start the child thread before setting up the signal in the main thread. Hence it would be indeterminate as to whether the order was "parent sets up signal, then child blocks" or vice versa. But, as stated, that's irrelevant due to the thread-specific nature of the signal mask.
If you want a thread to control the signal mask of another thread, you will need to use some form of inter-thread communication to let the other thread do it itself.
As I wrote in a comment, any USR1 signal sent to the process will be delivered using the main thread. It's output will not tell you exactly what happened, so it is not really a good way to test threads and signal masks. Additionally, it uses printf() in a signal handler, which may or may not work: printf() is not an async-signal safe function, so it must not be used in a signal handler.
Here is a better example:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <limits.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
/* This function writes a message directly to standard error,
without using the stderr stream. This is async-signal safe.
Returns 0 if success, errno error code if an error occurs.
errno is kept unchanged. */
static int write_stderr(const char *msg)
{
const char *end = msg;
const int saved_errno = errno;
int retval = 0;
ssize_t n;
/* If msg is non-NULL, find the string-terminating '\0'. */
if (msg)
while (*end)
end++;
/* Write the message to standard error. */
while (msg < end) {
n = write(STDERR_FILENO, msg, (size_t)(end - msg));
if (n > 0) {
msg += n;
} else
if (n != 0) {
/* Bug, should not occur */
retval = EIO;
break;
} else
if (errno != EINTR) {
retval = errno;
break;
}
}
/* Paranoid check that exactly the message was written */
if (!retval)
if (msg != end)
retval = EIO;
errno = saved_errno;
return retval;
}
static volatile sig_atomic_t done = 0;
pthread_t main_thread;
pthread_t other_thread;
static void signal_handler(int signum)
{
const pthread_t id = pthread_self();
const char *thread = (id == main_thread) ? "Main thread" :
(id == other_thread) ? "Other thread" : "Unknown thread";
const char *event = (signum == SIGHUP) ? "HUP" :
(signum == SIGUSR1) ? "USR1" :
(signum == SIGINT) ? "INT" :
(signum == SIGTERM) ? "TERM" : "Unknown signal";
if (signum == SIGTERM || signum == SIGINT)
done = 1;
write_stderr(thread);
write_stderr(": ");
write_stderr(event);
write_stderr(".\n");
}
static int install_handler(int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_handler = signal_handler;
act.sa_flags = 0;
if (sigaction(signum, &act, NULL) == -1)
return -1;
return 0;
}
void *other(void *unused __attribute__((unused)))
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGHUP);
pthread_sigmask(SIG_BLOCK, &mask, NULL);
while (!done)
sleep(1);
return NULL;
}
int main(void)
{
pthread_attr_t attrs;
sigset_t mask;
int result;
main_thread = pthread_self();
other_thread = pthread_self(); /* Just to initialize it to a sane value */
/* Install HUP, USR1, INT, and TERM signal handlers. */
if (install_handler(SIGHUP) ||
install_handler(SIGUSR1) ||
install_handler(SIGINT) ||
install_handler(SIGTERM)) {
fprintf(stderr, "Cannot install signal handlers: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
/* Create the other thread. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 2*PTHREAD_STACK_MIN);
result = pthread_create(&other_thread, &attrs, other, NULL);
pthread_attr_destroy(&attrs);
if (result) {
fprintf(stderr, "Cannot create a thread: %s.\n", strerror(result));
return EXIT_FAILURE;
}
/* This thread blocks SIGUSR1. */
sigemptyset(&mask);
sigaddset(&mask, SIGUSR1);
pthread_sigmask(SIG_BLOCK, &mask, NULL);
/* Ready to handle signals. */
printf("Send a HUP, USR1, or TERM signal to process %d.\n", (int)getpid());
fflush(stdout);
while (!done)
sleep(1);
pthread_join(other_thread, NULL);
return EXIT_SUCCESS;
}
Save it as e.g. example.c, and compile and run using
gcc -Wall -O2 example.c -pthread -o exprog
./exprog
It will block the USR1 signal in the main thread, and HUP and TERM in the other thread. It will also catch the INT signal (Ctrl+C), which is not blocked in either thread. When you send it the INT or TERM signal, the program will exit.
If you send the program the USR1 signal, you'll see that it will always be delivered using the other thread.
If you send the program a HUP signal, you'll see that it will always be delivered using the main thread.
If you send the program a TERM signal, it too will be delivered using the main thread, but it will also cause the program to exit (nicely).
If you send the program an INT signal, it will be delivered using one of the threads. It depends on several factors whether you'll always see it being delivered using the same thread or not, but at least in theory, it can be delivered using either thread. This signal too will cause the program to exit (nicely).
I have a parent process that created 16 child processes using fork in a loop. Eventually, every child sends a SIGUSR1 signal , that is handled by a handler function in the parent process.
My problem is this - some children send the signal while a signal from another child is handled. I read that the handler function then stops and handles the new signal, ignoring the current signal its working on.
I tried to fix that by sending: kill(0,SIGSTOP) at the start of the handler function, but looks like that stops the parent process as well. Is there a way to send this signal only to the children?
If its not possible, is my goal achievable using wait, waitpid and kill?
Added the code below, I left out stuff like checking the return value for read, open etc.
Handler function:
void my_signal_handler( int signum, siginfo_t* info, void* ptr)
{
kill(0, SIGSTOP);
int sonPid = info->si_pid;
char* pipeName = malloc(14 + sizeof(int));//TODO ok?
sprintf(pipeName, "//tmp//counter_%d" , (int) sonPid); //TODO double //?
size_t fdPipe = open(pipeName, O_RDONLY);
int cRead;
int countRead = read(fdPipe,&cRead,sizeof(int));
COUNT+= cRead;
kill(0, SIGCONT);
return;
}
Creating the child processes:
struct sigaction new_action;
memset(&new_action, 0, sizeof(new_action));
new_action.sa_handler = my_signal_handler;
new_action.sa_flags = SA_SIGINFO;
if( 0 != sigaction(SIGUSR1, &new_action, NULL) )
{
printf("Signal handle registration failed. %s\n", strerror(errno));
return -1;
}
for(int i=0; i<16; i++){
pid_t cpid = fork();
if(cpid == 0) // child
{
execv("./counter",argvv); // some arguments to the function
printf("execv failed: %s\n", strerror(errno));
return -1;
}
else{
continue;
}
The counter program of the children, in short it counts the appearances of the char counc in some part of the file then prints it to a pipe.:
int main(int argc, char** argv){
int counter = 0;
char counc = argv[1][0];
char* filename = argv[2];
off_t offset = atoll(argv[3]);
ssize_t length = atoll(argv[4]);
int fd = open(filename, O_RDWR | O_CREAT);
char* arr = (char*)mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
for(int i=0; i<length; i++){
if(arr[i] == counc){
counter++;
}
}
pid_t proid = getpid();
pid_t ppid = getppid();
char* pipeName = malloc(14 + sizeof(pid_t));
sprintf(pipeName, "//tmp//counter_%d" , (int) proid);
size_t fdPipe = mkfifo(pipeName, 0777);
int didopen = open(pipeName,O_WRONLY);
size_t wrote = write(fdPipe,&counter , 1 );
if(wrote < 0){
printf(OP_ERR, strerror( errno ));
return errno;
}
kill(ppid, SIGUSR1);
//close the pipe and unmap the array
return 1;
}
If i understand your question correctly, you are having trouble the desired signal function in father and children separately. If this is not your case please correct me.
If your problem is this, you can simply create some if statements, to test the pid that fork() call returns and then only execute if you are the father. So...
if (pid == 0)
//ChildProcess();
// do nothing!
else
//ParentProcess();
// do something!
Note that you have to define size_t pid as a global variable, to be visible in both main, and signal handler function you are implementing!
Besides what others have already pointed out regarding the use of non-async-safe functions from within a signal handler, I'm gonna go out on a limb and guess that the problem is either of these two things:
You're incorrectly assuming that signals sent via kill are queued up for the process they're sent to. They're not queued up (unless you use sigqueue on an operating system that supports it); the set of pending (yet un-handled) signals for the destination process is instead updated. See signal queuing in C. Or,
You're running this code on an operating system that doesn't support "reliable signal semantics". See man 3 sysv_signal for insights (horrors) like:
However sysv_signal() provides the System V unreliable signal semantics, that is: a) the disposition of the signal is reset to the default when the handler is invoked; b) delivery of further instances of the signal is not blocked while the signal handler is executing;
As I understand, the best way to achieve terminating a child process when its parent dies is via prctl(PR_SET_PDEATHSIG) (at least on Linux): How to make child process die after parent exits?
There is one caveat to this mentioned in man prctl:
This value is cleared for the child of a fork(2) and (since Linux 2.4.36 / 2.6.23) when executing a set-user-ID or set-group-ID binary, or a binary that has associated capabilities (see capabilities(7)). This value is preserved across execve(2).
So, the following code has a race condition:
parent.c:
#include <unistd.h>
int main(int argc, char **argv) {
int f = fork();
if (fork() == 0) {
execl("./child", "child", NULL, NULL);
}
return 0;
}
child.c:
#include <sys/prctl.h>
#include <signal.h>
int main(int argc, char **argv) {
prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
// ...
return 0;
}
Namely, the parent count die before prctl() is executed in the child (and thus the child will not receive the SIGKILL). The proper way to address this is to prctl() in the parent before the exec():
parent.c:
#include <unistd.h>
#include <sys/prctl.h>
#include <signal.h>
int main(int argc, char **argv) {
int f = fork();
if (fork() == 0) {
prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
execl("./child", "child", NULL, NULL);
}
return 0;
}
child.c:
int main(int argc, char **argv) {
// ...
return 0;
}
However, if ./child is a setuid/setgid binary, then this trick to avoid the race condition doesn't work (exec()ing the setuid/setgid binary causes the PDEATHSIG to be lost as per the man page quoted above), and it seems like you are forced to employ the first (racy) solution.
Is there any way if child is a setuid/setgid binary to prctl(PR_SET_PDEATH_SIG) in a non-racy way?
It is much more common to have the parent process set up a pipe. Parent process keeps the write end open (pipefd[1]), closing the read end (pipefd[0]). Child process closes the write end (pipefd[1]), and sets the read end (pipefd[1]) nonblocking.
This way, the child process can use read(pipefd[0], buffer, 1) to check if the parent process is still alive. If the parent is still running, it will return -1 with errno == EAGAIN (or errno == EINTR).
Now, in Linux, the child process can also set the read end async, in which case it will be sent a signal (SIGIO by default) when the parent process exits:
fcntl(pipefd[0], F_SETSIG, desired_signal);
fcntl(pipefd[0], F_SETOWN, getpid());
fcntl(pipefd[0], F_SETFL, O_NONBLOCK | O_ASYNC);
Use a siginfo handler for desired_signal. If info->si_code == POLL_IN && info->si_fd == pipefd[0], the parent process either exited or wrote something to the pipe. Because read() is async-signal safe, and the pipe is nonblocking, you can use read(pipefd[0], &buffer, sizeof buffer) in the signal handler whether the parent wrote something, or if parent exited (closed the pipe). In the latter case, the read() will return 0.
As far as I can see, this approach has no race conditions (if you use a realtime signal, so that the signal is not lost because an user-sent one is already pending), although it is very Linux-specific. After setting the signal handler, and at any point during the lifetime of the child process, the child can always explicitly check if the parent is still alive, without affecting the signal generation.
So, to recap, in pseudocode:
Construct pipe
Fork child process
Child process:
Close write end of pipe
Install pipe signal handler (say, SIGRTMIN+0)
Set read end of pipe to generate pipe signal (F_SETSIG)
Set own PID as read end owner (F_SETOWN)
Set read end of pipe nonblocking and async (F_SETFL, O_NONBLOCK | O_ASYNC)
If read(pipefd[0], buffer, sizeof buffer) == 0,
the parent process has already exited.
Continue with normal work.
Child process pipe signal handler:
If siginfo->si_code == POLL_IN and siginfo->si_fd == pipefd[0],
parent process has exited.
To immediately die, use e.g. raise(SIGKILL).
Parent process:
Close read end of pipe
Continue with normal work.
I do not expect you to believe my word.
Below is a crude example program you can use to check this behaviour yourself. It is long, but only because I wanted it to be easy to see what is happening at runtime. To implement this in a normal program, you only need a couple of dozen lines of code. example.c:
#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
static volatile sig_atomic_t done = 0;
static void handle_done(int signum)
{
if (!done)
done = signum;
}
static int install_done(const int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_handler = handle_done;
act.sa_flags = 0;
if (sigaction(signum, &act, NULL) == -1)
return errno;
return 0;
}
static int deathfd = -1;
static void death(int signum, siginfo_t *info, void *context)
{
if (info->si_code == POLL_IN && info->si_fd == deathfd)
raise(SIGTERM);
}
static int install_death(const int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_sigaction = death;
act.sa_flags = SA_SIGINFO;
if (sigaction(signum, &act, NULL) == -1)
return errno;
return 0;
}
int main(void)
{
pid_t child, p;
int pipefd[2], status;
char buffer[8];
if (install_done(SIGINT)) {
fprintf(stderr, "Cannot set SIGINT handler: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
if (pipe(pipefd) == -1) {
fprintf(stderr, "Cannot create control pipe: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
child = fork();
if (child == (pid_t)-1) {
fprintf(stderr, "Cannot fork child process: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
if (!child) {
/*
* Child process.
*/
/* Close write end of pipe. */
deathfd = pipefd[0];
close(pipefd[1]);
/* Set a SIGHUP signal handler. */
if (install_death(SIGHUP)) {
fprintf(stderr, "Child process: cannot set SIGHUP handler: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
/* Set SIGTERM signal handler. */
if (install_done(SIGTERM)) {
fprintf(stderr, "Child process: cannot set SIGTERM handler: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
/* We want a SIGHUP instead of SIGIO. */
fcntl(deathfd, F_SETSIG, SIGHUP);
/* We want the SIGHUP delivered when deathfd closes. */
fcntl(deathfd, F_SETOWN, getpid());
/* Make the deathfd (read end of pipe) nonblocking and async. */
fcntl(deathfd, F_SETFL, O_NONBLOCK | O_ASYNC);
/* Check if the parent process is dead. */
if (read(deathfd, buffer, sizeof buffer) == 0) {
printf("Child process (%ld): Parent process is already dead.\n", (long)getpid());
return EXIT_FAILURE;
}
while (1) {
status = __atomic_fetch_and(&done, 0, __ATOMIC_SEQ_CST);
if (status == SIGINT)
printf("Child process (%ld): SIGINT caught and ignored.\n", (long)getpid());
else
if (status)
break;
printf("Child process (%ld): Tick.\n", (long)getpid());
fflush(stdout);
sleep(1);
status = __atomic_fetch_and(&done, 0, __ATOMIC_SEQ_CST);
if (status == SIGINT)
printf("Child process (%ld): SIGINT caught and ignored.\n", (long)getpid());
else
if (status)
break;
printf("Child process (%ld): Tock.\n", (long)getpid());
fflush(stdout);
sleep(1);
}
printf("Child process (%ld): Exited due to %s.\n", (long)getpid(),
(status == SIGINT) ? "SIGINT" :
(status == SIGHUP) ? "SIGHUP" :
(status == SIGTERM) ? "SIGTERM" : "Unknown signal.\n");
fflush(stdout);
return EXIT_SUCCESS;
}
/*
* Parent process.
*/
/* Close read end of pipe. */
close(pipefd[0]);
while (!done) {
fprintf(stderr, "Parent process (%ld): Tick.\n", (long)getpid());
fflush(stderr);
sleep(1);
fprintf(stderr, "Parent process (%ld): Tock.\n", (long)getpid());
fflush(stderr);
sleep(1);
/* Try reaping the child process. */
p = waitpid(child, &status, WNOHANG);
if (p == child || (p == (pid_t)-1 && errno == ECHILD)) {
if (p == child && WIFSIGNALED(status))
fprintf(stderr, "Child process died from %s. Parent will now exit, too.\n",
(WTERMSIG(status) == SIGINT) ? "SIGINT" :
(WTERMSIG(status) == SIGHUP) ? "SIGHUP" :
(WTERMSIG(status) == SIGTERM) ? "SIGTERM" : "an unknown signal");
else
fprintf(stderr, "Child process has exited, so the parent will too.\n");
fflush(stderr);
break;
}
}
if (done) {
fprintf(stderr, "Parent process (%ld): Exited due to %s.\n", (long)getpid(),
(done == SIGINT) ? "SIGINT" :
(done == SIGHUP) ? "SIGHUP" : "Unknown signal.\n");
fflush(stderr);
}
/* Never reached! */
return EXIT_SUCCESS;
}
Compile and run the above using e.g.
gcc -Wall -O2 example.c -o example
./example
The parent process will print to standard output, and the child process to standard error. The parent process will exit if you press Ctrl+C; the child process will ignore that signal. The child process uses SIGHUP instead of SIGIO (although a realtime signal, say SIGRTMIN+0, would be safer); if generated by the parent process exiting, the SIGHUP signal handler will raise SIGTERM in the child.
To make the termination causes easy to see, the child catches SIGTERM, and exits the next iteration (a second later). If so desired, the handler can use e.g. raise(SIGKILL) to terminate itself immediately.
Both parent and child processes show their process IDs, so you can easily send a SIGINT/SIGHUP/SIGTERM signal from another terminal window. (The child process ignores SIGINT and SIGHUP sent from outside the process.)
Your last code snippet still contains a race condition:
int main(int argc, char **argv) {
int f = fork();
if (fork() == 0) {
// <- !!!race time!!!
prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
execl("./child", "child", NULL, NULL);
}
return 0;
}
Meaning that in the child, after the fork, until the prctl() has visible effects (think: returns), there is a time-window where the parent may exit.
To fix this race you have to save the PID of the parent before the fork and check it after the prctl() call, e.g.:
pid_t ppid_before_fork = getpid();
pid_t pid = fork();
if (pid == -1) { perror(0); exit(1); }
if (pid) {
; // continue parent execution
} else {
int r = prctl(PR_SET_PDEATHSIG, SIGTERM);
if (r == -1) { perror(0); exit(1); }
// test in case the original parent exited just
// before the prctl() call
if (getppid() != ppid_before_fork)
exit(1);
// continue child execution ...
(see also)
Regarding executing a setuid/setgid program: You can then pass the ppid_before_fork by other means (e.g. in the argument or environment vector) and execute the prctl() (including the comparison) after the exec, i.e. inside the execed binary.
I don't know this for sure, but clearing the parent death signal on execve when invoking a set-id binary looks like an intentional restriction for security reasons. I'm not sure why, considering that you can use kill to send signals to setuid programs that share your real user ID, but they wouldn't have bothered making that change in 2.6.23 if there wasn't a reason to disallow it.
Since you control the code of the set-id child, here is a kludge: make the call to prctl, then immediately afterward, call getppid and see if it returns 1. If it does, then either the process was started directly by init (which is not as rare as it used to be) or the process was reparented to init before it had a chance to call prctl, which means its original parent is dead and it should exit.
(This is a kludge because I know of no way to rule out the possibility that the process was started directly by init. init never exits, so you have one case where it should exit and one case where it shouldn't and no way to tell which. But if you know from the larger design that the process will not be started directly by init, it should be reliable.)
(You must call getppid after prctl, or you have only narrowed the race window, not eliminated it.)
I have a multithreaded application that installs a handler for SIGCHLD that logs and reaps the child processes.
The problem I see starts when I'm doing a call to system(). system() needs to wait for the child process to end and reaps him itself since it needs the exit code. This is why it calls sigprocmask() to block SIGCHLD. But in my multithreaded application, the SIGCHLD is still called in a different thread and the child is reaped before system() has a chance to do so.
Is this a known problem in POSIX?
One way around this I thought of is to block SIGCHLD in all other threads but this is not really realistic in my case since not all threads are directly created by my code.
What other options do I have?
Yes, it's a known (or at least strongly intimated) problem.
Blocking SIGCHLD while waiting for the child to terminate prevents the application from catching the signal and obtaining status from system()'s child process before system() can get the status itself.
....
Note that if the application is catching SIGCHLD signals, it will receive such a signal before a successful system() call returns.
(From the documentation for system(), emphasis added.)
So, POSIXly you are out of luck, unless your implementation happens to queue SIGCHLD. If it does, you can of course keep a record of pids you forked, and then only reap the ones you were expecting.
Linuxly, too, you are out of luck, as signalfd appears also to collapse multiple SIGCHLDs.
UNIXly, however, you have lots of clever and too-clever techniques available to manage your own children and ignore those of third-party routines. I/O multiplexing of inherited pipes is one alternative to SIGCHLD catching, as is using a small, dedicated "spawn-helper" to do your forking and reaping in a separate process.
Since you have threads you cannot control, I recommend you write a preloaded library to interpose the system() call (and perhaps also popen() etc.) with your own implementation. I'd also include your SIGCHLD handler in the library, too.
If you don't want to run your program via env LD_PRELOAD=libwhatever.so yourprogram, you can add something like
const char *libs;
libs = getenv("LD_PRELOAD");
if (!libs || !*libs) {
setenv("LD_PRELOAD", "libwhatever.so", 1);
execv(argv[0], argv);
_exit(127);
}
at the start of your program, to have it re-execute itself with LD_PRELOAD appropriately set. (Note that there are quirks to consider if your program is setuid or setgid; see man ld.so for details. In particular, if libwhatever.so is not installed in a system library directory, you must specify a full path.)
One possible approach would be to use a lockless array (using atomic built-ins provided by the C compiler) of pending children. Instead of waitpid(), your system() implementation allocates one of the entries, sticks the child PID in there, and waits on a semaphore for the child to exit instead of calling waitpid().
Here is an example implementation:
#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <semaphore.h>
#include <dlfcn.h>
#include <errno.h>
/* Maximum number of concurrent children waited for.
*/
#define MAX_CHILDS 256
/* Lockless array of child processes waited for.
*/
static pid_t child_pid[MAX_CHILDS] = { 0 }; /* 0 is not a valid PID */
static sem_t child_sem[MAX_CHILDS];
static int child_status[MAX_CHILDS];
/* Helper function: allocate a child process.
* Returns the index, or -1 if all in use.
*/
static inline int child_get(const pid_t pid)
{
int i = MAX_CHILDS;
while (i-->0)
if (__sync_bool_compare_and_swap(&child_pid[i], (pid_t)0, pid)) {
sem_init(&child_sem[i], 0, 0);
return i;
}
return -1;
}
/* Helper function: release a child descriptor.
*/
static inline void child_put(const int i)
{
sem_destroy(&child_sem[i]);
__sync_fetch_and_and(&child_pid[i], (pid_t)0);
}
/* SIGCHLD signal handler.
* Note: Both waitpid() and sem_post() are async-signal safe.
*/
static void sigchld_handler(int signum __attribute__((unused)),
siginfo_t *info __attribute__((unused)),
void *context __attribute__((unused)))
{
pid_t p;
int status, i;
while (1) {
p = waitpid((pid_t)-1, &status, WNOHANG);
if (p == (pid_t)0 || p == (pid_t)-1)
break;
i = MAX_CHILDS;
while (i-->0)
if (p == __sync_fetch_and_or(&child_pid[i], (pid_t)0)) {
child_status[i] = status;
sem_post(&child_sem[i]);
break;
}
/* Log p and status? */
}
}
/* Helper function: close descriptor, without affecting errno.
*/
static inline int closefd(const int fd)
{
int result, saved_errno;
if (fd == -1)
return EINVAL;
saved_errno = errno;
do {
result = close(fd);
} while (result == -1 && errno == EINTR);
if (result == -1)
result = errno;
else
result = 0;
errno = saved_errno;
return result;
}
/* Helper function: Create a close-on-exec socket pair.
*/
static int commsocket(int fd[2])
{
int result;
if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
fd[0] = -1;
fd[1] = -1;
return errno;
}
do {
result = fcntl(fd[0], F_SETFD, FD_CLOEXEC);
} while (result == -1 && errno == EINTR);
if (result == -1) {
closefd(fd[0]);
closefd(fd[1]);
return errno;
}
do {
result = fcntl(fd[1], F_SETFD, FD_CLOEXEC);
} while (result == -1 && errno == EINTR);
if (result == -1) {
closefd(fd[0]);
closefd(fd[1]);
return errno;
}
return 0;
}
/* New system() implementation.
*/
int system(const char *command)
{
pid_t child;
int i, status, commfd[2];
ssize_t n;
/* Allocate the child process. */
i = child_get((pid_t)-1);
if (i < 0) {
/* "fork failed" */
errno = EAGAIN;
return -1;
}
/* Create a close-on-exec socket pair. */
if (commsocket(commfd)) {
child_put(i);
/* "fork failed" */
errno = EAGAIN;
return -1;
}
/* Create the child process. */
child = fork();
if (child == (pid_t)-1)
return -1;
/* Child process? */
if (!child) {
char *args[4] = { "sh", "-c", (char *)command, NULL };
/* If command is NULL, return 7 if sh is available. */
if (!command)
args[2] = "exit 7";
/* Close parent end of comms socket. */
closefd(commfd[0]);
/* Receive one char before continuing. */
do {
n = read(commfd[1], &status, 1);
} while (n == (ssize_t)-1 && errno == EINTR);
if (n != 1) {
closefd(commfd[1]);
_exit(127);
}
/* We won't receive anything else. */
shutdown(commfd[1], SHUT_RD);
/* Execute the command. If successful, this closes the comms socket. */
execv("/bin/sh", args);
/* Failed. Return the errno to the parent. */
status = errno;
{
const char *p = (const char *)&status;
const char *const q = (const char *)&status + sizeof status;
while (p < q) {
n = write(commfd[1], p, (size_t)(q - p));
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1)
break;
else
if (errno != EINTR)
break;
}
}
/* Explicitly close the socket pair. */
shutdown(commfd[1], SHUT_RDWR);
closefd(commfd[1]);
_exit(127);
}
/* Parent process. Close the child end of the comms socket. */
closefd(commfd[1]);
/* Update the child PID in the array. */
__sync_bool_compare_and_swap(&child_pid[i], (pid_t)-1, child);
/* Let the child proceed, by sending a char via the socket. */
status = 0;
do {
n = write(commfd[0], &status, 1);
} while (n == (ssize_t)-1 && errno == EINTR);
if (n != 1) {
/* Release the child entry. */
child_put(i);
closefd(commfd[0]);
/* Kill the child. */
kill(child, SIGKILL);
/* "fork failed". */
errno = EAGAIN;
return -1;
}
/* Won't send anything else over the comms socket. */
shutdown(commfd[0], SHUT_WR);
/* Try reading an int from the comms socket. */
{
char *p = (char *)&status;
char *const q = (char *)&status + sizeof status;
while (p < q) {
n = read(commfd[0], p, (size_t)(q - p));
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1)
break;
else
if (errno != EINTR)
break;
}
/* Socket closed with nothing read? */
if (n == (ssize_t)0 && p == (char *)&status)
status = 0;
else
if (p != q)
status = EAGAIN; /* Incomplete error code, use EAGAIN. */
/* Close the comms socket. */
shutdown(commfd[0], SHUT_RDWR);
closefd(commfd[0]);
}
/* Wait for the command to complete. */
sem_wait(&child_sem[i]);
/* Did the command execution fail? */
if (status) {
child_put(i);
errno = status;
return -1;
}
/* Command was executed. Return the exit status. */
status = child_status[i];
child_put(i);
/* If command is NULL, then the return value is nonzero
* iff the exit status was 7. */
if (!command) {
if (WIFEXITED(status) && WEXITSTATUS(status) == 7)
status = 1;
else
status = 0;
}
return status;
}
/* Library initialization.
* Sets the sigchld handler,
* makes sure pthread library is loaded, and
* unsets the LD_PRELOAD environment variable.
*/
static void init(void) __attribute__((constructor));
static void init(void)
{
struct sigaction act;
int saved_errno;
saved_errno = errno;
sigemptyset(&act.sa_mask);
act.sa_sigaction = sigchld_handler;
act.sa_flags = SA_NOCLDSTOP | SA_RESTART | SA_SIGINFO;
sigaction(SIGCHLD, &act, NULL);
(void)dlopen("libpthread.so.0", RTLD_NOW | RTLD_GLOBAL);
unsetenv("LD_PRELOAD");
errno = saved_errno;
}
If you save the above as say child.c, you can compile it into libchild.so using
gcc -W -Wall -O3 -fpic -fPIC -c child.c -lpthread
gcc -W -Wall -O3 -shared -Wl,-soname,libchild.so child.o -ldl -lpthread -o libchild.so
If you have a test program that does system() calls in various threads, you can run it with system() interposed (and children automatically reaped) using
env LD_PRELOAD=/path/to/libchild.so test-program
Note that depending on exactly what those threads that are not under your control do, you may need to interpose further functions, including signal(), sigaction(), sigprocmask(), pthread_sigmask(), and so on, to make sure those threads do not change the disposition of your SIGCHLD handler (after installed by the libchild.so library).
If those out-of-control threads use popen(), you can interpose that (and pclose()) with very similar code to system() above, just split into two parts.
(If you are wondering why my system() code bothers to report the exec() failure to the parent process, it's because I normally use a variant of this code that takes the command as an array of strings; this way it correctly reports if the command was not found, or could not be executed due to insufficient privileges, etc. In this particular case the command is always /bin/sh. However, since the communications socket is needed anyway to avoid racing between child exit and having up-to-date PID in the *child_pid[]* array, I decided to leave the "extra" code in.)
For those who are still looking for the answer, there is an easier way to solve this problem:
Rewrite SIGCHLD handler to use waitid call with flags WNOHANG|WNOWAIT to check child's PID before reaping them. You can optionally check /proc/PID/stat (or similar OS interface) for command name.
Replace the system() by proc_system().
I am trying to implement a basic event loop with pselect, so I have blocked some signals, saved the signal mask and used it with pselect so that the signals will only be delivered during that call.
If a signal is sent outside of the pselect call, it is blocked until pselect as it should, however it does not interrupt the pselect call. If a signal is sent while pselect is blocking, it will be handled AND pselect will be interrupted. This behaviour is only present in OSX, in linux it seems to function correctly.
Here is a code example:
#include <stdio.h>
#include <string.h>
#include <sys/select.h>
#include <errno.h>
#include <unistd.h>
#include <signal.h>
int shouldQuit = 0;
void signalHandler(int signal)
{
printf("Handled signal %d\n", signal);
shouldQuit = 1;
}
int main(int argc, char** argv)
{
sigset_t originalSignals;
sigset_t blockedSignals;
sigemptyset(&blockedSignals);
sigaddset(&blockedSignals, SIGINT);
if(sigprocmask(SIG_BLOCK, &blockedSignals, &originalSignals) != 0)
{
perror("Failed to block signals");
return -1;
}
struct sigaction signalAction;
memset(&signalAction, 0, sizeof(struct sigaction));
signalAction.sa_mask = blockedSignals;
signalAction.sa_handler = signalHandler;
if(sigaction(SIGINT, &signalAction, NULL) == -1)
{
perror("Could not set signal handler");
return -1;
}
while(!shouldQuit)
{
fd_set set;
FD_ZERO(&set);
FD_SET(STDIN_FILENO, &set);
printf("Starting pselect\n");
int result = pselect(STDIN_FILENO + 1, &set, NULL, NULL, NULL, &originalSignals);
printf("Done pselect\n");
if(result == -1)
{
if(errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR)
{
perror("pselect failed");
}
}
else
{
printf("Start Sleeping\n");
sleep(5);
printf("Done Sleeping\n");
}
}
return 0;
}
The program waits until you input something on stdin, then sleeps for 5 seconds. To create the problem, "a" is typed to create data on stdin. Then, while the program is sleeping, an INT signal is sent with Crtl-C.
On Linux:
Starting pselect
a
Done pselect
Start Sleeping
^CDone Sleeping
Starting pselect
Handled signal 2
Done pselect
On OSX:
Starting pselect
a
Done pselect
Start Sleeping
^CDone Sleeping
Starting pselect
Handled signal 2
^CHandled signal 2
Done pselect
Confirmed that it acts that way on OSX, and if you look at the source for pselect (http://www.opensource.apple.com/source/Libc/Libc-320.1.3/gen/FreeBSD/pselect.c), you'll see why.
After sigprocmask() restores the signal mask, the kernel delivers the signal to the process, and your handler gets invoked. The problem here is, that the signal can be delivered before select() gets invoked, so select() won't return with an error.
There's some more discussion about the issue at http://lwn.net/Articles/176911/ - linux used to use a similar userspace implementation that had the same problem.
If you want to make that pattern safe on all platforms, you'll have to either use something like libev or libevent and let them handle the messiness, or use sigprocmask() and select() yourself.
e.g.
sigset_t omask;
if (sigprocmask(SIG_SETMASK, &originalSignals, &omask) < 0) {
perror("sigprocmask");
break;
}
/* Must re-check the flag here with signals re-enabled */
if (shouldQuit)
break;
printf("Starting select\n");
int result = select(STDIN_FILENO + 1, &set, NULL, NULL, NULL);
int save_errno = errno;
if (sigprocmask(SIG_SETMASK, &omask, NULL) < 0) {
perror("sigprocmask");
break;
}
/* Recheck again after the signal is blocked */
if (shouldQuit)
break;
printf("Done pselect\n");
if(result == -1)
{
errno = save_errno;
if(errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR)
{
perror("pselect failed");
}
}
There are a couple of other things you should do with your code:
declare your 'shouldQuit' variable as volatile sig_atomic_t
volatile sig_atomic_t shouldQuit = 0;
always save errno before calling any other function (such as printf()), since that function may cause errno to be overwritten with another value. Thats why the code above aves errno immediately after the select() call.
Really, I strongly recommend using an existing event loop handling library like libev or libevent - I do, even though I can write my own, because it is so easy to get wrong.