I have a parent process that created 16 child processes using fork in a loop. Eventually, every child sends a SIGUSR1 signal , that is handled by a handler function in the parent process.
My problem is this - some children send the signal while a signal from another child is handled. I read that the handler function then stops and handles the new signal, ignoring the current signal its working on.
I tried to fix that by sending: kill(0,SIGSTOP) at the start of the handler function, but looks like that stops the parent process as well. Is there a way to send this signal only to the children?
If its not possible, is my goal achievable using wait, waitpid and kill?
Added the code below, I left out stuff like checking the return value for read, open etc.
Handler function:
void my_signal_handler( int signum, siginfo_t* info, void* ptr)
{
kill(0, SIGSTOP);
int sonPid = info->si_pid;
char* pipeName = malloc(14 + sizeof(int));//TODO ok?
sprintf(pipeName, "//tmp//counter_%d" , (int) sonPid); //TODO double //?
size_t fdPipe = open(pipeName, O_RDONLY);
int cRead;
int countRead = read(fdPipe,&cRead,sizeof(int));
COUNT+= cRead;
kill(0, SIGCONT);
return;
}
Creating the child processes:
struct sigaction new_action;
memset(&new_action, 0, sizeof(new_action));
new_action.sa_handler = my_signal_handler;
new_action.sa_flags = SA_SIGINFO;
if( 0 != sigaction(SIGUSR1, &new_action, NULL) )
{
printf("Signal handle registration failed. %s\n", strerror(errno));
return -1;
}
for(int i=0; i<16; i++){
pid_t cpid = fork();
if(cpid == 0) // child
{
execv("./counter",argvv); // some arguments to the function
printf("execv failed: %s\n", strerror(errno));
return -1;
}
else{
continue;
}
The counter program of the children, in short it counts the appearances of the char counc in some part of the file then prints it to a pipe.:
int main(int argc, char** argv){
int counter = 0;
char counc = argv[1][0];
char* filename = argv[2];
off_t offset = atoll(argv[3]);
ssize_t length = atoll(argv[4]);
int fd = open(filename, O_RDWR | O_CREAT);
char* arr = (char*)mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
for(int i=0; i<length; i++){
if(arr[i] == counc){
counter++;
}
}
pid_t proid = getpid();
pid_t ppid = getppid();
char* pipeName = malloc(14 + sizeof(pid_t));
sprintf(pipeName, "//tmp//counter_%d" , (int) proid);
size_t fdPipe = mkfifo(pipeName, 0777);
int didopen = open(pipeName,O_WRONLY);
size_t wrote = write(fdPipe,&counter , 1 );
if(wrote < 0){
printf(OP_ERR, strerror( errno ));
return errno;
}
kill(ppid, SIGUSR1);
//close the pipe and unmap the array
return 1;
}
If i understand your question correctly, you are having trouble the desired signal function in father and children separately. If this is not your case please correct me.
If your problem is this, you can simply create some if statements, to test the pid that fork() call returns and then only execute if you are the father. So...
if (pid == 0)
//ChildProcess();
// do nothing!
else
//ParentProcess();
// do something!
Note that you have to define size_t pid as a global variable, to be visible in both main, and signal handler function you are implementing!
Besides what others have already pointed out regarding the use of non-async-safe functions from within a signal handler, I'm gonna go out on a limb and guess that the problem is either of these two things:
You're incorrectly assuming that signals sent via kill are queued up for the process they're sent to. They're not queued up (unless you use sigqueue on an operating system that supports it); the set of pending (yet un-handled) signals for the destination process is instead updated. See signal queuing in C. Or,
You're running this code on an operating system that doesn't support "reliable signal semantics". See man 3 sysv_signal for insights (horrors) like:
However sysv_signal() provides the System V unreliable signal semantics, that is: a) the disposition of the signal is reset to the default when the handler is invoked; b) delivery of further instances of the signal is not blocked while the signal handler is executing;
Related
I have 2 programs: 1) Father 2) Child.
When Father receives SIGINT (CTRL-C) signal his handler sends a SIGTERM to his child. The problem is that often (not always, don't know why) it shows this error in loop after SIGINT:
Invalid Argument
Goal of the father is to create a child and then just being alive to be ready to handle SIGINT.
Father
#include "library.h"
static void handler();
int main(int argc, char* argv[]){
int value, que_id;
char str_que_id[10], **child_arg;
pid_t child_pid;
sigaction int_sa;
//Create message queue
do{
que_id = msgget(IPC_PRIVATE, ALL_PERM | IPC_CREAT);
}while(que_id == -1);
snprintf(str_que_id, sizeof(str_que_id), "%d", que_id);
//Set arguments for child
child_arg = malloc(sizeof(char*) * 3);
child[0] = "child";
child[1] = str_que_id;
child[2] = NULL;
//Set handler for SIGINT
int_sa.sa_handler = &handler;
int_sa.sa_flags = SA_RESTART;
sigemptyset(&int_sa.sa_mask);
sigaddset(&int_sa.sa_mask, SIGALRM);
sigaction(SIGINT, &int_sa, NULL);
//Fork new child
if(value = fork() == 0){
child_pid = getpid();
do{
errno = 0;
execve("./child", child_arg, NULL);
}while(errno);
}
//Keep alive father
while(1);
return 0;
}
static void handler(){
if(kill(child_pid, SIGTERM) != -1)
waitpid(child_pid, NULL, WNOHANG);
while(msgctl(que_id, IPC_RMID, NULL) == -1);
free(child_arg);
exit(getpid());
}
Goal of the child (only for now in my project) is just to wait a new message incoming from the message queue. Since there won't be any message, it will always be blocked.
Child
#include "library.h"
typedef struct _Msgbuf {
long mtype;
char[10] message;
} Msgbuf;
int main(int argc, char * argv[]){
int que_id;
//Recovery of message queue id
que_id = atoi(argv[1]);
//Set handler for SIGTERM
signal(SIGTERM, handler);
//Dynamic allocation of message
received = calloc(1, sizeof(Msgbuf));
while(1){
do{
errno = 0;
//This will block child because there won't be any message incoming
msgrcv(que_id, received, sizeof(Msgbuf) - sizeof(long), getpid(), 0);
if(errno)
perror(NULL);
}while(errno && errno != EINTR);
}
}
static void handler(){
free(received);
exit(getpid());
}
I know from the man pages on msgrcv():
The calling process catches a signal. In this case the system call fails with errno set to EINTR. (msgrcv() is never automatically restarted after being interrupted by a signal handler, regardless of the setting of the SA_RESTART flag when establishing a signal handler.)
So why does it go to loop printing that error? It should exit in the handler instead it seems that after the handler comes back and (since the free(received) ) it doesn't find the buffer of the message setting errno to EINVAL .
(Almost) always errno only carries a sane value if and only if a function call failed.
This is the case for msgrcv().
From msgrcv()'s documentation:
RETURN VALUE
Upon successful completion, msgrcv() shall return a value equal to the number of bytes actually placed into the buffer mtext. Otherwise, no message shall be received, msgrcv() shall return -1, and errno shall be set to indicate the error.
So only use errno if msgrcv() returned -1, else errno's value is undefined and it might very well contain garbage or not ...
The code below does not make sense ...
msgrcv(que_id, received, sizeof(Msgbuf) - sizeof(long), getpid(), 0);
if(errno)
perror(NULL);
} while(errno && errno != EINTR);
... and should look like:
if (-1 == msgrcv(que_id, received, sizeof(Msgbuf) - sizeof(long), getpid(), 0))
{
/* Only here errno had a well defined value. */
perror("msgrcv() failed"); /* perror() translates errno into a human readable text prefixed by its argument and logs it to the stderr. */
}
else
{
errno = 0;
}
} while (errno && errno != EINTR);
This BTW
do{
errno = 0;
execve("./child", child_arg, NULL);
}while(errno);
only works as the members of the exec*() family of functions only return on error. So when the while's condition is tested then execve() had failed, though errno had been set. Here also the initial errnr = 0; setting is useless.
There are a number of problems with your program. It invokes undefined behaviour by calling exit, free, and msgctl from within the signal handlers. The table in the Signal Actions section of The Open Group Base Specifications lists the functions that are safe to call from within a signal handler. In most cases, you simply want to toggle a "running" flag from within the handler and have your main loop run until it is told to exit. Something like the following simple example:
#include <signal.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
/* this will be set when the signal is received */
static sig_atomic_t running = 1;
void
sig_handler(int signo, siginfo_t *si, void *context)
{
running = 0;
}
int
main(int argc, char *argv[])
{
int rc;
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = &sig_handler;
rc = sigaction(SIGINT, &sa, NULL);
if (rc < 0) {
perror("sigaction");
exit(EXIT_FAILURE);
}
printf("Waiting for SIGINT\n");
while (running) {
printf("... sleeping for 10 seconds\n");
sleep(10);
}
printf("Signal received\n");
return 0;
}
I put together a more complex session on repl.it as well.
The other problem is that you assume that errno retains a zero value across function calls. This is likely the case but the only thing that you should assume about errno is that it will be assigned a value when a library function returns a failure code -- e.g., read returns -1 and sets errno to something that indicates the error. The conventional way to call a C runtime library function is to check the return value and consult errno when appropriate:
int bytes_read;
unsigned char buf[128];
bytes_read = read(some_fd, &buf[0], sizeof(buf));
if (bytes_read < 0) {
printf("read failed: %s (%d)\n", strerror(errno), errno);
}
Your application is probably looping because the parent is misbehaving and not waiting on the child or something similar (see above about undefined behavior). If the message queue is removed before the child exits, then the msgrcv call is going to fail and set errno to EINVAL. You should check if msgrcv is failing before you check errno. The child should also be terminating the loop when it encounters a msgrcv failure with errno equal to EINVAL since that is a terminal condition -- the anonymous message queue can never be recreated after it ceases to exist.
I am writing a program that takes a list of UNIX commands from a file and executes them sequentially. To keep everything in order, I must have each command initialized and kept waiting for SIGUSR1 via sigwait(). When every command is initialized, then every command can execute.
Usage: > program.c input.txt
However, it appears that SIGUSR1 is repeatedly called, completely surpassing sigwait(). What is going on here? I've tried so many different things, but it's recently modeled after this answer. To rephrase, I want the signal to be raised for commands immediately after initialization. I want the signal to be unblocked when all commands are completely initialized
#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
void on_sigusr1(int sig)
{
// Note: Normally, it's not safe to call almost all library functions in a
// signal handler, since the signal may have been received in a middle of a
// call to that function.
printf("SIGUSR1 received!\n");
}
int main(int arc, char* argv[])
{
FILE *file;
file = fopen(argv[1] ,"r");
int BUF_SIZE = 100;
char *token;
char buffer[BUF_SIZE];
char programs[BUF_SIZE];
char *commands[BUF_SIZE];
int i = 0;
int counter = 1;
while (fgets(buffer, sizeof buffer, file ) != NULL)
{
strcpy(programs, buffer);
int length = strlen(buffer)-1;
if (buffer[length] == '\n')
{
buffer[length] = '\0';
}
i = 0;
token = strtok(buffer," ");
while(token != NULL)
{
commands[i++] = token;
token = strtok(NULL, " ");
}
commands[i] = 0;
pid_t pids[counter];
// Set a signal handler for SIGUSR1
signal(SIGUSR1, &on_sigusr1);
// At program startup, SIGUSR1 is neither blocked nor pending, so raising it
// will call the signal handler
raise(SIGUSR1);
// Now let's block SIGUSR1
sigset_t sigset;
sigemptyset(&sigset);
sigaddset(&sigset, SIGUSR1);
sigprocmask(SIG_BLOCK, &sigset, NULL);
// SIGUSR1 is now blocked, raising it will not call the signal handler
printf("About to raise SIGUSR1\n");
raise(SIGUSR1);
printf("After raising SIGUSR1\n");
for(i = 0; i < counter; ++i)
{
pids[i] = fork();
if(pids[i] > 0)
{
printf("Child process %d ready to execute command %s", getpid(), programs);
// SIGUSR1 is now blocked and pending -- this call to sigwait will return
// immediately
int sig;
int result = sigwait(&sigset, &sig);
if(result == 0) {
printf("Child process %d executing command %s", getpid(), programs);
execvp(commands[0], commands);
}
}
}
// All programs have been launched
for(i = 0; i < counter; ++i)
{
wait(&pids[i]);
}
// All programs are waiting to execute
for (i = 0; i < counter; ++i)
{
// SIGUSR1 is now no longer pending (but still blocked). Raise it again and
// unblock it
raise(SIGUSR1);
printf("About to unblock SIGUSR1\n");
sigprocmask(SIG_UNBLOCK, &sigset, NULL);
printf("Unblocked SIGUSR1\n");
}
}
exit(0);
fclose(file);
return 0;
}
UPDATE: Tried changing signal() to sigaction(). No change.
You should consider calling sigwait after checking to see if that pid is a child process.
So maybe put
int sig;
and
int result = sigwait(&sigset, &sig);
within an if statement that checks if the pid is == 0 which would indicate that it is a child. Otherwise you would be sigwaiting a parent process.
If the pid is greater than 0 it's a parent process id and if it's less than zero its an error.
Then for each process in your array of pids you would call kill(pid_array[i], SIGUSR1) to unblock it.
First let me say that there are a lot of questions in here.
One of the tasks for my thesis requires me to write a program that executes a sub-program and kills it if it running time ( not wall-time but user+sys ) is more then a specific value or it's RAM consumption is more then another specified value.
While I have not figured out the RAM part yet. The time killing I do with setitmer and the ITIMER_PROF signal. ( Because ITIMER_PROF gathers actual CPU usages rather then setting a starting point in time and then count for x amount of time )
The reason I use setitimer is because I need less then second precision. ( E.G. kill the process after 1.75 seconds ( 1750000 microseconds ). The setrlimit method only has a second's one.
Question 1 Why doesn't the setitimer with ITIME_PROF work when it's set in the parent process ? The CPU / System calls for the child are not collected by it ?
childPID = fork();
if (childPID == -1){
printf( "Puff paff ... fork() did not work !\n" );
exit(1);
}
// Child
if(childPID == 0) {
execvp(args[0], args);
exit(1);
}
// Parent
else{
// Using a ITIMER_PROF inside the parent program will not work!
// The child may take 1 hour to execute and the parent will wait it out!
// To fix this we need to use a ITIMER_REAL ( wall-time ) but that's not an accurate measurement
struct itimerval timer;
timer.it_value.tv_sec = 0;
timer.it_value.tv_usec = 500000;
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 500000;
setitimer ( ITIMER_PROF, &timer, NULL);
int status;
waitpid(childPID,&status,0);
if (WIFEXITED(status)) {
fprintf(stderr, "Nice nice, the child exited ... with cPID = %d with status = %d \n", cPID, WEXITSTATUS(status) );
}
}
Question 2 Why does this WORK!? Doesn't the execvp overwrite all the functions ( timeout_sigprof, main and any other)? And couldn't someone potentially catch the signal in the child program and supersede the original function ?
void timeout_sigprof( int signum ){
fprintf(stderr, "The alarm SIGPROF is here !\nThe actual pid: %d\n", getpid());
//TODO: Write output and say the child terminated with
// ram or time limit exceeded
exit(105); // Note the 105 !
}
childPID = fork();
if (childPID == -1){
printf( "Puff paff ... fork() did not work !\n" );
exit(1);
}
// Child
if(childPID == 0) {
//
struct sigaction sa;
memset (&sa, 0, sizeof (sa));
sa.sa_handler = &timeout_sigprof;
sigaction (SIGPROF, &sa, NULL);
struct itimerval timer;
timer.it_value.tv_sec = 0;
timer.it_value.tv_usec = 250000;
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 250000;
setitimer ( ITIMER_PROF, &timer, NULL);
execvp(args[0], args);
exit(1);
}
// Parent process
else {
// Waiting for the child
int status;
waitpid(childPID,&status,0);
if (WIFEXITED(status)) {
fprintf(stderr, "Nice nice, the child exited ... with cPID = %d with status = %d \n", cPID, WEXITSTATUS(status) );
}
exit(0);
}
Question 3 Why does the dup2 placed here actually work and let's the child's input / output to be redirected ?
childPID = fork();
if (childPID == -1){
printf( "Puff paff ... fork() did not work !\n" );
exit(1);
}
// Child
if(childPID == 0) {
// Redirect all I/O to/from a file
int outFileId = open("output", O_WRONLY | O_TRUNC | O_CREAT, S_IRUSR | S_IRGRP | S_IWGRP | S_IWUSR);
// Redirect the output for the CHILD program. Still don't know why it works.
dup2(outFileId, 1)
// No idea why these dup2's work ! As i close the file descriptors here ?!
close(outFileId);
execvp(args[0], args);
exit(1);
}
// Parent process
else {
// Waiting for the child
int status;
waitpid(childPID,&status,0);
if (WIFEXITED(status)) {
fprintf(stderr, "Nice nice, the child exited ... with cPID = %d with status = %d \n", cPID, WEXITSTATUS(status) );
}
exit(0);
}
Here is the code I wrote that goes and kills a program only after it was running after X amount of time ( x = 500ms ).
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <sys/time.h>
volatile pid_t childPID;
// This function should exist only in the parent! The child show not have it after a exec* acording to :
// The exec() family of functions replaces the current process image with a new process image.
void timeout_sigprof( int signum ){
fprintf(stderr, "The alarm SIGPROF is here !\nThe actual pid: %d\n", getpid());
//TODO: Write output and say the child terminated with a ram or time limit exceeded
exit(105); // Note the 105 !
}
int main(int argc, char *argv[]) {
int cstatus;
pid_t cPID;
char *args[2];
args[0] = "/home/ddanailov/Projects/thesis/programs/prime/prime";
args[1] = NULL; // Indicates the end of arguments.
// Handle the SIGPROF signal in the function time_handler in both the child and
struct sigaction sa;
memset (&sa, 0, sizeof (sa));
sa.sa_handler = &timeout_sigprof;
sigaction (SIGPROF, &sa, NULL);
childPID = fork();
if (childPID == -1){
printf( "Puff paff ... fork() did not work !\n" );
exit(1);
}
// Child
if(childPID == 0) {
struct itimerval timer;
timer.it_value.tv_sec = 0;
timer.it_value.tv_usec = 250000;
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 250000;
setitimer ( ITIMER_PROF, &timer, NULL);
// Redirect all I/O to/from a file
int outFileId = open("output", O_WRONLY | O_TRUNC | O_CREAT, S_IRUSR | S_IRGRP | S_IWGRP | S_IWUSR);
// int inFileId = open("input");
// Redirect the output for the CHILD program. Still don't know why it works.
//dup2(inFileId, 0);
dup2(outFileId, 1);
//dup2(outFileId, 2);
// No idea why these dup2's work ! As i close the file descriptors here ?!
close(outFileId);
close(inFileId);
execvp(args[0], args);
exit(1);
}
// Parent process
else {
// Waiting for the child
int status;
waitpid(childPID,&status,0);
if (WIFEXITED(status)) {
fprintf(stderr, "Nice nice, the child exited ... with cPID = %d with status = %d \n", cPID, WEXITSTATUS(status) );
}
exit(0);
}
return 0;
}
Any help / explanation will be much appreciated !
Thank you all in advance,
Ex
Question 1
Why doesn't the setitimer with ITIME_PROF work when it's
set in the parent process ? The CPU / System calls for the child are
not collected by it ?
No, they are not. The timer related to ITIME_PROF only decrements when the process which has the timer set is executing, or when system calls are executing on it's behalf, not when child processes are executing.
These signals are normally used by profiling instrumentation which is contained in libraries that you link to the program you are trying to profile.
However: you probably do not need to have the signal sent to the parent process anyway. If your goal is to terminate the program once it has exceeded the usage allowed, then let it receive the SIGPROF and exit (as seen in my answer to Q2 below). Then, after waitpid returns and you detect that the program has exited due to SIGPROF, you can find out the actual amount of time that the child used by calling times or getrusage.
The only drawback to this is that the child program could subvert this process by setting it's own signal handler on SIGPROF.
Question 2
Why does this WORK!? Doesn't the execvp overwrite all the
functions ( timeout_sigprof, main and any other)? And couldn't someone
potentially catch the signal in the child program and supersede the
original function ?
It doesn't, or at least not the way you may be thinking. As you say, the signal handler you have installed in the parent process are replaced by the new image that is loaded by execvp.
The reason it appears to work is that if the new program does not set a signal handler for SIGPROF, then when that signal is sent to the process it will terminate. Recall that any signal sent to a process which that process has not set up a handler for, or specifically decided to ignore, will cause the process to terminate.
If the program which is being executed by execvp does set a signal handler for SIGPROF, then it will not be terminated.
Update
After seeing your comment, I thought I had better try your program. I added another branch to the if statement after waitpid, like this:
waitpid(childPID,&status,0);
if (WIFEXITED(status)) {
fprintf(stderr, "Nice nice, the child exited ... with cPID = %d with status = %d \n", childPID, WEXITSTATUS(status) );
} else if (WIFSIGNALED(status)) {
fprintf(stderr, "Process pid=%d received signal %d\n",childPID,WTERMSIG(status));
}
When I run this, I see the following:
$ ./watcher
Process pid=1045 received signal 27
This validates what I am saying above. I do not see the string "The alarm SIGPROF is here !" printed, and I do see an indication in the parent that the child was killed by signal 27, which is SIGPROF.
I can only think of one scenario under which you would see the signal handler execute, and that would be if the timer is set so low that it fires before execv actually manages to load the new image. This does not look entirely likely though.
Another possibility is that you have inadvertently installed the same signal handler in your target program (copy paste error?).
Question 3
Why does the dup2 placed here actually work and let's the
child's input / output to be redirected ?
I assume from the comments in the code that you mean "why does it work even though I have closed the original file descriptors immediately after dup2?"
dup2 duplicates the old FD into the new FD, so after executing:
dup2(outFileId, 1);
you have two file descriptors referencing the same file description: the one contained in variable outFileId, and FD 1 (which is stdout). Also note that the original stdout will be closed by this action.
File Descriptors are like references to an underlying file description data structure, which represents the open file. After calling dup2, there are two file descriptors pointing to the same file description.
The man page for close says:
If fd is the last file descriptor referring to the underlying open
file description (see open(2)), the resources associated with the open
file description are freed
So it is working as it should: you still have one FD open, FD 1 (stdout), and the new child process can write to it.
What I'm trying to do is have a child process run in the background while the parent goes and does something else. When the child returns, I'd like for the parent to get that status and do something with it. However, I don't want to explicitly wait for the child at any point.
I looked into the WNOHANG option of waitpid but this seems to be a little different than what I'm looking for. WNOHANG just only gets the status if it's done, otherwise it moves on. Ideally, I'd like some option that will not wait for the process, but will jump back and grab the return status when it's done.
Is there any way to do this?
Code example:
pid_t p = fork();
if (p == 0){
value = do_child_stuff();
return(value);
}
if (p > 0){
captureStatus(p, &status); //NOT A REAL FUNCTION
// captureStatus will put p's exit status in status
// whenever p returns, without waiting or pausing for p
//do other stuff.....
}
Is there any way to simulate the behavior of captureStatus?
You could establish a signal handler for SIGCHLD and wait for the process once that triggers (it will trigger when the child terminates or is killed).
However, be aware that very few useful things can be done in a signal handler. Everything must be async-signal-safe. The standard specifically mentions wait and waitpid as safe.
Here's the proper way (or at least one proper way) to create an asynchronous wait out of a synchronous one:
struct waitpid_async_args {
pid_t pid;
int *status;
int flags;
sem_t sem, *done;
int *err;
};
static void *waitpid_async_start(void *arg)
{
struct waitpid_async_args *a = arg;
pid_t pid = a->pid;
int *status = a->status, flags = a->flags, *err = a->err;
sem_post(&a->sem);
if (waitpid(pid, status, flags) < 0) *err = errno;
else *err = 0;
sem_post(a->done);
pthread_detach(pthread_self());
return 0;
}
int waitpid_async(pid_t pid, int *status, int flags, sem_t *done, int *err)
{
struct waitpid_async_args a = { .pid = pid, .status = status,
.flags = flags, .done = done, .err = err };
sigset_t set;
pthread_t th;
int ret;
sem_init(&a.sem, 0, 0);
sigfillset(&set);
pthread_sigmask(SIG_BLOCK, &set, &set);
ret = pthread_create(&th, 0, waidpid_async_start, &a);
if (!ret) sem_wait(&a.sem);
pthread_sigmask(SIG_SETMASK, &set, 0);
return ret;
}
Note that the asynchronous function takes as an extra argument a semaphore it will post to flag that it's done. You could just examine status, but without a synchronization object there's no formal guarantee of memory ordering, so it's better to use an approach like this and call sem_trywait or sem_timedwait with a timeout to check whether the status is available yet before accessing it.
You can used shared memory IPC functions like shmget and shmat to communicate between the two processes. This can be non-blocking and may work well for producer/consumer models like the one you are describing. You still will have to poll, though.
shmget should be called before the fork to create the shared memory block.
Then you can use shmat after the fork to get a pointer to the shared memory and just use it as a buffer from there on.
When finished call shmdt on both child and parent to detach, and shmctl to remove the shared memory.
There is an example on the web here.
I like Vyktor's solution - in the parent process, start a thread to block on the child process and set the status variable when it's done. Here is an implementation:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <pthread.h>
struct return_monitor {
pid_t pid;
int *return_status;
};
static void
captureStatus(struct return_monitor *monitor)
{
printf("Monitor thread started...\n");
int return_status;
waitpid(monitor->pid, &return_status, 0);
*(monitor->return_status) = WEXITSTATUS(return_status);
}
int
main()
{
printf("Parent process started...\n");
pid_t p = fork();
if (p == 0){
/* child */
printf("Child process started...\n");
int i;
for (i = 0; i < 10; ++i) {
sleep(1);
printf("Child iteration %d...\n", i);
}
/* arbitrary return value that will be recognizable in parent */
return(3);
}
if (p > 0){
int child_return_status = -1;
struct return_monitor monitor = {p, &child_return_status};
pthread_t monitor_thread;
pthread_create(&monitor_thread, NULL, captureStatus, &monitor);
int i;
for (i = 0; i < 10; ++i) {
sleep(2);
printf("Parent process iteration %d (Child return status %d)...\n",
i, child_return_status);
}
/* captureStatus(p, &status); */
}
}
How could I track down the death of a child process without making the parent process wait until the child process got killed?
I am trying a client-server scenario where the server accepts the connection from a client and forks a new process for each and every connection it accepts.
I am ignoring SIGCHLD signals to prevent zombie creation.
signal(SIGCHLD, SIG_IGN);
while(1)
{
accept();
clients++;
if(fork() ==0)
{
childfunction();
clients--;
}
else
{
}
}
The problem in the above scenario is that if the child process gets killed in the childfunction() function, the global variable clients is not getting decremented.
NOTE: I am looking for a solution without using SIGCHLD signal ... If possible
Typically you write a handler for SIGCHLD which calls waitpid() on pid -1. You can use the return value from that to determine what pid died. For example:
void my_sigchld_handler(int sig)
{
pid_t p;
int status;
while ((p=waitpid(-1, &status, WNOHANG)) != -1)
{
/* Handle the death of pid p */
}
}
/* It's better to use sigaction() over signal(). You won't run into the
* issue where BSD signal() acts one way and Linux or SysV acts another. */
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = my_sigchld_handler;
sigaction(SIGCHLD, &sa, NULL);
Alternatively you can call waitpid(pid, &status, 0) with the child's process ID specified, and synchronously wait for it to die. Or use WNOHANG to check its status without blocking.
None of the solutions so far offer an approach without using SIGCHLD as the question requests. Here is an implementation of an alternative approach using poll as outlined in this answer (which also explains why you should avoid using SIGCHLD in situations like this):
Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
(Note: I omitted some best practices like error-checking and cleaning up file descriptors for brevity)
/**
* Specifies the maximum number of clients to keep track of.
*/
#define MAX_CLIENT_COUNT 1000
/**
* Tracks clients by storing their process IDs and pipe file descriptors.
*/
struct process_table {
pid_t clientpids[MAX_CLIENT_COUNT];
struct pollfd clientfds[MAX_CLIENT_COUNT];
} PT;
/**
* Initializes the process table. -1 means the entry in the table is available.
*/
void initialize_table() {
for (int i = 0; i < MAX_CLIENT_COUNT; i++) {
PT.clientfds[i].fd = -1;
}
}
/**
* Returns the index of the next available entry in the process table.
*/
int get_next_available_entry() {
for (int i = 0; i < MAX_CLIENT_COUNT; i++) {
if (PT.clientfds[i].fd == -1) {
return i;
}
}
return -1;
}
/**
* Adds information about a new client to the process table.
*/
void add_process_to_table(int i, pid_t pid, int fd) {
PT.clientpids[i] = pid;
PT.clientfds[i].fd = fd;
}
/**
* Removes information about a client from the process table.
*/
void remove_process_from_table(int i) {
PT.clientfds[i].fd = -1;
}
/**
* Cleans up any dead child processes from the process table.
*/
void reap_zombie_processes() {
int p = poll(PT.clientfds, MAX_CLIENT_COUNT, 0);
if (p > 0) {
for (int i = 0; i < MAX_CLIENT_COUNT; i++) {
/* Has the pipe closed? */
if ((PT.clientfds[i].revents & POLLHUP) != 0) {
// printf("[%d] done\n", PT.clientpids[i]);
waitpid(PT.clientpids[i], NULL, 0);
remove_process_from_table(i);
}
}
}
}
/**
* Simulates waiting for a new client to connect.
*/
void accept() {
sleep((rand() % 4) + 1);
}
/**
* Simulates useful work being done by the child process, then exiting.
*/
void childfunction() {
sleep((rand() % 10) + 1);
exit(0);
}
/**
* Main program
*/
int main() {
/* Initialize the process table */
initialize_table();
while (1) {
accept();
/* Create the pipe */
int p[2];
pipe(p);
/* Fork off a child process. */
pid_t cpid = fork();
if (cpid == 0) {
/* Child process */
close(p[0]);
childfunction();
}
else {
/* Parent process */
close(p[1]);
int i = get_next_available_entry();
add_process_to_table(i, cpid, p[0]);
// printf("[%d] started\n", cpid);
reap_zombie_processes();
}
}
return 0;
}
And here is some sample output from running the program with the printf statements uncommented:
[31066] started
[31067] started
[31068] started
[31069] started
[31066] done
[31070] started
[31067] done
[31068] done
[31071] started
[31069] done
[31072] started
[31070] done
[31073] started
[31074] started
[31072] done
[31075] started
[31071] done
[31074] done
[31081] started
[31075] done
You don't want a zombie. If a child process dies and the parent is still RUNNING but never issues a wait()/waitpid() call to harvest the status, the system does not release the resources associated with the child and a zombie/defunct process is left in the proc table.
Try changing your SIGCHLD handler to something closer to the following:
void chld_handler(int sig) {
pid_t p;
int status;
/* loop as long as there are children to process */
while (1) {
/* retrieve child process ID (if any) */
p = waitpid(-1, &status, WNOHANG);
/* check for conditions causing the loop to terminate */
if (p == -1) {
/* continue on interruption (EINTR) */
if (errno == EINTR) {
continue;
}
/* break on anything else (EINVAL or ECHILD according to manpage) */
break;
}
else if (p == 0) {
/* no more children to process, so break */
break;
}
/* valid child process ID retrieved, process accordingly */
...
}
}
You could optionally mask/block additional SIGCHLD signals during execution of the signal handler using sigprocmask(). The blocked mask must be returned to its original value when the signal handling routine has finished.
If you really don't want to use a SIGCHLD handler, you could try adding the child processing loop somewhere where it would be called regularly and poll for terminated children.
The variable 'clients' are in different process address spaces after fork() and when you decrement the variable in the child, this will not affect the value in the parent. I think you need to handle SIGCHLD to handle the count correctly.