How could I track down the death of a child process without making the parent process wait until the child process got killed?
I am trying a client-server scenario where the server accepts the connection from a client and forks a new process for each and every connection it accepts.
I am ignoring SIGCHLD signals to prevent zombie creation.
signal(SIGCHLD, SIG_IGN);
while(1)
{
accept();
clients++;
if(fork() ==0)
{
childfunction();
clients--;
}
else
{
}
}
The problem in the above scenario is that if the child process gets killed in the childfunction() function, the global variable clients is not getting decremented.
NOTE: I am looking for a solution without using SIGCHLD signal ... If possible
Typically you write a handler for SIGCHLD which calls waitpid() on pid -1. You can use the return value from that to determine what pid died. For example:
void my_sigchld_handler(int sig)
{
pid_t p;
int status;
while ((p=waitpid(-1, &status, WNOHANG)) != -1)
{
/* Handle the death of pid p */
}
}
/* It's better to use sigaction() over signal(). You won't run into the
* issue where BSD signal() acts one way and Linux or SysV acts another. */
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = my_sigchld_handler;
sigaction(SIGCHLD, &sa, NULL);
Alternatively you can call waitpid(pid, &status, 0) with the child's process ID specified, and synchronously wait for it to die. Or use WNOHANG to check its status without blocking.
None of the solutions so far offer an approach without using SIGCHLD as the question requests. Here is an implementation of an alternative approach using poll as outlined in this answer (which also explains why you should avoid using SIGCHLD in situations like this):
Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
(Note: I omitted some best practices like error-checking and cleaning up file descriptors for brevity)
/**
* Specifies the maximum number of clients to keep track of.
*/
#define MAX_CLIENT_COUNT 1000
/**
* Tracks clients by storing their process IDs and pipe file descriptors.
*/
struct process_table {
pid_t clientpids[MAX_CLIENT_COUNT];
struct pollfd clientfds[MAX_CLIENT_COUNT];
} PT;
/**
* Initializes the process table. -1 means the entry in the table is available.
*/
void initialize_table() {
for (int i = 0; i < MAX_CLIENT_COUNT; i++) {
PT.clientfds[i].fd = -1;
}
}
/**
* Returns the index of the next available entry in the process table.
*/
int get_next_available_entry() {
for (int i = 0; i < MAX_CLIENT_COUNT; i++) {
if (PT.clientfds[i].fd == -1) {
return i;
}
}
return -1;
}
/**
* Adds information about a new client to the process table.
*/
void add_process_to_table(int i, pid_t pid, int fd) {
PT.clientpids[i] = pid;
PT.clientfds[i].fd = fd;
}
/**
* Removes information about a client from the process table.
*/
void remove_process_from_table(int i) {
PT.clientfds[i].fd = -1;
}
/**
* Cleans up any dead child processes from the process table.
*/
void reap_zombie_processes() {
int p = poll(PT.clientfds, MAX_CLIENT_COUNT, 0);
if (p > 0) {
for (int i = 0; i < MAX_CLIENT_COUNT; i++) {
/* Has the pipe closed? */
if ((PT.clientfds[i].revents & POLLHUP) != 0) {
// printf("[%d] done\n", PT.clientpids[i]);
waitpid(PT.clientpids[i], NULL, 0);
remove_process_from_table(i);
}
}
}
}
/**
* Simulates waiting for a new client to connect.
*/
void accept() {
sleep((rand() % 4) + 1);
}
/**
* Simulates useful work being done by the child process, then exiting.
*/
void childfunction() {
sleep((rand() % 10) + 1);
exit(0);
}
/**
* Main program
*/
int main() {
/* Initialize the process table */
initialize_table();
while (1) {
accept();
/* Create the pipe */
int p[2];
pipe(p);
/* Fork off a child process. */
pid_t cpid = fork();
if (cpid == 0) {
/* Child process */
close(p[0]);
childfunction();
}
else {
/* Parent process */
close(p[1]);
int i = get_next_available_entry();
add_process_to_table(i, cpid, p[0]);
// printf("[%d] started\n", cpid);
reap_zombie_processes();
}
}
return 0;
}
And here is some sample output from running the program with the printf statements uncommented:
[31066] started
[31067] started
[31068] started
[31069] started
[31066] done
[31070] started
[31067] done
[31068] done
[31071] started
[31069] done
[31072] started
[31070] done
[31073] started
[31074] started
[31072] done
[31075] started
[31071] done
[31074] done
[31081] started
[31075] done
You don't want a zombie. If a child process dies and the parent is still RUNNING but never issues a wait()/waitpid() call to harvest the status, the system does not release the resources associated with the child and a zombie/defunct process is left in the proc table.
Try changing your SIGCHLD handler to something closer to the following:
void chld_handler(int sig) {
pid_t p;
int status;
/* loop as long as there are children to process */
while (1) {
/* retrieve child process ID (if any) */
p = waitpid(-1, &status, WNOHANG);
/* check for conditions causing the loop to terminate */
if (p == -1) {
/* continue on interruption (EINTR) */
if (errno == EINTR) {
continue;
}
/* break on anything else (EINVAL or ECHILD according to manpage) */
break;
}
else if (p == 0) {
/* no more children to process, so break */
break;
}
/* valid child process ID retrieved, process accordingly */
...
}
}
You could optionally mask/block additional SIGCHLD signals during execution of the signal handler using sigprocmask(). The blocked mask must be returned to its original value when the signal handling routine has finished.
If you really don't want to use a SIGCHLD handler, you could try adding the child processing loop somewhere where it would be called regularly and poll for terminated children.
The variable 'clients' are in different process address spaces after fork() and when you decrement the variable in the child, this will not affect the value in the parent. I think you need to handle SIGCHLD to handle the count correctly.
Related
I'm trying to implement a producer-consumer application using 1 parent process and 1 child process. The program should work like this:
1 - The parent process is the producer and the child process is the consumer.
2 - The producer creates a file, the consumer removes the file.
3 - After the file has been created, the parent process sends a SIGUSR1 signal to the child process which then removes the file and sends a SIGUSR2 signal to the parent, signaling it that the file can be created again.
I've tried implementing this problem but I keep getting this error:
User defined signal 1: 30.
I don't really understand what could be the problem. I've just started learning about process and signals and maybe I'm missing something. Any help would be appreciated. Here's my implementation:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
pid_t child, parent;
void producer()
{
system("touch file");
printf("File was created.\n");
}
void consumer()
{
system("rm file");
printf("File was deleted.\n");
kill(parent, SIGUSR2); // signal -> file can created by parent
}
int main(void)
{
system("touch file");
pid_t pid = fork();
for(int i = 0; i < 10; ++i)
{
if(pid < 0) // error fork()
{
perror("fork()");
return -1;
}
else if(pid == 0) // child proces - consumer
{
child = getpid();
signal(SIGUSR1, consumer);
pause();
}
else // parent process - producer
{
parent = getpid();
signal(SIGUSR2, producer);
// signal -> file can be deleted by child
kill(child, SIGUSR1);
}
}
return 0;
}
Edit: I forgot to mention that there can only be one file at a time.
...Any help would be appreciated.
Regarding the Error: User defined signal 1: 30, it is possible that the speed of execution is precipitating a race condition, causing termination before your handler functions are registered. Keep in mind, each signal has a default disposition (or action). For SIGUSR1 and SIGUSR2S the disposition is term, (from table in signal(7) page linked below)
SIGUSR1 30,10,16 Term User-defined signal 1
SIGUSR2 31,12,17 Term User-defined signal 2
(Note the value 30 listed by SIGUSR1 matches the exit condition you cite.)
The implication here would be that your handler functions had not registered before the first encounter with SIGUSR1, causing the default action of terminating your application and throwing the signal related error.
The relationship between synchronization and timing come to mind as something to look at. I found several things written on synchronization, and linked one below.
Timing may be implicitly addressed with an adequate approach to synchronization, negating the need for any explicit execution flow control functions. However, if help is needed, experiment with the sleep family of functions.
Here are a couple of other general suggestions:
1) printf (and family) should really not be used in a signal handler.
2) But, if used, a newline ( \n ) is a good idea (which you have), or use fflush to force a write.
3) Add a strace() call to check if any system call traffic is occurring.
Another code example of Synchronizing using signal().
Take a look at the signal(7) page.. (which is a lot of information, but implies why using printf or fprintf inside a signal handler in the first place may not be a good idea.)
Another collection of detailed information on Signal Handling.
Apart from what #ryyker mentioned, another problem is that by the time your parent process tries to signal the child using global variable child, the child has not got a chance to run and collect the pid. So the parent will send signal to a junk pid. A better approach is to use the pid variable in the parent and getppid() in the child. Here is the code which seems to give desired output
void producer()
{
system("touch file");
printf("File was created.\n");
}
void consumer()
{
system("rm file");
printf("File was deleted.\n");
kill(getppid(), SIGUSR2); // signal -> file can created by parent
}
int main(void)
{
system("touch file");
pid_t pid = fork();
if(pid < 0) // error fork()
{
perror("fork()");
return -1;
}
if(pid > 0) { //parent
signal(SIGUSR2, producer);
}
else { //child
signal(SIGUSR1, consumer);
}
for(int i = 0; i < 10; ++i)
{
if(pid == 0) {// child proces - consumer
pause();
}
else // parent process - producer
{
printf("Iter %d\n",i);
kill(pid, SIGUSR1);
pause();
}
}
return 0;
}
Try using semaphores in c++ instead of signals.
Signals truly serve special purposes in OS whereas semaphores serve process synchronization.
Posix named semaphores in c++ can be used across processes.
The following pseudocode will help.
Semaphore Full,Empty;
------
Producer() //producer
{
waitfor(Empty);//wait for an empty slot
system("touch file");
printf("File was created.\n");
Signal(Full); //Signal one slot is full
}
Consumer() //Consumer
{
WaitFor(Full); //wait for producer to produce
system("rm file");
printf("File was deleted.\n");
Signal(Empty);//Signal that it has consumed, so one empty slot created
}
After a lot of research and reading all of the suggestions I finally managed to make the program work. Here is my implementation. If you see any mistakes or perhaps something could have been done better, then feel free to correct my code. I'm open to suggestions.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
void signal_handler(int signal_number)
{
sigset_t mask;
if(sigemptyset(&mask) == -1 || sigfillset(&mask) == -1)
{// initialize signal set || block all signals
perror("Failed to initialize the signal mask.");
return;
}
switch(signal_number)
{
case SIGUSR1:
{
if(sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
{ // entering critical zone
perror("sigprocmask(1)");
return;
} //---------------------
sleep(1);
system("rm file"); /* critical zone */
puts("File was removed.");
//--------------------
if(sigprocmask(SIG_UNBLOCK, &mask, NULL) == -1)
{// exiting critical zone
perror("1 : sigprocmask()");
return;
}
break;
}
case SIGUSR2:
{
if(sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
{// entering critical zone
perror("2 : sigprocmask()");
return;
} //---------------------
sleep(1);
system("touch file");
puts("File was created."); /* critical zone */
// --------------------
if(sigprocmask(SIG_UNBLOCK, &mask, NULL) == -1)
{// exiting critical zone
perror("sigprocmask(2)");
return;
}
break;
}
}
}
int main(void)
{
pid_t pid = fork();
struct sigaction sa;
sa.sa_handler = &signal_handler; // handler function
sa.sa_flags = SA_RESTART;
sigaction(SIGUSR1, &sa, NULL);
sigaction(SIGUSR2, &sa, NULL);
if(pid < 0)
{
perror("fork()");
return -1;
}
for(int i = 0; i < 10; ++i)
{
if(pid > 0) // parent - producer
{
sleep(2);
// signal -> file was created
kill(pid, SIGUSR1);
pause();
}
else // child - consumer
{
pause();
// signal -> file was removed
kill(getppid(), SIGUSR2);
}
}
return 0;
}
I'm trying to write a shell and I came across this problem: after I run the fork() and execute the commands, in the main process I wait for all child processes like this:
while (wait(NULL) > 0);
But when I try to suspend a child process, the main process won't go past this loop.
So how do I wait only for non suspended processes?
I could try to save the pid_t of all started sub processes then check if they are suspended but I thought maybe there is a better way.
To wait for any child, either exited (aka ended, terminated) or stopped (aka suspended) use the waitpid() instead.
int wstatus;
{
pid_t result;
while (result = waitpid(-1, &wstatus, WUNTRACED)) /* Use WUNTRACED|WCONTINUED
to return on continued children as well. */
{
if ((pid_t) -1 = result)
{
if (EINTR = errno)
{
continue;
}
if (ECHILD == errno)
{
exit(EXIT_SUCCESS); /* no children */
}
perror("waitpid() failed");
exit(EXIT_FAILURE);
}
}
}
if (WEXITED(wstatus))
{
/* child exited normally with exit code rc = ... */
int rc = WEXITSTATUS(wstatus);
...
}
else if (WIFSIGNALED(wstatus)
{
/* child exited by signal sig = ... */
int sig = WTERMSIG(wstatus);
...
}
else if (WSTOPPED(wstatus))
{
/* child stopped by signal sig = ... */
int sig = WSTOPSIG(wstatus);
...
}
else if (WCONTINUED(wstatus))
{
/* child continued (occurs only if WCONTINUED was passed to waitpid()) */
}
I really didn't get how signal handlers work especially with forks. So i need to do this exercise but i couldn't get it work properly.
My main program makes 5 forks, each fork prints simply 10 messages with its pid. So the purpose of the program, when i send a SIGINT signal via keyboard(Ctrl-c) it should print, "a single SIGINT arrived", if two SIGINT arrives between one second, it should print "double SIGINT arrived" and should terminate the whole program. So when i launch my program, it handles first two SIGINT(that i send the second more than 1 second after the first one) but then it doesn't handle single SIGINT and neither double SIGINT.
So i'm very confused about signals. Forks continue to stamp messages. I load same handler both to main and to forks but what should i do to terminate all forks when arrives double SIGINT? Should i call killl or some other function in handler to terminate them?
the main function
/* libraries... */
volatile sig_atomic_t double_sigint = 0;
int64_t time_diff = 0;
int main()
{
int i;
int pid;
sigset_t set;
struct sigaction sa;
/* mask all signals */
/*H*/ if(sigfillset(&set) == -1 )
/*A*/ {perror("sigfillset"); exit(errno);}
/*N*/
/*D*/ if(sigprocmask(SIG_SETMASK,&set,NULL) == -1)
/*L*/ {perror("sigfillset"); exit(errno);}
/*E*/
/*R*/ memset(&sa,0,sizeof(sa));
/*B*/
/*L*/ sa.sa_handler = handler;
/*O*/
/*C*/ if(sigaction(SIGINT, &sa, NULL) == -1)
/*K*/ {perror("sigaction"); exit(errno);}
/**/
/**/ /* unmask all signals */
/**/ if( sigemptyset(&set) == -1 )
/**/ {perror("sigepmtyset"); exit(errno);}
/**/
/**/ if(sigprocmask(SIG_SETMASK,&set,NULL) == -1 )
/**/ {perror("sigprocmask"); exit(errno);}
for(i=0;i<5;++i)
{
if((pid = fork()) == -1)
{ perror("rec:fork"); exit(errno); }
if(pid == 0)/* figlio */
{
/* SAME HANDLER BLOCK IS HERE */
foo(i);
return;
}
sleep(1);
}
return 0;
}
foo function
void foo(int i)
{
int k;
for(k=0; k<10; ++k)
{
printf("%d. fork %d. print\n", i, k);
sleep(1);
}
}
signal handler
void handler (int signum) {
struct timespec sig1;
struct timespec sig2;
if(double_sigint == 0)
{
if(clock_gettime(CLOCK_REALTIME, &sig1))
{ perror("failed to get sig1 time"); exit(errno); }
write(1,"Received single SIGINT\n",18);
double_sigint = 1;
}
else if(double_sigint == 1)
{
if(clock_gettime(CLOCK_REALTIME, &sig2))
{ perror("failed to get sig2 time"); exit(errno); }
time_diff = (sig2.tv_sec - sig1.tv_sec) + (sig2.tv_nsec - sig1.tv_nsec)/1000000000;
if(time_diff < 1)
{
double_sigint = 2;
write(1,"Received double SIGINT\n",18);
_exit(EXIT_FAILURE);
}
else
{
sig1.tv_sec = sig2.tv_sec;
sig1.tv_nsec = sig2.tv_nsec;
write(1,"Received single SIGINT\n",18);
}
}
}
When you receive a double-SIGINT, you only kill the parent process, with the line _exit(EXIT_FAILURE);. The forks you have created before are not killed and keep running, their parent now being the init process.
If you want all the children to terminate, you have to kill them manually. Maybe this post would be helpful : How to make child process die after parent exits
Edit: That was not the problem since Ctrl+C sends a SIGINT to all the children (see comments).
What worked for me was :
As said in William Pursell's comment, make sig1 and sig2 global variables.
Make the parent process always run (just added a while (1); before the return statement), because some signals were not taken into account once the parent process was terminated.
In the handler, in the else clause (double_sigint == 1) you are comparing sig2 and sig1, but sig1 is uninitialized. The value that you gave it the first time the handler was called went away when that handler returned. You could simply give those variables file scope.
By using the uninitialized value of the local variable, you are getting undefined behavior. If the signal handler is called and the signal handling stack happens to be in the same state it was on the previous call, then things may work fine. This can happen if you send the signal twice with no intervening signals, for example. Since sleep is likely implemented with a signal, it is quite likely that the stack has been modified since the previous call and sig1 is not what you expect. However, speculation about undefined behavior is somewhat pointless.
I'm currently implementing the && function in a shell using C. For example, if we input cmd1 && cmd2, then cmd2 executes only when cmd1 exits successfully. I'm thinking about:
int main() {
int i;
char **args;
while(1) {
printf("yongfeng's shell:~$ ");
args = get_line();
if (strcmp(args[0], "exit") == 0) exit(0); /* if it's built-in command exit, exit the shell */
if('&&') parse_out_two_commands: cmd1, cmd2;
if (execute(cmd1) != -1) /* if cmd1 successfully executed */
execute(cmd2); /* then execute the second cmd */
}
}
int execute(char **args){
int pid;
int status; /* location to store the termination status of the terminated process */
char **cmd; /* pure command without special charactors */
if(pid=fork() < 0){ //fork a child process, if pid<0, fork fails
perror("Error: forking failed");
return -1;
}
/* child */
else if(pid==0){ /* child process, in which command is going to be executed */
cmd = parse_out(args);
/* codes handleing I/O redirection */
if(execvp(*cmd, cmd) < 0){ /* execute command */
perror("execution error");
return -1;
}
return 0;
}
/* parent */
else{ /* parent process is going to wait for child or not, depends on whether there's '&' at the end of the command */
if(strcmp(args[sizeof(args)],'&') == 0){
/* handle signals */
}
else if (pid = waitpid(pid, &status, 0) == -1) perror("wait error");
}
}
So I'm using another function int execute(char ** args) to do the actual work. Its return type is int because I wan to know whether the command exits successfully. But I'm not sure here whether the parent process can get the return value from the child since they're two different processes.
Or should I decide whether to execute the second command in the child process, by forking another process to run it? Thanks a lot.
Change:
if(pid=fork() < 0){ //fork a child process, if pid<0, fork fails
to:
if((pid=fork()) < 0){ //fork a child process, if pid<0, fork fails
You're setting pid to the result of fork() < 0, not setting it to the PID of the child. So unless there's an error in fork(), this sets pid to 0 in both the parent and child, so they both think they're the child.
Regarding the return value of the execute() function: It will return in both the parent and child. In each process, it will return whatever was specified in the return statement in the corresponding branch of the if in execute(). Note that it execve() is successful, the child never returns, because it's no longer running this program, it's running the program that was exec'ed.
If the child wants to send success or failure information to the parent, it does this using its exit status, by calling exit(0) to indicate success, and exit(some-nonzero-value) to indicate failure. The parent can get the exit status using waitpid, and then return a success or failure indication from execute().
I've recently finished Section 10 (Signals) of "Advanced Programming in the Unix Environment" (3rd edition) and I've come across a piece of code I don't entirely understand:
#include "apue.h"
static volatile sig_atomic_t sigflag; /* set nonzero by sig handler */
static sigset_t newmask, oldmask, zeromask;
static void
sig_usr(int signo) /* one signal handler for SIGUSR1 and SIGUSR2 */
{
sigflag = 1;
}
void
TELL_WAIT(void)
{
if (signal(SIGUSR1, sig_usr) == SIG_ERR)
err_sys("signal(SIGUSR1) error");
if (signal(SIGUSR2, sig_usr) == SIG_ERR)
err_sys("signal(SIGUSR2) error");
sigemptyset(&zeromask);
sigemptyset(&newmask);
sigaddset(&newmask, SIGUSR1);
sigaddset(&newmask, SIGUSR2);
/* Block SIGUSR1 and SIGUSR2, and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
}
void
TELL_PARENT(pid_t pid)
{
kill(pid, SIGUSR2); /* tell parent we're done */
}
void
WAIT_PARENT(void)
{
while (sigflag == 0)
sigsuspend(&zeromask); /* and wait for parent */
sigflag = 0;
/* Reset signal mask to original value */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
}
void
TELL_CHILD(pid_t pid)
{
kill(pid, SIGUSR1); /* tell child we're done */
}
void
WAIT_CHILD(void)
{
while (sigflag == 0)
sigsuspend(&zeromask); /* and wait for child */
sigflag = 0;
/* Reset signal mask to original value */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
}
The routines above are used (as you certainly know) to synchronize processes using signals. Although I understand every single line on its own, I can't see (understand) the big picture. The code itself is used it the following scenario: to avoid a race condition in our program, after we fork(), we make the child process TELL_PARENT and WAIT_PARENT, and then we do the same to the parent with TELL_CHILD and WAIT_CHILD. My questions are:
1.) How can a child communicate with its parent through a variable while both of them work with their own set (copy) of variables? Is it because the child doesn't modify sigflag directly but through a signal handler (the same goes for the parent)?
2.) Why do we need to block SIGUSR1 and SIGUSR2 and then unblock it with sigprocmask?
A program that uses three of those routines could be (taken from the book):
#include "apue.h"
static void charatatime(char *);
int
main(void)
{
pid_t pid;
TELL_WAIT();
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
WAIT_PARENT(); /* parent goes first */
charatatime("output from child\n");
} else {
charatatime("output from parent\n");
TELL_CHILD(pid);
}
exit(0);
}
static void
charatatime(char *str)
{
char *ptr;
int c;
setbuf(stdout, NULL); /* set unbuffered */
for (ptr = str; (c = *ptr++) != 0; )
putc(c, stdout);
}
Cheers,
1) They are not communicating through "variable" - the sole communication facility used here is kill function. We "tell" things by invoking kill, we "wait" to be told with sigsuspend. sig_flag is not shared, it's a local state of each process, and it says whether this particular process has been "told" by the other.
2) Were the signals not blocked prior to fork, the parent process could send the signal to the child before the child has started waiting for it. That is, the timeline could be like that:
fork
parent gets the time slice, sends signal to the child with kill
child gets the time slice, and waits for the signal
But this signal has already been delivered, and so waits indefinitely. Therefore, we must ensure the signal is not delivered to the child process before it starts the waiting loop. To this end, we block it before fork, and atomically unblock it and start waiting for it. Atomicity is the key; required invariant cannot be achieved with this operation performed as two independent steps, as the signal could be delivered inbetween.