Properly reaping all child processes and collecting exit status - c

I want to catch all child processes forked by a parent process, then collect the last child's exit status. To that end, I called sigsuspend() to wait for a SIGCHLD signal. When I receive the SIGCHLD signal, then the handler will call waitpid in a loop until it indicates there are no children left to reap. The exit status will be set, and the main will break out of the loop and terminate.
However, I noticed that this is not correct, as all the children aren't always reaped. How can I fix this?
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/wait.h>
volatile sig_atomic_t exit_stat;
// Signal Handler
void sigchld_handler(int sig) {
pid_t pid;
int status;
while(1) {
pid = waitpid(-1, &status, WNOHANG);
if(pid <= 0) {break;}
if(WIFEXITED(status)) {
printf("%s", "Exited correctly.");
}
else {
printf("%s", "Bad exit.");
}
}
exit_stat = status;
}
// Executing code.
int main() {
signal(SIGCHLD, sigchld_handler);
sigset_t mask_child;
sigset_t old_mask;
sigemptyset(&mask_child);
sigaddset(&mask_child, SIGCHLD);
sigprocmask(SIG_BLOCK, &mask_child, &old_mask);
for(int i = 0; i < 5; i++) {
int child_pid = fork();
if(child_pid != 0) {
//Perform execvp call.
char* argv[] = {"echo", "hi", NULL};
execvp(argv[0], argv);
}
}
while(!exit_stat) {
sigsuspend(&old_mask);
}
return 0;
}

Transferring lightly modified comments into an answer.
The WNOHANG option to waitpid() means "return immediately if there are no children left, OR if there are children left but they're still running". If you really want to wait for all children to exit, either omit the WNOHANG option to waitpid() or simply use wait() instead. Note that if there were tasks launched in the background, they may not terminate for a very long time, if ever. It also depends on the context whether 'the last child to die' is the correct one to report on. It is possible to imagine scenarios where that is not appropriate.
You're right, in this instance, I meant that "the last child to die" is the last child that was forked. Can I fix this by adding a simple condition to check if the returned pid of wait == the pid of the last forked child?
If you're interested in the last child in the most recent pipeline (e.g. ls | grep … | sort … | wc and you want to wait for wc), then you know the PID for wc, and you can use waitpid(wc_pid, &status, 0) to wait for that process specifically to die. Or you can use your loop to collect bodies until you either find the body of wc or get 'no dead processes left'. At that point, you can decide to wait specifically for the wc PID, or (better) use waitpid() without WNOHANG (or use wait()) until some process dies — and again you can decide whether it was wc or not, and if not, repeat the WNOHANG corpse collection process to collect any zombies. Repeat until you do find the corpse of wc.
And also, you said that background tasks may not terminate for a long time. By this, do you mean that waitpid(-1, &status, 0) will completely suspend all processes until a child is ready to be reaped?
waitpid(-1, &status, 0); will make the parent process wait indefinitely until some child process dies, or it will return because there are no children left to wait for (which indicates there was a housekeeping error; children should not die without the parent knowing).
Note that using a 'wait for any child' loop avoids leaving zombies around (children that have died but not been waited for). This is generally a good idea. But capturing when the child you're currently interested in dies ensures that your shell doesn't hang around waiting when it wasn't necessary. So, you need to capture both the PID and the exit status of the dead child processes.

Related

How can waitpid() reap more than one child?

In this example from the CSAPP book chap.8:
\#include "csapp.h"
/* WARNING: This code is buggy! \*/
void handler1(int sig)
{
int olderrno = errno;
if ((waitpid(-1, NULL, 0)) < 0)
sio_error("waitpid error");
Sio_puts("Handler reaped child\n");
Sleep(1);
errno = olderrno;
}
int main()
{
int i, n;
char buf[MAXBUF];
if (signal(SIGCHLD, handler1) == SIG_ERR)
unix_error("signal error");
/* Parent creates children */
for (i = 0; i < 3; i++) {
if (Fork() == 0) {
printf("Hello from child %d\n", (int)getpid());
exit(0);
}
}
/* Parent waits for terminal input and then processes it */
if ((n = read(STDIN_FILENO, buf, sizeof(buf))) < 0)
unix_error("read");
printf("Parent processing input\n");
while (1)
;
exit(0);
}
It generates the following output:
......
Hello from child 14073
Hello from child 14074
Hello from child 14075
Handler reaped child
Handler reaped child //more than one child reaped
......
The if block used for waitpid() is used to generate a mistake that waitpid() is not able to reap all children. While I understand that waitpid() is to be put in a while() loop to ensure reaping all children, what I don't understand is that why only one waitpid() call is made, yet was able to reap more than one children(Note in the output more than one child is reaped by handler)? According to this answer: Why does waitpid in a signal handler need to loop?
waitpid() is only able to reap one child.
Thanks!
update:
this is irrelevant, but the handler is corrected in the following way(also taken from the CSAPP book):
void handler2(int sig)
{
int olderrno = errno;
while (waitpid(-1, NULL, 0) > 0) {
Sio_puts("Handler reaped child\n");
}
if (errno != ECHILD)
Sio_error("waitpid error");
Sleep(1);
errno = olderrno;
}
Running this code on my linux computer.
The signal handler you designated runs every time the signal you assigned to it (SIGCHLD in this case) is received. While it is true that waitpid is only executed once per signal receival, the handler still executes it multiple times because it gets called every time a child terminates.
Child n terminates (SIGCHLD), the handler springs into action and uses waitpid to "reap" the just exited child.
Child n+1 terminates and its behaviour follows the same as Child n. This goes on for every child there is.
There is no need to loop it as it gets called only when needed in the first place.
Edit: As pointed out below, the reason as to why the book later corrects it with the intended loop is because if multiple children send their termination signal at the same time, the handler may only end up getting one of them.
signal(7):
Standard signals do not queue. If multiple instances of a
standard signal are generated while that signal is blocked, then
only one instance of the signal is marked as pending (and the
signal will be delivered just once when it is unblocked).
Looping waitpid assures the reaping of all exited children and not just one of them as is the case right now.
Why is looping solving the issue of multiple signals?
Picture this: you are currently inside the handler, handling a SIGCHLD signal you have received and whilst you are doing that, you receive more signals from other children that have terminated in the meantime. These signals cannot queue up. By constantly looping waitpid, you are making sure that even if the handler itself can't deal with the multiple signals being sent, waitpid still picks them up as it's constantly running, rather than only running when the handler activates, which can or can't work as intended depending on whether signals have been merged or not.
waitpid still exits correctly once there are no more children to reap. It is important to understand that the loop is only there to catch signals that are sent when you are already in the signal handler and not during normal code execution as in that case the signal handler will take care of it as normal.
If you are still in doubt, try reading these two answers to your question.
How to make sure that `waitpid(-1, &stat, WNOHANG)` collect all children processes
Why does waitpid in a signal handler need to loop? (first two paragraphs)
The first one uses flags such as WNOHANG, but this only makes waitpid return immediately instead of waiting, if there is no child process ready to be reaped.

Why does waitpid in a signal handler need to loop?

I read in an ebook that waitpid(-1, &status, WNOHANG) should be put under a while loop so that if multiple child process exits simultaniously , they are all get reaped.
I tried this concept by creating and terminating 2 child processes at the same time and reaping it by waitpid WITHOUT using loop. And the are all been reaped .
Question is , is it very necessary to put waitpid under a loop ?
#include<stdio.h>
#include<sys/wait.h>
#include<signal.h>
int func(int pid)
{
if(pid < 0)
return 0;
func(pid - 1);
}
void sighand(int sig)
{
int i=45;
int stat, pid;
printf("Signal caught\n");
//while( (
pid = waitpid(-1, &stat, WNOHANG);
//) > 0){
printf("Reaped process %d----%d\n", pid, stat);
func(pid);
}
int main()
{
int i;
signal(SIGCHLD, sighand);
pid_t child_id;
if( (child_id=fork()) == 0 ) //child process
{
printf("Child ID %d\n",getpid());
printf("child exiting ...\n");
}
else
{
if( (child_id=fork()) == 0 ) //child process
{
printf("Child ID %d\n",getpid());
printf("child exiting ...\n");
}
else
{
printf("------------Parent with ID %d \n",getpid());
printf("parent exiting ....\n");
sleep(10);
sleep(10);
}
}
}
Yes.
Okay, I'll elaborate.
Each call to waitpid reaps one, and only one, child. Since you put the call inside the signal handler, there is no guarantee that the second child will exit before you finish executing the first signal handler. For two processes that is okay (the pending signal will be handled when you finish), but for more, it might be that two children will finish while you're still handling another one. Since signals are not queued, you will miss a notification.
If that happens, you will not reap all children. To avoid that problem, the loop recommendation was introduced. If you want to see it happen, try running your test with more children. The more you run, the more likely you'll see the problem.
With that out of the way, let's talk about some other issues.
First, your signal handler calls printf. That is a major no-no. Very few functions are signal handler safe, and printf definitely isn't one. You can try and make your signal handler safer, but a much saner approach is to put in a signal handler that merely sets a flag, and then doing the actual wait call in your main program's flow.
Since your main flow is, typically, to call select/epoll, make sure to look up pselect and epoll_pwait, and to understand what they do and why they are needed.
Even better (but Linux specific), look up signalfd. You might not need the signal handler at all.
Edited to add:
The loop does not change the fact that two signal deliveries are merged into one handler call. What it does do is that this one call handles all pending events.
Of course, once that's the case, you must use WNOHANG. The same artifacts that cause signals to be merged might also cause you to handle an event for which a signal is yet to be delivered.
If that happens, then once your first signal handler exists, it will get called again. This time, however, there will be no pending events (as the events were already extracted by the loop). If you do not specify WNOHANG, your wait block, and the program will be stuck indefinitely.

fork and signal: how to send signals from parent process to specific child process

I need to fork two child-processes. One can receive the signal 3, print hello and send the signal 4 to the the other child process; The other can receive the signal 4, print world and send the signal 3 to the first child process.
To start, the father process will send the signal 3 to the first child process after sleeping for 3 seconds.
Then 3 seconds later, the father process will send SIGKILL to kill both of them.
I don't know how to send signals to a specific child process (I knew that we had a function kill to send signals but I don't know to use it here).
Here is my code:
#include <stdio.h>
#include <signal.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
void func(int n)
{
printf("ping\n");
// how to send signal 4 to the second process?
}
void func2(int n)
{
printf("pong\n");
// how to send signal 3 to the first process?
}
int main()
{
pid_t pid;
int i;
for(i = 0; i < 2; i++)
{
pid = fork();
if(pid == 0)
{
if(i == 0)
{
signal(3, func);
}
else
{
signal(4, func2);
}
while(1);
}
else
{
if(i == 1)
{
sleep(3);
// how to send signal 3 to the first child process?
sleep(3);
// how to kill the two children?
}
}
}
return 0;
}
you could use the popen() function to open a process by forking and opening a pipe to that process (instead of using fork() directly)
The parent knows the PID of each process so can then easily pass the pid of the second child to the first child.
The first child can use the pid and the kill()` function to pass a signal to the second child.
SO, use popen() to start the first child. use fork() to start the second child, then pass the pid from the second child to the first via the stream created with popen().
the handling of the pid value returned from the call to fork() is not being handled correctly.
The posted code is making the assumption that the call to fork() was successful... This is not a safe/valid assumption
The code also needs to check for the pid being -1 and appropriately handling that error.
when a child process completes, it should NOT sit in a while() loop but rather exit, using the exit() function.
The parent, should not just exit, as that leaves the two child processes as zombies. (zombies are very difficult to get rid of short of a system reboot.)
Rather, the parent should call wait() or even better waitpid() (and remember the child processes need to actually exit, NOT sit in a while() loop.
1) the func() and func2() should check the parameter to assure that it was the correct signal that was being processed.
2) the man page for signal() indicates that it should not be used. The man page suggest using: sigaction(),
When you fork you get the new pid. Per the kill manpage you call kill(pid_t pid, int sig); using the pid

Creating A Zombie Process Using the kill Function

I'm trying to create a zombie process with the kill function but it simply kills the child and returns 0.
int main ()
{
pid_t child_pid;
child_pid = fork ();
if (child_pid > 0) {
kill(getpid(),SIGKILL);
}
else {
exit (0);
}
return 0;
}
When I check the status of the process there is no z in the status column.
Here is a simple recipe which should create a zombie:
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
int main()
{
int pid = fork();
if(pid == 0) {
/* child */
while(1) pause();
} else {
/* parent */
sleep(1);
kill(pid, SIGKILL);
printf("pid %d should be a zombie\n", pid);
while(1) pause();
}
}
The key is that the parent -- i.e. this program -- keeps running but does not do a wait() on the dying child.
Zombies are dead children that have not been waited for. If this program waited for its dead child, it would go away and not be a zombie. If this program exited, the zombie child would be inherited by somebody else (probably init), which would probably do the wait, and the child would go away and not be a zombie.
As far as I know, the whole reason for zombies is that the dead child exited with an exit status, which somebody might want. But where Unix stores the exit status is in the empty husk of the dead process, and how you fetch a dead child's exit status is by waiting for it. So Unix is keeping the zombie around just to keep its exit status around just in case the parent wants it but hasn't gotten around to calling wait yet.
So it's actually kind of poetic: Unix's philosophy here is basically that no child's death should go unnoticed.

How do you kill zombie process using wait()

I have this code that requires a parent to fork 3 children.
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
What is the command to view zombie processes if you have Linux
virtual box?
main(){
pid_t child;
printf("-----------------------------------\n");
about("Parent");
printf("Now .. Forking !!\n");
child = fork();
int i=0;
for (i=0; i<3; i++){
if (child < 0) {
perror ("Unable to fork");
break;
}
else if (child == 0){
printf ("creating child #%d\n", (i+1));
about ("Child");
break;
}
else{
child = fork();
}
}
}
void about(char * msg){
pid_t me;
pid_t oldone;
me = getpid();
oldone = getppid();
printf("***[%s] PID = %d PPID = %d.\n", msg, me, oldone);
}
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
If your parent spawns only a small, fixed number of children; does not care when or whether they stop, resume, or finish; and itself exits quickly, then you do not need to use wait() or waitpid() to clean up the child processes. The init process (pid 1) takes responsibility for orphaned child processes, and will clean them up when they finish.
Under any other circumstances, however, you must wait() for child processes. Doing so frees up resources, ensures that the child has finished, and allows you to obtain the child's exit status. Via waitpid() you can also be notified when a child is stopped or resumed by a signal, if you so wish.
As for where to perform the wait,
You must ensure that only the parent wait()s.
You should wait at or before the earliest point where you need the child to have finished (but not before forking), OR
if you don't care when or whether the child finishes, but you need to clean up resources, then you can periodically call waitpid(-1, NULL, WNOHANG) to collect a zombie child if there is one, without blocking if there isn't any.
In particular, you must not wait() (unconditionally) immediately after fork()ing because parent and child run the same code. You must use the return value of fork() to determine whether you are in the child (return value == 0), or in the parent (any other return value). Furthermore, the parent must wait() only if forking was successful, in which case fork() returns the child's pid, which is always greater than zero. A return value less than zero indicates failure to fork.
Your program doesn't really need to wait() because it spawns exactly four (not three) children, then exits. However, if you wanted the parent to have at most one live child at any time, then you could write it like this:
int main() {
pid_t child;
int i;
printf("-----------------------------------\n");
about("Parent");
for (i = 0; i < 3; i++) {
printf("Now .. Forking !!\n");
child = fork();
if (child < 0) {
perror ("Unable to fork");
break;
} else if (child == 0) {
printf ("In child #%d\n", (i+1));
about ("Child");
break;
} else {
/* in parent */
if (waitpid(child, NULL, 0) < 0) {
perror("Failed to collect child process");
break;
}
}
}
return 0;
}
If the parent exits before one or more of its children, which can happen if it does not wait, then the child will thereafter see its parent process being pid 1.
Others have already answered how to get a zombie process list via th ps command. You may also be able to see zombies via top. With your original code you are unlikely to catch a glimpse of zombies, however, because the parent process exits very quickly, and init will then clean up the zombies it leaves behind.
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
You can use wait() anywhere in the parent process, and when the child process terminates it'll be removed from the system. Where to put it is up to you, in your specific case you probably want to put it immediately after the child = fork(); line so that the parent process won't resume its execution until its child has exited.
What is the command to view zombie processes if you have Linux virtual box?
You can use the ps aux command to view all processes in the system (including zombie processes), and the STAT column will be equal to Z if the process is a zombie. An example output would be:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
daniel 1000 0.0 0.0 0 0 ?? Z 17:15 0:00 command
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
You can register a signal handler for SIGCHLD that sets a global volatile sig_atomic_t flag = 0 variable to 1. Then, at some convenient place in your program, test whether flag is set to 1, and, if so, set it back to 0 and afterwards (for otherwise you might miss a signal) call waitpid(-1, NULL, WNOHANG) in a loop until it tells you that no more processes are to be waited for. Note that the signal will interrupt system calls with EINTR, which is a good condition to check for the value of flag. If you use an indefinitely blocking system call like select(), you might want to specify a timeout after which you check for flag, since otherwise you might miss a signal that was raised after your last waitpid() call but before entering the indefinitely blocking system call. An alternative to this kludge is to use pselect().
Use:
ps -e -opid,ppid,pgid,stat,etime,cmd | grep defunct
to see your zombies, also the ppid and pgid to see the parent ID and process group ID. The etime to see the elapsed (cpu) time your zombie has been alive. The parent ID is useful to send custom signals to the parent process.
If the parent process is right coded to catch and handle the SIGCHLD signal, and to what expected (i.e., wait/reap the zombies), then you can submit:
kill -CHLD <parent_pid>
to tell the parent to reap all their zombies.

Resources