In my program I am forking (in parallel) child processes in a finite while loop and doing exec on each of them. I want the parent process to resume execution (the point after this while loop ) only after all children have terminated. How should I do that?
i have tried several approaches. In one approach, I made parent pause after while loop and sent some condition from SIGCHLD handler only when waitpid returned error ECHILD(no child remaining) but the problem I am facing in this approach is even before parent has finished forking all processes, retStat becomes -1
void sigchld_handler(int signo) {
pid_t pid;
while((pid= waitpid(-1,NULL,WNOHANG)) > 0);
if(errno == ECHILD) {
retStat = -1;
}
}
**//parent process code**
retStat = 1;
while(some condition) {
do fork(and exec);
}
while(retStat > 0)
pause();
//This is the point where I want execution to resumed only when all children have finished
Instead of calling waitpid in the signal handler, why not create a loop after you have forked all the processes as follows:
while (pid = waitpid(-1, NULL, 0)) {
if (errno == ECHILD) {
break;
}
}
The program should hang in the loop until there are no more children. Then it will fall out and the program will continue. As an additional bonus, the loop will block on waitpid while children are running, so you don't need a busy loop while you wait.
You could also use wait(NULL) which should be equivalent to waitpid(-1, NULL, 0). If there's nothing else you need to do in SIGCHLD, you can set it to SIG_DFL.
I think you should use the waitpid() call. It allows you to wait for "any child process", so if you do that the proper number of times, you should be golden.
If that fails (not sure about the guarantees), you could do the brute-force approach sitting in a loop, doing a waitpid() with the NOHANG option on each of your child PIDs, and then delaying for a while before doing it again.
Related
In this example from the CSAPP book chap.8:
\#include "csapp.h"
/* WARNING: This code is buggy! \*/
void handler1(int sig)
{
int olderrno = errno;
if ((waitpid(-1, NULL, 0)) < 0)
sio_error("waitpid error");
Sio_puts("Handler reaped child\n");
Sleep(1);
errno = olderrno;
}
int main()
{
int i, n;
char buf[MAXBUF];
if (signal(SIGCHLD, handler1) == SIG_ERR)
unix_error("signal error");
/* Parent creates children */
for (i = 0; i < 3; i++) {
if (Fork() == 0) {
printf("Hello from child %d\n", (int)getpid());
exit(0);
}
}
/* Parent waits for terminal input and then processes it */
if ((n = read(STDIN_FILENO, buf, sizeof(buf))) < 0)
unix_error("read");
printf("Parent processing input\n");
while (1)
;
exit(0);
}
It generates the following output:
......
Hello from child 14073
Hello from child 14074
Hello from child 14075
Handler reaped child
Handler reaped child //more than one child reaped
......
The if block used for waitpid() is used to generate a mistake that waitpid() is not able to reap all children. While I understand that waitpid() is to be put in a while() loop to ensure reaping all children, what I don't understand is that why only one waitpid() call is made, yet was able to reap more than one children(Note in the output more than one child is reaped by handler)? According to this answer: Why does waitpid in a signal handler need to loop?
waitpid() is only able to reap one child.
Thanks!
update:
this is irrelevant, but the handler is corrected in the following way(also taken from the CSAPP book):
void handler2(int sig)
{
int olderrno = errno;
while (waitpid(-1, NULL, 0) > 0) {
Sio_puts("Handler reaped child\n");
}
if (errno != ECHILD)
Sio_error("waitpid error");
Sleep(1);
errno = olderrno;
}
Running this code on my linux computer.
The signal handler you designated runs every time the signal you assigned to it (SIGCHLD in this case) is received. While it is true that waitpid is only executed once per signal receival, the handler still executes it multiple times because it gets called every time a child terminates.
Child n terminates (SIGCHLD), the handler springs into action and uses waitpid to "reap" the just exited child.
Child n+1 terminates and its behaviour follows the same as Child n. This goes on for every child there is.
There is no need to loop it as it gets called only when needed in the first place.
Edit: As pointed out below, the reason as to why the book later corrects it with the intended loop is because if multiple children send their termination signal at the same time, the handler may only end up getting one of them.
signal(7):
Standard signals do not queue. If multiple instances of a
standard signal are generated while that signal is blocked, then
only one instance of the signal is marked as pending (and the
signal will be delivered just once when it is unblocked).
Looping waitpid assures the reaping of all exited children and not just one of them as is the case right now.
Why is looping solving the issue of multiple signals?
Picture this: you are currently inside the handler, handling a SIGCHLD signal you have received and whilst you are doing that, you receive more signals from other children that have terminated in the meantime. These signals cannot queue up. By constantly looping waitpid, you are making sure that even if the handler itself can't deal with the multiple signals being sent, waitpid still picks them up as it's constantly running, rather than only running when the handler activates, which can or can't work as intended depending on whether signals have been merged or not.
waitpid still exits correctly once there are no more children to reap. It is important to understand that the loop is only there to catch signals that are sent when you are already in the signal handler and not during normal code execution as in that case the signal handler will take care of it as normal.
If you are still in doubt, try reading these two answers to your question.
How to make sure that `waitpid(-1, &stat, WNOHANG)` collect all children processes
Why does waitpid in a signal handler need to loop? (first two paragraphs)
The first one uses flags such as WNOHANG, but this only makes waitpid return immediately instead of waiting, if there is no child process ready to be reaped.
I'm executing the code below and the call to waitpid() always returns -1, thus the code bellow ends with an infinite loop. The call works if I replace WNOHANG with 0.
void execute(cmdLine* pCmdLine) {
int status = 0;
pid_t pid = fork();
if(pid == 0) {
if(execvp(pCmdLine->arguments[0], pCmdLine->arguments) == -1) {
if(strcmp(pCmdLine->arguments[0], "cd") != 0) {
perror("execute failed\n");
}
_exit(1);
}
} else {
if(pCmdLine->blocking == 1) {
waitpid(pid, &status, 0);
}
while(waitpid(pid, &status, WNOHANG) == -1) {
printf("still -1\n");
}
}
}
}
Well, you have misunderstood the workings of the wait system call.
As with malloc/free, you can only successfully waitpid() only once per fork()ed process... so the while loop is never necessary if you are going to wait for the exit code of the child, you have to call it only once. Wait will only return -1 in your case because of two reasons:
fork() didn't succeed, so you are waiting for an invalid pid. Indeed, you should be calling wait() for pid == -1, which is invalid. In case you wait() and there's no process to be waited for (in case the pid variable has a positive number, but of an already wait()ed subprocess, you also get -1), you get an error from any of the wait() family of system calls. The mission of zombie processes in UN*X systems is just this, to ensure that a wait() for an already finished child is still valid and the calling process gets the exit code signalled by the child on exit().
You expressely say you are not going to wait for the process to finish. It should be clear that if you are not going to wait for the process to terminate, this is what you are doing with the WNOHANG parameter, then the child process can be still running (which is your case) and had not yet done an exit() syscall. You only want the exit code in case the child process has already finished. If this is the case, then you had better to write:
while(waitpid(pid, &status, WNOHANG) == -1 && errno == EAGAIN)
do_whatever_you_want_because_you_decided_not_to_wait();
The wait system call has no way to tell you that the &status variable has not been filled with the exit code of the child process than signalling an error, and in that case, it always sets errno to EAGAIN.
but, from my point of view, if you have nothing to do in the meanwhile, then you had better not to use WNOHANG. That will save cpu cycles and a lot of heat energy thrown to the environment.
Here
while(waitpid(pid,&status,WNOHANG)==-1) { }
when if there is no more child process exists then waitpid returns -1 and it makes while(true) always and that cause infinite loop.
From the manual page of waitpid().
waitpid(): on success, returns the process ID of the child whose
state has changed; if WNOHANG was specified and one or more
child(ren) specified by pid exist, but have not yet changed state,
then 0 is returned. On error, -1 is returned.
That means, when there are no more child to wait for, it returns -1. So either make it like
if() { /* child process. can be multiple */
}
else { /* parent process */
while(waitpid(pid,&status,WNOHANG) != -1) { /* when there is no more child process exists then it terminate */
}
}
or
if() { /* child process. can be multiple */
}
else { /* parent process */
while(waitpid(pid,&status,WNOHANG) == -1); /* dummy while ..when there is no more child process exists then it terminate */
}
In my program I am forking (in parallel) child processes in a finite while loop and doing exec on each of them. I want the parent process to resume execution (the point after this while loop ) only after all children have terminated. How should I do that?
i have tried several approaches. In one approach, I made parent pause after while loop and sent some condition from SIGCHLD handler only when waitpid returned error ECHILD(no child remaining) but the problem I am facing in this approach is even before parent has finished forking all processes, retStat becomes -1
void sigchld_handler(int signo) {
pid_t pid;
while((pid= waitpid(-1,NULL,WNOHANG)) > 0);
if(errno == ECHILD) {
retStat = -1;
}
}
**//parent process code**
retStat = 1;
while(some condition) {
do fork(and exec);
}
while(retStat > 0)
pause();
//This is the point where I want execution to resumed only when all children have finished
Instead of calling waitpid in the signal handler, why not create a loop after you have forked all the processes as follows:
while (pid = waitpid(-1, NULL, 0)) {
if (errno == ECHILD) {
break;
}
}
The program should hang in the loop until there are no more children. Then it will fall out and the program will continue. As an additional bonus, the loop will block on waitpid while children are running, so you don't need a busy loop while you wait.
You could also use wait(NULL) which should be equivalent to waitpid(-1, NULL, 0). If there's nothing else you need to do in SIGCHLD, you can set it to SIG_DFL.
I think you should use the waitpid() call. It allows you to wait for "any child process", so if you do that the proper number of times, you should be golden.
If that fails (not sure about the guarantees), you could do the brute-force approach sitting in a loop, doing a waitpid() with the NOHANG option on each of your child PIDs, and then delaying for a while before doing it again.
I have this code that requires a parent to fork 3 children.
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
What is the command to view zombie processes if you have Linux
virtual box?
main(){
pid_t child;
printf("-----------------------------------\n");
about("Parent");
printf("Now .. Forking !!\n");
child = fork();
int i=0;
for (i=0; i<3; i++){
if (child < 0) {
perror ("Unable to fork");
break;
}
else if (child == 0){
printf ("creating child #%d\n", (i+1));
about ("Child");
break;
}
else{
child = fork();
}
}
}
void about(char * msg){
pid_t me;
pid_t oldone;
me = getpid();
oldone = getppid();
printf("***[%s] PID = %d PPID = %d.\n", msg, me, oldone);
}
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
If your parent spawns only a small, fixed number of children; does not care when or whether they stop, resume, or finish; and itself exits quickly, then you do not need to use wait() or waitpid() to clean up the child processes. The init process (pid 1) takes responsibility for orphaned child processes, and will clean them up when they finish.
Under any other circumstances, however, you must wait() for child processes. Doing so frees up resources, ensures that the child has finished, and allows you to obtain the child's exit status. Via waitpid() you can also be notified when a child is stopped or resumed by a signal, if you so wish.
As for where to perform the wait,
You must ensure that only the parent wait()s.
You should wait at or before the earliest point where you need the child to have finished (but not before forking), OR
if you don't care when or whether the child finishes, but you need to clean up resources, then you can periodically call waitpid(-1, NULL, WNOHANG) to collect a zombie child if there is one, without blocking if there isn't any.
In particular, you must not wait() (unconditionally) immediately after fork()ing because parent and child run the same code. You must use the return value of fork() to determine whether you are in the child (return value == 0), or in the parent (any other return value). Furthermore, the parent must wait() only if forking was successful, in which case fork() returns the child's pid, which is always greater than zero. A return value less than zero indicates failure to fork.
Your program doesn't really need to wait() because it spawns exactly four (not three) children, then exits. However, if you wanted the parent to have at most one live child at any time, then you could write it like this:
int main() {
pid_t child;
int i;
printf("-----------------------------------\n");
about("Parent");
for (i = 0; i < 3; i++) {
printf("Now .. Forking !!\n");
child = fork();
if (child < 0) {
perror ("Unable to fork");
break;
} else if (child == 0) {
printf ("In child #%d\n", (i+1));
about ("Child");
break;
} else {
/* in parent */
if (waitpid(child, NULL, 0) < 0) {
perror("Failed to collect child process");
break;
}
}
}
return 0;
}
If the parent exits before one or more of its children, which can happen if it does not wait, then the child will thereafter see its parent process being pid 1.
Others have already answered how to get a zombie process list via th ps command. You may also be able to see zombies via top. With your original code you are unlikely to catch a glimpse of zombies, however, because the parent process exits very quickly, and init will then clean up the zombies it leaves behind.
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
You can use wait() anywhere in the parent process, and when the child process terminates it'll be removed from the system. Where to put it is up to you, in your specific case you probably want to put it immediately after the child = fork(); line so that the parent process won't resume its execution until its child has exited.
What is the command to view zombie processes if you have Linux virtual box?
You can use the ps aux command to view all processes in the system (including zombie processes), and the STAT column will be equal to Z if the process is a zombie. An example output would be:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
daniel 1000 0.0 0.0 0 0 ?? Z 17:15 0:00 command
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
You can register a signal handler for SIGCHLD that sets a global volatile sig_atomic_t flag = 0 variable to 1. Then, at some convenient place in your program, test whether flag is set to 1, and, if so, set it back to 0 and afterwards (for otherwise you might miss a signal) call waitpid(-1, NULL, WNOHANG) in a loop until it tells you that no more processes are to be waited for. Note that the signal will interrupt system calls with EINTR, which is a good condition to check for the value of flag. If you use an indefinitely blocking system call like select(), you might want to specify a timeout after which you check for flag, since otherwise you might miss a signal that was raised after your last waitpid() call but before entering the indefinitely blocking system call. An alternative to this kludge is to use pselect().
Use:
ps -e -opid,ppid,pgid,stat,etime,cmd | grep defunct
to see your zombies, also the ppid and pgid to see the parent ID and process group ID. The etime to see the elapsed (cpu) time your zombie has been alive. The parent ID is useful to send custom signals to the parent process.
If the parent process is right coded to catch and handle the SIGCHLD signal, and to what expected (i.e., wait/reap the zombies), then you can submit:
kill -CHLD <parent_pid>
to tell the parent to reap all their zombies.
So basically what i need is:
pid = fork();
if (pid == -1)
exit(1);
if (pid == 0)
{
// do stuff in child
}
else
{
// ONLY do stuff while child is running
}
would I need to create a tmp file right before the child exits saying that it is no longer running so the parent knows the child has exited when that file exists, or is there a simpler way to do this?
You can use waitpid to know if a child process is still running:
int status;
if (waitpid(pid, &status, WNOHANG) == 0) {
// still running
}
With WNOHANG, waitpid returns immediately so that the program can do something else.
When you have nothing to do other than waiting for the child process to terminate, call waitpid without WNOHANG.
The standard way to know that the child has terminated (and get its exit code) is to use the waitpid() system call.
Check wait () and waitpid () : http://linux.die.net/man/2/wait
Here is some more resource: http://users.actcom.co.il/~choo/lupg/tutorials/multi-process/multi-process.html#child_death_wait_syscall
There's a bunch of ways to do it. If you don't need to do anything with the child output, you can set a SIGCHLD handler to reap the child when it exits, and then forget about it in your main thread of execution. You can use the SIGCHLD handler to flag the exit of the child process via an IPC mechanism.
Or you can add a while loop that checks waitpid in your else clause. You would be doing discrete units of work between polls of the child state and you wouldn't get interrupted immediately on child exit.
Use the system wait() call if you just need to check if the child has stopped running.
int pid;
int status;
while (true)
{
pid = wait(&status);
if (pid < 0)
//keep waiting
else if (pid == 0)
//child is done
}