I have understood that:
1) waitpid is used to wait for a child's death and then collect the SIGCHLD and the exit status of the child etc.
2) When we have a signal handler for SIGCHLD, we do some more things related to cleanup of child or other stuff (upto the programmer) and then do a waitpid so that the child will not go zombie and then return.
Now, do we need to have both 1 and 2 in our programs when we do a fork/exec and the child returns ?
If we have both, the SIGCHLD is obtained first, so the signal handler is called first and thus its waitpid is called successfully and not the waitpid in the parent process code as follows:
my_signal_handler_for_sigchld
{
do something
tmp = waitpid(-1,NULL,0);
print tmp (which is the correct value of the child pid)
}
int main ()
{
sigaction(SIGCHLD, my_signal_handler_for_sigchld)
fork()
if (child) //do something, return
if parent // waitpid(child_pid, NULL,0); print value returned from this waitpid - it is -1
}
Appreciate if someone helps me understand this.
You really don't need to handle SIGCHLD if your intent is to run a child process, do some stuff, then wait for it to finish. In that case, you just call waitpid when you're ready to synchronize. The only thing SIGCHLD is useful for is asynchronous notification of child termination, for example if you've got an interactive (or long-running daemon) application that's spawning various children and needs to know when they finish. However, SIGCHLD is really bad/ugly for this purpose too, since if you're using library code that creates child processes, you might catch the events for the library's children terminating and interfere with its handling of them. Signal handlers are inherently process-global and deal with global state, which is usually A Bad Thing(tm).
Here are two better approaches for when you have child processes that will be terminating asynchronously:
Approach 1 (select/poll event-based): Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
Approach 2 (thread based): For each child process you create, also create a thread that will immediately call waitpid on the child process's pid. When waitpid returns successfully, use your favorite thread synchronization primitives to let the rest of the program know that the child terminated, or simply take care of everything you need to do in this waiter thread before it terminates.
Both of these approaches are modular and library-friendly (they avoid interfering with any other parts of your code or library code which might be making use of child processes).
You need to call the waiting syscalls like waitpid or friends -eg wait4 etc- othewise you could have zombie processes.
You could handle SIGCHLD to be notified that some child ended (or stopped, etc...) but you'll need to wait for it later.
Signal handlers are restricted to call a small set of async-signal-safe-functions (see signal(7) for more). Good advice is to just set a volatile sig_atomic_t flag inside, and test it at later and safer places.
Related
I have a daemon application that starts several 3rd party executables (all closed-sources and non modifiable).
I would like to have all the child processes to automatically terminate when the parent exits for any reason (including crashes).
Currently, I am using prctl to achieve this (see also this question):
int ret = fork();
if (ret == 0) {
//Setup other stuff
prctl (PR_SET_PDEATHSIG, SIGKILL);
if (execve( "childexecutable" ) < 0) { /*signal error*/}
}
However, if "childexecutable" also forks and spawns "grandchildren", then "grandchildren" is not killed when my process exits.
Maybe I could create an intermediate process that serves as subreaper, that would then kill "someexecutable" when my process dies, but then wait for SIGCHLD and continue to kill child processes until none is left, but it seems very brittle.
Are there better solutions?
Creating a subreaper is not useful in this case, your grandchildren would be reparented to and reaped by init anyway.
What you could do however is:
Start a parent process and fork a child immediately.
The parent will simply wait for the child.
The child will carry out all the work of your actual program, including spawning any other children via fork + execve.
Upon exit of the child for any reason (including deathly signals e.g. a crash) the parent can issue kill(0, SIGKILL) or killpg(getpgid(0), SIGKILL) to kill all the processes in its process group. Issuing a SIGINT/SIGTERM before SIGKILL would probably be a better idea depending on what child processes you want to run, as they could handle such signals and do a graceful cleanup of used resources (including children) before exiting.
Assuming that none of the children or grandchildren changes their process group while running, this will kill the entire tree of processes upon exit of your program. You could also keep the PR_SET_PDEATHSIG before any execve to make this more robust. Again depending on the processes you want to run a PR_SET_PDEATHSIG with SIGINT/SIGTERM could make more sense than SIGKILL.
You can issue setpgid(getpid(), 0) before doing any of the above to create a new process group for your program and avoid killing any parents when issuing kill(0, SIGKILL).
The logic of the "parent" process should be really simple, just a fork + wait in a loop + kill upon the right condition returned by wait. Of course, if this process crashes too then all bets are off, so take care in writing simple and reliable code.
I am reading https://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551, and there the author handle the sigchld in handler that calls waitpid rather then wait.
In Figure 5.7, we cannot call wait in a loop, because there is no way to prevent wait from blocking if there are
running children that have not yet terminated.
The handler is as follows:
void sig_chld(int signo)
{
pid_t pid;
int stat;
// while ((pid = wait(&stat)) > 0)
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0)
{
printf("child %d terminated\n", pid);
}
}
The question is, even if I use the blocking version of wait (as is commented out), the child are terminated anyway (which is what I want in order to not have zombies), so why to even bother whether it is in blocking way or non-blocking?
I assume when it is non-blocking way (i.e. with waitpid), then I can call the handler multiple times? (when some childs are terminated, and other are still running). But still I can just block and wait in that handler for all child to terminate. So no difference between calling the handler multiple times or just once. Or is there any other reason for non-blocking and calling the handler multiple times?
The while loop condition will run one more time than there are zombie child processes that need to be waited for. So if you use wait() instead of waitpid() with the WNOHANG flag, it'll potentially block forever if you have another still running child - as wait() only returns early with an ECHLD error if there are no child processes at all. A robust generic handler will use waitpid() to avoid that.
Picture a case where the parent process starts multiple children to do various things, and periodically sends them instructions about what to do. When the first one exits, using wait() in a loop in the SIGCHLD handler will cause it to block forever while the other child processes are hanging around waiting for more instructions that they'll never receive.
Or, say, an inetd server that listens for network connections and forks off a new process to handle each one. Some services finish quickly, some can run for hours or days. If it uses a signal handler to catch exiting children, it won't be able to do anything else until the long-lived one exits if you use wait() in a loop once that handler is triggered by a short-lived service process exiting.
It took me a second to understand what the problem was, so let me spell it out. As everyone else has pointed out, wait(2) may block. waitpid(2) will not block if the WNOHANG option is specified.
You are correct to say that the first call to wait(2) should not block since the the signal handler will only be called if a child exited. However, when the signal handler is called there may be several children that may have exited. Signals are delivered asynchronously, and can be consolidated. So, if two children exit at close to the same time, it's possible that the operating system only sends one signal to the parent process. Thus, a loop must be used to iteratively check if more than one child has exited.
Now that we've established that the loop is necessary for checking if multiple children have exited, it is clear that we can't use wait(2) since it will block on the second iteration if there is a child that has not exited.
TL;DR The loop is necessary, and hence using waitpid(2) is necessary.
I have just had a lecture that sums reaping as:
Reaping
Performed by parent on terminated child (using wait or waitpid)
Parent is given exit status informaton
Kernel then deletes zombie child process
So I understand that reaping is done by calling wait or waitpid from the parent process after which the kernel deletes the zombie process. If this actually is the case, that reaping is done only when calling wait or waitpid, why do the child processes actually go away after returning in theor entry function - I mean that indeed does seem as if the child processes have been reaped and thus no resources are wasted even though the parent process may not be waiting.
So is "reaping" only possible when calling wait or waitpid? Is processes are "reaped" as long as they return and exit from their entry function (which I assume all processes do) - what is the point of talking about "reaping" as if it was something special?
The child process does not fully "go away" when it exits. It ceases to exist as a running process, and most/all of its resources (memory, open files, etc.) are released, but it still remains in the process table. It remains in the process table because that's where its exit status is stored, so that the parent can retrieve it by calling one of the wait variants. If the parent fails to call wait, the process table entry sticks around — and that's what makes it a "zombie".
I said that most/all of its resources are released, but the one resource that's definitely still consumed is that process table slot.
As long as the (dead) child's parent exists, the kernel doesn't know that the parent isn't going to call wait eventually, so the process table slot has to stay there, so that the eventual call to wait (if there is one) can return the proper exit status.
If the parent eventually exits (without ever calling wait), the child will be inherited by the grandparent, which is usually a "master" process like the shell, or init, that does routinely call wait and that will finally "reap" the poor young zombie.
So, yes, it really is true that the only way for the parent to properly "reap" the child is, just as was said in your lecture, to call one of the wait functions. (Or to exit, but that's not an option if the parent is long-running.)
Footnote: I said "the child will be inherited by the grandparent", but I think I was wrong, there. Under Unix and Linux, orphaned processes are generally always inherited by pid 1, aka init.
The purpose of the wait*() call is to allow the child process to report a status back to the parent process. When the child process exits, the operating system holds that status data in a little data structure until the parent reads it. Reaping in that sense is cleaning out that little data structure.
If the parent does not care about waiting for status from the child, the code could be written in a way to allow the parent to ignore the status, and so the reaping occurs semi-automatically. One way is to ignore the SIGCHLD signal.
Another way is to perform a double-fork to create a grandchild process instead. When doing this, the "parent" does a blocking wait() after a call to fork(). Then, the child performs another fork() to create the grandchild and then immediately exits, causing the parent to unblock. The grandchild now does the real work, and is automatically reaped by the init process.
In C programming for Linux, I know the wait() function is used to wait for the child process to terminate, but are there some ways (or functions) for child processes to wait for parent process to terminate?
Linux has an extension (as in, non-POSIX functions) for this. Look up prctl ("process-related control").
With prctl, you can arrange for the child to get a signal when the parent dies. Look for the PR_SET_PDEATHSIG operation code used with prctl.
For instance, if you set it to the SIGKILL signal, it effectively gives us a way to have the children die when a parent dies. But of course, the signal can be something that the child can catch.
prctl can do all kinds of other things. It's like an ioctl whose target is the process itself: a "process ioctl".
Short answer: no.
A parent process can control the terminal or process group of its children, which is why we have the wait() and waitpid() functions. A child doesn't have that kind of control over its parent, so there's nothing built in for that.
If you really need a child to know when its parent exits, you can have the parent send a signal to the child in an atexit() handler, and have the child catch that signal.
In Linux, you can use prctl with the value PR_SET_PDEATHSIG to establish a signal that will be sent to your process when the thread that created it dies. Maybe you find it useful.
When a parent process ends, child process is adopted by init, so it is enough to check in child proces if ppid()==1 or ppid()!= than oryginal PPID
That means the parent process was finished.
I'm want to create a lot of child processes using the fork > exec procedure. Many processes are ending very fast (in less than two minutes, some even earlier).
My first problem is, I put the spawn process into the background with
./spawnbot > logging.txt
[CTRL+Z]
bg 1
disown
So far so good. Now I don't see any of the spawnbot's messages anymore and they go straight into the logging.txt. However, whenever a new child is created I see all the info about that child in my console again.. I now wanted to start each child with it's own pipe - is there a better way to not have children post their output messages all over the console? Should I just redirect it to /dev/null or is this done with some flag in C?
Secondly, all the children don't really get killed. I have a lot of processes in my ps -ef. What can I do about that? How do I d
First your second question!
Your children stay in 'zombie' mode because the kernel thinks you might still want to retrieve a return value from them..
If you have no intention to get return values from your child processes, you should set the SIGCHLD signal handler in the parent process to SIG_IGN to have the kernel automatically reap your children.
signal(SIGCHLD, SIG_IGN);
The first question depends a it on your implementation..
But general speaking, just after you fork() you should use close() to close the old file descriptors for 0 and 1 and then use dup2() to set them to your wanted values.. No time for an example right now, but hope this pushes you in the right direction..
Your child processes are getting killed. Defunct processes are also called zombie processes; zombies are dead! A zombie process is nothing but an entry in the process table, it doesn't have any code or memory.
When a process dies (by calling _exit, or killed by a signal), it must be reaped by its parent. Every resource used by the process other than the entry in the process table disappears. The parent must call wait or waitpid. Once the parent has been notified of the child process's death, and has had the opportunity to read the child's exit status, the child's entry in the process table disappears as well: the zombie is reaped.
If you never want to be notified of your children's death, ignore the SIGCHLD signal; this tells the kernel that you're not interested in knowing the fate of your children and the zombie will be reaped automatically.
signal(SIGCHLD, SIG_IGN)
If you only want to be notified of your children's deaths in specific circumstances, call sigaction with the SA_NOCLDWAIT flag. When a child dies, if the parent is executing one of the wait family of functions, it'll be notified of the child's death and be told the exit status; otherwise the child's exit status will be discarded.
struct sigaction sa;
sa.sa_handler = &my_sigchld_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_NOCLDWAIT;
sigaction(SIGCHLD, &sa, NULL);
Regarding the output, your children write to the same places as the parent unless you've explicitly redirected them (with close and open, or dup, or a number of other possibilities). Your children are probably printing diagnostic messages to standard error (that's what it's for, after all).
./spawnbot >logging.txt 2>&1
In addition, since you seem to want to detach the children from the terminal, you probably want to make sure they don't receive a SIGHUP if you kill the terminal. So use nohup:
nohup ./spawnbot >logging.txt 2>&1 &
disown