I am reading https://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551, and there the author handle the sigchld in handler that calls waitpid rather then wait.
In Figure 5.7, we cannot call wait in a loop, because there is no way to prevent wait from blocking if there are
running children that have not yet terminated.
The handler is as follows:
void sig_chld(int signo)
{
pid_t pid;
int stat;
// while ((pid = wait(&stat)) > 0)
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0)
{
printf("child %d terminated\n", pid);
}
}
The question is, even if I use the blocking version of wait (as is commented out), the child are terminated anyway (which is what I want in order to not have zombies), so why to even bother whether it is in blocking way or non-blocking?
I assume when it is non-blocking way (i.e. with waitpid), then I can call the handler multiple times? (when some childs are terminated, and other are still running). But still I can just block and wait in that handler for all child to terminate. So no difference between calling the handler multiple times or just once. Or is there any other reason for non-blocking and calling the handler multiple times?
The while loop condition will run one more time than there are zombie child processes that need to be waited for. So if you use wait() instead of waitpid() with the WNOHANG flag, it'll potentially block forever if you have another still running child - as wait() only returns early with an ECHLD error if there are no child processes at all. A robust generic handler will use waitpid() to avoid that.
Picture a case where the parent process starts multiple children to do various things, and periodically sends them instructions about what to do. When the first one exits, using wait() in a loop in the SIGCHLD handler will cause it to block forever while the other child processes are hanging around waiting for more instructions that they'll never receive.
Or, say, an inetd server that listens for network connections and forks off a new process to handle each one. Some services finish quickly, some can run for hours or days. If it uses a signal handler to catch exiting children, it won't be able to do anything else until the long-lived one exits if you use wait() in a loop once that handler is triggered by a short-lived service process exiting.
It took me a second to understand what the problem was, so let me spell it out. As everyone else has pointed out, wait(2) may block. waitpid(2) will not block if the WNOHANG option is specified.
You are correct to say that the first call to wait(2) should not block since the the signal handler will only be called if a child exited. However, when the signal handler is called there may be several children that may have exited. Signals are delivered asynchronously, and can be consolidated. So, if two children exit at close to the same time, it's possible that the operating system only sends one signal to the parent process. Thus, a loop must be used to iteratively check if more than one child has exited.
Now that we've established that the loop is necessary for checking if multiple children have exited, it is clear that we can't use wait(2) since it will block on the second iteration if there is a child that has not exited.
TL;DR The loop is necessary, and hence using waitpid(2) is necessary.
Related
I have a daemon application that starts several 3rd party executables (all closed-sources and non modifiable).
I would like to have all the child processes to automatically terminate when the parent exits for any reason (including crashes).
Currently, I am using prctl to achieve this (see also this question):
int ret = fork();
if (ret == 0) {
//Setup other stuff
prctl (PR_SET_PDEATHSIG, SIGKILL);
if (execve( "childexecutable" ) < 0) { /*signal error*/}
}
However, if "childexecutable" also forks and spawns "grandchildren", then "grandchildren" is not killed when my process exits.
Maybe I could create an intermediate process that serves as subreaper, that would then kill "someexecutable" when my process dies, but then wait for SIGCHLD and continue to kill child processes until none is left, but it seems very brittle.
Are there better solutions?
Creating a subreaper is not useful in this case, your grandchildren would be reparented to and reaped by init anyway.
What you could do however is:
Start a parent process and fork a child immediately.
The parent will simply wait for the child.
The child will carry out all the work of your actual program, including spawning any other children via fork + execve.
Upon exit of the child for any reason (including deathly signals e.g. a crash) the parent can issue kill(0, SIGKILL) or killpg(getpgid(0), SIGKILL) to kill all the processes in its process group. Issuing a SIGINT/SIGTERM before SIGKILL would probably be a better idea depending on what child processes you want to run, as they could handle such signals and do a graceful cleanup of used resources (including children) before exiting.
Assuming that none of the children or grandchildren changes their process group while running, this will kill the entire tree of processes upon exit of your program. You could also keep the PR_SET_PDEATHSIG before any execve to make this more robust. Again depending on the processes you want to run a PR_SET_PDEATHSIG with SIGINT/SIGTERM could make more sense than SIGKILL.
You can issue setpgid(getpid(), 0) before doing any of the above to create a new process group for your program and avoid killing any parents when issuing kill(0, SIGKILL).
The logic of the "parent" process should be really simple, just a fork + wait in a loop + kill upon the right condition returned by wait. Of course, if this process crashes too then all bets are off, so take care in writing simple and reliable code.
There is a way to look when pid/tid status change with waitpid but this is blocking function.
I want to monitor all threads in specific pid and get signal when one of them change and print the tid.
For now I open threads as count of threads in that process and each 1 make waitpid on 1 tid and after that blocking function finish I print that tid that changed.
How can I get a signal that tid change so I can monitor all tid's in 1 thread.
I didn't want to monitor all pid in system only specific pid/tid.
Those tids/pids are not children of my process.
You can call
int status;
pid_t waitpid(-1, &status, 0);
to wait for any child process change.
So you do not have to specify in advance, which pid to monitor, and can react on any status change. This way you do not need to start one thread for each pid.
As to the signal part of your question: A SIGCHLD is sent to your process when a child process exits. This signal is ignored by default, but you can install a custom signal handler for it, of course.
If you only want to reap specific pids, linux provides the option WNOWAIT, which only reports the state, but does not really reap the child process. Now you can check, if the pid is one of those you want to monitor, and if so, call waitpid() again without the option.
If the processes are not children, waitpid() cannot be used in general. One option is, to attach with ptrace() to these 40 processes to get signalled, if one of these processes exit. This might have unwanted side-effects, however.
If you're using POSIX threads, then you could use pthread_cleanup_push and pthread_cleanup_pop to call a "cleanup" function when your thread is exiting.
This "cleanup" function could then send one of the user signals (SIGUSR1 or SIGUSR2) to the process which then catches it and treats it as a signal about thread termination.
If you use sigqueue you can add the thread-id for the signal handler so it knows which thread just exited.
You can use pthread_sigmask to block the user signal in all threads, to make sure it's only delivered to the main process thread (or use pthread_sigqueue to send to the main process thread specifically).
Is there any way in C programming language , to stop a child process , and then call it again to start from the beginning? I have realised that if I use SIGKILL and then call the child process again nothing happens.
void handler {
printf(“entered handler”);
kill(getpid(),SIGKILL);
}
int main () {
pid_t child;
child=fork();
if (child<0) printf(“error”);
else if (child==0) {
signal(SIGINT,handler);
pause();
}
else {
kill(child,SIGINT);
kill(child,SIGINT);
}
This should print two times “Entered Handler” but it does not. Probably because it cannot call child again . Could I correct this in some way?
This should print two times “Entered Handler” but it does not.
Probably because it cannot call child again .
There are several problems here, but a general inability to deliver SIGINT twice to the same process is not one of them. The problems include:
The signal handler delivers a SIGKILL to the process in which it is running, effecting that process's immediate termination. Once terminated, the process will not respond to further signals, so there is no reason to expect that the child would ever print "entered handler" twice.
There is a race condition between the child installing a handler for SIGINT and the parent sending it that signal. If the child receives the signal before installing a handler for it, then the child will terminate without producing any output.
There is a race condition between the the first signal being accepted by the child and the second being delivered to it. Normal signals do not queue, so the second will be lost if delivered while the first is still pending.
There is a race condition between the child blocking in pause() and the parent signaling. If the signal handler were not killing the child, then it would be possible for the child to receive both signals before reaching the pause() call, and therefore fail to terminate at all.
In the event that the child made it to blocking in pause() before the parent first signaled it, and if it did not commit suicide by delivering itself a SIGKILL, then the signal should cause it to unblock and return from pause(), on a path to terminating normally. Thus, there would then also be a race condition between delivery of the second signal and normal termination of the child.
The printf() function is not async-signal safe. Calling it from a signal handler produces undefined behavior.
You should always use sigaction() to install signal handlers, not signal(), because the behavior of signal() is underspecified and varies in practice. The only safe use for signal() is to reset the disposition of a signal to its default.
Could I correct this in
some way?
Remove the kill() call from the signal handler.
Replace the printf() call in the signal handler with a corresponding write() call.
Use sigaction() instead of signal() to install the handler. The default flags should be appropriate for your use.
Solve the various race conditions by
Having the parent block SIGINT (via sigprocmask()) before forking, so that it will initially be blocked in the child.
Have the child use sigsuspend(), with an appropriate signal mask, instead of pause().
Have the child send some kind of response to the parent after returning from sigsuspend() (a signal of its own, perhaps, or a write to a pipe that the parent can read), and have parent await that response before sending the second signal.
Have the child call sigsuspend() a second time to receive the second signal.
I'm a little overwhelmed by how many ways you can control processes, like wait() pause() signal handling etc. All I want is to resume a paused process, and execute the line after the pause() statement afterward, like so:
/* Child code */
pause();
execvp(args[index], args);
The topology of my processes is linear children. One parent, n children, no grandchildren. So after the parent finishes forking, I have it running this loop to try to wake them up in order:
// Parent iterates through n child processes
for (i = 0; i < n; i++) {
// Need to unpause here, do i need signals?
signal(SIGCONT, sighandler);
// I don't know what im doing
}
wait(&status);
I can get their process IDs if that helps, but I dont know what to do with them.
From the pause(2) man page (emphasis mine):
pause() causes the calling process (or thread) to sleep until a signal is delivered that either terminates the process or causes the invocation of a signal-catching function.
And more specifically:
pause() only returns when a signal was caught and the signal-catching function returned.
This means that for your child to unpause, you need to send it a signal (and probably a custom signal handler).
This is a simple signal handling function - Usually these are put at the top of your page (under the imports) or in the header file.
void handleContinueSignal(int sig) {
myGlobalStaticContinueVariable = 1; // Or some other handling code
}
And this is how you announce that your signal handling function should be associated with the SIGCONT signal, should it ever be received. You'll probably only want your child process to run this line. Make sure you put it in before the pause though - getting signal handlers running is one of the first things that a new process should do.
signal(SIGCONT, handleContinueSignal); // Name of the signal handling function
Finally, you can make your parent send a SIGCONT signal to the child by giving its PID like this:
kill(yourChildPID, SIGCONT);
For your code, you'll have to make the parent loop though and call this once for each child's PID, which will wake each of them up in turn.
I have understood that:
1) waitpid is used to wait for a child's death and then collect the SIGCHLD and the exit status of the child etc.
2) When we have a signal handler for SIGCHLD, we do some more things related to cleanup of child or other stuff (upto the programmer) and then do a waitpid so that the child will not go zombie and then return.
Now, do we need to have both 1 and 2 in our programs when we do a fork/exec and the child returns ?
If we have both, the SIGCHLD is obtained first, so the signal handler is called first and thus its waitpid is called successfully and not the waitpid in the parent process code as follows:
my_signal_handler_for_sigchld
{
do something
tmp = waitpid(-1,NULL,0);
print tmp (which is the correct value of the child pid)
}
int main ()
{
sigaction(SIGCHLD, my_signal_handler_for_sigchld)
fork()
if (child) //do something, return
if parent // waitpid(child_pid, NULL,0); print value returned from this waitpid - it is -1
}
Appreciate if someone helps me understand this.
You really don't need to handle SIGCHLD if your intent is to run a child process, do some stuff, then wait for it to finish. In that case, you just call waitpid when you're ready to synchronize. The only thing SIGCHLD is useful for is asynchronous notification of child termination, for example if you've got an interactive (or long-running daemon) application that's spawning various children and needs to know when they finish. However, SIGCHLD is really bad/ugly for this purpose too, since if you're using library code that creates child processes, you might catch the events for the library's children terminating and interfere with its handling of them. Signal handlers are inherently process-global and deal with global state, which is usually A Bad Thing(tm).
Here are two better approaches for when you have child processes that will be terminating asynchronously:
Approach 1 (select/poll event-based): Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
Approach 2 (thread based): For each child process you create, also create a thread that will immediately call waitpid on the child process's pid. When waitpid returns successfully, use your favorite thread synchronization primitives to let the rest of the program know that the child terminated, or simply take care of everything you need to do in this waiter thread before it terminates.
Both of these approaches are modular and library-friendly (they avoid interfering with any other parts of your code or library code which might be making use of child processes).
You need to call the waiting syscalls like waitpid or friends -eg wait4 etc- othewise you could have zombie processes.
You could handle SIGCHLD to be notified that some child ended (or stopped, etc...) but you'll need to wait for it later.
Signal handlers are restricted to call a small set of async-signal-safe-functions (see signal(7) for more). Good advice is to just set a volatile sig_atomic_t flag inside, and test it at later and safer places.