What happens when two processes wait for the same child? - c

From what I've read the default behavior for wait/waitpid is to wait for a state change in a process. What I can't find is the expected behavior of two processes waitpid using the same pid_t argument.
Do both return and continue execution, or is it a race condition where only one notices the state change?

Only the parent can wait() for a process, and a process can of course have only one parent.
The parent process might, however, have multiple threads. In the case of multiple threads waiting for the same child, POSIX specifies that only one of them will see the state change. To allow multiple threads to see the state change, you must use waitid() with the WNOWAIT flag.
POSIX: status information

Related

How to asynchronously wait for process termination and find out the exit code?

The waiting works fine with pidfd_open and poll.
The problem I’m facing, after the process quits, apparently the poll() API removes the information about the now dead process, so the waitid with P_PIDFD argument fails at once saying code 22 “Invalid argument”
I don’t think I can afford launching a thread for every child process to sleep on the blocking waitpid, I have multiple processes, and another handles which aren’t processes I need to poll efficiently.
Any workarounds?
If it matters, I only need to support Linux 5.13.12 and newer running on ARM64 and ARMv7 CPUs.
The approximate sequence of kernel calls is following:
fork
In the child: setresuid, setresgid, execvpe
In the new child: printf, sleep, _exit
Meanwhile in the parent: pidfd_open, poll, once completed waitid with P_PIDFD first argument.
Expected result: waitid should give me the exit code of the child.
Actual result: it does nothing and sets errno to EINVAL
There is one crucial bit. From man waitid:
Applications shall specify at least one of the flags WEXITED, WSTOPPED, or WCONTINUED to be OR'ed in with the options argument.
I was passing was WNOHANG
And you want to pass WNOHAND | WEXITED ;)
You can use a single reaper thread, looping on waitpid(-1, &status, 0). Whenever it reaps a child process, it looks it up in the set of current child processes, handles possible notifications (semaphore or callback), and stores the exit status.
There is one notable situation that needs special consideration: the child process may exit before fork() returns in the parent process. This means it is possible for the reaper to see a child process exiting before the code that did the fork() manages to register the child process ID in any data structure. Thus, both the reaper and the fork() registering functions must be ready to look up or create the record in the data store keeping track of child processes; including calling the callback or posting the semaphore. It is not complicated at all, but unless you are used to thinking in asynchronous terms, it is easy to miss these corner cases.
Because wait(...)/waitpid(-1,...) returns immediately when there are no child processes to wait for (with -1 and errno set to ECHILD), the reaper thread should probably wait on a condition variable when there are no child processes to wait for, with the code that registers the child process ID signaling on that condition variable to minimize resource use in the no-child-processes case. (Also, do remember to minimize the reaper thread stack size, as it is unreasonably large (order of 8 MiB) by default, and wastes resources. I often use 2*PTHREAD_STACK_MIN, myself.)

Why to use non-blocking waitpid instead of blocking wait?

I am reading https://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551, and there the author handle the sigchld in handler that calls waitpid rather then wait.
In Figure 5.7, we cannot call wait in a loop, because there is no way to prevent wait from blocking if there are
running children that have not yet terminated.
The handler is as follows:
void sig_chld(int signo)
{
pid_t pid;
int stat;
// while ((pid = wait(&stat)) > 0)
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0)
{
printf("child %d terminated\n", pid);
}
}
The question is, even if I use the blocking version of wait (as is commented out), the child are terminated anyway (which is what I want in order to not have zombies), so why to even bother whether it is in blocking way or non-blocking?
I assume when it is non-blocking way (i.e. with waitpid), then I can call the handler multiple times? (when some childs are terminated, and other are still running). But still I can just block and wait in that handler for all child to terminate. So no difference between calling the handler multiple times or just once. Or is there any other reason for non-blocking and calling the handler multiple times?
The while loop condition will run one more time than there are zombie child processes that need to be waited for. So if you use wait() instead of waitpid() with the WNOHANG flag, it'll potentially block forever if you have another still running child - as wait() only returns early with an ECHLD error if there are no child processes at all. A robust generic handler will use waitpid() to avoid that.
Picture a case where the parent process starts multiple children to do various things, and periodically sends them instructions about what to do. When the first one exits, using wait() in a loop in the SIGCHLD handler will cause it to block forever while the other child processes are hanging around waiting for more instructions that they'll never receive.
Or, say, an inetd server that listens for network connections and forks off a new process to handle each one. Some services finish quickly, some can run for hours or days. If it uses a signal handler to catch exiting children, it won't be able to do anything else until the long-lived one exits if you use wait() in a loop once that handler is triggered by a short-lived service process exiting.
It took me a second to understand what the problem was, so let me spell it out. As everyone else has pointed out, wait(2) may block. waitpid(2) will not block if the WNOHANG option is specified.
You are correct to say that the first call to wait(2) should not block since the the signal handler will only be called if a child exited. However, when the signal handler is called there may be several children that may have exited. Signals are delivered asynchronously, and can be consolidated. So, if two children exit at close to the same time, it's possible that the operating system only sends one signal to the parent process. Thus, a loop must be used to iteratively check if more than one child has exited.
Now that we've established that the loop is necessary for checking if multiple children have exited, it is clear that we can't use wait(2) since it will block on the second iteration if there is a child that has not exited.
TL;DR The loop is necessary, and hence using waitpid(2) is necessary.

What if the child exits before the parent calls wait()?

I am learning the wait() method in C. And I know that it blocks the parent process until one of its child processes terminates. But what if the kernel decides to schedule the child first and the child process terminates before parent can call the wait()? Is that the parent will wait there forever(without other interrupts) since it can not observe the return of a child?
In the photo, if the execution sequence is: fork --> HC --> exit -->HP-->wait, then the situation I describe will happen.
No, the parent will not wait forever.
The documentation on wait states:
All of these system calls are used to wait for state changes in a
child of the calling process, and obtain information about the child
whose state has changed. A state change is considered to be: the
child terminated; the child was stopped by a signal; or the child was
resumed by a signal. In the case of a terminated child, performing a
wait allows the system to release the resources associated with the
child; if a wait is not performed, then the terminated child remains
in a "zombie" state .
If a child has already changed state, then these calls return immediately.
But what if the kernel decides to schedule the child first and the
child process terminates before parent can call the wait()?
It is a pretty possible case. If one of the wait family functions is used by the parent or signal(SIGCHLD, SIG_IGN); is called explicitly before forking, it does not turn the child into a zombie even if the parent process is preempted(=not permitted to use CPU at that time).
Moreover, the need of wait or signal-ignorance mentioned is to clean process's unused datas. While using one of the methods, the kernel is told that the child(ren) process is not used anymore. So, you can cleanup unused system resources.

How to restrict child thread or a child process to restrict from forking in C

In C language,I have a child thread(using pthreads),
Is there any way to restrict this child, so that we can't call fork inside this thread?
If we write fork inside, program should not compile.
I can also have a child process instead of child thread, as long as it cannot fork further.
Basically how can I have a child process or child thread, which cannot fork a process any further.
You can always try to play games with pthread_atfork: http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
Basically, you can use pthread_atfork() to install a "child" callback which always calls exit(). This way, your threads may still fork, but the forked process will exit immediately, so no harm will be done (and only a minimal overhead incurred).
With processes it may be somewhat more complicated. Linux allows you to limit a number of processes per user (so called RLIMIT_NPROC when set with setrlimit()). When this limit is reached, no further forks are possible for a given user id. Thus, you can create a parent process with a CAP_SETUID capability and a dummy user, having the RLIMIT_NPROC set to 1. This way, you can fork from parent, change the child uid to that of the "limited" user you've created in advance and drop the CAP_SETUID capability. At this point, child will have no possible way to fork itself.

question about Fork()

When a parent process creates a child process with fork(), according to me,
the child process is in a Running state whereas the parent process is in a Ready state, i.e. waiting for the child to end.
Am I right?
No, the fork creates a copy of the parent.
Then you generally tests for the return value of fork which says 0 = I am the child, other: I'm the parent and the child has the return value as PID
If the parent has to wait for the child to end, you need to use the wait function.
Edit:
see http://linux.die.net/man/2/fork and http://linux.die.net/man/2/wait for the fork() in C.
Here is something from
After a fork(), it is indeterminate
which process—the parent or the
child—next has access to the CPU.
Applications that implicitly or
explicitly rely on a particular
sequence of execution in order to
achieve correct results are open to
failure due to race conditions.
It goes on to point different behaviors in different kernels. The bottom line is that it's implementation-defined and not to be relied upon.
Also if you do want to rely on it, on Linux since 2.6.32 "there's a sysctl for that"
kernel.sched_child_runs_first
Cheers

Resources