How atomic the fork() syscall actually is? - c

Assuming check_if_pid_exists(pid) returns true when a process with such a pid exists (but possibly hasn't been running yet) or false when there is no process with such pid, is there any chance in parent code for a race condition when the fork() returned the child pid, however the kernel hasn't had a chance to initialize the data structures so that check_if_pid_exists(child) returns false? Or perhaps after returning from fork() we have a guarantee that check_if_pid_exists(pid) returns true?
pid_t child = fork();
if (child == 0) {
/* here the child just busy waits */
for (;;)
;
}
if (child > 0) {
/* here the parent checks whether child PID already exists */
check_if_pid_exists(child);
}

No.
The fork() returns after the new task is created and visible as expected by the parent. Most everything wouldn't work otherwise.
Whether the process has had a chance to run at all or has started quite a bit, is not known at that point. Whether the child has completed is known as you receive the SIGCHLD signal once that happens.
Where you can have a race is with the SIGCHLD if not handled properly. That is, you are expected to ignore the SIGCHLD signal, call fork(), save the results appropriately so you have the PID of the child (i.e. allocate a struct to save the child pid_t value), then use one of the wait() functions to know whether the child died.
Assuming your fork()ed process is expected to run for a while, then the
check_if_pid_exists(child) == true
will likely be true 99.9999% of the time (actually, assuming the parent is in control of when the child is expected to exit, make it 100% of the time).
As mentioned by others, many things can prevent the new process from running:
Not enough memory
You already started too many children
The child tries to do something and encounters a fatal error and exits
Some third party thing prevents the fork()
...
Also, fork() may return -1 in case it fails to create the child.
However, if the question was about: how do I track the lifetime of a child? Then the correct answer is for the parent to check whether it died. You should not rely on a function such as check_if_pid_exists() searching for the process under /proc/... or similar implementation (see waitid), because such a function may determine that the process is still running, then the process dies, and yet the function still returns true...
// ignore SIGCHLD
struct sigaction sa, chld;
sa.sa_handler = SIG_IGN;
sa.sa_flags = 0;
__sigemptyset (&sa.sa_mask);
sigaction(SIGCHLD, &sa, NULL);
[...]
int child_is_running = 0;
[...]
pid_t child = fork();
if(child == 0) ...do stuff in the child...
if(child < 0) ...handle error...
int child_is_running = 1;
[...]
wait(...);
if(...child exited...) child_is_running = 0;
Now you can write a safe check_if_pid_exists():
int check_if_pid_exists()
{
return child_is_running;
}
In my example here, I only allow one child. If you need multiple, that's where you need a better scheme (probably a struct to save the child info and a table of some sort or linked list of all your children).

Related

How can waitpid() reap more than one child?

In this example from the CSAPP book chap.8:
\#include "csapp.h"
/* WARNING: This code is buggy! \*/
void handler1(int sig)
{
int olderrno = errno;
if ((waitpid(-1, NULL, 0)) < 0)
sio_error("waitpid error");
Sio_puts("Handler reaped child\n");
Sleep(1);
errno = olderrno;
}
int main()
{
int i, n;
char buf[MAXBUF];
if (signal(SIGCHLD, handler1) == SIG_ERR)
unix_error("signal error");
/* Parent creates children */
for (i = 0; i < 3; i++) {
if (Fork() == 0) {
printf("Hello from child %d\n", (int)getpid());
exit(0);
}
}
/* Parent waits for terminal input and then processes it */
if ((n = read(STDIN_FILENO, buf, sizeof(buf))) < 0)
unix_error("read");
printf("Parent processing input\n");
while (1)
;
exit(0);
}
It generates the following output:
......
Hello from child 14073
Hello from child 14074
Hello from child 14075
Handler reaped child
Handler reaped child //more than one child reaped
......
The if block used for waitpid() is used to generate a mistake that waitpid() is not able to reap all children. While I understand that waitpid() is to be put in a while() loop to ensure reaping all children, what I don't understand is that why only one waitpid() call is made, yet was able to reap more than one children(Note in the output more than one child is reaped by handler)? According to this answer: Why does waitpid in a signal handler need to loop?
waitpid() is only able to reap one child.
Thanks!
update:
this is irrelevant, but the handler is corrected in the following way(also taken from the CSAPP book):
void handler2(int sig)
{
int olderrno = errno;
while (waitpid(-1, NULL, 0) > 0) {
Sio_puts("Handler reaped child\n");
}
if (errno != ECHILD)
Sio_error("waitpid error");
Sleep(1);
errno = olderrno;
}
Running this code on my linux computer.
The signal handler you designated runs every time the signal you assigned to it (SIGCHLD in this case) is received. While it is true that waitpid is only executed once per signal receival, the handler still executes it multiple times because it gets called every time a child terminates.
Child n terminates (SIGCHLD), the handler springs into action and uses waitpid to "reap" the just exited child.
Child n+1 terminates and its behaviour follows the same as Child n. This goes on for every child there is.
There is no need to loop it as it gets called only when needed in the first place.
Edit: As pointed out below, the reason as to why the book later corrects it with the intended loop is because if multiple children send their termination signal at the same time, the handler may only end up getting one of them.
signal(7):
Standard signals do not queue. If multiple instances of a
standard signal are generated while that signal is blocked, then
only one instance of the signal is marked as pending (and the
signal will be delivered just once when it is unblocked).
Looping waitpid assures the reaping of all exited children and not just one of them as is the case right now.
Why is looping solving the issue of multiple signals?
Picture this: you are currently inside the handler, handling a SIGCHLD signal you have received and whilst you are doing that, you receive more signals from other children that have terminated in the meantime. These signals cannot queue up. By constantly looping waitpid, you are making sure that even if the handler itself can't deal with the multiple signals being sent, waitpid still picks them up as it's constantly running, rather than only running when the handler activates, which can or can't work as intended depending on whether signals have been merged or not.
waitpid still exits correctly once there are no more children to reap. It is important to understand that the loop is only there to catch signals that are sent when you are already in the signal handler and not during normal code execution as in that case the signal handler will take care of it as normal.
If you are still in doubt, try reading these two answers to your question.
How to make sure that `waitpid(-1, &stat, WNOHANG)` collect all children processes
Why does waitpid in a signal handler need to loop? (first two paragraphs)
The first one uses flags such as WNOHANG, but this only makes waitpid return immediately instead of waiting, if there is no child process ready to be reaped.

How do you kill zombie process using wait()

I have this code that requires a parent to fork 3 children.
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
What is the command to view zombie processes if you have Linux
virtual box?
main(){
pid_t child;
printf("-----------------------------------\n");
about("Parent");
printf("Now .. Forking !!\n");
child = fork();
int i=0;
for (i=0; i<3; i++){
if (child < 0) {
perror ("Unable to fork");
break;
}
else if (child == 0){
printf ("creating child #%d\n", (i+1));
about ("Child");
break;
}
else{
child = fork();
}
}
}
void about(char * msg){
pid_t me;
pid_t oldone;
me = getpid();
oldone = getppid();
printf("***[%s] PID = %d PPID = %d.\n", msg, me, oldone);
}
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
If your parent spawns only a small, fixed number of children; does not care when or whether they stop, resume, or finish; and itself exits quickly, then you do not need to use wait() or waitpid() to clean up the child processes. The init process (pid 1) takes responsibility for orphaned child processes, and will clean them up when they finish.
Under any other circumstances, however, you must wait() for child processes. Doing so frees up resources, ensures that the child has finished, and allows you to obtain the child's exit status. Via waitpid() you can also be notified when a child is stopped or resumed by a signal, if you so wish.
As for where to perform the wait,
You must ensure that only the parent wait()s.
You should wait at or before the earliest point where you need the child to have finished (but not before forking), OR
if you don't care when or whether the child finishes, but you need to clean up resources, then you can periodically call waitpid(-1, NULL, WNOHANG) to collect a zombie child if there is one, without blocking if there isn't any.
In particular, you must not wait() (unconditionally) immediately after fork()ing because parent and child run the same code. You must use the return value of fork() to determine whether you are in the child (return value == 0), or in the parent (any other return value). Furthermore, the parent must wait() only if forking was successful, in which case fork() returns the child's pid, which is always greater than zero. A return value less than zero indicates failure to fork.
Your program doesn't really need to wait() because it spawns exactly four (not three) children, then exits. However, if you wanted the parent to have at most one live child at any time, then you could write it like this:
int main() {
pid_t child;
int i;
printf("-----------------------------------\n");
about("Parent");
for (i = 0; i < 3; i++) {
printf("Now .. Forking !!\n");
child = fork();
if (child < 0) {
perror ("Unable to fork");
break;
} else if (child == 0) {
printf ("In child #%d\n", (i+1));
about ("Child");
break;
} else {
/* in parent */
if (waitpid(child, NULL, 0) < 0) {
perror("Failed to collect child process");
break;
}
}
}
return 0;
}
If the parent exits before one or more of its children, which can happen if it does not wait, then the child will thereafter see its parent process being pid 1.
Others have already answered how to get a zombie process list via th ps command. You may also be able to see zombies via top. With your original code you are unlikely to catch a glimpse of zombies, however, because the parent process exits very quickly, and init will then clean up the zombies it leaves behind.
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
You can use wait() anywhere in the parent process, and when the child process terminates it'll be removed from the system. Where to put it is up to you, in your specific case you probably want to put it immediately after the child = fork(); line so that the parent process won't resume its execution until its child has exited.
What is the command to view zombie processes if you have Linux virtual box?
You can use the ps aux command to view all processes in the system (including zombie processes), and the STAT column will be equal to Z if the process is a zombie. An example output would be:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
daniel 1000 0.0 0.0 0 0 ?? Z 17:15 0:00 command
How do you know (and) where to put the "wait()" statement to kill
zombie processes?
You can register a signal handler for SIGCHLD that sets a global volatile sig_atomic_t flag = 0 variable to 1. Then, at some convenient place in your program, test whether flag is set to 1, and, if so, set it back to 0 and afterwards (for otherwise you might miss a signal) call waitpid(-1, NULL, WNOHANG) in a loop until it tells you that no more processes are to be waited for. Note that the signal will interrupt system calls with EINTR, which is a good condition to check for the value of flag. If you use an indefinitely blocking system call like select(), you might want to specify a timeout after which you check for flag, since otherwise you might miss a signal that was raised after your last waitpid() call but before entering the indefinitely blocking system call. An alternative to this kludge is to use pselect().
Use:
ps -e -opid,ppid,pgid,stat,etime,cmd | grep defunct
to see your zombies, also the ppid and pgid to see the parent ID and process group ID. The etime to see the elapsed (cpu) time your zombie has been alive. The parent ID is useful to send custom signals to the parent process.
If the parent process is right coded to catch and handle the SIGCHLD signal, and to what expected (i.e., wait/reap the zombies), then you can submit:
kill -CHLD <parent_pid>
to tell the parent to reap all their zombies.

At what point does a fork() child process actually begin?

Does the process begin when fork() is declared? Is anything being killed here?
pid_t child;
child = fork();
kill (child, SIGKILL);
Or do you need to declare actions for the fork process in order for it to actually "begin"?
pid_t child;
child = fork();
if (child == 0) {
// do something
}
kill (child, SIGKILL);
I ask because what I am trying to do is create two children, wait for the first to complete, and then kill the second before exiting:
pid_t child1;
pid_t child2;
child1 = fork();
child2 = fork();
int status;
if (child1 == 0) { //is this line necessary?
}
waitpid(child1, &status, 0);
kill(child2, SIGKILL);
The C function fork is defined in the standard C library (glibc on linux). When you call it, it performs an equivalent system call (on linux its name is clone) by the means of a special CPU instruction (on x86 sysenter). This causes the CPU to switch to a privileged mode and start executing instructions of the kernel. The kernel then creates a new process (a record in a list and accompanying structures), which inherits a copy of memory mappings of the original process (text, heap, stack, and others), file descriptors and more.
The memory areas are marked as non-writable, so that when the new or the original process tries to overwrite them, the kernel gets to handle a CPU exception and perform a copy-on-write (therefore delaying the need to copy a memory page until absolutely necessary). That's because the mappings initially point to the same pages (pieces of physical memory) in both processes.
The kernel then gives execution to the scheduler, which decides which process to run next. It could be the original process, the child process, or any other process running in the system.
Note: The Linux kernel actually puts the child process in front of the parent process in the run queue, so it is run earlier than the parent. This is deemed to give better performance when the child calls exec right after forking.
When execution is given to the original process, the CPU is switched back to nonprivileged mode and starts executing the next instruction. In this case it continues with the fork function of the standard library, which returns the PID of the child process (as returned by the clone system call).
Similarly, the child process continues execution in the fork function, but here it returns 0 to the calling function.
After that, the program continues in both cases normally. The child process has the original process as the parent (this is noted in a structure in the kernel). When it exists, the parent process is supposed to do the cleanup (receiving the exit status of the child) by calling wait.
Note: The clone system call is rather complicated, because it unifies fork with the creation of threads, as well as linux namespaces. Other operating systems have different implementation of fork, e.g. FreeBSD has fork system call by itself.
Disclaimer: I am not a kernel developer. If you know better, please correct the answer.
See Also
clone (2)
The Design and Implementation of the FreeBSD Operating System (Google Books)
Understanding the Linux Kernel (Google Books)
Is it true that fork() calls clone() internally?
"Declare" is the wrong word to use in this context; C uses that word to talk about constructs that merely assert the existence of something, e.g.
extern int fork(void);
is a declaration of the function fork. Writing that in your code (or having it written for you as a consequence of #include <unistd.h>) does not cause fork to be called.
Now, the statement in your sample code, child = fork(); when written inside a function body, does (generate code to) make a call to the function fork. That function, assuming it is in fact the system primitive fork(2) on your operating system, and assuming it succeeds, has the special behavior of returning twice, once in the original process and once in a new process, with different return values in each so you can tell which is which.
So the answer to your question is that in both of the code fragments you showed, assuming the things I mentioned in the previous paragraph, all of the code after the child = fork(); line is at least potentially executed twice, once by the child and once by the parent. The if (child == 0) { ... } construct (again, this is not a "declaration") is the standard idiom for making parent and child do different things.
EDIT: In your third code sample, yes, the child1 == 0 block is necessary, but not to ensure that the child is created. Rather, it is there to ensure that whatever you want child1 to do is done only in child1. Moreover, as written (and, again, assuming all calls succeed) you are creating three child processes, because the second fork call will be executed by both parent and child! You probably want something like this instead:
pid_t child1, child2;
int status;
child1 = fork();
if (child1 == -1) {
perror("fork");
exit(1);
}
else if (child1 == 0) {
execlp("program_to_run_in_child_1", (char *)0);
/* if we get here, exec failed */
_exit(127);
}
child2 = fork();
if (child2 == -1) {
perror("fork");
kill(child1, SIGTERM);
exit(1);
}
else if (child2 == 0) {
execlp("program_to_run_in_child_2", (char *)0);
/* if we get here, exec failed */
_exit(127);
}
/* control reaches this point only in the parent and only when
both fork calls succeeded */
if (waitpid(child1, &status, 0) != child1) {
perror("waitpid");
kill(child1, SIGTERM);
}
/* only use SIGKILL as a last resort */
kill(child2, SIGTERM);
FYI, this is only a skeleton. If I were writing code to do this for real (which I have: see for instance https://github.com/zackw/tbbscraper/blob/master/scripts/isolate.c ) there would be a whole bunch more code just to comprehensively detect and report errors, plus the additional logic required to deal with file descriptor management in the children and a few other wrinkles.
The fork process spawns a new process identical to the old one and returns in both functions.
This happens automatically so you don't have to take any actions.
But nevertheless, it is cleaner to check if the call indeed succeeded:
A value below 0 indicates failure. In this case, it is not good to call kill().
A value == 0 indicates that we are the child process. In this case, it is not very clean to call kill().
A value > 0 indicates that we are the parent process. In this case, the return value is our child. Here it is safe to call kill().
In your case, you even end up with 4 processes:
Your parent calls fork(), being left with 2 processes.
Both of them call fork() again, resulting in a new child process for each of them.
You should move the 2nd fork() process into the branch where the parent code runs.
The child process begins some time after fork() has been called (there is some setup which happens in the context of the child).
You can be sure that the child is running when fork() returns.
So the code
pid_t child = fork();
kill (child, SIGKILL);
will kill the child. The child might execute kill(0, SIGKILL) which does nothing and returns an error.
There is no way to tell whether the child might ever live long enough to execute it's kill. Most likely, it won't since the Linux kernel will set up the process structure for the child and let the parent continue. The child will just be waiting in the ready list of the processes. The kill will then remove it again.
EDIT If fork() returns a value <= 0, then you shouldn't wait or kill.

Waiting for processes in C

I've been reading the documentation on wait() and waitpid() and I'm still somewhat confused about how they work (I have gathered that wait(&status) is equivalent to waitpid(-1, &status, 0);). Below are some small snippets of code I'm working on. Please help me understand whether these snippets are written properly and if not then why not.
Goal 1: Reap all zombie children.
int reapedPid;
do {
reapedPid = waitpid(-1,NULL,WNOHANG);
} while (reapedPid > 0);
What I'm trying to do here is iterate through all the children, reap the child if it's finished, let it keep going if it's not, and when I run out of children then reapedPid == -1 and the loop exits. The reason I'm confused here is that I don't see how waitpid() is supposed to know which children have already been checked and which have not. Does it do any such check? Or will this approach not work?
Goal 2: Wait for all children to finish.
int pid;
do {
pid = wait(NULL);
} while (pid != -1);
Here I don't care what the resulting status is of the children - this should just keep waiting for every child process to finish, whether successfully or unsuccessfully, and then exit. I think this code is correct but I'm not sure.
Goal 3: Fork a child and wait for it to finish.
int pid = fork();
if (pid < 0) {
// handle error.
}
else if (pid == 0) {
// execute child command
}
else {
int status;
int waitedForPid = waitpid(pid,&status,0);
assert(waitedForPid == pid);
}
Here I'm just trying to fork the process and have the parent wait for the child to finish. I am not entirely sure if I should be passing in the 0 option here but it seemed like WNOHANG, WUNTRACED, and WCONTINUED were not really relevant to my goal.
It is the kernel's job to keep track of processes. Keeping track of dead processes is trivial. The kernel can tell which child processes have died but not yet been waited for, and will return one of those dead children on each call, until there are none left to report on. (Because of the WNOHANG option, there might still be children left to wait for, but none of the remaining children are dead, yet.)
This second loop is also fine and almost equivalent to the first. The difference is that it will hang waiting for all the children to die before returning the -1.
This third fragment is fine; the assertion will be true except in extraordinary circumstances (such as another thread in the program also waited for the child and collected the corpse). However, if you somewhere launched another process and let it run in the background, you might be collecting zombies, whereas with a modification of the other loops, you can collect the zombies and still wait for the correct child:
int pid = fork();
if (pid < 0)
{
// handle error.
}
else if (pid == 0)
{
// execute child command
}
else
{
int status;
int corpse;
while ((corpse = waitpid(-1, &status, 0)) > 0)
if (corpse == pid)
break;
}
For most of these, you should be able to easily code up some example programs and verify your understanding.
Goal 1: Reap all zombie children.
The reason I'm confused here is that I don't see how waitpid() is supposed to know which children have already been checked and which have not. Does it do any such check?
Once a child has exited, it can only be waited on once. So your loop will only get the exit status for child processes that have not yet been waited on (zombies).
For Goals 2 and 3, again, I would consider it a required exercise to code up an example to see how it works. For #2, I would instead suggest that your code should always keep track of all forked children, so that it can know exactly who to wait on. Your code for #3 looks good; no options are required. Remember to use the WEXITSTATUS and friends macros to get information from the status.
See also:
Waiting for all child processes before parent resumes execution UNIX

What is the difference between fork()!=0 and !fork() in process creation

Currently, I am doing some exercises on operating system based on UNIX. I have used the fork() system call to create a child process and the code snippet is as follows :
if(!fork())
{
printf("I am parent process.\n");
}
else
printf("I am child process.\n");
And this program first executes the child process and then parent process.
But, when I replace if(!fork()) by if(fork()!=0) then the parent block and then child block executes.Here my question is - does the result should be the same in both cases or there is some reason behind this? Thanks in advance!!
There is no guaranteed order of execution.
However, if(!fork()) and if(fork()!=0) do give opposite results logically: if fork() returns zero, then !fork() is true whilst fork()!=0 is false.
Also, from the man page for fork():
On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately.
So the correct check is
pid_t pid = fork();
if(pid == -1) {
// ERROR in PARENT
} else if(pid == 0) {
// CHILD process
} else {
// PARENT process, and the child has ID pid
}
EDIT: As Wyzard says, you should definitely make sure you make use of pid later as well. (Also, fixed the type to be pid_t instead of int.)
You shouldn't really use either of those, because when the child finishes, it'll remain as a zombie until the parent finishes too. You should either capture the child's pid in a variable and use it to retrieve the child's exit status:
pid_t child_pid = fork();
if (child_pid == -1)
{
// Fork failed, check errno
}
else if (child_pid)
{
// Do parent stuff...
int status;
waitpid(child_pid, &status, 0);
}
else
{
// Child stuff
}
or you should use the "double-fork trick" to dissociate the child from the parent, so that the child won't remain as a zombie waiting for the parent to retrieve its exit status.
Also, you can't rely on the child executing before the parent after a fork. You have two processes, running concurrently, with no guarantee about relative order of execution. They may take turns, or they may run simultaneously on different CPU cores.
The order in which the parent and child get to their respective printf() statements is undefined. It is likely that if you were to repeat your tests a large number of times, the results would be similar for both, in that for either version there would be times that the parent prints first and times the parent prints last.
!fork() and fork() == 0 both behave in the same way.
The condition itself cannot be the reason the execution sequence is any different.
The process is replicated, which means that child is now competing with parent for resources, including CPU. It is the OS scheduler that decides which process will get the CPU.
The sequence in which child and parent processes are being execute is determined by the scheduler. It determines when and for how long each process is being executed by the processor. So the sequence of the output may vary for one and the same program code. It is purely coincidental that the change in the source code led to the change of the output sequence.
By the way, your printf's should be just the other way round: if fork() returns 0, it's the child, not the parent process.
See code example at http://en.wikipedia.org/wiki/Fork_%28operating_system%29. The German version of this article (http://de.wikipedia.org/wiki/Fork_%28Unix%29) contains a sample output and a short discusion about operation sequence.

Resources