Good morning,
I'm trying to learn how to us Peterson's solution for critical section protection. Each process is trying to increment total to 100,000 and I have to make sure each child calls process#(). I also need to use the "wait" function so the parent knows when the child finishes. Once a child finishes I need to print the process ID and the amount of times process 1 interrupts process 2, and vise versa. I really have no idea what I'm doing even though I've been reading around a lot. What is this "Waiting" function I'm supposed to use? How do I use it? Why is my code incrementing to 200,000 instead of 100,000?
Code removed, unnecessary for question.
Apparently somewhere in the main function, I need to loop for the parent to wait for the child and then print the process ID of the children, but I have no idea how to do this.
The wait() command you are referring to (and waitpid()) is a command you use in the parent process to "wait" for a child to terminate (blocking, meaning the parent won't continue execution until the child changes state). If your child terminates and you do not wait() in the parent, the child process will become a "zombie". wait()ing effectively "reaps" the child process.
This is the signature for waitpid():
pid_t waitpid(pid_t pid, int *status, int options);
status is a variable you can use to return some info from the child to the parent (e.g., the number of times this process was interrupted (assuming you keep track of this in the child?)); it is the exit code of the child. Let's assume you have a child with PID of 1234, you would call waitpid(1234, &status 0) (if you wanted to wait non-blocking, you'd have to use WNOHANG for options (it immediately returns if no child has exited). Check out http://linux.die.net/man/2/waitpid because there's cool values you can use for waitpid() such as -1 to wait for any child to exit (this is the same as regular wait()). Please post comments if you have any more questions, but hopefully this is enough to point you in the right direction :).
Since you want to "loop" in the parent to wait for your children, you can either skip the loop altogether and use the normal, "blocking" wait, or you can use an infinite loop with a non-blocking wait (or a series of them, one for each child if you don't use regular wait()).
Just in case you didn't know, you can determine whether you are in the parent or child based on the return value of fork(), but I think you already knew this. So in the body of an if (checking for parent) is where you would do the wait().
Also, is process2() supposed to have while (k < 200000)? Could this by why you say it's incrementing to 200,000?
Related
I am reading https://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551, and there the author handle the sigchld in handler that calls waitpid rather then wait.
In Figure 5.7, we cannot call wait in a loop, because there is no way to prevent wait from blocking if there are
running children that have not yet terminated.
The handler is as follows:
void sig_chld(int signo)
{
pid_t pid;
int stat;
// while ((pid = wait(&stat)) > 0)
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0)
{
printf("child %d terminated\n", pid);
}
}
The question is, even if I use the blocking version of wait (as is commented out), the child are terminated anyway (which is what I want in order to not have zombies), so why to even bother whether it is in blocking way or non-blocking?
I assume when it is non-blocking way (i.e. with waitpid), then I can call the handler multiple times? (when some childs are terminated, and other are still running). But still I can just block and wait in that handler for all child to terminate. So no difference between calling the handler multiple times or just once. Or is there any other reason for non-blocking and calling the handler multiple times?
The while loop condition will run one more time than there are zombie child processes that need to be waited for. So if you use wait() instead of waitpid() with the WNOHANG flag, it'll potentially block forever if you have another still running child - as wait() only returns early with an ECHLD error if there are no child processes at all. A robust generic handler will use waitpid() to avoid that.
Picture a case where the parent process starts multiple children to do various things, and periodically sends them instructions about what to do. When the first one exits, using wait() in a loop in the SIGCHLD handler will cause it to block forever while the other child processes are hanging around waiting for more instructions that they'll never receive.
Or, say, an inetd server that listens for network connections and forks off a new process to handle each one. Some services finish quickly, some can run for hours or days. If it uses a signal handler to catch exiting children, it won't be able to do anything else until the long-lived one exits if you use wait() in a loop once that handler is triggered by a short-lived service process exiting.
It took me a second to understand what the problem was, so let me spell it out. As everyone else has pointed out, wait(2) may block. waitpid(2) will not block if the WNOHANG option is specified.
You are correct to say that the first call to wait(2) should not block since the the signal handler will only be called if a child exited. However, when the signal handler is called there may be several children that may have exited. Signals are delivered asynchronously, and can be consolidated. So, if two children exit at close to the same time, it's possible that the operating system only sends one signal to the parent process. Thus, a loop must be used to iteratively check if more than one child has exited.
Now that we've established that the loop is necessary for checking if multiple children have exited, it is clear that we can't use wait(2) since it will block on the second iteration if there is a child that has not exited.
TL;DR The loop is necessary, and hence using waitpid(2) is necessary.
I am just going to post pseudo code,
but my question is I have a loop like such
for(i<n){
createfork();
if(child)
/*
Exit so I can control exact amount of forks
without children creating more children
*/
exit
}
void createfork(){
fork
//execute other methods
}
Does my fork create a process do what it is suppose to do and exit then create another process and repeat? And if so what are some ways around this, to get the processes running concurrently?
Your pseudocode is correct as written and does not need to be modified.
The processes are already executing in parallel, all six of them or however many you spawn. As written, the parent process does not wait for the children to finish before spawning more children. It calls fork(), checks if (child) (which is skipped), then immediately proceeds to the next for loop iteration and forks again.
Notably, there's no wait() call. If the parent were to call wait() or waitpid() to wait for each child to finish then that would introduce the serialism you're trying to avoid. But there is no such call, so you're good.
When a process successfully performs a POSIX fork(), that process and the new child process are initially both eligible to run. In that sense, they will run concurrently until one or the other blocks. Whether there will be any periods of time when both are executing machine instructions (on different processing units) depends at least on details of hardware capabilities, OS scheduling, the work each process is performing, and what other processes there are in the system and what they are doing.
The parent certainly does not, in general, automatically wait for the child to terminate before it proceeds with its own work (there is a family of functions to make it wait when you want that), nor does the child process automatically wait for any kind of signal from the parent. If the next thing the parent does is fork another child, then that will under many circumstances result in the parent running concurrently with both (all) children, in the sense described above.
I cannot speak to specifics of the behavior of your pseudocode, because it's pseudocode.
I have just had a lecture that sums reaping as:
Reaping
Performed by parent on terminated child (using wait or waitpid)
Parent is given exit status informaton
Kernel then deletes zombie child process
So I understand that reaping is done by calling wait or waitpid from the parent process after which the kernel deletes the zombie process. If this actually is the case, that reaping is done only when calling wait or waitpid, why do the child processes actually go away after returning in theor entry function - I mean that indeed does seem as if the child processes have been reaped and thus no resources are wasted even though the parent process may not be waiting.
So is "reaping" only possible when calling wait or waitpid? Is processes are "reaped" as long as they return and exit from their entry function (which I assume all processes do) - what is the point of talking about "reaping" as if it was something special?
The child process does not fully "go away" when it exits. It ceases to exist as a running process, and most/all of its resources (memory, open files, etc.) are released, but it still remains in the process table. It remains in the process table because that's where its exit status is stored, so that the parent can retrieve it by calling one of the wait variants. If the parent fails to call wait, the process table entry sticks around — and that's what makes it a "zombie".
I said that most/all of its resources are released, but the one resource that's definitely still consumed is that process table slot.
As long as the (dead) child's parent exists, the kernel doesn't know that the parent isn't going to call wait eventually, so the process table slot has to stay there, so that the eventual call to wait (if there is one) can return the proper exit status.
If the parent eventually exits (without ever calling wait), the child will be inherited by the grandparent, which is usually a "master" process like the shell, or init, that does routinely call wait and that will finally "reap" the poor young zombie.
So, yes, it really is true that the only way for the parent to properly "reap" the child is, just as was said in your lecture, to call one of the wait functions. (Or to exit, but that's not an option if the parent is long-running.)
Footnote: I said "the child will be inherited by the grandparent", but I think I was wrong, there. Under Unix and Linux, orphaned processes are generally always inherited by pid 1, aka init.
The purpose of the wait*() call is to allow the child process to report a status back to the parent process. When the child process exits, the operating system holds that status data in a little data structure until the parent reads it. Reaping in that sense is cleaning out that little data structure.
If the parent does not care about waiting for status from the child, the code could be written in a way to allow the parent to ignore the status, and so the reaping occurs semi-automatically. One way is to ignore the SIGCHLD signal.
Another way is to perform a double-fork to create a grandchild process instead. When doing this, the "parent" does a blocking wait() after a call to fork(). Then, the child performs another fork() to create the grandchild and then immediately exits, causing the parent to unblock. The grandchild now does the real work, and is automatically reaped by the init process.
I have just had a lecture that sums reaping as:
Reaping
Performed by parent on terminated child (using wait or waitpid)
Parent is given exit status informaton
Kernel then deletes zombie child process
So I understand that reaping is done by calling wait or waitpid from the parent process after which the kernel deletes the zombie process. If this actually is the case, that reaping is done only when calling wait or waitpid, why do the child processes actually go away after returning in theor entry function - I mean that indeed does seem as if the child processes have been reaped and thus no resources are wasted even though the parent process may not be waiting.
So is "reaping" only possible when calling wait or waitpid? Is processes are "reaped" as long as they return and exit from their entry function (which I assume all processes do) - what is the point of talking about "reaping" as if it was something special?
The child process does not fully "go away" when it exits. It ceases to exist as a running process, and most/all of its resources (memory, open files, etc.) are released, but it still remains in the process table. It remains in the process table because that's where its exit status is stored, so that the parent can retrieve it by calling one of the wait variants. If the parent fails to call wait, the process table entry sticks around — and that's what makes it a "zombie".
I said that most/all of its resources are released, but the one resource that's definitely still consumed is that process table slot.
As long as the (dead) child's parent exists, the kernel doesn't know that the parent isn't going to call wait eventually, so the process table slot has to stay there, so that the eventual call to wait (if there is one) can return the proper exit status.
If the parent eventually exits (without ever calling wait), the child will be inherited by the grandparent, which is usually a "master" process like the shell, or init, that does routinely call wait and that will finally "reap" the poor young zombie.
So, yes, it really is true that the only way for the parent to properly "reap" the child is, just as was said in your lecture, to call one of the wait functions. (Or to exit, but that's not an option if the parent is long-running.)
Footnote: I said "the child will be inherited by the grandparent", but I think I was wrong, there. Under Unix and Linux, orphaned processes are generally always inherited by pid 1, aka init.
The purpose of the wait*() call is to allow the child process to report a status back to the parent process. When the child process exits, the operating system holds that status data in a little data structure until the parent reads it. Reaping in that sense is cleaning out that little data structure.
If the parent does not care about waiting for status from the child, the code could be written in a way to allow the parent to ignore the status, and so the reaping occurs semi-automatically. One way is to ignore the SIGCHLD signal.
Another way is to perform a double-fork to create a grandchild process instead. When doing this, the "parent" does a blocking wait() after a call to fork(). Then, the child performs another fork() to create the grandchild and then immediately exits, causing the parent to unblock. The grandchild now does the real work, and is automatically reaped by the init process.
I have understood that:
1) waitpid is used to wait for a child's death and then collect the SIGCHLD and the exit status of the child etc.
2) When we have a signal handler for SIGCHLD, we do some more things related to cleanup of child or other stuff (upto the programmer) and then do a waitpid so that the child will not go zombie and then return.
Now, do we need to have both 1 and 2 in our programs when we do a fork/exec and the child returns ?
If we have both, the SIGCHLD is obtained first, so the signal handler is called first and thus its waitpid is called successfully and not the waitpid in the parent process code as follows:
my_signal_handler_for_sigchld
{
do something
tmp = waitpid(-1,NULL,0);
print tmp (which is the correct value of the child pid)
}
int main ()
{
sigaction(SIGCHLD, my_signal_handler_for_sigchld)
fork()
if (child) //do something, return
if parent // waitpid(child_pid, NULL,0); print value returned from this waitpid - it is -1
}
Appreciate if someone helps me understand this.
You really don't need to handle SIGCHLD if your intent is to run a child process, do some stuff, then wait for it to finish. In that case, you just call waitpid when you're ready to synchronize. The only thing SIGCHLD is useful for is asynchronous notification of child termination, for example if you've got an interactive (or long-running daemon) application that's spawning various children and needs to know when they finish. However, SIGCHLD is really bad/ugly for this purpose too, since if you're using library code that creates child processes, you might catch the events for the library's children terminating and interfere with its handling of them. Signal handlers are inherently process-global and deal with global state, which is usually A Bad Thing(tm).
Here are two better approaches for when you have child processes that will be terminating asynchronously:
Approach 1 (select/poll event-based): Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
Approach 2 (thread based): For each child process you create, also create a thread that will immediately call waitpid on the child process's pid. When waitpid returns successfully, use your favorite thread synchronization primitives to let the rest of the program know that the child terminated, or simply take care of everything you need to do in this waiter thread before it terminates.
Both of these approaches are modular and library-friendly (they avoid interfering with any other parts of your code or library code which might be making use of child processes).
You need to call the waiting syscalls like waitpid or friends -eg wait4 etc- othewise you could have zombie processes.
You could handle SIGCHLD to be notified that some child ended (or stopped, etc...) but you'll need to wait for it later.
Signal handlers are restricted to call a small set of async-signal-safe-functions (see signal(7) for more). Good advice is to just set a volatile sig_atomic_t flag inside, and test it at later and safer places.