I was trying to write a basic multiprocessing tcp-server, which forks a process for every new accept().
I don't need the parent process to wait on the child processes. I have come across two solutions- forking twice and daemonising.
What's the difference between the two?
Which is more suitable in this scenario?
What are the factors that are to be kept in mind for choosing one amongst these?
There is a subtle difference.
Forking twice: Intermediate child process can't become a zombie provided it has exited and has been waited for by Parent. Grandchild can't become a zombie either as it's parent (intermediate child process) has exited, so grandchild is an orphan. The orphan(grandchild) gets inherited by init and if it exits now, it is the responsibility of the system to clean it up. In this way, the parent process is releived of the responsibility of waiting to collect the exit status signal from child and also the parent can be busy doing some other work. This also enables the child to run for long time so that a shorttime parent need not wait for that amount of time.
Daemon: This is for programs wishing to detach themselves from the controlling terminal and run in the background as system daemons. Has no controlling terminal.
The decision of approach depends on the requirement/scenario in hand.
You do need the parent process to (eventually) wait() for each of its child processes, else the children will hang around until the parent exits. This is a form of resource leak.
Forking twice, with the intermediate process exiting immediately after forking, allows the original process to collect the child immediately (via wait()), and makes the grandchild process an orphan, which the system has responsibility for cleaning up. This is one way to avoid accumulating zombie processes. The grandchild remains in the same process group (and thus the same session) as the original process.
Daemonizing serves a somewhat different purpose. It puts the resulting (child) process in a new session (and new process group) with no controlling terminal. The same effect can be achieved by forking once, with the parent immediately calling _exit() and the child calling setsid().
A system service daemonizes to escape the session in which it was launched, so as not to be shut down when that session ends. This has little to do with multiprocessing, but a lot to do with process management. A process double-forks to avoid process management duties for the (grand)child processes; this has both multiprocessing and process management aspects.
Note, too, that double-forking doesn't just pass off process-management responsibilty, it also gives up process-management ability. Whether that's a good trade-off is situation-dependent.
Related
I am just going to post pseudo code,
but my question is I have a loop like such
for(i<n){
createfork();
if(child)
/*
Exit so I can control exact amount of forks
without children creating more children
*/
exit
}
void createfork(){
fork
//execute other methods
}
Does my fork create a process do what it is suppose to do and exit then create another process and repeat? And if so what are some ways around this, to get the processes running concurrently?
Your pseudocode is correct as written and does not need to be modified.
The processes are already executing in parallel, all six of them or however many you spawn. As written, the parent process does not wait for the children to finish before spawning more children. It calls fork(), checks if (child) (which is skipped), then immediately proceeds to the next for loop iteration and forks again.
Notably, there's no wait() call. If the parent were to call wait() or waitpid() to wait for each child to finish then that would introduce the serialism you're trying to avoid. But there is no such call, so you're good.
When a process successfully performs a POSIX fork(), that process and the new child process are initially both eligible to run. In that sense, they will run concurrently until one or the other blocks. Whether there will be any periods of time when both are executing machine instructions (on different processing units) depends at least on details of hardware capabilities, OS scheduling, the work each process is performing, and what other processes there are in the system and what they are doing.
The parent certainly does not, in general, automatically wait for the child to terminate before it proceeds with its own work (there is a family of functions to make it wait when you want that), nor does the child process automatically wait for any kind of signal from the parent. If the next thing the parent does is fork another child, then that will under many circumstances result in the parent running concurrently with both (all) children, in the sense described above.
I cannot speak to specifics of the behavior of your pseudocode, because it's pseudocode.
I have just had a lecture that sums reaping as:
Reaping
Performed by parent on terminated child (using wait or waitpid)
Parent is given exit status informaton
Kernel then deletes zombie child process
So I understand that reaping is done by calling wait or waitpid from the parent process after which the kernel deletes the zombie process. If this actually is the case, that reaping is done only when calling wait or waitpid, why do the child processes actually go away after returning in theor entry function - I mean that indeed does seem as if the child processes have been reaped and thus no resources are wasted even though the parent process may not be waiting.
So is "reaping" only possible when calling wait or waitpid? Is processes are "reaped" as long as they return and exit from their entry function (which I assume all processes do) - what is the point of talking about "reaping" as if it was something special?
The child process does not fully "go away" when it exits. It ceases to exist as a running process, and most/all of its resources (memory, open files, etc.) are released, but it still remains in the process table. It remains in the process table because that's where its exit status is stored, so that the parent can retrieve it by calling one of the wait variants. If the parent fails to call wait, the process table entry sticks around — and that's what makes it a "zombie".
I said that most/all of its resources are released, but the one resource that's definitely still consumed is that process table slot.
As long as the (dead) child's parent exists, the kernel doesn't know that the parent isn't going to call wait eventually, so the process table slot has to stay there, so that the eventual call to wait (if there is one) can return the proper exit status.
If the parent eventually exits (without ever calling wait), the child will be inherited by the grandparent, which is usually a "master" process like the shell, or init, that does routinely call wait and that will finally "reap" the poor young zombie.
So, yes, it really is true that the only way for the parent to properly "reap" the child is, just as was said in your lecture, to call one of the wait functions. (Or to exit, but that's not an option if the parent is long-running.)
Footnote: I said "the child will be inherited by the grandparent", but I think I was wrong, there. Under Unix and Linux, orphaned processes are generally always inherited by pid 1, aka init.
The purpose of the wait*() call is to allow the child process to report a status back to the parent process. When the child process exits, the operating system holds that status data in a little data structure until the parent reads it. Reaping in that sense is cleaning out that little data structure.
If the parent does not care about waiting for status from the child, the code could be written in a way to allow the parent to ignore the status, and so the reaping occurs semi-automatically. One way is to ignore the SIGCHLD signal.
Another way is to perform a double-fork to create a grandchild process instead. When doing this, the "parent" does a blocking wait() after a call to fork(). Then, the child performs another fork() to create the grandchild and then immediately exits, causing the parent to unblock. The grandchild now does the real work, and is automatically reaped by the init process.
I have just had a lecture that sums reaping as:
Reaping
Performed by parent on terminated child (using wait or waitpid)
Parent is given exit status informaton
Kernel then deletes zombie child process
So I understand that reaping is done by calling wait or waitpid from the parent process after which the kernel deletes the zombie process. If this actually is the case, that reaping is done only when calling wait or waitpid, why do the child processes actually go away after returning in theor entry function - I mean that indeed does seem as if the child processes have been reaped and thus no resources are wasted even though the parent process may not be waiting.
So is "reaping" only possible when calling wait or waitpid? Is processes are "reaped" as long as they return and exit from their entry function (which I assume all processes do) - what is the point of talking about "reaping" as if it was something special?
The child process does not fully "go away" when it exits. It ceases to exist as a running process, and most/all of its resources (memory, open files, etc.) are released, but it still remains in the process table. It remains in the process table because that's where its exit status is stored, so that the parent can retrieve it by calling one of the wait variants. If the parent fails to call wait, the process table entry sticks around — and that's what makes it a "zombie".
I said that most/all of its resources are released, but the one resource that's definitely still consumed is that process table slot.
As long as the (dead) child's parent exists, the kernel doesn't know that the parent isn't going to call wait eventually, so the process table slot has to stay there, so that the eventual call to wait (if there is one) can return the proper exit status.
If the parent eventually exits (without ever calling wait), the child will be inherited by the grandparent, which is usually a "master" process like the shell, or init, that does routinely call wait and that will finally "reap" the poor young zombie.
So, yes, it really is true that the only way for the parent to properly "reap" the child is, just as was said in your lecture, to call one of the wait functions. (Or to exit, but that's not an option if the parent is long-running.)
Footnote: I said "the child will be inherited by the grandparent", but I think I was wrong, there. Under Unix and Linux, orphaned processes are generally always inherited by pid 1, aka init.
The purpose of the wait*() call is to allow the child process to report a status back to the parent process. When the child process exits, the operating system holds that status data in a little data structure until the parent reads it. Reaping in that sense is cleaning out that little data structure.
If the parent does not care about waiting for status from the child, the code could be written in a way to allow the parent to ignore the status, and so the reaping occurs semi-automatically. One way is to ignore the SIGCHLD signal.
Another way is to perform a double-fork to create a grandchild process instead. When doing this, the "parent" does a blocking wait() after a call to fork(). Then, the child performs another fork() to create the grandchild and then immediately exits, causing the parent to unblock. The grandchild now does the real work, and is automatically reaped by the init process.
I'm trying to write a mock-shell in c on linux, and got stuck on this problem:
I need to run some processes in the background, and some processes in the foreground.
To prevent the foreground processes from becoming zombies, I can use wait(), but how do I prevent the background processes from becoming zombies?
You cannot prevent any process from becoming a zombie, but you can limit the time that it remains one. A process is a zombie from the time it terminates to the time its parent collects it via a call to wait() or waitpid() or another function serving that purpose. That time can be made very short indeed, for instance if the parent process is already waiting when the child terminates, but termination and subsequent collection are not synchronous.
The distinction between background and foreground processes is primarily about control of a terminal; it has little to do with a parent shell managing child processes. You collect child processes belonging to background jobs via wait(), etc., exactly the same way you collect child processes belonging to foreground jobs. You can can collect already-terminated children without waiting for unterminated ones by using waitpid() with the W_NOHANG flag, as #Someprogrammerdude already described. It remains to insert such waits at an appropriate time, and it seems common for interactive shells to schedule that around reading commands from the user.
You can poll for the, using waitpid with the W_NOHANG flag. Or you could add a SIGCHLD handler which will be invoked each time a child-process ends (or have other status changes).
" Thus, the common method for launching a daemon involves forking once or twice, and making the parent processes die while the child process begins performing its normal function."
I was going through OS concepts and I didn't understand the above said lines.
Why the parent process will be made to exit( or parent dying ),in the process of creating a Daemon?
Can someone pls explain me.
Traditionally, a daemon process is defined as a process whose parent is the system's init process and which runs in the background. For instance, if you were to execute some program in your terminal, your shell would create a process (either in the foreground or background) and the program would run with your shell as its parent. This is an example of a non-daemon process because its parent is your shell process.
So how do you produce a process whose parent is the init process? Well, a process whose parent process dies before it (the child) has exited becomes an orphan process. An orphan process will in turn be re-parented to the init process. Voila, the process now meets the definition of a daemon.
Tying this back to your quote, if you were to fork once and then kill the parent, you achieve the desired effect. Likewise, if you fork once and then have that child fork another process, followed by killing the first child, you also achieve the desired effect while keeping the (now grandparent) process alive.
This is not a requirement, as any background process could be a daemon. Technically a daemon process in one that runs to operate some general non interactive task. In Unix environment, a daemon is generally set as a process that have some characteristics: no controlling terminal, no umask, particular working directory, etc. Forking twice is a common way to obtain the grandchild to be inherited by init process and have the former properties, in some way to get a process fully detached of any user control (except root of course).
This applies only if a standard user want to create a daemon. Some other standard daemons are created almost normally (see init, launchd, etc)
If the parent exits while the daemon continues running, the daemon is orphaned, and the init process typically adopts it (i.e. becomes the parent).
There are some exceptions, but it is normally expected that a daemon process will be descended from the init process (e.g. the init process will launch daemons during system startup). So, if another process launches a daemon and terminates, it achieves the desired effect.
Note that some other actions are also needed, such as disassociating the daemon from any tty window.
Other answers already explained what happens when parent dies i.e. child is adopted by init process.
But why above is required to make a process daemon? A daemon by definition is non-interacting program i.e. it should not be associated with a terminal. That ensures that daemon continues to work in background even when user sends signals by Control-C, hangup etc. Now, how to prevent a process from ever attaching to a terminal? Make init it's parent by killing original parent.
init is a special process because:
It's not attached to any terminal.
It's first process (pid 1) after booting OS, and that makes it leader of it's session. Note that every UNIX process belongs a process group and that in turn belongs to a session. First process in the session becomes session leader.
In UNIX, only session leader can attach to (or control) terminal. As soon as you make init parent of your process, it joins init's session. Since init is the session leader, your process can never be the leader and hence can never attach to a terminal. That's what we wanted, right?
There are other ways to detach terminal e.g. calling setsid but that's not part of this discussion.