Related
In this code (run on linux):
void child_process()
{
int count=0;
for(;count<1000;count++)
{
printf("Child Process: %04d\n",count);
}
printf("Child's process id: %d\n",getpid());
}
void parent_process()
{
int count=0;
for(;count<1000;count++)
{
printf("Parent Process: %04d\n",count);
}
}
int main()
{
pid_t pid;
int status;
if((pid = fork()) < 0)
{
printf("unable to create child process\n");
exit(1);
}
if(pid == 0)
child_process();
if(pid > 0)
{
printf("Return value of wait: %d\n",wait();
parent_process();
}
return 0;
}
If the wait() were not present in the code, one of the process (child or parent) would finish it's execution and then the control is given to the linux terminal and then finally the process left (child or parent) would run. The output of such a case is:
Parent Process: 0998
Parent Process: 0999
guest#debian:~/c$ Child Process: 0645 //Control given to terminal & then child process is again picked for processing
Child Process: 0646
Child Process: 0647
In case wait() is present in the code, what should be the flow of execution?
When fork() is called then a process tree must be created containing parent and child process. In above code when the processing of child process ends, the parent is informed about the death of child zombie process via wait() system call, but parent and child being two separate processes, is it mandatory that the control is passed the directly to the parent after child process is over? (no control given to other process like terminal at all) - if yes then it is like child process is a part of parent process (like a function called from another function).
This comment is, at least, misleading:
//Control given to terminal & then child process is again picked for processing
The "terminal" process doesn't really enter into the equation. It's always running, assuming that you are using a terminal emulator to interact with your program. (If you're using the console, then there is no terminal process. But that's unlikely these days.)
The process in control of the user interface is whatever shell you're using. You type some command-line like
$ ./a.out
and the shell arranges for your program to run. (The shell is an ordinary user program without special privileges, by the way. You could write your own.)
Specifically, the shell:
Uses fork to create a child process.
Uses waitpid to wait for that child process to finish.
The child process sets up any necessary redirects and then uses some exec system call, typically execve, to replace itself with the ./a.out program, passing execve (or whatever) the command line arguments you specified.
That's it.
Your program, in ./a.out, uses fork to create a child and then possibly waits for the child to finish before terminating. As soon as your parent process terminates, the shell's waitpid() can return, and as soon as it returns, the shell prints a new command prompt.
So there are at least three relevant processes: the shell, your parent process, and your child process. In the absence of synchronisation functions like waitpid(), there are no guarantees about ordering. So when your parent process calls fork(), the created child could start executing immediately. Or not. If it does start executing immediately, it does not necessarily preempt your parent process, assuming your computer is reasonably modern and has more than one core. They could both be executing at the same time. But that's not going to last very long because your parent process will either immediately call exit or immediately call wait.
When a process calls wait (or waitpid), it is suspended and becomes runnable again when the process it is waiting for terminates. But again there are no guarantees. The mere fact that a process is runnable doesn't mean that it will immediately start running. But generally, in the absence of high load, the operating system will start running it pretty soon. Again, it might be running at the same time as another process, such as your child process (if your parent didn't wait for it to finish).
In short, if you performed your experiment a million times, and your parent waits for your child, then you will see the same result a million times; the child must finish before the parent is unsuspended, and your parent must finish before the shell is unsuspended. (If your parent process printed something before waiting, you would see different results; the parent and child outputs could be in any order, or even overlapped.)
If, on the other hand, your parent does not wait for the child, then you could see any of a number of results, and in a million repetitions you're likely to see more than one of them (but not with the same probability). Since there is no synchronisation between parent and child, the outputs could appear in either order (or be interleaved). And since the child is not synchronised with the shell, its output could appear before or after the shell's prompt, or be interleaved with the shell's prompt. No guarantees, other than that the shell will not resume until your parent is done.
Note that the terminal emulator, which is a completely independent process, is runnable the entire time. It owns a pseudo-terminal ("pty") which is how it emulates a terminal. The pseudo-terminal is a kind of pipe; at one end of the pipe is the process which thinks it's communicating with a console, and at the other end is the terminal emulator which interprets whatever is being written to the pty in order to render it in the GUI, and which sends any keystrokes it receives, suitably modified as a character stream back through the pipe. Since the terminal emulator is never suspended and its execution is therefore interleaved with whatever other processes are active on your computer, it will (more or less) immediately show you any output which is sent by your shell or the processes it starts up. (Again, assuming the machine is not overloaded.)
I am new to processes in linux and c.
I am using this straightforward example:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, const char * argv[]) {
pid_t child_pid_or_zero = fork(); //fork returns twice
if(child_pid_or_zero < 0)
{
//if fork returns a number smaller than zero, something wrong happened
perror("Something wrong happened\n");
exit(-1);
}
if(child_pid_or_zero > 0)
{
//if fork returns a number greater than zero, this is the parent process
printf("I'm the parent, my pid is: %d\t My child pid is %d\n", getpid(), child_pid_or_zero);
wait(NULL);
}
else
{
//this means that fork now returned 0, the child process is running
printf("I am the child with pid: %d\t My parent pid is: %d\n",child_pid_or_zero, getppid());
}
return 0;
}
If I were to omit the wait() method in the
if(child_pid_or_zero > 0)
What would happen? I tried this myself, and apparently, there was no immediate difference. Do we always need to use a wait(), or does this only apply when the child is supposed to perform heavy calculations etc ?
Thanks in advance.
Wait is for listening to state changes and obtaining information about the child. A state change is child termination, stopping or resuming by a signal. Wait allows the system to release the resources associated with the child. If a wait is not performed, then the terminated child remains in a "zombie" state.
The kernel maintains a minimal set of information about the zombie
process (PID, termination status,
resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long
as a zombie is not removed from the system via a
wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further
processes. If a parent process terminates, then its
"zombie" children (if any) are adopted by init(1), which automatically performs a wait to remove the zombies.
The system call wait(2) is typically used to find if the child process's state has changed (i.e. whether it's still running, exited, etc).
Another purpose is to avoid "zombie" processes. If parent process doesn't wait on the child process and the child process exits before the parent process then it becomes a "zombie" process. So, a wait() call is used to "reap" the process and release the system resources associated with the process.
Imagine if the parent process is a long running one and creates several child processes at regular intervals then all the zombie processes will have entries in the process table which is an unncessary use of system resources.
after the fork, you'll have two independent process.
whit the wait() call, you tell the parent process to wait the child process to terminate.
In this example, nothing changes since the two process are not interacting with each other, so the parent can just exit after creating the child and printing the string, but in the scenario where the parent has to wait the child to do some operations and then maybe return some values to the parent, it becomes useful!
A Zombie is created when a parent process does not use the wait system call after a child dies to read its exit status, and an orphan is child process that is reclaimed by init when the original parent process terminates before the child.
In terms of memory management and the process table how are these processes handled differently, specifically in UNIX?
What is an example or extreme case when the creation of zombies or orphans can be detrimental to the greater application or system?
When a child exits, some process must wait on it to get its exit code. That exit code is stored in the process table until this happens. The act of reading that exit code is called "reaping" the child. Between the time a child exits and is reaped, it is called a zombie. (The whole nomenclature is a bit gruesome when you think about it; I recommend not thinking about it too much.)
Zombies only occupy space in the process table. They take no memory or CPU. However, the process table is a finite resource, and excessive zombies can fill it, meaning that no other processes can launch. Beyond that, they are bothersome clutter, and should be strongly avoided.
If a process exits with children still running (and doesn't kill its children; the metaphor continues to be bizarre), those children are orphans. Orphaned children are immediately "adopted" by init (actually, I think most people call this "reparenting," but "adoption" seems to carry the metaphor better). An orphan is just a process. It will use whatever resources it uses. It is reasonable to say that it is not an "orphan" at all since it has a parent, but I've heard them called that often.
init automatically reaps its children (adopted or otherwise). So if you exit without cleaning up your children, then they will not become zombies (at least not for more than a moment).
But long-lived zombies exist. What are they? They're the former children of an existing process that hasn't reaped them. The process may be hung. Or it may be poorly written and forgets to reap its children. Or maybe it's overloaded and hasn't gotten around to it. Or whatever. But for some reason, the parent process continues to exist (so they aren't orphans), and they haven't been waited on, so they live on as zombies in the process table.
So if you see zombies for longer than a moment, then it means that there is something wrong with the parent process, and something should be done to improve that program.
When a process terminates, its resources are deallocated by the operating
system. However, its entry in the process table must remain there until the
parent calls wait(), because the process table contains the process’s exit status.
A process that has terminated, but whose parent has not yet called wait(), is
known as a zombie process. All processes transition to this state when they
terminate, but generally they exist as zombies only briefly. Once the parent
calls wait(), the process identifier of the zombie process and its entry in the
process table are released.
Now consider what would happen if a parent did not invoke wait() and
instead terminated, thereby leaving its child processes as orphans. Linux and
UNIX address this scenario by assigning the init process as the new parent to orphan processes. The init process periodically
invokes wait(), thereby allowing the exit status of any orphaned process to be
collected and releasing the orphan’s process identifier and process-table entry.
Source: Operating System Concepts by Abraham, Peter, Greg
An orphan process is a computer process whose parent process has finished or terminated, though it (child process) remains running itself.
A zombie process or defunct process is a process that has completed execution but still has an entry in the process table as its parent process didn't invoke an wait() system call.
Orphan -
Parent exit , Init process becomes the parent of child process.
Whenever child is terminated, process table gets deleted by os.
Zombie -
When the child terminates it gives exit status to parent.
Meanwhile time suppose your parent is in sleep state and unable to receive any status from child.
Though the child exit but the process occupies space in process table
check out this command in linux ubuntu >>ps -eo pid,ppid,status,cmd
If you found something like defunc at the end i.e your process is zombie and occupying space.
Zombie Process:
A process that has finished the execution but still has an entry in the process table to report to its parent process is known as a zombie process. A child process always first becomes a zombie before being removed from the process table. The parent process reads the exit status of the child process which reaps off the child process entry from the process table.
Orphan Process:
A process whose parent process no more exists i.e. either finished or terminated without waiting for its child process to terminate is called an orphan process.
There are no orphans but the process using PID 1.
From the running process' point of view it makes no difference whether it was started directly and therefore has PID 1 as parent or got inherited by PID 1 because its original parent (being different from PID 1) ended.
It is handled like any other process.
Each process goes through some sort of zombie state, when ending, namely the phase between announcing its end by issuing SIGCHLD and having its processing (delivery or ignorance) acknowledged.
When the zombie state had been entered the process is just an entry in the system's process list.
The only significant resource a zombie is exclusively using is a valid PID.
I would like to add 2 code snippets featuring an orphan and a zombie process. But first, I will post the definition of these processes as stated in the book "Operating System Concepts" by Silberschatz, Galvin and Gagn:
If no parent waiting (did not invoke wait()) process is a zombie
If parent terminated without invoking wait , process is an orphan
Orphan
// A C program to demonstrate Orphan Process.
// Parent process finishes execution while the
// child process is running. The child process
// becomes orphan.
#include <stdio.h> //printf
#include <stdlib.h> //exit
#include <sys/types.h> //fork
#include <unistd.h> //fork and sleep
int main()
{
// Fork returns process id
// in parent process
pid_t child_pid = fork();
// Parent process didn't use wait and finished before child
// so the child becomes an orphan process
// Parent process
if (child_pid > 0) {
printf("I finished my execution before my child");
}
else // Child process
if (child_pid == 0) {
sleep(1); //sleep for 1 second
printf("This printf will not be executed");
}
else{
//error occurred
}
return 0;
}
Output
I finished my execution before my child
Zombie
// A C program to demonstrate Zombie Process.
// Child becomes Zombie as parent is not waiting
// when child process exits.
#include <stdio.h> //printf
#include <stdlib.h> //exit
#include <sys/types.h> //fork
#include <unistd.h> //fork and sleep
int main()
{
// Fork returns process id
// in parent process
pid_t child_pid = fork();
// Parent process didn't use wait
// so the child becomes a zombie process
// Parent process
if (child_pid > 0){
sleep(1); //sleep for 1 second
printf("\nI don't wait for my child");
}
else // Child process
if(child_pid == 0){
printf("My parent doesn't wait me");
exit(0);
}
else{
//error occurred
}
return 0;
}
Output
My parent doesn't wait me
I don't wait for my child
Edit: Source and inspiration taken from here
A process which has finished the execution but still has the entry in the process table to report to its parent process is known as a zombie process.
A process whose parent process no more exists i.e. either finished or terminated without waiting for its child process to terminate is called an orphan process
Here is one summary
Zombie Process
Orphan Process
A Zombie is a process that has completed its task but still, shows an entry in a process table.
A child process that remains running even after its parent process is terminated or completed without waiting for the child process execution is called an orphan.
Zombie process states always indicated by Z
The orphan process was created unknowingly due to a system crash.
The zombie process is treated as dead they are not used for system processing
An orphan process is a computer process even after its parent terminates init it becomes a parent and continues the remaining task.
wait() system call is used to deal with zombie processes
Kernel allocates a new process as parent process to orphan process. Mostly the new parent is the init process (pid=1).
To remove the zombie process executes the kill command.
Terminate the Orphan process using the SIGHUP signal.
Source:
Zombie vs. Orphan Processes
Difference between zombie orphan and daemon processes
Zombie and Orphan Process in OS
How the below program works and create a Zombie process under linux?
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main ()
{
pid_t child_pid;
child_pid = fork ();
if (child_pid > 0) {
sleep (60);
}
else {
exit (0);
}
return 0;
}
It creates children and doesn't wait (with one of the wait* system call) for them. And zombies are just that: children that the parents hasn't waited yet, the kernel has to maintain some information for them -- mainly the exit status -- in order to be able to return it to the parent.
The setsid() command is missing.
Every *nix process produces an exit status that must be reaped. This is supposed to be reaped by the parent process using a wait() statement, if the child is supposed to terminate first.
The setsid() command switches the parent process to init when the parent terminates before the child process.
Root should be able to remove zombies from the process list using kill -9. Inexperienced programmers sometimes omit setsid(), which will hide bugs that produce errors that would otherwise clog the disk drive.
In days of old, the system administrator would use zombies to identify inexperienced programmers that need additional training to produce good code.
The exit status harvested by init is sent to syslog when the kernel terminates a program prematurely. That exit status is used to identify the nature of the bug that caused the early termination (error conditions not handled by the programmer).
Exit status reported in this way becomes part of the syslog or klog files, which are commonly used to debug code.
In many programs and man pages of Linux, I have seen code using fork(). Why do we need to use fork() and what is its purpose?
fork() is how you create new processes in Unix. When you call fork, you're creating a copy of your own process that has its own address space. This allows multiple tasks to run independently of one another as though they each had the full memory of the machine to themselves.
Here are some example usages of fork:
Your shell uses fork to run the programs you invoke from the command line.
Web servers like apache use fork to create multiple server processes, each of which handles requests in its own address space. If one dies or leaks memory, others are unaffected, so it functions as a mechanism for fault tolerance.
Google Chrome uses fork to handle each page within a separate process. This will prevent client-side code on one page from bringing your whole browser down.
fork is used to spawn processes in some parallel programs (like those written using MPI). Note this is different from using threads, which don't have their own address space and exist within a process.
Scripting languages use fork indirectly to start child processes. For example, every time you use a command like subprocess.Popen in Python, you fork a child process and read its output. This enables programs to work together.
Typical usage of fork in a shell might look something like this:
int child_process_id = fork();
if (child_process_id) {
// Fork returns a valid pid in the parent process. Parent executes this.
// wait for the child process to complete
waitpid(child_process_id, ...); // omitted extra args for brevity
// child process finished!
} else {
// Fork returns 0 in the child process. Child executes this.
// new argv array for the child process
const char *argv[] = {"arg1", "arg2", "arg3", NULL};
// now start executing some other program
exec("/path/to/a/program", argv);
}
The shell spawns a child process using exec and waits for it to complete, then continues with its own execution. Note that you don't have to use fork this way. You can always spawn off lots of child processes, as a parallel program might do, and each might run a program concurrently. Basically, any time you're creating new processes in a Unix system, you're using fork(). For the Windows equivalent, take a look at CreateProcess.
If you want more examples and a longer explanation, Wikipedia has a decent summary. And here are some slides here on how processes, threads, and concurrency work in modern operating systems.
fork() is how Unix create new processes. At the point you called fork(), your process is cloned, and two different processes continue the execution from there. One of them, the child, will have fork() return 0. The other, the parent, will have fork() return the PID (process ID) of the child.
For example, if you type the following in a shell, the shell program will call fork(), and then execute the command you passed (telnetd, in this case) in the child, while the parent will display the prompt again, as well as a message indicating the PID of the background process.
$ telnetd &
As for the reason you create new processes, that's how your operating system can do many things at the same time. It's why you can run a program and, while it is running, switch to another window and do something else.
fork() is used to create child process. When a fork() function is called, a new process will be spawned and the fork() function call will return a different value for the child and the parent.
If the return value is 0, you know you're the child process and if the return value is a number (which happens to be the child process id), you know you're the parent. (and if it's a negative number, the fork was failed and no child process was created)
http://www.yolinux.com/TUTORIALS/ForkExecProcesses.html
fork() is basically used to create a child process for the process in which you are calling this function. Whenever you call a fork(), it returns a zero for the child id.
pid=fork()
if pid==0
//this is the child process
else if pid!=0
//this is the parent process
by this you can provide different actions for the parent and the child and make use of multithreading feature.
fork() will create a new child process identical to the parent. So everything you run in the code after that will be run by both processes — very useful if you have for instance a server, and you want to handle multiple requests.
System call fork() is used to create processes. It takes no arguments and returns a process ID. The purpose of fork() is to create a new process, which becomes the child process of the caller. After a new child process is created, both processes will execute the next instruction following the fork() system call. Therefore, we have to distinguish the parent from the child. This can be done by testing the returned value of fork():
If fork() returns a negative value, the creation of a child process was unsuccessful.
fork() returns a zero to the newly created child process.
fork() returns a positive value, the process ID of the child process, to the parent. The returned process ID is of type pid_t defined in sys/types.h. Normally, the process ID is an integer. Moreover, a process can use function getpid() to retrieve the process ID assigned to this process.
Therefore, after the system call to fork(), a simple test can tell which process is the child. Please note that Unix will make an exact copy of the parent's address space and give it to the child. Therefore, the parent and child processes have separate address spaces.
Let us understand it with an example to make the above points clear. This example does not distinguish parent and the child processes.
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#define MAX_COUNT 200
#define BUF_SIZE 100
void main(void)
{
pid_t pid;
int i;
char buf[BUF_SIZE];
fork();
pid = getpid();
for (i = 1; i <= MAX_COUNT; i++) {
sprintf(buf, "This line is from pid %d, value = %d\n", pid, i);
write(1, buf, strlen(buf));
}
}
Suppose the above program executes up to the point of the call to fork().
If the call to fork() is executed successfully, Unix will make two identical copies of address spaces, one for the parent and the other for the child.
Both processes will start their execution at the next statement following the fork() call. In this case, both processes will start their execution at the assignment
pid = .....;
Both processes start their execution right after the system call fork(). Since both processes have identical but separate address spaces, those variables initialized before the fork() call have the same values in both address spaces. Since every process has its own address space, any modifications will be independent of the others. In other words, if the parent changes the value of its variable, the modification will only affect the variable in the parent process's address space. Other address spaces created by fork() calls will not be affected even though they have identical variable names.
What is the reason of using write rather than printf? It is because printf() is "buffered," meaning printf() will group the output of a process together. While buffering the output for the parent process, the child may also use printf to print out some information, which will also be buffered. As a result, since the output will not be send to screen immediately, you may not get the right order of the expected result. Worse, the output from the two processes may be mixed in strange ways. To overcome this problem, you may consider to use the "unbuffered" write.
If you run this program, you might see the following on the screen:
................
This line is from pid 3456, value 13
This line is from pid 3456, value 14
................
This line is from pid 3456, value 20
This line is from pid 4617, value 100
This line is from pid 4617, value 101
................
This line is from pid 3456, value 21
This line is from pid 3456, value 22
................
Process ID 3456 may be the one assigned to the parent or the child. Due to the fact that these processes are run concurrently, their output lines are intermixed in a rather unpredictable way. Moreover, the order of these lines are determined by the CPU scheduler. Hence, if you run this program again, you may get a totally different result.
You probably don't need to use fork in day-to-day programming if you are writing applications.
Even if you do want your program to start another program to do some task, there are other simpler interfaces which use fork behind the scenes, such as "system" in C and perl.
For example, if you wanted your application to launch another program such as bc to do some calculation for you, you might use 'system' to run it. System does a 'fork' to create a new process, then an 'exec' to turn that process into bc. Once bc completes, system returns control to your program.
You can also run other programs asynchronously, but I can't remember how.
If you are writing servers, shells, viruses or operating systems, you are more likely to want to use fork.
Multiprocessing is central to computing. For example, your IE or Firefox can create a process to download a file for you while you are still browsing the internet. Or, while you are printing out a document in a word processor, you can still look at different pages and still do some editing with it.
Fork creates new processes. Without fork you would have a unix system that could only run init.
Fork() is used to create new processes as every body has written.
Here is my code that creates processes in the form of binary tree.......It will ask to scan the number of levels upto which you want to create processes in binary tree
#include<unistd.h>
#include<fcntl.h>
#include<stdlib.h>
int main()
{
int t1,t2,p,i,n,ab;
p=getpid();
printf("enter the number of levels\n");fflush(stdout);
scanf("%d",&n);
printf("root %d\n",p);fflush(stdout);
for(i=1;i<n;i++)
{
t1=fork();
if(t1!=0)
t2=fork();
if(t1!=0 && t2!=0)
break;
printf("child pid %d parent pid %d\n",getpid(),getppid());fflush(stdout);
}
waitpid(t1,&ab,0);
waitpid(t2,&ab,0);
return 0;
}
OUTPUT
enter the number of levels
3
root 20665
child pid 20670 parent pid 20665
child pid 20669 parent pid 20665
child pid 20672 parent pid 20670
child pid 20671 parent pid 20670
child pid 20674 parent pid 20669
child pid 20673 parent pid 20669
First one needs to understand what is fork () system call. Let me explain
fork() system call creates the exact duplicate of parent process, It makes the duplicate of parent stack, heap, initialized data, uninitialized data and share the code in read-only mode with parent process.
Fork system call copies the memory on the copy-on-write basis, means child makes in virtual memory page when there is requirement of copying.
Now Purpose of fork():
Fork() can be used at the place where there is division of work like a server has to handle multiple clients, So parent has to accept the connection on regular basis, So server does fork for each client to perform read-write.
fork() is used to spawn a child process. Typically it's used in similar sorts of situations as threading, but there are differences. Unlike threads, fork() creates whole seperate processes, which means that the child and the parent while they are direct copies of each other at the point that fork() is called, they are completely seperate, neither can access the other's memory space (without going to the normal troubles you go to access another program's memory).
fork() is still used by some server applications, mostly ones that run as root on a *NIX machine that drop permissions before processing user requests. There are some other usecases still, but mostly people have moved to multithreading now.
The rationale behind fork() versus just having an exec() function to initiate a new process is explained in an answer to a similar question on the unix stack exchange.
Essentially, since fork copies the current process, all of the various possible options for a process are established by default, so the programmer does not have supply them.
In the Windows operating system, by contrast, programmers have to use the CreateProcess function which is MUCH more complicated and requires populating a multifarious structure to define the parameters of the new process.
So, to sum up, the reason for forking (versus exec'ing) is simplicity in creating new processes.
Fork() system call use to create a child process. It is exact duplicate of parent process. Fork copies stack section, heap section, data section, environment variable, command line arguments from parent.
refer: http://man7.org/linux/man-pages/man2/fork.2.html
Fork() was created as a way to create another process with shared a copy of memory state to the parent. It works the way it does because it was the most minimal change possible to get good threading capabilities in time-slicing mainframe systems that previously lacked this capability. Additionally, programs needed remarkably little modification to become multi-process, fork() could simply be added in the appropriate locations, which is rather elegant. Basically, fork() was the path of least resistance.
Originally it actually had to copy the entire parent process' memory space. With the advent of virtual memory, it has been hacked and changed to be more efficient, with copy-on-write mechanisms avoiding the need to actual copy any memory.
However, modern systems now allow the creation of actual threads, which simply share the parent process' actual heap. With modern multi-threading programming paradigms and more advanced languages, it's questionable whether fork() provides any real benefit, since fork() actually prevents processes from communicating through memory directly, and forces them to use slower message passing mechanisms.