Understanding the fork system call in UNIX - c

I'm trying to understand both the execution order of a line of code given to me earlier and process creation using the fork() system call. It's in C language for running on UNIX.
I understand the main concept behind fork(), nevertheless, I want to understand the process tree creation derived from the following line of code:
x = fork() || !fork();
Any help will be greatly appreciated!

The initial parent-process who runs the code (let's say its is PID = 1000), executes the x = fork() part, and spawns a new child-process (let's say PID = 1001).
For process PID = 1000, the value of the logical expression so far is non-zero, because fork(2) returns its child's pid. Thus, due to short circuit evaluation, the rest part of the logical expression isn't evaluated as it's not needed.
For process PID = 1001, the value of the logical expression so far is 0, because in the case of the new process created, fork(2) returns 0; so it has to evaluate the rest part of the logical expression too. It executes the !fork() part, spawning a new process (let's say PID = 1002).
The process tree derived is the following:
1000
|
|
1001
|
|
1002

chrk's answer is great, in addition, fork() return 2 times, one is pid of child process in parent process, the other is 0 returned from child process. Both processes continue to process after fork() returns.

Related

How does fork with && and || Operator work?

I am trying to understand the Output of this Program. But i don't get it. I read about fork and how it works and i tried to understand it, but when i mix it with the && or || Operators i don't understand why the Output is different.
Some Times i get one word, sometimes 2, 3, 4...
#include<unistd.h>
#include<stdio.h>
int main (int argc, char *argv[]) {
(fork()&&fork()) || fork();
printf("Test\n");
return 0;
}
Any Idea how it works?
Somethings that will help in better understanding this code:
1. Fork return values - When you call fork anywhere in your code, you create a new process, and the new process also runs the same code as the parent. The difference between the parent and child process are known from the return value that fork gives. For the child process, fork return value is equal to 0, and for the parent process, it is non-zero.
2. Short circuit logic evaluation - When executing boolean logic in C, the execution is run in a short circuit way.
For a statement && where and are 2 expressions that need to be evaluated to either true or false, the second expression () is checked for its trueness only if the first expression () is evaluated to true. This is because, if is itself false, then regardless of the value of the , the resulting logical expression becomes false, hence making the evaluation of useless.
For a statement like || , is executed only if is false, for a similar reason like above.
In your code snippet, after the first fork() is executed, the child process skips the second fork (due to short circuit of &&) and goes to the third fork(). Resulting in "Test" printed 2 times.
The original parent then encounters the second fork(), and the new child process proceeds to the third fork() and prints Test 2 times again. The parent skips the third fork(), as for it, both the first and second forks are non-zero (short circuit of ||) and prints "Test" once.
Thus "Test" gets printed a total of 5 times.
P1 (fork-1)
/ \
P2(fork-2) P1 (fork-2)
/ \ / \
P3 P2 P4(fork-3) P1
/ \
P4 P5
Based on fork create successfully or not the output should be varied. If all forks are created successfully then 5 times Test should be emerged in output.
1- one output for main thread.(1)
2- main thread create two fork ( firstpart -> (fork() && fork() || secondpart->fork() ) because the first part evaluate as true and the second part do not execute in main thread. So two output should be seen here.(1+2)
3- every two created fork create a new fork but firstpart do not execute in child but secondpart are executed in each child so two another outputs would be seen in overall 5 outputs should be seen.(1+2+2)
First: Sorry for the Question. I had not done enough research, and i didn't consider each fork as child and as Parent.
What i learned is :
The first fork will run and it will create a child.Now we have 2 Processes, Parent returns a number not equal to 0 and Child return 0.
Because the Parent returned a number not 0 will the fork after && also run, and it will create another child Process. The 1 Child Process returned 0 so it wont run after &&. We now have 3 Processes
The fork after || wont run for the Parent because it returned a not 0 Number, but it will run for the child because it returned 0 so we end up with 5 Processes. Hope i am right

what should be returned when calling "getpid() == fork()"

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv) {
printf ("%d", getpid() == fork());
return 0;
}
The output of this program is 00. I don't quite understand why 0 is printed twice. My understanding it that after fork() is called, a child process is created. And now both processes continue running the next line of the program. Doesn't that mean the child process will run return 0? I can see that I will get 00 if it's fork() == getpid() tho. Thanks!
fork does not cause the child process to jump to the "next line". Assuming the operation succeeds, the fork function call returns twice, once in each process. So both processes execute the comparison to getpid.
Also, the C standard doesn't specify whether the call to getpid happens before the call to fork (in which case it happens only once) or after (in which case it happens twice and returns two different values), but this turns out not to matter, because all of the possible situations lead to the comparison being false:
fork fails: no new process is created and it returns −1 to the parent, which is not a valid process ID and therefore cannot compare equal to the value returned to the parent by getpid. It doesn't matter whether the getpid happens before or after the fork, because it's the same process in either case.
fork succeeds: it returns the child's process ID to the parent, and it returns zero to the child.
The child's process ID cannot be equal to the parent's process ID (because both of them are running at the same time), so, in the parent, the comparison will always be false, and it doesn't matter whether the getpid happens before or after the fork because it's the same process in either case.
If the getpid call happened before the fork, the child will compare zero to the parent's process ID; if it happened after the fork, the child will compare zero to the child's process ID. Zero is not a valid process ID either, so the comparison will be false either way.
getpid is one of a very few system calls that POSIX says cannot ever fail, so we don't have to worry about that possibility.
Therefore, the only things this program can print are 0 (if fork fails) or 00 (if fork succeeds).
I would strongly recommend not writing anything like this in a real program. Operations with "abnormal" control-flow behavior, like fork, should always be done as stand-alone statements, because this makes the program easier for humans to read. You might not have realized yet just how important that is, so let me leave you with an exercise: reread a program that you wrote more than three months ago, and try to remember what it does and why. (If you haven't been programming for long enough to do that, make a note to do this exercise when you can.)
fork documentation
When fork() returns, in both parent and child processes - it returns immediately after the ending of the invocation of the fork() system call. It could well be in the middle of the line as is the case here - you return with a value for the expression fork() and use that value to continue evaluating the expression that contained it - in this case - a condition within a printf.
fork() returns the PID of the child process in the parent process or 0 in the child process (or -1 on error).
In this case, both processes have the condition getpid() == fork() return 0 (false) since in the parent, the getpid() is different from the value returned by fork() since it's the parent processes pid (not the child's) and in the child - the fork returns 0 which is an illegal PID and can't be returned by getpid().
Thus the 00 output.

Fork() function in C programming

I just need to understand this statement:
if (fork() && !fork())
shouldn't it always be false? I mean, if I write:
if (a && !a)
It's always false so the first should always be false too, am I wrong? Of course I am, but I'm hoping someone can explain this strange thing to me.
I'm studying C for an exam and I had to resolve this code:
int main(){
if(fork && !fork()){
printf("a\n");
}
else printf("b\n");
}
Every calls to the unix process creation system call fork() returns twice. First it returns with the PID of the child to the parent(the process which called fork()). Second it returns to 0 to the newly created child.
from man pages:
Return Value
On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately.
in your case
if (fork() && !fork())
The statement inside if , calls fork twice. So what will happen is following :
A
|----------------B
| |
|---C |
| | |
Now first call to fork() will return in both A and B. In A it will be nonzero and in B it will be zero.
Second call to fork() will be evoked only from A. because first fork returned 0 to B, it will not Evoke a second fork(). its because && short circuits the evaluation if first operand is found non zero. Thanks to Daniel for pointing this out.
So we can make a table out of this:
PID fork()1 fork()2
------------------------------
A >0 >0
B =0 >0
C >0 =0
So from the chart, Process C's if will be evaluated to TRUE
Its important to remember, fork()1 didn't returned to C . it got the copy of Already evaluated expression from its parent.
I hope this explains your question.
First off, is a function. It may not always return the same value.
In this case specifically, fork is a function which creates another process. The original process gets a positive return value (of the child's pid) and the child process gets a return value of 0.
In your code, there end up being a total of three processes. The if statement will evaluate to true for 1 of them (process C below).
A
|__________B
| |
|__C |
| | |
| | |
shouldn't be always false?
No.
Because it's not a variable, each call tofork() creates a new child process.
Each call to fork() returns two values, one to each process. So for each decision, there's one process that takes each path.
The fork() function call returns 0 to the child process and the process ID to the parent process. Basically, what this does is forks once. If the process is the parent, it jumps to next block, the child then forks again. The parent of this process jumps to the next block, and the child in this block executes the code in the if statement.

Confused with output of fork system call [duplicate]

Why does this program print “forked!” 4 times?
#include <stdio.h>
#include <unistd.h>
int main(void) {
fork() && (fork() || fork());
printf("forked!\n");
return 0;
}
The one comes from main() and the other three from every fork().
Notice that all three forks() are going to be executed. You might want to take a look at the ref:
RETURN VALUE
Upon successful completion, fork() shall return 0 to the child process and shall return the process ID of the child process to the parent process. Both processes shall continue to execute from the fork() function. Otherwise, -1 shall be returned to the parent process, no child process shall be created, and errno shall be set to indicate the error.
Note that the process id cannot be zero, as stated here.
So what really happens?
We have:
fork() && (fork() || fork());
So the first fork() will return to the parent its non zero process id, while it will return 0 to the child process. That means that the logic expression's first fork will be evaluated to true in the parent process, while in the child process it will be evaluated to false and, due to Short circuit evaluation, it will not call the remaining two fork()s.
So, now we know that are going to get at least two prints (one from main and one from the 1st fork()).
Now, the 2nd fork() in the parent process is going to be executed, it does and it returns a non-zero value to the parent process and a zero one in the child process.
So now, the parent will not continue execution to the last fork() (due to short circuiting), while the child process will execute the last fork, since the first operand of || is 0.
So that means that we will get two more prints.
As a result, we get four prints in total.
Short circuiting
Here, short circuiting basically means that if the first operand of && is zero, then the other operand(s) is/are not evaluated. On the same logic, if an operand of a || is 1, then the rest of the operands do not need evaluation. This happens because the rest of the operands cannot change the result of the logic expression, so they do not need to be executed, thus we save time.
See example below.
Process
Remember that a parent process creates offspring processes which in turn create other processes and so on. This leads to a hierarchy of processes (or a tree one could say).
Having this in mind, it's worth taking a look at this similar problem, as well as this answer.
Descriptive image
I made also this figure which can help, I guess. I assumed that the pid's fork() returned are 3, 4 and 5 for every call.
Notice that some fork()s have a red X above them, which means that they are not executed because of the short-circuiting evaluation of the logic expression.
The fork()s at the top are not going to be executed, because the first operand of the operator && is 0, thus the whole expression will result in 0, so no essence in executing the rest of the operand(s) of &&.
The fork() at the bottom will not be executed, since it's the second operand of a ||, where its first operand is a non-zero number, thus the result of the expression is already evaluated to true, no matter what the second operand is.
And in the next picture you can see the hierarchy of the processes:
based on the previous figure.
Example of Short Circuiting
#include <stdio.h>
int main(void) {
if(printf("A printf() results in logic true\n"))
;//empty body
if(0 && printf("Short circuiting will not let me execute\n"))
;
else if(0 || printf("I have to be executed\n"))
;
else if(1 || printf("No need for me to get executed\n"))
;
else
printf("The answer wasn't nonsense after all!\n");
return 0;
}
Output:
A printf() results in logic true
I have to be executed
The first fork() returns a non-zero value in the calling process (call it p0) and 0 in the child (call it p1).
In p1 the shortcircuit for && is taken and the process calls printf and terminates. In p0 the process must evaluate the remainder of the expression. Then it calls fork() again, thus creating a new child process (p2).
In p0 fork() returns a non-zero value, and the shortcircuit for || is taken, so the process calls printf and terminates.
In p2, fork() returns 0 so the remainder of the || must be evaluated, which is the last fork(); that leads to the creation of a child for p2 (call it p3).
P2 then executes printf and terminates.
P3 then executes printf and terminates.
4 printfs are then executed.
For all the downvoters, this is from a merged but different question. Blame SO. Thank you.
You can decompose the problem to three lines, the first and last lines both simply double the number of processes.
fork() && fork() || fork();
The operators are short-circuiting, so this is what you get:
fork()
/ \
0/ \>0
|| fork() && fork()
/\ / \
/ \ 0/ \>0
* * || fork() *
/ \
* *
So this is altogether 4 * 5 = 20 processes each printing one line.
Note: If for some reason fork() fails (for example, you have some limit on the number of processes), it returns -1 and then you can get different results.
Executing fork() && (fork() || fork()), what happens
Each fork gives 2 processes with respectively values pid (parent) and 0 (child)
First fork :
parent return value is pid not null => executes the && (fork() || fork())
second fork parent value is pid not null stops executing the || part => print forked
second fork child value = 0 => executes the || fork()
third fork parent prints forked
third fork child prints forked
child return value is 0 stop executing the && part => prints forked
Total : 4 forked
I like all the answers that have already been submitted. Perhaps if you added a few more variables to your printf statement, it would be easier for you to see what is happening.
#include<stdio.h>
#include<unistd.h>
int main(){
long child = fork() && (fork() || fork());
printf("forked! PID=%ld Child=%ld\n", getpid(), child);
return 0;
}
On my machine it produced this output:
forked! PID=3694 Child = 0
forked! PID=3696 Child = 0
forked! PID=3693 Child = 1
forked! PID=3695 Child = 1
This code:
fork();
fork() && fork() || fork();
fork();
gets 20 processes for itself and 20 times Printf will go.
And for
fork() && fork() || fork();
printf will go a total of 5 times.

Differences between fork and exec

What are the differences between fork and exec?
The use of fork and exec exemplifies the spirit of UNIX in that it provides a very simple way to start new processes.
The fork call basically makes a duplicate of the current process, identical in almost every way. Not everything is copied over (for example, resource limits in some implementations) but the idea is to create as close a copy as possible.
The new process (child) gets a different process ID (PID) and has the PID of the old process (parent) as its parent PID (PPID). Because the two processes are now running exactly the same code, they can tell which is which by the return code of fork - the child gets 0, the parent gets the PID of the child. This is all, of course, assuming the fork call works - if not, no child is created and the parent gets an error code.
The exec call is a way to basically replace the entire current process with a new program. It loads the program into the current process space and runs it from the entry point.
So, fork and exec are often used in sequence to get a new program running as a child of a current process. Shells typically do this whenever you try to run a program like find - the shell forks, then the child loads the find program into memory, setting up all command line arguments, standard I/O and so forth.
But they're not required to be used together. It's perfectly acceptable for a program to fork itself without execing if, for example, the program contains both parent and child code (you need to be careful what you do, each implementation may have restrictions). This was used quite a lot (and still is) for daemons which simply listen on a TCP port and fork a copy of themselves to process a specific request while the parent goes back to listening.
Similarly, programs that know they're finished and just want to run another program don't need to fork, exec and then wait for the child. They can just load the child directly into their process space.
Some UNIX implementations have an optimized fork which uses what they call copy-on-write. This is a trick to delay the copying of the process space in fork until the program attempts to change something in that space. This is useful for those programs using only fork and not exec in that they don't have to copy an entire process space.
If the exec is called following fork (and this is what happens mostly), that causes a write to the process space and it is then copied for the child process.
Note that there is a whole family of exec calls (execl, execle, execve and so on) but exec in context here means any of them.
The following diagram illustrates the typical fork/exec operation where the bash shell is used to list a directory with the ls command:
+--------+
| pid=7 |
| ppid=4 |
| bash |
+--------+
|
| calls fork
V
+--------+ +--------+
| pid=7 | forks | pid=22 |
| ppid=4 | ----------> | ppid=7 |
| bash | | bash |
+--------+ +--------+
| |
| waits for pid 22 | calls exec to run ls
| V
| +--------+
| | pid=22 |
| | ppid=7 |
| | ls |
V +--------+
+--------+ |
| pid=7 | | exits
| ppid=4 | <---------------+
| bash |
+--------+
|
| continues
V
fork() splits the current process into two processes. Or in other words, your nice linear easy to think of program suddenly becomes two separate programs running one piece of code:
int pid = fork();
if (pid == 0)
{
printf("I'm the child");
}
else
{
printf("I'm the parent, my child is %i", pid);
// here we can kill the child, but that's not very parently of us
}
This can kind of blow your mind. Now you have one piece of code with pretty much identical state being executed by two processes. The child process inherits all the code and memory of the process that just created it, including starting from where the fork() call just left off. The only difference is the fork() return code to tell you if you are the parent or the child. If you are the parent, the return value is the id of the child.
exec is a bit easier to grasp, you just tell exec to execute a process using the target executable and you don't have two processes running the same code or inheriting the same state. Like #Steve Hawkins says, exec can be used after you forkto execute in the current process the target executable.
I think some concepts from "Advanced Unix Programming" by Marc Rochkind were helpful in understanding the different roles of fork()/exec(), especially for someone used to the Windows CreateProcess() model:
A program is a collection of instructions and data that is kept in a regular file on disk. (from 1.1.2 Programs, Processes, and Threads)
.
In order to run a program, the kernel is first asked to create a new process, which is an environment in which a program executes. (also from 1.1.2 Programs, Processes, and Threads)
.
It’s impossible to understand the exec or fork system calls without fully understanding the distinction between a process and a program. If these terms are new to you, you may want to go back and review Section 1.1.2. If you’re ready to proceed now, we’ll summarize the distinction in one sentence: A process is an execution environment that consists of instruction, user-data, and system-data segments, as well as lots of other resources acquired at runtime, whereas a program is a file containing instructions and data that are used to initialize the instruction and user-data segments of a process. (from 5.3 exec System Calls)
Once you understand the distinction between a program and a process, the behavior of fork() and exec() function can be summarized as:
fork() creates a duplicate of the current process
exec() replaces the program in the current process with another program
(this is essentially a simplified 'for dummies' version of paxdiablo's much more detailed answer)
Fork creates a copy of a calling process.
generally follows the structure
int cpid = fork( );
if (cpid = = 0)
{
//child code
exit(0);
}
//parent code
wait(cpid);
// end
(for child process text(code),data,stack is same as calling process)
child process executes code in if block.
EXEC replaces the current process with new process's code,data,stack.
generally follows the structure
int cpid = fork( );
if (cpid = = 0)
{
//child code
exec(foo);
exit(0);
}
//parent code
wait(cpid);
// end
(after exec call unix kernel clears the child process text,data,stack and fills with foo process related text/data)
thus child process is with different code (foo's code {not same as parent})
They are use together to create a new child process. First, calling fork creates a copy of the current process (the child process). Then, exec is called from within the child process to "replace" the copy of the parent process with the new process.
The process goes something like this:
child = fork(); //Fork returns a PID for the parent process, or 0 for the child, or -1 for Fail
if (child < 0) {
std::cout << "Failed to fork GUI process...Exiting" << std::endl;
exit (-1);
} else if (child == 0) { // This is the Child Process
// Call one of the "exec" functions to create the child process
execvp (argv[0], const_cast<char**>(argv));
} else { // This is the Parent Process
//Continue executing parent process
}
The main difference between fork() and exec() is that,
The fork() system call creates a clone of the currently running program. The original program continues execution with the next line of code after the fork() function call. The clone also starts execution at the next line of code.
Look at the following code that i got from http://timmurphy.org/2014/04/26/using-fork-in-cc-a-minimum-working-example/
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
printf("--beginning of program\n");
int counter = 0;
pid_t pid = fork();
if (pid == 0)
{
// child process
int i = 0;
for (; i < 5; ++i)
{
printf("child process: counter=%d\n", ++counter);
}
}
else if (pid > 0)
{
// parent process
int j = 0;
for (; j < 5; ++j)
{
printf("parent process: counter=%d\n", ++counter);
}
}
else
{
// fork failed
printf("fork() failed!\n");
return 1;
}
printf("--end of program--\n");
return 0;
}
This program declares a counter variable, set to zero, before fork()ing. After the fork call, we have two processes running in parallel, both incrementing their own version of counter. Each process will run to completion and exit. Because the processes run in parallel, we have no way of knowing which will finish first. Running this program will print something similar to what is shown below, though results may vary from one run to the next.
--beginning of program
parent process: counter=1
parent process: counter=2
parent process: counter=3
child process: counter=1
parent process: counter=4
child process: counter=2
parent process: counter=5
child process: counter=3
--end of program--
child process: counter=4
child process: counter=5
--end of program--
The exec() family of system calls replaces the currently executing code of a process with another piece of code. The process retains its PID but it becomes a new program. For example, consider the following code:
#include <stdio.h>
#include <unistd.h>
main() {
char program[80],*args[3];
int i;
printf("Ready to exec()...\n");
strcpy(program,"date");
args[0]="date";
args[1]="-u";
args[2]=NULL;
i=execvp(program,args);
printf("i=%d ... did it work?\n",i);
}
This program calls the execvp() function to replace its code with the date program. If the code is stored in a file named exec1.c, then executing it produces the following output:
Ready to exec()...
Tue Jul 15 20:17:53 UTC 2008
The program outputs the line ―Ready to exec() . . . ‖ and after calling the execvp() function, replaces its code with the date program. Note that the line ― . . . did it work‖ is not displayed, because at that point the code has been replaced. Instead, we see the output of executing ―date -u.‖
fork() creates a copy of the current process, with execution in the new child starting from just after the fork() call. After the fork(), they're identical, except for the return value of the fork() function. (RTFM for more details.) The two processes can then diverge still further, with one unable to interfere with the other, except possibly through any shared file handles.
exec() replaces the current process with a new one. It has nothing to do with fork(), except that an exec() often follows fork() when what's wanted is to launch a different child process, rather than replace the current one.
fork():
It creates a copy of running process. The running process is called parent process & newly created process is called child process. The way to differentiate the two is by looking at the returned value:
fork() returns the process identifier (pid) of the child process in the parent
fork() returns 0 in the child.
exec():
It initiates a new process within a process. It loads a new program into the current process, replacing the existing one.
fork() + exec():
When launching a new program is to firstly fork(), creating a new process, and then exec() (i.e. load into memory and execute) the program binary it is supposed to run.
int main( void )
{
int pid = fork();
if ( pid == 0 )
{
execvp( "find", argv );
}
//Put the parent to sleep for 2 sec,let the child finished executing
wait( 2 );
return 0;
}
The prime example to understand the fork() and exec() concept is the shell,the command interpreter program that users typically executes after logging into the system.The shell interprets the first word of command line as a command name
For many commands,the shell forks and the child process execs the command associated with the name treating the remaining words on the command line as parameters to the command.
The shell allows three types of commands. First, a command can be an
executable file that contains object code produced by compilation of source code (a C program for example). Second, a command can be an executable file that
contains a sequence of shell command lines. Finally, a command can be an internal shell command.(instead of an executable file ex->cd,ls etc.)

Resources