Understanding POSIX - fork() - c

I was reading about the fork function and how it creates new processes. The following program runs fine and prints here sixteen times, but, I am having trouble understanding the flow of execution:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <limits.h>
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
int main()
{
int i;
for (i = 0; i < 4; i++) { // line no. 12
fork(); // line no. 13
}
printf("%s\n", "here");
return 0;
}
It seems to me that there are two ways this program can be viewed as:
1st approach: fork() is called a total of four times. If I replace the loop with four calls to the fork() function, things seem to fall in place and I understand why here is printed 2 ^ 4 times.
2nd approach: fork() spawns a new process exactly from where it is invoked and each of these child processes have their own local variables. So, after line no. 13, each of these child processes see the end of the loop (}) and they go to line no. 12. Since, all these child processes have their own local variable i set to 0 (maybe i is set to some garbage value?), they all fork again. Again for these child processes their local variable i is set to 0. This should result in a fork bomb.
I am certainly missing something in my 2nd approach could someone please help?
Thanks.

Your 2nd approach is not right. Because after fork() the child process inherits the current value of i. It's nneither set to 0 everytime fork() is called nor do they have garbage value. So, your code can't have a fork bomb. The fact that it's a local variable is irrelevant. fork() clones pretty much everything and the child process is identical to its parent except for certain things as noted in the POSIX manual.
I'll reduce the loop count to 2 for ease of explaining and assume all fork() calls succeed:
for (i = 0; i < 2; i++) {
fork();
}
printf("%s\n", "here");
1) When i=0, fork() is executed and there are two processes now. Call them P1 and P2.
2) Now, each P1 and P2 processes continue with i=0 the loop and increment i to 1. The for loop condition is true, so each of them spawn another two processes and in total 4. Call them P1a & P1b and P2a & P2b. All 4 processes now have i=1 and increment it to 2 (as they continue the loop).
3) Now, all 4 processes have the value of i as 2 and for loop condition is false in all of them and "here" will be printed 4 times (one by each process).
If it helps, you can convert the for loop to a while loop and how i gets incremented by both processes returning from each fork() might become a bit more clear:
i = 0;
while(i < 2) {
fork();
i++;
}
printf("%s\n", "here");

Your first approach was right.
That's a rather boring answer, so I'll give you all the technical details.
When fork is called, several things happen:
A new process is created ('the child')
The stack of the parent is duplicated, and assigned to the child.
The stack pointer of the child is set to that of the parent.
The PID (process ID) of the child is returned to the parent.
Zero is returned to the child
variables declared inside a function are stored in the stack, and are therefore start at the same value, but are not shared.
variables declared outside a function (at the top level) are not in the stack, so are shared between child/parent.
(Some other things are or are not duplicated; see man fork for more information.)
So, when you run your code:
what happens # of processes
1. the parent forks. 2
2. the parent and it's child fork. 4
3. everyone forks 8
4. everyone forks 16
5. everyone prints "here". 16
You end up with sixteen processes, and the word 'here' sixteen times.
basically,
if(fork() != 0) {
parent_stuff();
} else {
child_stuff();
}

Using following code and you can easily see how fork create the value of variable i:
for (i = 0; i < 4; i++) {
printf("%d %s\n", i, "here");
fork();
}
As what you could expect, the child process copy the value of parent process and so we get 0 line with i = 0; 2 lines with i = 1; 4 lines with i = 2 and 8 lines with i = 3. Which I think answers your 2nd question.

Related

Visually what happens to fork() in a For Loop

I have been trying to understand fork() behavior. This time in a for-loop. Observe the following code:
#include <stdio.h>
void main()
{
int i;
for (i=0;i<3;i++)
{
fork();
// This printf statement is for debugging purposes
// getppid(): gets the parent process-id
// getpid(): get child process-id
printf("[%d] [%d] i=%d\n", getppid(), getpid(), i);
}
printf("[%d] [%d] hi\n", getppid(), getpid());
}
Here is the output:
[6909][6936] i=0
[6909][6936] i=1
[6936][6938] i=1
[6909][6936] i=2
[6909][6936] hi
[6936][6938] i=2
[6936][6938] hi
[6938][6940] i=2
[6938][6940] hi
[1][6937] i=0
[1][6939] i=2
[1][6939] hi
[1][6937] i=1
[6937][6941] i=1
[1][6937] i=2
[1][6937] hi
[6937][6941] i=2
[6937][6941] hi
[6937][6942] i=2
[6937][6942] hi
[1][6943] i=2
[1][6943] hi
I am a very visual person, and so the only way for me to truly understand things is by diagramming. My instructor said there would be 8 hi statements. I wrote and ran the code, and indeed there were 8 hi statements. But I really didn’t understand it. So I drew the following diagram:
Diagram updated to reflect comments :)
Observations:
Parent process (main) must iterate the loop 3 times. Then printf is called
On each iteration of parent for-loop a fork() is called
After each fork() call, i is incremented, and so every child starts a for-loop from i before it is incremented
At the end of each for-loop, "hi" is printed
Here are my questions:
Is my diagram correct?
Why are there two instances of i=0 in the output?
What value of i is carried over to each child after the fork()? If the same value of i is carried over, then when does the "forking" stop?
Is it always the case that 2^n - 1 would be a way to count the number of children that are forked? So, here n=3, which means 2^3 - 1 = 8 - 1 = 7 children, which is correct?
Here's how to understand it, starting at the for loop.
Loop starts in parent, i == 0
Parent fork()s, creating child 1.
You now have two processes. Both print i=0.
Loop restarts in both processes, now i == 1.
Parent and child 1 fork(), creating children 2 and 3.
You now have four processes. All four print i=1.
Loop restarts in all four processes, now i == 2.
Parent and children 1 through 3 all fork(), creating children 4 through 7.
You now have eight processes. All eight print i=2.
Loop restarts in all eight processes, now i == 3.
Loop terminates in all eight processes, as i < 3 is no longer true.
All eight processes print hi.
All eight processes terminate.
So you get 0 printed two times, 1 printed four times, 2 printed 8 times, and hi printed 8 times.
Yes, it's correct. (see below)
No, i++ is executed after the call of fork, because that's the way the for loop works.
If all goes successfully, yes. However, remember that fork may fail.
A little explanation on the second one:
for (i = 0;i < 3; i++)
{
fork();
}
is similar to:
i = 0;
while (i < 3)
{
fork();
i++;
}
So i in the forked processes(both parent and child) is the value before increment. However, the increment is executed immediately after fork(), so in my opinion, the diagram could be treat as correct.
To answer your questions one by one:
Is my diagram correct?
Yes, essentially. It's a very nice diagram, too.
That is to say, it's correct if you interpret the i=0 etc. labels as referring to full loop iterations. What the diagram doesn't show, however, is that, after each fork(), the part of the current loop iteration after the fork() call is also executed by the forked child process.
Why are there two instances of i=0 in the output?
Because you have the printf() after the fork(), so it's executed by both the parent process and the just forked child process. If you move the printf() before the fork(), it will only be executed by the parent (since the child process doesn't exist yet).
What value of i is carried over to each child after the fork()? If the same value of i is carried over, then when does the "forking" stop?
The value of i is not changed by fork(), so the child process sees the same value as its parent.
The thing to remember about fork() is that it's called once, but it returns twice — once in the parent process, and once in the newly cloned child process.
For a simpler example, consider the following code:
printf("This will be printed once.\n");
fork();
printf("This will be printed twice.\n");
fork();
printf("This will be printed four times.\n");
fork();
printf("This will be printed eight times.\n");
The child process created by fork() is an (almost) exact clone of its parent, and so, from its own viewpoint, it "remembers" being its parent, inheriting all of the parent process's state (including all variable values, the call stack and the instruction being executed). The only immediate difference (other than system metadata such as the process ID returned by getpid()) is the return value of fork(), which will be zero in the child process but non-zero (actually, the ID of the child process) in the parent.
Is it always the case that 2^n - 1 would be a way to count the number of children that are forked? So, here n=3, which means 2^3 - 1 = 8 - 1 = 7 children, which is correct?
Every process that executes a fork() turns into two processes (except under unusual error conditions, where fork() might fail). If the parent and child keep executing the same code (i.e. they don't check the return value of fork(), or their own process ID, and branch to different code paths based on it), then each subsequent fork will double the number of processes. So, yes, after three forks, you will end up with 2³ = 8 processes in total.

I don't understand this diagram of fork()

How we can get this process with this condition??schema of process?
int main (int argc, char **argv) {
int i;
int pid;
for (i= 0; i < 3; i++) {
pid = fork();
if (pid < 0) break;// with this condition i dont understand??
}
while (wait(NULL) != -1);
fork() splits a process in two, and returns 0 (if this process is the child), or the PID of the child (if this process is the parent), or -1 if the fork failed. So, this line:
if (pid < 0) break;
Says "exit the loop if we failed to create a child process".
The diagram is a little confusing because of the way the processes (circles) correspond to the fork() calls in the loop. The three child processes of the main process are created when i is 0, 1, and 2 respectively (see the diagram at the bottom of this post).
Since the loop continues in both the parent and the child process from the point fork was called, this is how the forks happen:
i == 0: fork is called in the original parent. There are now two processes (the top one, and the left one).
i == 1: fork is called in the two existing processes. New children are the leftmost child on the second layer from the bottom, and the middle child on the third layer from the bottom. There are now four processes
i == 2: fork is called in all existing processes. New children are all remaining nodes (the bottom node, the two rightmost nodes in the second layer from the borrom, and the rightmost node in the third layer from the bottom)
i == 3: All 8 processes exit the loop
Here is the diagram again, with numbers indicating what the value of i was in the loop when the process was created:
-1 <--- this is the parent that starts the loop
/ | \
0 1 2
/ \ |
1 2 2
|
2
To understand your diagram you must rely on the behavior of fork: it splits the process in two, creating another process identical to the first (except for the PID) in a new memory location.
If you call it in a loop that's what happen:
When i=0 the first process will be split, creating another process that will start running from exactly this point on (so will skip the first loop). Focusing on the first process, it will continue the loop, generating another process when i=1. The second process, thus, will start from i=1, so will skip the first two loops. The first process will be split last time for i=2. The last copy created, however, will start running from i=2, so it will exit the loop and will not generate anything.
The first copy created will start the loop from i=1, generating two process, while the second copy will start from i=2, generating only one copy.
You can continue this reasoning and understand the rest of the diagram.
As others pointed out, if (pid < 0) is just a check to see if there are errors and does not modify the logic of the code.
fork returns -1 if the fork call failed. it returns the pid in parent and 0 in the child. The condition you're looking at doesn't really matter to the functioning of the code; it's just saying if there's an error with fork then exit the loop. If there's no error in the fork call then the process tree in your diagram will be built.
The reason why is that the same loop will continue running in the child processes. So the children will also continue to fork based on the value of i at the time fork was called.
fork return -1 on error, and 0 or positive else, so the line if (pid < 0) break; says "if there was error, exit from the loop".
Assuming that there is not error, it's something like:
At the beginning, i=0, and you have one process. let's call it p0.
In the line fork();, p0 creates another process. let's call it p1.
In everyone of them, we have i++ (so now i is 1), and we are iterating the loop again.
p0 and p1, separately, have a fork(); command, so everyone of them creates another process. let's call the new processes p2 and p3.
Now, in every process, we have i++, that set i to be 2, and we run the loop again.
Everyone of the 4 processes we have, run the line fork();, and creates a new process. so now we have also p4,p5,p6,p7.
Every process increase its i to 3, and then, since the loop condition is now false, the loop finally ends.
Now, the 8 process arrive (separately) to the next line.
(In fact, every iteration double the number of processes, so if you change the 3 to, for example, 15, you will have 2^15 processes at the end.)

fork() execution in for loop

int main(int argc, char** argv) {
int i = 0;
while (i < 2) {
fork();
system("ps -o pid,ppid,comm,stat");
i++;
}
return (EXIT_SUCCESS);
}
Can anyone tell me how many times ps command is executed with an explanation?
I believe the answer is 6.
in the first iteration, fork() is called, splitting the process in 2, thus calling ps twice.
in the second iteration, fork is called again in each process, so you now have 4 processes running ps.
total calls to ps: 2+4=6.
6 times.
It creates a process tree like this:
A-+
|-B-+
| |-C-+
|-D
A does it twice (i=0)
B does it twice (i=0)
C does it once (i=1)
D does it once (i=1)
Note that my usage of letters is to distinguish them. There's no predictable output ordering since process switching is non-deterministic to the eyes of a programmer.
Initial Process
i == 0
-> Fork 1
system call
i == 1
-> Fork 1.1
system call
system call
system call
i == 1
-> Fork 2
system call
system call
I count 6, 2 each from the initial process and the first fork (4), and one from each process forked when i == 1 from those 2 processes.
Of course that's assuming you fix the missing end brace (and define EXIT_SUCCESS), otherwise none, since it won't compile. :-)

C: printf (with fork())

In the next code:
int i = 1;
fork();
i=i*2;
fork();
i=i*2;
fork();
i=i*2;
printf("%d\n", i);
Why 8,8,8,8,8,8,8,8 is printed, and not 1,2,2,4,4,8,8,8? fork() duplicate the process, and print the i before each fork. What I miss?
Given the code shown, you should be seeing eight lots of 6 (you wrote i = i + 2; instead of i = i * 2; for the last computation.
Since each process follows the same code path, each process will produce the same result.
To get the result you expected, you'd have to track whether each fork() yielded the parent or child process:
int i = 1;
if (fork())
{
i=i*2;
if (fork())
{
i=i*2;
if (fork())
i=i*2; // + --> *
}
}
printf(|%d\n", i);
I'm assuming there are no problems with the fork() operation. It is also interesting to note that you could invert any or all of the conditions and end up with the same result.
Because fork continues to execute the code as it goes downwards. So each of the processes will run through the i = i * 2 each time as they spawn off more children. Making it what you get and not what you expected (i.e. it doesn't jump to the end of the block once forked).
Info on fork: http://www.csl.mtu.edu/cs4411/www/NOTES/process/fork/create.html
Each new process gets a copy of the stack of the parent, so immediately after calling fork(), both parent and child have the same value for i -- but they don't have the same stack, just a copy... so changing i's value in one process has no effect on the other.
If you want two parallel pieces of code to share the same memory, either use threads (and memory that's in the heap, not on the stack), or use an explicit shared memory region.

Understanding forks in C

I am having some trouble understanding the following simple C code:
int main(int argc, char *argv[]) {
int n=0;
fork();
n++;
printf("hello: %d\n", n);
}
My current understanding of a fork is that from that line of code on, it will split the rest of the code in 2, that will run in parallel until there is "no more code" to execute.
From that prism, the code after the fork would be:
a)
n++; //sets n = 1
printf("hello: %d\n", n); //prints "hello: 1"
b)
n++; //sets n = 2
printf("hello: %d\n", n); //prints "hello: 2"
What happens, though, is that both print
hello: 1
Why is that?
EDIT: Only now it ocurred to me that contrary to threads, processes don't share the same memory. Is that right? If yes, then that'd be the reason.
After fork() you have two processes, each with its own "n" variable.
fork() starts a new process, sharing no variables/memory locations.
It is very similar to what happens if you execute ./yourprogram twice in a shell, assuming the first thing the program does is forking.
At fork() call's end, both the processes might be referring to the same copy of n. But at n++, each gets its own copy with n=0. At the end of n++; n becomes 1 in both the processes. The printf statement outputs this value.
Actually you spawn a new process of the same progarm. It is not the closure kind of thing. You could use pipes to exchange data between parent and child.
You did indeed answer your own question in your edit.
examine this code and everything should be clearer (see the man pages if you don't know what a certain function does):
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int count = 1;
int main(int argc, char *argv[]) {
// set the "startvalue" to create the random numbers
srand(time(NULL));
int pid;
// as long as count is <= 50
for (count; count<=50; count++) {
// create new proccess if count == 9
if (count==9) {
pid = fork();
// reset start value for generating the random numbers
srand(time(NULL)+pid);
}
if (count<=25) {
// sleep for 300 ms
usleep(3*100000);
} else {
// create a random number between 1 and 5
int r = ( rand() % 5 ) + 1;
// sleep for r ms
usleep(r*100000);
}
if (pid==0) {
printf("Child: count:%d pid:%d\n", count, pid);
} else if (pid>0) {
printf("Father: count:%d pid:%d\n", count, pid);
}
}
return 0;
}
happy coding ;-)
The system call forks more than the execution thread: also forked is the data space. You have two n variables at that point.
There are a few interesting things that follow from all this:
A program that fork()s must consider unwritten output buffers. They can be flushed before the fork, or cleared after the fork, or the program can _exit() instead of exit() to at least avoid automatic buffer flushing on exit.
Fork is often implemented with copy-on-write in order to avoid unnecessarily duplicating a large data memory that won't be used in the child.
Finally, an alternate call vfork() has been revived in most current Unix versions, after vanishing for a period of time following its introduction i 4.0BSD. Vfork() does not pretend to duplicate the data space, and so the implementation can be even faster than a copy-on-write fork(). (Its implementation in Linux may be due less to speed reasons than because a few programs actually depend on the vfork() semantics.)

Resources