Understanding fork() order in C

Understanding fork() order in C - c

So I have this program I'm trying to understand, its from an old exam but I just cant get a grip of it. How do I know the order of the forks and how the variables are changed?
static int g = -1;
int main(int argc, char *argv[])
{
int v = 0;
pid_t p;
while (v++ < 6)
if ((p = fork()) < 0) {
perror("fork error");
return 1;
} else if (p == 0) {
v++;
g--;
} else {
g++;
v+=3;
if (waitpid(p, NULL, 0) != p) {
perror("waitpid error");
return 1;
}
}
printf("mypid = %d parentpid = %d p = %d v = %d g = %d\n",
getpid(), getppid(), p, v, g);
return 0;
}

Each call to fork generates its own process with its own variables, which are copied at the time of the call (logically; optimization may change when the actual copy happens, but not in a way that'll change the outcome).
So when you enter the loop, v gets incremented to 1, then you fork. At this point the parent process has g=-1, v=1, p= and the new child has g=-1, v=1, p=0. The parent then drops into the else case, incrementing g to 0 and v to 4 and then waiting for the child to complete, whereas the child drops into the "else if (p == 0)", increments v to 2, decrements g to -2, and goes around the loop again.
From there, you've hopefully now got enough information to follow the logic as the next two child processes get forked off, finish off the loop, and print their respective results. When they do, the first child will also come to the end of its waitpid with v=6, drop out of the loop, and print its results.
At this point, the parent will unblock, go around the loop one more time (forking off one more child along the way), and (once that child has completed) drop out of the loop.

The call to fork() both starts a new process and continues the old one. If there is some kind of error, it returns an error value. All errors and only errors are negative numbers. This is what the first if block checks.
In the new process, fork() returns 0. The branch that increments v and decrements g is therefore called only in the child process, not the parent.
In the original process, the fork() function returns the process identifier (PID) of the daughter process, which is a positive integer. (This will later be passed to waitpid(). Therefore, the branch that decrements v and increments g is only called in the parent process, not the child.
Each process has its own copy of v and g. (That’s the main difference between a process and a thread: threads share memory.) On a modern SMP operating system, what will happen is that the child process gets a copy of the parent’s memory map. but these refer to the same pages of physical memory until one process or the other writes to them. When that happens, a copy is made of that page of memory and both processes now get their own, different copies.
The way modern Linux kernels implement fork(), the child process will continue before the parent does. This made a significant difference to performance. Most programs that call fork() immediately have the child process call exec() to start a new program. That means it isn’t going to need its copy of the parent’s memory at all. (There is a newer, simpler way to start a different program in a new process now, posix_spawn().) The parent process, on the other hand, almost always keeps running and modifying its memory. Therefore, giving the child the chance to declare that it’s going to discard the memory it inherited means that the parent doesn’t need to worry about leaving an unmodified copy of any memory pages for its children, and the kernel does not have to go through the rigmarole of copy-on-write.
In practice, though, any decent compiler will keep both local variables in registers, so this issue will not arise.
On the next iteration of the loop, which only happens after the child process terminates, a new child process is spawned using the updated values of the parent’s variables. Each child process also continues to run the loop with the same values of v and g that it inherited from its parent.

Related

Is there any formula to know how fork() makes near-perfect copy of the current process?

#include <stdio.h>
main()
{
int i, n=1;
for(i=0;i<n;i++) {
fork();
printf("Hello!");
}
}
I am confused if I put n=1, it prints Hello 2 times.
If n=2, it prints Hello 8 times
If n=3, it prints Hello 24 times..
and so on..

There's is no single "formula" how it's done as different operating systems do it in different ways. But what fork() does is it makes a copy of the process. Rough steps that are usually involved in that:
Stop current process.
Make a new process, and initialize or copy related internal structures.
Copy process memory.
Copy resources, like open file descriptors, etc.
Copy CPU/FPU registers.
Resume both processes.

you dont only fork the 'main' prozess, you also fork the children!
first itteration:
m -> c1
//then
m -> c2 c1-> c1.1
m -> c3 c1-> c1.1 c2->c2.1 c1.1 -> c1.1.1
for i = ....
write it in down this way:
main
fork
child(1)
fork child(2)
fork child(1.1)
fork child(3)
fork child(1.2)
fork child(2.1)
and so on ...

fork() always makes a perfect copy of the current process -- the only difference is the process ID).
Indeed in most modern operating systems, immediately after fork() the two processes share exactly the same chunk of memory. The OS makes a new copy only when the new process writes to that memory (in a mode of operation called "copy-on-write") -- if neither process ever changes the memory, then they can carry on sharing that chunk of memory, saving overall system memory.
But all of that is hidden from you as a programmer. Conceptually you can think of it as a perfect copy.
That copy includes all the values of variables at the moment the fork() happened. So if the fork() happens inside a for() loop, with i==2, then the new process will also be mid-way through a for() loop, with i==2.
However, the processes do not share i. When one process changes the value of i after the fork(), the other process's i is not affected.
To understand your program's results, modify the program so that you know which process is printing each line, and which iteration it is on.
#include <stdio.h>
# include <sys/types.h>
main() {
int i , n=4;
for(i=0;i<n;i++); {
int pid = getpid();
if(fork() == 0) {
printf("New process forked from %d with i=%d and pid=%d", pid, i, getpid());
}
printf("Hello number %d from pid %d\n", i, getpid());
}
}
Since the timing will vary, you'll get output in different orders, but you should be able to make sense of where all the "Hellos" are coming from, and why there are as many as there are.

When you fork() you create a new child process that has different virtual memory but initially shows in the same physical memory with the father. At that point both processes can only read data from the memory. If they want to write or change anything in that common memory, then they use different physical memory and have write properties to that memory. Therefore when you fork() your child has initially the same i value as its father, but that value changes separately for every child.

Yes, there is a formula:
fork() makes exactly one near-identical copy of the process. It returns twice, however. Once in the original process (the parent, returning the child pid) and once in the child (returning 0).

C: fork() child processes

According to the textbook I am reading, the code below creates N child processes that will exit with unique statuses.
/* Parent creates N children */
for (i = 0; i < N; i++)
if ((pid[i] = Fork()) == 0) /* Child */
exit(100+i);
Earlier in the textbook, it states that the following code will have 8 lines of output:
int main(){
Fork();
Fork();
Fork();
printf("hello\n");
exit(0);
}
This leads me to believe that there are 2^n child processes, where n is the number of times fork() is called. Is the reason the first code only produces N child processes (as opposed to 2^N) because the child exits every time, so by the time the subsequent fork() is called, it only operates on the parent process?

Every successful call to fork(), creates a new process.
In the first example, the children processes (the return value of fork() being 0) call exit();, which means they won't call the next fork().
In the second example, every children process continues forking.

When fork() is called it copies the parent data and starts executing from that point individually. So the execution of the parent or child is depend on the scheduling of process. Whichever process get cpu time will be executed whether it is child or parent. We have to take care that which code should run by which process(child or process).

C: printf (with fork())

In the next code:
int i = 1;
fork();
i=i*2;
fork();
i=i*2;
fork();
i=i*2;
printf("%d\n", i);
Why 8,8,8,8,8,8,8,8 is printed, and not 1,2,2,4,4,8,8,8? fork() duplicate the process, and print the i before each fork. What I miss?

Given the code shown, you should be seeing eight lots of 6 (you wrote i = i + 2; instead of i = i * 2; for the last computation.
Since each process follows the same code path, each process will produce the same result.
To get the result you expected, you'd have to track whether each fork() yielded the parent or child process:
int i = 1;
if (fork())
{
i=i*2;
if (fork())
{
i=i*2;
if (fork())
i=i*2; // + --> *
}
}
printf(|%d\n", i);
I'm assuming there are no problems with the fork() operation. It is also interesting to note that you could invert any or all of the conditions and end up with the same result.

Because fork continues to execute the code as it goes downwards. So each of the processes will run through the i = i * 2 each time as they spawn off more children. Making it what you get and not what you expected (i.e. it doesn't jump to the end of the block once forked).
Info on fork: http://www.csl.mtu.edu/cs4411/www/NOTES/process/fork/create.html

Each new process gets a copy of the stack of the parent, so immediately after calling fork(), both parent and child have the same value for i -- but they don't have the same stack, just a copy... so changing i's value in one process has no effect on the other.
If you want two parallel pieces of code to share the same memory, either use threads (and memory that's in the heap, not on the stack), or use an explicit shared memory region.

How to make multiple `fork()`-ed processes comunicate using shared memory?

I have a parent with 5 child processes. I'm wanting to send a random variable to each child process. Each child will square the variable and send it back to the parent and the parent will sum them all together.
Is this even possible? I can't figure it out...
edit: this process would use shared memory.

There are a great number of ways to do this, all involving some form of inter-process communication. Which one you choose will depend on many factors, but some are:
shared memory.
pipes (popen and such).
sockets.
In general, I would probably popen a number of communications sessions in the parent before spawning the children; the parent will know about all five but each child can be configured to use only one.
Shared memory is also a possibility, although you'd probably have to have a couple of values in it per child to ensure communications went smoothly:
a value to store the variable and return value.
a value to store the state (0 = start, 1 = variable ready for child, 2 = variable ready for parent again).
In all cases, you need a way for the children to only pick up their values, not those destined for other children. That may be as simple as adding a value to the shared memory block to store the PID of the child. All children would scan every element in the block but would only process those where the state is 1 and the PID is their PID.
For example:
Main creates shared memory for five children. Each element has state, PID and value.
Main sets all states to "start".
Main starts five children who all attach to the shared memory.
Main stores all their PIDs.
All children start scanning the shared memory for state = "ready for child" and their PID.
Main puts in first element (state = "ready for child", PID = pid1, value = 7).
Main puts in second element (state = "ready for child", PID = pid5, value = 9).
Child pid1 picks up first element, changes value to 49, sets state to "ready for parent"), goes back to monitoring.
Child pid5 picks up second element, changes value to 81, sets state to "ready for parent"), goes back to monitoring.
Main picks up pid5's response, sets that state back to "start.
Main picks up pid1's response, sets that state back to "start.
This gives a measure of parallelism with each child continuously monitoring the shared memory for work it's meant to do, Main places the work there and periodically receives the results.

The nastiest method uses vfork() and lets the different children trample on different parts of memory before exiting; the parent then just adds up the modified bits of memory.
Highly unrecommended - but about the only case I've come across where vfork() might actually have a use.
Just for amusement (mine) I coded this up:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/wait.h>
int main(void)
{
int i;
int array[5];
int square[5];
long sum = 0;
srand(time(0));
for (i = 0; i < 5; i++)
{
array[i] = rand();
if (vfork() == 0)
{
square[i] = array[i] * array[i];
execl("/bin/true", "/bin/true", (char *)0);
}
else
wait(0);
}
for (i = 0; i < 5; i++)
{
printf("in: %d; square: %d\n", array[i], square[i]);
sum += square[i];
}
printf("Sum: %d\n", sum);
return(0);
}
This works. The previous trial version using 'exit(0)' in place of 'execl()' did not work; the square array was all zeroes. Example output (32-bit compilation on Solaris 10, SPARC):
in: 22209; square: 493239681
in: 27082; square: 733434724
in: 2558; square: 6543364
in: 17465; square: 305026225
in: 6610; square: 43692100
Sum: 1581936094
Sometimes, the sum overflows - there is much room for improvement in the handling of that.
The Solaris manual page for 'vfork()' says:
Unlike with the fork() function, the child process borrows
the parent's memory and thread of control until a call to
execve() or an exit (either abnormally or by a call to
_exit() (see exit(2)). Any modification made during this
time to any part of memory in the child process is reflected
in the parent process on return from vfork(). The parent
process is suspended while the child is using its resources.
That probably means the 'wait()' is unnecessary in my code. (However, trying to simplify the code seemed to make it behave indeterminately. It is rather crucial that i does not change prematurely; the wait() does ensure that synchronicity. Using _exit() instead of execl() also seemed to break things. Don't use vfork() if you value your sanity - or if you want any marks for your homework.)

Things like the anti thread might make this a little easier for you, see the examples (in particular the ns lookup program).

fork vs vfork functionality in a C program

I am doing some C exercise for self-learning, and came across the following problem:
Part a:
int main(int argc, char **argv) {
int a = 5, b = 8;
int v;
v = fork();
if(v == 0) {
// 10
a = a + 5;
// 10
b = b + 2;
exit(0);
}
// Parent code
wait(NULL);
printf("Value of v is %d.\n", v); // line a
printf("Sum is %d.\n", a + b); // line b
exit(0);
}
Part b:
int main(int argc, char **argv) {
int a = 5, b = 8;
int v;
v = vfork();
if(v == 0) {
// a = 10
a = a + 5;
// b = 6
b = b - 2;
exit(0);
}
// Parent code
wait(NULL);
printf("Value of v is %d.\n", v); // line a
printf("Sum is %d.\n", a + b); // line b
exit(0);
}
We have to compare the outputs of line a and line b.
The outputs of part a is:
Value of v is 79525.
Sum is 13.
The outputs of part b is:
Value of v is 79517.
Sum is 16.
It appears in part a, the sum is the sum of the initial declaration of a and b, whereas in part b, the sum include the summation within the child process.
My question is - why is this happening?
According to this post:
The basic difference between the two is that when a new process is
created with vfork(), the parent process is temporarily suspended, and
the child process might borrow the parent's address space. This
strange state of affairs continues until the child process either
exits, or calls execve(), at which point the parent process continues.
The definition of parent process is temporarily suspended doesn't make much sense to me. Does this mean that for 1b, the program waits until the child process to finish running (hence why the child process variables get summed) before the parent runs?
The problem statement also assumes that "the process ID of the parent process maintained by the kernel is 2500, and that the new processes are created by the operating system before the child process is created."
By this definition, what would the value of v for both programs be?

the parent process is temporarily suspended
Basically, the parent process will not run until the child calls either _exit or one of the exec functions. In your example this means the child will run and therefore perform the summation before the parent runs and does the prints.
As for:
My question is - why is this happening?
First, your part b has undefined behavior because you are violating the vfork semantics. Undefined behavior for a program means the program will not behave in a predictable manner. See this SO post on undefined behavior for more details (it includes some C++ but most of the ideas are the same). From the POSIX specs on vfork:
The vfork() function has the same effect as fork(2), except that the
behavior is undefined if the process created by vfork() either
modifies any data other than a variable of type pid_t used to store
the return value from vfork(), or returns from the function in which
vfork() was called, or calls any other function before successfully
calling _exit(2) or one of the exec(3) family of functions.
So your part b could really do anything. However, you will probably see a somewhat consistent output from part b. This is because when you use vfork you are not creating a new address space. Instead, the child process basically "borrows" the address space of the parent, usually with the intent that it will call one of the exec functions and create a new program image. Instead in your part b you are using the parent address space. Basically, after the child has called exit (which is also invalid as it should call _exit) a most likely will equal 10 and b will most likely equal 6 in the parent. Therefore, the summation is 16 as shown in part b. I say most likely because as stated before this program has undefined behavior.
For part a where fork is used the child gets its own address space and its modifications are not seen in the parent, therefore the value printed is 13 (5 + 8).
Finally with regards to the value of v, this is seems just to be something the question is stating to make the output it is showing make sense. The value of v could be any valid value returned by vfork or fork and does not have to be limited to 2500.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight