Working of fork() - c

I recently learned about the function fork() in C. Since this function creates two concurrent processes and these two processes share the memory. So I have the following code:
#include<stdio.h>
int main()
{
int pid,i;
i=0;
pid=fork();
if(pid==0)
{
i++;
printf("child process:: address of i:%u value of i:%d\n",(int)&i,i);
}
else if(pid>0)
{
wait(NULL);
i--;
printf("parent process:: address of i:%u value of i:%d\n",(int)&i,i);
}
return 0;
}
The output I am getting is:
child process:: address of i:3215563096 value of i:1
parent process:: address of i:3215563096 value of i:-1
but since every time child is executing first so the value at memory location 3215563096 should become +1 which is on contrast 0 for the parent process.
My expected output is:
child process:: address of i:3215563096 value of i:1
parent process:: address of i:3215563096 value of i:0
Can someone please tell me where I am wrong?

The second process does use the same memory as the original when using fork; however, the memory is marked as copy-on-write which means that as soon as the child tries to modify it the memory management in the OS will make a copy of the page so the original process will not see the modified memory. See more at fork wiki.

The child and the parent have different address spaces. The virtual memory addresses (which you are printing) are the same. But these addresses refer to different physical memory areas.
It's like two people living on "1st Street" in different cities.
The advantage of this scheme is that each process has a different view of the memory - memory protection and separation are inherent.
Side note: use %p instead of %u when printing addresses.

When you fork(), the child process gets an exact copy of the parent process' memory. They do not share memory.
The address of your variable i is the same because they refer to different address spaces - one for the parent, one for the child.
You may be thinking of threads. Threads share memory.

fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process.
The entire virtual address space of the parent is replicated in the child.
This is the reason You get same address printed by printf() but physically they are present in different location.
NOTE
Remember when ever we use address of operator(&) we get the virtual address only(which is a relative with respect to your process only). The physical address are only managed by memory manager of your operating system(a primary service of your os).
for more information use command in your linux shell
man 2 fork

Related

Shared variable with fork()

I have the code:
int a = 0;
if (fork() == 0) {
a = a + 1;
printf("%d \n", a);
} else {
a = a - 1;
printf("%d \n", a);
}
And I want to know whether or not this code will always print different values for the 2 prints?
Or if we will have context switch after the son process will add 1 and then the father process will do minus 1. In this case we will get same value. The thing is I'm not sure if the variable 'a' is duplicated and each process will have different copy of it (and then we will always get different values), or is shared (and then the situation I mentions can cause the same value to be printed twice)?
Thanks a lot!
A forked child doesn't share its address space with its parent. It gets a copy, although as an optimization, that copy may be a lazy, copy-on-write copy, which is what Linux does.
On Linux a forked child process will initially share its address space with its parent, but if it attempts to write anywhere, then (provided that write attempt is not a segmentation fault) Linux will redirect that write to a copy of the original page.
The fork() creates a child process by duplicating the calling process.
The process that invoked fork() is the parent process and the newly created process is the child process. The child process and the parent process run in separate memory spaces. But at the time of fork() both memory spaces have the same content.
There is concept called Copy-on-Write, which is an optimization where the page tables are set up so that the parent and child process start off sharing all of the same memory, and only the pages that are written to by either process are copied when needed. Which means both parent and child process share copy of the same data and as soon as either of them do a write, a copy is made and any change in the memory of one process is not visible in another's.
In the context of your program -
Yes, both parent and child process share a copy of variable a until its written by either of the process and then a copy is made. Any change in value of variable a done by either of the process (parent/child) will be in its own copy of a and that change is not visible to the other process.
Check this - Copy-on-write.

Fork() executes the same program and has copy same variables - how do the OS keep both in memory, safeguarding each process only access his variables?

Fork() executes the same program and has copy same variables of the father at the moment of the fork, how do the OS keep both process in memory, safeguarding each process only access his variables?
When the kernel creates a new process, it also creates a new memory mapping. Initially all pages in the new mapping are shared with the parent process, but once pages in the map are modified by the child process those are copied into their own pages.
Useful terms to search for: Virtual memory, on demand paging, memory mapping, shared memory, copy on write.
The OS copies virtual memory space of the forking process (with possible optimizations like copy-on-write).
Fork is a technique that in general makes a separate address space for the child. The child has the same memory of the parent, but they have different PID. So you can distinguish them: specifically fork() returns 0 in the child process and a non zero value (child's PID) in the parent process.

About pointers after fork()

This is sort of a technical question, maybe you can help me if you know about C and UNIX (or maybe it is a really newbie question!)
A question came up today while analizing some code in our Operative Systems course. We are learning what it means to "fork" a process in UNIX, we already know it creates a copy of the current process parallel to it and they have separate data sections.
But then I thought that maybe, if one creates a variable and a pointer pointing at it before doing fork(), because the pointer stores the memory address of the variable, one could try to modify the value of that variable from the child process by using that pointer.
We tried a code similar to this in class:
#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
int main (){
int value = 0;
int * pointer = &value;
int status;
pid_t pid;
printf("Parent: Initial value is %d\n",value);
pid = fork();
switch(pid){
case -1: //Error (maybe?)
printf("Fork error, WTF?\n");
exit(-1);
case 0: //Child process
printf("\tChild: I'll try to change the value\n\tChild: The pointer value is %p\n",pointer);
(*pointer) = 1;
printf("\tChild: I've set the value to %d\n",(*pointer));
exit(EXIT_SUCCESS);
break;
}
while(pid != wait(&status)); //Wait for the child process
printf("Parent: the pointer value is %p\nParent: The value is %d\n",pointer,value);
return 0;
}
If you run it, you'll get something like this:
Parent: Initial value is 0
Child: I'll try to change the value
Child: The pointer value is 0x7fff733b0c6c
Child: I've set the value to 1
Parent: the pointer value is 0x7fff733b0c6c
Parent: The value is 0
It's obvious that the child process didn't affect at all the parent process. Frankly, I was expecting some "segmentation fault" error, because of accessing a not permitted memory address. But what really happened?
Remember, I'm not looking for a way to communicate processes, that's not the point. What I want to know is what did the code do. Inside the child process, the change is visible, so it DID something.
My main hypothesis is that pointers are not absolute to memory, they are relative to the process' stack. But I haven't been able to find an answer (no one in class knew, and googling I just found some questions about process communication) so I'd like to know from you, hopefully someone will know.
Thanks for taking your time reading!
The key here is the concept of a virtual address space.
Modern processors (Say anything newer then a 80386) have a memory management unit which maps from a per process virtual address space to physical memory pages under control of the kernel.
When the kernel sets up a process it creates a set of page table entries for that process that define the physical memory pages to virtual address space mapping, and it is in this virtual address space that the program executes.
Conceptually when you fork, the kernel copies the existing process pages to a new set of physical pages and sets up the new processes page tables so that as far as the new process is concerned it appears to be running in the same virtual memory layout as the original one had, while actually addressing entirely different physical memory.
The detail is more subtle as nobody wants to waste time copying hundreds of MB of data unless such is necessary.
When the process calls fork() the kernel sets up a second set of page table entries (for the new process), but points them at the same physical pages as the original process, it then sets the flag in both sets of pages to make the mmu consider them read only.....
As soon as either process writes to a page, the memory management unit generates a page fault (due to the PTE entry having the read only flag set), and the page fault handler then allocates a new page from physical memory, copies the data over, updates the page table entry and sets the pages back to read/write.
In this way, pages are only actually copied the first time either process tries to make a change to a copy on write page, and the slight of hand goes completely unnoticed by either process.
Regards, Dan.
Logically, the fork()ed process gets its own, independent copy of more or less the whole state of the parent process. That couldn't work if pointers in the child referred to memory belonging to the parent.
The details of how a particular UNIX-like kernel makes that work can vary. Linux implements the child process's memory via copy-on-write pages, which makes fork()ing comparatively cheap relative to other possible implementations. In that case, the child's pointers really do point to the parent process's memory, up until such time that either child or parent tries to modify that memory, at which time a copy is made for the child to use. That all relies on the underlying virtual memory system. Other UNIX and UNIX-like systems can and have done it differently.
The child modified a pointer that is perfectly legal in its address space because it is a copy of its parent. There was no effect on the parent because the memory is not logically shared. Each process gets to go its separate way after the fork.
UNIX has a number of ways of creating shared memory (where one process can modify memory and have that modification seen by another process), but fork is not one of them. And it's a good thing because otherwise, synchronization between the parent and child would be almost impossible.

freeing "copy-on-write" memory that wasn't changed

I get the idea behind copy-on-write. When I fork, the heap is marked as CoW, and when any process tries to change it, a copy is made. The question is: do I have to free it in a child's process nonetheless? Suppose a parent has a dynamic char *array, then it forks. A child process prints some const char, and exits. The child process did not alter the heap at all. Will there be a memory leak?
edit: My child process prints the array on heap, but doesn't modify it. Valgrind says there is a leak if I do not free that array. No leak/memory errors when I free it.
CoW is just a lazy optimization. You may freely think that fork() always makes the full copy of the process (in terms of memory at least) without any delay. But…
If you did prepare dynamic data chunk to "pass" to fork's child, then after fork you have two processes with two dynamic data chunks: parent and child (both are copies). When child exits, it's copy of memory is reclaimed, but parent should free that chunk right after fork by itself.
To be more clear, here is an example:
char *buf = malloc(123456);
// … fill buf for child …
int res = fork();
if (res == -1) {
fprintf(stderr, "fork failed\n");
exit(EXIT_FAILURE);
}
if (res == 0) {
// this is child process
// … do work with buf …
_Exit(EXIT_SUCCESS); // child reclaims buf by means of exit
}
// this is parent process
free(buf); // we don't need it in parent
// … other parent tasks here …
CoW is also very useful optimization in fork-exec technique, where child does nothing but exec with prepared arguments. exec replaces current process with specified executable image, retaining open descriptors and other things (more in man 2 execve). The only page that is copied after such fork is only current stack frame, making fork-exec very effective.
Some systems also provide vfork, that is very restrictive unfair version of fork, but on systems without CoW that is the only way to vfork-exec efficiently.
First the logical (process centered) view:
When you fork a process, the entire address space is copied into a new process as is. Your heap is essentially duplicated in both processes, and both processes can continue to use it just like one process could if fork() had never been called. Both processes can free an allocation that was done before the fork(), and they must do so if they want to reuse the address range connected to the allocation. CoW mappings are only an optimization that does not change these semantics.
Now the physical (system centered) view:
Your system kernel does not know about data ranges you have allocated using malloc(), it only knows about the memory pages it has allocated to the process at the request of malloc(). When you call fork() it marks all these pages as CoW, and references them from both processes. If any of the two processes writes to any of the CoW pages while the other process still exists, it will trap into the system which copies the entire page. And if one of the processes exits, it will at the very least lower the reference count of these pages, so that they do not have to be copied anymore.
So, what happens when you call free() in the child before exiting?
Well, the free() function will most likely write to the page containing the memory allocation to tell malloc() that the block is available again. This will trap into the system and copy the page, expect this operation to take a microsecond or two. If your parent process calls free() while the child is still alive, the same will happen. However, if your child does not free the page and exits, the kernel will know that it does not have to perform CoW anymore. If the parent frees and reuses the memory region afterwards, no copy needs to be done.
I assume, that what your child does is simply to check for some error condition, and exit immediately if it is met. In that case, the most prudent approach is to forget about calling free() in the child, and let the system do its work.

Does fork() create a duplicate instance of all the variables and object created by the parent process for the child process?

Suppose I have a main process running and in its execution it has initialized some pointers and created some instances of a predefined structure.
Now if I fork this main process, is seperate memory allocated for the pointers?And are duplicate instances of the previously existing variables, data structures created for this new process?
As an example of my requirement consider -
struct CKT
{
...
}
main()
{
...Some computations with the structure and other pointers.....
pid_t pid = fork();
if(pid == 0) //child
{
..some more computations with the structure...but I need a
..separate instance of it with all the pointers in it as well..
}
else if(pid > 0) // parent
{
..working with the original instance of the structure..
}
// merging the child process with the parent...
// after reading the data of the child processes structure's data...
// and considering a few cases...
}
Can anyone explain how do I achieve this??
Yes, theorically, the fork system call will duplicate, among other, the stack of the parent. In pratical, otherwise, there is a common method, named copy-on-write, used in that case.
It consists on copy a given parent's memory page only when the child's process is trying to modify this memory space. It allows to reduce the cost of the fork system call.
The one thing which is not copy is the return value of fork: 0 in the child, and the PID of the child in the father.
Yes. It might not copy the memory space of the old process immediately, though. The OS will use copy-on-write where possible, copying each memory page the first time it is modified in either process.
COW is what makes one common use of fork (shortly followed by an exec in the child) efficient. The child process never actually uses most of the memory space inherited from the parent.
The copies in the new process will have exactly the same numeric addresses as they did in the old process, so all the pointers from the old process remain valid in the new process and point to the new process's objects. That's part of the point of virtual memory, it allows different process to refer to different physical memory using the same pointer value.
pointer and memory content both will be duplicated for the fork child.
all kind of data pointers, memory, variable will be duplicate in a separate memory for the child process created with fork. and you could not change pointers neither memory content from process child directly.
but you can change variable of parent process from child process using memory share
Refer to this link to see how to it: How to share memory between process fork()?
Yes, your forked process receives copies of all privately mapped memory (default memory mappings via malloc, calloc, stack frames, global variables)
Your child receives shared copies of all open file descriptors. Means those file descriptors will remain valid and open until both parent and child close them. Seeks on those file descriptors are also shared. If you wish to make a file descriptor child-private then you will have to fdreopen it. Otherwise it is very recommended to close all file descriptors you don't need in children immediately after forking.
Your child will receive the same shared MAP_SHARED mappings of memory. Those will continue to access the same physical memory shared between parent and child. This applies to all shared memory aquired through the shm* family of calls and mmapwith MAP_SHARED.
Your child will not receive any mappings marked with MADV_DONTFORK flag via madvise. Those will become invalid in the child. This is not default behavior and you do not have to worry about it unless explicitly used.
You might get the result you are looking for by using a shared memory segment. Use the mmap system call to create a shared memory segment, and put all your shared structures in that segment. Since you cannot use malloc on this segment (it's returned by the syscall as a pointer to the whole segment), you must copy manually the structures, and do the shared memory usage tracking by yourself.
Perhaps you can allocate your data first locally, then evaluate how much memory is used by them, and do the shared memory allocation with the correct size. It is also possible to reallocate the shared segment to a bigger size, in which case you will have to signal the realloc somehow from one end to the other (maybe by using the first integer pointed by the shared map to store that value?).
man pages:
mmap
munmap

Resources