Weird behavior of parent-child-child threaded program - c

I have a doubt regarding the program copy-pasted below. I am explaining my understanding of the program here: In this program, the parent creates a child and waits for it complete. Then, the child creates a thread and waits for it to complete. So, at the end, the variable 'value' will have the value '5' in child process and '0' in parent process since there are in fact two copies of variable 'value' one in parent and other other in child (since fork essentially copies the address space of parent to child). However, addresses of the variable 'value' in both parent and child are turning out to be the same. I don't understand how. I will be grateful if someone can explain this behaviour.
#include <stdio.h>
#include <pthread.h>
int value = 0;
void *runner( void *param );
int main ()
{
int pid ;
pthread_t tid;
pthread_attr_t attr;
pid = fork();
if( pid == 0 ) /* child */
{
pthread_attr_init( &attr );
pthread_create( &tid, &attr, runner, NULL );
pthread_join(tid, NULL);
printf( "CHILD: value = %d, address = %p\n", value, &value );
}
else if( pid > 0 ) /* Parent */
{
wait(NULL);
printf( "PARENT: value = %d, address = %p\n", value, &value );
}
}
void *runner( void *param )
{
value = 5;
pthread_exit(0);
}

Modern operating systems provide a virtual address space for each process, so a coinciding address doesn't mean that the two variables are stored at the same destination on physical memory.
Moreover, most operating systems use the copy-on-write technique when forking. This means that parts of the address space of the parent process are not copied to the child's address space until the child attempts to change them.

Desktop CPUs, and many embedded CPUs have something called a Memory Management Unit (MMU). An MMU translates from virtual addresses to physical addresses, so each process runs on its own virtual address space, separated from other processes.
An MMU allows the operating system to use some important techniques, like on-demand paging, in addition to the separation between processes mentioned above.
Efficient implementation of fork() requires the use of an MMU: as you have just discovered, the parent and child process use the same virtual addresses, but in different virtual address spaces, thus usually (ignoring memory-mapped files and shared memory) mapping to different physical addresses.

Related

How 2 exact (volatile marked) pointers can point to different values? Forked process

The goal
Understanding what happens in the code and what have I misconcluded / mispredicted.
Context
While experimenting with fork function (and reading articles I probably misunderstood) I concluded that data used by child was a copy of parent's data.
In order to make a child work on the same data as its parent, I created pointer p, assuming that its value will be the same in all (both) processes after forking.
I made sure that both child's and parent's pointer was pointing to the same memory space by printing values stored by them. What stroke me to the ground, is that they also seem to be the same/exact – have the same addresses.
Shall the 2 exact pointers point to different values? In my code they did.
Next, to be sure that a compiler would not perform any optimization, I marked the p pointer as volatile.
That did not change anything. Here is the code.
The general code
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(){
volatile int *volatile p= malloc(sizeof(int));
*p= 6;
if (fork()){ //Parent
*p= 200; //Parent changes the value in "shared" memory
printf("Parent: %p : %p : %i\n", &p, p, *p);
wait(NULL);
}else{ //Child
sleep(3); //Child sleeps when parent is setting the "shared" value.
printf("Child: %p : %p : %i\n", &p, p, *p); //Child should read the same value.
}
}
Output:
Parent: 0x7ffc8bb6cbe0 : 0xea9260 : 200
Child: 0x7ffc8bb6cbe0 : 0xea9260 : 6
Expected outputs:
Parent: 0x7ffc8bb6cbe0 : 0xea9260 : 200
Child: 0x7ffc8bb6cbe0 : 0xea9260 : 200
Or
Parent: 0x7ffc8bb6cbe0 : 0xea9260 : 200
Child: differentAddress: 0xea9260 : 200
2nd code variation (snippet)
Still believing in the fault of compiler optimization, I added steps (sleep, fetch, sleep, copy).
if (fork()){ //Parent
*p= 200;
printf("Parent: %p : %p : %i\n", &p, p, *p);
wait(NULL);
}else{ //Child
sleep(2);
uintptr_t fetch= (uintptr_t)p;
sleep(2);
p= (int*)fetch;
printf("Child: %p : %p : %i\n", &p, p, *p);
}
This also changed nothing in the actual output.
Environment
Language C (propably C11)
Compiler: clang 7.0.0 (I haven't pass any of O1,O2,O3,... flags.)
IDE & platform: https://replit.com/
Platform: Linux-5.11.0-1029-gcp-x86_64-with-glibc2.27
Machine: x86_64
My other attempts at understanding this
I tried also other similar experiments. That provided me with ambiguous conclusions.
The operating system creates a virtual address space for each process. When a process accesses memory, it does not directly access memory by physical address. Whatever address the process uses is translated by the computer processor from a virtual address to a physical address. That translation is controlled by the operating system: Each process has its own map from virtual addresses to various physical addresses. Each process has largely its own different physical memory.
When a process calls fork, the operating system creates a copy of the process. For any memory that is modified by one processor the other, the operating system creates a copy in separate physical memory and adjusts the address translation. Even though two processes use the same virtual address to access memory, it may refer to different physical memory.
(For efficiency, any memory that is only read by processes is generally shared: Their virtual addresses are mapped to the same physical addresses, and there is only one physical copy of the data in memory. When a process first calls fork, both processes may use the same memory even though they have permission to write to memory. When they actually write to memory, then the operating system copies that portion of memory and adjusts the address translation.)
It is possible for processes to share physical memory that they write to, but this must be explicitly requested by calling routines such as shmat to manage shared memory.

is there any simplest way to wait in a child process until a flag / event change?

this is my sample program but it work as flag does not affect in child process and even sem_wait , sem_trywait also wont work, apart from signals and fcntl is there any other way to wait till particular event triggered.
#include "header.h"
void main()
{
int flag = 1;
pid_t cpid;
sem_t semobj;
int value;
cpid = fork() ;
sem_init ( &semobj, 0, 2);
if ( cpid == 0)
{
printf ("Chile Process id = %d, PPID= %d\n",getpid(),getppid());
while ( FLAG ); // here i need to wait until parent process disables the flag
//sem_trywait ( &semobj);
//sem_getvalue ( &semobj, &value );
printf ( "After Wait =%d\n", FLAG );
sleep(10);
}
else
{
printf( "Parent Process id =%d\n",getpid() );
sleep(3);
printf( "Setting Flag value\n" );
FLAG = 0; // here i need to set a flag
sleep(7);
}
}
I guess there's no the "simplest way".
The thing is that you're dealing with two separate processes here. And if you want to communicate anything between the two you have to use some kind of IPC (Inter Process Communications) mechanism. I'd suggest for you to read something like this on the subject and pick one.
You code doesn't work as you'd like it to because once you've fork()ed you have two address spaces each with its own set of variables (including semaphore variables). So any action parent performs on those variables in its address space does not affect any of the variables in child's address space, thus no communication happens.
I recommend reading whats written in the section "Named Semaphores" on this page.
Your problem is that you don't name your semaphores and therefore your semaphore gets copied for the child process.
That means, as Igor already explained in his answer, that the semaphore variable is not the same for child and parent process.
A named semaphore can be shared between two processes so your parent process would be able to signal your child process.
Take a look at http://www.yendor.com/programming/unix/apue/ch8.html, the Synchronization Library section.
Basically you want the
TELL_WAIT
TELL_PARENT
TELL_CHILD
WAIT_PARENT
WAIT_CHILD
functions.
The linked examples use signals to implement them, but you should also be able to implement them with pipes, shared memory + posix semaphores, or SysV semaphores.
If you want to use Posix semaphores, the semaphore needs to be in shared memory (shm_open) and initialized to 0 before forking.
Also, that is not how you use fork—take a look at my answer here (and subtract the exec part):
How to start process on Linux OS in C, C++

fork() system call and memory space of the process

I quote "when a process creates a new process using fork() call, Only the shared memory segments are shared between the parent process and the newly forked child process. Copies of the stack and the heap are made for the newly created process" from "operating system concepts" solutions by Silberschatz.
But when I tried this program out
#include <stdio.h>
#include <sys/types.h>
#define MAX_COUNT 200
void ChildProcess(void); /* child process prototype */
void ParentProcess(void); /* parent process prototype */
void main(void)
{
pid_t pid;
char * x=(char *)malloc(10);
pid = fork();
if (pid == 0)
ChildProcess();
else
ParentProcess();
printf("the address is %p\n",x);
}
void ChildProcess(void)
{
printf(" *** Child process ***\n");
}
void ParentProcess(void)
{
printf("*** Parent*****\n");
}
the result is like:
*** Parent*****
the address is 0x1370010
*** Child process ***
the address is 0x1370010
both parent and child printing the same address which is in heap.
can someone explain me the contradiction here. please clearly state what are all the things shared by the parent and child in memory space.
Quoting myself from another thread.
When a fork() system call is issued, a copy of all the pages
corresponding to the parent process is created, loaded into a separate
memory location by the OS for the child process. But this is not
needed in certain cases. Consider the case when a child executes an
"exec" system call or exits very soon after the fork(). When the
child is needed just to execute a command for the parent process,
there is no need for copying the parent process' pages, since exec
replaces the address space of the process which invoked it with the
command to be executed.
In such cases, a technique called copy-on-write (COW) is used. With
this technique, when a fork occurs, the parent process's pages are not
copied for the child process. Instead, the pages are shared between
the child and the parent process. Whenever a process (parent or child)
modifies a page, a separate copy of that particular page alone is made
for that process (parent or child) which performed the modification.
This process will then use the newly copied page rather than the
shared one in all future references. The other process (the one which
did not modify the shared page) continues to use the original copy of
the page (which is now no longer shared). This technique is called
copy-on-write since the page is copied when some process writes to it.
Also, to understand why these programs appear to be using the same space of memory (which is not the case), I would like to quote a part of the book "Operating Systems: Principles and Practice".
Most modern processors introduce a level of indirection, called
virtual addresses. With virtual addresses, every process's memory
starts at the "same" place, e.g., zero.
Each process thinks that it has the entire machine to itself, although
obviously that is not the case in reality.
So these virtual addresses are translations of physical addresses and doesn't represent the same physical memory space, to leave a more practical example we can do a test, if we compile and run multiple times a program that displays the direction of a static variable, such as this program.
#include <stdio.h>
int main() {
static int a = 0;
printf("%p\n", &a);
getchar();
return 0;
}
It would be impossible to obtain the same memory address in two
different programs if we deal with the physical memory directly.
And the results obtained from running the program several times are...
Yes, both processes are using the same address for this variable, but these addresses are used by different processes, and therefore aren't in the same virtual address space.
This means that the addresses are the same, but they aren't pointing to the same physical memory. You should read more about virtual memory to understand this.
The address is the same, but the address space is not. Each process has its own address space, so parent's 0x1370010 is not the same as child's 0x1370010.
You're probably running your program on an operating system with virtual memory. After the fork() call, the parent and child have separate address spaces, so the address 0x1370010 is not pointing to the same place. If one process wrote to *x, the other process would not see the change. (In fact those may be the same page of memory, or even the same block in a swap-file, until it's changed, but the OS makes sure that the page is copied as soon as either the parent or the child writes to it, so as far as the program can tell it's dealing with its own copy.)
When the kernel fork()s the process, the copied memory information inherits the same address information since the heap is effectively copied as-is. If addresses were different, how would you update pointers inside of custom structs? The kernel knows nothing about that information so those pointers would then be invalidated. Therefore, the physical address may change (and in fact often will change even during the lifetime of your executable even without fork()ing, but the logical address remains the same.
Yes address in both the case is same. But if you assign different value for x in child process and parent process and then also prints the value of x along with address of x, You will get your answer.
#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_COUNT 200
void ChildProcess(void); /* child process prototype */
void ParentProcess(void); /* parent process prototype */
void main(void)
{
pid_t pid;
int * x = (int *)malloc(10);
pid = fork();
if (pid == 0) {
*x = 100;
ChildProcess();
}
else {
*x = 200;
ParentProcess();
}
printf("the address is %p and value is %d\n", x, *x);
}
void ChildProcess(void)
{
printf(" *** Child process ***\n");
}
void ParentProcess(void)
{
printf("*** Parent*****\n");
}
Output of this will be:
*** Parent*****
the address is 0xf70260 and value is 200
*** Child process ***
the address is 0xf70260 and value is 100
Now, You can see that value is different but address is same. So The address space for both the process is different. These addresses are not actual address but logical address so these could be same for different processes.

how memory area is shared between processes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How memory is shared in following scenarios?
Between Parent and child Processes
Between two irrelevant Processes
In which part of the physical memory does the shared memory (or) any other IPC used for communicating between processes exists?
Here it the program with explanation of Memory management between Parent and Child Process..
/*
SHARING MEMORY BETWEEN PROCESSES
In this example, we show how two processes can share a common
portion of the memory. Recall that when a process forks, the
new child process has an identical copy of the variables of
the parent process. After fork the parent and child can update
their own copies of the variables in their own way, since they
dont actually share the variable. Here we show how they can
share memory, so that when one updates it, the other can see
the change.
*/
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/shm.h> /* This file is necessary for using shared
memory constructs
*/
main()
{
int shmid. status;
int *a, *b;
int i;
/*
The operating system keeps track of the set of shared memory
segments. In order to acquire shared memory, we must first
request the shared memory from the OS using the shmget()
system call. The second parameter specifies the number of
bytes of memory requested. shmget() returns a shared memory
identifier (SHMID) which is an integer. Refer to the online
man pages for details on the other two parameters of shmget()
*/
shmid = shmget(IPC_PRIVATE, 2*sizeof(int), 0777|IPC_CREAT);
/* We request an array of two integers */
/*
After forking, the parent and child must "attach" the shared
memory to its local data segment. This is done by the shmat()
system call. shmat() takes the SHMID of the shared memory
segment as input parameter and returns the address at which
the segment has been attached. Thus shmat() returns a char
pointer.
*/
if (fork() == 0) {
/* Child Process */
/* shmat() returns a char pointer which is typecast here
to int and the address is stored in the int pointer b. */
b = (int *) shmat(shmid, 0, 0);
for( i=0; i< 10; i++) {
sleep(1);
printf("\t\t\t Child reads: %d,%d\n",b[0],b[1]);
}
/* each process should "detach" itself from the
shared memory after it is used */
shmdt(b);
}
else {
/* Parent Process */
/* shmat() returns a char pointer which is typecast here
to int and the address is stored in the int pointer a.
Thus the memory locations a[0] and a[1] of the parent
are the same as the memory locations b[0] and b[1] of
the parent, since the memory is shared.
*/
a = (int *) shmat(shmid, 0, 0);
a[0] = 0; a[1] = 1;
for( i=0; i< 10; i++) {
sleep(1);
a[0] = a[0] + a[1];
a[1] = a[0] + a[1];
printf("Parent writes: %d,%d\n",a[0],a[1]);
}
wait(&status);
/* each process should "detach" itself from the
shared memory after it is used */
shmdt(a);
/* Child has exited, so parent process should delete
the cretaed shared memory. Unlike attach and detach,
which is to be done for each process separately,
deleting the shared memory has to be done by only
one process after making sure that noone else
will be using it
*/
shmctl(shmid, IPC_RMID, 0);
}
}
/*
POINTS TO NOTE:
In this case we find that the child reads all the values written
by the parent. Also the child does not print the same values
again.
1. Modify the sleep in the child process to sleep(2). What
happens now?
2. Restore the sleep in the child process to sleep(1) and modify
the sleep in the parent process to sleep(2). What happens now?
Thus we see that when the writer is faster than the reader, then
the reader may miss some of the values written into the shared
memory. Similarly, when the reader is faster than the writer, then
the reader may read the same values more than once. Perfect
i /*
SHARING MEMORY BETWEEN PROCESSES
In this example, we show how two processes can share a common
portion of the memory. Recall that when a process forks, the
new child process has an identical copy of the variables of
the parent process. After fork the parent and child can update
their own copies of the variables in their own way, since they
dont actually share the variable. Here we show how they can
share memory, so that when one updates it, the other can see
the change.
*/
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/shm.h> /* This file is necessary for using shared
memory constructs
*/
main()
{
int shmid. status;
int *a, *b;
int i;
/*
The operating system keeps track of the set of shared memory
segments. In order to acquire shared memory, we must first
request the shared memory from the OS using the shmget()
system call. The second parameter specifies the number of
bytes of memory requested. shmget() returns a shared memory
identifier (SHMID) which is an integer. Refer to the online
man pages for details on the other two parameters of shmget()
*/
shmid = shmget(IPC_PRIVATE, 2*sizeof(int), 0777|IPC_CREAT);
/* We request an array of two integers */
/*
After forking, the parent and child must "attach" the shared
memory to its local data segment. This is done by the shmat()
system call. shmat() takes the SHMID of the shared memory
segment as input parameter and returns the address at which
the segment has been attached. Thus shmat() returns a char
pointer.
*/
if (fork() == 0) {
/* Child Process */
/* shmat() returns a char pointer which is typecast here
to int and the address is stored in the int pointer b. */
b = (int *) shmat(shmid, 0, 0);
for( i=0; i< 10; i++) {
sleep(1);
printf("\t\t\t Child reads: %d,%d\n",b[0],b[1]);
}
/* each process should "detach" itself from the
shared memory after it is used */
shmdt(b);
}
else {
/* Parent Process */
/* shmat() returns a char pointer which is typecast here
to int and the address is stored in the int pointer a.
Thus the memory locations a[0] and a[1] of the parent
are the same as the memory locations b[0] and b[1] of
the parent, since the memory is shared.
*/
a = (int *) shmat(shmid, 0, 0);
a[0] = 0; a[1] = 1;
for( i=0; i< 10; i++) {
sleep(1);
a[0] = a[0] + a[1];
a[1] = a[0] + a[1];
printf("Parent writes: %d,%d\n",a[0],a[1]);
}
wait(&status);
/* each process should "detach" itself from the
shared memory after it is used */
shmdt(a);
/* Child has exited, so parent process should delete
the cretaed shared memory. Unlike attach and detach,
which is to be done for each process separately,
deleting the shared memory has to be done by only
one process after making sure that noone else
will be using it
*/
shmctl(shmid, IPC_RMID, 0);
}
}
/*
POINTS TO NOTE:
In this case we find that the child reads all the values written
by the parent. Also the child does not print the same values
again.
1. Modify the sleep in the child process to sleep(2). What
happens now?
2. Restore the sleep in the child process to sleep(1) and modify
the sleep in the parent process to sleep(2). What happens now?
Thus we see that when the writer is faster than the reader, then
the reader may miss some of the values written into the shared
memory. Similarly, when the reader is faster than the writer, then
the reader may read the same values more than once. Perfect
inter-process communication requires synchronization between the
reader and the writer. You can use semaphores to do this.
Further note that "sleep" is not a synchronization construct.
We use "sleep" to model some amount of computation which may
exist in the process in a real world application.
Also, we have called the different shared memory related
functions such as shmget, shmat, shmdt, and shmctl, assuming
that they always succeed and never fail. This is done to
keep this proram simple. In practice, you should always check for
the return values from this function and exit if there is
an error.
*/nter-process communication requires synchronization between the
reader and the writer. You can use semaphores to do this.
Further note that "sleep" is not a synchronization construct.
We use "sleep" to model some amount of computation which may
exist in the process in a real world application.
Also, we have called the different shared memory related
functions such as shmget, shmat, shmdt, and shmctl, assuming
that they always succeed and never fail. This is done to
keep this proram simple. In practice, you should always check for
the return values from this function and exit if there is
an error.
*/

Difference between the address space of parent process and its child process in Linux?

I am confused about something. I have read that when a child is created by a parent process, the child gets a copy of its parent's address space. What does it mean by copy?
If I use the code below, then it prints the same value for variable 'a' which is on the heap in both tthe child and parent. So what is happening here?
int main ()
{
pid_t pid;
int *a = (int *)malloc(4);
printf ("heap pointer %p\n", a);
pid = fork();
if (pid < 0) {
fprintf (stderr, "Fork Failed");
exit(-1);
}
else if (pid == 0) {
printf ("Child\n");
printf ("in child heap pointer %p\n", a);
}
else {
wait (NULL);
printf ("Child Complete\n");
printf ("in parent heap pointer %p\n", a);
exit(0);
}
}
The child gets an exact copy of the parents address space, which in many cases is likely to be laid out in the same format as the parent address space. I have to point out that each one will have it's own virtual address space for its memory, such that each could have the same data at the same address, yet in different address spaces. Also, Linux uses copy on write when creating child processes. This means that the parent and child will share the parent address space until one of them does a write, at which point the memory will be physically copied to the child. This eliminates unneeded copies when execing a new process. Since you're just going to overwrite the memory with a new executable, why bother copying it?
Yes, you will get the same virtual address, but remember each one has it's own process virtual address spaces.
Till there is a Copy-On-Write operation done everything is shared.
So when you try to strcpy or any write operation the Copy-On-Write takes place which means the child process virtual address of pointer a will be updated for the child process, but not so for the parent process.
A copy means exactly that, a bit-identical copy of the virtual address space. For all intents and purposes, the two copies are indistinguishable, until you start writing to one (the changes are not visible in the other copy).
With fork() the child process receives a new address space where all the contents of the parent address space are copied (actually, modern kernels use copy-on-write).
This means that if you modify a or the value pointed by it in a process, the other process still sees the old value.
You get two heaps, and since the memory addresses are translated to different parts of physical memory, both of them have the same virtual memory address.

Resources