I want to know how copy-on-write happens in fork().
Assuming we have a process A that has a dynamical int array:
int *array = malloc(1000000*sizeof(int));
Elements in array are initialized to some meaningful values.
Then, we use fork() to create a child process, namely B.
B will iterate the array and do some calculations:
for(a in array){
a = a+1;
}
I know B will not copy the entire array immediately, but when does the child B allocate memory for array? during fork()?
Does it allocate the entire array all at once, or only a single integer for a = a+1?
a = a+1; how does this happen? Does B read data from A and write new data to its own array?
I wrote some code to explore how COW works. My environment: ubuntu 14.04, gcc4.8.2
#include <stdlib.h>
#include <stdio.h>
#include <sys/sysinfo.h>
void printMemStat(){
struct sysinfo si;
sysinfo(&si);
printf("===\n");
printf("Total: %llu\n", si.totalram);
printf("Free: %llu\n", si.freeram);
}
int main(){
long len = 200000000;
long *array = malloc(len*sizeof(long));
long i = 0;
for(; i<len; i++){
array[i] = i;
}
printMemStat();
if(fork()==0){
/*child*/
printMemStat();
i = 0;
for(; i<len/2; i++){
array[i] = i+1;
}
printMemStat();
i = 0;
for(; i<len; i++){
array[i] = i+1;
}
printMemStat();
}else{
/*parent*/
int times=10;
while(times-- > 0){
sleep(1);
}
}
return 0;
}
After fork(), the child process modifies a half of numbers in array, and then modifies the entire array. The outputs are:
===
Total: 16694571008
Free: 2129162240
===
Total: 16694571008
Free: 2126106624
===
Total: 16694571008
Free: 1325101056
===
Total: 16694571008
Free: 533794816
It seems that the array is not allocated as a whole. If I slightly change the first modification phase to:
i = 0;
for(; i<len/2; i++){
array[i*2] = i+1;
}
The outputs will be:
===
Total: 16694571008
Free: 2129924096
===
Total: 16694571008
Free: 2126868480
===
Total: 16694571008
Free: 526987264
===
Total: 16694571008
Free: 526987264
Depends on the Operating System, hardware architecture and libc. But yes in case of recent Linux with MMU the fork(2) will work with copy-on-write. It will only (allocate and) copy a few system structures and the page table, but the heap pages actually point to the ones of the parent until written.
More control over this can be exercised with the clone(2) call. And vfork(2) beeing a special variant which does not expect the pages to be used. This is typically used before exec().
As for the allocation: the malloc() has meta information over requested memory blocks (address and size) and the C variable is a pointer (both in process memory heap and stacks). Those two look the same for the child (same values because same underlying memory page seen in the address space of both processes). So from a C program point of view the array is already allocated and the variable initialized when the process comes into existence. The underlying memory pages are however pointing to the original physical ones of the parent process, so no extra memory pages are needed until they are modified.
If the child allocates a new array it depends if it fits into the already existing heap pages or if the brk of the process needs to be increased. In both cases only the modified pages get copied and the new pages get allocated only for the child.
This also means that the physical memory might run out after malloc(). (Which is bad as the program cannot check the error return code of "a operation in a random code line"). Some operating systems will not allow this form of overcommit: So if you fork a process it will not allocate the pages, but it requires them to be available at that moment (kind of reserves them) just in case. In Linux this is configurable and called overcommit-accounting.
Some systems have a system call vfork(), which was originally
designed as a lower-overhead version of fork(). Since
fork() involved copying the entire address space of the process,
and was therefore quite expensive, the vfork() function was
introduced (in 3.0BSD).
However, since vfork() was introduced, the
implementation of fork() has improved drastically, most notably
with the introduction of 'copy-on-write', where the copying of the
process address space is transparently faked by allowing both processes
to refer to the same physical memory until either of them modify
it. This largely removes the justification for vfork(); indeed, a
large proportion of systems now lack the original functionality of
vfork() completely. For compatibility, though, there may still be
a vfork() call present, that simply calls fork() without
attempting to emulate all of the vfork() semantics.
As a result, it is very unwise to actually make use of any of the
differences between fork() and vfork(). Indeed, it is
probably unwise to use vfork() at all, unless you know exactly
why you want to.
The basic difference between the two is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent process continues.
This means that the child process of a vfork() must be careful to
avoid unexpectedly modifying variables of the parent process. In
particular, the child process must not return from the function
containing the vfork() call, and it must not call
exit() (if it needs to exit, it should use _exit();
actually, this is also true for the child of a normal fork()).
Related
There are similar question all over and one which is closes is from this stack exchange site. But even tho I learned a lot reading them none of them a exactly answer my question.
Here is the question, say I have this program (it is a very simplified version)
inline void execute(char *cmd)
{
int exitstat = 0;
//lots of other thins happed
exitstat = execve(cmd, cmdarg, environ);
free(cmd), free(other passed adress); ## DO I NEED THIS, since I am already about to exit
_exit (exitstat);
}
int main(void)
{
int childstat;
char *str = malloc(6); //I also have many more allocated space in heap in the parent process
strcpy(str, "Hello");
pid_t childid = fork() //creat a child process, which will also get a copy of all the heap memories even tho it is CoW.
if (childid < 0)
exit(-1);
if (child == 0)
{
execute(str);
}
else
{
wait(&childstat);
free(str);
}
//do somethign else with str with other functions and the rest of the program
}
In this program, the parent process does its thing for a while, allocates a lot of process in the heap, free some, keep other for later and at some point it wants to execute some command, but it doesn't want to terminate, so it creates a child to do its biting.
The child then calls another function which will do some task and in the end use execve to execute the command passed to it. If this was successful there would be no problem, since the executed program will handle all the allocated spaces, but if it failed the child exits with a status code. The parent waits for the child to answer, and when it does it moves on to its next routine. The problem here is that when the child fails to execute all the heap data remains allocated, but does that matter? since,
it is about to be exited int he next line,
Even though I am getting started with this, I have learned that a new process is created during fork, so the memory leak shouldn't affect the parent since the child is about to die.
If my assumption is correct and this leaks doesn't actually matter, why does valgrind lament about them?
Would there be a better way to free all the memories in the child heap with out actually passing all the memores (str, and the others in this example) to the execute function and laboriously call free each time? does kill() have a mechanism for this?
Edit
Here is working code
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <string.h>
inline void execute(char *cmd)
{
extern char **environ;
int exitstat = 0;
char *cmdarg[2];
cmdarg[0] = cmd;
cmdarg[1] = NULL;
exitstat = execve(cmd, cmdarg, environ);
free(cmd);
_exit (exitstat);
}
int main(void)
{
int childstat;
char *str = malloc(6);
pid_t childid;
strcpy(str, "Hello");
childid = fork();
if (childid < 0)
exit(-1);
if (childid == 0)
{
execute(str);
}
else
{
wait(&childstat);
free(str);
}
return (0);
}
When there is a free inside the child process here is valgrinds result.
When run with the free inside the child process is commented.(#free(cmd) in the execute function).
As you might see there is no overall error () since this programm is short lived. But in my real program which has an infinite loop every problem in the child () matters. So I was asking if I be worried by this leaked memories in this child process just before exiting and hunt them down or just leave them as they are in another virtual space and they are cleaned as their processes exit (but in that case why is valgrind still complaining about them if their corresponding process clean them during exit)
exitstat = execve(cmd, cmdarg, environ);
free(cmd), free(other passed adress);
If successful, execve does not return. No code is executed after it, the execution is terminated and the cmd process is started. All memory hold by current process is freed anyway by the operating system. The code free(cmd), free(other passed adress); will not execute if execve call is successful.
The only case where those liens could be relevant are in case of execve error. And in case execve returns with an error, yes, you should free that memory. I much doubt most programs with execve actually do that and I believe they would just call abort() after failed execve call.
inline void execute(char *cmd)
Please read about inline. inline is tricky. I recommend not use it, forget it exists. I doubt the intention here is to make an inline function - there is no other function the compiler can choose from anyway. After fixing typos in code, I could not compile the code because of undefined reference to execute - inline has to be removed.
The problem here is that when the child fails to execute all the heap data remains allocated, but does that matter?
Ultimately it depends on what do you want to do after the child fails. If you want to just call exit and you do not care about memory leaks in such case, then just don't care and it will not matter.
I see glibc exec*.c functions try to avoid dynamic allocation as much as possible and use variable length arrays or alloca().
If my assumption is correct and this leaks doesn't actually matter, why does valgrind lament about them?
It doesn't. If you correct all the mistakes in your code, then valgrind will not "lament" about memory leaks before calling execve (well, unless the execve call fails).
Would there be a better way to free all the memories in the child heap with out actually passing all the memores (str, and the others in this example) to the execute function and laboriously call free each time?
"Better" will tend to be opinion based. Subjectively: no. But writing/using own wrapper around dynamic allocation with garbage collector to free all memory is also an option.
does kill() have a mechanism for this?
No, sending a signal seems unrelated to freeing memory.
There is no simple solution.
On termination, the OS will free all memory for a process. That means that you don't have to free everything. However, for memory that is no longer needed it is usually best to free it as soon as possible. That leaves things like buffers that could potentially be used right up to the last moment before termination.
Reasons to free memory.
There's only one that I can think of. If you have a large number of blocks in use at the end it can be difficult to sort the wheat from the chaff. Either you end up with enormous logs that no-one looks at, or you have to maintain a suppression file. Either way you risk overlooking genuine issues.
Reasons not to free memory
Small performance gain
It can be quite difficult sometimes to free everything. For instance if you use putenv() then knowing whether the string was added (and needs freeing) or replaced (must not be freed) is a bit tricky.
If you need to use atexit() to free things, then there is a limit on the number of functions that you can call via atexit(). That might be a problem if you have many things being freed in many modules.
My advice is try to free most things but don't sweat it or you'll get bitten by the law of diminishing returns. Use a suppression file judiciously for the few percent that are tricky to free.
Can someone help me understand how the system handles variables that are set before a process makes a fork() call. Below is a small test program I wrote to try understanding what is going on behind the scenes.
I understand that the current state of a process is "cloned", variables included, at the time of the forking. My thought was, that if I malloc'd a 2D array before calling fork, I would need to free the array both in the parent and the child processes.
As you can see from the results below the sample code, the two values act as if they are totally separate from each other, yet they have the exact same address space. I expected that my final result for tmp would be -4 no matter which process completed first.
I am newer to C and Linux, so could someone explain how this is possible? Perhaps the variable tmp becomes a pointer to a pointer which is distinct in each process? Thanks so much.
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int tmp = 1;
pid_t forkReturn= fork();
if(!forkReturn) {
/*child*/
tmp=tmp+5;
printf("Value for child %d\n",tmp);
printf("Address for child %p\n",&tmp);
}
else if(forkReturn > 0){
/*parent*/
tmp=tmp-10;
printf("Value for parent %d\n",tmp);
printf("Address for parent %p\n",&tmp);
}
else {
/*Error calling fork*/
printf("Error calling fork);
}
return 0;
}
RESULTS of standard out:
Value for child 6
Address for child 0xbfb478d8
Value for parent -9
Address for parent 0xbfb478d8
It did indeed copy the entire address space, and changing memory in the child process does not affect the parent. The key to understanding this is to remember that a pointer can only point to something in your own process, and the copy happens at a lower level.
However, you should not call malloc() or free() at all in the child of fork. This can deadlock (another thread was in malloc() when you called fork()). The only functions safe to call in the child are the ones also listed as safe for signal handlers. I used to be able to claim this was true only if you wrote multithreaded code; however Apple was kind enough to spawn a background thread in the standard library, so the deadlock is real all the time. The child of fork should never be allowed to drop out of the if block. Call _exit to make sure it doesn't.
I quote "when a process creates a new process using fork() call, Only the shared memory segments are shared between the parent process and the newly forked child process. Copies of the stack and the heap are made for the newly created process" from "operating system concepts" solutions by Silberschatz.
But when I tried this program out
#include <stdio.h>
#include <sys/types.h>
#define MAX_COUNT 200
void ChildProcess(void); /* child process prototype */
void ParentProcess(void); /* parent process prototype */
void main(void)
{
pid_t pid;
char * x=(char *)malloc(10);
pid = fork();
if (pid == 0)
ChildProcess();
else
ParentProcess();
printf("the address is %p\n",x);
}
void ChildProcess(void)
{
printf(" *** Child process ***\n");
}
void ParentProcess(void)
{
printf("*** Parent*****\n");
}
the result is like:
*** Parent*****
the address is 0x1370010
*** Child process ***
the address is 0x1370010
both parent and child printing the same address which is in heap.
can someone explain me the contradiction here. please clearly state what are all the things shared by the parent and child in memory space.
Quoting myself from another thread.
When a fork() system call is issued, a copy of all the pages
corresponding to the parent process is created, loaded into a separate
memory location by the OS for the child process. But this is not
needed in certain cases. Consider the case when a child executes an
"exec" system call or exits very soon after the fork(). When the
child is needed just to execute a command for the parent process,
there is no need for copying the parent process' pages, since exec
replaces the address space of the process which invoked it with the
command to be executed.
In such cases, a technique called copy-on-write (COW) is used. With
this technique, when a fork occurs, the parent process's pages are not
copied for the child process. Instead, the pages are shared between
the child and the parent process. Whenever a process (parent or child)
modifies a page, a separate copy of that particular page alone is made
for that process (parent or child) which performed the modification.
This process will then use the newly copied page rather than the
shared one in all future references. The other process (the one which
did not modify the shared page) continues to use the original copy of
the page (which is now no longer shared). This technique is called
copy-on-write since the page is copied when some process writes to it.
Also, to understand why these programs appear to be using the same space of memory (which is not the case), I would like to quote a part of the book "Operating Systems: Principles and Practice".
Most modern processors introduce a level of indirection, called
virtual addresses. With virtual addresses, every process's memory
starts at the "same" place, e.g., zero.
Each process thinks that it has the entire machine to itself, although
obviously that is not the case in reality.
So these virtual addresses are translations of physical addresses and doesn't represent the same physical memory space, to leave a more practical example we can do a test, if we compile and run multiple times a program that displays the direction of a static variable, such as this program.
#include <stdio.h>
int main() {
static int a = 0;
printf("%p\n", &a);
getchar();
return 0;
}
It would be impossible to obtain the same memory address in two
different programs if we deal with the physical memory directly.
And the results obtained from running the program several times are...
Yes, both processes are using the same address for this variable, but these addresses are used by different processes, and therefore aren't in the same virtual address space.
This means that the addresses are the same, but they aren't pointing to the same physical memory. You should read more about virtual memory to understand this.
The address is the same, but the address space is not. Each process has its own address space, so parent's 0x1370010 is not the same as child's 0x1370010.
You're probably running your program on an operating system with virtual memory. After the fork() call, the parent and child have separate address spaces, so the address 0x1370010 is not pointing to the same place. If one process wrote to *x, the other process would not see the change. (In fact those may be the same page of memory, or even the same block in a swap-file, until it's changed, but the OS makes sure that the page is copied as soon as either the parent or the child writes to it, so as far as the program can tell it's dealing with its own copy.)
When the kernel fork()s the process, the copied memory information inherits the same address information since the heap is effectively copied as-is. If addresses were different, how would you update pointers inside of custom structs? The kernel knows nothing about that information so those pointers would then be invalidated. Therefore, the physical address may change (and in fact often will change even during the lifetime of your executable even without fork()ing, but the logical address remains the same.
Yes address in both the case is same. But if you assign different value for x in child process and parent process and then also prints the value of x along with address of x, You will get your answer.
#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_COUNT 200
void ChildProcess(void); /* child process prototype */
void ParentProcess(void); /* parent process prototype */
void main(void)
{
pid_t pid;
int * x = (int *)malloc(10);
pid = fork();
if (pid == 0) {
*x = 100;
ChildProcess();
}
else {
*x = 200;
ParentProcess();
}
printf("the address is %p and value is %d\n", x, *x);
}
void ChildProcess(void)
{
printf(" *** Child process ***\n");
}
void ParentProcess(void)
{
printf("*** Parent*****\n");
}
Output of this will be:
*** Parent*****
the address is 0xf70260 and value is 200
*** Child process ***
the address is 0xf70260 and value is 100
Now, You can see that value is different but address is same. So The address space for both the process is different. These addresses are not actual address but logical address so these could be same for different processes.
The following code never ends. Why is that?
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#define SIZE 5
int nums[SIZE] = {0, 1, 2, 3, 4};
int main()
{
int i;
pid_t pid;
pid = vfork();
if(pid == 0){ /* Child process */
for(i = 0; i < SIZE; i++){
nums[i] *= -i;
printf(”CHILD: %d “, nums[i]); /* LINE X */
}
}
else if (pid > 0){ /* Parent process */
wait(NULL);
for(i = 0; i < SIZE; i++)
printf(”PARENT: %d “, nums[i]); /* LINE Y */
}
return 0;
}
Update:
This code is just to illustrate some of the confusions I have regarding to vfork(). It seems like when I use vfork(), the child process doesn't copy the address space of the parent. Instead, it shares the address space. In that case, I would expect the nums array get updated by both of the processes, my question is in what order? How the OS synchronizes between the two?
As for why the code never ends, it is probably because I don't have any _exit() or exec() statement explicitly for exit. Am I right?
UPDATE2:
I just read: 56. Difference between the fork() and vfork() system call?
and I think this article helps me with my first confusion.
The child process from vfork() system call executes in the parent’s
address space (this can overwrite the parent’s data and stack ) which
suspends the parent process until the child process exits.
To quote from the vfork(2) man page:
The vfork() function has the same effect as fork(), except that the behaviour is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit() or one of the exec family of functions.
You're doing a whole bunch of those things, so you shouldn't expect it to work. I think the real question here is: why you're using vfork() rather than fork()?
Don't use vfork. That's the simplest advice you can get. The only thing that vfork gives you is suspending the parent until the child either calls exec* or _exit. The part about sharing the address space is incorrect, some operating systems do it, other choose not to because it's very unsafe and has caused serious bugs.
Last time I looked at how applications use vfork in reality the absolute majority did it wrong. It was so bad that I threw away the 6 character change that enabled address space sharing on the operating system I was working on at that time. Almost everyone who uses vfork at least leaks memory if not worse.
If you really want to use vfork, don't do anything other than immediately call _exit or execve after it returns in the child process. Anything else and you're entering undefined territory. And I really mean "anything". You start parsing your strings to make arguments for your exec call and you're pretty much guaranteed that something will touch something it's not supposed to touch. And I also mean execve, not some other function from the exec family. Many libc out there do things in execvp, execl, execle, etc. that are unsafe in a vfork context.
What is specifically happening in your example:
If your operating system shares address space the child returning from main means that your environment cleans things up (flush stdout since you called printf, free memory that was allocated by printf and such things). This means that there are other functions called that will overwrite the stack frame the parent was stuck in. vfork returning in the parent returns to a stack frame that has been overwritten and anything can happen, it might not even have a return address on the stack to return to anymore. You first entered undefined behavior country by calling printf, then the return from main brought you into undefined behavior continent and the cleanup run after the return from main made you travel to undefined behavior planet.
From the official specification:
the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(),
In your program you modify data other than the pid variable, meaning the behavior is undefined.
You also have to call _exit to end the process, or call one of the exec family of functions.
The child must _exit rather than returning from main. If the child returns from main, then the stack frame does not exist for the parent when it returns from vfork.
just call the _exit instead of calling return or insert _exit(0) to the last line in "child process". return 0 calls exit(0) while close the stdout, so when another printf follows, the program crashes.
I have a parent with 5 child processes. I'm wanting to send a random variable to each child process. Each child will square the variable and send it back to the parent and the parent will sum them all together.
Is this even possible? I can't figure it out...
edit: this process would use shared memory.
There are a great number of ways to do this, all involving some form of inter-process communication. Which one you choose will depend on many factors, but some are:
shared memory.
pipes (popen and such).
sockets.
In general, I would probably popen a number of communications sessions in the parent before spawning the children; the parent will know about all five but each child can be configured to use only one.
Shared memory is also a possibility, although you'd probably have to have a couple of values in it per child to ensure communications went smoothly:
a value to store the variable and return value.
a value to store the state (0 = start, 1 = variable ready for child, 2 = variable ready for parent again).
In all cases, you need a way for the children to only pick up their values, not those destined for other children. That may be as simple as adding a value to the shared memory block to store the PID of the child. All children would scan every element in the block but would only process those where the state is 1 and the PID is their PID.
For example:
Main creates shared memory for five children. Each element has state, PID and value.
Main sets all states to "start".
Main starts five children who all attach to the shared memory.
Main stores all their PIDs.
All children start scanning the shared memory for state = "ready for child" and their PID.
Main puts in first element (state = "ready for child", PID = pid1, value = 7).
Main puts in second element (state = "ready for child", PID = pid5, value = 9).
Child pid1 picks up first element, changes value to 49, sets state to "ready for parent"), goes back to monitoring.
Child pid5 picks up second element, changes value to 81, sets state to "ready for parent"), goes back to monitoring.
Main picks up pid5's response, sets that state back to "start.
Main picks up pid1's response, sets that state back to "start.
This gives a measure of parallelism with each child continuously monitoring the shared memory for work it's meant to do, Main places the work there and periodically receives the results.
The nastiest method uses vfork() and lets the different children trample on different parts of memory before exiting; the parent then just adds up the modified bits of memory.
Highly unrecommended - but about the only case I've come across where vfork() might actually have a use.
Just for amusement (mine) I coded this up:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/wait.h>
int main(void)
{
int i;
int array[5];
int square[5];
long sum = 0;
srand(time(0));
for (i = 0; i < 5; i++)
{
array[i] = rand();
if (vfork() == 0)
{
square[i] = array[i] * array[i];
execl("/bin/true", "/bin/true", (char *)0);
}
else
wait(0);
}
for (i = 0; i < 5; i++)
{
printf("in: %d; square: %d\n", array[i], square[i]);
sum += square[i];
}
printf("Sum: %d\n", sum);
return(0);
}
This works. The previous trial version using 'exit(0)' in place of 'execl()' did not work; the square array was all zeroes. Example output (32-bit compilation on Solaris 10, SPARC):
in: 22209; square: 493239681
in: 27082; square: 733434724
in: 2558; square: 6543364
in: 17465; square: 305026225
in: 6610; square: 43692100
Sum: 1581936094
Sometimes, the sum overflows - there is much room for improvement in the handling of that.
The Solaris manual page for 'vfork()' says:
Unlike with the fork() function, the child process borrows
the parent's memory and thread of control until a call to
execve() or an exit (either abnormally or by a call to
_exit() (see exit(2)). Any modification made during this
time to any part of memory in the child process is reflected
in the parent process on return from vfork(). The parent
process is suspended while the child is using its resources.
That probably means the 'wait()' is unnecessary in my code. (However, trying to simplify the code seemed to make it behave indeterminately. It is rather crucial that i does not change prematurely; the wait() does ensure that synchronicity. Using _exit() instead of execl() also seemed to break things. Don't use vfork() if you value your sanity - or if you want any marks for your homework.)
Things like the anti thread might make this a little easier for you, see the examples (in particular the ns lookup program).