vfork() system call - c

I read that the new process created using vfork() system call executes as a thread in the parent's address space and until the child thread doesnot calls exit() or exec() system call, the parent is blocked. So I wrote a program using vfork() system call
#include <stdio.h>
#include <unistd.h>
int main()
{
pid_t pid;
printf("Parent\n");
pid = vfork();
if(pid==0)
{
printf("Child\n");
}
return 0;
}
I got the output as follows:
Parent
Child
Parent
Child
Parent
Child
....
....
....
I was assuming that the return statement must be calling the exit() system call internally so I was expecting the output as only
Parent
Child
Can somebody explain me why actually it is not stopping and continuously printing for infinite loop.

You should read the man page for vfork very carefully:
The vfork() function has the same effect as fork(2), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit(2) or one of the exec(3) family of functions.
(above is from the POSIX part of the man page, so applies (potentially) to other environments than Linux).
You're calling printf and returning from the child, so the behavior of your program is undefined.

Related

Incorrect result from getpid() for grandchild with vfork() and -lpthread

In one of the special cases shown below, getpid() for the grandchild created with vfork() returns the PID of the parent process.
#include <stdio.h>
#include <stdlib.h>
int main() {
if(vfork()) { /* parent */
printf("parent pid = %d\n", getpid());
exit(0);
} else {
if(vfork()) { /* child */
printf("child pid = %d\n", getpid());
exit(0);
} else { /* grandchild */
printf("grandchild pid = %d\n", getpid());
exit(0);
}
}
}
Compiled as gcc main.c, this works as expected:
grandchild pid = 12241
child pid = 12240
parent pid = 12239
Compiled as gcc main.c -lpthread, the grandchild PID is incorrect:
grandchild pid = 12431
child pid = 12432
parent pid = 12431
Any clues why? Is this one of the undefined behavior cases?
With ps and strace, I can see the correct PID. BTW, the same example code works fine with fork(), i.e. correct getpid() with or without -lpthread.
getpid is not one of the two operations you're permitted to perform after vfork in the child; the only two are execve and _exit. It happens that glibc caches the process's pid in userspace, and does not update this cache on vfork (since it would modify the parent's cached value, and since it's not needed since valid code can't observe the result); that's the mechanism of the behavior you're seeing. The caching behavior differs slightly with -lpthread linked. But the underlying reason is that your code is invalid.
Pretty much, don't use vfork. There's basically nothing you can do with it.
From the manual page for vfork():
The vfork() function has the same effect as fork(2), except that the behavior is undefined if the
process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value
from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully
calling _exit(2) or one of the exec(3) family of functions.
It's not very nicely worded, but what this is saying is that the only things that a child process can do after vfork() are:
Check the return value.
Call one of the exec*() family of functions.
Call _exit().
This is because:
vfork() is a special case of clone(2). It is used to create new processes without copying the page tables of the parent process. It may be useful in performance-sensitive applications where a child is created which then immediately
issues an execve(2).
In other words, the intended use of vfork() is only to create children that will execute other programs through exec*(), making this faster than a normal fork() because the page table of the parent is not duplicated in the child (since it's going to be replaced by exec*() anyway). Even then though, vfork() only has a real advantage if this kind of operation needs to be performed multiple times. Since parent memory is not copied, accessing it in any way is undefined behavior.
here is the requirements for vfork()
#include <sys/types.h>
#include <unistd.h>
pid_t vfork(void);
Notice the OPs posted code fails to include the needed header files.

why vfork() causes the parent process crash(segment fault)?

The vfork can change variables in parent process, but why can't it increase the stack?
void f1()
{
vfork();
}
f2() leads to the crash.
void f2()
{
char buf[100];
}
int main()
{
f1();
f2();
_exit(0);
}
If I change vfork() to fork(), the crash won't happen.
The only thing you're allowed to do after calling vfork() is execute a file. It's right in the documentation:
The vfork() function shall be equivalent to fork(), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit() or one of the exec family of functions.
...
>
The use of vfork() for any purpose except as a prelude to an immediate call to a function from the exec family, or to _exit(), is not advised.
To wit, the only legal calls are _exit and exec*.

vfork never ends

The following code never ends. Why is that?
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#define SIZE 5
int nums[SIZE] = {0, 1, 2, 3, 4};
int main()
{
int i;
pid_t pid;
pid = vfork();
if(pid == 0){ /* Child process */
for(i = 0; i < SIZE; i++){
nums[i] *= -i;
printf(”CHILD: %d “, nums[i]); /* LINE X */
}
}
else if (pid > 0){ /* Parent process */
wait(NULL);
for(i = 0; i < SIZE; i++)
printf(”PARENT: %d “, nums[i]); /* LINE Y */
}
return 0;
}
Update:
This code is just to illustrate some of the confusions I have regarding to vfork(). It seems like when I use vfork(), the child process doesn't copy the address space of the parent. Instead, it shares the address space. In that case, I would expect the nums array get updated by both of the processes, my question is in what order? How the OS synchronizes between the two?
As for why the code never ends, it is probably because I don't have any _exit() or exec() statement explicitly for exit. Am I right?
UPDATE2:
I just read: 56. Difference between the fork() and vfork() system call?
and I think this article helps me with my first confusion.
The child process from vfork() system call executes in the parent’s
address space (this can overwrite the parent’s data and stack ) which
suspends the parent process until the child process exits.
To quote from the vfork(2) man page:
The vfork() function has the same effect as fork(), except that the behaviour is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit() or one of the exec family of functions.
You're doing a whole bunch of those things, so you shouldn't expect it to work. I think the real question here is: why you're using vfork() rather than fork()?
Don't use vfork. That's the simplest advice you can get. The only thing that vfork gives you is suspending the parent until the child either calls exec* or _exit. The part about sharing the address space is incorrect, some operating systems do it, other choose not to because it's very unsafe and has caused serious bugs.
Last time I looked at how applications use vfork in reality the absolute majority did it wrong. It was so bad that I threw away the 6 character change that enabled address space sharing on the operating system I was working on at that time. Almost everyone who uses vfork at least leaks memory if not worse.
If you really want to use vfork, don't do anything other than immediately call _exit or execve after it returns in the child process. Anything else and you're entering undefined territory. And I really mean "anything". You start parsing your strings to make arguments for your exec call and you're pretty much guaranteed that something will touch something it's not supposed to touch. And I also mean execve, not some other function from the exec family. Many libc out there do things in execvp, execl, execle, etc. that are unsafe in a vfork context.
What is specifically happening in your example:
If your operating system shares address space the child returning from main means that your environment cleans things up (flush stdout since you called printf, free memory that was allocated by printf and such things). This means that there are other functions called that will overwrite the stack frame the parent was stuck in. vfork returning in the parent returns to a stack frame that has been overwritten and anything can happen, it might not even have a return address on the stack to return to anymore. You first entered undefined behavior country by calling printf, then the return from main brought you into undefined behavior continent and the cleanup run after the return from main made you travel to undefined behavior planet.
From the official specification:
the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(),
In your program you modify data other than the pid variable, meaning the behavior is undefined.
You also have to call _exit to end the process, or call one of the exec family of functions.
The child must _exit rather than returning from main. If the child returns from main, then the stack frame does not exist for the parent when it returns from vfork.
just call the _exit instead of calling return or insert _exit(0) to the last line in "child process". return 0 calls exit(0) while close the stdout, so when another printf follows, the program crashes.

return value in vfork() system call

Considering the below code :
int main()
{
int pid;
pid=vfork();
if(pid==0)
printf("child\n");
else
printf("parent\n");
return 0;
}
In case of vfork() the adress space used by parent process and child process is same, so single copy of variable pid should be there. Now i cant understand how this pid variable can have two values returned by vfork() i.e. zero for child and non zero for parent ?
In case of fork() the adress space also gets copied and there are two copy of pid variable in each child and parent, so I can understand in this case two different copies can have different values returned by fork() but can't understand in case of vfork() how pid have two values returned by vfork()?
There aren't 2 copies. When you cal vfork the parent freezes while the child does its thing (until it calls _exit(2) or execve(2)). So at any single moment, there's only a single pid variable.
As a side note, what you are doing is unsafe. The standard spells it clearly:
The vfork() function shall be equivalent to fork(), except that the
behavior is undefined if the process created by vfork() either
modifies any data other than a variable of type pid_t used to store
the return value from vfork(), or returns from the function in which
vfork() was called, or calls any other function before successfully
calling _exit() or one of the exec family of functions.
As a second side note, vfork has been removed from SUSv4 - there's really no point in using it.

peterson's solution for critical section problem

#include<stdio.h>
#include<sys/types.h>
#include<stdlib.h>
int turn;
int flag[2];
int main(void)
{
int pid,parent=1;
printf("before vfork\n");
if((pid=vfork())<0)
{
perror("vfork error\n");
return 1;
}
while(1)
{
if(pid==0)
{
while(parent==1)
{
sleep(2);
}
parent=0;
flag[0]=1;
turn=1;
while(flag[1]&&turn==1);
printf("This is critical section:parent process\n");
flag[0]=0;
}
else
{
parent=2;
printf("This is parent");
flag[1]=1;
turn=0;
while(flag[0]&&turn==0);
printf("This is critical section:child process %d \n",pid);
flag[1]=0;
}
}
}
This is my code. Can anyone tell why control is not coming to my parent process.
man 2 vfork says:
(From POSIX.1) The vfork() function has the same effect as fork(2),
except that the behavior is undefined if the process created by vfork()
either modifies any data other than a variable of type pid_t used to
store the return value from vfork(), or returns from the function in
which vfork() was called, or calls any other function before success-
fully calling _exit(2) or one of the exec(3) family of functions.
Because you are modifying data in child process, vfork() behaviour is undefined, so whatever it does is correct (here by "correct" I mean "complies with the spec").
Because, the entire virtual address space of the parent is replicated in the child.
So two processes has separate address spaces. And flag[1] never become 1 in parent process.
vfork() was designed for fast creation of processes with execve(): vfork() + immediate execve(). And in cases when children modifies any data of parent process, behavior is undefined. Check man about details.
In my case, that behavior was such that parent reach control only after close of children.
Here is the important bit from the Linux man page. Keywords: "parent is suspended until the child ..."
vfork() differs from fork(2) in that the parent is suspended until the
child makes a call to execve(2) or _exit(2). The child shares all mem-
ory with its parent, including the stack, until execve(2) is issued by
the child. The child must not return from the current function or call
exit(3), but may call _exit(2).
Your parent process has been suspended. Consider using clone.
This code doesn't guarantee fairness; it only prevents deadlock.
There's nothing stopping the child process from executing over and over again.
If you want the lock to be fair, you'll have to make it so.

Resources