I'm currently studying the fork() function in C. I understand what it does (I think). Why do we check it in the following program?
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int pid;
pid=fork();
if(pid<0) /* Why is this here? */
{
fprintf(stderr, "Fork failed");
exit(-1);
}
else if (pid == 0)
{
printf("Printed from the child process\n");
}
else
{
printf("Printed from the parent process\n");
wait(pid);
}
}
In this program we check if the PID returned is < 0, which would indicate a failure. Why can fork() fail?
From the man page:
Fork() will fail and no child process will be created if:
[EAGAIN] The system-imposed limit on the total number of pro-
cesses under execution would be exceeded. This limit
is configuration-dependent.
[EAGAIN] The system-imposed limit MAXUPRC (<sys/param.h>) on the
total number of processes under execution by a single
user would be exceeded.
[ENOMEM] There is insufficient swap space for the new process.
(This is from the OS X man page, but the reasons on other systems are similar.)
fork can fail because you live in the real world, not some infinitely-recursive mathematical fantasy-land, and thus resources are finite. In particular, sizeof(pid_t) is finite, and this puts a hard upper bound of 256^sizeof(pid_t) on the number of times fork could possibly succeed (without any of the processes terminating). Aside from that, you also have other resources to worry about like memory.
There is not enough memory available to make the new process perhaps.
If the kernel fails to allocate memory for example, that's pretty bad and would cause fork() to fail.
Have a look at the error codes here:
http://linux.die.net/man/2/fork
Apparently it can fail (not really fail but hang infinitely) due to the following things coming together:
trying to profile some code
many threads
much memory allocation
See also:
clone() syscall infinitely restarts because of SIGPROF signals #97
Hanging in ARCH_FORK with CPUPROFILE #704
SIGPROF keeps a large task from ever completing a fork(). Bug 645528
Example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main()
{
size_t sz = 32*(size_t)(1024*1024*1024);
char *p = (char*)malloc(sz);
memset(p, 0, sz);
fork();
return 0;
}
Build:
gcc -pg tmp.c
Run:
./a.out
Related
I am reading the Operating Systems: Three Easy Pieces book, Chapter 5.
It says:
The fork() system call is strange; its partner in crime, exec(), is
not so normal either. What it does: given the name of an executable
(e.g., wc), and some arguments (e.g., p3.c), it loads code (and static
data) from that executable and overwrites its current code segment
(and current static data) with it; the heap and stack and other parts
of the memory space of the program are re-initialized.
Then I have a question with this sample code in this chapter:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <assert.h>
#include <sys/wait.h>
int
main(int argc, char *argv[])
{
int rc = fork();
if (rc < 0) {
// fork failed; exit
fprintf(stderr, "fork failed\n");
exit(1);
} else if (rc == 0) {
// child: redirect standard output to a file
close(STDOUT_FILENO);
open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
// now exec "wc"...
char *myargs[3];
myargs[0] = strdup("wc"); // program: "wc" (word count)
myargs[1] = strdup("p4.c"); // argument: file to count
myargs[2] = NULL; // marks end of array
execvp(myargs[0], myargs); // runs word count
} else {
// parent goes down this path (original process)
int wc = wait(NULL);
assert(wc >= 0);
}
return 0;
}
According to man strdup, margs[0] and margs[1] are created with malloc on the heap. So when execvp reinitialize the heap and stack the child's memory space, won't they be cleared or destroyed so as a result using margs[0] and margs[1] would be undefined behaviour?
The newly created process makes a copy of the arguments from the myargs array before the old process memory is zapped, precisely so there is no problem with memory accesses.
The POSIX specification for excevp() et al says:
The arguments specified by a program with one of the exec functions shall be passed on to the new process image in the corresponding main() arguments.
That page specifies a lot of other key behaviours of the exec() family of functions. You'll probably find that the Linux equivalent page specifies even more things that are affected (or not affected) by the exec() family of functions.
Note that if a function from the exec() family succeeds, it does not return. If it returns, it failed. There's no need to check the return value (because it will always be -1). But there is usually a need to report that the execution failed and most often, a process exits with a non-zero status after a failed exec().
I am writing a pretty simple script .
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(){
int pipefd[2];
pid_t c;
int value[2];
c = fork();
if(c<0){
perror("in fork");
exit(1);
}
if(c==0){
printf("i am the child\n");
int buf[2];
buf[0]=3;
buf[1]=0;
write(pipefd[1], buf, 4);
write(pipefd[1],buf+1,4);
close(pipefd[1]);
exit(0);
}
if (pipe(pipefd) == -1) { /*UPDATE */
perror("pipe");
exit(EXIT_FAILURE);
}
read(pipefd[0], value, 4);
read(pipefd[0], value+1, 4);
close(pipefd[0]);
printf("%d %d\n", value[0], value[1]);
exit(0);
}
What I intend to do is to achieve:
value[0] = buf[0];
value[1] = buf[1];
( and print those of course).
But all I get as a result is :
-1299582208 32766
i am the child
Because, I have ints, I assumed that each will hold 4 bytes. And I think that for an int array each element will holds 4 bytes. But clearly I am missing something. Any help?
As I mentioned in my top comment: Where is the pipe syscall?
Without it, the write and read calls will probably fail because pipefd has random values.
So, the parent will never have value filled in correctly.
Because these [unitialized] values are on the stack, they will have random values, which is what you're seeing.
This is UB [undefined behavior].
Different systems/compilers may manipulate the stack differently, which is why you see different [yet still random] results on different configurations.
To fix, add the following above your fork call:
pipe(pipefd);
I downloaded, built, and ran your program. Before I added the fix, I got random values. After applying the fix, I get 3 0 as the output, which is what you expected/wanted.
Note: As others have mentioned, you could check the return codes for read and write. If you had, they might return -1 and put an error code in errno that would have helped you debug the issue.
A very simple fix would be to put a sleep(1) call right above your read() calls - obviously this isn't a great solution.
An important early lesson in multi process programming and communications is "race conditions". Your fork'd child is executing before the parent, it seems. I bet if you ran this 20 times, you might get X number of times where it does what you want!
You cannot guarantee the order of execution. So a sleep(1) will suffice until you learn more advanced techniques on resource locking (mutexes, semaphores).
Within the child process, is there any way that it determine whether it was launched as a fork with overlay memory, or a vfork with shared memory?
Basically, our logging engine needs to be much more careful (and not log some classes of activity) in vfork. In fork it needs to cooperate with the parent process in ways that it doesn't in vfork. We know how to do those two things, but not how to decide.
I know I could probably intercept the fork/vfork/clone calls, and store the fork/vfork/mapping status as a flag, but it would make life somewhat simpler if there was an API call the child could make to determine its own state.
Extra marks: Ideally I also need to pick up any places in libraries that have done a fork or vfork and then called back into our code. And how that can happen? At least one of the libraries we have offers a popen-like API where a client call-back is called from the fork child before the exec. Clearly the utility of that call-back is significantly restricted in vfork.
All code not specifically designed to work under vfork() doesn't work under vfork().
Technically, you can check if you're in a vfork() child by calling mmap() and checking if the memory mapping was inherited by the parent process under /proc. Do not write this code. It's a really bad idea and nobody should be using it. Really, the best way to tell if you're in a vfork() child or not is to be passed that information. But here comes the punchline. What are you going to do with it?
The things you can't do as a vfork() child include calling fprintf(), puts(), fopen(), or any other standard I/O function, nor malloc() for that matter. Unless the code is very carefully designed, you're best off not calling into your logging framework at all, and if it is carefully designed you don't need to know. A better design would most likely be log your intent before calling vfork() in the first place.
You ask in comments about a library calling fork() and then back into your code. That's already kind of bad. But no library should ever ever call vfork() and back into your code without being explicitly documented as doing so. vfork() is a constrained environment and calling things not expected to be in that environment really should not happen.
A simple solution could use pthread_atfork(). The callbacks registered with this service are triggered only upon fork(). So, the 3rd parameter of the function, which is called in the child process right after the fork, could update a global variable. The child can check the variable and if it is modified, then it has been forked:
/*
Simple program which demonstrates a solution to
make the child process know if it has been forked or vforked
*/
#include <pthread.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
pid_t forked;
void child_hdl(void)
{
forked = getpid();
}
int main(void)
{
pid_t pid;
pthread_atfork(0, 0, child_hdl);
pid = fork();
if (pid == 0) {
if (forked != 0) {
printf("1. It is a fork()\n");
}
exit(0);
}
// Father continues here
wait(NULL);
pid = vfork();
if (pid == 0) {
if (forked != 0) {
printf("2. It is a fork()\n");
}
_exit(0);
}
// Father continues here
wait(NULL);
return 0;
}
Build/execution:
$ gcc fork_or_vfork.c
$ ./a.out
1. It is a fork()
I came across kcmp today, which looks like it can answer the basic question - i.e. do two tids or pids share the same VM. If you know they represent forked parent/child pids, this can perhaps tell you if they are vfork()ed.
Of course if they are tids in the same process group then they will by definition share VM.
https://man7.org/linux/man-pages/man2/kcmp.2.html
int syscall(SYS_kcmp, pid_t pid1, pid_t pid2, int type,
unsigned long idx1, unsigned long idx2);
KCMP_VM
Check whether the processes share the same address space.
The arguments idx1 and idx2 are ignored. See the
discussion of the CLONE_VM flag in clone(2).
If you were created by vfork, your parent will be waiting for you to terminate. Otherwise, it's still running. Here's some very ugly code:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
void doCheck()
{
char buf[512];
sprintf(buf, "/proc/%d/wchan", (int) getppid());
int j = open(buf, O_RDONLY);
if (j < 0) printf("No open!\n");
int k = read(j, buf, 500);
if (k <= 0) printf("k=%d\n", k);
close(j);
buf[k] = 0;
char *ptr = strstr(buf, "vfork");
if (ptr != NULL)
printf("I am the vfork child!\n");
else
printf("I am the fork child!\n");
}
int main()
{
if (fork() == 0)
{
doCheck();
_exit(0);
}
sleep(1);
if (vfork() == 0)
{
doCheck();
_exit(0);
}
sleep(1);
}
This is not perfect, the parent might be waiting for a subsequent vfork call to complete.
I'm not familiar with C at all.
How do I start a child process? This child process is going to execute the specified command with a call to execve(). It will try to search among the file directory specified in the environment variable PATH after the command can be
found as executable file.
I've done this so far:
//Do commands
pid_t childId;
// Fork the child process
child_id = safefork.c(); //can't use fork();
safefork.c
Code provided by the tutor; do not damn the messenger!
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/errno.h>
extern int errno;
#define MAX_PROCESSES 6
static int n_processes(void)
{
return system("exit `/bin/ps | /store/bin/wc -l`")/256;
}
pid_t safefork(void)
{
static int n_initial = -1;
if (n_initial == -1) /* Første gang funksjonen kalles: */
n_initial = n_processes();
else if (n_processes() >= n_initial+MAX_PROCESSES) {
sleep(2);
errno = EAGAIN;
return (pid_t)-1;
}
return fork();
}
Fixing your code
In your code, get the name of the variable spelled consistently (child_id is not the same as childId).
pid_t child_id = simplefork();
if (child_id < 0)
{
...handle error...
}
else if (child_id == 0)
{
...do childish code - execve() etc...
}
else
{
...do parental code - waitpid() etc...
}
Note that the fork() call within the simplefork() function that you're given to use is responsible for creating the new process. That's all it takes; that's the way it's done for all processes except the very first process.
Why not fork()?
What do you mean by "can't use fork()"? The main alternative mechanism is vfork(), which is a very restricted variant of fork() (do not use it); or maybe you could use posix_spawn() or posix_spawnp() — which are incredibly complex alternatives. I don't think there are any other options.
After forking, you might be able to use execvp() instead of execve() — it will do the path search for you. Unless, of course, the purpose of the exercise is to implement execvp() in terms of execve().
Your code uses the notation safefork.c(), but that is not usually correct in C; I could devise a structure type that would make it work, but it probably isn't what you meant.
We got another file called safefork.c — we are not allowed to use fork, only safefork which is already given.
[…before the code was posted]
OK; that's very curious. Presumably, you got a header safefork.h which declares whatever function you're supposed to use (perhaps extern pid_t safefork(void);), and the file safefork.c which does something to wrap around fork(). 'Tis odd: I don't think fork() is a dangerous function. I'd be curious to see what the 'safe fork' does, but I'm sceptical that it is significantly safer than the standard fork function. (I suppose it could does some things like fflush(0) before invoking fork(), or do an error exit if the fork() fails, but that's pushing the envelope.)
[…after the code was posted]
A critique of the code for safefork(), which I fully recognize is not your own code but code that is given to you to use.
The code for safefork() is an abomination. It runs a shell via system() which runs ps and wc to find out how many processes you currently have running, and goes to sleep for 2 seconds if you can't do the fork() because there are too many processes running (more than 6, maybe including the 3 that the safefork() is running!) and then returns "I failed". Someone needs their head seeing to (and no, that isn't you; it is the author of the code).
Oh, and extern int errno; is incorrect; the only safe way to declare errno is by #include <errno.h>. Negative marks to the teacher for that blunder. It is not a good idea to #include <sys/errno.h>; #include <sys/types.h> is not often needed in modern POSIX — from POSIX 2008 onwards at any rate; it may have been unnecessary before that). In the context of the safefork.h header, making it self-contained does require #include <sys/types.h>.
Even assuming that safefork() is a good idea (it isn't), it should be implemented as shown below.
safefork.h
#ifndef SAFEFORK_H_INCLUDED
#define SAFEFORK_H_INCLUDED
#include <sys/types.h> // pid_t
extern pid_t safefork(void);
#endif
safefork.c
#include "safefork.h"
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_PROCESSES 6
static int n_processes(void)
{
return system("exit `/bin/ps | /store/bin/wc -l`") / 256;
}
pid_t safefork(void)
{
static int n_initial = -1;
if (n_initial == -1)
n_initial = n_processes();
else if (n_processes() >= n_initial+MAX_PROCESSES)
{
errno = EAGAIN;
return (pid_t)-1;
}
return fork();
}
I am having some trouble understanding the following simple C code:
int main(int argc, char *argv[]) {
int n=0;
fork();
n++;
printf("hello: %d\n", n);
}
My current understanding of a fork is that from that line of code on, it will split the rest of the code in 2, that will run in parallel until there is "no more code" to execute.
From that prism, the code after the fork would be:
a)
n++; //sets n = 1
printf("hello: %d\n", n); //prints "hello: 1"
b)
n++; //sets n = 2
printf("hello: %d\n", n); //prints "hello: 2"
What happens, though, is that both print
hello: 1
Why is that?
EDIT: Only now it ocurred to me that contrary to threads, processes don't share the same memory. Is that right? If yes, then that'd be the reason.
After fork() you have two processes, each with its own "n" variable.
fork() starts a new process, sharing no variables/memory locations.
It is very similar to what happens if you execute ./yourprogram twice in a shell, assuming the first thing the program does is forking.
At fork() call's end, both the processes might be referring to the same copy of n. But at n++, each gets its own copy with n=0. At the end of n++; n becomes 1 in both the processes. The printf statement outputs this value.
Actually you spawn a new process of the same progarm. It is not the closure kind of thing. You could use pipes to exchange data between parent and child.
You did indeed answer your own question in your edit.
examine this code and everything should be clearer (see the man pages if you don't know what a certain function does):
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int count = 1;
int main(int argc, char *argv[]) {
// set the "startvalue" to create the random numbers
srand(time(NULL));
int pid;
// as long as count is <= 50
for (count; count<=50; count++) {
// create new proccess if count == 9
if (count==9) {
pid = fork();
// reset start value for generating the random numbers
srand(time(NULL)+pid);
}
if (count<=25) {
// sleep for 300 ms
usleep(3*100000);
} else {
// create a random number between 1 and 5
int r = ( rand() % 5 ) + 1;
// sleep for r ms
usleep(r*100000);
}
if (pid==0) {
printf("Child: count:%d pid:%d\n", count, pid);
} else if (pid>0) {
printf("Father: count:%d pid:%d\n", count, pid);
}
}
return 0;
}
happy coding ;-)
The system call forks more than the execution thread: also forked is the data space. You have two n variables at that point.
There are a few interesting things that follow from all this:
A program that fork()s must consider unwritten output buffers. They can be flushed before the fork, or cleared after the fork, or the program can _exit() instead of exit() to at least avoid automatic buffer flushing on exit.
Fork is often implemented with copy-on-write in order to avoid unnecessarily duplicating a large data memory that won't be used in the child.
Finally, an alternate call vfork() has been revived in most current Unix versions, after vanishing for a period of time following its introduction i 4.0BSD. Vfork() does not pretend to duplicate the data space, and so the implementation can be even faster than a copy-on-write fork(). (Its implementation in Linux may be due less to speed reasons than because a few programs actually depend on the vfork() semantics.)