Suppose I have this code:
int main () {
int i, r;
i = 5;
printf("%d\n", i);
r = fork();
if (r > 0) {
i = 6;
}
else if (r == 0) {
i = 4;
}
printf("%d\n", i);
}
I was wondering does the forked child process start executing either from the beginning or from where it was called. The reason I ask this is because on my own system I get the output 5,6,4 which means that is starts from where it is called but typing it in http://ideone.com/rHppMp I get 5,6,5,4?
One process calls fork, two processes return (errors notwithstanding). That's how it works. So the child starts from the next "line" (technically it starts with the assignment to r).
What you're seeing here has to do with buffering. In the online case, you'll find that it's using full buffering for standard output, meaning the initial 5 hasn't yet been flushed to the output device.
Hence, at the fork, both parent and child have it and both will flush at some point.
For the line buffered case, the parent flushes on the newline so the line is no longer in the buffer at the fork.
The rules are explicit. Standard output is set to line buffered only if the output device is known to be a terminal. In cases where you redirect to a file, or catch the output in an online environment so you can sanitise it for browser output, it'll be fully buffered.
Hence why you're seeing a difference.
It's generally a good idea to flush all output handles (with fflush) before forking.
You can't judge either parent will execute first or child, in most of the cases child executes. The output should not be "5 6 5 4" might be some garbage value from buffer. You can use fflush(NULL) to flush of the buffer before fork() and try again.
Generally, your application (with its two process threads) has little to no control over how the process threads are organized on the run-queue of your OS. After fork(), there is no specific order you should expect. In fact, if your OS supports multiple CPUs, the may actually both run at the same time; which can result in some unexpected output as both processes compete for stdout.
Related
This question already has answers here:
printf anomaly after "fork()"
(3 answers)
fork() branches more than expected?
(3 answers)
Closed last year.
I have the following C code:
#include <stdio.h>
#include <unistd.h>
int main()
{
int i, pid = 0;
for (i = 0; i < 3; i++)
{
fork();
pid = getpid();
printf("i=%d pid=%d\n", i, pid);
}
return 1;
}
Which is supposed to create a total of 7 new processes after all the iterations in the loop. Analyzing it you can see that 14 lines should be printed before all the processes finish, and that is exactly what you see when you execute it from the command line.
However, when you redirect the output to another file ./main > output.txt; cat output.txt, you get a completely different situation. In total, 24 lines are always printed and some of them are repeated for the same i and pid values, and the amount of repetition seems consistent. I'm attaching a screenshot for clarification here Execution example. The system that I'm using is Ubuntu 20.04.3 in a VirtualBox VM.
I really don't understand why that is happening, I'm guessing it has something to do with race conditions on the output buffer or some other conflict when multiple processes are writing to the file, but that doesn't explain to me why it doesn't happen on the terminal. Can anybody explain this odd behaviour? Thanks!
When the standard output is a terminal, the stream is typically line buffered. The C standard requires it not be fully buffered, meaning it must be line buffered or unbuffered; C 2018 7.21.3 6 says:
… As initially opened, … the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
When the program executes printf("i=%d pid=%d\n", i, pid);, the output is immediately sent to the terminal, either because the stream is line buffered and the new-line character causes the output to be sent or because the stream is unbuffered and the output is always sent in each printf. Then, when the program forks, there is no pending output, because it has already been sent to the terminal. Each forked instance of the program prints only its own output.
When the standard output is redirected to a file, the stream is fully buffered. Then, when the program executes printf("i=%d pid=%d\n", i, pid);, the data is held in a buffer inside the program. It is not sent to the terminal immediately. (It will be sent when the buffer is full or when a flush is requested, which occurs automatically at normal program termination.) When the program forks, the buffer is copied along with the rest of the program state. Each forked instance of the program accumulates output in the buffer.
When each forked instance of the program exits, pending data in its buffers are flushed. Thus includes both data added by that particular instance and data that was put into the buffer in parent processes and copied by the fork. Thus multiple copies of data are printed.
To resolve this, execute fflush(stdout); immediately before fork();. This flushes the buffer before forking. Alternately, request that the stream be line-buffered by executing setvbuf(stdout, NULL, _IOLBF, 0); at the start of main.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void) {
for (int i = 1; i < 4; i++) {
printf("%d", i);
int id = fork();
if (id == 0) {
printf("Hello\n");
exit(0);
} else {
exit(0);
}
}
return 0;
}
For this code, it prints 11Hello on my computer. It seems counter-intuitive to me because "1" is printed twice but it's before the fork() is called.
The fork() system call forks a new process and executes the instruction that follows it in each process parallelly. After your child process prints the value of i to the stdout, it gets buffered which then prints the value of 'i' again because stdout was not flushed.
Use the fflush(stdout); so that 'i' gets printed only once per fork.
Alternately, you could also use printf("%d\n", i); where the new line character at the end does the job.
Where does the process start to execute after fork()
fork() duplicates the image of the process and it's context. It will run the next line of code pointed by the instruction pointer.
It seems counter-intuitive to me because "1" is printed twice but it's before the fork() is called.
Read printf anomaly after "fork()"
To begin with, the for loop is superfluous in your example.
Recall that the child copies the caller's memory(that of its parent) (code, globals, heap and stack), registers, and open files. To be performant or there may be some other reason, the printf call may not flush the buffer and put the things passed to that except for some cases such as appending new-line-terminator.
Before forking, the parent(main process) is on the way.
Let's assume we're on a single core system and the child first preempts the core.
1 is in the buffer because its parent put it into that before forking. Then, the child reaches second print statement, a caveat here is that the child can be orphaned at that time(no matter for this moment), passing "Hello\n" string including new-line character giving rise to dump the buffer/cache(whatever you call.) Since it sees \n character, it flushes the buffer including prior 1 added by its parent, that is 11Hello.
Let's assume the parent preempts the core at first,
It surrenders after calling exit statement, bringing on the child to be orphaned, causing memory leak. After that point, the boss(init possessing process id as 1) whose newly name I forget(it may be sys-something) should handle this case. However, nothing is changed as to the printing-steps. So you run into again 11Hello except if not the buffer is flushed automagically.
I don't have much working experience with them but university class(I failed at the course 4 times). However, I can advise you whenever possible use stderr while coping with these tings since it is not buffered, in lieu of stdout or there is some magical way(I forget it again, you call it at the beginning in main()) you can opt for to disable buffering for stdout as well.
To be more competent over these topics, you should glance at The Linux Programming Interface of Michael Kerrisk and the topics related to William Pursell,
Jonathan Leffler,
WhozCraig,
John Bollinger, and
Nominal Animal. I have learnt a plethora of information from them even if the information almost wholly is useless in Turkey borders.
*Magic means needing a lot of details to explain.
I have a simple C program to time process startup (I'd rather not post the full code as it's an active school assignment). My main function looks like this:
int main(void) {
int i;
for (i = 0; i < 5; i++) {
printf("%lf\n", sample_time());
}
exit(0);
}
sample_time() is a function that times how long it takes to fork a new process and returns the result in seconds as a double. The part of sample_time() that forks:
double sample_time() {
// timing stuff
if (!fork()) exit(0); // immediately close new process
// timing stuff
return a_sample_time;
}
As expected, running the program, times, in the terminal outputs 5 numbers like so:
$ ./times
0.000085
0.000075
0.000079
0.000071
0.000078
However, trying to redirect this into a file (or anywhere else) in a Unix terminal produces unexpected results.
For example, ./times > times.out creates a file with fifteen numbers. Additionally, ./times | wc -l outputs 15, confirming the earlier result. Running ./times | cat, I again see fifteen numbers, more than five of which are distinct.
Does anyone know what on earth can cause something like this? I'm out of ideas.
./times != ./times | cat. Wat.
Prerequisite knowledge
Fact 1 - When stdout is connected to a TTY it is line buffered. When it's connected to a file or a pipeline it is full buffered. This means it's only flushed every 8KB, say, rather than every line.
Fact 2 - Forked processes have duplicate copies of in-memory data. This includes stdio's output buffers if the data hasn't been flushed yet.
Fact 3 - Calling exit() causes stdio's output buffers to be flushed before the program exits.
Case 1: Output to terminal
When your program prints to the terminal its output is line buffered. Each printf() call that ends with \n immediately prints. This means each line is printed and the in-memory output buffer is emptied before fork() runs.
Result: 5 lines of output.
Case 2: Output to pipeline or file
When libc sees that stdout isn't connected to a TTY it switches to a more efficient full buffering strategy. This causes output to be buffered until 4KB worth have accumulated. That means the output from the printf()s is saved in memory, and calls to write() are deferred.
if (!fork()) exit(0);
After forking, the child process has a copy of the buffered output. The exit() call then causes that buffer to be flushed. This doesn't affect the parent process, though. Its output is still buffered.
Then when the second line of output is printed, it has two lines buffered. The next child process forks, exits, and prints those two lines. The parent retains its two lines of output, and so on.
Result: The child processes print 0, 1, 2, 3, and 4 lines of output. The main program prints 5 when it finally exits and flushes its output. 0 + 1 + 2 + 3 + 4 + 5 = 15. 15 lines of output instead of 5!
Solutions
Call _Exit() instead of exit(). The function _Exit() is like exit(), but does not call any functions registered with atexit(). This would be my preferred solution.
Explicitly set stdout to be line buffered: setvbuf(stdout, NULL, _IOLBF, 0);
Call fflush(stdout) after each printf.
I'm learning to work with fork(), and I have some questions.
Consider the following code:
#include <stdio.h>
#include <unistd.h>
int main()
{
int i;
for(i = 0; i < 5; i++)
{
printf("%d", i);
if((i%2)==0)
if(fork())
fork();
}
}
When I output to a terminal, I get the result I expect (i.e.: 0,1,1,1,2,2,2,...). But when I output to a file, the result is completely different:
Case 1: (output to terminal, e.g.: ./a.out):
Result is: 0,1,1,1,2,2,2,...
Case 2: (output to file, e.g.: ./a.out > output_file)
Result is: 0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,...
Why it is like this?
When you output to a file, the stdio library automatically block-buffers the outbound bits.
When a program calls exit(2) or returns from main(), any remaining buffered bits are flushed.
In a program like this that doesn't generate much output, all of the I/O will occur after the return from main(), when the destination is not a tty. This will often change the pattern and order of I/O operations all by itself.
In this case, the result is further complicated by the series of fork() calls. This will duplicate the partially filled and as-yet-unflushed I/O buffers in each child image.
Before a program calls fork(), one might first flush I/O using fflush(3). If this flush is not done, then you may want all processes except one (typically: the children) to _exit(2) instead of exit(3) or return from main(), to prevent the same bits from being output more than once. (_exit(2) just does the exit system call.)
The fork() inside if block in your program is executed twice, because once fork is successful, the program is controlled by two processes(child and parent processes).So fork() inside if block, is executed by both child and parent processes. So it will have different output than expected since it is controlled by two different process and their order of execution is not known. ie. either child or parent may execute first after each fork()
For the difference in behaviour between the output and the file. this is the reason.
The contents you write to the buffer(to be written to file(disk) eventually) is not guaranteed to be written to the file (disk) immediatley. It is mostly flushed to the disk only after the execution of main() is complete. Whereas, it is output to terminal, during the execution of main().
Before writing to file in disk, the kernel actually copies the data into a buffer and later in the background, the kernel gathers up all of the dirty buffers, sorts them optimally and writes them out to file(disk).This is called writeback. It also allows the kernel to defer writes to more idle periods and batch many writes together.
To avoid such behaviour, it is always good to have three different condition checks in program using fork()
int pid;
if((pid = fork()) == -1 )
{ //fork unsuccessful
}
else if ( pid > 0)
{ //This is parent
}
else
{//This is child
}
buffered streams can produce some strange results sometimes... especially when you have multiple processes using the same buffered stream. Force the buffer to be flushed and you'll see different results:
int main()
{
int i;
FILE * fd = fopen(yourfile, "w");
for(i = 0; i < 5; i++)
{
fprintf(fd, "%d", i);
fflush(fd);
if((i%2)==0)
if(fork())
fork();
}
}
Also, for your debugging purposes, it might be nice to dump the process' IDs so you can see which process spawns which, and have a better idea of what's going on. getpid() can help you with that.
Why do I have different output between a terminal and a file when
forking?
C standard library functions use internal buffering for speed up. Most implementations use fully buffered IO for file streams, line buffered for stdin/stdout and unbuffered for stderr.
So your problem can be solved in number of ways:
Use explicit buffer flush before fork via fflush(3)
Set buffer type manually via setvbuf(3)
Use write(2) instead of stdlib's printf(3)
Output to stderr by default via fprintf(3) *****
Exit with _exit(2) in forked processes instead of exit(3) ******
Last two may not work as expected if:
* your implementation does not use unbuffered writes to stderr by default (Which is required by ISO C)
** you have written more than default buffer size in child and if was automatically flushed.
PS. Yet again, if you need deeper knowledge of standard library functions and buffering I recommend reading Advanced Programming in the UNIX Environment (2nd Edition) by W. Richard Stevens and Stephen A. Rago.
PPS. btw, your question is a very popular interview question for C/C++ programmer position.
I was looking at some simple code on fork, and decided to try it out for myself. I compiled and then ran it from inside Emacs, and got a different output to that output produced from running it in Bash.
#include <unistd.h>
#include <stdio.h>
int main() {
if (fork() != 0) {
printf("%d: X\n", getpid());
}
if (fork() != 0) {
printf("%d: Y\n", getpid());
}
printf("%d: Z\n", getpid());
}
I compiled it with gcc, and then ran a.out from inside Emacs, as well as piping it to cat, and grep ., and got this.
2055: X
2055: Y
2055: Z
2055: X
2058: Z
2057: Y
2057: Z
2059: Z
This isn't right. Running it just from Bash I get (which I expected)
2084: X
2084: Y
2084: Z
2085: Y
2085: Z
2087: Z
2086: Z
edit - missed some newlines
What's going on?
The order in which different processes write their output is entirely unpredictable. So the only surprise is that sometimes the "X" print statement sometimes happens twice.
I believe this is because sometimes at the second fork(), an output line including "X" is in an output buffer, needing to be flushed. So both processes eventually print it. Since getpid() was already called and converted into the string, they'll show the same pid.
I was able to reproduce multiple "X" lines, but if I add fflush(stdout); just before the second fork(), I always only see one "X" line and always a total of 7 lines.
I think I know what's going on. The stdio buffering will be different when output is a tty versus when it's a pipe or a file. The child processes inherit the parent buffers. When they're flushed, you can get double output.
If you add
fflush(stdout);
right after each printf() call, you'll see what I mean.
The interesting thing is that it's different when standard output is a tty device. It may be that the library knows what that means, and flushes after each line break, or something like that.
So I imagine you are wondering why you are getting more than one "X"?
This is because buffered output is being flushed twice.
When you pipe a program's output, the stdio library recognizes that your output is not a terminal, and it switches to block buffering instead of line buffering. Consequently, there isn't yet any output when the process forks and so now both parent and child have pending output.
If you have used stdout at all before forking, you must call fflush(stdout) before fork() (and likewise for any other output FILEs you use). Failure to do so results in undefined behavior. The effect you're seeing comes from stdout being line-buffered when it's connected to a terminal, but fully buffered when it's connected to a pipe. This is not required, but recommended by the standards (POSIX).