Program output changes when piped - c

I have a simple C program to time process startup (I'd rather not post the full code as it's an active school assignment). My main function looks like this:
int main(void) {
int i;
for (i = 0; i < 5; i++) {
printf("%lf\n", sample_time());
}
exit(0);
}
sample_time() is a function that times how long it takes to fork a new process and returns the result in seconds as a double. The part of sample_time() that forks:
double sample_time() {
// timing stuff
if (!fork()) exit(0); // immediately close new process
// timing stuff
return a_sample_time;
}
As expected, running the program, times, in the terminal outputs 5 numbers like so:
$ ./times
0.000085
0.000075
0.000079
0.000071
0.000078
However, trying to redirect this into a file (or anywhere else) in a Unix terminal produces unexpected results.
For example, ./times > times.out creates a file with fifteen numbers. Additionally, ./times | wc -l outputs 15, confirming the earlier result. Running ./times | cat, I again see fifteen numbers, more than five of which are distinct.
Does anyone know what on earth can cause something like this? I'm out of ideas.
./times != ./times | cat. Wat.

Prerequisite knowledge
Fact 1 - When stdout is connected to a TTY it is line buffered. When it's connected to a file or a pipeline it is full buffered. This means it's only flushed every 8KB, say, rather than every line.
Fact 2 - Forked processes have duplicate copies of in-memory data. This includes stdio's output buffers if the data hasn't been flushed yet.
Fact 3 - Calling exit() causes stdio's output buffers to be flushed before the program exits.
Case 1: Output to terminal
When your program prints to the terminal its output is line buffered. Each printf() call that ends with \n immediately prints. This means each line is printed and the in-memory output buffer is emptied before fork() runs.
Result: 5 lines of output.
Case 2: Output to pipeline or file
When libc sees that stdout isn't connected to a TTY it switches to a more efficient full buffering strategy. This causes output to be buffered until 4KB worth have accumulated. That means the output from the printf()s is saved in memory, and calls to write() are deferred.
if (!fork()) exit(0);
After forking, the child process has a copy of the buffered output. The exit() call then causes that buffer to be flushed. This doesn't affect the parent process, though. Its output is still buffered.
Then when the second line of output is printed, it has two lines buffered. The next child process forks, exits, and prints those two lines. The parent retains its two lines of output, and so on.
Result: The child processes print 0, 1, 2, 3, and 4 lines of output. The main program prints 5 when it finally exits and flushes its output. 0 + 1 + 2 + 3 + 4 + 5 = 15. 15 lines of output instead of 5!
Solutions
Call _Exit() instead of exit(). The function _Exit() is like exit(), but does not call any functions registered with atexit(). This would be my preferred solution.
Explicitly set stdout to be line buffered: setvbuf(stdout, NULL, _IOLBF, 0);
Call fflush(stdout) after each printf.

Related

Forking code creates unexpected results when redirecting output to file [duplicate]

This question already has answers here:
printf anomaly after "fork()"
(3 answers)
fork() branches more than expected?
(3 answers)
Closed last year.
I have the following C code:
#include <stdio.h>
#include <unistd.h>
int main()
{
int i, pid = 0;
for (i = 0; i < 3; i++)
{
fork();
pid = getpid();
printf("i=%d pid=%d\n", i, pid);
}
return 1;
}
Which is supposed to create a total of 7 new processes after all the iterations in the loop. Analyzing it you can see that 14 lines should be printed before all the processes finish, and that is exactly what you see when you execute it from the command line.
However, when you redirect the output to another file ./main > output.txt; cat output.txt, you get a completely different situation. In total, 24 lines are always printed and some of them are repeated for the same i and pid values, and the amount of repetition seems consistent. I'm attaching a screenshot for clarification here Execution example. The system that I'm using is Ubuntu 20.04.3 in a VirtualBox VM.
I really don't understand why that is happening, I'm guessing it has something to do with race conditions on the output buffer or some other conflict when multiple processes are writing to the file, but that doesn't explain to me why it doesn't happen on the terminal. Can anybody explain this odd behaviour? Thanks!
When the standard output is a terminal, the stream is typically line buffered. The C standard requires it not be fully buffered, meaning it must be line buffered or unbuffered; C 2018 7.21.3 6 says:
… As initially opened, … the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
When the program executes printf("i=%d pid=%d\n", i, pid);, the output is immediately sent to the terminal, either because the stream is line buffered and the new-line character causes the output to be sent or because the stream is unbuffered and the output is always sent in each printf. Then, when the program forks, there is no pending output, because it has already been sent to the terminal. Each forked instance of the program prints only its own output.
When the standard output is redirected to a file, the stream is fully buffered. Then, when the program executes printf("i=%d pid=%d\n", i, pid);, the data is held in a buffer inside the program. It is not sent to the terminal immediately. (It will be sent when the buffer is full or when a flush is requested, which occurs automatically at normal program termination.) When the program forks, the buffer is copied along with the rest of the program state. Each forked instance of the program accumulates output in the buffer.
When each forked instance of the program exits, pending data in its buffers are flushed. Thus includes both data added by that particular instance and data that was put into the buffer in parent processes and copied by the fork. Thus multiple copies of data are printed.
To resolve this, execute fflush(stdout); immediately before fork();. This flushes the buffer before forking. Alternately, request that the stream be line-buffered by executing setvbuf(stdout, NULL, _IOLBF, 0); at the start of main.

Fork() call process?

Suppose I have this code:
int main () {
int i, r;
i = 5;
printf("%d\n", i);
r = fork();
if (r > 0) {
i = 6;
}
else if (r == 0) {
i = 4;
}
printf("%d\n", i);
}
I was wondering does the forked child process start executing either from the beginning or from where it was called. The reason I ask this is because on my own system I get the output 5,6,4 which means that is starts from where it is called but typing it in http://ideone.com/rHppMp I get 5,6,5,4?
One process calls fork, two processes return (errors notwithstanding). That's how it works. So the child starts from the next "line" (technically it starts with the assignment to r).
What you're seeing here has to do with buffering. In the online case, you'll find that it's using full buffering for standard output, meaning the initial 5 hasn't yet been flushed to the output device.
Hence, at the fork, both parent and child have it and both will flush at some point.
For the line buffered case, the parent flushes on the newline so the line is no longer in the buffer at the fork.
The rules are explicit. Standard output is set to line buffered only if the output device is known to be a terminal. In cases where you redirect to a file, or catch the output in an online environment so you can sanitise it for browser output, it'll be fully buffered.
Hence why you're seeing a difference.
It's generally a good idea to flush all output handles (with fflush) before forking.
You can't judge either parent will execute first or child, in most of the cases child executes. The output should not be "5 6 5 4" might be some garbage value from buffer. You can use fflush(NULL) to flush of the buffer before fork() and try again.
Generally, your application (with its two process threads) has little to no control over how the process threads are organized on the run-queue of your OS. After fork(), there is no specific order you should expect. In fact, if your OS supports multiple CPUs, the may actually both run at the same time; which can result in some unexpected output as both processes compete for stdout.

Stdin not flushed after normal dummy program

Here is a piece of c code that I wrote after testing some stuffs.
I know this is not a vulnerability concern, but I don't understand why the stdin is not flushed after the normal return of the program, at the point that the prompt get back stdin,stdout,stderr. I mean why the remaining chars on stdin are redirected to stdout after the end of the normal execution of the program and not flushed?
$cat dummy.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
int main(){
char rbuf[100];
if (read(0, rbuf,5) == -1){
perror("learn to count");
printf("errno = %d.\n", errno);
exit(1);
}
//printf("rbuf : %s\n",rbuf);
return 1;
}
Here the execution:
$ gcc -o dummy dummy.c
$ ./dummy
AAAAA /bin/sh
$ /bin/sh
sh-3.2$ exit
exit
$
I guess this is just the remaining string of the stdin printed on the mew stdout which is the prompt. Plus the line feed at the end, it somehow emulates the enter pressed by the user to execute a command. What's going on? I'm just curious to know more about that.
Yes, your guess is right, these are extra characters in stdin:
do this:
void flush_stdin()
{
while(getchar() != '\n');
}
Note: do not use fflush() on stdin because that is undefined behavior
edit
The stdin is wired to the terminal which starts the program(which is bash). This starts a new program dummy and the stdin of dummy is wired to the stdin of bash.
From there on, the dummy process reads five characters, neglects the others(leaving them in the stdin buffer). When the control returns to bash it waits until there is atleast one character to read from in the buffer. Low and behold, there are characters in the stdin buffer, hence the program - instead of waiting, starts to read from the stdin and since the stdin at the end, contains \n the process is actually executed. This starts /bin/sh. The rest is up to /bin/sh to worry about!
To execute your program, the shell calls fork(2) to create a child process, and in the child process calls exec(3) to replace itself with the "dummy" program.
I imagine there is something like this in the shell's source code (if it is written in C):
if (fork() == 0)
execlp(program, arguments)
The child process inherits the file descriptors of the parent; in this case the shell. So the child process has the same stdin/stdout as the shell that exec'd it, which is the virtual terminal.
I'm not sure exactly how, but I'd imagine the parent process (the original shell you typed the command in) disregards stdin somehow whilst the child process is running.
When the program exits, the shell gets its stdin back. Any extra characters that weren't read by the your program will go to the shell. And then of course the shell just treats them as a command.
If you try using fgetc(3) instead of read(2) at first it appears the extra characters are lost, not sent to the shell... but, if you unbuffer stdin, you get the same effect using fgetc(2), ie: extra characters go back to the shell.
char rbuf[100];
setbuf(stdin, NULL); // with this line - same effect as using read(2)
// without it - extra characters are lost
for (int i = 0; i < 5; ++i)
rbuf[i] = (char)fgetc(stdin);
By default stdin is line buffered. So it looks like this behaviour is avoided when using buffered stdin because the entire line is read, and extra characters are discarded, whereas unbuffered stdin (or low level reads) will not read until the end of the line, and extra characters remain to be read by the parent (shell) once your program exits.

Why do I have different output between a terminal and a file when forking?

I'm learning to work with fork(), and I have some questions.
Consider the following code:
#include <stdio.h>
#include <unistd.h>
int main()
{
int i;
for(i = 0; i < 5; i++)
{
printf("%d", i);
if((i%2)==0)
if(fork())
fork();
}
}
When I output to a terminal, I get the result I expect (i.e.: 0,1,1,1,2,2,2,...). But when I output to a file, the result is completely different:
Case 1: (output to terminal, e.g.: ./a.out):
Result is: 0,1,1,1,2,2,2,...
Case 2: (output to file, e.g.: ./a.out > output_file)
Result is: 0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,...
Why it is like this?
When you output to a file, the stdio library automatically block-buffers the outbound bits.
When a program calls exit(2) or returns from main(), any remaining buffered bits are flushed.
In a program like this that doesn't generate much output, all of the I/O will occur after the return from main(), when the destination is not a tty. This will often change the pattern and order of I/O operations all by itself.
In this case, the result is further complicated by the series of fork() calls. This will duplicate the partially filled and as-yet-unflushed I/O buffers in each child image.
Before a program calls fork(), one might first flush I/O using fflush(3). If this flush is not done, then you may want all processes except one (typically: the children) to _exit(2) instead of exit(3) or return from main(), to prevent the same bits from being output more than once. (_exit(2) just does the exit system call.)
The fork() inside if block in your program is executed twice, because once fork is successful, the program is controlled by two processes(child and parent processes).So fork() inside if block, is executed by both child and parent processes. So it will have different output than expected since it is controlled by two different process and their order of execution is not known. ie. either child or parent may execute first after each fork()
For the difference in behaviour between the output and the file. this is the reason.
The contents you write to the buffer(to be written to file(disk) eventually) is not guaranteed to be written to the file (disk) immediatley. It is mostly flushed to the disk only after the execution of main() is complete. Whereas, it is output to terminal, during the execution of main().
Before writing to file in disk, the kernel actually copies the data into a buffer and later in the background, the kernel gathers up all of the dirty buffers, sorts them optimally and writes them out to file(disk).This is called writeback. It also allows the kernel to defer writes to more idle periods and batch many writes together.
To avoid such behaviour, it is always good to have three different condition checks in program using fork()
int pid;
if((pid = fork()) == -1 )
{ //fork unsuccessful
}
else if ( pid > 0)
{ //This is parent
}
else
{//This is child
}
buffered streams can produce some strange results sometimes... especially when you have multiple processes using the same buffered stream. Force the buffer to be flushed and you'll see different results:
int main()
{
int i;
FILE * fd = fopen(yourfile, "w");
for(i = 0; i < 5; i++)
{
fprintf(fd, "%d", i);
fflush(fd);
if((i%2)==0)
if(fork())
fork();
}
}
Also, for your debugging purposes, it might be nice to dump the process' IDs so you can see which process spawns which, and have a better idea of what's going on. getpid() can help you with that.
Why do I have different output between a terminal and a file when
forking?
C standard library functions use internal buffering for speed up. Most implementations use fully buffered IO for file streams, line buffered for stdin/stdout and unbuffered for stderr.
So your problem can be solved in number of ways:
Use explicit buffer flush before fork via fflush(3)
Set buffer type manually via setvbuf(3)
Use write(2) instead of stdlib's printf(3)
Output to stderr by default via fprintf(3) *****
Exit with _exit(2) in forked processes instead of exit(3) ******
Last two may not work as expected if:
* your implementation does not use unbuffered writes to stderr by default (Which is required by ISO C)
** you have written more than default buffer size in child and if was automatically flushed.
PS. Yet again, if you need deeper knowledge of standard library functions and buffering I recommend reading Advanced Programming in the UNIX Environment (2nd Edition) by W. Richard Stevens and Stephen A. Rago.
PPS. btw, your question is a very popular interview question for C/C++ programmer position.

Why is the output of my forking program different when I pipe its output?

I was looking at some simple code on fork, and decided to try it out for myself. I compiled and then ran it from inside Emacs, and got a different output to that output produced from running it in Bash.
#include <unistd.h>
#include <stdio.h>
int main() {
if (fork() != 0) {
printf("%d: X\n", getpid());
}
if (fork() != 0) {
printf("%d: Y\n", getpid());
}
printf("%d: Z\n", getpid());
}
I compiled it with gcc, and then ran a.out from inside Emacs, as well as piping it to cat, and grep ., and got this.
2055: X
2055: Y
2055: Z
2055: X
2058: Z
2057: Y
2057: Z
2059: Z
This isn't right. Running it just from Bash I get (which I expected)
2084: X
2084: Y
2084: Z
2085: Y
2085: Z
2087: Z
2086: Z
edit - missed some newlines
What's going on?
The order in which different processes write their output is entirely unpredictable. So the only surprise is that sometimes the "X" print statement sometimes happens twice.
I believe this is because sometimes at the second fork(), an output line including "X" is in an output buffer, needing to be flushed. So both processes eventually print it. Since getpid() was already called and converted into the string, they'll show the same pid.
I was able to reproduce multiple "X" lines, but if I add fflush(stdout); just before the second fork(), I always only see one "X" line and always a total of 7 lines.
I think I know what's going on. The stdio buffering will be different when output is a tty versus when it's a pipe or a file. The child processes inherit the parent buffers. When they're flushed, you can get double output.
If you add
fflush(stdout);
right after each printf() call, you'll see what I mean.
The interesting thing is that it's different when standard output is a tty device. It may be that the library knows what that means, and flushes after each line break, or something like that.
So I imagine you are wondering why you are getting more than one "X"?
This is because buffered output is being flushed twice.
When you pipe a program's output, the stdio library recognizes that your output is not a terminal, and it switches to block buffering instead of line buffering. Consequently, there isn't yet any output when the process forks and so now both parent and child have pending output.
If you have used stdout at all before forking, you must call fflush(stdout) before fork() (and likewise for any other output FILEs you use). Failure to do so results in undefined behavior. The effect you're seeing comes from stdout being line-buffered when it's connected to a terminal, but fully buffered when it's connected to a pipe. This is not required, but recommended by the standards (POSIX).

Resources