Stdin not flushed after normal dummy program - c

Here is a piece of c code that I wrote after testing some stuffs.
I know this is not a vulnerability concern, but I don't understand why the stdin is not flushed after the normal return of the program, at the point that the prompt get back stdin,stdout,stderr. I mean why the remaining chars on stdin are redirected to stdout after the end of the normal execution of the program and not flushed?
$cat dummy.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
int main(){
char rbuf[100];
if (read(0, rbuf,5) == -1){
perror("learn to count");
printf("errno = %d.\n", errno);
exit(1);
}
//printf("rbuf : %s\n",rbuf);
return 1;
}
Here the execution:
$ gcc -o dummy dummy.c
$ ./dummy
AAAAA /bin/sh
$ /bin/sh
sh-3.2$ exit
exit
$
I guess this is just the remaining string of the stdin printed on the mew stdout which is the prompt. Plus the line feed at the end, it somehow emulates the enter pressed by the user to execute a command. What's going on? I'm just curious to know more about that.

Yes, your guess is right, these are extra characters in stdin:
do this:
void flush_stdin()
{
while(getchar() != '\n');
}
Note: do not use fflush() on stdin because that is undefined behavior
edit
The stdin is wired to the terminal which starts the program(which is bash). This starts a new program dummy and the stdin of dummy is wired to the stdin of bash.
From there on, the dummy process reads five characters, neglects the others(leaving them in the stdin buffer). When the control returns to bash it waits until there is atleast one character to read from in the buffer. Low and behold, there are characters in the stdin buffer, hence the program - instead of waiting, starts to read from the stdin and since the stdin at the end, contains \n the process is actually executed. This starts /bin/sh. The rest is up to /bin/sh to worry about!

To execute your program, the shell calls fork(2) to create a child process, and in the child process calls exec(3) to replace itself with the "dummy" program.
I imagine there is something like this in the shell's source code (if it is written in C):
if (fork() == 0)
execlp(program, arguments)
The child process inherits the file descriptors of the parent; in this case the shell. So the child process has the same stdin/stdout as the shell that exec'd it, which is the virtual terminal.
I'm not sure exactly how, but I'd imagine the parent process (the original shell you typed the command in) disregards stdin somehow whilst the child process is running.
When the program exits, the shell gets its stdin back. Any extra characters that weren't read by the your program will go to the shell. And then of course the shell just treats them as a command.
If you try using fgetc(3) instead of read(2) at first it appears the extra characters are lost, not sent to the shell... but, if you unbuffer stdin, you get the same effect using fgetc(2), ie: extra characters go back to the shell.
char rbuf[100];
setbuf(stdin, NULL); // with this line - same effect as using read(2)
// without it - extra characters are lost
for (int i = 0; i < 5; ++i)
rbuf[i] = (char)fgetc(stdin);
By default stdin is line buffered. So it looks like this behaviour is avoided when using buffered stdin because the entire line is read, and extra characters are discarded, whereas unbuffered stdin (or low level reads) will not read until the end of the line, and extra characters remain to be read by the parent (shell) once your program exits.

Related

Forking code creates unexpected results when redirecting output to file [duplicate]

This question already has answers here:
printf anomaly after "fork()"
(3 answers)
fork() branches more than expected?
(3 answers)
Closed last year.
I have the following C code:
#include <stdio.h>
#include <unistd.h>
int main()
{
int i, pid = 0;
for (i = 0; i < 3; i++)
{
fork();
pid = getpid();
printf("i=%d pid=%d\n", i, pid);
}
return 1;
}
Which is supposed to create a total of 7 new processes after all the iterations in the loop. Analyzing it you can see that 14 lines should be printed before all the processes finish, and that is exactly what you see when you execute it from the command line.
However, when you redirect the output to another file ./main > output.txt; cat output.txt, you get a completely different situation. In total, 24 lines are always printed and some of them are repeated for the same i and pid values, and the amount of repetition seems consistent. I'm attaching a screenshot for clarification here Execution example. The system that I'm using is Ubuntu 20.04.3 in a VirtualBox VM.
I really don't understand why that is happening, I'm guessing it has something to do with race conditions on the output buffer or some other conflict when multiple processes are writing to the file, but that doesn't explain to me why it doesn't happen on the terminal. Can anybody explain this odd behaviour? Thanks!
When the standard output is a terminal, the stream is typically line buffered. The C standard requires it not be fully buffered, meaning it must be line buffered or unbuffered; C 2018 7.21.3 6 says:
… As initially opened, … the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
When the program executes printf("i=%d pid=%d\n", i, pid);, the output is immediately sent to the terminal, either because the stream is line buffered and the new-line character causes the output to be sent or because the stream is unbuffered and the output is always sent in each printf. Then, when the program forks, there is no pending output, because it has already been sent to the terminal. Each forked instance of the program prints only its own output.
When the standard output is redirected to a file, the stream is fully buffered. Then, when the program executes printf("i=%d pid=%d\n", i, pid);, the data is held in a buffer inside the program. It is not sent to the terminal immediately. (It will be sent when the buffer is full or when a flush is requested, which occurs automatically at normal program termination.) When the program forks, the buffer is copied along with the rest of the program state. Each forked instance of the program accumulates output in the buffer.
When each forked instance of the program exits, pending data in its buffers are flushed. Thus includes both data added by that particular instance and data that was put into the buffer in parent processes and copied by the fork. Thus multiple copies of data are printed.
To resolve this, execute fflush(stdout); immediately before fork();. This flushes the buffer before forking. Alternately, request that the stream be line-buffered by executing setvbuf(stdout, NULL, _IOLBF, 0); at the start of main.

Implementing shell in C - pipelined input has correct output but exits loop

I am trying to implement a basic shell in C that handles multiple pipes. It waits for input and execs the commands in a for loop. When it receives EOF, it stops waiting for input and exits.
Right now, my shell outputs the correct output when I input a pipelined command, e.g. ls | wc | grep ... but it stops waiting for input and exits the outer while loop instead of waiting for the next line of input.
I found that this happens because the fgets in my while loop is returning null (stdin is getting EOF somehow?). I do not get any errors while creating forking, creating a pipe, or execing.
However, if I enter one command at a time without any pipes e.g. ls, it successfully prints out the correct output and waits for the next line of input, as it should.
My program parses each line of input into a struct before trying to execute each command (omitted below). The struct is designed such that I can easily pass the parsed arguments into execvp, which I will not describe here.
This is a heavily simplified version of my code with most of the error-checking omitted:
FILE* input;
char line[MAX_LINE];
input = stdin;
printf("> ");
fflush(stdout);
while (fgets(line, sizeof(line), input)) {
int i;
struct cmdLine;
/* struct defined elsewhere
** commands = # of commands in parsed input
** start = index where a command and its args start
** args[] = array holding each command/arg
*/
/* parse input line into cmdLine */
...
/* exec all commands in pipeline except the last */
for (i = 0; i < cmdLine.commands-1; ++i) {
int pd[2];
pipe(pd);
if (fork() == 0) {
dup2(pd[1], 1);
execvp(cmdLine.args[cmdLine.start[i]], &(cmdLine.args[cmdLine.start[i]]));
} else {
wait(NULL);
}
dup2(pd[0], 0);
close(pd[1]);
}
/* exec last command */
if (fork() == 0) {
execvp(cmdLine.args[cmdLine.start[i]], &(cmdLine.args[cmdLine.start[i]]));
} else {
wait(NULL);
}
if (stdin == input) {
printf("> "); /* print shell prompt */
fflush(stdout);
}
}
I am almost certain I messed up somewhere with my duping, but I've been trying for hours and I don't understand what I'm doing wrong. Is EOF somehow being sent to stdin so the enclosing fgets returns NULL?
By calling dup2 with 0 (=stdin) as the second argument, you are closing the original stdin at the end of each iteration of your for loop, so you can no longer actually talk to your program via the original stdin.
The problem in your code is that you are trying to hand off connecting all of the pipes together to someone else; that is not going to work. Here's what should work:
For n programs, you need at least (n-1) pipes.
Record all of the pipe FDs in arrays: one for the input side of the pipe (that is written to), one for the output side (that is read from).
For each process you are forking, connect the previous pipe's output (if any) to its stdin, and the next pipe's input to its stdout (or your main program's stdout if you're handling the last process in your chain of pipes).
Once you've forked everything: in a loop, poll() on the output FDs of your pipes, read from any that have activity, and write to the input of the next pipe (your own stdout at the end). If you get EOF on one of the pipes, close the next pipe's input (and remove the EOF'd pipe output from your output array). Once all of the FDs are closed, exit your loop.
EDIT: I just thought of another, simpler way that requires less code changes but I haven't completely thought it through. :) The problem is that you are destroying your own stdin. If you do all of this (i.e. the whole "process one line of commands) in a forked child, replacing stdin between processes doesn't affect the parent process at all. Still, this would require a lot of buffering in the kernel and so it probably won't scale.

Understanding why fork gives different result in C

Although there are some similar questions like this and this
I still cannot understand why fork gives different output with the following two codes
#include<stdio.h>
void main()
{
printf("Hello World\n");
fork();
}
Gives output
Hello World
Where as this code
#include<stdio.h>
void main()
{
printf("Hello World");
fork();
}
Gives output
Hello WorldHello World
The second case is clear to me from the other questions, that both processes get a copy of the same buffer. So, after the fork, both processes eventually flush the buffer and print the contents to screen separately.
But I am not clear why the first case is so.
Let me explain it in simple words:
Consider,these two statements:
printf("Hello World")
and
printf("Hello World\n")
The STDOUT is line buffered that is printf will be executed only when the buffer is full or when it is terminated by a new line character
printf("Hello World") will not be guaranteed to display the output unless the
Buffer is full or terminated by new line character..Thus when a fork() is called,
Fork will create a copy of a process,this means that after fork, there are two identical processes,
each having a copy of the FILE* STDOUT Therefore, each process will have the same buffer contents of STDOUT
Since in your second program there is no newline character,
each process has the string in its buffer.
Then each process prints another string, and all of the contents are printed since it is terminated by a new line character. This is why you see two identical outputs.
But
In your first program,there is a new line character and it displays output
immediately after the statement is executed and hence it flushes the contents of buffer..thus when a fork() is called
The process tries to access the buffer but since the buffer is empty,it does not print anything
Thus,the different response is due to Buffering behavior of stdout
Hope it helps!!
It's due to the buffering behavior.
#include<stdio.h>
void main()
{
printf("Hello World\n");
fork();
}
flushes the output before you call fork, so the printing is already finished when it was still only a single process.
As to why printf("Hello World\n"); gets flushed immediately and printf("Hello World"); doesn't isn't that easy. It depends on the situation. In your case you probably ran it in the commandline where line buffering is common, line buffering means that it will be flushed once you got a newline. If you write to files it might buffer more before you get any output and you might see 2 outputs in the first case too.
If you want consistent behavior with flushing you might want to do it yourself.

Program output changes when piped

I have a simple C program to time process startup (I'd rather not post the full code as it's an active school assignment). My main function looks like this:
int main(void) {
int i;
for (i = 0; i < 5; i++) {
printf("%lf\n", sample_time());
}
exit(0);
}
sample_time() is a function that times how long it takes to fork a new process and returns the result in seconds as a double. The part of sample_time() that forks:
double sample_time() {
// timing stuff
if (!fork()) exit(0); // immediately close new process
// timing stuff
return a_sample_time;
}
As expected, running the program, times, in the terminal outputs 5 numbers like so:
$ ./times
0.000085
0.000075
0.000079
0.000071
0.000078
However, trying to redirect this into a file (or anywhere else) in a Unix terminal produces unexpected results.
For example, ./times > times.out creates a file with fifteen numbers. Additionally, ./times | wc -l outputs 15, confirming the earlier result. Running ./times | cat, I again see fifteen numbers, more than five of which are distinct.
Does anyone know what on earth can cause something like this? I'm out of ideas.
./times != ./times | cat. Wat.
Prerequisite knowledge
Fact 1 - When stdout is connected to a TTY it is line buffered. When it's connected to a file or a pipeline it is full buffered. This means it's only flushed every 8KB, say, rather than every line.
Fact 2 - Forked processes have duplicate copies of in-memory data. This includes stdio's output buffers if the data hasn't been flushed yet.
Fact 3 - Calling exit() causes stdio's output buffers to be flushed before the program exits.
Case 1: Output to terminal
When your program prints to the terminal its output is line buffered. Each printf() call that ends with \n immediately prints. This means each line is printed and the in-memory output buffer is emptied before fork() runs.
Result: 5 lines of output.
Case 2: Output to pipeline or file
When libc sees that stdout isn't connected to a TTY it switches to a more efficient full buffering strategy. This causes output to be buffered until 4KB worth have accumulated. That means the output from the printf()s is saved in memory, and calls to write() are deferred.
if (!fork()) exit(0);
After forking, the child process has a copy of the buffered output. The exit() call then causes that buffer to be flushed. This doesn't affect the parent process, though. Its output is still buffered.
Then when the second line of output is printed, it has two lines buffered. The next child process forks, exits, and prints those two lines. The parent retains its two lines of output, and so on.
Result: The child processes print 0, 1, 2, 3, and 4 lines of output. The main program prints 5 when it finally exits and flushes its output. 0 + 1 + 2 + 3 + 4 + 5 = 15. 15 lines of output instead of 5!
Solutions
Call _Exit() instead of exit(). The function _Exit() is like exit(), but does not call any functions registered with atexit(). This would be my preferred solution.
Explicitly set stdout to be line buffered: setvbuf(stdout, NULL, _IOLBF, 0);
Call fflush(stdout) after each printf.

C close STDOUT running forever

I am writing some C code that involves the use of pipes. To make a child process use my pipe instead of STDOUT for output, I used the following lines:
close(STDOUT);
dup2(leftup[1], STDOUT);
However, it seems to go into some sort of infinite loop or hang on those lines. When I get rid of close, it hangs on dup2.
Curiously, the same idea works in the immediately preceding line for STDIN:
close(STDIN);
dup2(leftdown[0], STDIN);
What could be causing this behavior?
Edit: Just to be clear...
#define STDIN 0
#define STDOUT 1
Edit 2: Here is a stripped-down example:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#define STDIN 0
#define STDOUT 1
main(){
pid_t child1 = 0;
int leftdown[2];
if (pipe(leftdown) != 0)
printf("ERROR");
int leftup[2];
if (pipe(leftup) != 0)
printf("ERROR");
printf("MADE PIPES");
child1 = fork();
if (child1 == 0){
close(STDOUT);
printf("TEST 1");
dup2(leftup[1], STDOUT);
printf("TEST 2");
exit(0);
}
return(0);
}
The "TEST 1" line is never reached. The only output is "MADE PIPES".
At a minimum, you should ensure that the dup2 function returns the new file descriptor rather than -1.
There's always a possibility that it will give you an error (for example, if the pipe() call failed previously). In addition, be absolutely certain that you're using the right indexes (0 and 1) - I've been bitten by that before and it depends on whether you're in the parent or child process.
Based on your edit, I'm not the least bit surprised that MADE PIPES is the last thing printed.
When you try to print TEST 1, you have already closed the STDOUT descriptor so that will go nowhere.
When you try to print TEST 2, you have duped the STDOUT descriptor so that will go to the parent but your parent doesn't read it.
If you change your forking code to:
child1 = fork();
if (child1 == 0){
int count;
close(STDOUT);
count = printf("TEST 1\n");
dup2(leftup[1], STDOUT);
printf("TEST 2 (%d)\n", count);
exit(0);
} else {
char buff[80];
read (leftup[0], buff, 80);
printf ("%s\n", buff);
sleep (2);
}
you'll see that the TEST 2 (-1) line is output by the parent because it read it via the pipe. The -1 in there is the return code from the printf you attempted in the child after you closed the STDOUT descriptor (but before you duped it), meaning that it failed.
From ISO C11 7.20.6.3 The printf function:
The printf function returns the number of characters transmitted, or a negative value if an output or encoding error occurred.
Multiple thing to mention,
When you use fork, it causes almost a complete copy of parent process. That also includes the buffer that is set up for stdout standard output stream as well. The stdout stream will hold the data till buffer is full or explicitly requested to flush the data from buffer/stream. Now because of this , now you have "MADE PIPES" sitting in buffer. When you close the STDOUT fd and use printf for writing data out to terminal, it does nothing but transfers your "TEST 1" and "TEST 2" into the stdout buffer and doesn't cause any error or crash (due to enough buffer). Thus even after duplicating pipe fd on STDOUT, due to buffered output printf hasn't even touched pipe write end. Most important, please use only one set of APIs i.e. either *NIX or standard C lib functions. Make sure you understand the libraries well, as they often play tricks for some sort of optimization.
Now, another thing to mention, make sure that you close the appropriate ends of pipe in appropriate process. Meaning that if say, pipe-1 is used to communicate from parent to child then make sure that you close the read end in parent and write end in child. Otherwise, your program may hung, due to reference counts associated with file descriptors you may think that closing read end in child means pipe-read end is closed. But as when you don't close the read end in parent, then you have extra reference count for read end of pipe and ultimately the pipe will never close.
There are many other things about your coding style, better you should get hold on it :)
Sooner you learn it better it will save your time. :)
Error checking is absolutely important, use at least assert to ensure that your assumptions are correct.
While using printf statements to log the error or as method of debugging and you are changing terminal FD's (STDOUT / STDIN / STDERR) its better you open a log file with *NIX open and write errors/ log entries to it.
At last, using strace utility will be a great help for you. This utility will allow you to track the system calls executed while executing your code. It is very straight forward and simple. You can even attach this to executing process, provided you have right permissions.

Resources