Unexpected behavior of pipes with scanf() - c

It's been a while since I last programmed in C, and I'm having trouble making pipes work. (For sake of clarity, I'm using Cygwin on Windows 7.) In particular, I need help understanding the behavior of the following example:
/* test.c */
#include <stdio.h>
#include <unistd.h>
int main() {
char c;
//scanf("%c", &c); // this is problematic
int p[2];
pipe(p);
int out = dup(STDOUT_FILENO);
// from now on, implicitly read from and write on pipe
dup2(p[0], STDIN_FILENO);
dup2(p[1], STDOUT_FILENO);
printf("hello");
fflush(stdout);
// restore stdout
dup2(out, STDOUT_FILENO);
// should read from pipe and write on stdout
putchar(getchar());
putchar(getchar());
putchar(getchar());
}
If I invoke:
echo abcde | ./test.exe
I get the following output:
hel
However, if I uncomment the scanf call, I get:
bcd
Which I can't explain. This is actually a very simplified version of a more complex program with a fork/exec structure that started behaving very bad. Despite not having cycles, it somehow began spawning infinite children in an endless loop. So, rules permitting, I'll probably need to extend the question with a more concrete case of use. Many thanks.

The stream I/O functions such as scanf generally perform buffering to improve performance. Thus, if you call scanf on the standard input then it will probably read more characters than needed to satisfy the request, and the extra will be waiting, buffered, for the next read.
Swapping out the the underlying file descriptor does not affect previously buffered data. When you subsequently read the file again, you get data buffered the first time until those are exhausted, and only then do you get fresh data from the new underlying file.
If you wish, you can turn off buffering of a stream via the setvbuf() function, before any I/O operations have been performed on it:
int result = setvbuf(stdin, NULL, _IONBF, 0);
if (result != 0) {
// handle error ...
}
This is actually a very simplified version of a more complex program
with a fork/exec structure that started behaving very bad. Despite not
having cycles, it somehow began spawning infinite children in an
endless loop.
I don't see how that behavior would be related to what you've asked here.
So, rules permitting, I'll probably need to extend the
question with a more concrete case of use.
That would be a separate question.

Related

Where does the process start to execute after fork()

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void) {
for (int i = 1; i < 4; i++) {
printf("%d", i);
int id = fork();
if (id == 0) {
printf("Hello\n");
exit(0);
} else {
exit(0);
}
}
return 0;
}
For this code, it prints 11Hello on my computer. It seems counter-intuitive to me because "1" is printed twice but it's before the fork() is called.
The fork() system call forks a new process and executes the instruction that follows it in each process parallelly. After your child process prints the value of i to the stdout, it gets buffered which then prints the value of 'i' again because stdout was not flushed.
Use the fflush(stdout); so that 'i' gets printed only once per fork.
Alternately, you could also use printf("%d\n", i); where the new line character at the end does the job.
Where does the process start to execute after fork()
fork() duplicates the image of the process and it's context. It will run the next line of code pointed by the instruction pointer.
It seems counter-intuitive to me because "1" is printed twice but it's before the fork() is called.
Read printf anomaly after "fork()"
To begin with, the for loop is superfluous in your example.
Recall that the child copies the caller's memory(that of its parent) (code, globals, heap and stack), registers, and open files. To be performant or there may be some other reason, the printf call may not flush the buffer and put the things passed to that except for some cases such as appending new-line-terminator.
Before forking, the parent(main process) is on the way.
Let's assume we're on a single core system and the child first preempts the core.
1 is in the buffer because its parent put it into that before forking. Then, the child reaches second print statement, a caveat here is that the child can be orphaned at that time(no matter for this moment), passing "Hello\n" string including new-line character giving rise to dump the buffer/cache(whatever you call.) Since it sees \n character, it flushes the buffer including prior 1 added by its parent, that is 11Hello.
Let's assume the parent preempts the core at first,
It surrenders after calling exit statement, bringing on the child to be orphaned, causing memory leak. After that point, the boss(init possessing process id as 1) whose newly name I forget(it may be sys-something) should handle this case. However, nothing is changed as to the printing-steps. So you run into again 11Hello except if not the buffer is flushed automagically.
I don't have much working experience with them but university class(I failed at the course 4 times). However, I can advise you whenever possible use stderr while coping with these tings since it is not buffered, in lieu of stdout or there is some magical way(I forget it again, you call it at the beginning in main()) you can opt for to disable buffering for stdout as well.
To be more competent over these topics, you should glance at The Linux Programming Interface of Michael Kerrisk and the topics related to William Pursell,
Jonathan Leffler,
WhozCraig,
John Bollinger, and
Nominal Animal. I have learnt a plethora of information from them even if the information almost wholly is useless in Turkey borders.
*Magic means needing a lot of details to explain.

How can I ensure a child process eventually writes data in C?

In C, I'd like to fork off a child process, and map its STDIN and STDOUT to pipes. The parent then communicates with the child by writing to or reading from the child's STDIN and STDOUT.
The MWE code below is apparently successful. The parent thread receives the string "Sending some message", and I can send arbitrary messages to the parent thread by writing to stdout. I can also freely read messages from the parent using, e.g. scanf.
The problem is that, once execl is called by the child, the output seems to stop coming through. I know that without the call to setvbuf to unbuffer stdout, this code will hang indefinitely, and so I suppose that the call to execl re-buffers stdout. Since the child program ./a.out is itself interactive, we hit a race condition where the child will not write (because of the buffering), and blocks waiting for input, while the parent blocks waiting for the child to write before producing input for the child.
Is there a nice way to avoid this? In particular, is there a way to use exec that doesn't overwrite the attributes of stdin stdout, etc.?
int main(char* argv[], int argc){
int mgame_read_pipe[2];
int mgame_write_pipe[2];
pipe(mgame_read_pipe);
pipe(mgame_write_pipe);
pid_t is_child = fork();
if(is_child == -1){
perror("Error while forking.");
exit(1);
}
if(is_child==0){
dup2(mgame_read_pipe[1], STDOUT_FILENO);
printf("Sending some message.\n");
dup2(mgame_write_pipe[0], STDIN_FILENO);
setvbuf(stdin, NULL, _IONBF, 0);
setvbuf(stdout, NULL, _IONBF, 0);
close(mgame_read_pipe[0]);
close(mgame_write_pipe[1]);
execl("./a.out", "./a.out", NULL);
}
else{
close(mgame_read_pipe[1]);
close(mgame_write_pipe[0]);
int status;
do{
printf("SYSTEM: Waiting for inferior process op.\n");
char buf[BUFSIZ];
read(mgame_read_pipe[0], buf, BUFSIZ);
printf("%s",buf);
scanf("%s", buf);
printf("SYSTEM: Waiting for inferior process ip.\n");
write(mgame_write_pipe[1], buf, strlen(buf));
} while( !waitpid(is_child, &status, WNOHANG) );
}
}
EDIT: For completeness, here's an (untested) example a.out:
int main(){
printf("I'm alive!");
int parent_msg;
scanf("%d", &parent_msg);
printf("I got %d\n");
}
Your buffering problems stem from the fact that the buffering is being performed by the C standard library in the program that you are exec-ing, not at the kernel / file descriptor level (as observed by #Claris). There is nothing you can do to affect buffering in another programs own code (unless you modify that program).
This is actually a common problem encountered by anyone trying to automate interaction with a program.
One solution is to use a pseudo-tty, which makes the program think it is actually talking to an interactive terminal, which alters it's buffering behaviour, amongst other things.
This article provides a good introduction. There is an example program there showing exactly how to achieve what you are trying to do.
The setvbuf options you are setting have to do with stdio streams and not file descriptors so will have no effect.
The read/write system calls are not buffered (aside from caching which is different and which might exist in the kernel), so you don't need to worry about disabling a buffer or any other such stuff. They will go directly to where they need to go.
That being said, they are blocking so if the kernel does not have enough data to fill your IO block size they will block at the OS level until that data exists and can be copied to/from your buffer. They will only provide you less than the data you asked for if an EOF condition is encountered or you have enabled async/non blocking IO.
You may be able to enable non-blocking IO through a system call using the fcntl interface. This would return immediately but is not always supported depending on how you are using a file descriptor. Async IO (for files) is supported through the AIO interface.

Why is the line following printf(), a call to sleep(), executed before anything is printed?

I thought I was doing something simple here, but C decided to go asynchronous on me. I'm not sure what's going on. Here's my code:
#include <stdio.h>
int main() {
printf("start");
sleep(5);
printf("stop");
}
When I compile and run, I notice that sleep(5) works like a charm. But the compiler decided it was a good idea to skip the first printf() and go out of order, so when running, the program waits for 5 seconds and then prints startstop.
What's the deal? My theory is that the program initiates the print operation with the shell, then continues with the program, leaving Bash to wait until the program is no longer busy to actually render the strings. But I really don't know.
Thanks
printf uses buffered output. This means that data first accumulates in a memory buffer before it is flushed to the output source, which in this case is stdout (which generally defaults to console output). Use fflush after your first printf statement to force it to flush the buffered data to the output source.
#include <stdio.h>
int main() {
printf("start");
fflush(stdout);
sleep(5);
printf("stop");
}
Also see Why does printf not flush after the call unless a newline is in the format string?
Try adding '\n' to your printf statements, like so:
#include <stdio.h>
int main() {
printf("start\n");
sleep(5);
printf("stop\n");
}
The compiler is not executing this out of order. Just the output is getting accumulated, and then displayed when the program exits. The '\n' will invoke the line discipline in the tty drivers to flush the output.
Read this Q&A, it explains it.

C close STDOUT running forever

I am writing some C code that involves the use of pipes. To make a child process use my pipe instead of STDOUT for output, I used the following lines:
close(STDOUT);
dup2(leftup[1], STDOUT);
However, it seems to go into some sort of infinite loop or hang on those lines. When I get rid of close, it hangs on dup2.
Curiously, the same idea works in the immediately preceding line for STDIN:
close(STDIN);
dup2(leftdown[0], STDIN);
What could be causing this behavior?
Edit: Just to be clear...
#define STDIN 0
#define STDOUT 1
Edit 2: Here is a stripped-down example:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#define STDIN 0
#define STDOUT 1
main(){
pid_t child1 = 0;
int leftdown[2];
if (pipe(leftdown) != 0)
printf("ERROR");
int leftup[2];
if (pipe(leftup) != 0)
printf("ERROR");
printf("MADE PIPES");
child1 = fork();
if (child1 == 0){
close(STDOUT);
printf("TEST 1");
dup2(leftup[1], STDOUT);
printf("TEST 2");
exit(0);
}
return(0);
}
The "TEST 1" line is never reached. The only output is "MADE PIPES".
At a minimum, you should ensure that the dup2 function returns the new file descriptor rather than -1.
There's always a possibility that it will give you an error (for example, if the pipe() call failed previously). In addition, be absolutely certain that you're using the right indexes (0 and 1) - I've been bitten by that before and it depends on whether you're in the parent or child process.
Based on your edit, I'm not the least bit surprised that MADE PIPES is the last thing printed.
When you try to print TEST 1, you have already closed the STDOUT descriptor so that will go nowhere.
When you try to print TEST 2, you have duped the STDOUT descriptor so that will go to the parent but your parent doesn't read it.
If you change your forking code to:
child1 = fork();
if (child1 == 0){
int count;
close(STDOUT);
count = printf("TEST 1\n");
dup2(leftup[1], STDOUT);
printf("TEST 2 (%d)\n", count);
exit(0);
} else {
char buff[80];
read (leftup[0], buff, 80);
printf ("%s\n", buff);
sleep (2);
}
you'll see that the TEST 2 (-1) line is output by the parent because it read it via the pipe. The -1 in there is the return code from the printf you attempted in the child after you closed the STDOUT descriptor (but before you duped it), meaning that it failed.
From ISO C11 7.20.6.3 The printf function:
The printf function returns the number of characters transmitted, or a negative value if an output or encoding error occurred.
Multiple thing to mention,
When you use fork, it causes almost a complete copy of parent process. That also includes the buffer that is set up for stdout standard output stream as well. The stdout stream will hold the data till buffer is full or explicitly requested to flush the data from buffer/stream. Now because of this , now you have "MADE PIPES" sitting in buffer. When you close the STDOUT fd and use printf for writing data out to terminal, it does nothing but transfers your "TEST 1" and "TEST 2" into the stdout buffer and doesn't cause any error or crash (due to enough buffer). Thus even after duplicating pipe fd on STDOUT, due to buffered output printf hasn't even touched pipe write end. Most important, please use only one set of APIs i.e. either *NIX or standard C lib functions. Make sure you understand the libraries well, as they often play tricks for some sort of optimization.
Now, another thing to mention, make sure that you close the appropriate ends of pipe in appropriate process. Meaning that if say, pipe-1 is used to communicate from parent to child then make sure that you close the read end in parent and write end in child. Otherwise, your program may hung, due to reference counts associated with file descriptors you may think that closing read end in child means pipe-read end is closed. But as when you don't close the read end in parent, then you have extra reference count for read end of pipe and ultimately the pipe will never close.
There are many other things about your coding style, better you should get hold on it :)
Sooner you learn it better it will save your time. :)
Error checking is absolutely important, use at least assert to ensure that your assumptions are correct.
While using printf statements to log the error or as method of debugging and you are changing terminal FD's (STDOUT / STDIN / STDERR) its better you open a log file with *NIX open and write errors/ log entries to it.
At last, using strace utility will be a great help for you. This utility will allow you to track the system calls executed while executing your code. It is very straight forward and simple. You can even attach this to executing process, provided you have right permissions.

Why do I have different output between a terminal and a file when forking?

I'm learning to work with fork(), and I have some questions.
Consider the following code:
#include <stdio.h>
#include <unistd.h>
int main()
{
int i;
for(i = 0; i < 5; i++)
{
printf("%d", i);
if((i%2)==0)
if(fork())
fork();
}
}
When I output to a terminal, I get the result I expect (i.e.: 0,1,1,1,2,2,2,...). But when I output to a file, the result is completely different:
Case 1: (output to terminal, e.g.: ./a.out):
Result is: 0,1,1,1,2,2,2,...
Case 2: (output to file, e.g.: ./a.out > output_file)
Result is: 0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,...
Why it is like this?
When you output to a file, the stdio library automatically block-buffers the outbound bits.
When a program calls exit(2) or returns from main(), any remaining buffered bits are flushed.
In a program like this that doesn't generate much output, all of the I/O will occur after the return from main(), when the destination is not a tty. This will often change the pattern and order of I/O operations all by itself.
In this case, the result is further complicated by the series of fork() calls. This will duplicate the partially filled and as-yet-unflushed I/O buffers in each child image.
Before a program calls fork(), one might first flush I/O using fflush(3). If this flush is not done, then you may want all processes except one (typically: the children) to _exit(2) instead of exit(3) or return from main(), to prevent the same bits from being output more than once. (_exit(2) just does the exit system call.)
The fork() inside if block in your program is executed twice, because once fork is successful, the program is controlled by two processes(child and parent processes).So fork() inside if block, is executed by both child and parent processes. So it will have different output than expected since it is controlled by two different process and their order of execution is not known. ie. either child or parent may execute first after each fork()
For the difference in behaviour between the output and the file. this is the reason.
The contents you write to the buffer(to be written to file(disk) eventually) is not guaranteed to be written to the file (disk) immediatley. It is mostly flushed to the disk only after the execution of main() is complete. Whereas, it is output to terminal, during the execution of main().
Before writing to file in disk, the kernel actually copies the data into a buffer and later in the background, the kernel gathers up all of the dirty buffers, sorts them optimally and writes them out to file(disk).This is called writeback. It also allows the kernel to defer writes to more idle periods and batch many writes together.
To avoid such behaviour, it is always good to have three different condition checks in program using fork()
int pid;
if((pid = fork()) == -1 )
{ //fork unsuccessful
}
else if ( pid > 0)
{ //This is parent
}
else
{//This is child
}
buffered streams can produce some strange results sometimes... especially when you have multiple processes using the same buffered stream. Force the buffer to be flushed and you'll see different results:
int main()
{
int i;
FILE * fd = fopen(yourfile, "w");
for(i = 0; i < 5; i++)
{
fprintf(fd, "%d", i);
fflush(fd);
if((i%2)==0)
if(fork())
fork();
}
}
Also, for your debugging purposes, it might be nice to dump the process' IDs so you can see which process spawns which, and have a better idea of what's going on. getpid() can help you with that.
Why do I have different output between a terminal and a file when
forking?
C standard library functions use internal buffering for speed up. Most implementations use fully buffered IO for file streams, line buffered for stdin/stdout and unbuffered for stderr.
So your problem can be solved in number of ways:
Use explicit buffer flush before fork via fflush(3)
Set buffer type manually via setvbuf(3)
Use write(2) instead of stdlib's printf(3)
Output to stderr by default via fprintf(3) *****
Exit with _exit(2) in forked processes instead of exit(3) ******
Last two may not work as expected if:
* your implementation does not use unbuffered writes to stderr by default (Which is required by ISO C)
** you have written more than default buffer size in child and if was automatically flushed.
PS. Yet again, if you need deeper knowledge of standard library functions and buffering I recommend reading Advanced Programming in the UNIX Environment (2nd Edition) by W. Richard Stevens and Stephen A. Rago.
PPS. btw, your question is a very popular interview question for C/C++ programmer position.

Resources