why "Line buffer stdout to ensure lines are written atomically and immediately" - c

I'm reading the source code of wc command, and in main function I found follow code:
/* Line buffer stdout to ensure lines are written atomically and immediately
so that processes running in parallel do not intersperse their output. */
setvbuf (stdout, NULL, _IOLBF, 0);
so why line buffer stdout ensure that?

Say block buffering is used for stdout instead of line buffering. (This is the default if stdout refers to a regular file, for example.) Let the buffer size be 1024 bytes (so that output is flushed to the file every 1024 bytes), and pretend that two processes are writing to the same file.
Say that the first process currently has 1020 bytes in its I/O buffer and writes the line "foo_file 37\n" to stdout. This will put "foo_" at the end of the I/O buffer, flush the buffer to the file (since the buffer is now full), and then put "file 37\n" at the beginning of the buffer. Say that the second process then comes along and flushes its buffer, which happens to start with "bar_file 48\n". The resulting line in the output file will then be "foo_bar_file 48", which clearly isn't what we want.
The basic problem is that buffer boundaries do not necessarily correspond to line boundaries when block buffering is used.
You could play around with two instances of the following program writing to the same file to see this effect in action yourself:
#include <stdio.h>
int main(void) {
setvbuf (stdout, NULL, _IOLBF, 0);
for (;;)
puts("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
return 0;
}
With the setvbuf() call commented out, you will see some lines get mixed up with other lines. Be aware that this will program will quickly write a huge file, of course. :)

Related

C Read in bash : stdin and stdout

I have a simple C program with the read function and I don't understand the output.
//code1.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main()
{
int r;
char c; // In C, char values are stored in 1 byte
r = read ( 0, &c, 1);
// DOC:
//ssize_t read (int filedes, void *buffer, size_t size)
//The read function reads up to size bytes from the file with descriptor filedes, storing the results in the buffer.
//The return value is the number of bytes actually read.
// Here:
// filedes is 0, which is stdin from <stdio.h>
// *buffer is &c : address in memory of char c
// size is 1 meaning it will read only 1 byte
printf ("r = %d\n", r);
return 0;
}
And here is a screenshot of the result:
I ran this program 2 times as showed above and typed "a" for the first try and "aecho hi" for the second try.
How I try to explain the results:
When read is called it sees that stdin is closed and opens it (from my point of view, why? It should just read it. I don't know why it opens it).
I type "aecho hi" in the bash and press enter.
read has priority to process stdin and reads the first byte of "aecho hi" : "a".
I get the confirmation that read has processed 1 byte with the printf.
a.out has finished and is terminated.
Somehow the remaining data in stdin is processed in bash (the father of my program) and goes to stdout which executes it and for some reason the first byte has been deleted by read.
This is all hypothetical and very blurry. Any help understanding what is happening would be very welcome.
When you type at your terminal emulator, it writes your keystrokes to a "file", in this case an in-memory buffer that, thanks to the file system, looks just like any other file that might be on disk.
Every process inherits 3 open file handles from its parent. We are interested in one of them here, standard input. The program executed by the terminal emulator (here, bash), is given as its standard input the in-memory buffer described in the first paragraph.
a.out, when run by bash, also receives this same file as its standard input. Keep this in mind: bash and a.out are reading from the same, already-opened file.
After you run a.out, its read blocks, because its standard input is empty. When you type aecho hi<enter>, the terminal writes these characters to the buffer (<enter> becoming a single linefeed character). a.out only requests one character, so it gets a and leaves the rest of the characters in the file. (Or more precisely, the file pointer is still pointing at the e after a is read.)
After a.out completes, bash tries to read from the same file. Normally, the file is empty (i.e., the file pointer is at the end of the file), so bash blocks waiting for another command. In this case, though, there is input available already: echo hi\n. bash reads this now the same as if you had typed it after a.out completed.
Check this. As alk suggests stdin and stdout are already open with the program. Now you have to understand, once you type:
aecho hi
and hit return the stdin buffer is filled with all those letters (and space) - and will continue to be as long as you don't flush it. When the program exits, the stdin buffer is still full, and your terminal automatically handles a write into stdin by echoing it to stdout - this is what you're seeing at the end - your shell reading stdin.
Now as you point out, your code "presses return" for you so to speak - in the first execution adding an empty shell line, and in the second executing echo hi. But you must remember, you pressed return, so "\n" is in the buffer! To be explicit, you in fact typed:
aecho hi\n
Once your program exits the shell reads the remaining characters in the buffer, including the return, and that's what you see!

stdout stream changes order after redirection?

These days I was learning the "apue", a result of an typical case confused me. The following are the sample codes of "sample.c":
#include "apue.h"
#include <stdio.h>
#define BUFF_SZ 4096
int main()
{
int n = 0;
char buff[BUFF_SZ] = {'\0'};
while ((n = read(STDIN_FILENO, buff, BUFF_SZ)) > 0) {
printf("read %d bytes\n", n);
if (write(STDOUT_FILENO, buff, n) != n) {
err_sys("write error");
}
}
if (n < 0) {
err_sys("read error");
}
return 0;
}
After compilation gcc sample.c, you can use this command echo Hello | ./a.out and get the following std output on terminal:
read 6 bytesHello
However, if you redirect the output to a file echo Hello | ./a.out > outfile, then use cat outfile to see the content:
Helloread 6 bytes
The ouput changes order after redirection! I wonder if some one could tell me the reason?
For the standard I/O function printf, when you output to a terminal, the standard output is by default line buffered.
printf("read %d bytes\n", n);
\n here cause the output to flush.
However, when you output to a file, it's by default fully buffered. The output won't flush unless the buffer is full, or you explicitly flush it.
The low level system call write, on the other hand, is unbuffered.
In general, intermixing standard I/O calls with system calls is not advised.
printf(), by default, buffers its output, while write() does not, and there is no synchronisation between then.
So, in your code, it is possible that printf() stores its data in a buffer and returns, then write() is called, and - as main() returns, printf()s buffer is flushed so that buffered output appears. From your description, that is happening when output is redirected.
It is also possible that printf() writes data immediately, then write() is called. From your description, that happens when output is not redirected.
Typically, one part of redirection of a stream is changing the buffer - and therefore the behaviour when buffering - for streams like stdout and stdin. The precise change depends on what type of redirection is happening (e.g. to a file, to a pipe, to a different display device, etc).
Imagine that printf() writes data to a buffer and, when flushing that buffer, uses write() to produce output. That means all overt calls of write() will have their output produced immediately, but data that is buffered may be printed out of order.
The problem is that the writes are handled by write(2) call, so you effectively lose control of what happens.
If we look at the documentation for write(2) we can see that the writes are not guaranteed to be actually written until a read() occurs. More specifically:
A successful return from write() does not make any guarantee that data has
been committed to disk. In fact, on some buggy implementations, it does not even
guarantee that space has successfully been reserved for the data. The only way to
be sure is to call fsync(2) after you are done writing all your data.
This means that depending on the implementation and buffering of the write(2) (which may differ even between redirects and printing to screen), you can get different results.

stderr and stdout - not buffered vs. buffered?

I was writing a small program that had various console output strings depending upon different events. As I was looking up the best way to send these messages I came across something that was a bit confusing.
I have read that stderr is used to shoot messages directly to the console - not buffered. While, in contrast, I read that stdout is buffered and is typically used to redirect messages to various streams?, that may or may not be error messages, to an output file or some other medium.
What is the difference when something is said to be buffered and not buffered? It made sense when I was reading that the message is shot directly to the output and is not buffered .. but at the same time I realized that I was not entirely sure what it meant to be buffered.
Typically, stdout is line buffered, meaning that characters sent to stdout "stack up" until a newline character arrives, at which point that are all outputted.
A buffered stream is one in which you keep writing until a certain threshold. This threshold could be a specific character, as Konrad mentions for line buffering, or another threshold, such as a specific count of characters written.
Buffering is intended to speed up input/output operations. One of the slowest things a computer does is write to a stream (whether a console or a file). When things don't need to be immediately seen, it saves time to store it up for a while.
You are right, stderr is typically an unbuffered stream while stdout typically is buffered. So there can be times when you output things to stdout then output to stderr and stderr appears first on the console. If you want to make stdout behave similarly, you would have to flush it after every write.
When an output stream is buffered, it means that the stream doesn't necessarily output data the moment you tell it to. There can be significant overhead per IO operation, so lots and lots of little IO operations can create a bottleneck. By buffering IO operations and then flushing many at once, this overhead is reduced.
While stdout and stderr may behave differently regarding buffering, that is generally not the deciding factor between them and shouldn't be relied on. If you absolutely need the output immediately, always manually flush the stream.
Assume
int main(void)
{
printf("foo\n");
sleep(10);
printf("bar\n");
}
when executing it on the console
$ ./a.out
you will see foo line and 10 seconds later bar line (--> line buffered). When redirecting the output into a file or a pipe
$ ./a.out > /tmp/file
the file stays empty (--> buffered) until the program terminates (--> implicit fflush() at exit).
When lines above do not contain a \n, you won't see anything on the console either until program terminates.
Internally, printf() adds a character to a buffer. To make things more easy, let me describe fputs(char const *s, FILE *f) instead of. FILE might be defined as
struct FILE {
int fd; /* is 0 for stdin, 1 for stdout, 2 for stderr (usually) */
enum buffer_mode mode;
char buf[4096];
size_t count; /* number of chars in buf[] */
};
typedef struct FILE *FILE;
int fflush(FILE *f)
{
write(f->fd, f->buf, f->count);
f->count = 0;
}
int fputc(int c, FILE *f)
{
if (f->count >= ARRAY_SIZE(f->buf))
fflush(f);
f->buf[f->count++] = c;
}
int fputs(char const *s, FILE *f)
{
while (*s) {
char c = *s++;
fputc(c, f);
if (f->mode == LINE_BUFFERED && c == '\n')
fflush(f);
}
if (f->mode == UNBUFFERED)
fflush(f);
}

C: multi-processes stdio append mode

I wrote this code in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
void random_seed(){
struct timeval tim;
gettimeofday(&tim, NULL);
double t1=tim.tv_sec+(tim.tv_usec/1000000.0);
srand (t1);
}
void main(){
FILE *f;
int i;
int size=100;
char *buf=(char*)malloc(size);
f = fopen("output.txt", "a");
setvbuf (f, buf, _IOFBF, size);
random_seed();
for(i=0; i<200; i++){
fprintf(f, "[ xx - %d - 012345678901234567890123456789 - %d]\n", rand()%10, getpid());
fflush(f);
}
fclose(f);
free(buf);
}
This code opens in append mode a file and attaches 200 times a string.
I set the buf of size 100 that can contains the full string.
Then I created multi processes running this code by using this bash script:
#!/bin/bash
gcc source.c
rm output.txt
for i in `seq 1 100`;
do
./a.out &
done
I expected that in the output the strings are never mixed up, as I read that when opening a file with O_APPEND flag the file offset will be set to the end of the file prior to each write and i'm using a fully buffered stream, but i got the first line of each process is mixed as this:
[ xx - [ xx - 7 - 012345678901234567890123456789 - 22545]
and some lines later
2 - 012345678901234567890123456789 - 22589]
It looks like the write is interrupted for calling the rand function.
So...why appear these lines?
Is the only way to prevent this the use file locks...even if i'm using only the append mode?
Thanks in advance!
You will need to implement some form of concurrency control yourself, POSIX makes no guarantees with respect to concurrent writes from multiple processes. You get some guarantees for pipes, but not for regular files written to from different processes.
Quoting POSIX write():
This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.
(At the end of the Rationale section.)
You open the file in the fully buffered mode. That means that every line of the output first goes into the buffer and when the buffer overflows it gets flushed to the file regardless whether it contains incomplete lines. That causes chunks of output from different processes writing into the same file concurrently to be interleaved.
An easy fix would be to open the file in line buffered mode _IOLBF, so that the buffer gets flushed on each complete line. Just make sure that the buffer size is at least as big as your longest line, otherwise it will end up writing incomplete lines. The buffer is normally flushed with a single write() system call, so that lines from different processes won't interleave each other.
There is no guarantee that write() system call is atomic for different filesystems though, but it normally works as expected because write() normally locks the file descriptor in the kernel with a mutex before proceeding.

Whats wrong with this print order

have a look at this code:
#include<stdio.h>
#include <unistd.h>
int main()
{
int pipefd[2],n;
char buf[100];
if(pipe(pipefd)<0)
printf("Pipe error");
printf("\nRead fd:%d write fd:%d\n",pipefd[0],pipefd[1]);
if(write(pipefd[1],"Hello Dude!\n",12)!=12)
printf("Write error");
if((n=read(pipefd[0],buf,sizeof(buf)))<=0)
printf("Read error");
write(1,buf,n);
return 0;
}
I expect the printf to print Read fd and write fd before Hello Dude is read from the pipe. But thats not the case... see here. When i tried the same program in our college computer lab my output was
Read fd:3 write fd:4
Hello Dude!
also few of our friends observed that, changing the printf statement to contain more number of \n characters changed the output order... for example..printf("\nRead fd:%d\n write fd:%d\n",pipefd[0],pipefd[1]); meant that Read fd is printed then the message Hello Dude! then the write fd is printed. What is this behaviour??
Note: Out lab uses a linux server on which we run terminals, i don't remember the compiler version though.
It's because printf to the standard output stream is buffered but write to the standard output file descriptor is not.
That means the behaviour can change based on what sort of buffering you have. In C, standard output is line buffered if it can be determined to be connected to an interactive device. Otherwise it's fully buffered (see here for a treatise on why this is so).
Line buffered means it will flush to the file descriptor when it sees a newline. Fully buffered means it will only flush when the buffer fills (for example, 4K worth of data), or when the stream is closed (or when you fflush).
When you run it interactively, the flush happens before the write because printf encounters the \n and flushes automatically.
However, when you run it otherwise (such as by redirecting output to a file or in an online compiler/executor where it would probably do the very same thing to capture data for presentation), the flush happens after the write (because printf is not flushing after every line).
In fact, you don't need all that pipe stuff in there to see this in action, as per the following program:
#include <stdio.h>
#include <unistd.h>
int main (void) {
printf ("Hello\n");
write (1, "Goodbye\n", 8);
return 0;
}
When I execute myprog ; echo === ; myprog >myprog.out ; cat myprog.out, I get:
Hello
Goodbye
===
Goodbye
Hello
and you can see the difference that the different types of buffering makes.
If you want line buffering regardless of redirection, you can try:
setvbuf (stdin, NULL, _IOLBF, BUFSIZ);
early on in your program - it's implementation defined whether an implementation supports this so it may have no effect but I've not seen many where it doesn't work.
You shouldn't mix calls to write and printf on single file descriptor. Change write to fwrite.
Functions which use FILE are buffered. Functions which use file descriptors are not. This is why you may get mixed order.
You can also try calling fflush before write.
When you write onto the same file, or pipe, or whatever by two means at once (direct IO and output stream) you can get this behaviour. The reason is that the output stream is buffered.
With fflush() you can control that behaviour.
What is happening is that printf writes to stdout in a buffered way -- the string is kept in a buffer before being output -- while the 'write' later on writes to stdout unbuffered. This can have the effect that the output from 'write' appears first if the buffer from the printf is only flushed later on.
You can explicitly flush using fflush() -- but even better would be not to mix buffered and non-buffered writes to the same output. Type man printf, man fflush, man fwrite etc. on your terminal to learn more about what these commands do exactly.

Resources