What is meant by stream buffering?

What is meant by stream buffering? - c

I had started learning C programming, so I'm a beginner, while learning about standard streams of text, I came up with the lines "stdout" stream is buffered while "stderr" stream is not buffered, but I am not able to make sense with this lines.
I already have read about "buffer" on this forum and I like candy analogy, but I am not able to figure out what is meant when one says: "This stream is buffered and the other one is not." What is the effect?
What is the difference?
Update: Does it affect the speed of processing?

Buffer is a block of memory which belongs to a stream and is used to hold stream data temporarily. When the first I/O operation occurs on a file, malloc is called and a buffer is obtained. Characters that are written to a stream are normally accumulated in the buffer (before being transmitted to the file in chunks), instead of appearing as soon as they are output by the application program. Similarly, streams retrieve input from the host environment in blocks rather than on a character-by-character basis. This is done to increase efficiency, as file and console I/O is slow in comparison to memory operations.
GCC provides three types of buffering - unbuffered, block buffered, and line buffered. Unbuffered means that characters appear on the destination file as soon as written (for an output stream), or input is read from a file on a character-by-character basis instead of reading in blocks (for input streams). Block buffered means that characters are saved up in the buffer and written or read as a block. Line buffered means that characters are saved up only till a newline is written into or read from the buffer.
stdin and stdout are block buffered if and only if they can be determined not to refer to an interactive device else they are line buffered (this is true of any stream). stderr is always unbuffered by default.
The standard library provides functions to alter the default behaviour of streams. You can use fflush to force the data out of the output stream buffer (fflush is undefined for input streams). You can make the stream unbuffered using the setbuf function.

Buffering is collecting up many elements before writing them, or reading many elements at once before processing them. Lots of information out there on the Internet, for example, this
and other SO questions like this
EDIT in response to the question update: And yes, it's done for performance reasons. Writing and reading from disks etc will in any case write or read a 'block' of some sort for most devices, and there's a fair overhead in doing so. So batching these operations up can make for a dramatic performance difference

A program writing to buffered output can perform the output in the time it takes to write to the buffer which is typically very fast, independent of the speed of the output device which may be slow.
With buffered output the information is queues and a separate process deals with the output rendering.
With unbuffered output, the data is written directly to the output device, so runs at the speed on the device. This is important for error output because if the output were buffered it would be possible for the process to fail before the buffered output has made it to the display - so the program might terminate with no diagnostic output.

Related

Misunderstand line-buffer in Unix

I'm reading Advanced Programming in the UNIX Environment, 3rd Edition and misunderstanding a section in it (page 145, Section 5.4 Buffering, Chapter 5).
Line buffering comes with two caveats. First, the size of the buffer that the
standard I/O library uses to collect each line is fixed, so I/O might take place if
we fill this buffer before writing a newline. Second, whenever input is
requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel),
all line-buffered output streams are flushed. The reason for the qualifier on (b)
is that the requested data may already be in the buffer, which doesn’t require
data to be read from the kernel. Obviously, any input from an unbuffered
stream, item (a), requires data to be obtained from the kernel.
I can't get the bold lines. My English isn't good. So, could you clarify it for me? Maybe in an easier way. Thanks.

The point behind the machinations described is to ensure that prompts appear before the system goes into a mode where it is waiting for input.
If an input stream is unbuffered, every time the standard I/O library needs data, it has to go to the kernel for some information. (That's the last sentence.) That's because the standard I/O library does not buffer any data, so when it needs more data, it has to read from the kernel. (I think that even an unbuffered stream might buffer one character of data, because it would need to read up to a space character, for example, to detect when it has reached the end of a %s format string; it has to put back (ungetc()) the extra character it read so that the next time it needs a character, there is the character it put back. But it never needs more than the one character of buffering.)
If an input stream is line buffered, there may already be some data in its input buffer, in which case it may not need to go to the kernel for more data. In that case, it might not flush anything. This can occur if the scanf() format requested "%s" and you typed hello world; it would read the whole line, but the first scan would stop after hello, and the next scanf() would not need to go to the kernel for the world word because it is already in the buffer.
However, if there isn't any data in the buffer, it has to ask the kernel to read the data, and it ensures that any line-buffered output streams are flushed so that if you write:
printf("Enter name: ");
if (scanf("%63s", name) != 1)
…handle error or EOF…
then the prompt (Enter name:) appears. However, if you'd previously typed hello world and previously read just hello, then the prompt wouldn't necessarily appear because the world was already waiting in the (line buffered) input stream.

This may explain the point.
Let's imagine that you have a pipe in your program and you use it for communication between different parts of your program (single thread program writing and reading from this single pipe).
If you write to the writing end of the pipe, say the letter 'A', and then call the read operation to read from the reading end of the pipe. You would expect that the letter 'A' is read. However, read operation is a system call to the kernel. To be able to return the letter 'A' it must be written to the kernel first. This means that the writing of 'A' must be flushed, otherwise it would stay in your local writing buffer and your program would be locked forever.
In consequence, before calling a read operation all write buffers are flushed. This is what the section (b) says.

The size of the buffer that the standard I/O library is using to collect each line is fixed.
with the help of the fgets function we are getting the line continuously, during that time it will read the content with the specified buffer size or up to newline.
Second, whenever input is requested through the standard I/O library, it can use an unbuffered stream or line-buffered stream.
unbuffered stream - It will not buffer the character, flush the character regularly.
line-buffered - It will store the character into the buffer and then flush when the operation is completed.
lets take without using \n we are going to print the content in printf statement, that time it will buffer all the content until we flush or printing with new line. Like that when the operation is completed the stream buffer is flushed internally.
(b) is that the requested data may already be in the buffer, which doesn't require data to be read from the kernel
In line oriented stream the requested buffer may already in the buffer because the data can be buffered, so we can't required data to read from the kernel once again.
(a) requires data to be obtained from the kernel.
Any input from unbuffered stream item, a data to be get from the kernel due to the unbuffered stream can't store anything in the buffer.

Regarding printing characters in C

int main()
{
printf("Hello"); // doesn't display anything on the screen
printf("\n"); // hello is display on the screen
return 0;
}
All characters(candidate of printing) are buffered until a new line is received? Correct?
Q1 - Why does it wait before printing on terminal until a newline char?
Q2 - Where are the characters of first printf (i.e. "Hello") buffered?
Q3 - What is the flow of printing printf()->puts()->putchar() -> now where? driver? Does the driver has a control to wait until \n?
Q4 - What is the role stdout that is attached to a process?
Looking for a in-depth picture. Feel free to edit the question, if something doesn't makes sense.

printf is not writing directly to the screen, instead it writes to the output stream, which is by default buffered. The reason for this is, that there may not even be a screen attached and the output can go to a file as well. For performance reasons, it is better for a system if access to disc is buffered and then executed in one step with appropriately sized chunks, rather than writing every time.
You can even change the size of the buffer and set it to 0, which means that all output goes directly to the target, which may be usefull for logging purposes.
setbuf(stdout, NULL);
The buffer is flushed either when it is full, or if certain criterions are fullfilled, like printing a newline. So when you would execute the printf in a loop, you would notice that it will write out in chunks unless you have a newline inbetween.

I'll start with some definitions and then go on to answer your questions.
File: It is an ordered sequence of bytes. It can be a disk file, a stream of bytes generated by a program (such as a pipeline), a TCP/IP socket, a stream of bytes received from or sent to a peripheral device (such as the keyboard or the display) etc. The latter two are interactive files. Files are typically the principal means by which a program communicates with its environment.
Stream: It is a representation of flow of data from one place to another, e.g., from disk to memory, memory to disk, one program to another etc. A stream is a source of data where data can be put into (write) or taken data out of (read). Thus, it's an interface for writing data into or reading data from a file which can be any type as stated above. Before you can perform any operation on a file, the file must be opened. Opening a file associates it with a stream. Streams are represented by FILE data type defined in stdio.h header. A FILE object (it's a structure) holds all of the internal state information about the connection to the associated file, including such things as the file position indicator and buffering information. FILE objects are allocated and managed internally by the input/output library functions and you should not try to create your own objects of FILE type, the library does it for us. The programs should deal only with pointers to these objects (FILE *) rather than the objects themselves.
Buffer: Buffer is a block of memory which belongs to a stream and is used to hold stream data temporarily. When the first I/O operation occurs on a file, malloc is called and a buffer is obtained. Characters that are written to a stream are normally accumulated in the buffer (before being transmitted to the file in chunks), instead of appearing as soon as they are output by the application program. Similarly, streams retrieve input from the host environment in blocks rather than on a character-by-character basis. This is done to increase efficiency, as file and console I/O is slow in comparison to memory operations.
The C library provides three predefined text streams (FILE *) open and available for use at program start-up. These are stdin (the standard input stream, which is the normal source of input for the program), stdout (the standard output stream, which is used for normal output from the program), and stderr (the standard error stream, which is used for error messages and diagnostics issued by the program). Whether these streams are buffered or unbuffered is implementation-defined and not required by the standard.
GCC provides three types of buffering - unbuffered, block buffered, and line buffered. Unbuffered means that characters appear on the destination file as soon as written (for an output stream), or input is read from a file on a character-by-character basis instead of reading in blocks (for input streams). Block buffered means that characters are saved up in the buffer and written or read as a block. Line buffered means that characters are saved up only till a newline is written into or read from the buffer.
stdin and stdout are block buffered if and only if they can be determined not to refer to an interactive device else they are line buffered (this is true of any stream). stderr is always unbuffered by default.
The standard library provides functions to alter the default behaviour of streams. You can use fflush to force the data out of the output stream buffer (fflush is undefined for input streams). You can make the stream unbuffered using the setbuf function.
Now, let's come to your questions.
Unmarked question: Yes, becausestdout normally refers to a display terminal unless you have output redirection using > operator.
Q1: It waits because stdout is newline buffered when it refers to a terminal.
Q2: The characters are buffered, well, in the buffer allocated to the stdout stream.
Q3: Flow of the printing is: memory --> stdout buffer --> display terminal. There are kernel buffers as well controlled by the OS which the data pass through before appearing on the terminal.
Q4: stdout refers to the standard output stream which is usually a terminal.
Finally, here's a sample code to experiment things before I finish my answer.
#include <stdio.h>
#include <limits.h>
int main(void) {
// setbuf(stdout, NULL); // make stdout unbuffered
printf("Hello, World!"); // no newline
// printf("Hello, World!"); // with a newline
// only for demonstrating that stdout is line buffered
for(size_t i = 0; i < UINT_MAX; i++)
; // null statement
printf("\n"); // flush the buffer
return 0;
}

Yes, by default, standard output is line buffered when it's connected to a terminal. The buffer is managed by the operating system, normally you don't have to worry about it.
You can change this behavior using setbuf() or setvbuf(), for example, to change it to no buffer:
setbuf(stdout, NULL);
All the functions of printf, puts, putchar outputs to the standard output, so they use the same buffer.

If you wish, you can flush out the characters before the new line by calling
fflush(stdout);
This can be handy if you're slowly printing something like a progress bar where each character gets printed without a newline.
int main()
{
printf("Hello"); // Doesn't display anything on the screen
fflush(stdout); // Now, hello appears on the screen
printf("\n"); // The new line gets printed
return 0;
}

what is the point of using the setvbuf() function in c?

Why would you want to set aside a block of memory in setvbuf()?
I have no clue why you would want to send your read/write stream to a buffer.

setvbuf is not intended to redirect the output to a buffer (if you want to perform IO on a buffer you use sprintf & co.), but to tightly control the buffering behavior of the given stream.
In facts, C IO functions don't immediately pass the data to be written to the operating system, but keep an intermediate buffer to avoid continuously performing (potentially expensive) system calls, waiting for the buffer to fill before actually performing the write.
The most basic case is to disable buffering altogether (useful e.g. if writing to a log file, where you want the data to go to disk immediately after each output operation) or, on the other hand, to enable block buffering on streams where it is disabled by default (or is set to line-buffering). This may be useful to enhance output performance.
Setting a specific buffer for output can be useful if you are working with a device that is known to work well with a specific buffer size; on the other side, you may want to have a small buffer to cut down on memory usage in memory-constrained environments, or to avoid losing much data in case of power loss without disabling buffering completely.

In C files opened with e.g. fopen are by default buffered. You can use setvbuf to supply your own buffer, or make the file operations completely unbuffered (like to stderr is).
It can be used to create fmemopen functionality on systems that doesn't have that function.

The size of a files buffer can affect Standard library call I/O rates. There is a table in Chap 5 of Steven's 'Advanced Programming in the UNIX Environment' that shows I/O throughput increasing dramatically with I/O buffer size, up to ~16K then leveling off. A lot of other factor can influenc overall I/O throughtput, so this one "tuning" affect may or may not be a cureall. This is the main reason for "why" other than turning off/on buffering.

Each FILE structure has a buffer associated with it internally. The reason behind this is to reduce I/O, and real I/O operations are time costly.
All your read/write will be buffered until the buffer is full. All the data buffered will be output/input in one real I/O operation.

Why would you want to set aside a block of memory in setvbuf()?
For buffering.
I have no clue why you would want to send your read/write stream to a buffer.
Neither do I, but as that's not what it does the point is moot.
"The setvbuf() function may be used on any open stream to change its buffer" [my emphasis]. In other words it alread has a buffer, and all the function does is change that. It doesn't say anything about 'sending your read/write streams to a buffer". I suggest you read the man page to see what it actually says. Especially this part:
When an output stream is unbuffered, information appears on the destination file or terminal as soon as written; when it is block buffered many characters are saved up and written as a block; when it is line buffered characters are saved up until a newline is output or input is read from any stream attached to a terminal device (typically stdin).

How are files written? Why do I not see my data written immediately?

I understand the general process of writing and reading from a file, but I was curious as to what is happening under the hood during file writing. For instance, I have written a program that writes a series of numbers, line by line, to a .txt file. One thing that bothers me however is that I don't see the information written until after my c program is finished running. Is there a way to see the information written while the program is running rather than after? Is this even possible to do? This is a hard question to phrase in one line, so please forgive me if it's already been answered elsewhere.
The reason I ask this is because I'm writing to a file and was hoping that I could scan the file for the highest and lowest values (the program would optimally be able to run for hours).

Research buffering and caching.
There are a number of layers of optimisation performed by:
your application,
your OS, and
your disk driver,
in order to extend the life of your disk and increase performance.
With the careful use of flushing commands, you can generally make things happen "quite quickly" when you really need them to, though you should generally do so sparingly.
Flushing can be particularly useful when debugging.
The GNU C Library documentation has a good page on the subject of file flushing, listing functions such as fflush which may do what you want.

You observe an effect solely caused by the C standard I/O (stdio) buffers. I claim that any OS or disk driver buffering has nothing to do with it.
In stdio, I/O happens in one of three modes:
Fully buffered, data is written once BUFSIZ (from <stdio.h>) characters were accumulated. This is the default when I/0 is redirected to a file or pipe. This is what you observe. Typically BUFSIZ is anywhere from 1k to several kBytes.
Line buffered, data is written once a newline is seen (or BUFSIZ is reached). This is the default when i/o is to a terminal.
Unbuffered, data is written immediately.
You can use the setvbuf() (<stdio.h>) function to change the default, using the _IOFBF, _IOLBF or _IONBF macros, respectively. See your friendly setvbuf man page.
In your case, you can set your output stream (stdout or the FILE * returned by fopen) to line buffered.
Alternatively, you can call fflush() on the output stream whenever you want I/O to happen, regardless of buffering.

Indeed, there are several layers between the writing commands resp. functions and the actual file.
First, you open the file for writing. This causes the file to be either created or emptied. If you write then, the write doesn't actually occur immediately, but the data are cached until the buffer is full or the file is flushed or closed.
You can call fflush() for writing each portion of data, or you can actually wait until the file is closed.

Yes, it is possible to see whats written in the file(s). If you programm under Linux you can open a new Terminal and watch the progress with for example "less Filename".

buffer confusion

Could anyone clarify on the types of buffers used by a program?
For eg:
I have a C program that reads from a stdin to stdout.
What are the buffers involved here? I'm aware that there are 2.
One provided by the kernel on which a user don't have any control.
One provided with standard streams namely stdout, stdin and stderr. Each having a separate buffer.
Is my understanding correct?
Thanks,
John

If you are working on linux/unix then you could more easily understand that there are three streams namely
1.STDIN: FILE DESCRIPTOR VALUE 0 (IN unix)
2.STDOUT :FILE DESCRIPTOR VALUE 1
3.STDERR :FILE DESCRIPTOR VALUE 2
By default these streams correspond to keyboard and monitor.In unix we can change these streams to read input from file instead of keyboard.To display output on a file rather than monitor using close(),dup() system calls.Yes there are 3 buffers involved.To clear the contents of input buffer in c we use fflush() function.
If you want to know more about handling these streams in UNIX then let me Know.

The kernel (or other underlying system) could have any number of layers of buffering, depending on what device is being read from and the details of the kernel implementation; in some systems there is no buffering at that level, with the data being read directly into the userspace buffer.
The stdio library allocates a buffer for stdin; the size is implementation-dependent but you can control the size and even use your own buffer with setvbuf. It also allows you to control whether I/O is fully buffered (as much data is read into the buffer as is available), line buffered (data is is only read until a newline is encountered), or unbuffered. The default is line buffering if the system can determine that the input is a terminal, else fully buffered.
The story is similar for stdout. stderr is by default unbuffered.