Understanding the need for fflush() and problems associated with it - c

Below is sample code for using fflush():
#include <string.h>
#include <stdio.h>
#include <conio.h>
#include <io.h>
void flush(FILE *stream);
int main(void)
{
FILE *stream;
char msg[] = "This is a test";
/* create a file */
stream = fopen("DUMMY.FIL", "w");
/* write some data to the file */
fwrite(msg, strlen(msg), 1, stream);
clrscr();
printf("Press any key to flush DUMMY.FIL:");
getch();
/* flush the data to DUMMY.FIL without closing it */
flush(stream);
printf("\nFile was flushed, Press any key to quit:");
getch();
return 0;
}
void flush(FILE *stream)
{
int duphandle;
/* flush the stream's internal buffer */
fflush(stream);
/* make a duplicate file handle */
duphandle = dup(fileno(stream));
/* close the duplicate handle to flush the DOS buffer */
close(duphandle);
}
All I know about fflush() is that it is a library function used to flush an output buffer. I want to know what is the basic purpose of using fflush(), and where can I use it. And mainly I am interested in knowing what problems can there be with using fflush().

It's a little hard to say what "can be problems with" (excessive?) use of fflush. All kinds of things can be, or become, problems, depending on your goals and approaches. Probably a better way to look at this is what the intent of fflush is.
The first thing to consider is that fflush is defined only on output streams. An output stream collects "things to write to a file" into a large(ish) buffer, and then writes that buffer to the file. The point of this collecting-up-and-writing-later is to improve speed/efficiency, in two ways:
On modern OSes, there's some penalty for crossing the user/kernel protection boundary (the system has to change some protection information in the CPU, etc). If you make a large number of OS-level write calls, you pay that penalty for each one. If you collect up, say, 8192 or so individual writes into one large buffer and then make one call, you remove most of that overhead.
On many modern OSes, each OS write call will try to optimize file performance in some way, e.g., by discovering that you've extended a short file to a longer one, and it would be good to move the disk block from point A on the disk to point B on the disk, so that the longer data can fit contiguously. (On older OSes, this is a separate "defragmentation" step you might run manually. You can think of this as the modern OS doing dynamic, instantaneous defragmentation.) If you were to write, say, 500 bytes, and then another 200, and then 700, and so on, it will do a lot of this work; but if you make one big call with, say, 8192 bytes, the OS can allocate a large block once, and put everything there and not have to re-defragment later.
So, the folks who provide your C library and its stdio stream implementation do whatever is appropriate on your OS to find a "reasonably optimal" block size, and to collect up all output into chunk of that size. (The numbers 4096, 8192, 16384, and 65536 often, today, tend to be good ones, but it really depends on the OS, and sometimes the underlying file system as well. Note that "bigger" is not always "better": streaming data in chunks of four gigabytes at a time will probably perform worse than doing it in chunks of 64 Kbytes, for instance.)
But this creates a problem. Suppose you're writing to a file, such as a log file with date-and-time stamps and messages, and your code is going to keep writing to that file later, but right now, it wants to suspend for a while and let a log-analyzer read the current contents of the log file. One option is to use fclose to close the log file, then fopen to open it again in order to append more data later. It's more efficient, though, to push any pending log messages to the underlying OS file, but keep the file open. That's what fflush does.
Buffering also creates another problem. Suppose your code has some bug, and it sometimes crashes but you're not sure if it's about to crash. And suppose you've written something and it's very important that this data get out to the underlying file system. You can call fflush to push the data through to the OS, before calling your potentially-bad code that might crash. (Sometimes this is good for debugging.)
Or, suppose you're on a Unix-like system, and have a fork system call. This call duplicates the entire user-space (makes a clone of the original process). The stdio buffers are in user space, so the clone has the same buffered-up-but-not-yet-written data that the original process had, at the time of the fork call. Here again, one way to solve the problem is to use fflush to push buffered data out just before doing the fork. If everything is out before the fork, there's nothing to duplicate; the fresh clone won't ever attempt to write the buffered-up data, as it no longer exists.
The more fflush-es you add, the more you're defeating the original idea of collecting up large chunks of data. That is, you are making a tradeoff: large chunks are more efficient, but are causing some other problem, so you make the decision: "be less efficient here, to solve a problem more important than mere efficiency". You call fflush.
Sometimes the problem is simply "debug the software". In that case, instead of repeatedly calling fflush, you can use functions like setbuf and setvbuf to alter the buffering behavior of a stdio stream. This is more convenient (fewer, or even no, code changes required—you can control the set-buffering call with a flag) than adding a lot of fflush calls, so that could be considered a "problem with use (or excessive-use) of fflush".

Well, #torek's answer is almost perfect, but there's one point which is not so accurate.
The first thing to consider is that fflush is defined only on output
streams.
According to man fflush, fflush can also be used in input streams:
For output streams, fflush() forces a write of all user-space
buffered data for the given output or update stream via the stream's
underlying write function. For
input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by
the application. The open status of
the stream is unaffected.
So, when used in input, fflush just discard it.
Here is a demo to illustrate it:
#include<stdio.h>
#define MAXLINE 1024
int main(void) {
char buf[MAXLINE];
printf("prompt: ");
while (fgets(buf, MAXLINE, stdin) != NULL)
fflush(stdin);
if (fputs(buf, stdout) == EOF)
printf("output err");
exit(0);
}

fflush() empties the buffers related to the stream. if you e.g. let a user input some data in a very shot timespan (milliseconds) and write some stuff into a file, the writing and reading buffers may have some "reststuff" remaining in themselves. you call fflush() then to empty all the buffers and force standard outputs to be sure the next input you get is what the user pressed then.
reference: http://www.cplusplus.com/reference/cstdio/fflush/

Related

what is the different of using fflush(stdout) and not using it

#include <stdio.h>
int main()
{
printf("Hello");
fflush(stdout);
return 0;
}
#include <stdio.h>
int main()
{
printf("Hello");
return 0;
}
I'm trying to understand the use of fflush(stdout) and what is the difference between the 2 programs above?
In a normal C program running on a modern OS, file access is buffered twice (or more when you count buffers like the buffer in your drive). One buffer is implemented in the FILE structure and the other is implemented in the kernel.
Often, the FILE structure buffers the content in a buffer inside of your program. When you write something to a buffered file, the content is keep in the buffer, inside of the running program. It is written to the OS when the buffer is full and, when the buffering mode is line buffered, at the end of a line. This data is written to the OS by a syscall, for example write().
The buffer is there because a syscall requires a context switch from the user program to the kernel, this is relatively expensive (slow), the buffer is here to reduce the number of syscalls. You could also use the syscalls from your program directly without the stdio functions, however, this functions are less portable and more complex to handle.
A fflush(stdout) checks if there are any data in the buffer that should be written and if so, the underlying syscall is used to write the data to the OS.
When the syscall returns, the data is in your kernel. But modern operating systems buffer this data as well. This is used to reduce the number of disk writes, reduce latency and other things. This buffer is completely independent of the FILE buffer inside your program.
Note that this does not apply to all systems. For example microcontroller environments may provide some stdio.h functions that write directly to a UART, without any buffer, neither inside FILE nor any (probably non-existent) OS.
To see what fflush() does to a running program, compare this programs:
int main(void)
{
fputs("s",stdout);
fputs("e",stderr);
}
and
int main(void)
{
fputs("s",stdout);
fflush(stdout);
fputs("e",stderr);
}
On Linux, stderr is not buffered by default, so fputs("e",stderr); will print the data immediately. On the other hand, fputs("s",stdout); is line buffered by default on Linux so the data is not printed immediately. This causes the first program to output es and not se, but the second one outputs se.
You can change the buffer modes with setvbuf()
When stdout points to a tty, it is, by default, line-buffered. This means the output is buffered inside the computer internals until a full line is received (and output).
Your programs do not send a full line to the computer internals.
In the case of using fflush() you are telling the computer internals to send the current data in the buffer to the device; without fflush() you are relying on the computer internals to do that for you at program termination.
By computer internals I mean the combination of the C library, Operating System, hardware interface, (automatic) buffers between the various interfaces, ...

Long delay hiccups for logging stdout to file

I have a C program that writes into 3 lines every 10ms into stdout. If I redirect the output to a file (using > ) there will be long delays (60ms) in the running of the program. The delays are periodic (say every 5 seconds).
If I just let it write to console or redirect to /dev/null, there is no problem.
I suspected that this is the stdout buffer problem, but using fflush(stdout) didn't solve the problem.
How can I solve the issue?
If I redirect the output to a file (using > ) there will be long
delays (60ms) in the running of the program.
That's because when stdout is a terminal device, it is usually (although not required) line-buffered, that is, the output buffer is flushed when a newline character is written, whereas in the case of regular files, output is fully buffered, meaning the buffers are flushed either when they are full or you close the file (or you explicitly call fflush(), of course).
fflush(stdout) may not be enough for you because that only flushes the standard I/O library buffers, but the kernel also buffers and delays writes to disk. You can call fsync() on the file descriptor to flush the modified buffer cache pages to disk after calling fflush(), as in fsync(STDOUT_FILENO).
Be careful and don't call fsync() without calling fflush() before.
UPDATE: You can also try sync(), which, unlike fsync(), does not block waiting for the underlying writes to return. Or, as suggested in another answer, fdatasync() may be a good choice because it avoids the overhead of updating file times.
You need to use fsync. The following:
fsync(fileno(stdout))
Should help. Note that the Linux kernel will still buffer limit I/O according to its internal scheduler limits. Running as root and setting a very low nice value might make a difference, if you're not getting the frequency you want.
If it's still too slow, try using fdatasync instead. Every fflush and fsync causes the filesystem to update node metadata (filesize, access time, etc) as well as the actual data itself. If you know in blocks how much data you'll be writing, then you can try the following trick:
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char **argv){
FILE *fp = fopen("test.txt", "w");
char *line = "Test\n";
char *fill = "\0";
fwrite(fill, 1, 100*strlen(line), fp);
fflush(fp);
fsync(fileno(fp));
rewind(fp);
for (int i = 0; i < 100; i++){
fwrite(line, strlen(line), 1, fp);
fflush(fp);
fdatasync(fileno(fp));
}
}
The first fwrite call writes 5*100 zeros to the file in one chunk, and fsyncs so that it's written to disk and the node information is updated. Now we can write up to 500 bytes to the file without trashing filesystem metadata. rewind(3) returns the file pointer position to the beginning of the file so we can write over the data without changing the filesize of the node.
Timing that program gives the following:
$ time ./fdatasync
./fdatasync 0.00s user 0.01s system 1% cpu 0.913 total
So it ran fdatasync and sync'ed to disk 100 times in 0.913 seconds, which averages out to ~9ms per write & fdatasync call.
it could be just that every 5seconds you are filling up your disk buffer and there is a spike in the latency due to flushing to actual disk.check with iostat

Understanding Buffering in C

I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen

Reading directly from a FILE buffer

The core of my app looks approximately as follows:
size_t bufsize;
char* buf1;
size_t r1;
FILE* f1=fopen("/path/to/file","rb");
...
do{
r1=fread(buf1, 1, bufsize, f1);
processChunk(buf1,r1);
} while (!feof(f1));
...
(In reality, I have multiple FILE*'s and multiple bufN's.) Now, I hear that FILE is quite ready to manage a buffer (referred to as a "stream buffer") all by itself, and this behavior appears to be quite tweakable: https://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering .
How can I refactor the above piece of code to ditch the buf1 buffer and use f1's internal stream buffer instead (while setting it to bufsize)?
If you don't want opaquely buffered I/O, don't use FILE *. Use lower-level APIs that let you manage all the application-side buffering yourself, such as plain POSIX open() and read() for instance.
So I've read a little bit of the C standard and run some benchmarks and here are my findings:
1) Doing it as in the above example does involve unnecessary in-memory copying, which increases the user time of simple cmp program based on the above example about twice. Nevertheless user-time is insignificant for most IO-heavy programs, unless the source of the file is extremely fast.
On in-memory file-sources (/dev/shm on Linux), however, turning off FILE buffering (setvbuf(f1, NULL, _IONBF, 0);) does yield a nice and consistent speed increase of about 10–15% on my machine when using buffsizes close to BUFSIZ (again, measured on the IO-heavy cmp utility based on the above snippet, which I've already mentioned, which I've tested on 2 identical 700MB files 100 times).
2) Whereas there is an API for setting the FILE buffer, I haven't found any standardized API for reading it, so I'm going to stick with the true and tested way of doing, but with the FILE buffer off (setvbuf(f1, NULL, _IONBF, 0);)
(But I guess I could solve my question by setting my own buffer as the FILE stream buffer with the _IONBF mode option (=turn off buffering), and then I could just access it via some unstandardized pointer in the FILE struct.)

what is the point of using the setvbuf() function in c?

Why would you want to set aside a block of memory in setvbuf()?
I have no clue why you would want to send your read/write stream to a buffer.
setvbuf is not intended to redirect the output to a buffer (if you want to perform IO on a buffer you use sprintf & co.), but to tightly control the buffering behavior of the given stream.
In facts, C IO functions don't immediately pass the data to be written to the operating system, but keep an intermediate buffer to avoid continuously performing (potentially expensive) system calls, waiting for the buffer to fill before actually performing the write.
The most basic case is to disable buffering altogether (useful e.g. if writing to a log file, where you want the data to go to disk immediately after each output operation) or, on the other hand, to enable block buffering on streams where it is disabled by default (or is set to line-buffering). This may be useful to enhance output performance.
Setting a specific buffer for output can be useful if you are working with a device that is known to work well with a specific buffer size; on the other side, you may want to have a small buffer to cut down on memory usage in memory-constrained environments, or to avoid losing much data in case of power loss without disabling buffering completely.
In C files opened with e.g. fopen are by default buffered. You can use setvbuf to supply your own buffer, or make the file operations completely unbuffered (like to stderr is).
It can be used to create fmemopen functionality on systems that doesn't have that function.
The size of a files buffer can affect Standard library call I/O rates. There is a table in Chap 5 of Steven's 'Advanced Programming in the UNIX Environment' that shows I/O throughput increasing dramatically with I/O buffer size, up to ~16K then leveling off. A lot of other factor can influenc overall I/O throughtput, so this one "tuning" affect may or may not be a cureall. This is the main reason for "why" other than turning off/on buffering.
Each FILE structure has a buffer associated with it internally. The reason behind this is to reduce I/O, and real I/O operations are time costly.
All your read/write will be buffered until the buffer is full. All the data buffered will be output/input in one real I/O operation.
Why would you want to set aside a block of memory in setvbuf()?
For buffering.
I have no clue why you would want to send your read/write stream to a buffer.
Neither do I, but as that's not what it does the point is moot.
"The setvbuf() function may be used on any open stream to change its buffer" [my emphasis]. In other words it alread has a buffer, and all the function does is change that. It doesn't say anything about 'sending your read/write streams to a buffer". I suggest you read the man page to see what it actually says. Especially this part:
When an output stream is unbuffered, information appears on the destination file or terminal as soon as written; when it is block buffered many characters are saved up and written as a block; when it is line buffered characters are saved up until a newline is output or input is read from any stream attached to a terminal device (typically stdin).

Resources