what is the different of using fflush(stdout) and not using it - c

#include <stdio.h>
int main()
{
printf("Hello");
fflush(stdout);
return 0;
}
#include <stdio.h>
int main()
{
printf("Hello");
return 0;
}
I'm trying to understand the use of fflush(stdout) and what is the difference between the 2 programs above?

In a normal C program running on a modern OS, file access is buffered twice (or more when you count buffers like the buffer in your drive). One buffer is implemented in the FILE structure and the other is implemented in the kernel.
Often, the FILE structure buffers the content in a buffer inside of your program. When you write something to a buffered file, the content is keep in the buffer, inside of the running program. It is written to the OS when the buffer is full and, when the buffering mode is line buffered, at the end of a line. This data is written to the OS by a syscall, for example write().
The buffer is there because a syscall requires a context switch from the user program to the kernel, this is relatively expensive (slow), the buffer is here to reduce the number of syscalls. You could also use the syscalls from your program directly without the stdio functions, however, this functions are less portable and more complex to handle.
A fflush(stdout) checks if there are any data in the buffer that should be written and if so, the underlying syscall is used to write the data to the OS.
When the syscall returns, the data is in your kernel. But modern operating systems buffer this data as well. This is used to reduce the number of disk writes, reduce latency and other things. This buffer is completely independent of the FILE buffer inside your program.
Note that this does not apply to all systems. For example microcontroller environments may provide some stdio.h functions that write directly to a UART, without any buffer, neither inside FILE nor any (probably non-existent) OS.
To see what fflush() does to a running program, compare this programs:
int main(void)
{
fputs("s",stdout);
fputs("e",stderr);
}
and
int main(void)
{
fputs("s",stdout);
fflush(stdout);
fputs("e",stderr);
}
On Linux, stderr is not buffered by default, so fputs("e",stderr); will print the data immediately. On the other hand, fputs("s",stdout); is line buffered by default on Linux so the data is not printed immediately. This causes the first program to output es and not se, but the second one outputs se.
You can change the buffer modes with setvbuf()

When stdout points to a tty, it is, by default, line-buffered. This means the output is buffered inside the computer internals until a full line is received (and output).
Your programs do not send a full line to the computer internals.
In the case of using fflush() you are telling the computer internals to send the current data in the buffer to the device; without fflush() you are relying on the computer internals to do that for you at program termination.
By computer internals I mean the combination of the C library, Operating System, hardware interface, (automatic) buffers between the various interfaces, ...

Related

Understanding Buffering in C

I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen

Regarding printing characters in C

int main()
{
printf("Hello"); // doesn't display anything on the screen
printf("\n"); // hello is display on the screen
return 0;
}
All characters(candidate of printing) are buffered until a new line is received? Correct?
Q1 - Why does it wait before printing on terminal until a newline char?
Q2 - Where are the characters of first printf (i.e. "Hello") buffered?
Q3 - What is the flow of printing printf()->puts()->putchar() -> now where? driver? Does the driver has a control to wait until \n?
Q4 - What is the role stdout that is attached to a process?
Looking for a in-depth picture. Feel free to edit the question, if something doesn't makes sense.
printf is not writing directly to the screen, instead it writes to the output stream, which is by default buffered. The reason for this is, that there may not even be a screen attached and the output can go to a file as well. For performance reasons, it is better for a system if access to disc is buffered and then executed in one step with appropriately sized chunks, rather than writing every time.
You can even change the size of the buffer and set it to 0, which means that all output goes directly to the target, which may be usefull for logging purposes.
setbuf(stdout, NULL);
The buffer is flushed either when it is full, or if certain criterions are fullfilled, like printing a newline. So when you would execute the printf in a loop, you would notice that it will write out in chunks unless you have a newline inbetween.
I'll start with some definitions and then go on to answer your questions.
File: It is an ordered sequence of bytes. It can be a disk file, a stream of bytes generated by a program (such as a pipeline), a TCP/IP socket, a stream of bytes received from or sent to a peripheral device (such as the keyboard or the display) etc. The latter two are interactive files. Files are typically the principal means by which a program communicates with its environment.
Stream: It is a representation of flow of data from one place to another, e.g., from disk to memory, memory to disk, one program to another etc. A stream is a source of data where data can be put into (write) or taken data out of (read). Thus, it's an interface for writing data into or reading data from a file which can be any type as stated above. Before you can perform any operation on a file, the file must be opened. Opening a file associates it with a stream. Streams are represented by FILE data type defined in stdio.h header. A FILE object (it's a structure) holds all of the internal state information about the connection to the associated file, including such things as the file position indicator and buffering information. FILE objects are allocated and managed internally by the input/output library functions and you should not try to create your own objects of FILE type, the library does it for us. The programs should deal only with pointers to these objects (FILE *) rather than the objects themselves.
Buffer: Buffer is a block of memory which belongs to a stream and is used to hold stream data temporarily. When the first I/O operation occurs on a file, malloc is called and a buffer is obtained. Characters that are written to a stream are normally accumulated in the buffer (before being transmitted to the file in chunks), instead of appearing as soon as they are output by the application program. Similarly, streams retrieve input from the host environment in blocks rather than on a character-by-character basis. This is done to increase efficiency, as file and console I/O is slow in comparison to memory operations.
The C library provides three predefined text streams (FILE *) open and available for use at program start-up. These are stdin (the standard input stream, which is the normal source of input for the program), stdout (the standard output stream, which is used for normal output from the program), and stderr (the standard error stream, which is used for error messages and diagnostics issued by the program). Whether these streams are buffered or unbuffered is implementation-defined and not required by the standard.
GCC provides three types of buffering - unbuffered, block buffered, and line buffered. Unbuffered means that characters appear on the destination file as soon as written (for an output stream), or input is read from a file on a character-by-character basis instead of reading in blocks (for input streams). Block buffered means that characters are saved up in the buffer and written or read as a block. Line buffered means that characters are saved up only till a newline is written into or read from the buffer.
stdin and stdout are block buffered if and only if they can be determined not to refer to an interactive device else they are line buffered (this is true of any stream). stderr is always unbuffered by default.
The standard library provides functions to alter the default behaviour of streams. You can use fflush to force the data out of the output stream buffer (fflush is undefined for input streams). You can make the stream unbuffered using the setbuf function.
Now, let's come to your questions.
Unmarked question: Yes, becausestdout normally refers to a display terminal unless you have output redirection using > operator.
Q1: It waits because stdout is newline buffered when it refers to a terminal.
Q2: The characters are buffered, well, in the buffer allocated to the stdout stream.
Q3: Flow of the printing is: memory --> stdout buffer --> display terminal. There are kernel buffers as well controlled by the OS which the data pass through before appearing on the terminal.
Q4: stdout refers to the standard output stream which is usually a terminal.
Finally, here's a sample code to experiment things before I finish my answer.
#include <stdio.h>
#include <limits.h>
int main(void) {
// setbuf(stdout, NULL); // make stdout unbuffered
printf("Hello, World!"); // no newline
// printf("Hello, World!"); // with a newline
// only for demonstrating that stdout is line buffered
for(size_t i = 0; i < UINT_MAX; i++)
; // null statement
printf("\n"); // flush the buffer
return 0;
}
Yes, by default, standard output is line buffered when it's connected to a terminal. The buffer is managed by the operating system, normally you don't have to worry about it.
You can change this behavior using setbuf() or setvbuf(), for example, to change it to no buffer:
setbuf(stdout, NULL);
All the functions of printf, puts, putchar outputs to the standard output, so they use the same buffer.
If you wish, you can flush out the characters before the new line by calling
fflush(stdout);
This can be handy if you're slowly printing something like a progress bar where each character gets printed without a newline.
int main()
{
printf("Hello"); // Doesn't display anything on the screen
fflush(stdout); // Now, hello appears on the screen
printf("\n"); // The new line gets printed
return 0;
}

Understanding the need for fflush() and problems associated with it

Below is sample code for using fflush():
#include <string.h>
#include <stdio.h>
#include <conio.h>
#include <io.h>
void flush(FILE *stream);
int main(void)
{
FILE *stream;
char msg[] = "This is a test";
/* create a file */
stream = fopen("DUMMY.FIL", "w");
/* write some data to the file */
fwrite(msg, strlen(msg), 1, stream);
clrscr();
printf("Press any key to flush DUMMY.FIL:");
getch();
/* flush the data to DUMMY.FIL without closing it */
flush(stream);
printf("\nFile was flushed, Press any key to quit:");
getch();
return 0;
}
void flush(FILE *stream)
{
int duphandle;
/* flush the stream's internal buffer */
fflush(stream);
/* make a duplicate file handle */
duphandle = dup(fileno(stream));
/* close the duplicate handle to flush the DOS buffer */
close(duphandle);
}
All I know about fflush() is that it is a library function used to flush an output buffer. I want to know what is the basic purpose of using fflush(), and where can I use it. And mainly I am interested in knowing what problems can there be with using fflush().
It's a little hard to say what "can be problems with" (excessive?) use of fflush. All kinds of things can be, or become, problems, depending on your goals and approaches. Probably a better way to look at this is what the intent of fflush is.
The first thing to consider is that fflush is defined only on output streams. An output stream collects "things to write to a file" into a large(ish) buffer, and then writes that buffer to the file. The point of this collecting-up-and-writing-later is to improve speed/efficiency, in two ways:
On modern OSes, there's some penalty for crossing the user/kernel protection boundary (the system has to change some protection information in the CPU, etc). If you make a large number of OS-level write calls, you pay that penalty for each one. If you collect up, say, 8192 or so individual writes into one large buffer and then make one call, you remove most of that overhead.
On many modern OSes, each OS write call will try to optimize file performance in some way, e.g., by discovering that you've extended a short file to a longer one, and it would be good to move the disk block from point A on the disk to point B on the disk, so that the longer data can fit contiguously. (On older OSes, this is a separate "defragmentation" step you might run manually. You can think of this as the modern OS doing dynamic, instantaneous defragmentation.) If you were to write, say, 500 bytes, and then another 200, and then 700, and so on, it will do a lot of this work; but if you make one big call with, say, 8192 bytes, the OS can allocate a large block once, and put everything there and not have to re-defragment later.
So, the folks who provide your C library and its stdio stream implementation do whatever is appropriate on your OS to find a "reasonably optimal" block size, and to collect up all output into chunk of that size. (The numbers 4096, 8192, 16384, and 65536 often, today, tend to be good ones, but it really depends on the OS, and sometimes the underlying file system as well. Note that "bigger" is not always "better": streaming data in chunks of four gigabytes at a time will probably perform worse than doing it in chunks of 64 Kbytes, for instance.)
But this creates a problem. Suppose you're writing to a file, such as a log file with date-and-time stamps and messages, and your code is going to keep writing to that file later, but right now, it wants to suspend for a while and let a log-analyzer read the current contents of the log file. One option is to use fclose to close the log file, then fopen to open it again in order to append more data later. It's more efficient, though, to push any pending log messages to the underlying OS file, but keep the file open. That's what fflush does.
Buffering also creates another problem. Suppose your code has some bug, and it sometimes crashes but you're not sure if it's about to crash. And suppose you've written something and it's very important that this data get out to the underlying file system. You can call fflush to push the data through to the OS, before calling your potentially-bad code that might crash. (Sometimes this is good for debugging.)
Or, suppose you're on a Unix-like system, and have a fork system call. This call duplicates the entire user-space (makes a clone of the original process). The stdio buffers are in user space, so the clone has the same buffered-up-but-not-yet-written data that the original process had, at the time of the fork call. Here again, one way to solve the problem is to use fflush to push buffered data out just before doing the fork. If everything is out before the fork, there's nothing to duplicate; the fresh clone won't ever attempt to write the buffered-up data, as it no longer exists.
The more fflush-es you add, the more you're defeating the original idea of collecting up large chunks of data. That is, you are making a tradeoff: large chunks are more efficient, but are causing some other problem, so you make the decision: "be less efficient here, to solve a problem more important than mere efficiency". You call fflush.
Sometimes the problem is simply "debug the software". In that case, instead of repeatedly calling fflush, you can use functions like setbuf and setvbuf to alter the buffering behavior of a stdio stream. This is more convenient (fewer, or even no, code changes required—you can control the set-buffering call with a flag) than adding a lot of fflush calls, so that could be considered a "problem with use (or excessive-use) of fflush".
Well, #torek's answer is almost perfect, but there's one point which is not so accurate.
The first thing to consider is that fflush is defined only on output
streams.
According to man fflush, fflush can also be used in input streams:
For output streams, fflush() forces a write of all user-space
buffered data for the given output or update stream via the stream's
underlying write function. For
input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by
the application. The open status of
the stream is unaffected.
So, when used in input, fflush just discard it.
Here is a demo to illustrate it:
#include<stdio.h>
#define MAXLINE 1024
int main(void) {
char buf[MAXLINE];
printf("prompt: ");
while (fgets(buf, MAXLINE, stdin) != NULL)
fflush(stdin);
if (fputs(buf, stdout) == EOF)
printf("output err");
exit(0);
}
fflush() empties the buffers related to the stream. if you e.g. let a user input some data in a very shot timespan (milliseconds) and write some stuff into a file, the writing and reading buffers may have some "reststuff" remaining in themselves. you call fflush() then to empty all the buffers and force standard outputs to be sure the next input you get is what the user pressed then.
reference: http://www.cplusplus.com/reference/cstdio/fflush/

Write system call writes data to disk directly?

I've read couple of questions(here) related to this but I still have some confusion.
My understanding is that write system call puts the data into Buffered Cache(OS caches as referred in that question). When the Buffered Cache gets full it is written to the disk.
Buffered IO is further optimization on top of this. It caches in the C RTL buffers and when they get full a write system call issued to move the contents to Buffered Cache. If I use fflush then data related to this particular file that is present in the C RTL buffers as well as Buffered Cache is sent to the disk.
Is my understanding correct?
How the stdio buffers are flushed is depending on the standard C library you use. To quote from the Linux manual page:
Note that fflush() only flushes the user space buffers provided by the C library.
To ensure that the data is physically stored on disk the kernel buffers must be
flushed too, for example, with sync(2) or fsync(2).
This means that on a Linux system, using fflush or overflowing the buffer will call the write function. But the operating system may keep internal buffers, and not actually write the data to the device. To make sure the data is truly written to the device, use both fflush and the low-level fsync.
Edit: Answer rephrased.

How does scanf() work inside the OS?

I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...
scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.
On my OS I am working with scanf and printf are based on functions getch() ant putch().
I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.
scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

Resources