C - implications of fflush(stdout) - c

Does fflush(stdout) do anything besides flushing the output buffer?
Or what does flushing the output buffer imply?
Because in a scheduler, I just resolved a segfault by throwing an fflush(stdout) into the context switch, even though for debugging purposes, all writes to stdout had been disabled, which - as far as I'm concerned - should have rendered any kind of flushing obsolete.

For output streams, fflush() with non-null argument writes any unwritten data from the stream's buffer to the associated output device. How does it do this is implementation dependent.
Because in a scheduler, I just resolved a segfault by throwing an
fflush(stdout) into the context switch
You can close stdout explicitly at the start of program to verify your findings. Good chances are that the problem lies somewhere else or the implementation of stream on your system is buggy.

Without seeing actual code, we can only speculate.
However, adding more code to a function can have an indirect effect of changing memory layout in your program. The nature of the change - if any - depends on what the function does (does it allocate memory, declare lots of variables, etc), how the operating system manages the executable code in memory while running it, etc.
Odds are, somewhere else in your code, there is an invalid operation (invalid operation with a pointer, etc) and the effect of the additional statement is simply changing the symptom, by changing what your program is doing with the affected memory.
I suppose it is remotely possible that there is a bug in fflush(). But I wouldn't bet on it - standard I/O functions like fflush() are used by a lot of people and bugs in such function in a library that has existed for a while (e.g. from a vendor that has released several versions) are likely to have historically been found, reported, and fixed.

Related

Buffering expectations using `printf`

Say there exists a C program that executes in some Linux process. Upon start, the C program calls setvbuf to disable buffering on stdout. The program then alternates between two "logical" calls ("logical" in this sense to avoid consideration of the compiler possibly reordering instructions) - the first to printf() and the second incrementing a variable.
int main (int argc, char **argv)
{
setvbuf(stdout, NULL, _IONBF, 0);
unsigned int a = 0;
for (;;) {
printf("hello world!");
a++;
}
}
At some point assume the program receives a signal, e.g. via kill, that causes the program to terminate. Will the contents of stdout always be complete after the signal is received, in the sense that they include the result of all previous invocations to printf(), or is this dependent on other levels of buffering/other behavior not controllable via setvbuf (e.g. kernel buffering)?
The broader context of this question is, if using a synchronous logging mechanism in a C application (e.g. all threads log with printf()), can the log be trusted to be "complete" for all calls that have returned from printf() upon receiving some application-terminating signal?
Edit: I've edited the code snippet and question to remove undefined behavior for clarity.
Any sane interpretation of the expression "unbuffered stream" means that the data has left the stream object when printf returns. In the case of file-descriptor backed streams, that means the data has entered kernel-space, and the kernel should continue sending the data to its final destination (assuming no kernel panic, power loss etc).
But a problem with segfaults is that they may not happen when you think they do. Take for instance the following code:
int *p = NULL;
printf("hello world\n");
*p = 1;
A dumb non-optimizing compiler may create code that segfaults at *p=1;. But that is not the only possibility according to the c-standard. A compiler may for instance, if it can prove that printf doesn't depend on the contents of *p, reorganize the code like this:
int *p = NULL;
*p = 1;
printf("hello world\n");
In that case printf would never be called.
Another possibility is that, since p==NULL, *p=1 is invalid, the compiler may scrap that expression all together.
EDIT: The poster has changed the question from "Segfaulting" to being killed. In that case, it should all depend on if the kernel closes open file descriptors on exit the same way as close does, or not.
Given a construct like:
fprintf(file1, "whatever"); fflush(file1);
file2 = fopen(someExistingFile, "w");
there are some circumstances where it may be essential that fopen doesn't overwrite the existing file unless or until the write to file1 can be guaranteed successful, but there are others where waiting until success of the fflush can be assured before starting the fopen would needlessly degrade performance. In order to allow designers of C implementations to weigh such considerations however they see fit, and also avoid requiring that implementations provide semantic guarantees beyond those offered by the underlying OS (e.g. if an OS reports that the fflush() is complete before data is written to disk, and offers no way of finding out when all pending writes are complete, there would be no way the Standard could usefully require that an implementation which targets that OS must not allow fflush to return at any time when the write could still fail).
So, it appears that there's a basic misunderstanding in your question, and I think it's important to go through the basics of what printf is -> if your stdout buffer size is 0, then the question of "will all data be sent out of the buffer" is always yes, since there isn't a hardware buffer to save data, in theory. That is, somewhere in your computer hardware there's a something like a UART chip, that has a small buffer for transferring data. Most programs I've seen do not use this hardware buffer, so It's not surprising that your program does this.
However, the printf function has an upper layer buffer (in my application ~150 characters), and I'm assuming that this is the buffer you're asking about, note that this is not the same thing as the stdout buffer, its just an allocated piece of memory that stores messages before they're sent to wherever you want them to go. Think about it - if there were no printf-specific buffer you would only be able to send 1 character per function call
Now it really depends on the implementation of printf on your system, if it's nonblocking or blocking. If it's nonblocking, that could mean that data is being transferred by an interrupt or a DMA, probably a combination of both. In which case it depends on if your system stops these transfer mechanisms in the middle of a transfer, or allows them to complete. It's impossible for me to say based on the information you've given
However, in my experience, printf is usually a blocking function; that is it locks up the rest of your code while it's transferring things out of the buffer and moves to the next command only once it's completed, in which case if you have stopped the code from running (again, I'm not certain on the specifics of "kill" in your system) then you have also stopped the transfer.
Your system most likely has blocking PRINTF calls, and considering you say a "kill" signal it sounds like you're not even really sure what you mean by that. I think it's safe to assume that whatever signal you're talking about is not internally stopping your printf function from completing, so your full message will probably be sent before exiting, even if it arrives mid-printf. If your printf is being called it most likely is completing and sending the full message, unless this "kill" signal does something odd. That's the best answer I can give you from a "C" standpoint - if you would like a more absolute answer you would have to give us information that lets us see the implementation of "printf" on your operating system, and/or give us more specifics on how this "kill signal" you mentioned works

Is it possible to implement printf() in a non blocking manner in linux terminal

I have a printf statement after sleep for 1 second. Since the printf statement takes longer than 1 second , the refresh rate is more than 2 seconds. Here is an example i am talking about :
while(1){
printf("%s",buf);//Takes more than one second to print a table. Only few
//values are updated
sleep(1);
}
How can i have a printf to be non blocking. Is there a way in a standard linux machine ?
-Sanatan
If you only care about what shows on the screen, that is one of the problems that curses addresses. Using curses, you could update the display using reasonably optimal output (only the changed areas would be updated rather than printing the whole table each time), and with the typeahead feature, you can alleviate the problem of falling behind if the updates are too rapid.
It's more complicated than just printf. But with printf, the buffer will get full, and there is nowhere to put it except to the standard output. In some implementations, you could use setvbuf to assign a new output buffer, but POSIX frowns on that after output has started, saying:
The setvbuf() function may be used after the stream pointed to by stream is associated with an open file but before any other operation (other than an unsuccessful call to setvbuf()) is performed on the stream.
Because of this, ncurses has treated setvbuf (and similar functions such as setbuf) with caution. In the current release, to solve other problems, ncurses no longer uses this function. But it is still documented:
ncurses enabled buffered output during terminal initialization. This was done (as in SVr4 curses) for performance reasons. For testing purposes, both of ncurses and
certain applications, this feature was made optional.
Setting the NCURSES_NO_SETBUF variable disabled output
buffering, leaving the output in the original (usually
line buffered) mode.
The function printf is a buffered function. It only flush to stdout when the buffer is full or when you force it. If you print a \n this will full the buffer. What you can do is to use the fflush function on stdout to force it.

How are files written? Why do I not see my data written immediately?

I understand the general process of writing and reading from a file, but I was curious as to what is happening under the hood during file writing. For instance, I have written a program that writes a series of numbers, line by line, to a .txt file. One thing that bothers me however is that I don't see the information written until after my c program is finished running. Is there a way to see the information written while the program is running rather than after? Is this even possible to do? This is a hard question to phrase in one line, so please forgive me if it's already been answered elsewhere.
The reason I ask this is because I'm writing to a file and was hoping that I could scan the file for the highest and lowest values (the program would optimally be able to run for hours).
Research buffering and caching.
There are a number of layers of optimisation performed by:
your application,
your OS, and
your disk driver,
in order to extend the life of your disk and increase performance.
With the careful use of flushing commands, you can generally make things happen "quite quickly" when you really need them to, though you should generally do so sparingly.
Flushing can be particularly useful when debugging.
The GNU C Library documentation has a good page on the subject of file flushing, listing functions such as fflush which may do what you want.
You observe an effect solely caused by the C standard I/O (stdio) buffers. I claim that any OS or disk driver buffering has nothing to do with it.
In stdio, I/O happens in one of three modes:
Fully buffered, data is written once BUFSIZ (from <stdio.h>) characters were accumulated. This is the default when I/0 is redirected to a file or pipe. This is what you observe. Typically BUFSIZ is anywhere from 1k to several kBytes.
Line buffered, data is written once a newline is seen (or BUFSIZ is reached). This is the default when i/o is to a terminal.
Unbuffered, data is written immediately.
You can use the setvbuf() (<stdio.h>) function to change the default, using the _IOFBF, _IOLBF or _IONBF macros, respectively. See your friendly setvbuf man page.
In your case, you can set your output stream (stdout or the FILE * returned by fopen) to line buffered.
Alternatively, you can call fflush() on the output stream whenever you want I/O to happen, regardless of buffering.
Indeed, there are several layers between the writing commands resp. functions and the actual file.
First, you open the file for writing. This causes the file to be either created or emptied. If you write then, the write doesn't actually occur immediately, but the data are cached until the buffer is full or the file is flushed or closed.
You can call fflush() for writing each portion of data, or you can actually wait until the file is closed.
Yes, it is possible to see whats written in the file(s). If you programm under Linux you can open a new Terminal and watch the progress with for example "less Filename".

flush without sync

From what I've read, flush pushes data into the OS buffers and sync makes sure that data goes down to the storage media. So, if you want to be sure that data is actually written to disk, you need to do a flush followed by a sync. So, are there any cases where you want to call flush but not sync?
You only want to fflush if you're using stdio's FILE *. This writes a user space buffer to the kernel.
The other answers seem to be missing fdatasync. This is the system call you want to flush a specific file descriptor to disk.
When you fflush, you flush the buffer of one file to disk (unless you give NULL, in which case it flushes all open files). http://www.manpagez.com/man/3/fflush/
When you sync, you flush all the buffers to disk. http://www.manpagez.com/man/2/sync/
The most important thing that you should notice is that fflush is a standard function, while sync is a system call provided by the operating system (Linux for example).
So basically, if you are writing portable program, you in fact never use sync.
Yes, lots. Most programs most of the time would not bother to call any of the various sync operations; flushing the data into the kernel buffer pool as you close the file is sufficient. This is doubly true if you're using a journalled file system.
Note that flushing is a higher level operation than the read() or similar system calls. It is used by the C <stdio.h> library, or the C++ <iostream> library. The system calls inherently flush the data to the kernel buffer pool (or direct to disk if you're using direct I/O or something similar).
Note, too, that on POSIX-like systems, you can arrange for data sync etc by setting flags on the open() system call (O_SYNC, O_DSYNC, O_RSYNC), or subsequently via fcntl().
Just to clarify, fflush() applies only when using the FILE interface of UNIX that buffers writes at the application level. In case the normal write() call is used, fflush() makes little sense.
Having said that, I can think of two situations where you would like to call fflush() but not sync:
You want to make sure that the data will eventually make it to disk even though the application crashes.
Force to screen the data that the application has written to standard output so far.
The second case is the most common use I have seen and it is usually required if the printf() call does not end with a new line character ('\n').

refresh stream buffer while reading /proc

I'm reading from /proc/pid/task/stat to keep track of cpu usage in a thread.
fopen on /proc/pic/task/stat
fget a string from the stream
sscanf on the string
I am having issues however getting the streams buffer to update.
If I fget 1024 characters if regreshes but if I fget 128 characters then it never updates and I always get the same stats.
I rewind the stream before the read and have tried fsync.
I do this very frequently so I'd rather not reopen to file each time.
What is the right way to do this?
Not every program benefits from the use of buffered I/O.
In your case, I think I would just use read(2)1. This way, you:
eliminate all stale buffer2 issues
probably run faster via the elimination of double buffering
probably use less memory
definitely simplify the implementation
For a case like you describe, the efficiency gain may not matter on today's remarkably powerful CPUs. But I will point out that programs like cp(2) and other heavy-duty data movers don't use buffered I/O packages.
1. That is, open(2), read(2), lseek(2), and close(2).
2. And perhaps to intercept an argument, on questions related to this one someone usually offers a "helpful" suggestion along the lines of fflush(stdin), and then another someone comes along to accurately point out that fflush() is defined by C99 on output streams only, and that it's usually unwise to depend on implementation-specific behavior.

Resources