One way to write into a file is by using fprintf(). However, this function does not write the results into a file immediately. It rather seems to write everything at once when the program is terminated or finished.
My question is the following: I have a program that takes very long time to run (4-5 hours for big dataset). During this time, I want to see the intermediate results so that I don't have to wait for 5 hours. My university uses Sun Grid Engine for job submission. As most of you know, you have to wait until your job finishes to see your final results. Thus, I want to be able to write the intermediate results into a text file and see the updated results as the program is processing (Similarly if I am using printf).
How can I modify fprintf() to write anything I want immediately to the target file?
You can use the fflush function after each write to flush the output buffer to disk.
fprintf(fileptr, "writing to file\n");
fflush(fileptr);
If you're on a POSIX system (i.e. Linux, BSD, etc), and you really want to be sure the file is written to disk, i.e. you want to flush the kernel buffers as well as the userspace buffers, also use fsync:
fsync(fileno(fileptr));
But fflush should be sufficient. Don't bother with fsync unless you find that you need to.
Maybe you can set FILE pointer _IONBF mode. Then you cloud not use fflush or fsync.
FILE *pFilePointor = fopen(...);
setvbuf(pFilePointor, NULL, _IONBF, 0);
fprintf(...)
fprintf(...)
fflush
This works on FILE *. For your case it looks more appropriate. Please note fflush(NULL) will update all opened files / streams and my be CPU intensive. You may like to use/avoid fflush(NULL) for performance reason.
fsync
This works on int descriptor. It not only updates file/stream, also metadata. It can work even in system crash / reboot cases as well. You can check man page for more details.
Personally I use fflush, and it works fine for me (in Ubuntu / Linux).
Related
On a Linux (Ubuntu Platform) device I use a file to save mission critical data.
From time to time (once in about 10,000 cases), the file gets corrupted for unspecified reasons.
In particular, the file is truncated (instead of some kbyte it has only about 100 bytes).
Now, in the sequence of the software
the file is opened,
modified and
closed.
Immediately after that, the file might be opened again (4), and something else is being done.
Up to now I didn't notice, that fflush (which is called upon fclose) doesn't write to the file system, but only to an intermediate buffer. Could it be, that the time between 3) and 4) is too short and the change from 2) is not yet written to disc, so when I reopen with 4) I get a truncated file which, when it is closed again leads to permanent loss of those data?
Should I use fsync() in that case after each file write?
What do I have to consider for power outages? It is not unlikely that the data corruption is related to power down.
fwrite is writing to an internal buffer first, then sometimes (at fflush or fclose or when the buffer is full) calling the OS function write.
The OS is also doing some buffering and writes to the device might get delayed.
fsync is assuring that the OS is writing its buffers to the device.
In your case where you open-write-close you don't need to fsync. The OS knows which parts of the file are not yet written to the device. So if a second process wants to read the file the OS knows that it has the file content in memory and will not read the file content from the device.
Of course when thinking about power outage it might (depending on the circumstances) be a good idea to fsync to be sure that the file content is written to the device (which as Andrew points out, does not necessarily mean that the content is written to disc, because the device itself might do buffering).
Up to now I didn't notice, that fflush (which is called upon fclose) doesn't write to the file system, but only in an intermediate buffer. Could it be, that the time between 3) and 4) is too short and the change from 2) is not yet written to disc, so when I reopen with 4) I get a truncated file which, when it is closed again leads to permanent loss of those data?
No. A system that behaved that way would be unusable.
Should I use fsync() in that case after each file write?
No, that will just slow things down.
What do I have to consider for power outtages? It is not unlikeley, that the data corruption is related to power down.
Use a filesystem that's resistant to such corruption. Possibly even consider using a safer modification algorithm such as writing out a new version of the file with a different name, syncing, and then renaming it on top of the existing file.
If what you're doing is something like this:
FILE *f = fopen("filename", "w");
while(...) {
fwrite(data, n, m, f);
}
fclose(f);
Then what can happen is that another process can open the file while it's being written (between the open and write system calls that the C library runs behind the scenes, or between separate write calls). Then they would see only a partially written file.
The workaround to that is to write the file with another name, and rename() it over the actual filename. The downside is that you need double the amount of space.
If you are sure the opening of the file happens only after the write, then that cannot happen. But then there has to be some syncronization between the writer and reader so that the latter does not start reading too early.
fsync() tells the system to write the changes to the actual storage, which is a bit of an oddball within the other POSIX system calls, since I think nothing is specified of a system if it crashes, and that's the only situation where it matters if some data is stored on the actual storage, and not in some cache. Even with fsync() it's still possible for the storage hardware to cache the data, or for an unrelated corruption to trash the file system when the system crashes.
If you're happy to let the OS do its job, and don't need to think about crashes, you can ignore fsync() completely and just let the data be written when the OS sees fit. If you do care about crashes, you have to look more closely into what guarantees the filesystem makes (or doesn't). E.g. at least at some point, the ext* developers pretty much demanded applications to do an fsync() on the containing directory, too.
I understand the general process of writing and reading from a file, but I was curious as to what is happening under the hood during file writing. For instance, I have written a program that writes a series of numbers, line by line, to a .txt file. One thing that bothers me however is that I don't see the information written until after my c program is finished running. Is there a way to see the information written while the program is running rather than after? Is this even possible to do? This is a hard question to phrase in one line, so please forgive me if it's already been answered elsewhere.
The reason I ask this is because I'm writing to a file and was hoping that I could scan the file for the highest and lowest values (the program would optimally be able to run for hours).
Research buffering and caching.
There are a number of layers of optimisation performed by:
your application,
your OS, and
your disk driver,
in order to extend the life of your disk and increase performance.
With the careful use of flushing commands, you can generally make things happen "quite quickly" when you really need them to, though you should generally do so sparingly.
Flushing can be particularly useful when debugging.
The GNU C Library documentation has a good page on the subject of file flushing, listing functions such as fflush which may do what you want.
You observe an effect solely caused by the C standard I/O (stdio) buffers. I claim that any OS or disk driver buffering has nothing to do with it.
In stdio, I/O happens in one of three modes:
Fully buffered, data is written once BUFSIZ (from <stdio.h>) characters were accumulated. This is the default when I/0 is redirected to a file or pipe. This is what you observe. Typically BUFSIZ is anywhere from 1k to several kBytes.
Line buffered, data is written once a newline is seen (or BUFSIZ is reached). This is the default when i/o is to a terminal.
Unbuffered, data is written immediately.
You can use the setvbuf() (<stdio.h>) function to change the default, using the _IOFBF, _IOLBF or _IONBF macros, respectively. See your friendly setvbuf man page.
In your case, you can set your output stream (stdout or the FILE * returned by fopen) to line buffered.
Alternatively, you can call fflush() on the output stream whenever you want I/O to happen, regardless of buffering.
Indeed, there are several layers between the writing commands resp. functions and the actual file.
First, you open the file for writing. This causes the file to be either created or emptied. If you write then, the write doesn't actually occur immediately, but the data are cached until the buffer is full or the file is flushed or closed.
You can call fflush() for writing each portion of data, or you can actually wait until the file is closed.
Yes, it is possible to see whats written in the file(s). If you programm under Linux you can open a new Terminal and watch the progress with for example "less Filename".
I would like to ask a fundamental question about when is it useful to use a system call like fsync. I am beginner and i was always under the impression that write is enough to write to a file, and samples that use write actually write to the file at the end.
So what is the purpose of a system call like fsync?
Just to provide some background i am using Berkeley DB library version 5.1.19 and there is a lot of talk around the cost of fsync() vs just writing. That is the reason i am wondering.
Think of it as a layer of buffering.
If you're familiar with the standard C calls like fopen and fprintf, you should already be aware of buffering happening within the C runtime library itself.
The way to flush those buffers is with fflush which ensures that the information is handed from the C runtime library to the OS (or surrounding environment).
However, just because the OS has it, doesn't mean it's on the disk. It could get buffered within the OS as well.
That's what fsync takes care of, ensuring that the stuff in the OS buffers is written physically to the disk.
You may typically see this sort of operation in logging libraries:
fprintf (myFileHandle, "something\n"); // output it
fflush (myFileHandle); // flush to OS
fsync (fileno (myFileHandle)); // flush to disk
fileno is a function which gives you the underlying int file descriptor for a given FILE* file handle, and fsync on the descriptor does the final level of flushing.
Now that is a relatively expensive operation since the disk write is usually considerably slower than in-memory transfers.
As well as logging libraries, one other use case may be useful for this behaviour. Let me see if I can remember what it was. Yes, that's it. Databases! Just like Berzerkely DB. Where you want to ensure the data is on the disk, a rather useful feature for meeting ACID requirements :-)
I write Server for client-server application in C. I have to save logs to file.
I write it into a file using fprintf, but when the server go down I lost the data in the file, cause I don't close filedescriptor, is there any function which tell my program save the data?
Thx
If you fflush after every fprintf it helps.
fflush should do what you want — it ensures all output is explicitly written to the file rather than e.g. being cached for later writing. So that moves the data out of user space.
sync can then be used — it causes all buffered file changes to be physically written (though per the spec it needn't block until the writes are complete, so you can be certain they've started but not that they've finished).
From what I've read, flush pushes data into the OS buffers and sync makes sure that data goes down to the storage media. So, if you want to be sure that data is actually written to disk, you need to do a flush followed by a sync. So, are there any cases where you want to call flush but not sync?
You only want to fflush if you're using stdio's FILE *. This writes a user space buffer to the kernel.
The other answers seem to be missing fdatasync. This is the system call you want to flush a specific file descriptor to disk.
When you fflush, you flush the buffer of one file to disk (unless you give NULL, in which case it flushes all open files). http://www.manpagez.com/man/3/fflush/
When you sync, you flush all the buffers to disk. http://www.manpagez.com/man/2/sync/
The most important thing that you should notice is that fflush is a standard function, while sync is a system call provided by the operating system (Linux for example).
So basically, if you are writing portable program, you in fact never use sync.
Yes, lots. Most programs most of the time would not bother to call any of the various sync operations; flushing the data into the kernel buffer pool as you close the file is sufficient. This is doubly true if you're using a journalled file system.
Note that flushing is a higher level operation than the read() or similar system calls. It is used by the C <stdio.h> library, or the C++ <iostream> library. The system calls inherently flush the data to the kernel buffer pool (or direct to disk if you're using direct I/O or something similar).
Note, too, that on POSIX-like systems, you can arrange for data sync etc by setting flags on the open() system call (O_SYNC, O_DSYNC, O_RSYNC), or subsequently via fcntl().
Just to clarify, fflush() applies only when using the FILE interface of UNIX that buffers writes at the application level. In case the normal write() call is used, fflush() makes little sense.
Having said that, I can think of two situations where you would like to call fflush() but not sync:
You want to make sure that the data will eventually make it to disk even though the application crashes.
Force to screen the data that the application has written to standard output so far.
The second case is the most common use I have seen and it is usually required if the printf() call does not end with a new line character ('\n').