Defering FILE flush to when the file is closed - c

I want to defer flushing the contents of the FILE* to when fclose is called. In other words, I only want to write to disk when fclose is called and keep buffering the contents until then. Is it possible to do that with FILE* or I need to write my own code?

If you want to buffer (and under no circumstances write to the disk until the file is closed), then your best bet is to write to a buffer in memory (assuming that it will fit in memory, of course), and then write that buffer in one go and then call fclose().

Related

c language : fwrite doesnt do anything

{
FILE* f1 = fopen("C:\\num1.bin", "wb+");//it will create a new file
int A[] = { 1,3,6,28 }; //int arr
fwrite(A, sizeof(A), 1, f1); //should insert the A array to the file
}
I do see the file but even after the fwrite, the file remains empty (0 bytes), does anyone know why?
You need to close the file with fclose
Otherwise the write buffer will not (necessarily) force the file contents to be written to disk
A couple of things:
As #Grantly correctly noted above, you are missing a call to fclose or fflush after writing to the file. Without this any cached/pending writes will not necessarily be actually written to the open file.
You do not check the return value of fopen. If fopen fails for any reason it will return a NULL pointer and not a valid file pointer. Since you're writing directly to the root of the drive C:\ on a Windows platform, that's something you definitely do want to be checking for (not that you shouldn't in other cases too, but run under a regular user account that location is often write protected).
Result of fwrite is not required to appear in the fille immediately after it returns. That is because file operations usually work in a buffered manner, i.e. they are cached and then flushed to speed things up and improve the performance.
The content of the file will be updated after you call fclose:
fclose()
(...) Any unwritten buffered data are flushed to the OS. Any unread buffered
data are discarded.
You may also explicitly flush the internal buffer without closing the file using fflush:
fflush()
For output streams (and for update streams on which the last operation
was output), writes any unwritten data from the stream's buffer to the
associated output device.

fwrite() return value does not match with file size

I am using fwrite() to write data to the file in a loop. It repeats for 40 times, but during the first fwrite() last few KBs are not written to the file. Whereas the consecutive fwrite() works fine.When I run for a single chunk say,4000 bytes fwrite() returns 4000, but the file content is less than 4000 (nearly 3000 bytes). There is no error reported during perror() also. Kindly help me in this and thanks in advance.
fwrite doesn't write to a file. The function writes to a buffered file stream. The buffer is flushed to disk when you close the file with fclose or flush the buffer with fflush. Use the appropriate function to empty the buffer.
The buffer is used to get a good performance. If performance is not your concern and you need the direct write to disk you can use the setbuf or setvbuf function to disable the buffer. (Thanks for comment by Barmar).

Understanding the need for fflush() and problems associated with it

Below is sample code for using fflush():
#include <string.h>
#include <stdio.h>
#include <conio.h>
#include <io.h>
void flush(FILE *stream);
int main(void)
{
FILE *stream;
char msg[] = "This is a test";
/* create a file */
stream = fopen("DUMMY.FIL", "w");
/* write some data to the file */
fwrite(msg, strlen(msg), 1, stream);
clrscr();
printf("Press any key to flush DUMMY.FIL:");
getch();
/* flush the data to DUMMY.FIL without closing it */
flush(stream);
printf("\nFile was flushed, Press any key to quit:");
getch();
return 0;
}
void flush(FILE *stream)
{
int duphandle;
/* flush the stream's internal buffer */
fflush(stream);
/* make a duplicate file handle */
duphandle = dup(fileno(stream));
/* close the duplicate handle to flush the DOS buffer */
close(duphandle);
}
All I know about fflush() is that it is a library function used to flush an output buffer. I want to know what is the basic purpose of using fflush(), and where can I use it. And mainly I am interested in knowing what problems can there be with using fflush().
It's a little hard to say what "can be problems with" (excessive?) use of fflush. All kinds of things can be, or become, problems, depending on your goals and approaches. Probably a better way to look at this is what the intent of fflush is.
The first thing to consider is that fflush is defined only on output streams. An output stream collects "things to write to a file" into a large(ish) buffer, and then writes that buffer to the file. The point of this collecting-up-and-writing-later is to improve speed/efficiency, in two ways:
On modern OSes, there's some penalty for crossing the user/kernel protection boundary (the system has to change some protection information in the CPU, etc). If you make a large number of OS-level write calls, you pay that penalty for each one. If you collect up, say, 8192 or so individual writes into one large buffer and then make one call, you remove most of that overhead.
On many modern OSes, each OS write call will try to optimize file performance in some way, e.g., by discovering that you've extended a short file to a longer one, and it would be good to move the disk block from point A on the disk to point B on the disk, so that the longer data can fit contiguously. (On older OSes, this is a separate "defragmentation" step you might run manually. You can think of this as the modern OS doing dynamic, instantaneous defragmentation.) If you were to write, say, 500 bytes, and then another 200, and then 700, and so on, it will do a lot of this work; but if you make one big call with, say, 8192 bytes, the OS can allocate a large block once, and put everything there and not have to re-defragment later.
So, the folks who provide your C library and its stdio stream implementation do whatever is appropriate on your OS to find a "reasonably optimal" block size, and to collect up all output into chunk of that size. (The numbers 4096, 8192, 16384, and 65536 often, today, tend to be good ones, but it really depends on the OS, and sometimes the underlying file system as well. Note that "bigger" is not always "better": streaming data in chunks of four gigabytes at a time will probably perform worse than doing it in chunks of 64 Kbytes, for instance.)
But this creates a problem. Suppose you're writing to a file, such as a log file with date-and-time stamps and messages, and your code is going to keep writing to that file later, but right now, it wants to suspend for a while and let a log-analyzer read the current contents of the log file. One option is to use fclose to close the log file, then fopen to open it again in order to append more data later. It's more efficient, though, to push any pending log messages to the underlying OS file, but keep the file open. That's what fflush does.
Buffering also creates another problem. Suppose your code has some bug, and it sometimes crashes but you're not sure if it's about to crash. And suppose you've written something and it's very important that this data get out to the underlying file system. You can call fflush to push the data through to the OS, before calling your potentially-bad code that might crash. (Sometimes this is good for debugging.)
Or, suppose you're on a Unix-like system, and have a fork system call. This call duplicates the entire user-space (makes a clone of the original process). The stdio buffers are in user space, so the clone has the same buffered-up-but-not-yet-written data that the original process had, at the time of the fork call. Here again, one way to solve the problem is to use fflush to push buffered data out just before doing the fork. If everything is out before the fork, there's nothing to duplicate; the fresh clone won't ever attempt to write the buffered-up data, as it no longer exists.
The more fflush-es you add, the more you're defeating the original idea of collecting up large chunks of data. That is, you are making a tradeoff: large chunks are more efficient, but are causing some other problem, so you make the decision: "be less efficient here, to solve a problem more important than mere efficiency". You call fflush.
Sometimes the problem is simply "debug the software". In that case, instead of repeatedly calling fflush, you can use functions like setbuf and setvbuf to alter the buffering behavior of a stdio stream. This is more convenient (fewer, or even no, code changes required—you can control the set-buffering call with a flag) than adding a lot of fflush calls, so that could be considered a "problem with use (or excessive-use) of fflush".
Well, #torek's answer is almost perfect, but there's one point which is not so accurate.
The first thing to consider is that fflush is defined only on output
streams.
According to man fflush, fflush can also be used in input streams:
For output streams, fflush() forces a write of all user-space
buffered data for the given output or update stream via the stream's
underlying write function. For
input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by
the application. The open status of
the stream is unaffected.
So, when used in input, fflush just discard it.
Here is a demo to illustrate it:
#include<stdio.h>
#define MAXLINE 1024
int main(void) {
char buf[MAXLINE];
printf("prompt: ");
while (fgets(buf, MAXLINE, stdin) != NULL)
fflush(stdin);
if (fputs(buf, stdout) == EOF)
printf("output err");
exit(0);
}
fflush() empties the buffers related to the stream. if you e.g. let a user input some data in a very shot timespan (milliseconds) and write some stuff into a file, the writing and reading buffers may have some "reststuff" remaining in themselves. you call fflush() then to empty all the buffers and force standard outputs to be sure the next input you get is what the user pressed then.
reference: http://www.cplusplus.com/reference/cstdio/fflush/

ftruncate on file opened with fopen

Platform is Ubuntu Linux on ARM.
I want to write a string to a file, but I want every time to truncate the file and then write the string, i.e. no append.
I have this code:
f=fopen("/home/user1/refresh.txt","w");
fputs( "{"some string",f);
fflush(f);
ftruncate(fileno(f),(off_t)0);
flcose(f);
If I run it and then check the file, it will be of zero length and when opened, there will be nothing in it.
If I remove the fflush call, it will NOT be 0 (will be 11) and when I open it there will be "some string" in it.
Is this the normal behavior?
I do not have a problem calling fflush, but I want to do this in a loop and calling fflush may increase the execution time considerably.
You should not really mix file handle and file descriptor calls like that.
What's almost certainly happening without the fflush is that the some string is waiting in file handle buffers for delivery to the file descriptor. You then truncate the file descriptor and fclose the file handle, flushing the string, hence it shows up in the file.
With the fflush, some string is sent to the file descriptor and then you truncate it. With no further flushing, the file stays truncated.
If you want to literally "truncate the file then write", then it's sufficient to:
f=fopen("/home/user1/refresh.txt","w");
fputs("some string",f);
fclose(f);
Opening the file in the mode w will truncate it (as opposed to mode a which is for appending to the end).
Also calling fclose will flush the output buffer so no data gets lost.
POSIX requires you to take specific actions (which ensure that no ugly side effects of buffering make your program go haywire) when switching between using a FILE stream and a file descriptor to access the same open file. This is described in XSH 2.5.1 Interaction of File Descriptors and Standard I/O Streams.
In your case, I believe it should suffice to just call fflush before ftruncate, like you're doing. Omitting this step, per the rules of 2.5.1, results in undefined behavior.

How OS performs buffering for a file

I know that when you call fwrite or fprintf or rather any other function that writes to a file, the contents aren't immediately flushed to the disk, but buffered in the memory.
Firstly, where do the OS manage these buffers and how. Secondly, if you do the write to a file and later read in the content you wrote and assuming that the OS didn't flushed the contents between the time you wrote and read, how it knows that it has to return the read from the buffer? How does it handle this situation.
The reason I want to know this is that I'm interested in implementing my own buffering scheme in user-space, rather than kernel space as done by OS. That is, write to a file would be buffered in user-space and the actual write will only occur at a certain point. Consquently I also need to handle situations where read is called for the content that is still in the buffer. Is it possible to do all this in user-space.
Firstly, where do the OS manage these buffers and how
The functions fwrite and fprintf use stdio buffers which already are completely in userspace. The buffers are (likely) static arrays or perhaps malloced memory.
how it knows that it has to return the read from the buffer
It doesn't, so the changes aren't seen. Nothing actually happens to a file until the underlying system call (write) is called (and even then - read on).
Is it possible to do all this in user-space
No, it's not possible. The good news is that the kernel already has buffers so every write you do isn't atually translated into an actual write to the file. It is postponed and executed later. If in the meantime somebody tries to read from the file, the kernel is smart enough to serve him from the buffer.
Bits from TLPI:
When working with disk files, the read() and write() system calls
don’t directly ini- tiate disk access. Instead, they simply copy data
between a user-space buffer and a buffer in the kernel buffer cache.
When performing I/O on a disk file, a successful return from write()
doesn’t guarantee that the data has been transferred to disk, because
the kernel performs buffering of disk I/O in order to reduce disk
activity and expedite write() calls.
At some later point, the kernel writes (flushes) its buffer to the
disk.
If, in the interim, another process attempts to read these bytes of
the file, then the kernel automatically supplies the data from the
buffer cache, rather than from (the outdated contents of) the file.
So you may want to find out about sync and fsync.
Multiple levels of buffering are generally bad. The reason stdio buffers are useful is that they minimize the number of system calls performed. If a system call would be cheaper nobody would use stdio buffers anymore.

Resources