The core of my app looks approximately as follows:
size_t bufsize;
char* buf1;
size_t r1;
FILE* f1=fopen("/path/to/file","rb");
...
do{
r1=fread(buf1, 1, bufsize, f1);
processChunk(buf1,r1);
} while (!feof(f1));
...
(In reality, I have multiple FILE*'s and multiple bufN's.) Now, I hear that FILE is quite ready to manage a buffer (referred to as a "stream buffer") all by itself, and this behavior appears to be quite tweakable: https://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering .
How can I refactor the above piece of code to ditch the buf1 buffer and use f1's internal stream buffer instead (while setting it to bufsize)?
If you don't want opaquely buffered I/O, don't use FILE *. Use lower-level APIs that let you manage all the application-side buffering yourself, such as plain POSIX open() and read() for instance.
So I've read a little bit of the C standard and run some benchmarks and here are my findings:
1) Doing it as in the above example does involve unnecessary in-memory copying, which increases the user time of simple cmp program based on the above example about twice. Nevertheless user-time is insignificant for most IO-heavy programs, unless the source of the file is extremely fast.
On in-memory file-sources (/dev/shm on Linux), however, turning off FILE buffering (setvbuf(f1, NULL, _IONBF, 0);) does yield a nice and consistent speed increase of about 10–15% on my machine when using buffsizes close to BUFSIZ (again, measured on the IO-heavy cmp utility based on the above snippet, which I've already mentioned, which I've tested on 2 identical 700MB files 100 times).
2) Whereas there is an API for setting the FILE buffer, I haven't found any standardized API for reading it, so I'm going to stick with the true and tested way of doing, but with the FILE buffer off (setvbuf(f1, NULL, _IONBF, 0);)
(But I guess I could solve my question by setting my own buffer as the FILE stream buffer with the _IONBF mode option (=turn off buffering), and then I could just access it via some unstandardized pointer in the FILE struct.)
Related
I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen
Below is sample code for using fflush():
#include <string.h>
#include <stdio.h>
#include <conio.h>
#include <io.h>
void flush(FILE *stream);
int main(void)
{
FILE *stream;
char msg[] = "This is a test";
/* create a file */
stream = fopen("DUMMY.FIL", "w");
/* write some data to the file */
fwrite(msg, strlen(msg), 1, stream);
clrscr();
printf("Press any key to flush DUMMY.FIL:");
getch();
/* flush the data to DUMMY.FIL without closing it */
flush(stream);
printf("\nFile was flushed, Press any key to quit:");
getch();
return 0;
}
void flush(FILE *stream)
{
int duphandle;
/* flush the stream's internal buffer */
fflush(stream);
/* make a duplicate file handle */
duphandle = dup(fileno(stream));
/* close the duplicate handle to flush the DOS buffer */
close(duphandle);
}
All I know about fflush() is that it is a library function used to flush an output buffer. I want to know what is the basic purpose of using fflush(), and where can I use it. And mainly I am interested in knowing what problems can there be with using fflush().
It's a little hard to say what "can be problems with" (excessive?) use of fflush. All kinds of things can be, or become, problems, depending on your goals and approaches. Probably a better way to look at this is what the intent of fflush is.
The first thing to consider is that fflush is defined only on output streams. An output stream collects "things to write to a file" into a large(ish) buffer, and then writes that buffer to the file. The point of this collecting-up-and-writing-later is to improve speed/efficiency, in two ways:
On modern OSes, there's some penalty for crossing the user/kernel protection boundary (the system has to change some protection information in the CPU, etc). If you make a large number of OS-level write calls, you pay that penalty for each one. If you collect up, say, 8192 or so individual writes into one large buffer and then make one call, you remove most of that overhead.
On many modern OSes, each OS write call will try to optimize file performance in some way, e.g., by discovering that you've extended a short file to a longer one, and it would be good to move the disk block from point A on the disk to point B on the disk, so that the longer data can fit contiguously. (On older OSes, this is a separate "defragmentation" step you might run manually. You can think of this as the modern OS doing dynamic, instantaneous defragmentation.) If you were to write, say, 500 bytes, and then another 200, and then 700, and so on, it will do a lot of this work; but if you make one big call with, say, 8192 bytes, the OS can allocate a large block once, and put everything there and not have to re-defragment later.
So, the folks who provide your C library and its stdio stream implementation do whatever is appropriate on your OS to find a "reasonably optimal" block size, and to collect up all output into chunk of that size. (The numbers 4096, 8192, 16384, and 65536 often, today, tend to be good ones, but it really depends on the OS, and sometimes the underlying file system as well. Note that "bigger" is not always "better": streaming data in chunks of four gigabytes at a time will probably perform worse than doing it in chunks of 64 Kbytes, for instance.)
But this creates a problem. Suppose you're writing to a file, such as a log file with date-and-time stamps and messages, and your code is going to keep writing to that file later, but right now, it wants to suspend for a while and let a log-analyzer read the current contents of the log file. One option is to use fclose to close the log file, then fopen to open it again in order to append more data later. It's more efficient, though, to push any pending log messages to the underlying OS file, but keep the file open. That's what fflush does.
Buffering also creates another problem. Suppose your code has some bug, and it sometimes crashes but you're not sure if it's about to crash. And suppose you've written something and it's very important that this data get out to the underlying file system. You can call fflush to push the data through to the OS, before calling your potentially-bad code that might crash. (Sometimes this is good for debugging.)
Or, suppose you're on a Unix-like system, and have a fork system call. This call duplicates the entire user-space (makes a clone of the original process). The stdio buffers are in user space, so the clone has the same buffered-up-but-not-yet-written data that the original process had, at the time of the fork call. Here again, one way to solve the problem is to use fflush to push buffered data out just before doing the fork. If everything is out before the fork, there's nothing to duplicate; the fresh clone won't ever attempt to write the buffered-up data, as it no longer exists.
The more fflush-es you add, the more you're defeating the original idea of collecting up large chunks of data. That is, you are making a tradeoff: large chunks are more efficient, but are causing some other problem, so you make the decision: "be less efficient here, to solve a problem more important than mere efficiency". You call fflush.
Sometimes the problem is simply "debug the software". In that case, instead of repeatedly calling fflush, you can use functions like setbuf and setvbuf to alter the buffering behavior of a stdio stream. This is more convenient (fewer, or even no, code changes required—you can control the set-buffering call with a flag) than adding a lot of fflush calls, so that could be considered a "problem with use (or excessive-use) of fflush".
Well, #torek's answer is almost perfect, but there's one point which is not so accurate.
The first thing to consider is that fflush is defined only on output
streams.
According to man fflush, fflush can also be used in input streams:
For output streams, fflush() forces a write of all user-space
buffered data for the given output or update stream via the stream's
underlying write function. For
input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by
the application. The open status of
the stream is unaffected.
So, when used in input, fflush just discard it.
Here is a demo to illustrate it:
#include<stdio.h>
#define MAXLINE 1024
int main(void) {
char buf[MAXLINE];
printf("prompt: ");
while (fgets(buf, MAXLINE, stdin) != NULL)
fflush(stdin);
if (fputs(buf, stdout) == EOF)
printf("output err");
exit(0);
}
fflush() empties the buffers related to the stream. if you e.g. let a user input some data in a very shot timespan (milliseconds) and write some stuff into a file, the writing and reading buffers may have some "reststuff" remaining in themselves. you call fflush() then to empty all the buffers and force standard outputs to be sure the next input you get is what the user pressed then.
reference: http://www.cplusplus.com/reference/cstdio/fflush/
My application (C program) opens two file handles to the same file (one in write and one in read mode). Two separate threads in the app read from and write to the file. This works fine.
Since my app runs on embedded device with a limited ram disk size, I would like write FileHandle to wrap to beginning of file on reaching max size and the read FileHandle to follow like a circular buffer. I understand from answers to this question that this should work. However as soon as I do fseek of write FileHandle to beginning of file, fread returns error. Will the EOF get reset on doing fseek to beginning of file? If so, which function should be used to cause write file position to get set to 0 without causing EOF to be reset.
EDIT/UPDATE:
I tried couple of things:
Based on #neodelphi I used pipes this works. However my usecase requires I write to a file. I receive multiple channels of live video surveilance stream that needs to be stored to harddisk and also read back decoded and displayed on monitor.
Thanks to #Clement suggestions on doing ftell I fixed a couple of bugs in my code and wrap works for the reader however, the data read appears to be stale data since write are still buffered but reader reads stale content from hard disk. I cant avoid buffering due to performance considerations (I get 32Mbps of live data that needs to be written to harddisk). I have tried things like flushing writes only in the interval from when write wraps to when read wraps and truncating the file (ftruncate) after read wraps but this doesnt solve the stale data problem.
I am trying to use two files in ping-pong fashion to see if this solves the issue but want to know if there is a better solution
You should have something like that :
// Write
if(ftell(WriteHandle)>BUFFER_MAX) rewind (WriteHandle);
fwrite(WriteHandle,/* ... */);
// Read (assuming binary)
readSize = fread (buffer,1,READ_CHUNK_SIZE,ReadHandle);
if(readSize!=READ_CHUNK_SIZE){
rewind (ReadHandle);
if(fread (buffer+readSize,1,READ_CHUNK_SIZE-readSize,ReadHandle)!=READ_CHUNK_SIZE-readSize)
;// ERROR !
}
Not tested, but it gives an idea. The write should also handle the case BUFFER_MAX is not modulo WRITE_CHUNK_SIZE.
Also, you may read only if you are sure that the data has already been written. But I guess you already do that.
You could mmap the file into you're virtual memory and then just create a normal circular buffer with the pointer returned.
int fd = open(path, O_RDWR);
volatile void * mem = mmap(NULL, max_size, PROT_WRITE, MAP_SHARED, fd, 0);
volatile char * c_mem = (volatile char *)mem;
c_mem[index % max_size] = 'a'; // This line will now write to the offset index in the file
// Now doing
Can also probably be stricter on permissions depending on on exact case.
I'm going to first admit that this is for a class project, since it will be pretty obvious. We are supposed to do reads to probe for the block size of the filesystem. My problem is that the time taken to do this appears to be linearly increasing, with no steps like I would expect.
I am timing the read like this:
double startTime = getticks();
read = fread(x, 1, toRead, fp);
double endTime = getticks();
where getticks uses rdtsc instructions. I am afraid there is caching/prefetching that is causing the reads to not take time during the fread. I tried creating a random file between each execution my program, but that is not alleviating my problem.
What is the best way to accurately measure the time taken for a read from disk? I am pretty sure my block size is 4096, but how can I get data to support that?
The usual way of determining filesystem block size is to ask the filesystem what its blocksize is.
#include <sys/statvfs.h>
#include <stdio.h>
int main() {
struct statvfs fs_stat;
statvfs(".", &fs_stat);
printf("%lu\n", fs_stat.f_bsize);
}
But if you really want, open(…,…|O_DIRECT) or posix_fadvise(…,…,…,POSIX_FADV_DONTNEED) will try to let you bypass the kernel's buffer cache (not guaranteed).
You may want to use the system calls (open(), read(), write(), ...)
directly to reduce the impact of the buffering done by the FILE* stuff.
Also, you may want to use synchronous I/O somehow.
One ways is opening the file with the O_SYNC flag set
(or O_DIRECT as per ephemient's reply).
Quoting the Linux open(2) manual page:
O_SYNC The file is opened for synchronous I/O. Any write(2)s on the
resulting file descriptor will block the calling process until
the data has been physically written to the underlying hardware.
But see NOTES below.
Another options would be mounting the filesystem with -o sync (see mount(8)) or setting the S attribute on the file using the chattr(1) command.
Recently I ran into a "fun" problem with the Microsoft implementation of the CRTL. tmpfile places temp files in the root directory and completely ignores the temp file directory. This has issues with users who do not have privileges to the root directory (say, on our cluster). Moreover, using _tempnam would require the application to remember to delete the temporary files, which it is unable to do without a considerable amount of rework.
Therefore I bit the bullet and wrote Win32 versions of all of the IO routines (create_temp, read, write, seek, flush) which call the appropriate method. One thing I've noticed is the now abysmal performance of the library.
Results from the test suite:
CRTL: 4:30.05 elapsed
Win32: 11:18.06 elapsed
Stats measured in my routines:
Writes: 3129934 ( 44,642,745,008 bytes)
Reads: 935903 ( 8,183,423,744 bytes)
Seeks: 2205757 (2,043,782,657,968 bytes traveled)
Flushes: 92442
Example of a CRTL v. Win32 method:
int io_write(FILE_POINTER fp, size_t words, const void *buffer)
{
#if !defined(USE_WIN32_IO)
{
size_t words_written = 0;
/* read the data */
words_written = fwrite(buffer, sizeof(uint32_t), words, fp);
if (words_written != words)
{
return errno;
}
}
#else /* !defined(USE_WIN32_IO) */
{
DWORD bytesWritten;
if (!WriteFile(fp, buffer, words * sizeof(uint32_t), &bytesWritten, NULL)
|| (bytesWritten != words * sizeof(uint32_t)))
{
return GetLastError();
}
}
#endif /* USE_WIN32_IO */
return E_SUCCESS;
}
As you can see, they are effectively identical, yet the performance (in release mode) is wildly divergent. Time spent in WriteFile and SetFilePointer dwarf the time spent in fwrite and fseeko, which seems counterintuitive.
Ideas?
UPDATE: perfmon notes that fflush is about 10x cheaper than FlushFileBuffers and fwrite is ~1.1x slower than WriteFile. The net result is a huge performance loss with FlushFileBuffers used in the same manner as fflush. There is no change from FILE_ATTRIBUTE_NORMAL to FILE_FLAG_RANDOM_ACCESS either.
I think it's probably due to this issue, described on MSDN's page for FlushFileBuffers:
Due to disk caching interactions
within the system, the
FlushFileBuffers function can be
inefficient when used after every
write to a disk drive device when many
writes are being performed separately.
If an application is performing
multiple writes to disk and also needs
to ensure critical data is written to
persistent media, the application
should use unbuffered I/O instead of
frequently calling FlushFileBuffers.
To open a file for unbuffered I/O,
call the CreateFile function with the
FILE_FLAG_NO_BUFFERING and
FILE_FLAG_WRITE_THROUGH flags. This
prevents the file contents from being
cached and flushes the metadata to
disk with each write. For more
information, see CreateFile.
In general, FlushFileBuffers is an "expensive" operation, since it flushes everything in the write-back cache:
FlushFileBuffers(): This function will flush everything in the write-back cache, as it
does not know what part of the cache belongs to your file. This can take a lot of time,
depending on the cache size and the speed of the media. How necessary is it? There is
a thread which goes through and writes out dirty pages, so it is likely not very
necessary.
I presume that fflush does not flush the entire write-back cache. In that case, it's much more efficient, but that efficiency comes at the risk of potential data loss. The CRT's source code for fflush confirms this, since _commit calls FlushFileBuffers:
/* lowio commit to ensure data is written to disk */
if (str->_flag & _IOCOMMIT) {
return (_commit(_fileno(str)) ? EOF : 0);
}
From the implementation of _commit:
if ( !FlushFileBuffers((HANDLE)_get_osfhandle(filedes)) ) {
retval = GetLastError();
}
Traditionally, the C runtime library functions buffer the data and only trigger the write operation (hence the need for functions like fflush). I don't think that WriteFile buffers the write operation so every time you call WriteFile, an I/O operation gets triggered whereas with fwrite, the I/O gets triggered when the buffer has reached a certain size.
As you can see from your measurements, the buffered I/O tends to be more efficient...
I might be crazy, but wouldn't it be easier to just write a replacement for tmpfile that uses fopen(temporaryname, "wbTD+"), where you generate your own temporaryname?
At least then you don't have to worry about reimplementing <file.h>.
I'm still a little unclear on what the question is. You start out by talking about managing the lifetime of a temporary file and then jump to wrapping an entire file i/o interface. Are you asking about how to manage a temporary file without the performance penalty of wrapping all the file I/O? Or are you interested in how the CRT functions can be faster than the WinAPI functions they are built on top of?
Several of the comparisons being made between the C run-time functions and the WinAPi functions are of the apples and oranges variety.
The C run-time functions buffer the I/O in library memory. There is another layer of buffering (and caching) in the OS.
fflush flushes the data from the library buffers to the OS. It may go directly to disk, or it may go to OS buffers for later writing. FlushFileBuffers gets data from the OS buffers onto the disk, which generally takes longer than moving data from the library buffers to the OS buffers.
Unaligned writes are expensive. The OS buffers make unaligned writes possible, but they don't really speed up the process. The library buffers may accept several writes before pushing data to the OS, effectively reducing the number of unaligned writes to the disk.
It's also possible (though this is just a guess) that the library routines are taking advantage of overlapped (asynchronous) I/O to the disk, where your straight-to-WinAPI implementation is all synchronous.