Why the fwrite libc function is faster than the syscall write function? - c

After providing the same program which reads a random generated input file and echoes the same string it read to an output. The only difference is that on one side I'm providing the read and write methods from linux syscalls, and on the other side I'm using fread/fwrite.
Timing my application with an input of 10Mb in size and echoing it to /dev/null, and making sure the file is not cached, I've found that libc's fwrite is faster by a LARGE scale when using very small buffers (1 byte in case).
Here is my output from time, using fwrite:
real 0m0.948s
user 0m0.780s
sys 0m0.012s
And using the syscall write:
real 0m8.607s
user 0m0.972s
sys 0m7.624s
The only possibility that I can think of is that internally libc is already buffering my input... Unfortunately I couldn't find that much information around the web, so maybe the gurus here could help me out.

Timing my application with an input of
10Mb in size and echoing it to
/dev/null, and making sure the file in
not cached, I've found that libc's
frwite is faster by a LARGE scale when
using very small buffers (1 byte in
case).
fwrite works on streams, which are buffered. Therefore many small buffers will be faster because it won't run a costly system call until the buffer fills up (or you flush it or close the stream). On the other hand, small buffers being sent to write will run a costly system call for each buffer - that's where you're losing the speed. With a 1024 byte stream buffer, and writing 1 byte buffers, you're looking at 1024 write calls for each kilobyte, rather than 1024 fwrite calls turning into one write - see the difference?
For big buffers the difference will be small, because there will be less buffering, and therefore a more consistent number of system calls between fwrite and write.
In other words, fwrite(3) is just a library routine that collects up output into chunks, and then calls write(2). Now, write(2), is a system call which traps into the kernel. That's where the I/O actually happens. There is some overhead for simply calling into the kernel, and then there is the time it takes to actually write something. If you use large buffers, you will find that write(2) is faster because it eventually has to be called anyway, and if you are writing one or more times per fwrite then the fwrite buffering overhead is just that: more overhead.
If you want to read more about it, you can have a look at this document, which explains standard I/O streams.

write(2) is the fundamental kernel operation.
fwrite(3) is a library function that adds buffering on top of write(2).
For small (e.g., line-at-a-time) byte counts, fwrite(3) is faster, because of the overhead for just doing a kernel call.
For large (block I/O) byte counts, write(2) is faster, because it doesn't bother with buffering and you have to call the kernel in both cases.
If you look at the source to cp(1), you won't see any buffering.
Finally, there is one last consideration: ISO C vs Posix. The buffered library functions like fwrite are specified in ISO C whereas kernel calls like write are Posix. While many systems claim Posix-compatibility, especially when trying to qualify for government contracts, in practice it's specific to Unix-like systems. So, the buffered ops are more portable. As a result, a Linux cp will certainly use write but a C program that has to work cross-platform may have to use fwrite.

You can also disable buffering with setbuf() function. When the buffering is disabled, fwrite() will be as slow as write() if not slower.
More information on this subject can be found there: http://www.gnu.org/s/libc/manual/html_node/Controlling-Buffering.html

Related

Understanding low level file routines

I am going through Mark Burgess's "The GNU C Programming Tutorial". I have come across the following information:
Even though low-level fle routines do not use buffering, and once you call write, your data can be read from the file immediately, it may take up to a minute before your data is physically written to disk. (Page:142)
Firstly, is "it may take up to a minute(some time) before your data is written to disk" true?
Secondly, when low level file routines are not using buffering why will the delay take place?
There are two places where I/O buffering can occur (at least — it could be more than just two).
One is in the application; the standard I/O functions using FILE * use buffered I/O unless you use setvbuf() to prevent it.
The other is in the kernel. Disk I/O normally goes into the kernel buffer pool, and eventually gets written by the kernel to disk. There are ways around that (O_DIRECT on Linux; raw devices on classic Unix; etc). The key point is that the write() system call normally writes to he kernel buffer pool. The kernel takes responsibility for ensuring that the data is written to disk safely and correctly (journalling, …).
The kernel doesn't write everything to disk immediately because (a) you may add more changes to the data, (b) other people may need to read or write the data, (c) the disk drive may be busy writing something else at the other end of its 1 TiB of storage and it will take time to get the write head in position to take your data, and it would be better for the overall performance of the system if it scheduled other work before writing your changed buffer to disk. It will get written to disk. It is just not defined when, and it could be fractions of a second or multiple seconds or longer, though most often it will not take minutes for the data to be written to disk.
These days, there could also be buffering in the RAID controllers, and maybe in the individual disks inside the RAID setup, and maybe there's network buffering too if it is a remotely-mounted file system. Those add extra levels of buffering.
The read() and write() and related low-level I/O functions do not have any client-side (application) buffering — unlike the standard C I/O functions.
A file is said to be buffered, when its contents are not outputted or inputted directly. Instead, the file's bytes are written to a temporary buffer in memory.
For example, if you are reading from a file, you are reading from the buffer. Once you have read all the characters in the buffer, it is replenished with new bytes from the file. The reason for this indirectness, is that a memory read is much faster than a hard disk read.
The calls read and write are low-level, and do not perform buffering. The stdio.h calls like getc and putc, do use buffering. These higher-level APIs only call the low level ones, when the buffer must be replenished.
Writing to the hard drive is much slower than writing to RAM. When you write to a drive it writes to memory, but doesn't always write to the disk immediately. The data might not be written to disk until that part of memory needs to be overwritten to make room for something else. This is called a Write-Back cache.

Are fscanf and fprintf buffered in C?

I wish to write an efficient C program, that makes a copy of a file. There doesn't seem to be a function (such as rename) that performs this. I plan to use fscanf an fprintf in stdio.h, but their descriptions do not say how or if they are buffered. Are they buffered between different cache levels? (I assume disk to memory buffer is handled by the OS)
When you open a file with fopen, it will be fully buffered.
You can change the buffering before doing any other operations on the file, using setvbuf (reference).
Using any normal I/O functions on a FILE object will take advantage of the buffering.
If you are just copying the data, you will be doing sequential reads and writes, and will not necessarily need buffering. But doing that efficiently does require choosing an appropriate block size for I/O operations. Traditionally, this is related to the size of a disk sector (4096 bytes) but that value isn't future-proof. The default used by fopen is BUFSIZ.
As with any optimisation, construct actual tests to verify your performance gains (or losses).
In the end, for the fastest I/O you might have to use OS-specific APIs. The C I/O functions just map to the general case of those APIs, but there may be special performance settings for an OS that you cannot control through the C library. I certainly ran into this when writing a fast AVI writer for Windows. Using platform-specific I/O I was able to achieve the maximum read/write speed of the disk: twice the speed of buffered I/O (<stdio.h>) or the native AVI API, and about 20% faster than traditional C unbuffered I/O.
The printf and scanf family of functions are all part of the same buffered "interface". man 3 stdio:
The standard I/O library provides a simple and efficient buffered
stream I/O interface. Input and output is mapped into logical data
streams and the physical I/O characteristics are concealed. The
functions and macros are listed below; more information is
available from the individual man pages.
If you want to avoid buffering, you will have to use a different C library.

are fread and fwrite different in handling the internal buffer?

I keep on reading that fread() and fwrite() are buffered library calls. In case of fwrite(), I understood that once we write to the file, it won't be written to the hard disk, it will fill the internal buffer and once the buffer is full, it will call write() system call to write the data actually to the file.
But I am not able to understand how this buffering works in case of fread(). Does buffered in case of fread() mean, once we call fread(), it will read more data than we originally asked and that extra data will be stored in buffer (so that when 2nd fread() occurs, it can directly give it from buffer instead of going to hard disk)?
And I have following queries also.
If fread() works as I mention above, then will first fread() call read the data that is equal to the size of the internal buffer? If that is the case, if my fread() call ask for more bytes than internal buffer size, what will happen?
If fread() works as I mention above, that means at least one read() system call to kernel will happen for sure in case of fread(). But in case of fwrite(), if we only call fwrite() once during the program execution, we can't say for sure that write() system call be called. Is my understanding correct?
Will the internal buffer be maintained by OS?
Does fclose() flush the internal buffer?
There is buffering or caching at many different levels in a modern system. This might be typical:
C standard library
OS kernel
disk controller (esp. if using hardware RAID)
disk drive
When you use fread(), it may request 8 KB or so if you asked for less. This will be stored in user-space so there is no system call and context switch on the next sequential read.
The kernel may read ahead also; there are library functions to give it hints on how to do this for your particular application. The OS cache could be gigabytes in size since it uses main memory.
The disk controller may read ahead too, and could have a cache size up to hundreds of megabytes on smallish systems. It can't do as much in terms of read-ahead, because it doesn't know where the next logical block is for the current file (indeed it doesn't even know what file it is reading).
Finally, the disk drive itself has a cache, perhaps 16 MB or so. Like the controller, it doesn't know what file it is reading. For many years one disk block was 512 bytes, but it got a little larger (a few KB) recently with multi-terabyte disks.
When you call fclose(), it will probably deallocate the user-space buffer, but not the others.
Your understanding is correct. And any buffered fwrite data will be flushed when the FILE* is closed. The buffered I/O is mostly transparent for I/O on regular files.
But for terminals and other character devices you may care. Another instance where buffered I/O may be an issue is if you read from the file that one process is writing to from another process -- a common example is if a program writes text to a log file during operation, and the user runs a command like tail -f program.log to watch the content of the log file live. If the writing process has buffering enabled and it doesn't explicitly flush the log file, it will make it difficult to monitor the log file.

what is the point of using the setvbuf() function in c?

Why would you want to set aside a block of memory in setvbuf()?
I have no clue why you would want to send your read/write stream to a buffer.
setvbuf is not intended to redirect the output to a buffer (if you want to perform IO on a buffer you use sprintf & co.), but to tightly control the buffering behavior of the given stream.
In facts, C IO functions don't immediately pass the data to be written to the operating system, but keep an intermediate buffer to avoid continuously performing (potentially expensive) system calls, waiting for the buffer to fill before actually performing the write.
The most basic case is to disable buffering altogether (useful e.g. if writing to a log file, where you want the data to go to disk immediately after each output operation) or, on the other hand, to enable block buffering on streams where it is disabled by default (or is set to line-buffering). This may be useful to enhance output performance.
Setting a specific buffer for output can be useful if you are working with a device that is known to work well with a specific buffer size; on the other side, you may want to have a small buffer to cut down on memory usage in memory-constrained environments, or to avoid losing much data in case of power loss without disabling buffering completely.
In C files opened with e.g. fopen are by default buffered. You can use setvbuf to supply your own buffer, or make the file operations completely unbuffered (like to stderr is).
It can be used to create fmemopen functionality on systems that doesn't have that function.
The size of a files buffer can affect Standard library call I/O rates. There is a table in Chap 5 of Steven's 'Advanced Programming in the UNIX Environment' that shows I/O throughput increasing dramatically with I/O buffer size, up to ~16K then leveling off. A lot of other factor can influenc overall I/O throughtput, so this one "tuning" affect may or may not be a cureall. This is the main reason for "why" other than turning off/on buffering.
Each FILE structure has a buffer associated with it internally. The reason behind this is to reduce I/O, and real I/O operations are time costly.
All your read/write will be buffered until the buffer is full. All the data buffered will be output/input in one real I/O operation.
Why would you want to set aside a block of memory in setvbuf()?
For buffering.
I have no clue why you would want to send your read/write stream to a buffer.
Neither do I, but as that's not what it does the point is moot.
"The setvbuf() function may be used on any open stream to change its buffer" [my emphasis]. In other words it alread has a buffer, and all the function does is change that. It doesn't say anything about 'sending your read/write streams to a buffer". I suggest you read the man page to see what it actually says. Especially this part:
When an output stream is unbuffered, information appears on the destination file or terminal as soon as written; when it is block buffered many characters are saved up and written as a block; when it is line buffered characters are saved up until a newline is output or input is read from any stream attached to a terminal device (typically stdin).

Write system call writes data to disk directly?

I've read couple of questions(here) related to this but I still have some confusion.
My understanding is that write system call puts the data into Buffered Cache(OS caches as referred in that question). When the Buffered Cache gets full it is written to the disk.
Buffered IO is further optimization on top of this. It caches in the C RTL buffers and when they get full a write system call issued to move the contents to Buffered Cache. If I use fflush then data related to this particular file that is present in the C RTL buffers as well as Buffered Cache is sent to the disk.
Is my understanding correct?
How the stdio buffers are flushed is depending on the standard C library you use. To quote from the Linux manual page:
Note that fflush() only flushes the user space buffers provided by the C library.
To ensure that the data is physically stored on disk the kernel buffers must be
flushed too, for example, with sync(2) or fsync(2).
This means that on a Linux system, using fflush or overflowing the buffer will call the write function. But the operating system may keep internal buffers, and not actually write the data to the device. To make sure the data is truly written to the device, use both fflush and the low-level fsync.
Edit: Answer rephrased.

Resources