speed comparison between fgetc/fputc and fread/fwrite in C

speed comparison between fgetc/fputc and fread/fwrite in C - c

So(just for fun), i was just trying to write a C code to copy a file. I read around and it seems that all the functions to read from a stream call fgetc() (I hope this is this true?), so I used that function:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define FILEr "img1.png"
#define FILEw "img2.png"
main()
{
clock_t start,diff;
int msec;
FILE *fr,*fw;
fr=fopen(FILEr,"r");
fw=fopen(FILEw,"w");
start=clock();
while((!feof(fr)))
fputc(fgetc(fr),fw);
diff=clock()-start;
msec=diff*1000/CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec/1000, msec%1000);
fclose(fr);
fclose(fw);
}
This gave a run time of 140 ms for this file on a 2.10Ghz core2Duo T6500 Dell inspiron laptop.
However, when I try using fread/fwrite, I get decreasing run time as I keep increasing the number of bytes(ie. variable st in the following code) transferred for each call until it peaks at around 10ms! Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define FILEr "img1.png"
#define FILEw "img2.png"
main()
{
clock_t start,diff;
// number of bytes copied at each step
size_t st=10000;
int msec;
FILE *fr,*fw;
// placeholder for value that is read
char *x;
x=malloc(st);
fr=fopen(FILEr,"r");
fw=fopen(FILEw,"w");
start=clock();
while(!feof(fr))
{
fread(x,1,st,fr);
fwrite(x,1,st,fw);
}
diff=clock()-start;
msec=diff*1000/CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec/1000, msec%1000);
fclose(fr);
fclose(fw);
free(x);
}
Why this is happening? I.e if fread is actually multiple calls to fgetc then why the speed difference?
EDIT: specified that "increasing number of bytes" refers to the variable st in the second code

fread() is not calling fgetc() to read each byte.
It behaves as if calling fgetc() repeatedly, but it has direct access to the buffer that fgetc() reads from so it can directly copy a larger quantity of data.

You are forgetting about file buffering (inode, dentry and page caches).
Clear them before you run:
echo 3 > /proc/sys/vm/drop_caches
Backgrounder:
Benchmarking is an art. Refer to bonnie++, iozone and phoronix for proper filesystem benchmarking. As a characteristic, bonnie++ won't allow a benchmark with a written volume of less than 2x the available system memory.
Why?
(answer: buffering effects!)

Like sehe says its partly because buffering, but there is more to it and I'll explain why is that and at the same why fgetc() will give more latency.
fgetc() is called for every byte that is read from from file.
fread() is called for every n bytes of the local buffer for file data.
So for a 10MiB file:
fgetc() is called: 10 485 760 times
While fread with a 1KiB buffer the function called 10 240 times.
Lets say for simplicity that every function call takes 1ms:
fgetc would take 10 485 760 ms = 10485.76 seconds ~ 2,9127 hours
fread would take 10 240 ms = 10.24 seconds
On top of that the OS does reading and writing on usually the same device, I suppose your example does it on the same hard disk. The OS when reading your source file, move the hard disk heads over the spinning disk platters seeking the file and then reads 1 byte, put it on memory, then move again the reading/writing head over the hard disk spinning platters looking on the place that the OS and the hard disk controller agreed to locate the destination file and then writes 1 byte from memory. For the above example this happens over 10 million times for each file: totaling over 20 million times, using the buffered version this happens just a grand total of over 20 000 times.
Besides that the OS when reading the disk puts in memory a few more KiB of hard disk data for performance purposes, an this can speed up the program even when using the less efficient fgetc because the program read from the OS memory instead of reading directly from the hard disk. This is to what sehe's response refers.
Depending on your machine configuration/load/OS/etc your results from reading and writing can vary a lot, hence his recommendation to empty the disk caches to grasp better more meaningful results.
When source and destination files are on different hdd things are a lot faster. With SDDs I'm not really sure if reading/writing are absolutely exclusive of each other.
Summary: Every call to a function has certain overhead, reading from a HDD has other overheads and caches/buffers help to get things faster.
Other info
http://en.wikipedia.org/wiki/Disk_read-and-write_head
http://en.wikipedia.org/wiki/Hard_disk#Components

stdio functions will fill a read buffer, of size "BUFSIZ" as defined in stdio.h, and will only make one read(2) system call every time that buffer is drained. They will not do an individual read(2) system call for every byte consumed -- they read large chunks. BUFSIZ is typically something like 1024 or 4096.
You can also adjust that buffer's size, if you wish, to increase it -- see the man pages for setbuf/setvbuf/setbuffer on most systems -- though that is unlikely to make a huge difference in performance.
On the other hand, as you note, you can make a read(2) system call of arbitrary size by setting that size in the call, though you get diminishing returns with that at some point.
BTW, you might as well use open(2) and not fopen(3) if you are doing things this way. There is little point in fopen'ing a file you are only going to use for its file descriptor.

Related

Why fread does have thread safe requirements which slows down its call

I am writing a function to read binary files that are organized as a succession of (key, value) pairs where keys are small ASCII strings and value are int or double stored in binary format.
If implemented naively, this function makes a lot of call to fread to read very small amount of data (usually no more than 10 bytes). Even though fread internally uses a buffer to read the file, I have implemented my own buffer and I have observed speed up by a factor of 10 on both Linux and Windows. The buffer size used by fread is large enough and the function call cannot be responsible for such a slowdown. So I went and dug into the GNU implementation of fread and discovered some lock on the file, and many other things such as verifying that the file is open with read access and so on. No wonder why fread is so slow.
But what is the rationale behind fread being thread-safe where it seems that multiple thread can call fread on the same file which is mind boggling to me. These requirements make it slow as hell. What are the advantages?

Imagine you have a file where each 5 bytes can be processed in parallel (let's say, pixel by pixel in an image):
123456789A
One thread needs to pick 5 bytes "12345", the next one the next 5 bytes "6789A".
If it was not thread-safe different threads could pick-up wrong chunks. For example: "12367" and "4589A" or even worst (unexpected behaviour, repeated bytes or worst).

As suggested by nemequ:
Note that if you're on glibc you can use the _unlocked variants (*e.g., fread_unlocked). On Windows you can define _CRT_DISABLE_PERFCRIT_LOCKS

Stream I/O is already as slow as molasses. Programmers think that a read from main memory (1000x longer than a CPU cycle) is ages. A read from the physical disk or a network may as well be eternity.
I don't know if that's the #1 reason why the library implementers were ok with adding the lock overhead, but I guarantee it played a significant part.
Yes, it slows it down, but as you discovered, you can manually buffer the read and use your own handling to increase the speed when performance really matters. (That's the key--when you absolutely must read the data as fast as possible. Don't bother manually buffering in the general case.)
That's a rationalization. I'm sure you could think of more!

Long delay hiccups for logging stdout to file

I have a C program that writes into 3 lines every 10ms into stdout. If I redirect the output to a file (using > ) there will be long delays (60ms) in the running of the program. The delays are periodic (say every 5 seconds).
If I just let it write to console or redirect to /dev/null, there is no problem.
I suspected that this is the stdout buffer problem, but using fflush(stdout) didn't solve the problem.
How can I solve the issue?

If I redirect the output to a file (using > ) there will be long
delays (60ms) in the running of the program.
That's because when stdout is a terminal device, it is usually (although not required) line-buffered, that is, the output buffer is flushed when a newline character is written, whereas in the case of regular files, output is fully buffered, meaning the buffers are flushed either when they are full or you close the file (or you explicitly call fflush(), of course).
fflush(stdout) may not be enough for you because that only flushes the standard I/O library buffers, but the kernel also buffers and delays writes to disk. You can call fsync() on the file descriptor to flush the modified buffer cache pages to disk after calling fflush(), as in fsync(STDOUT_FILENO).
Be careful and don't call fsync() without calling fflush() before.
UPDATE: You can also try sync(), which, unlike fsync(), does not block waiting for the underlying writes to return. Or, as suggested in another answer, fdatasync() may be a good choice because it avoids the overhead of updating file times.

You need to use fsync. The following:
fsync(fileno(stdout))
Should help. Note that the Linux kernel will still buffer limit I/O according to its internal scheduler limits. Running as root and setting a very low nice value might make a difference, if you're not getting the frequency you want.
If it's still too slow, try using fdatasync instead. Every fflush and fsync causes the filesystem to update node metadata (filesize, access time, etc) as well as the actual data itself. If you know in blocks how much data you'll be writing, then you can try the following trick:
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char **argv){
FILE *fp = fopen("test.txt", "w");
char *line = "Test\n";
char *fill = "\0";
fwrite(fill, 1, 100*strlen(line), fp);
fflush(fp);
fsync(fileno(fp));
rewind(fp);
for (int i = 0; i < 100; i++){
fwrite(line, strlen(line), 1, fp);
fflush(fp);
fdatasync(fileno(fp));
}
}
The first fwrite call writes 5*100 zeros to the file in one chunk, and fsyncs so that it's written to disk and the node information is updated. Now we can write up to 500 bytes to the file without trashing filesystem metadata. rewind(3) returns the file pointer position to the beginning of the file so we can write over the data without changing the filesize of the node.
Timing that program gives the following:
$ time ./fdatasync
./fdatasync 0.00s user 0.01s system 1% cpu 0.913 total
So it ran fdatasync and sync'ed to disk 100 times in 0.913 seconds, which averages out to ~9ms per write & fdatasync call.

it could be just that every 5seconds you are filling up your disk buffer and there is a spike in the latency due to flushing to actual disk.check with iostat

Block Linux read(2) until all of count bytes have arrived

I'm using read (2) to read from a file (/dev/random, where data arrives very slowly).
However, read() returns after reading only a few bytes, while I'd like it to wait until the specified amount of bytes has been read (or an error has occurred), so the return value should always be count, or -1.
Is there any way to enable this behaviour? The open (2) and read (2) manpages do not contain any useful information on that topic, nor have I found any information regarding the topic on the Internet.
I am fully aware of the workaround of just putting the read() inside a while loop and calling it until all data has been read. I'd just like to know if this can be achieved in a proper way that yields deterministic behaviour and involves only O(1) syscalls, instead of the nondeterministic O(n) in case of the while loop solution.
The following minimal example reproduces the problem.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
int fd = open("/dev/random", 0);
char buf[128];
size_t bytes = read(fd, buf, sizeof(buf));
printf("Bytes read: %lu\n", bytes); //output is random, usually 8.
close(fd);
}

While read can be interrupted by a signal before the requested data is received, it cannot really be done without while.
You have to check the return value and count bytes, unfortunately.
And yes, the easiest way would be to write a wrapping function.

As everyone has said,
There's no way to guarantee that 128 bytes of randomness are available before your read returns, and
The overhead involved in getting eight bytes at a time is trivial compared to the amortized cost of generating the eight bytes; consequently,
You should remember that entropy comes at a huge cost and take that into account when consuming it.
Nonetheless, no answer to this question would be complete without noting that in man 4 random (on a vaguely recent Linux distro) you should find the following information:
The files in the directory /proc/sys/kernel/random
(present since 2.3.16) provide an additional interface
to the /dev/random device.
...
The file read_wakeup_threshold contains the number of bits of
entropy required for waking up processes that sleep waiting
for entropy from /dev/random. The default is 64.
That is, 64 bits, which is eight bytes. With superuser privileges you could increase this value, but imho increasing it to 1024 and then expecting your machine to keep working as normal is probably pretty optimistic. I don't know all the things that want a bit of entropy but I've certainly noticed that my entropy pool goes up and down, so I know something wants it and I strongly suspect that whatever something that is would not be happy having to wait for 1024 bits of it to be available. Anyway, you know have a bit of rope...

From the documentation, /dev/random does its best to return the most relibale randomized data it can, and that limits the number of bytes it returns in one read.
But reading /dev/urandom (notice the 'u') will return as much data as you requested (buffer size), with sometimes less randomized data.
Here is an useful link
About the read() behavior, I'm quite sure this cannot be changed : read() returns the amount of data that the underlying plumbery (for example disk+driver+...) decided to return, it's a by-design behavior. The way to do things is, as you said, to loop until you received as much data as expected.

Overhead of times() system call - relative to file operations

What is the relative overhead of calling times() versus file operations like reading a line fread().
I realize this likely differs from OS to OS and depends on how long the line is, where the file is located, if it's really a pipe that's blocked (it's not), etc.
Most likely the file is not local but is on a mounted NFS drive somewhere on the local network. The common case is a line that is 20 characters long. If it helps, assume Linux kernel 2.6.9. The code will not be run on Windows.
I'm just looking for a rough guide. Is it on the same order of magnitude? Faster? Slower?
Ultimate goal: I'm looking at implementing a progress callback routine, but don't want to call too frequently (because the callback is likely very expensive). The majority of work is reading a text file (line by line) and doing something with the line. Unfortunately, some of the lines are very long, so simply calling every N lines isn't effective in the all-too-often seen pathological cases.
I'm avoiding writing a benchmark because I'm afraid of writing it wrong and am hoping the wisdom of the crowd is greater than my half-baked tests.

fread() is a C library function, not a system call. fread(), fwrite(), fgets() and friends are all buffered I/O by default (see setbuf) which means that the library allocates a buffer which decreases the frequency with which read() and write() system calls need to be made.
This means that if you're reading sequentially from the file, the library will only issue a system call every, say, 100 reads (subject to the buffer size and how much data you read at a time).
When the read() and write() system calls are made, however, they will definitely be slower than calling times(), simply due to the volume of data that needs to be exchanged between your program and the kernel. If the data is cached in the OS's buffers (e.g. it was written by another process on the same machine moments ago) then it will still be pretty fast. If the data is not cached, then you will have to wait for I/O (be it to the disk or over the network), which is very slow in comparison.
If the data is coming fresh over NFS, then I'd be pretty confident that calling times() will be faster than fread() on average.

On Linux, you could write a little program that does lots of calls to times() and fread() and measure the syscall times with strace -c
e.g
for (i = 0; i < RUNS; i++) {
times(&t_buf);
fread(buf,1,BUF,fh);
}
This is when BUF 4096 (fread will actually call read() every time)
# strace -c ./time_times
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
59.77 0.001988 0 100000 read
40.23 0.001338 0 99999 times
and this is when BUF 16
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.00 0.001387 0 99999 times
1.00 0.000014 0 392 read

times() simply reads kernel maintained process-specific data. The data is maintained by the kernel to supply information for the wait() system call when the process exits. So, the data is always maintained, regardless of whether times() ever gets called. The extra overhead of calling times() is really low
fread(), fwrite(), etc call underlying system calls - read() & write(), which invoke drivers. The drivers then place data in a kernel buffer. This is far more costly in terms of resources than invoking times().
Is this what you are asking?

Probing for filesystem block size

I'm going to first admit that this is for a class project, since it will be pretty obvious. We are supposed to do reads to probe for the block size of the filesystem. My problem is that the time taken to do this appears to be linearly increasing, with no steps like I would expect.
I am timing the read like this:
double startTime = getticks();
read = fread(x, 1, toRead, fp);
double endTime = getticks();
where getticks uses rdtsc instructions. I am afraid there is caching/prefetching that is causing the reads to not take time during the fread. I tried creating a random file between each execution my program, but that is not alleviating my problem.
What is the best way to accurately measure the time taken for a read from disk? I am pretty sure my block size is 4096, but how can I get data to support that?

The usual way of determining filesystem block size is to ask the filesystem what its blocksize is.
#include <sys/statvfs.h>
#include <stdio.h>
int main() {
struct statvfs fs_stat;
statvfs(".", &fs_stat);
printf("%lu\n", fs_stat.f_bsize);
}
But if you really want, open(…,…|O_DIRECT) or posix_fadvise(…,…,…,POSIX_FADV_DONTNEED) will try to let you bypass the kernel's buffer cache (not guaranteed).

You may want to use the system calls (open(), read(), write(), ...)
directly to reduce the impact of the buffering done by the FILE* stuff.
Also, you may want to use synchronous I/O somehow.
One ways is opening the file with the O_SYNC flag set
(or O_DIRECT as per ephemient's reply).
Quoting the Linux open(2) manual page:
O_SYNC The file is opened for synchronous I/O. Any write(2)s on the
resulting file descriptor will block the calling process until
the data has been physically written to the underlying hardware.
But see NOTES below.
Another options would be mounting the filesystem with -o sync (see mount(8)) or setting the S attribute on the file using the chattr(1) command.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

speed comparison between fgetc/fputc and fread/fwrite in C - c

fread() is not calling fgetc() to read each byte. It behaves as if calling fgetc() repeatedly, but it has direct access to the buffer that fgetc() reads from so it can directly copy a larger quantity of data.

Related

Why fread does have thread safe requirements which slows down its call

Long delay hiccups for logging stdout to file

Block Linux read(2) until all of count bytes have arrived

Overhead of times() system call - relative to file operations

Probing for filesystem block size

Categories

Resources