Comparing time taken to read() from file system

Comparing time taken to read() from file system - c

I have created a program that measures that time taken for a read() to be performed on a file and I do this several times to determine the block size of my file system.
My question:
After plotting this data, everytime I try it, no matter the size I am reading in each iteration, the first read takes significantly longer time compared to any other read. I know that once a block has completed reading, the time to do the next read in the new block will take a bit more time (which I have observed in my plot) but this first read value is much higher than that too.
Does anyone have a filesystems/O.S. based answer to why this is the case?

I can think of a couple of reasons why this might be the case.
The file system might cache (pre-fetch) the data read from disk, so that even if it only returns (say) 1 block to your program, it might have actually read multiple blocks from the disk; so that the next time you do a read, you're actually just pulling more from that cached data. It's also perhaps possible that doing the first read might involve the read head having to move to the start of the file? This is probably very file-system-dependent. I think that cacheing is more likely to be the cause?

Related

Why fread does have thread safe requirements which slows down its call

I am writing a function to read binary files that are organized as a succession of (key, value) pairs where keys are small ASCII strings and value are int or double stored in binary format.
If implemented naively, this function makes a lot of call to fread to read very small amount of data (usually no more than 10 bytes). Even though fread internally uses a buffer to read the file, I have implemented my own buffer and I have observed speed up by a factor of 10 on both Linux and Windows. The buffer size used by fread is large enough and the function call cannot be responsible for such a slowdown. So I went and dug into the GNU implementation of fread and discovered some lock on the file, and many other things such as verifying that the file is open with read access and so on. No wonder why fread is so slow.
But what is the rationale behind fread being thread-safe where it seems that multiple thread can call fread on the same file which is mind boggling to me. These requirements make it slow as hell. What are the advantages?

Imagine you have a file where each 5 bytes can be processed in parallel (let's say, pixel by pixel in an image):
123456789A
One thread needs to pick 5 bytes "12345", the next one the next 5 bytes "6789A".
If it was not thread-safe different threads could pick-up wrong chunks. For example: "12367" and "4589A" or even worst (unexpected behaviour, repeated bytes or worst).

As suggested by nemequ:
Note that if you're on glibc you can use the _unlocked variants (*e.g., fread_unlocked). On Windows you can define _CRT_DISABLE_PERFCRIT_LOCKS

Stream I/O is already as slow as molasses. Programmers think that a read from main memory (1000x longer than a CPU cycle) is ages. A read from the physical disk or a network may as well be eternity.
I don't know if that's the #1 reason why the library implementers were ok with adding the lock overhead, but I guarantee it played a significant part.
Yes, it slows it down, but as you discovered, you can manually buffer the read and use your own handling to increase the speed when performance really matters. (That's the key--when you absolutely must read the data as fast as possible. Don't bother manually buffering in the general case.)
That's a rationalization. I'm sure you could think of more!

Read a file after write and closing it in C

My code does the following
do 100 times of
open a new file; write 10M data; close it
open the 100 files together, read and merge their data into a larger file
do steps 1 and 2 many times in a loop
I was wondering if I can keep the 100 open w/o opening and closing them too many times. What I can do is fopen them with w+. After writing I set position the beginning to read, after read I set position to the beginning to write, and so on.
The questions are:
if I read after write w/o closing, do we always read all the written data
would this save some overhead? File open and close must have some overhead, but is this overhead large enough to save?
Bases on the comments and discussion I will talk about why I need to do this in my work. It is also related to my other post
how to convert large row-based tables into column-based tables efficently
I have a calculation that generates a stream of results. So far the results are saved in a row-storage table. This table has 1M columns, each column could be 10M long. Actually each column is one attribute the calculation produces. At the calculation runs, I dump and append the intermediate results the table. The intermediate results could be 2 or 3 double values at each column. I wanted to dump it soon because it already consumes >16M memory. And the calculate needs more memoy. This ends up a table like the following
aabbcc...zzaabbcc..zz.........aabb...zz
A row of data are stored together. The problem happens when I want to analyze the data column by column. So I have to read 16 bytes then seek to the next row for reading 16 bytes then keep on going. There are too many seeks, it is much slower than if all columns are stored together so I can read them sequentially.
I can make the calculation dump less frequent. But to make the late read more efficent. I may want to have 4K data stored together since I assume each fread gets 4K by default even if I read only 16bytes. But this means I need to buffer 1M*4k = 4G in memory...
So I was thinking if I can merge fragment datas into larger chunks like that the post says
how to convert large row-based tables into column-based tables efficently
So I wanted to use files as offline buffers. I may need 256 files to get a 4K contiguous data after merge if each file contains 1M of 2 doubles. This work can be done as an asynchronous way in terms of the main calculation. But I wanted to ensure the merge overhead is small so when it runs in parallel it can finish before the main calculation is done. So I came up with this question.
I guess this is very related to how column based data base is constructed. When people create them, do they have the similar issues? Is there any description of how it works on creation?

You can use w+ as long as the maximum number of open files on your system allows it; this is usually 255 or 1024, and can be set (e.g. on Unix by ulimit).
But I'm not too sure this will be worth the effort.
On the other hand, 100 files of 10M each is one gigabyte; you might want to experiment with a RAM disk. Or with a large file system cache.
I suspect that huger savings might be reaped by analyzing your specific problem structure. Why is it 100 files? Why 10 M? What kind of "merge" are you doing? Are those 100 files always accessed in the same order and with the same frequency? Could some data be kept in RAM and never be written at all?
Update
So, you have several large buffers like,
ABCDEFG...
ABCDEFG...
ABCDEFG...
and you want to pivot them so they read
AAA...
BBB...
CCC...
If you already have the total size (i.e., you know that you are going to write 10 GB of data), you can do this with two files, pre-allocating the file and using fseek() to write to the output file. With memory-mapped files, this should be quite efficient. In practice, row Y, column X of 1,000,000 , has been dumped at address 16*X in file Y.dat; you need to write it to address 16*(Y*1,000,000 + X) into largeoutput.dat.
Actually, you could write the data even during the first calculation. Or you could have two processes communicating via a pipe, one calculating, one writing to both row-column and column-row files, so that you can monitor the performances of each.
Frankly, I think that adding more RAM and/or a fast I/O layer (SSD maybe?) could get you more bang for the same buck. Your time costs too, and the memory will remain available after this one work has been completed.

Yes. You can keep the 100 files open without doing the opening-closing-opening cycle. Most systems do have a limit on the number of open files though.
if I read after write w/o closing, do we always read all the written data
It depends on you. You can do an fseek goto wherever you want in the file and read data from there. It's all the way you and your logic.
would this save some overhead? File open and close must have some overhead, but is this overhead large enough to save?
This would definitely save some overhead, like additional unnecessary I/O operations and also in some systems, the content which you write to file is not immediately flushed to physical file, it may be buffered and flushed periodically and or done at the time of fclose.
So, such overheads are saved, but, the real question is what do you achieve by saving such overheads? How does it suit you in the overall picture of your application? This is the call which you must take before deciding on the logic.

C read part of file into cache

I have to do a program (for Linux) where there's an extremely large index file and I have to search and interpret the data from the file. Now the catch is, I'm only allowed to have x-bytes of the file cached at any time (determined by argument) so I have to remove certain data from the cache if it's not what I'm looking for.
If my understanding is correct, fopen (r) doesn't put anything in the cache, only when I call getc or fread(specifying size) does it get cached.
So my question is, lets say I use fread and read 100 bytes but after checking it, only 20 of the 100 bytes contains the data I need; how would I remove the useless 80 bytes from cache (or overwrite it) in order to read more from the file.
EDIT By caching I mean data stored in memory, which makes the problem easier

fread's first argument is a pointer to a block of memory. So the way to go about this is to set that pointer to the stuff you want to over write. For example lets say you want to keep bytes 20-40 and overwrite everything else. You could either a) invoke fread on start with a length of 20 then invoke it again on buffer[40] with a size of 60. or b) You could start by defragmenting (ie copy the bytes you want to keep to the start) then invoke fread with a pointer to the next section.

Why do you want to micromanage the cache? Secondly, what makes you think you can? No argument specified on the command line of your program can control what the cache manager does internally - it may decide to read an entire file into RAM, it may decide to read none of it, or it may decide to throw a party. Any control you have over it would use low-level APIs/syscalls and would not very granular.

I think you might be confused about the requirements, or maybe the person who gave them to you. You seem to be referring to the cache managed by the operating system, which there is no need for an application to ever have to worry about. The operating system will make sure it doesn't grow too large automatically.
The other meaning of "cache" is the one you create yourself, the char* buffer or whatever you create to temporarily hold the data in memory while you process it. This one should be fairly easy to manage yourself simply by not allocating too much memory for that buffer.

To discard the read buffer of a file opened with fopen(), you can use fflush(). Also note that you can control the buffer size with setvbuf().
You should consider using open/read (instead of fopen/fread) if you must have exact control over buffering, though.

reading data from filesystem vs compiling the data directly into program

I have a file (10-20MB) containing data, where each line is a single piece of data.
I have a C program that reads the file from the filesystem, and then based on command line input, it reads each line of the file, does a calculation on each line to determine if that line should be returned, and then return a subset of the data.
Assume that the program does an fread and reads the entire file into memory at the beginning, and then parses it directly from memory.
Would the program execute faster if, instead of reading it from the filesystem, I compiled the data into the program directly, by creating an array such as the following?
char *dataArray[] = {"data1", "data2", "data3"....};
Since the OS needs to read the entire binary from the filesystem, my gut feeling is that the execution time of both techniques would be similar, since reading from the filesystem would be the high order bit. However, would anyone have more definitive ideas on this?

Defining everything as a program literal will certainly be faster.
You do not need the relatively slow "open" call for the data file and you don't need to move the data from the buffer to your storage.
This was a common optimization circa. 1970, and every programming/coding style book since then stongly recommends you do not do this. The actual performance increase is minimal and what you gain in performance you lose in maintainability and flexibility.
Should you want a quick maintainable optimisation for this type of problem then look at the "mmap" call which makes the buffer directly available to your program and minimises data movement.

I doubt the difference in execution time will be significant, but from a memory utilization standpoint, putting the data in the executable (and qualifying it const appropriately) will make a big difference.
If you read 10-20 megs of data from a file into memory allocated (e.g. via malloc) in your program, the data initially exists in two places in memory: the filesystem cache, and your program's private memory. The former copy can be discarded if memory is tight, but the latter occupies physical memory or swap permanently until it's freed.
If on the other hand the 10-20 megs of data are part of your program's image (in the executable file), the data will be demand-paged, and can be discarded whenever needed because the OS knows it can reload the pages if it needs them again.

Read a line of input faster than fgets?

I'm writing a program where performance is quite important, but not critical. Currently I am reading in text from a FILE* line by line and I use fgets to obtain each line. After using some performance tools, I've found that 20% to 30% of the time my application is running, it is inside fgets.
Are there faster ways to get a line of text? My application is single-threaded with no intentions to use multiple threads. Input could be from stdin or from a file. Thanks in advance.

You don't say which platform you are on, but if it is UNIX-like, then you may want to try the read() system call, which does not perform the extra layer of buffering that fgets() et al do. This may speed things up slightly, on the other hand it may well slow things down - the only way to find out is to try it and see.

Use fgets_unlocked(), but read carefully what it does first
Get the data with fgetc() or fgetc_unlocked() instead of fgets(). With fgets(), your data is copied into memory twice, first by the C runtime library from a file to an internal buffer (stream I/O is buffered), then from that internal buffer to an array in your program

Read the whole file in one go into a buffer.
Process the lines from that buffer.
That's the fastest possible solution.

You might try minimizing the amount of time you spend reading from the disk by reading large amounts of data into RAM then working on that. Reading from disk is slow, so minimize the amount of time you spend doing that by reading (ideally) the entire file once, then working on it.
Sorta like the way CPU cache minimizes the time the CPU actually goes back to RAM, you could use RAM to minimize the number of times you actually go to disk.

Depending on your environment, using setvbuf() to increase the size of the internal buffer used by file streams may or may not improve performance.
This is the syntax -
setvbuf (InputFile, NULL, _IOFBF, BUFFER_SIZE);
Where InputFile is a FILE* to a file just opened using fopen() and BUFFER_SIZE is the size of the buffer (which is allocated by this call for you).
You can try various buffer sizes to see if any have positive influence. Note that this is entirely optional, and your runtime may do absolutely nothing with this call.

If the data is coming from disk, you could be IO bound.
If that is the case, get a faster disk (but first check that you're getting the most out of your existing one...some Linux distributions don't optimize disk access out of the box (hdparm)), stage the data into memory (say by copying it to a RAM disk) ahead of time, or be prepared to wait.
If you are not IO bound, you could be wasting a lot of time copying. You could benefit from so-called zero-copy methods. Something like memory map the file and only access it through pointers.
That is a bit beyond my expertise, so you should do some reading or wait for more knowledgeable help.
BTW-- You might be getting into more work than the problem is worth; maybe a faster machine would solve all your problems...
NB-- It is not clear that you can memory map the standard input either...

If the OS supports it, you can try asynchronous file reading, that is, the file is read into memory whilst the CPU is busy doing something else. So, the code goes something like:

start asynchronous read
loop:
wait for asynchronous read to complete
if end of file goto exit
start asynchronous read
do stuff with data read from file
goto loop
exit:
If you have more than one CPU then one CPU reads the file and parses the data into lines, the other CPU takes each line and processes it.