Reduce number of disk access while writing to file in C - c

I am writing a multi-threaded application and as of now I have this idea. I have a FILE*[n] where n is a number determined at runtime. I open all the n files for reading and then multiple threads can access to read it. The computation on the data of each file is equivalent i.e. if serial execution is supposed then each file will remain in memory for the same time.
Each files can be arbitrarily large so on should not assume that they can be loaded in memory.
Now in such a scenario I want to reduce the number of disk IO's that occur. It would be great if someone can suggest any shared memory model for such scenario (I don't know if I am using one because I have very less idea of how things are implemented) .I am not sure how should I achieve this. In other words i just want to know what is the most efficient model to implement such a scenario. I am using C.
EDIT: A more detailed scenario.
The actual problem is I have n bloom filters for data contained in n files and once all the elements from a file are inserted in the corresponding bloom filter I need to need to do membership testing. Since membership testing is a read-only process on data file I can read file from multiple threads and this problem can be easily parallelized. Now the number of files having data are fairly large(around 20k and note that number of files equals number of bloom filter) so I choose to spawn a thread for testing against a bloom-filter i.e. each bloom filter will have its own thread and that will read every other file one by one and test the membership of data against the bloom filter. I wan to minimize disk IO in such a case.

At the start use the mmap() function to map the files into memory, instead of opening/reading FILE*'s. After that spawn the threads which read the files.
In that way the OS buffers the accesses in memory, only performing disk io when the cache becomes full.

If your program is multi-threaded, all the threads are sharing memory unless you take steps to create thread-local storage. You don't need o/s shared memory directly. The way to minimize I/O is to ensure that each file is read only once if at all possible, and similarly that results files are only written once each.
How you do that depends on the processing you're doing.
f each thread is responsible for processing a file in its entirety, then the thread simply reads the file; you can't reduce the I/O any more than that. If a file must be read by several threads, then you should try to memory map the file so that it is available to all the relevant threads. If you're using a 32-bit program and the files are too big to all fit in memory, you can't necessarily do the memory mapping. Then you need to work out how the different threads will process each file, and try to minimize the number of times different threads have to reread the files. If you're using a 64-bit program, you may have enough virtual memory to handle all the files via memory mapped I/O. You still want to keep the number of times that the data is accessed to a minimum. Similar concepts apply to the output files.

Related

Is it possible to do Input/Output operations in parallel in C?

I 'd like to write a large array in c in a .csv file.
Would it be possible to write it in parallel?
maybe using a OpenMP ?
The piece of code I'd like to parallelize is a typical IO operation in a file.
Given a resutVector1 and a resultVector2 of size n,
fp=fopen("output.csv","w+");
for(i=0;i<n;i++){
fprintf(fp,"%f,%f\n",resultVector1[i],resultVector2[i]);
}
fclose(fp);
You are going to run into a number of problems trying to perform a parallel write to a single file.
w+ truncates an existing file to 0 length before the write operations or creates a new file, How are you going to coordinate the writing of parallel file pointers?
In any case if you have multiple writers, you will need to synchronize them and you will lose any speed advantage you would have had over a sequential write. In fact, they will probably be slower due to the synchronization overhead than a single dedicated sequential write thread.
Thinking about your question a bit more. If you really had a huge array, say 500 million integers and you really needed the fastest way to read/write this array to a persistent file. You could divide the array by the number of dedicated threads you can allocate, write each segment to a separate file. You can then read this array back into your array by doing a parallel read of this data. In this case you can use a Parallel For type of pattern and avoid the synchronization lock overhead you have with a single file.
So in the example I gave, if you have 4 threads, you will divide the array inter quarters where each thread will write/read its own quarter to and from its separate file.
Note: if all the files are on the same disk drive you may have some I/O slowdown do the multiple simultaneous read/write operations going on at different parts of the disk. This effect can be mediated if you are able to save each file to a different disk/server.
You could open 2 files and write each vector in its own file, this MIGHT help but I won't bet on it, it would depend on the architecture of your platform I think. Plus if you need both in the same file you still have to copy it together, which again takes time.
Also the writes to the harddrive itself are probably the bottleneck here so there is no need to speed up the way you fill up the buffer to the harddrive.
You might open two files on two different harddrives, but I still doubt this would give you a real speed up.
The question triggered me to write pread, a parallel read method implemented using pthread library. Given the file size FILESIZE and the number of threads n, pread method will slice the input file into roughly equal chunks of size FILESIZE/n and assign each chunk to a thread. Then, each thread starts reading the file using fread from different offsets of file with predefined BUFFFERSIZE in parallel. You can find the implementation here.
This is an ongoing implementation, I'm still working on parallel write side.

How fio benchmark tool performs sequential disk reads?

I use fio to test read/write bandwidth of my disks.
Even for the sequential read test, I can let it run the multiple threads.
What does it mean by running multiple threads on sequential read test?
Does it perform multiple sequential reads? (each thread is assigned a file offset to start the sequential scanning from)
Do the multiple threads share a file offset? (Each thread invokes sequential reads using a single file offset that is shared by the multiple threads)
I tried to read the open source codes of fio, but I couldn't really figure it out.
Can any one give me an idea?
Sadly you didn't include a jobfile with your question and didn't say what platform you're running on. Here's a stab at answers:
Yes it does multiple sequential reads though wouldn't it have to do this even with a single thread?
No each thread has its own offset but unless you use offset and size they will all work inside the same "region".
On Linux fio actually defaults to using separate processes per job and each process has its own file descriptor (for ioengines that use files) for each file used. Further, some ioengines (e.g. libaio, pvsync but there are many others) use syscalls that take the offset you want to do the I/O at with the request itself so even if they do share a descriptor their offset is not impacted by others using the same descriptor.
There may be problems if you use the sync ioengine, ask fio to use threads rather than processes and have those threads work on the same file. That ioengine has to use lseek prior to doing its I/O so perhaps there's a chance for another thread's lseek to sneak in before the I/O is submitted. Note that the sync I/O engine is not the default one used with recent fio versions.
Perhaps the fio mailing list can say more?

Multiple threads on different cores reading same set of files

I have a multi threaded process, where each thread runs on one core. I am reading the same set of files from each of the threads and processing them. Will reading the same set of files by multiple threads affect the performance of the process?
Not necessarily, but there are a few factors to be taken on account.
When you open a file for READING you don't need to put a read lock on it.
That means multiple threads can be reading from the same file.
In fact all threads from a process share the process memory, so you can use that for your benefit by caching the whole set (or part of it, depending on the size) on the process memory. That will reduce access time.
Otherwise if we assume all files are in the same device, the problem is that reading multiple files simultaneously from the same device at the same time is slow and, depending on the number of threads and the storage type it can be noticeably slower
Reading the same set of files from each different thread may reduce the performance of the process, because the IO request are normally costly and slow, in addition to being repeating the same read operation for each difference thread.
One possible solution to deal with this is having one thread dealing with the IO reads/writes and the rest processing the data, for example as a producer consumer.
You may consider Memory-Mapped Files for concurrent read access.
It will avoid overhead of copying data into every process address space.

How to prevent C read() from reading from cache

I have a program that is used to exercise several disk units in a raid configuration. 1 process synchronously (O_SYNC) writes random data to a file using write(). It then puts the name of the directory into a shared-memory queue, where a 2nd process is waiting for the queue to have entries to read the data back into memory using read().
The problem that I can't seem to overcome is that when the 2nd process attempts to read the data back into memory, none of the disk units show read accesses. The program has code to check whether or not the data read back in is equal to the code that is written to disk, and the data always matches.
My question is, how can I make the OS (IBM i) not buffer the data when it is written to disk so that the read() system call accesses the data on the disk rather than in cache? I am doing simple throughput calculations and the read() operations are always 10+ times faster than the write operations.
I have tried using the O_DIRECT flag, but cannot seem to get the data to write to the file. It could have to do with setting up the correct aligned buffers. I have also tried the posix_fadvise(fd, offset,len, POSIX_FADV_DONTNEED) system call.
I have read through this similar question but haven't found a solution. I can provide code if it would be helpful.
My though is that if you write ENOUGH data, then there simply won't be enough memory to cache it, and thus SOME data must be written to disk.
You can also, if you want to make sure that small writes to your file works, try writing ANOTHER large file (either from the same process or a different one - for example, you could start a process like dd if=/dev/zero of=myfile.dat bs=4k count=some_large_number) to force other data to fill the cache.
Another "trick" may be to "chew up" some (more like most) of the RAM in the system - just allocate a large lump of memory, then write to some small part of it at a time - for example, an array of integers, where you write to every 256th entry of the array in a loop, moving to one step forward each time - that way, you walk through ALL of the memory quickly, and since you are writing continuously to all of it, the memory will have to be resident. [I used this technique to simulate a "busy" virtual machine when running VM tests].
The other option is of course to nobble the caching system itself in OS/filesystem driver, but I would be very worried about doing that - it will almost certainly slow the system down to a slow crawl, and unless there is an existing option to disable it, you may find it hard to do accurately/correctly/reliably.
...exercise several disk units in a raid configuration... How? IBM i doesn't allow a program access to the hardware. How are you directing I/O to any specific physical disks?
ANSWER: The write/read operations are done in parallel against IFS so the stream file manager is selecting which disks to target. By having enough threads reading/writing, the busyness of SYSBASE or an IASP can be driven up.
...none of the disk units show read accesses. None of them? Unless you are running the sole job on a system in restricted state, there is going to be read activity on the disks from other tasks. Is the system divided into multiple LPARs? Multiple ASPs? I'm suggesting that you may be monitoring disks that this program isn't writing to, because IBM i handles physical I/O, not programs.
ANSWER I guess none of them is a slight exaggeration - I know which disks belong to SYSBASE and those disks are not being targeted with many read requests. I was just trying to generalize for an audience not familiar w/IBM i. In the picture below, you will see that the write reqs are driving the % busyness up, but the read reqs are not even though they are targeting the same files.
...how can I make the OS (IBM i) not buffer the data when it is written to disk... Use a memory starved main storage pool to maximise paging, write immense blocks of data so as to guarantee that the system and disk controller caches overflow and use a busy machine so that other tasks are demanding disk I/O as well.

Alternative to reduce large number of binary files reading access time from hard disk

In my first prototype of application, I have to read around 400,000 files (each 4KB file, around total 1.5 GB data) from hard disk sequentially, and do some operation over the data read from each files, and store the results over RAM. Through this mechanism, I were first accessing I/O for a file and then utilizing CPU for operation, and keep going for another file, but it was very slow process.
To work around, now we first read all the files, and stored all the files data in the RAM, and now doing operation (utilizing CPU). It gave significant improvement.
But in my second phase of development, I have to read 20 GB of data, which now I cannot store in RAM. And, single reading operation with CPU utilization is very time consuming operation.
Can someone please suggest some method to work around this problem?
I am developing this application on Windows in C, with Visual Studio compiler.
There's a technique called Asynchronous I/O (AIO) that lets you keep doing some processing with the CPU while a file is read in the background. You can use this to read the next few files at the same time as you're processing a file.
The various AIO calls are OS-specific. On Windows, Microsoft call it "Overlapped I/O". See this Wikipedia page or this MSDN page for more info.
To work around, now we first read all the files, and stored all the files data in the RAM, and now doing operation (utilizing CPU).
(Assuming files can be processed independently...)
You are half-way there. Instead of waiting until all files have been loaded to RAM, start processing as soon as any file is loaded. That would be a form of pipelining.
You'll need three components:
A thread1 that reads files ("producer").
A thread2 that processes the files ("consumer").
A message queue3 between them.
The producer reads the files the way you are already doing it, but instead of processing them, just enqueues them to the message queue. The consumer thread waits until it can dequeue the file from the queue, processes it, and then immediately frees the memory that has been occupied by the file and resumes waiting to the queue.
In case you can process files by sequentially traversing them start-to-finish, you could even devise a more fine-grained "streaming", where files wold be both read and processed in chunks, which could lower the peak memory consumption even more (e.g. if you have some extra-large files that would no longer need to be kept whole in the memory).
1 Or a set of threads to parallelize the I/O, if you anticipate reading from multiple physical disks.
2 Or a set of threads to saturate the CPU cores, if processing the file is not cheaper than reading it.
3 You don't need a fancy persistent distributed message queue for that. Just a
straight in-memory queue, a-la BlockingCollection in .NET (I'm sure you'll find something similar for pure C).
Create threads (in loop) which will read files into RAM.
Work with the data in RAM in separate thread[s] and free RAM after processing.
Keep limits and a poll of records about files (read and processed) in the shared object protected by mutex.
Use semaphore for resources (files in RAM) production/utilisation synchronisation.

Resources