Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to write bulk of integers to a file after performing heap operations on them, one by one. I am trying to merge sorted files into a single file. As of now, I am writing to file after every operation. I am using min heap to merge files.
My questions are -
When performing file write, is disk accessed every time a file write is made or chunks of memory blocks are written at a time?
Will it improve performance if I'll take output of heap in an array of say size 1024 or may be more and then perform a write at once?
Thank you in advance.
EDIT- Will using setbuffer() help? I feel it should help for certain extend.
1. When performing file write, is disk accessed every time a file write is made
or chunks of memory blocks are written at a time?
No. Your output isn't written until the output buffer is full. You can force a write with fflush to flush output streams causing an immediate write, but otherwise, output is buffered.
other 1. Will it improve performance if I'll take output of heap in an array of
say size 1024 or may be more and then perform a write at once?
If you are not exhausting the heap, then no, you are not going to gain significant performance putting the storage on the stack, etc.. Buffering is always preferred, but if you store all the data in an array and then call write, you still have the same size output buffer to deal with.
When performing file write, is disk accessed every time a file write
is made or chunks of memory blocks are written at a time?
This is up to the kernel. Buffers are flushed when you call fsync() on the file descriptor. fflush() only flushes the data buffered in the FILE structure, it doesn't flush the kernel buffers.
Will it improve performance if I'll take output of heap in an array of
say size 1024 or may be more and then perform a write at once?
I made tests some time ago to compare the performance of write() and fwrite() against a custom implementation, and it turns out you can gain a fair speedup by calling write() directly with large chunks. This is actually what fwrite() does, but due to the infrastructure it has to maintain, it is slower than a custom implementation. As for buffer size, 1024 is certainly too small. 8K or something would perform better.
It is operating system and implementation specific.
On most Linux systems -with a good filesystem like Ext4- the kernel will try hard to avoid disk accesses by caching a lot of file system data. See linuxatemyram
But I would still recommend avoiding making too much of IO operations, and have some buffering (if using stdio(3) routines, pass buffers of several dozens of kilobytes to fwrite(3) and use setvbuf(3) and fflush(3) with care; alternatively use direct syscalls like write(2) or mmap(2) with buffers of e.g. 64Kbytes...)
BTW, using perhaps the posix_fadvise(2) syscall might marginally help performance (if used wisely).
In reality, the bottleneck is often the hardware. Use RAM filesystems (tmpfs) or fast SSD disks if you can.
On Windows systems (which I never used), I have no idea, but the general intuition is that some buffering should help.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a string composed of some packet statistics, such as packet length, etc.
I would like to store this to a csv file, but if I use the standard fprintf to write to a file, it writes incredibly slowly, and I end up losing information.
How do I write information to a file as quickly as possible in order to minimize information loss from packets. Ideally I would like to support millions of packets per second, which means I need to write millions of lines per second.
I am using XDP to get packet information and send it to the userspace via an eBPF map if that matters.
The optimal performance will depend on the hard drive, drive fragmentation, the filesystem, the OS and the processor. But optimal performance will never be achieved by writing small chunks of data that do not align well with the filesystem's disk structure.
A simple solution would be to use a memory mapped file and let the OS asynchronously deal with actually committing the data to the file - that way it is likely to be optimal for the system you are running on without you having to deal with all the possible variables or work out the optimal write block size of your system.
Even with regular stream I/O you will improve performance drastically by writing to a RAM buffer. Making the buffer size a multiple of the block size of your file system is likely to be optimal. However since file writes may block if there is insufficient buffering in the file system itself for queued writes or write-back, you may not want to make the buffer too large if the data generation and the data write occur in a single thread.
Another solution is to have a separate write thread, connected to the thread generating the data via a pipe or queue. The writer thread can then simply buffer data from the pipe/queue until it has a "block" (again matching the file system block size is a good idea), then committing the block to the file. The pipe/queue then acts a buffer storing data generated while the thread is stalled writing to the file. The buffering afforded by the pipe, the block, the file system and the disk write-cache will likely accommodate any disk latency so long at the fundamental write performance of the drive is faster then the rate at which data to write is being generated - nothing but a faster drive will solve that problem.
Use sprintf to write to a buffer in memory.
Make that buffer as large as possible, and when it gets full, then use a single fwrite to dump the entire buffer to disk. Hopefully by that point it will contain many hundreds or thousands of lines of CSV data that will get written at once while you begin to fill up another in-memory buffer with more sprintf.
What's the most idiomatic/efficient way to read a file of arbitrary length in C?
Get the filesize of the file in bytes and issue a single fread()
Keep fread()ing a constant size buffer until getting EOF
Anything else?
Avoid using any technique which requires knowing the size of the file in advance. That leaves exactly one technique: read the file a bit at a time, in blocks of a convenient size.
Here's why you don't want to try to find the filesize in advance:
If it is not a regular file, there may not be any way to tell. For example, you might be reading directly from a console, or taking piped input from a previous data generator. If your program requires the filesize to be knowable, these useful input mechanisms will not be available to your users, who will complain or choose a different tool.
Even if you can figure out the filesize, you have no way of preventing it from changing while you are reading the file. If you are not careful about how you read the file, you might open a vulnerability which could be exploited by adversarial programs.
For example, if you allocate a buffer of the "correct" size and then read until you get an end-of-file condition, you may end up overwriting random memory. (Multiple reads may be necessary if you use an interface like read() which might read less data than requested.) Or you might find that the file has been truncated; if you don't check the amount of data read, you might end up processing uninitialised memory leading to information leakage.
In practice, you usually don't need to keep the entire file content in memory. You'll often parse the file (notably if it is textual), or at least read the file in smaller pieces, and for that you don't need it entirely in memory. For a textual file, reading it line-by-line (perhaps with some state inside your parser) is often enough (using fgets or getline).
Files exist (notably on disks or SSDs) because usually they can be much "bigger" than your computer RAM. Actually, files have been invented (more than 50 years ago) to be able to deal with data larger than memory. Distributed file systems also can be very big (and accessed remotely even from a laptop, e.g. by NFS, CIFS, etc...)
Some file systems are capable of storing petabytes of data (on supercomputers), with individual files of many terabytes (much larger than available RAM).
You'll also likely to use some databases. These routinely have terabytes of data. See also this answer (about realistic size of sqlite databases).
If you really want to read a file entirely in memory using stdio (but you should avoid doing that, because you generally want your program to be able to handle a lot of data on files; so reading the entire file in memory is generally a design error), you indeed could loop on fread (or fscanf, or even fgetc) till end-of-file. Notice that feof is useful only after some input operation.
On current laptop or desktop computers, you could prefer (for efficiency) to use buffers of a few megabytes, and you certainly can deal with big files of several hundreds of gigabytes (much larger than your RAM).
On POSIX file systems, you might do memory mapped IO with e.g. mmap(2) - but that might not be faster than read(2) with large buffers (of a few megabytes). You could use readahead(2) (Linux specific) and posix_fadvise(2) (or madvise(2) if using mmap) to tune performance by hinting your OS kernel.
If you have to code for Microsoft Windows, you could study its WinAPI and find some way to do memory mapped IO.
In practice, file data (notably if it was accessed recently) often stays in the page cache, which is of paramount importance for performance. When that is not the case, your hardware (disk, controller, ...) becomes the bottleneck and your program becomes I/O bound (in that case, no software trick could improve significantly the performance).
I am writing a function to read binary files that are organized as a succession of (key, value) pairs where keys are small ASCII strings and value are int or double stored in binary format.
If implemented naively, this function makes a lot of call to fread to read very small amount of data (usually no more than 10 bytes). Even though fread internally uses a buffer to read the file, I have implemented my own buffer and I have observed speed up by a factor of 10 on both Linux and Windows. The buffer size used by fread is large enough and the function call cannot be responsible for such a slowdown. So I went and dug into the GNU implementation of fread and discovered some lock on the file, and many other things such as verifying that the file is open with read access and so on. No wonder why fread is so slow.
But what is the rationale behind fread being thread-safe where it seems that multiple thread can call fread on the same file which is mind boggling to me. These requirements make it slow as hell. What are the advantages?
Imagine you have a file where each 5 bytes can be processed in parallel (let's say, pixel by pixel in an image):
123456789A
One thread needs to pick 5 bytes "12345", the next one the next 5 bytes "6789A".
If it was not thread-safe different threads could pick-up wrong chunks. For example: "12367" and "4589A" or even worst (unexpected behaviour, repeated bytes or worst).
As suggested by nemequ:
Note that if you're on glibc you can use the _unlocked variants (*e.g., fread_unlocked). On Windows you can define _CRT_DISABLE_PERFCRIT_LOCKS
Stream I/O is already as slow as molasses. Programmers think that a read from main memory (1000x longer than a CPU cycle) is ages. A read from the physical disk or a network may as well be eternity.
I don't know if that's the #1 reason why the library implementers were ok with adding the lock overhead, but I guarantee it played a significant part.
Yes, it slows it down, but as you discovered, you can manually buffer the read and use your own handling to increase the speed when performance really matters. (That's the key--when you absolutely must read the data as fast as possible. Don't bother manually buffering in the general case.)
That's a rationalization. I'm sure you could think of more!
I'am programing in C language. Sometimes we have to read large data from files for which we normally use fread or read system calls, which means either stream I/O or system call I/O.
I want to ask, if we are reading such a large data, then calculating the block size and reading according to that, will it help us in any way reading it efficiently or not?
I know that reading through system calls can make it slow and there are other conditions, like if we have to deal with network sockets then we should use these, instead of using stream based I/O will give us optimized results. Like wise I need some tips and tricks to read large data from files and the things to be taken care of.
Also if mmap can be more advantageous than these conventional I/O , please elaborate the situations when it would be?
Platform : Linux , gcc compiler
Have you considered memory-mapping the file using mmap?
I think it is always a good idea to read in blocks. For huge files, we would obviously not want to allocate huge amount of memory in heap.
If the file is of the order of a few MBs then I think we can read the whole file at once in a char buffer and use that buffer to process your data. This would be faster than reading again and again from file.
I'm writing a program where performance is quite important, but not critical. Currently I am reading in text from a FILE* line by line and I use fgets to obtain each line. After using some performance tools, I've found that 20% to 30% of the time my application is running, it is inside fgets.
Are there faster ways to get a line of text? My application is single-threaded with no intentions to use multiple threads. Input could be from stdin or from a file. Thanks in advance.
You don't say which platform you are on, but if it is UNIX-like, then you may want to try the read() system call, which does not perform the extra layer of buffering that fgets() et al do. This may speed things up slightly, on the other hand it may well slow things down - the only way to find out is to try it and see.
Use fgets_unlocked(), but read carefully what it does first
Get the data with fgetc() or fgetc_unlocked() instead of fgets(). With fgets(), your data is copied into memory twice, first by the C runtime library from a file to an internal buffer (stream I/O is buffered), then from that internal buffer to an array in your program
Read the whole file in one go into a buffer.
Process the lines from that buffer.
That's the fastest possible solution.
You might try minimizing the amount of time you spend reading from the disk by reading large amounts of data into RAM then working on that. Reading from disk is slow, so minimize the amount of time you spend doing that by reading (ideally) the entire file once, then working on it.
Sorta like the way CPU cache minimizes the time the CPU actually goes back to RAM, you could use RAM to minimize the number of times you actually go to disk.
Depending on your environment, using setvbuf() to increase the size of the internal buffer used by file streams may or may not improve performance.
This is the syntax -
setvbuf (InputFile, NULL, _IOFBF, BUFFER_SIZE);
Where InputFile is a FILE* to a file just opened using fopen() and BUFFER_SIZE is the size of the buffer (which is allocated by this call for you).
You can try various buffer sizes to see if any have positive influence. Note that this is entirely optional, and your runtime may do absolutely nothing with this call.
If the data is coming from disk, you could be IO bound.
If that is the case, get a faster disk (but first check that you're getting the most out of your existing one...some Linux distributions don't optimize disk access out of the box (hdparm)), stage the data into memory (say by copying it to a RAM disk) ahead of time, or be prepared to wait.
If you are not IO bound, you could be wasting a lot of time copying. You could benefit from so-called zero-copy methods. Something like memory map the file and only access it through pointers.
That is a bit beyond my expertise, so you should do some reading or wait for more knowledgeable help.
BTW-- You might be getting into more work than the problem is worth; maybe a faster machine would solve all your problems...
NB-- It is not clear that you can memory map the standard input either...
If the OS supports it, you can try asynchronous file reading, that is, the file is read into memory whilst the CPU is busy doing something else. So, the code goes something like:
start asynchronous read
loop:
wait for asynchronous read to complete
if end of file goto exit
start asynchronous read
do stuff with data read from file
goto loop
exit:
If you have more than one CPU then one CPU reads the file and parses the data into lines, the other CPU takes each line and processes it.