Is it possible to read a file without loading it into memory? - c

I want to read a file but it is too big to load it completely into memory.
Is there a way to read it without loading it into memory? Or there is a better solution?

I want to read a file but it is too big to load it completely into memory.
Be aware that -in practice- files are an abstraction (so somehow an illusion) provided by your operating system thru file systems. Read Operating Systems: Three Easy Pieces (freely downloadable) to learn more about OSes. Files can be quite big (even if most of them are small), e.g. many dozens of gigabytes on current laptops or desktops (and many terabytes on servers, and perhaps more).
You don't define what is memory, and the C11 standard n1570 uses that word in a different way, speaking of memory locations in §3.14, and of memory management functions in §7.22.3...
In practice, a process has its virtual address space, related to virtual memory.
On many operating systems -notably Linux and POSIX- you can change the virtual address space with mmap(2) and related system calls, and you could use memory-mapped files.
Is there a way to read it without loading it into memory?
Of course, you can read and write partial chunks of some file (e.g. using fread, fwrite, fseek, or the lower-level system calls read(2), write(2), lseek(2), ...). For performance reasons, better use large buffers (several kilobytes at least). In practice, most checksums (or cryptographic hash functions) can be computed chunkwise, on a very long stream of data.
Many libraries are built above such primitives (doing direct IO by chunks). For example the sqlite database library is able to handle database files of many terabytes (more than the available RAM). And you could use RDBMS (they are software coded in C or C++)
So of course you can deal with files larger than available RAM and read or write them by chunks (or "records"), and this has been true since at least the 1960s. I would even say that intuitively, files can (usually) be much larger than RAM, but smaller than a single disk (however, even this is not always true; some file systems are able to span several physical disks, e.g. using LVM techniques).
(on my Linux desktop with 32Gbytes of RAM, the largest file has 69Gbytes, on an ext4 filesystem with 669G available and 780G total space, and I did had in the past files above 100 Gbytes)
You might find worthwhile to use some database like sqlite (or be a client of some RDBMS like PostGreSQL, etc...), or you could be interested in libraries for indexed files like gdbm. Of course you can also do direct I/O operations (e.g. fseek then fread or fwrite, or lseek then read or write, or pread(2) or pwrite ...).

I need the content to do a checksum, so I need the complete message
Many checksum libraries support incremental updates to the checksum. For example, the GLib has g_checksum_update(). So you can read the file a block at a time with fread and update the checksum as you read.
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <glib.h>
int main(void) {
char filename[] = "test.txt";
// Create a SHA256 checksum
GChecksum *sum = g_checksum_new(G_CHECKSUM_SHA256);
if( sum == NULL ) {
fprintf(stderr, "Could not create checksum.\n");
exit(1);
}
// Open the file we'll be checksuming.
FILE *fp = fopen( filename, "rb" );
if( fp == NULL ) {
fprintf(stderr, "Could not open %s: %s.\n", filename, strerror(errno));
exit(1);
}
// Read one buffer full at a time (BUFSIZ is from stdio.h)
// and update the checksum.
unsigned char buf[BUFSIZ];
size_t size_read = 0;
while( (size_read = fread(buf, 1, sizeof(buf), fp)) != 0 ) {
// Update the checksum
g_checksum_update(sum, buf, (gssize)size_read);
}
// Print the checksum.
printf("%s %s\n", g_checksum_get_string(sum), filename);
}
And we can check it works by comparing the result with sha256sum.
$ ./test
0c46af5bce717d706cc44e8c60dde57dbc13ad8106a8e056122a39175e2caef8 test.txt
$ sha256sum test.txt
0c46af5bce717d706cc44e8c60dde57dbc13ad8106a8e056122a39175e2caef8 test.txt

One way to do this, if the problem is RAM, not virtual address space, is memory mapping the file, either via mmap on POSIX systems, or CreateFileMapping/MapViewOfFile on Windows.
That can get you what looks like a raw array of the file bytes, but with the OS responsible for paging the contents in (and writing them back to disk if you alter them) as you go. When mapped read-only, it's quite similar to just malloc-ing a block of memory and fread-ing to populate it, but:
It's lazy: For a 1 GB file, you're not waiting the 5-30 seconds for the whole thing to be read in before you can work with any part of it, instead, you just pay for each page on access (and sometimes, the OS will pre-read in the background, so you don't even have to wait on the per-page load in)
It responds better under memory pressure; if you run out of memory, the OS can just drop clean pages from memory without writing them to swap, knowing it can page them back in from the golden copy in the file whenever they're needed; with malloc-ed memory, it has to write it out to swap, increasing disk traffic at a time when you're likely oversubscribed on the disk already
Performance-wise, this can be slightly slower under default settings (since, without memory pressure, reading the whole file in mostly guarantees it will be in memory when asked for, while random access to a memory mapped file is likely to trigger on-demand page faults to populate each page on first access), though you can use posix_madvise with POSIX_MADV_WILLNEED (POSIX systems) or PrefetchVirtualMemory (Windows 8 and higher) to provide a hint that the entire file will be needed, causing the system to (usually) page it in in the background, even as you're accessing it. On POSIX systems, other advise hints can be used for more granular hinting when paging the whole file in at once isn't necessary (or possible), e.g. using POSIX_MADV_SEQUENTIAL if you're reading the file data in order from beginning to end usually triggers more aggressive prefetch of subsequent pages, increasing the odds that they're in memory by the time you get to them. By doing so, you get the best of both worlds; you can begin accessing the data almost immediately, with a delay on accessing pages not paged in yet, but the OS will be pre-loading the pages for you in the background, so you eventually run as full speed (while still being more resilient to memory pressure, since the OS can just drop clean pages, rather than writing them to swap first).
The main limitation here is virtual address space. If you're on a 32 bit system, you're likely limited to (depending on how fragmented the existing address space is) 1-3 GB of contiguous address space, which means you'd have to map the file in chunks, and can't have on-demand random access to any point in the file at any time without additional system calls. Thankfully, on 64 bit systems, this limitation rarely comes up; even the most limiting 64 bit systems (Windows 7) provide 8 TB of user virtual address space per process, far larger than the vast, vast majority of files you're likely to encounter (and later versions increase the cap to 128 TB).

Related

Reading a file of arbitrary length in C

What's the most idiomatic/efficient way to read a file of arbitrary length in C?
Get the filesize of the file in bytes and issue a single fread()
Keep fread()ing a constant size buffer until getting EOF
Anything else?
Avoid using any technique which requires knowing the size of the file in advance. That leaves exactly one technique: read the file a bit at a time, in blocks of a convenient size.
Here's why you don't want to try to find the filesize in advance:
If it is not a regular file, there may not be any way to tell. For example, you might be reading directly from a console, or taking piped input from a previous data generator. If your program requires the filesize to be knowable, these useful input mechanisms will not be available to your users, who will complain or choose a different tool.
Even if you can figure out the filesize, you have no way of preventing it from changing while you are reading the file. If you are not careful about how you read the file, you might open a vulnerability which could be exploited by adversarial programs.
For example, if you allocate a buffer of the "correct" size and then read until you get an end-of-file condition, you may end up overwriting random memory. (Multiple reads may be necessary if you use an interface like read() which might read less data than requested.) Or you might find that the file has been truncated; if you don't check the amount of data read, you might end up processing uninitialised memory leading to information leakage.
In practice, you usually don't need to keep the entire file content in memory. You'll often parse the file (notably if it is textual), or at least read the file in smaller pieces, and for that you don't need it entirely in memory. For a textual file, reading it line-by-line (perhaps with some state inside your parser) is often enough (using fgets or getline).
Files exist (notably on disks or SSDs) because usually they can be much "bigger" than your computer RAM. Actually, files have been invented (more than 50 years ago) to be able to deal with data larger than memory. Distributed file systems also can be very big (and accessed remotely even from a laptop, e.g. by NFS, CIFS, etc...)
Some file systems are capable of storing petabytes of data (on supercomputers), with individual files of many terabytes (much larger than available RAM).
You'll also likely to use some databases. These routinely have terabytes of data. See also this answer (about realistic size of sqlite databases).
If you really want to read a file entirely in memory using stdio (but you should avoid doing that, because you generally want your program to be able to handle a lot of data on files; so reading the entire file in memory is generally a design error), you indeed could loop on fread (or fscanf, or even fgetc) till end-of-file. Notice that feof is useful only after some input operation.
On current laptop or desktop computers, you could prefer (for efficiency) to use buffers of a few megabytes, and you certainly can deal with big files of several hundreds of gigabytes (much larger than your RAM).
On POSIX file systems, you might do memory mapped IO with e.g. mmap(2) - but that might not be faster than read(2) with large buffers (of a few megabytes). You could use readahead(2) (Linux specific) and posix_fadvise(2) (or madvise(2) if using mmap) to tune performance by hinting your OS kernel.
If you have to code for Microsoft Windows, you could study its WinAPI and find some way to do memory mapped IO.
In practice, file data (notably if it was accessed recently) often stays in the page cache, which is of paramount importance for performance. When that is not the case, your hardware (disk, controller, ...) becomes the bottleneck and your program becomes I/O bound (in that case, no software trick could improve significantly the performance).

Why use mmap over fread?

Why/when is it better to use mmap(), as opposed to fread()'ing from a filestream in chunks into a byte array?
uint8_t my_buffer[MY_BUFFER_SIZE];
size_t bytes_read;
bytes_read = fread(my_buffer, 1, sizeof(my_buffer), input_file);
if (MY_BUFFER_SIZE != bytes_read) {
fprintf(stderr, "File read failed: %s\n", filepath);
exit(1);
}
There are advantages to mapping a file instead of reading it as a stream:
If you intend to perform random access to different widely-spaced areas of the file, mapping might mean that only the pages you access need to be actually read, while keeping your code simple.
If multiple applications are going to be accessing the same file, mapping it means that it will only be read into memory once, as opposed to the situation where each application loads [part of] the file into its own private buffers.
If the file doesn't fit in memory or would take a large chunk of memory, mapping it can supply the illusion that it fits and simplify your program logic, while letting the operating system decide how to manage rotating bits of the file in and out of physical memory.
If the file contents change, you MAY get to see the new contents automatically. (This can be a dubious advantage.)
There are disadvantages to mapping the file:
If you only need sequential access to the file or it is small or you only need access to a small portion of it, the overhead of setting up a memory mapping and then incurring page faults to actually cause the contents to be read can be less efficient than just reading the file.
If there is an I/O error reading the file, your application will most likely be killed on the spot instead of receiving a system call error to which your application can react gracefully. (Technically you can catch the SIGBUS in the former case but recovering properly from that kind of thing is not easy.)
If you are not using a 64-bit architecture and the file is very large, there might not be enough address space to map it.
mmap() is less portable than read() (of fread() as you suggest).
mmap() will only work on regular files (on some filesystems) and some block devices.

High performance reading - linux/pthreads

I have moderately large binary file consisting of independent blocks like this:
header1
data1
header2
data2
header3
data3
...
The number of blocks, the size of each block and the total size of the file vary quite a lot, but typical numbers are ~1000 blocks and average blocksize 100kb. The files are generated by an external application which I have no control over, but I want to read them as fast as possible. In many cases I am only interested in a fraction (i.e. 10 %) of the blocks, and this is the case I will optimize for.
My current implementation is like this:
Open the file and read all the headers - using information in the header to fseek() to the next header location; retain an open FILE * pointer.
When data is requested use fseek() to locate the data block, read all the data and return it.
This works fine - but I was thinking maybe(?) it was possible to speed things up using e.g. aio, mmap or other techniques I have only heard of.
Any thoughts?
Joakim
The speed difference between mmap and read is not that big (both need to read the data from disk), the biggest advantage of mmap is avoiding the double buffering.
If you are only interested in 10% of the contents, your biggest saving will be to not read the other 90%. This could be done by only reading the headers, and seeking to the next header or to the data block wanted. But it all depends on the fileformat, which the OP did not show in detail.
Most of the time is probably spent in accessing the disk. So perhaps buying an SSD is sensible. (Whatever you do, your application is I/O bound).
Apparently, your file is only about 100Mb. You could get it on disk (kernel file) cache just by reading it, e.g. with cat yourfile > /dev/null before running your program. For such a small file (on a reasonable machine it fits in RAM), I won't worry that much.
You could pre-process the text file, e.g. to make a database (for sqlite, or a real RDBMS like PostGreSQL) or just a gdbm indexed file.
If using <stdio.h> you might have a bigger buffer with setbuffer, or call fopen with a "rmt" mode (the m is a GNU Glibc extension to ask mmap-ing it).
You could use mmap with madvise.
You could (perhaps in a separate thread) use the readahead syscall.
But your file seems small enough that you should not bother that much. Are you sure it is really a performance issue? Do you read that file many thousand times per day, or do you have many hundreds of such files?

Why is sequentially reading a large file row by row with mmap and madvise sequential slower than fgets?

Overview
I have a program bounded significantly by IO and am trying to speed it up.
Using mmap seemed to be a good idea, but it actually degrades the performance relative to just using a series of fgets calls.
Some demo code
I've squeezed down demos to just the essentials, testing against an 800mb file with about 3.5million lines:
With fgets:
char buf[4096];
FILE * fp = fopen(argv[1], "r");
while(fgets(buf, 4096, fp) != 0) {
// do stuff
}
fclose(fp);
return 0;
Runtime for 800mb file:
[juhani#xtest tests]$ time ./readfile /r/40/13479/14960
real 0m25.614s
user 0m0.192s
sys 0m0.124s
The mmap version:
struct stat finfo;
int fh, len;
char * mem;
char * row, *end;
if(stat(argv[1], &finfo) == -1) return 0;
if((fh = open(argv[1], O_RDONLY)) == -1) return 0;
mem = (char*)mmap(NULL, finfo.st_size, PROT_READ, MAP_SHARED, fh, 0);
if(mem == (char*)-1) return 0;
madvise(mem, finfo.st_size, POSIX_MADV_SEQUENTIAL);
row = mem;
while((end = strchr(row, '\n')) != 0) {
// do stuff
row = end + 1;
}
munmap(mem, finfo.st_size);
close(fh);
Runtime varies quite a bit, but never faster than fgets:
[juhani#xtest tests]$ time ./readfile_map /r/40/13479/14960
real 0m28.891s
user 0m0.252s
sys 0m0.732s
[juhani#xtest tests]$ time ./readfile_map /r/40/13479/14960
real 0m42.605s
user 0m0.144s
sys 0m0.472s
Other notes
Watching the process run in top, the memmapped version generated a few thousand page faults along the way.
CPU and memory usage are both very low for the fgets version.
Questions
Why is this the case? Is it just because the buffered file access implemented by fopen/fgets is better than the aggressive prefetching that mmap with madvise POSIX_MADV_SEQUENTIAL?
Is there an alternative method of possibly making this faster(Other than on-the-fly compression/decompression to shift IO load to the processor)? Looking at the runtime of 'wc -l' on the same file, I'm guessing this might not be the case.
POSIX_MADV_SEQUENTIAL is only a hint to the system and may be completely ignored by a particular POSIX implementation.
The difference between your two solutions is that mmap requires the file to be mapped into the virtual address space entierly, whereas fgets has the IO entirely done in kernel space and just copies the pages into a buffer that doesn't change.
This also has more potential for overlap, since the IO is done by some kernel thread.
You could perhaps increase the perceived performance of the mmap implementation by having one (or more) independent threads reading the first byte of each page. This (or these) thread then would have all the page faults and the time your application thread would come at a particular page it would already be loaded.
Reading the man pages of mmap reveals that the page faults could be prevented by adding MAP_POPULATE to mmap's flags:
MAP_POPULATE (since Linux 2.5.46): Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later accesses to the mapping will not be blocked by page faults.
This way a page faulting pre-load thread (as suggested by Jens) will become obsolete.
Edit:
First of all the benchmarks you make should be done with the page cache flushed to get meaningful results:
echo 3 | sudo tee /proc/sys/vm/drop_caches
Additionally: The MADV_WILLNEED advice with madvise will pre-fault the required pages in (same as the POSIX_FADV_WILLNEED with fadvise). Currently unfortunately these calls block until the requested pages are faulted in, even if the documentation tells differently. But there are kernel patches underway which queue the pre-fault requests into a kernel work-queue to make these calls asynchronous as one would expect - making a separate read-ahead user space thread obsolete.
What you're doing - reading through the entire mmap space - is supposed to trigger a series of page faults. with mmap, the OS only lazily loads pages of the mmap'd data into memory (loads them when you access them). So this approach is an optimization. Although you interface with mmap as if the entire thing is in RAM, it is not all in RAM - it is just a chunk set aside in virtual memory.
In contrast, when you do a read of a file into a buffer the OS pulls the entire structure into RAM (into your buffer). This can apply alot of memory pressure, crowding out other pages, forcing them to be written back to disk. It can lead to thrashing if you're low on memory.
A common optimization technique when using mmap is to page-walk the data into memory: loop through the mmap space, incrementing your pointer by the page size, accessing a single byte per page and triggering the OS to pull all the mmap's pages into memory; triggering all these page faults. This is an optimization technique to "prime the RAM", pulling the mmap in and readying it for future use. With this approach, the OS won't need to do as much lazy loading. You can do this on a separate thread to lead the pages in prior to your main threads access - just make sure you don't run out of RAM or get too far ahead of the main thread, you'll actually begin to degrade performance.
What is the difference between page walking w/ mmap and read() into a large buffer? That's kind of complicated.
Older versions of UNIX, and some current versions, don't always use demand-paging (where the memory is divided up into chunks and swapped in / out as needed). Instead, in some cases, the OS uses traditional swapping - it treats data structures in memory as monolithic, and the entire structure is swapped in / out as needed. This may be more efficient when dealing with large files, where demand-paging requires copying pages into the universal buffer cache, and may lead to frequent swapping or even thrashing. Swapping may avoid use of the universal buffer cache - reducing memory consumption, avoiding an extra copy operation and avoiding frequent writes. Downside is you can't benefit from demand-paging.
With mmap, you're guaranteed demand-paging; with read() you are not.
Also bear in mind that page-walking in a full mmap memory space is always about 60% slower than a flat out read (not counting if you use MADV_SEQUENTIAL or other optimizations).
One note when using mmap w/ MADV_SEQUENTIAL - when you use this, you must be absolutely sure your data IS stored sequentially, otherwise this will actually slow down the paging in of the file by about 10x. Usually your data is not mapped to a continuous section of the disk, it's written to blocks that are spread around the disk. So I suggest you be careful and look closely into this.
Remember, too much data in RAM will pollute the RAM, making page faults alot more common elsewhere. One common misconception about performance is that CPU optimization is more important than memory footprint. Not true - the time it takes to travel to disk exceeds the time of CPU operations by something like 8 orders of magnitude, even with todays SSDs. Therefor, when program execution speed is a concern, memory footprint and utilization is far more important.
A nice thing about read() is the data can be stored on the stack (assuming the stack is large enough), which will further speed up processing.
Using read() with a streaming approach is a good alternative to mmap, if it fits your use case. This is kind of what you're doing with fgets/fputs (fgets/fputs is internally implemented with read). Here what you do is, in a loop, read into a buffer, process the data, & then read in the next section / overwrite the old data. Streaming like this can keep your memory consumption very low, and can be the most efficient way of doing I/O. The only downside is that you never have the entire file in memory at once, and it doesn't persist in memory. So it's a one-off approach. If you can use it - great, do it. If not... use mmap.
So whether read or mmap is faster... it depends on many factors. Testing is probably what you need to do. Generally speaking, mmap is nice if you plan on using the data for an extended period, where you will benefit from demand-paging; or if you just can't handle that amount of data in memory at once. Read() is better if you are using a streaming approach - the data doesn't have to persist, or the data can fit in memory so memory pressure isn't a concern. Also if the data won't be in memory for very long, read() may be preferable.
Now, with your current implementation - which is a sort of streaming approach - you are using fgets() and stopping on \n. Large, bulk reads are more efficient than calling read() repeatedly a million times (which is what fgets does). You don't have to use a giant buffer - you don't want excess memory pressure (which can pollute your cache & other things), & the system also has some internal buffering it uses. But you do want to be reading into a buffer of... lets say 64k in size. You definitely dont want to be calling read line by line.
You could multithread the parsing of that buffer. Just make sure the threads access data in different cache blocks - so find the size of the cache block, get your threads working on different portions of the buffer distanced by at least the cache block size.
Some more specific suggestions for your particular problem:
You might try reformatting the data into some binary format. For example, try changing the file encoding to a custom format instead of UTF-8 or whatever it is. That could reduce its size. 3.5 million lines is quite alot of characters to loop through... it's probably ~150 million character comparisons that you are doing.
If you can sort the file by line length prior to the program running... you can write an algorithm to much more quickly parse the lines - just increment a pointer and test the character you arrive at, making sure it's '\n'. Then do whatever processing you need to do.
You'll need to find a way to maintain the sorted file by inserting new data into appropriate places with this approach.
You can take this a step further - after sorting your file, maintain a list of how many lines of a given length are in the file. Use that to guide your parsing of lines - jump right to the end of each line w/out having to do character comparisons.
If you can't sort the file, just create a list of all the offsets from the start of each line to its terminating newline. 3.5 million offsets.
Write algorithms to update that list on insertion/deletion of lines from the file
When you get into file processing algorithms such as this... it begins to resemble the implementation of a noSQL database. An alternative might just be to insert all this data into a noSQL database. Depends on what you need to do: believe it or not, sometimes just raw custom file manipulation & maintenance described above is faster than any database implementation, even noSQL databases.
A few more things:
When you use this streaming approach with read() you must take care to handle the edge cases - where you reach the end of one buffer, and start a new buffer - appropriately. That's called buffer-stitching.
Lastly, on most modern systems when you use read() the data still gets stored in the universal buffer cache and then copied into your process. That's an extra copy operation. You can disable the buffer cache to speed up the IO in certain cases where you're handling big files. Beware, this will disable paging. But if the data is only in memory for a brief time, this doesn't matter.
The buffer cache is important - find a way to reenable it after the IO was finished. Maybe disable it just for the particular process, do your IO in a separate process, or something... I'm not sure about the details, but this is something that can be done.
I don't think that's actually your problem, though, tbh I think the character comparisons - once you fix that it should just be fine.
That's the best I've got, maybe the experts will have other ideas.
Carry onward!

When should I use mmap for file access?

POSIX environments provide at least two ways of accessing files. There's the standard system calls open(), read(), write(), and friends, but there's also the option of using mmap() to map the file into virtual memory.
When is it preferable to use one over the other? What're their individual advantages that merit including two interfaces?
mmap is great if you have multiple processes accessing data in a read only fashion from the same file, which is common in the kind of server systems I write. mmap allows all those processes to share the same physical memory pages, saving a lot of memory.
mmap also allows the operating system to optimize paging operations. For example, consider two programs; program A which reads in a 1MB file into a buffer creating with malloc, and program B which mmaps the 1MB file into memory. If the operating system has to swap part of A's memory out, it must write the contents of the buffer to swap before it can reuse the memory. In B's case any unmodified mmap'd pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap'd from. (The OS can detect which pages are unmodified by initially marking writable mmap'd pages as read only and catching seg faults, similar to Copy on Write strategy).
mmap is also useful for inter process communication. You can mmap a file as read / write in the processes that need to communicate and then use synchronization primitives in the mmap'd region (this is what the MAP_HASSEMAPHORE flag is for).
One place mmap can be awkward is if you need to work with very large files on a 32 bit machine. This is because mmap has to find a contiguous block of addresses in your process's address space that is large enough to fit the entire range of the file being mapped. This can become a problem if your address space becomes fragmented, where you might have 2 GB of address space free, but no individual range of it can fit a 1 GB file mapping. In this case you may have to map the file in smaller chunks than you would like to make it fit.
Another potential awkwardness with mmap as a replacement for read / write is that you have to start your mapping on offsets of the page size. If you just want to get some data at offset X you will need to fixup that offset so it's compatible with mmap.
And finally, read / write are the only way you can work with some types of files. mmap can't be used on things like pipes and ttys.
One area where I found mmap() to not be an advantage was when reading small files (under 16K). The overhead of page faulting to read the whole file was very high compared with just doing a single read() system call. This is because the kernel can sometimes satisify a read entirely in your time slice, meaning your code doesn't switch away. With a page fault, it seemed more likely that another program would be scheduled, making the file operation have a higher latency.
mmap has the advantage when you have random access on big files. Another advantage is that you access it with memory operations (memcpy, pointer arithmetic), without bothering with the buffering. Normal I/O can sometimes be quite difficult when using buffers when you have structures bigger than your buffer. The code to handle that is often difficult to get right, mmap is generally easier. This said, there are certain traps when working with mmap.
As people have already mentioned, mmap is quite costly to set up, so it is worth using only for a given size (varying from machine to machine).
For pure sequential accesses to the file, it is also not always the better solution, though an appropriate call to madvise can mitigate the problem.
You have to be careful with alignment restrictions of your architecture(SPARC, itanium), with read/write IO the buffers are often properly aligned and do not trap when dereferencing a casted pointer.
You also have to be careful that you do not access outside of the map. It can easily happen if you use string functions on your map, and your file does not contain a \0 at the end. It will work most of the time when your file size is not a multiple of the page size as the last page is filled with 0 (the mapped area is always in the size of a multiple of your page size).
In addition to other nice answers, a quote from Linux system programming written by Google's expert Robert Love:
Advantages of mmap( )
Manipulating files via mmap( ) has a handful of advantages over the
standard read( ) and write( ) system calls. Among them are:
Reading from and writing to a memory-mapped file avoids the
extraneous copy that occurs when using the read( ) or write( ) system
calls, where the data must be copied to and from a user-space buffer.
Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch
overhead. It is as simple as accessing memory.
When multiple processes map the same object into memory, the data is shared among all the processes. Read-only and shared writable
mappings are shared in their entirety; private writable mappings have
their not-yet-COW (copy-on-write) pages shared.
Seeking around the mapping involves trivial pointer manipulations. There is no need for the lseek( ) system call.
For these reasons, mmap( ) is a smart choice for many applications.
Disadvantages of mmap( )
There are a few points to keep in mind when using mmap( ):
Memory mappings are always an integer number of pages in size. Thus, the difference between the size of the backing file and an
integer number of pages is "wasted" as slack space. For small files, a
significant percentage of the mapping may be wasted. For example, with
4 KB pages, a 7 byte mapping wastes 4,089 bytes.
The memory mappings must fit into the process' address space. With a 32-bit address space, a very large number of various-sized mappings
can result in fragmentation of the address space, making it hard to
find large free contiguous regions. This problem, of course, is much
less apparent with a 64-bit address space.
There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel. This overhead is
generally obviated by the elimination of the double copy mentioned in
the previous section, particularly for larger and frequently accessed
files.
For these reasons, the benefits of mmap( ) are most greatly realized
when the mapped file is large (and thus any wasted space is a small
percentage of the total mapping), or when the total size of the mapped
file is evenly divisible by the page size (and thus there is no wasted
space).
Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.
This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.
An advantage that isn't listed yet is the ability of mmap() to keep a read-only mapping as clean pages. If one allocates a buffer in the process's address space, then uses read() to fill the buffer from a file, the memory pages corresponding to that buffer are now dirty since they have been written to.
Dirty pages can not be dropped from RAM by the kernel. If there is swap space, then they can be paged out to swap. But this is costly and on some systems, such as small embedded devices with only flash memory, there is no swap at all. In that case, the buffer will be stuck in RAM until the process exits, or perhaps gives it back withmadvise().
Non written to mmap() pages are clean. If the kernel needs RAM, it can simply drop them and use the RAM the pages were in. If the process that had the mapping accesses it again, it cause a page fault the kernel re-loads the pages from the file they came from originally. The same way they were populated in the first place.
This doesn't require more than one process using the mapped file to be an advantage.

Resources