mmaping large files(for persistent large arrays) - c

I'm implementing persistent large constant arrays via mmap. Is there any tips and tricks or gotchas one should be aware when using mmap?

All pointers that are stored inside the mmap'd region should be done as offsets from the base of the mmap'd region, not as real pointers! You won't necessarily be getting the same base address when you mmap the region on the next run of the program. (I have had to clean up code that made incorrect assumptions about mmap region base address constancy).

This is the most straight forward use case for mmap() so there shouldn't be much to trip you up.
You are effectively just loading a large constant array. Being constants you shouldn't need to worry about synchronization. It would be advisable to make sure the prot parameter is set to PROT_READ only since you won't be writing.
If one or more programs using the constants are going to be continually run, it might be worthwhile to have a separate program that loads the data and keeps it resident. Runs of the other programs then essentially are just doing an shared memory attach rather than continually reading the file into memory.

Make sure you check for restrictions on open file size or memory usage. On Linux there is a built in shell command ulimit. Run as ulimit -a to see the current settings.
Flush writes to the in-memory array to the file with the msync(2) syscall or else they may stay in memory until munmap(2) and there may be a power outage or something before then!
If multiple processes are mmap'ing the same memory region shared with read and write privileges, make sure that only one is writing to it at a time to avoid corrupting your data. Or use file locking or some other means of synchronization.

Related

Read mmapped data memory efficient

I want to mmap a big file into memory and parse it sequentially. As I understand if bytes have been lazily read into memory once, they stay there. Is there a way to periodically tell the system to release the previously read contents?
This understanding is only a very superficial view.
To understand what really happens you have to take into account the difference of the virtual memory of your process and the actual real memory of the machine. Mapping a huge file means reserving space in your virtual address-space. It's probably platform-dependent if anything is already read at this point.
When you actually access the data the OS has to fill an actual page of memory. When you access other parts these parts have to be brought into memory. It's completely up to the OS when it will re-use the memory. Normally this happens when some data is accessed by you or an other process and no free memory is available. But could happen at any time. If you access it again later it might be still in memory or will brought back by the OS. No way for your process to tell the difference.
In short: You don't need to care about that. The OS manages all that in the background.
One point might be that if you map a really huge file this takes up space in your virtual address-space which is limited. So if you deal with many huge mappings and or huge allocations you might want to only map parts of the file at a given time.
ADDITION: after thinking a bit about it, I came up with a reason why it might be smarter to do it blockwise-sequential. Although I doubt you will be able to measure that.
Any reasonable OS will look for a block to unload when in need in something like the following order:
unmapped files ( not needed anymore)
LRU unmodified mapped file (can be retrieved from disc)
LRU modified mapped file (same as 2. but needs to be updated on disc before unload)
LRU allocated memory (needs to be written to swap)
So unmapping blocks known to be never used again as you go, you give the OS a hint that these should be freed earlier. This will give data that has been used less recently but might be accessed in the future a bigger chance to stay in memory.

How to selectively put memory into swap? (Linux)

In the case that memory is allocated and its known that it (almost certainly / probably) won't be used for a long time, it could be useful to tag this memory to be more aggressively moved into swap-space.
Is there some command to tell the kernel of this?
Failing that, it may be better to dump these out to temp files, but I was curious about the ability to send-to-swap (or something similar).
Of course if there is no swap-space, this would do nothing, and in that case writing temp files may be better.
You can use the madvise call to tell the kernel what you will likely be doing with the memory in the future. For example:
madvise(base, length, MADV_SOFT_OFFLINE);
tells the kernel that you wont need the memory in quesion any time soon, so it can be flushed to backing store (or just dropped if it was mapped from a file and is unchanged).
There's also MADV_DONTNEED which allows the kernel to drop the contents even if modified (so when you next access the memory, if you do, it might be zeroed or reread from the original mapped file).
The closest thing I can think of would be mmap see: Memory-mapped I/O. This does not write to the linux swap partition, but allows for paging (complete pages of memory) to disk for access. Temporary files and directories are also available with tempfile, mkstemp and mkdtemp, but again, this does not write to the swap partition, but instead it occurs on the normal filesystem.
Other than features similar to the above, I do not believe there is anything that allows direct access to the swap partition (other than exhausting system memory).

When would you use mmap

So, I understand that if you need some dynamically allocated memory, you can use malloc(). For example, your program reads a variable length file into a char[]. You don't know in advance how big to make your array, so you allocate the memory in runtime.
I'm trying to understand when you would use mmap(). I have read the man page and to be honest, I don't understand what the use case is.
Can somebody explain a use case to me in simple terms? Thanks in advance.
mmap can be used for a few things. First, a file-backed mapping. Instead of allocating memory with malloc and reading the file, you map the whole file into memory without explicitly reading it. Now when you read from (or write to) that memory area, the operations act on the file, transparently. Why would you want to do this? It lets you easily process files that are larger than the available memory using the OS-provided paging mechanism. Even for smaller files, mmapping reduces the number of memory copies.
mmap can also be used for an anonymous mapping. This mapping is not backed by a file, and is basically a request for a chunk of memory. If that sounds similar to malloc, you are right. In fact, most implementations of malloc will internally use an anonymous mmap to provide a large memory area.
Another common use case is to have multiple processes map the same file as a shared mapping to obtain a shared memory region. The file doesn't have to be actually written to disk. shm_open is a convenient way to make this happen.
Whenever you need to read/write blocks of data of a fixed size it's much simpler (and faster) to simply map the data file on disk to memory using mmap and acess it directly rather than allocate memory, read the file, access the data, potentially write the data back to disk, and free the memory.
consider the famous producer-consumer problem, the producer creates a shared memory object using shm_open(), and since our goal is to make the producer and consumer share data, we use the mmap syscall to map that shared memory region to the process' address space. Now, the consumer can open that shared memory object (shared memory objects are referred to by a "name") and read from it, after a call to mmap to map the address space as done for the producer.

How to copy a ram_base file to disk efficiently

I want to copy a large a ram-based file (located at /dev/shm direcotry) to local disk, is there some way for an efficient copy instead of read char one by one or create another piece memory? I can use only C language here. Is there anyway that I can put the memory file directly to disk? Thanks!
I would mmap() the files and do memcpy() between them.
Thanks you guys for the help! I made it by mmap the ram-based file and write the entire block directly to the destination. memcopy was not used because I am actually writing to a parallel file system (pvfs), which does not support mmap operation.
/dev/shm is shared memory, so one way to copy it would be to open it as shared memory, but frankly I don't think you will gain anything.
when writing your memory file to disk, the bottleneck will be the disk.
just be sure to write data in big chunks, and you should be fine.
You can just copy it like any other file:
cp /dev/shm/tmp ~/tmp
So, a quick, simple way is to issue a cp command via system().
You could try to see if the splice system call works for this. I'm not sure if it will since it has some restrictions about the types of files that it can work with, but if it did work you would call it repeatedly with memory page sized (or some multiple page size) requests repeatedly until it finished, and the kernel would handle it very efficently.
If this doesn't work you'll need to do either mmap or do plain old read/write.
Reading and Writing in memory page sized chunks makes things much more efficient. It can be even more efficient if your buffers are memory page size aligned since it opens up the oppurtunity for the kernel to just move the data to/from your process's memory via memory managment trickery rather than actually copying the data around.
The only thing you can do is read() in page size aligned chunks. I'm assuming you need to guarantee the data as written, which is going to mean bypassing buffers via posix_fadvise() or using O_DIRECT (I typically use posix_fadvise(), but O_DIRECT is appropriate here).
In this case, the speed of the media being written to alone dictates how quickly this will happen.
If you don't need to bypass buffers, the operation will complete faster, but there's no guarantee that the data will actually be written in the event of a reboot / power outage / etc. Since the source of the data is in shared memory, I'm (again) guessing you want the write to be guaranteed.
The only thing you can optimize is how long it takes read() to get data from shared memory into your own address space, which page size aligned chunks will improve.

When should I use mmap for file access?

POSIX environments provide at least two ways of accessing files. There's the standard system calls open(), read(), write(), and friends, but there's also the option of using mmap() to map the file into virtual memory.
When is it preferable to use one over the other? What're their individual advantages that merit including two interfaces?
mmap is great if you have multiple processes accessing data in a read only fashion from the same file, which is common in the kind of server systems I write. mmap allows all those processes to share the same physical memory pages, saving a lot of memory.
mmap also allows the operating system to optimize paging operations. For example, consider two programs; program A which reads in a 1MB file into a buffer creating with malloc, and program B which mmaps the 1MB file into memory. If the operating system has to swap part of A's memory out, it must write the contents of the buffer to swap before it can reuse the memory. In B's case any unmodified mmap'd pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap'd from. (The OS can detect which pages are unmodified by initially marking writable mmap'd pages as read only and catching seg faults, similar to Copy on Write strategy).
mmap is also useful for inter process communication. You can mmap a file as read / write in the processes that need to communicate and then use synchronization primitives in the mmap'd region (this is what the MAP_HASSEMAPHORE flag is for).
One place mmap can be awkward is if you need to work with very large files on a 32 bit machine. This is because mmap has to find a contiguous block of addresses in your process's address space that is large enough to fit the entire range of the file being mapped. This can become a problem if your address space becomes fragmented, where you might have 2 GB of address space free, but no individual range of it can fit a 1 GB file mapping. In this case you may have to map the file in smaller chunks than you would like to make it fit.
Another potential awkwardness with mmap as a replacement for read / write is that you have to start your mapping on offsets of the page size. If you just want to get some data at offset X you will need to fixup that offset so it's compatible with mmap.
And finally, read / write are the only way you can work with some types of files. mmap can't be used on things like pipes and ttys.
One area where I found mmap() to not be an advantage was when reading small files (under 16K). The overhead of page faulting to read the whole file was very high compared with just doing a single read() system call. This is because the kernel can sometimes satisify a read entirely in your time slice, meaning your code doesn't switch away. With a page fault, it seemed more likely that another program would be scheduled, making the file operation have a higher latency.
mmap has the advantage when you have random access on big files. Another advantage is that you access it with memory operations (memcpy, pointer arithmetic), without bothering with the buffering. Normal I/O can sometimes be quite difficult when using buffers when you have structures bigger than your buffer. The code to handle that is often difficult to get right, mmap is generally easier. This said, there are certain traps when working with mmap.
As people have already mentioned, mmap is quite costly to set up, so it is worth using only for a given size (varying from machine to machine).
For pure sequential accesses to the file, it is also not always the better solution, though an appropriate call to madvise can mitigate the problem.
You have to be careful with alignment restrictions of your architecture(SPARC, itanium), with read/write IO the buffers are often properly aligned and do not trap when dereferencing a casted pointer.
You also have to be careful that you do not access outside of the map. It can easily happen if you use string functions on your map, and your file does not contain a \0 at the end. It will work most of the time when your file size is not a multiple of the page size as the last page is filled with 0 (the mapped area is always in the size of a multiple of your page size).
In addition to other nice answers, a quote from Linux system programming written by Google's expert Robert Love:
Advantages of mmap( )
Manipulating files via mmap( ) has a handful of advantages over the
standard read( ) and write( ) system calls. Among them are:
Reading from and writing to a memory-mapped file avoids the
extraneous copy that occurs when using the read( ) or write( ) system
calls, where the data must be copied to and from a user-space buffer.
Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch
overhead. It is as simple as accessing memory.
When multiple processes map the same object into memory, the data is shared among all the processes. Read-only and shared writable
mappings are shared in their entirety; private writable mappings have
their not-yet-COW (copy-on-write) pages shared.
Seeking around the mapping involves trivial pointer manipulations. There is no need for the lseek( ) system call.
For these reasons, mmap( ) is a smart choice for many applications.
Disadvantages of mmap( )
There are a few points to keep in mind when using mmap( ):
Memory mappings are always an integer number of pages in size. Thus, the difference between the size of the backing file and an
integer number of pages is "wasted" as slack space. For small files, a
significant percentage of the mapping may be wasted. For example, with
4 KB pages, a 7 byte mapping wastes 4,089 bytes.
The memory mappings must fit into the process' address space. With a 32-bit address space, a very large number of various-sized mappings
can result in fragmentation of the address space, making it hard to
find large free contiguous regions. This problem, of course, is much
less apparent with a 64-bit address space.
There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel. This overhead is
generally obviated by the elimination of the double copy mentioned in
the previous section, particularly for larger and frequently accessed
files.
For these reasons, the benefits of mmap( ) are most greatly realized
when the mapped file is large (and thus any wasted space is a small
percentage of the total mapping), or when the total size of the mapped
file is evenly divisible by the page size (and thus there is no wasted
space).
Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.
This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.
An advantage that isn't listed yet is the ability of mmap() to keep a read-only mapping as clean pages. If one allocates a buffer in the process's address space, then uses read() to fill the buffer from a file, the memory pages corresponding to that buffer are now dirty since they have been written to.
Dirty pages can not be dropped from RAM by the kernel. If there is swap space, then they can be paged out to swap. But this is costly and on some systems, such as small embedded devices with only flash memory, there is no swap at all. In that case, the buffer will be stuck in RAM until the process exits, or perhaps gives it back withmadvise().
Non written to mmap() pages are clean. If the kernel needs RAM, it can simply drop them and use the RAM the pages were in. If the process that had the mapping accesses it again, it cause a page fault the kernel re-loads the pages from the file they came from originally. The same way they were populated in the first place.
This doesn't require more than one process using the mapped file to be an advantage.

Resources