Force loading of mmap'ed pages - c

I have mapped a file into memory using mmap. Now I would like to ensure that there will be no page faults when accessing this memory, i.e. I want to force the system to actually read the data from the harddisk and store it in RAM. I believe
that once the data is there, I can prevent swapping with mlockall. But what is the proper way to get the system to load the data?
I could obviously just do dummy reads of all the pages, but this seems like an ugly hack. Also, I don't want to worry about the compiler being too smart and optimizing away the dummy reads.
Any suggestions?

Why do you think mlock() or mlockall() wouldn't work? Guaranteeing that the affected pages are in RAM is exactly what its purpose is. Quoting from the manpage:
All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
You can use other methods like madvise() to ask for the pages to be loaded into RAM but it's not guaranteed the kernel will comply with that and it's not guaranteed that they will stay in RAM even if the kernel does bring them in. I believe mmap(MAP_POPULATE) also doesn't guarantee that the pages will stay in RAM.

You're looking for MAP_POPULATE.

Related

Read mmapped data memory efficient

I want to mmap a big file into memory and parse it sequentially. As I understand if bytes have been lazily read into memory once, they stay there. Is there a way to periodically tell the system to release the previously read contents?
This understanding is only a very superficial view.
To understand what really happens you have to take into account the difference of the virtual memory of your process and the actual real memory of the machine. Mapping a huge file means reserving space in your virtual address-space. It's probably platform-dependent if anything is already read at this point.
When you actually access the data the OS has to fill an actual page of memory. When you access other parts these parts have to be brought into memory. It's completely up to the OS when it will re-use the memory. Normally this happens when some data is accessed by you or an other process and no free memory is available. But could happen at any time. If you access it again later it might be still in memory or will brought back by the OS. No way for your process to tell the difference.
In short: You don't need to care about that. The OS manages all that in the background.
One point might be that if you map a really huge file this takes up space in your virtual address-space which is limited. So if you deal with many huge mappings and or huge allocations you might want to only map parts of the file at a given time.
ADDITION: after thinking a bit about it, I came up with a reason why it might be smarter to do it blockwise-sequential. Although I doubt you will be able to measure that.
Any reasonable OS will look for a block to unload when in need in something like the following order:
unmapped files ( not needed anymore)
LRU unmodified mapped file (can be retrieved from disc)
LRU modified mapped file (same as 2. but needs to be updated on disc before unload)
LRU allocated memory (needs to be written to swap)
So unmapping blocks known to be never used again as you go, you give the OS a hint that these should be freed earlier. This will give data that has been used less recently but might be accessed in the future a bigger chance to stay in memory.

Is it possible to control page-out and page-in by user programming? If yes then how?

My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.

madvise: not understood

CONTEXT:
I run on an old laptop. I only just have 128Mo ram free on 512Mo total. No money to buy more ram.
I use mmap to help me circumvent this issue and it works quite well.
C code.
Debian 64 bits.
PROBLEM:
Besides all my efforts, I am running out of memory pretty quick right know and I would like to know if I could release the mmaped regions I read to free my ram.
I read that madvise could help, especially the option MADV_SEQUENTIAL.
But I don't quite understand the whole picture.
THE NEED:
To be able to free mmaped allocated memory after the region is read so that it doesn't fill my whole ram with large files. I will not read it soon so it is garbage to me. It is pointless to keep it in ram.
Update: I am not done with the file so don't want to call munmap. I have other stuffs to do with it but in another regions of it. Random reads.
For random read/write access to a mmap()ed file, MADV_SEQUENTIAL is probably not very useful (and may in fact cause undesired behavior). MADV_RANDOM or MADV_DONTNEED would be better options in this case. However, be aware that the kernel is free to ignore any madvise() - although in my understanding, Linux currently does not, as it tends to treat madvise() more as a command than an advisory...
Another option would be to mmap() only selected sections of the file as needed, and munmap() them as you're done with them, perhaps maintaining a pool of some small number of currently active mappings (i.e. mapping more than one region at once if needed, but still keeping it limited).
Or course you must free resources when you're done with them in order not to leak them and thus run out of available space too soon.
Not sure what the question is, if you know about mmap() then surely you know about munmap() too? It's right there on the same manual page.

fadvise vs madvise? can I use both together?

I'm randomly reading data (each read < page size) throughout a huge file (far too big to fit in memory).
I normally set MADV_DONTNEED, but looking at the docs + info it seems I instead need FADV_NOREUSE.
I'm not really getting how madvise() and fadvise() work together. Are they synonymous? Does it matter if I prefer one or the other? Can they be used together? Are they different kernel subsystems? Is FADV_NOREUSE what I'm looking for to gain optimal performance?
madvise() and posix_fadvise() are not synonymous.
madvise() tells the kernel (give advise) what to do with existing memory region while fadvise() tells the kernel what to do with cached (or future cache) of a file data.
For example, if you mmap() anonymous region you should use madvise() to hint the kernal not to swap out (MADV_RANDOM) or to swap out only after access. (MADV_SEQUENTIAL)
If you mmap() a file, or part of a file, you can use either madvise() or fadvise() to hint the kernel to readahead for you (MADV_WILLNEED) or to free that cache (MADV_DONTNEED) or to free after access (POSIX_FADV_NOREUSE, fadvise() only) in additional to the above.
If you use file without mapping the data to your process memory (without using mmap()), you should use fadvise() only. madvise() has no meaning.
As far as kernel subsystem, in linux, it is the same subsystem, simply different ways to refer to memory pages and file cache. Please note that those are only hints and when memory is in dire, the kernel might decide to swap out or reuse cached data despite the hint. Only mlock() and mlockall() can prevent that.
In your case, not giving any hint may help, especially if some pages are being read more than other, since the kernel will figure out which pages are "hot" and will attempt to keep in memory.
If you are only reading from a file then you actually don't need either. The paging daemon will automatically free RAM pages that are associated with non-dirty or shared file-backed mappings. If you keep calling madvise/MADV_DONTNEED then you are specifically instructing the kernel to do this. Which may cause performance impact if you access the same page by chance again in the near future.
fadvise is only useful if you access your file with read/lseek. For mmapped pages it has no effect.

Do mmap/mprotect-readonly zero pages count towards committed memory?

I want to keep virtual address space reserved in my process for memory that was previously used but is not presently needed. I'm interested in the situation where the host kernel is Linux and it's configured to prevent overcommit (which it does by detailed accounting for all committed memory).
If I just want to prevent the data that my application is no longer using from occupying physical memory or getting swapped to disk (wasting resources either way), I can madvise the kernel that it's unneeded, or mmap new zero pages over top of it. But neither of these approaches will necessarily reduce the amount of memory that counts as committed, which other processes are then prevented from using.
What if I replace the pages with fresh zero pages that are marked read-only? My intent is that they don't count towards committed memory, and further that I can later use mprotect to make them writable, and that it would fail if making them writable would go over the committed memory limit. Is my understanding correct? Will this work?
If you're not using the page (reading or writing to it), it won't be commited to your address space (only reserved).
But your address space is limited, so you can't play as you want/like with it.
See for example ElectricFence which may fail for large number of allocations, because of insertion of "nul page/guard page" (anonymous memory with no access).
Have a look at these thread : "mprotect() failed: Cannot allocate memory" :
http://thread.gmane.org/gmane.comp.lib.glibc.user/538/focus=976052
On Linux, assuming overcommit has not been disabled, you can use the MAP_NORESERVE flag to mmap, which will ensure that the page in question will not be accounted as allocated memory prior to being accessed. If overcommit has been completely disabled, see below about multiple-mapping pages.
Note that Linux's behavior for zero pages has changed at times in the past; with some kernel versions, simply reading the page would cause it to be allocated. With others, a write is necessary. Note that the protection flags do not cause allocation directly; however they can prevent you from accidentally triggering an allocation. Therefore, for most reliable results you should avoid accessing the page at all by mprotecting with PROT_NONE.
As another, more portable option, you can map the same page at multiple locations. That is, create and open an empty temp file, unlink it, ftruncate to some reasonable number of pages, then mmap repeatedly at offset 0 into the file. This will absolutely guarantee the memory only counts once against your program's memory usage. You can even use MAP_PRIVATE to auto-reallocate it when you write to the page.
This may have higher memory usage than the MAP_NORESERVE technique (both for kernel tracking data, and for the pages of the temp file itself), however, so I would recommend using MAP_NORESERVE instead when available. If you do use this technique, try to make the region being mapped reasonably large (and put it in /dev/shm if on Linux, to avoid actual disk IO). Each individual mmap call will consume a certain amount of (non-swappable) kernel memory to track it, so it's good to keep that count down.

Resources