I'm randomly reading data (each read < page size) throughout a huge file (far too big to fit in memory).
I normally set MADV_DONTNEED, but looking at the docs + info it seems I instead need FADV_NOREUSE.
I'm not really getting how madvise() and fadvise() work together. Are they synonymous? Does it matter if I prefer one or the other? Can they be used together? Are they different kernel subsystems? Is FADV_NOREUSE what I'm looking for to gain optimal performance?
madvise() and posix_fadvise() are not synonymous.
madvise() tells the kernel (give advise) what to do with existing memory region while fadvise() tells the kernel what to do with cached (or future cache) of a file data.
For example, if you mmap() anonymous region you should use madvise() to hint the kernal not to swap out (MADV_RANDOM) or to swap out only after access. (MADV_SEQUENTIAL)
If you mmap() a file, or part of a file, you can use either madvise() or fadvise() to hint the kernel to readahead for you (MADV_WILLNEED) or to free that cache (MADV_DONTNEED) or to free after access (POSIX_FADV_NOREUSE, fadvise() only) in additional to the above.
If you use file without mapping the data to your process memory (without using mmap()), you should use fadvise() only. madvise() has no meaning.
As far as kernel subsystem, in linux, it is the same subsystem, simply different ways to refer to memory pages and file cache. Please note that those are only hints and when memory is in dire, the kernel might decide to swap out or reuse cached data despite the hint. Only mlock() and mlockall() can prevent that.
In your case, not giving any hint may help, especially if some pages are being read more than other, since the kernel will figure out which pages are "hot" and will attempt to keep in memory.
If you are only reading from a file then you actually don't need either. The paging daemon will automatically free RAM pages that are associated with non-dirty or shared file-backed mappings. If you keep calling madvise/MADV_DONTNEED then you are specifically instructing the kernel to do this. Which may cause performance impact if you access the same page by chance again in the near future.
fadvise is only useful if you access your file with read/lseek. For mmapped pages it has no effect.
Related
How to have an isolated part of memory, that is NOT at all backed to any file or extra management layers such as piping and that can be shared between two dedicated processes on the same Windows machine?
Majority of articles point me into the direction of CreateFileMapping. Let's start from there:
How does CreateFileMapping with hFile=INVALID_HANDLE_VALUE actually work?
According to
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366537(v=vs.85).aspx
it
"...creates a file mapping object of a specified size that is backed by the system paging file instead of by a file in the file system..."
Assume I write something into the memory which is mapped by CreateFileMapping with hFile=INVALID_HANDLE_VALUE. Under which conditions will this content be written to the page file on disk?
Also my understanding of what motivates the use of shared memory is to keep performance up and optimized. Why is the article "Creating Named Shared Memory"
(https://msdn.microsoft.com/de-de/library/windows/desktop/aa366551(v=vs.85).aspx) refering to
CreateFileMapping, if there is not a single attribute combination, that would prevent writing to files, e.g. the page file?
Going back to the original question: I am afraid, that CreateFileMapping is not good enough... So what would work?
You misunderstand what it means for memory to be "backed" by the system paging file. (Don't feel bad; Raymond Chen has described the text you quoted from MSDN as "one of the most misunderstood sentences in the Win32 documentation.") Almost all of the computer's memory is "backed" by something on disk; only the "non-paged pool", used exclusively by the kernel and as little as possible, isn't. If a page isn't backed by an ordinary named file, then it's backed by the system paging file. The operating system won't write pages out to the system paging file unless it needs to, but it can if it does need to.
This architecture is intended to ensure that processes can be completely "paged out" of RAM when they have nothing to do. This used to be much more important than it is nowadays, but it's still valuable; a typical Windows desktop will have dozens of processes "idle" waiting for events (e.g. needing to spool a print job) that may never happen. Those processes can get paged out and the memory can be put to more constructive use.
CreateFileMapping with hfile=INVALID_HANDLE_VALUE is, in fact, what you want. As long as the processes sharing the memory are actively doing stuff with it, it will remain resident in RAM and there will be no performance problem. If they go idle, yeah, it may get paged out, but that's fine because they're not doing anything with it.
You can direct the system not to page out a chunk of memory; that's what VirtualLock is for. But it's meant to be used for small chunks of memory containing secret information, where writing it to the page file could conceivably leak the secret. The MSDN page warns you that "Each version of Windows has a limit on the maximum number of pages a process can lock. This limit is intentionally small to avoid severe performance degradation."
My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.
In the case that memory is allocated and its known that it (almost certainly / probably) won't be used for a long time, it could be useful to tag this memory to be more aggressively moved into swap-space.
Is there some command to tell the kernel of this?
Failing that, it may be better to dump these out to temp files, but I was curious about the ability to send-to-swap (or something similar).
Of course if there is no swap-space, this would do nothing, and in that case writing temp files may be better.
You can use the madvise call to tell the kernel what you will likely be doing with the memory in the future. For example:
madvise(base, length, MADV_SOFT_OFFLINE);
tells the kernel that you wont need the memory in quesion any time soon, so it can be flushed to backing store (or just dropped if it was mapped from a file and is unchanged).
There's also MADV_DONTNEED which allows the kernel to drop the contents even if modified (so when you next access the memory, if you do, it might be zeroed or reread from the original mapped file).
The closest thing I can think of would be mmap see: Memory-mapped I/O. This does not write to the linux swap partition, but allows for paging (complete pages of memory) to disk for access. Temporary files and directories are also available with tempfile, mkstemp and mkdtemp, but again, this does not write to the swap partition, but instead it occurs on the normal filesystem.
Other than features similar to the above, I do not believe there is anything that allows direct access to the swap partition (other than exhausting system memory).
CONTEXT:
I run on an old laptop. I only just have 128Mo ram free on 512Mo total. No money to buy more ram.
I use mmap to help me circumvent this issue and it works quite well.
C code.
Debian 64 bits.
PROBLEM:
Besides all my efforts, I am running out of memory pretty quick right know and I would like to know if I could release the mmaped regions I read to free my ram.
I read that madvise could help, especially the option MADV_SEQUENTIAL.
But I don't quite understand the whole picture.
THE NEED:
To be able to free mmaped allocated memory after the region is read so that it doesn't fill my whole ram with large files. I will not read it soon so it is garbage to me. It is pointless to keep it in ram.
Update: I am not done with the file so don't want to call munmap. I have other stuffs to do with it but in another regions of it. Random reads.
For random read/write access to a mmap()ed file, MADV_SEQUENTIAL is probably not very useful (and may in fact cause undesired behavior). MADV_RANDOM or MADV_DONTNEED would be better options in this case. However, be aware that the kernel is free to ignore any madvise() - although in my understanding, Linux currently does not, as it tends to treat madvise() more as a command than an advisory...
Another option would be to mmap() only selected sections of the file as needed, and munmap() them as you're done with them, perhaps maintaining a pool of some small number of currently active mappings (i.e. mapping more than one region at once if needed, but still keeping it limited).
Or course you must free resources when you're done with them in order not to leak them and thus run out of available space too soon.
Not sure what the question is, if you know about mmap() then surely you know about munmap() too? It's right there on the same manual page.
I have mapped a file into memory using mmap. Now I would like to ensure that there will be no page faults when accessing this memory, i.e. I want to force the system to actually read the data from the harddisk and store it in RAM. I believe
that once the data is there, I can prevent swapping with mlockall. But what is the proper way to get the system to load the data?
I could obviously just do dummy reads of all the pages, but this seems like an ugly hack. Also, I don't want to worry about the compiler being too smart and optimizing away the dummy reads.
Any suggestions?
Why do you think mlock() or mlockall() wouldn't work? Guaranteeing that the affected pages are in RAM is exactly what its purpose is. Quoting from the manpage:
All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
You can use other methods like madvise() to ask for the pages to be loaded into RAM but it's not guaranteed the kernel will comply with that and it's not guaranteed that they will stay in RAM even if the kernel does bring them in. I believe mmap(MAP_POPULATE) also doesn't guarantee that the pages will stay in RAM.
You're looking for MAP_POPULATE.