I want to mmap a big file into memory and parse it sequentially. As I understand if bytes have been lazily read into memory once, they stay there. Is there a way to periodically tell the system to release the previously read contents?
This understanding is only a very superficial view.
To understand what really happens you have to take into account the difference of the virtual memory of your process and the actual real memory of the machine. Mapping a huge file means reserving space in your virtual address-space. It's probably platform-dependent if anything is already read at this point.
When you actually access the data the OS has to fill an actual page of memory. When you access other parts these parts have to be brought into memory. It's completely up to the OS when it will re-use the memory. Normally this happens when some data is accessed by you or an other process and no free memory is available. But could happen at any time. If you access it again later it might be still in memory or will brought back by the OS. No way for your process to tell the difference.
In short: You don't need to care about that. The OS manages all that in the background.
One point might be that if you map a really huge file this takes up space in your virtual address-space which is limited. So if you deal with many huge mappings and or huge allocations you might want to only map parts of the file at a given time.
ADDITION: after thinking a bit about it, I came up with a reason why it might be smarter to do it blockwise-sequential. Although I doubt you will be able to measure that.
Any reasonable OS will look for a block to unload when in need in something like the following order:
unmapped files ( not needed anymore)
LRU unmodified mapped file (can be retrieved from disc)
LRU modified mapped file (same as 2. but needs to be updated on disc before unload)
LRU allocated memory (needs to be written to swap)
So unmapping blocks known to be never used again as you go, you give the OS a hint that these should be freed earlier. This will give data that has been used less recently but might be accessed in the future a bigger chance to stay in memory.
Related
I'm trying to write a program that reads the memory of other processes. This seems quite straightforward however I came across one problem. If the process I'm trying to read from has memory mapped but it has not use that memory so far (e.g. process called malloc, but never read from or wrote to that memory) the physical pages are not yet allocated. I presume this is the lazy allocation.
Now, when my program tries to read that memory it's immediately allocated. I would like to avoid that as I noticed some programs seems to allocate (or map) very large amounts that they don't normally use. This either causes very large memory consumption after the read or, in extreme cases, the process even being killed by OOM as the mapped region was bigger than the available ram. This is of course not acceptable as my reader should ideally be transparent and not affect the process being examined.
So far I just found information about /proc/pid/pagemap but I'm not fully sure if it suits my needs. There is a 'present' bit but I did not found information what it exactly means and if this is what I'm looking for.
Can someone confirm that? Or there's another way of achieving the same goal.
My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.
In the case that memory is allocated and its known that it (almost certainly / probably) won't be used for a long time, it could be useful to tag this memory to be more aggressively moved into swap-space.
Is there some command to tell the kernel of this?
Failing that, it may be better to dump these out to temp files, but I was curious about the ability to send-to-swap (or something similar).
Of course if there is no swap-space, this would do nothing, and in that case writing temp files may be better.
You can use the madvise call to tell the kernel what you will likely be doing with the memory in the future. For example:
madvise(base, length, MADV_SOFT_OFFLINE);
tells the kernel that you wont need the memory in quesion any time soon, so it can be flushed to backing store (or just dropped if it was mapped from a file and is unchanged).
There's also MADV_DONTNEED which allows the kernel to drop the contents even if modified (so when you next access the memory, if you do, it might be zeroed or reread from the original mapped file).
The closest thing I can think of would be mmap see: Memory-mapped I/O. This does not write to the linux swap partition, but allows for paging (complete pages of memory) to disk for access. Temporary files and directories are also available with tempfile, mkstemp and mkdtemp, but again, this does not write to the swap partition, but instead it occurs on the normal filesystem.
Other than features similar to the above, I do not believe there is anything that allows direct access to the swap partition (other than exhausting system memory).
CONTEXT:
I run on an old laptop. I only just have 128Mo ram free on 512Mo total. No money to buy more ram.
I use mmap to help me circumvent this issue and it works quite well.
C code.
Debian 64 bits.
PROBLEM:
Besides all my efforts, I am running out of memory pretty quick right know and I would like to know if I could release the mmaped regions I read to free my ram.
I read that madvise could help, especially the option MADV_SEQUENTIAL.
But I don't quite understand the whole picture.
THE NEED:
To be able to free mmaped allocated memory after the region is read so that it doesn't fill my whole ram with large files. I will not read it soon so it is garbage to me. It is pointless to keep it in ram.
Update: I am not done with the file so don't want to call munmap. I have other stuffs to do with it but in another regions of it. Random reads.
For random read/write access to a mmap()ed file, MADV_SEQUENTIAL is probably not very useful (and may in fact cause undesired behavior). MADV_RANDOM or MADV_DONTNEED would be better options in this case. However, be aware that the kernel is free to ignore any madvise() - although in my understanding, Linux currently does not, as it tends to treat madvise() more as a command than an advisory...
Another option would be to mmap() only selected sections of the file as needed, and munmap() them as you're done with them, perhaps maintaining a pool of some small number of currently active mappings (i.e. mapping more than one region at once if needed, but still keeping it limited).
Or course you must free resources when you're done with them in order not to leak them and thus run out of available space too soon.
Not sure what the question is, if you know about mmap() then surely you know about munmap() too? It's right there on the same manual page.
I have to write a simple key-value store for a very specific use. This store will be running in the same memory space as the process that uses it.
One requirement for this store is that it is kept in RAM and it has to be as fast as possible. We haven't decided for a data structure but we might be using a LLRB-Tree.
How can I make sure that my data structure will always be kept in RAM? Not swapped, not paged, not cached somewhere else but exclusively in-memory.
If you use Linux, then check mlock()
mlock() and mlockall() respectively lock part or all of the calling
process's virtual address space into RAM, preventing that memory from
being paged to the swap area.
(man page)