how to make sure an allocate buffer is in memory - c

Let's say I allocate a buffer using mmap in C. Is there any linux operation I can use to make sure that this buffer has been paged into memory and there is an entry for this buffer in page table. I want this because I see some page faults with my application even my memory is much larger than the application requirement. I am using CentOS 7.

Pass MAP_POPULATE as a flag to the mmap call. That's precisely what it's there for. It won't guarantee the pages don't page out later under memory pressure, but it will page them in at mmap time if possible. Quoting the man page:
MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later accesses to the mapping will not be blocked by page faults. MAP_POPULATE is only supported for private mappings since Linux 2.6.23.
If you really wanted to force stuff to be locked into memory, you could also try passing the MAP_LOCKED flag (which mlocks the memory preventing page out), but this is dangerous since it thwarts memory management and as a result, the cap on mlock-ed pages is often quite low to avoid causing problems.

Related

Can static memory be lazily allocated?

Having a static array in a C program:
#define MAXN (1<<13)
void f() {
static int X[MAXN];
//...
}
Can the Linux kernel choose to not map the addresses to physical memory until the each page is actually used? How can X be full of 0s then, is the memory zeroed when each page is accessed? How does that not impact the performance of the program?
Can the Linux kernel choose to not map the addresses to physical memory until the each page is actually used?
Yes, it does this for all memory (except special memory used by drivers and the kernel itself).
How can X be full of 0s then, is the memory zeroed when each page is accessed?
You're supposed to ignore this detail. As long as the memory is full of zeroes when you access it, we say it's full of zeroes.
How does that not impact the performance of the program?
It does.
Can the Linux kernel choose to not map the addresses to physical memory until the each page is actually used?
Yes, with userspace memory it is always done.
How can X be full of 0s then, is the memory zeroed when each page is accessed?
The kernel maintains a page full of 0s, when the user asks for a new page of the static array (static thus full of 0s before first use), the kernel provides the zeroed page, without permissions for the program to write. Writing to the array causes the copy-on-write mechanism to trigger: a page fault occurs, the kernel then allocates a writable page, maps it and resumes the program from the last instruction (the one that couldn't complete because of the page fault). Note that prezeroing optimizations change the implementation details here, but the theory's the same.
How does that not impact the performance of the program?
The program doesn't have to zero a (potentially) lot of pages on start, and the kernel doesn't actually have to have the memory (one can ask for more memory than the system's got, as long as you don't use it). Page faults will be generated during the execution of the program, but they can be minimized, see mmap() and madvise() with MADV_SEQUENTIAL. Remember that the Translation Lookaside Buffer is not infinite, there are so many entries it can maintain.
Sources: A linux memory FAQ, Introduction to Memory Management in Linux by Alan Ott

How zero-fill-on demand is implemented in Linux kernel, and where I can disable it?

When we malloc memory, only virtual memory is available, and it actually pointed to zero-page. The real physical memory will be allocated when we try to write to the malloced memory, at this moment, there will be copy-on-wright that copy zeros from zero-page to physical memory which mapped by page-fault. My problem is, how/where zero-fill-on demand is implemented in linux source code, I want to disable this functionality to do some test. I guess it may happened in page-fault procedure, rather than brk() or mmap().
Similar topics related to zero-fill-on-demand. ZFOD and COW.
You want to use the MAP_UNINITIALIZED parameter to mmap and enable CONFIG_MMAP_ALLOW_UNINITIALIZED in your kernel compilation.
MAP_UNINITIALIZED (since Linux 2.6.33)
Don't clear anonymous pages. This flag is intended to improve
performance on embedded devices. This flag is honored only if
the kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALā€
IZED option. Because of the security implications, that option
is normally enabled only on embedded devices (i.e., devices
where one has complete control of the contents of user memory).
If you want your userspace process to allocate real memory every *alloc call, I think in the next options:
If it is for performance reasons, you can replace all calloc calls for malloc+memset so processes will always have a real memory page. However, the kernel could still be able to merge some memory pages.
Disable memory overcommit so that every malloc will return the page at the moment. This way, your program will not be able to allocate more memory than available (RAM + swap). See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Is it possible to control page-out and page-in by user programming? If yes then how?

My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.

Force loading of mmap'ed pages

I have mapped a file into memory using mmap. Now I would like to ensure that there will be no page faults when accessing this memory, i.e. I want to force the system to actually read the data from the harddisk and store it in RAM. I believe
that once the data is there, I can prevent swapping with mlockall. But what is the proper way to get the system to load the data?
I could obviously just do dummy reads of all the pages, but this seems like an ugly hack. Also, I don't want to worry about the compiler being too smart and optimizing away the dummy reads.
Any suggestions?
Why do you think mlock() or mlockall() wouldn't work? Guaranteeing that the affected pages are in RAM is exactly what its purpose is. Quoting from the manpage:
All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
You can use other methods like madvise() to ask for the pages to be loaded into RAM but it's not guaranteed the kernel will comply with that and it's not guaranteed that they will stay in RAM even if the kernel does bring them in. I believe mmap(MAP_POPULATE) also doesn't guarantee that the pages will stay in RAM.
You're looking for MAP_POPULATE.

Do mmap/mprotect-readonly zero pages count towards committed memory?

I want to keep virtual address space reserved in my process for memory that was previously used but is not presently needed. I'm interested in the situation where the host kernel is Linux and it's configured to prevent overcommit (which it does by detailed accounting for all committed memory).
If I just want to prevent the data that my application is no longer using from occupying physical memory or getting swapped to disk (wasting resources either way), I can madvise the kernel that it's unneeded, or mmap new zero pages over top of it. But neither of these approaches will necessarily reduce the amount of memory that counts as committed, which other processes are then prevented from using.
What if I replace the pages with fresh zero pages that are marked read-only? My intent is that they don't count towards committed memory, and further that I can later use mprotect to make them writable, and that it would fail if making them writable would go over the committed memory limit. Is my understanding correct? Will this work?
If you're not using the page (reading or writing to it), it won't be commited to your address space (only reserved).
But your address space is limited, so you can't play as you want/like with it.
See for example ElectricFence which may fail for large number of allocations, because of insertion of "nul page/guard page" (anonymous memory with no access).
Have a look at these thread : "mprotect() failed: Cannot allocate memory" :
http://thread.gmane.org/gmane.comp.lib.glibc.user/538/focus=976052
On Linux, assuming overcommit has not been disabled, you can use the MAP_NORESERVE flag to mmap, which will ensure that the page in question will not be accounted as allocated memory prior to being accessed. If overcommit has been completely disabled, see below about multiple-mapping pages.
Note that Linux's behavior for zero pages has changed at times in the past; with some kernel versions, simply reading the page would cause it to be allocated. With others, a write is necessary. Note that the protection flags do not cause allocation directly; however they can prevent you from accidentally triggering an allocation. Therefore, for most reliable results you should avoid accessing the page at all by mprotecting with PROT_NONE.
As another, more portable option, you can map the same page at multiple locations. That is, create and open an empty temp file, unlink it, ftruncate to some reasonable number of pages, then mmap repeatedly at offset 0 into the file. This will absolutely guarantee the memory only counts once against your program's memory usage. You can even use MAP_PRIVATE to auto-reallocate it when you write to the page.
This may have higher memory usage than the MAP_NORESERVE technique (both for kernel tracking data, and for the pages of the temp file itself), however, so I would recommend using MAP_NORESERVE instead when available. If you do use this technique, try to make the region being mapped reasonably large (and put it in /dev/shm if on Linux, to avoid actual disk IO). Each individual mmap call will consume a certain amount of (non-swappable) kernel memory to track it, so it's good to keep that count down.

Resources