Multiple mappings for a physical page - c

I want to create a copy-on-write like interface for accessing a mmap()ed file in GNU C. Here is the way it should work:
I will map the file to the address space using mmap(). Doing so, I will have a pointer to a contiguous region of memory which will contain real data.
Using some sort of magic, I will have another part of the address space pointing to the exact same physical pages. In other words, I will have two different addresses to access any physical page on memory for the mmap()ed region.
Once an instruction tries to write to a page using the second mapping, I will change the mapping for that particular page to point to a different physical page (which I will create in a on-demand fashion).
At some point, I will sync the dirty-page with originally mapped page and change the alias to point to the memory-mapped page.
Here is the question: What is the best way to do this?

Still not entirely clear on your exact requirements. But here are some options I see.
Let mmap handle the COW for you using MAP_PRIVATE. Then when you are ready to sync just create a regular mmap (or direct file open) to the original file and do your sync with the modified MAP_PRIVATE page.
That doesn't allow you to know whether the MAP_PRIVATE page was actually modified or not. If you want that (e.g. so that you can optimise and not do a sync unless a page has changed) then you can make the MAP_PRIVATE page readonly. On first access a SEGV will occur. Catch that SEGV with a signal handler and re-map the MAP_PRIVATE page to be writeable and internally note it as dirty.
And finally, if you don't want to use MAP_PRIVATE and want full copy control, just don't use MMAP_PRIVATE but do a readonly mapping and signal handler. In the signal handler, allocate some memory, copy the original page and remap the faulting page.
Hope that all makes sense.

Related

How to drop dirty pages in mmaped memory and prepare for quick munmap?

I'm trying to implement a file backed memory allocator for a swapless system.
For each new allocation, I use mkstemp to create a temporary file as backstore, mmap it as MAP_SHARED to allow pages to be swapped to the backstore when system's memory pressure is hight. I think I've got this part working.
However I'm having difficulty implementing the deallocation case.
Since at the moment of deallocation, neither the content of the back store, nor the content the resident pages or dirty pages matters any more, the quickest way to do this is to drop and free all resident pages and leave the backstore unchanged. However I didn't find a madvice flags that can do this.
MADV_DONTNEED seems excessive because it will commit the dirty pages to back store. (Not true, see answer below)
MADV_DONTNEED
After a successful MADV_DONTNEED operation, the semantics of memory access in the specified region are changed: subsequent accesses of pages in the range will succeed, but will result in either repopulating the memory contents from the up-to-date contents of the underlying mapped file (for shared file mappings, shared anonymous mappings, and shmem-based techniques such as System V shared memory segments) or zero-fill-on-demand pages for anonymous private mappings.
MADV_REMOVE seems excessive as well because not only does it drops resident pages, it also drops the backstore itself.
MADV_REMOVE
Free up a given range of pages and its associated backing store. This is equivalent to punching a hole in the corresponding byte range of the backing store (see fallocate(2)). Subsequent accesses in the specified address range will see bytes containing zero.
So what steps are the quickest path of unmap/close/delete a mmaped file?
Maybe mmap the same region again as MAP_PRIVATE (like this) and then munmap it?
According to this question, MADV_DONTNEED does exactly this: drop the pages without writing back to the back store.
The clause repopulating the memory contents from the up-to-date contents of the underlying mapped file means loads after MADV_DONTNEED will reload from the back store.
All the dirty pages before MADV_DONTNEED weren't committed to the back store and so will be lost.
In summary: MADV_DONTNEED drops all mapped pages (including dirty pages are not committed to the back store) and leave the back store as is.
Fun and informational video here
madvise MADV_DONTNEED on dirty pages don't discard the changes.
I just wrote a 5GB file using mmap/madvise MADV_DONTNEED/munmap with the last 2 steps taking place in 2-4MB chunks, and the file was created without a single error.
However MADV_DONTNEED has no effect on dirty pages. munmap reduces the RSS of the process, but writing still done at Linux's discretion.

Is it possible to control page-out and page-in by user programming? If yes then how?

My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.

How do I implement dynamic shared memory resizing?

Currently I use shm_open to get a file descriptor and then use ftruncate and mmap whenever I want to add a new buffer to the shared memory. Each buffer is used individually for its own purposes.
Now what I need to do is arbitrarily resize buffers.
And also munmap buffers and reuse the free space again later.
The only solution I can come up with for the first problem is: ftuncate(file_size + old_buffer_size + extra_size), mmap, copy data accross into the new buffer and then munmap the original data. This looks very expensive to me and there is probably a better way. It also entails removing the original buffer every time.
For the second problem I don't even have a bad solution, I clearly can't move memory around everytime a buffer is removed. And if I keep track of free memory and use it whenever possible it will slow down allocation process as well as leave me with bits and pieces in between that are unused.
I hope this is not too confusing.
Thanks
As best as I understand you need to grow (or shrink) the existing memory mapping.
Under linux shared memory implemented as a file, located in /dev/shm memory filesystem. All operations in this file is the same as on the regular files (and file descriptors).
if you want to grow the existing mapping first expand the file size with ftruncate (as you wrote) then use mremap to expand the mapping to the requested size.
If you store pointers points to this region you maybe have to update these, but first try to call with 0 flag. In this case the system tries to grow the existing mapping to the requested size (if there is no collision with other preserved memory region) and pointers remains valid.
If previous option not available use MREMAP_MAYMOVE flag. In this case the system remaps to another locations, but mostly it's done effectively (no copy applied by the system.) then update the pointers.
Shrinking is the same but the reverse order.
I wrote an open source library for just this purpose:
rszshm - resizable pointer-safe shared memory
To quote from the description page:
To accommodate resizing, rszshm first maps a large, private, noreserve
map. This serves to claim a span of addresses. The shared file mapping
then overlays the beginning of the span. Later calls to extend the
mapping overlay more of the span. Attempts to extend beyond the end of
the span return an error.
I extend a mapping by calling mmap with MAP_FIXED at the original address, and with the new size.

Exchange the mapping to two physical pages of two pages in virtual memory

Here is the situation:
A process has two pages vp1 and vp2. These two pages are mapped to 2 physical pages or 2 pages in the swap. Let's call these physical (or in swap) pages pp1 and pp2. The mapping is:
vp1->pp1
vp2->pp2
Now, if I want to change the mapping to:
vp1->pp2
vp2->pp1
That means, reading from vp2 by the process will get the content originally in vp1. Is there a method to do this without changing the kernel on Linux?
Yes, but you have to do some work first. One way to accomplish this is to create two shared memory objects. Then you can map and unmap the shared memory objects in the process address space. See the system calls shmat, shmdt, shmget, and shmctl for details.
Mapping and unmapping is likely to take considerable time, so it may not save time over using some pointer scheme to choose which addresses a process uses to access data.
No. Not in the general case if you want to keep your system working. But if you control how the mappings are created you can create them with mmap of a file or an object from shm_open and when you need to swap them just overwrite them with mmap(... MAP_FIXED ...).

Manage virtual memory from userspace

What I actually want to do is to redirect writes in a certain memory area to a separate memory area which is shared between two processes. Can this be done at user level? For example, for some page X. What I want to do is to change its (virtual to physical) mapping to some shared mapping when it's written. Is this achievable? I need to do it transparently too, that is the program still uses the variables in page X by their names or pointers, but behind the scenes, we are using a different page.
Yes, it is possible to replace memory mappings in Linux, though it is not advisable to do it since it is highly non-portable.
First, you should find out in what page the X variable is located by taking its address and masking out the last several bits - query the system page size with sysconf(_SC_PAGE_SIZE) in order to know how many bits to mask out. Then you can create a shared memory mapping that overlaps this page using the MAP_FIXED | MAP_SHARED flag to mmap(2) or mmap2(2). You should copy the initial content of the page and restore it after the new mapping. Since other variables may reside in the same page, you should be very careful about memory layout and better use a dedicated shared memory object.
What you're trying to do isn't entirely possible, because, at least on x86, memory cannot be remapped on that fine-grained of a scale. The smallest quantum that you can remap memory on is a 4k page, and the page containing any given variable (e.g, X) is likely to contain other variables or program data.
That being said, you can share memory between processes using the mmap() system call.

Resources