Manage virtual memory from userspace - c

What I actually want to do is to redirect writes in a certain memory area to a separate memory area which is shared between two processes. Can this be done at user level? For example, for some page X. What I want to do is to change its (virtual to physical) mapping to some shared mapping when it's written. Is this achievable? I need to do it transparently too, that is the program still uses the variables in page X by their names or pointers, but behind the scenes, we are using a different page.

Yes, it is possible to replace memory mappings in Linux, though it is not advisable to do it since it is highly non-portable.
First, you should find out in what page the X variable is located by taking its address and masking out the last several bits - query the system page size with sysconf(_SC_PAGE_SIZE) in order to know how many bits to mask out. Then you can create a shared memory mapping that overlaps this page using the MAP_FIXED | MAP_SHARED flag to mmap(2) or mmap2(2). You should copy the initial content of the page and restore it after the new mapping. Since other variables may reside in the same page, you should be very careful about memory layout and better use a dedicated shared memory object.

What you're trying to do isn't entirely possible, because, at least on x86, memory cannot be remapped on that fine-grained of a scale. The smallest quantum that you can remap memory on is a 4k page, and the page containing any given variable (e.g, X) is likely to contain other variables or program data.
That being said, you can share memory between processes using the mmap() system call.

Related

Is it possible to control page-out and page-in by user programming? If yes then how?

My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.

How does mmap work when 2 programs map the same file

I am trying to understand how mmap works while looking at man mmap.
As I understand it, it adds a mapping to the page table that maps between the file and the virtual address (which is the address that is given void *addr)
So, what happens when 2 programs map the same file?
Are there 2 entries in the page table, one for each program?
So, what happens when 2 programs map the same file? Are there 2 entries in the page table, one for each program?
In modern operating systems, each process has its own page table for its memory, that may point to pages of physical memory shared with other user and kernel processes.
With MAP_SHARED, this mapping is shared: updates to the mapping are visible to other processes that map this file, and are carried through to the underlying file. The file may not actually be updated until msync(2) or munmap() is called.
This seems very interesting, but there are numerous caveats:
The actual pages mmapped by both processes for the same file may reside at the same address or at a different address in each process, storing pointers into this shared memory may not allow the other process to use them as they might point to inconsistent addresses.
The implementation may use the same physical memory pages for both mappings or not: for subtile reasons (cache strategies, out of sync reading...), even if it is the same physical memory, modifications done by one process to its memory may not be immediately reflected in the memory of the other process.
So the modification may or may not be visible to the other processes mmapping the file nor reading it via read or the FILE* stream API.
If one of the processes calls msync(), the modifications should be visible in all maps and for all yet unread portions of the file, bearing in mind that the FILE* streaming APIs may have buffered some data in internal unshared buffers: modifications in this area will not be reflected.
Conclusion: it is risky and unreliable to use these mechanisms to implement inter process communication. The behavior may depend on system specific characteristics such as the OS strategies, the CPU and cache architectures, the type of RAM in use, the clock speed, and who knows what else. It is safer to rely on proven APIs that may indeed be implemented using mmapped memory, but only if it is know to provide the correct semantics.
The actual system implementation is different. At the risk of over simplification (and omitting paging here):
A mmap will map physical page frames to a file.
So, what happens when 2 programs map the same file? Are there 2 entries in the page table, one for each program?
If two processes (P and Q) map to the same file, then P and Q will each have there own page table; each page table will have entry mapping to the same physical page frame (which could be mapped to different addresses within P and Q).

Multiple mappings for a physical page

I want to create a copy-on-write like interface for accessing a mmap()ed file in GNU C. Here is the way it should work:
I will map the file to the address space using mmap(). Doing so, I will have a pointer to a contiguous region of memory which will contain real data.
Using some sort of magic, I will have another part of the address space pointing to the exact same physical pages. In other words, I will have two different addresses to access any physical page on memory for the mmap()ed region.
Once an instruction tries to write to a page using the second mapping, I will change the mapping for that particular page to point to a different physical page (which I will create in a on-demand fashion).
At some point, I will sync the dirty-page with originally mapped page and change the alias to point to the memory-mapped page.
Here is the question: What is the best way to do this?
Still not entirely clear on your exact requirements. But here are some options I see.
Let mmap handle the COW for you using MAP_PRIVATE. Then when you are ready to sync just create a regular mmap (or direct file open) to the original file and do your sync with the modified MAP_PRIVATE page.
That doesn't allow you to know whether the MAP_PRIVATE page was actually modified or not. If you want that (e.g. so that you can optimise and not do a sync unless a page has changed) then you can make the MAP_PRIVATE page readonly. On first access a SEGV will occur. Catch that SEGV with a signal handler and re-map the MAP_PRIVATE page to be writeable and internally note it as dirty.
And finally, if you don't want to use MAP_PRIVATE and want full copy control, just don't use MMAP_PRIVATE but do a readonly mapping and signal handler. In the signal handler, allocate some memory, copy the original page and remap the faulting page.
Hope that all makes sense.

Exchange the mapping to two physical pages of two pages in virtual memory

Here is the situation:
A process has two pages vp1 and vp2. These two pages are mapped to 2 physical pages or 2 pages in the swap. Let's call these physical (or in swap) pages pp1 and pp2. The mapping is:
vp1->pp1
vp2->pp2
Now, if I want to change the mapping to:
vp1->pp2
vp2->pp1
That means, reading from vp2 by the process will get the content originally in vp1. Is there a method to do this without changing the kernel on Linux?
Yes, but you have to do some work first. One way to accomplish this is to create two shared memory objects. Then you can map and unmap the shared memory objects in the process address space. See the system calls shmat, shmdt, shmget, and shmctl for details.
Mapping and unmapping is likely to take considerable time, so it may not save time over using some pointer scheme to choose which addresses a process uses to access data.
No. Not in the general case if you want to keep your system working. But if you control how the mappings are created you can create them with mmap of a file or an object from shm_open and when you need to swap them just overwrite them with mmap(... MAP_FIXED ...).

How does mprotect() work?

I was stracing some of the common commands in the linux kernel, and saw mprotect() was used a lot many times. I'm just wondering, what is the deciding factor that mprotect() uses to find out that the memory address it is setting a protection value for, is in its own address space?
On architectures with an MMU1, the address that mprotect() takes as an argument is a virtual address. Each process has its own independent virtual address space, so there's only two possibilities:
The requested address is within the process's own address range; or
The requested address is within the kernel's address range (which is mapped into every process).
mprotect() works internally by altering the flags attached to a VMA2. The first thing it must do is look up the VMA corresponding to the address that was passed - if the passed address was within the kernel's address range, then there is no VMA, and so this search will fail. This is exactly the same thing happens if you try to change the protections on an area of the address space that is not mapped.
You can see a representation of the VMAs in a process's address space by examining /proc/<pid>/smaps or /proc/<pid>/maps.
1. Memory Management Unit
2. Virtual Memory Area, a kernel data structure describing a contiguous section of a process's memory.
This is about virtual memory. And about dynamic linker/loader. Most mprotect(2) syscalls you see in the trace are probably related to bringing in library dependencies, though malloc(3) implementation might call it too.
Edit:
To answer your question in comments - the MMU and the code inside the kernel protect one process from the other. Each process has an illusion of a full 32-bit or 64-bit address space. The addresses you operate on are virtual and belong to a given process. Kernel, with the help of the hardware, maps those to physical memory pages. These pages could be shared between processes implicitly as code, or explicitly for interprocess communications.
The kernel looks up the address you pass mprotect in the current process's page table. If it is not in there then it fails. If it is in there the kernel may attempt to mark the page with new access rights. I'm not sure, but it may still be possible that the kernel would return an error here if there were some special reason that the access could not be granted (such as trying to change the permissions of a memory mapped shared file area to writable when the file was actually read only).
Keep in mind that the page table that the processor uses to determine if an area of memory is accessible is not the one that the kernel used to look up that address. The processor's table may have holes in it for things like pages that are swapped out to disk. The tables are related, but not the same.

Resources