byte level write access protection? - c

protecting a page for Read and/or Write access is possible as there are bits in the page table entry that can be turned on and off at kernel level. Is there a way in which certain region of memory be protected from write access, lets say in a C structure there are certain variable(s) which need to be write protected and any write access to them triggers a segfault and a core dump. Its something like the scaled down functionality of mprotect (), as that works at page level, is there a mechanism to similar kind of thing at byte level in user space.
thanks, Kapil Upadhayay.

No, there is no such facility. If you need per-data-object protections, you'll have to allocate at least a page per object (using mmap). If you also want to have some protection against access beyond the end of the object (for arrays) you might allocate at least one more page than what you need, align the object so it ends right at a page boundary, and use mprotect to protect the one or more additional pages you allocated.
Of course this kind of approach will result in programs that are very slow and waste lots of resources. It's probably not viable except as a debugging technique, and valgrind can meet that need much more effectively without having to modify your program...

One way, although terribly slow, is to protect the whole page in which the object lies. Whenever a write access to that page happens, your custom handler for invalid page access gets called and resolves the situation by quickly unprotecting the page, writing the data and then protecting the page again.
This works fine for single-threaded programs, I'm not sure what to do for multi-threaded programs.
This idea is probably not new, so you may be able to find some information or even a ready-made implementation of it.

Related

windows - plain shared memory between 2 processes (no file mapping, no pipe, no other extra)

How to have an isolated part of memory, that is NOT at all backed to any file or extra management layers such as piping and that can be shared between two dedicated processes on the same Windows machine?
Majority of articles point me into the direction of CreateFileMapping. Let's start from there:
How does CreateFileMapping with hFile=INVALID_HANDLE_VALUE actually work?
According to
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366537(v=vs.85).aspx
it
"...creates a file mapping object of a specified size that is backed by the system paging file instead of by a file in the file system..."
Assume I write something into the memory which is mapped by CreateFileMapping with hFile=INVALID_HANDLE_VALUE. Under which conditions will this content be written to the page file on disk?
Also my understanding of what motivates the use of shared memory is to keep performance up and optimized. Why is the article "Creating Named Shared Memory"
(https://msdn.microsoft.com/de-de/library/windows/desktop/aa366551(v=vs.85).aspx) refering to
CreateFileMapping, if there is not a single attribute combination, that would prevent writing to files, e.g. the page file?
Going back to the original question: I am afraid, that CreateFileMapping is not good enough... So what would work?
You misunderstand what it means for memory to be "backed" by the system paging file. (Don't feel bad; Raymond Chen has described the text you quoted from MSDN as "one of the most misunderstood sentences in the Win32 documentation.") Almost all of the computer's memory is "backed" by something on disk; only the "non-paged pool", used exclusively by the kernel and as little as possible, isn't. If a page isn't backed by an ordinary named file, then it's backed by the system paging file. The operating system won't write pages out to the system paging file unless it needs to, but it can if it does need to.
This architecture is intended to ensure that processes can be completely "paged out" of RAM when they have nothing to do. This used to be much more important than it is nowadays, but it's still valuable; a typical Windows desktop will have dozens of processes "idle" waiting for events (e.g. needing to spool a print job) that may never happen. Those processes can get paged out and the memory can be put to more constructive use.
CreateFileMapping with hfile=INVALID_HANDLE_VALUE is, in fact, what you want. As long as the processes sharing the memory are actively doing stuff with it, it will remain resident in RAM and there will be no performance problem. If they go idle, yeah, it may get paged out, but that's fine because they're not doing anything with it.
You can direct the system not to page out a chunk of memory; that's what VirtualLock is for. But it's meant to be used for small chunks of memory containing secret information, where writing it to the page file could conceivably leak the secret. The MSDN page warns you that "Each version of Windows has a limit on the maximum number of pages a process can lock. This limit is intentionally small to avoid severe performance degradation."

Is it possible to manage own memory pages for program?

I'm going into distributed programming and I'm thinking about possible ways of sharing data to multiple computers.
Assume: I'll manage memory coherence by proper task management (only one writing task to a memory region, or one or multiple tasks reading from same memory). Together with custom page management - which I'm asking about in this question.
I would like to work with pointers directly for least overhead (no processing of shared data - index/pointer rewrite, ,... ). Also I would like to lazy load memory pages / shared data / on first access.
So it is possible to manage memory pages:
allocate precisely specified memory range (no data processing, pointer rewritting)
use custom function/callback, when memory page is missed - loads data from network
force system to remove page from memory. This should cause page miss on next access to same memory page
optional: not remove page, but mark is "unsafe" - custom function has to check it, whether some else machine modified that region (don't transfer it again, if not needed)
Assume x64-64bit architecture is used, and I only care about Linux OS (for now).

fadvise vs madvise? can I use both together?

I'm randomly reading data (each read < page size) throughout a huge file (far too big to fit in memory).
I normally set MADV_DONTNEED, but looking at the docs + info it seems I instead need FADV_NOREUSE.
I'm not really getting how madvise() and fadvise() work together. Are they synonymous? Does it matter if I prefer one or the other? Can they be used together? Are they different kernel subsystems? Is FADV_NOREUSE what I'm looking for to gain optimal performance?
madvise() and posix_fadvise() are not synonymous.
madvise() tells the kernel (give advise) what to do with existing memory region while fadvise() tells the kernel what to do with cached (or future cache) of a file data.
For example, if you mmap() anonymous region you should use madvise() to hint the kernal not to swap out (MADV_RANDOM) or to swap out only after access. (MADV_SEQUENTIAL)
If you mmap() a file, or part of a file, you can use either madvise() or fadvise() to hint the kernel to readahead for you (MADV_WILLNEED) or to free that cache (MADV_DONTNEED) or to free after access (POSIX_FADV_NOREUSE, fadvise() only) in additional to the above.
If you use file without mapping the data to your process memory (without using mmap()), you should use fadvise() only. madvise() has no meaning.
As far as kernel subsystem, in linux, it is the same subsystem, simply different ways to refer to memory pages and file cache. Please note that those are only hints and when memory is in dire, the kernel might decide to swap out or reuse cached data despite the hint. Only mlock() and mlockall() can prevent that.
In your case, not giving any hint may help, especially if some pages are being read more than other, since the kernel will figure out which pages are "hot" and will attempt to keep in memory.
If you are only reading from a file then you actually don't need either. The paging daemon will automatically free RAM pages that are associated with non-dirty or shared file-backed mappings. If you keep calling madvise/MADV_DONTNEED then you are specifically instructing the kernel to do this. Which may cause performance impact if you access the same page by chance again in the near future.
fadvise is only useful if you access your file with read/lseek. For mmapped pages it has no effect.

Force loading of mmap'ed pages

I have mapped a file into memory using mmap. Now I would like to ensure that there will be no page faults when accessing this memory, i.e. I want to force the system to actually read the data from the harddisk and store it in RAM. I believe
that once the data is there, I can prevent swapping with mlockall. But what is the proper way to get the system to load the data?
I could obviously just do dummy reads of all the pages, but this seems like an ugly hack. Also, I don't want to worry about the compiler being too smart and optimizing away the dummy reads.
Any suggestions?
Why do you think mlock() or mlockall() wouldn't work? Guaranteeing that the affected pages are in RAM is exactly what its purpose is. Quoting from the manpage:
All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
You can use other methods like madvise() to ask for the pages to be loaded into RAM but it's not guaranteed the kernel will comply with that and it's not guaranteed that they will stay in RAM even if the kernel does bring them in. I believe mmap(MAP_POPULATE) also doesn't guarantee that the pages will stay in RAM.
You're looking for MAP_POPULATE.

Is there a better way than parsing /proc/self/maps to figure out memory protection?

On Linux (or Solaris) is there a better way than hand parsing /proc/self/maps repeatedly to figure out whether or not you can read, write or execute whatever is stored at one or more addresses in memory?
For instance, in Windows you have VirtualQuery.
In Linux, I can mprotect to change those values, but I can't read them back.
Furthermore, is there any way to know when those permissions change (e.g. when someone uses mmap on a file behind my back) other than doing something terribly invasive and using ptrace on all threads in the process and intercepting any attempt to make a syscall that could affect the memory map?
Update:
Unfortunately, I'm using this inside of a JIT that has very little information about the code it is executing to get an approximation of what is constant. Yes, I realize I could have a constant map of mutable data, like the vsyscall page used by Linux. I can safely fall back on an assumption that anything that isn't included in the initial parse is mutable and dangerous, but I'm not entirely happy with that option.
Right now what I do is I read /proc/self/maps and build a structure I can binary search through for a given address's protection. Any time I need to know something about a page that isn't in my structure I reread /proc/self/maps assuming it has been added in the meantime or I'd be about to segfault anyways.
It just seems that parsing text to get at this information and not knowing when it changes is awfully crufty. (/dev/inotify doesn't work on pretty much anything in /proc)
I do not know an equivalent of VirtualQuery on Linux. But some other ways to do it which may or may not work are:
you setup a signal handler trapping SIGBUS/SIGSEGV and go ahead with your read or write. If the memory is protected, your signal trapping code will be called. If not your signal trapping code is not called. Either way you win.
you could track each time you call mprotect and build a corresponding data structure which helps you in knowing if a region is read or write protected. This is good if you have access to all the code which uses mprotect.
you can monitor all the mprotect calls in your process by linking your code with a library redefining the function mprotect. You can then build the necessary data structure for knowing if a region is read or write protected and then call the system mprotect for really setting the protection.
you may try to use /dev/inotify and monitor the file /proc/self/maps for any change. I guess this one does not work, but should be worth the try.
There sorta is/was /proc/[pid|self]/pagemap, documentation in the kernel, caveats here:
https://lkml.org/lkml/2015/7/14/477
So it isn't completely harmless...

Resources