Linux - Duplicate a virtual memory address from malloc or move a virtual memory address - c

Short question:
Is it possible to map a buffer that has been malloc'd to have two ways (two pointers pointing to the same physical memory) of accessing the same buffer?
Or, is it possible to temporarily move a virtual memory address received by malloc? Or is it possible to point from one location in virtual space to another?
Background:
I am working with DirectFB, a surface management and 2D graphics composting library. I am trying to enforce the Locking protocol which is to Lock a surface, modify the memory only while locked (the pointer is to system memory allocated using malloc), and unlocking the surface.
I am currently trying to trace down a bug in an application that is locking a surface and then storing the pixel pointer and modifying the surface later. This means that the library does not know when it is safe to read or write to a surface. I am trying to find a way to detect that the locking protocol has been violated. What I would like is a way to invalidate the pointer passed to the user after the unlock call is made. Even better, I would like the application to seg fault if it tries to access the memory after the lock. This would stop in the debugger and give us an idea of which surface is involved, which routine is involved, who called it, etc.
Possible solutions:
Create a temporary buffer, pass the buffer pointer to the user, on unlock copy the pixels to the actual buffer, delete the temporary
buffer.
Pros: This is an implementable solution.
Cons: Performance is slow as it requires a copy which is expensive, also the memory may or may not be available. There is no
way to guarantee that one temporary surface overlaps another allowing
an invalidated pointer to suddenly work again.
Make an additional map to a malloc'd surface and pass that to the user. On unlock, unmap the memory.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Need to set aside a reserved range of addresses are never used by anything else (including malloc or the kernel). Also need to
ensure that no two surfaces overlap which could allow an old pointer
to suddenly point to something valid and not seg fault when it should.
Take advantage of the fact that the library does not access the memory while locked by the user and simply move the virtual address on
a lock and move it back on an unlock.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Same as "2" above.
Is this feasible?
Additional info:
This is using Linux 2.6, using stdlib.
The library is written in C.
The library and application run in user space.
There is a possibility of using a kernel module (to write a custom memory allocation routine), but the difficulty of writing a module in
my current working climate would probably reduce the chances to near
zero levels that I could actually implement this solution. But if this
is the only way, it would be good to know.
The underlying processor is x86.

The function you want to create multiple mappings of a page is shm_open.
You may only be using the memory within one process, but it's still "shared memory" - that is to say, multiple virtual mappings for the same underlying physical page will exist.
However, that's not what you want to do. What you should actually do is have your locking functions use the mprotect system call to render the memory unreadable on unlock and restore the permissions on lock; any access without the lock being held will cause a segfault. Of course, this'll only work with a single simultaneous accessing thread...
Another, possibly better, way to track down the problem would be to run your application in valgrind or another memory analysis tool. This will greatly slow it down, but allows you very fine control: you can have a valgrind script that will mark/unmark memory as accessible and the tool will kick you straight into the debugger when a violation occurs. But for one-off problem solving like this, I'd say install an #ifdef DEBUG-wrapped mprotect call in your lock/unlock functions.

Related

Read mmapped data memory efficient

I want to mmap a big file into memory and parse it sequentially. As I understand if bytes have been lazily read into memory once, they stay there. Is there a way to periodically tell the system to release the previously read contents?
This understanding is only a very superficial view.
To understand what really happens you have to take into account the difference of the virtual memory of your process and the actual real memory of the machine. Mapping a huge file means reserving space in your virtual address-space. It's probably platform-dependent if anything is already read at this point.
When you actually access the data the OS has to fill an actual page of memory. When you access other parts these parts have to be brought into memory. It's completely up to the OS when it will re-use the memory. Normally this happens when some data is accessed by you or an other process and no free memory is available. But could happen at any time. If you access it again later it might be still in memory or will brought back by the OS. No way for your process to tell the difference.
In short: You don't need to care about that. The OS manages all that in the background.
One point might be that if you map a really huge file this takes up space in your virtual address-space which is limited. So if you deal with many huge mappings and or huge allocations you might want to only map parts of the file at a given time.
ADDITION: after thinking a bit about it, I came up with a reason why it might be smarter to do it blockwise-sequential. Although I doubt you will be able to measure that.
Any reasonable OS will look for a block to unload when in need in something like the following order:
unmapped files ( not needed anymore)
LRU unmodified mapped file (can be retrieved from disc)
LRU modified mapped file (same as 2. but needs to be updated on disc before unload)
LRU allocated memory (needs to be written to swap)
So unmapping blocks known to be never used again as you go, you give the OS a hint that these should be freed earlier. This will give data that has been used less recently but might be accessed in the future a bigger chance to stay in memory.

How to determine if specific memory locations are unallocated in Linux

I am trying to determine whether a group of contiguous memory locations on my Linux system are currently holding data for a process. If they are not then I want to allocate these locations to my programme and ensure that other processes know that those memory locations are reserved for my programme.
I need to access a specific memory location, e.g. 0xF0A35194. Any old location will not do.
I have been searching for solutions but have not been brave enough to execute and test any code for fear that I will destroy my system.
I do not want to run this code in kernel-space so a user-space solution is preferable.
Any help would be greatly appreciated!
There are a lot of issues at play here. To start, each process has its own memory address space. Even if a particular memory address is allocated for one process, it won't be for a different process. Memory virtualize and paging is a complex and opaque abstraction that can't be broken from within user space.
Next, the only reason that I can imagine you want to do something like this is to go hunting for particular DMA ranges for devices. That is also not allowed from user space and there are much better ways to accomplish that.
If you can post what you are trying to achieve more directly, we can provide a better solution.

Is it possible to control page-out and page-in by user programming? If yes then how?

My questions are as follows:
I mmap(memory mapping) a file into the virtual memory space.
When I access the first byte of the file using a pointer at the first time, the OS will try to access the data in memory, but it will fails and raises the page fault, because the data doesn't present in memory now. So the OS will swap the data from disk into memory. Finally my access will success.
(question is coming)
When I modify the data(in-memory) and write back into disk file, how could I just free the physical memory for other using, but remain virtual memory for fetching the data back into memory as needed?
It sounds like the page-out and page-in behaviors where the OS know the memory is exhaust, it will swap the LRU(or something like that) memory page into disk(swap files) and free the physical memory for other process, and fetch the evicted data back into memory as needed. But this mechanism is controlled by OS.
For some reasons, I need to control the page-out and page-in behaviors by myself. So how should I do? Hack the kernel?
You can use the madvise system call. Its behaviour is affected by the advice argument; there are many choices for advice and the optimal one should be picked based on the specifics of your application.
The flag MADV_DONTNEED means that the given range of physical backing frames should be unconditionally freed (i.e. paged out). Also:
After a successful MADV_DONTNEED operation, the semantics of
memory access in the specified region are changed: subsequent
accesses of pages in the range will succeed, but will result
in either repopulating the memory contents from the up-to-date
contents of the underlying mapped file (for shared file
mappings, shared anonymous mappings, and shmem-based
techniques such as System V shared memory segments) or zero-
fill-on-demand pages for anonymous private mappings.
This could be useful if you're absolutely certain that it will be very long until you access the same position again.
However it might not be necessary to force the kernel to actually page out; instead another possibility, if you're accessing the mapping sequentially is to use madvise with MADV_SEQUENTIAL to tell kernel that you'd access your memory mapping mostly sequentially:
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
or MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
These are not as aggressive as explicitly calling MADV_DONTNEED to page out. (Of course you can combine these with MADV_DONTNEED as well)
In recent kernel versions there is also the MADV_FREE flag which will lazily free the page frames; they will stay mapped in if enough memory is available, but are reclaimed by the kernel if the memory pressure grows.
You can checout mlock+munlock to lock/unlock the pages. This will give you control over pages being swapped out.
You need to have CAP_IPC_LOCK capability to perform this operation though.

The implementation of copy_from_user()

I am just wondering why does copy_from_user(to, from, bytes) do real copy? Because it just wants kernel to access user-space data, can it directly maps physical address to kernel's address space without moving the data?
Thanks,
copy_from_user() is usually used when writing certain device drivers. Note that there is no "mapping" of bytes here, the only thing that is happening is the copying of bytes from a certain virtual location mapped in user-space to bytes in a location in kernel-space. This is done to enforce separation of kernel and user and to prevent any security flaws -- you never want the kernel to start accessing and reading arbitrary user memory locations or vice-versa. That is why arguments and results from syscalls are copied to/from the user before they actually run.
"Before this it's better to know why copy_from_user() is used"
Because the Kernel never allow a user space application to access Kernel memory directly, because if the memory pointed is invalid or a fault occurs while reading, this would the kernel to panic by just simply using a user space application.
"And that's why!!!!!!"
So while using copy_from_user is all that it could create an error to the user and it won't affect the kernel functionality
Even though it's an extra effort it ensures the safe and secure operation of Kernel
copy_from_user() does a few checks before it starts copying data. Directly manipulating data from user-space is never a good idea because it exists in a virtual address space which might get swapped out.
http://www.ibm.com/developerworks/linux/library/l-kernel-memory-access/
one of the major requirement in system call implementation is to check the validity of user parameter pointer passed as argument, kernel should not blindly follow the user pointer as the user pointer can play tricks in many ways. Major concerns are:
1. it should be a pointer from that process address space - so that it cant get into some other process address space.
2. it should be a pointer from user space - it should not trick to play with a kernel space pointer.
3. it should not bypass memory access restrictions.
that is why copy_from_user() is performed. It is blocking and process sleeps until page fault handler can bring the page from swap file to physical memory.

mmaping large files(for persistent large arrays)

I'm implementing persistent large constant arrays via mmap. Is there any tips and tricks or gotchas one should be aware when using mmap?
All pointers that are stored inside the mmap'd region should be done as offsets from the base of the mmap'd region, not as real pointers! You won't necessarily be getting the same base address when you mmap the region on the next run of the program. (I have had to clean up code that made incorrect assumptions about mmap region base address constancy).
This is the most straight forward use case for mmap() so there shouldn't be much to trip you up.
You are effectively just loading a large constant array. Being constants you shouldn't need to worry about synchronization. It would be advisable to make sure the prot parameter is set to PROT_READ only since you won't be writing.
If one or more programs using the constants are going to be continually run, it might be worthwhile to have a separate program that loads the data and keeps it resident. Runs of the other programs then essentially are just doing an shared memory attach rather than continually reading the file into memory.
Make sure you check for restrictions on open file size or memory usage. On Linux there is a built in shell command ulimit. Run as ulimit -a to see the current settings.
Flush writes to the in-memory array to the file with the msync(2) syscall or else they may stay in memory until munmap(2) and there may be a power outage or something before then!
If multiple processes are mmap'ing the same memory region shared with read and write privileges, make sure that only one is writing to it at a time to avoid corrupting your data. Or use file locking or some other means of synchronization.

Resources