memfd_secret(): how is it supposed to work? [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
memfd_secret() was merged in the kernel, but I do not see the real security benefit of it. I mean, this has the idea of avoiding sideline attacks, but this is like when the car keys are locked and nobody knows where they are.
AFAIK, the page given to the application simply is not mapped when in kernel mode, but this cant be used to isolate a virus, or whatever of the kernel itself.
How is it supposed to be safer to isolate a range of memory of the kernel?
Could someone provide a code example showing how this protects of spectre or like that?
Update
int main(int argc, char *argv[]) {
while(true) {
int rc = fork();
if(rc == -1) {
perror("fork error");
}
}
}

memfd_secret() allows a user-space process to have a "secret" memory area. In this context, "secret" means that other processes cannot have access to that memory area (not even the kernel itself, or at least not by accident).
This syscall allows a process to store confidential information (like a password or a private key) in a more secure way, because it's harder for a malware to access that secret memory area. This syscall should also protect from vulnerabilities like Spectre, because the secret memory area is uncached; and should also protect (albeit not completely, but at least partially) from kernel bugs, since the kernel has no access to that memory area.
In order to use this syscall (that will be available in Linux 5.14), you first make a call to memfd_secret() in order to obtain a file descriptor; then you make a call to ftruncate() in order to choose the size of the secret memory region; and finally you use mmap() in order to map the secret memory, so you can access it via pointers as usual.
Other details are available here.
EDIT: unfortunately, the "uncached" feature that made memfd_secret() less vulnerable to attacks like Spectre has been removed because there was a concern for perfomance.
EDIT 2: additional details about why secret memory areas obtained with memfd_secret() makes programs safer (source, slightly modified by me for clearness):
Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Secret memory makes "simple"
ROP insufficient to perform exfiltration, which increases the required
complexity of the attack. Along with other protections like the kernel
stack size limit and address space layout randomization which make finding
gadgets is really hard, absence of any in-kernel primitive for accessing
secret memory means the one gadget ROP attack can't work. Since the only
way to access secret memory is to reconstruct the missing mapping entry,
the attacker has to recover the physical page and insert a PTE pointing to
it in the kernel and then retrieve the contents. That takes at least three
gadgets which is a level of difficulty beyond most standard attacks.
Prevent cross-process secret user-space memory exposures. Once the secret
memory is allocated, the user can't accidentally pass it into the kernel to
be transmitted somewhere. The secret memory pages cannot be accessed via the
direct map and they are disallowed in GUP.
Harden against exploited kernel flaws. In order to access secret memory, a
kernel-side attack would need to either walk the page tables and create new
ones, or spawn a new privileged user-space process to perform secrets
exfiltration using ptrace.
EDIT 3: just a note that I think may be relevant: secret memory areas can be accessed by child processes created using fork(), so one must be cautious. At least, using flag O_CLOEXEC (passed to memfd_secret()), the process will not make the secret memory available to processes created with execve().

Related

Segments in RAM memory [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am confused with the segments in RAM memory,please clarify following doubts
RAM has been been dived as User space and Kernel space;is this memory division is done by O/S or it is done by H/W(CPU).
What are the contents of kernel space;as far as i have understood there will be kernel image only,please correct me if i am wrong.
Where does this code,data,stack and heap segments exist?
a) Does User and Kernel space has separate code,data,stack and heap segments?
b) Is this segments are created by H/W or (O/S).
Can i find the amount of memory occupied by Kernel space and User Space?
a) Is there any Linux command (or) system calls to find this?
Why the RAM has been divided into user space and kernel space?
a) I fell it is done to keep the kernel safe from application program is it so?is this is the only reason.
I am a beginner so please suggest me some good books,links and the way to approach these concepts.
I took up the challenge and tried with rather short answers:
Execution happens in user and kernel space. BIOS & CPU support the OS at detecting and separating resources/address ranges such as main memory and devices (-> related question) to establish the protected mode. In protected mode, memory is separated via virtual address spaces, which are mapped page wise (usually blocks of 4096 byte) to real addresses of physical memory via the MMU (Memory Management Unit).
From user space, one cannot accesses memory directly (in real mode), one has to access it via the MMU, which acts like a transparent proxy with access protection. Access errors are known as segmentation fault, access violation, segmentation violation (SIGSEGV), which are abstracted with NullPointerException (NPE) in high level programming languages like Java.
Read about protected mode, real mode and 'rings'.
Note: Special CPUs, such as in embedded systems, don't necessarily have an MMU and could therefore be limited to special OSes like µClinux or FreeRTOS.
A kernel does also allocate buffers, the same goes for drivers (e.g. IO buffers for disks, network interfaces and GPUs).
Generally, resources exist per space and process/thread
a) The kernel puts its own, protected stack on top of the user space stack (per thread) and has also separate code (also 'text'), data and heap segments. Also, each process has its own resources.
b) CPU architectures have certain requirements (depends upon the degree of support they offer), but in the end, it is the software (kernel & user space libraries used for interfacing), which create these structures.
Every reasonable OS provides at least one way to do that.
a) Try sudo cat /proc/slabinfo or simply sudo slabtop
Read 1.
a) Primarily, yes, just like user space processes are isolated from each other, except for special techniques such as CMA (Cross Memory Attach) for fast direct access in newer kernels.
Search the stack sites for recommended books
What can cause segmentation faults in C++?

The implementation of copy_from_user()

I am just wondering why does copy_from_user(to, from, bytes) do real copy? Because it just wants kernel to access user-space data, can it directly maps physical address to kernel's address space without moving the data?
Thanks,
copy_from_user() is usually used when writing certain device drivers. Note that there is no "mapping" of bytes here, the only thing that is happening is the copying of bytes from a certain virtual location mapped in user-space to bytes in a location in kernel-space. This is done to enforce separation of kernel and user and to prevent any security flaws -- you never want the kernel to start accessing and reading arbitrary user memory locations or vice-versa. That is why arguments and results from syscalls are copied to/from the user before they actually run.
"Before this it's better to know why copy_from_user() is used"
Because the Kernel never allow a user space application to access Kernel memory directly, because if the memory pointed is invalid or a fault occurs while reading, this would the kernel to panic by just simply using a user space application.
"And that's why!!!!!!"
So while using copy_from_user is all that it could create an error to the user and it won't affect the kernel functionality
Even though it's an extra effort it ensures the safe and secure operation of Kernel
copy_from_user() does a few checks before it starts copying data. Directly manipulating data from user-space is never a good idea because it exists in a virtual address space which might get swapped out.
http://www.ibm.com/developerworks/linux/library/l-kernel-memory-access/
one of the major requirement in system call implementation is to check the validity of user parameter pointer passed as argument, kernel should not blindly follow the user pointer as the user pointer can play tricks in many ways. Major concerns are:
1. it should be a pointer from that process address space - so that it cant get into some other process address space.
2. it should be a pointer from user space - it should not trick to play with a kernel space pointer.
3. it should not bypass memory access restrictions.
that is why copy_from_user() is performed. It is blocking and process sleeps until page fault handler can bring the page from swap file to physical memory.

Linux - Duplicate a virtual memory address from malloc or move a virtual memory address

Short question:
Is it possible to map a buffer that has been malloc'd to have two ways (two pointers pointing to the same physical memory) of accessing the same buffer?
Or, is it possible to temporarily move a virtual memory address received by malloc? Or is it possible to point from one location in virtual space to another?
Background:
I am working with DirectFB, a surface management and 2D graphics composting library. I am trying to enforce the Locking protocol which is to Lock a surface, modify the memory only while locked (the pointer is to system memory allocated using malloc), and unlocking the surface.
I am currently trying to trace down a bug in an application that is locking a surface and then storing the pixel pointer and modifying the surface later. This means that the library does not know when it is safe to read or write to a surface. I am trying to find a way to detect that the locking protocol has been violated. What I would like is a way to invalidate the pointer passed to the user after the unlock call is made. Even better, I would like the application to seg fault if it tries to access the memory after the lock. This would stop in the debugger and give us an idea of which surface is involved, which routine is involved, who called it, etc.
Possible solutions:
Create a temporary buffer, pass the buffer pointer to the user, on unlock copy the pixels to the actual buffer, delete the temporary
buffer.
Pros: This is an implementable solution.
Cons: Performance is slow as it requires a copy which is expensive, also the memory may or may not be available. There is no
way to guarantee that one temporary surface overlaps another allowing
an invalidated pointer to suddenly work again.
Make an additional map to a malloc'd surface and pass that to the user. On unlock, unmap the memory.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Need to set aside a reserved range of addresses are never used by anything else (including malloc or the kernel). Also need to
ensure that no two surfaces overlap which could allow an old pointer
to suddenly point to something valid and not seg fault when it should.
Take advantage of the fact that the library does not access the memory while locked by the user and simply move the virtual address on
a lock and move it back on an unlock.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Same as "2" above.
Is this feasible?
Additional info:
This is using Linux 2.6, using stdlib.
The library is written in C.
The library and application run in user space.
There is a possibility of using a kernel module (to write a custom memory allocation routine), but the difficulty of writing a module in
my current working climate would probably reduce the chances to near
zero levels that I could actually implement this solution. But if this
is the only way, it would be good to know.
The underlying processor is x86.
The function you want to create multiple mappings of a page is shm_open.
You may only be using the memory within one process, but it's still "shared memory" - that is to say, multiple virtual mappings for the same underlying physical page will exist.
However, that's not what you want to do. What you should actually do is have your locking functions use the mprotect system call to render the memory unreadable on unlock and restore the permissions on lock; any access without the lock being held will cause a segfault. Of course, this'll only work with a single simultaneous accessing thread...
Another, possibly better, way to track down the problem would be to run your application in valgrind or another memory analysis tool. This will greatly slow it down, but allows you very fine control: you can have a valgrind script that will mark/unmark memory as accessible and the tool will kick you straight into the debugger when a violation occurs. But for one-off problem solving like this, I'd say install an #ifdef DEBUG-wrapped mprotect call in your lock/unlock functions.

read() system call does a copy of data instead of passing the reference

The read() system call causes the kernel to copy the data instead of passing the buffer by reference. I was asked the reason for this in an interview. The best I could come up with were:
To avoid concurrent writes on the same buffer across multiple processes.
If the user-level process tries to access a buffer mapped to kernel virtual memory area it will result in a segfault.
As it turns out the interviewer was not entirely satisfied with either of these answers. I would greatly appreciate if anybody could elaborate on the above.
A zero copy implementation would mean the user level process would have to be given access to the buffers used internally by the kernel/driver for reading. The user would have to make an explicit call to the kernel to free the buffer after they were done with it.
Depending on the type of device being read from, the buffers could be more than just an area of memory. (For example, some devices could require the buffers to be in a specific area of memory. Or they could only support writing to a fixed area of memory be given to them at startup.) In this case, failure of the user program to "free" those buffers (so that the device could write more data to them) could cause the device and/or its driver to stop functioning properly, something a user program should never be able to do.
The buffer is specified by the caller, so the only way to get the data there is to copy them. And the API is defined the way it is for historical reasons.
Note, that your two points above are no problem for the alternative, mmap, which does pass the buffer by reference (and writing to it than writes to the file, so you than can't process the data in place, while many users of read do just that).
I might have been prepared to dispute the interviewer's assertion. The buffer in a read() call is supplied by the user process and therefore comes from the user address space. It's also not guaranteed to be aligned in any particular way with respect to page frames. That makes it tricky to do what is necessary to perform IO directly into the buffer ie. map the buffer into the device driver's address space or wire it for DMA. However, in limited circumstances, this may be possible.
I seem to remember the BSD subsystem used by Mac OS X used to copy data between address spaces had an optimisation in this respect, although I may be completely mistaken.

Shared memory and IPC [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I was reading a tutorial about shared memory and found the following statement: "If a process wishes to notify another process that new data has been inserted to the shared memory, it will have to use signals, message queues, pipes, sockets, or other types of IPC.". So what is the main advantage of using shared memory and other type of IPC for notifying only instead of using an IPC that doesn't need any other IPC type to be used, like message queue and socket for example?
The distinction here is IPC mechanisms for signalling versus shared state.
Signalling (signals, message queues, pipes, etc.) is appropriate for information that tends to be short, timely and directed. Events over these mechanisms tend to wake up or interrupt another program. The analogy would be, "what would one program SMS to another?"
Hey, I added a new entry to the hash table!
Hey, I finished that work you asked me to do!
Hey, here's a picture of my cat. Isn't he cute?
Hey, would you like to go out, tonight? There's this new place called the hard drive.
Shared memory, compared with the above, is more effective for sharing relatively large, stable objects that change in small parts or are read repeatedly. Programs might consult shared memory from time to time or after receiving some other signal. Consider, what would a family of programs write on a (large) whiteboard in their home's kitchen?
Our favorite recipes.
Things we know.
Our friends' phone numbers and other contact information.
The latest manuscript of our family's illustrious history, organized by prison time served.
With these examples, you might say that shared memory is closer to a file than to an IPC mechanism in the strictest sense, with the obvious exceptions that shared memory is
Random access, whereas files are sequential.
Volatile, whereas files tend to survive program crashes.
An example of where you want shared memory is a shared hash table (or btree or other compound structure). You could have every process receive update messages and update a private copy of the structure, or you can store the hash table in shared memory and use semaphores for locking.
Shared memory is very fast - that is the main advantage and reason you would use it. You can use part of the memory to keep flags/timestamps regarding the data validity, but you can use other forms of IPC for signaling if you want to avoid polling the shared memory.
Shared memory is used to transfer the data between processes (and also to read/write disk files fast). If you don't need to transfer the data and need to only notify other process, don't use shared memory - use other notification mechanisms (semaphores, events, etc) instead.
Depending on the amount of data to be passed from process to process, shared memory would be more efficient because you would minimize the number of times that data would be copied from userland memory to kernel memory and back to userland memory.

Resources