I am just wondering why does copy_from_user(to, from, bytes) do real copy? Because it just wants kernel to access user-space data, can it directly maps physical address to kernel's address space without moving the data?
Thanks,
copy_from_user() is usually used when writing certain device drivers. Note that there is no "mapping" of bytes here, the only thing that is happening is the copying of bytes from a certain virtual location mapped in user-space to bytes in a location in kernel-space. This is done to enforce separation of kernel and user and to prevent any security flaws -- you never want the kernel to start accessing and reading arbitrary user memory locations or vice-versa. That is why arguments and results from syscalls are copied to/from the user before they actually run.
"Before this it's better to know why copy_from_user() is used"
Because the Kernel never allow a user space application to access Kernel memory directly, because if the memory pointed is invalid or a fault occurs while reading, this would the kernel to panic by just simply using a user space application.
"And that's why!!!!!!"
So while using copy_from_user is all that it could create an error to the user and it won't affect the kernel functionality
Even though it's an extra effort it ensures the safe and secure operation of Kernel
copy_from_user() does a few checks before it starts copying data. Directly manipulating data from user-space is never a good idea because it exists in a virtual address space which might get swapped out.
http://www.ibm.com/developerworks/linux/library/l-kernel-memory-access/
one of the major requirement in system call implementation is to check the validity of user parameter pointer passed as argument, kernel should not blindly follow the user pointer as the user pointer can play tricks in many ways. Major concerns are:
1. it should be a pointer from that process address space - so that it cant get into some other process address space.
2. it should be a pointer from user space - it should not trick to play with a kernel space pointer.
3. it should not bypass memory access restrictions.
that is why copy_from_user() is performed. It is blocking and process sleeps until page fault handler can bring the page from swap file to physical memory.
Related
I am writing a device driver which uses character device file to copy data from user space buffer which is allocated by using malloc to kernel buffer. Currently using copy_from_user api to copy user data to kernel buffer. Trying to find a way to avoid data copying between user and kernel spaces. Is there any way to access user space buffer (allocated by malloc) in kernel space without using copy_from_user?
Let's start by answering the question you've asked. Yes, you can access memory malloced by the userspace from the kernel. I'm not sure of the technicalities of how, but if your kernel code runs in the context of the calling thread, it might be possible to simply use the user space pointer given to you.
Please don't do that, however.
The reason you shouldn't do that is that if the user space pointer is bad, or the memory is too short, or any other problem exists, the user space process may need to crash with a segmentation fault. Sadly, you are not running user space code, you are running kernel space. The kernel equivalent of segmentation fault is a kernel panic. Don't inflict that on your users.
A better approach is to use a mechanism where userspace writes the data once, and then the character device can simply use it. Having user space mmap the page was mentioned in the comments. A method that might be more standard to the way file descriptors work would be to implement splice_read in the device's fileops struct.
Now, splice is not an easy interface to work with, but the main advantage is that if your source is also splice aware, the user can pass data directly from the source to your driver without it ever passing through user space.
If you want to save the copies, I suggest you go with one of those solutions. Again, do not access the user supplied pointer directly unless you know what you're doing.
get_user_pages() API can be used to pin the user pages from swapped out from physical memory and kernel can access this memory area by accessing physical pages of corresponding virtual address.
I was reading a paragraph from the "The Linux Kernel Module Programming Guide" and I have a couple of doubts related to the following paragraph.
The reason for copy_from_user or get_user is that Linux memory (on
Intel architecture, it may be different under some other processors)
is segmented. This means that a pointer, by itself, does not reference
a unique location in memory, only a location in a memory segment, and
you need to know which memory segment it is to be able to use it.
There is one memory segment for the kernel, and one for each of the
processes.
However it is my understanding that Linux uses paging instead of segmentation and that virtual addresses at and above 0xc0000000 have the kernel mapping in.
Do we use copy_from_user in order to accommodate older kernels?
Do the current linux kernels use segmentation in any way at all? If so how?
If (1) is not true, are there any other advantages to using copy_from_user?
Yeah. I don't like that explanation either. The details are essentially correct in a technical sense (see also Why does Linux on x86 use different segments for user processes and the kernel?) but as you say, linux typically maps the memory so that kernel code could access it directly, so I don't think it's a good explanation for why copy_from_user, etc. actually exist.
IMO, the primary reason for using copy_from_user / copy_to_user (and friends) is simply that there are a number of things to be checked (dangers to be guarded against), and it makes sense to put all of those checks in one place. You wouldn't want every place that needs to copy data in and out from user-space to have to re-implement all those checks. Especially when the details may vary from one architecture to the next.
For example, it's possible that a user-space page is actually not present when you need to copy to or from that memory and hence it's important that the call be made from a context that can accommodate a page fault (and hence being put to sleep).
Also, user-space data pointers need to be checked carefully to ensure that they actually point to user-space and that they point to data regions, and that the copy length doesn't wrap beyond the end of the valid regions, and so forth.
Finally, it's possible that user-space actually doesn't share the same page mappings with the kernel. There used to be a linux patch for 32-bit x86 that made the complete 4G of virtual address space available to user-space processes. In that case, kernel code could not make the assumption that a user-space pointer was directly accessible, and those functions might need to map individual user-space pages one at a time in order to access them. (See 4GB/4GB Kernel VM Split)
I have to scan the memory space of a calling process in C. This is for homework. My problem is that I don't fully understand virtual memory addressing.
I'm scanning the memory space by attempting to read and write to a memory address. I can not use proc files or any other method.
So my problem is setting the pointers.
From what I understand the "User Mode Space" begins at address 0x0, however, if I set my starting point to 0x0 for my function, then am I not scanning the address space for my current process? How would you recommend adjusting the pointer -- if at all -- to address the parent process address space?
edit: Ok sorry for the confusion and I appreciate the help. We can not use proc file system, because the assignment is intended for us to learn about signals.
So, basically I'm going to be trying to read and then write to an address in each page of memory to test if it is R, RW or not accessible. To see if I was successful I will be listening for certain signals -- I'm not sure how to go about that part yet. I will be creating a linked list of structure to represent the accessibility of the memory. The program will be compiled as a 32 bit program.
With respect to parent process and child process: the exact text states
When called, the function will scan the entire memory area of the calling process...
Perhaps I am mistaken about the child and parent interaction, due to the fact we've been covering this (fork function etc.) in class, so I assumed that my function would be scanning a parent process. I'm going to be asking for clarification from the prof.
So, judging from this picture I'm just going to start from 0x0.
From a userland process's perspective, its address space starts at address 0x0, but not every address in that space is valid or accessible for the process. In particular, address 0x0 itself is never a valid address. If a process attempts to access memory (in its address space) that is not actually assigned to that process then a segmentation results.
You could actually use the segmentation fault behavior to help you map out what parts of the address space are in fact assigned to the process. Install a signal handler for SIGSEGV, and skip through the whole space, attempting to read something from somewhere in each page. Each time you trap a SIGSEGV you know that page is not mapped for your process. Go back afterward and scan each accessible page.
Do only read, however. Do not attempt to write to random memory, because much of the memory accessible to your programs is the binary code of the program itself and of the shared libraries it uses. Not only do you not want to crash the program, but also much of that memory is probably marked read-only for the process.
EDIT: Generally speaking, a process can only access its own (virtual) address space. As #cmaster observed, however, there is a syscall (ptrace()) that allows some processes access to some other processes' memory in the context of the observed process's address space. This is how general-purpose debuggers usually work.
You could read (from your program) the /proc/self/maps file. Try first the following two commands in a terminal
cat /proc/self/maps
cat /proc/$$/maps
(at least to understand what are the address space)
Then read proc(5), mmap(2) and of course wikipages about processes, address space, virtual memory, MMU, shared memory, VDSO.
If you want to share memory between two processes, read first shm_overview(7)
If you can't use /proc/ (which is a pity) consider mincore(2)
You could also non-portably try reading from (and perhaps rewriting the same value using volatile int* into) some address and catching SIGSEGV signal (with a sigsetjmp(3) in the signal handler), and do that in a -dichotomical- loop (in multiple of 4Kbytes) - from some sane start and end addresses (certainly not from 0, but probably from (void*)0x10000 and up to (void*)0xffffffffff600000)
See signal(7).
You could also use the Linux (Gnu libc) specific dladdr(3). Look also into ptrace(2) (which should be often used from some other process).
Also, you could study elf(5) and read your own executable ELF file. Canonically it is /proc/self/exe (a symlink) but you should be able to get its from the argv[0] of your main (perhaps with the convention that your program should be started with its full path name).
Be aware of ASLR and disable it if your teacher permits that.
PS. I cannot figure out what your teacher is expecting from you.
It is a bit more difficult than it seems at the first sight. In Linux every process has its own memory space. Using any arbitrary memory address points to the memory space of this process only. However there are mechanisms which allow one process to access memory regions of another process. There are certain Linux functions which allow this shared memory feature. For example take a look at
this link which gives some examples of using shared memory under Linux using shmget, shmctl and other system calls. Also you can search for mmap system call, which is used to map a file into a process' memory, but can also be used for the purpose of accessing memory of another process.
Short question:
Is it possible to map a buffer that has been malloc'd to have two ways (two pointers pointing to the same physical memory) of accessing the same buffer?
Or, is it possible to temporarily move a virtual memory address received by malloc? Or is it possible to point from one location in virtual space to another?
Background:
I am working with DirectFB, a surface management and 2D graphics composting library. I am trying to enforce the Locking protocol which is to Lock a surface, modify the memory only while locked (the pointer is to system memory allocated using malloc), and unlocking the surface.
I am currently trying to trace down a bug in an application that is locking a surface and then storing the pixel pointer and modifying the surface later. This means that the library does not know when it is safe to read or write to a surface. I am trying to find a way to detect that the locking protocol has been violated. What I would like is a way to invalidate the pointer passed to the user after the unlock call is made. Even better, I would like the application to seg fault if it tries to access the memory after the lock. This would stop in the debugger and give us an idea of which surface is involved, which routine is involved, who called it, etc.
Possible solutions:
Create a temporary buffer, pass the buffer pointer to the user, on unlock copy the pixels to the actual buffer, delete the temporary
buffer.
Pros: This is an implementable solution.
Cons: Performance is slow as it requires a copy which is expensive, also the memory may or may not be available. There is no
way to guarantee that one temporary surface overlaps another allowing
an invalidated pointer to suddenly work again.
Make an additional map to a malloc'd surface and pass that to the user. On unlock, unmap the memory.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Need to set aside a reserved range of addresses are never used by anything else (including malloc or the kernel). Also need to
ensure that no two surfaces overlap which could allow an old pointer
to suddenly point to something valid and not seg fault when it should.
Take advantage of the fact that the library does not access the memory while locked by the user and simply move the virtual address on
a lock and move it back on an unlock.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Same as "2" above.
Is this feasible?
Additional info:
This is using Linux 2.6, using stdlib.
The library is written in C.
The library and application run in user space.
There is a possibility of using a kernel module (to write a custom memory allocation routine), but the difficulty of writing a module in
my current working climate would probably reduce the chances to near
zero levels that I could actually implement this solution. But if this
is the only way, it would be good to know.
The underlying processor is x86.
The function you want to create multiple mappings of a page is shm_open.
You may only be using the memory within one process, but it's still "shared memory" - that is to say, multiple virtual mappings for the same underlying physical page will exist.
However, that's not what you want to do. What you should actually do is have your locking functions use the mprotect system call to render the memory unreadable on unlock and restore the permissions on lock; any access without the lock being held will cause a segfault. Of course, this'll only work with a single simultaneous accessing thread...
Another, possibly better, way to track down the problem would be to run your application in valgrind or another memory analysis tool. This will greatly slow it down, but allows you very fine control: you can have a valgrind script that will mark/unmark memory as accessible and the tool will kick you straight into the debugger when a violation occurs. But for one-off problem solving like this, I'd say install an #ifdef DEBUG-wrapped mprotect call in your lock/unlock functions.
The read() system call causes the kernel to copy the data instead of passing the buffer by reference. I was asked the reason for this in an interview. The best I could come up with were:
To avoid concurrent writes on the same buffer across multiple processes.
If the user-level process tries to access a buffer mapped to kernel virtual memory area it will result in a segfault.
As it turns out the interviewer was not entirely satisfied with either of these answers. I would greatly appreciate if anybody could elaborate on the above.
A zero copy implementation would mean the user level process would have to be given access to the buffers used internally by the kernel/driver for reading. The user would have to make an explicit call to the kernel to free the buffer after they were done with it.
Depending on the type of device being read from, the buffers could be more than just an area of memory. (For example, some devices could require the buffers to be in a specific area of memory. Or they could only support writing to a fixed area of memory be given to them at startup.) In this case, failure of the user program to "free" those buffers (so that the device could write more data to them) could cause the device and/or its driver to stop functioning properly, something a user program should never be able to do.
The buffer is specified by the caller, so the only way to get the data there is to copy them. And the API is defined the way it is for historical reasons.
Note, that your two points above are no problem for the alternative, mmap, which does pass the buffer by reference (and writing to it than writes to the file, so you than can't process the data in place, while many users of read do just that).
I might have been prepared to dispute the interviewer's assertion. The buffer in a read() call is supplied by the user process and therefore comes from the user address space. It's also not guaranteed to be aligned in any particular way with respect to page frames. That makes it tricky to do what is necessary to perform IO directly into the buffer ie. map the buffer into the device driver's address space or wire it for DMA. However, in limited circumstances, this may be possible.
I seem to remember the BSD subsystem used by Mac OS X used to copy data between address spaces had an optimisation in this respect, although I may be completely mistaken.