mmap() in linux kernel to access unmapped memory - c

I am trying to use the mamp() functionality provided in linux-kernel.
As we call mmap() in user-space we try to map virtual memory area of user-space process to the memory in the kernel-space.
the definition of mamp() inside kernel is done in my kernel module which try to allocate some memory in pages & maps it during mmap system call. The memory content of this kernel-space memory could be filled by this module.
The question i want to ask is that after memory mapping the user-space process could access the mapped memory directly with-out any extra kernel overload so there will be no system-call like read() but if the memory(allocated inside kernel-space & mapped in the kernel-space) is containing the pointer to other memory(not mapped) allocated inside the kernel-space then could the user-space process be able to access this unmapped memory with the help of mapped memory's content which are pointer to this unmapped memory.

No, userspace can't chase pointers in mapped memory that point to unmapped kernel memory.

No user-space process can not be able to access the unmapped memory. Kernel wont allow you to access that memory.
You are able to access only that portion of memory which is mapped via mmap.
I think use can use remap_pfn_range function explicitly to remapping the region.
From Linux mmap man page
The effect of changing
the size of the underlying file of a mapping on the pages that correspond to
added or removed regions of the file is unspecified.

No,you can't.
However,If your purpose is to change your mmaped area on the fly,Here are some options:
A. In user space, you can use mremap which expands (or shrinks) an existing memory mapping.
B. In kernel space,in your driver, you need to implement nopage() method or remap_pfn_range,but remap_pfn_range has its limitation which Linux only gives the reserved pages and you even cant remap normal address,such as the one allocated by get_free_page()

Related

Linux: Mapping debugee memory into debugger memory space

Basically, I want to avoid system calls for reading to/writing from the debugee memory space. I only want to map a single mapping from /proc/pid/maps, I tried just mmap()ing from /proc/pid/mem but turns out procfs doesn't support mmap.
Tuxifan

About memory allocation, does C malloc/calloc depends on Linux mmap/malloc or the opposite?

As far as I know, C has the following functions, e.g: malloc, calloc, realloc, to allocate memory. And the linux kernel also has the following functions, e.g: malloc, mmap, kmalloc, vmalloc... to allocate memory.
I want to know which is the lowest function.
If you say, "Linux kernel is the lowest function, your C program must allocate memory with Linux kernel", then how does the Linux kernel allocate it's own memory?
Or if you say, "Linux kernel is the lowest function.", then when I write a C program and run in the Linux system, to allocate memory, I should through the system call.
Hope to have an answer.
I want to know which is the lowest function.
The user-level malloc function calls brk or malloc (depending on the library used and depending on the Linux version).
... how does the Linux kernel allocate it's own memory?
On a system without MMU this is easy:
Let's say we have system with 8 MB RAM and we know that the address of the RAM is 0x20000000 to 0x20800000.
The kernel contains an array that contains information about which pages are in use. Let's say the size of a "page" is 0x1000 (this is the page size in x86 systems with MMU).
In old Linux versions the array was named mem_map. Each element in the array corresponds to one memory "page". It is zero if the page is free.
When the system is started, the kernel itself initializes this array (writes the initial values in the array).
If the kernel needs one page of memory, it searches for an element in the array whose value is 0. Let's say mem_map[0x123] is 0. The kernel now sets mem_map[0x123]=1;. mem_map[0x123] corresponds to the address 0x20123000, so the kernel "allocated" some memory at address 0x20123000.
If the kernel wants to "free" some memory at address 0x20234000, it simply sets mem_map[0x234]=0;.
On a system with MMU, it is a bit more complicated but the principle is the same.
On the linux OS, the C functions malloc, calloc, realloc used by user mode programs are implemented in the C library and handle pages of memory mapped in the process address space using the mmap system call. mmap associates pages of virtual memory with addresses in the process address space. When the process then accesses these addresses, actual RAM is mapped by the kernel to this virtual space. Not every call to malloc maps memory pages, only those for which not enough space was already requested from the system.
In the kernel space, a similar process takes place but the caller can require that the RAM be mapped immediately.

How is the anatomy of memory mapped Kernel space

I try to understand the mechanism in Linux of mapping kernel mode space into user mode space using mmap.
First I have a loadable kernel module (LKM) which provides a character device with mmap-functionality. Then a user space application open the device and calls mmap the LKM allocate memory space on the heap of the LKM inside the kernel mode space (virtual high address). On user space side the data pointer points to a virtual low address.
The following picture shows how I imagine the anatomy of memory is. Is this right?
Please let me know if question is not clear, I will try to add more details.
Edit: The picture was edited regarding to Gil Hamilton. The black arrow now points to a physical address.
The drawing is missing out a few important underlying assumptions.
The kernel does not need to mmap() to access user space memory. If a user process has the memory, it's already mapped in the address space by definition. In that sense, the memory is already shared between user and kernel.
mmap() creates a new region in user's virtual address space, so that the address region can be populated by physical memory if later accessed. The actual allocation of memory and modifying the page table entry is done by the kernel.
mmap() only makes sense for managing user-half of the virtual address space. Kernel-half of the address space is managed completely differently.
Also, the kernel-half is shared by all processes in the system. Each process has its dedicated virtual address space, but the page tables are programmed in such a way that the page table entries for the kernel-half are set exactly the same for all processes.
Again, the kernel does not mmap() in order to access user space memory. mmap() is rather a service provided by kernel to user to modify the current mapping in user's virtual address space.
BTW, the kernel actually has a few ways to access user memory if it wants to.
First of all, the kernel has a dedicated region of kernel address space (as part of its kernel space) which maps the entirety of the physical memory present in consecutive fashion. (This is true in all 64-bit system. In 32-bit system the kernel has to 'remap' on-the-fly to achieve this.)
Second, if the kernel is entered via a system call or exception, not by hardware interrupt, you have valid process context, so the kernel can directly "dereference" user space pointer to get the correct value.
Third, if kernel wants to deference a user space pointer of a process while executing in a borrowed context such as in an interrupt handler, kernel can trace process's virtual address by traversing the vm_area_struct tree for permission and walking the page table to find out actual physical page frame.
You can check the memory regions by iterating through vma's "struct vm_area_struct" through current.
If you walk pagetables and derive mapped physical addresses for virtual addresses which is not related to user space then memory layout will be more clear.
Apart from this minor correction in this figure,
BSS is not a segment but section which is embed to Data segment, refer ELF specification for more details, linker script

An alternative to map physical memory to user virtual address space without using "mmap" call [duplicate]

This question already has answers here:
Mapping DMA buffers to userspace [closed]
(5 answers)
Closed 9 years ago.
In Linux we know that we can map physical memory to the user virtual address space using mmap call in user-space app and implementing mmap function pointer in our device driver(using remap_pfn_range). But is there any other way to map the physical memory to user virtual address space without the mmap call. May be we can use malloc call and make an "IOCTL" call passing user virtual start address and then using kmalloc and remap_pfn_range we can map.
I tried once but failed. Is it the correct way or any other way exists.
-Sumeet
Physical memory is mapped to virtual address space automatically, without any mmap call, by the Operating System. It is called memory management or memory paging. This means that with any new memory your process needs, e.g. by malloc, or in the stack segment, the process asks for new pages and OS creates mapping from virtual to physical memory automatically. But, you need not to know all this - the operating system will hide everything from you. Just be happy in the virtual address space and that's enough :-)
mmap, on the other hand is mapping files to virtual memory. It is special case of mapping. You can also use IPC shared memory (shmget) to share memory between processes. You can also do this using mmap only without actually writing to file.
Depends what you want to do.

How does mprotect() work?

I was stracing some of the common commands in the linux kernel, and saw mprotect() was used a lot many times. I'm just wondering, what is the deciding factor that mprotect() uses to find out that the memory address it is setting a protection value for, is in its own address space?
On architectures with an MMU1, the address that mprotect() takes as an argument is a virtual address. Each process has its own independent virtual address space, so there's only two possibilities:
The requested address is within the process's own address range; or
The requested address is within the kernel's address range (which is mapped into every process).
mprotect() works internally by altering the flags attached to a VMA2. The first thing it must do is look up the VMA corresponding to the address that was passed - if the passed address was within the kernel's address range, then there is no VMA, and so this search will fail. This is exactly the same thing happens if you try to change the protections on an area of the address space that is not mapped.
You can see a representation of the VMAs in a process's address space by examining /proc/<pid>/smaps or /proc/<pid>/maps.
1. Memory Management Unit
2. Virtual Memory Area, a kernel data structure describing a contiguous section of a process's memory.
This is about virtual memory. And about dynamic linker/loader. Most mprotect(2) syscalls you see in the trace are probably related to bringing in library dependencies, though malloc(3) implementation might call it too.
Edit:
To answer your question in comments - the MMU and the code inside the kernel protect one process from the other. Each process has an illusion of a full 32-bit or 64-bit address space. The addresses you operate on are virtual and belong to a given process. Kernel, with the help of the hardware, maps those to physical memory pages. These pages could be shared between processes implicitly as code, or explicitly for interprocess communications.
The kernel looks up the address you pass mprotect in the current process's page table. If it is not in there then it fails. If it is in there the kernel may attempt to mark the page with new access rights. I'm not sure, but it may still be possible that the kernel would return an error here if there were some special reason that the access could not be granted (such as trying to change the permissions of a memory mapped shared file area to writable when the file was actually read only).
Keep in mind that the page table that the processor uses to determine if an area of memory is accessible is not the one that the kernel used to look up that address. The processor's table may have holes in it for things like pages that are swapped out to disk. The tables are related, but not the same.

Resources