How does mprotect() work? - c

I was stracing some of the common commands in the linux kernel, and saw mprotect() was used a lot many times. I'm just wondering, what is the deciding factor that mprotect() uses to find out that the memory address it is setting a protection value for, is in its own address space?

On architectures with an MMU1, the address that mprotect() takes as an argument is a virtual address. Each process has its own independent virtual address space, so there's only two possibilities:
The requested address is within the process's own address range; or
The requested address is within the kernel's address range (which is mapped into every process).
mprotect() works internally by altering the flags attached to a VMA2. The first thing it must do is look up the VMA corresponding to the address that was passed - if the passed address was within the kernel's address range, then there is no VMA, and so this search will fail. This is exactly the same thing happens if you try to change the protections on an area of the address space that is not mapped.
You can see a representation of the VMAs in a process's address space by examining /proc/<pid>/smaps or /proc/<pid>/maps.
1. Memory Management Unit
2. Virtual Memory Area, a kernel data structure describing a contiguous section of a process's memory.

This is about virtual memory. And about dynamic linker/loader. Most mprotect(2) syscalls you see in the trace are probably related to bringing in library dependencies, though malloc(3) implementation might call it too.
Edit:
To answer your question in comments - the MMU and the code inside the kernel protect one process from the other. Each process has an illusion of a full 32-bit or 64-bit address space. The addresses you operate on are virtual and belong to a given process. Kernel, with the help of the hardware, maps those to physical memory pages. These pages could be shared between processes implicitly as code, or explicitly for interprocess communications.

The kernel looks up the address you pass mprotect in the current process's page table. If it is not in there then it fails. If it is in there the kernel may attempt to mark the page with new access rights. I'm not sure, but it may still be possible that the kernel would return an error here if there were some special reason that the access could not be granted (such as trying to change the permissions of a memory mapped shared file area to writable when the file was actually read only).
Keep in mind that the page table that the processor uses to determine if an area of memory is accessible is not the one that the kernel used to look up that address. The processor's table may have holes in it for things like pages that are swapped out to disk. The tables are related, but not the same.

Related

How is the anatomy of memory mapped Kernel space

I try to understand the mechanism in Linux of mapping kernel mode space into user mode space using mmap.
First I have a loadable kernel module (LKM) which provides a character device with mmap-functionality. Then a user space application open the device and calls mmap the LKM allocate memory space on the heap of the LKM inside the kernel mode space (virtual high address). On user space side the data pointer points to a virtual low address.
The following picture shows how I imagine the anatomy of memory is. Is this right?
Please let me know if question is not clear, I will try to add more details.
Edit: The picture was edited regarding to Gil Hamilton. The black arrow now points to a physical address.
The drawing is missing out a few important underlying assumptions.
The kernel does not need to mmap() to access user space memory. If a user process has the memory, it's already mapped in the address space by definition. In that sense, the memory is already shared between user and kernel.
mmap() creates a new region in user's virtual address space, so that the address region can be populated by physical memory if later accessed. The actual allocation of memory and modifying the page table entry is done by the kernel.
mmap() only makes sense for managing user-half of the virtual address space. Kernel-half of the address space is managed completely differently.
Also, the kernel-half is shared by all processes in the system. Each process has its dedicated virtual address space, but the page tables are programmed in such a way that the page table entries for the kernel-half are set exactly the same for all processes.
Again, the kernel does not mmap() in order to access user space memory. mmap() is rather a service provided by kernel to user to modify the current mapping in user's virtual address space.
BTW, the kernel actually has a few ways to access user memory if it wants to.
First of all, the kernel has a dedicated region of kernel address space (as part of its kernel space) which maps the entirety of the physical memory present in consecutive fashion. (This is true in all 64-bit system. In 32-bit system the kernel has to 'remap' on-the-fly to achieve this.)
Second, if the kernel is entered via a system call or exception, not by hardware interrupt, you have valid process context, so the kernel can directly "dereference" user space pointer to get the correct value.
Third, if kernel wants to deference a user space pointer of a process while executing in a borrowed context such as in an interrupt handler, kernel can trace process's virtual address by traversing the vm_area_struct tree for permission and walking the page table to find out actual physical page frame.
You can check the memory regions by iterating through vma's "struct vm_area_struct" through current.
If you walk pagetables and derive mapped physical addresses for virtual addresses which is not related to user space then memory layout will be more clear.
Apart from this minor correction in this figure,
BSS is not a segment but section which is embed to Data segment, refer ELF specification for more details, linker script

Does Virtual Memory area struct only comes into picture when there is a page fault?

Virtual Memory is a quite complex topic for me. I am trying to understand it. Here is my understanding for a 32-bit system. Example RAM is just 2GB. I have tried reading many links, and I am not confident at the moment. I would like you people to help me in clearing up my concepts. Please acknowledge my points, and also please answer for what you feel is wrong. I have also a confused section in my points. So, here starts the summary.
Every process thinks it is only running. It can access the 4GB of memory - virtual address space.
When a process access a virtual address it is translated to physical address via MMU.
This MMU is a part of a CPU - a hardware.
When the MMU cannot translate the address to a physical one, it raises a page fault.
On page fault, the kernel is notified. The kernel check the VM area struct. If it can find it - may be on disk. It will do some page-in /page-out. And get this memory on the RAM.
Now MMU will again try and will succeed this time.
In case the kernel cannot find the address, it will raise a signal. For example, invalid access will raise a SIGSEGV.
Confused points.
Does Page table is maintained in Kernel? This VM area struct has a page table ?
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
Does Page table is maintained in Kernel? This VM area struct has a page table ?
Yes. Not exactly: each process has a mm_struct, which contains a list of vm_area_struct's (which represent abstract, processor-independent memory regions, aka mappings), and a field called pgd, which is a pointer to the processor-specific page table (which contains the current state of each page: valid, readable, writable, dirty, ...).
The page table doesn't need to be complete, the OS can generate each part of it from the VMAs.
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
The translation fails, e.g. because the page was marked as invalid, or a write access was attempted against a readonly page.
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
There are two kinds of MMUs in common use. One of them only has a TLB (Translation Lookaside Buffer), which is a cache of the page table. When the TLB doesn't have a translation for an attempted access, a TLB miss is generated, the OS does a page table walk, and puts the translation in the TLB.
The other kind of MMU does the page table walk in hardware.
In any case, the OS maintains a page table per process, this maps Virtual Page Numbers to Physical Frame Numbers. This mapping can change at any moment, when a page is paged-in, the physical frame it is mapped to depends on the availability of free memory.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
To a first approximation, yes. Beyond that, there are many reasons why the kernel may decide to fiddle with a process' memory, e.g: if there is memory pressure it may decide to page out some rarely used pages from some random process. User space can also manipulate the mappings via mmap(), execve() and other system calls.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Totally unrelated to the other questions. In summary, high memory is a hack to be able to access lots of memory in a limited address space computer.
Basically, the kernel has a limited address space reserved to it (on x86, a typical user/kernel split is 3Gb/1Gb [processes can run in user space or kernel space. A process runs in kernel space when a syscall is invoked. To avoid having to switch the page table on every context-switch, on x86 typically the address space is split between user-space and kernel-space]). So the kernel can directly access up to ~1Gb of memory. To access more physical memory, there is some indirection involved, which is what high memory is all about.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run Linux?
Laptop/desktop processors come with an MMU. x86 supports paging since the 386.
Linux, specially the variant called µCLinux, supports processors without MMUs (!MMU). Many embedded systems (ADSL routers, ...) use processors without an MMU. There are some important restrictions, among them:
Some syscalls don't work at all: e.g fork().
Some syscalls work with restrictions and non-POSIX conforming behavior: e.g mmap()
The executable file format is different: e.g bFLT or ELF-FDPIC instead of ELF.
The stack cannot grow, and its size has to be set at link-time.
When a program is loaded first the kernel will setup a kernel VM-Area for that process is it? This Kernel VM Area actually holds where the program sections are there in the memory/HDD. Then the entire story of updating CR3 register, and page walkthrough or TLB comes into the picture right? So, whenever there is a pagefault - Kernel will update the page table by looking at Kernel virtual memory area is it? But they say Kernel VM area keeps updating. How this is possible, since cat /proc/pid_value/map will keep updating.The map won't be constant from start to end. SO, the real information is available in the Kernel VM area struct is it? This is the acutal information where the section of program lies, it could be HDD or physical memory -- RAM? So, this is filled during process loading is it, the first job? Kernel does the page in page out on page fault, and will update the Kernel VM area is it? So, it should also know the entire program location on the HDD for page-in / page out right? Please correct me here. This is in continuation to my first question of the previous comment.
When the kernel loads a program, it will setup several VMAs (mappings), according to the segments in the executable file (which on ELF files you can see with readelf --segments), which will be text/code segment, data segment, etc... During the lifetime of the program, additional mappings may be created by the dynamic/runtime linkers, by the memory allocator (malloc(), which may also extend the data segment via brk()), or directly by the program via mmap(),shm_open(), etc..
The VMAs contain the necessary information to generate the page table, e.g. they tell whether that memory is backed by a file or by swap (anonymous memory). So, yes, the kernel will update the page table by looking at the VMAs. The kernel will page in memory in response to page faults, and will page out memory in response to memory pressure.
Using x86 no PAE as an example:
On x86 with no PAE, a linear address can be split into 3 parts: the top 10 bits point to an entry in the page directory, the middle 10 bits point to an entry in the page table pointed to by the aforementioned page directory entry. The page table entry may contain a valid physical frame number: the top 22 bits of a physical address. The bottom 12 bits of the virtual address is an offset into the page that goes untranslated into the physical address.
Each time the kernel schedules a different process, the CR3 register is written to with a pointer to the page directory for the current process. Then, each time a memory access is made, the MMU tries to look for a translation cached in the TLB, if it doesn't find one, it looks for one doing a page table walk starting from CR3. If it still doesn't find one, a GPF fault is raised, the CPU switches to Ring 0 (kernel mode), and the kernel tries to find one in the VMAs.
Also, I believe this reading from CR, page directory->page-table->Page frame number-memory address this all done by MMU. Am I correct?
On x86, yes, the MMU does the page table walk. On other systems (e.g: MIPS), the MMU is little more than the TLB, and on TLB miss exceptions the kernel does the page table walk by software.
Though this is not going to be the best answer, iw ould like to share my thoughts on confused points.
1. Does Page table is maintained...
Yes. kernel maintains the page tables. In fact it maintains nested page tables. And top of the page tables is stored in top_pmd. pmd i suppose it is page mapping directory. You can traverse through all the page tables using this structure.
2. How MMU cannot find the address in physical RAM.....
I am not sure i understood the question. But in case because of some problem, the instruction is faulted or out of its instruction area is being accessed, you generally get undefined instruction exception resulting in undefined exception abort. If you look at the crash dumps, you can see it in the kernel log.
3. Is the Mapping table - virtual to physical is inside a MMU...
Yes. MMU is SW+HW. HW is like TLB and all. The mapping tables are stored here. For instructions, that is for code section i always converted the physical-virtual address and always they matched. And almost all the times it matches for Data sections as well.
4. cat /proc/pid_value/maps. This shows me the current mapping of the vmarea....
This is more used for analyzing the virtual addresses of user space stacks. As you know virtually all the user space programs can have 4 GB of virtual address. So unlike kernel if i say 0xc0100234. You cannot directly go and point to the istruction. So you need this mapping and the virtual address to point the instruction based on the data you have.
5. The high-mem concept is that kernel cannot directly access the Memory...
High-mem corresponds to user space memory(some one correct me if i am wrong). When kernel wants to read some data from a address at user space you will be accessing the HIGHMEM.
6. Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
MMU as i mentioned is HW + SW. So mostly it would be coming with the chipset. and the SW would be generally architecture dependent. You can disable MMU from kernel config and build. I have never tried it though. Mostly these days allthe chipsets have it. But small boards i think they disable MMU. I am not entirely sure though.
As all these are conceptual questions, i may be lacking some knowledge and be wrong at places. If so others please correct me.

Is the Kernel Virtual Memory struct first formed when the process is about to execute?

I have been bothering with similar questions indirectly on my other posts. Now, my understanding is better. Thus, my questions are better. So, I want to summarize the facts here. This example is based on X86-32-bit system.
Please say yes/no to my points. If no, then please explain.
MMU will look into the CR3 register to find the Process - Page Directory base address.
The CR3 register is set by the kernel.
Now MMU after reading the Page directory base address, will offset to the Page Table index (calculated from VA), from here it will read the Page frame number, now it will find the offset on the page frame number based on the VA given. It gets the physical memory address. All this is done in MMU right? Don't know when MMU is disabled, who will do all this circus? If software then it will be slow right?
I know then page fault occurs when the MMU cannot resolve the address. The kernel is informed. The kernel will update the page table based on the reading from kernel virtual memory area struct. Am I correct?
Keeping in mind, the point 4. Does it mean that before executing any process. Perhaps during loading process. Does Kernel first fills the kernel virtual memory area struct. For example, where the section of memory will be BSS, Code, DS,etc. It could be that some sections are in RAM, and some are in Storage device. When the sections of the program is moved from storage to main memory, I am assuming that kernel would be updating the Kernel virtual memory area struct. Am I correct here? So, it is the kernel who keeps a close track on the program location - whether in storage device or RAM - inode number of device and file offset.
Sequence wise -> During Process loading ( may be a loader program)-> Kernel will populate the data in the kernel virtual memory area struct. It will also set the CR3 register. Now Process starts executing, it will initially get some frequent page faults.Now the VM area struct will be updated (if required) and then the page table. Now, MMU will succeed in translating the address. So, when I say process accessing a memory, it is the MMU which is accessing the memory on behalf of the process. This is all about user-space. Kernel space is entirely different. The kernel space doesn't need the MMU, it can directly map to the physical address - low mem. For high mem ( to access user space from kernel space), it will do the temporary page table updation - internally. This is a separate page table for kernel, a temporary one. The kernel space doesn't need MMU. Am I correct?
Don't know when MMU is disabled, who will do all this circus?
Nobody. All this circus is intended to do two things: translate the virtual address you gave it into a real address, and if it can't do that then to abort the instruction entirely and start executing a routine addressed from an architecturally pre-defined address, see "page fault" there for the basic one.
When the MMU is shut off, no translation is done and the address you gave it is fed directly down the CPU's address-processing pipe just as any address the MMU might have translated it to would have been.
So, when I say process accessing a memory, it is the MMU which is accessing the memory on behalf of the process.
You're on the right track here, the MMU is mediating the access, but it isn't doing the access. It's doing only what you described before, translating it. What's generally called the Load/Store unit, gets it next, and it's the one that handles talking to whatever holds the closest good copy of the data at that address, "does the access".
The kernel space doesn't need the MMU, it can directly map to the physical address
That depends on how you define "need". It can certainly shut it off, but it almost never does. First, it has to talk to user space, and the MMU has to be running to translate what user space has to addresses the Load-Store unit can use. Second, the flexibility and protection provided by the MMU are very valuable, they're not discarded without a really compelling reason. I know at least one OS will (or would, it's been a while) run some bulk copies MMU-off, but that's about it.

C accessing memory location

I mean the physical memory, the RAM.
In C you can access any memory address, so how does the operating system then prevent your program from changing memory address which is not in your program's memory space?
Does it set specific memory adresses as begin and end for each program, if so how does it know how much is needed.
Your operating system kernel works closely with memory management (MMU) hardware, when the hardware and OS both support this, to make it impossible to access memory you have been disallowed access to.
Generally speaking, this also means the addresses you access are not physical addresses but rather are virtual addresses, and hardware performs the appropriate translation in order to perform the access.
This is what is called a memory protection. It may be implemented using different methods. I'd recommend you start with a Wikipedia article on this subject — http://en.wikipedia.org/wiki/Memory_protection
Actually, your program is allocated virtual memory, and that's what you work with. The OS gives you a part of the RAM, you can't access other processes' memory (unless it's shared memory, look it up).
It depends on the architecture, on some it's not even possible to prevent a program from crashing the system, but generally the platform provides some means to protect memory and separate address space of different processes.
This has to do with a thing called 'paging', which is provided by the CPU itself. In old operating systems, you had 'real mode', where you could directly access memory addresses. In contrast, paging gives you 'virtual memory', so that you are not accessing the raw memory itself, but rather, what appears to your program to be the entire memory map.
The operating system does "memory management" often coupled with TLB's (Translation Lookaside Buffers) and Virtual Memory, which translate any address to pages, which the operation system can tag readable or executable in the current processes context.
The minimum requirement for a processors MMU or memory management unit is in current context restrict the accessable memory to a range which can be only set in processors registers in supervisor mode (as opposed to user mode).
The logical address is generated by the CPU which is mapped to the physical address by the memory mapping unit. Unlike the physical address space the logical address is not restricted by memory size and you just get to work with the logical address space. The address binding is done by MMU. So you never deal with the physical address directly.
Most computers (and all PCs since the 386) have something called the Memory Management Unit (or MMU). It's job is to translate local addresses used by a program into the physical addresses needed to fetch real bytes from real memory. It's the operating system's job to program the MMU.
As a result of this, programs can be loaded into any region of memory and appear, from that program's point of view while executing, to be be any any other address. It's common to find that the code for all programs appear (locally) to be at the same address and their data always appears (locally) to be at the same address even though physically they will be in different locations. With each memory access, the MMU transparently translates from the local address space to the physical one.
If a program trys to access a memory address that has not been mapped into its local address space, the hardware generates an exception and typically gets flagged as a "segmentation violation", followed by the forcible termination of the program. This protects from accessing the memory of other processes.
But that doesn't have to be the case! On systems with "virtual memory" and current resource demands on RAM that exceed the amount of physical memory, some pages (just blocks of memory of a common size, often on the order of 4-8kB) can be written out to disk and given as RAM to a program trying to allocate and use new memory. Later on, when that page is needed by whatever program owns it, the memory access causes an exception and the OS swaps out some other memory page and re-loads the needed one from disk. The program that "page-faulted" gets delayed while this happens but otherwise notices nothing.
There are lots of other tricks the MMU/OS can do as well, like sharing memory between processes, making a disk file appear to be direct-memory-accessible, setting some pages as "NX" so they can't be treated as executable code, using arbitrary sections of the logical memory space regardless of how much and at what address the physical ram uses, and more.

Can malloc return same address in two different processes?

Suppose I have two process a and b on Linux. and in both process I use malloc() to allocate a memory,
Is there any chances that malloc() returns the same starting address in two processes?
If no, then who is going to take care of this.
If yes, then both process can access the same data at this address.
Is there any chances that malloc() return same starting address in two process.
Yes, but this is not a problem.
What you're not understanding is that operating systems firstly handle your physical space for you - programs etc only see virtual addresses. There is only one virtual address space, however, the operating system (let's stick with 32-bit for now) divides that up. On Windows, the top half (0xA0000000+) belongs to the kernel and the lower half to user mode processes. This is referred to as the 2GB/2GB split. On Linux, the divide is 3GB/1GB - see this article:
Kernel memory is defined to start at PAGE_OFFSET,which in x86 is 0XC0000000, or 3 gigabytes. (This is where the 3gig/1gig split is defined.) Every virtual address above PAGE_OFFSET is the kernel, any address below PAGE_OFFSET is a user address.
Now, when a process switch (as opposed to a context switch) occurs, all of the pages belonging to the current process are unmapped from virtual memory (not necessarily paging them) and all of the pages belonging to the to-be-run process are copied in (disclaimer: this might not exactly be true; one could mark pages dirty etc and copy on access instead, theoretically).
The reason for the split is that, for performance reasons, the upper half of the virtual memory space can remained mapped to the operating system kernel.
So, although malloc might return the same value in two given processes, that doesn't matter because:
physically, they're not the same address.
the processes don't share virtual memory anywhere.
For 64-bit systems, since we're currently only using 48 of those bits there is a gulf between the bottom of user mode and kernel mode which is not addressable (yet).
Yes, malloc() can return the same pointer value in separate processes, if the processes run in separate address spaces, which is achieved via virtual memory. But they won't access the same physical memory location in that case and the data at the address need not be the same, obviously.
Process is a collection of threads plus an address-space. This address-space is referred as virtual because every byte of it is not necessarily backed by physical memory. Segments of a virtual address-space will be eventually backed by physical memory if the application in the process ends up by using effectively this memory.
So, malloc() may return an identical address for two process, but it is no problem since these malloced memories will be backed by different segments of physical memory.
Moreover malloc() implementation is moslty not reentrant, therefore calling malloc() in differents threads sharing the same address-space hopefully won't result in returning the same virtual address.

Resources