System Wide Page Table

System Wide Page Table - c

I have a doubt when each process has its own separate page table then why is a system wide page table required? Also if a page table is such that it maps virtual address to a physical address then I think two process may map to same physical address because all process have same virtual address space. Is this true?

About the second part, which is mapping virtual address to the same physical address, for library code and different instances of an application code, this is indeed what is done. The code is given read only access and the same virtual address is mapped to the same physical address. In this way, there is no need to have multiple copies of the same code in the physical memory, all this assuming ASLR is not enabled.
Now concerning the data part, Modern OSs like Linux, use demand paging, that is a page is only brought to the physical memory when it is accessed (read or write). At that point, the kernel can make sure to assign a unique physical address for that page. I don't know what is the purpose of system wide page table though.

A system-wide page table would be used by the kernel, which in most systems is always mapped in to memory. (Typically a 32-bit system will allocate the lower 2-3 GB of virtual address space to the user process and the upper 1-2 GB to the kernel.) Making the kernel mappings common across all processes means that you don't have to worry about making sure kernel code you're about to run is mapped when you enter a system call from userland.

Related

How does page fault handler fill up the page table entry with a physical address if CPU sees virtual addresses?

I was reading about page faulting and from what I read, the MMU consults the page table to translate virtual addresses into physical addresses. It is the responsibility of the OS (via the page fault handler) to fill up these page table entries.
What confuses me is how does the page fault handler obtain the physical addresses in the first place? In the diagrams and notes I saw, the CPU seems to use virtual addresses and the MMU transparently translates them to physical addresses. Does the CPU specially work with physical address rather than virtual addresses for page fault handling?
If there is an access to some 4K page that is not present in memory and the page fault handler successfully locates the corresponding 4K page on disk, how does it acquire a 4K page of physical memory and figure out the physical address of the 4K page of physical memory?

Part of the OS`s responsibility is to keep track of a list of physical pages. You can look on OSDev to see how this is done - usually by querying BIOS/UEFI-exposed functions, which give you (usually non-contiguous) lists of free memory.
UEFI in particular exposes GetMemoryMap at boot time to get an array of memory descriptors.
Given a maintained list of available physical pages - when the OS handles a page fault, it has access to the faulting virtual address, and it can decide what to do. If it needs to allocate a new page, then it will consults its list of free pages, and choose an available physical page to map into the virtual address space. On x86 this mapping is done by modifying the page table and loading it into the cr3 register.
Once the page is mapped, it can be written to using virtual addresses.

The operating system works with virtual address at the fundamental level, but it does have to manage physical addresses as well. In an OSes memory management subsystem, there is a part called the physical memory manager. Basically, at startup, it read a table given to it by the firmware which tells it which memory regions are free. It sets up free list to contain all free pages in this map. When a page fault occurs, it pulls a page off of this list, maps it into the PTE, grabs another page for to create a page table if no page table exists for this address (note that it will keep doing that step depending on how many levels haven't been mapped yet), flushes the TLB for this address, and then it carries on.
Note that most physical memory allocators are much more compex then this, but, fundamentally, that is the algorithm.

Does Virtual Memory area struct only comes into picture when there is a page fault?

Virtual Memory is a quite complex topic for me. I am trying to understand it. Here is my understanding for a 32-bit system. Example RAM is just 2GB. I have tried reading many links, and I am not confident at the moment. I would like you people to help me in clearing up my concepts. Please acknowledge my points, and also please answer for what you feel is wrong. I have also a confused section in my points. So, here starts the summary.
Every process thinks it is only running. It can access the 4GB of memory - virtual address space.
When a process access a virtual address it is translated to physical address via MMU.
This MMU is a part of a CPU - a hardware.
When the MMU cannot translate the address to a physical one, it raises a page fault.
On page fault, the kernel is notified. The kernel check the VM area struct. If it can find it - may be on disk. It will do some page-in /page-out. And get this memory on the RAM.
Now MMU will again try and will succeed this time.
In case the kernel cannot find the address, it will raise a signal. For example, invalid access will raise a SIGSEGV.
Confused points.
Does Page table is maintained in Kernel? This VM area struct has a page table ?
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?

Does Page table is maintained in Kernel? This VM area struct has a page table ?
Yes. Not exactly: each process has a mm_struct, which contains a list of vm_area_struct's (which represent abstract, processor-independent memory regions, aka mappings), and a field called pgd, which is a pointer to the processor-specific page table (which contains the current state of each page: valid, readable, writable, dirty, ...).
The page table doesn't need to be complete, the OS can generate each part of it from the VMAs.
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
The translation fails, e.g. because the page was marked as invalid, or a write access was attempted against a readonly page.
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
There are two kinds of MMUs in common use. One of them only has a TLB (Translation Lookaside Buffer), which is a cache of the page table. When the TLB doesn't have a translation for an attempted access, a TLB miss is generated, the OS does a page table walk, and puts the translation in the TLB.
The other kind of MMU does the page table walk in hardware.
In any case, the OS maintains a page table per process, this maps Virtual Page Numbers to Physical Frame Numbers. This mapping can change at any moment, when a page is paged-in, the physical frame it is mapped to depends on the availability of free memory.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
To a first approximation, yes. Beyond that, there are many reasons why the kernel may decide to fiddle with a process' memory, e.g: if there is memory pressure it may decide to page out some rarely used pages from some random process. User space can also manipulate the mappings via mmap(), execve() and other system calls.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Totally unrelated to the other questions. In summary, high memory is a hack to be able to access lots of memory in a limited address space computer.
Basically, the kernel has a limited address space reserved to it (on x86, a typical user/kernel split is 3Gb/1Gb [processes can run in user space or kernel space. A process runs in kernel space when a syscall is invoked. To avoid having to switch the page table on every context-switch, on x86 typically the address space is split between user-space and kernel-space]). So the kernel can directly access up to ~1Gb of memory. To access more physical memory, there is some indirection involved, which is what high memory is all about.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run Linux?
Laptop/desktop processors come with an MMU. x86 supports paging since the 386.
Linux, specially the variant called µCLinux, supports processors without MMUs (!MMU). Many embedded systems (ADSL routers, ...) use processors without an MMU. There are some important restrictions, among them:
Some syscalls don't work at all: e.g fork().
Some syscalls work with restrictions and non-POSIX conforming behavior: e.g mmap()
The executable file format is different: e.g bFLT or ELF-FDPIC instead of ELF.
The stack cannot grow, and its size has to be set at link-time.
When a program is loaded first the kernel will setup a kernel VM-Area for that process is it? This Kernel VM Area actually holds where the program sections are there in the memory/HDD. Then the entire story of updating CR3 register, and page walkthrough or TLB comes into the picture right? So, whenever there is a pagefault - Kernel will update the page table by looking at Kernel virtual memory area is it? But they say Kernel VM area keeps updating. How this is possible, since cat /proc/pid_value/map will keep updating.The map won't be constant from start to end. SO, the real information is available in the Kernel VM area struct is it? This is the acutal information where the section of program lies, it could be HDD or physical memory -- RAM? So, this is filled during process loading is it, the first job? Kernel does the page in page out on page fault, and will update the Kernel VM area is it? So, it should also know the entire program location on the HDD for page-in / page out right? Please correct me here. This is in continuation to my first question of the previous comment.
When the kernel loads a program, it will setup several VMAs (mappings), according to the segments in the executable file (which on ELF files you can see with readelf --segments), which will be text/code segment, data segment, etc... During the lifetime of the program, additional mappings may be created by the dynamic/runtime linkers, by the memory allocator (malloc(), which may also extend the data segment via brk()), or directly by the program via mmap(),shm_open(), etc..
The VMAs contain the necessary information to generate the page table, e.g. they tell whether that memory is backed by a file or by swap (anonymous memory). So, yes, the kernel will update the page table by looking at the VMAs. The kernel will page in memory in response to page faults, and will page out memory in response to memory pressure.
Using x86 no PAE as an example:
On x86 with no PAE, a linear address can be split into 3 parts: the top 10 bits point to an entry in the page directory, the middle 10 bits point to an entry in the page table pointed to by the aforementioned page directory entry. The page table entry may contain a valid physical frame number: the top 22 bits of a physical address. The bottom 12 bits of the virtual address is an offset into the page that goes untranslated into the physical address.
Each time the kernel schedules a different process, the CR3 register is written to with a pointer to the page directory for the current process. Then, each time a memory access is made, the MMU tries to look for a translation cached in the TLB, if it doesn't find one, it looks for one doing a page table walk starting from CR3. If it still doesn't find one, a GPF fault is raised, the CPU switches to Ring 0 (kernel mode), and the kernel tries to find one in the VMAs.
Also, I believe this reading from CR, page directory->page-table->Page frame number-memory address this all done by MMU. Am I correct?
On x86, yes, the MMU does the page table walk. On other systems (e.g: MIPS), the MMU is little more than the TLB, and on TLB miss exceptions the kernel does the page table walk by software.

Though this is not going to be the best answer, iw ould like to share my thoughts on confused points.
1. Does Page table is maintained...
Yes. kernel maintains the page tables. In fact it maintains nested page tables. And top of the page tables is stored in top_pmd. pmd i suppose it is page mapping directory. You can traverse through all the page tables using this structure.
2. How MMU cannot find the address in physical RAM.....
I am not sure i understood the question. But in case because of some problem, the instruction is faulted or out of its instruction area is being accessed, you generally get undefined instruction exception resulting in undefined exception abort. If you look at the crash dumps, you can see it in the kernel log.
3. Is the Mapping table - virtual to physical is inside a MMU...
Yes. MMU is SW+HW. HW is like TLB and all. The mapping tables are stored here. For instructions, that is for code section i always converted the physical-virtual address and always they matched. And almost all the times it matches for Data sections as well.
4. cat /proc/pid_value/maps. This shows me the current mapping of the vmarea....
This is more used for analyzing the virtual addresses of user space stacks. As you know virtually all the user space programs can have 4 GB of virtual address. So unlike kernel if i say 0xc0100234. You cannot directly go and point to the istruction. So you need this mapping and the virtual address to point the instruction based on the data you have.
5. The high-mem concept is that kernel cannot directly access the Memory...
High-mem corresponds to user space memory(some one correct me if i am wrong). When kernel wants to read some data from a address at user space you will be accessing the HIGHMEM.
6. Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
MMU as i mentioned is HW + SW. So mostly it would be coming with the chipset. and the SW would be generally architecture dependent. You can disable MMU from kernel config and build. I have never tried it though. Mostly these days allthe chipsets have it. But small boards i think they disable MMU. I am not entirely sure though.
As all these are conceptual questions, i may be lacking some knowledge and be wrong at places. If so others please correct me.

how is virtual address translated to its physical address on backing store?

We have address translation table to translate virtual address (VA) of a process to its corresponding physical address in RAM, but if the table does not have any entry for a VA , it results in page fault and kernal goes to backing store (often a hard drive) and fetch the corresponding data and update the RAM and address translation table. So my question is how does the OS come to know what is the address corresponding to a VA in backing store ? Does it have a separate translation table for that?

A process starts by allocating virtual memory. That eventually will cause a page fault when the program starts actually addressing the virtual memory address. The OS knows that the memory access is valid. Since it was allocated explicitly.
So no harm done, the OS simply maps the VM address to a physical address.
If the page fault is for an address that was not previously requested to be a valid VM address then the processor will discover that there is no page table entry for the address. And will instead raise an GP fault, an AccessViolation or segfault in your program. Kaboom, program over.

There is no direct correlation, at least not in the way that you suppose.
The operating system divides virtual and phsyical RAM as well as swap space (backing store) and mapped files into pages, most commonly 4096 bytes.
When your code accesses a certain address, this is always a virtual address within a page that is either valid-in-core, valid-not-accessed, valid-out-of-core, or invalid. The OS may have other properties (such as "has been written to") in its books, but they're irrelevant for us here.
If the page is in-core, then it has a physical address, otherwise it does not. When swapped out and in again, the same identical page could in theory very well land in a different physical region of memory. Similarly, the page after some other page in memory (virtual or physical) could be before that page in the swap file or in a memory-mapped file. There's no guarantee for that.
Thus, there is no such thing as translating a virtual address to a physical address in backing store. There is only a translation from a virtual address to a page which may temporarily have a physical address. In the easiest case, "translating" means dividing by 4096, but of course other schemes are possible.
Further, every time your code accesses a memory location, the virtual address must be translated to a physical one. There exists dedicated logic inside a CPU to do this translation fully automatically (for a very small subset of "hot" pages, often as few as 64), or in a hardware-assisted way, which usually involves a lookup in a more or less complicated hierarchical structure.
This is also a fault, but it's one that you don't see. The only faults that you get to see are the ones when the OS doesn't have a valid page (or can't supply it for some reason), and thus can't assign a physical address to the to-be-translated virtual one.
When your program asks for memory, the OS remembers that certain pages are valid, but they do not exist yet because you have never accessed them.
The first time you access a page, a fault happens and obviously its address is nowhere in the translation tables (how could it be, it doesn't exist!). Thus the OS looks into its books and (assuming the page is valid) it either loads the page from disk or assigns the address of a zero page otherwise.
Not rarely, the OS will cheat and all zero pages are the same write-protected zero page until you actually write to it (at which point, a fault occurs and you are secretly redirected to a different physical memory area, one which you can write to, too.
Otherwise, that is if you haven't reserved memory, the OS sends a signal (or an equivalent, Windows calls it "exception") which will terminate your process unless handled.
For a multitude of reasons, the OS may later decide to remove one or several pages from your working set. This normally does not immediately remove them, but maked them candidates for being swapped (for non-mapped data) or discarded (for mapped data) in case more memory is needed. When you access an address in one of these pages again, it is either re-added to your working set (likely pushing another one out) or reloaded from disk.
In either case, all the OS needs to know is how to translate your virtual address to a page identifier of some sort (e.g. "page frame number"), and whether the page is resident (and at what address).

I think your question answer is issue about interrupt table.
Page fault is a kind of software interrupt, and operating system must have some solution to that interrupt.And the solution code is already in the os kernel, and that piece of code address is right at the interrupt table.So the page fault happen, os will go to that piece of code to get the unmapped page into the physical memory.

This is OS-specific, but many implementations share logic with memory-mapped file features (so that anonymous pages actually are memory-mapped views of the pagefile, flagged so that the content can be discards at unmapping instead of flushed).
For Windows, much of this is documented here, on the CreateFileMapping page

Mapping of Virtual Address to Physical Address

I have a doubt when each process has it's own separate page table then why is there s system wide page table required ? Also if Page table is such that it maps virtual address to a physical address then I think two process may map to same physical address because all process have same virtual address space . Any good link on system wide page table will also solve my problem?

Each process has its own independent virtual address space - two processes can have virtpage 1 map to different physpages. Processes can participate in shared memory, in which case they each have some virtpage mapping to the same physpage.
The virtual address space of a process can be used to map virtpages to physpages, to memory mapped files, devices, etc. Virtpages don't have to be wired to RAM. A process could memory-map an entire 1GB file - in which case, its physical memory usage might only be a couple megs, but its virtual address space usage would be 1GB or more. Many processes could do this, in which case the sum of virtual address space usage across all processes might be, say, 40 GB, while the total physical memory usage might be only, say, 100 megs; this is very easy to do on 32-bit systems.
Since lots of processes load the same libraries, the OS typically puts the libs in one set of read-only executable pages, and then loads mappings in the virtpage space for each process to point to that one set of pages, to save on physical memory.
Processes may have virtpage mappings that don't point to anything, for instance if part of the process's memory got written to the page file - the process will try to access that page, the CPU will trigger a page fault, the OS will see the page fault and handle it by suspending the process, reading the pages back into ram from the page file and then resuming the process.
There are typically 3 types of page faults. The first type is when the CPU does not have the virtual-physical mapping in the TLB - the processor invokes the pagefault software interrupt in the OS, the OS puts the mapping into the processor for that process, then the proc re-runs the offending instructions. These happen thousands of times a second.
The second type is when the OS has no mapping because, say, the memory for the process has been swapped to disk, as explained above. These happen infrequently on a lightly loaded machine, but happen more often as memory pressure is increased, up to 100s to 1000s of times per second, maybe even more.
The third type is when the OS has no mapping because the mapping does not exist - the process is trying to access memory that does not belong to it. This generates a segfault, and typically, the process is killed. These aren't supposed to happen often, and solely depend on how well written the software is on the machine, and does not have anything to do with scheduling or machine load.
Even if you already knew that, I figured I throw that in for the community.

Two Processes in Virtual Memory

I have read about virtual memory . I have a doubt .Suppose there are two processes P and Q . Both will have same virtual memory addressing . They both have their page tables . There will be a system wide page table .then how are these two processes distinguished on RAM if we use a system wide page table ?

Each process has a virtual address space which has a mapping to physical memory but then can also be virtualized to, typically, disk.

It is because virtual address is split into user and kernel space. After boot process is started, paging unit is enabled which suddenly jumps into kernel space virtual address and finally it passes the control to user space. System wide page table is for kernel and each process has its own page table.
When it runs into kernel it uses kernel page table and when switched back to user, it uses user process page table.
Each process has its own page table and thus it differentiate the two process

If there is a system wide pagetable in use all the time, then that cannot work, as there would be only one virtual mapping and the 2 processes would step on each others memory.
*nix systems however, keep one page table per process(simply speaking) and switches between those when the kernel schedules a process to run. That way each process can have the same virtual addresses mapped to different physical addresses, and there's no problem.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight