What's the mechanism behind P2V, V2P macro in Xv6

What's the mechanism behind P2V, V2P macro in Xv6 - c

I know there is a mapping mechanism in how a virtual address turns out into a physical.
Just like the following, a linear address contains three parts
Page Directory index
Page Table index
Offset
Here is the illustration:
Now, when I take a look at the source code of Xv6 in memorylayout.h
#define V2P(a) (((uint) (a)) - KERNBASE)
#define P2V(a) (((void *) (a)) + KERNBASE)
#define V2P_WO(x) ((x) - KERNBASE) // same as V2P, but without casts
#define P2V_WO(x) ((x) + KERNBASE) // same as P2V, but without casts
How can the V2P or P2V work correctly without doing the process of the address translation?

The V2P and P2V macros doesn't do more than you think they do. They are just subtract and add the KERNBASE constant, which marked at 2 GB.
It seems you understand the MMU's hardware mapping mechanism correctly.
The mapping rules are saved by a per process page tables. Those tables forms the process's virtual space.
Specifically, in XV6, process's virtual space structure is being built (by the appropriate mappings) as follows: virtual address space layout
As the diagram above shows, XV6 specifically builds the process's virtual address space so that 2GB - 4GB virtual addresses maps to 0 to PHYSTOP physical addresses (respectively). As explained in the XV6 official commentary:
Xv6 includes all mappings needed for the kernel to run in every process’s page table;
these mappings all appear above KERNBASE. It maps virtual addresses KERNBASE:KERNBASE+PHYSTOP
to 0:PHYSTOP.
The motivation for this decision is also specified:
One reason for this mapping is so that the
kernel can use its own instructions and data. Another reason is that the kernel sometimes
needs to be able to write a given page of physical memory, for example when
creating page table pages; having every physical page appear at a predictable virtual
address makes this convenient.
In other words, because the kernel made all page tables map 2GB virtual to 0 physical (and up to PHYSTOP), we can easily find the physical address of a virtual address that is above 2GB.
For example: The physical address of the virtual address 0x10001000 can easily found: it is 0x00001000 we just subtract 2GB, because that's how we mapped it to be.
This "workaround" can help the kernel do the conversion easily, for example, for building and manipulating page tables (where physical address needs to be calculated and written to memory).
Of course that the above "workaround" is not free, as this makes us waste valuable virtual address space (2GB of it) because now every physical address has at least already 1 virtual address. even if we never going to use it. that leaves the real process only the 2GB left from the entire virtual address (it is 4 GB in total, because we use 32bit to count addresses). This also was explained in the XV6 official commentary:
A defect of this arrangement is that xv6 cannot make
use of more than 2 GB of physical memory
I recommend you reading more about this manner in the XV6 commentary under "Process address space" header. XV6 Commentary

Related

where does address of variables stored in a memory?

whenever we need to find the address of the variable we use below syntax in C and it prints a address of the variable. what i am trying to understand is the address that returned is actual physical memory location or compiler throwing a some random number. if it is either physical or random, where did it get those number or where it has to be stored in memory. actually does address of the memory location takes space in the memory?
int a = 10;
printf("ADDRESS:%d",&a);
ADDRESS: 2234xxxxxxxx

This location is from the virtual address space, which is allocated to your program. In other words, this is from the virtual memory, which your OS maps to a physical memory, as and when needed.

It depends on what type of system you've got.
Low-end systems such as microcontroller applications often only supports physical addresses.
Mid-range CPUs often come with a MMU (memory mapping unit) which allows so-called virtual memory to be placed on top of the physical memory. Meaning that a certain part of the code could be working from address 0 to x, though in reality those virtual addresses are just aliases for physical ones.
High-end systems like PC typically only allows virtual memory access and denies applications direct access to physical memory. They often also use Address space layout randomization (ASLR) to produce random address layouts for certain kinds of memory, in order to prevent hacks that exploit hard-coded addresses.
In either case, the actual address itself does not take up space in memory.
Higher abstraction layer concepts such as file systems may however store addresses in look-up tables etc and then they will take up memory.

… is the address that returned is actual physical memory location or compiler throwing a some random number
In general-purpose operating systems, the addresses in your C program are virtual memory addresses.1
if it is either physical or random, where did it get those number or where it has to be stored in memory.
The software that loads your program into memory makes the final decisions about what addresses are used2, and it may inform your program about those addresses in various ways, including:
It may put the start addresses of certain parts of the program in designated processor registers. For example, the start address of the read-only data of your program might be put in R17, and then your program would use R17 as a base address for accessing that data.
It may “fix up” addresses built into your program’s instructions and data. The program’s executable file may contain information about places in your program’s instructions or data that need to be updated when the virtual addresses are decided. After the instructions and data are loaded into memory, the loader will use the information in the file to find those places and update them.
With position-independent code, the program counter itself (a register in the processor that contains the address of the instruction the processor is currently executing or about to execute) provides address information.
So, when your program wants to evaluate &x, it may take the offset of x from the start of the section it is in (and that offset is built into the program by the compiler and possibly updated by the linker) and adds it to the base address of that section. The resulting sum is the address of x.
actually does address of the memory location takes space in the memory?
The C standard does not require the program to use any memory for the address of x, &x. The result of &x is a value, like the result of 3*x. The only thing the compiler has to do with a value is ensure it gets used for whatever further expression it is used in. It is not required to store it in memory. However, if the program is dealing with many values in a piece of code, so there are not enough processor registers to hold them all, the compiler may choose to store values in memory temporarily.
Footnotes
1 Virtual memory is a conceptual or “imaginary” address space. Your program can execute with virtual addresses because the hardware automatically translates virtual addresses to physical addresses while it is executing the program. The operating system creates a map that tells the hardware how to translate virtual addresses to physical addresses. (The map may also tell the hardware certain virtual memory is not actually in physical memory at the moment. In this case, the hardware interrupts the program and starts an operating system routine which deals with the issue. That routine arranges for the needed data to be loaded into memory and then updates the virtual memory map to indicate that.)
2 There is usually a general scheme for how parts of the program are laid out in memory, such as starting the instructions in one area and setting up space for stack in another area. In modern systems, some randomness is intentionally added to the addresses to foil malicious people trying to take advantage of bugs in programs.

In ARMv7, is the address used in TTBR0 and TTBR1 physical or virtual

I've been looking in the ARM Architecture Reference Manual for v7-A and v7-R in Section B3 and I can't figure out if the address used in the TTBR0 and TTBR1 registers is supposed to be a virtual or physical address.
Physical would make the most sense, but I'd like to know definitively.
So, is this address supposed to be physical or virtual?
Is it required to keep the page table location mapped as an identity address (PA == VA)?

Imagine it were a virtual address...
The CPU issues a transaction to a virtual address. In order to translate it, the MMU needs to do a table walk. For that it needs to know what bit of RAM to address on the bus, so it looks in the base register. Great, now it has the virtual base address, it just needs to translate that to a physical address to know what bit of RAM to address on the bus, so it needs to do a table walk. For that it needs... etc. etc.
In short, yes, they're definitely physical addresses. The fact that TTBRn are 64-bit on LPAE implementations is also a bit of a clue.*
Once the page tables are set up and the MMU is on, it's not required to keep them mapped at all, let alone in any particular relationship - if the data's physically there in RAM, the MMU is quite happy. The CPU only needs to map that RAM into its address space if it's updating the tables - the rest of the time they'd just be a waste of address space.
* ...and this is of course a complete lie when the Virtualisation Extensions are involved ;) In that case, they're intermediate physical addresses, and entirely subject to the whims of stage 2 translation. For which the above applies. Fun.

As per the code, Physical address of pgd is written to TTBR.
http://lxr.free-electrons.com/source/arch/arm/include/asm/proc-fns.h#L116
#define cpu_switch_mm(pgd,mm) cpu_do_switch_mm(virt_to_phys(pgd),mm)

Physical. In this regard it is unchanged from the ARMv5.

Using mremap() to merge two identical pages into one physical page

I have a C code where I know that the content of the page pointed to by void *p1 is the same as the content pointed to by page void *p2. p1 and p2 were dynamically allocated. My question is can I use remap() to let these two pages point to the same physical page instead of having two identical physical pages?
Edit: I am trying to change the virtual to physical mapping in the page table of this process so that p1 and p2 point to the same physical address. I do not want to make p1 and p2 to point to the same thing virtually.

If you are trying to map multiple virtual memory addresses to a single physical address, using the linux page scheme, that isn't what mremap() is for. mremap is for moving (remapping) an existing region, and if you use it to map to a specific newaddress, any old mappings to that address become invalid (per the man page). http://man7.org/linux/man-pages/man2/mremap.2.html
See the emphasized section...
MREMAP_FIXED (since Linux 2.3.31)
This flag serves a similar purpose to the MAP_FIXED flag of
mmap(2). If this flag is specified, then mremap() accepts a
fifth argument, void *new_address, which specifies a page-
aligned address to which the mapping must be moved. Any
previous mapping at the address range specified by new_address
and new_size is unmapped. If MREMAP_FIXED is specified, then
MREMAP_MAYMOVE must also be specified.
If you are simply trying to merge the storage of 2 identical data structures, you would't need mremap() to point 2 "pages" to the same identical page, you'd need to point the 2 different data structure pointers to the same page and free the redundant page.
If the content is the same, you'd need to convert any pointers that are pointing to p2 to addresses into p1.
Even using mremap properly requires you to take care of your own pointer housekeeping, it doesn't magically do that for you; if you fail to do that, after the remap you may have dangling pointers.
PS: Its been years since I did kernel programming, so I might be wrong or out of date in my next statement, but I think you will need to use kernel calls (ie. kernel module / driver level calls) to get to the physical mappings because mmap() and mremap() are user-land calls and work within the virtual address space. The "page mapping" is done at the kernel level, outside of user space.

Dereferencing a pointer at lower level in C

When malloc returns a pointer (a virtual address of a block of data),
char *p = malloc (10);
p has a virtual address, (say x). And p holds a virtual address of a block of 10 addresses.
Say these virtual addresses are from y to y+10.
These 10 addresses belong to a page , and the virtual --> physical mapping is placed in the page table.
When processor dereferences the pointer p, say printf("%c", *p); , how does the processor know that it has to access the address at y ?
Is the page table accessed twice in order to dereference a pointer ,in other words -print the address pointed by p ? How exactly is it done, can anybody explain ?
Also, for accessing stack variables, does the processor have to access it through page table ?
Isn't the stack pointer register (SP) not pointing to the stack already ?

I think there's a muddling of different layers.
First, page tables: This is a data structure that uses some memory to provide pointers to more memory. Given a particular virtual address, it can deconstruct it into indices into the table. Right now, this is happening under the cover in the kernel, but it's possible to implement this same idea in user space.
Now, the next step is processes. Each process gets its own view of memory and hence has its one set of page tables. How the processor know where these different page tables reside? In a special control register called cr3. Changing processes is sometimes called a context switch; and rightly so because setting cr3 changes the processes view of virtual memory.
But the next question is, how does the processor even understand the concept of virtual memory? Well, in some older architectures (MIPs comes to mind), the system would keep a cache of recently translated memory and provides guidance for how to handle virtual memory access. In x86, the cache (more commonly called a translation lookaside buffer) is actually implemented in hardware. The processor stores these translations so it can handle the page table lookups automatically. If there's a cache miss, then it will actually traverse the page table structure as set up by the OS to lookup what it should reference.
Of course, this means there must be at least two different modes for the processor: one that assumes the addresses are direct and one that traverses the page tables. The first mode, real mode, is there on boot and only around long enough to set up the tables before the bootloader turns on virtual mode and jumps to the beginning of the rest of the code.
The short answer to my long explanation is that in all likelihood, the page tables aren't accessed at all because the processor already has the address translations.

And p holds a virtual address of a block of 10 addresses.
You're confused. p is a pointer holding the address of a 10-byte block; how these bytes are interpreted is up to the application.

Process Page Tables

I'm interested in gaining a greater understanding of the virtual memory and page mechanism, specifically for Windows x86 systems. From what I have gathered from various online resources (including other questions posted on SO),
1) The individual page tables for each process are located within the kernel address space of that same process.
2) There is only a single page table per process, containing the mapping of virtual pages onto physical pages (or frames).
3) The physical address corresponding to a given virtual address is calculated by the memory management unit (MMU) essentially by using the first 20 bits of the provided virtual address as the index of the page table, using that index to retrieve the beginning address of the physical frame and then applying some offset to that address according to the remaining 12 bits of the virtual address.
Are these three statements correct? Or am I misinterpreting the information?

So, first lets clarify some things:
In the case of the x86 architecture, it is not the operating system that determines the paging policy, it is the CPU (more specifically it's MMU). How the operating system views the paging system is independent of the the way it is implemented. As a commenter rightly pointed out, there is an OS specific component to paging models. This is subordinate to the hardware's way of doing things.
32 bit and 64 bit x86 processors have different paging schemes so you can't really talk about the x86 paging model without also specifying the word size of the processor.
What follows is a massively condensed version of the 32 bit x86 paging model, using the simplest version of it. There are many additional tweaks that are possible and I know that various OS's make use of them. I'm not going into those because I'm not really familiar with the internals of most OS's and because you really shouldn't go into that until you have a grasp on the simpler stuff. If you want the to know all of the wonderful quirks of the x86 paging model, you can go to the Intel docs: Intel System Programming Guide
In the simplest paging model, the memory space is divided into 4KB blocks called pages. A contiguous chunk of 1024 of these is mapped to a page table (which is also 4KB in size). For a further level of indirection, All 1024 page tables are mapped to a 4KB page directory and the base of this directory sits in a special register %cr3 in the processor. This two level structure is in place because most memory spaces in the OS are sparse which means that most of it is unused. You don't want to keep a bunch of page tables around for memory that isn't touched.
When you get a memory address, the most significant 10 bits index into the page directory, which gives you the base of the page table. The next 10 bits index into that page table to give you the base of the physical page (also called the physical frame). Finally, the last 12 bits index into the frame. The MMU does all of this for you, assuming you've set %cr3 to the correct value.
64 bit systems have a 4 level paging system because their memory spaces are much more sparse. Also, it is possible to page sizes that are not 4KB.
To actually get to your questions:
All of this paging information (tables, directories etc) sits in kernel memory. Note that kernel memory is one big chuck and there is no concept of having kernel memory for a single process.
There is only one page directory per process. This is because the page directory defines a memory space and each process has exactly one memory space.
The last paragraph above gives you the way an address is chopped up.
Edit: Clean up and minor modifications.

Overall that's pretty much correct.
If memory serves, a few details are a bit off though:
The paging for the kernel memory doesn't change per-process, so all the page tables are always visible to the kernel.
In theory, there's also a segment-based translation step. Most practical systems (e.g., *BSD, Linux, Windows, OS/X), however, use segments with their base set to 0 and limit set to the address space limit, so this step ends up as essentially a NOP.