Using mremap() to merge two identical pages into one physical page

Using mremap() to merge two identical pages into one physical page - c

I have a C code where I know that the content of the page pointed to by void *p1 is the same as the content pointed to by page void *p2. p1 and p2 were dynamically allocated. My question is can I use remap() to let these two pages point to the same physical page instead of having two identical physical pages?
Edit: I am trying to change the virtual to physical mapping in the page table of this process so that p1 and p2 point to the same physical address. I do not want to make p1 and p2 to point to the same thing virtually.

If you are trying to map multiple virtual memory addresses to a single physical address, using the linux page scheme, that isn't what mremap() is for. mremap is for moving (remapping) an existing region, and if you use it to map to a specific newaddress, any old mappings to that address become invalid (per the man page). http://man7.org/linux/man-pages/man2/mremap.2.html
See the emphasized section...
MREMAP_FIXED (since Linux 2.3.31)
This flag serves a similar purpose to the MAP_FIXED flag of
mmap(2). If this flag is specified, then mremap() accepts a
fifth argument, void *new_address, which specifies a page-
aligned address to which the mapping must be moved. Any
previous mapping at the address range specified by new_address
and new_size is unmapped. If MREMAP_FIXED is specified, then
MREMAP_MAYMOVE must also be specified.
If you are simply trying to merge the storage of 2 identical data structures, you would't need mremap() to point 2 "pages" to the same identical page, you'd need to point the 2 different data structure pointers to the same page and free the redundant page.
If the content is the same, you'd need to convert any pointers that are pointing to p2 to addresses into p1.
Even using mremap properly requires you to take care of your own pointer housekeeping, it doesn't magically do that for you; if you fail to do that, after the remap you may have dangling pointers.
PS: Its been years since I did kernel programming, so I might be wrong or out of date in my next statement, but I think you will need to use kernel calls (ie. kernel module / driver level calls) to get to the physical mappings because mmap() and mremap() are user-land calls and work within the virtual address space. The "page mapping" is done at the kernel level, outside of user space.

Related

Is it possible to allocate a single byte of memory at a specific address?

Is it possible to allocate a single byte of memory at a specific desired address, say 0x123?
This suggests follow up questions:
Is it possible to know if a specific address has already been malloced?
Some complications could be:
The byte at the desired address 0x123 was already malloc'ed. In this case, can I move the byte value elsewhere and notify the compiler (or whatever's keeping track of these things) of the new address of the byte?
The byte at the desired address 0x123 was malloc'ed along with other bytes. E.g. char *str = malloc(8); and str <= 0x123 < str + 8, or in other words, 0x123 overlaps some portion of already malloc'ed memory. In this case, is it possible to move the portion of malloc'ed memory elsewhere and notify the compiler (or whatever's keeping track of these things)?
There are also several variations:
Is this possible if the desired address is known at compile time?
Is this possible if the desired address is known at run time?
I know mmap takes a hint addr, but it allocates in multiples of the pagesize and may or may not allocate at the given hint addr.

It is possible to assign a specific value to a pointer as follows:
unsigned char *p = (unsigned char *)0x123;
However dereferencing such a pointer will almost certainly result in undefined behavior on any hosted system.
The only time such a construct would be valid is on an embedded system where it is allowed to access an arbitrary address and the implementation documents specific addresses for specific uses.
As for trying to manipulate the inner workings of a malloc implementation, such a task is very system specific and not likely to yield any benefit.

The are operating-system-specific ways to do this. On Windows, you can use VirtualAlloc (with the MEM_COMMIT | MEM_RESERVE flags). On Linux, you can use mmap (with the MAP_FIXED_NOREPLACE flag). These are the operating system functions which give you full control over your own address space.
In either case, you can only map entire pages. Addresses only become valid and invalid a page at a time. You can't have a page that is only half valid, and you can't have a page where only one address is valid. This is a CPU limitation.
If the page you want is already allocated, then obviously you can't allocate it again.
On both Windows and Linux, you can't allocate the first page. This is so that accesses to NULL pointers (which point to the first page) will always crash.

Is it possible to allocate a single byte of memory at a specific desired address, say 0x123?
Generally: no. The C language doesn't cover allocation at specific addresses, it only covers how to access a specific address. Many compilers do provide a non-standard language extensions for how to allocate at a fixed address. When sticking to standard C, the actual allocation must either be done:
In hardware, by for example having a MCU which provides a memory-mapped register map, or
By the system-specific linker, through custom linker scripts.
See How to access a hardware register from firmware? for details.
malloc doesn't make any sense in either case, since it exclusively uses heap allocation and the heap sits inside a pre-designated address space.

What's the mechanism behind P2V, V2P macro in Xv6

I know there is a mapping mechanism in how a virtual address turns out into a physical.
Just like the following, a linear address contains three parts
Page Directory index
Page Table index
Offset
Here is the illustration:
Now, when I take a look at the source code of Xv6 in memorylayout.h
#define V2P(a) (((uint) (a)) - KERNBASE)
#define P2V(a) (((void *) (a)) + KERNBASE)
#define V2P_WO(x) ((x) - KERNBASE) // same as V2P, but without casts
#define P2V_WO(x) ((x) + KERNBASE) // same as P2V, but without casts
How can the V2P or P2V work correctly without doing the process of the address translation?

The V2P and P2V macros doesn't do more than you think they do. They are just subtract and add the KERNBASE constant, which marked at 2 GB.
It seems you understand the MMU's hardware mapping mechanism correctly.
The mapping rules are saved by a per process page tables. Those tables forms the process's virtual space.
Specifically, in XV6, process's virtual space structure is being built (by the appropriate mappings) as follows: virtual address space layout
As the diagram above shows, XV6 specifically builds the process's virtual address space so that 2GB - 4GB virtual addresses maps to 0 to PHYSTOP physical addresses (respectively). As explained in the XV6 official commentary:
Xv6 includes all mappings needed for the kernel to run in every process’s page table;
these mappings all appear above KERNBASE. It maps virtual addresses KERNBASE:KERNBASE+PHYSTOP
to 0:PHYSTOP.
The motivation for this decision is also specified:
One reason for this mapping is so that the
kernel can use its own instructions and data. Another reason is that the kernel sometimes
needs to be able to write a given page of physical memory, for example when
creating page table pages; having every physical page appear at a predictable virtual
address makes this convenient.
In other words, because the kernel made all page tables map 2GB virtual to 0 physical (and up to PHYSTOP), we can easily find the physical address of a virtual address that is above 2GB.
For example: The physical address of the virtual address 0x10001000 can easily found: it is 0x00001000 we just subtract 2GB, because that's how we mapped it to be.
This "workaround" can help the kernel do the conversion easily, for example, for building and manipulating page tables (where physical address needs to be calculated and written to memory).
Of course that the above "workaround" is not free, as this makes us waste valuable virtual address space (2GB of it) because now every physical address has at least already 1 virtual address. even if we never going to use it. that leaves the real process only the 2GB left from the entire virtual address (it is 4 GB in total, because we use 32bit to count addresses). This also was explained in the XV6 official commentary:
A defect of this arrangement is that xv6 cannot make
use of more than 2 GB of physical memory
I recommend you reading more about this manner in the XV6 commentary under "Process address space" header. XV6 Commentary

Passing information from UEFI to the OS

I am familiar with BIOS int 15 - E820 function, where you could choose a fixed physical location, put there whatever you wanted, the OS would not overwrite it, and you could just access that fixed memory address (may map it to a virtual pointer first etc).
But in the UEFI case, as much as I am aware, there is no memory area reserved for the user, so I couldn't rely on allocating a buffer at a specific memory address (if that's even possible?), therefore I have to use a UEFI memory memory function - which returns a pointer that is not fixed.
So my questions are -
Is it possible to allocate a buffer that will not be overwritten once the OS goes up?
How is it possible to pass the OS the pointer of the allocated buffer, so I could access it from the OS (again, since allocation, hopefully given that the buffer itself is not overwritten, is not in a fixed location).
Thank you!

Yes. Allocate memory memory of a non-reclaimable type, such as EfiRuntimeServicesData.
The mechanism UEFI uses is called configuration tables.
Note: EfiPersistentMemory is something completely different.
Configuration tables are installed by calling InstallConfigurationTable during boot services, with the two parameters being a GUID and a pointer to the physical address of the data structure you want to pass. This pair is then linked into an array pointed to by the UEFI System Table.
How you extract that information in Windows, I do not know. In Linux, the UEFI system table is globally accessible in kernel space (efi->systab), so the pointer can be extracted from there.

When would one use mmap MAP_FIXED?

I've been looking at the different flags for the mmap function, namely MAP_FIXED, MAP_SHARED, MAP_PRIVATE. Can someone explain to me the purpose of MAP_FIXED? There's no guarantee that the address space will be used in the first place.

MAP_FIXED is dup2 for memory mappings, and it's useful in exactly the same situations where dup2 is useful for file descriptors: when you want to perform a replace operation that atomically reassigns a resource identifier (memory range in the case of MAP_FIXED, or fd in the case of dup2) to refer to a new resource without the possibility of races where it might get reassigned to something else if you first released the old resource then attempted to regain it for the new resource.
As an example, take loading a shared library (by the dynamic loader). It consists of at least three types of mappings: read+exec-only mapping of the program code and read-only data from the executable file, read-write mapping of the initialized data (also from the executable file, but typically with a different relative offset), and read-write zero-initialized anonymous memory (for .bss). Creating these as separate mappings would not work because they must be at fixed relative addresses relative to one another. So instead you first make a dummy mapping of the total length needed (the type of this mapping doesn't matter) without MAP_FIXED just to reserve a sufficient range of contiguous addresses at a kernel-assigned location, then you use MAP_FIXED to map over top of parts of this range as needed with the three or more mappings you need to create.
Further, note that use of MAP_FIXED with a hard-coded address or a random address is always a bug. The only correct way to use MAP_FIXED is to replace an existing mapping whose address was assigned by a previous successful call to mmap without MAP_FIXED, or in some other way where you feel it's safe to replace whole pages. This aspect too is completely analogous to dup2; it's always a bug to use dup2 when the caller doesn't already have an open file on the target fd with the intent to replace it.

If the file you are loading contains pointers, you will need to load it at a fixed location in order to ensure that the pointers are correct. In some cases, this can merely be an optimization.
Executables which are not position-independent must be loaded at fixed addresses.
Shared memory may contain pointers.
Executables which use prebinding will attempt to load dynamic libraries at predetermined memory locations as an optimization, but will fall back to normal loading techniques if a different location is used (or if the library has changed).
So MAP_FIXED is not typical usage.

Dereferencing a pointer at lower level in C

When malloc returns a pointer (a virtual address of a block of data),
char *p = malloc (10);
p has a virtual address, (say x). And p holds a virtual address of a block of 10 addresses.
Say these virtual addresses are from y to y+10.
These 10 addresses belong to a page , and the virtual --> physical mapping is placed in the page table.
When processor dereferences the pointer p, say printf("%c", *p); , how does the processor know that it has to access the address at y ?
Is the page table accessed twice in order to dereference a pointer ,in other words -print the address pointed by p ? How exactly is it done, can anybody explain ?
Also, for accessing stack variables, does the processor have to access it through page table ?
Isn't the stack pointer register (SP) not pointing to the stack already ?

I think there's a muddling of different layers.
First, page tables: This is a data structure that uses some memory to provide pointers to more memory. Given a particular virtual address, it can deconstruct it into indices into the table. Right now, this is happening under the cover in the kernel, but it's possible to implement this same idea in user space.
Now, the next step is processes. Each process gets its own view of memory and hence has its one set of page tables. How the processor know where these different page tables reside? In a special control register called cr3. Changing processes is sometimes called a context switch; and rightly so because setting cr3 changes the processes view of virtual memory.
But the next question is, how does the processor even understand the concept of virtual memory? Well, in some older architectures (MIPs comes to mind), the system would keep a cache of recently translated memory and provides guidance for how to handle virtual memory access. In x86, the cache (more commonly called a translation lookaside buffer) is actually implemented in hardware. The processor stores these translations so it can handle the page table lookups automatically. If there's a cache miss, then it will actually traverse the page table structure as set up by the OS to lookup what it should reference.
Of course, this means there must be at least two different modes for the processor: one that assumes the addresses are direct and one that traverses the page tables. The first mode, real mode, is there on boot and only around long enough to set up the tables before the bootloader turns on virtual mode and jumps to the beginning of the rest of the code.
The short answer to my long explanation is that in all likelihood, the page tables aren't accessed at all because the processor already has the address translations.

And p holds a virtual address of a block of 10 addresses.
You're confused. p is a pointer holding the address of a 10-byte block; how these bytes are interpreted is up to the application.