What is use of extended page table?

What is use of extended page table? - c

Can we show page table address using c program?
what is the difference between page table and extended page table?

Can we show page table address using c program?
Not using a plain-old C program, no you can't. User-mode programs run in virtual memory, which is provided by the kernel, using paging mechanisms. All of this is abstracted away so userspace knows nothing about it.
The Linux kernel does provide a mechanism for userspace to observe the pagetables however, as indicated at this question.
what is the difference between page table and extended page table?
"Extended page tables" are Intel's implementation of Second Level Address Translation (SLAT), also known as nested paging, which is used to more efficiently virtualize the memory of guest VMs.
Basically, guest virtual addresses are first translated to guest physical addresses, which are then translated to host physical addresses. This is all done in hardware (by the MMU) to avoid extra work needing to be done in software by the VMM.

1] What is second level Page Table
Extended page tables are a mechanism to allow each virtual machine to manage its page table, without giving access to the underlying host machine's MMU - Hardware.
Have a quick look at the link below. It should give an idea
http://www.cs.cmu.edu/~dga/15-440/F10/lectures/vm-ucsd.pdf
2] Is it possible to print Page Table using a C program?
- Its perfectly possible. There will be an MMU driver in your system. MMU driver will be setting up the Page Tables in some part of RAM. You need to know that location.
In conventional operating systems [linux,windows etc] . This memory area would be privileged, so applications may not get direct access.
If your platform is an embedded system with a micro-kernel running on it, probably you will be able to access this table.

Related

How does kernel restrict processes to their own memory pool?

This is purely academical question not related to any OS
We have x86 CPU and operating memory, this memory resembles some memory pool, that consist of addressable memory units that can be read or written to, using their address by MOV instruction of CPU (we can move memory from / to this memory pool).
Given that our program is the kernel, we have a full access to whole this memory pool. However if our program is not running directly on hardware, the kernel creates some "virtual" memory pool which lies somewhere inside the physical memory pool, our process consider it just as the physical memory pool and can write to it, read from it, or change its size usually by calling something like sbrk or brk (on Linux).
My question is, how is this virtual pool implemented? I know I can read whole linux source code and maybe one year I find it, but I can also ask here :)
I suppose that one of these 3 potential solutions is being used:
Interpret the instructions of program (very ineffective and unlikely): the kernel would just read the byte code of program and interpret each instruction individually, eg. if it saw a request to access memory the process isn't allowed to access it wouldn't let it.
Create some OS level API that would need to be used in order to read / write to memory and disallow access to raw memory, which is probably just as ineffective.
Hardware feature (probably best, but have no idea how that works): the kernel would say "dear CPU, now I will send you instructions from some unprivileged process, please restrict your instructions to memory area 0x00ABC023 - 0xDEADBEEF" the CPU wouldn't let the user process do anything wrong with the memory, except for that range approved by kernel.
The reason why am I asking, is to understand if there is any overhead in running program unprivileged behind the kernel (let's not consider overhead caused by multithreading implemented by kernel itself) or while running program natively on CPU (with no OS), as well as overhead in memory access caused by computer virtualization which probably uses similar technique.

You're on the right track when you mention a hardware feature. This is a feature known as protected mode and was introduced to x86 by Intel on the 80286 model. That evolved and changed over time, and currently x86 has 4 modes.
Processors start running in real mode and later a privileged software (ring0, your kernel for example) can switch between these modes.
The virtual addressing is implemented and enforced using the paging mechanism (How does x86 paging work?) supported by the processor.

On a normal system, memory protection is enforced at the MMU, or memory management unit, which is a hardware block that configurably maps virtual to physical addresses. Only the kernel is allowed to directly configure it, and operations which are illegal or go to unmapped pages raise exceptions to the kernel, which can then discipline the offending process or fetch the missing page from disk as appropriate.
A virtual machine typically uses CPU hardware features to trap and emulate privileged operations or those which would too literally interact with hardware state, while allowing ordinary operations to run directly and thus with moderate overall speed penalty. If those are unavailable, the whole thing must be emulated, which is indeed slow.

Which operating systems, and how, can pin pages in a database buffer pool?

Most relational database construction textbooks talk about the concept of being able to pin a page, i.e. prevent the operating system from swapping it out of memory. The concept is so that the database software can use it's own buffer replacement algorithm, which might be a better fit than whatever the OS virtual memory policy provides.
It is unclear to me whether typical desktop operating systems actually provide the programmer with the capability to pin pages. The best I can find on OS X, for example, refers to wired pages, but these seem to be only usable by the superuser.
Is the concept of pinning pages, and of defining appropriate buffer replacement strategies that supersede that of the OS, only of theoretical interest and not really implemented by real relational database systems? Or is it the case that typical desktop OS'es (Linux, Windows, OS X) do include hooks for pinning, and typical relational DB software (Oracle, SQL Server, PostgreSQL, MySQL, etc) uses them?

In PostgreSQL, the database server copies the pages from the file (or from the OS, really) into a shared memory segment which PostgreSQL controls. The OS doesn't know what the mapping is between the file system blocks and the shared memory blocks, so the OS couldn't write those pages back out to their disk locations even if it wanted to, until PostgreSQL tells it to do so by issuing a seek and a write.
The OS could decide to swap parts of shared memory out to disk into a swap partition (for example, if it were under severe memory stress), but it can't write them back to their native location on disk since it doesn't know what that location is.
There are ways to tell the OS not to page out certain parts of memory, such as shmctl(shmid,SHM_LOCK,NULL). But these are mostly intended for security purposes, not performance purposes. For example, you use it to prevent very sensitive information (like the decrypted copy of a private key) from accidentally getting written to swap partitions, from which it might be recovered by the bad guys.

#jjanes is correct to say that the OS can't really write out Pg's shared memory buffer, and can't control what PostgreSQL reads into it, so it doesn't make sense to "pin" it. But that's only half the story.
PostgreSQL does not offer any feature for pinning pages from tables in its shared memory segment. It could do so, and it might arguably be useful, but nobody has implemented it. In most cases the buffer replacement algorithm does a pretty good job by its self.
Partly this is because PostgreSQL relies heavily on the operating system's buffer caches, rather than trying to implement its own. Data might be evicted from shared_buffers, but it's usually still cached in the OS. It's not unreasonable to think of shared_buffers as a first-level cache, and the OS disk cache as the second-level cache.
The features available to control what's kept in the operating system's disk cache are whatever the OS provides. In general, that's not much, because again modern OSes tend to do a better job if you leave them alone and let them manage things themselves.
The idea of manual buffer management, etc, is IMO largely a relic of times when systems had simpler and less effective algorithms for managing caches and buffers automatically.
The main time that automation falls down is if you have something that's used only intermittently, but you want to ensure is available with extremely good response times when it is used; i.e. you wish to degrade the overall system's throughput to make one part of it more responsive. PostgreSQL doesn't offer much control over that; most people simply ensure that they have something regularly querying the data of interest to keep it warm in the cache.
You could write a relatively simple extension to mmap() a file and mlock() its range, but it'd be pretty wasteful and you'd have to fiddle with the default OS limits designed to stop you from locking too much memory.
(FWIW, I think Oracle offers quite a bit of control over pinning relations, indexes, etc, in tune with its "manually control everything whether you want to or not" philosophy, and it bypasses much of the operating system in the process.)

Speaking for SQL Server (on Windows, obviously), there's an OS setting that allows the SQL engine to ignore requests from the OS in response to memory pressure. That setting is called Lock Pages in Memory (LPIM). That permissions is granted on a per-account basis and needs to be granted to the account running your SQL service when the service is started.
Keep in mind that this isn't always a good idea. For example, in a virtualized environment, the hypervisor communicates its memory needs via a balloon driver process in the guest. If the hypervisor needs more memory, it inflates the memory needs of the balloon in the guest. If your SQL process has LPIM turned on, it won't respond and the hypervisor can start flagging as a result. And if the hypervisor isn't happy, ain't nobody happy.

Virtual Memory allocation without Physical Memory allocation

I'm working on a Linux kernel project and i need to find a way to allocate Virtual Memory without allocating Physical Memory. For example if I use this :
char* buffer = my_virtual_mem_malloc(sizeof(char) * 512);
my_virtual_mem_malloc is a new SYSCALL implemented by my kernel module. All data written on this buffer is stocked on file or on other server by using socket (not on Physical Memory). So to complete this job, i need to request Virtual Memory and get access to the vm_area_struct structure to redefine vm_ops struct.
Do you have any ideas about this ?
thx

This is not architecturally possible. You can create vm areas that have a writeback routine that copies data somewhere, but at some level, you must allocate physical pages to be written to.
If you're okay with that, you can simply write a FUSE driver, mount it somewhere, and mmap a file from it. If you're not, then you'll have to just write(), because redirecting writes without allocating a physical page at all is not supported by the x86, at the very least.

There are a few approaches to this problem, but most of them require you to first write to an intermediate memory.
Network File System (NFS)
The easiest approach is simply to have the server open some sort of a shared file system such as NFS and using mmap() to map a remote file to a memory address. Then, writing to that address will actually write the OS's page cache, wich will eventually be written to the remote file when the page cache is full or after predefined system timeout.
Distributed Shared Memory (DSM)
An alternative approach is using DSM with a very small cache size.
In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as one logically shared address space.
[...] Software DSM systems can be implemented in an operating system, or as a programming library and can be thought of as extensions of the underlying virtual memory architecture. When implemented in the operating system, such systems are transparent to the developer; which means that the underlying distributed memory is completely hidden from the users.
It means that each virtual address is logically mapped to a virtual address on a remote machine and writing to it will do the following: (a) receive the page from the remote machine and gain exclusive access. (b) update the page data. (c) release the page and send it back to the remote machine when it reads it again.
On typical DSM implementation, (c) will only happen when the remote machine will read the data again, but you might start from existing DSM implementation and change the behavior so that the data is sent once the local machine page cache is full.
I/O MMU
[...] the IOMMU maps device-visible virtual addresses (also called device addresses or I/O addresses in this context) to physical addresses.
This basically means to write directly to the network device buffer, which is actually implementing an alternative driver for that device.
Such approach seems the most complicated and I don't see any benefit from that approach.
This approach is actually not using any intermediate memory but is definitely not recommended unless the system has a heavy realtime requirement.

Is there a way to dump all the physical memory value?

I understand that each user process is given a virtual address space, and that can be dumped. But is there a way to dump the Physical Address Space? Suppose I have 32-bit system with 4GB memory, can i write a program to print each physical memory location.
I understand it violates memory protection etc. but if its possible how can convert this into a kernel process or lower level process to allow me access to the entire memory..?
I'd like to know how to write such code (if possible) on Windows/Linux platform( or kernel).. OR in case I've to use Assembly or something like that, how to shift to that privilege level.

In Linux, you can open and map the device file /dev/mem (if you have read permission to it). This corresponds to physical memory.

can i write a program to print each physical memory location.
I think no operating system gives the user access to physical memory location. So, you cann't. What ever, you are seeing are virtual addresses produced by the Operating System.

It is possible, on Windows, to access physical memory directly. Some of the things you can do:
Use the Device\PhysicalMemory object -- you can't access all physical memory, and user-mode access to it is restricted starting from Windows Server 2003 SP1.
Use Address Windowing Extensions -- you can control your own virtual-to-physical address mappings, so in a sense you are accessing physical memory directly, although still through page tables.
Write a kernel-mode driver -- there are kernel-mode APIs to access physical memory directly, to allocate physical memory pages, etc. One reason for that is DMA (Direct Memory Access).
None of these methods will give you easy, unrestricted access to any physical memory location.
If I may ask, what are you trying to accomplish?

I'm thinking you could probably do it with a kernel mode driver, but the result would be gibberish as what is in the user section of RAM at the time you grabbed it would be what the OS had paged in, it may be part of one application or a mish mash of a whole bunch. This previous SO question may also be helpful: How does a Windows Kernel mode Driver, access paged memory ?

Try this NTMIO - A WINDOWS COMMAND LINE TO ACCESS HARDWARE RESOURCES http://siliconkit.com/ocart/index.php?route=product/product&keyword=ntmio&category_id=0&product_id=285

How to use more than 3 GB in a process on 32-bit PAE-enabled Linux app?

PAE (Physical Address Extension) was introduced in CPUs back in 1994. This allows a 32-bit processor to access 64 GB of memory instead of 4 GB. Linux kernels offer support for this starting with 2.3.23. Assume I am booting one of these kernels, and want to write an application in C that will access more than 3 GB of memory (why 3 GB? See this).
How would I go about accessing more than 3 GB of memory? Certainly, I could fork off multiple processes; each one would get access to 3 GB, and could communicate with each other. But that's not a realistic solution for most use cases. What other options are available?
Obviously, the best solution in most cases would be to simply boot in 64-bit mode, but my question is strictly about how to make use of physical memory above 4 GB in an application running on a PAE-enabled 32-bit kernel.

You don't, directly -- as long as you're running on 32-bit, each process will be subject to the VM split that the kernel was built with (2GB, 3GB, or if you have a patched kernel with the 4GB/4GB split, 4GB).
One of the simplest ways to have a process work with more data and still keep it in RAM is to create a shmfs and then put your data in files on that fs, accessing them with the ordinary seek/read/write primitives, or mapping them into memory one at a time with mmap (which is basically equivalent to doing your own paging). But whatever you do it's going to take more work than using the first 3GB.

Or you could fire up as many instances of memcached as needed until all physical memory is mapped. Each memcached instance could make 3GiB available on a 32 bit machine.
Then access memory in chunks via the APIs and language bindings for memcached. Depending on the application, it might be almost as fast as working on a 64-bit platform directly. For some applications you get the added benefit of creating a scalable program. Not many motherboards handle more than 64GiB RAM but with memcached you have easy access to as much RAM as you can pay for.
Edited to note, that this approach of course works in Windows too, or any platform which can run memcached.

PAE is an extension of the hardware's address bus, and some page table modifications to handle that. It doesn't change the fact that a pointer is still 32 bits, limiting you to 4G of address space in a single process. Honestly, in the modern world the proper way to write an application that needs more than 2G (windows) or 3G (linux) of address space is to simply target a 64 bit platform.

On Unix one way to access that more-than 32bit addressable memory in user space by using mmap/munmap if/when you want to access a subset of the memory that you aren't currently using. Kind of like manually paging. Another way (easier) is to implicitly utilize the memory by using different subsets of the memory in multiple processes (if you have a multi-process archeteticture for your code).
The mmap method is essentially the same trick as commodore 128 programmers used to do for bank switching. In these post commodore-64 days, with 64-bit support so readily available, there aren't many good reasons to even think about it;)
I had fun deleting all the hideous PAE code from our product a number of years ago.

You can't have pointers pointing to > 4G of address space, so you'd have to do a lot of tricks.
It should be possible to switch a block of address space between different physical pages by using mmap to map bits of a large file; you can change the mapping at any time by another call to mmap to change the offset into the file (in multiples of the OS page size).
However this is a really nasty technique and should be avoided. What are you planning on using the memory for? Surely there is an easier way?

Obviously, the best solution in most cases would be to simply boot in 64-bit mode, but my question is strictly about how to make use of physical memory above 4 GB in an application running on a PAE-enabled 32-bit kernel.
There's nothing special you need to do. Only the kernel needs to address physical memory, and with PAE, it knows how to address physical memory above 4 GB. The application will use memory above 4 GB automatically and with no issues.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight