ARM MMU, handle L2 page table - arm

How can we determine the base address of the L2 page table? (Using ARM Cortex-A9)
For example, if I have a programme which requires 7KB of data space and starts at the address 0x0, I need two pages of 4KB.
To do that, I add an entry in the L1 page table which points to the L2 page table base address.
Then I add two entries in the L2 page table like that (with addr = 0x0 for the first page and 0x1000 for the second one)
u32 *ptr;
u32 small_page;
small_page = addr / 0x1000;
ptr = small_page + L2_table_base_addr;
*ptr = (addr & 0xFFFFF) | attributes;
Now there is one thing that I still do not understand.
How can I determine the L2 page table base address? Should I put the table right after the L1 page table?
Where can I store the address? I know that the base address of the L1 page table is stored in a coprocessor register but I did not find any register to store the L2 base address.
Another question to be sure, both coprocessor register TTBR0 and TTBR1 holds the base address of a L1 page. Each to its own. It is not TTBR0 for L1 and TTBR1 for L2, does it ?

I would recommend to read Chapter 9 Memory Management Unit of the Cortex-A Series Programmers Guide. There you will find clear explanation about base address storage. The base address of Level 1 TTB is stored in one of the two base registers (TTRB 0/1 depending upon whether table locates OS code or user process code). These two base registers are helpful while context switching.
The address of the L2 translation table entry that required is calculated by taking the (1KB aligned) base address of the level 2 translation table (given by the level 1 translation table entry) and using 8 bits of the virtual address (bits [19:12]) to index within the 256 entries in the L2 translation table (256 because its 4 byte per index so total bytes are 256*4bytes=1KB).

Related

How to get frame number from page table?

My understanding is that the page table is already specified somewhere in memory, so I just need the page number to get frame number so I can get the physical address. How do i implement the page table so I can look up the frame number?
Given a virtual address
page_number = (virtual >> page_bits);
page_offset = (virtual & page_size);
physical = page_table[page_number] + page_offset;
How is the page table implemented?

MMU: Long descriptor page table sizes in AARCH64

I Would like to understand, memory covered at each level by page tables in AARCH64 with 4k granularity.
With 47 bit of VA, One could have level 0 to level 3.
At level 0 there could be one table which describes 512 Level 1 Page tables,
Now each level 1 page table can describes 512 Level 2 page tables and further each level 2 page table can describes 512 Level 3 page tables.
So at level 3 there are 512 page tables of size 4k each and memory covered is 512*4k = 2MB , this is what only one page table of level 2 can cover and if we have 512 such level page tables at level 2 then total memory covered is 512*2MB = 1GB, right ?
Similar way, each table at level 1 points to 512 level 2 page tables( where each level 2 page tale covers 2MB).
So, 512*2MB= 1GB and if we have 512 level 1 page table and total memory covered is 512 GB , right ?
Similar way , Total memory covered at level 0 is 1024 GB, right ?
You seem to have mixed up single page table entries with entire page tables at one point, lost one level somehow and added a bit rather than subtracted it.
Single page: 4'096
Level 3 table: 4096*512 = 2'097'152 = 2MB
Level 2 table: 4096*512*512 = 1'073'741'824 = 1GB
Level 1 table: 4096*512*512*512 = 549'755'813'888 = 512GB
Level 0 table: 4096*512*512*512*512 = 281'474'976'710'656 = 256TB
Note though that the above applies to a 48-bit address. That is, from the address 12 bits are used for the page offset, and 4 times 9 bits each as page table index (12 + 4*9 = 48).
For 47 bits, you simply only have 256 entries in the level 0 table, so you end up with 128TB of addressable memory.

How can a page frame be assigned to more than one process?

I'm reading the book "Understanding the linux kernel", and this is what it says about the __count field of the page descriptor (struct page) :
_count:
A usage reference counter for the page. If it is set to - 1, the corresponding page
frame is free and can be assigned to any process or to the kernel itself. If it is set
to a value greater than or equal to 0, the page frame is assigned to one or more
processes or is used to store some kernel data structures. The page_count() func-
tion returns the value of the _count field increased by one, that is, the number of
users of the page.
My question is, if the same page is assigned to two processes, can't one process access memory assigned to the other by just decrementing/incrementing the linear address by a value smaller than PAGE_SIZE?

IA-32E Paging Example

When trying to set up virtual memory I'm a bit confused about where to go regarding mapping a given virtual address to a physical address. When working with x86 architecture and using IA-32E mode I have a function to map a new virtual page
int allocate_page(page_table_entry* p)
{
//Get Physical Block for this page table entry to point to
void* phys = getFreeBlock();
//Here I need to map the virtual entry to this physical page frame
}
According to the Intel Manual For a 4 level page table, the first 4 bits of the virtual address should give the Page Directory Table Pointer, the next 9 bits should give the Page Directory, the next 9 bits are used to find the Page Table, and the last 12 bits are used for the offset in the page frame, for a total of 52 bits. Does anybody have any resources or a suggestion of how I can get started implementing the rest of this function so that I can take a given virtual address and map it to a free page frame?

Adaptive Radix Tree

Since 1 week, I have found one interrested topic called :
Adaptive Radix Tree,
I found it is very useful techniques used to index memory specially in modern hardware architectures .
Actually I could not understand one point in page 4 , called Node48.
I have attached a picture of what I mean.
http://s30.postimg.org/nff1am2r5/xadaptive_radix.png
also this is the main page of the article : http://www-db.in.tum.de/~leis/papers/ART.pdf
So could anybody who is more smart than me to explain that for me, I would be very happy.
Thanks.
I believe that you understand how NODE_4 and NODE_16 works.
In NODE_4 and NODE_16, they put 4 to 16 8-bit keys in the first part of a node. The cost of searching a key are 32 bits and 128 bits that can fit into a regular register and a SIMD register.
However, if we use same way in NODE_48, the cost of searching is read 384 bits that cannot even fit into 256-bit SIMD register. So, Viktor Leis et al. use a child index instead of keys in the first part of NODE_48. The child index contains 256 8-bit offsets that represent the position of a pointer. Eg. If you want to search 103 (it could be 0 to 255) in a NODE_48, the program would:
jump to the 103rd slot in the child index (start from 0th)
read the value from 103rd slot and assume it is 5
jump to the 5th slot in the child pointer
read the pointer value from 5th slot and go to the next node
It does 2 offset calculations instead of 48 (SIMD) selections.
Addition:
For Accessing the 103th element in NODE_4 (or NODE_16):
read the "key" part of the NODE in which there are 4 keys (or 16 in NODE_16) which present the key of the child pointers.
execute a SIMD instruction to compare 103 with all 4 keys.
if one of the keys (let say 3rd key) is 103, follow the 3rd child pointer
if none of the keys is 103, return not found.

Resources