How this simple paging in ARMv8a works

How this simple paging in ARMv8a works - c

According to the ARM manual:
In the case of a 4kB granule, the hardware can use a 4-level look up
process. The 48-bit address has nine address bits for each level
translated (that is, 512 entries each), with the final 12 bits
selecting a byte within the 4kB coming directly from the original
address.
Bits [47:39] of the virtual address index into the 512 entry L0 table.
Each of these table entries spans a 512GB range and points to an L1
table. Within that 512 entry L1 table, bits [38:30] are used as index
to select an entry and each entry points to either a 1GB block or an
L2 table.
Bits [29:21] index into a 512 entry L2 table and each entry points to
a 2MB block or next table level. At the last level, bits [20:12] index
into a 512 entry L2 table and each entry points to a 4kB block
This makes 100% sense for me. L0, L1, L2 tables and the final offset to arrive at a physical address.
However, look at this code: https://github.com/bztsrc/raspi3-tutorial/blob/abaf5a5b2bc1a9fdfe5a9d8191c061671555da3d/10_virtualmemory/mmu.c#L66, explained here:
Because we choose 4k as page size, and one translation entry is 8
bytes, that means we have 512 entries on each page. Therefore indeces
0..511 belong to the first page, 512..1023 to the second and so forth. With other words, the address of paging[0] equals to _end (first
page), and paging[512] equals to _end + PAGESIZE (second page).
It looks like it's setting the L0, L1 and L2 as mentioned in the manual. So the first 512 entries would be the entries of the L0 table, the 513-1024 entries would be the L1 and 1025-1536 entries would be the L2 table.
However in the code it starts doing this like this:
paging[4*512+511]=(unsigned long)((unsigned char*)&_end+5*PAGESIZE) | // physical address
PT_PAGE | // we have area in it mapped by pages
PT_AF | // accessed flag
PT_KERNEL | // privileged
PT_ISH | // inner shareable
PT_MEM; // normal memory
The index 4*512+511 = 2559 is way past the L2 table I imagined. I think I misunderstood something very wrong!
Should paging[0] and paging[511] span the first table (L0), then paging[512] and paging[2013] span the second table (L1) and paging[1024] and paging[2559] span the last table (L2)?
What about r<<21 and r*PAGESIZE, what do these mean?

There are two tables, which are pointed to by TTBR0 and TTBR1.
The first, TTBR0, points directly at &paging[0], and form the L0,L1,L2 page heirarchy:
Paging[0] points at &paging[512*2]
Paging[512*2] points at &paging[512*3]
Paging[512*3..512*3+511] contains page descriptors for physical memory at 0..200000.
Addintionally
Paging[512*2+1..512*2+511] contains large descriptors for physical memory at 400000..40000000
The second (kernel), TTBR1, points directly at &paging[512], forming a similar L0,L1,L2 heirarchy:
Paging[512+511] points at &paging[512*4]
Paging[512*4+511] points at &paging[512*5]
Paging[512*5] contains a descriptor for MMIO_BASE+0x201000.
The reason the second set is offset to the 511th descriptor of each table is to make it be at a very high address.
The virtual address decoding is selected by the translation control register's T1SZ; it is annotated as 3 Levels, or 39 bits of virtual addressing:
12 bits of offset and
27 bits of table indices (9 bits * 3 levels).
Address bits 63..40 traditionally must have the same value -- either all zeroes or all ones. This can be loosened in a control register, but regardless, bit 63 chooses which of the TTBR[01] will be used to select one of two L0 page table sets.
Traditionally, each process will have its own TTBR0, and the kernel will have a TTBR1 which is used for all processes [ thus needn't be changed ].

Related

How do I do the BSR SUBR, and define the SUBR part of the code?

Question
The program is supposed to do the following:
Add up the first 6 data items ($1 to $6) stored at address label DATA.
Store the sum of the first 6 data items to address label SUM1.
Multiply the sum stored at address label SUM1 by 8, and store the result at address label MUL8. (loop then add)
Add up the last 6 data items ($7 to $C) stored at address label DATA.
Store the sum of the last 6 data items to address label SUM2.
Divide the sum stored at address label SUM2 by 4, and store the result at address label DIV4.

How do I do the BSR SUBR, and define the SUBR part of the code?
You can't solve this task without consulting the Programmer's Reference Manual
There's really nothing more to do for the BSR SUBR instruction that already does a 'Branch to Subroutine' (BSR). Defining the SUBR part is just a matter of writing down the instructions that will perform the six steps that were outlined in your task description and then execute a 'Return from Subroutine' (RTS).
To get you on your way, here's a detailed explanation for step 1
Add up the first 6 data items ($1 to $6) stored at address label DATA.
In order to sum up 6 bytes from the array, we can load the first byte in a data register and then add the next 5 bytes from a loop.
Before the loop we:
load an address register like A1 with the address of the DATA label. The movea.l #DATA, a1 instruction does that.
load a data register like D1 with the loop count which is 5. The moveq.l #5, d1 instruction does that. To load small numbers in the range [-128,+127] always prefer moveq over move because it is both faster and has a smaller encoding .
load another data register like D0 with the first byte from the array. The move.b (a1)+, d0 instruction does that. Because this instruction uses the post-increment addressing mode and because the size attribute is byte, the address held in the A1 address register will automatically increment by 1. This way we can step through the array.
In the loop we:
add the next byte to the chosen D0 data register. The add.b (a1)+, d0 instruction does that.
decrement the loop count we have in the D1 data register. The subq.l #1, d1 instruction does that. To subtract small numbers in the range [1,8] always prefer subq over sub/subi because it has a smaller encoding and is much faster then subi.
branch back to the top of the loop only if the decrement on the loop counter did not produce 0. The bne.s loop1 instruction does that.
movea.l #DATA, a1
moveq.l #5, d1
move.b (a1)+, d0
loop1:
add.b (a1)+, d0
subq.l #1, d1
bne.s loop1
I'll throw in the next step since it is rather trivial
Store the sum of the first 6 data items to address label SUM1.
Step 1 left the sum in the D0 data register. Just move it to the SUM1 variable but be sure to use the correct size tag which is .b in accordance with how the SUM1 variable was defined:
move.b d0, SUM1
Good luck with steps 3 to 6...

CHS to LBA mapping - (Disk Storage)

Before LBA you simply had the physical mapping of a disk, which on originally on an old a IBM-PC compatible machine would look something like this the following:
Cylinder Number : (10-bits)
0-1024 (1024 = 2^10)
Head Number : (8-bits)
0-256 (256 = 2^8)
Sector Number : (6-bits)
0 is reserved typically for the "boot sector" (c-0,h-0,s-0)
1-64 (63 = 2^6 - 1) *0 is reserved
Total CHS address bits : 24-Bits
Back in day the average (file|block|sector) size was 512B.
Example from wikipedia:
512(bytes) × 63(sectors) x 256(heads) × 1024(cylinders) = 8064 MiB (yields what is known as 8 GiB limit)
What I'm confused on is what a head actually means, when referred to as heads-per-cylinder in the LBA formula. It doesn't make sense to me because from what I know a head is head, and unless it removable media each platter has two of them (top,bottom) for each of the it's surfaces.
In my mind it would make more since to referred to them as heads-per-disk or heads-per-surface, since a cylinder goes through the entire disk (multiple platters).
Logical Block Addressing:
Formula: A = (c ⋅ Nheads + h) ⋅ Nsectors + (s − 1)
A - Logical Block Address
Nheads - Number of heads on a disk heads-per-disk
Nsectors - Numbers of sectors on a track sectors-per-track
c,h,s - is the cylinder,head,sector numbers 24-bits total (10+8+6)
Looking at the first example on here:
For geometry 1020 16 63 of a disk with 1028160 sectors CHS 3 2 1 is LBA 3150=(3× 16+2)× 63
Geometry:
Cylinder Number - 1020 (0-1024)
Head Number - 16 (0-256)
Sector Number - 63 (1-64)
How are these geometry CHS number mapping in to the CHS tuple (3,2,1) to be used in this formula?

I don't think heads is a number to be taken too literally. I've taken a few apart to salvage the neodymium magnets and only ever seen one disk, except on big 5-1/4 inch drives. And 2 heads. And cylinders start at 0 but heads and sectors start at 1. Some early Windows versions could only deal with 255 heads so the numbers get played with.
Short answer: Multiply cylinder # times head # times sector # and it's close to LBA. I tried pasting an OpenBSD fdisk listing in here but it's a whole 80 characters wide and the web page wouldn't take it.

The term head generally isn’t referring to the actual physical head, but rather the two sides of the platter. So C,H,S can be thought of as P,T,S (platter, track, sector). First it’s narrowed now to a specific layer, the the radial track from the centre of the disk to the outside, then the individual sector. On a floppy it’s similar but there are no platters so you just use T.S (track, sector)

data structure for quick access

The given data is input to an algo:
F1---- P1---- P2 ---- P3 .....
2 ---- 4 ------5 ------2.......
5 ----2-------10------1.....
1----4--------15------0...... (numbers under F1 are unique and non zero..numbers under P1,P2,P3 can be same and may be zero)
The algo chooses some numbers from F1 and gives the positions as 0,1,2 for which the numbers have to be accessed according to selected F1 number.(position 0 correspond to P1,1 to P2 and so on)
Again processing has to be done on numbers from selected positions.
I have made a data structure and attached the link to it where all the numbers in F1 will go in Part I of Portion A in sorted order and Part II of portion A will point to array consisting of numbers from P1,P2,P3 so that when number of F1 and position is selected by algo, the position can be quickly accessed and number retrieved from that position.
DS Image is in the link.
https://www.dropbox.com/s/iz9nqfg8jy4iekn/DS.png
(In case the image is not accessible--the DS consist of an array of structure having two members.Member 1 stores sorted number from F1 and member 2 points to an array consisting of numbers from P1,P2,P3 corresponding to a particular number in F1)
The access time has to be reduced.That is why i have taken everything into memory and quickly access the positions through that index in the horizontal array.Using a simple 2-D would have an overhead in moving the numbers of horizontal array because the numbers of F1 have to be in sorted order in Part I.
How Can i improve this?

Perfmon, how to combine Combine FirstValueA and FirstValueB?

I am using performance monitor to collect the counters data and save it to the DB. Here is the DB structure defined in the msdn http://msdn.microsoft.com/en-us/library/windows/desktop/aa371915(v=VS.85).aspx
Based on DB structure, here is the definition of the FirstValueA:
Combine this 32-bit value with the value of FirstValueB to create the
FirstValue member of PDH_RAW_COUNTER. FirstValueA contains the low
order bits.
And the FirstValueB:
Combine this 32-bit value with the value of FirstValueA to create the
FirstValue member of PDH_RAW_COUNTER. FirstValueB contains the high
order bits.
The fields FirstValueA and FirstValueB should be combined to create the FirstValue, and similarly the SecondValue.
How do you combine FirstValueA and FirstValueB to get the FirstValue in SQL Server?

So what they're saying is that you need to comingle the two, like this:
//for reference, this is 32 bits
12345678901234567890123456789012
000000000000000000000FirstValueA
000000000000000000000FirstValueB
What it says is we need to combine the two. It says that A is the low order, and B is the high order.
Let's refer to Wikipedia for http://en.wikipedia.org/wiki/Least_significant_bit and see that the low order is on the --> right, and the high order is on the <-- left.
low order -> right
high order <- left
A -> right
B <- left
So we're going to end up with (our previous example)
//for reference, this is 32 bits
12345678901234567890123456789012
000000000000000000000FirstValueA
000000000000000000000FirstValueB
becomes
//for reference, this is 32 bits
12345678901234567890123456789012
000000000000000000000FirstValueB000000000000000000000FirstValueA
Now, that doesn't work if the values look like this:
//for reference, this is 32 bits
12345678901234567890123456789012
1001101100110100101011010001010100101000010110000101010011101010
//the above string of 1's and 0's is more correct for the example
What you're given is not two binary strings, but two integers. So you have to multiply the left value by 2**32 and add it to the right value. (that's a 64 bit field by the way)
let's examine tho, why the low order bit is on the right and the high order is on the left:
Binary is written just like Arabic numerals. In Arabic numerals, the number:
123456
means one hundred twenty three thousand, four hundred fifty six. The one hundred thousand is the most significant part (given as we would shorten this to "just over one hundred thousand dollars" instead of "a lot over 6 dollars") and the six is the part we most freely drop. So we could say that the number were:
123 is the value that contains the high order bits, and 456 is the value that contains the low order bits. Here we would multiply by 10^3 to add them together (this is a mathematical fact, not a guess, so trust me on this) because it would look like this:
123
456
and so the same works for the binary:
//for reference, this is 32 bits
12345678901234567890123456789012
000000000000000000000FirstValueB
000000000000000000000FirstValueA
tl;dr:
Multiply B by 2^32 and add to A

Console.WriteLine("{0} {1} {2} : {3} {4}", p.CategoryName, p.InstanceName, p.CounterName, p.RawValue, p.CounterType.GetHashCode());
float FirstValue = p.NextValue();
Console.WriteLine("FirstValueA :{0}", (ulong)FirstValue & 4294967295);
Console.WriteLine("FirstValueB :{0}", (ulong)FirstValue >> 32);
Console.WriteLine("SecondValueA :{0}", p.NextSample().TimeStamp & 4294967295);
Console.WriteLine("SecondValueB :{0}", p.NextSample().TimeStamp >> 32);

Reading memory in correct order Need some help

We are storing some sort of records in memory location as follows
----------------------------------------------
|EventID | Timestamp | Variable Data | Length |
----------------------------------------------
Lengths of these fields are as follows
EventID+ timestamp is 12 bytes
Length Fields is 4 bytes , it indicates the length of data field.
Millions of such records are placed one after the other & I have a pointer pointing to the current index, so If I want to read all the records I go like this I read 4 bytes right to left & I fetch that particular record & doing this iteratively I read the complete memory space. But the problem with this method is that It reads records in the reverse order as compared to the order in which to they were entered.
I need to device a method which will allow me to read this memory records in the same order they were entered with minimal space complexity.

I have another great solution for you!
Read your records in reverse order (end to beginning) and swap in-memory values for EventID and Length fields.
When access rows, just keep in mind the new layout: Length | Timestamp | Data | EventID

As the variable length data section comes before the length, it will be impossible to read data starting with the beginning memory address. Assuming no changes can be made to architecture or storage, one possible option is to use your current system to build an index of the variable data lenghts. Then, once you reach the beginning of the data you would then read the records in the correct order - using the previous built index to determine variable data length.
However, you mention this dataset contains millions of records. Thus storing an index of all variable data lengths before processing may not be feasible. One such solution to this problem would be to index only every other entry, or every fourth, eight, etc... depending upon your specific requirements. Then you could start at each indexed record, work backwards temporarily saving the data lengths until you reach a record you havn't processed. Then work forward again using this saved data.
For example, let's say you index every 8 records your first pass. Then, you would start at record 8 and save the length of that record. Then go to 7, 6, 5, 4, 3, 2, 1. Now you've saved the next 8 lenghts. So process record 1, 2, 3, 4, 5, 6, 7, and 8. Now, you don't know the length of 9 - so jump to 16. Then record 16, 15, 14, .., 9 lengths. Then again as before process 9, 10, 11 ... 16. Now repeat.

Try to 'reverse' records order while fetching at first, and then make a second fetch using the same process (allocate same memory amount to reverse).
As the variable data has variable length, and the length value in last position, I see no way to get this fetching from left to right.

There is another way to find the end of a row with no additional memory.
All EventID fall into definite range, and could be sequential
All Timestamp have a definite range too (say, from 2009/09/09 through 2011/11/11)
Length, EventID, and Timestamp are adjacent between two rows and have fixed length of 16 bytes in total (4 for length, 4 for eventID, and 8 for timestamp).
Under these considerations you could write a function that searches the end of a row, e.g.
byte* FindNextRow(byte* rowStart, byte* memBlockEnd,
DWORD minEventID, DWORD maxEventID,
QWORD minTimestamp, QWORD maxTimestamp)
{
long bytesAvail = (long)(memBlockEnd - rowStart) - 4;
byte* ptr = rowStart + 12; // move to 'data'
for (long i = 0; i < bytesAvail; i++, ptr++) {
long length = *(long*)(ptr);
// check if this is the last row
if (ptr + 4 == memBlockEnd)
return memBlockEnd;
// try to find candidate for 'length' field first
if (rowStart + 12 != ptr - length)
continue;
// then check 'EventID' and 'Timestamp' for the next row
DWORD eventID = *(DWORD*)(ptr + 4);
if (eventID < minEventID || eventID > maxEventID)
continue; // you might add additional check on a sequence: eventID + 1 == *(DWORD*)(rowStart);
QWORD timestamp = *(QWORD*)(ptr + 8);
if (timestamp < minTimestamp || timestamp > maxTimestamp)
continue; // you might add additional check on a sequence: timestamp > *(QWORD*)(rowStart + 4);
// this is the match
return ptr + 4;
}
}
WARNING: this will not guarantee the correctness, but you could try to find a workaround this way.

Is allocating one pointer (in a 32 bits machine, usually 4 bytes) per message acceptable to you?
If it is, you could, starting from the end:
Read length at current position - 4
Get the pointer to the 1st byte of event id with: current position - 4 - length - 12
Resize the pointer array if needed
Store that pointer in the array
Repeat from 1
Of course, you would need to realloc() as the pointer array grows (no need to realloc every time, do it in chunks).
I am assuming you are treating them as a char array, so char pointer difference of contiguous elements (n and n-1) would give you the size of the entire message.
This wastes memory. I know you don't want to, but if you can't do like Opillect said, swapping EventID and Length fields because they have different sizes, this seems like a good way to do it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight