I Would like to understand, memory covered at each level by page tables in AARCH64 with 4k granularity.
With 47 bit of VA, One could have level 0 to level 3.
At level 0 there could be one table which describes 512 Level 1 Page tables,
Now each level 1 page table can describes 512 Level 2 page tables and further each level 2 page table can describes 512 Level 3 page tables.
So at level 3 there are 512 page tables of size 4k each and memory covered is 512*4k = 2MB , this is what only one page table of level 2 can cover and if we have 512 such level page tables at level 2 then total memory covered is 512*2MB = 1GB, right ?
Similar way, each table at level 1 points to 512 level 2 page tables( where each level 2 page tale covers 2MB).
So, 512*2MB= 1GB and if we have 512 level 1 page table and total memory covered is 512 GB , right ?
Similar way , Total memory covered at level 0 is 1024 GB, right ?
You seem to have mixed up single page table entries with entire page tables at one point, lost one level somehow and added a bit rather than subtracted it.
Single page: 4'096
Level 3 table: 4096*512 = 2'097'152 = 2MB
Level 2 table: 4096*512*512 = 1'073'741'824 = 1GB
Level 1 table: 4096*512*512*512 = 549'755'813'888 = 512GB
Level 0 table: 4096*512*512*512*512 = 281'474'976'710'656 = 256TB
Note though that the above applies to a 48-bit address. That is, from the address 12 bits are used for the page offset, and 4 times 9 bits each as page table index (12 + 4*9 = 48).
For 47 bits, you simply only have 256 entries in the level 0 table, so you end up with 128TB of addressable memory.
Related
CREATE SEQUENCE has CACHE option
MSDN defines it as
[ CACHE [<constant> ] | NO CACHE ]
Increases performance for applications that use sequence objects by
minimizing the number of disk IOs that are required to generate
sequence numbers. Defaults to CACHE. For example, if a cache size of
50 is chosen, SQL Server does not keep 50 individual values cached. It
only caches the current value and the number of values left in the
cache. This means that the amount of memory required to store the
cache is always two instances of the data type of the sequence object.
I understand it improves performance by avoiding reads from disk IO and maintaining some info in the memory that would help reliably generate the next number in the sequence, but I cannot imagine what a simple memory representation of the cache would look like for what the MSDN describes in the example.
Can someone explain how would the cache work with this sequence
CREATE SEQUENCE s
AS INT
START WITH 0
INCREMENT BY 25
CACHE 5
describing what the cache memory would hold when each of the following statements is executed independently:
SELECT NEXT VALUE FOR s -- returns 0
SELECT NEXT VALUE FOR s -- returns 25
SELECT NEXT VALUE FOR s -- returns 50
SELECT NEXT VALUE FOR s -- returns 75
SELECT NEXT VALUE FOR s -- returns 100
SELECT NEXT VALUE FOR s -- returns 125
This paragraph in the doc is very helpful:
For an example, a new sequence is created with a starting value of 1 and a cache size of 15. When the first value is needed, values 1
through 15 are made available from memory. The last cached value (15)
is written to the system tables on the disk. When all 15 numbers are
used, the next request (for number 16) will cause the cache to be
allocated again. The new last cached value (30) will be written to the
system tables.
So, in your scenario
CREATE SEQUENCE s
AS INT
START WITH 0
INCREMENT BY 25
CACHE 5
You will have 0, 25, 50, 75 and 100 in Memory and you will get only one I/O write in disk: 100.
The problem you could have, explained in the the doc, is if the server goes down and you haven't used all the 5 items, next time you ask for a value you'll get 125.
MSDN says here: msdn maximums that max datafile size is "16 Terabytes" - not sure if their definition of terabyte is 1024^4 or 1000^4 - so valid max page number might be 2,147,483,648 (for 1024 basis) or 1,953,125,000 (for 1000 basis) or perhaps something else - does anyone know with certainty?
I have heard that this limit should be increasing with future releases - right now I'm using 2012.
Yes it is based on 1024 which is a kilobyte. Multiply that by 1024 and you get a megabyte. And so on.
Your also correct that newer versions have larger maximums.
The minimum unit required for storing any type of data in SQL Server is a Page, a page is 8 KB in size.i.e exactly 8192 Bytes , pages are stored in logical Extents.
Page Header
Yet not all of the 8192 Bytes is available for data storage, some of the space from 8192 Bytes is used to store information about the page itself. It is called Page Header and it is about 96 Bytes.
Row Set
This is another section on the page containing information about the rows on that page, it begins at the end of the page taking another 36 Bytes from the total page size 8192 Bytes.
Total Space Available for Data Storage
8912 Total space on a page
- 96 Space taken by the page header
- 36 Space taken by the Row set
----------------------------------------------
8060 Total Space Available for Data Storage
So if you are trying to calculate the amount of data you will be able to store in a database and especially when you are talking in Terabytes, dont forget to take Page header and row set into consideration.
I am trying to run following script on informix:
CREATE TABLE REG_PATH (
REG_PATH_ID SERIAL UNIQUE,
REG_PATH_VALUE LVARCHAR(750) NOT NULL,
REG_PATH_PARENT_ID INTEGER,
REG_TENANT_ID INTEGER DEFAULT 0,
PRIMARY KEY(REG_PATH_ID, REG_TENANT_ID) CONSTRAINT PK_REG_PATH
);
CREATE INDEX IDX1 ON REG_PATH(REG_PATH_VALUE, REG_TENANT_ID);
But it gives the following error:
517: The total size of the index is too large or too many parts in index.
I am using informix version 11.50FC9TL. My dbspace chunk size is 5M.
What is the reason for this error, and how can I fix it?
I believe 11.50 has support for large page sizes, and to create an index on a column that is LVARCHAR(750) (plus a 4-byte INTEGER), you will need to use a bigger page size for the dbspace that holds the index. Offhand, I think the page size will need to be at least 4 KiB, rather than the default 2 KiB you almost certainly are using. The rule of thumb I remember is 'at least 5 index keys per page', and at 754 bytes plus some overhead, 5 keys squeaks in at just under 4 KiB.
This is different from the value quoted by Bohemian in his answer.
See the IDS 12.10 Information Center for documentation about Informix 12.10.
Creating a dbspace with a non-default page size
CREATE INDEX statement
Index key specification
This last reference has a table of dbspace page sizes and maximum key sizes permitted:
Page Size Maximum Index Key Size
2 kilobytes 387 bytes
4 kilobytes 796 bytes
8 kilobytes 1,615 bytes
12 kilobytes 2,435 bytes
16 kilobytes 3,245 bytes
If 11.50 doesn't have support for large page sizes, you will have to migrate to a newer version (12.10 recommended, 11.70 a possibility) if you must create such an index.
One other possibility to consider is whether you really want such a large key string; could you reduce it to, say, 350 bytes? That would then fit in your current system.
From the informix documentation:
You can include up to 16 columns in a composite index. The total width of all indexed columns in a single composite index cannot exceed 380 bytes.
One of the columns you want to add to your index is REG_PATH_VALUE LVARCHAR(750); 750 bytes is longer than the 380 maximum allowed.
You can't "fix" this per se; either make the column size smaller, or don't include it in the index.
I am working on a problem where three memory pages are available and data is supposed to be written in one of the pages.
To keep history the data is first written to 1st page, and when that is full the next page shall be used. Finally, the last page is also full so we have to erase the data in the first page and use the first page. And so on...
How can I know which of the pages is the 'oldest'? How do I determine which to erase?
I think that a counter is needed, and this counter increments every time a new page is used. The counter values is read in the beginning to find which page is the newest and then the next page is the oldest (since circular approach). However, eventually the counter will overflow, the counter restarts and it will not be possible to be sure which value is the highest (since the new value is 0).
Example:
0 0 0 (from beginning)
1 0 0 (page0 was used)
1 2 0 (page1 was used)
1 2 3 (page2 was used)
4 2 3 (page0 was used)
4 5 3 (page1 was used)
...
255 0 254 (I dont know... )
Is the problem clear? Otherwise I can try to re-explain.
This is a technique used in EEPROM wear leveling. The concept is that since EEPROM usually has a limited life of write/erase cycles, we balance out the wear in the memory so that effectively the life increases. Since the data in EEPROM stays in the controller even on power off, we may have to store log values of some variables periodically on the EEPROM for further use.
One simple approach that you can follow is that as suggested in the comments you can update the counter by keep calculating (counter modulo 3).
Other (more general) approach is to have three registers for the counter. Whenever you have to write to a page, first scan these three registers and check the combinations where (C[i] != C[i-1] + 1)
0 0 0
1 0 0 // 1 to 0
1 2 0 // 2 to 0
1 2 3 // 3 to 1
4 2 3 // 4 to 2
...
255 0 254 // 0 to 254.
This link has more information about this subject: Is there a general algorithm for microcontroller EEPROM wear leveling?
Your idea of using a circular buffer is a good one. All you need in addition to that are to indices, one to point at the oldest page and one to point at the newest. You need to update those indices whenever you add or replace a page.
The reason you need to is that in beginning -- until the buffer is full -- only one of them will be advancing while the other will remain stationary.
I do this kind of cycles like this:
// init
int page0=adress of page0; // oldest data
int page1=adress of page1; // old data
int page2=adress of page2; // actual data (page for write)
// after page 2 is full
int tmp;
tmp=page0;
page0=page1;
page1=page2;
page2=tmp;
this way you allways know which page is which
page 0 allways the oldest data
page 1 allways the old data
page 2 allways actual data
it is easily extendable to any number of pages
instead of adress you can store the page number ... use what is more suitable for your task
I have a static database of ~60,000 rows. There is a certain column for which there are ~30,000 unique entries. Given that ratio (60,000 rows/30,000 unique entries in a certain column), is it worth creating a new table with those entries in it, and linking to it from the main table? Or is that going to be more trouble than it's worth?
To put the question in a more concrete way: Will I gain a lot more efficiency by separating out this field into it's own table?
** UPDATE **
We're talking about a VARCHAR(100) field, but in reality, I doubt any of the entries use that much space -- I could most likely trim it down to VARCHAR(50). Example entries: "The Gas Patch and Little Canada" and "Kora Temple Masonic Bldg. George Coombs"
If the field is a VARCHAR(255) that normally contains about 30 characters, and the alternative is to store a 4-byte integer in the main table and use a second table with a 4-byte integer and the VARCHAR(255), then you're looking at some space saving.
Old scheme:
T1: 30 bytes * 60 K entries = 1800 KiB.
New scheme:
T1: 4 bytes * 60 K entries = 240 KiB
T2: (4 + 30) bytes * 30 K entries = 1020 KiB
So, that's crudely 1800 - 1260 = 540 KiB space saving. If, as would be necessary, you build an index on the integer column in T2, you lose some more space. If the average length of the data is larger than 30 bytes, the space saving increases. If the ratio of repeated rows ever increases, the saving increases.
Whether the space saving is significant depends on your context. If you need half a megabyte more memory, you just got it — and you could squeeze more if you're sure you won't need to go above 65535 distinct entries by using 2-byte integers instead of 4 byte integers (120 + 960 KiB = 1080 KiB; saving 720 KiB). On the other hand, if you really won't notice the half megabyte in the multi-gigabyte storage that's available, then it becomes a more pragmatic problem. Maintaining two tables is harder work, but guarantees that the name is the same each time it is used. Maintaining one table means that you have to make sure that the pairs of names are handled correctly — or, more likely, you ignore the possibility and you end up without pairs where you should have pairs, or you end up with triplets where you should have doubletons.
Clearly, if the type that's repeated is a 4 byte integer, using two tables will save nothing; it will cost you space.
A lot, therefore, depends on what you've not told us. The type is one key issue. The other is the semantics behind the repetition.