I've been trawling books online and google incantations trying to find out what fill factor physically is in a leaf-page (SQL Server 2000 and 2005).
I understand that its the amount of room left free on a page when an index is created, but what I've not found is how that space is actually left: i.e., is it one big chunk towards the end of the page, or is it several gaps through that data.
For example, [just to keep the things simple], assume a page can only hold 100 rows. If the fill-factor is stated to be 75%, does this mean that the first (or last) 75% of the page is data and the rest is free, or is every fourth row free (i.e., the page looks like: data, data, data, free, data, data, data, free, ...).
The long and short of this is that I'm getting a handle on exactly what happens in terms of physical operations that occur when inserting a row into a table with a clustered index, and the insert isn't happening at the end of the row. If multiple gaps are left throught a page, then an insert has minimal impact (at least until a page split) as the number of rows that may need to be moved to accomodate the insert is minimised. If the gap is in one big chunk in the table, then the overhead to juggle the rows around would (in theory at least) be significantly more.
If someone knows an MSDN reference, point me to it please! I can't find one at the moment (still looking though). From what I've read it's implied that it's many gaps - but this doesn't seem to be explicitly stated.
From MSDN:
The fill-factor setting applies only when the index is created, or rebuilt. The SQL Server Database Engine does not dynamically keep the specified percentage of empty space in the pages. Trying to maintain the extra space on the data pages would defeat the purpose of fill factor because the Database Engine would have to perform page splits to maintain the percentage of free space specified by the fill factor on each page as data is entered.
and, further:
When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations. When frequent page splits occur, the index can be rebuilt by using a new or existing fill factor value to redistribute the data.
SQL Server's data page consists of the following elements:
Page header: 96 bytes, fixed.
Data: variable
Row offset array: variable.
The row offset array is always stored at the end of the page and grows backwards.
Each element of the array is the 2-byte value holding the offset to the beginning of each row within the page.
Rows are not ordered within the data page: instead, their order (in case of clustered storage) is determined by the row offset array. It's the row offsets that are sorted.
Say, if we insert a 100-byte row with cluster key value of 10 into a clustered table and it goes into a free page, it gets inserted as following:
[00 - 95 ] Header
[96 - 195 ] Row 10
[196 - 8190 ] Free space
[8190 - 8191 ] Row offset array: [96]
Then we insert a new row into the same page, this time with the cluster key value of 9:
[00 - 95 ] Header
[96 - 195 ] Row 10
[196 - 295 ] Row 9
[296 - 8188 ] Free space
[8188 - 8191 ] Row offset array: [196] [96]
The row is prepended logically but appended physically.
The offset array is reordered to reflect the logical order of the rows.
Given this, we can easily see that the rows are appended to the free space, starting from the beginning on the page, while pointers to the rows are prepended to the free space starting from the end of the page.
This is the first time I've thought of this, and I'm not positive about the conclusion, but,
Since the smallest amount of data that can be retrieved by SQL Server in a single Read IO is one complete page of data, why would any of the rows within a single page need to be sorted in the first place? I'd bet that they're not, so that even if the gap is all in one big gap at the end, new records can be added at the end regardless of whether that's the right sort order. (if there's no reason to sort records on a page in the first place)
And, secondly, thinking about the write side of thge IO, I think the smallest write chunk is an entire page as well, (even the smallest change requires the entire page be written back to disk). This means that all the rows on a page could get sorted in memory every time the page is written to, so even if you were inserting into the beginning of a sorted set of rows on a dingle page, the whole page gets read out, the new record could be inserted into it's proper slot in the set in memory, and then the whole new sorted page gets written back to disk...
Related
Recently I started to do some research on PostgreSQL free space management and defragmentation. I understand that each page on the heap contains a page header, page identifier, free space, and items. When inserting a new tuple, a new item identifier would be inserted into the beginning of free space and the new item data would be inserted at the end of free space.
After using Vacuum, the item identifier and item data of dead tuples will be removed. If the removed item identifier is in the middle of other idents, there would be a gap between the identifiers. Since usually the new identifiers will be added at the beginning of free space, will this freed space in between ever be reused again? If so, how can we find this space?
Here is an visual example of this scenario:
There is unused space between (0,3) and (0,5) after removing some tuple. How will this space be reused again?
Thanks!
The PostgreSQL technical term for what you call “item identifier” is “line pointer”. The “item pointer” or “tuple identifier” is a combination of page number and line pointer (the (0,5) in your image).
This indirection is awkward at first glance, but the advantage is that the actual tuple data can be re-shuffled any time to defragment the free space without changing the address of the tuples.
The line pointers form an array after the page header. When a new tuple should be added to the buffer, any free line pointer can be used. If there is no free line pointer, a new line pointer is added at the end of the array. For reference, see PageAddItemExtended in src/backend/storage/page/bufpage.c.
Assume that the table contains 2 pages. We take a look at the first page(0th page) Lets assume there is some data inserted and the page has three tuples(rows). Now if lets say we delete the tuple 2 then PG removes Tuple number 2 and reorders the remaining tuples to de-fragmentation, and then updates both the FSM and VM of this page. PostgreSQL continues doing this process till the last page(vacuum).
When you do vacuum the unnecessary line pointers are not removed and they will be reused in future.
Because, if line pointers are removed, all index tuples of the associated indexes must be updated.
The line pointer array is never compacted unless the page is truncated away and later re-added. The unused array slots will be reused as new tuples are added. If the page was once full of a large number of small tuples, then deleted and repopulated with a small number of large tuples, there will be extra unused lp taking up a small amount of space which will never get reused.
You can use heap_page_items from pageinspect with where lp_off=0 to find them.
Let's say I have an 2D array. Along the first axis I have a series of properties for one individual measurement. Along the second axis I have a series of measurements.
So, for example, the array could look something like this:
personA personB personC
height 1.8 1.75 2.0
weight 60.5 82.0 91.3
age 23.1 65.8 48.5
or anything similar.
I want to change the size of the array very often - for example, ignoring personB's data and including personD and personE. I will be looping through "time", probably with >10^5 timesteps. Each timestep, there is a chance that each "person" in the array could be deleted and a chance that they will introduce several new people into the simulation.
From what I can see there are several ways to manage an array like this:
Overwriting and infrequent reallocation
I could use a very large array with an extra column, in which I put a "skip" flag. So, if I decide I no longer need personB, I set the flag to 1 and ignore personB every time I loop through the list of people. When I need to add personD, I search through the list for the first person with skip == 1, replace the data with the data for personD, and set skip = 0. If there aren't any people with skip == 1, I copy the array, deallocate it, reallocate it with several more columns, and then fill the first new column with personD's data.
Advantages:
infrequent allocation - possibly better performance
easy access to array elements
easier to optimise
Disadvantages:
if my array shrinks a lot, I'll be wasting a lot of memory
I need a whole extra row in the data, and I have to perform checks to make sure I don't use the irrelevant data. If the array shrinks from 1000 people to 1, I'm going to have to loop through 999 extra records
could encounter memory issues, if I have a very large array to copy
Frequent reallocation
Every time I want to add or remove some data, I copy and reallocate the entire array.
Advantages:
I know every piece of data in the array is relevant, so I don't have to check them
easy access to array elements
no wasted memory
easier to optimise
Disadvantages:
probably slow
could encounter memory issues, if I have a very large array to copy
A linked list
I refactor everything so that each individual's data includes a pointer to the next individual's data. Then, when I need to delete an individual I simply remove it from the pointer chain and deallocate it, and when I need to add an individual I just add some pointers to the end of the chain.
Advantages:
every record in the chain is relevant, so I don't have to do any checks
no wasted memory
less likely to encounter memory problems, as I don't have to copy the entire array at once
Disadvantages:
no easy access to array elements. I can't slice with data(height,:), for example
more difficult to optimise
I'm not sure how this option will perform compared to the other two.
--
So, to the questions: are there other options? When should I use each of these options? Is one of these options better than all of the others in my case?
I want to load data with 4 columns and 80 millon rows in MySQL on Redis, so that I can reduce fetching delay.
However, when I try to load all the data, it becomes 5 times larger.
The original data was 3gb (when exported to csv format), but when I load them on Redis, it takes 15GB... it's too large for our system.
I also tried different datatypes -
1) 'table_name:row_number:column_name' -> string
2) 'table_name:row_number' -> hash
but all of them takes too much.
am I missing something?
added)
my data have 4 col - (user id(pk), count, created time, and a date)
The most memory efficient way is storing values as a json array, and splitting your keys such that you can store them using a ziplist encoded hash.
Encode your data using say json array, so you have key=value pairs like user:1234567 -> [21,'25-05-2012','14-06-2010'].
Split your keys into two parts, such that the second part has about 100 possibilities. For example, user:12345 and 67
Store this combined key in a hash like this hset user:12345 67 <json>
To retrieve user details for user id 9876523, simply do hget user:98765 23 and parse the json array
Make sure to adjust the settings hash-max-ziplist-entries and hash-max-ziplist-value
Instagram wrote a great blog post explaining this technique, so I will skip explaining why this is memory efficient.
Instead, I can tell you the disadvantages of this technique.
You cannot access or update a single attribute on a user; you have to rewrite the entire record.
You'd have to fetch the entire json object always even if you only care about some fields.
Finally, you have to write this logic on splitting keys, which is added maintenance.
As always, this is a trade-off. Identify your access patterns and see if such a structure makes sense. If not, you'd have to buy more memory.
+1 idea that may free some memory in this case - key zipping based on crumbs dictionary and base62 encoding for storing integers,
it shrinks user:12345 60 to 'u:3d7' 'Y', which take two times less memory for storing key.
And with custom compression of data, not to array but to a loooong int (it's possible to convert [21,'25-05-2012','14-06-2010'] to such int: 212505201214062010, two last part has fixed length then it's obvious how to pack/repack such value )
So whole bunch of keys/values size is now 1.75 times less.
If your codebase is ruby-based I may suggest me-redis gem which is seamlessly implement all ideas from Sripathi answer + given ones.
I was googling for something unrelated when I happened upon this page:
http://msdn.microsoft.com/en-us/library/ff647694.aspx
Which gives the following when dealing with Command objects:
"Use CommandBehavior.SequentialAccess for very wide rows or for rows with binary large objects (BLOBs)."
So, when is a row considered to be very wide?
On SQL Server, a row can only be as wide as the page size minus some overhead-- a maximum of 8060 bytes. The only exception are rows that have out of row storage, i.e. a row with a BLOB in which the row itself only stores a pointer to the actual data.
Therefore, a very wide row would be one near or above about 8000 bytes.
I've been trying to estimate the size of an Access table with a certain number of records.
It has 4 Longs (4 bytes each), and a Currency (8 bytes).
In theory: 1 Record = 24 bytes, 500,000 = ~11.5MB
However, the accdb file (even after compacting) increases by almost 30MB (~61 bytes per record). A few extra bytes for padding wouldn't be so bad, but 2.5X seems a bit excessive - even for Microsoft bloat.
What's with the discrepancy? The four longs are compound keys, would that matter?
This is the result of my tests, all conducted with an A2003 MDB, not with A2007 ACCDB:
98,304 IndexTestEmpty.mdb
131,072 IndexTestNoIndexesNoData.mdb
11,223,040 IndexTestNoIndexes.mdb
15,425,536 IndexTestPK.mdb
19,644,416 IndexTestPKIndexes1.mdb
23,838,720 IndexTestPKIndexes2.mdb
24,424,448 IndexTestPKCompound.mdb
28,041,216 IndexTestPKIndexes3.mdb
28,655,616 IndexTestPKCompoundIndexes1.mdb
32,849,920 IndexTestPKCompoundIndexes2.mdb
37,040,128 IndexTestPKCompoundIndexes3.mdb
The names should be pretty self-explanatory, I think. I used an append query with Rnd() to append 524,288 records of fake data, which made the file 11MBs. The indexes I created on the other fields were all non-unique. But if you see the compound 4-column index increased the size from 11MBs (no indexes) to well over 24MBs. A PK on the first column only increased the size only from 11MBs to 15.4MBs (using fake MBs, of course, i.e., like hard drive manufacturers).
Notice how each single-column index added approximately 4MBs to the file size. If you consider that 4 columns with no indexes totalled 11MBs, that seems about right based on my comment above, i.e., that each index should increase the file size by about the amount of data in the field being indexed. I am surprised that the clustered index did this, too -- I thought that the clustered index would use less space, but it doesn't.
For comparison, a non-PK (i.e., non-clustered) unique index on the first column, starting from IndexTestNoIndexes.mdb is exactly the same size as the database with the first column as the PK, so there's no space savings from the clustered index at all. On the off chance that perhaps the ordinal position of the indexed field might make a difference, I also tried a unique index on the second column only, and this came out exactly the same size.
Now, I didn't read your question carefully, and omitted the Currency field, but if I add that to the non-indexed table and the table with the compound index and populate it with random data, I get this:
98,304 IndexTestEmpty.mdb
131,072 IndexTestNoIndexesNoData.mdb
11,223,040 IndexTestNoIndexes.mdb
15,425,536 IndexTestPK.mdb
15,425,536 IndexTestIndexUnique2.mdb
15,425,536 IndexTestIndexUnique1.mdb
15,482,880 IndexTestNoIndexes+Currency.mdb
19,644,416 IndexTestPKIndexes1.mdb
23,838,720 IndexTestPKIndexes2.mdb
24,424,448 IndexTestPKCompound.mdb
28,041,216 IndexTestPKIndexes3.mdb
28,655,616 IndexTestPKCompoundIndexes1.mdb
28,692,480 IndexTestPKCompound+Currency.mdb
32,849,920 IndexTestPKCompoundIndexes2.mdb
37,040,128 IndexTestPKCompoundIndexes3.mdb
The points of comparison are:
11,223,040 IndexTestNoIndexes.mdb
15,482,880 IndexTestNoIndexes+Currency.mdb
24,424,448 IndexTestPKCompound.mdb
28,692,480 IndexTestPKCompound+Currency.mdb
So, the currency field added another 4.5MBs, and its index added another 4MBs. And if I add non-unique indexes to the 2nd, 3rd and 4th long fields, the database 41,336,832, and increase in size of just under 12MBs (or ~4MBs per additional index).
So, this basically replicates your results, no? And I ended up with the same file sizes, roughly speaking.
The answer to your question is INDEXES, though there is obviously more overhead in the A2007 ACCDB format, since I saw an increase in size of only 20MBs, not 30MBs.
One thing I did notice was that I could implement an index that would make the file larger, then delete the index and compact, and it would return to exactly the same file size as it had before, so you should be able to take a single copy of your database and experiment with what removing the indexes does to your file size.