I am doing a performance tuning on one of the largest table in our project. While reading about indexes I came across the partial index in PostgreSQL. This sounds a very nice idea to put indexes only on the rows which are getting accessed frequently.
Though, I am not able to figure-out how the partial index gets updated. For example, I have a table with following Columns:
task_uuid, job_id, enqueued_at, updated_at, task_status
task_status= ENQUEUED, RUNNING, ASSIGNED, FAILED
We search for the records which are in ENQUEUED state very frequently. If we add a partial index on (task_uuid, task_status) it will build a unique key and improve the performance. But, I want to know, what happens when the record gets updated, when we update the record RUNNING status. (task_uuid, task_status) is still unique, but will it be removed from the partial index? as the record does not fulfills the condition.
If we add a partial index on (task_uuid, task_status) it will build a unique key and improve the performance.
It will only build it as unique if you specify that in the definition of the index. Otherwise it won't be a unique index, even of those columns do happen to be unique.
When a record gets updated so that it no longer matches the WHERE predicate of the index, nothing happens to the index. It still has a pointer to the row, it just points to something no longer valid. If you did specify the index as UNIQUE, then upon inserting a conflicting index tuple, it will follow the pointer for the old tuple to the table, realize it is invalid, and allow the insertion to continue.
The next time the table is vacuumed, those obsolete pointers will be cleaned up. Queue tables with partial indexes should usually be vacuumed frequently (more frequently than the default) because the index bloats easily. The Autovac settings depend on the fraction of the table rows obsoleted, not on the fraction of the index rows obsoleted. For partial indexes, these fractions are not the same. (On the other hand, you don't seem to have a status for "COMPLETED". If completed tasks are immediately deleted, perhaps the queue table will stay small enough that this does not matter.)
Also, when an index scan follows the pointer from the index into the table and finds the row is no longer visible to anyone, it will mark the index entry as dead. Then future index scans won't have to pointlessly jump to the table. But this "microvacuum" only happens for regular index scans, not bitmap scans, and it only happens for queries done on the master, not for any done just on a hot standby.
I am reading that RDMS stores table data on disk in some form of B-tree, and also that table indexes are stored in the B-tree form.
I read that primary key index is created automatically for a primary key defined, but that it could also be dropped anytime. So, it implies that primary-key index is an additional structure next to the B-tree used for just storing table data.
Isn't that wasting of resources - why wouldn't all table table be kept through primary-key index?
If it isn't like that, which order is then used for the B-tree used to store table data?
Thanks for clarifying
The primary key index is an optimization for finding the place on disk where the row is held. As a structure, it contains simply the PK data, not the whole row.
On a database, performance is often gated by how many pages are read from disk vs. cache. Since the PK index is smaller than the whole table, it is more likely to be in cache, it causes fewer blocks to be read from disk, and less blocks of other tables are removed from cache. It therefore is a major performance optimization.
Further, while modifying the table data, rows are locked. If the primary key were being scanned from the table data on disk, locked rows would slow access for all the other queries. By separating the index as a separate structure, the index can be used even while the row being pointed to is locked.
So overall, the separate PK structure is a classic space-for-time optimization.
EDIT What is the order of the rows in the table? The following answer is for Oracle, but is applicable to many databases.
Short answer: rows are not ordered on disk which is why the PK index (and other indexes) are so important.
Long answer:
While the primary-key b-tree structure is necessarily sorted (the b-tree) the rows of the table are scattered across the table-space. To understand this we need to drill down the various data structures.
First, the database is structured into logical entities called a tablespaces. A tablespace is the space in one or more files on one or more disks. The files start empty. When the tablespace become full (technically when the data in it reaches a threshold) the tablespace can be automatically grown. It can also be grown manually by enlarging the file (adding an 'extent', or adding new files). Tablespaces can be clustered across multiple machines as well as disks.
Second: A tablespace is divided segments, each segment for the use of a single table or index.
Third: The segment is divided into blocks, each block has space for one or more rows. These blocks are not the same as disk or OS blocks; Oracle blocks are one or more OS blocks. (This is for transportability, and for managing media with different block sizes).
On insert, a database will select a space in a block from anywhere in the tablespace. The row can be inserted sequentially (especially bulk inserting into an empty table), but normally the database will also reuse space where rows have been deleted or moved due to some types of update. While the placement is theoretically kind-a predictable, in practice you should never rely on or expect the row to be placed in any specific block.
One interesting thing in Oracle is the ROWID. This is the reference stored in the index that allows the DB to look up the row:
An extended rowid has a four-piece format, OOOOOOFFFBBBBBBRRR:
The first 6 characters OOOOOO represent data object number, using 32bits
The next 3 characters FFF represent tablespace-relative datafile number, using 10bits.
The next 6 characters BBBBB represent block number, using 22bits.
The last 3 character RRR represent row number, using 16bit
For much more detail, see http://docs.oracle.com/cd/E11882_01/server.112/e25789/logical.htm#autoId0
One other thought: There is a concept in the DB world called partitions, where a dataset is divided across different tablespaces (frequently different disks or nodes in a cluster) depending on some expression logic. For example, on a table of customers, a vertical partition could be defined by the country of the person. That way you can ensure that the US customers are physically on one disk while the Australians are on another.
[11] tells:
"In a nonclustered index, the leaf level does not contain all the data. In addition to the key values, each index row in the leaf level (the lowest level of the tree) contains a bookmark that tells SQL Server where to find the data row corresponding to the key in the index.
A bookmark can take one of two forms. If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row
. If the table is a heap (in other words, it has no clustered index), the bookmark is a row identifier (RID), which is an actual row locator in the form File#:Page#:Slot#."
Is this a copy of clustered index key or nonclustered index has a pointer to it?
Should all clustered index structure, i.e. b-tree with intermediate data, be traversed to get to row data through non-clustered index bookmark on clustered table?
What does clustered index indtroduce that direct referencing become impossible?
Update:
Let me re-phrase the question. How this is done I can read myself but I want to understand why it is done this way.
Would not it be much more efficient to continue referencing row data by RID from non-clustered index having (added) clustered one?
Suppose a table has only non-clustered index(es) (but no clustered index).
Non-clustered index leaves contains RID to real data. For direct access of row data without any need of lookup/traversals.
Adding clustered index means elimination of IAM (Index Allocation Map) pages and substituting all RIDs of all non-clustered indexes by clustered index keys + necessity of additional lookup instead of direct access.
What is the point in this?
Update2:
Was my question downvoted by Microsoft himself? Thanks again, this is an honor.
It is pointless to downvote without explaining.
Update3:
#PerformanceDB", I could not understand the phrase in your answer:
""It also means the B-Tree is reduced by one level in index height (which is why they are tiny if you inspected them)."
Can you explain it?
Yes, I'd like illustrations.
I started to read: Debunking myths about clustered indexes - part 4 (CIXs, TPC-C & Oracle clusters) and it, as many other sources, explicitly refer to the fact that SQL Server, in contrat to Oracle, lacks direct access features on a clustered table.
Update4 (Update5 - corrected by strike-out):
A few answerers referred to the fact that a bookmark CI key in NCI leaf is for address independence in case of page splits.
Aren't during index reorganization or de-fragmenting in non-clustered table with CI NCI (non-clustered index) the rows relocated and corresponding RIDs in NCI change in NCI are modified?
This seems to me as addressing scheme deficiency - the row should have moved with its address, should have not it?
Also, Is heap completely immune to page splits? due to size increase of variable size data types in row
Related questions:
What do I miss in understanding the clustered index?
Why/when/how is whole clustered index scan chosen rather than full table scan?
Reasons not to have a clustered index in SQL Server 2005
Cited:
[11]
Inside Microsoft® SQL Server™ 2005: The Storage Engine
By Kalen Delaney - (Solid Quality Learning)
...............................................
Publisher: Microsoft Press
Pub Date: October 11, 2006
Print ISBN-10: 0-7356-2105-5
Print ISBN-13: 978-0-7356-2105-3
Pages: 464
[11a]
p.250 Section Index organization from Chapter 7. Index Internals and Management
Here is helpful online copypaste from it
http://sqlserverindexeorgnization.blogspot.com/
though without any credits to source
The problem is that the doco is gobbledegook, and increases the very confusion it is alleging it clarifies. If you forget about all that and start again, it is actually quite straight-forward. Since you are inquiring re data storage structures, and concerned re performance, let's look at that perspective (not the logical). There is no data storage structure caled "Table".
Heap
Data pages containing rows. There is no Clustered Index. The rows are not shifted as a result of Inserts/Deletes. The rows can be read in entirety (table scan) or singly (via a NonClustered Index). It gets badly fragmented.
Clustered Index
B-Tree. The Index is Clustered with the Data Rows. The leaf level is the data row. That means one less I/O on every access. It also means the B-Tree is reduced by one level in index height (which is why they are tiny if you inspected them). The Heap (entire data storage stucture) is eliminated. There are no pointers. The rows are maintained in Clustered Index Key order (rows are moved on the page as a result of Insert/Delete/Expand). Pages are trimmed within the extents.
NonClustered Index
B-Tree. Full height as required by number of rows.
Where there is a Clustered Index, the Leaf level is the Clustered Index Key (so that it can go to the exact location in the CI, which is the row).
Where there is no Clustered Index, the Leaf level is a pointer: File:Page:Offset (so that it can go to the Heap, and get the row). The RowIds in the Heap do not change (if they did, every time you Inserted/Deleted one row, you would have to update all the NCI entries in all associated NCIs, for all other rows on the page).
That is why, when you create a CI, all NCIs are automatically rebuilt (they have to be switched from [2] to 1). Obviously, always create the CI before the NCIs.
There is no File:Page:Slot, the row length is variable, it is Offset within Page.
There is no Bookmark or other goobledegook.
Re "No direct access to data row in clustered table - why"
Nonsense. You have direct and immediate access to each data row, via the CI (one less I/O) or the NCI⇢CI Key.
This is very fast, invented by Britton Lee; re-implemented and patented by Sybase; obtained by dishonesty and for a pittance by Darth Vader.
If you need further clarification, I can provide illustrations.
Responses to Comments
"It also means the B-Tree is reduced by one level in index height (which is why they are tiny if you inspected them)."
Let's say you have a tables with 1 billion rows. The "height" of the B-Tree of any given index (eg. Unique, on PK) drawn vertically is say 8; or you can say the index is 8 levels deep, between the top (a single entry) and the bottom, the leaf level. the leaf level is of course the widest, and most polpulated; it will have 1 billion entries. Given that each index page contains say 256 entries, the leaf-minus-one level contains 390K entries.
The CI B-tree (index only portion) will contain 7 levels, 390K entries, taking 10MB; because the leaf level IS the data row (of which there are 1 billion entries, spread nicely across 100GB), and is thus excluded, or not repeated.
Yes, I'd like illustrations.
Ok. I have a set of finished Sybase docs; I have butchered one for you, so as to avoid confusion, and excluded the bits that Sybase has, that MS does not. Sorry. Don't follow the links, just stay on the one page. Also the very low levels of Fragmentation in the heap are different by the fragmentation in the Heap is massive, in both Sybase and MS, so I have left that intact.
Data Storage Basics
(That is a condensed version of my much more elaborate Sybase diagrams, which I have butchered for the MS context. There is a link at the bottom of that doc, if you want the full Sybase set.)
"I started to read: Debunking myths about clustered indexes - part 4 (CIXs, TPC-C & Oracle clusters) and it, as many other sources, explicitly refer to the fact that SQL Server, in contrat to Oracle, lacks direct access features on a clustered table."
Be careful what you read, the web is full of superficial information; half truths discussed out of context; misinformation (from vendiors as well as well-meaning ignorants). As you notice, I just answer questions; I do not waste time answering points raised in references.
Just think about this. Well-implemented Tables with a CI do not need de-fragmentation; and when implemented badly, need infrequent de-fragmentation; tables without a CI need frequent and pretty much offline de-fragmentation. That's your maintenance window running into Monday morning. Just an example of why discussing items in isolation is actually misinformation. Which is why my docs are all linked and related to one another.
"A few answerers referred to the fact that CI key in NCI leaf is for address independence in case of page splits."
Yes, I just would not put it that way, that's as confusing as the first reference you posted. Page splits have nothing to do with it. I put is the way I did in my post above on purpose, for clarity. Since the rows move (the CI keeps the pages and Extents trim), the NCI MUST have the CI key, in order to find the row. It can't use a RowId which would keep changing all the time. Unless you have wide CI keys, this is no big deal; a 4-byte RowId (plus processing overhead) vs an 8-byte CI key (minus said overhead) ... who cares (ok, maybe you). Address the higher level issues, and the low level issues will be small enough to not warrant address. Squeezing out 1% performance improvement at the low level when your db is fragmented and unnormalised, is amore than a bit silly.
A system in an integrated set of components, none can be changed or evaluated in isolation. A bunch of components that are not integrated are dis-integrated, not a system. At your level of questioning, you are not yet in a position to form conclusions, or have grudges against this or that, if you do, they are premature conclusions and grudges, that will impede your progress. On top of that, there is a big difference between knowledge gained by question-and-answer vs knowledge gained by reading plus experience.
"Aren't during undex reotganization or defragmanting of non-clustered table with CI the rows relocated and corresponding RIDs in NCI change in NCI?"
Do you mean "non-clustered INDEX with CI" ? Well the NCIs are not worth de-fragmenting, just drop/create them.
Or do you mean "de-fragmenting a CI [whole table]" ? I have already posted, when you re-create the CI (or de-fragment it in place), the NCIs are automatically rebuilt. It is not about RowIds, it is about the change: when you drop the CI, the NCIs have to be rewritten from CI keys to RowIds; when you create the CI, the NCIs have to the changed back to CI Keys. Switched on DBAs drop the NCI before dropping the CI.
"This seems to me as addressing scheme deficiency - the row should have moved with its address, should have not it?"
You're getting too low-level without understanding the higher levels. If the row moves, its address changes; if the address changes, the row moves. Either you have a CI (rows move) xor you have a Heap (rows do not move).
"Also, Is heap completely immune to page splits?"
No. Page Splits still happen when variable length rows expand and there is no room on the page. But in the scheme of things, massive fragmentation on Heaps, due to never moving rows, due to it being RowId based (which the NCIs rely on), this is a small item.
Let me re-phrase the question. How
this is done I can read myself but I
want to understand why it is done this
way.
Would not it be much more efficient to
continue referencing row data by RID
from non-clustered index having
(added) clustered one?
NO ! If a table has an insert, and a page split occurs, then you would have to potentially update a lot of references that use a RID to point to the new locations of those data rows that have been moved to a new page in SQL Server. That's exactly why the SQL Server team chose to use the clustering key instead, as the "data pointer", so to speak. The value of the clustering key does not change when a page split occurs, so no update to indices are needed.
Would not it be much more efficient to continue referencing row data by RID from non-clustered index having (added) clustered one?
The whole point of a clustered index is that the records are accessed via a logical locator (which is not normally meant to change), not physical.
If the indexes were pointing to a physical RID and a row changed its physical location (say from a page split), all indexes would need to be updated too.
It's exactly the kind of problem the clustered indexes were invented to deal with.
If a table has a clustered index, each non-clustered index row contains a copy of the clustered index key.
If a table does not have a clustered index, i.e. the table is a heap, each non-clustered index row contains a pointer built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
When you identify (select) a row using a clustered index, you have all the columns from the row.
When you identify a row in a non-clustered index, you need to perfrom another lookup step to obtain the columns not included in the non-clustered index.
Would not it be much more efficient to
continue referencing row data by RID
from non-clustered index having
(added) clustered one?
In many cases it would be more efficient, yes. I believe that clustered indexes were originally implemented that way (in version 6.0?). Presumably they were changed for the reasons that marc_s mentioned, which make sense if your clustered index is such that it has a lot of page splits.
I would not have posted this (my) question, have I seen before my posting here that answer by AlexSmith there, which I saw just a few minutes after posting and having been already answered here:
"Greg Linwood wrote a great series of blogs. It is a must read: Debunking myths about clustered indexes"
It is pity, it is impossible to accept it there as an answer here
Update:
The accepted here answer by PerformanceDBA told: "The problem is that the doco is gobbledegook, and increases the very confusion it is alleging it clarifies"
Well, all msdn docs tell and show, for ex., cf. pictures from Clustered Index Structures vs. "Heap Structures" that clustered table does not have IAM page. Meanwhile, the output from following the code from Inside the Storage Engine: Using DBCC PAGE and DBCC IND to find out if page splits ever roll back shows the opposite.
Having no desire to continue spamming here I shifted my questioning and participation to /www.sqlservercentral.com/Forums
My related question there:
Does clustered table have IAM page?
I am struggling understanding what a clustered index in SQL Server 2005 is. I read the MSDN article Clustered Index Structures (among other things) but I am still unsure if I understand it correctly.
The (main) question is: what happens if I insert a row (with a "low" key) into a table with a clustered index?
The above mentioned MSDN article states:
The pages in the data chain and the rows in them are ordered on the value of the clustered index key.
And Using Clustered Indexes for example states:
For example, if a record is added to the table that is close to the beginning of the sequentially ordered list, any records in the table after that record will need to shift to allow the record to be inserted.
Does this mean that if I insert a row with a very "low" key into a table that already contains a gazillion rows literally all rows are physically shifted on disk? I cannot believe that. This would take ages, no?
Or is it rather (as I suspect) that there are two scenarios depending on how "full" the first data page is.
A) If the page has enough free space to accommodate the record it is placed into the existing data page and data might be (physically) reordered within that page.
B) If the page does not have enough free space for the record a new data page would be created (anywhere on the disk!) and "linked" to the front of the leaf level of the B-Tree?
This would then mean the "physical order" of the data is restricted to the "page level" (i.e. within a data page) but not to the pages residing on consecutive blocks on the physical hard drive. The data pages are then just linked together in the correct order.
Or formulated in an alternative way: if SQL Server needs to read the first N rows of a table that has a clustered index it can read data pages sequentially (following the links) but these pages are not (necessarily) block wise in sequence on disk (so the disk head has to move "randomly").
How close am I? :)
If you happen to insert a row with a "low" ID as you say, then yes - it will be placed in the vicinity of your other rows that are already there with similar ID's.
If your SQL Server page (8K chunks) is filled to the max, then a page split will occur - half the rows will remain on that page, and the other half will be moved to a new page. These two new pages will now have some capacity for new row.
That's one of the reasons why you don't want to use something as your clustering key that is very random, e.g. a GUID, which will cause rows to the inserted all over the place.
Trying to avoid page splits (which are quite expensive operations) is one of the main reasons why gurus like Kimberly Tripp heavily advocate using something that is ever increasing as your clustering key - e.g. an INT IDENTITY column. Here, a new value is always guaranteed to be larger than anything that's already in your database, so new rows are always added at the "end" of the food chain.
For more excellent background info, see Kimberly Tripps' Blog - especially her Clustering Key category!
How close are you? Very!
These articles may help consolidate your understanding:
http://msdn.microsoft.com/en-us/library/aa964133(SQL.90).aspx
http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx
If I have a table column with data and create an index on this column, will the index take same amount of disc space as the column itself?
I'm interested because I'm trying to understand if b-trees actually keep copies of column data in leaf nodes or they somehow point to it?
Sorry if this a "Will Java replace XML?" kind question.
UPDATE:
created a table without index with a single GUID column, added 1M rows - 26MB
same table with a primary key (clustered index) - 25MB (even less!), index size - 176KB
same table with a unique key (nonclustered index) - 26MB, index size - 27MB
So only nonclustered indexes take as much space as the data itself.
All measurements were done in SQL Server 2005
The B-Tree points to the row in the table, but the B-Tree itself still takes some space on disk.
Some database, have special table which embed the main index and the data. In Oracle, it's called IOT -- index-organized table.
Each row in a regular table can be identified by an internal ID (but it's database specific) which is used by the B-Tree to identify the row. In Oracle, it's called rowid and looks like AAAAECAABAAAAgiAAA :)
If I have a table column with data and
create an index on this column, will
the index take same amount of disc
space as the column itself?
In a basic B-Tree, you have the same number of node as the number of item in the column.
Consider 1,2,3,4:
1
/
2
\ 3
\ 4
The exact space can still be a bit different (the index is probably a bit bigger as it need to store links between nodes, it may not be balanced perfectly, etc.), and I guess database can use optimization to compress part of the index. But the order of magnitude between the index and the column data should be the same.
I'm almost sure it's quite a DB dependent, but generally – yeah, they take additional space. This happens because of two reasons:
This way you can utilize the fact
the data in BTREE leafs is sorted;
You gain lookup speed advantage as
you don't have to seek back and
forth to fetch neccessary stuff.
PS just checked our mysql server: for a 20GB table indexes take 10GB of space :)
Judging by this article, it will, in fact, take at least the same amount of space as the data in the column (in PostgreSQL, anyway).
The article also goes to suggest a strategy to reduce disk and memory usage.
A way to check for yourself would be to use e.g. the derby DB, create a table with a million rows and a single column, check it's size, create an index on the column and check it's size again. If you take the 10-15 minutes to do so, let us know the results. :)