I implemented a heap with MVCC; that is, each tuple in the heap has a history list so that updating the tuple does not block reading. I also implemented a secondary B+ tree; that is, it stores (primary key, RID) in its leaves, where RID is the address of the tuple with the primary key in heap. I use the following locking scheme in my implementation of B+ tree search, insertion and deletion (mark deletion actually).
Search: When the search key is found, an S-lock on the search key is requested. Once acquired, if the search key is not marked as deleted, the corresponding RID is used to access the tuple in the heap.
Insertion: After having found the leaf to accommodate the inserted key, an X-lock on the inserted key is requested. Once acquired, if the key is not present, the key is inserted into the leaf; otherwise, a uniqueness violation is issued.
Deletion: After having found the leaf where the key to delete resides, an X-lock on the deleted key is requested. Once acquired, if the key is present and not marked deleted, the key is delete-marked.
My Question: My current implementation obviously does not guarantee "writing does not block reading" of MVCC. To see this, suppose transaction T1 deletes the key "hello" and transaction T2 at the same time wants to read the key "hello". In MVCC, T2 will not be blocked; it reads the corresponding history list of "hello". However, in my implementation, T2 will be blocked since it needs to request an S-lock on "hello" (B+ tree search) which has been X-locked by T1 already (B+ tree deletion). I wonder if there is any better way to do the locking for the B+ tree.
P.S. I considered not using S-lock in search. In that case, for a delete-marked entry (key, RID), it is not possible to tell if the entry has been committed. If the entry has been committed, RID may be invalid since the corresponding tuple may have been removed from the heap. If the entry has not yet been committed, RID is still valid and thus can be used to visit the heap.
Related
Supposing there is no power loss or catastrophic OS crash so fsync() calls succeed and DB changes are truly written to oxide.
Since PRAGMA integrity_check does NOT [1] actually detect data corruption in a SQLite database, the developer needs to come up with a way to detect data corruption themselves.
Storing a checksum of the entire database file works but is too inefficient to do for every row insert (suppose the database is multiple gigabytes in size).
Keeping a checksum for each row is the obvious solution but this does not detect the case where entire rows are deleted by the data corruption. So we need a way to detect whether rows have been deleted from the table.
The method I thought up to detect missing rows is to keep a XOR sum of all checksums. This is efficient because every time a row is updated, we simply XOR the sum with the row's old checksum, and then XOR the sum with the row's new checksum. It occurs to me that this is not the strongest method but I have not been able to find any stronger alternatives that are efficient so suggestions are welcome.
EDIT: I have thought of an alternative method which requires all tables to be append-only. In an append-only table we can assume that the rowids are consecutive which means we can easily notice any missing rows except for the last row, which we can store somewhere.
References:
https://www.sqlite.org/pragma.html
Here is a scheme that should work:
In the table, add a "prev" and a "next" column which hold the primary keys of the previous and next rows respectively, so you can think of the table as a doubly linked list. This means that when a row is deleted from the table via data corruption, a scan through the table will find that a "next" key does not match the key of the next row, or the "prev" key does not match the key of the previous row. These columns should also be made UNIQUE so that two rows can't have the same "next" or "prev" row also they should have the FOREIGN KEY constraint.
See below for incorrect answer:
Pragma integrity_check will in fact detect missing rows, see where it says:
Missing or surplus index entries
So if you lose a row then it will say something like:
row 12345 missing from index some_index
EDIT: This answer is actually incorrect. The row x missing message only indicates that a row does not have a corresponding entry in the index, it does NOT mean that an entry in the index does not have a corresponding row (which is what would happen if a row was deleted).
I am doing a performance tuning on one of the largest table in our project. While reading about indexes I came across the partial index in PostgreSQL. This sounds a very nice idea to put indexes only on the rows which are getting accessed frequently.
Though, I am not able to figure-out how the partial index gets updated. For example, I have a table with following Columns:
task_uuid, job_id, enqueued_at, updated_at, task_status
task_status= ENQUEUED, RUNNING, ASSIGNED, FAILED
We search for the records which are in ENQUEUED state very frequently. If we add a partial index on (task_uuid, task_status) it will build a unique key and improve the performance. But, I want to know, what happens when the record gets updated, when we update the record RUNNING status. (task_uuid, task_status) is still unique, but will it be removed from the partial index? as the record does not fulfills the condition.
If we add a partial index on (task_uuid, task_status) it will build a unique key and improve the performance.
It will only build it as unique if you specify that in the definition of the index. Otherwise it won't be a unique index, even of those columns do happen to be unique.
When a record gets updated so that it no longer matches the WHERE predicate of the index, nothing happens to the index. It still has a pointer to the row, it just points to something no longer valid. If you did specify the index as UNIQUE, then upon inserting a conflicting index tuple, it will follow the pointer for the old tuple to the table, realize it is invalid, and allow the insertion to continue.
The next time the table is vacuumed, those obsolete pointers will be cleaned up. Queue tables with partial indexes should usually be vacuumed frequently (more frequently than the default) because the index bloats easily. The Autovac settings depend on the fraction of the table rows obsoleted, not on the fraction of the index rows obsoleted. For partial indexes, these fractions are not the same. (On the other hand, you don't seem to have a status for "COMPLETED". If completed tasks are immediately deleted, perhaps the queue table will stay small enough that this does not matter.)
Also, when an index scan follows the pointer from the index into the table and finds the row is no longer visible to anyone, it will mark the index entry as dead. Then future index scans won't have to pointlessly jump to the table. But this "microvacuum" only happens for regular index scans, not bitmap scans, and it only happens for queries done on the master, not for any done just on a hot standby.
From Database System Concepts
We use the term hash index to denote hash file structures as well as
secondary hash indices. Strictly speaking, hash indices are only
secondary index structures.
A hash index is never needed as a clustering index structure, since, if a file itself is organized by hashing, there is no need for a
separate hash index structure on it. However, since hash file
organization provides the same direct access to records that indexing
provides, we pretend that a file organized by hashing also has a
clustering hash index on it.
Is "secondary index" the same concept as "nonclustering index" (which is what I understood from the book)?
Is a hash index never a clustering index or not?
Could you rephrase or explain why the reason "A hash index is never needed as a clustering index structure" is "if a file itself is organized by hashing, there is no need for a separate hash index structure on it"? What about "if a file itself is not organized by hashing"?
Thanks.
The text tries to explain something but unfortunately creates more confusion than it resolves.
At the logical level, database tables (correct term : "relations") are made up of rows (correct term : "tuples") which represent facts about the real world the db is aimed to represent/reflect. Don't ever call those rows/tuples "records" because "records" is a concept pertaining to the physical level, which is distinct from the logical.
Typically, but this is not a universal law cast in stone, you will find that the physical organization consists of a "main" datastore which has a record for each tuple and where that record contains each and every attribute (column) value of the tuple (row). (That's unless there are LOBs in play or so.) Those records must be given a physical location in the store they are stored in and this is usually/typically done using a B-tree on the primary key values. This facilitates :
retrieving only specific [tuples/rows with] primary key values from the relation/table.
traversing the [tuples of] relation in-order of primary key values
retrieving only [tuples/rows within] specific ranges of primary key values from the relation/table.
This B-tree on the primary key values is typically called the "clustering" index.
Often, there is also a frequent need for retrieving only [tuples/rows with] specific values of attributes that are not the primary key. If that needs to be done as efficiently/fast as it can for values of the primary key, we use similar indexes that are then sometimes called "secondary". Those indexes typically do not contain all the attribute/column values of the tuple/row indexed, but only the attribute values to be indexed plus a mention of the primary key value (so we can find the rest of the attributes in the "main" datastore.
Those "secondary" indexes will mostly also be B-tree indexes which will permit in-order traversal for the attributes being indexed, but they can potentially also be hashing indexes, which permit only to look up tuples/rows using equality comparisons with a given key value ("key" = index key, nothing to do with the keys on the relation/table, though obviously for most keys on the table/relation, there will be a dedicated index too where the index key has the same attributes as the table key it supports).
Finally, there is no theoretical reason why a "primary" (/"clustered") index could not be a hash index (the text kinda suggests the opposite but that is plain wrong). But given the poor level of the explanation in your textbook, it is probably not expected of you to be taught that.
Also note that there are still other ways to physically organize a database than just using B-tree or hash indexes.
So to sum up :
"Clustered" usually refers to the index on the primary data records store
and is usually a B-tree [or some such] on the primary key
and the textbook presumably does not want you to know about more advanced possibilities
"Secondary" usually refers to additional indexes that provide additional "fast access to specific tuples/rows"
and is usually also a B-tree that permits in-order traversal just like the "clustered"/"primary" index
but can also be a hash index that permits only "access by given value" but no in-order traversal.
Hope it helps.
I will try to oversimplify just to point where your confusion is.
There are different type of index organisations:
Clustered
Non Clustered
Each of them may use one of the following file structures:
Sequential File organisation
Hash file organisation
We can have clustered indexes and non clustered indexes using hash file organisations.
Your text book is supposing that clustered indexes are used only on primary keys.
It also supposes that hash indexes, which I suppose is referring to a non-clustered index using hash file organisation, are only used for secondary indexes (non primary-key fields).
But you can actually have clustered indexes on primary keys and non-primary keys. Maybe it is a simplification done for the sake of comprehension, or it is based on a specific implementation of a DB.
The example
Take the example of a Music table from https://www.amazon.com/Amazon-DynamoDB-Developer-Guide-Services-ebook/dp/B007Q4JGBM. This table has partition key Artist and sort key SongTitle.
The question
If I query for a particular song by a particular artist, is the performance O(1), or does it depend on how many entries that artist has in the database?
Prior research
The linked documentation indicates constant performance:
You can access any item in the Music table immediately, if you provide
the Artist and SongTitle values for that item.
However, the phrasing is ambiguous and no support is given.
Here, the architecture is described in a way that indicates performance would not be constant:
DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. All items with the same partition key are stored together, in sorted order by sort key value.
I would expect this to result in O(lg m) performance, where m is the number of entries in the database for that particular partition key. This non-constant time would be necessary to search the sorted list of entries for the one with the correct sort key -- in this case, to search for the right SongTitle.
Thank you!
If you get an item by its full primary key - it's constant.
If you query by Artist, this means you don't supply a full primary key, then getting the pointer to the first item is constant.
I have been thinking about two questions. Couldn't find any resources on the internet about this. How do dbms handle it ? Or do they ? Especially Oracle.
Before the questions, here is an example: Say I have a master table "MASTER" and slave table "SLAVE".
Master table has an "ID" column which is the primary key and index is created by Oracle.Slave table has the foreign key "MASTER_ID" which refers to master table and "SLAVE_NO". These two together is the primary key of slave table, which is again indexed.
**MASTER** | **SLAVE**
(P) ID <------> (P)(F) MASTER_ID
(P) SLAVE_NO
Now the questions;
1- If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ? Does Oracle rebuilds indexes periodically ? As far as i know Oracle only balances index branches at build time. Does Oracle re-builds indexes Automatically ever ? Say if the level goes up to some high levels ?
2- Assuming Oracle does not rebuild automatically, apart from scheduling a job that rebuilds index periodically, would it be wiser to order SLAVE table's primary key columns reverse ? I mean instead of "MASTER_ID", "SLAVE_NO" ordering it as "SLAVE_NO", "MASTER_ID"i, would it help the slave table's b-tree index be more balanced ? (Well each master table might not have exact number of slave records, but still, seems better than reverse order)
Anyone know anything about that ? Or opinions ?
If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ?
Oracle's indexes are never "unbalanced": every leaf in the index is at the same depth as any other leaf.
No page split introduces a new level by itself: a leaf page does not become a parent for new pages like it would be on a non-self-balancing tree.
Instead, a sibling for the split page is made and the new record (plus possibly some of the records from the old page) go to the new page. A pointer to the new page is added to the parent.
If the parent page is out of space too (can't accept the pointer to the newly created leaf page), it gets split as well, and so on.
These splits can propagate up to the root page, whose split is the only thing which increases the index depth (and does it for all pages at once).
Index pages are additionally organized into double-linked lists, each list on its own level. This would be impossible if the tree were unbalanced.
If master_id is auto-incremented this means that all splits occur at the end (such called 90/10 splits) which makes the most dense index possible.
Would it be wiser to order SLAVE table's primary key columns reverse?
No, it would not, for the reasons above.
If you join slave to master often, you may consider creating a CLUSTER of the two tables, indexed by master_id. This means that the records from both tables, sharing the same master_id, go to the same or nearby data pages which makes a join between them very fast.
When the engine found a record from master, with an index or whatever, this also means it already has found the records from slave to be joined with that master. And vice versa, locating a slave also means locating its master.
The b-tree index on MASTER_ID will remain balanced for most useful definitions of "balanced". In particular, the distance between the root block and any child block will always be the same and the amount of data in any branch will be at least roughly euqal to the amount of data in any other sibling branch.
As you insert data into an index on a sequence-generated column, Oracle will perform 90-10 block splits on the leaves when the amount of data in any particular level increases. If, for example, you have a leaf block that can hold 10 rows, when it is full and you want to add an 11th row, Oracle creates a new block, leaves the first 9 entries in the first block, puts 2 entries in the new block, and updates the parent block with the address of the new block. If the parent block needs to split because it is holding addresses for too many children, a similar process takes place. That leaves the indexes relatively balanced throughout their life. Richard Foote (the expert on Oracle indexes) has an excellent blog on When Does an Oracle Index Increase in Height that goes into much more detail about how this works.
The only case that you potentially have to worry about an index becoming skewed is when you regularly delete most but not all blocks from the left-hand side of the index. If, for example, you decide to delete 90% of the data from the left-hand side of the index leaving one row in every leaf node pointing to data, your index can become unbalanced in the sense that some paths lead to vastly more data than other paths. Oracle doesn't reclaim index leaf nodes until they are completely empty so you can end up with the left-hand side of the index using a lot more space than is really necessary. This doesn't really affect the performance of the system much-- it is more of a space utilization issue-- but it can be fixed by coalescing the index after you've purged the data (or structuring your purges so that you don't leave lots of sparse blocks).