Some databases (like MySQL) use B+trees as a way to physically organize records. The primary index (also called clustering index) stores the keys as internal nodes and the records in the leaf level.
Since all pages have fixed size, I suppose that internal nodes and leaf nodes have different degrees: The degree of internal nodes is derived from the primary key size, and the degree of the leaf nodes is derived from the record size.
What if the record size is variable? How the degree of the leaf level is set?
Additionally, I always study b+trees under the assumption that keys have the same size. It enables a straightforward method for splitting and dividing the nodes. What about the leaf level? How to balance the nodes when the records to move have variable sizes?
What is the most efficient way to search for a key within a B-Tree node? For example, if I have a node with 100 sorted keys and I'm looking for key 23, what would be the fastest way of searching for that key within that node? What do most B-Tree implementations do?
I am pretty new to database systems and I have come doubts which I could not find answers.
If I am not wrong, the DBMS stores tables and their associative B+ trees on disk, and commonly each table will have a B+ tree based on its primary key.
However if I have a table which contains 2 columns student_id and GPA, and I want to do a range query on GPAs. It's not able to efficiently perform this query because there's no B+ tree constructed with GPA. My question is how DBMS address problem like this? Does it maintain a B+ tree for every column in your table?
I am trying to answer the following question but I am confused on the subject of B+ trees. Below is the question:
Is it possible to define an insertion algorithm for a B+ tree on a non-key search field? If yes, explain by showing insertions with appropriate search values (you need not define the algorithm in formal terms). If not, provide necessary justification.
I have been thinking about two questions. Couldn't find any resources on the internet about this. How do dbms handle it ? Or do they ? Especially Oracle.
Before the questions, here is an example: Say I have a master table "MASTER" and slave table "SLAVE".
Master table has an "ID" column which is the primary key and index is created by Oracle.Slave table has the foreign key "MASTER_ID" which refers to master table and "SLAVE_NO". These two together is the primary key of slave table, which is again indexed.
**MASTER** | **SLAVE**
(P) ID <------> (P)(F) MASTER_ID
(P) SLAVE_NO
Now the questions;
1- If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ? Does Oracle rebuilds indexes periodically ? As far as i know Oracle only balances index branches at build time. Does Oracle re-builds indexes Automatically ever ? Say if the level goes up to some high levels ?
2- Assuming Oracle does not rebuild automatically, apart from scheduling a job that rebuilds index periodically, would it be wiser to order SLAVE table's primary key columns reverse ? I mean instead of "MASTER_ID", "SLAVE_NO" ordering it as "SLAVE_NO", "MASTER_ID"i, would it help the slave table's b-tree index be more balanced ? (Well each master table might not have exact number of slave records, but still, seems better than reverse order)
Anyone know anything about that ? Or opinions ?
If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ?
Oracle's indexes are never "unbalanced": every leaf in the index is at the same depth as any other leaf.
No page split introduces a new level by itself: a leaf page does not become a parent for new pages like it would be on a non-self-balancing tree.
Instead, a sibling for the split page is made and the new record (plus possibly some of the records from the old page) go to the new page. A pointer to the new page is added to the parent.
If the parent page is out of space too (can't accept the pointer to the newly created leaf page), it gets split as well, and so on.
These splits can propagate up to the root page, whose split is the only thing which increases the index depth (and does it for all pages at once).
Index pages are additionally organized into double-linked lists, each list on its own level. This would be impossible if the tree were unbalanced.
If master_id is auto-incremented this means that all splits occur at the end (such called 90/10 splits) which makes the most dense index possible.
Would it be wiser to order SLAVE table's primary key columns reverse?
No, it would not, for the reasons above.
If you join slave to master often, you may consider creating a CLUSTER of the two tables, indexed by master_id. This means that the records from both tables, sharing the same master_id, go to the same or nearby data pages which makes a join between them very fast.
When the engine found a record from master, with an index or whatever, this also means it already has found the records from slave to be joined with that master. And vice versa, locating a slave also means locating its master.
The b-tree index on MASTER_ID will remain balanced for most useful definitions of "balanced". In particular, the distance between the root block and any child block will always be the same and the amount of data in any branch will be at least roughly euqal to the amount of data in any other sibling branch.
As you insert data into an index on a sequence-generated column, Oracle will perform 90-10 block splits on the leaves when the amount of data in any particular level increases. If, for example, you have a leaf block that can hold 10 rows, when it is full and you want to add an 11th row, Oracle creates a new block, leaves the first 9 entries in the first block, puts 2 entries in the new block, and updates the parent block with the address of the new block. If the parent block needs to split because it is holding addresses for too many children, a similar process takes place. That leaves the indexes relatively balanced throughout their life. Richard Foote (the expert on Oracle indexes) has an excellent blog on When Does an Oracle Index Increase in Height that goes into much more detail about how this works.
The only case that you potentially have to worry about an index becoming skewed is when you regularly delete most but not all blocks from the left-hand side of the index. If, for example, you decide to delete 90% of the data from the left-hand side of the index leaving one row in every leaf node pointing to data, your index can become unbalanced in the sense that some paths lead to vastly more data than other paths. Oracle doesn't reclaim index leaf nodes until they are completely empty so you can end up with the left-hand side of the index using a lot more space than is really necessary. This doesn't really affect the performance of the system much-- it is more of a space utilization issue-- but it can be fixed by coalescing the index after you've purged the data (or structuring your purges so that you don't leave lots of sparse blocks).