BLink Tree insert operation - database

The B-link tree introduced in Lehman and Yao (1981) claims that any insert operation need at most 3 locks simultaneously.
I have a hard time to find a concrete scenario where 3 locks are acquired. What is the scenario?

The scenario appears when:
Split leaf. A leaf node (originally A) is split (into A1 and A2), hence the split key (max(A1)) needs to be inserted into the parent node (T).
Parent node has link. The parent node T also has a valid link pointer that points to S. The value has to be inserted into S instead of T.
The three locks are:
On leaf node A1: to prevent further split of this node (and also nodes that its link points to)
On T: when Move.right is performed (see below).
On S: when Move.right is performed (see below).
[Move.right]
while True:
S = scannode(v, T)
if isLinkPointer(S):
lock(S) # <-- 3 locks *
unlock(T) # <-- 2 locks
T = S
Lock 2 and 3 are more like "transitions locks" that has to be acquired when moving right. Therefore the 3 lock scenario really is just a tiny tiny amount of time.
A Crude Graphic Illustration.

Related

If leaf node to the left of current node has space, should i redistribute the records before insertion?

I'm working on a homework problem involving B+ Trees. The question asks us to insert (2,3,5,7,11,17,19,23,29,31) in a B+ Tree whose node structure has 4 pointers.
This is where I have reached.
Solution so far
Image 2
I want to know if now, to inset 17 should I split the current node into two and distribute key-values (Same as step 2 : Image 1) or should I move 5 to the leaf node on left and update key-value in parent node to 7 and the insert 17 by shifting 7 and 11 one place left(As in image 2)? The second approach would impose computational overhead but will save space. But there is no mention of this in the text.(Database System Concepts)

Priority queue/min heap for directed graph with Dijkstra's alg

For an assignment I need to implement Dijkstra's algorithm using a priority queue implemented as a min heap. I already have an implementation in which I use a distance table and an unordered array of the unprocessed vertexes. (As described towards the bottom of the page here).
An example input file is given below, where the first line is the number of vertexes and each line after is: source, destination, weight.
3
1 2 3
3 2 1
1 3 1
My initial thought was to treat each edge in the graph as a node in the heap, but that isn't right because one of the test files has 15677372 edges and I can't even make an array that big without an immediate segfault. After implementing the method with the array of unprocessed vertexes it seems that I need to somehow replace that array with the heap, but I'm not sure how to go about that.
Could anyone point me in the right direction?
Typically, in Dijkstra's algorithm, your priority queue will hold the nodes in the graph and the best estimated distance to that node so far. The standard technique is to enqueue all nodes in the graph into the priority queue initially at distance ∞, then to use the queue's decrease-key operation to lower them as needed.
This is often completely infeasible from a memory standpoint, so an alternate interpretation is to keep the queue empty initially and then seed it with the start node at distance 0. Every time you process a node, you then update the distance estimate for each of its adjacent nodes. You do this as follows:
If the node is already in the priority queue, you can use decrease-key to lower the node's distance.
If the node is not already in the priority queue, then you insert it at the required priority.
This keeps the priority queue small for graphs that aren't absurdly dense.
Hope this helps!

Understanding B+ tree insertion

I'm trying to create a B+ tree with the following sequence,
10 20 30 40 50 60 70 80 90 100
all index nodes should have minimum of 2 and max of 3 keys. I was able to insert till 90, but as soon as insert 100 it increases the height from 2 to 3.
The problem is second child of root has one node, and I cannot fix it. It should have atleast 2, right? Can someone guide me?
UPDATE: I'm following this algorithm
If the bucket is not full (at most b - 1 entries after the insertion), add the record.
Otherwise, split the bucket.
Allocate new leaf and move half the bucket's elements to the new bucket.
Insert the new leaf's smallest key and address into the parent.
If the parent is full, split it too.
Add the middle key to the parent node.
Repeat until a parent is found that need not split.
If the root splits, create a new root which has one key and two pointers. (That is, the value that gets pushed to the new root gets removed from the original node)
P.S: I'm doing it manually, by hand, to understand the algorithm. There's no code!
I believe your B+ Tree is O.K, assuming the order of your B+ Tree is 3. If the order is m, each internal node can have ⌈m/2⌉ to m children. In your case, each internal node can have 2 to 3 children. In a B+ Tree if a node is having just 2 it children, then it requires only 1 key, so no constraints are violated by your B+ Tree.
If you are still confused, look at this B+ Tree Simulator. Try it.
To get the tree you've drawn after inserting the values 10 to 100, the Order of your tree must be 4 not 3. Otherwise the answer given is correct: order m allows m-1 keys in each leaf and each node. After that the Wikipedia description gets a bit confusing as it concentrates on children not keys, and doesn't mention what to do with rounding. Dealing with just keys, the rules are:
Max keys for all nodes = Order-1
Min keys for leaf nodes = floor(Order/2)
Min keys for internal nodes = floor(maxkeys/2)
So you are correct in having one key in the node (order=4, max=3, minleaf=2, minnode=1). You might find this page useful as it has an online JavaScript version of the processes as well as documentation of both insert and delete:
http://goneill.co.nz/btree.php

parallel binary tree using mpi

How are binary trees implemented in C/MPI ? And in particular, how can I gather on a processor the missing parts of the tree?
Say I want to divide a rectangle in N sub-rectangle, N being the total number of processes.
I start with a division in two sub-rectangles (along a direction and at a position that are given by a certain rule, never mind here). The left rectangle is assigned to processes [0,mid_process] and the right sub-rectangle is assigned to the other processes.
Then I do this recursively for inside these two sub-rectangles until I have, eventually, only one process left in the current sub-rectangle.
By doing this, each process is going to have a "local" part of the tree, which consists of the path, from itself to the root node.
However, later one, once the tree is built, I want each process (i.e. each final sub-rectangle) to identify which are its neighboring processes.
Assuming I have the whole tree locally, this can be done by traversing the tree, and for each leaf, look at its spatial extent and compare its boundary with the ones of the sub-rectangles we're looking the neighbors for.
This thing is I'm not sure how I send and receive the missing parts of the tree.
The figure here shows an example of a decomposition among 11 processes, and its associated binary tree.
Let's say I'm process 2. Locally, the tree I have in memory consists in the following nodes : [root], [0-4], [2-4], [2], where the numbers inside [] are the ranks of the processes. The neighbors of the process [2] are [0],1, [3],[4], [5], [8].

Lowest Common Ancestor of Binary Tree(Not Binary Search Tree)

I tried working out the problem using Tarjan's Algorithm and one algorithm from the website: http://discuss.techinterview.org/default.asp?interview.11.532716.6, but none is clear. Maybe my recursion concepts are not build up properly. Please give small demonstration to explain the above two examples. I have an idea of Union Find data-structure.
It looks very interesting problem. So have to decode the problem anyhow. Preparing for the interviews.
If any other logic/algorithm exist, please share.
The LCA algorithm tries to do a simple thing: Figure out paths from the two nodes in question to the root. Now, these two paths would have a common suffix (assuming that the path ends at the root). The LCA is the first node where the suffix begins.
Consider the following tree:
r *
/ \
s * *
/ \
u * * t
/ / \
* v * *
/ \
* *
In order to find the LCA(u, v) we proceed as follows:
Path from u to root: Path(u, r) = usr
Path from v to root: Path(v, r) = vtsr
Now, we check for the common suffix:
Common suffix: 'sr'
Therefore LCA(u, v) = first node of the suffix = s
Note the actual algorithms do not go all the way up to the root. They use Disjoint-Set data structures to stop when they reach s.
An excellent set of alternative approaches are explained here.
Since you mentioned job interviews, I thought of the variation of this problem where you are limited to O(1) memory usage.
In this case, consider the following algorithm:
1) Scan the tree from node u up to the root, finding the path length L(u)
2) Scan the tree from node v up to the root, finding the path length L(v)
3) Calculate the path length difference D = |L(u)-L(v)|
4) Skip D nodes in the longer path from the root
5) Walk up the tree in parallel from the two nodes, until you hit the same node
6) Return this node as the LCA
Assuming you only need to solve the problem once (per data set) then a simple approach is to collect the set of ancestors from one node (along with itself), and then walk the list of ancestors from the other until you find a member of the above set, which is necessarily the lowest common ancestor. Pseudocode for that is given:
Let A and B begin as the nodes in question.
seen := set containing the root node
while A is not root:
add A to seen
A := A's parent
while B is not in seen:
B := B's parent
B is now the lowest common ancestor.
Another method is to compute the entire path-to-room for each node, then scan from the right looking for a common suffix. Its first element is the LCA. Which one of these is faster depends on your data.
If you will be needing to find LCAs of many pairs of nodes, then you can make various space/time trade-offs:
You could, for instance, pre-compute the depth of each node, which would allow you to avoid re-creating the sets(or paths) each time by first walking from the deeper node to the depth of the shallower node, and then walking the two nodes toward the root in lock step: when these paths meet, you have the LCA.
Another approach annotates nodes with their next ancestor at depth-mod-H, so that you first solve a similar-but-H-times-smaller problem and then an H-sized instance of the first problem. This is good on very deep trees, and H is generally chosen as the square root of the average depth of the tree.

Resources