Deletion in B Plus Tree Korth Pseucode - database

The pseudo code given in the Database Systems Concept by Korth is the following.
Pseucode
Consider that we have the following tree with order = 5
[7.0]
[3.0, 5.0] [9.0, 11.0, 13.0]
<Lead Nodes Here> <Lead Nodes Here>
Now image that we are at level 2 and in the function delete_entry([3.0, 5.0], 3.0, Pointer to Child). Now as per the algorithm we delete 3.0, so now we are left with [5.0] and [9.0, 11.0, 13.0]. Looking at the first marked if condition we see that these nodes can be merged as (1+3) <= (order - 1) therefore 4 keys can fit in one node. But since they are not a leaf node therefore the second marked line in the code will execute and apart from merging these two nodes it will also insert 7 in them. Therefore giving us a node with number of keys = 5, which violates the condition that number of keys in non leaf node should be <= order -1.
Am I doing something wrong here?

Related

Rascal MPL print id of all traversed nodes in a visit

Is it possible to print all node ids / values that are traversed in a visit? Take the following code sample:
top-down visit(asts) {
case x: println(typeOf(x));
}
I now get all types of the nodes that were visited. I would instead like to know the ids / values of all x that were encountered (without printing the entire tree). In other words, I am curious as to how Rascal traverses the list[Declaration] asts containing lang::java::m3::ASTs.
As a corrolary, is it possible to print direct ascendants / descendants of a Declaration regardless of their type? Or print the total number of children of an ast?
In the documentation (https://www.rascal-mpl.org/docs/rascal/expressions/visit/) there is no mention of this.
in principle a Rascal value does not have an "id". It's structure is its "id" and its id is its structure. You can print a hash if you want, using md5sum for example from the IO function: case value x : println(md5sum(x));
Rascal traverses lists from left to right. In top-down mode it first visits the entire list and in bottom-up mode it goes first to the children.
printing the total number of nodes in a tree: (0 | it + 1 | /_ <- t)
printing all children of a node: import Node;, case node n: println(getChildren(n));
ascendants are not accessible unless you match deeper and include two levels of nodes in a pattern.

proof or disproof adding two minimal values to a B tree and then delete them

I came across a question and I'm not sure about the right answer:
We insert two new minimal values w and z, with w > z, in a B tree --
first we insert w and then x. Right afterwards we delete them by the same order. Does the original B tree struct stay the same, or do we get a different order in the tree?
It is not guaranteed that the B-tree remains the same. It would be guaranteed if the deletions happened in the opposite order as the insertions, but if the order is:
Insert w
Insert z
Delete w
Delete z
...then it depends on implementation choices, notably how the deletion of a value that occurs in a non-leaf node is dealt with.
Here is a counter example 2-3 tree, i.e. a B-tree of order 3:
[5 , -]
/ |
[4,-] [6,7]
So we have a root with (separator) value 5 and an empty slot. There are two leaves: the first leaf is filled half, with value 4, while the right leaf is completely occupied with values 6 and 7.
Now let w=2 and z=1.
After we insert 2, we get this tree -- nothing special happens:
[5 , -]
/ |
[2,4] [6,7]
Then, to insert 1, we must split the left most leaf, and move 2 as separator value to the parent node:
[2 , 5]
/ | \
[1,-] [4,-] [6,7]
Now we get to the critical part: the deletion of 2 gives us a choice. Wikipedia describes that choice as follows:
Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator.
If we choose the second option, then that means we choose 4 as new separator value to replace the value 2. This gives us the following intermediate situation:
[4 , 5]
/ | \
[1,-] [-,-] [6,7]
The empty leaf in the middle is underflowing, so we try to rotate. We must perform a rotation with the right-sided neighbor, as the other one does not have enough values, and so we move the 6 up, and the 5 down:
[4 , 6]
/ | \
[1,-] [5,-] [7,-]
...and the tree is valid again. But,... it is not the original tree.
So, this one counter example is enough proof that the predicate is false.
If however there would be the extra information that the algorithm always takes the first alternative for the deletion of an internal value, then the predicate seems to be true.

Check if 2 BSTs have the same in-order traversal simultaneously using recursion

I am looking at this problem discussed on YouTube:
Given two binary trees, determine whether they have the same inorder traversal:
Tree 1 Tree 2
5 3
/ \ / \
3 7 1 6
/ / / \
1 6 5 7
[1,3,5,6,7] [1,3,5,6,7]
I wanted to know how to solve this problem by doing a simultaneous in-order traversal of both trees using only recursion. I know people alluded to it in the comment section but I assume they mean iteratively.
My first thought was to use 1 function and pass in 2 lists to hold the values of the trees, and then compare them in the parent function. But this seems to work on only trees that have the same height (I think).
def dual_inorder_traversal(self, p, q, pn = [], qn = []):
if not p and not q: return
if not q: pn.append(p.val)
if not p: qn.append(q.val)
if p and q:
self.dualInorder(p.left, q.left)
pn.append(p.val)
qn.append(q.val)
self.dualInorder(p.right, q.right)
return pn, qn
I then tried mutual recursion in which I have two functions, one that recurses tree1 and another that recurses tree2 and have them call each other. My idea was that we could append the root in the tree1 function when we visit it, and then then pop it from the list and compare when we visit the root in the tree2 function. Not even going to post what I tried because it didn't work at all lol. Also not even sure if mutual recursion is even possible in this case.
You are right that your first attempt to recurse through both trees in a single recursive function is difficult to do when the trees have different shapes: you may need to recur deeper in one tree, while in the other you are stuck at a leaf. This means you need to maintain lists to collect values (as can be seen in your code).
But storing values in lists is really defeating the purpose of the challenge. You might then as well collect all values in two lists and compare the lists, or collect the values of one tree in one list and compare against that list while making the traversal in the other (as is done in the video you linked to). Yet, it should not be necessary to use that much space for the job.
The idea to have two separate traversals going on in tandem is more promising. It becomes really easy when you make the traversal function a generator. This way that function stops running whenever the next value is found, giving you the opportunity to proceed in the other traversal also with one step.
Here is an inorder generator you could define in the Node class:
class Node:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def inorder(self):
if self.left:
yield from self.left.inorder()
yield self.value
if self.right:
yield from self.right.inorder()
Now it is a matter of iterating through p.inorder() and q.inorder() in tandem. You can write this with a loop, but itertools has a very usefull zip_longest function for that purpose:
from itertools import zip_longest
def same_inorder(p, q):
return all(v1 == v2 for v1, v2 in zip_longest(p.inorder(), q.inorder()))
There is a boundary case when the trees have a different number of nodes, but their inorder traversal is the same until the smaller tree is fully iterated. Then zip_longest will fill up the gap and produce None values to compare with. So as long as the tree nodes do not have None as value, that is fine. If however you expect nodes to have None values (quite odd), then use a different filler value. For instance nan could be an option:
zip_longest(p.inorder(), q.inorder(), fillvalue=float("nan"))
As a demo, let's take these two trees:
5 3
/ \ / \
3 7 1 6
/ / / \
1 6 5 7
Then you can create and compare their inorder traversals as follows:
tree1 = Node(5,
Node(3,
Node(1)
), Node(7,
Node(6)
)
)
tree2 = Node(3,
Node(1),
Node(6,
Node(5),
Node(7)
)
)
print(same_inorder(tree1, tree2)) # True
Time and Space use
Let's define 𝑚 as the number of nodes in the first tree and 𝑛 as the number of nodes in the second tree.
Runtime complexity = O(𝑚+𝑛)
Average, auxiliary space complexity = O(log(𝑛) + log(𝑚))
The auxiliary space excludes the space already used by the input, which is O(𝑛+𝑚). On top of this we need to use the stack during the recursion. On average that will be O(log(𝑛) + log(𝑚))

How to find the maximum sum of nodes in a tree

I am given two arrays, one defines the relationship of nodes and other gives the values of nodes.
arr1={0,1,1,1,3,3,4}
arr2={22,100,3,3,4,5,9}
arr1 defines the relationship, i.e. root node is 1st element and parent of node 2,3 and 4th is node 1 and parent of node 5th and 6th is root 3rd and parent of node 7th is node 4.
arr2 gives the value of nodes, node 1 have a value of 22 and node 2 has got a value of 100.
I have to find the maximum sum of nodes such that no two included nodes have a parent or a grand parent relationship.
sample input:
a[i]=[0,1,1,1,3,3,6,6]
b[i]=[1,2,3,4,5,100,7,8]
output: 111
I am new to DS and ALGO and not able to even think of the solution. Help is needed thanks.
Any type of help will do.
You can solve it using Dynamic Programming.
Consider an array dp[] which stores the answer for each of the vertex and its subtree.
Now state of DP would be,
dp[currentVertex] = max(sum of all children's dp[] ,
b[currentVertex] + sum of all vertices' dp[] whose
greatGrandParent is currentVertex])
You need to build DP table using bottom-up approach. So start from the leaves.
Answer would be dp[root] after all the calculation.

Approximate Order-Preserving Huffman Code

I am working on an assignment for an Algorithms and Data Structures class. I am having trouble understanding the instructions given. I will do my best to explain the problem.
The input I am given is a positive integer n followed by n positive integers which represent the frequency (or weight) for symbols in an ordered character set. The first goal is to construct a tree that gives an approximate order-preserving Huffman code for each character of the ordered character set. We are to accomplish this by "greedily merging the two adjacent trees whose weights have the smallest sum."
In the assignment we are shown that a conventional Huffman code tree is constructed by first inserting the weights into a priority queue. Then by using a delmin() function to "pop" off the root from the priority queue I can obtain the two nodes with the lowest frequencies and merge them into one node with its left and right being these two lowest frequency nodes and its priority being the sum of the priorities of its children. This merged node then is inserted back into the min-heap. The process is repeated until all input nodes have been merged. I have implemented this using an array of size 2*n*-1 with the input nodes being from 0...n-1 and then from n...2*n*-1 being the merged nodes.
I do not understand how I can greedily merge the two adjacent trees whose weights have the smallest sum. My input has basically been organized into a min-heap and from there I must find the two adjacent nodes that have the smallest sum and merge them. By adjacent I assume my professor means that they are next to each other in the input.
Example Input:
9
1
2
3
3
2
1
1
2
3
Then my min-heap would look like so:
1
/ \
2 1
/ \ / \
2 2 3 1
/ \
3 3
The two adjacent trees (or nodes) with the smallest sum, then, are the two consecutive 1's that appear near the end of the input. What logic can I apply to start with these nodes? I seem to be missing something but I can't quite grasp it. Please, let me know if you need any more information. I can elaborate myself or provide the entire assignment page if something is unclear.
I think this can be done with a small modification to the conventional algoritm. Instead of storing single trees in your priority queue heap, store pairs of adjacent trees. Then, at each step you remove the minimum pair (t1, t2) as well as the up to two pairs that also contain those trees, i.e. (u, t1) and (t2, r). Then merge t1 and t2 to a new tree t', re-insert the pairs (u, t') and (t', r) in the heap with updated weights and repeat.
You need to pop two trees and make 3rd tree. To it left node join tree with smaller sum and to right node join second tree. Put this tree to heap. From your example
Pop 2 tree from heap:
1 1
Make tree
?
/ \
? ?
Put smaller tree to left node
min(1, 1) = 1
?
/ \
1 ?
Put to right node second tree
?
/ \
1 1
Tree you made have sum = sum of left node + sum of right node
2
/ \
1 1
Put new tree (sum 2) to heap.
Finally you will have one tree, It's Huffman tree.

Resources