AVL Tree insertion - c

How do I calculate the balance factor for a particular node, when I am recursively calling an insert function for adding a node to an AVL tree. I haven't started on the rotation logic. I simply want to calculate the balance factors.
In my current attempt, I am forced to store heights of left & right subtrees as I can't find the balance factor without them.
typedef struct _avlTree
{
int num;
int balFactor;
int height[2]; // left & right subtree heights
struct _avlTree *left,*right;
} *avlTree;
int avlAdd(avlTree a,avlTree aNew)
{
...
if(a->left == NULL) // left subtree insertion case
{
a->left = aNew;
return(1);
}
else
{
a->height[0] = avlAdd(a->left,aNew);
a->balFactor = a->height[0] - a->height[1];
return( (a->height[0]>a->height[1]) ? (a->height[0]+1) : (a->height[1]+1) );
}
...
}

The balance factor is the difference in heights between the right and left subtrees of a node.
When creating a new node, initialize the balance factor to zero since it is balanced (it has no subtrees).
If you are inserting a new node to the right, increase the balance factor by 1.
If you are inserting a new node to the left, decrease the balance factor by 1.
After rebalancing (rotating), if you increase the height of the subtree at this node, recursively propagate the height increase to the parent node.

Here is a very simple approach. If there was a recursive height() function, then balance factor can be computed simply as
node->balFactor = height( node->right ) - height( node->left );
This is not the best approach though, since the complexity of this approach is O( h ) where h is the height of the node in the AVL tree. For better approach, a bigger discussion is required :)
There are numerous resources on AVL tree in the web, a chosen few are:
http://en.wikipedia.org/wiki/AVL_tree
C implementation: http://www.stanford.edu/~blp/avl/libavl.html
Animation: http://www.cs.jhu.edu/~goodrich/dsa/trees/avltree.html
Animation: http://www.strille.net/works/media_technology_projects/avl-tree_2001/
BTW, The avlAdd() function looks wrong. I don't see where aNew->num is compared to a->num. Whether to go to left subtree or right subtree must depend on that. The given code seems to be adding to the left subtree unconditionally.

Related

Implement data structure with efficient insert/findKthSmallest operations

I was asked this question by an interviewer. I tried solving it using an array, such that I make sure the array in sorted when I insert. However I don't think this is the best solution. What would be a good solution for this problem?
You can use a Balenced Binary Search Tree(eg: red-black bst) for this.
Simply insert the nodes in the balenced bst. and each node maintains its rank within its own subtree. As we need to find the smallest element we can maintain count of elements in left subtree.
Now, start traversing from root and check:
if k = N +1, where N is number of nodes in roots left subtree. if yes than root is kth node.
else if K < N, continue to search in the left subtree of root.
else if k > N +1, then continue to search in the right subtree. And search for (K-N-1) smallest element.
Time complexity of insertion in a balenced bst is O(logn). While searching for kth smallest element will be O(h), where h is height of the tree.
You can use an order statistic tree. Essentially, it is a self-balancing binary search tree where each node also stores the cardinality of its subtree. Maintaining the cardinality does not increase the complexity of any tree operations, but allows your query to be performed in O(log n) time:
If the left subtree has cardinality > k, recurse on the left subtree.
Otherwise, if the left subtree has cardinality < k, recurse on the right subtree.
Otherwise the left subtree's cardinality equals k, so return the current node's element.

Find maximum subtree in the given BST such that it has no duplicates

Given the BST which allows duplicates as separate vertices, how do I find the highest subtree such that it has no duplicates.
This is the idea:
(1) Check if the root value appears in its right subtree (inserting this way: left < root <= right). If not, tree has no duplicates. I look for it always on the left from the root's child.
(2) Traversing and doing (1) I can find all subtrees without duplicates, storing their root pointer and height.
(3) Comparing heights I can find largest seeked subtree.
I don't know how to store these information while traversing. I found programs for finding all duplicate subtrees of BST that use hash maps, but if possible I would prefer to avoid using hash maps, as I haven't had them on my course yet.
<!-- language: lang-c -->
typedef struct vertex {
int data;
struct vertex *left;
struct vertex *right;
} vertex, *pvertex;
// Utility functions
int Height(pvertex t){
if (t == NULL)
return 0;
if (Height(t->left) > Height(t->right))
return Height(t->left) + 1;
else
return Height(t->right) + 1;
}
int DoesItOccur(pvertex t, int k){
if(!t)
return 0;
if(t->data==k)
return 1;
if(t->data<k){
return DoesItOccur(t->left,k);
}
}
// My function
pvertex MaxSeeked(pvertex t){
if(!t)
return NULL;
if(DoesItOccur(t->right,t->data)==0)
return t;
else if{
if(t->left && t->right){
if(Height(MaxSeeked(t->left))>Height(MaxSeeked(t->right)))
return t->left;
else
return t->right;
}
}
else if{
......
}
}
I don't know how to store these information while traversing. I found programs for finding all duplicate subtrees of BST that use hash maps, but if possible I would prefer to avoid using hash maps, as I haven't had them on my course yet.
Note in the first place that you only need to track all the subtrees of the maximal height discovered so far. Or maybe you can limit that to just one such, if that's all you need to discover. For efficiency, you should also track what that maximal height actually is.
I'll suppose that you must not add members to your node structure, but if you could do, you could add a member or two wherein to record whether the tree rooted at each node contains any dupes, and how high that tree is. You could populate those data as you go, and remember what the maximum height is, then make a second traversal to collect the nodes.
But without modifying any nodes themselves, you can still track the current candidates by other means, such as a linked list. And you can put whatever metadata you want into the tracking data structure. For example,
struct nondupe_subtree {
struct vertex *root;
int height;
struct nondupe_subtree *next;
};
You can then, say, perform a selective traversal of your tree in breadth first order, carrying along a linked list of struct nondupe_subtree nodes:
Start by visiting the root node.
Test the subtree rooted at each visited node to see whether it contains any dupes, according to the procedure you have described.
If so then enqueue its children for traversal.
If not then measure the subtree height and update your linked list (or not) accordingly. Do not enqueue this node's children.
When no more nodes are enqueued for traversal, you linked list contains the roots of all the maximal height subtrees without dupes.
Note that that algorithm would in many cases be significantly sped if you could compute and store all the subtree heights in an initial DFS pass, for it is otherwise prone to performing duplicate tree-height computations. Many of them, in some cases.
Note also that although it does simplify this particular algorithm, your rule for always putting dupes to the right works against balanced trees, which may also yield reduced performance. In the worst case, where are vertices are duplicate, your "tree" will perforce be linear.

How to check whether a complete binary tree is value-balanced

How can I check whether a given complete binary tree represented by an array is a value balanced binary tree? By value-balanced I mean, if for each and every node, the sum of the integer values of nodes on the left-hand side is equal to the sum of values on the right-hand side. What is the C-like algorithm?
It's easy to find out the indices of the nodes having children. But I'm unable to develop the logic for computing the sum at each node recursively. The sum also needs to be computed in such a manner that the sum of all the nodes of the left subtree below a particular node will be equal to the right-handed counterpart of it and dig down below in a similar manner. How is it possible using an array?
You can do a post order traversal of the tree, that sums each subtree, and when back to the root (of each subtree), evaluates if the two subtrees have the same weight.
C-like Pseudo code:
res = 1; //global variable, can also be used as sending pointer to res instead
int verifySums(Node* root) {
if (root == null) return 0;
int leftSum = verifySums(getLeft(root));
int rightSum = verifySums(getRight(root));
if (leftSum != rightSum) res = 0;
return leftSum + rightSum + getValue(root);
}
Where
Node getLeft(Node*) is returning a pointer to a Node representing
the left child of the argument
Node getRight(Node*) is returning a pointer to a Node representing
the right child of the argument
int getValue(Node*) is returning the value of the given node
The idea is to do a post-order traversal that sums the value of all children to the left, get sum to the right and then:
Verify correctness - if it's not, the answer of the entire tree is no, and set it in res.
sum the two sums + current node, and return it back to parent.

Need some explanation about trees in C

Leaf *findLeaf(Leaf *R,int data)
{
if(R->data >= data )
{
if(R->left == NULL) return R;
else return findLeaf(R->left,data);
}
else
{
if(R->right == NULL) return R;
else return findLeaf(R->right,data);
}
}
void traverse(Leaf *R)
{
if(R==root){printf("ROOT is %d\n",R->data);}
if(R->left != NULL)
{
printf("Left data %d\n",R->left->data);
traverse(R->left);
}
if(R->right != NULL)
{
printf("Right data %d\n",R->right->data);
traverse(R->right);
}
}
These code snippets works fine but i wonder how they works?
I need a brief explanation about recursion.I am thankful for your helps.
A Leaf struct will look something like this:
typedef struct struct_t {
int data;
Leaf * left; //These allow structs to be chained together where each node
Leaf * right; //has two pointers to two more nodes, causing exponential
} Leaf; //growth.
The function takes a pointer to a Leaf we call R and some data to search against, it returns a pointer to a Leaf
Leaf *findLeaf(Leaf *R,int data){
This piece of code decides whether we should go left or right, the tree is known to be ordered because the insert function follows this same rule for going left and right.
if(R->data >= data ){
This is an edge case of the recursive nature of the function, if we have reached the last node in a tree, called the Leaf, return that Leaf.
An edge case of a recursive function has the task of ending the recursion and returning a result. Without this, the function would not finish.
if(R->left == NULL) return R;
This is how we walk through the tree, Here, we are traversing down the left side because the data was larger. (Larger data is always inserted on at the left to stay ordered.)
What is happening is that now we call findLeaf() with R->left, but imagine if we get to this point again in this next call.
It will become R->left->left in reference to the first call. If the data is smaller than the current node we are operating on we would go right instead.
else return findLeaf(R->left,data);
Now we are at the case where the data was smaller than the current Node, so we are going right.
} else {
This is exactly the same as with the left.
if(R->right == NULL) return R;
else return findLeaf(R->right,data);
}
}
In the end, the return of the function can be conceptualized as something like R->right->right->left->NULL.
Lets take this tree and operate on it with findLeaf();
findLeaf(Leaf * root, 4) //In this example, root is already pointing to (8)
We start at the root, at the top of the tree, which contains 8.
First we check R->data >= data where we know R->data is (8) and data is (4). Since we know data is smaller than R->data(Current node), we enter the if statement.
Here we operate on the left Leaf, checking if it is NULL. It isn't and so we skip to the else.
Now we return findLeaf(R->left, data);, but to return it, we must solve it first. This causes us to enter a second iteration where we compare (3) to (4) and try again.
Going through the entire process again, we will compare (6) to (4) and then finally find our node when we comepare (4) to (4). Now we will backtrack through the function and return a chain like this:
R(8)->(3)->(6)->(4)
Edit: Also, coincidentally, I wrote a blog post about traversing a linked list to explain the nature of a Binary Search Tree here.
Each Leaf contains three values:
data - an integer
left and right, both pointers to another leaf.
left, right or both, might be NULL, meaning there isn't another leaf in that direction.
So that's a tree. There's one Leaf at the root, and you can follow the trail of left or right pointers until you reach a NULL.
The key to recursion is that if you follow the path by one Leaf, the remaining problem is exactly the same (but "one smaller") as the problem you had when you were at the root. So you can call the same function to solve the problem. Eventually the routine will be at a Leaf with NULL as its pointer, and you've solved the problem.
It's probably easiest to understand a list before you understand a tree. So instead of a Leaf with two pointers, left and right, you have a Node with just one pointer, next. To follow the list to its end, recursively:
Node findEnd(Node node) {
if(node->next == NULL) {
return node; // Solved!!
} else {
return findEnd(node->next);
}
}
What's different about your findLeaf? Well, it uses the data parameter to decide whether to follow the left or right pointer, but otherwise it's exactly the same.
You should be able to make sense of traverse() with this knowledge. It uses the same principle of recursion to visit every Leaf in the structure.
Recursion is a function that breaks a problem down into 2 variants:
one step of solving the problem, and calling itself with the remainder of the problem
the last step of solving the problem
Recursion is simply a different way of looping through code.
Recursive algorithms generally work hand in hand with some form of data structure - in your case the tree. You need to imagine the recursion - very high level - as "reapply the same logic on a subset of the problem".
In your case the subset of the problem is either the branch of the three on the right or the branch of the three on the left.
So, let's look at the traverse algorithm:
It takes the leaf you pass to the method and - if it's the ROOT leaf states it
Then, if there is a "left" sub-leaf it displays the data attached to it and restarts the algorithm (the recursion) which means... on the left node
If the left node is the ROOT, state it (no chance after the first recursion since the ROOT is at the top)
Then , if there is a "left" sub-leaf to our left node, display it and restart the algorithm on this left, left
When reaching the bottom left, i.e. when there is no left leaf left (following? :) ) then it does the same on the first right leaf. If there is neither a left leaf nor a right leaf, which means we are at the real leaf that does not have sub-leafs, the recursive call ends, which means that the algorithm starts again from the place it was before recursing and with all the variables at the state they were in then.
After first recursion termination, you will move from the bottom left leaf up one leaf, and go down on the right leaf if there is one and start again printing and moving on the left.
All in all - the ending result is that you walk through your whole tree in a left first way.
Tell me if it's not crystal clear and try to apply the same pattern on the findLeaf recursive algorithm.
A little comment about recursion and then a little comment about searching on a tree:
let's suppose you want to calculate n!. You can do (pseudocode)
fac 0 = 1
fac (n+1) = (n+1) * fac n
So recursion is solving a problem by manipulating the result of solving the same problem with a smaller data. See http://en.wikipedia.org/wiki/Recursion.
So now, let's suppose we have a data structure tree
T = (L, e, R)
with L the left subtree, e is the root and R is the right subtree... So let's say you want to find the value v in that tree, you would do
find v LEAF = false // you cant find any value in an empty tree, base case
find v (L, e, R) =
if v == e
then something(e)
else
if v < e
find v L (here we have recursion, we say 'go and search for v in the left subtree)
else
find v R (here we have recursion, we say 'go and search for v in the right subtree)
end
end

Center of Binary Tree

How can we find center of a binary tree?
What shall be the most efficient algorithm. Though center of binary tree will be the mid point of the path corresponding to the diameter of tree. We can find the diameter of tree without actually knowing the path, is there any similar technique for finding center of binary tree?
If you know the diameter : D
and you know the max depth of the tree : M
then your center will be at the (M-(D/2)) th node(from the root) on the deepest path.(it might be M - (D-1)/2 depending on parity, you need to check yourself)
If you have more than on 1 paths from root to leaf with M nodes then the center is the root. (only true when the longest path goes through the root)
EDIT:
To answer your remark.
if it doesn't go through the root. Let's take the D/2th node on the diameter it will still be on the longest side of the diameter path (wich is in all the cases the longest path from root to leaf). and therefore M-D/2 still represent this point from the root.
Taking M-D/2nth from the root is the same as talking D/2nth from the leaf of the longest path.
Am I clear enough ? You might just want to draw it to check it .
You could calculate this in linear time O(N) by storing a list of the nodes that you have traversed if you are using a recursive method where you calculate the diameter by using the height of the tree (see this website here).
For instance, adapt the linear-time diameter function at the link I posted above so that you are also collecting a list of the nodes you have visited, and not just distance information. On each recursive call, you would select the list that went along with the longer traversed distance. The middle of the list that represented the diameter of the tree would be the "center" of the tree.
Your setup would look like the following:
typedef struct linked_list
{
tree_node* node;
linked_list* next;
} linked_list;
typedef struct list_pair
{
linked_list* tree_height;
linked_list* full_path;
} list_pair;
//some standard functions for working with the structure data-types
//they're not defined here for the sake of brevity
void back_insert_node(linked_list** tree, tree_node* add_node);
void front_insert_node(linked_list** tree, tree_node* add_node);
int list_length(linked_list* list);
void destroy_list(linked_list* list);
linked_list* copy_list(linked_list* list);
linked_list* append_list(linked_list* first, linked_list* second);
//main function for finding the diameter of the tree
list_pair diameter_path(tree_node* tree)
{
if (tree == NULL)
{
list_pair return_list_pair = {NULL, NULL};
return return_list_pair;
}
list_pair rhs = diameter_path(tree->right);
list_pair lhs = diameter_path(tree->left);
linked_list* highest_tree =
list_length(rhs.tree_height) > list_length(lhs.tree_height) ?
rhs.tree_height : lhs.tree_height;
linked_list* longest_path =
list_length(rhs.full_path) > list_length(lhs.full_path) ?
rhs.full_path : lhs.full_path;
//insert the current node onto the sub-branch with the highest height
//we need to make sure that the full-path, when appending the
//rhs and lhs trees, will read from left-to-right
if (highest_tree == rhs.tree_height)
front_insert_node(highest_tree, tree);
else
back_insert_node(highest_tree, tree);
//make temporary copies of the subtrees lists and append them to
//create a full path that represents a potential diameter of the tree
linked_list* temp_rhs = copy_list(rhs.tree_height);
linked_list* temp_lhs = copy_list(lhs.tree_height);
linked_list* appended_list = append_list(temp_lhs, temp_rhs);
longest_path =
list_length(appended_list) > list_length(longest_path) ?
appended_list : longest_path;
list_pair return_list_pair;
return_list_pair.tree_height = copy_list(highest_tree);
return_list_pair.full_path = copy_list(longest_path);
destroy_list(rhs.tree_height);
destroy_list(rhs.full_path);
destroy_list(lhs.tree_height);
destroy_list(lhs.full_path);
return return_list_pair;
}
Now the function returns a series of pointers in the full_path structure member that can be used to cycle though and find the middle-node which will be the "center" of the tree.
P.S. I understand that utilizing copying functions is not the fastest approach, but I wanted to be clearer rather than make something that was faster but had too much pointer-twiddling.
Optimized implementation: The above implementation can be optimized by calculating the
height in the same recursion rather than calling a height() separately.
/*The second parameter is to store the height of tree.
Initially, we need to pass a pointer to a location with value
as 0. So, function should be used as follows:
int height = 0;
struct node *root = SomeFunctionToMakeTree();
int diameter = diameterOpt(root, &height); */
int diameterOpt(struct node *root, int* height)
{
/* lh --> Height of left subtree
rh --> Height of right subtree */
int lh = 0, rh = 0;
/* ldiameter --> diameter of left subtree
rdiameter --> Diameter of right subtree */
int ldiameter = 0, rdiameter = 0;
if(root == NULL)
{
*height = 0;
return 0; /* diameter is also 0 */
}
/* Get the heights of left and right subtrees in lh and rh
And store the returned values in ldiameter and ldiameter */
ldiameter = diameterOpt(root->left, &lh);
rdiameter = diameterOpt(root->right, &rh);
/* Height of current node is max of heights of left and
right subtrees plus 1*/
*height = max(lh, rh) + 1;
return max(lh + rh + 1, max(ldiameter, rdiameter));
}
Time Complexity: O(n)

Resources