Perfect Balanced Binary Search Tree - c

I have an theoretical question about Balanced BST.
I would like to build Perfect Balanced Tree that has 2^k - 1 nodes, from a regular unbalanced BST. The easiest solution I can think of is to use a sorted Array/Linked list and recursively divide the array to sub-arrays, and build Perfect Balanced BST from it.
However, in a case of extremely large Tree sizes, I will need to create an Array/List in the same size so this method will consume a large amount of memory.
Another option is to use AVL rotation functions, inserting element by element and balancing the tree with rotations depending on the Tree Balance Factor - three height of the left and right sub trees.
My questions are, am I right about my assumptions? Is there any other way to create a perfect BST from unbalanced BST?

AVL and similar trees are not perfectly balanced so I'm not sure how they are useful in this context.
You can build a doubly-linked list out of tree nodes, using left and right pointers in lieu of forward and backward pointers. Sort that list, and build the tree recursively from the bottom up, consuming the list from left to right.
Building a tree of size 1 is trivial: just bite the leftmost node off the list.
Now if you can build a tree of size N, you can also build a tree of size 2N+1: build a tree of size N, then bite off a single node, then build another tree of size N. The singe node will be the root of your larger tree, and the two smaller trees will be its left and right subtrees. Since the list is sorted, the tree is automatically a valid search tree.
This is easy to modify for sizes other than 2^k-1 too.
Update: since you are starting from a search tree, you can build a sorted list directly from it in O(N) time and O(log N) space (perhaps even in O(1) space with a little ingenuity), and build the tree bottom-up also in O(N) time and O(log N) (or O(1)) space.

I did not yet find a very good situation for needing a perfectly balanced search tree. If your case really needs it, I would like to hear about it. Usually it is better and faster to have a almost balanced tree.
If you have a large search tree, throwing away all existing structure is usually no good idea. Using rotation functions is a good way of getting a more balanced tree while preserving most of the existing structure. But normally you use a suitable data structure to make sure you never have a completely unbalanced tree. So called self balancing trees.
There is for example an AVL tree, a red-black-tree or a splay-tree, which use slightly different variants of rotation to keep the tree balanced.
If you really have a totally unbalanced tree you might have a different problem. - In your case rotating it the AVL way is probably the best way to fix it.

If you are memory constrained, then you can use the split and join operations which can be done on an AVL tree in O(log n) time, and I believe constant space.
If you also were able to maintain the order statistics, then you can split on median, make the LHS and RHS perfect and then join.
The pseudo-code (for a recursive version) will be
void MakePerfect (AVLTree tree) {
Tree left, right;
Data median;
SplitOnMedian(tree, &left, &median, &right);
left = MakePerfect(left);
right = MakePerfect(right);
return Join(left, median, right);
}
This can be implemented in O(n) time and O(log n) space, I believe.

Related

Can we delete avl tree node in this way

So I was studying AVL trees and came across deleting a particular node from the tree. It was done by deleting a node similar to a BST and then balancing the height difference factor. But if we have the array which contains the order of inserted elements and we have a node to be deleted. I delete the occurrence of that node in the array and then construct AVL from scratch. Can this be a good way to do the deletion?
Of course you can delete this way. But the real question when discussing algorithms is what is the complexity of that?
The complexity of the standard algorithm of deletion from an AVL tree is o(lg(n)) - you can find explanations online everywhere to that fact.
Now let's look at your method - the complexity of converting the AVL tree to a sorted array would take O(n) using inorder traversal. Than constructing the AVL tree from a sorted array is O(n).
So in the bottom line, this method is just less efficient.

Why Binary search tree?

Why do people use binary search trees?
Why not simply do a binary search on the array sorted from lowest to highest?
To me, an insertion / deletion cost seems to be the same, why complicate life with processes such as max/min heapify etc?
Is it just because of random access required within a data structure?
The cost of insertion is not the same. If you want to insert an item in the middle of an array, you have to move all elements to the right of the inserted element by one position, the effort for that is proportional to the size of the array: O(N). With a self-balancing binary tree the complexity of insertion is much lower: O(ln(N)).

Binary trees and quicksort?

I have a homework assignment that reads as follows (don't flame/worry, I am not asking you to do my homework):
Write a program that sorts a set of numbers by using the Quick Sort method using a binary search
tree. The recommended implementation is to use a recursive algorithm.
What does this mean? Here are my interpretations thus far, and as I explain below, I think both are flawed:
A. Get an array of numbers (integers, or whatever) from the user. Quicksort them with the normal quicksort algorithm on arrays. Then put stuff into a binary search tree, make the middle element of the array the root, et cetera, done.
B. Get numbers from the user, put them directly one by one into the tree, using standard properties of binary search trees. Tree is 'sorted', all is well--done.
Here's why I'm confused. Option 'A' does everything the assignment asks for, except it doesn't really use the binary tree so much as it throws it last minute in the end since it's a homework assignment on binary trees. This makes me think the intended exercise couldn't have been 'A', since the main topic's not quicksort, but binary trees.
But option 'B' isn't much better--it doesn't use quicksort at all! So, I'm confused.
Here are my questions:
if the interpretation is option 'A', just say so, I have no questions, thank you for your time, goodbye.
if the interpretation is option 'B', why is the sorting method used for inserting values in binary trees the same as quicksort? they don't seem inherently similar other than the fact that they both (in the forms I've learned so far) use the recursion divide-and-conquer strategy and divide their input in two.
if the interpretation is something else...what am I supposed to do?
Here's a really cool observation. Suppose you insert a series of values into a binary search tree in some order of your choosing. Some values will end up in the left subtree, and some values will end in the right subtree. Specifically, the values in the left subtree are less than the root, and the values of the right subtree are greater than the root.
Now, imagine that you were quicksorting the same elements, except that you use the value that was in the root of the BST as the pivot. You'd then put a bunch of elements into the left subarray - the ones less than the pivot - and a bunch of elements into the right subarray - the ones greater than the pivot. Notice that the elements in the left subtree and the right subtree of the BST will correspond perfectly to the elements in the left subarray and the right subarray of the first quicksort step!
When you're putting things into a BST, after you've compared the element against the root, you'd then descend into either the left or right subtree and compare against the root there. In quicksort, after you've partitioned the array into a left and right subarray, you'll pick a pivot for the left and partition it, and pick a pivot to the right and partition it. Again, there's a beautiful correspondence here - each subtree in the the overall BST corresponds to doing a pivot step in quicksort using the root of the subtree, then recursively doing the same in the left and right subtrees.
Taking this a step further, we get the following claim:
Every run of quicksort corresponds to a BST, where the root is the initial pivot and each left and right subtree corresponds to the quicksort recursive call in the appropriate subarrays.
This connection is extremely strong: every comparison made in that run of quicksort will be made when inserting the elements into the BST and vice-versa. The comparisons aren't made in the same order, but they're still made nonetheless.
So I suspect that what your instructor is asking you to do is to implement quicksort in a different way: rather than doing manipulations on arrays and pivots, instead just toss everything into a BST in whatever order you'd like, then walk the tree with an inorder traversal to get back the elements in sorted order.
A really cool consequence of this is that you can think of quicksort as essentially a space-optimized implementation of binary tree sort. The partitioning and pivoting steps correspond to building left and right subtrees and no explicit pointers are needed.

How to store adjacent nodes for Dijkstra algorithm?

Most articles about Dijkstra algorithm only focus on which data structure should be used to perform the "relaxing" of nodes.
I'm going to use a min-heap which runs on O(m log(n)) I believe.
My real question is what data structure should I used to store the adjacent nodes of each node?
I'm thinking about using an adjacency list because I can find all adjacent nodes on u in O(deg(u)), is this the fastest method?
How will that change the running time of the algorithm?
For the algorithm itself, I think you should aim for compact representation of the graph. If it has a lot of links per node, a matrix may be best, but usually an adjacency list will take less space, and therefore less cache misses.
It may be worth looking at how you are building the graph, and any other operations you do on it.
With Dijkstra's algorithm you just loop through the list of neighbours of a node once, so a simple array or linked list storing the adjacent nodes (or simply their indices in a global list) at each node (as in an adjacency list) would be sufficient.
How will that change the running time of the algorithm? - in comparison to what? I'm pretty sure the algorithm complexity assumes an adjacency list implementation. The running time is O(edges + vertices * log(vertices)).

Binary search vs binary search tree

What is the benefit of a binary search tree over a sorted array with binary search? Just with mathematical analysis I do not see a difference, so I assume there must be a difference in the low-level implementation overhead. Analysis of average case run time is shown below.
Sorted array with binary search
search: O(log(n))
insertion: O(log(n)) (we run binary search to find where to insert the element)
deletion: O(log(n)) (we run binary search to find the element to delete)
Binary search tree
search: O(log(n))
insertion: O(log(n))
deletion: O(log(n))
Binary search trees have a worst case of O(n) for operations listed above (if tree is not balanced), so this seems like it would actually be worse than sorted array with binary search.
Also, I am not assuming that we have to sort the array beforehand (which would cost O(nlog(n)), we would insert elements one by one into the array, just as we would do for the binary tree. The only benefit of BST I can see is that it supports other types of traversals like inorder, preorder, postorder.
Your analysis is wrong, both insertion and deletion is O(n) for a sorted array, because you have to physically move the data to make space for the insertion or compress it to cover up the deleted item.
Oh and the worst case for completely unbalanced binary search trees is O(n), not O(logn).
There's not much of a benefit in querying either one.
But constructing a sorted tree is a lot faster than constructing a sorted array, when you're adding elements one at a time. So there's no point in converting it to an array when you're done.
Note also that there are standard algorithms for maintaining balanced binary search trees. They get rid of the deficiencies in binary trees and maintain all of the other strengths. They are complicated, though, so you should learn about binary trees first.
Beyond that, the big-O may be the same, but the constants aren't always. With binary trees if you store the data correctly, you can get very good use of caching at multiple levels. The result is that if you are doing a lot of querying, most of your work stays inside of CPU cache which greatly speeds things up. This is particularly true if you are careful in how you structure your tree. See http://blogs.msdn.com/b/devdev/archive/2007/06/12/cache-oblivious-data-structures.aspx for an example of how clever layout of the tree can improve performance greatly. An array that you do a binary search of does not permit any such tricks to be used.
Adding to #Blindy , I would say the insertion in sorted array takes more of memory operation O(n) std::rotate() than CPU instruction O(logn), refer to insertion sort.
std::vector<MYINTTYPE> sorted_array;
// ... ...
// insert x at the end
sorted_array.push_back(x);
auto& begin = sorted_array.begin();
// O(log n) CPU operation
auto& insertion_point = std::lower_bound(begin()
, begin()+sorted_array().size()-1, x);
// O(n) memory operation
std::rotate(begin, insertion_point, sorted_array.end());
I guess Left child right sibling tree combines the essence of binary tree and sorted array.
data structure
operation
CPU cost
Memory operation cost
sorted array
insert
O(logn) (benefits from pipelining)
O(n) memory operation, refer to insertion-sort using std::rotate()
search
O(logn)
benefits from inline implementation
delete
O(logn) (when pipelining with memory operation)
O(n) memory operation, refer to std::vector::erase()
balanced binary tree
insert
O(logn) (drawback of branch-prediction affecting pipelining, also added cost of tree rotation)
Additional cost of pointers that exhaust the cache.
search
O(logn)
delete
O(logn) (same as insert)
Left child right sibling tree (combines sorted array and binary tree)
insert
O(logn) on average
No need std::rotate() when inserting on left child if kept unbalanced
search
O(logn) (in worst case O(n) when unbalanced)
takes advantage of cache locality in right sibling search , refer to std::vector::lower_bound()
delete
O(logn) (when hyperthreading/pipelining)
O(n) memory operation refer to std::vector::erase()

Resources