Binary trees and quicksort? - c

I have a homework assignment that reads as follows (don't flame/worry, I am not asking you to do my homework):
Write a program that sorts a set of numbers by using the Quick Sort method using a binary search
tree. The recommended implementation is to use a recursive algorithm.
What does this mean? Here are my interpretations thus far, and as I explain below, I think both are flawed:
A. Get an array of numbers (integers, or whatever) from the user. Quicksort them with the normal quicksort algorithm on arrays. Then put stuff into a binary search tree, make the middle element of the array the root, et cetera, done.
B. Get numbers from the user, put them directly one by one into the tree, using standard properties of binary search trees. Tree is 'sorted', all is well--done.
Here's why I'm confused. Option 'A' does everything the assignment asks for, except it doesn't really use the binary tree so much as it throws it last minute in the end since it's a homework assignment on binary trees. This makes me think the intended exercise couldn't have been 'A', since the main topic's not quicksort, but binary trees.
But option 'B' isn't much better--it doesn't use quicksort at all! So, I'm confused.
Here are my questions:
if the interpretation is option 'A', just say so, I have no questions, thank you for your time, goodbye.
if the interpretation is option 'B', why is the sorting method used for inserting values in binary trees the same as quicksort? they don't seem inherently similar other than the fact that they both (in the forms I've learned so far) use the recursion divide-and-conquer strategy and divide their input in two.
if the interpretation is something else...what am I supposed to do?

Here's a really cool observation. Suppose you insert a series of values into a binary search tree in some order of your choosing. Some values will end up in the left subtree, and some values will end in the right subtree. Specifically, the values in the left subtree are less than the root, and the values of the right subtree are greater than the root.
Now, imagine that you were quicksorting the same elements, except that you use the value that was in the root of the BST as the pivot. You'd then put a bunch of elements into the left subarray - the ones less than the pivot - and a bunch of elements into the right subarray - the ones greater than the pivot. Notice that the elements in the left subtree and the right subtree of the BST will correspond perfectly to the elements in the left subarray and the right subarray of the first quicksort step!
When you're putting things into a BST, after you've compared the element against the root, you'd then descend into either the left or right subtree and compare against the root there. In quicksort, after you've partitioned the array into a left and right subarray, you'll pick a pivot for the left and partition it, and pick a pivot to the right and partition it. Again, there's a beautiful correspondence here - each subtree in the the overall BST corresponds to doing a pivot step in quicksort using the root of the subtree, then recursively doing the same in the left and right subtrees.
Taking this a step further, we get the following claim:
Every run of quicksort corresponds to a BST, where the root is the initial pivot and each left and right subtree corresponds to the quicksort recursive call in the appropriate subarrays.
This connection is extremely strong: every comparison made in that run of quicksort will be made when inserting the elements into the BST and vice-versa. The comparisons aren't made in the same order, but they're still made nonetheless.
So I suspect that what your instructor is asking you to do is to implement quicksort in a different way: rather than doing manipulations on arrays and pivots, instead just toss everything into a BST in whatever order you'd like, then walk the tree with an inorder traversal to get back the elements in sorted order.
A really cool consequence of this is that you can think of quicksort as essentially a space-optimized implementation of binary tree sort. The partitioning and pivoting steps correspond to building left and right subtrees and no explicit pointers are needed.

Related

Can I use Day-Stout-Warren to rebalance a Binary Search Tree implemented in an array?

So, I've implemented a binary search tree backed by an array. The full implementation is here.
Because the tree is backed by an array, I determine left and right children by performing arithmetic on the current index.
private Integer getLeftIdx(Integer rootIndex) {
return 2 * rootIndex + 1;
}
private Integer getRightIdx(Integer rootIndex) {
return 2 * rootIndex + 2;
}
I've realized that this can become really inefficient as the tree becomes unbalanced, partly because the array will be sparsely populated, and partly because the tree height will increase, causing searches to tend towards O(n).
I'm looking at ways to rebalance the tree, but I keep coming across algorithms like Day-Stout-Warren which seem to rely on a linked-list implementation for the tree.
Is this just the tradeoff for an array implementation? I can't seem to think of a way to rebalance without creating a second array.
Imagine you have an array of length M that contains N items (with N < M, of course) at various positions, and you want to redistributed them into "valid positions" without changing their order.
To do that you can first walk through the array from end to start, packing all the items together at the end, and then walk through the array from start to end, moving an item into each valid position you find until you run out of items.
This easy problem is the same as your problem, except that you don't want to walk though the array in "index order", you want to walk through it in binary in-order traversal order.
You want to move all the items into "valid positions", i.e. the part of the array corresponding to indexes < N, and you don't want to change their in-order traversal order.
So, walk the array in reverse in-order order, packing items into the in-order-last-possible positions. Then walk forward over the items in order, putting each item into the in-order-first available valid position valid position until you run out of items.
BUT NOTE: This is fun to consider, but it's not going to make your tree efficient for inserts -- you have to do too many rebalancings to keep the array at a reasonable size.
BUT BUT NOTE: You don't actually have to rebalance the whole tree. When there's no free place for the insert, you only have to rebalance the smallest subtree on the path that has an extra space. I vaguely remember a result that I think applies, which suggests that the amortized cost of an insert using this method is O(log^2 N) when your array has a fixed number of extra levels. I'll do the math and figure out the real cost when I have time.
I keep coming across algorithms like Day-Stout-Warren which seem to rely on a linked-list implementation for the tree.
That is not quite correct. The original paper discusses the case where the tree is embedded into an array. In fact, section 3 is devoted to the changes necessary. It shows how to do so with constant auxiliary space.
Note that there's a difference between their implementation and yours, though.
Your idea is to use a binary-heap order, where once you know a single-number index i, you can determine the indices of the children (or the parent). The array is, in general, not sorted in increasing indices.
The idea in the paper is to use an array sorted in increasing indices, and to compact the elements toward the beginning of the array on a rebalance. Using this implementation, you would not specify an element by an index i. Instead, as in binary search, you would indirectly specify an element by a pair (b, e), where the idea is that the index is implicitly specified as ⌊(b + e) / 2⌋, but the information allows you to determine how to go left or right.

Why Binary search tree?

Why do people use binary search trees?
Why not simply do a binary search on the array sorted from lowest to highest?
To me, an insertion / deletion cost seems to be the same, why complicate life with processes such as max/min heapify etc?
Is it just because of random access required within a data structure?
The cost of insertion is not the same. If you want to insert an item in the middle of an array, you have to move all elements to the right of the inserted element by one position, the effort for that is proportional to the size of the array: O(N). With a self-balancing binary tree the complexity of insertion is much lower: O(ln(N)).

Perfect Balanced Binary Search Tree

I have an theoretical question about Balanced BST.
I would like to build Perfect Balanced Tree that has 2^k - 1 nodes, from a regular unbalanced BST. The easiest solution I can think of is to use a sorted Array/Linked list and recursively divide the array to sub-arrays, and build Perfect Balanced BST from it.
However, in a case of extremely large Tree sizes, I will need to create an Array/List in the same size so this method will consume a large amount of memory.
Another option is to use AVL rotation functions, inserting element by element and balancing the tree with rotations depending on the Tree Balance Factor - three height of the left and right sub trees.
My questions are, am I right about my assumptions? Is there any other way to create a perfect BST from unbalanced BST?
AVL and similar trees are not perfectly balanced so I'm not sure how they are useful in this context.
You can build a doubly-linked list out of tree nodes, using left and right pointers in lieu of forward and backward pointers. Sort that list, and build the tree recursively from the bottom up, consuming the list from left to right.
Building a tree of size 1 is trivial: just bite the leftmost node off the list.
Now if you can build a tree of size N, you can also build a tree of size 2N+1: build a tree of size N, then bite off a single node, then build another tree of size N. The singe node will be the root of your larger tree, and the two smaller trees will be its left and right subtrees. Since the list is sorted, the tree is automatically a valid search tree.
This is easy to modify for sizes other than 2^k-1 too.
Update: since you are starting from a search tree, you can build a sorted list directly from it in O(N) time and O(log N) space (perhaps even in O(1) space with a little ingenuity), and build the tree bottom-up also in O(N) time and O(log N) (or O(1)) space.
I did not yet find a very good situation for needing a perfectly balanced search tree. If your case really needs it, I would like to hear about it. Usually it is better and faster to have a almost balanced tree.
If you have a large search tree, throwing away all existing structure is usually no good idea. Using rotation functions is a good way of getting a more balanced tree while preserving most of the existing structure. But normally you use a suitable data structure to make sure you never have a completely unbalanced tree. So called self balancing trees.
There is for example an AVL tree, a red-black-tree or a splay-tree, which use slightly different variants of rotation to keep the tree balanced.
If you really have a totally unbalanced tree you might have a different problem. - In your case rotating it the AVL way is probably the best way to fix it.
If you are memory constrained, then you can use the split and join operations which can be done on an AVL tree in O(log n) time, and I believe constant space.
If you also were able to maintain the order statistics, then you can split on median, make the LHS and RHS perfect and then join.
The pseudo-code (for a recursive version) will be
void MakePerfect (AVLTree tree) {
Tree left, right;
Data median;
SplitOnMedian(tree, &left, &median, &right);
left = MakePerfect(left);
right = MakePerfect(right);
return Join(left, median, right);
}
This can be implemented in O(n) time and O(log n) space, I believe.

How to calculate difference between two sets in C?

I have two arrays, say A and B with |A|=8 and |B|=4. I want to calculate the set difference A-B. How do I proceed? Please note that there are no repeated elements in either of the sets.
Edit: Thank you so much everybody for a myriad of elegant solutions. Since I am in prototyping stage of my project, for now I implemented the simplest solution told by Brian and Owen. But I do appreciate the clever use of data structures as suggested here by the rest of you, even Though I am not a computer scientist but an engineer and never studied data structures as a course. Looks like it's about time I should really start reading CLRS which I have been procrastinating for quite a while :) Thanks again!
sort arrays A and B
result will be in C
let a - the first elem of A
let b - the first elem of B
then:
1) while a < b: insert a into C and a = next elem of A
2) while a > b: b = next elem of B
3) if a = b: a = next elem of A and b = next elem of B
4) if b goes to end: insert rest of A into C and stop
5) if a goes to end: stop
Iterate over each element of A, if each of those elements are not in B, then add them to a new set C.
It depends on how you want to represent your sets, but if they are just packed bits then you can use bitwise operators, e.g. D = A & ~B; would give you the set difference A-B if the sets fit into an integer type. For larger sets you might use arrays of integer types and iterate, e.g.
for (i = 0; i < N; ++i)
{
D[i] = A[i] & ~B[i];
}
The following assumes the sets are stored as a sorted container (as std::set does).
There's a common algorithm for merging two ordered lists to produce a third. The idea is that when you look at the heads of the two lists, you can determine which is the lower, extract that, and add it to the tail of the output, then repeat.
There are variants which detect the case where the two heads are equal, and treat this specially. Set intersections and unions are examples of this.
With a set asymmetric difference, the key point is that for A-B, when you extract the head of B, you discard it. When you extract the head of A, you add it to the input unless the head of B is equal, in which case you extract that too and discard both.
Although this approach is designed for sequential-access data structures (and tape storage etc), it's sometimes very useful to do the same thing for a random-access data structure so long as it's reasonably efficient to access it sequentially anyway. And you don't necessarily have to extract things for real - you can do copying and step instead.
The key point is that you step through the inputs sequentially, always looking at the lowest remaining value next, so that (if the inputs have no duplicates) you will the matched items. You therefore always know whether your next lowest value to handle is an item from A with no match in B, and item in B with no match in A, or an item that's equal in both A and B.
More generally, the algorithm for the set difference depends on the representation of the set. For example, if the set is represented as a bit-vector, the above would be overcomplex and slow - you'd just loop through the vectors doing bitwise operations. If the set is represented as a hashtable (as in the tr1 unordered_set) the above is wrong as it requires ordered inputs.
If you have your own binary tree code that you're using for the sets, one good option is to convert both trees into linked lists, work on the lists, then convert the resulting list to a perfectly balanced tree. The linked-list set-difference is very simple, and the two conversions are re-usable for other similar operations.
EDIT
On the complexity - using these ordered merge-like algorithms is O(n) provided you can do the in-order traversals in O(n). Converting to a list and back is also O(n) as each of the three steps is O(n) - tree-to-list, set-difference and list-to-tree.
Tree-to-list basically does a depth-first traversal, deconstructing the tree as it goes. Theres a trick for making this iterative, storing the "stack" in part-handled nodes - changing a left-child pointer into a parent-pointer just before you step to the left child. This is a good idea if the tree may be large and unbalanced.
Converting a list to a tree basically involves a depth-first traversal of an imaginary tree (based on the size, known from the start) building it for real as you go. If a tree has 5 nodes, for instance, you can say that the root will be node 3. You recurse to build a two-node left subtree, then grab the next item from the list for that root, then recurse to build a two-node right subtree.
The list-to-tree conversion shouldn't need to be implemented iteratively - recursive is fine as the result is always perfectly balanced. If you can't handle the log n recursion depth, you almost certainly can't handle the full tree anyway.
Implement a set object in C. You can do it using a hash table for the underlying storage. This is obviously a non trivial exercise, but a few Open Source solutions exist. Then you simply need to add all the elements of A and then iterate over B and remove any that are elements of your set.
The key point is to use the right data structure for the job.
For larger sets I'd suggest sorting the numbers and iterating through them by emulating the code at http://www.cplusplus.com/reference/algorithm/set_difference/ which would be O(N*logN), but since the set sizes are so small, the solution given by Brian seems fine even though it's theoretically slower at O(N^2).

How to sort a bunch of polygons/polyhedra by a specific value at their vertices (or some other distance measure)

I am working on a project which will be using large datasets (both 2D and 3D) which I will be turning into triangles, or tetrahedra, in order to render them.
I will also be performing calculations on these tris/tets. Which tris/tets to use for each calculation depend on the greatest and smallest values of their vertices.
So I need to sort the tris/tets in order of their greatest valued vertex.
--
I have tried quicksort, and binary insertion sort. Quicksort so far offers the quickest solution but it is still quite slow due to the size of the data sets.
I was thinking along the lines of a bucket/map sort when creating the tris/tets in the first place; a bucket for each of the greatest valued vertices encountered, adding pointers to the triangles who all have that value as the value of their greatest valued vertex.
This approach should be linear in time, but requires more memory obviously. This is not an issue, but my programming language of choice is c. And I'm not entirely sure how I would go about coding such a thing up.
So my question to you is, how would you go about getting the triangles/tets in such a way that you could iterate through them, from the triangle whos vertex with the greatest value of its 3 vertices is the greatest valued vertex in the entire data set, all the way down to the triangle with the the smallest greatest vertex value? :)
Can't you just store them in a binary search tree as you generate them? That would keep them in order and easily searchable (O(log(n)) for both insertion and lookup)
You could use priority queue based on a heap data structure. This should also get you O(log(n)) for insertion and extraction.

Resources