After getting the code for inserting a note in a Min Heap right, I'm confused when it comes to what changes should I make if I want to prioritize the left child of a node when rearranging the heap.
The input would be something like:
I 5 //insert number 5 in the Min Heap
I 4
I 3
I 2
I 1
and the output should be:
1 2 3 4 5
instead of the usual:
1 2 4 5 3
Any ideas on how to get to this output? Thanks in advance.
The structure of the heap depends entirely on the order in which you insert items. The reason is that, when inserting, you add the new node to the end of the heap and then sift it up through its parent's pointer. The rules are:
Add the item to the end of the heap.
If the item is greater than or equal to its parent, then done.
Swap the item with its parent.
Go to 2.
Given those rules, let's walk through what happens when you insert items in the order [5,4,3,2,1].
[5]
[5,4] // the new item is smaller than its parent, so swap
[4,5]
[4,5,3] // the new item is smaller than its parent, so swap
[3,5,4]
[3,5,4,2] // the new item is smaller than its parent, so swap
[3,2,4,5] // still smaller than its parent
[2,3,4,5]
[2,3,4,5,1] // 1 is smaller than 3, so swap
[2,1,4,5,3] // 1 is smaller than 2, so swap
[1,2,4,5,3]
There's no efficient way to "prioritize" a particular subtree, especially in a binary heap. It looks simple enough in a heap with just five items, but every level you add increases the cost of keeping sibling nodes in the proper order. You're better off just sorting the nodes and creating a heap from the resulting array.
Not that a sorted heap helps you much. As soon as you removed the first item, rearranging the heap would cause it to no longer be sorted.
Related
What is the best algorithm(in terms of time complexity) to find the minimum element in max heap?
The minimum element in a max-heap is guaranteed to be in the last (n/2 + 1) items, where n is the number of items in the heap. So the best way to find it is to do a sequential scan of the last n/2 items. Consider, for example, a heap with 5 items:
5
4 1
3 2
The smallest item will never have children, so it must be either on the bottom row of the heap, or on the next row up.
I recently came through an interesting coding problem, which is as follows:
There are n boxes, let us assume this is an array of n boxes.
For each index i of this array, three values are given -
1.) Weight(i)
2.) Left(i)
3.) Right(i)
left(i) means - if weight[i] is chosen, we are not allowed to choose left[i] elements from the left of this ith element.
Similarly, right[i] means if arr[i] is chosen, we are not allowed to choose right[i] elements from the right of it.
Example :
Weight[2] = 5
Left[2] = 1
Right[2] = 3
Then, if I pick element at position 2, I get weight of 5 units. But, I cannot pick elements at position {1} (due to left constraint). And cannot pick elements at position {3,4,5} (due to right constraint).
Objective - We need to calculate the maximum sum of the weights we can pick.
Sample Test Case :-
**Input: **
5
2 0 3
4 0 0
3 2 0
7 2 1
9 2 0
**Output: **
13
Note - First column is weights, Second column is left constraints, Third column is right constraints
I used Dynamic Programming approach(similar to Longest Increasing Subsequence) to reach a O(n^2) solution. But, not able to think of a O(n*logn) solution. (n can be up to 10^5.)
I also tried to use priority queue, in which elements with lower value of (right[i] + i) are given higher priority(assigned higher priority to element with lower value of "i", in case primary key value is equal). But, it is also giving timeout error.
Any other approach for this? or any optimization in priority queue method? I can post both of my codes if needed.
Thanks.
One approach is to use a binary indexed tree to create a data structure that makes it easy to do two operations in O(logn) time each:
Insert number into an array
Find maximum in a given range
We will use this data structure to hold the maximum weight that can be achieved by selecting box i along with an optimal selection of boxes to the left.
The key is that we will only insert values into this data structure when we reach a point where the right constraint has been met.
To find the best value for box i, we need to find the maximum value in the data structure for all points up to location i-left[i], which can be done in O(logn).
The final algorithm is to loop over i=0..n-1 and for each i:
Compute result for box i by finding maximum in range 0..(i-left[i])
Schedule the result to be added when we reach location i+right[i]
Add any previously scheduled results into our data structure
The final result is the maximum value in the whole data structure.
Overall, the complexity is o(nlogn) because each value of i results in one lookup and one update operation.
So I have this project for school : I have a linked list of 50 000 numbers and a second empty list. I only have a very limited panel of instructions. They are :
"sa" swaps the first two elements of list 1
"sb" swaps the first two elements of list 2
"ss" is "sa" and "sb" at the same time
"pa" : push top element of list 2 on top of list 1
"pb": push top element of list 1 on top of list 2
"ra": rotate list 1 (first element becomes last)
"rb":rotate list 2 (first becomes last)
"rr": "ra"and "rb" at once
"rra": rotate list 1 (last becomes first)
"rrb": rotate list 2(last becomes first)
"rrr": "rra" and "rrb" at once
I have to implement a sorting algorithm in c and the goal is to do it with the smallest amount of instructions.
I tried with a very simple algorithm that rotated list one until the maximum was on top and then pushed it in list 2 repeatedly until everything was in list 2 and then pushed everything back in list 1, but I was not able to sort lists of more than 5k numbers in a reasonable amount of time
I think I've figured out how to do this using quick sort. Here's some pseudocode.
edit: updated pseudocode to focus on what it's doing and not unnecessary syntax
quicksort(int n)
if n == 1 return
int top_half_len = 0
choose a median //it's up to you to determine the best way to do this
for 0 to n { //filter all values above the median into list 2
if (value > median) {
push list 1 top to list 2 //list 2 stores the larger half
top_half_len++
}
rotate list 1 forward
}
//reverse the list back to original position
rotate list 1 backward (n - top_half_len) times
//push larger half onto smaller half
push list 2 top to list 1 top_half_len times
//recursively call this on the larger half
quicksort(top_half_len)
//rotate smaller half to front
rotate list 1 forward top_half_len times
//recursively call this on smaller half
quicksort(n - top_half_len)
//reverse list back to original position
rotate list 1 backward top_half_len times
Basically, it splits the list into a portion smaller or equal than the median (smaller half) and a portion greater than the median (larger half). Then it calls itself on both of these halves. Once they're length 1, the algorithm is done, since a 1 length list is sorted. Google quick sort for an actual explanation.
I'm think this should work, but I may have missed some edge case. Don't blindly follow this. Also, if you were dealing with arrays, I'd recommend you stop the quick sort at a certain recursion depth and use heap sort (or something to prevent the worst case O(n^2) complexity), but I'm not sure what would be efficient here. update: according to Peter Cordes, you should use insertion sort when you get below a certain array size (IMO you should at a certain recursion depth too).
Apparently merge sort is faster on linked lists. It probably wouldn't be too hard to modify this to implement merge sort. Merge sort is pretty similar to quick sort.
why is merge sort preferred over quick sort for sorting linked lists
The problem statement is missing a compare function, so I would define compare(lista, listb) to compare the first node of lista with the first node of listb and return -1 for <, 0 for =, 1 for greater, or all that is really needed for merge sort is 0 for <= and 1 for >.
Also missing is a return value to indicate a list is empty when doing pa or pb. I would define pa or pb to return 1 if source list is not empty and 0 if source list is empty (no node to copy).
It's not clear if the goal is smallest amount of instructions refers to the number of instructions in the source code or the number of instructions executed during the sort.
-
Smallest number of instructions in the code would rotate list2 based on compares with list1 to insert nodes into list2 at the proper location. Start with a pb, and set list2 size to 1. Then rb or rrb are done to rotate list2 to where the next pb should be done. The code would keep track of list2 "offset" to smallest node in order to avoid endless loop in rotating list2. Complexity is O(n^2).
-
I'm thinking the fastest sort and perhaps fewest number of instructions executed is a bottom up merge sort.
Do a bottom up merge sort while rotating the lists, using them like circular buffers / lists. Copy list1 to list2 to generate a count of nodes, using the sequence | count = 0 | while(pb){rb | count += 1 }.
Using the count, move every other node from list2 to list1 using {pa, rr}, n/2 times. Always keep track of the actual number of nodes on each list in order to know when the logical end of a list is reached. Also keep track of a run counter for each list to know when the logical end of a run is reached. At this point you have two lists where the even nodes are on list1 and odd nodes are on list2.
Run size starts off at 1 and doubles on each pass. For the first pass with run size of 1, merge even nodes with odd nodes, creating sorted runs of size 2, alternating appending the sorted pairs of nodes to list1 and list2. For example, if appending to list1, and list1 node <= list2 node, use {ra, run1count -= 1}, else use {pa, ra, run2count -= 1}. When the end of a run is reached, the append the rest of the remaining run to the end of a list. On the next pass, merge sorted runs of 2 nodes from the lists, alternately appending sorted runs of 4 nodes to each list. Continue this for runs of 8, 16, ... until all nodes end up on one list.
So that's one pass to count the nodes, one pass to split up the even and odd nodes, and ceil(log2(n)) passes to do the merge sort. The overhead for the linked list operations is small (rotate removes and appends a node), so the overall merge should be fairly quick.
The number of instructions on the count pass could be reduced with while(pb){count +=1}, which would copy list1 to list2 reversed. Then spitting up list2 into list1 would also be done using rrr to unreverse them.
Complexity is O(n log(n)).
So I have a problem which i'm pretty sure is solvable, but after many, many hours of thinking and discussion, only partial progress has been made.
The issue is as follows. I'm building a BTree of, potentially, a few million keys. When searching the BTree, it is paged on demand from disk into memory, and each page in operation is relatively expensive. This effectively means that we want to need to traverse as few nodes as possible (although after a node has been traversed to, the cost of traversing through that node, up to that node is 0). As a result, we don't want to waste space by having lots of nodes nearing minimum capacity. In theory, this should be preventable (within reason) as the structure of the tree is dependent on the order that the keys were inserted in.
So, the question is how to reorder the keys such that after the BTree is built the fewest number of nodes are used. Here's an example:
I did stumble on this question In what order should you insert a set of known keys into a B-Tree to get minimal height? which unfortunately asks a slightly different question. The answers, also don't seem to solve my problem. It is also worth adding that we want the mathematical guarantees that come from not building the tree manually, and only using the insert option. We don't want to build a tree manually, make a mistake, and then find it is unsearchable!
I've also stumbled upon 2 research papers which are so close to solving my question but aren't quite there!
Time-and Space-Optimality in B-Trees and Optimal 2,3-Trees (where I took the above image from in fact) discuss and quantify the differences between space optimal and space pessimal BTrees, but don't go as far as to describe how to design an insert order as far as I can see.
Any help on this would be greatly, greatly appreciated.
Thanks
Research papers can be found at:
http://www.uqac.ca/rebaine/8INF805/Automne2007/Sujets2007Automne/p174-rosenberg.pdf
http://scholarship.claremont.edu/cgi/viewcontent.cgi?article=1143&context=hmc_fac_pub
EDIT:: I ended up filling a btree skeleton constructed as described in the above papers with the FILLORDER algorithm. As previously mentioned, I was hoping to avoid this, however I ended up implementing it before the 2 excellent answers were posted!
The algorithm below should work for B-Trees with minimum number of keys in node = d and maximum = 2*d I suppose it can be generalized for 2*d + 1 max keys if way of selecting median is known.
Algorithm below is designed to minimize the number of nodes not just height of the tree.
Method is based on idea of putting keys into any non-full leaf or if all leaves are full to put key under lowest non full node.
More precisely, tree generated by proposed algorithm meets following requirements:
It has minimum possible height;
It has no more then two nonfull nodes on each level. (It's always two most right nodes.)
Since we know that number of nodes on any level excepts root is strictly equal to sum of node number and total keys number on level above we can prove that there is no valid rearrangement of nodes between levels which decrease total number of nodes. For example increasing number of keys inserted above any certain level will lead to increase of nodes on that level and consequently increasing of total number of nodes. While any attempt to decrease number of keys above the certain level will lead to decrease of nodes count on that level and fail to fit all keys on that level without increasing tree height.
It also obvious that arrangement of keys on any certain level is one of optimal ones.
Using reasoning above also more formal proof through math induction may be constructed.
The idea is to hold list of counters (size of list no bigger than height of the tree) to track how much keys added on each level. Once I have d keys added to some level it means node filled in half created in that level and if there is enough keys to fill another half of this node we should skip this keys and add root for higher level. Through this way, root will be placed exactly between first half of previous subtree and first half of next subtree, it will cause split, when root will take it's place and two halfs of subtrees will become separated. Place for skipped keys will be safe while we go through bigger keys and can be filled later.
Here is nearly working (pseudo)code, array needs to be sorted:
PushArray(BTree bTree, int d, key[] Array)
{
List<int> counters = new List<int>{0};
//skip list will contain numbers of nodes to skip
//after filling node of some order in half
List<int> skip = new List<int>();
List<Pair<int,int>> skipList = List<Pair<int,int>>();
int i = -1;
while(true)
{
int order = 0;
while(counters[order] == d) order += 1;
for(int j = order - 1; j >= 0; j--) counters[j] = 0;
if (counters.Lenght <= order + 1) counters.Add(0);
counters[order] += 1;
if (skip.Count <= order)
skip.Add(i + 2);
if (order > 0)
skipList.Add({i,order}); //list of skipped parts that will be needed later
i += skip[order];
if (i > N) break;
bTree.Push(Array[i]);
}
//now we need to add all skipped keys in correct order
foreach(Pair<int,int> p in skipList)
{
for(int i = p.2; i > 0; i--)
PushArray(bTree, d, Array.SubArray(p.1 + skip[i - 1], skip[i] -1))
}
}
Example:
Here is how numbers and corresponding counters keys should be arranged for d = 2 while first pass through array. I marked keys which pushed into the B-Tree during first pass (before loop with recursion) with 'o' and skipped with 'x'.
24
4 9 14 19 29
0 1 2 3 5 6 7 8 10 11 12 13 15 16 17 18 20 21 22 23 25 26 27 28 30 ...
o o x x o o o x x o o o x x x x x x x x x x x x o o o x x o o ...
1 2 0 1 2 0 1 2 0 1 2 0 1 ...
0 0 1 1 1 2 2 2 0 0 0 1 1 ...
0 0 0 0 0 0 0 0 1 1 1 1 1 ...
skip[0] = 1
skip[1] = 3
skip[2] = 13
Since we don't iterate through skipped keys we have O(n) time complexity without adding to B-Tree itself and for sorted array;
In this form it may be unclear how it works when there is not enough keys to fill second half of node after skipped block but we can also avoid skipping of all skip[order] keys if total length of array lesser than ~ i + 2 * skip[order] and skip for skip[order - 1] keys instead, such string after changing counters but before changing variable i might be added:
while(order > 0 && i + 2*skip[order] > N) --order;
it will be correct cause if total count of keys on current level is lesser or equal than 3*d they still are split correctly if add them in original order. Such will lead to slightly different rearrangement of keys between two last nodes on some levels, but will not break any described requirements, and may be it will make behavior more easy to understand.
May be it's reasonable to find some animation and watch how it works, here is the sequence which should be generated on 0..29 range: 0 1 4 5 6 9 10 11 24 25 26 29 /end of first pass/ 2 3 7 8 14 15 16 19 20 21 12 13 17 18 22 23 27 28
The algorithm below attempts to prepare the order the keys so that you don't need to have power or even knowledge about the insertion procedure. The only assumption is that overfilled tree nodes are either split at the middle or at the position of the last inserted element, otherwise the B-tree can be treated as a black box.
The trick is to trigger node splits in a controlled way. First you fill a node exactly, the left half with keys that belong together and the right half with another range of keys that belong together. Finally you insert a key that falls in between those two ranges but which belongs with neither; the two subranges are split into separate nodes and the last inserted key ends up in the parent node. After splitting off in this fashion you can fill the remainder of both child nodes to make the tree as compact as possible. This also works for parent nodes with more than two child nodes, just repeat the trick with one of the children until the desired number of child nodes is created. Below, I use what is conceptually the rightmost childnode as the "splitting ground" (steps 5 and 6.1).
Apply the splitting trick recursively, and all elements should end up in their ideal place (which depends on the number of elements). I believe the algorithm below guarantees that the height of the tree is always minimal and that all nodes except for the root are as full as possible. However, as you can probably imagine it is hard to be completely sure without actually implementing and testing it thoroughly. I have tried this on paper and I do feel confident that this algorithm, or something extremely similar, should do the job.
Implied tree T with maximum branching factor M.
Top procedure with keys of length N:
Sort the keys.
Set minimal-tree-height to ceil(log(N+1)/log(M)).
Call insert-chunk with chunk = keys and H = minimal-tree-height.
Procedure insert-chunk with chunk of length L, subtree height H:
If H is equal to 1:
Insert all keys from the chunk into T
Return immediately.
Set the ideal subchunk size S to pow(M, H - 1).
Set the number of subtrees T to ceil((L + 1) / S).
Set the actual subchunk size S' to ceil((L + 1) / T).
Recursively call insert-chunk with chunk' = the last floor((S - 1) / 2) keys of chunk and H' = H - 1.
For each of the ceil(L / S') subchunks (of size S') except for the last with index I:
Recursively call insert-chunk with chunk' = the first ceil((S - 1) / 2) keys of subchunk I and H' = H - 1.
Insert the last key of subchunk I into T (this insertion purposefully triggers a split).
Recursively call insert-chunk with chunk' = the remaining keys of subchunk I (if any) and H' = H - 1.
Recursively call insert-chunk with chunk' = the remaining keys of the last subchunk and H' = H - 1.
Note that the recursive procedure is called twice for each subtree; that is fine, because the first call always creates a perfectly filled half subtree.
Here is a way which would lead to minimum height in any BST (including b tree) :-
sort array
Say you can have m key in b tree
Divide array recursively in m+1 equal parts using m keys in parent.
construct the child tree of n/(m+1) sorted keys using recursion.
example : -
m = 2 array = [1 2 3 4 5 6 7 8 9 10]
divide array into three parts :-
root = [4,8]
recursively solve :-
child1 = [1 2 3]
root1 = [2]
left1 = [1]
right1 = [3]
similarly for all childs solve recursively.
So is this about optimising the creation procedure, or optimising the tree?
You can clearly create a maximally efficient B-Tree by first creating a full Balanced Binary Tree, and then contracting nodes.
At any level in a binary tree, the gap in numbers between two nodes contains all the numbers between those two values by the definition of a binary tree, and this is more or less the definition of a B-Tree. You simply start contracting the binary tree divisions into B-Tree nodes. Since the binary tree is balanced by construction, the gaps between nodes on the same level always contain the same number of nodes (assuming the tree is filled). Thus the BTree so constructed is guaranteed balanced.
In practice this is probably quite a slow way to create a BTree, but it certainly meets your criteria for constructing the optimal B-Tree, and the literature on creating balanced binary trees is comprehensive.
=====================================
In your case, where you might take an off the shelf "better" over a constructed optimal version, have you considered simply changing the number of children nodes can have? Your diagram looks like a classic 2-3 tree, but its perfectly possible to have a 3-4 tree, or a 3-5 tree, which means that every node will have at least three children.
Your question is about btree optimization. It is unlikely that you do this just for fun. So I can only assume that you would like to optimize data accesses - maybe as part of database programming or something like this. You wrote: "When searching the BTree, it is paged on demand from disk into memory", which means that you either have not enough memory to do any sort of caching or you have a policy to utilize as less memory as possible. In either way this may be the root cause for why any answer to your question will not be satisfying. Let me explain why.
When it comes to data access optimization, memory is your friend. It does not matter if you do read or write optimization you need memory. Any sort of write optimization always works on the assumption that it can read information in a quick way (from memory) - sorting needs data. If you do not have enough memory for read optimization you will not have that for write optimization too.
As soon as you are willing to accept at least some memory utilization you can rethink your statement "When searching the BTree, it is paged on demand from disk into memory", which makes up room for balancing between read and write optimization. A to maximum optimized BTREE is maximized write optimization. In most data access scenarios I know you get a write at any 10-100 reads. That means that a maximized write optimization is likely to give a poor performance in terms of data access optimization. That is why databases accept restructuring cycles, key space waste, unbalanced btrees and things like that...
I am working on an assignment for an Algorithms and Data Structures class. I am having trouble understanding the instructions given. I will do my best to explain the problem.
The input I am given is a positive integer n followed by n positive integers which represent the frequency (or weight) for symbols in an ordered character set. The first goal is to construct a tree that gives an approximate order-preserving Huffman code for each character of the ordered character set. We are to accomplish this by "greedily merging the two adjacent trees whose weights have the smallest sum."
In the assignment we are shown that a conventional Huffman code tree is constructed by first inserting the weights into a priority queue. Then by using a delmin() function to "pop" off the root from the priority queue I can obtain the two nodes with the lowest frequencies and merge them into one node with its left and right being these two lowest frequency nodes and its priority being the sum of the priorities of its children. This merged node then is inserted back into the min-heap. The process is repeated until all input nodes have been merged. I have implemented this using an array of size 2*n*-1 with the input nodes being from 0...n-1 and then from n...2*n*-1 being the merged nodes.
I do not understand how I can greedily merge the two adjacent trees whose weights have the smallest sum. My input has basically been organized into a min-heap and from there I must find the two adjacent nodes that have the smallest sum and merge them. By adjacent I assume my professor means that they are next to each other in the input.
Example Input:
9
1
2
3
3
2
1
1
2
3
Then my min-heap would look like so:
1
/ \
2 1
/ \ / \
2 2 3 1
/ \
3 3
The two adjacent trees (or nodes) with the smallest sum, then, are the two consecutive 1's that appear near the end of the input. What logic can I apply to start with these nodes? I seem to be missing something but I can't quite grasp it. Please, let me know if you need any more information. I can elaborate myself or provide the entire assignment page if something is unclear.
I think this can be done with a small modification to the conventional algoritm. Instead of storing single trees in your priority queue heap, store pairs of adjacent trees. Then, at each step you remove the minimum pair (t1, t2) as well as the up to two pairs that also contain those trees, i.e. (u, t1) and (t2, r). Then merge t1 and t2 to a new tree t', re-insert the pairs (u, t') and (t', r) in the heap with updated weights and repeat.
You need to pop two trees and make 3rd tree. To it left node join tree with smaller sum and to right node join second tree. Put this tree to heap. From your example
Pop 2 tree from heap:
1 1
Make tree
?
/ \
? ?
Put smaller tree to left node
min(1, 1) = 1
?
/ \
1 ?
Put to right node second tree
?
/ \
1 1
Tree you made have sum = sum of left node + sum of right node
2
/ \
1 1
Put new tree (sum 2) to heap.
Finally you will have one tree, It's Huffman tree.