Traversing a complete binary min heap - arrays

I am not sure how to traverse the tree structure below so that the nodes are always in ascending order. Heapifying the array [9 8 7 6 5 4 3 2 1 0] results in the array [0 1 3 2 5 4 7 9 6 8] which I think corresponds to this representation:
Wanting to keep the array as is (because I want to do efficient inserts later) how can I efficiently traverse it in ascending order? (That is visiting the nodes in this order [0 1 2 3 4 5 6 7 8 9])

Just sort the array. It will still be a min-heap afterward, and no other algorithm is asymptotically more efficient.

You can't traverse the heap in the same sense that you can traverse a BST. #Dukeling is right about the BST being a better choice if sorted traversal is an important operation. However you can use the following algorithm, which requires O(1) additional space.
Assume you have the heap in the usual array form.
Remove items one at a time in sorted order to "visit" them for the traversal.
After visiting the i'th item, put it back in the heap array at location n-i, which is currently unused by the heap (assuming zero-based array indices).
After traversal reverse the array to create a new heap.
Removing the items requires O(n log n) time. Reversing is another O(n).
If you don't need to traverse all the way, you can stop at any time and "fix up" the array by running the O(n) heapify operation. See pseudocode here for example.

I would actually rather suggest a self-balancing binary search tree (BST) here:
A binary search tree (BST) ... is a node-based binary tree data structure which has the following properties:
The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
The left and right subtree each must also be a binary search tree.
There must be no duplicate nodes.
It's simpler and more space efficient to traverse a BST in sorted order (a so-called in-order traversal) than doing so with a heap.
A BST would support O(log n) inserts, and O(n) traversal.
If you're doing tons of inserts before you do a traversal again, it might be more efficient to just sort it into an array before traversing - the related running times would be O(1) for inserts and O(n log n) to get the sorted order - the exact point this option becomes more efficient than using a BST will need to be benchmarked.
For the sake of curiosity, here's how you traverse a heap in sorted order (if you, you know, don't want to just keep removing the minimum from the heap until it's empty, which is probably simpler option, since removing the minimum is a standard operation of a heap).
From the properties of a heap, there's nothing stopping some element to be in the left subtree, the element following it in the right, the one after in the left again, etc. - this means that you can't just completely finish the left subtree and then start on the right - you may need to keep a lot of the heap in memory as you're doing this.
The main idea is based on the fact that an element is always smaller than both its children.
Based on this, we could construct the following algorithm:
Create a heap (another one)
Insert the root of the original heap into the new heap
While the new heap has elements:
Remove minimum from the heap
Output that element
Add the children of that element in the original heap, if it has any, to the new heap
This takes O(n log n) time and O(n) space (for reference, the BST traversal takes O(log n) space), not to mention the added code complexity.

You can use std::set, if you're ok without duplicates. A std::set iterator will traverse in order and maintains ordering based on the default comparator. In the case of int, it's <, but if you traverse in reverse order with rbegin(), you can traverse from highest to lowest. Or you can add a custom comparator. The former is presented:
#include <iostream>
#include <vector>
#include <set>
using namespace std;
int main() {
vector<int> data{ 5, 2, 1, 9, 10, 3, 4, 7, 6, 8 };
set<int> ordered;
for (auto n : data) {
ordered.insert(n);
// print in order
for (auto it = ordered.rbegin(); it != ordered.rend(); it++) {
cout << *it << " ";
}
cout << endl;
}
return 0;
}
Output:
5
5 2
5 2 1
9 5 2 1
10 9 5 2 1
10 9 5 3 2 1
10 9 5 4 3 2 1
10 9 7 5 4 3 2 1
10 9 7 6 5 4 3 2 1
10 9 8 7 6 5 4 3 2 1

Related

Efficient removal of duplicates in array

How can duplicates be removed and recorded from an array with the following constraints:
The running time must be at most O(n log n)
The additional memory used must be at most O(n)
The result must fulfil the following:
Duplicates must be moved to the end of the original array
The order of the first occurrence of each unique element must be preserved
For example, from this input:
int A[] = {2,3,7,3,2,11,2,3,1,15};
The result should be similar to this (only the order of duplicates may differ):
2 3 7 11 1 15 3 3 2 2
As I understand it, the goal is to split an array into two parts: unique elements and duplicates in such a way that the order of the first occurrence of the unique elements is preserved.
Using the the array of the OP as an example:
A={2,3,7,3,2,11,2,3,1,15}
A solution could do the following::
Initialize the helper array with indices 0, ..., n-1:
B={0,1,2,3,4,5,6,7,8,9}
Sort the pairs (A[i],B[i]) using A[i] as key and with a stable sorting algorithm of complexity O(n log n):
A={1,2,2,2,3,3,3,7,11,15}
B={8,0,4,6,1,3,7,2,5, 9}
With n being the size of the array, go through the pairs (A[i],B[i]) and for all duplicates (A[i]==A[i-1]), add n to B[i]:
A={1,2, 2, 2,3, 3, 3,7,11,15}
B={8,0,14,16,1,13,17,2, 5, 9}
Sort the pairs (A[i],B[i]) again, but now using B[i] as key:
A={2,3,7,11,1,15, 3, 2, 2, 3}
B={0,1,2, 5,8, 9,13,14,16,17}
A then contains the desired result.
Steps 1 and 3 are O(n) and steps 2 and 4 can be done in O(n log n), so overall complexity is O(n log n).
Note that this method also preserves the order of duplicates. If you want them sorted, you can assign indices n, n+1, ... in step 3 instead of adding n.
Here is a very important hint: when an algorithm is permitted O(n) extra space, that is not the same as saying it can only use the same amount of memory as the input array!
For example, given the input array int array[] = {2,3,7,3,2,11,2,3,1,15}; (10 elements)That is a total space of 10 * sizeof(int) bytes.On a 64-bit machine an int is 8 bytes long, making the array 80 bytes of data.
However, I can use more space for my extra array than just 80 bytes! In fact, I can make a histogram structure that looks like this:
struct histogram
{
bool is_used; // Is this element in use in the histogram?
int value; // The integer value represented by this element
size_t index; // The index in the output array of the FIRST instance of the value
size_t count; // The number of times the value appears in the source array
};
typedef struct histogram histogram;
And since that is a fixed, finite amount of space, I can feel totally free to allocate n of them!
histogram * new_histogram( size_t size )
{
return calloc( size, sizeof(struct histogram) );
}
On my machine that’s 240 bytes.
And yes, this absolutely, totally complies with the O(n) extra space requirement! (Because we are only using space for n extra items. Bigger items, yes, but only n of them.)
Goals
So, why make a histogram with all that extra stuff in it?
We are counting duplicates — suggesting that we should be looking at a Counting Sort, and hence, a histogram.
Accept integers in a range beyond [0,n).
The example array has 10 items, so our histogram should only have 10 slots. But there are integer values larger than 9.
Keep all the non-duplicate values in the same order as input
So we need to track the index of the first instance of each value in the input array.
We are obviously not sorting the data, but the basic idea behind a Counting Sort is to build a histogram and then use that histogram to overwrite the array with the ordered elements.
This is a powerful idea. We are going to tweak it.
The Algorithm
Remember that our input array is also our output array! So we will overwrite the array’s input values with our algorithm.
Let’s look at our example again:
2 3 7 3 2 11 2 3 1 15
  0    1    2    3    4    •5     6    7    8     9
❶ Build the histogram:
0 1 2 3 4 5 6 7 8 9 (index in histogram)
used?: no yes yes yes yes yes no yes no no
value: 0 11 2 3 1 15 0 7 0 0
index: 0 3 0 1 4 5 0 2 0 0
count: 0 1 3 3 1 1 0 1 0 0
I used a simple non-negative modulo function to get a hash index into the histogram: abs(value) % histogram_size, then found the first matching or unused entry, again modulo the histogram size. Our histogram has a single collision: 1 and 11 (mod 10) both hash to 1. Since we encountered 11 first it gets stored at index 1 of the histogram, and for 1 we had to seek to the first unused index: 4.
We can see that the duplicate values all have a count of 2 or more, and all non-duplicate values have a count of 1.
The magic here is the index value. Look at 11. It’s index is 3, not 5. If we look at our desired output we can see why:
2 3 7 11 1 15   2 2 3 3.
  0    1    2    •3     4     5       6    7    8    9
The 11 is in index 3 of the output. This is a very simple counting trick when building the histogram. Keep a running index that we only increment when we first add a value to the histogram. This index is where the value should appear in the ouput!
❷ Use the histogram to put the non-duplicate values into the array.
Clearly, anything with a non-zero count appears at least once in the input, so it must also be output.
Here’s where our magic histogram index first helps us. We already know exactly where in the array to put the value!
2 3 7 11 1 15
  0    1    2     3     4     5    ⟵   index into the array to put the value
You should take a moment to compare the array output index with the index values stored in the histogram above and convince yourself that it works.
❸ Use the histogram to put the duplicate values into the array.
So, at what index do we start putting duplicates into the array? Do we happen to have some magic index laying around somewhere that could help? From when we built the histogram?
Again stating the obvious, anything with a count greater than 1 is a value with duplicates. For each duplicate, put count-1 copies into the array.
We don’t care what order the duplicates appear, so we’ll just take them in the order they are stored in the histogram.
Complexity
The complexity of a Counting Sort is O(n+k): one pass over the input array (to build the histogram) and one pass over the histogram data (to rebuild the array in sorted order).
Our modification is: one pass over the input array (to build the histogram), then one pass over the histogram to build the non-duplicate partition, then one more pass over the histogram to build the duplicates partition. That’s a complexity of O(n+2k).
In both cases it reduces to an O(n) worst-case complexity. In fact, it is also an Ω(n) best-case complexity, making it a Θ(n) complexity — it takes the same processing per element no matter what the input.
Aaaaaahhhh! I gotta code that!!!?
Yep. It is a only a tiny bit more complex than you are used to. Remember, you only need a few things:
An array of integer values (obtained from the user?)
A histogram array
A function to turn an integer value into an index into the histogram
A function that does the three things:
Build the histogram from the array
Use the histogram to write the non-duplicate values back into the array in the correct spots
Use the histogram to write the duplicate values to the end of the array
Ability to print an integer array
Your main() should look something like this:
int main(void)
{
// Get number of integers to input
int size = 0;
scanf( "%d", &n );
// Allocate and get the integers
int * array = malloc( size );
for (int n = 0; n < size; n++)
scanf( "%d", &array[n] );
// Partition the array between non-duplicate and duplicate values
int pivot = partition( array, size );
// Print the results
print_array( "non-duplicates:", array, pivot );
print_array( "duplicates: ", array+pivot, size-pivot );
free( array );
return 0;
}
Notice the complete lack of input error checking. You can assume that your professor will test your program without inputting hello or anything like that.
You can do this!

Can we use binary search to find most frequently occuring integer in sorted array? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Problem:
Given a sorted array of integers find the most frequently occurring integer. If there are multiple integers that satisfy this condition, return any one of them.
My basic solution:
Scan through the array and keep track of how many times you've seen each integer. Since it's sorted, you know that once you see a different integer, you've gotten the frequency of the previous integer. Keep track of which integer had the highest frequency.
This is O(N) time, O(1) space solution.
I am wondering if there's a more efficient algorithm that uses some form of binary search. It will still be O(N) time, but it should be faster for the average case.
Asymptotically (big-oh wise), you cannot use binary search to improve the worst case, for the reasons the answers above mine have presented. However, here are some ideas that may or may not help you in practice.
For each integer, binary search for its last occurrence. Once you find it, you know how many times it appears in the array, and can update your counts accordingly. Then, continue your search from the position you found.
This is advantageous if you have only a few elements that repeat a lot of times, for example:
1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Because you will only do 3 binary searches. If, however, you have many distinct elements:
1 2 3 4 5 6
Then you will do O(n) binary searches, resulting in O(n log n) complexity, so worse.
This gives you a better best case and a worse worst case than your initial algorithm.
Can we do better? We could improve the worst case by finding the last occurrence of the number at position i like this: look at 2i, then at 4i etc. as long as the value at those positions are the same. If they are not, look at (i + 2i) / 2 etc.
For example, consider the array:
i
1 2 3 4 5 6 7 ...
1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
We look at 2i = 2, it has the same value. We look at 4i = 4, same value. We look at 8i = 8, different value. We backtrack to (4 + 8) / 2 = 6. Different value. Backtrack to (4 + 6) / 2 = 5. Same value. Try (5 + 6) / 2 = 5, same value. We search no more, because our window has width 1, so we're done. Continue the search from position 6.
This should improve the best case, while keeping the worst case as fast as possible.
Asymptotically, nothing is improved. To see if it actually works better on average in practice, you'll have to test it.
Binary search, which eliminates half of the remaining candidates, probably wouldn't work. There are some techniques you could use to avoid reading every element in the array. Unless your array is extremely long or you're solving a problem for curiosity, the naive (linear scan) solution is probably good enough.
Here's why I think binary search wouldn't work: start with an array: given the value of the middle item, you do not have enough information to eliminate the lower or upper half from the search.
However, we can scan the array in multiple passes, each time checking twice as many elements. When we find two elements that are the same, make one final pass. If no other elements were repeated, you've found the longest element run (without even knowing how many of that element is in the sorted list).
Otherwise, investigate the two (or more) longer sequences to determine which is longest.
Consider a sorted list.
Index 0 1 2 3 4 5 6 7 8 9 a b c d e f
List 1 2 3 3 3 3 3 3 3 4 5 5 6 6 6 7
Pass1 1 . . . . . . 3 . . . . . . . 7
Pass2 1 . . 3 . . . 3 . . . 5 . . . 7
Pass3 1 2 . 3 . x . 3 . 4 . 5 . 6 . 7
After pass 3, we know that the run of 3's must be at least 5, while the longest run of any other number is at most 3. Therefore, 3 is the most frequently occurring number in the list.
Using the right data structures and algorithms (use binary-tree-style indexing), you can avoid reading values more than once. You can also avoid reading the 3 (marked as an x in pass 3) since you already know its value.
This solution has running time O(n/k) which degrades to O(n) for k=1 for a list with n elements and a longest run of k elements. For small k, the naive solution will perform better due to simpler logic, data structures, and higher RAM cache hits.
If you need to determine the frequency of the most common number, it would take O((n/k) log k) as indicated by David to find the first and last position of the longest run of numbers using binary search on up to n/k groups of size k.
The worst case cannot be better than O(n) time. Consider the case where each element exists once, except for one element which exists twice. In order to find that element, you'd need to look at every element in the array until you find it. This is because knowing the value of any array element does not give you any information regarding the location of the duplicate element, until it's actually found. This is in contrast to binary search, where the value of an array element allows you to rule out many other elements.
No, in the worst case we have to scan at least n - 2 elements, but see
below for an algorithm that exploits inputs with many duplicates.
Consider an adversary that, for the first n - 3 distinct probes into the
n-element array, returns m for the value at index m. Now the algorithm
knows that the array looks like
1 2 3 ... i-1 ??? i+1 ... j-1 ??? j+1 ... k-1 ??? k+1 ... n-2 n-1 n.
Depending on what the ???s are, the sole correct answer could be j-1
or j+1, so the algorithm isn’t done yet.
This example involved an array where there were very few duplicates. In
fact, we can design an algorithm that, if the most frequent element
occurs k times out of n, uses O((n/k) log k) probes into the array. For
j from ceil(log2(n)) - 1 down to 0, examine the subarray consisting of
every (2**j)th element. Stop if we find a duplicate. The cost so far
is O(n/k). Now, for each element in the subarray, use binary search to
find its extent (O(n/k) searches in subarrays of size O(k), for a total
of O((n/k) log k)).
It can be shown that all algorithms have a worst case of Omega((n/k) log
k), making this one optimal in the worst case up to constant factors.

replace each number, a[i] with next higher number on its right side,

Given an array of integers, replace each number, a[i] with next higher number(according to value) on its right side, whose value is closer to a[i] (if not present than keep it as it is.)
for e.g.
input – > 3 7 5
output -> 5 7 5
input –> 3 6 2 6 4 7 1
output-> 4 7 4 7 7 7 1
This question was asked in an interview.
If start from right and insert each element in BST and then finding the closer value in BST but this approach would also be O(n^2) in the worst case.
Is there any optimized approach for this?
You can build a balanced BST for the entire list of numbers. Then, go through the list again, using the tree to find the next larger number. After each item is done, remove it from the tree.
The depth of the tree never increases, so the total complexity is O(n log n) for building the tree in the first place, O(log n) per item for finding the next largest item, and O(log n) for removing the current item. Overall O(n log n) with no fancy data structures.

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Is there a more elegant way of doing this?

Given an array of positive integers a I want to output array of integers b so that b[i] is the closest number to a[i] that is smaller then a[i], and is in {a[0], ... a[i-1]}. If such number doesn't exist, then b[i] = -1.
Example:
a = 2 1 7 5 7 9
b = -1 -1 2 2 5 7
b[0] = -1 since there is no number that is smaller than 2
b[1] = -1 since there is no number that is smaller than 1 from {2}
b[2] = 2, closest number to 7 that is smaller than 7 from {2,1} is 2
b[3] = 2, closest number to 5 that is smaller than 5 from {2,1,7} is 2
b[4] = 5, closest number to 7 that is smaller than 7 from {2,1,7,5} is 5
I was thinking about implementing balanced binary tree, however it will require a lot of work. Is there an easier way of doing this?
Here is one approach:
for i ← 1 to i ← (length(A)-1) {
// A[i] is added in the sorted sequence A[0, .. i-1] save A[i] to make a hole at index j
item = A[i]
j = i
// keep moving the hole to next smaller index until A[j - 1] is <= item
while j > 0 and A[j - 1] > item {
A[j] = A[j - 1] // move hole to next smaller index
j = j - 1
}
A[j] = item // put item in the hole
// if there are elements to the left of A[j] in sorted sequence A[0, .. i-1], then store it in b
// TODO : run loop so that duplicate entries wont hamper results
if j > 1
b[i] = A[j-1]
else
b[1] = -1;
}
Dry run:
a = 2 1 7 5 7 9
a[1] = 2
its straight forward, set b[1] to -1
a[2] = 1
insert into subarray : [1 ,2]
any elements before 1 in sorted array ? no.
So set b[2] to -1 . b: [-1, -1]
a[3] = 7
insert into subarray : [1 ,2, 7]
any elements before 7 in sorted array ? yes. its 2
So set b[3] to 2. b: [-1, -1, 2]
a[4] = 5
insert into subarray : [1 ,2, 5, 7]
any elements before 5 in sorted array ? yes. its 2
So set b[4] to 2. b: [-1, -1, 2, 2]
and so on..
Here's a sketch of a (nearly) O(n log n) algorithm that's somewhere in between the difficulty of implementing an insertion sort and balanced binary tree: Do the problem backwards, use merge/quick sort, and use binary search.
Pseudocode:
let c be a copy of a
let b be an array sized the same as a
sort c using an O(n log n) algorithm
for i from a.length-1 to 1
binary search over c for key a[i] // O(log n) time
remove the item found // Could take O(n) time
if there exists an item to the left of that position, b[i] = that item
otherwise, b[i] = -1
b[0] = -1
return b
There's a few implementation details that can make this have poor runtime.
For instance, since you have to remove items, doing this on a regular array and shifting things around will make this algorithm still take O(n^2) time. So, you could store key-value pairs instead. One would be the key, and the other would be the number of those keys (kind of like a multiset implemented on an array). "Removing" one would just be subtracting the second item from the pair and so on.
Eventually you will be left with a bunch of 0-value keys. This would eventually make the if there exists an item to the left take roughly O(n) time, and therefore, the entire algorithm would degrade to a O(n^2) for that reason. So another optimization might be to batch remove all of them periodically. For instance, when 1/2 of them are 0-values, perform a pruning.
The ideal option might be to implement another data structure that has a much more favorable remove time. Something along the lines of a modified unrolled linked list with indices could work, but it would certainly increase the implementation complexity of this approach.
I've actually implemented this. I used the first two optimizations above (storing key-value pairs for compression, and pruning when 1/2 of them are 0s). Here's some benchmarks to compare using an insertion sort derivative to this one:
a.length This method Insert sort Method
100 0.0262ms 0.0204ms
1000 0.2300ms 0.8793ms
10000 2.7303ms 75.7155ms
100000 32.6601ms 7740.36 ms
300000 98.9956ms 69523.6 ms
1000000 333.501 ms ????? Not patient enough
So, as you can see, this algorithm grows much, much slower than the insertion sort method I posted before. However, it took 73 lines of code vs 26 lines of code for the insertion sort method. So in terms of simplicity, the insertion sort method might still be the way to go if you don't have time requirements/the input is small.
You could treat it like an insertion sort.
Pseudocode:
let arr be one array with enough space for every item in a
let b be another array with, again, enough space for all elements in a
For each item in a:
perform insertion sort on item into arr
After performing the insertion, if there exists a number to the left, append that to b.
Otherwise, append -1 to b
return b
The main thing you have to worry about is making sure that you don't make the mistake of reallocating arrays (because it would reallocate n times, which would be extremely costly). This will be an implementation detail of whatever language you use (std::vector's reserve for C++ ... arr.reserve(n) for D ... ArrayList's ensureCapacity in Java...)
A potential downfall with this approach compared to using a binary tree is that it's O(n^2) time. However, the constant factors using this method vs binary tree would make this faster for smaller sizes. If your n is smaller than 1000, this would be an appropriate solution. However, O(n log n) grows much slower than O(n^2), so if you expect a's size to be significantly higher and if there's a time limit that you are likely to breach, you might consider a more complicated O(n log n) algorithm.
There are ways to slightly improve the performance (such as using a binary insertion sort: using binary search to find the position to insert into), but generally they won't improve performance enough to matter in most cases since it's still O(n^2) time to shift elements to fit.
Consider this:
a = 2 1 7 5 7 9
b = -1 -1 2 2 5 7
c 0 1 2 3 4 5 6 7 8 9
0 - - - - - - - - - -
Where the index of C is value of a[i] such that 0,3,4,6,8 would have null values.
and the 1st dimension of C contains the highest to date closest value to a[i]
So in step by a[3] we have the following
c 0 1 2 3 4 5 6 7 8 9
0 - -1 -1 - - 2 - 2 - -
and by step a[5] we have the following
c 0 1 2 3 4 5 6 7 8 9
0 - -1 -1 - - 2 - 5 - 7
This way when we get to the 2nd 7 at a[4] we know that 2 is the largest value to date and all we need to do is loop back through a[i-1] until we encounter a 7 again comparing the a[i] value to that in c[7] if bigger, replace c[7]. Once a[i-1] = the 7 we put c[7] into b[i] and move on to next a[i].
The main downfalls to this approach that I can see are:
footprint size depending on how big the c[] needs to be dimensioned..
the fact that you have to revisit elements of a[] that you've already touched. If the distribution of data is such that there are significant spaces between the two 7's then keeping track of the highest value as you go would presumably be faster. Alternatively it might be better to gather statistics on the a[i] up front to know what distributions exist and then use a hybrid method maintaining the max until such time that no more instances of that number are in the statistics.

Resources