For each element of an unordered array output the number of greater elements - arrays

I guess question is quite straight forward, so let me explain with an example
Input Array = {3 1 8 2 5 3 6 7};
Output Required = {4,7,0,6,3,4,2,1};
Greater than 3 are 4 elements in array (5,6,7,8)
Greater than 1 are 7 elements in array (2,3,3,5,6,7,8)
Greater than 8 are 0 elements in array ()
Greater than 2 are 6 elements in array (3,3,5,6,7,8)
Greater than 5 are 3 elements in array (6,7,8)
Greater than 3 are 4 elements in array (5,6,7,8)
Greater than 6 are 2 elements in array (7,8)
Greater than 7 are 1 elements in array (8)
So one approach will be just to run two nested for loops and be done with it,
time complexity O(N^2), space complexity O(1)
How this can be further optimized?

If you create a copy of the list and sort it, then (assuming unique elements), the 'greater element count' for a value is just
(total number of elements - 1 - position of value in sorted_list),
where we subtract 1 since indices start at 0 and we only want strictly greater elements.
When elements can be repeated, we should now find the unique elements of the original list and sort them, but also keep track of how many times each element appeared. Then, we need the 'weighted position' of each value in the sorted list, which is the sum of counts of all values at or before that index.
After creating a mapping from each unique value to the count of strictly greater elements, iterate over the original list, replacing each element with the count it's been mapped to.
Since we can convert between 'greater element counts' and the full sorted list in linear time, this method is asymptotically optimal, as it finds greater element counts in O(n log n) time.
Here's a short Python implementation of that idea.
def greater_element_counts(arr: List[int]) -> List[int]:
"""Return a list with the number of strictly larger elements
in arr for each position in arr."""
element_to_counts = collections.Counter(arr)
unique_sorted_elements = sorted(element_to_counts.keys())
greater_element_count = len(arr)
answer_by_element = {}
for unique_element in unique_sorted_elements:
greater_element_count -= element_to_counts[unique_element]
answer_by_element[unique_element] = greater_element_count
return [answer_by_element[element] for element in arr]

Related

Efficient removal of duplicates in array

How can duplicates be removed and recorded from an array with the following constraints:
The running time must be at most O(n log n)
The additional memory used must be at most O(n)
The result must fulfil the following:
Duplicates must be moved to the end of the original array
The order of the first occurrence of each unique element must be preserved
For example, from this input:
int A[] = {2,3,7,3,2,11,2,3,1,15};
The result should be similar to this (only the order of duplicates may differ):
2 3 7 11 1 15 3 3 2 2
As I understand it, the goal is to split an array into two parts: unique elements and duplicates in such a way that the order of the first occurrence of the unique elements is preserved.
Using the the array of the OP as an example:
A={2,3,7,3,2,11,2,3,1,15}
A solution could do the following::
Initialize the helper array with indices 0, ..., n-1:
B={0,1,2,3,4,5,6,7,8,9}
Sort the pairs (A[i],B[i]) using A[i] as key and with a stable sorting algorithm of complexity O(n log n):
A={1,2,2,2,3,3,3,7,11,15}
B={8,0,4,6,1,3,7,2,5, 9}
With n being the size of the array, go through the pairs (A[i],B[i]) and for all duplicates (A[i]==A[i-1]), add n to B[i]:
A={1,2, 2, 2,3, 3, 3,7,11,15}
B={8,0,14,16,1,13,17,2, 5, 9}
Sort the pairs (A[i],B[i]) again, but now using B[i] as key:
A={2,3,7,11,1,15, 3, 2, 2, 3}
B={0,1,2, 5,8, 9,13,14,16,17}
A then contains the desired result.
Steps 1 and 3 are O(n) and steps 2 and 4 can be done in O(n log n), so overall complexity is O(n log n).
Note that this method also preserves the order of duplicates. If you want them sorted, you can assign indices n, n+1, ... in step 3 instead of adding n.
Here is a very important hint: when an algorithm is permitted O(n) extra space, that is not the same as saying it can only use the same amount of memory as the input array!
For example, given the input array int array[] = {2,3,7,3,2,11,2,3,1,15}; (10 elements)That is a total space of 10 * sizeof(int) bytes.On a 64-bit machine an int is 8 bytes long, making the array 80 bytes of data.
However, I can use more space for my extra array than just 80 bytes! In fact, I can make a histogram structure that looks like this:
struct histogram
{
bool is_used; // Is this element in use in the histogram?
int value; // The integer value represented by this element
size_t index; // The index in the output array of the FIRST instance of the value
size_t count; // The number of times the value appears in the source array
};
typedef struct histogram histogram;
And since that is a fixed, finite amount of space, I can feel totally free to allocate n of them!
histogram * new_histogram( size_t size )
{
return calloc( size, sizeof(struct histogram) );
}
On my machine that’s 240 bytes.
And yes, this absolutely, totally complies with the O(n) extra space requirement! (Because we are only using space for n extra items. Bigger items, yes, but only n of them.)
Goals
So, why make a histogram with all that extra stuff in it?
We are counting duplicates — suggesting that we should be looking at a Counting Sort, and hence, a histogram.
Accept integers in a range beyond [0,n).
The example array has 10 items, so our histogram should only have 10 slots. But there are integer values larger than 9.
Keep all the non-duplicate values in the same order as input
So we need to track the index of the first instance of each value in the input array.
We are obviously not sorting the data, but the basic idea behind a Counting Sort is to build a histogram and then use that histogram to overwrite the array with the ordered elements.
This is a powerful idea. We are going to tweak it.
The Algorithm
Remember that our input array is also our output array! So we will overwrite the array’s input values with our algorithm.
Let’s look at our example again:
2 3 7 3 2 11 2 3 1 15
  0    1    2    3    4    •5     6    7    8     9
❶ Build the histogram:
0 1 2 3 4 5 6 7 8 9 (index in histogram)
used?: no yes yes yes yes yes no yes no no
value: 0 11 2 3 1 15 0 7 0 0
index: 0 3 0 1 4 5 0 2 0 0
count: 0 1 3 3 1 1 0 1 0 0
I used a simple non-negative modulo function to get a hash index into the histogram: abs(value) % histogram_size, then found the first matching or unused entry, again modulo the histogram size. Our histogram has a single collision: 1 and 11 (mod 10) both hash to 1. Since we encountered 11 first it gets stored at index 1 of the histogram, and for 1 we had to seek to the first unused index: 4.
We can see that the duplicate values all have a count of 2 or more, and all non-duplicate values have a count of 1.
The magic here is the index value. Look at 11. It’s index is 3, not 5. If we look at our desired output we can see why:
2 3 7 11 1 15   2 2 3 3.
  0    1    2    •3     4     5       6    7    8    9
The 11 is in index 3 of the output. This is a very simple counting trick when building the histogram. Keep a running index that we only increment when we first add a value to the histogram. This index is where the value should appear in the ouput!
❷ Use the histogram to put the non-duplicate values into the array.
Clearly, anything with a non-zero count appears at least once in the input, so it must also be output.
Here’s where our magic histogram index first helps us. We already know exactly where in the array to put the value!
2 3 7 11 1 15
  0    1    2     3     4     5    ⟵   index into the array to put the value
You should take a moment to compare the array output index with the index values stored in the histogram above and convince yourself that it works.
❸ Use the histogram to put the duplicate values into the array.
So, at what index do we start putting duplicates into the array? Do we happen to have some magic index laying around somewhere that could help? From when we built the histogram?
Again stating the obvious, anything with a count greater than 1 is a value with duplicates. For each duplicate, put count-1 copies into the array.
We don’t care what order the duplicates appear, so we’ll just take them in the order they are stored in the histogram.
Complexity
The complexity of a Counting Sort is O(n+k): one pass over the input array (to build the histogram) and one pass over the histogram data (to rebuild the array in sorted order).
Our modification is: one pass over the input array (to build the histogram), then one pass over the histogram to build the non-duplicate partition, then one more pass over the histogram to build the duplicates partition. That’s a complexity of O(n+2k).
In both cases it reduces to an O(n) worst-case complexity. In fact, it is also an Ω(n) best-case complexity, making it a Θ(n) complexity — it takes the same processing per element no matter what the input.
Aaaaaahhhh! I gotta code that!!!?
Yep. It is a only a tiny bit more complex than you are used to. Remember, you only need a few things:
An array of integer values (obtained from the user?)
A histogram array
A function to turn an integer value into an index into the histogram
A function that does the three things:
Build the histogram from the array
Use the histogram to write the non-duplicate values back into the array in the correct spots
Use the histogram to write the duplicate values to the end of the array
Ability to print an integer array
Your main() should look something like this:
int main(void)
{
// Get number of integers to input
int size = 0;
scanf( "%d", &n );
// Allocate and get the integers
int * array = malloc( size );
for (int n = 0; n < size; n++)
scanf( "%d", &array[n] );
// Partition the array between non-duplicate and duplicate values
int pivot = partition( array, size );
// Print the results
print_array( "non-duplicates:", array, pivot );
print_array( "duplicates: ", array+pivot, size-pivot );
free( array );
return 0;
}
Notice the complete lack of input error checking. You can assume that your professor will test your program without inputting hello or anything like that.
You can do this!

Counting according to query

Given an array of N positive elements. Lets suppose we list all N × (N+1) / 2 non-empty continuous subarrays of the array A and then replaced all the subarrays with the maximum element present in the respective subarray. So now we have N × (N+1) / 2 elements where each element is maximum among its subarray.
Now we are having Q queries, where each query is one of 3 types :
1 K : We need to count of numbers strictly greater than K among those N × (N+1) / 2 elements.
2 K : We need to count of numbers strictly less than K among those N × (N+1) / 2 elements.
3 K : We need to count of numbers equal to K among those N × (N+1) / 2 elements.
Now main problem am facing is N can be upto 10^6. So i can't generate all those N × (N+1) / 2 elements. Please help to solve this porblem.
Example : Let N=3 and we have Q=2. Let array A be [1,2,3] then all sub arrays are :
[1] -> [1]
[2] -> [2]
[3] -> [3]
[1,2] -> [2]
[2,3] -> [3]
[1,2,3] -> [3]
So now we have [1,2,3,2,3,3]. As Q=2 so :
Query 1 : 3 3
It means we need to tell count of numbers equal to 3. So answer is 3 as there are 3 numbers equal to 3 in the generated array.
Query 2 : 1 4
It means we need to tell count of numbers greater than 4. So answer is 0 as no one is greater than 4 in generated array.
Now both N and Q can be up to 10^6. So how to solve this problem. Which data structure should be suitable to solve it.
I believe I have a solution in O(N + Q*log N) (More about time complexity). The trick is to do a lot of preparation with your array before even the first query arrives.
For each number, figure out where is the first number on left / right of this number that is strictly bigger.
Example: for array: 1, 8, 2, 3, 3, 5, 1 both 3's left block would be position of 8, right block would be the position of 5.
This can be determined in linear time. This is how: Keep a stack of previous maximums in a stack. If a new maximum appears, remove maximums from the stack until you get to a element bigger than or equal to the current one. Illustration:
In this example, in the stack is: [15, 13, 11, 10, 7, 3] (you will of course keep the indexes, not the values, I will just use value for better readability).
Now we read 8, 8 >= 3 so we remove 3 from stack and repeat. 8 >= 7, remove 7. 8 < 10, so we stop removing. We set 10 as 8's left block, and add 8 to the maximums stack.
Also, whenever you remove from the stack (3 and 7 in this example), set the right block of removed number to the current number. One problem though: right block would be set to the next number bigger or equal, not strictly bigger. You can fix this with simply checking and relinking right blocks.
Compute what number is how many times a maximum of some subsequence.
Since for each number you now know where is the next left / right bigger number, I trust you with finding appropriate math formula for this.
Then, store the results in a hashmap, key would be a value of a number, and value would be how many times is that number a maximum of some subsequence. For example, record [4->12] would mean that number 4 is the maximum in 12 subsequences.
Lastly, extract all key-value pairs from the hashmap into an array, and sort that array by the keys. Finally, create a prefix sum for the values of that sorted array.
Handle a request
For request "exactly k", just binary search in your array, for more/less thank``, binary search for key k and then use the prefix array.
This answer is an adaptation of this other answer I wrote earlier. The first part is exactly the same, but the others are specific for this question.
Here's an implemented a O(n log n + q log n) version using a simplified version of a segment tree.
Creating the segment tree: O(n)
In practice, what it does is to take an array, let's say:
A = [5,1,7,2,3,7,3,1]
And construct an array-backed tree that looks like this:
In the tree, the first number is the value and the second is the index where it appears in the array. Each node is the maximum of its two children. This tree is backed by an array (pretty much like a heap tree) where the children of the index i are in the indexes i*2+1 and i*2+2.
Then, for each element, it becomes easy to find the nearest greater elements (before and after each element).
To find the nearest greater element to the left, we go up in the tree searching for the first parent where the left node has value greater and the index lesser than the argument. The answer must be a child of this parent, then we go down in the tree looking for the rightmost node that satisfies the same condition.
Similarly, to find the nearest greater element to the right, we do the same, but looking for a right node with an index greater than the argument. And when going down, we look for the leftmost node that satisfies the condition.
Creating the cumulative frequency array: O(n log n)
From this structure, we can compute the frequency array, that tells how many times each element appears as maximum in the subarray list. We just have to count how many lesser elements are on the left and on the right of each element and multiply those values. For the example array ([1, 2, 3]), this would be:
[(1, 1), (2, 2), (3, 3)]
This means that 1 appears only once as maximum, 2 appears twice, etc.
But we need to answer range queries, so it's better to have a cumulative version of this array, that would look like:
[(1, 1), (2, 3), (3, 6)]
The (3, 6) means, for example, that there are 6 subarrays with maxima less than or equal to 3.
Answering q queries: O(q log n)
Then, to answer each query, you just have to make binary searches to find the value you want. For example. If you need to find the exact number of 3, you may want to do: query(F, 3) - query(F, 2). If you want to find those lesser than 3, you do: query(F, 2). If you want to find those greater than 3: query(F, float('inf')) - query(F, 3).
Implementation
I've implemented it in Python and it seems to work well.
import sys, random, bisect
from collections import defaultdict
from math import log, ceil
def make_tree(A):
n = 2**(int(ceil(log(len(A), 2))))
T = [(None, None)]*(2*n-1)
for i, x in enumerate(A):
T[n-1+i] = (x, i)
for i in reversed(xrange(n-1)):
T[i] = max(T[i*2+1], T[i*2+2])
return T
def print_tree(T):
print 'digraph {'
for i, x in enumerate(T):
print ' ' + str(i) + '[label="' + str(x) + '"]'
if i*2+2 < len(T):
print ' ' + str(i)+ '->'+ str(i*2+1)
print ' ' + str(i)+ '->'+ str(i*2+2)
print '}'
def find_generic(T, i, fallback, check, first, second):
j = len(T)/2+i
original = T[j]
j = (j-1)/2
#go up in the tree searching for a value that satisfies check
while j > 0 and not check(T[second(j)], original):
j = (j-1)/2
#go down in the tree searching for the left/rightmost node that satisfies check
while j*2+1<len(T):
if check(T[first(j)], original):
j = first(j)
elif check(T[second(j)], original):
j = second(j)
else:
return fallback
return j-len(T)/2
def find_left(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>b[0] and a[1]<b[1], #value greater, index before
lambda j: j*2+2, #rightmost first
lambda j: j*2+1 #leftmost second
)
def find_right(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>=b[0] and a[1]>b[1], #value greater or equal, index after
lambda j: j*2+1, #leftmost first
lambda j: j*2+2 #rightmost second
)
def make_frequency_array(A):
T = make_tree(A)
D = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = find_left(T, i, -1)
right = find_right(T, i, len(A))
D[x] += (i-left) * (right-i)
F = sorted(D.items())
for i in range(1, len(F)):
F[i] = (F[i][0], F[i-1][1] + F[i][1])
return F
def query(F, n):
idx = bisect.bisect(F, (n,))
if idx>=len(F): return F[-1][1]
if F[idx][0]!=n: return 0
return F[idx][1]
F = make_frequency_array([1,2,3])
print query(F, 3)-query(F, 2) #3 3
print query(F, float('inf'))-query(F, 4) #1 4
print query(F, float('inf'))-query(F, 1) #1 1
print query(F, 2) #2 3
You problem can be divided into several steps:
For each element of initial array calculate the number of "subarrays" where current element is maximum. This will involve a bit of combinatorics. First you need for each element to know index of previous and next element that is bigger than current element. Then calculate the number of subarrays as (i - iprev) * (inext - i). Finding iprev and inext requires two traversals of the initial array: in forward and backward order. For iprev you need to traverse you array left to right. During the traversal maintain the BST that contains the biggest of the previous elements along with their index. For each element of original array, find the minimal element in BST that is bigger than current. It's index, stored as value, will be iprev. Then remove from BST all elements that are smaller that current. This operation should be O(logN), as you are removing whole subtrees. This step is required, as current element you are about to add will "override" all element that are less than it. Then add current element to BST with it's index as value. At each point of time, BST will store the descending subsequence of previous elements where each element is bigger than all it's predecessors in array (for previous elements {1,2,44,5,2,6,26,6} BST will store {44,26,6}). The backward traversal to find inext is similar.
After previous step you'll have pairs K→P where K is the value of some element from the initial array and P is the number of subarrays where this element is maxumum. Now you need to group this pairs by K. This means calculating sum of P values of the equal K elements. Be careful about the corner cases when two elements could have share the same subarrays.
As Ritesh suggested: Put all grouped K→P into an array, sort it by K and calculate cumulative sum of P for each element in one pass. It this case your queries will be binary searches in this sorted array. Each query will be performed in O(log(N)) time.
Create a sorted value-to-index map. For example,
[34,5,67,10,100] => {5:1, 10:3, 34:0, 67:2, 100:4}
Precalculate the queries in two passes over the value-to-index map:
Top to bottom - maintain an augmented tree of intervals. Each time an index is added,
split the appropriate interval and subtract the relevant segments from the total:
indexes intervals total sub-arrays with maximum greater than
4 (0,3) 67 => 15 - (4*5/2) = 5
2,4 (0,1)(3,3) 34 => 5 + (4*5/2) - 2*3/2 - 1 = 11
0,2,4 (1,1)(3,3) 10 => 11 + 2*3/2 - 1 = 13
3,0,2,4 (1,1) 5 => 13 + 1 = 14
Bottom to top - maintain an augmented tree of intervals. Each time an index is added,
adjust the appropriate interval and add the relevant segments to the total:
indexes intervals total sub-arrays with maximum less than
1 (1,1) 10 => 1*2/2 = 1
1,3 (1,1)(3,3) 34 => 1 + 1*2/2 = 2
0,1,3 (0,1)(3,3) 67 => 2 - 1 + 2*3/2 = 4
0,1,3,2 (0,3) 100 => 4 - 4 + 4*5/2 = 10
The third query can be pre-calculated along with the second:
indexes intervals total sub-arrays with maximum exactly
1 (1,1) 5 => 1
1,3 (3,3) 10 => 1
0,1,3 (0,1) 34 => 2
0,1,3,2 (0,3) 67 => 3 + 3 = 6
Insertion and deletion in augmented trees are of O(log n) time-complexity. Total precalculation time-complexity is O(n log n). Each query after that ought to be O(log n) time-complexity.

An array with O(n) inversions

Trying to figure out what type of an array consists of at most n inversions with n being the array size. I was thinking an array that is nearly sorted would fall under this case and also an array that is almost completely sorted with the max element and min element switched, for instance..
9 2 3 4 5 6 7 8 1
So my thinking is that when an array has at most n inversions, is it safe to say that the array is nearly sorted? Or are there other cases where the array would have at most n inversions and not be nearly sorted.
The 'least' sorted array (i.e. reverse sorted) has 1 + 2 + 3 + ... + n-1 = n(n-1)/2 inversions.
The less inversions an array has, the 'more' sorted it is.
And, since n is quite a bit smaller than n(n-1)/2, one can probably call an array with n inversions 'nearly sorted'.
This array has n-1 inversions:
9 1 2 3 4 5 6 7 8
In response to your comment, insertion sort's complexity is O(n + d), where d is the number of inversions, thus it will run in O(n) for O(n) inversions.

2sum with duplicate values

The classic 2sum question is simple and well-known:
You have an unsorted array, and you are given a value S. Find all pairs of elements in the array that add up to value S.
And it's always been said that this can be solved with the use of HashTable in O(N) time & space complexity or O(NlogN) time and O(1) space complexity by first sorting it and then moving from left and right,
well these two solution are obviously correct BUT I guess not for the following array :
{1,1,1,1,1,1,1,1}
Is it possible to print ALL pairs which add up to 2 in this array in O(N) or O(NlogN) time complexity ?
No, printing out all pairs (including duplicates) takes O(N2). The reason is because the output size is O(N2), thus the running time cannot be less than that (since it takes some constant amount of time to print each element in the output, thus to simply print the output would take CN2 = O(N2) time).
If all the elements are the same, e.g. {1,1,1,1,1}, every possible pair would be in the output:
1. 1 1
2. 1 1
3. 1 1
4. 1 1
5. 1 1
6. 1 1
7. 1 1
8. 1 1
9. 1 1
10. 1 1
This is N-1 + N-2 + ... + 2 + 1 (by taking each element with all elements to the right), which is
N(N-1)/2 = O(N2), which is more than O(N) or O(N log N).
However, you should be able to simply count the pairs in expected O(N) by:
Creating a hash-map map mapping each element to the count of how often it appears.
Looping through the hash-map and summing, for each element x up to S/2 (if we go up to S we'll include the pair x and S-x twice, let map[x] == 0 if x doesn't exist in the map):
map[x]*map[S-x] if x != S-x (which is the number of ways to pick x and S-x)
map[x]*(map[x]-1)/2 if x == S-x (from N(N-1)/2 above).
Of course you can also print the distinct pairs in O(N) by creating a hash-map similar to the above and looping through it, and only outputting x and S-x the value if map[S-x] exists.
Displaying or storing the results is O(N2) only.The worst case as highlighted by you clearly has N2 pairs and to write them onto the screen or storing them into a result array would clearly require at least that much time.In short, you are right!
No
You can pre-compute them in O(nlogn) using sorting but to print them you may need more than O(nlogn).In worst case It can be O(N^2).
Let's modify the algorithm to find all duplicate pairs.
As an example:
a[ ]={ 2 , 4 , 3 , 2 , 9 , 3 , 3 } and sum =6
After sorting:
a[ ] = { 2 , 2 , 3 , 3 , 3 , 4 , 9 }
Suppose you found pair {2,4}, now you have to find count of 2 and 4 and multiply them to get no of duplicate pairs.Here 2 occurs 2 times and 1 occurs 1 times.Hence {2,1} will appear 2*1 = 2 times in output.Now consider special case when both numbers are same then count no of occurrence and sq them .Here { 3,3 } sum to 6. occurrence of 3 in array is 3.Hence { 3,3 } will appear 9 times in output.
In your array {1,1,1,1,1} only pair {1,1} will sum to 2 and count of 1 is 5.hence there are going to 5^2=25 pairs of {1,1} in output.

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Resources