Counting according to query - arrays

Given an array of N positive elements. Lets suppose we list all N × (N+1) / 2 non-empty continuous subarrays of the array A and then replaced all the subarrays with the maximum element present in the respective subarray. So now we have N × (N+1) / 2 elements where each element is maximum among its subarray.
Now we are having Q queries, where each query is one of 3 types :
1 K : We need to count of numbers strictly greater than K among those N × (N+1) / 2 elements.
2 K : We need to count of numbers strictly less than K among those N × (N+1) / 2 elements.
3 K : We need to count of numbers equal to K among those N × (N+1) / 2 elements.
Now main problem am facing is N can be upto 10^6. So i can't generate all those N × (N+1) / 2 elements. Please help to solve this porblem.
Example : Let N=3 and we have Q=2. Let array A be [1,2,3] then all sub arrays are :
[1] -> [1]
[2] -> [2]
[3] -> [3]
[1,2] -> [2]
[2,3] -> [3]
[1,2,3] -> [3]
So now we have [1,2,3,2,3,3]. As Q=2 so :
Query 1 : 3 3
It means we need to tell count of numbers equal to 3. So answer is 3 as there are 3 numbers equal to 3 in the generated array.
Query 2 : 1 4
It means we need to tell count of numbers greater than 4. So answer is 0 as no one is greater than 4 in generated array.
Now both N and Q can be up to 10^6. So how to solve this problem. Which data structure should be suitable to solve it.

I believe I have a solution in O(N + Q*log N) (More about time complexity). The trick is to do a lot of preparation with your array before even the first query arrives.
For each number, figure out where is the first number on left / right of this number that is strictly bigger.
Example: for array: 1, 8, 2, 3, 3, 5, 1 both 3's left block would be position of 8, right block would be the position of 5.
This can be determined in linear time. This is how: Keep a stack of previous maximums in a stack. If a new maximum appears, remove maximums from the stack until you get to a element bigger than or equal to the current one. Illustration:
In this example, in the stack is: [15, 13, 11, 10, 7, 3] (you will of course keep the indexes, not the values, I will just use value for better readability).
Now we read 8, 8 >= 3 so we remove 3 from stack and repeat. 8 >= 7, remove 7. 8 < 10, so we stop removing. We set 10 as 8's left block, and add 8 to the maximums stack.
Also, whenever you remove from the stack (3 and 7 in this example), set the right block of removed number to the current number. One problem though: right block would be set to the next number bigger or equal, not strictly bigger. You can fix this with simply checking and relinking right blocks.
Compute what number is how many times a maximum of some subsequence.
Since for each number you now know where is the next left / right bigger number, I trust you with finding appropriate math formula for this.
Then, store the results in a hashmap, key would be a value of a number, and value would be how many times is that number a maximum of some subsequence. For example, record [4->12] would mean that number 4 is the maximum in 12 subsequences.
Lastly, extract all key-value pairs from the hashmap into an array, and sort that array by the keys. Finally, create a prefix sum for the values of that sorted array.
Handle a request
For request "exactly k", just binary search in your array, for more/less thank``, binary search for key k and then use the prefix array.

This answer is an adaptation of this other answer I wrote earlier. The first part is exactly the same, but the others are specific for this question.
Here's an implemented a O(n log n + q log n) version using a simplified version of a segment tree.
Creating the segment tree: O(n)
In practice, what it does is to take an array, let's say:
A = [5,1,7,2,3,7,3,1]
And construct an array-backed tree that looks like this:
In the tree, the first number is the value and the second is the index where it appears in the array. Each node is the maximum of its two children. This tree is backed by an array (pretty much like a heap tree) where the children of the index i are in the indexes i*2+1 and i*2+2.
Then, for each element, it becomes easy to find the nearest greater elements (before and after each element).
To find the nearest greater element to the left, we go up in the tree searching for the first parent where the left node has value greater and the index lesser than the argument. The answer must be a child of this parent, then we go down in the tree looking for the rightmost node that satisfies the same condition.
Similarly, to find the nearest greater element to the right, we do the same, but looking for a right node with an index greater than the argument. And when going down, we look for the leftmost node that satisfies the condition.
Creating the cumulative frequency array: O(n log n)
From this structure, we can compute the frequency array, that tells how many times each element appears as maximum in the subarray list. We just have to count how many lesser elements are on the left and on the right of each element and multiply those values. For the example array ([1, 2, 3]), this would be:
[(1, 1), (2, 2), (3, 3)]
This means that 1 appears only once as maximum, 2 appears twice, etc.
But we need to answer range queries, so it's better to have a cumulative version of this array, that would look like:
[(1, 1), (2, 3), (3, 6)]
The (3, 6) means, for example, that there are 6 subarrays with maxima less than or equal to 3.
Answering q queries: O(q log n)
Then, to answer each query, you just have to make binary searches to find the value you want. For example. If you need to find the exact number of 3, you may want to do: query(F, 3) - query(F, 2). If you want to find those lesser than 3, you do: query(F, 2). If you want to find those greater than 3: query(F, float('inf')) - query(F, 3).
Implementation
I've implemented it in Python and it seems to work well.
import sys, random, bisect
from collections import defaultdict
from math import log, ceil
def make_tree(A):
n = 2**(int(ceil(log(len(A), 2))))
T = [(None, None)]*(2*n-1)
for i, x in enumerate(A):
T[n-1+i] = (x, i)
for i in reversed(xrange(n-1)):
T[i] = max(T[i*2+1], T[i*2+2])
return T
def print_tree(T):
print 'digraph {'
for i, x in enumerate(T):
print ' ' + str(i) + '[label="' + str(x) + '"]'
if i*2+2 < len(T):
print ' ' + str(i)+ '->'+ str(i*2+1)
print ' ' + str(i)+ '->'+ str(i*2+2)
print '}'
def find_generic(T, i, fallback, check, first, second):
j = len(T)/2+i
original = T[j]
j = (j-1)/2
#go up in the tree searching for a value that satisfies check
while j > 0 and not check(T[second(j)], original):
j = (j-1)/2
#go down in the tree searching for the left/rightmost node that satisfies check
while j*2+1<len(T):
if check(T[first(j)], original):
j = first(j)
elif check(T[second(j)], original):
j = second(j)
else:
return fallback
return j-len(T)/2
def find_left(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>b[0] and a[1]<b[1], #value greater, index before
lambda j: j*2+2, #rightmost first
lambda j: j*2+1 #leftmost second
)
def find_right(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>=b[0] and a[1]>b[1], #value greater or equal, index after
lambda j: j*2+1, #leftmost first
lambda j: j*2+2 #rightmost second
)
def make_frequency_array(A):
T = make_tree(A)
D = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = find_left(T, i, -1)
right = find_right(T, i, len(A))
D[x] += (i-left) * (right-i)
F = sorted(D.items())
for i in range(1, len(F)):
F[i] = (F[i][0], F[i-1][1] + F[i][1])
return F
def query(F, n):
idx = bisect.bisect(F, (n,))
if idx>=len(F): return F[-1][1]
if F[idx][0]!=n: return 0
return F[idx][1]
F = make_frequency_array([1,2,3])
print query(F, 3)-query(F, 2) #3 3
print query(F, float('inf'))-query(F, 4) #1 4
print query(F, float('inf'))-query(F, 1) #1 1
print query(F, 2) #2 3

You problem can be divided into several steps:
For each element of initial array calculate the number of "subarrays" where current element is maximum. This will involve a bit of combinatorics. First you need for each element to know index of previous and next element that is bigger than current element. Then calculate the number of subarrays as (i - iprev) * (inext - i). Finding iprev and inext requires two traversals of the initial array: in forward and backward order. For iprev you need to traverse you array left to right. During the traversal maintain the BST that contains the biggest of the previous elements along with their index. For each element of original array, find the minimal element in BST that is bigger than current. It's index, stored as value, will be iprev. Then remove from BST all elements that are smaller that current. This operation should be O(logN), as you are removing whole subtrees. This step is required, as current element you are about to add will "override" all element that are less than it. Then add current element to BST with it's index as value. At each point of time, BST will store the descending subsequence of previous elements where each element is bigger than all it's predecessors in array (for previous elements {1,2,44,5,2,6,26,6} BST will store {44,26,6}). The backward traversal to find inext is similar.
After previous step you'll have pairs K→P where K is the value of some element from the initial array and P is the number of subarrays where this element is maxumum. Now you need to group this pairs by K. This means calculating sum of P values of the equal K elements. Be careful about the corner cases when two elements could have share the same subarrays.
As Ritesh suggested: Put all grouped K→P into an array, sort it by K and calculate cumulative sum of P for each element in one pass. It this case your queries will be binary searches in this sorted array. Each query will be performed in O(log(N)) time.

Create a sorted value-to-index map. For example,
[34,5,67,10,100] => {5:1, 10:3, 34:0, 67:2, 100:4}
Precalculate the queries in two passes over the value-to-index map:
Top to bottom - maintain an augmented tree of intervals. Each time an index is added,
split the appropriate interval and subtract the relevant segments from the total:
indexes intervals total sub-arrays with maximum greater than
4 (0,3) 67 => 15 - (4*5/2) = 5
2,4 (0,1)(3,3) 34 => 5 + (4*5/2) - 2*3/2 - 1 = 11
0,2,4 (1,1)(3,3) 10 => 11 + 2*3/2 - 1 = 13
3,0,2,4 (1,1) 5 => 13 + 1 = 14
Bottom to top - maintain an augmented tree of intervals. Each time an index is added,
adjust the appropriate interval and add the relevant segments to the total:
indexes intervals total sub-arrays with maximum less than
1 (1,1) 10 => 1*2/2 = 1
1,3 (1,1)(3,3) 34 => 1 + 1*2/2 = 2
0,1,3 (0,1)(3,3) 67 => 2 - 1 + 2*3/2 = 4
0,1,3,2 (0,3) 100 => 4 - 4 + 4*5/2 = 10
The third query can be pre-calculated along with the second:
indexes intervals total sub-arrays with maximum exactly
1 (1,1) 5 => 1
1,3 (3,3) 10 => 1
0,1,3 (0,1) 34 => 2
0,1,3,2 (0,3) 67 => 3 + 3 = 6
Insertion and deletion in augmented trees are of O(log n) time-complexity. Total precalculation time-complexity is O(n log n). Each query after that ought to be O(log n) time-complexity.

Related

Number of ways of partitioning an array

Given an array of n elements, a k-partitioning of the array would be to split the array in k contiguous subarrays such that the maximums of the subarrays are non-increasing. Namely max(subarray1) >= max(subarray2) >= ... >= max(subarrayK).
In how many ways can an array be partitioned into valid partitions like the ones mentioned before?
Note: k isn't given as input or anything, I mereley used it to illustrate the general case. A partition could have any size from 1 to n, we just need to find all the valid ones.
Example, the array [3, 2, 1] can be partitioned in 4 ways, you can see them below:
The valid partitions :[3, 2, 1]; [3, [2, 1]]; [[3, 2], 1]; [[3], [2], [1]].
I've found a similar problem related to linear partitioning, but I couldn't find a way to adapt the thinking to this problem. I'm pretty sure this is dynamic programming, but I haven't been able to properly identify
how to model the problem using a recurrence relation.
How would you solve this?
Call an element of the input a tail-max if it is at least as great as all elements that follow. For example, in the following input:
5 9 3 3 1 2
the following elements are tail-maxes:
5 9 3 3 1 2
^ ^ ^ ^
In a valid partition, every subarray must contain the next tail-max at or after the subarray's starting position; otherwise, the next tail-max will be the max of some later subarray, and the condition of non-increasing subarray maximums will be violated.
On the other hand, if every subarray contains the next tail-max at or after the subarray's starting position, then the partition must be valid, as the definition of a tail-max ensures that the maximum of a later subarray cannot be greater.
If we identify the tail-maxes of an array, for example
1 1 9 2 1 6 5 1
. . X . . X X X
where X means tail-max and . means not, then we can't place any subarray boundaries before the first tail-max, because if we do, the first subarray won't contain a tail-max. We can place at most one subarray boundary between a tail-max and the next; if we place more, we get a subarray that doesn't contain a tail-max. The last tail-max must be the last element of the input, so we can't place a subarray boundary after the last tail-max.
If there are m non-tail-max elements between a tail-max and the next, that gives us m+2 options: m+1 places to put an array boundary, or we can choose not to place a boundary between these elements. These factors are multiplicative.
We can make one pass from the end of the input to the start, identifying the lengths of the gaps between tail-maxes and multiplying together the appropriate factors to solve the problem in O(n) time:
def partitions(array):
tailmax = None
factor = 1
result = 1
for i in reversed(array):
if tailmax is None:
tailmax = i
continue
factor += 1
if i >= tailmax:
# i is a new tail-max.
# Multiply the result by a factor indicating how many options we
# have for placing a boundary between i and the old tail-max.
tailmax = i
result *= factor
factor = 1
return result
Update: Sorry I misunderstanding the problem. In this case, split the arrays to sub-arrays where every tails is the max element in the array, then it will work in narrow cases. e.g. [2 4 5 9 6 8 3 1] would be split to [[2 4 5 9] 6 8 9 3 1] first. Then we can freely chose range 0 - 5 to decide whether following are included. You can use an array to record the result of DP. Our goal is res[0]. We already have res[0] = res[5] + res[6] + res[7] + res[8] + res[9] + res[10] in above example and res[10] = 1
def getnum(array):
res = [-1 for x in range(len(array))]
res[0] = valueAt(array, res, 0)
return res[0]
def valueAt(array, res, i):
m = array[i]
idx = i
for index in range(i, len(array), 1):
if array[index] > m:
idx = index
m = array[index]
value = 1;
for index in range(idx + 1, len(array), 1):
if res[index] == -1:
res[index] = valueAt(array, res, index)
value = value + res[index]
return value;
Worse than the answer above in time consuming. DP always costs a lot.
Old Answer: If no duplicate elements in an array is allowed, the following way would work:
Notice that the number of sub-arrays is not depends on the values of elements if no duplicate. We can remark the number is N(n) if there is n elements in array.
The largest element must be in the first sub-arrays, other elements can be in or not in the first sub-array. Depends on whether they are in the first sub-array, the number of partitions for the remaining elements varies.
So,
N(n) = C(n-1, 1)N(n-1) + C(n-1, 2)N(n-2) + ... + C(n-1, n-1)N(0)
where C(n,k) means:
Then it can be solved by DP.
Hope this helps

Smallest "n" sums from n arrays

I was trying to do my friends problem set from a few years ago to sharpen up my knowledge about data structures etc. I came across this problem, and I'm not really sure where to start. Hopefully someone could help me out!
We are given n unsorted arrays, each array has n elements. Ex.
3 1 2
7 6 9
4 9 12
Now, say we take one element from each array, and add them up. Lets just call the sum of these elements an "n-sum".
I need to devise an algorithm that gives us the n smallest "n-sums" (duplicates are allowed).
In our above ex, the answer would be:
11, 12, 12
# 11 comes from: 1 (first array) + 6 (second array) + 4 (third array)
# 12 comes from: 2 (first array) + 6 (second array) + 4 (third array)
# 12 comes from: 1 (first array) + 7 (second array) + 4 (third array)
One of the suggestions given were to use a priority queue.
Thanks!
The time is at least O (n^2): You must visit all array elements, because if all elements were equal to 1000 except on in each row being 0, you would have to look at the n elements equal to 0, or you couldn't find the smallest sum.
Sort each row, taking O (n^2 log n) steps. In each row, subtract the first element from all elements in the row, so the first element in each row is 0; after you found the smallest sums you can compensate for that. Your example turns into
3 1 2 -> 1 2 3 -> 0 1 2
7 6 9 -> 6 7 9 -> 0 1 3
4 9 12 -> 4 9 12-> 0 5 7
Now finding all sums ≤ K can be done in m steps if there are m sums: In the first row, pick all values in turn as long as they are ≤ K. In the second row, pick all values in turn as long as the sum from two rows is ≤ K and so on. Since each row starts with 0, no time is wasted.
For example, sums ≤ 5 are: 0+0+0, 0+0+5, 0+1+0, 0+3+0, 1+0+0, 1+1+0, 1+3+0, 2+0+0, 2+1+0, 2+3+0. Many more than the three that we needed. If we stop after finding 3 sums ≤ 5, we know very quickly "there are at least 3 sums ≤ 5". We need to have an early stop, because in the general case there could be n^n possible sums.
If you pick K = "largest element in the second column", then you know there are at least n+1 sums with a value ≤ K, because you can pick all 0's, or all 0's except one value from the second column. In your example, K = 5 (we know that worked). Let X be the value where there are n sums ≤ X but fewer than n sums ≤ X - 1. We find X with binary search between 0 and K, and then we find the sums. Example:
K = 5 is known to be big enough. We try K = 2, and find 4 sums (actually we stop at 3 sums). Too many. We try K = 1, and there are three solutions 0+0+0, 0+1+0 and 1+0+0. We try K = 0, but only one solution.
This part goes very quick, so we'd try to reduce the time used for sorting. We notice that in this case looking at the first two columns was enough. We can in each row find the two smallest items, and in this case that would be enough. If the two smallest items are not enough to determine the n smallest sums, find the third smallest item etc. where needed. For example, since the 2nd largest item of the last row is 5, we wouldn't need the third item of the row, because even the 5 is not element of a sum if K ≤ 4.

Smallest number that cannot be formed from sum of numbers from array

This problem was asked to me in Amazon interview -
Given a array of positive integers, you have to find the smallest positive integer that can not be formed from the sum of numbers from array.
Example:
Array:[4 13 2 3 1]
result= 11 { Since 11 was smallest positive number which can not be formed from the given array elements }
What i did was :
sorted the array
calculated the prefix sum
Treverse the sum array and check if next element is less than 1
greater than sum i.e. A[j]<=(sum+1). If not so then answer would
be sum+1
But this was nlog(n) solution.
Interviewer was not satisfied with this and asked a solution in less than O(n log n) time.
There's a beautiful algorithm for solving this problem in time O(n + Sort), where Sort is the amount of time required to sort the input array.
The idea behind the algorithm is to sort the array and then ask the following question: what is the smallest positive integer you cannot make using the first k elements of the array? You then scan forward through the array from left to right, updating your answer to this question, until you find the smallest number you can't make.
Here's how it works. Initially, the smallest number you can't make is 1. Then, going from left to right, do the following:
If the current number is bigger than the smallest number you can't make so far, then you know the smallest number you can't make - it's the one you've got recorded, and you're done.
Otherwise, the current number is less than or equal to the smallest number you can't make. The claim is that you can indeed make this number. Right now, you know the smallest number you can't make with the first k elements of the array (call it candidate) and are now looking at value A[k]. The number candidate - A[k] therefore must be some number that you can indeed make with the first k elements of the array, since otherwise candidate - A[k] would be a smaller number than the smallest number you allegedly can't make with the first k numbers in the array. Moreover, you can make any number in the range candidate to candidate + A[k], inclusive, because you can start with any number in the range from 1 to A[k], inclusive, and then add candidate - 1 to it. Therefore, set candidate to candidate + A[k] and increment k.
In pseudocode:
Sort(A)
candidate = 1
for i from 1 to length(A):
if A[i] > candidate: return candidate
else: candidate = candidate + A[i]
return candidate
Here's a test run on [4, 13, 2, 1, 3]. Sort the array to get [1, 2, 3, 4, 13]. Then, set candidate to 1. We then do the following:
A[1] = 1, candidate = 1:
A[1] ≤ candidate, so set candidate = candidate + A[1] = 2
A[2] = 2, candidate = 2:
A[2] ≤ candidate, so set candidate = candidate + A[2] = 4
A[3] = 3, candidate = 4:
A[3] ≤ candidate, so set candidate = candidate + A[3] = 7
A[4] = 4, candidate = 7:
A[4] ≤ candidate, so set candidate = candidate + A[4] = 11
A[5] = 13, candidate = 11:
A[5] > candidate, so return candidate (11).
So the answer is 11.
The runtime here is O(n + Sort) because outside of sorting, the runtime is O(n). You can clearly sort in O(n log n) time using heapsort, and if you know some upper bound on the numbers you can sort in time O(n log U) (where U is the maximum possible number) by using radix sort. If U is a fixed constant, (say, 109), then radix sort runs in time O(n) and this entire algorithm then runs in time O(n) as well.
Hope this helps!
Use bitvectors to accomplish this in linear time.
Start with an empty bitvector b. Then for each element k in your array, do this:
b = b | b << k | 2^(k-1)
To be clear, the i'th element is set to 1 to represent the number i, and | k is setting the k-th element to 1.
After you finish processing the array, the index of the first zero in b is your answer (counting from the right, starting at 1).
b=0
process 4: b = b | b<<4 | 1000 = 1000
process 13: b = b | b<<13 | 1000000000000 = 10001000000001000
process 2: b = b | b<<2 | 10 = 1010101000000101010
process 3: b = b | b<<3 | 100 = 1011111101000101111110
process 1: b = b | b<<1 | 1 = 11111111111001111111111
First zero: position 11.
Consider all integers in interval [2i .. 2i+1 - 1]. And suppose all integers below 2i can be formed from sum of numbers from given array. Also suppose that we already know C, which is sum of all numbers below 2i. If C >= 2i+1 - 1, every number in this interval may be represented as sum of given numbers. Otherwise we could check if interval [2i .. C + 1] contains any number from given array. And if there is no such number, C + 1 is what we searched for.
Here is a sketch of an algorithm:
For each input number, determine to which interval it belongs, and update corresponding sum: S[int_log(x)] += x.
Compute prefix sum for array S: foreach i: C[i] = C[i-1] + S[i].
Filter array C to keep only entries with values lower than next power of 2.
Scan input array once more and notice which of the intervals [2i .. C + 1] contain at least one input number: i = int_log(x) - 1; B[i] |= (x <= C[i] + 1).
Find first interval that is not filtered out on step #3 and corresponding element of B[] not set on step #4.
If it is not obvious why we can apply step 3, here is the proof. Choose any number between 2i and C, then sequentially subtract from it all the numbers below 2i in decreasing order. Eventually we get either some number less than the last subtracted number or zero. If the result is zero, just add together all the subtracted numbers and we have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2i, so it is "representable" and none of the subtracted numbers are used for its representation. When we add these subtracted numbers back, we have the representation of chosen number. This also suggests that instead of filtering intervals one by one we could skip several intervals at once by jumping directly to int_log of C.
Time complexity is determined by function int_log(), which is integer logarithm or index of the highest set bit in the number. If our instruction set contains integer logarithm or any its equivalent (count leading zeros, or tricks with floating point numbers), then complexity is O(n). Otherwise we could use some bit hacking to implement int_log() in O(log log U) and obtain O(n * log log U) time complexity. (Here U is largest number in the array).
If step 1 (in addition to updating the sum) will also update minimum value in given range, step 4 is not needed anymore. We could just compare C[i] to Min[i+1]. This means we need only single pass over input array. Or we could apply this algorithm not to array but to a stream of numbers.
Several examples:
Input: [ 4 13 2 3 1] [ 1 2 3 9] [ 1 1 2 9]
int_log: 2 3 1 1 0 0 1 1 3 0 0 1 3
int_log: 0 1 2 3 0 1 2 3 0 1 2 3
S: 1 5 4 13 1 5 0 9 2 2 0 9
C: 1 6 10 23 1 6 6 15 2 4 4 13
filtered(C): n n n n n n n n n n n n
number in
[2^i..C+1]: 2 4 - 2 - - 2 - -
C+1: 11 7 5
For multi-precision input numbers this approach needs O(n * log M) time and O(log M) space. Where M is largest number in the array. The same time is needed just to read all the numbers (and in the worst case we need every bit of them).
Still this result may be improved to O(n * log R) where R is the value found by this algorithm (actually, the output-sensitive variant of it). The only modification needed for this optimization is instead of processing whole numbers at once, process them digit-by-digit: first pass processes the low order bits of each number (like bits 0..63), second pass - next bits (like 64..127), etc. We could ignore all higher-order bits after result is found. Also this decreases space requirements to O(K) numbers, where K is number of bits in machine word.
If you sort the array, it will work for you. Counting sort could've done it in O(n), but if you think in a practically large scenario, range can be pretty high.
Quicksort O(n*logn) will do the work for you:
def smallestPositiveInteger(self, array):
candidate = 1
n = len(array)
array = sorted(array)
for i in range(0, n):
if array[i] <= candidate:
candidate += array[i]
else:
break
return candidate

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Highest Percentage Increase

Lets say we have the following set of numbers representing values over time
1 2 3 10 1 20 40 60
Now I am looking for an algorithm to find the highest percentage increase from one time to another. In the above case, the answer would be the pair (1, 60), which has a 6000% increase.
So far, the best algorithm I can think of is a brute-force method. We consider all possible pairs using a series of iterations:
1st Iteration:
1-2 1-3 1-10 .. 1-60
2nd Iteration
2-3 2-10 2-1 ... 2-60
(etc.)
This has complexity O(n3).
I've also been thinking about another approach. Find all the strictly increasing sequences, and determine only the perecentage increase in those strictly increasing sequences.
Does any other idea strike you guys? Please do correct me if my ideas are wrong!
I may have misunderstood the problem, but it seems that all you want is the largest and smallest numbers, since those are the two numbers that matter.
while true:
indexOfMax = max(list)
indexOfMin = min(list)
list.remove(indexOfMax)
list.remove(indexOfMin)
if(indexOfmax < indexOfMin)
contine
else if(indexOfMax == indexOfMin)
return -1
else
SUCCESS
As I understand (you didn't correct me in your comment), you want to maximize a[i]/a[j] for all j <= i. If that's correct, then for each i we only need to know smallest value before it.
int current_min = INFINITY;
double max_increase = 0;
for (int i = 0; i < n; ++i) {
current_min = min(current_min, a[i]);
max_increase = max(max_increase, a[i] / min);
}
So you just want to compare each number pair-wise and see which pair has the highest ratio from the second number to the first number? Just iterating with two loops (one with i=0 to n, and an inner loop with j=i+1 to n) is going to give you O(n^2). I guess this is actually your original solution, but you incorrectly said the complexity was O(n^3). It's n^2.
You could get to O(n log n), though. Take your list, make it into a list where each element is a pair of (index, value). Then sort it by the second element of the pair. Then have two pointers into the list, one coming from the left (0 to n-1), and the other coming from the right (n-1 to 0). Find the first pair of elements such that the left element's original index is less than the right element's original index. Done.
1 2 3 10 1 20 40 60
becomes
(1,0) (2,1) (3,2) (10,3) (1, 4) (20, 5) (40, 6) (60,7)
becomes
(1,0) (1,4) (2,1) (3,2) (10,3) (20,5) (40,6) (60,7)
So your answer is 60/1, from index 0 to index 7.
If this isn't what you're looking for, it would help if you said what the right answer was for your example numbers.
If I understand your problem correctly, you are looking for two indices (i, j) in the array with i < j that has the highest ratio A[j]/A[i]. If so, then you can reduce it to this related problem, which asks you to find the indices (i, j) with i ≤ j such that A[j] - A[i] is as large as possible. That problem has a very fast O(n)-time, O(1)-space algorithm that can be adapted to this problem as well. The intuition is to solve the problem for the array consisting of just the first element of your array, then for the first two elements, then the first three, etc. Once you've solved the problem for the first n elements of the array, you have an overall solution to the problem.
Let's think about how to do this. Initially, when you consider just the first element of the array, the best percentage increase you can get is 0% by comparing the element with itself. Now, suppose (inductively) that you've solved the problem for the first k array elements and want to see what happens when you look at the next array element. When this happens, one of two conditions holds. First, the maximum percentage increase over the first k elements might also be the maximum percentage increase over the first (k + 1) elements as well. For example, if the (k+1)st array element is an extremely small number, then chances are you can't get a large percentage increase from something in the first k elements to that value. Second, the maximum percentage increase might be from one of the first k elements to the (k + 1)st element. If this is the case, the highest percentage increase would be from the smallest of the first k elements to the (k + 1)st element.
Combining these two cases, we get that the best percentage increase over the first k + 1 elements is the maximum of
The highest percentage increase of the first k elements, or
The percentage increase from the smallest of the first k elements to the (k + 1)st element.
You can implement this by iterating across the array elements keeping track of two values - the minimum value you've seen so far and the pair that maximizes the percent increase. As an example, for your original example array of
1 2 3 10 1 20 40 60
The algorithm would work like this:
1 2 3 10 1 20 40 60
min 1 1 1 1 1 1 1 1
best (1,1) (1, 2) (1, 3) (1, 10) (1, 10) (1, 20) (1, 40) (1, 60)
and you'd output (1, 60) as the highest percentage increase. On a different array, like this one:
3 1 4 2 5
then the algorithm would trace out like this:
3 1 4 2 5
min 3 1 1 1 1
best (3,3) (3,3) (1,4) (1,4) (1,5)
and you'd output (1, 5).
This whole algorithm uses only O(1) space and runs in O(n) time, which is an extremely good solution to the problem.
Alternatively, you can think about reducing this problem directly to the maximum single-sell profit problem by taking the logarithm of all of the values in your array. In that case, if you find a pair of values where log A[j] - log A[i] is maximized, this is equivalent (using properties of logarithms) to finding a pair of values where log (A[j] / A[i]) is maximized. Since the log function is monotonically increasing, this means that you have found a pair of values where A[j] / A[i] is maximized, as intended.

Resources