Number of ways of partitioning an array - arrays

Given an array of n elements, a k-partitioning of the array would be to split the array in k contiguous subarrays such that the maximums of the subarrays are non-increasing. Namely max(subarray1) >= max(subarray2) >= ... >= max(subarrayK).
In how many ways can an array be partitioned into valid partitions like the ones mentioned before?
Note: k isn't given as input or anything, I mereley used it to illustrate the general case. A partition could have any size from 1 to n, we just need to find all the valid ones.
Example, the array [3, 2, 1] can be partitioned in 4 ways, you can see them below:
The valid partitions :[3, 2, 1]; [3, [2, 1]]; [[3, 2], 1]; [[3], [2], [1]].
I've found a similar problem related to linear partitioning, but I couldn't find a way to adapt the thinking to this problem. I'm pretty sure this is dynamic programming, but I haven't been able to properly identify
how to model the problem using a recurrence relation.
How would you solve this?

Call an element of the input a tail-max if it is at least as great as all elements that follow. For example, in the following input:
5 9 3 3 1 2
the following elements are tail-maxes:
5 9 3 3 1 2
^ ^ ^ ^
In a valid partition, every subarray must contain the next tail-max at or after the subarray's starting position; otherwise, the next tail-max will be the max of some later subarray, and the condition of non-increasing subarray maximums will be violated.
On the other hand, if every subarray contains the next tail-max at or after the subarray's starting position, then the partition must be valid, as the definition of a tail-max ensures that the maximum of a later subarray cannot be greater.
If we identify the tail-maxes of an array, for example
1 1 9 2 1 6 5 1
. . X . . X X X
where X means tail-max and . means not, then we can't place any subarray boundaries before the first tail-max, because if we do, the first subarray won't contain a tail-max. We can place at most one subarray boundary between a tail-max and the next; if we place more, we get a subarray that doesn't contain a tail-max. The last tail-max must be the last element of the input, so we can't place a subarray boundary after the last tail-max.
If there are m non-tail-max elements between a tail-max and the next, that gives us m+2 options: m+1 places to put an array boundary, or we can choose not to place a boundary between these elements. These factors are multiplicative.
We can make one pass from the end of the input to the start, identifying the lengths of the gaps between tail-maxes and multiplying together the appropriate factors to solve the problem in O(n) time:
def partitions(array):
tailmax = None
factor = 1
result = 1
for i in reversed(array):
if tailmax is None:
tailmax = i
continue
factor += 1
if i >= tailmax:
# i is a new tail-max.
# Multiply the result by a factor indicating how many options we
# have for placing a boundary between i and the old tail-max.
tailmax = i
result *= factor
factor = 1
return result

Update: Sorry I misunderstanding the problem. In this case, split the arrays to sub-arrays where every tails is the max element in the array, then it will work in narrow cases. e.g. [2 4 5 9 6 8 3 1] would be split to [[2 4 5 9] 6 8 9 3 1] first. Then we can freely chose range 0 - 5 to decide whether following are included. You can use an array to record the result of DP. Our goal is res[0]. We already have res[0] = res[5] + res[6] + res[7] + res[8] + res[9] + res[10] in above example and res[10] = 1
def getnum(array):
res = [-1 for x in range(len(array))]
res[0] = valueAt(array, res, 0)
return res[0]
def valueAt(array, res, i):
m = array[i]
idx = i
for index in range(i, len(array), 1):
if array[index] > m:
idx = index
m = array[index]
value = 1;
for index in range(idx + 1, len(array), 1):
if res[index] == -1:
res[index] = valueAt(array, res, index)
value = value + res[index]
return value;
Worse than the answer above in time consuming. DP always costs a lot.
Old Answer: If no duplicate elements in an array is allowed, the following way would work:
Notice that the number of sub-arrays is not depends on the values of elements if no duplicate. We can remark the number is N(n) if there is n elements in array.
The largest element must be in the first sub-arrays, other elements can be in or not in the first sub-array. Depends on whether they are in the first sub-array, the number of partitions for the remaining elements varies.
So,
N(n) = C(n-1, 1)N(n-1) + C(n-1, 2)N(n-2) + ... + C(n-1, n-1)N(0)
where C(n,k) means:
Then it can be solved by DP.
Hope this helps

Related

Convert sorted array into low high array

Interview question:
Given a sorted array of this form :
1,2,3,4,5,6,7,8,9
( A better example would be 10,20,35,42,51,66,71,84,99 but let's use above one)
Convert it to the following low high form without using extra memory or a standard library
1,9,2,8,3,7,4,6,5
A low-high form means that we use the smallest followed by highest. Then we use the second smallest and second-highest.
Initially, when he asked, I had used a secondary array and used the 2 pointer approach. I kept one pointer in front and the second pointer at last . then one by one I copied left and right data to my new array and then moved left as left ++ and right as --right till they cross or become same.
After this, he asked me to do it without memory.
My approach to solving it without memory was on following lines . But it was confusing and not working
1) swap 2nd and last in **odd** (pos index 1)
1,2,3,4,5,6,7,8,9 becomes
1,9,3,4,5,6,7,8,2
then we reach even
2) swap 3rd and last in **even** (pos index 2 we are at 3 )
1,9,3,4,5,6,7,8,2 becomes (swapped 3 and 2_ )
1,9,2,4,5,6,7,8,3
and then sawp 8 and 3
1,9,2,4,5,6,7,8,3 becomes
1,9,2,4,5,6,7,3,8
3) we reach in odd (pos index 3 we are at 4 )
1,9,2,4,5,6,7,3,8
becomes
1,9,2,8,5,6,7,3,4
4) swap even 5 to last
and here it becomes wrong
Let me start by pointing out that even registers are a kind of memory. Without any 'extra' memory (other than that occupied by the sorted array, that is) we don't even have counters! That said, here goes:
Let a be an array of n > 2 positive integers sorted in ascending order, with the positions indexed from 0 to n-1.
From i = 1 to n-2, bubble-sort the sub-array ranging from position i to position n-1 (inclusive), alternatively in descending and ascending order. (Meaning that you bubble-sort in descending order if i is odd and in ascending order if it is even.)
Since to bubble-sort you only need to compare, and possibly swap, adjacent elements, you won't need 'extra' memory.
(Mind you, if you start at i = 0 and first sort in ascending order, you don't even need a to be pre-sorted.)
And one more thing: as there was no talk of it in your question, I will keep very silent on the performance of the above algorithm...
We will make n/2 passes and during each pass we will swap each element, from left to right, starting with the element at position 2k-1, with the last element. Example:
pass 1
V
1,2,3,4,5,6,7,8,9
1,9,3,4,5,6,7,8,2
1,9,2,4,5,6,7,8,3
1,9,2,3,5,6,7,8,4
1,9,2,3,4,6,7,8,5
1,9,2,3,4,5,7,8,6
1,9,2,3,4,5,6,8,7
1,9,2,3,4,5,6,7,8
pass 2
V
1,9,2,3,4,5,6,7,8
1,9,2,8,4,5,6,7,3
1,9,2,8,3,5,6,7,4
1,9,2,8,3,4,6,7,5
1,9,2,8,3,4,5,7,6
1,9,2,8,3,4,5,6,7
pass 3
V
1,9,2,8,3,4,5,6,7
1,9,2,8,3,7,5,6,4
1,9,2,8,3,7,4,6,5
1,9,2,8,3,7,4,5,6
pass 4
V
1,9,2,8,3,7,4,5,6
1,9,2,8,3,7,4,6,5
This should take O(n^2) swaps and uses no extra memory beyond the counters involved.
The loop invariant to prove is that the first 2k+1 positions are correct after iteration k of the loop.
Alright, assuming that with constant space complexity, we need to lose some of our time complexity, the following algorithm possibly works in O(n^2) time complexity.
I wrote this in python. I wrote it as quickly as possible so apologies for any syntactical errors.
# s is the array passed.
def hi_low(s):
last = len(s)
for i in range(0, last, 2):
if s[i+1] == None:
break
index_to_swap = last
index_to_be_swapped = i+1
while s[index_to_be_swapped] != s[index_to_swap]:
# write your own swap func here
swap(s[index_to_swap], s[index_to_swap-1])
index_to_swap -=1
return s
Quick explanation:
Suppose the initial list given to us is:
1 2 3 4 5 6 7 8 9
So in our program, initially,
index_to_swap = last
meaning that it is pointing to 9, and
index_to_be_swapped = i+1
is i+1, i.e one step ahead of our current loop pointer. [Also remember we're looping with a difference of 2].
So initially,
i = 0
index_to_be_swapped = 1
index_to_swap = 9
and in the inner loop what we're checking is: until the values in both of these indexes are same, we keep on swapping
swap(s[index_to_swap], s[index_to_swap-1])
so it'll look like:
# initially:
1 2 3 4 5 6 7 8 9
^ ^---index_to_swap
^-----index_to_be_swapped
# after 1 loop
1 2 3 4 5 6 7 9 8
^ ^-----index_to_swap
^----- index_to_be_swapped
... goes on until
1 9 2 3 4 5 6 7 8
^-----index_to_swap
^-----index_to_be_swapped
Now, the inner loop's job is done, and the main loop is run again with
1 9 2 3 4 5 6 7 8
^ ^---- index_to_swap
^------index_to_be_swapped
This runs until it's behind 2.
So the outer loop runs for almost n\2 times, and for each outer loop the inner loop runs for almost n\2 times in the worst case so the time complexity if n/2*n/2 = n^2/4 which is the order of n^2 i.e O(n^2).
If there are any mistakes please feel free to point it out.
Hope this helps!
It will work for any sorted array
let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9];
let i = arr[0];
let j = arr[arr.length - 1];
let k = 0;
while(k < arr.length) {
arr[k] = i;
if(arr[k+1]) arr[k+1] = j;
i++;
k += 2;
j--;
}
console.log(arr);
Explanation: Because its a sorted array, you need to know 3 things to produce your expected output.
Starting Value : let i = arr[0]
Ending Value(You can also find it with the length of array by the way): let j = arr[arr.length -1]
Length of Array: arr.length
Loop through the array and set the value like this
arr[firstIndex] = firstValue, arr[thirdIndex] = firstValue + 1 and so on..
arr[secondIndex] = lastValue, arr[fourthIndex] = lastValue - 1 and so on..
Obviously you can do the same things in a different way. But i think that's the simplest way.

Counting appropriate number of subarrays in an array excluding some specific pairs?

Let's say, I have an array like this:
1 2 3 4 5
And given pair is (2,3), then number of possible subarrays that don't have (2,3) present in them will be,,
1. 1
2. 2
3. 3
4. 4
5. 5
6. 1 2
7. 3 4
8. 4 5
9. 3 4 5
So, the answer will be 9.
Obviously, there can be more of such pairs.
Now, one method that I thought of is of O(n^2) which involves finding all such elements of maximum length n. Can I do better? Thanks!
Let's see, this adhoc pseudocode should be O(n):
array = [1 2 3 4 5]
pair = [2 3]
length = array.length
n = 0
start = 0
while (start < length)
{
# Find next pair
pair_pos = start
while (pair_pos < length) and (array[pair_pos,pair_pos+1] != pair) # (**1)
{
pair_pos++
}
# Count subarrays
n += calc_number_of_subarrays(pair_pos-start) # (**2)
# Continue after the pair
start = pair_pos+2
}
print n
Note **1: This seems to involve a loop inside the outer loop. Since every element of the array is visited exactly once, both loops together are O(n). In fact, it is probably easy to refactor this to use only one while loop.
Note **2: Given an array of length l, there are l+(l-1)+(l-2)+...+1 subarrays (including the array itself). Which is easy to calculate in O(1), there is no loop involved. c/f Euler. :)
You don't need to find which subarrays are in an array to know how many of them there are. Finding where the pair is in the array is at most 2(n-1) array operations. Then you only need to do a simple calculation with the two lengths you extract from that. The amount of subarrays in an array of length 3 is, for example, 3 + 2 + 1 = 6 = (n(n+1))/2.
The solution uses that in a given array [a, ..., p1, p2, ..., b], the amount of subarrays without the pair is the amount of subarrays for [a, ..., p1] + the amount of subarrays for [p2, ..., b]. If multiple of such pairs exist, we repeat the same trick on [p2, ..., b] as if it was the whole array.
function amount_of_subarrays ::
index := 1
amount := 0
lastmatch := 0
while length( array ) > index do
if array[index] == pair[1] then
if array[index+1] == pair[2] then
length2 := index - lastmatch
amount := amount + ((length2 * (length2 + 1)) / 2)
lastmatch := index
fi
fi
index := index + 1
od
//index is now equal to the length
length2 := index - lastmatch
amount := amount + ((length2 * (length2 + 1)) / 2)
return amount
For an array [1, 2, 3, 4, 5] with pair [2, 3], index will be 2 when the two if-statements are true. amount will be updated to 3 and lastmatch will be updated to 2. No more matches will be found, so lastmatch is 2 and index is 5. amount will be 3 + 6 = 9.

Counting according to query

Given an array of N positive elements. Lets suppose we list all N × (N+1) / 2 non-empty continuous subarrays of the array A and then replaced all the subarrays with the maximum element present in the respective subarray. So now we have N × (N+1) / 2 elements where each element is maximum among its subarray.
Now we are having Q queries, where each query is one of 3 types :
1 K : We need to count of numbers strictly greater than K among those N × (N+1) / 2 elements.
2 K : We need to count of numbers strictly less than K among those N × (N+1) / 2 elements.
3 K : We need to count of numbers equal to K among those N × (N+1) / 2 elements.
Now main problem am facing is N can be upto 10^6. So i can't generate all those N × (N+1) / 2 elements. Please help to solve this porblem.
Example : Let N=3 and we have Q=2. Let array A be [1,2,3] then all sub arrays are :
[1] -> [1]
[2] -> [2]
[3] -> [3]
[1,2] -> [2]
[2,3] -> [3]
[1,2,3] -> [3]
So now we have [1,2,3,2,3,3]. As Q=2 so :
Query 1 : 3 3
It means we need to tell count of numbers equal to 3. So answer is 3 as there are 3 numbers equal to 3 in the generated array.
Query 2 : 1 4
It means we need to tell count of numbers greater than 4. So answer is 0 as no one is greater than 4 in generated array.
Now both N and Q can be up to 10^6. So how to solve this problem. Which data structure should be suitable to solve it.
I believe I have a solution in O(N + Q*log N) (More about time complexity). The trick is to do a lot of preparation with your array before even the first query arrives.
For each number, figure out where is the first number on left / right of this number that is strictly bigger.
Example: for array: 1, 8, 2, 3, 3, 5, 1 both 3's left block would be position of 8, right block would be the position of 5.
This can be determined in linear time. This is how: Keep a stack of previous maximums in a stack. If a new maximum appears, remove maximums from the stack until you get to a element bigger than or equal to the current one. Illustration:
In this example, in the stack is: [15, 13, 11, 10, 7, 3] (you will of course keep the indexes, not the values, I will just use value for better readability).
Now we read 8, 8 >= 3 so we remove 3 from stack and repeat. 8 >= 7, remove 7. 8 < 10, so we stop removing. We set 10 as 8's left block, and add 8 to the maximums stack.
Also, whenever you remove from the stack (3 and 7 in this example), set the right block of removed number to the current number. One problem though: right block would be set to the next number bigger or equal, not strictly bigger. You can fix this with simply checking and relinking right blocks.
Compute what number is how many times a maximum of some subsequence.
Since for each number you now know where is the next left / right bigger number, I trust you with finding appropriate math formula for this.
Then, store the results in a hashmap, key would be a value of a number, and value would be how many times is that number a maximum of some subsequence. For example, record [4->12] would mean that number 4 is the maximum in 12 subsequences.
Lastly, extract all key-value pairs from the hashmap into an array, and sort that array by the keys. Finally, create a prefix sum for the values of that sorted array.
Handle a request
For request "exactly k", just binary search in your array, for more/less thank``, binary search for key k and then use the prefix array.
This answer is an adaptation of this other answer I wrote earlier. The first part is exactly the same, but the others are specific for this question.
Here's an implemented a O(n log n + q log n) version using a simplified version of a segment tree.
Creating the segment tree: O(n)
In practice, what it does is to take an array, let's say:
A = [5,1,7,2,3,7,3,1]
And construct an array-backed tree that looks like this:
In the tree, the first number is the value and the second is the index where it appears in the array. Each node is the maximum of its two children. This tree is backed by an array (pretty much like a heap tree) where the children of the index i are in the indexes i*2+1 and i*2+2.
Then, for each element, it becomes easy to find the nearest greater elements (before and after each element).
To find the nearest greater element to the left, we go up in the tree searching for the first parent where the left node has value greater and the index lesser than the argument. The answer must be a child of this parent, then we go down in the tree looking for the rightmost node that satisfies the same condition.
Similarly, to find the nearest greater element to the right, we do the same, but looking for a right node with an index greater than the argument. And when going down, we look for the leftmost node that satisfies the condition.
Creating the cumulative frequency array: O(n log n)
From this structure, we can compute the frequency array, that tells how many times each element appears as maximum in the subarray list. We just have to count how many lesser elements are on the left and on the right of each element and multiply those values. For the example array ([1, 2, 3]), this would be:
[(1, 1), (2, 2), (3, 3)]
This means that 1 appears only once as maximum, 2 appears twice, etc.
But we need to answer range queries, so it's better to have a cumulative version of this array, that would look like:
[(1, 1), (2, 3), (3, 6)]
The (3, 6) means, for example, that there are 6 subarrays with maxima less than or equal to 3.
Answering q queries: O(q log n)
Then, to answer each query, you just have to make binary searches to find the value you want. For example. If you need to find the exact number of 3, you may want to do: query(F, 3) - query(F, 2). If you want to find those lesser than 3, you do: query(F, 2). If you want to find those greater than 3: query(F, float('inf')) - query(F, 3).
Implementation
I've implemented it in Python and it seems to work well.
import sys, random, bisect
from collections import defaultdict
from math import log, ceil
def make_tree(A):
n = 2**(int(ceil(log(len(A), 2))))
T = [(None, None)]*(2*n-1)
for i, x in enumerate(A):
T[n-1+i] = (x, i)
for i in reversed(xrange(n-1)):
T[i] = max(T[i*2+1], T[i*2+2])
return T
def print_tree(T):
print 'digraph {'
for i, x in enumerate(T):
print ' ' + str(i) + '[label="' + str(x) + '"]'
if i*2+2 < len(T):
print ' ' + str(i)+ '->'+ str(i*2+1)
print ' ' + str(i)+ '->'+ str(i*2+2)
print '}'
def find_generic(T, i, fallback, check, first, second):
j = len(T)/2+i
original = T[j]
j = (j-1)/2
#go up in the tree searching for a value that satisfies check
while j > 0 and not check(T[second(j)], original):
j = (j-1)/2
#go down in the tree searching for the left/rightmost node that satisfies check
while j*2+1<len(T):
if check(T[first(j)], original):
j = first(j)
elif check(T[second(j)], original):
j = second(j)
else:
return fallback
return j-len(T)/2
def find_left(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>b[0] and a[1]<b[1], #value greater, index before
lambda j: j*2+2, #rightmost first
lambda j: j*2+1 #leftmost second
)
def find_right(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>=b[0] and a[1]>b[1], #value greater or equal, index after
lambda j: j*2+1, #leftmost first
lambda j: j*2+2 #rightmost second
)
def make_frequency_array(A):
T = make_tree(A)
D = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = find_left(T, i, -1)
right = find_right(T, i, len(A))
D[x] += (i-left) * (right-i)
F = sorted(D.items())
for i in range(1, len(F)):
F[i] = (F[i][0], F[i-1][1] + F[i][1])
return F
def query(F, n):
idx = bisect.bisect(F, (n,))
if idx>=len(F): return F[-1][1]
if F[idx][0]!=n: return 0
return F[idx][1]
F = make_frequency_array([1,2,3])
print query(F, 3)-query(F, 2) #3 3
print query(F, float('inf'))-query(F, 4) #1 4
print query(F, float('inf'))-query(F, 1) #1 1
print query(F, 2) #2 3
You problem can be divided into several steps:
For each element of initial array calculate the number of "subarrays" where current element is maximum. This will involve a bit of combinatorics. First you need for each element to know index of previous and next element that is bigger than current element. Then calculate the number of subarrays as (i - iprev) * (inext - i). Finding iprev and inext requires two traversals of the initial array: in forward and backward order. For iprev you need to traverse you array left to right. During the traversal maintain the BST that contains the biggest of the previous elements along with their index. For each element of original array, find the minimal element in BST that is bigger than current. It's index, stored as value, will be iprev. Then remove from BST all elements that are smaller that current. This operation should be O(logN), as you are removing whole subtrees. This step is required, as current element you are about to add will "override" all element that are less than it. Then add current element to BST with it's index as value. At each point of time, BST will store the descending subsequence of previous elements where each element is bigger than all it's predecessors in array (for previous elements {1,2,44,5,2,6,26,6} BST will store {44,26,6}). The backward traversal to find inext is similar.
After previous step you'll have pairs K→P where K is the value of some element from the initial array and P is the number of subarrays where this element is maxumum. Now you need to group this pairs by K. This means calculating sum of P values of the equal K elements. Be careful about the corner cases when two elements could have share the same subarrays.
As Ritesh suggested: Put all grouped K→P into an array, sort it by K and calculate cumulative sum of P for each element in one pass. It this case your queries will be binary searches in this sorted array. Each query will be performed in O(log(N)) time.
Create a sorted value-to-index map. For example,
[34,5,67,10,100] => {5:1, 10:3, 34:0, 67:2, 100:4}
Precalculate the queries in two passes over the value-to-index map:
Top to bottom - maintain an augmented tree of intervals. Each time an index is added,
split the appropriate interval and subtract the relevant segments from the total:
indexes intervals total sub-arrays with maximum greater than
4 (0,3) 67 => 15 - (4*5/2) = 5
2,4 (0,1)(3,3) 34 => 5 + (4*5/2) - 2*3/2 - 1 = 11
0,2,4 (1,1)(3,3) 10 => 11 + 2*3/2 - 1 = 13
3,0,2,4 (1,1) 5 => 13 + 1 = 14
Bottom to top - maintain an augmented tree of intervals. Each time an index is added,
adjust the appropriate interval and add the relevant segments to the total:
indexes intervals total sub-arrays with maximum less than
1 (1,1) 10 => 1*2/2 = 1
1,3 (1,1)(3,3) 34 => 1 + 1*2/2 = 2
0,1,3 (0,1)(3,3) 67 => 2 - 1 + 2*3/2 = 4
0,1,3,2 (0,3) 100 => 4 - 4 + 4*5/2 = 10
The third query can be pre-calculated along with the second:
indexes intervals total sub-arrays with maximum exactly
1 (1,1) 5 => 1
1,3 (3,3) 10 => 1
0,1,3 (0,1) 34 => 2
0,1,3,2 (0,3) 67 => 3 + 3 = 6
Insertion and deletion in augmented trees are of O(log n) time-complexity. Total precalculation time-complexity is O(n log n). Each query after that ought to be O(log n) time-complexity.

Smallest number that cannot be formed from sum of numbers from array

This problem was asked to me in Amazon interview -
Given a array of positive integers, you have to find the smallest positive integer that can not be formed from the sum of numbers from array.
Example:
Array:[4 13 2 3 1]
result= 11 { Since 11 was smallest positive number which can not be formed from the given array elements }
What i did was :
sorted the array
calculated the prefix sum
Treverse the sum array and check if next element is less than 1
greater than sum i.e. A[j]<=(sum+1). If not so then answer would
be sum+1
But this was nlog(n) solution.
Interviewer was not satisfied with this and asked a solution in less than O(n log n) time.
There's a beautiful algorithm for solving this problem in time O(n + Sort), where Sort is the amount of time required to sort the input array.
The idea behind the algorithm is to sort the array and then ask the following question: what is the smallest positive integer you cannot make using the first k elements of the array? You then scan forward through the array from left to right, updating your answer to this question, until you find the smallest number you can't make.
Here's how it works. Initially, the smallest number you can't make is 1. Then, going from left to right, do the following:
If the current number is bigger than the smallest number you can't make so far, then you know the smallest number you can't make - it's the one you've got recorded, and you're done.
Otherwise, the current number is less than or equal to the smallest number you can't make. The claim is that you can indeed make this number. Right now, you know the smallest number you can't make with the first k elements of the array (call it candidate) and are now looking at value A[k]. The number candidate - A[k] therefore must be some number that you can indeed make with the first k elements of the array, since otherwise candidate - A[k] would be a smaller number than the smallest number you allegedly can't make with the first k numbers in the array. Moreover, you can make any number in the range candidate to candidate + A[k], inclusive, because you can start with any number in the range from 1 to A[k], inclusive, and then add candidate - 1 to it. Therefore, set candidate to candidate + A[k] and increment k.
In pseudocode:
Sort(A)
candidate = 1
for i from 1 to length(A):
if A[i] > candidate: return candidate
else: candidate = candidate + A[i]
return candidate
Here's a test run on [4, 13, 2, 1, 3]. Sort the array to get [1, 2, 3, 4, 13]. Then, set candidate to 1. We then do the following:
A[1] = 1, candidate = 1:
A[1] ≤ candidate, so set candidate = candidate + A[1] = 2
A[2] = 2, candidate = 2:
A[2] ≤ candidate, so set candidate = candidate + A[2] = 4
A[3] = 3, candidate = 4:
A[3] ≤ candidate, so set candidate = candidate + A[3] = 7
A[4] = 4, candidate = 7:
A[4] ≤ candidate, so set candidate = candidate + A[4] = 11
A[5] = 13, candidate = 11:
A[5] > candidate, so return candidate (11).
So the answer is 11.
The runtime here is O(n + Sort) because outside of sorting, the runtime is O(n). You can clearly sort in O(n log n) time using heapsort, and if you know some upper bound on the numbers you can sort in time O(n log U) (where U is the maximum possible number) by using radix sort. If U is a fixed constant, (say, 109), then radix sort runs in time O(n) and this entire algorithm then runs in time O(n) as well.
Hope this helps!
Use bitvectors to accomplish this in linear time.
Start with an empty bitvector b. Then for each element k in your array, do this:
b = b | b << k | 2^(k-1)
To be clear, the i'th element is set to 1 to represent the number i, and | k is setting the k-th element to 1.
After you finish processing the array, the index of the first zero in b is your answer (counting from the right, starting at 1).
b=0
process 4: b = b | b<<4 | 1000 = 1000
process 13: b = b | b<<13 | 1000000000000 = 10001000000001000
process 2: b = b | b<<2 | 10 = 1010101000000101010
process 3: b = b | b<<3 | 100 = 1011111101000101111110
process 1: b = b | b<<1 | 1 = 11111111111001111111111
First zero: position 11.
Consider all integers in interval [2i .. 2i+1 - 1]. And suppose all integers below 2i can be formed from sum of numbers from given array. Also suppose that we already know C, which is sum of all numbers below 2i. If C >= 2i+1 - 1, every number in this interval may be represented as sum of given numbers. Otherwise we could check if interval [2i .. C + 1] contains any number from given array. And if there is no such number, C + 1 is what we searched for.
Here is a sketch of an algorithm:
For each input number, determine to which interval it belongs, and update corresponding sum: S[int_log(x)] += x.
Compute prefix sum for array S: foreach i: C[i] = C[i-1] + S[i].
Filter array C to keep only entries with values lower than next power of 2.
Scan input array once more and notice which of the intervals [2i .. C + 1] contain at least one input number: i = int_log(x) - 1; B[i] |= (x <= C[i] + 1).
Find first interval that is not filtered out on step #3 and corresponding element of B[] not set on step #4.
If it is not obvious why we can apply step 3, here is the proof. Choose any number between 2i and C, then sequentially subtract from it all the numbers below 2i in decreasing order. Eventually we get either some number less than the last subtracted number or zero. If the result is zero, just add together all the subtracted numbers and we have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2i, so it is "representable" and none of the subtracted numbers are used for its representation. When we add these subtracted numbers back, we have the representation of chosen number. This also suggests that instead of filtering intervals one by one we could skip several intervals at once by jumping directly to int_log of C.
Time complexity is determined by function int_log(), which is integer logarithm or index of the highest set bit in the number. If our instruction set contains integer logarithm or any its equivalent (count leading zeros, or tricks with floating point numbers), then complexity is O(n). Otherwise we could use some bit hacking to implement int_log() in O(log log U) and obtain O(n * log log U) time complexity. (Here U is largest number in the array).
If step 1 (in addition to updating the sum) will also update minimum value in given range, step 4 is not needed anymore. We could just compare C[i] to Min[i+1]. This means we need only single pass over input array. Or we could apply this algorithm not to array but to a stream of numbers.
Several examples:
Input: [ 4 13 2 3 1] [ 1 2 3 9] [ 1 1 2 9]
int_log: 2 3 1 1 0 0 1 1 3 0 0 1 3
int_log: 0 1 2 3 0 1 2 3 0 1 2 3
S: 1 5 4 13 1 5 0 9 2 2 0 9
C: 1 6 10 23 1 6 6 15 2 4 4 13
filtered(C): n n n n n n n n n n n n
number in
[2^i..C+1]: 2 4 - 2 - - 2 - -
C+1: 11 7 5
For multi-precision input numbers this approach needs O(n * log M) time and O(log M) space. Where M is largest number in the array. The same time is needed just to read all the numbers (and in the worst case we need every bit of them).
Still this result may be improved to O(n * log R) where R is the value found by this algorithm (actually, the output-sensitive variant of it). The only modification needed for this optimization is instead of processing whole numbers at once, process them digit-by-digit: first pass processes the low order bits of each number (like bits 0..63), second pass - next bits (like 64..127), etc. We could ignore all higher-order bits after result is found. Also this decreases space requirements to O(K) numbers, where K is number of bits in machine word.
If you sort the array, it will work for you. Counting sort could've done it in O(n), but if you think in a practically large scenario, range can be pretty high.
Quicksort O(n*logn) will do the work for you:
def smallestPositiveInteger(self, array):
candidate = 1
n = len(array)
array = sorted(array)
for i in range(0, n):
if array[i] <= candidate:
candidate += array[i]
else:
break
return candidate

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Resources