How do I maximise the sum in this game? - arrays

So, I was asked this question in an interview:
There are two friends playing a game in which they select a number from an array containing n positive numbers. Both friends select one number at a time, and both the players play the game optimally. And you have to find out what's the maximum sum (of numbers that are being selected) that you could obtain after the game ends. The constraints that were given after I gave the answer to the same question without constraints were:
On the first move of both players, they can select any number.
Apart from first move, they can only select the number which is adjacent to the previous number in the given array and which hasn't been selected by the first and second player up until that moment in the game. (Clarification as edit)
And if a player is not able to make a move, he/she stops playing. And the game ends when both players cannot make a move.
Now, the solution that I gave was:
Make a structure containing the value as well as the index of the value in the input array.
Make an array of the previous structure and store the values of first step in this array.
Sort this array in non-decreasing order on the basis of values.
Start selecting a value in a greedy manner and print the maximum value.
They were looking more for a pseudo-code though I can code it too. But, the interviewer said this will fail for some cases. I thought a lot on which cases this will fail but couldn't find any. Therefore, I need help in this question.
Also, if possible, please include a pseudo-code of what I can do to improve this.
Edit: I guess I wasn't clear enough in my question, specifically in 2nd point. What the interviewer meant was:
If it is not the first move of the player, he has to choose a number which is adjacent to one of the number he already selected in previous moves.
Also, yes, both the players play the game optimally and they choose numbers turn by turn.
Edit2: So, the same question was asked from my friend but it was modified a little. Instead of an array, what he was given was a graph. So, like in my case, I can select only the indices that are adjacent to my previously selected indices, what he was given was an undirected graph (adjacency list as input) and he could select only those vertices in a particular move which are directly connected to any of the previously selected vertex.
For eg:
Let's say the number of positive integers is 3. The value of those integers are 4, 2, 4 and also, if I name the positive integers by A, B and C, then,
A - B
B - C
The above was the example that my friend was given and the answer to the above would be 6. Can you just point me in the right direction as to how I can begin with this? Thanks!

Notice that if you make the first move at index x, if your opponent plays optimally, their first move will have to be at index x-1 or x+1. Otherwise, they will have elements that they could have picked but didn't. To see this, consider two non-adjacent starting points:
-------y-------------x-------
Eventually they will both take elements from the array and end up with something like:
yyyyyyyyyyyyyyxxxxxxxxxxxxxxx
So you can relocate the starting points to the middle yx, obtaining the same solution.
So, assume you move first at x. Let:
s_left_x = a[0] + ... + a[x]
s_right_x = a[x] + ... a[n - 1]
s_left_y = a[0] + ... + a[x - 1]
s_right_y = a[x + 1] + ... + a[n - 1]
Let's say you want to win the game: have a larger sum than your opponent at the end. If your opponent picks x + 1, you want s_left_x > s_right_y, and if your opponent picks x - 1, you want s_right_x > s_left_y. This is ideally, in order to win. It's not always possible to win though, and your question doesn't ask how to win, but rather how to get the largest sum.
Since your opponent will play optimally, he will force you into the worst case. So for each x as your first move, the best you can do is min(s_left_x, s_right_x). Pick the maximum of this expression for each index x, which you can find in O(1) for each index after some precomputations.

Ok, I think this is solution, that is formulated more briefly:
1st player must pick the item that bisects the array in a way two resulting arrays' sums difference is less, than picked item's value.
If it's achievable, p1 wins, if not, p2 wins.
Obviously, on his 1st move p2 must choose the item next to p1's, as it is the only way for him to get maximum sum. He chooses the item on the side, where the sum of remaining items is bigger. This will be the maximum sum p2 can get also.
p1's maximum sum will be the sum of remaining items (items that are on the side, that p2 has not chosen plus item p1 picked in first move).

As the OP mentioned that both player play the game optimally,I am going to present an algorithm under this assumption.
Definitely if both players play optimally,then definitely the sum they obtain at the end would be maximum,otherwise they are not playing optimally.
There are two different cases here:
I make the first move and pick element at position x
Now because we have to obey the condition that only adjacent elements could be picked,let me define two arrays here.
left[x]: It is the sum of elements that can be obtained by adding
array[0],array[1]....array[x-1],the elements left to x.
right[x]: It is the sum of elements that can be obtained by adding
array[x+1],array[x+2]....array[n-1],the elements right to x.
Now,because the other player also plays optimally,what he will do is he will check what I can possibly achieve and he finds that,I could achieve the following:
array[x] + left[x] = S1
OR
array[x] + right[x] = S2
So what the other player does is finds the minimum of the S1 and S2.
If S1 < S2 this means that if the other player picks element at x+1,he just took away the better part of array from us because now we are left with a lesser sum S1
If S1 > S2 this means that if the other player picks element at x-1,he just took away the better part of array from us because now we are left with a lesser sum S2
Because I am also playing optimally I would in the very first move pick such x which has minimum absolute value of (right[x]-left[x]),so that even if our opponent takes the better part of array from us,he is only able to take away minimum
Therefore,If both players play optimally, the maximum sums obtained are:
Update.
x + left[x] and right[x]//when second player in his first move picks x+1
Therefore,in this case the moves made are:
Player 1:Picks element at position x,x-1,x-2....0.
Player 2:Picks element at position x+1,x+2,....n-1
Because each player has to pick adjacent element to the previously picked element by him.
OR
x + right[x] and left[x]//when second player in his first move picks x-1
Therefore,in this case the moves made are:
Player 1:Picks element at position x,x+1,x+2....n-1.
Player 2:Picks element at position x-1,x-2,....0.
Because each player has to pick adjacent element to the previously picked element by him.
where x is such that we obtain minimum absolute value of (right[x]-left[x]).
Since OP insisted on posting pseudocode,here is one:
Computing the left and right array.
for(i = 0 to n-1)
{
if(i==0)
left[i]=0;
else left[i] = left[i] + array[i-1];
j = n-1-i;
if(j==n-1)
right[j]=0;
else right[j]= right[j] + array[j+1];
}
The left and right arrays initially have 0 in all positions.
Computing max_sums.
Find_the_max_sums()
{
min = absoulte_value_of(right[0]-left[0])
x = 0;
for(i = 1 to n-1)
{
if( absolute_value_of(right[i]-left[i]) < min)
{
min = absolute_value_of(right[i]-left[i]);
x=i;
}
}
}
Clearly Both space and time complexity of this algorithm is linear.

To complete #Ivlad's anwser, there exists a strategy for which p1 will never lose (there may be a draw in the worst case). She can always find a x such that the sum she obtains is not smaller than p2's sum.
The proof goes as follows -- sorry this more math that coding. I consider an array of positive numbers (up to a constant translation, there is no loss of generality here).
I denote a = [a[0],...,a[n]] the array under consideration and for any k<=l, S[k,l] = a[k]+...+a[l] the sum of all terms form rank k to rank l. By convention, S[k,l]=0 if k>l.
As explained in previous comments, picking up a[k+1] guarantees p1 to win if:
a[k+1] >= max( S[1,k]-S[k+2,n] , S[k+2,n]-S[1,k] )
Let consider D[k] = S[1,k] - S[k+1,n], the difference between the sum of the first k terms and the sum of other terms. The sequence D is decreasing, negative for k=0 and positive for k=n. We can an index i such that:
D[i] <= 0
D[i+1] => 0
(in fact, one of the two above inequalities must be strict).
Substituting for S and noticing that S[1,k+1]=S[1,k] + a[k+1] and S[k+1,n]=S[k+2,n] + a[k+1], the two inequalities in D imply:
S[1,i] <= S[i+2,n] + a[i+1]
S[1,i] + a[i+1] >= S[i+2,n]
or equivalently:
a[i+1] >= S[1,i] - S[i+2,n]
a[i+1] >= S[i+2,n] - S[1,i]
In other words, choosing a[i+1] is strategy with which p1 cannot lose.
In fact, this strategy offers the largest payoff for p1. Note that it is not unique even if all terms of the array are strictly positive. Consider for example a=[1,2,3], where p1 can indifferently choose 2 or 3.

Given your constraints, the maximum sum in any game will always be the addition of a number of adjacent values, from a given split position S to either position 1 or position N. Player 1 chooses an initial split point S and Player 2 chooses his side of the array to sum (so Player 2 also chooses Player 1's side). One player will add from 1 to S (or S-1) and the other from S (or S+1) to N. The higher sum wins.
In order to win the game, Player 1 must find a split position S such that both addtions from 1 to S-1 and from S+1 to N are strictly smaller than the sum of the other side plus the value of S. This way no matter what side Player 2 chooses to add, its sum will be smaller.
Pseudocode:
1. For each position S from 1 to N, repeat:
1.1. Add values from 1 to S-1 (set to zero if S is 1), assign to S1.
1.2. Add values from S+1 to N (set to zero if S is N), assign to SN.
1.3. If S1 is smaller than S+SN and SN is smaller than S+S1, then S is the winning position for Player 1. If not, repeat.
2. If you have found no winning position, then whatever you choose Player 2 can win choosing in turn an optimal position.

Related

Find the longest subarray that contains a majority element

I am trying to solve this algorithmic problem:
https://dunjudge.me/analysis/problems/469/
For convenience, I have summarized the problem statement below.
Given an array of length (<= 2,000,000) containing integers in the range [0, 1,000,000], find the
longest subarray that contains a majority element.
A majority element is defined as an element that occurs > floor(n/2) times in a list of length n.
Time limit: 1.5s
For example:
If the given array is [1, 2, 1, 2, 3, 2],
The answer is 5 because the subarray [2, 1, 2, 3, 2] of length 5 from position 1 to 5 (0-indexed) has the number 2 which appears 3 > floor(5/2) times. Note that we cannot take the entire array because 3 = floor(6/2).
My attempt:
The first thing that comes to mind is an obvious brute force (but correct) solution which fixes the start and end indexes of a subarray and loop through it to check if it contains a majority element. Then we take the length of the longest subarray that contains a majority element. This works in O(n^2) with a small optimization. Clearly, this will not pass the time limit.
I was also thinking of dividing the elements into buckets that contain their indexes in sorted order.
Using the example above, these buckets would be:
1: 0, 2
2: 1, 3, 5
3: 4
Then for each bucket, I would make an attempt to merge the indexes together to find the longest subarray that contains k as the majority element where k is the integer label of that bucket.
We could then take the maximum length over all values of k. I didn't try out this solution as I didn't know how to perform the merging step.
Could someone please advise me on a better approach to solve this problem?
Edit:
I solved this problem thanks to the answers of PhamTrung and hk6279. Although I accepted the answer from PhamTrung because he first suggested the idea, I highly recommend looking at the answer by hk6279 because his answer elaborates the idea of PhamTrung and is much more detailed (and also comes with a nice formal proof!).
Note: attempt 1 is wrong as #hk6279 has given a counter example. Thanks for pointing it out.
Attempt 1:
The answer is quite complex, so I will discuss a brief idea
Let process each unique number one by one.
Processing each occurrence of number x from left to right, at index i, let add an segment (i, i) indicates the start and end of the current subarray. After that, we need to look to the left side of this segment, and try to merge the left neighbour of this segment into (i, i), (So, if the left is (st, ed), we try to make it become (st, i) if it satisfy the condition) if possible, and continue to merge them until we are not able to merge, or there is no left neighbour.
We keep all those segments in a stack for faster look up/add/remove.
Finally, for each segment, we try to enlarge them as large as possible, and keep the biggest result.
Time complexity should be O(n) as each element could only be merged once.
Attempt 2:
Let process each unique number one by one
For each unique number x, we maintain an array of counter. From 0 to end of the array, if we encounter a value x we increase the count, and if we don't we decrease, so for this array
[0,1,2,0,0,3,4,5,0,0] and number 0, we have this array counter
[1,0,-1,0,1,0,-1,-2,-1,0]
So, in order to make a valid subarray which ends at a specific index i, the value of counter[i] - counter[start - 1] must be greater than 0 (This can be easily explained if you view the array as making from 1 and -1 entries; with 1 is when there is an occurrence of x, -1 otherwise; and the problem can be converted into finding the subarray with sum is positive)
So, with the help of a binary search, the above algo still have an complexity of O(n ^ 2 log n) (in case we have n/2 unique numbers, we need to do the above process n/2 times, each time take O (n log n))
To improve it, we make an observation that, we actually don't need to store all values for all counter, but just the values of counter of x, we saw that we can store for above array counter:
[1,#,#,0,1,#,#,#,-1,0]
This will leads to O (n log n) solution, which only go through each element once.
This elaborate and explain how attempt 2 in #PhamTrung solution is working
To get the length of longest subarray. We should
Find the max. number of majority element in a valid array, denote as m
This is done by attempt 2 in #PhamTrung solution
Return min( 2*m-1, length of given array)
Concept
The attempt is stem from a method to solve longest positive subarray
We maintain an array of counter for each unique number x. We do a +1 when we encounter x. Otherwise, do a -1.
Take array [0,1,2,0,0,3,4,5,0,0,1,0] and unique number 0, we have array counter [1,0,-1,0,1,0,-1,-2,-1,0,-1,0]. If we blind those are not target unique number, we get [1,#,#,0,1,#,#,#,-1,0,#,0].
We can get valid array from the blinded counter array when there exist two counter such that the value of the right counter is greater than or equal to the left one. See Proof part.
To further improve it, we can ignore all # as they are useless and we get [1(0),0(3),1(4),-1(8),0(9),0(11)] in count(index) format.
We can further improve this by not record counter that is greater than its previous effective counter. Take counter of index 8,9 as an example, if you can form subarray with index 9, then you must be able to form subarray with index 8. So, we only need [1(0),0(3),-1(8)] for computation.
You can form valid subarray with current index with all previous index using binary search on counter array by looking for closest value that is less than or equal to current counter value (if found)
Proof
When right counter greater than left counter by r for a particular x, where k,r >=0 , there must be k+r number of x and k number of non x exist after left counter. Thus
The two counter is at index position i and r+2k+i
The subarray form between [i, r+2k+i] has exactly k+r+1 number of x
The subarray length is 2k+r+1
The subarray is valid as (2k+r+1) <= 2 * (k+r+1) -1
Procedure
Let m = 1
Loop the array from left to right
For each index pi
If the number is first encounter,
Create a new counter array [1(pi)]
Create a new index record storing current index value (pi) and counter value (1)
Otherwise, reuse the counter array and index array of the number and perform
Calculate current counter value ci by cprev+2-(pi - pprev), where cprev,pprev are counter value and index value in index record
Perform binary search to find the longest subarray that can be formed with current index position and all previous index position. i.e. Find the closest c, cclosest, in counter array where c<=ci. If not found, jump to step 5
Calculate number of x in the subarray found in step 2
r = ci - cclosest
k = (pi-pclosest-r)/2
number of x = k+r+1
Update counter m by number of x if subarray has number of x > m
Update counter array by append current counter if counter value less than last recorded counter value
Update index record by current index (pi) and counter value (ci)
For completeness, here's an outline of an O(n) theory. Consider the following, where * are characters different from c:
* c * * c * * c c c
i: 0 1 2 3 4 5 6 7 8 9
A plot for adding 1 for c and subtracting 1 for a character other than c could look like:
sum_sequence
0 c c
-1 * * c c
-2 * * c
-3 *
A plot for the minimum of the above sum sequence, seen for c, could look like:
min_sum
0 c * *
-1 * c * *
-2 c c c
Clearly, for each occurrence of c, we are looking for the leftmost occurrence of c with sum_sequence lower than or equal to the current sum_sequence. A non-negative difference would mean c is a majority, and leftmost guarantees the interval is the longest up to our position. (We can extrapolate a maximal length that is bounded by characters other than c from the inner bounds of c as the former can be flexible without affecting the majority.)
Observe that from one occurrence of c to the next, its sum_sequence can decrease by an arbitrary size. However, it can only ever increase by 1 between two consecutive occurrences of c. Rather than each value of min_sum for c, we can record linear segments, marked by cs occurrences. A visual example:
[start_min
\
\
\
\
end_min, start_min
\
\
end_min]
We iterate over occurrences of c and maintain a pointer to the optimal segment of min_sum. Clearly we can derive the next sum_sequence value for c from the previous one since it is exactly diminished by the number of characters in between.
An increase in sum_sequence for c corresponds with a shift of 1 back or no change in the pointer to the optimal min_sum segment. If there is no change in the pointer, we hash the current sum_sequence value as a key to the current pointer value. There can be O(num_occurrences_of_c) such hash keys.
With an arbitrary decrease in c's sum_sequence value, either (1) sum_sequence is lower than the lowest min_sum segment recorded so we add a new, lower segment and update the pointer, or (2) we've seen this exact sum_sequence value before (since all increases are by 1 only) and can use our hash to retrieve the optimal min_sum segment in O(1).
As Matt Timmermans pointed out in the question comments, if we were just to continually update the pointer to the optimal min_sum by iterating over the list, we would still only perform O(1) amortized-time iterations per character occurrence. We see that for each increasing segment of sum_sequence, we can update the pointer in O(1). If we used binary search only for the descents, we would add at most (log k) iterations for every k occurences (this assumes we jump down all the way), which keeps our overall time at O(n).
Algorithm :
Essentially, what Boyer-Moore does is look for a suffix sufsuf of nums where suf[0]suf[0] is the majority element in that suffix. To do this, we maintain a count, which is incremented whenever we see an instance of our current candidate for majority element and decremented whenever we see anything else. Whenever count equals 0, we effectively forget about everything in nums up to the current index and consider the current number as the candidate for majority element. It is not immediately obvious why we can get away with forgetting prefixes of nums - consider the following examples (pipes are inserted to separate runs of nonzero count).
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 7, 7, 7, 7]
Here, the 7 at index 0 is selected to be the first candidate for majority element. count will eventually reach 0 after index 5 is processed, so the 5 at index 6 will be the next candidate. In this case, 7 is the true majority element, so by disregarding this prefix, we are ignoring an equal number of majority and minority elements - therefore, 7 will still be the majority element in the suffix formed by throwing away the first prefix.
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 5, 5, 5, 5]
Now, the majority element is 5 (we changed the last run of the array from 7s to 5s), but our first candidate is still 7. In this case, our candidate is not the true majority element, but we still cannot discard more majority elements than minority elements (this would imply that count could reach -1 before we reassign candidate, which is obviously false).
Therefore, given that it is impossible (in both cases) to discard more majority elements than minority elements, we are safe in discarding the prefix and attempting to recursively solve the majority element problem for the suffix. Eventually, a suffix will be found for which count does not hit 0, and the majority element of that suffix will necessarily be the same as the majority element of the overall array.
Here's Java Solution :
Time complexity : O(n)
Space complexity : O(1)
public int majorityElement(int[] nums) {
int count = 0;
Integer candidate = null;
for (int num : nums) {
if (count == 0) {
candidate = num;
}
count += (num == candidate) ? 1 : -1;
}
return candidate;
}

Algorithm for two piles of equal heights by removing boxes at top or bottom

Assume you have two piles, each made of N boxes of different heights. You want to remove boxes so as to obtain two piles of equal heights (if possible). You cannot remove a box which is not at the top or the bottom of the pile! One can see for instance that if we remove the red boxes below we obtain two towers of equal heights:
Another way to state this problem is: given two arrays of positive numbers, are there two consecutive sub-sequences (one in each array) whose sums are equal?
This problem looks similar to this one, in which we have an array A of size N and a target t, and we want to find a consecutive sub-sequence of A whose sum is t (equivalently, we have a single pile of boxes and we want to remove boxes at the top and the bottom so as to obtain a pile of height t). This problem can be solved in time O(N). On the other hand, the above-mentioned problem has a trivial O(N^2) solution (see the answers below), but is there also a o(N^2) algorithm (O(N log N) for instance)?
Addendum: The sizes of the boxes are positive integers. They are assumed to be all differents (otherwise the problem can be trivially solved in O(N)). If we denote by H the total size of the two piles, there is a O(H log H) algorithm (see the answers below). Thus, the problem starts to be interesting for H greater than N (an efficient solution for H = N^2 would be a good start ;)
Let's find all the sums in O(n^2) and add them to a hash table:
for l in [0, n)
cur_sum = 0
for r in [l, n)
cur_sum += a[r]
hash_table.add(cur_sum)
After that, we need to do the same thing for the second array and check if at least one sum occurs in the hash table.
That's much better than a naive O(n^3) solution.
It's also possible to solve this problem in O(H log H) time, where H is the sum of boxes' heights if all heights are integers, but it's useful only if the total height is limited.
The O(H log H) solution goes like this:
We can find all subarray sums quickly. Let's take a look at what a subarray S sum is. It's a difference of two prefix sums P[r] - P[l] = S. If we add H to both sides of the equation, we obtain P[r] + (H - P[l]) = H + S. So we need to check if there exist two elements in the array P[i] and H - P[i], respectively, that add up to a given value. Let's not just check their existence, but count the number of such pairs. No it looks exactly like a multiplication of two polynomials. The coefficients of the first polynomial are the number of occurrences of a specific prefix sum (C[i] = count(j: P[j] == i)). The second polynomial is essentially the same thing reversed. We can multiply these two polynomials in O(H log H) time using Fourier fast transform (as they both are of degree H). After that, we just need to check the H + i coefficient of the product is non-zero.
Once we know all sums for the first and the second array, we need to check if they have at least one in common.
This sounds similar to the algorithm to find the length of the longest palindrome in a string. In the palindrome algorithm, the beginning and end of the string is examined to determine if a palindrome exists, if not then the first character of the string is removed from the string so the rest of the string is compared with the result of taking off the last character of the string.
In this case, you are seeking a box that results from taking off the top and bottom of the pile. If the pile doesn't result in a box of equal size in another pile, then take off the top of the pile and compare the result with taking the bottom of a pile. The recursive pseudocode could look similar to:
global pile2
global HashTableForPile2 # each value is a box in Pile2
def FindBoxInPile(pile, top, bottom):
if (top == ReachedBottomInPile or bottom == ReachedTopInPile):
Stop processing and return False ## base case
else:
# find the box created from the top to the bottom
for i in range(bottom, top)
sumPile1 += pile[i]
# Determine if an identical box is inside both piles.
# If so, return True; otherwise, recurse.
# (assuming HashTableForPile2 was calculated earlier)
if (sumPile1 == HashTableForPile2[sumPile1]):
return True
# take top AND bottom off pile
if (FindBoxInPile(pile1, top-1, bottom+1) == True):
return True # A box is inside both piles
# take top OR bottom off pile and compare result
elif (FindBoxInPile(pile1, top-1, bottom) == True or
FindBoxInPile(pile1, top, bottom+1) == True)
return True # A box is inside both piles, return True
A dynamic programming solution would run in O(N) time.

Finding the Average case complexity of an Algorithm

I have an algorithm for Sequential search of an unsorted array:
SequentialSearch(A[0..n-1],K)
i=0
while i < n and A[i] != K do
i = i+1
if i < n then return i
else return -1
Where we have an input array A[0...n-1] and a search key K
I know that the worst case is n, because we would have to search the entire array, hence n items O(n)
I know that the best case is 1, since that would mean the first item we search is the one we want, or the array has all the same items, either way it's O(1)
But I have no idea on how to calculate the average case. The answer my textbook gives is:
= (p/n)[1+2+...+i+...+n] + n(1-p)
is there a general formula I can follow for when I see an algorithm like this one, to calculate it?
PICTURE BELOW
Textbook example
= (p/n)[1+2+...+i+...+n] + n(1-p)
p here is the probability of an search key found in the array, since we have n elements, we have p/n as the probability of finding the key at the particular index within n . We essentially doing weighted average as in each iteration, we weigh in 1 comparison, 2 comparison, and until n comparison. Because we have to take all inputs into account, the second part n(1-p) tells us the probability of input that doesn't exist in the array 1-p. and it takes n as we search through the entire array.
You'd need to consider the input cases, something like equivalence classes of input, which depends on the context of the algorithm. If none of those things are known, then assuming that the input is an array of random integers, the average case would probably be O(n). This is because, roughly, you have no way of proving to a useful extent how often your query will be found in an array of N integer values in the range of ~-32k to ~32k.
More formally, let X be a discrete random variable denoting the number of elements of the array A that are needed to be scanned. There are n elements and since all positions are equally likely for inputs generated randomly, X ~ Uniform(1,n) where X = 1,..,n, given that search key is found in the array (with probability p), otherwise all the elements need to be scanned, with X=n (with probability 1-p).
Hence, P(X=x)=(1/n).p.I{x<n}+((1/n).p+(1-p)).I{x=n} for x = 1,..,n, where I{x=n} is the indicator function and will have value 1 iff x=n otherwise 0.
Average time complexity of the algorithm is the expected time taken to execute the algorithm when the input is an arbitrary sequence. By definition,
The following figure shows how time taken for searching the array changes with n and p.

Consecutve Subset Array Sum is a certain integer algorithm

Here is the problem:
Given is an array A of n integers, a seperate integer M, and an integer d. Find a consecutive subarray S of A, such that the size of the subarray is less than or equal to d and the sum of all the elements in S is M. Return the indexes of A that make the left and right index the subarray S. All numbers are positive.
If there is more than one result, give the rightmost result.
We have to make the algorithm run in better time than: O(n^2) or O(n*d). So basically it has to be O(nlog(n)), and divide and conquer I'm assuming is the way to go. I know how to do maximum continuous subarray problem, but that is made easier because when you divide and conquer you can look for max subarrays, with this one you don't really know what you are looking for in the subarrays, if that makes sense, since the solution could come from combinations of subarrays with small numbers and subarrays with big
Any help to lead me to the solution would be greatly appreciated!
I'm about 80% sure at this point that this is not possible... I keep looking it over and I can not think of a single way to make this work, is it possible this is a massive trick question?
This is relatively easy if the integers in A are >= 0, because you can just maintain a couple of pointers that define an interval with sum close to M and slide this along the array from right to left. Is it possible that you have missed some extra information like this in the question?
OK - here is some expansion. You have a left pointer and a right pointer. Move the right hand pointer from right to left, maintaining the invariant that the left hand pointer is no more than d places from the right hand pointer, and the sum of elements enclosed by the two pointers is the greatest possible number <= M. Repeatedly move the right hand pointer one step to the left and move the left hand pointer to the left until either you reach the limit of d or moving it further would produce a sum > M. Each time you move a pointer you can increment or decrement to maintain a running total of the sum enclosed by the two pointers.
Because the numbers are >= 0 every time you move the right hand pointer you decrease the sum or it stays the same so you always want to leave the left hand pointer the same or move it to the left. Because the numbers are >=0 you know that if there is an answer starting at the right hand pointer position you will find it with the left hand pointer position - anything that doesn't extend as far as the left hand pointer is too small, and anything that extends further is too large, except in the case where there are zeros and in that case you will find a solution, it's just that there are other solutions.
Each pointer is moved only in one direction so the maximum number of pointer movements is O(n) and the cost per pointer move is fixed so the complexity is O(n)
If all numbers are non-negative, this has a straightforward O(N) solution. The requirement of length<=d doesn't add any complexity, just add a check current_length<=d. I assume there are negative numbers in the array. We need additional O(N) space.
Compute prefix-sum of each index of S: p(i) = sum(S,0,i). Place p(i) in an additional array P: P[i]=p(i).
Make a copy of P: PSorted = P. Sort PSorted with a stable sort algorithm. We use it as a map prefix-sum -> index, with the index being a tie-breaker.
For each index k of S, starting from the largest:
Set p = P[k].
Look up p-M in PSorted using binary search, biased to the right. Say the resulting index is q.
If found, and q-k<d, return the answer (k,q).
This has overall O(n log n) complexity.
Expected running time can be reduced to O(N) if one uses a hash table instead of a sorted array, but one needs to be careful to always return the rightmost index which is smaller than the current index.
Correctly working algorithm, time complexity is O(n), if you count number operations closely.
public void SubArraySum(int[] arr, int d, int sum)
{
int n = arr.Length-1;
int curr_sum = arr[0], start = 0, i;
/* Add elements one by one to curr_sum and if the curr_sum exceeds the
sum, then remove starting element */
for (i = 1; i <= n; i++)
{
// If curr_sum exceeds the sum, then remove the starting elements
while (curr_sum > sum && start < i - 1)
{
curr_sum = curr_sum - arr[start];
start++;
}
// If curr_sum becomes equal to sum, then return true
if (curr_sum == sum && Math.Abs(start - i - 1) <= d)
{
Console.WriteLine("Sum found between indexes {0} and {1}", start, i - 1);
return;
}
// Add this element to curr_sum
if (i < n)
curr_sum = curr_sum + arr[i];
}
// If we reach here, then no subarray
Console.WriteLine("No subarray found");
}
Hope this help :)

Need idea for solving this algorithm puzzle

I've came across some similar problems to this one in the past, and I still haven't got good idea how to solve this problem. Problem goes like this:
You are given an positive integer array with size n <= 1000 and k <= n which is the number of contiguous subarrays that you will have to split your array into. You have to output minimum m, where m = max{s[1],..., s[k]}, and s[i] is the sum of the i-th subarray. All integers in the array are between 1 and 100. Example :
Input: Output:
5 3 >> n = 5 k = 3 3
2 1 1 2 3
Splitting array into 2+1 | 1+2 | 3 will minimize the m.
My brute force idea was to make first subarray end at position i (for all possible i) and then try to split the rest of the array in k-1 subarrays in the best way possible. However, this is exponential solution and will never work.
So I'm looking for good ideas to solve it. If you have one please tell me.
Thanks for your help.
You can use dynamic programming to solve this problem, but you can actually solve with greedy and binary search on the answer. This algorithm's complexity is O(n log d), where d is the output answer. (An upper bound would be the sum of all the elements in the array.) (or O( n d ) in the size of the output bits)
The idea is to binary search on what your m would be - and then greedily move forward on the array, adding the current element to the partition unless adding the current element pushes it over the current m -- in that case you start a new partition. The current m is a success (and thus adjust your upper bound) if the numbers of partition used is less than or equal to your given input k. Otherwise, you used too many partitions, and raise your lower bound on m.
Some pseudocode:
// binary search
binary_search ( array, N, k ) {
lower = max( array ), upper = sum( array )
while lower < upper {
mid = ( lower + upper ) / 2
// if the greedy is good
if partitions( array, mid ) <= k
upper = mid
else
lower = mid
}
}
partitions( array, m ) {
count = 0
running_sum = 0
for x in array {
if running_sum + x > m
running_sum = 0
count++
running_sum += x
}
if running_sum > 0
count++
return count
}
This should be easier to come up with conceptually. Also note that because of the monotonic nature of the partitions function, you can actually skip the binary search and do a linear search, if you are sure that the output d is not too big:
for i = 0 to infinity
if partitions( array, i ) <= k
return i
Dynamic programming. Make an array
int best[k+1][n+1];
where best[i][j] is the best you can achieve splitting the first j elements of the array int i subarrays. best[1][j] is simply the sum of the first j array elements. Having row i, you calculate row i+1 as follows:
for(j = i+1; j <= n; ++j){
temp = min(best[i][i], arraysum[i+1 .. j]);
for(h = i+1; h < j; ++h){
if (min(best[i][h], arraysum[h+1 .. j]) < temp){
temp = min(best[i][h], arraysum[h+1 .. j]);
}
}
best[i+1][j] = temp;
}
best[m][n] will contain the solution. The algorithm is O(n^2*k), probably something better is possible.
Edit: a combination of the ideas of ChingPing, toto2, Coffee on Mars and rds (in the order they appear as I currently see this page).
Set A = ceiling(sum/k). This is a lower bound for the minimum. To find a good upper bound for the minimum, create a good partition by any of the mentioned methods, moving borders until you don't find any simple move that still decreases the maximum subsum. That gives you an upper bound B, not much larger than the lower bound (if it were much larger, you'd find an easy improvement by moving a border, I think).
Now proceed with ChingPing's algorithm, with the known upper bound reducing the number of possible branches. This last phase is O((B-A)*n), finding B unknown, but I guess better than O(n^2).
I have a sucky branch and bound algorithm ( please dont downvote me )
First take the sum of array and dvide by k, which gives you the best case bound for you answer i.e. the average A. Also we will keep a best solution seen so far for any branch GO ( global optimal ).Lets consider we put a divider( logical ) as a partition unit after some array element and we have to put k-1 partitions. Now we will put the partitions greedily this way,
Traverse the array elements summing them up until you see that at the next position we will exceed A, now make two branches one where you put the divider at this position and other where you put at next position, Do this recursiely and set GO = min (GO, answer for a branch ).
If at any point in any branch we have a partition greater then GO or the no of position are less then the partitions left to be put we bound. In the end you should have GO as you answer.
EDIT:
As suggested by Daniel, we could modify the divider placing strategy a little to place it until you reach sum of elements as A or the remaining positions left are less then the dividers.
This is just a sketch of an idea... I'm not sure that it works, but it's very easy (and probably fast too).
You start say by putting the separations evenly distributed (it does not actually matter how you start).
Make the sum of each subarray.
Find the subarray with the largest sum.
Look at the right and left neighbor subarrays and move the separation on the left by one if the subarray on the left has a lower sum than the one on the right (and vice-versa).
Redo for the subarray with the current largest sum.
You'll reach some situation where you'll keep bouncing the separation between the same two positions which will probably mean that you have the solution.
EDIT: see the comment by #rds. You'll have to think harder about bouncing solutions and the end condition.
My idea, which unfortunately does not work:
Split the array in N subarrays
Locate the two contiguous subarrays whose sum is the least
Merge the subarrays found in step 2 to form a new contiguous subarray
If the total number of subarrays is greater than k, iterate from step 2, else finish.
If your array has random numbers, you can hope that a partition where each subarray has n/k is a good starting point.
From there
Evaluate this candidate solution, by computing the sums
Store this candidate solution. For instance with:
an array of the indexes of every sub-arrays
the corresponding maximum of sum over sub-arrays
Reduce the size of the max sub-array: create two new candidates: one with the sub-array starting at index+1 ; one with sub-array ending at index-1
Evaluate the new candidates.
If their maximum is higher, discard
If their maximum is lower, iterate on 2, except if this candidate was already evaluated, in which case it is the solution.

Resources