Three prong partition (Dynamic Programming example) - c

I have an array of int which contains numbers like {47, 94, 79, 90, 89, 14, 82, 92}. The array must be divided into three sub-arrays so that the sum of each array is the smallest possible, aka minimal. I think the its worth solving using recursion, however the approach escapes me, i also thought of using qsort on the initial array and then dividing it "greedily" but it doesnt work all the time (e.g taking the lowest and highest number and so on).
For example the numbers above would be divided into:
1) {94, 90, 14}
2) {92, 89}
3) {82, 79, 47}
Here the third array contains the highest minimal sum, which is 208. The order of the numbers does not matter. The question is how to fairly divide the numbers into three groups so that they form the lowest sum. Do I have to test all possibilities?

The described problem can be modelled using dynamic programming. We can define a state space as follows.
v[i,t1,t2] := minimal load in partition 3 attainable for items
in {0,...,i} where the total load in partition 1
is exactly t1 and the total load in t2 is exactly t2
if such a load exists and positive infinity otherwise
For the state space, i is in {0,...n}, and t1, t2 are in {0,...,P} where P is the total sum of the items, which is an upper bound for the objective value and is bounded by n*smax where smax is the largest value occuring in the input.
We obtain the following recurrence relation, where the cases basically depend on iteratively chosing for each element into which partition it is assigned, where s_i denotes the size of the i-th item.
v[i,t1,t2] = min { v[i-1,t1-s_i,t2],
v[i-1,t1,t2-s_i],
v[i-1,t1,t2] + s_i }
The first term in the minimum expression corresponds to assigning item i into the partition 3, the second case corresponds to assigning item i into partition 2 and the third case corresponds to assigning item i into the partition 3. After the state space is filled, the desired result (namely the minimal maximum load of the partition) can be obtained by evaluation of the following expression.
Result = min { max { t1, t2, v[n,t1,t2] : t1, t2 in {0,...,P} } }
In the maximum expression above, t1 would correspond to the load in partition 1, t2 would correspond to the load in partition 2 and the
state value v[n,t1,t2] corresponds to the load in partition 3. The running time of the sketched algorithm can be bounded by O(n^3*smax), which is a pseudopolynomial runtime bound. If additionaly the optimal assignment of the items into the partition is desired, either backtracking or auxiliary data structures have to be used.
Note that it seems artificial to give one of the identical partitions a special role as its load is the value of the states while the load of the other partitions is used for the axes of the state space. Furthermore, at first glance, the value of the state might seem to be trivially obtainable as it is simply the remaining total load
sum_{j=1}^{i} s_i - ( t1 + t2 )
but this is not the case, as the above quantity only determines the load in partition 3 if such an assignment actually exists; in the definition of the state space, the usage of positive infinity indicates the nonexistence of such an assignment.
The approach is very similar to the one described here, page 12 ff. In total, the described problem can be seen as a scheduling problem, namely minimization of the makespan of 3 identical parallel machines. In the so-called three-field notation, the problem is denoted as P3||Cmax, which means that the number of machines is not part of the input, but fixed.

Related

Maximize sum of weights with constraints given on left and right indices in array

I recently came through an interesting coding problem, which is as follows:
There are n boxes, let us assume this is an array of n boxes.
For each index i of this array, three values are given -
1.) Weight(i)
2.) Left(i)
3.) Right(i)
left(i) means - if weight[i] is chosen, we are not allowed to choose left[i] elements from the left of this ith element.
Similarly, right[i] means if arr[i] is chosen, we are not allowed to choose right[i] elements from the right of it.
Example :
Weight[2] = 5
Left[2] = 1
Right[2] = 3
Then, if I pick element at position 2, I get weight of 5 units. But, I cannot pick elements at position {1} (due to left constraint). And cannot pick elements at position {3,4,5} (due to right constraint).
Objective - We need to calculate the maximum sum of the weights we can pick.
Sample Test Case :-
**Input: **
5
2 0 3
4 0 0
3 2 0
7 2 1
9 2 0
**Output: **
13
Note - First column is weights, Second column is left constraints, Third column is right constraints
I used Dynamic Programming approach(similar to Longest Increasing Subsequence) to reach a O(n^2) solution. But, not able to think of a O(n*logn) solution. (n can be up to 10^5.)
I also tried to use priority queue, in which elements with lower value of (right[i] + i) are given higher priority(assigned higher priority to element with lower value of "i", in case primary key value is equal). But, it is also giving timeout error.
Any other approach for this? or any optimization in priority queue method? I can post both of my codes if needed.
Thanks.
One approach is to use a binary indexed tree to create a data structure that makes it easy to do two operations in O(logn) time each:
Insert number into an array
Find maximum in a given range
We will use this data structure to hold the maximum weight that can be achieved by selecting box i along with an optimal selection of boxes to the left.
The key is that we will only insert values into this data structure when we reach a point where the right constraint has been met.
To find the best value for box i, we need to find the maximum value in the data structure for all points up to location i-left[i], which can be done in O(logn).
The final algorithm is to loop over i=0..n-1 and for each i:
Compute result for box i by finding maximum in range 0..(i-left[i])
Schedule the result to be added when we reach location i+right[i]
Add any previously scheduled results into our data structure
The final result is the maximum value in the whole data structure.
Overall, the complexity is o(nlogn) because each value of i results in one lookup and one update operation.

The best order to choose elements in the random array to maximize output?

We have an array as input to production.
R = [5, 2, 8, 3, 6, 9]
If ith input is chosen the output is sum of ith element, the max element whose index is less than i and the min element whose index is greater than i.
For example if I take 8, output would be 8+5+3=16.
Selected items cannot be selected again. So, if I select 8 the next array for next selection would look like R = [5, 2, 3, 6, 9]
What is the order to choose all inputs with maximum output in total? If possible, please send dynamic programming solutions.
I'll start the bidding with an O(n2n) solution . . .
There are a number of ambiguities in your description of the problem, that you have declined to address in comments. None of these ambiguities affects the runtime complexity of this solution, but they do affect implementation details of the solution, so the solution is necessarily somewhat of a sketch.
The solution is as follows:
Create an array results of 2n integers. Each array index i will denote a certain subsequence of the input, and results[i] will be the greatest sum that we can achieve starting with that subsequence.
A convenient way to manage the index-to-subsequence mapping is to represent the first element of the input using the least significant bit (the 1's place), the second element with the 2's place, etc.; so, for example, if our input is [5, 2, 8, 3, 6, 9], then the subsequence 5 2 8 would be represented as array index 0001112 = 7, meaning results[7]. (You can also start with the most significant bit ā€” which is probably more intuitive ā€” but then the implementation of that mapping is a little bit less convenient. Up to you.)
Then proceed in order, from subset #0 (the empty subset) up through subset #2nāˆ’1 (the full input), calculating each array-element by seeing how much we get if we select each possible element and add the corresponding previously-stored values. So, for example, to calculate results[7] (for the subsequence 5 2 8), we select the largest of these values:
results[6] plus how much we get if we select the 5
results[5] plus how much we get if we select the 2
results[3] plus how much we get if we select the 8
Now, it might seem like it should require O(n2) time to compute any given array-element, since there are n elements in the input that we could potentially select, and seeing how much we get if we do so requires examining all other elements (to find the maximum among prior elements and the minimum among later elements). However, we can actually do it in just O(n) time by first making a pass from right to left to record the minimal value that is later than each element of the input, and then proceeding from left to right to try each possible value. (Two O(n) passes add up to O(n).)
An important caveat: I suspect that the correct solution only ever involves, at each step, selecting either the rightmost or second-to-rightmost element. If so, then the above solution calculates many, many more values than an algorithm that took that into account. For example, the result at index 1110002 is clearly not relevant in that case. But I can't prove this suspicion, so I present the above O(n2n) solution as the fastest solution whose correctness I'm certain of.
(I'm assuming that the elements are nonnegative absent a suggestion to the contrary.)
Here's an O(n^2)-time algorithm based on ruakh's conjecture that there exists an optimal solution where every selection is from the rightmost two, which I prove below.
The states of the DP are (1) n, the number of elements remaining (2) k, the index of the rightmost element. We have a recurrence
OPT(n, k) = max(max(R(0), ..., R(n - 2)) + R(n - 1) + R(k) + OPT(n - 1, k),
max(R(0), ..., R(n - 1)) + R(k) + OPT(n - 1, n - 1)),
where the first line is when we take the second rightmost element, and the second line is when we take the rightmost. The empty max is zero. The base cases are
OPT(1, k) = R(k)
for all k.
Proof: the condition of choosing from the two rightmost elements is equivalent to the restriction that the element at index i (counting from zero) can be chosen only when at most i + 2 elements remain. We show by induction that there exists an optimal solution satisfying this condition for all i < j where j is the induction variable.
The base case is trivial, since every optimal solution satisfies the vacuous restriction for j = 0. In the inductive case, assume that there exists an optimal solution satisfying the restriction for all i < j. If j is chosen when there are more than j + 2 elements left, let's consider what happens if we defer that choice until there are exactly j + 2 elements left. None of the elements left of j are chosen in this interval by the inductive hypothesis, so they are irrelevant. Choosing the elements right of j can only be at least as profitable, since including j cannot decrease the max. Meanwhile, the set of elements left of j is the same at both times, and the set of the elements right of j is a subset at the later time as compared to the earlier time, so the min does not decrease. We conclude that this deferral does not affect the profitability of the solution.

How to find the biggest subset of an array given some constraints?

There is an array A[1........N]. How to find the largest subset of the array such that product of any two distinct element of the subset is not a perfect cube. Upper bound for N is 100000.
Example:
For A = 1 2 4 8. Answer will be {1, 2} or {1, 4} or {8, 2} 0r {8, 4}.
1 and 8 cannot come together in the solution.
Similarly 2 and 4.
My approach.
check all the subset of the given array and return the subset of maximum length which satisfies the constraint. It will take O(N*N*2^N).
create a graph out of the given array. Two nodes in the graph will be connected if their product is perfect cube. Our main task is to remove the minimum number of nodes such that there are no edges left in the graph (when we remove any node all the edges associated with the node will disappear). Here the main issue is the space (representation of graph). In the worst case size of the graph will be O(N*N).
Please help.
Explanation
Consider the factorization of each number as follows:
A[i] = x^3.y^2.z
i.e. we first find the largest cube that divides (and call it x), then the largest square (and call it y), then call whatever is left over z.
The product of A[i] with another A[j]=X^3.Y^2.Z will be a cube if and only if Y=z and Z=y.
Therefore, if you consider groups of numbers with the same value of y^2.z, these groups form into pairs, where for each pair you cannot take an element from both linked groups.
Clearly the best case is to take all the elements from whichever group is the largest in each pair.
There is one special case, where y^2.z is equal to 1. In this case, any number in the group is already a perfect cube and cannot be paired with another number from the same group. Therefore you can add just 1 number from the set of perfect cubes.
Example
Suppose our array was (expressed as a prime factorization):
A[0] = 2^3
A[1] = 3^3
A[2] = 2^2.3.5^3
A[3] = 2^2.3.7^3
A[4] = 2.3^2.13^3
We first assign these into groups:
Value 1 = Group A (2^3, 3^3)
Value 2^2.3 = Group B (2^2.3.5^3, 2^2.3.7^3)
Value 2.3^2 = Group C (2.3^2.13^3)
Group A is paired with itself, while group B is paired with group C.
Therefore we can take one element from group A, and the whole of group B, for a total of 3 elements in the final subset.
You can formulate it as a largest clique problem.
Create a graph with each number as a vertex and connect two vertices if their product is not a cube.
Now find the largest clique in the graph. See https://en.wikipedia.org/wiki/Clique_problem#Finding_maximum_cliques_in_arbitrary_graphs

Optimize queries performed on a subarray

I recently interviewed at Google. Because of this question my process didn't move forward after 2 rounds.
Suppose you are given an array of numbers. You can be given queries
to:
Find the sum of the values between indexes i and j.
Update value at index i to a new given value.
Find the maximum of the values between indexes i and j.
Check whether the subarray between indexes i and j, both inclusive, is in ascending or descending order.
I gave him a solution but it was to check the subarray between the indexes i and j. He asked me to optimize it. I thought of using a hashtable so that if the starting index is same and the ending index is more than the previous found, we store the maximum and whether its in ascending or descending and check only the remaining subarray. But that also didn't optimize it as much as required.
I'd love to know how I can optimize the solution so as to make it acceptable.
Constraints:
Everything from [1,10^5]
Thanks :)
All this queries can be answered in O(log N) time per query in the worst case(with O(N) time for preprocessing). You can just build a segment tree and maintain the sum, the maximum and two boolean flags(they indicate whether the range which corresponds to this node is sorted in ascending/descending order or not) for each node. All this values can be recomputed efficiently for an update query because only O(log N) nodes can change(they lie on the path from the root to a leaf which corresponds to the changing element). All other range queries(sum, max, sorted or not) are decomposed into O(log N) nodes(due to the properties of a segment tree), and it is easy to combine the value of two nodes in O(1)(for example, for sum the result of combining 2 nodes is just the sum of values for these nodes).
Here is some pseudo code. It shows what data should be stored in a node and how to combine values of 2 nodes:
class Node {
bool is_ascending
bool is_descending
int sum
int max
int leftPos
int rightPos
}
Node merge(Node left, Node right) {
res = Node()
res.leftPos = left.leftPos
res.rightPos = right.rightPos
res.sum = left.sum + right.sum
res.max = max(left.max, right.max)
res.is_ascending = left.is_ascending and right.is_ascending
and array[left.rightPos] < array[right.leftPos]
res.is_descending = left.is_descending and right.is_descending
and array[left.rightPos] > array[right.leftPos]
return res
}
As andy pointed out in the comments: The queries are quite different in nature, so the "best" solution will probably depend on which query type is executed most frequently.
However, the task
Find the sum of the values between indexes i and j.
can efficiently be solved by performing a scan/prefix sum computation of the array. Imagine an array of int values
index: 0 1 2 3 4 5 6
array: [ 3, 8, 10, -5, 2, 12, 7 ]
Then you compute the Prefix Sum:
index: 0 1 2 3 4 5 6, 7
prefix: [ 0, 3, 11, 21, 16, 18, 30, 37 ]
Note that this can be computed particularly efficient in parallel. In fact, this is one important building block of many parallel algorithms, as described in the thesis "Vector Models for Data-Parallel Computing" by Guy E. Blelloch (thesis as PDF File).
Additionally, it can be computed in two ways: Either starting with the value from array[0], or starting with 0. This will, of course, affect how the resulting prefix array has to be accessed. Here, I started with 0, and made the resulting array one element longer than the input array. This may also be implemented differently, but in this case, it makes it easier to obey the array limits (although one would still have to clarify in which cases indices should be considered as inclusive or exclusive).
However, given this prefix sum array, one can compute the sum of elements between indices i and j in constant time, by simply subtracting the corresponding values of the prefix sum array:
sum(n=i..j)(array[n]) = (prefix[j+1] - prefix[i])
For example:
sum(n=2..5)(array[n]) = 10 + (-5) + 2 + 12 = 19
prefix[5+1] - prefix[2] = 30 - 11 = 19
For task 2,
Update value at index i to a new given value.
this would mean that the prefix sums would have to be updated. This could be done brute-force, in linear time, by just adding the difference of the old value and the new value to all prefix sums that appear after the modified element (but for this, also see the notes in the last section of this answer)
The tasks 3 and 4
Find the maximum of the values between indexes i and j.
Check whether the subarray between indexes i and j, both inclusive, is in ascending or descending order.
I could imagine that the maximum value could simply be tracked while building the prefix sums, as well as checking whether the values are only ascending or descending. However, when values are updated, this information would have to be re-computed.
In any case, there are some data structures that deal with prefix sums in particular. I think that a Fenwick tree might allow to implement some of the O(n) operations mentioned above in O(logn), but I have not yet looked at this in detail.

efficient methods to do summation

Is there any efficient techniques to do the following summation ?
Given a finite set A containing n integers A={X1,X2,ā€¦,Xn}, where Xi is an integer. Now there are n subsets of A, denoted by A1, A2, ... , An. We want to calculate the summation for each subset. Are there some efficient techniques ?
(Note that n is typically larger than the average size of all the subsets of A.)
For example, if A={1,2,3,4,5,6,7,9}, A1={1,3,4,5} , A2={2,3,4} , A3= ... . A naive way of computing the summation for A1 and A2 needs 5 Flops for additions:
Sum(A1)=1+3+4+5=13
Sum(A2)=2+3+4=9
...
Now, if computing 3+4 first, and then recording its result 7, we only need 3 Flops for addtions:
Sum(A1)=1+7+5=13
Sum(A2)=2+7=9
...
What about the generalized case ? Is there any efficient methods to speed up the calculation? Thanks!
For some choices of subsets there are ways to speed up the computation, if you don't mind doing some (potentially expensive) precomputation, but not for all. For instance, suppose your subsets are {1,2}, {2,3}, {3,4}, {4,5}, ..., {n-1,n}, {n,1}; then the naive approach uses one arithmetic operation per subset, and you obviously can't do better than that. On the other hand, if your subsets are {1}, {1,2}, {1,2,3}, {1,2,3,4}, ..., {1,2,...,n} then you can get by with n-1 arithmetic ops, whereas the naive approach is much worse.
Here's one way to do the precomputation. It will not always find optimal results. For each pair of subsets, define the transition cost to be min(size of symmetric difference, size of Y - 1). (The symmetric difference of X and Y is the set of things that are in X or Y but not both.) So the transition cost is the number of arithmetic operations you need to do to compute the sum of Y's elements, given the sum of X's. Add the empty set to your list of subsets, and compute a minimum-cost directed spanning tree using Edmonds' algorithm (http://en.wikipedia.org/wiki/Edmonds%27_algorithm) or one of the faster but more complicated variations on that theme. Now make sure that when your spanning tree has an edge X -> Y you compute X before Y. (This is a "topological sort" and can be done efficiently.)
This will give distinctly suboptimal results when, e.g., you have {1,2}, {3,4}, {1,2,3,4}, {5,6}, {7,8}, {5,6,7,8}. After deciding your order of operations using the procedure above you could then do an optimization pass where you find cheaper ways to evaluate each set's sum given the sums already computed, and this will probably give fairly decent results in practice.
I suspect, but have made no attempt to prove, that finding an optimal procedure for a given set of subsets is NP-hard or worse. (It is certainly computable; the set of possible computations you might do is finite. But, on the face of it, it may be awfully expensive; potentially you might be keeping track of about 2^n partial sums, be adding any one of them to any other at each step, and have up to about n^2 steps, for a super-naive cost of (2^2n)^(n^2) = 2^(2n^3) operations to try every possibility.)
Assuming that 'addition' isn't simply an ADD operation but instead some very intensive function involving two integer operands, then an obvious approach would be to cache the results.
You could achieve that via a suitable data structure, for example a key-value dictionary containing keys formed by the two operands and the answers as the value.
But as you specified C in the question, then the simplest approach would be an n by n array of integers, where the solution to x + y is stored at array[x][y].
You can then repeatedly iterate over the subsets, and for each pair of operands you check the appropriate position in the array. If no value is present then it must be calculated and placed in the array. The value then replaces the two operands in the subset and you iterate.
If the operation is commutative then the operands should be sorted prior to looking up the array (i.e. so that the first index is always the smallest of the two operands) as this will maximise "cache" hits.
A common optimization technique is to pre-compute intermediate results. In your case, you might pre-compute all sums with 2 summands from A and store them in a lookup table. This will result in |A|*|A+1|/2 table entries, where |A| is the cardinality of A.
In order to compute the element sum of Ai, you:
look up the sum of the first two elements of Ai and save them in tmp
while there is an element x left in Ai:
look up the sum of tmp and x
In order to compute the element sum of A1 = {1,3,4,5} from your example, you do the following:
lookup(1,3) = 4
lookup(4,4) = 8
lookup(8,5) = 13
Note that computing the sum of any given Ai doesn't require summation, since all the work has already been conducted while pre-computing the lookup table.
If you store the lookup table in a hash table, then lookup() is in O(1).
Possible optimizations to this approach:
construct the lookup table while computing the summation results; hence, you only compute those summations that you actually need. Your lookup table is now a cache.
if your addition operation is commutative, you can save half of your cache size by storing only those summations where the smaller summand comes first. Then modify lookup() such that lookup(a,b) = lookup(b,a) if a > b.
If assuming summation is time consuming action you can find LCS of every pair of subsets (by assuming they are sorted as mentioned in comments, or if they are not sorted sort them), after that calculate sum of LCS of maximum length (over all LCS in pairs), then replace it's value in related arrays with related numbers, update their LCS and continue this way till there is no LCS with more than one number. Sure this is not optimum, but it's better than naive algorithm (smaller number of summation). However you can do backtracking to find best solution.
e.g For your sample input:
A1={1,3,4,5} , A2={2,3,4}
LCS (A_1,A_2) = {3,4} ==>7 ==>replace it:
A1={1,5,7}, A2={2,7} ==> LCS = {7}, maximum LCS length is `1`, so calculate sums.
Still you can improve it by calculation sum of two random numbers, then again taking LCS, ...
NO. There is no efficient techique.
Because it is NP complete problem. and there are no efficient solutions for such problem
why is it NP-complete?
We could use algorithm for this problem to solve set cover problem, just by putting extra set in set, conatining all elements.
Example:
We have sets of elements
A1={1,2}, A2={2,3}, A3 = {3,4}
We want to solve set cover problem.
we add to this set, set of numbers containing all elements
A4 = {1,2,3,4}
We use algorhitm that John Smith is aking for and we check solution A4 is represented whit.
We solved NP-Complete problem.

Resources