Array balancing - arrays

we have two arrays a[] and b[] and we need to find minimum absolute difference between sum of two arrays a & b and minimum no. of moves to make minimum absolute difference.
Example : a[ ] = {70,30,33,23,4,4,34,95} sum = 293b[ ] = {50,10,10,7} sum = 77
move 95,23 from array a to b.
move 10 from array a to b
after moving both the array's sum becomes 185
output is 0 , 3 (difference between two arrays , no. of moves)

The first part of your problem, "find minimum absolute difference between sum of two arrays a & b", is a variation of the Knapsack problem. Wikipedia defines that as "Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible."
To see this, combine all the values in a and in b into a new array ab and find half the sum of its values. You want to find elements in ab that sum to that half-sum, or as close to it as possible. You could then place those values and a and the rest in b, and that is one of the ways to get the minimum absolute difference.
To find your "minimum number of moves" we could find all the ways to solve the knapsack problem, then for each solution find how many moves it would take to get back to the original a and b (or the original b and a if that takes fewer moves).
The computational complexity of just the first part of your problem is famously NP-complete, so expect a long-running program for any sizable arrays. The Wikipedia article has a variety of algorithms to solve that first part of your problem, so you can start there and make a choice of algorithms.
No wonder this is a competitive-programming problem!

Related

Algorithm for searching an array for 5 elements which sum to a value

[I asked lately a similar question, Search unsorted array for 3 elements which sum to a value
and got wonderful answers, thank you all! :)]
I need your help for solving the following problem:
I am looking for an algorithm, the time-complexity must be ϴ( n³ ).
The algorithm searches an unsorted array (of n integers) for 5 different integers
which sum to a given z.
E.g.: for the input: ({2,5,7,6,3,4,9,8,21,10} , 22)
the output should be true for we can sum up 2+7+6+3+4=22
(the sorting doesn't really matter. The array can be sorted first without affecting the complexity.
So you can look at the problem as if the array is already sorted.)
-No memory constraints-
-We only know that the array elements are n integers.-
Any help would be appriciated.
Algorithm:
1) Generate an array consisting of pairs of your initial integers and sort it. That step will take O(n^2 * log (n^2)) time.
2) Choose a value from your initial array. O(n) ways.
3) Now you have a very similar problem to the linked one. You have to choose two pairs such that their sum will be equal to z - chosen value. Thankfully, you have an array of all pairs, already sorted, of length O(n^2). Finding such pairs should be straightforward -- same thing you did in a 3 integer sum problem. You make two pointers and move both of them O(n^2) times in total.
O(n^3) total complexity.
You may get into some problems with finding pairs that consist of your chosen value. Skip every pair that consists of your chosen value (just move the pointer further when you reach such a pair like it never existed).
Let's say that you have two pairs, p1 and p2, such that sum(p1) + sum(p2) + chosen value = z. If all of the integers in p1 and p2 are different, you have the solution. If not, that's where it gets a little bit messy.
Let's fix p1 and check the next value after p2. It may have the same sum as p2 since two different pairs can have same sum. If it does, definitely you will not have the same collision with p1 as you had with p2, but you may get a collision with the other integer of p1. If so, check the second value after p2, if it also has the same sum -- it definitely won't have any collision with p1.
So assuming that there are at least 3 pairs with same sum as p1 or p2, you will always find a solution checking 3 values for fixed p1 or checking 3 values for fixed p2.
The only possibility left is that there are less than 3 pairs with same sum as p1 and there are less than 3 pairs with same sum as p2. You can choose them in up to 4 ways -- just check each possibility.
It is a bit unpleasant, but in constant amount of operations you are able to handle such problems. That means the total complexity is O(n^3).

Minimum Complexity of two lists element summation comparison

I have a question in algorithm design about arrays, which should be implement in C language.
Suppose that we have an array which has n elements. For simplicity n is power of '2' like 1, 2, 4, 8, 16 , etc. I want to separate this to 2 parts with (n/2) elements. Condition of separating is lowest absolute difference between sum of all elements in two arrays for example if I have this array (9,2,5,3,6,1,4,7) it will be separate to these arrays (9,5,1,3) and (6,7,4,2) . summation of first array's elements is 18 and the summation of second array's elements is 19 and the difference is 1 and these two arrays are the answer but two arrays like (9,5,4,2) and (7,6,3,1) isn't the answer because the difference of element summation is 4 and we have found 1 . so 4 isn't the minimum difference. How to solve this?
Thank you.
This is the Partition Problem, which is unfortunately NP-Hard.
However, since your numbers are integers, if they are relatively low, there is a pseudo polynomial O(W*n^2) solution using Dynamic Programming (where W is sum of all elements).
The idea is to create the DP matrix of size (W/2+1)*(n+1)*(n/2+1), based on the following recursive formula:
D(0,i,0) = true
D(0,i,k) = false k != 0
D(x,i,k) = false x < 0
D(x,0,k) = false x > 0
D(x,i,0) = false x > 0
D(x,i,k) = D(x,i-1,k) OR D(x-arr[i], i-1,k-1)
The above gives a 3d matrix, where each entry D(x,i,k) says if there is a subset containing exactly k elements, that sums to x, and uses the first i elements as candidates.
Once you have this matrix, you just need to find the highest x (that is smaller than SUM/2) such that D(x,n,n/2) = true
Later, you can get the relevant subset by going back on the table and "retracing" your choices at each step. This thread deals with how it is done on a very similar problem.
For small sets, there is also the alternative of a naive brute force solution, which basically splits the array to all possible halves ((2n)!/(n!*n!) of those), and picks the best one out of them.

efficient methods to do summation

Is there any efficient techniques to do the following summation ?
Given a finite set A containing n integers A={X1,X2,…,Xn}, where Xi is an integer. Now there are n subsets of A, denoted by A1, A2, ... , An. We want to calculate the summation for each subset. Are there some efficient techniques ?
(Note that n is typically larger than the average size of all the subsets of A.)
For example, if A={1,2,3,4,5,6,7,9}, A1={1,3,4,5} , A2={2,3,4} , A3= ... . A naive way of computing the summation for A1 and A2 needs 5 Flops for additions:
Sum(A1)=1+3+4+5=13
Sum(A2)=2+3+4=9
...
Now, if computing 3+4 first, and then recording its result 7, we only need 3 Flops for addtions:
Sum(A1)=1+7+5=13
Sum(A2)=2+7=9
...
What about the generalized case ? Is there any efficient methods to speed up the calculation? Thanks!
For some choices of subsets there are ways to speed up the computation, if you don't mind doing some (potentially expensive) precomputation, but not for all. For instance, suppose your subsets are {1,2}, {2,3}, {3,4}, {4,5}, ..., {n-1,n}, {n,1}; then the naive approach uses one arithmetic operation per subset, and you obviously can't do better than that. On the other hand, if your subsets are {1}, {1,2}, {1,2,3}, {1,2,3,4}, ..., {1,2,...,n} then you can get by with n-1 arithmetic ops, whereas the naive approach is much worse.
Here's one way to do the precomputation. It will not always find optimal results. For each pair of subsets, define the transition cost to be min(size of symmetric difference, size of Y - 1). (The symmetric difference of X and Y is the set of things that are in X or Y but not both.) So the transition cost is the number of arithmetic operations you need to do to compute the sum of Y's elements, given the sum of X's. Add the empty set to your list of subsets, and compute a minimum-cost directed spanning tree using Edmonds' algorithm (http://en.wikipedia.org/wiki/Edmonds%27_algorithm) or one of the faster but more complicated variations on that theme. Now make sure that when your spanning tree has an edge X -> Y you compute X before Y. (This is a "topological sort" and can be done efficiently.)
This will give distinctly suboptimal results when, e.g., you have {1,2}, {3,4}, {1,2,3,4}, {5,6}, {7,8}, {5,6,7,8}. After deciding your order of operations using the procedure above you could then do an optimization pass where you find cheaper ways to evaluate each set's sum given the sums already computed, and this will probably give fairly decent results in practice.
I suspect, but have made no attempt to prove, that finding an optimal procedure for a given set of subsets is NP-hard or worse. (It is certainly computable; the set of possible computations you might do is finite. But, on the face of it, it may be awfully expensive; potentially you might be keeping track of about 2^n partial sums, be adding any one of them to any other at each step, and have up to about n^2 steps, for a super-naive cost of (2^2n)^(n^2) = 2^(2n^3) operations to try every possibility.)
Assuming that 'addition' isn't simply an ADD operation but instead some very intensive function involving two integer operands, then an obvious approach would be to cache the results.
You could achieve that via a suitable data structure, for example a key-value dictionary containing keys formed by the two operands and the answers as the value.
But as you specified C in the question, then the simplest approach would be an n by n array of integers, where the solution to x + y is stored at array[x][y].
You can then repeatedly iterate over the subsets, and for each pair of operands you check the appropriate position in the array. If no value is present then it must be calculated and placed in the array. The value then replaces the two operands in the subset and you iterate.
If the operation is commutative then the operands should be sorted prior to looking up the array (i.e. so that the first index is always the smallest of the two operands) as this will maximise "cache" hits.
A common optimization technique is to pre-compute intermediate results. In your case, you might pre-compute all sums with 2 summands from A and store them in a lookup table. This will result in |A|*|A+1|/2 table entries, where |A| is the cardinality of A.
In order to compute the element sum of Ai, you:
look up the sum of the first two elements of Ai and save them in tmp
while there is an element x left in Ai:
look up the sum of tmp and x
In order to compute the element sum of A1 = {1,3,4,5} from your example, you do the following:
lookup(1,3) = 4
lookup(4,4) = 8
lookup(8,5) = 13
Note that computing the sum of any given Ai doesn't require summation, since all the work has already been conducted while pre-computing the lookup table.
If you store the lookup table in a hash table, then lookup() is in O(1).
Possible optimizations to this approach:
construct the lookup table while computing the summation results; hence, you only compute those summations that you actually need. Your lookup table is now a cache.
if your addition operation is commutative, you can save half of your cache size by storing only those summations where the smaller summand comes first. Then modify lookup() such that lookup(a,b) = lookup(b,a) if a > b.
If assuming summation is time consuming action you can find LCS of every pair of subsets (by assuming they are sorted as mentioned in comments, or if they are not sorted sort them), after that calculate sum of LCS of maximum length (over all LCS in pairs), then replace it's value in related arrays with related numbers, update their LCS and continue this way till there is no LCS with more than one number. Sure this is not optimum, but it's better than naive algorithm (smaller number of summation). However you can do backtracking to find best solution.
e.g For your sample input:
A1={1,3,4,5} , A2={2,3,4}
LCS (A_1,A_2) = {3,4} ==>7 ==>replace it:
A1={1,5,7}, A2={2,7} ==> LCS = {7}, maximum LCS length is `1`, so calculate sums.
Still you can improve it by calculation sum of two random numbers, then again taking LCS, ...
NO. There is no efficient techique.
Because it is NP complete problem. and there are no efficient solutions for such problem
why is it NP-complete?
We could use algorithm for this problem to solve set cover problem, just by putting extra set in set, conatining all elements.
Example:
We have sets of elements
A1={1,2}, A2={2,3}, A3 = {3,4}
We want to solve set cover problem.
we add to this set, set of numbers containing all elements
A4 = {1,2,3,4}
We use algorhitm that John Smith is aking for and we check solution A4 is represented whit.
We solved NP-Complete problem.

Dynamic Programming Problem.. Array Partitioning..

The question says,
That given an array of size n, we have to output/partition the array into subsets which sum to N.
For E,g,
I/p arr{2,4,5,7}, n=4, N(sum) = 7(given)
O/p = {2,5}, {7}
I saw similar kind of problem/explanation in the url Dynamic Programming3
And I have the following queries in the pdf:-
How could we find the subsets which sum to N, as the logic only tells whether the subset exist or not?
Also, if we change the question a bit, can we find two subsets which has equal average using the same ideology?
Can anybody thrown some light on this Dynamic Programming problem.. :)
Thanks in Advance..
You can try to process recursively:
Given a SORTED array X={x1 ... xn} xi !=0 and an intger N.
First find all the possibilities "made" with just one element:
here if N=xp, eliminate all xi s.t i>=p
second find all the possibilities made with 2 elements:
{ (x1,x2) .... (xp-2,xp-1)}
Sort by sum and elminate all the sums >=N
and you had the rules: xi cannot go with xj when xi+xj >= N
Third with 3 elments:
You create all the part that respect the above rule.
And idem step 2
etc...
Example:
X={1,2,4,7,9,10} N=9
step one:
{9}
X'={1,2,4,7,9}
step 2: cannot chose 9 and 10
X={(1,2) (1,4) (2,4) (1,7) (2,7) (4,7)}
{2,7}
X'={(1,2) (1,4) (2,4) (1,7)}
step 3: 4 and 2 cannot go with 7:
X={(1,2,4)}
no sol
{9} {2,7} are the only solutions
This diminishes the total number of comparaison (that would be 2^n = 2^6=64) you only did : 12 comparaisons
hope it helps
Unfortunately, this is a very difficult problem. Even determining if there exists a single subset summing to your target value is NP-Complete.
If the problem is more restricted, you might be able to find a good algorithm. For example:
Do the subsets have to be contiguous?
Can you ignore subsets with more than K values?
Are the array values guaranteed to be positive?
Are the array values guaranteed to be distinct? What about differing from the other values by at least some constant factor?
Is there some bound on the difference between the smallest and largest value?
The proposed algorithm stores only a single bit of information in the temporary array T[N], namely whether it's reachable at all. Obviously, you can store more information at each index [N], such as the values C[i] used to get there. (It's a variation of the "Dealing with Unlimited Copies" chapter in the PDF)

Given a string of red and blue balls, find min number of swaps to club the colors together

We are given a string of the form: RBBR, where R - red and B - blue.
We need to find the minimum number of swaps required in order to club the colors together. In the above case that answer would be 1 to get RRBB or BBRR.
I feel like an algorithm to sort a partially sorted array would be useful here since a simple sort would give us the number of swaps, but we want the minimum number of swaps.
Any ideas?
This is allegedly a Microsoft interview question according to this.
Take one pass over the string and count the number of reds (#R) and the number of blues (#B). Then take a second pass counting the number of reds in the first #R balls (#r) and the number of blue balls in the first #B balls (#b). The lesser of (#R - #r) and (#B - #b) will be the minimum number of swaps needed.
We are given the string S that we have to convert to the final string F = R^a B^b or B^b R^a. The number of differences between S and F should be even because for every misplaced R there will be a complementary misplaced B. So why not find the minimum number of differences between S and both possible F's and divide that by 2?
For example, you're given S = RBRRBRBR which should convert to
RRRRRBBB
or
BBBRRRRR
Comparing the differences between S and F for each character for each possibility, there are 4 differences for each possible final string so regardless the minimum is 2 swaps.
Let's look at your example. You know that the end state will be RRBB or BBRR. In other words, the end state is always nRmB or mBnR, where n is the number of R's and m is the number o B's in your string.
Since the end state is defined, maybe some sort of path-finding algorithm would be a good aproach for this? How about considering each swap as a state-change and thinking of a heuristic function to aproximate the number of left over swaps needed.
I'm just throwing an idea in the air, but I hope this helps.
Start with two indices simultaneously from the right and left end of the string. Advance the left index until you find an R. Advance the right index backwards until you find a B. Swap them. Repeat until the left index meets the right index, and count the swaps. Then, do the same, but look for B on the left and R on the right. The minimum is the lower of both swap counts.
I think the number of swaps can be derived from the number of inversions required to sort the vector. This is the example of doing the same with permutation vector.
This isn't a technical answer, but I looked at this more intuitively.
RRBBBBR is can be reduced to RBR, since a group of R's can be moved as a single block. This means that the array is really just a N sets of RB.
The only thing that matters is the number of N sets of RB blocks (including incomplete blocks for the last one).
RBR -> 1 swap to get to RRB (2 sets of RB block, RB and R)
RBRB-> 1 swap to get to RRBB (2 full sets of RB blocks)
RBRBRB-> 2 swaps to get to RRRBBB (3 full sets of RB blocks)
RBRBRBRB -> 4 sets of RB = 3 swaps
So to generalize this, the number of swaps needed = N sets of RB block (including incomplete blocks) and subtract 1.

Resources