efficient methods to do summation - c

Is there any efficient techniques to do the following summation ?
Given a finite set A containing n integers A={X1,X2,…,Xn}, where Xi is an integer. Now there are n subsets of A, denoted by A1, A2, ... , An. We want to calculate the summation for each subset. Are there some efficient techniques ?
(Note that n is typically larger than the average size of all the subsets of A.)
For example, if A={1,2,3,4,5,6,7,9}, A1={1,3,4,5} , A2={2,3,4} , A3= ... . A naive way of computing the summation for A1 and A2 needs 5 Flops for additions:
Sum(A1)=1+3+4+5=13
Sum(A2)=2+3+4=9
...
Now, if computing 3+4 first, and then recording its result 7, we only need 3 Flops for addtions:
Sum(A1)=1+7+5=13
Sum(A2)=2+7=9
...
What about the generalized case ? Is there any efficient methods to speed up the calculation? Thanks!

For some choices of subsets there are ways to speed up the computation, if you don't mind doing some (potentially expensive) precomputation, but not for all. For instance, suppose your subsets are {1,2}, {2,3}, {3,4}, {4,5}, ..., {n-1,n}, {n,1}; then the naive approach uses one arithmetic operation per subset, and you obviously can't do better than that. On the other hand, if your subsets are {1}, {1,2}, {1,2,3}, {1,2,3,4}, ..., {1,2,...,n} then you can get by with n-1 arithmetic ops, whereas the naive approach is much worse.
Here's one way to do the precomputation. It will not always find optimal results. For each pair of subsets, define the transition cost to be min(size of symmetric difference, size of Y - 1). (The symmetric difference of X and Y is the set of things that are in X or Y but not both.) So the transition cost is the number of arithmetic operations you need to do to compute the sum of Y's elements, given the sum of X's. Add the empty set to your list of subsets, and compute a minimum-cost directed spanning tree using Edmonds' algorithm (http://en.wikipedia.org/wiki/Edmonds%27_algorithm) or one of the faster but more complicated variations on that theme. Now make sure that when your spanning tree has an edge X -> Y you compute X before Y. (This is a "topological sort" and can be done efficiently.)
This will give distinctly suboptimal results when, e.g., you have {1,2}, {3,4}, {1,2,3,4}, {5,6}, {7,8}, {5,6,7,8}. After deciding your order of operations using the procedure above you could then do an optimization pass where you find cheaper ways to evaluate each set's sum given the sums already computed, and this will probably give fairly decent results in practice.
I suspect, but have made no attempt to prove, that finding an optimal procedure for a given set of subsets is NP-hard or worse. (It is certainly computable; the set of possible computations you might do is finite. But, on the face of it, it may be awfully expensive; potentially you might be keeping track of about 2^n partial sums, be adding any one of them to any other at each step, and have up to about n^2 steps, for a super-naive cost of (2^2n)^(n^2) = 2^(2n^3) operations to try every possibility.)

Assuming that 'addition' isn't simply an ADD operation but instead some very intensive function involving two integer operands, then an obvious approach would be to cache the results.
You could achieve that via a suitable data structure, for example a key-value dictionary containing keys formed by the two operands and the answers as the value.
But as you specified C in the question, then the simplest approach would be an n by n array of integers, where the solution to x + y is stored at array[x][y].
You can then repeatedly iterate over the subsets, and for each pair of operands you check the appropriate position in the array. If no value is present then it must be calculated and placed in the array. The value then replaces the two operands in the subset and you iterate.
If the operation is commutative then the operands should be sorted prior to looking up the array (i.e. so that the first index is always the smallest of the two operands) as this will maximise "cache" hits.

A common optimization technique is to pre-compute intermediate results. In your case, you might pre-compute all sums with 2 summands from A and store them in a lookup table. This will result in |A|*|A+1|/2 table entries, where |A| is the cardinality of A.
In order to compute the element sum of Ai, you:
look up the sum of the first two elements of Ai and save them in tmp
while there is an element x left in Ai:
look up the sum of tmp and x
In order to compute the element sum of A1 = {1,3,4,5} from your example, you do the following:
lookup(1,3) = 4
lookup(4,4) = 8
lookup(8,5) = 13
Note that computing the sum of any given Ai doesn't require summation, since all the work has already been conducted while pre-computing the lookup table.
If you store the lookup table in a hash table, then lookup() is in O(1).
Possible optimizations to this approach:
construct the lookup table while computing the summation results; hence, you only compute those summations that you actually need. Your lookup table is now a cache.
if your addition operation is commutative, you can save half of your cache size by storing only those summations where the smaller summand comes first. Then modify lookup() such that lookup(a,b) = lookup(b,a) if a > b.

If assuming summation is time consuming action you can find LCS of every pair of subsets (by assuming they are sorted as mentioned in comments, or if they are not sorted sort them), after that calculate sum of LCS of maximum length (over all LCS in pairs), then replace it's value in related arrays with related numbers, update their LCS and continue this way till there is no LCS with more than one number. Sure this is not optimum, but it's better than naive algorithm (smaller number of summation). However you can do backtracking to find best solution.
e.g For your sample input:
A1={1,3,4,5} , A2={2,3,4}
LCS (A_1,A_2) = {3,4} ==>7 ==>replace it:
A1={1,5,7}, A2={2,7} ==> LCS = {7}, maximum LCS length is `1`, so calculate sums.
Still you can improve it by calculation sum of two random numbers, then again taking LCS, ...

NO. There is no efficient techique.
Because it is NP complete problem. and there are no efficient solutions for such problem
why is it NP-complete?
We could use algorithm for this problem to solve set cover problem, just by putting extra set in set, conatining all elements.
Example:
We have sets of elements
A1={1,2}, A2={2,3}, A3 = {3,4}
We want to solve set cover problem.
we add to this set, set of numbers containing all elements
A4 = {1,2,3,4}
We use algorhitm that John Smith is aking for and we check solution A4 is represented whit.
We solved NP-Complete problem.

Related

Sum over n-tuples with total sum equal to k

I want to sum over tuples of length n, i.e. I have a vector (m_1,...,m_n) where mi is an integer greater or equal to zero with the constraint that the sum of all vector elements is equal to k.
What is the most efficient way to implement this?
My naive approach would be to iterate through all combinations with m_i between 0 and k and check if they satisfy the criterion, but this seems inefficient.
For instance, if k=2 and n=2, then
(2,0),(1,1),(0,2) would be the possible values of m1,m2 that I would like to have. Is there a way to generate these numbers efficiently (I don't necessarily have to store them all in an array, but I want to iterate over all possible combinations)
Ok, random stuff I deleted.
If you look at FXT book/library by J.Arndt, there is on page 342 section 16.3 "Partition into m parts"
Here is algorithm and reference to the code to generate exactly m-vector of partitioning of n.
You'll probably need to modify it, he doesn't have bins with zeros, starts with ones.
And some thoughts on the matter. n is sum, and you have k bins. Start with |n|0|...|0| combination. Define operation "distribute 1" which is take one from the leftmost bin and distribute it into all other bins.
E.g. D1(|n|0|...|0|)=tuple(|n-1|1|...|0|, ..., |n-1|0|...|1|)
Then you apply D1() to the tuple, and get tuple of tuples. And so on and so forth, till first bin is exhausted.
You could think this as a tree:
root |n|0|...|0|
D1 applied once, k-1 leaves |n-1|1|...|0| ... |n-1|0|...|1|
Next tree level, D1 applied once to previous level, each node getting k-1 children.
THe only thing left is how to traverse it - DFS, BFS, or anything else from https://en.wikipedia.org/wiki/Tree_traversal

Determine if two unsorted (short) arrays share the same elements?

this is a follow-up question to Determine if two unsorted arrays are identical?
Given two unsorted arrays A and B with the same number of distinct elements (positive integers>0), determine if A and B can be rearranged so that they are identical.
I don't want to actually rearrange the elements, just perform a quick and inexpensive check if it is possible (I need to perform this on a large number of such arrays).
I was thinking about a check based on the sum and product of the elements. I.e., if 1. and 2. are true, A and B can be rearranged so that they are identical:
a_1+a_2+...+a_n = b_1+b_2+...+b_n
a_1*a_2*...*a_n = b_1*b_2*...*b_n
However, the mathematical foundations of this approach seem shaky to me. Are there similar proofs, which are mathematically more rigorous?
By The Vieta formulas, the sum and the product of n numbers are the second and last coefficients of a polynomial having those numbers for roots (to a change of sign). The other coefficients remain free, leaving many possibilities for distinct numbers.
E.g. sum = 3, product = 4.
The polynomial x³-3x²-21x-4 has the roots -3.19, -0.19634, 6.3863.
The polynomial x³-3x²-12x-4 has the roots -2, -0.37228, 5.3723.
These two distinct triples have the desired properties.
Addendum:
Comparing all coefficients of the expansion of (x-a)(x-b)...(x-z), which are known as the elementary symmetric polynomials (a+b+...z, ab+bc+...za, abc+bcd+...zab, ..., ab..z) is enough to prove equality of the roots, whatever the order. But I would not recommend this very costly method.

Algorithm for searching an array for 5 elements which sum to a value

[I asked lately a similar question, Search unsorted array for 3 elements which sum to a value
and got wonderful answers, thank you all! :)]
I need your help for solving the following problem:
I am looking for an algorithm, the time-complexity must be ϴ( n³ ).
The algorithm searches an unsorted array (of n integers) for 5 different integers
which sum to a given z.
E.g.: for the input: ({2,5,7,6,3,4,9,8,21,10} , 22)
the output should be true for we can sum up 2+7+6+3+4=22
(the sorting doesn't really matter. The array can be sorted first without affecting the complexity.
So you can look at the problem as if the array is already sorted.)
-No memory constraints-
-We only know that the array elements are n integers.-
Any help would be appriciated.
Algorithm:
1) Generate an array consisting of pairs of your initial integers and sort it. That step will take O(n^2 * log (n^2)) time.
2) Choose a value from your initial array. O(n) ways.
3) Now you have a very similar problem to the linked one. You have to choose two pairs such that their sum will be equal to z - chosen value. Thankfully, you have an array of all pairs, already sorted, of length O(n^2). Finding such pairs should be straightforward -- same thing you did in a 3 integer sum problem. You make two pointers and move both of them O(n^2) times in total.
O(n^3) total complexity.
You may get into some problems with finding pairs that consist of your chosen value. Skip every pair that consists of your chosen value (just move the pointer further when you reach such a pair like it never existed).
Let's say that you have two pairs, p1 and p2, such that sum(p1) + sum(p2) + chosen value = z. If all of the integers in p1 and p2 are different, you have the solution. If not, that's where it gets a little bit messy.
Let's fix p1 and check the next value after p2. It may have the same sum as p2 since two different pairs can have same sum. If it does, definitely you will not have the same collision with p1 as you had with p2, but you may get a collision with the other integer of p1. If so, check the second value after p2, if it also has the same sum -- it definitely won't have any collision with p1.
So assuming that there are at least 3 pairs with same sum as p1 or p2, you will always find a solution checking 3 values for fixed p1 or checking 3 values for fixed p2.
The only possibility left is that there are less than 3 pairs with same sum as p1 and there are less than 3 pairs with same sum as p2. You can choose them in up to 4 ways -- just check each possibility.
It is a bit unpleasant, but in constant amount of operations you are able to handle such problems. That means the total complexity is O(n^3).

Finding max of difference + element for an array

We have an array consisting of each entry as a tuple of two integers. Let the array be A = [(a1, b1), (a2, b2), .... , (an, bn)]. Now we have multiple queries where we are given an integer x, we need to find the maximum value of ai + |x - bi| for 1 <= i <= n.
I understand this can be easily achieved in O(n) time complexity for each query but I am looking for something faster than that, probably O(log n) for each query. I can preprocess the array in O(n) time, but the queries should be done faster than O(n).
Any kind of help would be appreciated.
It seems to be way too easy to over-think this.
For n = 1, the function is v-shaped with a minimum of a1 at b1, with slopes of -1 and 1, respectively - let's call these values ac and bc (for combined).
For an additional pair (ai, bi), one of the pairs may dominate the other (|bc - bi| ≤ |ac - ai), which may then be ignored.
Otherwise, the falling slope of the combination will be from the pair with the larger b, the rising slope from the other.
The minimum will be between the individual b, closer to the b of the pair with the larger a, the distance being half the difference between the (absolute value of the) "coordinate" differences, the minimum value that amount higher.
The main catch is that neither needs to be an integer - the only alternative being exactly in the middle between two integers.
(Ending up with the falling slope from max ai + bi, and the rising slope of max ai - bi.)

Find all possible distances from two arrays

Given two sorted array A and B length N. Each elements may contain natural number less than M. Determine all possible distances for all combinations elements A and B. In this case, if A[i] - B[j] < 0, then the distance is M + (A[i] - B[j]).
Example :
A = {0,2,3}
B = {1,2}
M = 5
Distances = {0,1,2,3,4}
Note: I know O(N^2) solution, but I need faster solution than O(N^2) and O(N x M).
Edit: Array A, B, and Distances contain distinct elements.
You can get a O(MlogM) complexity solution in the following way.
Prepare an array Ax of length M with Ax[i] = 1 if i belongs to A (and 0 otherwise)
Prepare an array Bx of length M with Bx[M-1-i] = 1 if i belongs to B (and 0 otherwise)
Use the Fast Fourier Transform to convolve these 2 sequences together
Inspect the output array, non-zero values correspond to possible distances
Note that the FFT is normally done with floating point numbers, so in step 4 you probably want to test if the output is greater than 0.5 to avoid potential rounding noise issues.
I possible done with optimized N*N.
If convert A to 0 and 1 array where 1 on positions which present in A (in range [0..M].
After convert this array into bitmasks, size of A array will be decreased into 64 times.
This will allow insert results by blocks of size 64.
Complexity still will be N*N but working time will be greatly decreased. As limitation mentioned by author 50000 for A and B sizes and M.
Expected operations count will be N*N/64 ~= 4*10^7. It will passed in 1 sec.
You can use bitvectors to accomplish this. Bitvector operations on large bitvectors is linear in the size of the bitvector, but is fast, easy to implement, and may work well given your 50k size limit.
Initialize two bitvectors of length M. Call these vectA and vectAnswer. Set the bits of vectA that correspond to the elements in A. Leave vectAnswer with all zeroes.
Define a method to rotate a bitvector by k elements (rotate down). I'll call this rotate(vect,k).
Then, for every element b of B, vectAnswer = vectAnswer | rotate(vectA,b).

Resources