Maximum subset sum with two arrays - arrays

I am not even sure if this can be done in polynomial time.
Problem:
Given two arrays of real numbers,
A = (a[1], a[2], ..., a[n]),
B = (b[1], b[2], ..., b[n]), (b[j] > 0, j = 1, 2, ..., n)
and a number k, find a subset A' of A (A' = (a[i(1)],
a[i(2)], ..., a[i(k)])), which contains exactly k elements, such that, (sum a[i(j)])/(sum b[i(j)]) is maximized, wherej = 1, 2, ..., k.
For example, if k == 3, and {a[1], a[5], a[7]} is the result, then
(a[1] + a[5] + a[7])/(b[1] + b[5] + b[7])
should be larger than any other combination. Any clue?

Assuming that the entries of B are positive (it sounds as though this special case might be useful to you), there is an O(n^2 log n) algorithm.
Let's first solve the problem of deciding, for a particular t, whether there exists a solution such that
(sum a[i(j)])/(sum b[i(j)]) >= t.
Clearing the denominator, this condition is equivalent to
sum (a[i(j)] - t*b[i(j)]) >= 0.
All we have to do is choose the k largest values of a[i(j)] - t*b[i(j)].
Now, in order to solve the problem when t is unknown, we use a kinetic algorithm. Think of t as being a time variable; we are interested in the evolution of a one-dimensional physical system with n particles having initial positions A and velocities -B. Each particle crosses each other particle at most one time, so the number of events is O(n^2). In between crossings, the optimum of sum (a[i(j)] - t*b[i(j)]) changes linearly, because the same subset of k is optimal.

If B can contain negative numbers, then this is NP-Hard.
Because of the NP-Hardness of this problem:
Given k and array B, is there a subset of size k of B which sums to zero.
The A becomes immaterial in that case.
Of course, from your comment it seems like B must contain positive numbers.

Related

Can I sort an array in O(n) if all values are positive and with digits <= k?

Supposing I have an array A[1 ... n] , and no range for its values apart from the fact that they are positive. If I know they have up to k digits, is it possible to sort array in O(n)?
All examples I have come across for O(n) sorting give an upper bound for the values in the array. If there is a duplicate please let me know.
This depends on whether k is a constant or not.
If your numbers have k digits each, then you do have a bound on the numbers, since they can't be any bigger than 10k - 1. You could therefore use radix sort to sort the integers. The runtime of radix sort is O((n + b)logb U), where n is the number of numbers to sort, b is the base of your radix sort, and U is the maximum value that you're sorting. In your case, that works out to
O((n + b) logb 10k) = O(k(n + b)).
This is where the "it depends" comes in. If k is some fixed number that never changes - say, it's always 137 or something like that - then the above expression reduces to O(n + b), and picking b to be any constant (say, base-2 for your radix sort) gives a runtime of O(n). On the other hand, if k can vary (say, the numbers are allowed to be as big as you'd like them to be, and then after seeing the numbers you work out what k is), then the above expression can't be simplified beyond O(kn) because k is a parameter to the algorithm.
Hope this helps!
If k <= n, you can, otherwise it's not guaranteed.
If I give you n = 4 and k = 5, where would you place an element 3?

Finding the smallest sum of the difference of A[i] and a constant

For an assignment I need to solve a mathmatical problem. I narrowed it down to the following:
Let A[1, ... ,n] be an array of n integers.
Let y be an integer constant.
Now, I have to write an algorithm that finds the minimum of M(y) in O(n) time:
M(y) = Sum |A[i] - y|, i = 1 to n. Note that I not just take A[i] - y, but the absolute value |A[i] - y|.
For clarity, I also put this equation in Wolfram Alpha.
I have considered least squares method, but this will not yield the minimum of M(y) but more of an average value of A, I think. As I'm taking the absolute value of A[i] - y, there is also no way I can differentiate this function to y. Also I can't just come up with any algorithm because I have to do it in O(n) time. Also, I believe there can be more correct answers for y in some cases, in that case, the value of y must be equal to one of the integer elements of A.
This has really been eating me for a whole week now and I still haven't figured it out. Can anyone please teach me the way to go or point me in the right direction? I'm stuck. Thank you so much for your help.
You want to pick a y for which M(y) = sum(abs(A[i] - y)) is minimal. Let's assume every A[i] is positive (it does not change the result, because the problem is invariant by translation).
Let's start with two simple observations. First, if you pick y such that y < min(A) or y > max(A), you end up with a greater value for M(y) than if you picked y such that min(A) <= y <= max(A). Also, there is a unique local minimum or range of minima of A (M(y) is convex).
So we can start by picking some y in the interval [min(A) .. max(A)] and try to move this value around so that we get a smaller M(y). To make things easier to understand, let's sort A and pick a i in [1 .. n] (so y = A[i]).
There are three cases to consider.
If A[i+1] > A[i], and either {n is odd and i < (n+1)/2} or {n is even and i < n/2}, then M(A[i+1]) < M(A[i]).
This is because, going from M(A[i]) to M(A[i+1]), the number of terms that decrease (that is n-i) is greater than the number of terms that increase (that is i), and the increase or decrease is always of the same amount. In the case where n is odd, i < (n+1)/2 <=> 2*i < n+1 <=> 2*i < n, because 2*i is even (thus necessarily smaller than a larger even number from which we subtract one).
In more formal terms, M(A[i]) = sum(A[i]-A[s]) + sum(A[g]-A[i]), where s and g represent indices such that A[s] < A[i] and A[g] > A[i]. So if A[i+1] > A[i], then M(A[i+1]) = sum(A[i]-A[s]) + i*(A[i+1]-A[i]) + sum(A[g]-A[i]) - (n-i)*(A[i+1]-A[i]) = M(A[i]) + (2*i-n)*(A[i+1]-A[i]). Since 2*i < n and A[i+1] > A[i], (2*i-n)*(A[i+1]-A[i]) < 0, so M(A[i+1]) < M(A[i]).
Similarly, if A[i-1] < A[i], and either {n is odd and i > (n+1)/2} or {n is even and i > (n/2)+1}, then M(A[i-1]) > M(A[i]).
Finally, if {n is odd and i = (n+1)/2} or {n is even and i = (n/2) or (n/2)+1}, then you have a minimum, because decrementing or incrementing i will eventually lead you to the first or second case, respectively. There are leftover possible values for i, but all of them lead to A[i] being a minimum too.
The median of A is exactly the value A[i] where i satisfies the last case. If the number of elements in A is odd, then you have exactly one such value, y = A[(n+1)/2] (but possibly multiple indices for it) ; if it's even, then you have a range (which may contain just one integer) of such values, A[n/2] <= y <= A[n/2+1].
There is a standard C++ algorithm that can help you find the median in O(n) time : nth_element. If you are using another language, look up the median of medians algorithm (which Nico Schertler pointed out) or even introselect (which is what nth_element typically uses).

Finding the probability that two items are compared. (hints please)

I'm attempting to solve the following problem (from Prof. Jeff Erikson's notes): Given the algorithm below which takes in an unsorted array A and returns the k-th smallest element in the array (given that Partition does what its name implies via the standard quicksort method given the pivot returned by Random (which is assumed to return a uniformly random integer between 1 and n in linear time) and returns the new index of the pivot), we are to find the exact probability that this algorithm compares the i-th smallest and j-th smallest elements in the input array.
QuickSelect(A[1..n],k):
r <-- Partition(A[1..n],Random(n))
if k < r:
return QuickSelect(A[1..r-1],k)
else if k > r:
return QuickSelect(A[r+1..n],k-r)
else:
return A[k]
Now, I can see that the probability of the first if statement being true is (n-k)/n, the probability of the second block being true is (k-1)/n, and the probability of executing the else statement is 1/n. I also know that (assuming i < j) the probability of i < r < j is (j-i-1)/n which guarantees that the two elements are never compared. On the other hand, if i==r or j==r, then i and j are guaranteed to be compared. The part that really trips me up is what happens if r < i or j < r, because whether or not i and j are compared depends on the value of k (whether or not we are able to recursively call QuickSelect).
Any hints and/or suggestions would be greatly appreciated. This is for homework, so I would rather not have full solutions given to me so that I may actually learn a bit. Thanks in advance!
As it has already been mentioned Monte Carlo method is simple solution for fast (in sense of implementation) approximation.
There is a way to compute exact probability using dynamic programming
Here we will assume that all elements in array are distinct and A[i] < A[j].
Let us denote P(i, j, k, n) for probability of comparison ith and jth elements while selecting k-th in an n-elements array.
Then there is equal probability for r to be any of 1..n and this probability is 1/n. Also note that all this events are non-intersecting and their union forms all the space of events.
Let us look carefully at each possible value of r.
If r = 1..i-1 then i and j fall into the same part and the probability of their comparison is P(i-r, j-r, k-r, n-r) if k > r and 0 otherwise.
If r = i the probability is 1.
If r = i+1..j-1 the probability is 0.
If r = j the probability is 1 and if r = j+1..n the probability is P(i, j, k, r-1) if k < r and 0 otherwise.
So the full recurrent formula is P(i, j, k, n) = 1/n * (2 + Sum for r = 1..min(r, i)-1 P(i-r, j-r, k-r, n-r) + sum for r = max(j, k)+1..n P(i, j, k, r-1))
Finally for n = 2 (for i and j to be different) the only possible Ps are P(1, 2, 1, 2) and P(1, 2, 2, 2) and both equal 1 (no matter what r is equal to there will be a comparison)
Time complexity is O(n^5), space complexity is O(n^4). Also it is possible to optimize calculations and make time complexity O(n^4). Also as we only consider A[i] < A[j] and i,j,k <= n multiplicative constant is 1/8. So it would possible to compute any value for n up to 100 in a couple of minutes, using straight-forward algorithm described or up to 300 for optimized one.
Note that two positions are only compared if one of them is the pivot. So the best way to look at this is to look at the sequence of chosen pivots.
Suppose the k-th smallest element is between i and j. Then i and j are not compared if and only if an element between them is selected as a pivot before i or j are. What is the probability that this happens?
Now suppose the k-th smallest element is after j. i and j are not compared if and only if an element between i+1 and k (excluding j) is selected as a pivot before i or j are. What is the probability that this happens?

Algorithm help: how to divide array into N segments with least possible largest segment (balanced segmenting)

I came across this problem on one of the russian programming forums, but haven't come up with an elegant solution.
Problem:
You have an array with N positive integers, you need to divide it into M contiguous segments, so that the total of the largest segment is the smallest possible value. By segment's total, I mean the sum of all its integers. In other words, I want a well-balanced array segmentation, where you don't want a single segment to be too large.
Example:
Array: [4, 7, 12, 5, 3, 16]
M = 3, meaning that I need to divide my array into 3 subarrays.
Solution would be: [4,7] [12, 5] [3, 16] so that the largest segment is [3, 16] = 19 and no other segmentation variant can produce the largest segment with smaller total.
Another example:
Array [3, 13, 5, 7, 18, 8, 20, 1]
M = 4
Solution: [3, 13, 5] [7, 18] [8] [20, 1], the "fattest" segment is [7, 18] = 25 (correct me if I am wrong, I made up this example)
I have a feeling that this is some classic CS/math problem, probably with some famous person's name associated with it, like Dijkstra's problem.
- Is there any known solution for it?
- If not, can you come up with some other solution besides brute forcing, which is, as far as I understand time complexity, exponential. (N^M, to be more specific).
Thanks in advance, stackoverflowers.
Let's do a binary search over the answer.
For a fixed answer X it is easy to check if it is feasible or not(we can just use a greedy algorithm(always taking the longest possible segment so that its sum is <= X) and compare the number of segments to M).
The total time complexity is O(N * log(sum of all elements)).
Here is some pseudo-code
boolean isFeasible(int[] array, long candidate, int m) {
// Here goes the greedy algorithm.
// It finds the minimum number of segments we can get(minSegments).
...
return minSegments <= m;
}
long getMinimumSum(int[] array, int m) {
long low = 0; // too small
long high = sum of elements of the array // definitely big enough
while (high - low > 1) {
long mid = low + (high - low) / 2;
if (isFeasible(array, mid, m))
high = mid;
else
low = mid;
}
return high;
}
I like ILoveCoding's approach. Here's another way that takes O(MN^2) time, which will be faster if N and M are small but the numbers in the array are very large (specifically, if log(sum) >> MN, which is possible but admittedly doesn't sound very realistic). It uses dynamic programming.
Let's consider partitioning just the subarray consisting of the first i <= N entries into j <= M segments. Let f(i, j) be the weight of the largest segment in the best solution for this subproblem -- i.e. the weight of the largest segment in that j-partition of the first i numbers whose largest segment is smallest among all such partitions. We want to compute f(N, M), as well as a (there may be more than one) partition that corresponds to it.
It's easy to compute f(i, 1) -- that's just the sum of the first i elements:
f(i, 1) = x[1] + ... + x[i]
To compute f(i, j) for j >= 2, observe that element i must be the final element of some segment that starts at some position 1 <= k <= i, and which is preceded by j-1 segments -- and in any optimal solution for parameters (i, j), those j-1 preceding segments must themselves be optimal for parameters (k-1, j-1). So if we consider every possible start position k for this final segment and take the best, we will calculate the best j-partition of the first i elements:
[EDIT 3/2/2015: We need to take the max of the new segment and the largest remaining segment, instead of adding them!]
f(i, j >= 2) = minimum of (max(f(k-1, j-1), x[k] + ... + x[i])) over all 1 <= k <= i
If we try k values in decreasing order, then we can easily build up the sum in constant time per k value, so calculating a single f(i, j) value takes O(N) time. We have MN of these values to compute, so the total time needed is O(MN^2).
One more boundary condition is needed to forbid trying to partition into more segments than there are elements:
f(i, j > i) = infinity
Once we have calculated f(N, M), we could extract a corresponding partition by tracing back through the DP matrix in the usual way -- but in this case, it's probably easier just to build the partition using ILoveCoding's greedy algorithm. Either way takes O(N) time.

fast algorithm of finding sums in array

I am looking for a fast algorithm:
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
For example I know there's a int array of size 3 is [1, 2, 3] then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
Please share your idea for this problem, thank you
Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2), but we've only proved that it can't be faster than O(n lg n) for most models of computation.
If the integer elements lie in the range [u, v], you can do a slightly different version of this in O(n + (v-u) lg (v-u)) with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.
The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a, b, c with c - b = b - a, or equivalently, a + c = 2b.
Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.
Let's call your original array X, which contains integers x_1, ..., x_n. We want to find indices i, j, k such that x_i + x_j = x_k.
Find the minimum u and maximum v of X in O(n) time. Let u' be min(u, u*2) and v' be max(v, v*2).
Construct a binary array (bitstring) Z of length v' - u' + 1; Z[i] will be true if either X or its double [x_1*2, ..., x_n*2] contains u' + i. This is O(n) to initialize; just walk over each element of X and set the two corresponding elements of Z.
As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y. Once Z is complete, we just check for 2 * x_i for each x_i in Y. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y. (The only situation slightly more complicated is if 0 is repeated; then we need three distinct copies of it to get a solution.)
Now, a solution to your problem, i.e. x_i + x_j = x_k, will appear in Z as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from 2X) and the one in the middle is a regular entry (from X).
Consider Z as a representation of a polynomial p, where the coefficient for the term of degree i is Z[i]. If X is [1, 2, 3, 5], then Z is 1111110001 (because we have 1, 2, 3, 4, 5, 6, and 10); p is then 1 + x + x2 + x3 + x4 + x5 + x9.
Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1) will be the sum over all i of Z[i] * Z[2*j - i]. But since Z is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.
We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|) time, where |Z| is v' - u' + 1. We get out another array of coefficients; call it W.
Loop over each x_k in X. (Recall that our desired evenly-spaced ones are all centered on an element of X, not 2*X.) If the corresponding W for twice this element, i.e. W[2*(x_k - u')], is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)
Otherwise, it might be the center of a progression that we want (so we need to find i and j). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i of X, and check if there's a triple with 2*x_i, x_k, 2*x_j for some j (by checking Z[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all of X without a hit, then the FFT found only spurious answers, and we have to check another element of W.
This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W; if we knew that any appropriate W coefficient definitely had an answer, this last step would be O(n) and all would be well.
I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....
Partially based on this answer.
It has to be at least O(n^2) as there are n(n-1)/2 different sums possible to check for other members. You have to compute all those, because any pair summed may be any other member (start with one example and permute all the elements to convince yourself that all must be checked). Or look at fibonacci for something concrete.
So calculating that and looking up members in a hash table gives amortised O(n^2). Or use an ordered tree if you need best worst-case.
You essentially need to find all the different sums of value pairs so I don't think you're going to do any better than O(n2). But you can optimize by sorting the list and reducing duplicate values, then only pairing a value with anything equal or greater, and stopping when the sum exceeds the maximum value in the list.

Resources