Consider having an array filled with elements a0,a1,a2,....,a(n-1).
Consider that this array is sorted already; it will be easier to describe the problem.
Now the range of the array is defined as the biggest element - smallest element.
Say this range is some value x.
Now the problem I have is that, I want to change the elements in such a way that the range becomes less than/equal to some target value y.
I also have the additional constraint that I want to change minimal amount for each element. Consider an element a(i) that has value z. If I change it by r amount, this costsr^2.
Thus, what is an efficient algorithm to update this array to make the range less than or equal to target range y that minimizes the cost.
An example:
Array = [ 0, 3, 19, 20, 23 ] Target range is 17.
I would make the new array [ 3, 3, 19, 20, 20 ] . The cost is (3)^2 + (3)^2 = 18.
This is the minimal cost.
If you are adding/removing to some certain element a(i), you must add/remove that quantity q all at once. You can not remove 3 times 1 unit from a certain element, but must remove a quantity of 3 units once.
I think you can build two heaps from the array - one min-heap, one max-heap. Now you will take the top elements of both heaps and peek at the ones right under them and compare the differences. The one that has the bigger difference you will take and if that difference is bigger than you need, you will just take the required size and add the cost.
Now, if you had to take the whole difference and didn't achieve your goal, you will need to repeat this step. However, if you once again choose from the same heap, you have to remember to add the cost for the element you are taking out of the heap in that steps AND also for those that have been taken out of the processed heap before.
This yields an O(N*logN) algorithm, I'm not sure if it can be done faster.
Example:
Array [2,5,10,12] , I want difference 4.
First heap has 2 on top, second one 12. the 2 is 3 far from 5 and 12 is 2 far from 10 so I take the min-heap and the two will have to be changed by 3. So now we have a new situation:
[5, 10, 12]
The 12 is 2 far from 10 and we take it, subtract 2 and get new situation:
[5,10]
Now we can choose any heap, both differences are the same (the same numbers :-) ). We just need to change by 1 so we get subtract 1 from 10 and get the right result. Now, because we changed 5 to 6 we would also have to change the number that was originally 12 once more to 9 so the resulting cost:
[2 - changed to 5, 5 - unchanged, 10 - changed to 9, 12 - changed to 9].
Here is a linear-time algorithm that minimizes the piecewise quadratic objective function. Probably it can be simplified.
Let the range be [x, x + y], where x is a variable. For different choices of x, there are at most 2n + 1 possibilities for which points lie in the range, arising from 2n critical values a0 - y, a1 - y, ..., a(n-1) - y, a0, a1, ..., a(n-1). One linear-time merge yields the critical values in sorted order. For each of the 2n - 1 intervals [w, z] between critical values where the range contains at least one point, we can construct and minimize a quadratic function consisting of a sum where every point aj less than w yields a term (x - aj)^2 and every point aj greater than z + y yields a term (x + y - aj)^2. The global minimum lies at the mean of aj (for terms of the first type) or aj - y (for terms of the second type); the endpoints of the interval must be checked as well. Naively, this gives a quadratic-time algorithm.
To get down to linear time, it suffices to update the sum preceding the mean computation incrementally. Each of the critical values has an associated event indicating whether the point responsible for it is entering or leaving the interval, meaning that that point's term should enter or leave the sum.
Related
Let's say you have k arrays of size N, each containing unique values from 1 to N.
How would you find the two numbers that are on average the furthest away from each other?
For example, given the arrays:
[1,4,2,3]
[4,2,3,1]
[2,3,4,1]
Then the answer would be item 1 and 2, because they are of distance 2 apart in the first two arrays, and 3 numbers apart in the last one.
I am aware of an O(kN^2) solution (by measuring the distance between each pair of numbers for each of the k arrays), but is there a better solution?
I want to implement such an algorithm in C++, but any description of a solution would be helpful.
After a linear-time transformation indexing the numbers, this problem boils down to computing the diameter of a set of points with respect to L1 distance. Unfortunately this problem is subject to the curse of dimensionality.
Given
1 2 3 4
1: [1,4,2,3]
2: [4,2,3,1]
3: [2,3,4,1]
we compute
1 2 3
1: [1,4,4]
2: [3,2,1]
3: [4,3,2]
4: [2,1,3]
and then the L1 distance between 1 and 2 is |1-3| + |4-2| + |4-1| = 8, which is their average distance (in problem terms) times k = 3.
That being said, you can apply an approximate nearest neighbor algorithm using the input above as the database and the image of each point in the database under N+1-v as a query.
I've a suggestion for the best case. You can follow an heuristical approach.
For instance, You know that if N=4, N-1=3 will be the maximum distance and 1 will be the minimum. The mean distance is 10/6=1,66667 (sums of distances among pairs within array / number of pairs within an array).
Then, you know that if two numbers are on the edges for k/2 arrays (most of the times), it is already on the average top (>= 2 of distance), even if they're just 1 distance apart in the other k/2 arrays. That could be a solution for a best case in O(2k) = O(k).
Given a sorted sequence of n elements. Find the maximum of all the minimums taken from all pairs differences of subsequences of length k.
Here 1<=n<=10^5
and 2<=k<=n
For eg: [2, 3, 5, 9] and k = 3
there are 4 subsequences:
[2, 3, 5] - min diff of all pairs = 1
[2, 3, 9] - min diff of all pairs = 1
[3, 5, 9] - min diff of all pairs = 2
[2, 5, 9] - min diff of all pairs = 3
So answer is max of all min diffs = 3
The naive way is to find all k length subsequences and then find mins in each of them and then max of all of them but that will time out because of the constraints.
Apart from that what I thought was to find the sequence which is optimally distanced so that the min becomes maximum.
Can someone give an optimal and better solution?
Suppose that your sequence of integers is a[i]. Then my solution will find the answer in time O(n log((a[n-1]-a[0])/n)). If your sequence is floating point numbers it will likely run in similar time, but could theoretically be as bad as O(n^3).
The key observation is this. It is easy to construct the most compact sequence starting at the first element whose minimum gap is at least m. Just take the first element, and take each other one when it is at at least m bigger than the last one that you took. So we can write a function that constructs this sequence and tracks 3 numbers:
How many elements we got
The size of the smallest gap that we found.
The next smallest m that would result in a more compact sequence. That is, the largest gap to an element that we didn't include.
In the case of your sequence if we did this with a gap of 2 we'd find that we took 3 elements, the smallest gap is 3, and we'd get a different sequence if we had looked for a gap of 1.
This is enough information to construct a binary search for the desired gap length. With the key logic looking like this:
lower_bound = 0
upper_bound = (a[n-1] - a[0])/(k-1)
while lower_bound < upper_bound:
# Whether int or float, we want float to land "between" guesses
guess = (lower_bound + upper_bound + 0.0) / 2
(size, gap_found, gap_different) = min_gap_stats(a, guess)
if k < size:
# We must pick a more compact sequence
upper_bound = gap_different
else:
# We can get this big a gap, but maybe we can get bigger?
lower_bound = gap_found
If we ran this for your sequence we'd first set a lower_bound of 0 and an upper_bound of 7/2 = 3 (thanks to integer division). And we'd immediately find the answer.
If you had a sequence of floats with the same values it would take longer. We'd first try 3.5, and get a sequence of 2 with a different decision at 3. We'd then try 1.5, and find our sequence of 3 with the gap that we want.
The binary search will usually make this take a logarithmic number of passes.
However each time we set either the upper or lower bound to the size of an actual pairwise gap. Since there are only O(n^2) gaps, we are guaranteed to need no more than that many passes.
What I am trying to convey with the title is the following exercise:
One is given an list of numbers both positive and negative where 0 < N < 100 000. From this list one needs to find the maximum sub array sum, but it should be the maximum sum under a certain threshold value (x). The exercise is meant to be solved in c++, but that doesn't really matter. Also the naive approach which is O(n^2) isn't fast enough with the time constraints given. I also couldn't think of a simple approach to make this work like there is with the maximum sub array sum.
For example:
The list: 1 -4 5 6 -3 -2 14
If the threshold is 4, the best solution would be {-4}
If the threshold is 9, the best solution would be {-3 -2 14}
If the threshold is 100, the best solution would be {5 6 -3 -2 14}
If the threshold is 7, the best solution would be {-4 5 6}
If the threshold is 2, the best solution would be {-2}
For the people who are interested in how I approached the problem:
The O(n^2) solution I used just looped over every possible sub array sum. It did not recalculate every sum rather it just add a new number it came across.
Compute partial sums: P(i) = sum(a[0]...a[i])
For a given position pair x, y (x <= y) the sum is P(y) - P(x-1).
So if we fix y, then you are looking for the smallest value greater or equal to y - target.
So iterate over the set and insert items into balanced tree (like set in most languages). Then you can lookup the value in the set and return the value (in C++ there's upper_bound that does almost exactly what you need). Keep doing this and find the largest combination that satisfies the condition.
Lookup, and insertion should be O(log N) so overall the solution is going to be O(N logN).
Problem from :
https://www.hackerrank.com/contests/epiccode/challenges/white-falcon-and-sequence.
Visit link for references.
I have a sequence of integers (-10^6 to 10^6) A. I need to choose two contiguous disjoint subsequences of A, let's say x and y, of the same size, n.
After that you will calculate the sum given by ∑x(i)y(n−i+1) (1-indexed)
And I have to choose x and y such that sum is maximised.
Eg:
Input:
12
1 7 4 0 9 4 0 1 8 8 2 4
Output: 120
Where x = {4,0,9,4}
y = {8,8,2,4}
∑x(i)y(n−i+1)=4×4+0×2+9×8+4×8=120
Now, the approach that I was thinking of for this is something in lines of O(n^2) which is as follows:
Initialise two variables l = 0 and r = N-1. Here, N is the size of the array.
Now, for l=0, I will calculate the sum while (l<r) which basically refers to the subsequences that will start from the 0th position in the array. Then, I will increment l and decrement r in order to come up with subsequences that start from the above position + 1 and on the right hand side, start from right-1.
Is there any better approach that I can use? Anything more efficient? I thought of sorting but we cannot sort numbers since that will change the order of the numbers.
To answer the question we first define S(i, j) to be the max sum of multlying the two sub-sequence items, for sub-array A[i...j] when the sub-sequence x starts at position i, and sub-sequence y ends on position j.
For example, if A=[1 7 4 0 9 4 0 1 8 8 2 4], then S(1, 2)=1*7=7 and S(2, 5)=7*9+4*0=63.
The recursive rule to compute S is: S(i, j)=max(0, S(i+1, j-1)+A[i]*A[j]), and the end condition is S(i, j)=0 iff i>=j.
The requested final answer is simply the maximum value of S(i, j) for all combinations of i=1..N, j=1..N, since one of the S(i ,j) values will correspond to the max x,y sub-sequences, and thus will be equal the maximum value for the whole array. The complexity of computing all such S(i, j) values is O(N^2) using dynamic programming, since in the course of computing S(i, j) we will also compute the values of up to N other S(i', j') values, but ultimately each combination will be computed only once.
def max_sum(l):
def _max_sub_sum(i, j):
if m[i][j]==None:
v=0
if i<j:
v=max(0, _max_sub_sum(i+1, j-1)+l[i]*l[j])
m[i][j]=v
return m[i][j]
n=len(l)
m=[[None for i in range(n)] for j in range(n)]
v=0
for i in range(n):
for j in range(i, n):
v=max(v, _max_sub_sum(i, j))
return v
WARNING:
This method assumes the numbers are non-negative so this solution does not answer the poster's actual problem now it has been clarified that negative input values are allowed.
Trick 1
Assuming the numbers are always non-negative, it is always best to make the sequences as wide as possible given the location where they meet.
Trick 2
We can change the sum into a standard convolution by summing over all values of i. This produces twice the desired result (as we get both the product of x with y, and y with x), but we can divide by 2 at the end to get the original answer.
Trick 3
You are now attempting to find the maximum of a convolution of a signal with itself. There is a standard method for doing this which is to use the fast fourier transform. Some libraries will have this built in, e.g. in Scipy there is fftconvolve.
Python code
Note that you don't allow the central value to be reused (e.g. for a sequance 1,3,2 we can't make x 1,3 and y 3,1) so we need to examine alternate values of the convolved output.
We can now compute the answer in Python via:
import scipy.signal
A = [1, 7, 4, 0, 9, 4, 0, 1, 8, 8, 2, 4]
print max(scipy.signal.fftconvolve(A,A)[1::2]) / 2
Ideally I'm looking for a c# solution, but any help on the algorithm will do.
I have a 2-dimension array (x,y). The max columns (max x) varies between 2 and 10 but can be determined before the array is actually populated. Max rows (y) is fixed at 5, but each column can have a varying number of values, something like:
1 2 3 4 5 6 7...10
A 1 1 7 9 1 1
B 2 2 5 2 2
C 3 3
D 4
E 5
I need to come up with the total of all possible row-wise sums for the purpose of looking for a specific total. That is, a row-wise total could be the cells A1 + B2 + A3 + B5 + D6 + A7 (any combination of one value from each column).
This process will be repeated several hundred times with different cell values each time, so I'm looking for a somewhat elegant solution (better than what I've been able to come with). Thanks for your help.
The Problem Size
Let's first consider the worst case:
You have 10 columns and 5 (full) rows per column. It should be clear that you will be able to get (with the appropriate number population for each place) up to 5^10 ≅ 10^6 different results (solution space).
For example, the following matrix will give you the worst case for 3 columns:
| 1 10 100 |
| 2 20 200 |
| 3 30 300 |
| 4 40 400 |
| 5 50 500 |
resulting in 5^3=125 different results. Each result is in the form {a1 a2 a3} with ai ∈ {1,5}
It's quite easy to show that such a matrix will always exist for any number n of columns.
Now, to get each numerical result, you will need to do n-1 sums, adding up to a problem size of O(n 5^n). So, that's the worst case and I think nothing can be done about it, because to know the possible results you NEED to effectively perform the sums.
More benign incarnations:
The problem complexity may be cut off in two ways:
Less numbers (i.e. not all columns are full)
Repeated results (i.e. several partial sums give the same result, and you can join them in one thread). Much more in this later.
Let's see a simplified example of the later with two rows:
| 7 6 100 |
| 3 4 200 |
| 1 2 200 |
at first sight you will need to do 2 3^3 sums. But that's not the real case. As you add up the first column you don't get the expected 9 different results, but only 6 ({13,11,9,7,5,3}).
So you don't have to carry your nine results up to the third column, but only 6.
Of course, that is on the expense of deleting the repeating numbers from the list. The "Removal of Repeated Integer Elements" was posted before in SO and I'll not repeat the discussion here, but just cite that doing a mergesort O(m log m) in the list size (m) will remove the duplicates. If you want something easier, a double loop O(m^2) will do.
Anyway, I'll not try to calculate the size of the (mean) problem in this way for several reasons. One of them is that the "m" in the sort merge is not the size of the problem, but the size of the vector of results after adding up any two columns, and that operation is repeated (n-1) times ... and I really don't want to do the math :(.
The other reason is that as I implemented the algorithm, we will be able to use some experimental results and save us from my surely leaking theoretical considerations.
The Algorithm
With what we said before, it is clear that we should optimize for the benign cases, as the worst case is a lost one.
For doing so, we need to use lists (or variable dim vectors, or whatever can emulate those) for the columns and do a merge after every column add.
The merge may be replaced by several other algorithms (such as an insertion on a BTree) without modifying the results.
So the algorithm (procedural pseudocode) is something like:
Set result_vector to Column 1
For column i in (2 to n-1)
Remove repeated integers in the result_vector
Add every element of result_vector to every element of column i+1
giving a new result vector
Next column
Remove repeated integers in the result_vector
Or as you asked for it, a recursive version may work as follows:
function genResVector(a:list, b:list): returns list
local c:list
{
Set c = CartesianProduct (a x b)
Set c = Sum up each element {a[i],b[j]} of c </code>
Drop repeated elements of c
Return(c)
}
function ResursiveAdd(a:matrix, i integer): returns list
{
genResVector[Column i from a, RecursiveAdd[a, i-1]];
}
function ResursiveAdd(a:matrix, i==0 integer): returns list={0}
Algorithm Implementation (Recursive)
I choose a functional language, I guess it's no big deal to translate to any procedural one.
Our program has two functions:
genResVector, which sums two lists giving all possible results with repeated elements removed, and
recursiveAdd, which recurses on the matrix columns adding up all of them.
recursiveAdd, which recurses on the matrix columns adding up all of them.
The code is:
genResVector[x__, y__] := (* Header: A function that takes two lists as input *)
Union[ (* remove duplicates from resulting list *)
Apply (* distribute the following function on the lists *)
[Plus, (* "Add" is the function to be distributed *)
Tuples[{x, y}],2] (*generate all combinations of the two lists *)];
recursiveAdd[t_, i_] := genResVector[t[[i]], recursiveAdd[t, i - 1]];
(* Recursive add function *)
recursiveAdd[t_, 0] := {0}; (* With its stop pit *)
Test
If we take your example list
| 1 1 7 9 1 1 |
| 2 2 5 2 2 |
| 3 3 |
| 4 |
| 5 |
And run the program the result is:
{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
The maximum and minimum are very easy to verify since they correspond to taking the Min or Max from each column.
Some interesting results
Let's consider what happens when the numbers on each position of the matrix is bounded. For that we will take a full (10 x 5 ) matrix and populate it with Random Integers.
In the extreme case where the integers are only zeros or ones, we may expect two things:
A very small result set
Fast execution, since there will be a lot of duplicate intermediate results
If we increase the Range of our Random Integers we may expect increasing result sets and execution times.
Experiment 1: 5x10 matrix populated with varying range random integers
It's clear enough that for a result set near the maximum result set size (5^10 ≅ 10^6 ) the Calculation time and the "Number of != results" have an asymptote. The fact that we see increasing functions just denote that we are still far from that point.
Morale: The smaller your elements are, the better chances you have to get it fast. This is because you are likely to have a lot of repetitions!
Note that our MAX calculation time is near 20 secs for the worst case tested
Experiment 2: Optimizations that aren't
Having a lot of memory available, we can calculate by brute force, not removing the repeated results.
The result is interesting ... 10.6 secs! ... Wait! What happened ? Our little "remove repeated integers" trick is eating up a lot of time, and when there are not a lot of results to remove there is no gain, but looses in trying to get rid of the repetitions.
But we may get a lot of benefits from the optimization when the Max numbers in the matrix are well under 5 10^5. Remember that I'm doing these tests with the 5x10 matrix fully loaded.
The Morale of this experiment is: The repeated integer removal algorithm is critical.
HTH!
PS: I have a few more experiments to post, if I get the time to edit them.