Maximum average distance between two numbers across multiple arrays - arrays

Let's say you have k arrays of size N, each containing unique values from 1 to N.
How would you find the two numbers that are on average the furthest away from each other?
For example, given the arrays:
[1,4,2,3]
[4,2,3,1]
[2,3,4,1]
Then the answer would be item 1 and 2, because they are of distance 2 apart in the first two arrays, and 3 numbers apart in the last one.
I am aware of an O(kN^2) solution (by measuring the distance between each pair of numbers for each of the k arrays), but is there a better solution?
I want to implement such an algorithm in C++, but any description of a solution would be helpful.

After a linear-time transformation indexing the numbers, this problem boils down to computing the diameter of a set of points with respect to L1 distance. Unfortunately this problem is subject to the curse of dimensionality.
Given
1 2 3 4
1: [1,4,2,3]
2: [4,2,3,1]
3: [2,3,4,1]
we compute
1 2 3
1: [1,4,4]
2: [3,2,1]
3: [4,3,2]
4: [2,1,3]
and then the L1 distance between 1 and 2 is |1-3| + |4-2| + |4-1| = 8, which is their average distance (in problem terms) times k = 3.
That being said, you can apply an approximate nearest neighbor algorithm using the input above as the database and the image of each point in the database under N+1-v as a query.

I've a suggestion for the best case. You can follow an heuristical approach.
For instance, You know that if N=4, N-1=3 will be the maximum distance and 1 will be the minimum. The mean distance is 10/6=1,66667 (sums of distances among pairs within array / number of pairs within an array).
Then, you know that if two numbers are on the edges for k/2 arrays (most of the times), it is already on the average top (>= 2 of distance), even if they're just 1 distance apart in the other k/2 arrays. That could be a solution for a best case in O(2k) = O(k).

Related

Algorithm for the maximum absolute sub array sum under threshold value

What I am trying to convey with the title is the following exercise:
One is given an list of numbers both positive and negative where 0 < N < 100 000. From this list one needs to find the maximum sub array sum, but it should be the maximum sum under a certain threshold value (x). The exercise is meant to be solved in c++, but that doesn't really matter. Also the naive approach which is O(n^2) isn't fast enough with the time constraints given. I also couldn't think of a simple approach to make this work like there is with the maximum sub array sum.
For example:
The list: 1 -4 5 6 -3 -2 14
If the threshold is 4, the best solution would be {-4}
If the threshold is 9, the best solution would be {-3 -2 14}
If the threshold is 100, the best solution would be {5 6 -3 -2 14}
If the threshold is 7, the best solution would be {-4 5 6}
If the threshold is 2, the best solution would be {-2}
For the people who are interested in how I approached the problem:
The O(n^2) solution I used just looped over every possible sub array sum. It did not recalculate every sum rather it just add a new number it came across.
Compute partial sums: P(i) = sum(a[0]...a[i])
For a given position pair x, y (x <= y) the sum is P(y) - P(x-1).
So if we fix y, then you are looking for the smallest value greater or equal to y - target.
So iterate over the set and insert items into balanced tree (like set in most languages). Then you can lookup the value in the set and return the value (in C++ there's upper_bound that does almost exactly what you need). Keep doing this and find the largest combination that satisfies the condition.
Lookup, and insertion should be O(log N) so overall the solution is going to be O(N logN).

Determine the adjacency of two fibonacci number

I have many fibonacci numbers, if I want to determine whether two fibonacci number are adjacent or not, one basic approach is as follows:
Get the index of the first fibonacci number, say i1
Get the index of the second fibonacci number, say i2
Get the absolute value of i1-i2, that is |i1-i2|
If the value is 1, then return true.
else return false.
In the first step and the second step, it may need many comparisons to get the correct index by using accessing an array.
In the third step, it need one subtraction and one absolute operation.
I want to know whether there exists another approach to quickly to determine the adjacency of the fibonacci numbers.
I don't care whether this question could be solved mathematically or by any hacking techniques.
If anyone have some idea, please let me know. Thanks a lot!
No need to find the index of both number.
Given that the two number belongs to Fibonacci series, if their difference is greater than the min. number among them then those two are not adjacent. Other wise they are.
Because Fibonacci series follows following rule:
F(n) = F(n-1) + F(n-2) where F(n)>F(n-1)>F(n-2).
So F(n) - F(n-1) = F(n-2) ,
=> Diff(n,n-1) < F(n-1) < F(n-k) for k >= 1
Difference between two adjacent fibonaci number will always be less than the min number among them.
NOTE : This will only hold if numbers belong to Fibonacci series.
Simply calculate the difference between them. If it is smaller than the smaller of the 2 numbers they are adjacent, If it is bigger, they are not.
Each triplet in the Fibonacci sequence a, b, c conforms to the rule
c = a + b
So for every pair of adjacent Fibonaccis (x, y), the difference between them (y-x) is equal to the value of the previous Fibonacci, which of course must be less than x.
If 2 Fibonaccis, say (x, z) are not adjacent, then their difference must be greater than the smaller of the two. At minimum, (if they are one Fibonacci apart) the difference would be equal to the Fibonacci between them, (which is of course greater than the smaller of the two numbers).
Since for (a, b, c, d)
since c= a+b
and d = b+c
then d-b = (b+c) - b = c
By Binet's formula, the nth Fibonacci number is approximately sqrt(5)*phi**n, where phi is the golden ration. You can use base phi logarithms to recover the index easily:
from math import log, sqrt
def fibs(n):
nums = [1,1]
for i in range(n-2):
nums.append(sum(nums[-2:]))
return nums
phi = (1 + sqrt(5))/2
def fibIndex(f):
return round((log(sqrt(5)*f,phi)))
To test this:
for f in fibs(20): print(fibIndex(f),f)
Output:
2 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
Of course,
def adjacentFibs(f,g):
return abs(fibIndex(f) - fibIndex(g)) == 1
This fails with 1,1 -- but there is little point for explicit testing special logic for such an edge-case. Add it in if you want.
At some stage, floating-point round-off error will become an issue. For that, you would need to replace math.log by an integer log algorithm (e.g. one which involves binary search).
On Edit:
I concentrated on the question of how to recover the index (and I will keep the answer since that is an interesting problem in its own right), but as #LeandroCaniglia points out in their excellent comment, this is overkill if all you want to do is check if two Fibonacci numbers are adjacent, since another consequence of Binet's formula is that sufficiently large adjacent Fibonacci numbers have a ratio which differs from phi by a negligible amount. You could do something like:
def adjFibs(f,g):
f,g = min(f,g), max(f,g)
if g <= 34:
return adjacentFibs(f,g)
else:
return abs(g/f - phi) < 0.01
This assumes that they are indeed Fibonacci numbers. The index-based approach can be used to verify that they are (calculate the index and then use the full-fledged Binet's formula with that index).

Can we use binary search to find most frequently occuring integer in sorted array? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Problem:
Given a sorted array of integers find the most frequently occurring integer. If there are multiple integers that satisfy this condition, return any one of them.
My basic solution:
Scan through the array and keep track of how many times you've seen each integer. Since it's sorted, you know that once you see a different integer, you've gotten the frequency of the previous integer. Keep track of which integer had the highest frequency.
This is O(N) time, O(1) space solution.
I am wondering if there's a more efficient algorithm that uses some form of binary search. It will still be O(N) time, but it should be faster for the average case.
Asymptotically (big-oh wise), you cannot use binary search to improve the worst case, for the reasons the answers above mine have presented. However, here are some ideas that may or may not help you in practice.
For each integer, binary search for its last occurrence. Once you find it, you know how many times it appears in the array, and can update your counts accordingly. Then, continue your search from the position you found.
This is advantageous if you have only a few elements that repeat a lot of times, for example:
1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Because you will only do 3 binary searches. If, however, you have many distinct elements:
1 2 3 4 5 6
Then you will do O(n) binary searches, resulting in O(n log n) complexity, so worse.
This gives you a better best case and a worse worst case than your initial algorithm.
Can we do better? We could improve the worst case by finding the last occurrence of the number at position i like this: look at 2i, then at 4i etc. as long as the value at those positions are the same. If they are not, look at (i + 2i) / 2 etc.
For example, consider the array:
i
1 2 3 4 5 6 7 ...
1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
We look at 2i = 2, it has the same value. We look at 4i = 4, same value. We look at 8i = 8, different value. We backtrack to (4 + 8) / 2 = 6. Different value. Backtrack to (4 + 6) / 2 = 5. Same value. Try (5 + 6) / 2 = 5, same value. We search no more, because our window has width 1, so we're done. Continue the search from position 6.
This should improve the best case, while keeping the worst case as fast as possible.
Asymptotically, nothing is improved. To see if it actually works better on average in practice, you'll have to test it.
Binary search, which eliminates half of the remaining candidates, probably wouldn't work. There are some techniques you could use to avoid reading every element in the array. Unless your array is extremely long or you're solving a problem for curiosity, the naive (linear scan) solution is probably good enough.
Here's why I think binary search wouldn't work: start with an array: given the value of the middle item, you do not have enough information to eliminate the lower or upper half from the search.
However, we can scan the array in multiple passes, each time checking twice as many elements. When we find two elements that are the same, make one final pass. If no other elements were repeated, you've found the longest element run (without even knowing how many of that element is in the sorted list).
Otherwise, investigate the two (or more) longer sequences to determine which is longest.
Consider a sorted list.
Index 0 1 2 3 4 5 6 7 8 9 a b c d e f
List 1 2 3 3 3 3 3 3 3 4 5 5 6 6 6 7
Pass1 1 . . . . . . 3 . . . . . . . 7
Pass2 1 . . 3 . . . 3 . . . 5 . . . 7
Pass3 1 2 . 3 . x . 3 . 4 . 5 . 6 . 7
After pass 3, we know that the run of 3's must be at least 5, while the longest run of any other number is at most 3. Therefore, 3 is the most frequently occurring number in the list.
Using the right data structures and algorithms (use binary-tree-style indexing), you can avoid reading values more than once. You can also avoid reading the 3 (marked as an x in pass 3) since you already know its value.
This solution has running time O(n/k) which degrades to O(n) for k=1 for a list with n elements and a longest run of k elements. For small k, the naive solution will perform better due to simpler logic, data structures, and higher RAM cache hits.
If you need to determine the frequency of the most common number, it would take O((n/k) log k) as indicated by David to find the first and last position of the longest run of numbers using binary search on up to n/k groups of size k.
The worst case cannot be better than O(n) time. Consider the case where each element exists once, except for one element which exists twice. In order to find that element, you'd need to look at every element in the array until you find it. This is because knowing the value of any array element does not give you any information regarding the location of the duplicate element, until it's actually found. This is in contrast to binary search, where the value of an array element allows you to rule out many other elements.
No, in the worst case we have to scan at least n - 2 elements, but see
below for an algorithm that exploits inputs with many duplicates.
Consider an adversary that, for the first n - 3 distinct probes into the
n-element array, returns m for the value at index m. Now the algorithm
knows that the array looks like
1 2 3 ... i-1 ??? i+1 ... j-1 ??? j+1 ... k-1 ??? k+1 ... n-2 n-1 n.
Depending on what the ???s are, the sole correct answer could be j-1
or j+1, so the algorithm isn’t done yet.
This example involved an array where there were very few duplicates. In
fact, we can design an algorithm that, if the most frequent element
occurs k times out of n, uses O((n/k) log k) probes into the array. For
j from ceil(log2(n)) - 1 down to 0, examine the subarray consisting of
every (2**j)th element. Stop if we find a duplicate. The cost so far
is O(n/k). Now, for each element in the subarray, use binary search to
find its extent (O(n/k) searches in subarrays of size O(k), for a total
of O((n/k) log k)).
It can be shown that all algorithms have a worst case of Omega((n/k) log
k), making this one optimal in the worst case up to constant factors.

Making Minimal Changes to Change Range of the Array

Consider having an array filled with elements a0,a1,a2,....,a(n-1).
Consider that this array is sorted already; it will be easier to describe the problem.
Now the range of the array is defined as the biggest element - smallest element.
Say this range is some value x.
Now the problem I have is that, I want to change the elements in such a way that the range becomes less than/equal to some target value y.
I also have the additional constraint that I want to change minimal amount for each element. Consider an element a(i) that has value z. If I change it by r amount, this costsr^2.
Thus, what is an efficient algorithm to update this array to make the range less than or equal to target range y that minimizes the cost.
An example:
Array = [ 0, 3, 19, 20, 23 ] Target range is 17.
I would make the new array [ 3, 3, 19, 20, 20 ] . The cost is (3)^2 + (3)^2 = 18.
This is the minimal cost.
If you are adding/removing to some certain element a(i), you must add/remove that quantity q all at once. You can not remove 3 times 1 unit from a certain element, but must remove a quantity of 3 units once.
I think you can build two heaps from the array - one min-heap, one max-heap. Now you will take the top elements of both heaps and peek at the ones right under them and compare the differences. The one that has the bigger difference you will take and if that difference is bigger than you need, you will just take the required size and add the cost.
Now, if you had to take the whole difference and didn't achieve your goal, you will need to repeat this step. However, if you once again choose from the same heap, you have to remember to add the cost for the element you are taking out of the heap in that steps AND also for those that have been taken out of the processed heap before.
This yields an O(N*logN) algorithm, I'm not sure if it can be done faster.
Example:
Array [2,5,10,12] , I want difference 4.
First heap has 2 on top, second one 12. the 2 is 3 far from 5 and 12 is 2 far from 10 so I take the min-heap and the two will have to be changed by 3. So now we have a new situation:
[5, 10, 12]
The 12 is 2 far from 10 and we take it, subtract 2 and get new situation:
[5,10]
Now we can choose any heap, both differences are the same (the same numbers :-) ). We just need to change by 1 so we get subtract 1 from 10 and get the right result. Now, because we changed 5 to 6 we would also have to change the number that was originally 12 once more to 9 so the resulting cost:
[2 - changed to 5, 5 - unchanged, 10 - changed to 9, 12 - changed to 9].
Here is a linear-time algorithm that minimizes the piecewise quadratic objective function. Probably it can be simplified.
Let the range be [x, x + y], where x is a variable. For different choices of x, there are at most 2n + 1 possibilities for which points lie in the range, arising from 2n critical values a0 - y, a1 - y, ..., a(n-1) - y, a0, a1, ..., a(n-1). One linear-time merge yields the critical values in sorted order. For each of the 2n - 1 intervals [w, z] between critical values where the range contains at least one point, we can construct and minimize a quadratic function consisting of a sum where every point aj less than w yields a term (x - aj)^2 and every point aj greater than z + y yields a term (x + y - aj)^2. The global minimum lies at the mean of aj (for terms of the first type) or aj - y (for terms of the second type); the endpoints of the interval must be checked as well. Naively, this gives a quadratic-time algorithm.
To get down to linear time, it suffices to update the sum preceding the mean computation incrementally. Each of the critical values has an associated event indicating whether the point responsible for it is entering or leaving the interval, meaning that that point's term should enter or leave the sum.

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Resources