Let's say you have k arrays of size N, each containing unique values from 1 to N.
How would you find the two numbers that are on average the furthest away from each other?
For example, given the arrays:
[1,4,2,3]
[4,2,3,1]
[2,3,4,1]
Then the answer would be item 1 and 2, because they are of distance 2 apart in the first two arrays, and 3 numbers apart in the last one.
I am aware of an O(kN^2) solution (by measuring the distance between each pair of numbers for each of the k arrays), but is there a better solution?
I want to implement such an algorithm in C++, but any description of a solution would be helpful.
After a linear-time transformation indexing the numbers, this problem boils down to computing the diameter of a set of points with respect to L1 distance. Unfortunately this problem is subject to the curse of dimensionality.
Given
1 2 3 4
1: [1,4,2,3]
2: [4,2,3,1]
3: [2,3,4,1]
we compute
1 2 3
1: [1,4,4]
2: [3,2,1]
3: [4,3,2]
4: [2,1,3]
and then the L1 distance between 1 and 2 is |1-3| + |4-2| + |4-1| = 8, which is their average distance (in problem terms) times k = 3.
That being said, you can apply an approximate nearest neighbor algorithm using the input above as the database and the image of each point in the database under N+1-v as a query.
I've a suggestion for the best case. You can follow an heuristical approach.
For instance, You know that if N=4, N-1=3 will be the maximum distance and 1 will be the minimum. The mean distance is 10/6=1,66667 (sums of distances among pairs within array / number of pairs within an array).
Then, you know that if two numbers are on the edges for k/2 arrays (most of the times), it is already on the average top (>= 2 of distance), even if they're just 1 distance apart in the other k/2 arrays. That could be a solution for a best case in O(2k) = O(k).
Given an array of N positive elements. Lets suppose we list all N × (N+1) / 2 non-empty continuous subarrays of the array A and then replaced all the subarrays with the maximum element present in the respective subarray. So now we have N × (N+1) / 2 elements where each element is maximum among its subarray.
Now we are having Q queries, where each query is one of 3 types :
1 K : We need to count of numbers strictly greater than K among those N × (N+1) / 2 elements.
2 K : We need to count of numbers strictly less than K among those N × (N+1) / 2 elements.
3 K : We need to count of numbers equal to K among those N × (N+1) / 2 elements.
Now main problem am facing is N can be upto 10^6. So i can't generate all those N × (N+1) / 2 elements. Please help to solve this porblem.
Example : Let N=3 and we have Q=2. Let array A be [1,2,3] then all sub arrays are :
[1] -> [1]
[2] -> [2]
[3] -> [3]
[1,2] -> [2]
[2,3] -> [3]
[1,2,3] -> [3]
So now we have [1,2,3,2,3,3]. As Q=2 so :
Query 1 : 3 3
It means we need to tell count of numbers equal to 3. So answer is 3 as there are 3 numbers equal to 3 in the generated array.
Query 2 : 1 4
It means we need to tell count of numbers greater than 4. So answer is 0 as no one is greater than 4 in generated array.
Now both N and Q can be up to 10^6. So how to solve this problem. Which data structure should be suitable to solve it.
I believe I have a solution in O(N + Q*log N) (More about time complexity). The trick is to do a lot of preparation with your array before even the first query arrives.
For each number, figure out where is the first number on left / right of this number that is strictly bigger.
Example: for array: 1, 8, 2, 3, 3, 5, 1 both 3's left block would be position of 8, right block would be the position of 5.
This can be determined in linear time. This is how: Keep a stack of previous maximums in a stack. If a new maximum appears, remove maximums from the stack until you get to a element bigger than or equal to the current one. Illustration:
In this example, in the stack is: [15, 13, 11, 10, 7, 3] (you will of course keep the indexes, not the values, I will just use value for better readability).
Now we read 8, 8 >= 3 so we remove 3 from stack and repeat. 8 >= 7, remove 7. 8 < 10, so we stop removing. We set 10 as 8's left block, and add 8 to the maximums stack.
Also, whenever you remove from the stack (3 and 7 in this example), set the right block of removed number to the current number. One problem though: right block would be set to the next number bigger or equal, not strictly bigger. You can fix this with simply checking and relinking right blocks.
Compute what number is how many times a maximum of some subsequence.
Since for each number you now know where is the next left / right bigger number, I trust you with finding appropriate math formula for this.
Then, store the results in a hashmap, key would be a value of a number, and value would be how many times is that number a maximum of some subsequence. For example, record [4->12] would mean that number 4 is the maximum in 12 subsequences.
Lastly, extract all key-value pairs from the hashmap into an array, and sort that array by the keys. Finally, create a prefix sum for the values of that sorted array.
Handle a request
For request "exactly k", just binary search in your array, for more/less thank``, binary search for key k and then use the prefix array.
This answer is an adaptation of this other answer I wrote earlier. The first part is exactly the same, but the others are specific for this question.
Here's an implemented a O(n log n + q log n) version using a simplified version of a segment tree.
Creating the segment tree: O(n)
In practice, what it does is to take an array, let's say:
A = [5,1,7,2,3,7,3,1]
And construct an array-backed tree that looks like this:
In the tree, the first number is the value and the second is the index where it appears in the array. Each node is the maximum of its two children. This tree is backed by an array (pretty much like a heap tree) where the children of the index i are in the indexes i*2+1 and i*2+2.
Then, for each element, it becomes easy to find the nearest greater elements (before and after each element).
To find the nearest greater element to the left, we go up in the tree searching for the first parent where the left node has value greater and the index lesser than the argument. The answer must be a child of this parent, then we go down in the tree looking for the rightmost node that satisfies the same condition.
Similarly, to find the nearest greater element to the right, we do the same, but looking for a right node with an index greater than the argument. And when going down, we look for the leftmost node that satisfies the condition.
Creating the cumulative frequency array: O(n log n)
From this structure, we can compute the frequency array, that tells how many times each element appears as maximum in the subarray list. We just have to count how many lesser elements are on the left and on the right of each element and multiply those values. For the example array ([1, 2, 3]), this would be:
[(1, 1), (2, 2), (3, 3)]
This means that 1 appears only once as maximum, 2 appears twice, etc.
But we need to answer range queries, so it's better to have a cumulative version of this array, that would look like:
[(1, 1), (2, 3), (3, 6)]
The (3, 6) means, for example, that there are 6 subarrays with maxima less than or equal to 3.
Answering q queries: O(q log n)
Then, to answer each query, you just have to make binary searches to find the value you want. For example. If you need to find the exact number of 3, you may want to do: query(F, 3) - query(F, 2). If you want to find those lesser than 3, you do: query(F, 2). If you want to find those greater than 3: query(F, float('inf')) - query(F, 3).
Implementation
I've implemented it in Python and it seems to work well.
import sys, random, bisect
from collections import defaultdict
from math import log, ceil
def make_tree(A):
n = 2**(int(ceil(log(len(A), 2))))
T = [(None, None)]*(2*n-1)
for i, x in enumerate(A):
T[n-1+i] = (x, i)
for i in reversed(xrange(n-1)):
T[i] = max(T[i*2+1], T[i*2+2])
return T
def print_tree(T):
print 'digraph {'
for i, x in enumerate(T):
print ' ' + str(i) + '[label="' + str(x) + '"]'
if i*2+2 < len(T):
print ' ' + str(i)+ '->'+ str(i*2+1)
print ' ' + str(i)+ '->'+ str(i*2+2)
print '}'
def find_generic(T, i, fallback, check, first, second):
j = len(T)/2+i
original = T[j]
j = (j-1)/2
#go up in the tree searching for a value that satisfies check
while j > 0 and not check(T[second(j)], original):
j = (j-1)/2
#go down in the tree searching for the left/rightmost node that satisfies check
while j*2+1<len(T):
if check(T[first(j)], original):
j = first(j)
elif check(T[second(j)], original):
j = second(j)
else:
return fallback
return j-len(T)/2
def find_left(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>b[0] and a[1]<b[1], #value greater, index before
lambda j: j*2+2, #rightmost first
lambda j: j*2+1 #leftmost second
)
def find_right(T, i, fallback):
return find_generic(T, i, fallback,
lambda a, b: a[0]>=b[0] and a[1]>b[1], #value greater or equal, index after
lambda j: j*2+1, #leftmost first
lambda j: j*2+2 #rightmost second
)
def make_frequency_array(A):
T = make_tree(A)
D = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = find_left(T, i, -1)
right = find_right(T, i, len(A))
D[x] += (i-left) * (right-i)
F = sorted(D.items())
for i in range(1, len(F)):
F[i] = (F[i][0], F[i-1][1] + F[i][1])
return F
def query(F, n):
idx = bisect.bisect(F, (n,))
if idx>=len(F): return F[-1][1]
if F[idx][0]!=n: return 0
return F[idx][1]
F = make_frequency_array([1,2,3])
print query(F, 3)-query(F, 2) #3 3
print query(F, float('inf'))-query(F, 4) #1 4
print query(F, float('inf'))-query(F, 1) #1 1
print query(F, 2) #2 3
You problem can be divided into several steps:
For each element of initial array calculate the number of "subarrays" where current element is maximum. This will involve a bit of combinatorics. First you need for each element to know index of previous and next element that is bigger than current element. Then calculate the number of subarrays as (i - iprev) * (inext - i). Finding iprev and inext requires two traversals of the initial array: in forward and backward order. For iprev you need to traverse you array left to right. During the traversal maintain the BST that contains the biggest of the previous elements along with their index. For each element of original array, find the minimal element in BST that is bigger than current. It's index, stored as value, will be iprev. Then remove from BST all elements that are smaller that current. This operation should be O(logN), as you are removing whole subtrees. This step is required, as current element you are about to add will "override" all element that are less than it. Then add current element to BST with it's index as value. At each point of time, BST will store the descending subsequence of previous elements where each element is bigger than all it's predecessors in array (for previous elements {1,2,44,5,2,6,26,6} BST will store {44,26,6}). The backward traversal to find inext is similar.
After previous step you'll have pairs K→P where K is the value of some element from the initial array and P is the number of subarrays where this element is maxumum. Now you need to group this pairs by K. This means calculating sum of P values of the equal K elements. Be careful about the corner cases when two elements could have share the same subarrays.
As Ritesh suggested: Put all grouped K→P into an array, sort it by K and calculate cumulative sum of P for each element in one pass. It this case your queries will be binary searches in this sorted array. Each query will be performed in O(log(N)) time.
Create a sorted value-to-index map. For example,
[34,5,67,10,100] => {5:1, 10:3, 34:0, 67:2, 100:4}
Precalculate the queries in two passes over the value-to-index map:
Top to bottom - maintain an augmented tree of intervals. Each time an index is added,
split the appropriate interval and subtract the relevant segments from the total:
indexes intervals total sub-arrays with maximum greater than
4 (0,3) 67 => 15 - (4*5/2) = 5
2,4 (0,1)(3,3) 34 => 5 + (4*5/2) - 2*3/2 - 1 = 11
0,2,4 (1,1)(3,3) 10 => 11 + 2*3/2 - 1 = 13
3,0,2,4 (1,1) 5 => 13 + 1 = 14
Bottom to top - maintain an augmented tree of intervals. Each time an index is added,
adjust the appropriate interval and add the relevant segments to the total:
indexes intervals total sub-arrays with maximum less than
1 (1,1) 10 => 1*2/2 = 1
1,3 (1,1)(3,3) 34 => 1 + 1*2/2 = 2
0,1,3 (0,1)(3,3) 67 => 2 - 1 + 2*3/2 = 4
0,1,3,2 (0,3) 100 => 4 - 4 + 4*5/2 = 10
The third query can be pre-calculated along with the second:
indexes intervals total sub-arrays with maximum exactly
1 (1,1) 5 => 1
1,3 (3,3) 10 => 1
0,1,3 (0,1) 34 => 2
0,1,3,2 (0,3) 67 => 3 + 3 = 6
Insertion and deletion in augmented trees are of O(log n) time-complexity. Total precalculation time-complexity is O(n log n). Each query after that ought to be O(log n) time-complexity.
Lets say we have the following set of numbers representing values over time
1 2 3 10 1 20 40 60
Now I am looking for an algorithm to find the highest percentage increase from one time to another. In the above case, the answer would be the pair (1, 60), which has a 6000% increase.
So far, the best algorithm I can think of is a brute-force method. We consider all possible pairs using a series of iterations:
1st Iteration:
1-2 1-3 1-10 .. 1-60
2nd Iteration
2-3 2-10 2-1 ... 2-60
(etc.)
This has complexity O(n3).
I've also been thinking about another approach. Find all the strictly increasing sequences, and determine only the perecentage increase in those strictly increasing sequences.
Does any other idea strike you guys? Please do correct me if my ideas are wrong!
I may have misunderstood the problem, but it seems that all you want is the largest and smallest numbers, since those are the two numbers that matter.
while true:
indexOfMax = max(list)
indexOfMin = min(list)
list.remove(indexOfMax)
list.remove(indexOfMin)
if(indexOfmax < indexOfMin)
contine
else if(indexOfMax == indexOfMin)
return -1
else
SUCCESS
As I understand (you didn't correct me in your comment), you want to maximize a[i]/a[j] for all j <= i. If that's correct, then for each i we only need to know smallest value before it.
int current_min = INFINITY;
double max_increase = 0;
for (int i = 0; i < n; ++i) {
current_min = min(current_min, a[i]);
max_increase = max(max_increase, a[i] / min);
}
So you just want to compare each number pair-wise and see which pair has the highest ratio from the second number to the first number? Just iterating with two loops (one with i=0 to n, and an inner loop with j=i+1 to n) is going to give you O(n^2). I guess this is actually your original solution, but you incorrectly said the complexity was O(n^3). It's n^2.
You could get to O(n log n), though. Take your list, make it into a list where each element is a pair of (index, value). Then sort it by the second element of the pair. Then have two pointers into the list, one coming from the left (0 to n-1), and the other coming from the right (n-1 to 0). Find the first pair of elements such that the left element's original index is less than the right element's original index. Done.
1 2 3 10 1 20 40 60
becomes
(1,0) (2,1) (3,2) (10,3) (1, 4) (20, 5) (40, 6) (60,7)
becomes
(1,0) (1,4) (2,1) (3,2) (10,3) (20,5) (40,6) (60,7)
So your answer is 60/1, from index 0 to index 7.
If this isn't what you're looking for, it would help if you said what the right answer was for your example numbers.
If I understand your problem correctly, you are looking for two indices (i, j) in the array with i < j that has the highest ratio A[j]/A[i]. If so, then you can reduce it to this related problem, which asks you to find the indices (i, j) with i ≤ j such that A[j] - A[i] is as large as possible. That problem has a very fast O(n)-time, O(1)-space algorithm that can be adapted to this problem as well. The intuition is to solve the problem for the array consisting of just the first element of your array, then for the first two elements, then the first three, etc. Once you've solved the problem for the first n elements of the array, you have an overall solution to the problem.
Let's think about how to do this. Initially, when you consider just the first element of the array, the best percentage increase you can get is 0% by comparing the element with itself. Now, suppose (inductively) that you've solved the problem for the first k array elements and want to see what happens when you look at the next array element. When this happens, one of two conditions holds. First, the maximum percentage increase over the first k elements might also be the maximum percentage increase over the first (k + 1) elements as well. For example, if the (k+1)st array element is an extremely small number, then chances are you can't get a large percentage increase from something in the first k elements to that value. Second, the maximum percentage increase might be from one of the first k elements to the (k + 1)st element. If this is the case, the highest percentage increase would be from the smallest of the first k elements to the (k + 1)st element.
Combining these two cases, we get that the best percentage increase over the first k + 1 elements is the maximum of
The highest percentage increase of the first k elements, or
The percentage increase from the smallest of the first k elements to the (k + 1)st element.
You can implement this by iterating across the array elements keeping track of two values - the minimum value you've seen so far and the pair that maximizes the percent increase. As an example, for your original example array of
1 2 3 10 1 20 40 60
The algorithm would work like this:
1 2 3 10 1 20 40 60
min 1 1 1 1 1 1 1 1
best (1,1) (1, 2) (1, 3) (1, 10) (1, 10) (1, 20) (1, 40) (1, 60)
and you'd output (1, 60) as the highest percentage increase. On a different array, like this one:
3 1 4 2 5
then the algorithm would trace out like this:
3 1 4 2 5
min 3 1 1 1 1
best (3,3) (3,3) (1,4) (1,4) (1,5)
and you'd output (1, 5).
This whole algorithm uses only O(1) space and runs in O(n) time, which is an extremely good solution to the problem.
Alternatively, you can think about reducing this problem directly to the maximum single-sell profit problem by taking the logarithm of all of the values in your array. In that case, if you find a pair of values where log A[j] - log A[i] is maximized, this is equivalent (using properties of logarithms) to finding a pair of values where log (A[j] / A[i]) is maximized. Since the log function is monotonically increasing, this means that you have found a pair of values where A[j] / A[i] is maximized, as intended.