Find the median of the sum of the arrays - arrays

Two sorted arrays of length n are given and the question is to find, in O(n) time, the median of their sum array, which contains all the possible pairwise sums between every element of array A and every element of array B.
For instance: Let A[2,4,6] and B[1,3,5] be the two given arrays.
The sum array is [2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5]. Find the median of this array in O(n).
Solving the question in O(n^2) is pretty straight-forward but is there any O(n) solution to this problem?
Note: This is an interview question asked to one of my friends and the interviewer was quite sure that it can be solved in O(n) time.

The correct O(n) solution is quite complicated, and takes a significant amount of text, code and skill to explain and prove. More precisely, it takes 3 pages to do so convincingly, as can be seen in details here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack in the comments).
It is basically a clever divide-and-conquer algorithm that, among other things, takes advantage of the fact that in a sorted n-by-n matrix, one can find in O(n) the amount of elements that are smaller/greater than a given number k. It recursively breaks down the matrix into smaller submatrixes (by taking only the odd rows and columns, resulting in a submatrix that has n/2 colums and n/2 rows) which combined with the step above, results in a complexity of O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n). It is crazy!
I can't explain it better than the paper, which is why I'll explain a simpler, O(n logn) solution instead :).
O(n * logn) solution:
It's an interview! You can't get that O(n) solution in time. So hey, why not provide a solution that, although not optimal, shows you can do better than the other obvious O(n²) candidates?
I'll make use of the O(n) algorithm mentioned above, to find the amount of numbers that are smaller/greater than a given number k in a sorted n-by-n matrix. Keep in mind that we don't need an actual matrix! The Cartesian sum of two arrays of size n, as described by the OP, results in a sorted n-by-n matrix, which we can simulate by considering the elements of the array as follows:
a[3] = {1, 5, 9};
b[3] = {4, 6, 8};
//a + b:
{1+4, 1+6, 1+8,
5+4, 5+6, 5+8,
9+4, 9+6, 9+8}
Thus each row contains non-decreasing numbers, and so does each column. Now, pretend you're given a number k. We want to find in O(n) how many of the numbers in this matrix are smaller than k, and how many are greater. Clearly, if both values are less than (n²+1)/2, that means k is our median!
The algorithm is pretty simple:
int smaller_than_k(int k){
int x = 0, j = n-1;
for(int i = 0; i < n; ++i){
while(j >= 0 && k <= a[i]+b[j]){
--j;
}
x += j+1;
}
return x;
}
This basically counts how many elements fit the condition at each row. Since the rows and columns are already sorted as seen above, this will provide the correct result. And as both i and j iterate at most n times each, the algorithm is O(n) [Note that j does not get reset within the for loop]. The greater_than_k algorithm is similar.
Now, how do we choose k? That is the logn part. Binary Search! As has been mentioned in other answers/comments, the median must be a value contained within this array:
candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]};.
Simply sort this array [also O(n*logn)], and run the binary search on it. Since the array is now in non-decreasing order, it is straight-forward to notice that the amount of numbers smaller than each candidate[i] is also a non-decreasing value (monotonic function), which makes it suitable for the binary search. The largest number k = candidate[i] whose result smaller_than_k(k) returns smaller than (n²+1)/2 is the answer, and is obtained in log(n) iterations:
int b_search(){
int lo = 0, hi = n, mid, n2 = (n²+1)/2;
while(hi-lo > 1){
mid = (hi+lo)/2;
if(smaller_than_k(candidate[mid]) < n2)
lo = mid;
else
hi = mid;
}
return candidate[lo]; // the median
}

Let's say the arrays are A = {A[1] ... A[n]}, and B = {B[1] ... B[n]}, and the pairwise sum array is C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n} which has n^2 elements and we need to find its median.
Median of C must be an element of the array D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]}: if you fix A[i], and consider all the sums A[i] + B[j], you would see that the only A[i] + B[j = n + 1 - i] (which is one of D) could be the median. That is, it may not be the median, but if it is not, then all other A[i] + B[j] are also not median.
This can be proved by considering all B[j] and count the number of values that are lower and number of values that are greater than A[i] + B[j] (we can do this quite accurately because the two arrays are sorted -- the calculation is a bit messy thought). You'd see that for A[i] + B[n + 1 - j] these two counts are most "balanced".
The problem then reduces to finding median of D, which has only n elements. An algorithm such as Hoare's will work.
UPDATE: this answer is wrong. The real conclusion here is that the median is one of D's element, but then D's median is the not the same as C's median.

Doesn't this work?:
You can compute the rank of a number in linear time as long as A and B are sorted. The technique you use for computing the rank can also be used to find all things in A+B that are between some lower bound and some upper bound in time linear the size of the output plus |A|+|B|.
Randomly sample n things from A+B. Take the median, say foo. Compute the rank of foo. With constant probability, foo's rank is within n of the median's rank. Keep doing this (an expected constant number of times) until you have lower and upper bounds on the median that are within 2n of each other. (This whole process takes expected linear time, but it's obviously slow.)
All you have to do now is enumerate everything between the bounds and do a linear-time selection on a linear-sized list.
(Unrelatedly, I wouldn't excuse the interviewer for asking such an obviously crappy interview question. Stuff like this in no way indicates your ability to code.)
EDIT: You can compute the rank of a number x by doing something like this:
Set i = j = 0.
While j < |B| and A[i] + B[j] <= x, j++.
While i < |A| {
While A[i] + B[j] > x and j >= 0, j--.
If j < 0, break.
rank += j+1.
i++.
}
FURTHER EDIT: Actually, the above trick only narrows down the candidate space to about n log(n) members of A+B. Then you have a general selection problem within a universe of size n log(n); you can do basically the same trick one more time and find a range of size proportional to sqrt(n) log(n) where you do selection.
Here's why: If you sample k things from an n-set and take the median, then the sample median's order is between the (1/2 - sqrt(log(n) / k))th and the (1/2 + sqrt(log(n) / k))th elements with at least constant probability. When n = |A+B|, we'll want to take k = sqrt(n) and we get a range of about sqrt(n log n) elements --- that's about |A| log |A|. But then you do it again and you get a range on the order of sqrt(n) polylog(n).

You should use a selection algorithm to find the median of an unsorted list in O(n). Look at this: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

Related

Find three elements in a sorted array which sum to a fourth element

A friend of mine recently got this interview question, which seems to us to be solvable but not within the asymptotic time bounds that the interviewer thought should be possible. Here is the problem:
You have an array of N integers, xs, sorted but possibly non-distinct. Your goal is to find four array indices(1) (a,b,c,d) such that the following two properties hold:
xs[a] + xs[b] + xs[c] = xs[d]
a < b < c < d
The goal is to do this in O(N2) time.
First, an O(N3log(N)) solution is obvious: for each (a,b,c) ordered triple, use binary search to see if an appropriate d can be found. Now, how to do better?
One interesting suggestion from the interviewer is to rewrite the first condition as:
xs[a] + xs[b] = xs[d] - xs[c]
It's not clear what to do after this, but perhaps we could chose some pivot value P, and search for an (a,b) pair adding up to P, and a (d,c) pair subtracting to it. That search is easy enough to do in O(n) time for a given P, by searching inwards from both ends of the array. However, it seems to me that the problem with this is that there are N2 such values P, not just N of them, so we haven't actually reduced the problem size at all: we're doing O(N) work, O(N2) times.
We found some related problems being discussed online elsewhere: Find 3 numbers in an array adding to a given sum is solvable in N2 time, but requires that the sum be fixed ahead of time; adapting the same algorithm but iterating through each possible sum leaves us at N3 as always.
Another related problem seems to be Find all triplets in array with sum less than or equal to given sum, but I'm not sure how much of the stuff there is relevant here: an inequality rather than an equality mixes things up quite a bit, and of course the target is fixed rather than varying.
So, what are we missing? Is the problem impossible after all, given the performance requirements? Or is there a clever algorithm we're unable to spot?
(1) Actually the problem as posed is to find all such (a,b,c,d) tuples, and return a count of how many there are. But I think even finding a single one of them in the required time constraints is hard enough.
If the algorithm would have to list the solutions (i.e. the sets of a, b, c, and d that satisfy the condition), the worst case time complexity is O(n4):
1. There can be O(n4) solutions
The trivial example is an array with only 0 values in it. Then a, b, c and d have all the freedom as long as they stay in order. This represents O(n4) solutions.
But more generally arrays which follow the following pattern have O(n4) solutions:
w, w, w, ... x, x, x, ..., y, y, y, ... z, z, z, ....
With just as many occurrences of each, and:
w + x + y = z
However, to only produce the number of solutions, an algorithm can have a better time complexity.
2. Algorithm
This is a slight variation of the already posted algorithm, which does not involve the H factor. It also describes how to handle cases where different configurations lead to the same sums.
Retrieve all pairs and store them in an array X, where each element gets the following information:
a: the smallest index of the two
b: the other index
sum: the value of xs[a] + xs[b]
At the same time also store for each such pair in another array Y, the following:
c: the smallest index of the two
d: the other index
sum: the value of xs[d] - xs[c]
The above operation has a time complexity of O(n²)
Sort both arrays by their element's sum attribute. In case of equal sum values, the sort order will be determined as follows: for the X array by increasing b; for the Y array by decreasing c. Sorting can be done in O(n²) O(n²logn) time.
[Edit: I could not prove the earlier claim of O(n²) (unless some assumptions are made that allow for a radix/bucket sorting algorithm, which I will not assume). As noted in comments, in general an array with n² elements can be sorted in O(n²logn²), which is O(n²logn), but not O(n²)]
Go through both arrays in "tandem" to find pairs of sums that are equal. If that is the case, it needs to be checked that X[i].b < Y[j].c. If so it represents a solution. But there could be many of them, and counting those in an acceptable time needs special care.
Let m = n(n-1)/2, i.e. the number of elements in array X (which is also the size of array Y):
i = 0
j = 0
while i < m and j < m:
if X[i].sum < Y[j].sum:
i = i + 1
elif X[i].sum > Y[j].sum:
j = j + 1
else:
# We have a solution. Need to count all others that have same sums in X and Y.
# Find last match in Y and set k as index to it:
countY = 0
while k < m and X[i].sum == Y[j].sum and X[i].b < Y[j].c:
countY = countY + 1
j = j + 1
k = j - 1
# add chunks to `count`:
while i < m and countY >= 0 and X[i].sum == Y[k].sum:
while countY >= 0 and X[i].b >= Y[k].c:
countY = countY - 1
k = k - 1
count = count + countY
i = i + 1
Note that although there are nested loops, the variable i only ever increments, and so does j. The variable k always decrements in the innermost loop. Although it also gets higher values to start from, it can never address the same Y element more than a constant number of times via the k index, because while decrementing this index, it stays within the "same sum" range of Y.
So this means that this last part of the algorithm runs in O(m), which is O(n²). As my latest edit confirmed that the sorting step is not O(n²), that step determines the overall time-complexity: O(n²logn).
So one solution can be :
List all x[a] + x[b] value possible such that a < b and hash them in this fashion
key = (x[a]+x[b]) and value = (a,b).
Complexity of this step - O(n^2)
Now List all x[d] - x[c] values possible such that d > c. Also for each x[d] - x[c] search the entry in your hash map by querying. We have a solution if there exists an entry such that c > b for any hit.
Complexity of this step - O(n^2) * H.
Where H is the search time in your hashmap.
Total complexity - O(n^2)* H. Now H may be O(1). This could done if the range of values in the array is small. Also the choice of hash function would depend on the properties of elements in the array.

Does "Find all triplets whose sum is less than some number" have any solution better than O(n3) runtime? [duplicate]

This question already has answers here:
Find all triplets in array with sum less than or equal to given sum
(5 answers)
Closed 8 years ago.
I got asked this on an interview.
Given an array of ints, find all triplets whose sum is less than some number
After some scrambling I told the interviewer that the best solution would still lead to worst-case runtime O(n3) and possibly would need O(n3).
The interviewer blatantly disagreed with me and told me "you need to go back to your algorithms...".
Am I missing something?
A possible optimization will be:
Remove all elements in the array that bigger than sum;
Sort the array;
Run O(N^2) to pick up a[i] + a[j], then binary search for sum - a[i] - a[j] in the range of [j + 1, N], the index is the number of possible candidates, but you should subtract j since they have been covered.
The complexity will be O(N^2 log N), slightly better.
You can solve this O(n^2) time:
First, sort the array.
Then, loop over the array with the first pointer i.
Now, use a second pointer j to loop up from there and a third pointer k to simultaneously loop down from the end.
Whenever you're in a situation where A[i]+A[j]+A[k] < X, you know that the same holds for all j<k'<k so increment your count with k-j and increment j. I keep the hidden invariant that A[i]+A[j]+A[k+1] >= X, so incrementing j only makes that statement stronger.
Otherwise, decrement k. When j and k meet, increment i.
You will only increment j and decrement k, so they need O(n) amortized time to meet.
In pseudocode:
count= 0
for i = 0; i < N; i++
j = i+1
k = N-1
while j < k
if A[i] + A[j] + A[k] < X
count += k-j
j++
else
k--
I see that you ask for all triplets. It is quite obvious that there can be O(n^3) triplets, so if you want them all you will need as much time, worst case.
This is an example of a problem where the output size matters. For example, if the array contains just 1, 2, 3, 4, 5, ..., n and the maximum value is set at 3n then every single triplet will be an answer, and you have to do Ω(n3) work just to list them all. On the other hand, if the maximum value had been 0, it would be nice to finish in O(n) time after confirming all the items are too large.
Basically, we want an output-sensitive algorithm with a running time that's something like O(f(n) + t) where t is the output size and n is the input size.
An O(n2 + t) algorithm would work by essentially tracking the transition points where triplets transitioned from being over the limit to under the limit. Then it would yield everything under that surface. The space is three-dimensional so the surface is two-dimensional, and you can track along it from point to point in aggregate constant time.
Here's some python code (untested!):
def findTripletsBelow(items, limit):
surfaceCoords = []
s = sorted(items)
for i in range(len(s)):
k = len(s)-1
for j in range(i, len(s))
while k >= 0 and s[i]+s[j]+s[k] > limit:
k -= 1
if k < 0: break
surfaceCoords.append((i,j,k))
results = []
for (i,j,k) in surfaceCoords:
for k2 in range(k+1):
results.append((s[i], s[j], s[k2]))
return results
O(n2) algorithm.
Sort the list.
For every element ai, this is how you calculate the number of combinations:
Binary search and find maximum aj such that j < i and ai+aj <= total.
Binary search and find maximum ak such that k < j and ai+aj+ak <= total
For this particular combination of (ai, aj), k is the number of sums that is less than or equal to total.
Now decrement j and increment k as much as possible (but ai+aj+ak <= total )
The total number of increments and decrements is less than i. So for a particular i the complexity is O(i). Therefore overall complexity is O(n2).
I am leaving out many corner conditions, but this should give you an idea.
Edit:
In the worst case there are O(n3) solutions. So outputting them explicitly would certainly require O(n3) time. There is no way around it.
But if you want to return a implicit list (i.e. a compressed list of combinations) this would still work. An example of compressed output would be (ai, aj, ak) for k in 1:p.

Is there an O(n) algorithm to generate a prefix-less array for an positive integer array?

For array [4,3,5,1,2],
we call prefix of 4 is NULL, prefix-less of 4 is 0;
prefix of 3 is [4], prefix-less of 3 is 0, because none in prefix is less than 3;
prefix of 5 is [4,3], prefix-less of 5 is 2, because 4 and 3 are both less than 5;
prefix of 1 is [4,3,5], prefix-less of 1 is 0, because none in prefix is less than 1;
prefix of 2 is [4,3,5,1], prefix-less of 2 is 1, because only 1 is less than 2
So for array [4, 3, 5, 1, 2], we get prefix-less arrary of [0,0, 2,0,1],
Can we get an O(n) algorithm to get prefix-less array?
It can't be done in O(n) for the same reasons a comparison sort requires O(n log n) comparisons. The number of possible prefix-less arrays is n! so you need at least log2(n!) bits of information to identify the correct prefix-less array. log2(n!) is O(n log n), by Stirling's approximation.
Assuming that the input elements are always fixed-width integers you can use a technique based on radix sort to achieve linear time:
L is the input array
X is the list of indexes of L in focus for current pass
n is the bit we are currently working on
Count is the number of 0 bits at bit n left of current location
Y is the list of indexs of a subsequence of L for recursion
P is a zero initialized array that is the output (the prefixless array)
In pseudo-code...
Def PrefixLess(L, X, n)
if (n == 0)
return;
// setup prefix less for bit n
Count = 0
For I in 1 to |X|
P(I) += Count
If (L(X(I))[n] == 0)
Count++;
// go through subsequence with bit n-1 with bit(n) = 1
Y = []
For I in 1 to |X|
If (L(X(I))[n] == 1)
Y.append(X(I))
PrefixLess(L, Y, n-1)
// go through subsequence on bit n-1 where bit(n) = 0
Y = []
For I in 1 to |X|
If (L(X(I))[n] == 0)
Y.append(X(I))
PrefixLess(L, Y, n-1)
return P
and then execute:
PrefixLess(L, 1..|L|, 32)
I think this should work, but double check the details. Let's call an element in the original array a[i] and one in the prefix array as p[i] where i is the ith element of the respective arrays.
So, say we are at a[i] and we have already computed the value of p[i]. There are three possible cases. If a[i] == a[i+1], then p[i] == p[i+1]. If a[i] < a[i+1], then p[i+1] >= p[i] + 1. This leaves us with the case where a[i] > a[i+1]. In this situation we know that p[i+1] >= p[i].
In the naïve case, we go back through the prefix and start counting items less than a[i]. However, we can do better than that. First, recognize that the minimum value for p[i] is 0 and the maximum is i. Next look at the case of an index j, where i > j. If a[i] >= a[j], then p[i] >= p[j]. If a[i] < a[j], then p[i] <= p[j] + j . So, we can start going backwards through p updating the values for p[i]_min and p[i]_max. If p[i]_min equals p[i]_max, then we have our solution.
Doing a back of the envelope analysis of the algorithm, it has O(n) best case performance. This is the case where the list is already sorted. The worst case is where it is reversed sorted. Then the performance is O(n^2). The average performance is going to be O(k*n) where k is how much one needs to backtrack. My guess is for randomly distributed integers, k will be small.
I am also pretty sure there would be ways to optimize this algorithm for cases of partially sorted data. I would look at Timsort for some inspiration on how to do this. It uses run detection to detect partially sorted data. So the basic idea for the algorithm would be to go through the list once and look for runs of data. For ascending runs of data you are going to have the case where p[i+1] = p[i]+1. For descending runs, p[i] = p_run[0] where p_run is the first element in the run.

Finding kth smallest number from n sorted arrays

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)
I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two.
This has logarithmic time complexity.
After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO.
Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)
Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?
I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.
Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.
Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.
PS: Do you think this can be further generalized to the case of unsorted arrays?
PPS: It's not a homework problem, I am just preparing for interviews.
This doesn't generalize the links, but does solve the problem:
Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
Identify the largest remaining array A. If more than one, pick one.
Pick the middle element M of the largest array A.
Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).
When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.
Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.
It can not be done in less than O(n) time. Proof Sketch If it did, it would have to completely not look at at least one array. Obviously, one array can arbitrarily change the value of the kth element.
I have a relatively simple O(n*log(n)*log(m)) where m is the length of the longest array. I'm sure it is possible to be slightly faster, but not a lot faster.
Consider the simple case where you have n arrays each of length 1. Obviously, this is isomorphic to finding the kth element in an unsorted list of length n. It is possible to find this in O(n), see Median of Medians algorithm, originally by Blum, Floyd, Pratt, Rivest and Tarjan, and no (asymptotically) faster algorithms are possible.
Now the problem is how to expand this to longer sorted arrays. Here is the algorithm: Find the median of each array. Sort the list of tuples (median,length of array/2) and sort it by median. Walk through keeping a sum of the lengths, until you reach a sum greater than k. You now have a pair of medians, such that you know the kth element is between them. Now for each median, we know if the kth is greater or less than it, so we can throw away half of each array. Repeat. Once the arrays are all one element long (or less), we use the selection algorithm.
Implementing this will reveal additional complexities and edge conditions, but nothing that increases the asymptotic complexity. Each step
Finds the medians or the arrays, O(1) each, so O(n) total
Sorts the medians O(n log n)
Walks through the sorted list O(n)
Slices the arrays O(1) each so, O(n) total
that is O(n) + O(n log n) + O(n) + O(n) = O(n log n). And, we must perform this untill the longest array is length 1, which will take log m steps for a total of O(n*log(n)*log(m))
You ask if this can be generalized to the case of unsorted arrays. Sadly, the answer is no. Consider the case where we only have one array, then the best algorithm will have to compare at least once with each element for a total of O(m). If there were a faster solution for n unsorted arrays, then we could implement selection by splitting our single array into n parts. Since we just proved selection is O(m), we are stuck.
You could look at my recent answer on the related question here. The same idea can be generalized to multiple arrays instead of 2. In each iteration you could reject the second half of the array with the largest middle element if k is less than sum of mid indexes of all arrays. Alternately, you could reject the first half of the array with the smallest middle element if k is greater than sum of mid indexes of all arrays, adjust k. Keep doing this until you have all but one array reduced to 0 in length. The answer is kth element of the last array which wasn't stripped to 0 elements.
Run-time analysis:
You get rid of half of one array in each iteration. But to determine which array is going to be reduced, you spend time linear to the number of arrays. Assume each array is of the same length, the run time is going to be cclog(n), where c is the number of arrays and n is the length of each array.
There exist an generalization that solves the problem in O(N log k) time, see the question here.
Old question, but none of the answers were good enough. So I am posting the solution using sliding window technique and heap:
class Node {
int elementIndex;
int arrayIndex;
public Node(int elementIndex, int arrayIndex) {
super();
this.elementIndex = elementIndex;
this.arrayIndex = arrayIndex;
}
}
public class KthSmallestInMSortedArrays {
public int findKthSmallest(List<Integer[]> lists, int k) {
int ans = 0;
PriorityQueue<Node> pq = new PriorityQueue<>((a, b) -> {
return lists.get(a.arrayIndex)[a.elementIndex] -
lists.get(b.arrayIndex)[b.elementIndex];
});
for (int i = 0; i < lists.size(); i++) {
Integer[] arr = lists.get(i);
if (arr != null) {
Node n = new Node(0, i);
pq.add(n);
}
}
int count = 0;
while (!pq.isEmpty()) {
Node curr = pq.poll();
ans = lists.get(curr.arrayIndex)[curr.elementIndex];
if (++count == k) {
break;
}
curr.elementIndex++;
pq.offer(curr);
}
return ans;
}
}
The maximum number of elements that we need to access here is O(K) and there are M arrays. So the effective time complexity will be O(K*log(M)).
This would be the code. O(k*log(m))
public int findKSmallest(int[][] A, int k) {
PriorityQueue<int[]> queue = new PriorityQueue<>(Comparator.comparingInt(x -> A[x[0]][x[1]]));
for (int i = 0; i < A.length; i++)
queue.offer(new int[] { i, 0 });
int ans = 0;
while (!queue.isEmpty() && --k >= 0) {
int[] el = queue.poll();
ans = A[el[0]][el[1]];
if (el[1] < A[el[0]].length - 1) {
el[1]++;
queue.offer(el);
}
}
return ans;
}
If the k is not that huge, we can maintain a priority min queue. then loop for every head of the sorted array to get the smallest element and en-queue. when the size of the queue is k. we get the first k smallest .
maybe we can regard the n sorted array as buckets then try the bucket sort method.
This could be considered the second half of a merge sort. We could simply merge all the sorted lists into a single list...but only keep k elements in the combined lists from merge to merge. This has the advantage of only using O(k) space, but something slightly better than merge sort's O(n log n) complexity. That is, it should in practice operate slightly faster than a merge sort. Choosing the kth smallest from the final combined list is O(1). This is kind of complexity is not so bad.
It can be done by doing binary search in each array, while calculating the number of smaller elements.
I used the bisect_left and bisect_right to make it work for non-unique numbers as well,
from bisect import bisect_left
from bisect import bisect_right
def kthOfPiles(givenPiles, k, count):
'''
Perform binary search for kth element in multiple sorted list
parameters
==========
givenPiles are list of sorted list
count is the total number of
k is the target index in range [0..count-1]
'''
begins = [0 for pile in givenPiles]
ends = [len(pile) for pile in givenPiles]
#print('finding k=', k, 'count=', count)
for pileidx,pivotpile in enumerate(givenPiles):
while begins[pileidx] < ends[pileidx]:
mid = (begins[pileidx]+ends[pileidx])>>1
midval = pivotpile[mid]
smaller_count = 0
smaller_right_count = 0
for pile in givenPiles:
smaller_count += bisect_left(pile,midval)
smaller_right_count += bisect_right(pile,midval)
#print('check midval', midval,smaller_count,k,smaller_right_count)
if smaller_count <= k and k < smaller_right_count:
return midval
elif smaller_count > k:
ends[pileidx] = mid
else:
begins[pileidx] = mid+1
return -1
Please find the below C# code to Find the k-th Smallest Element in the Union of Two Sorted Arrays. Time Complexity : O(logk)
public int findKthElement(int k, int[] array1, int start1, int end1, int[] array2, int start2, int end2)
{
// if (k>m+n) exception
if (k == 0)
{
return Math.Min(array1[start1], array2[start2]);
}
if (start1 == end1)
{
return array2[k];
}
if (start2 == end2)
{
return array1[k];
}
int mid = k / 2;
int sub1 = Math.Min(mid, end1 - start1);
int sub2 = Math.Min(mid, end2 - start2);
if (array1[start1 + sub1] < array2[start2 + sub2])
{
return findKthElement(k - mid, array1, start1 + sub1, end1, array2, start2, end2);
}
else
{
return findKthElement(k - mid, array1, start1, end1, array2, start2 + sub2, end2);
}
}

Total number of possible triangles from n numbers

If n numbers are given, how would I find the total number of possible triangles? Is there any method that does this in less than O(n^3) time?
I am considering a+b>c, b+c>a and a+c>b conditions for being a triangle.
Assume there is no equal numbers in given n and it's allowed to use one number more than once. For example, we given a numbers {1,2,3}, so we can create 7 triangles:
1 1 1
1 2 2
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3
If any of those assumptions isn't true, it's easy to modify algorithm.
Here I present algorithm which takes O(n^2) time in worst case:
Sort numbers (ascending order).
We will take triples ai <= aj <= ak, such that i <= j <= k.
For each i, j you need to find largest k that satisfy ak <= ai + aj. Then all triples (ai,aj,al) j <= l <= k is triangle (because ak >= aj >= ai we can only violate ak < a i+ aj).
Consider two pairs (i, j1) and (i, j2) j1 <= j2. It's easy to see that k2 (found on step 2 for (i, j2)) >= k1 (found one step 2 for (i, j1)). It means that if you iterate for j, and you only need to check numbers starting from previous k. So it gives you O(n) time complexity for each particular i, which implies O(n^2) for whole algorithm.
C++ source code:
int Solve(int* a, int n)
{
int answer = 0;
std::sort(a, a + n);
for (int i = 0; i < n; ++i)
{
int k = i;
for (int j = i; j < n; ++j)
{
while (n > k && a[i] + a[j] > a[k])
++k;
answer += k - j;
}
}
return answer;
}
Update for downvoters:
This definitely is O(n^2)! Please read carefully "An Introduction of Algorithms" by Thomas H. Cormen chapter about Amortized Analysis (17.2 in second edition).
Finding complexity by counting nested loops is completely wrong sometimes.
Here I try to explain it as simple as I could. Let's fix i variable. Then for that i we must iterate j from i to n (it means O(n) operation) and internal while loop iterate k from i to n (it also means O(n) operation). Note: I don't start while loop from the beginning for each j. We also need to do it for each i from 0 to n. So it gives us n * (O(n) + O(n)) = O(n^2).
There is a simple algorithm in O(n^2*logn).
Assume you want all triangles as triples (a, b, c) where a <= b <= c.
There are 3 triangle inequalities but only a + b > c suffices (others then hold trivially).
And now:
Sort the sequence in O(n * logn), e.g. by merge-sort.
For each pair (a, b), a <= b the remaining value c needs to be at least b and less than a + b.
So you need to count the number of items in the interval [b, a+b).
This can be simply done by binary-searching a+b (O(logn)) and counting the number of items in [b,a+b) for every possibility which is b-a.
All together O(n * logn + n^2 * logn) which is O(n^2 * logn). Hope this helps.
If you use a binary sort, that's O(n-log(n)), right? Keep your binary tree handy, and for each pair (a,b) where a b and c < (a+b).
Let a, b and c be three sides. The below condition must hold for a triangle (Sum of two sides is greater than the third side)
i) a + b > c
ii) b + c > a
iii) a + c > b
Following are steps to count triangle.
Sort the array in non-decreasing order.
Initialize two pointers ‘i’ and ‘j’ to first and second elements respectively, and initialize count of triangles as 0.
Fix ‘i’ and ‘j’ and find the rightmost index ‘k’ (or largest ‘arr[k]‘) such that ‘arr[i] + arr[j] > arr[k]‘. The number of triangles that can be formed with ‘arr[i]‘ and ‘arr[j]‘ as two sides is ‘k – j’. Add ‘k – j’ to count of triangles.
Let us consider ‘arr[i]‘ as ‘a’, ‘arr[j]‘ as b and all elements between ‘arr[j+1]‘ and ‘arr[k]‘ as ‘c’. The above mentioned conditions (ii) and (iii) are satisfied because ‘arr[i] < arr[j] < arr[k]'. And we check for condition (i) when we pick 'k'
4.Increment ‘j’ to fix the second element again.
Note that in step 3, we can use the previous value of ‘k’. The reason is simple, if we know that the value of ‘arr[i] + arr[j-1]‘ is greater than ‘arr[k]‘, then we can say ‘arr[i] + arr[j]‘ will also be greater than ‘arr[k]‘, because the array is sorted in increasing order.
5.If ‘j’ has reached end, then increment ‘i’. Initialize ‘j’ as ‘i + 1′, ‘k’ as ‘i+2′ and repeat the steps 3 and 4.
Time Complexity: O(n^2).
The time complexity looks more because of 3 nested loops. If we take a closer look at the algorithm, we observe that k is initialized only once in the outermost loop. The innermost loop executes at most O(n) time for every iteration of outer most loop, because k starts from i+2 and goes upto n for all values of j. Therefore, the time complexity is O(n^2).
I have worked out an algorithm that runs in O(n^2 lgn) time. I think its correct...
The code is wtitten in C++...
int Search_Closest(A,p,q,n) /*Returns the index of the element closest to n in array
A[p..q]*/
{
if(p<q)
{
int r = (p+q)/2;
if(n==A[r])
return r;
if(p==r)
return r;
if(n<A[r])
Search_Closest(A,p,r,n);
else
Search_Closest(A,r,q,n);
}
else
return p;
}
int no_of_triangles(A,p,q) /*Returns the no of triangles possible in A[p..q]*/
{
int sum = 0;
Quicksort(A,p,q); //Sorts the array A[p..q] in O(nlgn) expected case time
for(int i=p;i<=q;i++)
for(int j =i+1;j<=q;j++)
{
int c = A[i]+A[j];
int k = Search_Closest(A,j,q,c);
/* no of triangles formed with A[i] and A[j] as two sides is (k+1)-2 if A[k] is small or equal to c else its (k+1)-3. As index starts from zero we need to add 1 to the value*/
if(A[k]>c)
sum+=k-2;
else
sum+=k-1;
}
return sum;
}
Hope it helps........
possible answer
Although we can use binary search to find the value of 'k' hence improve time complexity!
N0,N1,N2,...Nn-1
sort
X0,X1,X2,...Xn-1 as X0>=X1>=X2>=...>=Xn-1
choice X0(to Xn-3) and choice form rest two item x1...
choice case of (X0,X1,X2)
check(X0<X1+X2)
OK is find and continue
NG is skip choice rest
It seems there is no algorithm better than O(n^3). In the worst case, the result set itself has O(n^3) elements.
For Example, if n equal numbers are given, the algorithm has to return n*(n-1)*(n-2) results.

Resources