Finding kth smallest number from n sorted arrays - arrays

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)
I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two.
This has logarithmic time complexity.
After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO.
Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)
Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?
I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.
Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.
Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.
PS: Do you think this can be further generalized to the case of unsorted arrays?
PPS: It's not a homework problem, I am just preparing for interviews.

This doesn't generalize the links, but does solve the problem:
Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
Identify the largest remaining array A. If more than one, pick one.
Pick the middle element M of the largest array A.
Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).
When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.
Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.

It can not be done in less than O(n) time. Proof Sketch If it did, it would have to completely not look at at least one array. Obviously, one array can arbitrarily change the value of the kth element.
I have a relatively simple O(n*log(n)*log(m)) where m is the length of the longest array. I'm sure it is possible to be slightly faster, but not a lot faster.
Consider the simple case where you have n arrays each of length 1. Obviously, this is isomorphic to finding the kth element in an unsorted list of length n. It is possible to find this in O(n), see Median of Medians algorithm, originally by Blum, Floyd, Pratt, Rivest and Tarjan, and no (asymptotically) faster algorithms are possible.
Now the problem is how to expand this to longer sorted arrays. Here is the algorithm: Find the median of each array. Sort the list of tuples (median,length of array/2) and sort it by median. Walk through keeping a sum of the lengths, until you reach a sum greater than k. You now have a pair of medians, such that you know the kth element is between them. Now for each median, we know if the kth is greater or less than it, so we can throw away half of each array. Repeat. Once the arrays are all one element long (or less), we use the selection algorithm.
Implementing this will reveal additional complexities and edge conditions, but nothing that increases the asymptotic complexity. Each step
Finds the medians or the arrays, O(1) each, so O(n) total
Sorts the medians O(n log n)
Walks through the sorted list O(n)
Slices the arrays O(1) each so, O(n) total
that is O(n) + O(n log n) + O(n) + O(n) = O(n log n). And, we must perform this untill the longest array is length 1, which will take log m steps for a total of O(n*log(n)*log(m))
You ask if this can be generalized to the case of unsorted arrays. Sadly, the answer is no. Consider the case where we only have one array, then the best algorithm will have to compare at least once with each element for a total of O(m). If there were a faster solution for n unsorted arrays, then we could implement selection by splitting our single array into n parts. Since we just proved selection is O(m), we are stuck.

You could look at my recent answer on the related question here. The same idea can be generalized to multiple arrays instead of 2. In each iteration you could reject the second half of the array with the largest middle element if k is less than sum of mid indexes of all arrays. Alternately, you could reject the first half of the array with the smallest middle element if k is greater than sum of mid indexes of all arrays, adjust k. Keep doing this until you have all but one array reduced to 0 in length. The answer is kth element of the last array which wasn't stripped to 0 elements.
Run-time analysis:
You get rid of half of one array in each iteration. But to determine which array is going to be reduced, you spend time linear to the number of arrays. Assume each array is of the same length, the run time is going to be cclog(n), where c is the number of arrays and n is the length of each array.

There exist an generalization that solves the problem in O(N log k) time, see the question here.

Old question, but none of the answers were good enough. So I am posting the solution using sliding window technique and heap:
class Node {
int elementIndex;
int arrayIndex;
public Node(int elementIndex, int arrayIndex) {
super();
this.elementIndex = elementIndex;
this.arrayIndex = arrayIndex;
}
}
public class KthSmallestInMSortedArrays {
public int findKthSmallest(List<Integer[]> lists, int k) {
int ans = 0;
PriorityQueue<Node> pq = new PriorityQueue<>((a, b) -> {
return lists.get(a.arrayIndex)[a.elementIndex] -
lists.get(b.arrayIndex)[b.elementIndex];
});
for (int i = 0; i < lists.size(); i++) {
Integer[] arr = lists.get(i);
if (arr != null) {
Node n = new Node(0, i);
pq.add(n);
}
}
int count = 0;
while (!pq.isEmpty()) {
Node curr = pq.poll();
ans = lists.get(curr.arrayIndex)[curr.elementIndex];
if (++count == k) {
break;
}
curr.elementIndex++;
pq.offer(curr);
}
return ans;
}
}
The maximum number of elements that we need to access here is O(K) and there are M arrays. So the effective time complexity will be O(K*log(M)).

This would be the code. O(k*log(m))
public int findKSmallest(int[][] A, int k) {
PriorityQueue<int[]> queue = new PriorityQueue<>(Comparator.comparingInt(x -> A[x[0]][x[1]]));
for (int i = 0; i < A.length; i++)
queue.offer(new int[] { i, 0 });
int ans = 0;
while (!queue.isEmpty() && --k >= 0) {
int[] el = queue.poll();
ans = A[el[0]][el[1]];
if (el[1] < A[el[0]].length - 1) {
el[1]++;
queue.offer(el);
}
}
return ans;
}

If the k is not that huge, we can maintain a priority min queue. then loop for every head of the sorted array to get the smallest element and en-queue. when the size of the queue is k. we get the first k smallest .
maybe we can regard the n sorted array as buckets then try the bucket sort method.

This could be considered the second half of a merge sort. We could simply merge all the sorted lists into a single list...but only keep k elements in the combined lists from merge to merge. This has the advantage of only using O(k) space, but something slightly better than merge sort's O(n log n) complexity. That is, it should in practice operate slightly faster than a merge sort. Choosing the kth smallest from the final combined list is O(1). This is kind of complexity is not so bad.

It can be done by doing binary search in each array, while calculating the number of smaller elements.
I used the bisect_left and bisect_right to make it work for non-unique numbers as well,
from bisect import bisect_left
from bisect import bisect_right
def kthOfPiles(givenPiles, k, count):
'''
Perform binary search for kth element in multiple sorted list
parameters
==========
givenPiles are list of sorted list
count is the total number of
k is the target index in range [0..count-1]
'''
begins = [0 for pile in givenPiles]
ends = [len(pile) for pile in givenPiles]
#print('finding k=', k, 'count=', count)
for pileidx,pivotpile in enumerate(givenPiles):
while begins[pileidx] < ends[pileidx]:
mid = (begins[pileidx]+ends[pileidx])>>1
midval = pivotpile[mid]
smaller_count = 0
smaller_right_count = 0
for pile in givenPiles:
smaller_count += bisect_left(pile,midval)
smaller_right_count += bisect_right(pile,midval)
#print('check midval', midval,smaller_count,k,smaller_right_count)
if smaller_count <= k and k < smaller_right_count:
return midval
elif smaller_count > k:
ends[pileidx] = mid
else:
begins[pileidx] = mid+1
return -1

Please find the below C# code to Find the k-th Smallest Element in the Union of Two Sorted Arrays. Time Complexity : O(logk)
public int findKthElement(int k, int[] array1, int start1, int end1, int[] array2, int start2, int end2)
{
// if (k>m+n) exception
if (k == 0)
{
return Math.Min(array1[start1], array2[start2]);
}
if (start1 == end1)
{
return array2[k];
}
if (start2 == end2)
{
return array1[k];
}
int mid = k / 2;
int sub1 = Math.Min(mid, end1 - start1);
int sub2 = Math.Min(mid, end2 - start2);
if (array1[start1 + sub1] < array2[start2 + sub2])
{
return findKthElement(k - mid, array1, start1 + sub1, end1, array2, start2, end2);
}
else
{
return findKthElement(k - mid, array1, start1, end1, array2, start2 + sub2, end2);
}
}

Related

Algorithm for deciding if a,b,c exist in an array so that a+b+c = z? [duplicate]

This question already has answers here:
Finding three elements in an array whose sum is closest to a given number
(15 answers)
O(NlogN) finding 3 numbers that have a sum of any arbitrary T in an array
(7 answers)
Closed 5 years ago.
a little stuck finding an efficient algorithm for the following problem. The algo has to decide if there are 3 elements a,b and c in an array so that a+b+c is equal to a given number z.
The naive way would be to try out the combinations, of course, but asymptotically the time needed would be too large.
For finding a and b in an array so that the sum is z is much easier. Sort the given array in ascending order and check for every element if z-a exists. But I'm not sure how I'd implement it in the 3 element problem and what time would be needed.
Any help is much appreciated!
Edit: a,b,c and z are integers.
The approach is very similar to finding a and b with sum z.
First sort the array. And then fix a at position i and check if you have sumz-a in the limits i + 1 to n
Since you have a O(n) algorithm to check if a sum z exist with a and b. We only extend it to fix a and check if two other variable can be used to produce the sum. Giving a overall run time of O(n^2)
From here
// returns true if there is triplet with sum equal
// to 'sum' present in A[]. Also, prints the triplet
bool find3Numbers(int A[], int arr_size, int sum)
{
int l, r;
/* Sort the elements */
sort(A, A+arr_size);
/* Now fix the first element one by one and find the
other two elements */
for (int i=0; i<arr_size-2; i++)
{
// To find the other two elements, start two index
// variables from two corners of the array and move
// them toward each other
l = i + 1; // index of the first element in the
// remaining elements
r = arr_size-1; // index of the last element
while (l < r)
{
if( A[i] + A[l] + A[r] == sum)
{
printf("Triplet is %d, %d, %d", A[i],
A[l], A[r]);
return true;
}
else if (A[i] + A[l] + A[r] < sum)
l++;
else // A[i] + A[l] + A[r] > sum
r--;
}
}
// If we reach here, then no triplet was found
return false;
}
I suppose I should've written a short comment as an answer, but I don't have enough reputation for it ... So here goes nothing!
The best algorithm I can come up with right now is O(n^2), to explain this algorithm better we shall start with the a+b = z in O(n) case (or O(nlgn) if it wasn't sorted)
First of all, iterate {a}, and find {b} such that a+b = z. Naively if you iterate all b this would cost O(n) per {a}, leading to a O(n^2) solution. However, if you iterate {a} increasingly, the value of {b} must be strictly decreasing. We can make use of this information to reduce the time complexity as in this code:
for a = first element, b = last element; a != last; a = next a
while ( ( b != first element ) and (a + b > z) )
b = previous elemnet of b
if a + b == z
return true
Note that {b} only goes through the whole list once throughout loop, so it has a complexity of amortized O(n).
Now we can apply this principle back to the original problem, we could iterate through {a}, and apply this O(n) algorithm to {b, c} to find {z-a}, the total complexity is O(n*n = n^2).
Hopefully there is a solution with a lower complexity, I don't think O(n^2) is impressive but I just can't come up with a better one.

Merge k sorted arrays using C

I need to merge k (1 <= k <= 16) sorted arrays into one sorted array. This is for a homework assignment and the Professor requires that this be done using an O(n) algorithm. Merging 2 arrays is no problem and I can do it easily using an O(n) algorithm. I feel that what my professor is asking is undoable for n arrays with an O(n) algorithm.
I am using the below algorithm to split the array indices and running InsertionSort on each partition. I could save these start and end indices into a 2D array. I just don't see how the merging can be done using O(n) because this is going to require more than one loop. If it is possible, does anyone have any hints. I'm not looking for actual code, just a hint as to where I should start/
int chunkSize = round(float(arraySize) / numThreads);
for (int i = 0; i < numThreads; i++) {
int start = i * chunkSize;
int end = start + chunkSize - 1;
if (i == numThreads - 1) {
end = arraySize - 1;
}
InsertionSort(&array[start], end - start + 1);
}
EDIT: The requirement is that the algorithm be O(n) where n is the number of elements in the array. Also, I need to solve this without using a min heap.
EDIT #2: Here is an algorithm I came up with. The problem here is that I'm not storing the result of each iteration back into the original array. I could just copy all of it back in for a loop but that would be expensive. Is there any way I can do this, other than using something memcpy? In the below code, indices is a 2D array [numThreads][2] where array[i][0] is the start index and array[i][1] is the end index of the ith array.
void mergeArrays(int array[], int indices[][2], int threads, int result[]) {
for (int i = 0; i < threads - 1; i++) {
int resPos = 0;
int lhsPos = 0;
int lhsEnd = indices[i][1];
int rhsPos = indices[i+1][0];
int rhsEnd = indices[i+1][1];
while (lhsPos <= lhsEnd && rhsPos <= rhsEnd) {
if (array[lhsPos] <= array[rhsPos]) {
result[resPos] = array[lhsPos];
lhsPos++;
} else {
result[resPos] = array[rhsPos];
rhsPos++;
}
resPos++;
}
while (lhsPos <= lhsEnd) {
result[resPos] = array[lhsPos];
lhsPos++;
resPos++;
}
while (rhsPos <= rhsEnd) {
result[resPos] = array[rhsPos];
rhsPos++;
resPos++;
}
}
}
You can merge K sorted arrays in one sorted array with O(N*log(K)) algorithm, using priority queue with K entries, where N is overall number of elements in all arrays.
If K is considered as constant value (it is limited by 16 in your case), then complexity is O(N).
Note again: N is number of elements in my post, not number of arrays.
It is impossible to merge arrays in O(K) - simple copy takes O(N)
Using the facts you provided:
(1) n is the number of arrays to to merge;
(2) the arrays to be merged are already sorted;
(3) the merge needs to be of order n, that is linear in the number of arrays
(and NOT linear in the number of elements in each array, as you might mistakenly think at first sight).
Use the analogy of merging 4 sorted piles of cards, low to high, face up. You would pick the card with the lowest face value from one of the piles and put it (face down) on the merged deck, until all piles are exhausted.
For your program: keep a counter for each array for the number of elements you have already transferred to the output. This is at the same time an index to the next element in each array NOT merged in the output. Pick the smallest element that you find at one of these locations. You have to lookup the first waiting element in all the arrays for that, so that is of order n.
Also, I don't understand why the answer from MoB got up-votes, it does not answer the question.
Here is one way to do it (pseudocode)
input array[k][n]
init indices[k] = { 0, 0, 0, ... }
init queue = { empty priority queue }
for i in 0..k:
insert i into queue with priority (array[i][0])
while queue is not empty:
let x = pop queue
output array[x, indices[x]]
increment indices[x]
insert x into queue with priority (array[x][indices[x]])
This can probably be simplified further in C. You would have to find a suitable queue implementation to use though as there are none in libc.
Complexity for this operation:
"while queue is not empty" => O(n)
"insert x into queue ..." => O(log k)
=> O(n log k)
Which, if you consider k = constant, is O(n).
After sorting the k sub-arrays (the method doesn't matter), the code does a k-way merge. The simplest implementation does k-1 compares to determine the smallest leading element of each of the k arrays, then moves that element from it's sub-array to the output array and gets the next element from that array. When the end of an array is reached, the algorithm drops down to a (k-1) way merge, then (k-2) way merge, finally there's just one sub-array left and it's copied. This will be O(n) time since k-1 is a constant.
The k-1 compares can be sped up by using a minimum heap (which is how some priority queues are implemented), but it's still O(n), with just a smaller constant. The heap needs to be initialized at the start, then updated each time an element is removed and a new one added.

max. distance of a number greater than a given number in array

i was going through an interview question ..and came up with logic that requires to find:
Find an index j for an element a[j] larger than a[i] (with j < i), such that (i-j) is the largest. And I want to find this j for every index i in the array, in O(n) or O(n log n) time with O(n) extra space.`
What I have done until now :
1) O(n^2) by using simple for loops
2) Build balanced B.S.T. as we scan the elements from left to right and for i'th element find index of element greater than it. But I realized that it can easily be O(n) for single element, therefore O(n^2) for entire array.
I want to know if it is possible to do it in O(n) or O(n log n). If yes, please give some hints.
EDIT : i think i am unable to explain my question . let me explain it clearly:
i want arr[j] on left of arr[i] such that (i-j) is the largest possible ,and arr[j]>arr[i] and find this for all index i i.e.for(i=0 to n-1).
EDIT 2 :example - {2,3,1,6,0}
for 2 , ans=-1
for 3 , ans=-1
for 1 , ans=2 (i-j)==(2-0)
for 6 , ans=-1
for 0 , ans=4 (i-j)==(4-0)
Create an auxillary array of maximums, let it be maxs, which will basically contain the max value on the array up to the current index.
Formally: maxs[i] = max { arr[0], arr[1], ..., arr[i] }
Note that this is pre processing step that can be done in O(n)
Now for each element i, you are looking for the first element in maxs that is larger then arr[i]. This can be done using binary search, and is O(logn) per op.
Gives you total of O(nlogn) time and O(n) extra space.
You can do this in O(n) time using a stack data structure for array indexes for which you have yet to find a solution. It can be implemented as an array of at most n elements.
Iterate over the input array from left to right, starting with the last element:
Pop all indexes from the stack for which the array element is less than the current element. Mark the index of the current element as the solution for each index you pop.
Push the index of the current element on the stack.
Invariant: the array items corresponding to the indexes in the stack are always in ascending order, with the least item on top.
When you reach the beginning of the input, mark any items that still remain on the stack with -1; for them there is no answer.
Each array index is pushed into the stack exactly once and popped at most once, so this algorithm runs in O(n) time.
An example in Python:
def solution(arr):
stack = []
out = [-1]*len(arr)
for i in xrange(len(arr)-1, -1, -1):
while len(stack) > 0 and arr[stack[-1]] < arr[i]:
out[stack.pop()] = i
stack.append(i);
return out
Note that the correct answer for input [2, 4, 1, 5, 3] is [-1, -1, 1, -1, 3]: for a fixed i, the difference j-i is greatest when j is greatest, so you are looking for the leftmost index j, which minimizes the distance. (When j < i, the difference j-i is negative.)
The fastest solution I can think of is allocating a second array and scanning the array left-to-right. As you traverse the array and scan each element, append the index of the element to your second array if arr[index] is greater than the right-most element of your second array. This is O(1) time per append, maximum of n appends, so O(n).
Finally, once your array is complete, take a second pass through your array. For each element, scan your second array using binary search (this is possible since it is implicitly sorted) and find the leftmost (earliest inserted) index j in your array such that arr[j] > arr[i].
To do this, you have to do a modification of binary search. If you find an index j such that arr[j] > arr[i], you still have to check to see if there are any indices k to the left such that arr[k] > arr[i]. You must do this until you find the left-most index.
I think this is O(log n) per binary search and you have to do the search for n elements. So the total time complexity would be close to O(n log n), but I am not sure of this. Any comments/suggestions to this answer would be much appreciated.
Here's my solution in C++
We maintain an increasing array. Compare the current element with the element at the back of the array.
If it is larger or equals to the larget element so far, then append this element to the array, return -1, there's no smaller element on its left.
If not, we use a binary search, find the index and return the difference.
(We still need to append vec.back() to the array, because we cannot change the index)
int findIdx(vector<int>& vec, int target){
auto it = upper_bound(vec.begin(), vec.end(), target);
int idx = int(it-vec.begin());
return idx;
}
vector<int> farestBig(vector<int>& arr){
vector<int> ans{-1};
vector<int> vec{arr[0]};
int n = (int)arr.size();
for(int i=1; i<n; i++){
if(arr[i] >= vec.back()){
ans.push_back(-1);
vec.push_back(arr[i]);
}
else{
int idx = findIdx(vec, arr[i]);
ans.push_back(i-idx);
vec.push_back(vec.back());
}
}
return ans;
}

Find the median of the sum of the arrays

Two sorted arrays of length n are given and the question is to find, in O(n) time, the median of their sum array, which contains all the possible pairwise sums between every element of array A and every element of array B.
For instance: Let A[2,4,6] and B[1,3,5] be the two given arrays.
The sum array is [2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5]. Find the median of this array in O(n).
Solving the question in O(n^2) is pretty straight-forward but is there any O(n) solution to this problem?
Note: This is an interview question asked to one of my friends and the interviewer was quite sure that it can be solved in O(n) time.
The correct O(n) solution is quite complicated, and takes a significant amount of text, code and skill to explain and prove. More precisely, it takes 3 pages to do so convincingly, as can be seen in details here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack in the comments).
It is basically a clever divide-and-conquer algorithm that, among other things, takes advantage of the fact that in a sorted n-by-n matrix, one can find in O(n) the amount of elements that are smaller/greater than a given number k. It recursively breaks down the matrix into smaller submatrixes (by taking only the odd rows and columns, resulting in a submatrix that has n/2 colums and n/2 rows) which combined with the step above, results in a complexity of O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n). It is crazy!
I can't explain it better than the paper, which is why I'll explain a simpler, O(n logn) solution instead :).
O(n * logn) solution:
It's an interview! You can't get that O(n) solution in time. So hey, why not provide a solution that, although not optimal, shows you can do better than the other obvious O(n²) candidates?
I'll make use of the O(n) algorithm mentioned above, to find the amount of numbers that are smaller/greater than a given number k in a sorted n-by-n matrix. Keep in mind that we don't need an actual matrix! The Cartesian sum of two arrays of size n, as described by the OP, results in a sorted n-by-n matrix, which we can simulate by considering the elements of the array as follows:
a[3] = {1, 5, 9};
b[3] = {4, 6, 8};
//a + b:
{1+4, 1+6, 1+8,
5+4, 5+6, 5+8,
9+4, 9+6, 9+8}
Thus each row contains non-decreasing numbers, and so does each column. Now, pretend you're given a number k. We want to find in O(n) how many of the numbers in this matrix are smaller than k, and how many are greater. Clearly, if both values are less than (n²+1)/2, that means k is our median!
The algorithm is pretty simple:
int smaller_than_k(int k){
int x = 0, j = n-1;
for(int i = 0; i < n; ++i){
while(j >= 0 && k <= a[i]+b[j]){
--j;
}
x += j+1;
}
return x;
}
This basically counts how many elements fit the condition at each row. Since the rows and columns are already sorted as seen above, this will provide the correct result. And as both i and j iterate at most n times each, the algorithm is O(n) [Note that j does not get reset within the for loop]. The greater_than_k algorithm is similar.
Now, how do we choose k? That is the logn part. Binary Search! As has been mentioned in other answers/comments, the median must be a value contained within this array:
candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]};.
Simply sort this array [also O(n*logn)], and run the binary search on it. Since the array is now in non-decreasing order, it is straight-forward to notice that the amount of numbers smaller than each candidate[i] is also a non-decreasing value (monotonic function), which makes it suitable for the binary search. The largest number k = candidate[i] whose result smaller_than_k(k) returns smaller than (n²+1)/2 is the answer, and is obtained in log(n) iterations:
int b_search(){
int lo = 0, hi = n, mid, n2 = (n²+1)/2;
while(hi-lo > 1){
mid = (hi+lo)/2;
if(smaller_than_k(candidate[mid]) < n2)
lo = mid;
else
hi = mid;
}
return candidate[lo]; // the median
}
Let's say the arrays are A = {A[1] ... A[n]}, and B = {B[1] ... B[n]}, and the pairwise sum array is C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n} which has n^2 elements and we need to find its median.
Median of C must be an element of the array D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]}: if you fix A[i], and consider all the sums A[i] + B[j], you would see that the only A[i] + B[j = n + 1 - i] (which is one of D) could be the median. That is, it may not be the median, but if it is not, then all other A[i] + B[j] are also not median.
This can be proved by considering all B[j] and count the number of values that are lower and number of values that are greater than A[i] + B[j] (we can do this quite accurately because the two arrays are sorted -- the calculation is a bit messy thought). You'd see that for A[i] + B[n + 1 - j] these two counts are most "balanced".
The problem then reduces to finding median of D, which has only n elements. An algorithm such as Hoare's will work.
UPDATE: this answer is wrong. The real conclusion here is that the median is one of D's element, but then D's median is the not the same as C's median.
Doesn't this work?:
You can compute the rank of a number in linear time as long as A and B are sorted. The technique you use for computing the rank can also be used to find all things in A+B that are between some lower bound and some upper bound in time linear the size of the output plus |A|+|B|.
Randomly sample n things from A+B. Take the median, say foo. Compute the rank of foo. With constant probability, foo's rank is within n of the median's rank. Keep doing this (an expected constant number of times) until you have lower and upper bounds on the median that are within 2n of each other. (This whole process takes expected linear time, but it's obviously slow.)
All you have to do now is enumerate everything between the bounds and do a linear-time selection on a linear-sized list.
(Unrelatedly, I wouldn't excuse the interviewer for asking such an obviously crappy interview question. Stuff like this in no way indicates your ability to code.)
EDIT: You can compute the rank of a number x by doing something like this:
Set i = j = 0.
While j < |B| and A[i] + B[j] <= x, j++.
While i < |A| {
While A[i] + B[j] > x and j >= 0, j--.
If j < 0, break.
rank += j+1.
i++.
}
FURTHER EDIT: Actually, the above trick only narrows down the candidate space to about n log(n) members of A+B. Then you have a general selection problem within a universe of size n log(n); you can do basically the same trick one more time and find a range of size proportional to sqrt(n) log(n) where you do selection.
Here's why: If you sample k things from an n-set and take the median, then the sample median's order is between the (1/2 - sqrt(log(n) / k))th and the (1/2 + sqrt(log(n) / k))th elements with at least constant probability. When n = |A+B|, we'll want to take k = sqrt(n) and we get a range of about sqrt(n log n) elements --- that's about |A| log |A|. But then you do it again and you get a range on the order of sqrt(n) polylog(n).
You should use a selection algorithm to find the median of an unsorted list in O(n). Look at this: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

Find n-th smallest element in array without sorting?

I want to write a program to find the n-th smallest element without using any sorting technique..
Can we do it recursively, divide and conquer style like quick-sort?
If not, how?
You can find information about that problem here: Selection algorithm.
What you are referring to is the Selection Algorithm, as previously noted. Specifically, your reference to quicksort suggests you are thinking of the partition based selection.
Here's how it works:
Like in Quicksort, you start by picking a good
pivot: something that you think is nearly
half-way through your list. Then you
go through your entire list of items
swapping things back and forth until
all the items less than your pivot
are in the beginning of the list, and
all things greater than your pivot
are at the end. Your pivot goes into the leftover spot in the middle.
Normally in a quicksort you'd recurse
on both sides of the pivot, but for
the Selection Algorithm you'll only
recurse on the side that contains the
index you are interested in. So, if
you want to find the 3rd lowest
value, recurse on whichever side
contains index 2 (because index 0 is
the 1st lowest value).
You can stop recursing when you've
narrowed the region to just the one
index. At the end, you'll have one
unsorted list of the "m-1" smallest
objects, and another unsorted list of the "n-m" largest
objects. The "m"th object will be inbetween.
This algorithm is also good for finding a sorted list of the highest m elements... just select the m'th largest element, and sort the list above it. Or, for an algorithm that is a little bit faster, do the Quicksort algorithm, but decline to recurse into regions not overlapping the region for which you want to find the sorted values.
The really neat thing about this is that it normally runs in O(n) time. The first time through, it sees the entire list. On the first recursion, it sees about half, then one quarter, etc. So, it looks at about 2n elements, therefore it runs in O(n) time. Unfortunately, as in quicksort, if you consistently pick a bad pivot, you'll be running in O(n2) time.
This task is quite possible to complete within roughly O(n) time (n being the length of the list) by using a heap structure (specifically, a priority queue based on a Fibonacci heap), which gives O(1) insertion time and O(log n) removal time).
Consider the task of retrieving the m-th smallest element from the list. By simply looping over the list and adding each item to the priority queue (of size m), you can effectively create a queue of each of the items in the list in O(n) time (or possibly fewer using some optimisations, though I'm not sure this is exceedingly helpful). Then, it is a straightforward matter of removing the element with lowest priority in the queue (highest priority being the smallest item), which only takes O(log m) time in total, and you're finished.
So overall, the time complexity of the algorithm would be O(n + log n), but since log n << n (i.e. n grows a lot faster than log n), this reduces to simply O(n). I don't think you'll be able to get anything significantly more efficient than this in the general case.
You can use Binary heap, if u dont want to use fibonacci heap.
Algo:
Contruct the min binary heap from the array this operation will take O(n) time.
Since this is a min binary heap, the element at the root is the minimum value.
So keep on removing element frm root, till u get ur kth minimum value. o(1) operation
Make sure after every remove you re-store the heap kO(logn) operation.
So running time here is O(klogn) + O(n)............so it is O(klogn)...
Two stacks can be used like this to locate the Nth smallest number in one pass.
Start with empty Stack-A and Stack-B
PUSH the first number into Stack-A
The next number onwards, choose to PUSH into Stack-A only if the number is smaller than its top
When you have to PUSH into Stack-A, run through these steps
While TOP of Stack-A is larger than new number, POP TOP of Stack-A and push it into Stack-B
When Stack-A goes empty or its TOP is smaller than new number, PUSH in the new number and restore the contents of Stack-B over it
At this point you have inserted the new number to its correct (sorted) place in Stack-A and Stack-B is empty again
If Stack-A depth is now sufficient you have reached the end of your search
I generally agree to Noldorins' optimization analysis.
This stack solution is towards a simple scheme that will work (with relatively more data movement -- across the two stacks). The heap scheme reduces the fetch for Nth smallest number to a tree traversal (log m).
If your target is an optimal solution (say for a large set of numbers or maybe for a programming assignment, where optimization and the demonstration of it are critical) you should use the heap technique.
The stack solution can be compressed in space requirements by implementing the two stacks within the same space of K elements (where K is the size of your data set). So, the downside is just extra stack movement as you insert.
Here is the Ans to find Kth smallest element from an array:
#include<stdio.h>
#include<conio.h>
#include<iostream>
using namespace std;
int Nthmin=0,j=0,i;
int GetNthSmall(int numbers[],int NoOfElements,int Nthsmall);
int main()
{
int size;
cout<<"Enter Size of array\n";
cin>>size;
int *arr=(int*)malloc(sizeof(int)*size);
cout<<"\nEnter array elements\n";
for(i=0;i<size;i++)
cin>>*(arr+i);
cout<<"\n";
for(i=0;i<size;i++)
cout<<*(arr+i)<<" ";
cout<<"\n";
int n=sizeof(arr)/sizeof(int);
int result=GetNthSmall(arr,size,3);
printf("Result = %d",result);
getch();
return 0;
}
int GetNthSmall(int numbers[],int NoOfElements,int Nthsmall)
{
int min=numbers[0];
while(j<Nthsmall)
{
Nthmin=numbers[0];
for(i=1;i<NoOfElements;i++)
{
if(j==0)
{
if(numbers[i]<min)
{
min=numbers[i];
}
Nthmin=min;
}
else
{
if(numbers[i]<Nthmin && numbers[i]>min)
Nthmin=numbers[i];
}
}
min=Nthmin;
j++;
}
return Nthmin;
}
The simplest way to find the nth largest element in an array without using any sorting methods.
public static void kthLargestElement() {
int[] a = { 5, 4, 3, 2, 1, 9, 8 };
int n = 3;
int max = a[0], min = a[0];
for (int i = 0; i < a.length; i++) {
if (a[i] < min) {
min = a[i];
}
if (a[i] > max) {
max = a[i];
}
}
int max1 = max, c = 0;
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < a.length; j++) {
if (a[j] > min && a[j] < max) {
max = a[j];
}
}
min = max;
max = max1;
c++;
if (c == (a.length - n)) {
System.out.println(min);
}
}
}

Resources