I am looking for an algorithm that returns the indice of the kth largest element in a array. I found many algoritms but most of them return the list of the k largest elements (Extract K largest elements from array of N integers in O(N + K) time, Best way to retrieve K largest elements from large unsorted arrays?, ...).
In this case, only the indice of the kth largest element is needed. All the kth largest elementS are not needed. As array and k are large, I would like to avoid the allocation of an array (or other structure, e.g. linked list) of dimension k and the initial array must be unchanged. What is (are) the most efficient algorithm(s) ?
Finding the k'th largest element in an array cannot be done in less than O(k*n) (or O((n-k)*n)) time without modifying the input array or allocating more than O(1) additional space. If you do not permute the array, you can't do any better than brute force; if you do permute the array, you can't reverse the permutation without keeping extra information around to do it.
(A randomized selection algorithm can achieve linearithmic expected time, but cannot improve on the worst-case time.)
Related
We are given a stream of numbers and Q queries.
At each query, we are given a number k.
We need to find the kth smallest number at that point of the stream.
How to approach this problem?
total size of stream is < 10^5
1 < number < 10^9
I tried linked list but finding the right position is time-consuming and in array inserting is time-consuming.
You can use some kind of search tree. They are many different kind of search trees but all the common ones allow insertion in O(log n) and finding the kth element in O(log n) too.
If the stream is too long to keep all the numbers in memory and you also know an upper bound on k, you can prune the tree by only keeping a number of elements equal to the upper bound.
You can use a max heap with size=k.
Put elements until the heap's size reaches to k. After then, put an element and pop the heap's root so you can keep the size=k. Removing(extracting) root makes sense because there are at least k elements smaller than the root value.
When you finished iterating the stream, the root of the heap will be the k-th smallest element. Because you're having smallest k elements in the heap and the root is the largest among them.
As the heap's size is k, time complexity is O(n lg k) which could a bit better than O(n lg n). And the implementation would be a way easy.
Q) Given an array A1, A2 ... AN and K count how many subarrays have inversion count greater than or equal to K.
N <= 10^5
K <= N*(N-1)/2
So, this question I came across in an interview. I came up with the naive solution of forming all subarrays with two for loops (O(n^2) ) and counting inversions in the array using modified merge sort which is O(nlogn). This leads to a complexity of O(n^3logn) which I guess can be improved. Any leads how I can improve it? Thanks!
You can solve it in O(nlogn) if I'm not wrong, using two moving pointers.
Start with the left pointer in the first element and move the right pointer until you have a subarray with >= K inversions. To do that, you can use any balanced binary search tree and every time you move the pointer to the right, count how many elements bigger than this one are already in the tree. Then you insert the element in the tree too.
When you hit the point in which you already have >= K inversions, you know that every longer subarray with the same starting element also satisfies the restriction, so you can add them all.
Then move the left pointer one position to the right and subtract the inversions of it (again, look in the tree for elements smaller than it). Now you can do the same as before again.
An amortized analysis easily shows that this is O(nlogn), as the two pointers only traverse once the array and each operation in the tree is O(logn).
I know that I can get the Kth order statistic (i.e. the kth smallest number in an array) by using quickselect in almost linear time, but what if I needed the k smallest elements of an array?
The wikipedia link has a pseudocode for the single-element lookup, but not for the k smallest elements lookup.
How should quickselect be modified to attain it in linear time (if possible) ?
I believe that after you use quickselest to find the k-th statictic, you will automatically find that the first k elements of the resulting array are the k smallest elements, only probably not sorted.
Moreover, quickselect actually does partitioning with respect to the k-th statistics: all the elements before k-th statistic is smaller (or equal) to it, and all the elements after are bigger or equal. This is easy to prove.
Note, for example that for C++ nth_element
The other elements are left without any specific order, except that
none of the elements preceding nth are greater than it, and none of
the elements following it are less.
If you need not just k smallest elements, but sorted k smallest elements, you can of course sort them after quickselect.
Actually modifying quickselect is not needed. If I had an array (called arrayToSearch in this example) and I wanted the k smallest items I'd do this:
int i;
int k = 10; // if you wanted the 10 smallest elements
int smallestItems = new Array(k);
for (i = 0; i < k; i++)
{
smallestItems[i] = quickselect(i, arrayToSearch);
}
Edit: I was under the assumption that k would be a relatively small number which would make the effective Big-O O(n). If not assuming k is small this would have a speed of O(k*n), not linear time. My answer is easier to comprehend, and applicable for most practical purposes. recursion.ninja's answer may be more technically correct, and therefore better for academic purposes.
I have an array
A[4]={4,5,9,1}
I need it would give the first 3 top elements like 9,5,4
I know how to find the max element but how to find the 2nd and 3rd max?
i.e if
max=A[0]
for(i=1;i<4;i++)
{
if (A[i]>max)
{
max=A[i];
location=i+1;
}
}
actually sorting will not be suitable for my application because,
the position number is also important for me i.e. I have to know in which positions the first 3 maximum is occurring, here it is in 0th,1th and 2nd position...so I am thinking of a logic
that after getting the max value if I could put 0 at that location and could apply the same steps for that new array i.e.{4,5,0,1}
But I am bit confused how to put my logic in code
Consider using the technique employed in the Python standard library. It uses an underlying heap data structure:
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapify(result)
for elem in it:
heappushpop(result, elem)
result.sort(reverse=True)
return result
The steps are:
Make an n length fixed array to hold the results.
Populate the array with the first n elements of the input.
Transform the array into a minheap.
Loop over remaining inputs, replacing the top element of the heap if new data element is larger.
If needed, sort the final n elements.
The heap approach is memory efficient (not requiring more memory than the target output) and typically has a very low number of comparisons (see this comparative analysis).
You can use the selection algorithm
Also to mention that the complexity will be O(n) ie, O(n) for selection and O(n) for iterating, so the total is also O(n)
What your essentially asking is equivalent to sorting your array in descending order. The fastest way to do this is using heapsort or quicksort depending on the size of your array.
Once your array is sorted your largest number will be at index 0, your second largest will be at index 1, ...., in general your nth largest will be at index n-1
you can follw this procedure,
1. Add the n elements to another array B[n];
2. Sort the array B[n]
3. Then for each element in A[n...m] check,
A[k]>B[0]
if so then number A[k] is among n large elements so,
search for proper position for A[k] in B[n] and replace and move the numbers on left in B[n] so that B[n] contains n large elements.
4. Repeat this for all elements in A[m].
At the end B[n] will have the n largest elements.
(This question is inspired by deque::insert() at index?, I was surprised that it wasn't covered in my algorithm lecture and that I also didn't find it mentioned in another question here and even not in Wikipedia :). I think it might be of general interest and I will answer it myself ...)
Dynamic arrays are datastructures that allow addition of elements at the end in amortized constant time O(1) (by doubling the size of the allocated memory each time it needs to grow, see Amortized time of dynamic array for a short analysis).
However, insertion of a single element in the middle of the array takes linear time O(n), since in the worst case (i.e. insertion at first position) all other elements needs to be shifted by one.
If I want to insert k elements at a specific index in the array, the naive approach of performit the insert operation k times would thus lead to a complexity of O(n*k) and, if k=O(n), to a quadratic complexity of O(n²).
If I know k in advance, the solution is quite easy: Expand the array if neccessary (possibly reallocating space), shift the elements starting at the insertion point by k and simply copy the new elements.
But there might be situations, where I do not know the number of elements I want to insert in advance: For example I might get the elements from a stream-like interface, so I only get a flag when the last element is read.
Is there a way to insert multiple (k) elements, where k is not known in advance, into a dynamic array at consecutive positions in linear time?
In fact there is a way and it is quite simple:
First append all k elements at the end of the array. Since appending one element takes O(1) time, this will be done in O(k) time.
Second rotate the elements into place. If you want to insert the elements at position index. For this you need to rotate the subarray A[pos..n-1] by k positions to the right (or n-pos-k positions to the left, which is equivalent). Rotation can be done in linear time by use of a reverse operation as explained in Algorithm to rotate an array in linear time. Thus the time needed for rotation is O(n).
Therefore the total time for the algorithm is O(k)+O(n)=O(n+k). If the number of elements to be inserted is in the order of n (k=O(n)), you'll get O(n+n)=O(2n)=O(n) and thus linear time.
You could simply allocate a new array of length k+n and insert the desired elements linearly.
newArr = new T[k + n];
for (int i = 0; i < k + n; i++)
newArr[i] = i <= insertionIndex ? oldArr[i]
: i <= insertionIndex + k ? toInsert[i - insertionIndex - 1]
: oldArr[i - k];
return newArr;
Each iteration takes constant time, and it runs k+n times, thus O(k+n) (or, O(n) if you so like).