Kth smallest element in an array using partition - c

Suppose you are provided with the following function declaration in the C programming language.
int partition(int a[], int n);
The function treats the first element of a[] as a pivot and rearranges the array so that all elements less than or equal to the pivot is in the left part of the array, and all elements greater than the pivot is in the right part. In addition, it moves the pivot so that the pivot is the last element of the left part. The return value is the number of elements in the left part.
The following partially given function in the C programming language is used to find the kth smallest element in an array a[] of size n using the partition function. We assume k≤n.
int kth_smallest (int a[], int n, int k)
{
int left_end = partition (a, n);
if (left_end+1==k) {
return a[left_end];
}
if (left_end+1 > k) {
return kth_smallest (___________);
} else {
return kth_smallest (___________);
}
}
The missing arguments lists are respectively
(a, left_end, k) and (a+left_end+1, n-left_end-1, k-left_end-1)
(a, left_end, k) and (a, n-left_end-1, k-left_end-1)
(a, left_end+1, n-left_end-1, k-left_end-1) and (a, left_end, k)
(a, n-left_end-1, k-left_end-1) and (a, left_end, k)
I found here a nice explanation about "How to find the kth largest element in an unsorted array of length n in O(n)?"
I've read partition , used in quick sort .Answer is given option (1).I agree with answer . But I need formal explanation .
Can you explain little bit please ?
Edit : AFAIK , Partition algorithm puts the chosen pivot in its correct position . We need recursively partition algorithm to find kth smallest element in an array .Partition algorithm run on a single side of array , either left or right of it's sorted pivot. I got stuck here . I'm thinking , it depends on kth index number ?

Its simple. Say, you pick a q th largest element of the array. In that case, partition has q-1 elements in left half and n-q elements in the right half, while, q th element is the pivot. Now, 3 possibilities:
If q is k, you get the answer, which is your return statement.
If q > k, then k th element is in the left half of the array, and, in the left half, it is, still, the k th largest element. So, in the partition, we pass left half of the array, and k, that we have to find k th largest element there.
If q < k, then, k th largest element in in the right half of the array. Also, since there are q elements smaller than smallest element of this right part, k th largest element in the original array is k - q th largest in the right array. So, we pass the right array, and k-q, to find k-q th largest element of the partition.
EDIT:
Adding comments to your code:
int partition(int a[], int n); //breaks array into 2 parts, according to pivot (1st element of array), left is smaller and right is larger han pivot.
Now, your recursive algorithm:
int kth_smallest (int a[], int n, int k)
{
int left_end = partition (a, n); //get index of a[0] in sorted array a
if (left_end+1==k) { //kth largest element found
return a[left_end];
}
if (left_end+1 > k) { //k th largest element in left part of array, and is k th largest in the left part
return kth_smallest (___________);
} else { ////k th largest element in right part of array, and is (k - left_end) th largest in the right part
return kth_smallest (___________);
}
}

Related

How to choose pivot value in QuickSort?

I have been studying quick sort for a few hours and am confused about choosing a pivot value. Does the pivot value need to exist in the array?
For example if the array is 1,2,5,6 , can we use the value 3 or 4 as a pivot?
We use the position of pivot for dividing the array into sub-arrays but I am a little confused about what will be the pivot position after we move values < 5 to left of the array and values > 5 to right?
7,1,5,3,3,5,8,9,2,1
I dry runned the algo with the pivot 5 and came with the following result:
1,1,5,3,3,5,8,9,2,7
1,1,5,3,3,5,8,9,2,7
1,1,5,3,3,5,8,2,9,7
We can see that the value 2 is still not in the correct position. What am i doing wrong? Sorry if it's a silly question.
I came up with the following code but it's only working when pivot = left, I can't use a random pivot.
template <class T>
void quickSort(vector <T> & arr, int p, int r, bool piv_flag) {
if (p < r) {
int q, piv(p); counter++;
//piv = ((p + r) / 2); doesn't work
q = partition(arr, p, r, piv);
quickSort(arr, p, q - 1, piv_flag); //Sort left half
quickSort(arr, q + 1, r, piv_flag); //Sort right half
}
return;
}
int partition(vector <T> & arr, int left, int right, int piv) {
int i{ left - 0 }, j{ right + 0 }, pivot{ arr[piv] };
while (i < j) {
while (arr[i] <= pivot) { i++; }
while (arr[j] > pivot) { j--; }
if (i < j) (swap(arr[i], arr[j]));
else {
swap(arr[j], arr[piv]);
return j;
}
}
}
Thank you.
In many applications the pivot is chosen as some element in the array, but it can also be any value you may use to separate the numbers in the array into two. If pivot value you choose is a specific element in the array you need to place it between those two groups after you partition the array into two. If not, you can just proceed with the recursive sorting process by calling the indices properly. (i.e. keeping in mind that there is no pivot element in the array, but just the two groups of values)
See this response to a similar question for a concise explanation of some widely-used alternatives for selecting a pivot.
The most important function of the pivot as to serve as a boundary between the groups we are trying to create during the partitioning phase of quicksort. The goal/challenge here is to create those groups in such a way that they are equal or almost equal in size so that quicksort can work efficiently. That challenge is the reason so many pivot selection methods are conceived. (i.e. so that at least in most cases the numbers will be separated into groups of similar size)
As to the second part of your question regarding how the position of the pivot will change once the partitioning is done, see below for a sample partitioning phase.
Say we have an array A with elements [4,1,5,3,3,5,8,9,2,1] and we chose pivot to be the first element, namely 4. The letter E used below indicates the end of the elements that are smaller than the pivot. (i.e. the last element that is smaller than the pivot)
E
[4,1,5,3,3,5,8,9,2,1]
E
[4,1,3,5,3,5,8,9,2,1]
E
[4,1,3,3,5,5,8,9,2,1]
E
[4,1,3,3,2,5,8,9,5,1]
E
[4,1,3,3,2,1,8,9,5,5]
[1,1,3,3,2,4,8,9,5,5] // swap pivot with the rightmost element that is smaller than its value
After this partitioning, the elements are still not sorted, obviously. But all the elements that is to the left of 4 are smaller than 4, and all the ones to its right are larger than 4. To sort them, we recursively use Quicksort on those groups.
Based on your code, below is a sample partitioning code based on the procedure I described above. You may also observe its execution here.
template <class T>
int partition(vector<T>& arr, int left, int right, int piv) {
int leftmostSmallerThanPivot = left;
if(piv != left)
swap(arr[piv], arr[left]);
for(int i=left+1; i <= right; ++i) {
if(arr[i] < arr[left])
swap(arr[++leftmostSmallerThanPivot], arr[i]);
}
swap(arr[left], arr[leftmostSmallerThanPivot]);
return leftmostSmallerThanPivot;
}
template <class T>
void quickSort(vector<T>& arr, int p, int r) {
if (p < r) {
int q, piv(p);
piv = ((p + r) / 2); // works
q = partition(arr, p, r, piv);
quickSort(arr, p, q - 1); //Sort left half
quickSort(arr, q + 1, r); //Sort right half
}
}

Smallest Lexicographic Subsequence of size k in an Array

Given an Array of integers, Find the smallest Lexical subsequence with size k.
EX: Array : [3,1,5,3,5,9,2] k =4
Expected Soultion : 1 3 5 2
The problem can be solved in O(n) by maintaining a double ended queue(deque). We iterate the element from left to right and ensure that the deque always holds the smallest lexicographic sequence upto that point. We should only pop off element if the current element is smaller than the elements in deque and the total elements in deque plus remaining to be processed are at least k.
vector<int> smallestLexo(vector<int> s, int k) {
deque<int> dq;
for(int i = 0; i < s.size(); i++) {
while(!dq.empty() && s[i] < dq.back() && (dq.size() + (s.size() - i - 1)) >= k) {
dq.pop_back();
}
dq.push_back(s[i]);
}
return vector<int> (dq.begin(), dq.end());
}
Here is a greedy algorithm that should work:
Choose Next Number ( lastChoosenIndex, k ) {
minNum = Find out what is the smallest number from lastChoosenIndex to ArraySize-k
//Now we know this number is the best possible candidate to be the next number.
lastChoosenIndex = earliest possible occurance of minNum after lastChoosenIndex
//do the same process for k-1
Choose Next Number ( lastChoosenIndex, k-1 )
}
Algorithm above is high complexity.
But we can pre-sort all the array elements paired with their array index and do the same process greedily using a single loop.
Since we used sorting complexity still will be n*log(n)
Ankit Joshi's answer works. But I think it can be done with just a vector itself, not using a deque as all the operations done are available in vector too. Also in Ankit Joshi's answer, the deque can contain extra elements, we have to manually pop off those elements before returning. Add these lines before returning.
while(dq.size() > k)
{
dq.pop_back();
}
It can be done with RMQ in O(n) + Klog(n).
Construct an RMQ in O(n).
Now find the sequence where every ith element will be the smallest no. from pos [x(i-1)+1 to n-(K-i)] (for i [1 to K] , where x0 = 0, xi is the position of the ith smallest element in the given array)
If I've understood the question right, here's a DP Algorithm that should work but it takes O(NK) time.
//k is the given size and n is the size of the array
create an array dp[k+1][n+1]
initialize the first column with the maximum integer value (we'll need it later)
and the first row with 0's (keep element dp[0][0] = 0)
now run the loop while building the solution
for(int i=1; i<=k; i++) {
for(int j=1; j<=n; j++) {
//if the number of elements in the array is less than the size required (K)
//initialize it with the maximum integer value
if( j < i ) {
dp[i][j] = MAX_INT_VALUE;
}else {
//last minimum of size k-1 with present element or last minimum of size k
dp[i][j] = minimun (dp[i-1][j-1] + arr[j-1], dp[i][j-1]);
}
}
}
//it consists the solution
return dp[k][n];
The last element of the array contains the solution.
I suggest you can try use modified merge sort. The place for
modified is merge part, discard the duplicate value.
select the smallest four
The complexity is o(n logn)
Still thinking whether complexity can be o(n)

Skiena's Quick Sort implementation

I find it hard to understand Skiena's quick sort. Specifically, what he is doing with the partition function, especially the firsthigh parameter?
quicksort(item_type s[], int l, int h) {
int p; /* index of partition */
if ((h - l) > 0) {
p = partition(s, l, h);
quicksort(s, l, p-1);
quicksort(s, p+1, h);
}
}
We can partition the array in one linear scan for a particular pivot element by maintaining three sections of the array: less than the pivot (to the left of firsthigh), greater than or equal to the pivot (between firsthigh and i), and unexplored (to the right of i), as implemented below:
int partition(item_type s[], int l, int h) {

int i; /* counter */
int p; /* pivot element index */
int firsthigh; /* divider position for pivot element */
p = h;
firsthigh = l;
for (i = l; i <h; i++) {
if (s[i] < s[p]) {
swap(&s[i],&s[firsthigh]);
firsthigh ++;
}
swap(&s[p],&s[firsthigh]);
return(firsthigh);
}
I recommend following the reasoning with pencil and paper while reading through this answer and its considered example case
Some parenthesis are missing from the snippet:
int partition(item_type s[], int l, int h)
{
int i;/* counter */
int p;/* pivot element index */
int firsthigh;/* divider position for pivot element */
p = h;
firsthigh = l;
for (i = l; i < h; i++) {
if (s[i] < s[p]) {
swap(s[i], s[firsthigh]);
firsthigh++;
}
}
swap(s[p], s[firsthigh]);
return(firsthigh);
}
void quicksort(item_type s[], int l, int h)
{
int p; /* index of partition */
if ((h - l)>0) {
p = partition(s, l, h);
quicksort(s, l, p - 1);
quicksort(s, p + 1, h);
}
}
Anyway the partition function works as follows: suppose we have the array { 2,4,5,1,3 } of size 5. The algorithm grabs the last element 3 as the pivot and starts exploring the items iteratively:
2 is first encountered.. since 2 is less than the pivot element 3, it is swapped with the position 0 pointed by firsthigh. This has no effect since 2 is already at position 0
2,4,5,1,3
^
firsthigh is incremented since 2 is now a stable value at that position.
Then 4 is encountered. This time 4 is greater than 3 (than the pivot) so no swap is necessary. Notice that firsthigh continues pointing to 4. The same happens for 5.
When 1 is encountered, this value should be put after 2, therefore it is swapped with the position pointed by firsthigh, i.e. with 4's position
2,4,5,1,3
^ ^ swap
2,1,5,4,3
^ now firsthigh points here
When the elements end, the pivot element is swapped with firsthigh's position and therefore we get
2,1,| 3,| 4,5
notice how the values less than the pivot are put on the left while the values greater than the pivot remain on the right. Exactly what is expected by a partition function.
The position of the pivot element is returned and the process is repeated on the subarrays on the left and right of the pivot until a group of 0 elements is encountered (the if condition is the bottom of the recursion).
Therefore firsthigh means: the first element greater than the pivot that I know of. In the example above firsthigh is put on the first element since we still don't know if that element is greater or less than the pivot
2,4,5,1,3
^
as soon as we realize 2 is not the first element greater than the pivot or we swap a less-than-the-pivot element in that position, we try to keep our invariant valid: ok, advance firsthigh and consider 4 as the first element greater than the pivot. This gives us the three sections cited in the textbook.
At all times, everything strictly to the left of firstHigh is known to be less than the pivot (notice that there are initially no elements in this set), and everything at or to the right of it is either unknown, or known to be >= the pivot. Think of firstHigh as the next place where we can put a value lower than the pivot.
This algorithm is very similar to the in-place algorithm you would use to delete all items that are >= the pivot while "compacting" the remaining items as far to the left as possible. For the latter, you would maintain two indices l and firstHigh (which you could think of as from and to, respectively) that both start at 0, and walk l through the array; whenever you encounter an s[l] that should not be removed, you shunt it as far left as possible: i.e., you copy it to s[firstHigh] and then you increment firstHigh. This is safe because we always have firstHigh <= l. The only difference here is that we can't afford to overwrite the deleted (possibly->=-to-pivot) item currently residing at s[firstHigh], so we swap the two items instead.

quicksort code understanding

i have a quicksort code that is supposed to run on the text "B A T T A J U S" (ignore blanks). But i dont seem to understand the code that well.
void quicksort (itemType a[], int l, int r)
{
int i, j; itemType v;
if (r>l)
{
v = a[r]; i = l-1; j = r;
for (;;)
{
while (a[++i] < v);
while (a[--j] >= v);
if (i >= j) break;
swap(a,i,j);
}
swap(a,i,r);
quicksort(a,l,i-1);
quicksort(a,i+1,r);
}
}
i can explain what i understand: the first if check if l < r which in this case it is since, s is greater than b. THen i get alittle confused: v is set to be equal to a[r], does this mean S? since S is all the way to the right? then l is set to outside the "array" since its -1. (so its undefined, i assume) then j is set to be equal to r, but is that the posision r? as in S?
I kinda dont understand what values are set to what, if the a[r] = the letter in the posision or the or anything else. Hopefully some1 can explain me how the first swap works, so i hopefully can learn this?
It is probably better to start with an understanding of the QuickSort algorithm, and then see how the code corresponds to it, than to study the code to try to figure out how QuickSort works. Basic QuickSort (which is what you have) is in fact a pretty simple algorithm. To sort an array A:
If the length of A is less than 2 then the array is already sorted. Otherwise,
Select any element of A to be a "pivot element".
Rearrange the other elements as needed so that all those that are less than the pivot are at the beginning of A, and those that are greater than or equal to the pivot are at the end. (This particular version also puts the pivot itself between the two, which is common but not strictly necessary; it could simply be included in the upper subarray, and the algorithm would still work.)
Apply the QuickSort procedure to each of the two sub-arrays produced by (3).
Your particular code chooses the right-most element of each (sub)array as the pivot element, and at step (4) it excludes the pivot from the sub-arrays to be recursively sorted.
Quick sort works by separating your array into a "left" subarray which contains only values stricly less than an arbitrarily chosen a pivot value and a "right" subarray that contains only elements that are greater than or equal to the pivot. Once the array has been divided like this, each of the two subarrays are sorted using the same algorithm. Here is how this applies to your code:
v = a[r] sets the pivot value to the last element in the array. This works well since the array is presumably unsorted to begin with, so a[r] is as good a value as any.
while(a[++i] < v) ; keeps stopping at the first element of the left sub-array that is greater than or equal to the pivot, v. When this loop ends, i is the index of an element that should be in the right sub-array rather than the left.
while(a[--j] >= v) ; does the same thing, except that it stops at the last element of the right sub-array that is strictly less than the pivot, v. When this loop ends, j is the index of an element that should be in the left sub-array rather than the right.
Whenever we find a pair of elements that are in the wrong sub-arrays, we swap them.
When all of the elements in the array are sorted (i meets j), we swap the pivot with the element at index i (which is now guaranteed to be in the right sub-array).
Since the pivot is guaranteed to be in the right position (left sub-array is strictly less and right sub-array is greater than or equal), we need to sort the sub-arrays but not the pivot. That is why the recursive calls use indices l,i-1 and i+1,r, leaving the pivot at index i.
I can't offer a solution in that exact form. That code is overly complicated in my thinking.
Also not sure if what I'm proposing is a bubble sort, or modified bubble, but to me just easier. My added comment is that quicksort() is calling itself, therefore it is recursive. Not good in my book for something as simple as a sort. This all depends on what you need for size and efficiency. If you're sorting many terms, then my proposed sort is not the best.
for(i = 0; i < (n - 1); i++) {
for(j = (i + 1); j < n; j++) {
if(value[i] > value[j]) {
tmp = value[i];
value[i] = value[j];
value[j] = tmp;
}
}
}
Where
n is the number of total elements.
i, j, and tmp are integers
value[] is an array of integers to sort

Given 2 sorted arrays of integers, find the nth largest number in sublinear time [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to find the kth smallest element in the union of two sorted arrays?
This is a question one of my friends told me he was asked while interviewing, I've been thinking about a solution.
Sublinear time implies logarithmic to me, so perhaps some kind of divide and conquer method. For simplicity, let's say both arrays are the same size and that all elements are unique
I think this is two concurrent binary searches on the subarrays A[0..n-1] and B[0..n-1], which is O(log n).
Given sorted arrays, you know that the nth largest will appear somewhere before or at A[n-1] if it is in array A, or B[n-1] if it is in array B
Consider item at index a in A and item at index b in B.
Perform binary search as follows (pretty rough pseudocode, not taking in account 'one-off' problems):
If a + b > n, then reduce the search set
if A[a] > B[b] then b = b / 2, else a = a / 2
If a + b < n, then increase the search set
if A[a] > B[b] then b = 3/2 * b, else a = 3/2 * a (halfway between a and previous a)
If a + b = n then the nth largest is max(A[a], B[b])
I believe worst case O(ln n), but in any case definitely sublinear.
I believe that you can solve this problem using a variant on binary search. The intuition behind this algorithm is as follows. Let the two arrays be A and B and let's assume for the sake of simplicity that they're the same size (this isn't necessary, as you'll see). For each array, we can construct parallel arrays Ac and Bc such that for each index i, Ac[i] is the number of elements in the two arrays that are no larger than A[i] and Bc[i] is the number of elements in the two arrays that are no larger than B[i]. If we could construct these arrays efficiently, then we could find the kth smallest element efficiently by doing binary searches on both Ac and Bc to find the value k. The corresponding entry of A or B for that entry is then the kth largest element. The binary search is valid because the two arrays Ac and Bc are sorted, which I think you can convince yourself of pretty easily.
Of course, this solution doesn't work in sublinear time because it takes O(n) time to construct the arrays Ac and Bc. The question then is - is there some way that we can implicitly construct these arrays? That is, can we determine the values in these arrays without necessarily constructing each element? I think that the answer is yes, using this algorithm. Let's begin by searching array A to see if it has the kth smallest value. We know for a fact that the kth smallest value can't appear in the array in array A after position k (assuming all the elements are distinct). So let's focus just on the the first k elements of array A. We'll do a binary search over these values as follows. Start at position k/2; this is the k/2th smallest element in array A. Now do a binary search in array B to find the largest value in B smaller than this value and look at its position in the array; this is the number of elements in B smaller than the current value. If we add up the position of the elements in A and B, we have the total number of elements in the two arrays smaller than the current element. If this is exactly k, we're done. If this is less than k, then we recurse in the upper half of the first k elements of A, and if this is greater than k we recurse in the lower half of the first elements of k, etc. Eventually, we'll either find that the kth largest element is in array A, in which case we're done. Otherwise, repeat this process on array B.
The runtime for this algorithm is as follows. The search of array A does a binary search over k elements, which takes O(lg k) iterations. Each iteration costs O(lg n), since we have to do a binary search in B. This means that the total time for this search is O(lg k lg n). The time to do this in array B is the same, so the net runtime for the algorithm is O(lg k lg n) = O(lg2 n) = o(n), which is sublinear.
This is quite similar answer to Kirk's.
Let Find( nth, A, B ) be function that returns nth number, and |A| + |B| >= n. This is simple pseudo code without checking if one of array is small, less than 3 elements. In case of small array one or 2 binary searches in larger array is enough to find needed element.
Find( nth, A, B )
If A.last() <= B.first():
return B[nth - A.size()]
If B.last() <= A.first():
return A[nth - B.size()]
Let a and b indexes of middle elements of A and B
Assume that A[a] <= B[b] (if not swap arrays)
if nth <= a + b:
return Find( nth, A, B.first_half(b) )
return Find( nth - a, A.second_half(a), B )
It is log(|A|) + log(|B|), and because input arrays can be made to have n elements each it is log(n) complexity.
int[] a = new int[] { 11, 9, 7, 5, 3 };
int[] b = new int[] { 12, 10, 8, 6, 4 };
int n = 7;
int result = 0;
if (n > (a.Length + b.Length))
throw new Exception("n is greater than a.Length + b.Length");
else if (n < (a.Length + b.Length) / 2)
{
int ai = 0;
int bi = 0;
for (int i = n; i > 0; i--)
{
// find the highest from a or b
if (ai < a.Length)
{
if (bi < b.Length)
{
if (a[ai] > b[bi])
{
result = a[ai];
ai++;
}
else
{
result = b[bi];
bi++;
}
}
else
{
result = a[ai];
ai++;
}
}
else
{
if (bi < b.Length)
{
result = b[bi];
bi++;
}
else
{
// error, n is greater than a.Length + b.Length
}
}
}
}
else
{
// go in reverse
int ai = a.Length - 1;
int bi = b.Length - 1;
for (int i = a.Length + b.Length - n; i >= 0; i--)
{
// find the lowset from a or b
if (ai >= 0)
{
if (bi >= 0)
{
if (a[ai] < b[bi])
{
result = a[ai];
ai--;
}
else
{
result = b[bi];
bi--;
}
}
else
{
result = a[ai];
ai--;
}
}
else
{
if (bi >= 0)
{
result = b[bi];
bi--;
}
else
{
// error, n is greater than a.Length + b.Length
}
}
}
}
Console.WriteLine("{0} th highest = {1}", n, result);
Sublinear of what though? You can't have an algorithm that doesn't check at least n elements, even verifying a solution would require checking that many. But the size of the problem here should surely mean the size of the arrays, so an algorithm that only checks n elements is sublinear.
So I think there's no trick here, start with the list with the smaller starting element and advance until you either:
Reach the nth element, and you're done.
Find the next element is bigger than the next element in the other list, at which point you switch to the other list.
Run out of elements and switch.

Resources