I need to implement a quicksort algorithm that uses random pivot; I'm working with big matrices, so i can't afford the worst case.
Now, I've found this implementation that works correctly, but it uses as pivot the first element.
I've modified it to fit my scenario (I'm working with Sparse Matrices, and I need to sort the elements by "row index, col index") and this is what I have:
void quicksortSparseMatrix(struct sparsematrix *matrix,int first,int last){
int i, j, pivot, temp_I, temp_J;
double temp_val;
if(first<last){
pivot=first; //(rand() % (last - first + 1)) + first;
i=first;
j=last;
while(i<j){
while(lessEqual(matrix,i, pivot)&&i<last)
i++;
while(greater(matrix,j, pivot))
j--;
if(i<j){
temp_I = matrix->I[i];
temp_J = matrix->J[i];
temp_val = matrix->val[i];
matrix->I[i] = matrix->I[j];
matrix->J[i] = matrix->J[j];
matrix->val[i] = matrix->val[j];
matrix->I[j]=temp_I;
matrix->J[j]=temp_J;
matrix->val[j]=temp_val;
}
}
temp_I = matrix->I[pivot];
temp_J = matrix->J[pivot];
temp_val = matrix->val[pivot];
matrix->I[pivot] = matrix->I[j];
matrix->J[pivot] = matrix->J[j];
matrix->val[pivot] = matrix->val[j];
matrix->I[j]=temp_I;
matrix->J[j]=temp_J;
matrix->val[j]=temp_val;
quicksortSparseMatrix(matrix,first,j-1);
quicksortSparseMatrix(matrix,j+1,last);
}
}
Now, the problem is that some of the matrices i'm working with are almost sorted and the algorithm runs extremely slow. I want to modify my algorithm to make it use random pivot, but if I apply the change you see commented in the code above pivot=(rand() % (last - first + 1)) + first;, the algorithm does not sort the data correctly.
Can anyone help me figure out how to change the algorithm to use a random pivot and sort the data correctly?
EDIT: this is the struct sparsematrix definition, I don't think you need it, but for completeness...
struct sparsematrix {
int M, N, nz;
int *I, *J;
double *val;
};
Pivot should be a value, not an index. The first comparison should be lessthan (not lessthanorequal), which will also eliminate the need for checking for i < last . After swapping, there should be i++ and j-- . The last two lines should be quicksortSparseMatrix(matrix,first,j); and quicksortSparseMatrix(matrix,i,last); , for this variation of Hoare partition scheme. Example code for array:
void QuickSort(int *a, int lo, int hi)
{
int i, j;
int p, t;
if(lo >= hi)
return;
p = a[lo + 1 + (rand() % (hi - lo))];
i = lo;
j = hi;
while (i <= j){
while (a[i] < p)i++;
while (a[j] > p)j--;
if (i > j)
break;
t = a[i];
a[i] = a[j];
a[j] = t;
i++;
j--;
}
QuickSort(a, lo, j);
QuickSort(a, i, hi);
}
A merge sort on an array of indexes to rows of matrix may be faster: more moves of the indexes, but fewer compares of rows of matrix. A second temp array of indexes will be needed for merge sort.
Related
I am currently writing an algorithm to analyze the sorting algorithms. I have many inputs from 1000 numbers up to 1 000 000 inputs.
Currently I'm having some problems with the Quick Sort function. As I have an input of 1 000 000 of similar numbers (numbers between 1-10) this code will throw me an error (0xC00000FD) (seems to be an stack overflow exception). Now, I don't know what to do to lower the numbers of recursion calls or how to increase the stack so there could be multiple recursion calls. I'm attaching the code for the Quick Sort.
void swap(int *xp, int *yp)
{
int temp = *xp;
*xp = *yp;
*yp = temp;
}
int partition (int arr[], int low, int high)
{
int pivot = arr[(low+high)/2];
int i = (low - 1);
for (int j = low; j <= high - 1; j++)
{
if (arr[j] < pivot)
{
i++;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
void quicksort(int A[], int l, int h)
{
if (l < h) {
int p = partition(A, l, h);
quicksort(A, l, p - 1);
quicksort(A, p + 1, h);
}
}
If you get stack overflows during recursion, it means that your recursion is broken. Recursion in general should be avoided since it has a huge potential for creating slow and dangerous algorithms. If you are a beginner programmer, then I would strongly advise to simply forget that you ever heard about recursion and stop reading here.
The only time it can be reasonably allowed is when the recursive call is placed at the end of the function, so-called "tail call recursion". This is pretty much the only form of recursion that the compiler can actually optimize and replace with an inlined loop.
If it cannot perform tail-call optimization, then it means that a function is actually called each time you do recursion. Meaning that the stack keeps piling up and you also get function call overhead. This is both needlessly slow and unacceptably dangerous. All recursive functions you ever write must therefore be disassembled for the target, to see that the code has not gone haywire.
Since this code seems to be taken from this site https://www.geeksforgeeks.org/iterative-quick-sort/, they already described most of these problems with the code for you there. They have a "quickSortIterative" function at the bottom which is a much better implementation.
My take is that the aim of the tutorial is to show you some broken code (the code in your question) then demonstrate how to write it correctly, by getting rid of the recursion.
Stack overflow can be avoided by only recursing on the smaller partition:
void quicksort(int A[], int l, int h)
{
while (l < h) {
int p = partition(A, l, h);
if((p - l) <= (h - p)){
quicksort(A, l, p - 1);
l = p + 1;
} else {
quicksort(A, p + 1, h);
h = p - 1;
}
}
}
However, worst case time complexity remains at O(n^2), and the Lomuto partition scheme used in the questions code has issues with a large number of duplicate values. Hoare partition scheme doesn't have this issue (in fact more duplicates results in less time).
https://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme
Example code with partition logic in quicksort:
void quicksort(int a[], int lo, int hi)
{
int p;
int i, j;
while (lo < hi){
p = a[lo + (hi - lo) / 2];
i = lo - 1;
j = hi + 1;
while (1){
while (a[++i] < p);
while (a[--j] > p);
if (i >= j)
break;
swap(a+i, a+j);
}
if(j - lo < hi - j){
quicksort(a, lo, j);
lo = j+1;
} else {
quicksort(a, j+1, hi);
hi = j;
}
}
}
Let us suppose we have two arrays A[] and B[]. Each array contains n distinct integers which are not sorted. We need to find kth ranked element in the union of the 2 arrays in the most efficient way possible.
(Please dont post answers about merging the arrays and then sorting them to return kth index in the merged array)
You can use the selection algorithm to find the Kth item, in O(N) time, where N is the sum of the sizes of the arrays. Obviously, you treat the two arrays as a single large array.
Union of arrays can be done in linear time. I am skipping that part.
You can use the partition() algorithm which is used in the quick sort. In quick sort, the function will have to recurse two branches. However here we will just conditionally invoke the recursive call and thus only 1-branched recursion.
Main concept: partition() will place the chosen PIVOT element at its appropriate sorted position. Hence we can use this property to select that half of the array in which we are interested and just recurse on that half. This will prevent us from sorting the entire array.
I have written the below code based on the above concept. Assumption rank = 0 implies the smallest element in the array.
void swap (int *a, int *b)
{
int tmp = *a;
*a = *b;
*b = tmp;
}
int partition (int a[], int start, int end)
{
/* choose a fixed pivot for now */
int pivot = a[end];
int i = start, j;
for (j = start; j <= end-1; j++) {
if (a[j] < pivot) {
swap (&a[i], &a[j]);
i++;
}
}
/* Now swap the ith element with the pivot */
swap (&a[i], &a[end]);
return i;
}
int find_k_rank (int a[], int start, int end, int k)
{
int x = partition (a, start, end);
if (x == k) {
return a[x];
} else if (k < x) {
return find_k_rank (a, start, x-1, k);
} else {
return find_k_rank (a, x+1, end, k);
}
}
int main()
{
int a[] = {10,2,7,4,8,3,1,5,9,6};
int N = 10;
int rank = 3;
printf ("%d\n", find_k_rank (a, 0, N-1, rank));
}
void mergeSubArr(int arr[], int lb, int mid, int ub){
int temp[size], i = lb, j = mid, k = 0;
while(i<mid && j<=ub){
if(arr[i] <= arr[j])
temp[k] = arr[i++];
else
temp[k] = arr[j++];
k++;
}
while(i<mid)
temp[k++] = arr[i++];
while(j<=ub)
temp[k++] = arr[j++];
for(k=0;k<size;k++)
arr[k] = temp[k];
}
void mergeArr(int num[], int lb, int ub){
int mid;
if((ub-lb)>1){
mid = (lb+ub)/2;
mergeArr(num, lb, mid);
mergeArr(num, mid+1, ub);
mergeSubArr(num, lb, mid+1, ub);
}
}
when calling the function mergeArr the output is outputting some other elements which are not initially present in the array? I think there is something wrong with the mergeSubArr function please help me out here in finding a solution.
There are basically two things wrong with your algorithm:
You fill the array temp from index 0 on. Later, you need to copy it to arr, but starting at the lower bound. You also shouldn't copy size elements, because you can't be sure that the original array even has size elements. (Thanks, BLUEPIXY, for spotting this one.) You should probably also make sure that size is big enough to hold all elements.
Your upper bound is inclusive. That means that if (ub - lb > 1) catches the case where there are more than two elements. You still have to treat the case where there are two elements, which may be in the wrong order, either by making the above condition ub - lb > 0 or by having an else clause where you swap the elements if they are in the wrong order.
So make these changes:
int temp[size], i = lb, j = mid, k = lb;
Start the temp array from the lower bound.
for(k = lb; k <= ub; k++)
arr[k] = temp[k];
Copy only the subarray in question.
if (ub - lb > 0) ...
Catch the case where there are two elements, too.
On a personal note, I find the mergesort algorithm much easier when it passes subarrays via pointer arithmetic and length. It forgoes much fiddling with offsets.
All are okk. Instead if((ub-lb)>1) Just replace
if((ub-lb)>0)
OR
if(ub > lb)
I'm a computer science student (just started), I was working on writing from pseudocode a randomized pivot version of Quicksort. I've written and tested it, and it all works perfectly however...
The partition part looks a bit too complicated, as it feels I have missed something or overthought it. I can't understand if it's ok or if I made some avoidable mistakes.
So long story short: it works, but how to do better?
Thanks in advance for all the help
void partition(int a[],int start,int end)
{
srand (time(NULL));
int pivotpos = 3; //start + rand() % (end-start);
int i = start; // index 1
int j = end; // index 2
int flag = 1;
int pivot = a[pivotpos]; // sets the pivot's value
while(i<j && flag) // main loop
{
flag = 0;
while (a[i]<pivot)
{
i++;
}
while (a[j]>pivot)
{
j--;
}
if(a[i]>a[j]) // swap && sets new pivot, and restores the flag
{
swap(&a[i],&a[j]);
if(pivotpos == i)
pivotpos = j;
else if(pivotpos == j)
pivotpos = i;
flag++;
}
else if(a[i] == a[j]) // avoids getting suck on a mirror of values (fx pivot on pos 3 of : 1-0-0-1-1)
{
if(pivotpos == i)
j--;
else if(pivotpos == j)
i++;
else
{
i++;
j--;
}
flag++;
}
}
}
This is the pseudo code of partition() from Introduction to Algorithms , which is called Lomuto's Partitioning Algorithm, and there's a good explanation below it in the book.
PARTITION(A, p, r)
1 x ← A[r]
2 i ← p - 1
3 for j ← p to r - 1
4 do if A[j] ≤ x
5 then i ←i + 1
6 exchange A[i] ↔ A[j]
7 exchange A[i + 1] ↔ A[r]
8 return i +1
You can implement a randomized partition implementation easily based on the pseudo code above. As the comment pointed out, move the srand() out of the partition.
// srand(time(NULL));
int partition(int* arr, int start, int end)
{
int pivot_index = start + rand() % (end - start + 1);
int pivot = arr[pivot_index ];
swap(&arr[pivot_index ], &arr[end]); // swap random pivot to end.
pivot_index = end;
int i = start -1;
for(int j = start; j <= end - 1; j++)
{
if(arr[j] <= pivot)
{
i++;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[pivot_index]); // place the pivot to right place
return i + 1;
}
And there is another partition method mentioned in the book, which is called Hoare's Partitioning Algorithm, the pseudo code is as below:
Hoare-Partition(A, p, r)
x = A[p]
i = p - 1
j = r + 1
while true
repeat
j = j - 1
until A[j] <= x
repeat
i = i + 1
until A[i] >= x
if i < j
swap( A[i], A[j] )
else
return j
After the partition, every element in A[p...j] ≤ every element in A[j+1...r]. So the quicksort would be:
QUICKSORT (A, p, r)
if p < r then
q = Hoare-Partition(A, p, r)
QUICKSORT(A, p, q)
QUICKSORT(A, q+1, r)
There are multiple ways to partition for quicksort, the following being likely the simplest I can muster. Generally two schools of partitioning are used:
The Squeeze - collapses both ends of the sequence until a suitable swap pair is found, then swaps two elements into proper sides of the partition. Not trivial to implement, but can be more efficient (reduced swap count) than the alternative...
The Sweep - uses a single left to right (or right to left) sweep the values, swapping values to an incrementing pivot index that moves as the algorithm runs. Very simple to implement, as you'll see below.
I prefer the Sweep algorithm for people learning quicksort and partitioning only because it is so dead-simple to implement. Both can be implemented to perform in-place partitioning, as is the case in the implementation below. At no time except in swap() will you see a value stored in temp-storage.
Using a random pivot selection is only a small part of this. The following shows how to initialize the random number generator, and demonstrates likely the simplest partition algorithm and quicksort usage therein you're going to find.
It demonstrates, among other things, that in C/C++, you don't need both ends of a partition since simple pointer arithmetic can be used to adjust the "top" half of a partition. See the quicksort() function for how this is done.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void swap(int *lhs, int *rhs)
{
if (lhs == rhs)
return;
int tmp = *lhs;
*lhs = *rhs;
*rhs = tmp;
}
int partition(int ar[], int len)
{
int i, pvt=0;
// swap random slot selection to end.
// ar[len-1] will hold the pivot value.
swap(ar + (rand() % len), ar+(len-1));
for (i=0; i<len; ++i)
{
if (ar[i] < ar[len-1])
swap(ar + i, ar + pvt++);
}
// swap the pivot value into position
swap(ar+pvt, ar+(len-1));
return pvt;
}
void quicksort(int ar[], int len)
{
if (len < 2)
return;
int pvt = partition(ar, len);
quicksort(ar, pvt++); // note increment. skips pivot slot
quicksort(ar+pvt, len-pvt);
}
int main()
{
srand((unsigned int)time(NULL));
const int N = 20;
int data[N];
for (int i=0; i<N; ++i)
{
data[i] = rand() % 50 + 1;
printf("%d ", data[i]);
}
puts("");
quicksort(data, N);
for (int i=0; i<N; ++i)
printf("%d ", data[i]);
puts("");
return 0;
}
Output (varies, obviously)
32 49 42 49 5 18 41 48 22 33 40 27 12 47 41 6 50 27 8 7
5 6 7 8 12 18 22 27 27 32 33 40 41 41 42 47 48 49 49 50
Note: this does NOT account for modulo bias for using rand() % len, and frankly it would be overkill to do so for this example. If it were critical, I would use another generator entirely. An outstanding discussion for methods of choosing random pivot locations for quicksort partitioning can be found at this post on this site, including many links to different methods. I suggest reviewing it.
I am writing a simple merge sort function to sort based on a given compar function:
void merge(int left, int mid, int right, int(*compar)(const void *, const void *))
{
// sublist sizes
int left_size = mid - left + 1;
int right_size = right - mid;
// counts
int i, j, k;
// create left and right arrays
B *left_list = (B*) malloc(left_size*sizeof(B));
B *right_list = (B*) malloc(right_size*sizeof(B));
// copy sublists, could be done with memcpy()?
for (i = 0; i < left_size; i++)
left_list[i] = list[left + i];
for (j = 0; j < right_size; j++)
right_list[j] = list[mid + j + 1];
// reset counts
i = 0; j = 0;
for (k = left; k <= right; k++)
{
if (j == right_size)
list[k] = left_list[i++];
else if (i == left_size)
list[k] = right_list[j++];
// here we call the given comparision function
else if (compar(&left_list[i], &right_list[j]) < 0)
list[k] = left_list[i++];
else
list[k] = right_list[j++];
}
}
void sort(int left, int right, int(*compar)(const void *, const void *))
{
if (left < right)
{
// find the pivot point
int mid = (left + right) / 2;
// recursive step
sort(left, mid, compar);
sort(mid + 1, right, compar);
// merge resulting sublists
merge(left, mid, right, compar);
}
}
I am then calling this several times on the same list array using different comparison functions. I am finding that the sort is stable for the first call, but then after that I see elements are swapped even though they are equal.
Can anyone suggest the reason for this behaviour?
I'm not sure if this will do it but try changing this line:
compar(&left_list[i], &right_list[j]) < 0
to this:
compar(&left_list[i], &right_list[j]) <= 0
This will make it so that if they are already equal it does the first action which will (hopefully) preserve the stability rather than moving things around.
This is just a guess though.
I think you got your sizes wrong
int left_size = mid - left;
And, as pointed by arasmussen, you need to give preference to the left list in order to mantain stability
compar(&left_list[i], &right_list[j]) <= 0
In adition to all of this, you are not calling free after malloc-ing the helper lists. This will not make the algorithm return incorrect results but will cause your program's memory use to grow irreversably everytime you call the sort function.