Point at which insertion sort beats quicksort - c

I'm writing a quicksort that switches to insertion sort when the length of the array being sorted is small. I've run some tests and from what I see, when the array being sorted is 200 elements or less, it is faster to switch to insertion sort. This seems really large, I was expecting something closer to 20. Is 200 reasonable or is one of my implementations way off? I used the basic Wikipedia algorithms.
void quicksort(int list[asize], int left, int right)
{
int pivotindex;
int newpivotindex;
if (left < right)
{
if ((right-left)<=200)
{
insertionsort(list,left,right);
}
}
else
{
pivotindex = r_rand(left,right);
newpivotindex = partition(list, left, right, pivotindex);
quicksort(list, left, newpivotindex-1);
quicksort(list, newpivotindex+1, right);
}
return;
}
int partition(int* list,int left, int right, int pivotindex)
{
int pivotvalue = list[pivotindex];
int temp;
int i;
int storeindex;
temp=list[pivotindex];
list[pivotindex]=list[right];
list[right]=temp;
storeindex=left;
for (i=left;i<right;i++)
{
if (list[i] < pivotvalue)
{
temp=list[i];
list[i]=list[storeindex];
list[storeindex]=temp;
storeindex++;
}
}
temp=list[storeindex];
list[storeindex]=list[right];
list[right]=temp;
return storeindex;
}
void insertionsort(int* list, int left, int right)
{
int i;
int item;
int ihole;
for (i=left;i<=right;i++)
{
item=list[i];
ihole=i;
while (ihole > 0 && list[ihole-1] > item)
{
list[ihole]=list[ihole-1];
ihole--;
}
list[ihole]=item;
}
return;
}

Related

C - recursive function loses tracks of array's address?

I am implementing a binary search. I seem to have solved my problem but I'd like to understand what was going on.
Here is what I was originally doing for my search:
void find_index(double *list, int val, int low, int high, int *target)
{
int mid;
if (list[low] == val)
{
target[0] = low;
return;
}
if (list[high] == val)
{
target[0] = high;
return;
}
mid = (int)((double)(low + high) / 2.0);
if (list[mid] == val || mid == low || mid == high)
{
target[0] = mid;
return;
}
if (list[low] < val && val < list[mid])
{
high = mid;
find_index(list, val, low, high, target);
}
if (list[mid] < val && val < list[high])
{
low = mid;
find_index(list, val, low, high, target);
}
return;
}
In the code above, list is a sorted array of doubles. I allocated it dynamically in the main,
double *list = malloc((N+1)*sizeof(double));
and val is the value to be searched (guaranteed to be in list). The main calls func1, which calls find_index. Here's the story:
Initially I allocated statically a size1 array, int target[1] into func1 and
passed it to find_index. I got a segmentation fault. Tracking the origin of the error with valgrind I found that, in func1, I was using an uninitialized value AND THAT THE UNINITIALIZED VALUE WAS CREATED IN func1. target is the only array I allocate in such function so it had to be it.
I changed the declaration of find_index. Specifically, int *target became
int target[1] as I seem to understand from this question that this is the proper way of passing statically allocated arrays (despite it deals with 2D arrays). I got segmentation fault again and the same message from valgrind.
I declared target dynamically in the main int *target = malloc(1*sizeof(int));, passed it to func1 and then to find_index. The way I passed it to both functions is the same as how I was originally doing, i.e., both functions takes as argument a int *target. It worked, no complain from valgrind.
However I am suspicious now so I decided to change the type of find_index to int and declared target as a scalar into find_index itself. To my understanding of recursions, it should be safe.
However, I do not understand what was going on during steps 1-3 and I'd like to, so to prevent future mistakes. My only guess is that, during the numerous recursive calls that find_index implements, it loses track of the address of the array I originally allocated. Is this something that can happen? Under what circumstances? Could it be related to how I pass the array to the functions and/or to where I originally declared?
I have searched but could not find a truly related question.
EDIT: minimal reproducible example shows the situation is more involved
The minimal reproducible example I could build follows:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>
#include <time.h>
void func1(double **list, double **another_list, int N);
void func2(double **list, double **another_list, int N);
void find_index(double *list, int val, int low, int high, int *target);
void q_sort_with_doublelst(double *ary,double *lst,int low,int high);
int q_partition_with_doublelst(double *ary,double *lst,int low,int high);
void swap_with_doublelst(double *ary,double *lst,int i,int j);
////////////////////////////////////////////////////////////////////////
// quick sort
void q_sort_with_doublelst(double *ary,double *lst,int low,int high)
{
int pivotloc;
if(low<high)
{
pivotloc=q_partition_with_doublelst(ary,lst,low,high);
q_sort_with_doublelst(ary,lst,low,pivotloc-1);
q_sort_with_doublelst(ary,lst,pivotloc+1,high);
}
}
int q_partition_with_doublelst(double *ary,double *lst,int low,int high)
{
int i,pivotloc;
int pivotkey;
swap_with_doublelst(ary,lst,low,(low+high)/2);
pivotkey=ary[low];
pivotloc=low;
for(i=low+1;i<=high;i++)
{
if(ary[i]<pivotkey) swap_with_doublelst(ary,lst,++pivotloc,i);
}
swap_with_doublelst(ary,lst,low,pivotloc);
return pivotloc;
}
void swap_with_doublelst(double *ary,double *lst,int i,int j)
{
int tmp_ary;
double tmp_lst;
tmp_ary=ary[i]; tmp_lst=lst[i];
ary[i]=ary[j]; lst[i]=lst[j];
ary[j]=tmp_ary; lst[j]=tmp_lst;
}
////////////////////////////////////////////////////////////////////////
void func2(double **list, double **another_list, int N)
{
int i, j;
double x;
for(i=0; i<=N; i++)
{
j = (int) lrand48()%(N+1);
x = lrand48();
list[0][i] = j;
list[1][i] = x;
j = (int) lrand48()%(N+1);
x = lrand48();
another_list[0][i] = j;
another_list[1][i] = x;
}
}
void func1(double **list, double **another_list, int N)
{
int i, j;
int binary_index[1];
double x, y;
for(i=0; i<=N; i++)
{
find_index(list[0], i, 0, N, binary_index);
j = binary_index[0];
x = another_list[0][j];
y = another_list[1][j];
}
return;
}
void find_index(double *list, int val, int low, int high, int *target)
{
int mid;
if (list[low] == val)
{
target[0] = low;
return;
}
if (list[high] == val)
{
target[0] = high;
return;
}
mid = (int)((double)(low + high) / 2.0);
if (list[mid] == val || mid == low || mid == high)
{
target[0] = mid;
return;
}
if (list[low] < val && val < list[mid])
{
high = mid;
find_index(list, val, low, high, target);
}
if (list[mid] < val && val < list[high])
{
low = mid;
find_index(list, val, low, high, target);
}
return;
}
int main (int argc, char **argv)
{
int i, N;
int seed=time(0);
srand48(seed);
N = 1000000;
double **list = (double **)malloc(2*sizeof(double *));
list[0] = (double *)malloc((N+1)*sizeof(double)); // contains only ints
list[1] = (double *)malloc((N+1)*sizeof(double)); // contains double
double **another_list = (double **)malloc(2*sizeof(double *));
another_list[0] = (double *)malloc((N+1)*sizeof(double)); // contains only ints
another_list[1] = (double *)malloc((N+1)*sizeof(double)); // contains double
for(i=0; i<=N; i++) list[0][i] = i;
while(1 < 2)
{
func2(list, another_list, N);
q_sort_with_doublelst(list[0], list[1], 0, N);
func1(list, another_list, N);
}
free(list[0]);
free(list[1]);
free(list);
free(another_list[0]);
free(another_list[1]);
free(another_list);
return 0;
}
I tried to reproduce the error by deterministically filling list[0] with the for loop in the main (line 173), so to have it already sorted. It turns out it does not crash. The code I am posting is more similar to my true code, but now I am afraid (as comments suggest) that I have a problem with recursions. A summary of what the minimal example does (very similarly to my code):
Fill list[0] with random integers in [0, N]. Fill list1 with random double.
Fill another_list in the same way.
Sort list[0] by preserving the relation with list[1].
Binary-search list[0] and use the index found to take values from list[1] and another_list.
There is a difference between this example and my true code: here list[0] is not guaranteed to contain the value I am searching. Further, here list[0] may contain duplicates. In principle the search is written with the idea that the value searched needs not to be present and the idea that there are no duplicates. Thus, I would say that if the value is not present, the search should give me the closest value found, but I have no clue to what happens if there are duplicates. I guess it should give me the index of one the possible duplicates, but I am definitely unsure. In my original code there are no duplicates.
I am afraid the present code crashes because of the presence of duplicates. Still, it would be great if someone can identify a different problem. Let me stress that valgrind gives me the same exact message: the error should with target, as it is the only array allocated in func1.

My quick sort implementation doesn't work

Can anyone help me with this code because when it's executed it doesn't sort correctly the array. I can't figure out what's wrong.
I use this struct and get the data from a file
typedef struct record {
int id_field;
char string_field[20];
int int_field;
double double_field;
} record;
typedef int (*CompareFunction)(void *, void *);
and this is the quick sort:
void swap(void **a, void **b) {
void *tmp;
tmp = *a;
*a = *b;
*b = tmp;
}
void quick_sort(void **array, int left, int right, CompareFunction compare) {
int index;
if (left < right) {
index = partition(array, left, right, compare);
quick_sort(array, left, index - 1, compare);
quick_sort(array, index + 1, right, compare);
}
}
int partition(void **array, int left, int right, CompareFunction compare) {
int pivot = left + (right - left) / 2;
int i = left;
int j = right;
while (i < j) {
if (compare(array[i], array[pivot]) < 0) {
i++;
} else {
if (compare(array[j], array[pivot]) > 0) {
j--;
} else {
swap(&array[i], &array[j]);
i++;
j--;
}
}
}
swap(&array[pivot], &array[j]);
return j;
}
This is the compare function for int:
int compare_int_struct(void *ptr1, void *ptr2) {
int i1 = (*((record *) ptr1)).int_field;
int i2 = (*((record *) ptr2)).int_field;
if (i1 < i2) {
return -1;
}
if (i1 == i2) {
return 0;
}
return 1;
}
for example:
given array sorted array
233460 | 233460
4741192 | 1014671
1014671 | 1188961
496325 | 3119429
4476757 | 496325
3754104 | 2146160
4271997 | 2163766
4896376 | 2369159
2735414 | 3754104
2163766 | 2735414
2369159 | 4271997
1188961 | 4476757
3843159 | 4741192
2146160 | 3843159
It seems it orders in small blocks
The problems are in your partition routine.
You are selecting a pivot index, you then proceed to partition the (sub)array by comparing values indirectly via this index, and the value identified by l or r, respectively.
However, as you go, swapping values, sooner or later your selected pivot value will change its position in the array and you're now comparing to whatever happens to wind up at the pivot index.
Instead, you should save off the pivot value and compare to that. This has the added benefit that it saves array indexing within the inner loops:
int partition(void **array, int left, int right, CompareFunction compare) {
int pivot = left + (right - left) / 2;
int pivotValue = array[pivot]; // ********
int i = left;
int j = right;
while (i < j) {
if (compare(array[i], pivotValue) < 0) { // ********
i++;
} else {
if (compare(array[j], pivotValue) > 0) { // ********
j--;
} else {
swap(&array[i], &array[j]);
i++;
j--;
}
}
}
swap(&array[pivot], &array[j]);
return j;
}
And then there's that final swap. This is something you would use if you had chosen, up front, to move the selected pivot to the beginning or the end of the array and exclude that index from the remaining partitioning process. Several variants do this, but here that swap is just messing things up and should be removed.
Thank you all for your answers. I modified the the partition and now it works:
int pivot = left;
int i = left + 1;
int j = right;
while (i <= j) {
if (compare(array[i], array[pivot]) < 0) {
i++;
} else {
if (compare(array[j], array[pivot]) > 0) {
j--;
} else {
swap(&array[i], &array[j]);
i++;
j--;
}
}
}
swap(&array[pivot], &array[j]);
return j;

How do I write a sort function without using loops?

I am trying to write a recursive sort function with no loops at all.
void insertionSortRecursive(int arr[], int n)
{
if (n <= 1)
return;
insertionSortRecursive( arr, n-1 );
int last = arr[n-1];
int j = n-2;
while (j >= 0 && arr[j] > last)
{
arr[j+1] = arr[j];
j--;
}
arr[j+1] = last;
}
Is there a way to get rid of while loop and still make this function work?
Use the following code which comes in here (remove the while using another recursive function):
void insertInOrder( int element,int *a, int first, int last)
{
if (element >= a[last])
a[last+1] = element;
else if (first < last)
{
a[last+1] = a[last];
insertInOrder(element, a, first, last-1);
}
else // first == last and element < a[last]
{
a[last+1] = a[last];
a[last] = element;
}
}
void insertion_sort_recur(int *arr, int first, int last)
{
if(first < last)
{
insertion_sort_recur(arr, first, last-1); // avoids looping thru arr[0..last-1]
insertInOrder(arr[last], arr, first, last-1); // considers arr[last] as the first element in the unsorted list
}
}
void main()
{
int A[]={5,3,2,4,6,1};
insertion_sort_recur(A,0,5);
}

Introsort - iterative variant gets slower

For a bit of fun, I'm trying to implement an iterative variant of Introsort. The default implementation looks something like this:
void introsort_loop(int *a, size_t left, size_t right, size_t threshold, size_t depth) {
while(right-left > threshold) {
if(depth == 0) {
heapsort(a, left, right);
return;
}
--depth;
int p = partition(a, left, right);
introsort_loop(p, right, depth);
right = p-1;
}
}
void introsort(int *array, size_t num, size_t threshold) {
if(num > 1) {
introsort_loop(a, 0, num-1, threshold, log2(num)*2);
insertionsort(a, num);
}
}
I'm using the glibc implementation of qsort as the basis for an iterative introsort, since qsort happens to implement an iterative quicksort.
My implementation looks like this:
#include <limits.h>
#include <math.h>
#include "introsort.h"
// Stack node declarations used to store unfulfilled partition obligations.
typedef struct {
int lo;
int hi;
} stack_node;
// The next 4 #defines implement a very fast in-line stack abstraction.
// The stack needs log (total_elements) entries (we could even subtract
// log(threshold)). Since num has type size_t, we get as
// upper bound for log (num):
// bits per byte (CHAR_BIT) * sizeof(size_t).
#define STACK_SIZE (CHAR_BIT*sizeof(size_t))
#define PUSH(low, high) ((top->lo = (low)), (top->hi = (high)), ++top)
#define POP(low, high) (--top, ((low) = top->lo), ((high) = top->hi))
#define STACK_NOT_EMPTY (stack < top)
#define SWAP(a, i, j) { int tmp = a[i]; a[i] = a[j]; a[j] = tmp; }
#define PARENT(i) ((i-1)/2)
#define LEFT_CHILD(i) (((i)<<1)+1)
void heapify_i(int *a, int left, int right) {
int child, swap;
int root = left;
while((child = LEFT_CHILD(root)) <= right) {
swap = root;
if(a[swap] < a[child]) {
swap = child;
}
if(child+1 <= right && a[swap] < a[child+1]) {
swap = child+1;
}
if(swap == root) {
break;
} else {
SWAP(a, root, swap);
root = swap;
}
}
}
void heapsort_i(int *a, int left, int right) {
int start = left;
int end = right;
for(start = PARENT(end); start >= left; --start) {
heapify_i(a, start, end);
}
start = left;
while(start < end) {
SWAP(a, start, end);
heapify_i(a, start, --end);
}
}
void quicksort_i(int *a, size_t num, size_t threshold, size_t depth) {
//========== QUICKSORT ==========//
if(num > threshold) {
stack_node stack[STACK_SIZE];
stack_node *top = stack;
PUSH(-1, -1);
int low = 0;
int high = num-1;
int left, mid, right;
while(STACK_NOT_EMPTY) {
if(depth == 0) {
heapsort_i(a, low, high);
break;
} else {
--depth;
//========== PIVOT = MID (MEDIAN OF THREE) ==========
mid = low+(high-low)/2;
if(a[mid] < a[low]) {
SWAP(a, mid, low);
}
if(a[high] < a[mid]) {
SWAP(a, high, mid);
} else {
goto jump_qi;
}
if(a[mid] < a[low]) {
SWAP(a, mid, low);
}
jump_qi:;
//========== PARTITIONING ==========//
left = low+1;
right = high-1;
while(left < right) {
while(a[left] < a[mid]) {
++left;
}
while(a[mid] < a[right]) {
--right;
}
if(left < right) {
SWAP(a, left, right);
if(mid == left) {
mid = right;
} else if(mid == right) {
mid = left;
}
++left;
--right;
}
}
// Set up pointers for next iteration. First determine whether
// left and right partitions are below the threshold size. If so,
// ignore one or both. Otherwise, push the larger partition's
// bounds on the stack and continue sorting the smaller one.
if(right-low < threshold) {
if(high-left <= threshold) {
// ignore both small partitions
POP(low, high);
} else {
// ignore small left partition
low = left;
}
} else if(high-left <= threshold) {
// ignore small right partition
high = right;
} else if(right-low > high-left) {
// push larger left partition
PUSH(low, right);
low = left;
} else {
// push larger right partition
PUSH(left, high);
high = right;
}
}
}
}
//========== INSERTION SORT ==========//
int e, i, j;
for(i = 1; i <= num; ++i) {
e = a[i];
for(j = i-1; j >= 0 && e < a[j]; --j) {
a[j+1] = a[j];
}
a[j+1] = e;
}
}
void introsort_i(int *array, size_t num, size_t threshold) {
if(num > 1) {
quicksort_i(array, num-1, threshold, log2(num)*2);
}
}
For input of sizes 10 to 100'000random elements it appears to run fine, but when I test for a million elements, it suddenly slows down to a couple seconds, which is far too long for a single array with 1 million elements.
How do I fix this?

Quicksort C in-place partition does not work (Revision)

I get stack overflow errors, would appreciate it if anyone could point out to me where am I doing it wrong. C language, using Visual Studio compiler. Note that this is only a revision given to me by my lecturer. I think the rest of my code, eg. sorting is fine.
#include <stdio.h>
void QuickSort (int data[], int size);
void QuickSortHelp (int data[], int from, int to);
void QuickSort (int data[], int size)
{
QuickSortHelp(data, 0, size-1);
}
void QuickSortHelp (int data[], int from, int to)
{
if (from >= to)
return;
else
{
int pivot;
int index;
int temp;
int left, right;
pivot = data[from];
left = from + 1;
right = to;
for (left = from + 1;left<= right;left++,right--)
{
if (data[left] > pivot)
{
if (data[right]<= pivot)
{
temp = data[left];
data[left] = data[right];
data[right] = temp;
}
}
}
temp = data[from];
data[from] = data[left];
data[left] = temp;
index = left;
QuickSortHelp (data, from, index-1);
QuickSortHelp (data, index+1, to);
}
}
int main()
{
int data[] = {4,5,3,8,2,6,1,7};
int i;
printf("Test");
QuickSort (data, 8);
while (i<=8)
{
printf("%d", data[i]);
i++;
}
printf("Test done");
}

Resources