The parallel kth smallest algorithm with OpenMP selecting wrong number

The parallel kth smallest algorithm with OpenMP selecting wrong number - c

I am working on a parallelized algorithm finding the k-th smallest number in an array using OpenMP. Basically, I have a sequential algorithm that is based on quicksort and it works just fine.
//K'th smallest element in arr[l..r]
int kthSmallest(int arr[], int l, int r, int k)
{
if (k > 0 && k <= r - l + 1) {
int pos = partition(arr, l, r);
if (pos - l == k - 1)
return arr[pos];
if (pos - l > k - 1) {
return kthSmallest(arr, l, pos - 1, k);
} else {
return kthSmallest(arr, pos + 1, r, k - pos + l - 1);
}
}
return INT_MAX;
}
void swap(int* a, int* b)
{
int temp = *a;
*a = *b;
*b = temp;
}
int partition(int arr[], int l, int r)
{
int x = arr[r], i = l;
for (int j = l; j <= r - 1; j++) {
if (arr[j] <= x) {
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]);
return i;
}
I am new to OpenMP. At first, I tried to parallelize the area of the function call itself in driver, unfortunately, it didn't work. The number drawn was correct, but it wasn't faster than sequential.
int main()
{
omp_set_num_threads(8);
double start_time, run_time;
int *data = (int *)malloc(N * sizeof(int));
clock_t start, end;
printf("Creating the array.......\n");
randomArray(data, N);
printf("Created the array........\n");
//start = clock();
start_time = omp_get_wtime();
#pragma omp parallel
{
#pragma omp single nowait
printf("\nKth [%d] value is %d\n", K,
kthSmallest(data, 0, N - 1, K));
}
//end = clock();
run_time = omp_get_wtime() - start_time;
When trying to parallelize the partition function itself, the timing improves, but the selected number is different each time.
int partition(int arr[], int l, int r)
{
int j;
int x = arr[r], i = l;
#pragma omp parallel for shared(arr, l, r) private(j) schedule(static)
for ( j = l; j <= r - 1; j++) {
if (arr[j] <= x) {
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]);
return i;
}
What's wrong with my idea, how could I parallelize this algorithm correctly?

Related

comparison of shell sorting and merge sorting

Comparing the running time of these two sorts, for some reason I get that they work in almost the same time, and on sorted arrays, shell sorting works 4 times faster. Although this can't be, because in the best case, shell sorting is performed in time n(log n)^2, while merge sorting is performed in nlogn time, which is faster than shell sorting. What could be the problem? I tried it on different PCs, but shell sorting is still faster.
UPD: For arrays filled randomly, it works as it should, but for already sorted ones, shell sorting works twice as fast.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#define DIFF 3276
double wtime(void);
void randArray(int* array, int size);
void shellSort(int a[], int size);
void merge(int arr[], int l, int m, int r);
void mergeSort(int arr[], int l, int r);
int main(void)
{
FILE* file1;
FILE* file2;
int* array = NULL;
unsigned seed = time(NULL);
if (!(file1 = fopen("shell.txt", "w"))) {
return 1;
}
for (int step = DIFF; step <= 32760; step += DIFF) {
srand(seed);
array = (int*)malloc(step * sizeof(int));
double start, end;
randArray(array, step);
start = wtime();
shellSort(array, step);
end = wtime();
free(array);
fprintf(file1, "%d\t%lf\n", step, end - start);
}
fclose(file1);
if (!(file2 = fopen("merge.txt", "w"))) {
return 1;
}
for (int step = DIFF; step <= 32760; step += DIFF) {
srand(seed);
array = (int*)malloc(step * sizeof(int));
double start, end;
randArray(array, step);
start = wtime();
mergeSort(array, 0, step - 1);
end = wtime();
free(array);
fprintf(file2, "%d\t%lf\n", step, end - start);
}
fclose(file2);
}
void randArray(int* array, int size)
{
for (int i = 0; i < size; i++) {
array[i] = i;
}
}
double wtime(void)
{
struct timeval t;
gettimeofday(&t, NULL);
return (double)t.tv_sec + (double)t.tv_usec * 1E-6;
}
void shellSort(int a[], int size)
{
int i, j;
int s = size / 2;
while (s > 0) {
for (i = s; i < size; i++) {
int temp = a[i];
for (j = i - s; (j >= 0) && (a[j] > temp); j -= s)
a[j + s] = a[j];
a[j + s] = temp;
}
s /= 2;
}
}
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
int L[n1], R[n2];
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];
i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}
void mergeSort(int arr[], int l, int r)
{
if (l < r) {
int m = l + (r - l) / 2;
mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
merge(arr, l, m, r);
}
}

Your shell sort works in-place and for a sorted array it never swaps a single element. So it only has the cost of the compares and the branch predictor will predict the compare perfectly every time.
Your merge sort on the other hand copies the data to temp arrays and back in every step. That's 2 * log(n) copies of the whole array. The extra memory needed might also exceed the L1 cache of your CPU for the larger test runs making this magnitudes worse. You can cut that in half by alternating between 2 arrays, well almost log(n) + 1 copies worst case. Branch prediction for the sorted case is perfect too, it's just the copies that cost you.

C Code doesn't run all the way through on CMD but works fine on other IDE's Sorting Algorithms

The main.c file works fine in Repl.it, OnlineGDB, and Mimir. But I had originally written the code in VSCode but the code will stop running at random points, only on command prompt. Sometimes it will only run two lines, or all the way to 40,000, and rarely have I gotten it to run all the way through. It seems as though there is some sort of limitation on command prompt or my compiler. Attached is my main.c file and a screenshot of what my command prompt output looks like. Each time I run the code it stops at a random point. Jamila suggested adding system(“PAUSE”); before return 0; in the main function but that did not do it. I had Jon try the code through his command prompt and he didn’t have an issue either. So it seems it comes down to my computer. I have reinstalled MinGW according to the instructions from Intro to C but the issue is still present. I have an i9 processor & 16gb of Ram, so it shouldn’t be a hardware limitation. This is just odd behavior and I want to understand why it is only my computer that has this problem. I have also tried running it with the leak_detector_c.c but that makes no difference as well. Code works fine in Mimir, OnlineGDB, and Repl.it.
IMAGE 1 IMAGE 2 IMAGE 3
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MAXVAL 100000
void randArray(int A[], int size, int maxval)
{
int i;
for(i=0l; i<size; i++)
A[i] = rand()%maxval + 1;
}
void arrayCopy(int from[], int to[], int size)
{
int j;
for(j=0; j<size; j++)
to[j] = from[j];
}
long timediff(clock_t t1, clock_t t2)
{
long elapsed;
elapsed = ((double)t2-t1) / CLOCKS_PER_SEC * 1000;
return elapsed;
}
void swap(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
void bubbleSort(int A[], int n)
{
int i, j;
for(i=n-2; i>=0; i--)
{
for(j=0; j<=i; j++)
if(A[j] > A[j+1])
swap(&A[j], &A[j]+1);
}
}
void insertionSort(int arr[], int n)
{
int i, item, j;
for (i = 1; i < n; i++)
{
item = arr[i];
/* Move elements of arr[0..i-1], that are
greater than key, to one position ahead
of their current position */
for(j=i-1; j>=0; j--)
{
if(arr[j]>item)
arr[j+1] = arr[j];
else
break;
}
arr[j+1] = item;
}
}
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
/* create temp arrays */
int *L = (int*) malloc(n1*sizeof(int));
int *R = (int*) malloc(n2*sizeof(int));
/* Copy data to temp arrays L[] and R[] */
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1+ j];
/* Merge the temp arrays back into arr[l..r]*/
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray
while (i < n1 && j < n2)
{
if (L[i] <= R[j])
{
arr[k] = L[i];
i++;
}
else
{
arr[k] = R[j];
j++;
}
k++;
}
/* Copy the remaining elements of L[], if there
are any */
while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}
/* Copy the remaining elements of R[], if there
are any */
while (j < n2)
{
arr[k] = R[j];
j++;
k++;
}
free(L);
free(R);
}
void mergeSort(int arr[], int l, int r)
{
if (l < r)
{
// get the mid point
int m = (l+r)/2;
// Sort first and second halves
mergeSort(arr, l, m);
mergeSort(arr, m+1, r);
// printf("Testing l=%d r=%d m=%d\n", l, r, m);
merge(arr, l, m, r);
}
}
int partition(int *vals, int low, int high)
{
// Pick a random partition element and swap it into index low.
int i = low + rand()%(high-low+1);
swap(&vals[low], &vals[i]);
int lowpos = low; //here is our pivot located.
low++; //our starting point is after the pivot.
// Run the partition so long as the low and high counters don't cross.
while(low<=high)
{
// Move the low pointer until we find a value too large for this side.
while(low<=high && vals[low]<=vals[lowpos]) low++;
// Move the high pointer until we find a value too small for this side.
while(high>=low && vals[high] > vals[lowpos]) high--;
// Now that we've identified two values on the wrong side, swap them.
if (low<high)
swap(&vals[low], &vals[high]);
}
// Swap the pivot element element into its correct location.
swap(&vals[lowpos], &vals[high]);
return high; //return the partition point
}
// Pre-condition: s and f are value indexes into numbers.
// Post-condition: The values in numbers will be sorted in between indexes s
// and f.
void quickSort(int* numbers, int low, int high) {
// Only have to sort if we are sorting more than one number
if (low < high) {
int split = partition(numbers,low,high);
quickSort(numbers,low,split-1);
quickSort(numbers,split+1,high);
}
}
void selectionSort(int arr[], int n)
{
int i, j, min_idx, temp;
// One by one move boundary of unsorted subarray
for (i = 0; i < n-1; i++)
{
//printf("\nIteration# %d\n",i+1);
// Find the minimum element in unsorted array
min_idx = i;
for (j = i+1; j < n; j++)
if (arr[j] < arr[min_idx])
min_idx = j;
// Swap the found minimum element with the first element
temp = arr[i];
arr[i] = arr[min_idx];
arr[min_idx] = temp;
}
}
int main()
{
int sizes[] = {1000, 10000, 20000, 40000, 50000, 100000, 1000000};
int *originalArray;
int* sortedArray;
int i, j;
long elapsed;
clock_t start, end;
for(i=0; i<7; i++)
{
originalArray = malloc(sizeof(int)*sizes[i]);
sortedArray = malloc(sizeof(int)*sizes[i]);
randArray(originalArray, sizes[i], MAXVAL);
arrayCopy(originalArray, sortedArray, sizes[i]);
start = clock();
bubbleSort(sortedArray, sizes[i]);
end= clock();
elapsed=timediff(start,end);
printf("Sorting %d values took %ld milliseconds for Bubble sort.\n", sizes[i], elapsed);
arrayCopy(originalArray, sortedArray, sizes[i]);
start = clock();
insertionSort(sortedArray, sizes[i]);
end= clock();
elapsed=timediff(start, end);
printf("Sorting %d values took %ld milliseconds for Insertion sort.\n", sizes[i], elapsed);
arrayCopy(originalArray, sortedArray, sizes[i]);
start = clock();
mergeSort(sortedArray, 0, sizes[i]);
end = clock();
elapsed=timediff(start, end);
printf("Sorting %d values took %ld milliseconds for Merge sort.\n", sizes[i], elapsed);
arrayCopy(originalArray, sortedArray, sizes[i]);
start = clock();
selectionSort(sortedArray, sizes[i]);
end = clock();
elapsed=timediff(start, end);
printf("Sorting %d values took %ld milliseconds for Selection sort.\n", sizes[i], elapsed);
arrayCopy(originalArray, sortedArray, sizes[i]);
start = clock();
quickSort(sortedArray, 0, sizes[i]);
end = clock();
elapsed=timediff(start, end);
printf("Sorting %d values took %ld milliseconds for Quick sort.\n", sizes[i], elapsed);
free(sortedArray);
free(originalArray);
}
return 0;
}

This answer was wrong, but I don't delete it yet so I can reply to comments.
Another guess: quicksort & partition look like you assumed low & high both inclusive. If so, the first call should be
quickSort(sortedArray, 0, sizes[i] - 1);
instead of
quickSort(sortedArray, 0, sizes[i]);
Unlike all other sorting routines, this one expects the ending index, not the array's length as its last parameter.

Get the sorted indices of an array using quicksort

I have changed to quicksort code to sort an array of floats which I got from tutorialgatway.org. However I need the sorted indices. I am aware of the qsort library function that can be used to get the sorted indices and I can implement that. However, I want to avoid standard library (I know this is not recommendation). The reason for not using a standard library is that I need to sort large number of arrays in a loop, which I need to parallelize using openMP, therefore writing function explicitly would allow me to parallelize quicksort function in a loop.
/* C Program for Quick Sort */
#include <stdio.h>
void Swap(float *x, float *y) {
float Temp;
Temp = *x;
*x = *y;
*y = Temp;
}
void quickSort(float a[], int first, int last) {
int i, j;
int pivot;
if (first < last) {
pivot = first;
i = first;
j = last;
while (i < j) {
while (a[i] <= a[pivot] && i < last)
i++;
while (a[j] > a[pivot])
j--;
if (i < j) {
Swap(&a[i], &a[j]);
}
}
Swap(&a[pivot], &a[j]);
quickSort(a, first, j - 1);
quickSort(a, j + 1, last);
}
}
int main() {
int number, i;
float a[100];
printf("\n Please Enter the total Number of Elements : ");
scanf("%d", &number);
printf("\n Please Enter the Array Elements : ");
for (i = 0; i < number; i++)
scanf("%f", &a[i]);
quickSort(a, 0, number - 1);
printf("\n Selection Sort Result : ");
for (i = 0; i < number; i++) {
printf(" %f \t", a[i]);
}
printf("\n");
return 0;
}
How can I return the sorted indices in the code ?

You need to generate an array of indexes from 0 to size-1, then sort the array of indexes according to the array values. So the code does compares using array[index[...]], and does swaps on index[...].
An alternative is to generate an array of pointers from &array[0] to &array[size-1]. When the pointers are sorted, you can convert them to indexes by using: index[i] = pointer[i] - &array[0] (could use a union for the indexes and pointers).
Example program with standard version of Hoare partition scheme to sort array of indexes in I[] according to floats in A[]:
#include <stdio.h>
#include <stdlib.h>
void QuickSort(float A[], size_t I[], size_t lo, size_t hi)
{
if (lo < hi)
{
float pivot = A[I[lo + (hi - lo) / 2]];
size_t t;
size_t i = lo - 1;
size_t j = hi + 1;
while (1)
{
while (A[I[++i]] < pivot);
while (A[I[--j]] > pivot);
if (i >= j)
break;
t = I[i];
I[i] = I[j];
I[j] = t;
}
QuickSort(A, I, lo, j);
QuickSort(A, I, j + 1, hi);
}
}
#define COUNT (4*1024*1024) // number of values to sort
int main(int argc, char**argv)
{
int r; // random number
size_t i;
float * A = (float *) malloc(COUNT*sizeof(float));
size_t * I = (size_t *) malloc(COUNT*sizeof(size_t));
for(i = 0; i < COUNT; i++){ // random floats
r = (((rand()>>4) & 0xff)<< 0);
r += (((rand()>>4) & 0xff)<< 8);
r += (((rand()>>4) & 0xff)<<16);
r += (((rand()>>4) & 0xff)<<24);
A[i] = (float)r;
}
for(i = 0; i < COUNT; i++) // array of indexes
I[i] = i;
QuickSort(A, I, 0, COUNT-1);
for(i = 1; i < COUNT; i++){
if(A[I[i-1]] > A[I[i]]){
printf("error\n");
break;
}
}
free(I);
free(A);
return(0);
}
This version of quicksort avoids stack overflow by only using recursion of the smaller side of the partition. Worst case time complexity will still be O(n^2), but the stack space complexity is limited to O(log(n)).
void QuickSort(float A[], size_t I[], size_t lo, size_t hi)
{
while (lo < hi)
{
float pivot = A[I[lo + (hi - lo) / 2]];
size_t t;
size_t i = lo - 1;
size_t j = hi + 1;
while (1)
{
while (A[I[++i]] < pivot);
while (A[I[--j]] > pivot);
if (i >= j)
break;
t = I[i];
I[i] = I[j];
I[j] = t;
}
/* avoid stack overflow */
if((j - lo) < (hi - j)){
QuickSort(A, I, lo, j);
lo = j+1;
} else {
QuickSort(A, I, j + 1, hi);
hi = j;
}
}
}

Why is this implementation of Quick Sort slower than qsort?

Doing some refreshing on algorithms. I wrote this Quick sort implementation:
#define SWAP(X, Y, Type) { Type Temp = (X); (X) = (Y); (Y) = Temp; }
int QSPartition(int* Array, int StartIndex, int EndIndex)
{
int PivotDestinationIndex = StartIndex;
int Pivot = Array[EndIndex];
for(int i = StartIndex; i <= EndIndex; i++)
{
if (Array[i] < Pivot)
{
SWAP(Array[i], Array[PivotDestinationIndex], int);
PivotDestinationIndex++;
}
}
SWAP(Array[PivotDestinationIndex], Array[EndIndex], int);
return PivotDestinationIndex;
}
void QuickSort(int* Array, int StartIndex, int EndIndex)
{
if (StartIndex >= EndIndex)
return;
int PivotIndex = QSPartition(Array, StartIndex, EndIndex);
QuickSort(Array, PivotIndex + 1, EndIndex);
QuickSort(Array, StartIndex, PivotIndex - 1);
}
In my main program:
#define Measure(Iterations, What, Code)\
{\
clock_t Begin = clock();\
for(int _im_ = 0; _im_ < Iterations; _im_++)\
{\
Code;\
}\
clock_t End = clock();\
printf("Time took to " What ": %.1f ms\n", (double)(End - Begin));\
}
int CompareInt(const void* x, const void* y) { return *(int*)x - *(int*)y; }
int main()
{
srand(time(0));
int Count = 100000;
int* Array0 = (int*)malloc(sizeof(int) * Count);
int* Array1 = (int*)malloc(sizeof(int) * Count);
for (int i = 0; i < Count; i++)
{
int RandomValue = rand() % 100;
Array0[i] = RandomValue;
Array1[i] = RandomValue;
}
Measure(1, "My Quick", QuickSort(Array0, 0, Count - 1));
Measure(1, "C Quick", qsort(Array1, Count, sizeof(int), CompareInt));
getchar();
return 0;
}
The output is that "My Quick" is always around 125 ms while "C Quick" is always around 25 ms. (32-bit DEBUG build - On RELEASE still the same 5 times slower)
What is going on? Why is my implementation slower? (I tried inlining the functions but it didn't do much) Is there something else wrong? (the way I'm timing this, or the way I'm populating the arrays with random values, or...?)

Try one of these, you'll need to change the std::swap to swap for C.
QuickSort using middle value for pivot:
void QuickSort(uint32_t a[], int lo, int hi) {
int i = lo, j = hi;
uint32_t pivot = a[(lo + hi) / 2];
while (i <= j) { // partition
while (a[i] < pivot)
i++;
while (a[j] > pivot)
j--;
if (i <= j) {
std::swap(a[i], a[j]);
i++;
j--;
}
}
if (lo < j) // recurse
QuickSort(a, lo, j);
if (i < hi)
QuickSort(a, i, hi);
}
QuickSort using median of low, middle, high, values as pivot
void QuickSort(uint32_t a[], int lo, int hi) {
int i = lo, j = (lo + hi)/2, k = hi;
uint32_t pivot;
if (a[k] < a[i]) // median of 3
std::swap(a[k], a[i]);
if (a[j] < a[i])
std::swap(a[j], a[i]);
if (a[k] < a[j])
std::swap(a[k], a[j]);
pivot = a[j];
while (i <= k) { // partition
while (a[i] < pivot)
i++;
while (a[k] > pivot)
k--;
if (i <= k) {
std::swap(a[i], a[k]);
i++;
k--;
}
}
if (lo < k) // recurse
QuickSort(a, lo, k);
if (i < hi)
QuickSort(a, i, hi);
}

Quick sort runtime speed concerns

I am having something that troubles me. I have my implementation of a quick sort algorithm, but when I test it on an array of integers that has over 30 elements, sorting takes, in my opinion to long. Sometimes even more than 10 seconds, unlike with selection sort, insertion sort and bubble sort, which are faster on 10000 elements than quick sort on 100 elements.
Here is my solution, please give advice :)
void kvikSort(int a[], int l, int d) {
int i, k;
if (l >= d)
return;
k = l;
swap(&a[l], &a[(l + d) / 2]);
for (i = l + 1; i <= d; i++)
if (a[i] < a[l])
swap(&a[++k], &a[i]);
swap(&a[l], &a[k]);
kvikSort(a, 0, k-1);
kvikSort(a, k+1, d);
}
EDIT: I am using GCC v 4.7.2 on my Linux Mint 14, proc: intel core2duo e7400
EDIT: My other algorithms:
void selectionSort(int a[], int n) {
int i, j, min;
for (i = 0; i < n - 1; i++) {
min = i;
for (j = i + 1; j < n; j++)
if (a[j] < a[min])
min = j;
if (min != i)
swap(&a[min], &a[i]);
}
}
void insertionSort(int a[], int n) {
int i, j;
for (i = 0; i < n - 1; i++)
for (j = i + 1; j > 0 && a[j] < a[j-1]; j--)
swap(&a[j], &a[j-1]);
}
void bubbleSort(int a[], int n) {
int i, j;
for (i = n - 1; i > 0; i--)
for (j = 0; j < i; j++)
if (a[j] > a[j+1])
swap(&a[j], &a[j+1]);
}
void swap(int *i, int *j) {
int tmp;
tmp = *i;
*i = *j;
*j = tmp;
}
EDIT: Maybe I should mention that in my test program I am first outputing randomly generated array to a text file, then sorted array to another text file. So it is certainly running slow, but that's not the problem, the problem is that quick sort runs a lot slower than the rest.

Your first recursive call
kvikSort(a, 0, k-1);
has the wrong lower bound, it should be
kvikSort(a, l, k-1);
With a lower bound of 0, you re-sort the initial part of the array again and again.

Here's the problem:
void kvikSort(int a[], int l, int d) {
int i, k;
if (l >= d)
return;
k = l;
swap(&a[l], &a[(l + d) / 2]);
for (i = l + 1; i <= d; i++)
if (a[i] < a[l])
swap(&a[++k], &a[i]);
swap(&a[l], &a[k]);
>>> kvikSort(a, 0, k-1);
kvikSort(a, l, k-1);
kvikSort(a, k+1, d);