Parallel merge in single thread mode very slow

Parallel merge in single thread mode very slow - c

I have two sets of sorted elementes and want to merge them together in way so i can parallelize it later. I have a simple merge implementation that has data dependencies because it uses the maximum function and a first version of a parallelizable merge that uses binary search to find the rank and compute the index for a given value.
The getRank function returns the number of elements lower or equal than the given needle.
#define ATYPE int
int getRank(ATYPE needle, ATYPE *haystack, int size) {
int low = 0, mid;
int high = size - 1;
int cmp;
ATYPE midVal;
while (low <= high) {
mid = ((unsigned int) (low + high)) >> 1;
midVal = haystack[mid];
cmp = midVal - needle;
if (cmp < 0) {
low = mid + 1;
} else if (cmp > 0) {
high = mid - 1;
} else {
return mid; // key found
}
}
return low; // key not found
}
The merge algorithms operates on the two sorted sets a, b and store the result into c.
void simpleMerge(ATYPE *a, int n, ATYPE *b, int m, ATYPE *c) {
int i, l = 0, r = 0;
for (i = 0; i < n + m; i++) {
if (l < n && (r == m || max(a[l], b[r]) == b[r])) {
c[i] = a[l];
l++;
} else {
c[i] = b[r];
r++;
}
}
}
void merge(ATYPE *a, int n, ATYPE *b, int m, ATYPE *c) {
int i;
for (i = 0; i < n; i++) {
c[i + getRank(a[i], b, m)] = a[i];
}
for (i = 0; i < m; i++) {
c[i + getRank(b[i], a, n)] = b[i];
}
}
The merge operation is very slow when having a lot of elements and still can be parallelized, but simpleMerge is always faster even though it can not be parallelized.
So my question now is, do you know any better approach for parallel merging and if so, can you point me to a direction or is my code just bad?

Complexity of simpleMerge function:
O(n + m)
Complexity of merge function:
O(n*logm + m*logn)
Without having thought about this too much, my suggestion for parallelizing it, is to find a single value that's around the middle of each function, using something similar to the getRank function, and using simple merge from there. That can be O(n + m + log m + log n) = O(n + m) (even if you do a few, but constant amount of lookups to find a value around the middle).

The algorithm used by the merge function is best by asymptotic analysis. The complexity is O(n+m). You cannot find a better algorithm since I/O takes O(n+m).

Related

What's wrong with this merge sort code I have done from the CLRS?

Wrong output!
I have tried each and every condition but failed to get the real result
I tried to accomplish this from the clrs book pseudo-code but I failed.
I am trying to write merge sort using iterators to implement myself pseudo-code in c language, but for some reason, this code is compiling but the outcome is not sorted. Can someone figure out what is wrong with it? it seems perfectly fine to my untrained eyes.
#include <stdio.h>
#include<math.h>
#include <stdlib.h>
int a[] = {5,3,65,6,7,3,7,8};
void print_array(int a[], int size)
{
int i;
for(i = 0;i < size;i++)
{
printf("%d ",a[i]);
}
}
void merge(int a[],int p,int q,int r)
{
int n1,n2,i,j,k;
n1 = q - p + 1;
n2 = r - q;
int l[n1];
int m[n2];
for(i = 0; i < n1; i++)
l[i] = a[i+p];
for(j = 0; j < n2; j++)
m[j] = a[q+1+j];
l[n1] = 9999999;
m[n2] = 9999999;
i = 0;
j = 0;
for(k = p;k < r; k++)
{
if(l[i] <= m[j])
{
a[k] = l[i];
i = i+1;
}
else
{
a[k] = m[j];
j = j+1;
}
}
}
void merge_sort(int a[],int p,int r)
{
if(p < r)
{
int q = floor((p + r) / 2);
merge_sort(a,p,q);
merge_sort(a,q+1,r);
merge(a,p,q,r);
}
}
int main()
{
int size = (sizeof(a) / sizeof(a[0]));
print_array(a,size);
printf("\n");
merge_sort(a,0,size);
print_array(a,size);
return 0;
}
//for this input out put is showing
//-1 -1 3 3 3 -1 6 7

Please pay attention to array bounds and sizes:
Your parameter r is not the size of the array, but the index of the rightmost element, so you should call merge_sort(a, 0, size - 1);.
When you want to use a large sentinel value, after the actual array, you must allocate space for it, so:
int l[n1];
int m[n2];
Because your value r is the index of the last element, you must consider it when merging and your loop condition should be for(k = p; k <= r; k++).
(Not really a problem, but you don't need to use floor like in JavaScript. When a and b are integers, a / b will perform a division that results in an integer.)
In C, arrays (and ranges in general) have inclusive lower bounds and exclusive upper bounds: lo is the first valid index and hi is the first invalid index after the valid range. For array indices, lo and hi are zero and the array size.
Embrace this convention. The C indices lead to the following style:
The length of a range is hi - lo;
Forward loops are for (i = lo; i < hi; i++);
Adjacent ranges share the hi and lo values.
For example, in your merge function the middle value p would be the first value in the right range, but also the exclusive upper bound of the left range.
If pseudocode or code in other languages uses one-based indices, I recommend translating it to the zero-based, exclusive upper-bound style of C. After a while, you'll get suspicious of spurious - 1's and <='s. :)

How to assign an entity of type DTYPE to SWAP

This is sorting algorithm. I try to sort in descending order according to the size of the sum(x+y) of the fields which has x and y in struct.
I make the Quicksort descending order. but I have a question about SWAP(A[N/2].sum, A[0].sum); and pivot.
typedef struct DATA { double x; double y; double sum;} DTYPE;
//#define DTYPE double
#define SWAP(aa,bb) {DTYPE tmp; tmp = aa; aa=bb; bb=tmp;}
void sort_201821100(DTYPE A[], int N)
{
int piv;
if (N > 1) {
piv = partition(A, N);
sort_201821100(A, piv);
sort_201821100(A + piv + 1, N - piv - 1);
}
}
int partition(DTYPE A[], int N) {
int P = N / 2;
int i = 0, j = N;
for (i = 0; i < j; i++)
{
A[i].sum = A[i].x + A[i].y;
}
DTYPE pivot;
SWAP(A[N / 2].sum, A[0].sum);
pivot = A[0].sum;
while (1) {
while ((A[++i].sum > pivot) && (i < N));
while ((A[--j].sum < pivot) && (j > 0));
if (i >= j) break;
SWAP(A[i], A[j]);
}
SWAP(A[0], A[j]);
return j;
}
SWAP can’t assign a value of type Double to an entity of type DTYPE.
And pivot needs an arithmetic form.
I want to know how to assign an entity of type DTYPE and put an arithmetic form.
Any reply will be thankful. Best regards.

you should replace your SWAP-Macro by two functions
void swap_dtype(DTYPE *a, DTYPE *b);
Then you can call swap_dtype(A, A+j) instead of SWAP(A[0], A[j])
void swap_double(double *a, double *b);
Then you can call swap_dtype(&(A[N / 2].sum), &(A[0].sum)) instead of SWAP(A[0], A[j])
If this works properly, you can replace this two function with a macro. Since macros are error prone, you should consider this step carefully.
You should also think hard about the swapping of the sums. An sorting algorithm should not modify the sorted elements only there position. This is counter intuitive.

Merge Sort - getting incorrect results and corrupted data (duplicating the elements) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I try to implement the basic Merge Sort, however something goes wrong and it incorrectly duplicates some elements in my input array and even changes some elements, so the output array becomes corrupted. I use tmp[] as a global declared array pointer (long *tmp; -> in global declarations) What am I missing or making wrong?
Also, how can I improve the time complexity of this algorithm?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void merge(long *arr, int l, int m, int r);
void mergeSort(long *arr, int l, int r);
//Global Declarations
long *tmp;
//Merge Sort
void Merge_Sort(long *Array, int Size) {
tmp = malloc(sizeof(long) * Size);
mergeSort(Array, 0, Size - 1);
}
//Merge Sort helper function
void mergeSort(long *arr, int l, int r) {
if (l >= r)
return;
// divide the array into two arrays
// call mergeSort with each array
// merge the two arrays into one
int m = l + ((r - l) / 2; //integer overflow
mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
merge(arr, l, m, r);
}
//merge function
static void merge(long *arr, int l, int m, int r) {
//tmp[] is a global array with the same size as arr[]
memcpy(&tmp[l], &arr[l], m - l + 1); //copy left subarray to tmp
memcpy(&tmp[m + 1], &arr[m + 1], r - m); //copy right subarray to tmp
int i = l;
int j = m + 1;
for (int k = l; k <= r; k++) {
if (i > m)
arr[k] = tmp[j++]; //if the left sub-array is exhausted
else
if (j > r)
arr[k] = tmp[i++]; //if the right sub-array is exhausted
else
if (tmp[j] < tmp[i])
arr[k] = tmp[j++]; //compare the current values
else
arr[k] = tmp[i++];
}
}
int main() {
long array[10] = {
-3153274050600690459,
6569843820458972605,
-6837880721686463424,
1876340121514080353,
-1767506107468465601,
-1913444019437311076,
-426543213433372251,
6724963487502039099,
-1272217999899710623,
3399373277871640777,
};
Merge_Sort(array, 10);
for (int i = 0; i < 10; i++) {
printf("%ld\n". array[i]);
}
return 0;
}
Output (incorrect):
-1913444019437311076
-426543213433372251
140464981228095
140388532523709
94285492859968
94285492861503
-1767506107468465601
6724963487502039099
-1272217999899710623
3399373277871640777
Expected output:
-6837880721686463424
-3153274050600690459
-1913444019437311076
-1767506107468465601
-1272217999899710623
-426543213433372251
1876340121514080353
3399373277871640777
6569843820458972605
6724963487502039099

The merge function does not copy the correct number of bytes:
memcpy(&tmp[l], &arr[l], m - l + 1); //copy left subarray to tmp
memcpy(&tmp[m + 1], &arr[m + 1], r - m); //copy right subarray to tmp
You must compute the correct number of bytes by multiplying the number of elements by the size of the element. Note also that the left and right subarrays are contiguous, so it suffices to write:
memcpy(&tmp[l], &arr[l], sizeof(*tmp) * (r - l + 1));
There are other problems:
avoid using a global variable tmp, just pass it to mergeSort as an extra argument
you must free the temporary array after mergeSort() finishes.
Here is a modified version:
#include <stdlib.h>
#include <string.h>
//merge function
static void merge(long *arr, int l, int m, int r, long *tmp) {
//tmp[] is a global array with the same size as arr[]
memcpy(&tmp[l], &arr[l], sizeof(*tmp) * (r - l + 1));
for (int k = l, i = l, j = m + 1; k <= r; k++) {
if (i <= m && (j > r || tmp[i] <= tmp[j]))
arr[k] = tmp[i++];
else
arr[k] = tmp[j++];
}
}
//Merge Sort helper function
static void mergeSort(long *arr, int l, int r, long *tmp) {
if (l < r) {
// divide the array into two arrays
// call mergeSort with each array
// merge the two arrays into one
int m = l + (r - l) / 2; //avoid integer overflow
mergeSort(arr, l, m, tmp);
mergeSort(arr, m + 1, r, tmp);
merge(arr, l, m, r);
}
}
//Merge Sort
void Merge_Sort(long *array, int size) {
long *tmp = malloc(sizeof(*tmp) * size);
mergeSort(array, 0, Size - 1, tmp);
free(tmp);
}
Regarding your other question: how can I improve the time complexity of this algorithm?
The merge sort algorithm has a time complexity of O(N * log(N)) regardless of the set distribution. This is considered optimal for generic data. If your data happens to have known specific characteristics, other algorithms may have a lower complexity.
if all values are n a small range, counting sort is a good alternative
if there are many duplicates and a small number K of distinct unique values, the complexity can be reduced to O(N + K.log(K)).
integer values can be sorted with radix sort that can be more efficient for large arrays.
if the array is almost sorted, insertion sort or a modified merge sort (testing if the left and right subarrays are already in order with a single initial test) can be faster too.
Using Timsort can result in faster execution for many non random distributions.
Here is an implementation of radix_sort() for arrays of long:
#include <stdlib.h>
#include <string.h>
void radix_sort(long *a, size_t size) {
size_t counts[sizeof(*a)][256] = {{ 0 }}, *cp;
size_t i, sum;
unsigned int n;
unsigned long *tmp, *src, *dst, *aa;
dst = tmp = malloc(size * sizeof(*a));
src = (unsigned long *)a;
for (i = 0; i < size; i++) {
unsigned long v = src[i] + (unsigned long)VAL_MIN;
for (n = 0; n < sizeof(*a) * 8; n += 8)
counts[n >> 3][(v >> n) & 255]++;
}
for (n = 0; n < sizeof(*a) * 8; n += 8) {
cp = &counts[n >> 3][0];
for (i = 0, sum = 0; i < 256; i++)
cp[i] = (sum += cp[i]) - cp[i];
for (i = 0; i < size; i++)
dst[cp[((src[i] + (unsigned long)VAL_MIN) >> n) & 255]++] = src[i];
aa = src;
src = dst;
dst = aa;
}
if (src == tmp)
memcpy(a, src, size * sizeof(*a));
free(tmp);
}

Is there a way to iterate over order?

How can one iterate through order of execution?
I am developing a piece of software that have several steps to compute over some data, and i was thinking in may changing the order of those steps pragmatically so i can check what would be the best order for some data.
Let me exemplify: I have let's say 3 steps (it's actually more):
stepA(data);
stepB(data);
stepC(data);
And I want a contraption that allow me to walk thought every permutation of those steps and then check results. Something like that:
data = originalData; i=0;
while (someMagic(&data,[stepA,stepB,stepC],i++)){
checkResults(data);
data = originalData;
}
then someMagic execute A,B then C on i==0. A, C then B on i==1. B, A then C on i==2 and so on.

You can use function pointers, maybe something like the following:
typedef void (*func)(void *data);
int someMagic(void *data, func *func_list, int i) {
switch (i) {
case 0:
func_list[0](data);
func_list[1](data);
func_list[2](data);
break;
case 1:
func_list[0](data);
func_list[2](data);
func_list[1](data);
break;
case 2:
func_list[1](data);
func_list[0](data);
func_list[2](data);
break;
default: return 0;
}
return 1;
}
func steps[3] = {
stepA,
stepB,
stepC
}
while (someMagic(&data, steps, i++)) {
....
}

The key is to find a way to iterate over the set of permutations of the [0, n[ integer interval.
A permutation (in the mathematical meaning) can be seen as a bijection of [0, n[ into itself and can be represented by the image of this permutation, applied to [0, n[.
for example, consider the permutation of [0, 3[:
0 -> 1
1 -> 2
2 -> 0
it can be seen as the tuple (1, 2, 0), which in C, translate naturally to the array of integers permutation = (int []){1, 2, 0};.
Suppose you have an array of function pointers steps, then for each permutation, you'll then want to call steps[permutation[i]], for each value of i in [0, n[.
The following code implements this algorithm:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
static void stepA(int data) { printf("%d: %s\n", data, __func__); }
static void stepB(int data) { printf("%d: %s\n", data, __func__); }
static void stepC(int data) { printf("%d: %s\n", data, __func__); }
static void (* const steps[])(int) = {stepA, stepB, stepC,};
static int fact(int n) { return n == 0 ? 1 : fact(n - 1) * n; }
static int compare_int(const void *pa, const void *pb)
{
return *(const int *)pa - *(const int *)pb;
}
static void get_next_permutation(int tab[], size_t n)
{
int tmp;
unsigned i;
unsigned j;
unsigned k;
/* to find the next permutation in the lexicographic order
* source: question 4 (in french, sorry ^^) of
* https://liris.cnrs.fr/~aparreau/Teaching/INF233/TP2-permutation.pdf
. */
/* 1. find the biggest index i for which tab[i] < tab[i+1] */
for (k = 0; k < n - 1; k++)
if (tab[k] < tab[k + 1])
i = k;
/* 2. Find the index j of the smallest element, bigger than tab[i],
* located after i */
j = i + 1;
for (k = i + 1; k < n; k++)
if (tab[k] > tab[i] && tab[k] < tab[j])
j = k;
/* 3. Swap the elements of index i and j */
tmp = tab[i];
tab[i] = tab[j];
tab[j] = tmp;
/* 4. Sort the array in ascending order, after index i */
qsort(tab + i + 1, n - (i + 1), sizeof(*tab), compare_int);
}
int main(void)
{
int n = sizeof(steps) / sizeof(*steps);
int j;
int i;
int permutation[n];
int f = fact(n);
/* first permutation is identity */
for (i = 0; i < n; i++)
permutation[i] = i;
for (j = 0; j < f; j++) {
for (i = 0; i < n; i++)
steps[permutation[i]](i);
if (j != f - 1)
get_next_permutation(permutation, n);
}
return EXIT_SUCCESS;
}
The outer loop in main, indexed by j, iterates over all the n! permutations, while the inner one, indexed by i, iterates overs the n steps.
The get_next_permutation modifies the permutation array in place, to obtain the next permutation in the lexicographical order.
Note that it doesn't work when the permutation in input is the last one (n - 1, ..., 1, 0), hence the if (j != f - 1) test.
One could enhance it to detect this case (i isn't set) and to put the first permutation (0, 1, ..., n - 1) into the permutation array.
The code can be compiled with:
gcc main.c -o main -Wall -Wextra -Werror -O0 -g3
And I strongly suggest using valgrind as a way to detect off-by-one errors.
EDIT: I just realized I didn't answer the OP's question precisely. The someMagic() function would allow a direct access to the i-th permutation, while my algorithm only allows to compute the successor in the lexicographic order. But if the aim is to iterate on all the permutations, it will work fine. Otherwise, maybe an answer like this one should match the requirement.

I've come to a solution that is simple enough:
void stepA(STRUCT_NAME *data);
void stepB(STRUCT_NAME *data);
void stepC(STRUCT_NAME *data);
typedef void (*check)(STRUCT_NAME *data);
void swap(check *x, check *y) {
check temp;
temp = *x;
*x = *y;
*y = temp;
}
void permute(check *a, int l, int r,STRUCT_NAME *data) {
int i, j = 0, score;
HAND_T *copy, *copy2, *best_order = NULL;
if (l == r) {
j = 0;
while (j <= r) a[j++](data);
} else {
for (i = l; i <= r; i++) {
swap((a + l), (a + i));
permute(a, l + 1, r, data);
swap((a + l), (a + i));
}
}
}
check checks[3] = {
stepA,
stepB,
stepC,
};
int main(void){
...
permute(checks,0,2,data)
}

Problem with my quicksort implementation

Newbie programmer here trying to implement quicksort, yet it won't work. I've looked at online resources but I just can't seem to spot the error in my implementation. Thanks in advance.
EDIT Issue I'm having seems like it gets stuck in the quicksort function, and the program just hangs. When I tried debugging it with printf's, the original array seems to have been modified with unexpected numbers (not from the original list), such as 0's.
void quicksort(int a[], const int start, const int end)
{
if( (end - start + 1 ) < 2)
return;
int pivot = a[rand()%(end - start)];
//Two pointers
int L = start;
int R = end;
while(L < R)
{
while(a[L] < pivot)
L++;
while(a[R] > pivot)
R--;
if(L < R)
swap(a,L,R);
}
quicksort(a, start, L-1);
quicksort(a, L+1, end );
}
void swap(int a[], const int pos1, const int pos2)
{
a[pos1] ^= a[pos2];
a[pos2] ^= a[pos1];
a[pos1] ^= a[pos2];
}
int main()
{
int array[20] = {0};
int size = sizeof(array)/sizeof(array[0]);//index range = size - 1
int i = 0;
printf("Original: ");
for (i; i < size; i++)
{
array[i] = rand()%100+ 1;
printf("%d ", array[i]);
}
printf("\n");
quicksort(array,0,size-1);
int j = 0;
printf("Sorted: ");
for(j; j < size; j++)
printf("%d ", array[j]);
printf("\n");
}
Additional Question: In regards to calling quicksort recursively, would the left and right pointer always point towards the pivot at the end of each partition? If so, is calling quicksort from start to L-1 and L+1 to end correct?
Also, is the if (L < R) before the swap necessary?

I believe that the problems stem from two errors in the logic. The first one is here:
int pivot = a[rand()%(end - start)];
Note that this always picks a pivot in the range [0, end - start) instead of [start, end). I think you want to have something like
int pivot = a[rand()%(end - start) + start];
so that you pick a pivot in the range you want.
The other error is in this looping code:
while(L < R)
{
while(a[L] < pivot)
L++;
while(a[R] > pivot)
R--;
if(L < R)
swap(a,L,R);
}
Suppose that L < R, but that a[L], a[R], and pivot are all the same value. This might come up, for example, if you were quicksorting a range containing duplicate elements. It also comes up when you use rand with the standard Linux implementation of rand (I tried this on my machine and 27 was duplicated twice). If this is the case, then you never move L or R, because the conditions in the loops always evaluate to false. You will need to update your logic for partitioning elements when duplicates are possible, since otherwise you'll go into an infinite loop here.
Hope this helps!

After the While statement, R should be less than L, try this:
quicksort(a, start, R);
quicksort(a, L, end );
And the statement if(L < R) is not necessary.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Parallel merge in single thread mode very slow - c

The algorithm used by the merge function is best by asymptotic analysis. The complexity is O(n+m). You cannot find a better algorithm since I/O takes O(n+m).

Related

What's wrong with this merge sort code I have done from the CLRS?

How to assign an entity of type DTYPE to SWAP

Merge Sort - getting incorrect results and corrupted data (duplicating the elements) [closed]

Is there a way to iterate over order?

Problem with my quicksort implementation

Categories

Resources