mergesort using 2 thread ID's - c

hi i want to use mergesort to an array using threads i need to use two thread id in order to sort recursionly
here is my code
void Recursive_Divition(int a[], int l, int h,int Degree_of_parallelism,int count)//this function divdis recursivly the array
{//and sends it to merge sort
struct thread_data thread_data_array[Degree_of_parallelism];
int i, len=(h-l+1);
// Using insertion sort for small sized array
if (len==2*Degree_of_parallelism)//a stoping condition if the array size is equal to the degree of parallisim then send the array to be sortrd
{
pthread_t * thread = (pthread_t *)malloc(Degree_of_parallelism*sizeof(pthread_t ));//creating a pointer to thread array
thread_data_array[count].low=l;//the arguments for the merge sort
thread_data_array[count].high=h/2;
pthread_create(&thread[count],NULL,threaded_merge_sort,(void *) &thread_data_array[count]);//create a thread and send it to the function
count++;
thread_data_array[count].low=h/2+1;//the argument for the other thread
thread_data_array[count].high=h;
pthread_create(&thread[count],NULL,threaded_merge_sort,(void *) &thread_data_array[count]);
pthread_join(thread[count-1],NULL);
pthread_join(thread[count],NULL);
free(thread);
}
if(len>Degree_of_parallelism){
Recursive_Divition(a,l,l+len/2-1,Degree_of_parallelism,count);//for the first half
Recursive_Divition(a,l+len/2,h,Degree_of_parallelism,count);//for the secound half
merge(a, l, l+len/2-1, h);
}
}
and here is the threaded merge sort
void *threaded_merge_sort(void *param)
{
printf("Create a thread %u\n", (unsigned int)pthread_self());//this function calls the merge sort function
struct thread_data *my_data;
my_data = (struct thread_data *) param;
int l=my_data->low;
int h=my_data->high;
printf("low is : %d high is : %d\n",l,h);
mergeSort(array,l,h);
pthread_exit(NULL);
and i got the following output the problem appears to be in the index and i didn't know how to fix it
Amount of numbers that sort: 16
Degree of parallelism: 8
Array Before Sort: 1,5,6,33,77,12,90,87,0,10,34,2,741,453,19,132
Create a thread 3158492928
low is : 0 high is : 1
Create a thread 3150100224
low is : 2 high is : 3
Create a thread 3141707520
low is : 4 high is : 3
Create a thread 3133314816
low is : 4 high is : 7
Create a thread 3122734848
low is : 8 high is : 5
Create a thread 3114342144
low is : 6 high is : 11
Create a thread 3105949440
Create a thread 3097556736
low is : 8 high is : 15
low is : 12 high is : 7
Array After Sort: 1,5,6,10,19,33,12,34,0,2,77,87,90,132,453,741

The indexing problem is here:
thread_data_array[count].low=l;//the arguments for the merge sort
thread_data_array[count].high=h/2;
[...]
thread_data_array[count].low=h/2+1;//the argument for the other thread
thread_data_array[count].high=h;
That does not correctly split the (sub)array into halves unless l is 0. For example, if h is 16 then your array break is always at 8, even if that's less than l.
Instead, you appear to want
thread_data_array[count].low = l;
thread_data_array[count].high= (l + h) / 2;
and
thread_data_array[count].low = (l + h) / 2 + 1;
thread_data_array[count].high=h;
Also, I suspect you want an else before the second if in Recursive_Divition(), and you may also need to add final else block.

Related

why my program throws 14000000 instead of 10000000 using threads?

i wrote a simple c program to make every thread multiplate its index by 1000000 and add it to sum , i created 5 threads so the logic answer would be (0+1+2+3+4)*1000000 which is 10000000 but it throws 14000000 instead .could anyone helps me understanding this?
#include<pthread.h>
#include<stdio.h>
typedef struct argument {
int index;
int sum;
} arg;
void *fonction(void *arg0) {
((arg *) arg0) -> sum += ((arg *) arg0) -> index * 1000000;
}
int main() {
pthread_t thread[5];
int order[5];
arg a;
for (int i = 0; i < 5; i++)
order[i] = i;
a.sum = 0;
for (int i = 0; i < 5; i++) {
a.index = order[i];
pthread_create(&thread[i], NULL, fonction, &a);
}
for (int i = 0; i < 5; i++)
pthread_join(thread[i], NULL);
printf("%d\n", a.sum);
return 0;
}
It is 140.. because the behavior is undefined. The results will differ on different machines and other environmental factors. The undefined behavior is caused as a result of all threads accessing the same object (see &a given to each thread) that is modified after the first thread is created.
When each thread runs it accesses the same index (as part of accessing a member of the same object (&a)). Thus the assumption that the threads will see [0,1,2,3,4] is incorrect: multiple threads likely see the same value of index (eg. [0,2,4,4,4]1) when they run. This depends on the scheduling with the loop creating threads as it also modifies the shared object.
When each thread updates sum it has to read and write to the same shared memory. This is inherently prone to race conditions and unreliable results. For example, it could be lack of memory visibility (thread X doesn’t see value updated from thread Y) or it could be a conflicting thread schedule between the read and write (thread X read, thread Y read, thread X write, thread Y write) etc..
If creating a new arg object for each thread, then both of these problems are avoided. While the sum issue can be fixed with the appropriate locking, the index issue can only be fixed by not sharing the object given as the thread input.
// create 5 arg objects, one for each thread
arg a[5];
for (..) {
a[i].index = i;
// give DIFFERENT object to each thread
pthread_create(.., &a[i]);
}
// after all threads complete
int sum = 0;
for (..) {
sum += a[i].result;
}
1 Even assuming that there is no race condition in the current execution wrt. the usage of sum, the sequence for the different threads seeing index values as [0,2,4,4,4], the sum of which is 14, might look as follows:
a.index <- 0 ; create thread A
thread A reads a.index (0)
a.index <- 1 ; create thread B
a.index <- 2 ; create thread C
thread B reads a.index (2)
a.index <- 3 ; create thread D
a.index <- 4 ; create thread E
thread D reads a.index (4)
thread C reads a.index (4)
thread E reads a.index (4)

Why is the array sum coming less than the actual sum when OpenMP is used?

I have written the following program in C using OpenMP library for parallel programming to find the sum of an array of size 10000000. The expected output should be sum of elements = 10000000 but the output I am getting is less than the sum.
#include <stdio.h>
#define ARR_SIZE 10000000
int a[ARR_SIZE];
int main(int argc, char* argv[])
{
int i,tid,numt;
int sum=0;
double t1,t2;
for(i=0;i<ARR_SIZE;i++)
a[i]=1;
t1=omp_get_wtime();
#pragma omp parallel default(shared) private(i,tid)
{
int from,to;
tid=omp_get_thread_num();
numt=omp_get_num_threads();
from = (ARR_SIZE/numt)*tid;
to= (ARR_SIZE/numt)*(tid+1)-1;
if(tid == numt-1)
to= ARR_SIZE-1;
printf("Hello from %d of %d , my range is from = %d to %d \n",tid,numt,from,to);
for(i=from;i<=to;i++)
sum+=a[i];
}
t2=omp_get_wtime();
printf("Sum of the array elements = %d time = %g \n",sum,t2-t1);
return 0;
}
Some of the sample outputs are :
Output 1
Hello from 0 of 4 , my range is from = 0 to 2499999
Hello from 3 of 4 , my range is from = 7500000 to 9999999
Hello from 1 of 4 , my range is from = 2500000 to 4999999
Hello from 2 of 4 , my range is from = 5000000 to 7499999
Sum of the array elements = 3235618 time = 0.118754
Output 2
Hello from 3 of 4 , my range is from = 7500000 to 9999999
Hello from 0 of 4 , my range is from = 0 to 2499999
Hello from 2 of 4 , my range is from = 5000000 to 7499999
Hello from 1 of 4 , my range is from = 2500000 to 4999999
Sum of the array elements = 2964874 time = 0.129216
What is the reason that the given sum is less than the actual sum.
The update of sum variable isn't an atomic operation and is prone to races. Races of this type are likely to yield a lesser than expected sum.
The summing boils down to something of this kind:
Load to register from memory location
Add new value to the register
Store the register value back to the memory
Now when 4 threads perform the above without consideration of one to another, some additions will be lost, resulting in a sum that is below of what was expected.
For example, with 2 threads (for simplicity):
Thread 1: Load to a register from memory location
Thread 2: Load to a register from memory location
Thread 1: Add new value to the register
Thread 2: Add new value to the register
Thread 1: Store the register value back to the memory
Thread 2: Store the register value back to the memory
In this example, at the end the addition of thread 1 will be overridden.
You should make sure the summation is done atomically to avoid races.

Am I using the mutex_trylock correctly?

The racers should have an equal chance of winning. When I run the program the results seem to be correct, both racers win about half the time, but I dont think I am using the mutex_trylock correctly. Is it actually doing anything the way with how I implemented it? I am new to C so I dont know a lot about this.
Program Description:
We assume two racers, at two diagonally opposite corner of a rectangular region. They have to traverse along the roads along the peripheri of the region. There are two bridges on two opposite sides of the rectangle. In order to complete one round of traversal around this, the racers have to get pass for both the bridge at a time. The conditions of the race are
1) Only one racer can get a pass at a time.
2) Before one starts one round, he has to request and get both the passes and then after finishing that round has to release the passes, and make new try to get those passes for the next round.
3) Racer1 (R1) will acquire bridge-pass B1 first, then B0. R0 will acquire B0 and then B1.
4) There is a maximum number of rounds prefixed. Whoever reaches that number first will be the winner and the race will stop.
This is how the situation looks before starting.
B0
R0-------- ~ -------------
| |
| |
| |
| |
--------- ~ ------------- R1
B1
#include<stdio.h>
#include<pthread.h>
#include<stdlib.h>
#define THREAD_NUM 2
#define MAX_ROUNDS 200000
#define TRUE 1
#define FALSE 0
/* mutex locks for each bridge */
pthread_mutex_t B0, B1;
/* racer ID */
int r[THREAD_NUM]={0,1};
/* number of rounds completed by each racer */
int numRounds[THREAD_NUM]={0,0};
void *racer(void *); /* prototype of racer routine */
int main()
{
pthread_t tid[THREAD_NUM];
void *status;
int i,j;
/* create 2 threads representing 2 racers */
for (i = 0; i < THREAD_NUM; i++)
{
/*Your code here */
pthread_create(&tid[i], NULL, racer, &r[i]);
}
/* wait for the join of 2 threads */
for (i = 0; i < THREAD_NUM; i++)
{
/*Your code here */
pthread_join(tid[i], &status);
}
printf("\n");
for(i=0; i<THREAD_NUM; i++)
printf("Racer %d finished %d rounds!!\n", i, numRounds[i]);
if(numRounds[0]>=numRounds[1]) printf("\n RACER-0 WINS.\n\n");
else printf("\n RACER-1 WINS..\n\n");
return (0);
}
void *racer(void *arg)
{
int index = *(int*)arg, NotYet;
while( (numRounds[0] < MAX_ROUNDS) && (numRounds[1] < MAX_ROUNDS) )
{
NotYet = TRUE;
/* RACER 0 tries to get both locks before she makes a round */
if(index==0){
/*Your code here */
pthread_mutex_trylock(&B0);
pthread_mutex_trylock(&B1);
}
/* RACER 1 tries to get both locks before she makes a round */
if(index==1){
/*Your code here */
pthread_mutex_trylock(&B1);
pthread_mutex_trylock(&B0);
}
numRounds[index]++; /* Make one more round */
/* unlock both locks */
pthread_mutex_unlock(&B0);
pthread_mutex_unlock(&B1);
/* random yield to another thread */
}
printf("racer %d made %d rounds !\n", index, numRounds[index]);
pthread_exit(0);
}
when first thread locks B0 and if second get scheduled to lock B1, it will cause deadlock. If first mutex is locked and second is not locked, then release first mutex and loop again. This loop can be smaller if tried with mutex_lock and not trylock.

Parallelize a Function using openMP in C

I wrote a program which inputs matrix size and number of threads and then generated a random binary matrix of 0's and 1's. Then I need to find clusters of 1's and give each cluster a unique number.
I am getting the output correctly but I am having a problem parallelizing the function.
My professor asked me to break the matrix rows into "thread_cnt" parts. i.e.: thread size is 4 and matrix size is 8 then it breaks into 4 matrices having 2 rows each.
The code is as follows:
//Inputted Matrix size n and generated a binary matrix rand1[][]
//
begin = omp_get_wtime();
width = n/thread_cnt;
#pragma omp parallel num_threads(thread_cnt) for
for(d=0;d<n;d=d++)
{
b=d+width;
Mat(d,b);
d=(d-1)+width;
}
Mat(int w,int x)
{
//printf("\n Entered function\n");
for(i=w;i<x;i++)
{
for(j=0;j<n;j++)
{
//printf("\n Entered the loop also\n");
//printf("i = %d, j = %d\n",i,j);
if(rand1[i][j]==1)
{
rand1[i][j]=q;
adj(i,j,q);
q++;
}
}
}
}
adj(int p, int e, int m) //Function to find adjacent 1's
{
//printf("\n Entered adj function\n");
//printf("\n p = %d e = %d m = %d\n",p,e,m);
if (rand1[p][e+1] == 1)
{
//printf("Test1\n");
rand1[p][e+1]=m;
adj(p,e+1,m);
}
if (rand1[p+1][e] == 1)
{
rand1[p+1][e]=m;
//printf("Test2\n");
adj(p+1,e,m);
}
if (rand1[p][e-1] == 1 && e-1>=0)
{
rand1[p][e-1]=m;
//printf("Test3\n");
adj(p,e-1,m);
}
if (p-1>=0 && rand1[p-1][e] == 1)
{
rand1[p-1][e]=m;
//printf("Test4\n");
adj(p-1,e,m);
}
}
The code gives me correct output. But the time increases instead of decreasing when I increase the number of threads. For 1 thread I get 0.000076 and for 2 threads I get
0.000136.
It looks like its iterating instead of parallelizing.
Can anyone help me out on this?
PS: I need to show both Serial time and parallel time and show that I have got a performance increase because of parallelization.
The reason why time increases when thread number increases is that each thread is executing the first loop. It seems that, you don't give the submatrixes into threads, instead each threads operates on every submatrix i.e all matrix.
To make threads work on matrix seperately you should use their unique tid number which you can get with this line :
tid = omp_get_thread_num();
Then you should make a simple mapping : if pid is i operate on (i+1)th submatrix where 0<=i<=nthreads-1
Which can possibly be coded as:
Mat(i*width,i*width+width)

Dynamically allocating work to pthreads via a queue

Okay, so I'm having an issue with dynamically allocating work to pthreads in a queue.
For example, in my code I have a struct like below:
struct calc
{
double num;
double calcVal;
};
I store each struct in an array of length l like below.
struct calc **calcArray;
/* then I initialize the calcArray to say length l and
fill each calc struct with a num*/
Now, based on num, I want to find the value of calcVal. Each struct calc has a different value for num.
I want to spawn 4 pthreads which is easy enough but I want to make it so at the start,
thread 0 gets calcArray[0]
thread 1 gets calcArray[1]
thread 2 gets calcArray[2]
thread 3 gets calcArray[3]
Now assuming that it will take different times for each thread to do the calculations for each calc,
if thread 1 finishes first, it will then get calcArray[4]
then thread 3 finishes and gets calcArray[5] to do
and this continues until it reaches the end of calcArray[l].
I know I could just split the array into l/4 (each thread gets one quarter of the calcs) but I don't want to do this. Instead I want to make the work like a queue. Any ideas on how to do this?
You could accomplish it pretty easily, by creating a variable containing the index of the next element to be assigned, and then having it secured by a mutex.
Example:
// Index of next element to be worked on
int next_pos;
// Mutex that secures next_pos-access
pthread_mutex_t next_pos_lock;
int main() {
// ...
// Initialize the mutex before you create any threads
pthread_mutex_init(&next_pos_lock, NULL);
next_pos = NUM_THREADS;
// Create the threads
// ...
}
void *threadfunc(void *arg) {
int index = ...;
while (index < SIZE_OF_WORK_ARRAY) {
// Do your work
// Update your index
pthread_mutex_lock(&next_pos_lock);
index = next_pos;
next_pos++;
pthread_mutex_unlock(&next_pos_lock);
}
}
See also: POSIX Threads Programming - Mutex Variables

Resources