i hope i will put my question very clear, i am programming pthread,Briefly i calculate the number of threads needed, and pass created threads to a function and back, the function does transpose on different blocks; so each thread has its own block.
To check that im sending different threads, i run pthread_t self_t, but face two problems:
that seems only one same thread is used, and that i always have warning message about the type output of selt_t, below code simplified showing main pints.
any ideas where i went wrong ?
First here struct and main:
pthread_mutex_t mutexZ; // Mutex initialize
int array[nn][nn];
struct v
{
int i, j; // threaded Row,Col
int n, y; //
int iMAX; //
};
void *transposeM(void *arg);
int main(int argc, char *argv[])
{
int Thread_Num = 10;
pthread_t t_ID[Thread_Num]; // the number of threads depending on # blocks
printf("Thread_Num %d\n", Thread_Num);
struct v *data = (struct v *) malloc(sizeof(struct v));
int i, j; //loop varables
//#############################################################
printf("Matrix Initial before Transpose Done\n");
// printing the Matrix Before any transpose if needed testing
for (i = 0; i < nn; i++){
for(j = 0; j< nn; j++){
array[i][j] = i*nn + j;
printf("%d ", array[i][j]);
}
printf("\n");}
//************************************************************/
// Initialize the mutex
pthread_mutex_init(&mutexZ, NULL);
pthread_attr_t attr; //Set of thread attributes
pthread_attr_init(&attr);
int n, y; // Loop Variables for tiling
//************************************************************/
//Start of loop transpose:
int start = 0;
for (n = 0; n < nn; n += TILE)
{
data->n = n; // row
for (y = 0; y <= n; y += TILE) {
data->y = y; // column
printf("y Tile:%d \n", y);
printf("Start before:%d \n", start);
//Transpose the other blocks, thread created for each Block transposed
pthread_create(&(t_ID[start]), NULL, transposeM, (void*) data); // Send the thread to the function
pthread_join(t_ID[start], NULL);
if (start < Thread_Num)
{
start = start + 1;
}
printf("Start after:%d \n", start);
} // End the Y column TileJump loop
} // End of n Row TileJump loop
}
Modified according to the notes,
void *transposeM(void *arg)
{
// Transposing the tiles
struct v *data = arg;
int i, j; //loop row and column
int temp = 0;
pthread_mutex_lock(&mutexZ); //lock the running thread here,so keeps block until thread that holds mutex releases it
pthread_t self_t; // To check the thread id - my check not Mandetory to use
self_t = pthread_self();
printf("Thread number Main = %u \n ", self_t); //here we used u% coz seems the pthread_t is unsigned long data type
//*******************************************************
//here some function to work
//########################################################
pthread_mutex_unlock(&mutexZ);
pthread_exit(NULL);
return (NULL);
} // End
There are two conceptual issues with your code:
You pass the same reference/addrerss to each thread, making each thread work on the same data.
You join the thread immediately after having created it. As joining block until the thread to be joined ended, this sequentialises the running of all threads.
To get around 1. created a unique instance of what data points to for each thread.
To fix 2. move the call to pthread_join() out of the loop creating the threads and put it in a 2nd loop run after creation-loop.
...
printf("Thread_Num %d\n", Thread_Num);
pthread_t t_ID[Thread_Num]; // the number of threads depending on # blocks
struct v data_ID[Thread_Num] = {0}; // define an instance of data for ech thread
...
for (n = 0; n < nn; n += TILE) //limit of row
{
struct v * data = data_ID + start; // assign thread specific instance
data->n = n; // row
for (y = 0; y <= n; y += TILE) // limit of column -here removd the =n, then diagonal tile is not transposed
{
...
pthread_create(&(t_ID[start]), NULL, transposeM, (void*) data); // Send the thread to the function
...
}
} // End the Y column TileJump loop
for (;start >= 0; --start)
{
pthread_join(t_ID[start], NULL);
}
...
Modifications to the thread function:
void *transposeM(void *arg)
{
struct v *data = arg;
...
pthread_t self = pthread_self(); // better naming
...
pthread_exit(NULL); // the thread functions exits here.
return NULL; // this is never reached, but is necessary to calm down thr compiler.
} // End
Related
Given this code:
#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>
void *findPrimes(void *arg)
{
int val = *(int *)arg;
for (int i = val * 1000; i < val * 1000 + 1000; i++)
{
int isPrime = 1;
for (int j = 2; j < i; j++)
{
if (i % j == 0)
{
isPrime = 0;
break;
}
}
if (isPrime)
{
printf("%d\n", i);
}
}
pthread_exit(NULL);
}
int main()
{
pthread_t p[3];
int val[3] = {0, 1, 2};
for (int i = 0; i < 3; i++)
{
pthread_create(&p[i], NULL, findPrimes, &val[i]);
}
for (int i = 0; i < 3; i++)
{
pthread_join(p[i], NULL);
}
return 0;
}
Who prints in 3 threads all the prime number between 0 and 3000.
I want to print them in order, how can i do it?
My professor suggest to use an array of semaphore.
In order to synchronize the actions of all the threads I suggest using a pthread_mutex_t and a pthread_cond_t (a condition variable). You also need a way to share data between threads, so I'd create a struct for that:
#include <pthread.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
typedef struct {
unsigned whos_turn;
pthread_mutex_t mtx;
pthread_cond_t cv;
} shared_data;
whos_turn will here be used to tell the threads whos turn it is to print the primes found.
Each thread also needs some thread-unique information. You called it val so I'll call it val here too. We can compare val with whos_turn to decide which thread it is that should print its result. In order to pass both the shared data and val to a thread, you can package that in a struct too:
typedef struct {
unsigned val;
shared_data *sd; // will point to the one and only instance of `shared_data`
} work_order;
Now, findPrimes need somewhere to store the primes it calculates before it's time to print them. Since the range to search is hardcoded, I'd just add an array for that:
#define SEARCH_RANGE (1000ULL)
void *findPrimes(void *arg) {
work_order *wo = arg;
uintmax_t primes[SEARCH_RANGE]; // to store the found primes
int found_count = 0;
for (uintmax_t i = wo->val*SEARCH_RANGE+1; i <= (wo->val+1)*SEARCH_RANGE; i += 2) {
bool isPrime = true;
for (uintmax_t j = 3; j < i; j += 2) {
if (i % j == 0) { // note: both i and j are odd
isPrime = false;
break;
}
}
if (isPrime) {
primes[found_count++] = i;
}
}
if(wo->val == 0) { // special case for the first range
primes[0] = 2; // 1 is not a prime, but 2 is.
}
// ... to be continued below ...
So far, nothing spectacular. The thread has now found all primes in its range and has come to the synchronizing part. The thread must
lock the mutex
wait for its turn (called "the predicate")
let other threads do the same
Here's one common pattern:
// ... continued from above ...
// synchronize
pthread_mutex_lock(&wo->sd->mtx); // lock the mutex
// only 1 thread at a time reaches here
// check the predicate: That is's this thread's turn to print
while(wo->val != wo->sd->whos_turn) { // <- the predicate
// if control enters here, it was not this thread's turn
// cond_wait internally "unlocks" the mutex to let other threads
// reach here and wait for the condition variable to get signalled
pthread_cond_wait(&wo->sd->cv, &wo->sd->mtx);
// and here the lock is only held by one thread at a time again
}
// only the thread whos turn it is reaches here
Now, the thread has reached the point where it is its time to print. It has the mutex lock so no other threads can reach this point at the same time.
// print the collected primes
for(int i = 0; i < found_count; ++i)
printf("%ju\n", primes[i]);
And hand over to the next thread in line to print the primes it has found:
// step the "whos_turn" indicator
wo->sd->whos_turn++;
pthread_mutex_unlock(&wo->sd->mtx); // release the mutex
pthread_cond_broadcast(&wo->sd->cv); // signal all threads to check the predicate
pthread_exit(NULL);
}
And it can be tied together quite neatly in main:
#define Size(x) (sizeof (x) / sizeof *(x))
int main() {
shared_data sd = {.whos_turn = 0,
.mtx = PTHREAD_MUTEX_INITIALIZER,
.cv = PTHREAD_COND_INITIALIZER};
pthread_t p[3];
work_order wos[Size(p)];
for (unsigned i = 0; i < Size(p); i++) {
wos[i].val = i; // the thread-unique information
wos[i].sd = &sd; // all threads will point at the same `shared_data`
pthread_create(&p[i], NULL, findPrimes, &wos[i]);
}
for (unsigned i = 0; i < Size(p); i++) {
pthread_join(p[i], NULL);
}
}
Demo
I have built this code utilizing pthreads. The goal is to build an array X[N][D] and assign random values to it. You could read the elements of this array as the coefficients of some points.
On the next step I am trying to calculate an array distances[N]which holds all the distances between the last element (Nth) and each other element. The distances calculation is executed using pthreads.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <math.h>
#define N 10
#define D 2 //works for any d
#define NUM_THREADS 8
//double *distances;
//int global_index = 0;
pthread_mutex_t lock;
double *X;
typedef struct
{
//int thread_id;
double *distances;
int *global_index ;
pthread_mutex_t lock;
double *X;
}parms;
void *threadDistance(void *arg)
{
parms *data = (parms *) arg;
double *distances = data->distances;
double *X = data->X;
int *global_idx = data -> global_index;
int idx,j;
//long id = (long)arg;
pthread_mutex_lock(&lock);
while(*global_idx<N)
{
//printf("Thread #%ld , is calculating\n", id);
idx = *(global_idx);
(*global_idx)++;
pthread_mutex_unlock(&lock);
for(j=0 ; j<D; j++)
{
distances[idx] = pow(X[(j+1)*N-1]-X[j*N+idx], 2);
//printf("dis[%d]= ", dis);
//printf("%f\n",distances[idx]);
}
//printf("global : %d\n", *global_idx);
}
pthread_exit(NULL);
}
void calcDistance(double * X, int n, int d)
{
int i;
int temp=0;
pthread_t threads[NUM_THREADS];
double *distances = malloc(n * sizeof(double));
parms arg;
arg.X = X;
arg.distances = distances;
arg.global_index = &temp;
for (i=0 ; i<NUM_THREADS ; i++)
{
pthread_create(&threads[i], NULL, threadDistance, (void *) &arg);
}
for(i = 0 ; i<NUM_THREADS; i++)
{
pthread_join(threads[i], NULL);
}
/*----print dstances[] array-------*/
printf("--------\n");
for(int i = 0; i<N; i++)
{
printf("%f\n", distances[i]);
}
/*------------*/
free(distances);
}
int main()
{
srand(time(NULL));
//allocate the proper space for X
X = malloc(D*N*(sizeof(double)));
//fill X with numbers in space (0,1)
for(int i = 0 ; i<N ; i++)
{
for(int j=0; j<D; j++)
{
X[i+j*N] = (double) (rand() / (RAND_MAX + 2.0));
}
}
calcDistance(X, N, D);
return 0;
}
The problem is that the code executes completely only when N=100000. If N!=100000 the code just hangs and I have found that the source of the problem is the pthread_join() function. First of all I cannot understand why the hang depends on the value of N.
Secondly, I have tried printf()ing the value of global_index (as you can see it is commented out in this particular sample of code). As soon as I uncomment the printf("global : %d\n", *global_idx); command the program stops hanging, regardless of the value of N.
It seems crazy to me as the differences between hanging and not hanging are so irrelevant.
regarding:
pthread_mutex_lock(&lock);
while(*global_idx<N)
{
// ...
pthread_mutex_unlock(&lock);
The result is that after the first iteration of the loop, the mutex is always unlocked. Suggest moving the call to pthread_mutex_lock() to inside the top of the loop.
after making the above corrections, I then set N to 10000. Then re-compiled, etc. The result was a seg fault event, so the mis-handling of the mutex is not the only problem.
regarding:
* First of all I cannot understand why the hang depends on the value of N.*
it seems the program is actually crashing with a seg fault event, not hanging
I want to read as input a table A and B from a user , and make an inner product space from them (a1b1+a2b2+……+anbn) and save it in a local_sum and then share it to an total_sum variable. I am doing the bellow code , but there is a segment fault. For some reason table A & B can't pass to function MUL. Any help would be great, thank you!
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#define N 2
int p;
int A[N],B[N];
int local_sum;
void *mul(void *arg)
{
int lines, start, end, i, j;
int id = *(int*)arg;
lines = N / p;
start = id * lines;
end = start + lines;
for (i = start; i < end; i++)
local_sum = A[i] * B[i] + local_sum;
return NULL;
}
int main (int argc, char *argv[])
{
int i;
pthread_t *tid;
if (argc != 2)
{
printf("Provide number of threads.\n");
exit(1);
}
p = atoi(argv[1]);
tid = (pthread_t *)malloc(p * sizeof(pthread_t));
if (tid == NULL)
{
printf("Could not allocate memory.\n");
exit(1);
}
printf("Give Table A\n");
for (int i = 0; i < N; i++)
{
scanf("%d", &A[i]);
}
printf("Give Table B\n");
for (int i = 0; i < N; i++)
{
scanf("%d", &B[i]);
}
for (i = 0; i < p; i++)
{
int *a;
a = malloc(sizeof(int));
*a = 0;
pthread_create(&tid[i], NULL, mul, a);
}
for (i = 0; i < p; i++)
pthread_join(tid[i], NULL);
printf("%d", local_sum);
return 0;
}
Let's see:
You want to have p threads, working on the vectors A and B.
You must be aware of that threads share the same memory, and might be interrupted at any time.
You've got p threads, all trying to write to one shared variable local_sum. This leads to unpredictable results since one thread overwrites the value another thread has written there before.
You can bypass this problem by ensuring exclusive access of one single thread to this variable by using a mutex or the like, or you could have one variable per thread, have each thread produce an intermediate result and after joining all threads, collapse all your intermediate results into the final one.
To do this, your main should look something like (assuming your compiler supports a recent C standard):
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#define N 2
/* these are variables shared amongst all threads */
int p;
int A[N], B[N];
/* array with one slot per thread to receive the partial result of each thread */
int* partial_sum;
/* prototype of thread function, just to be independent of the place mul will be placed in the source file... */
void *mul(void *arg);
int main (int argc, char** argv)
{
pthread_t* tid;
p = atoi(argv[1]);
const size_t n_by_p = N/p;
if(n_by_p * p != N)
{
fprintf(stderr, "Number of threads must be an integral factor of N\n");
exit(EXIT_FAILURE) ;
}
tid = calloc(p, sizeof(pthread_t));
partial_sum = calloc(p, sizeof(int)) ;
printf("Give Table A\n");
for(size_t i = 0; i < N; ++i)
{
scanf("%d",&A[i]);
}
printf("Give Table B\n");
for(size_t i = 0; i < N; ++i)
{
scanf("%d",&B[i]);
}
for (size_t i =0; i < p; ++i)
{
/* clumsy way to pass a thread it's slot number, but works as a starter... */
int *a;
a = malloc(sizeof(int));
*a = i;
pthread_create(&tid[i], 0, mul, a);
}
for (size_t i = 0; i < p; ++i)
{
pthread_join(tid[i], 0);
}
free(tid);
tid = 0;
int total_sum = 0;
for (size_t i = 0; i < p; ++i)
{
total_sum += partial_sum[i] ;
}
free(partial_sum);
partial_sum = 0;
printf("%d",total_sum);
return EXIT_SUCCESS;
}
Your threaded method mul should now write to its particular partial_sum slot only :
void *mul(void *arg)
{
int slot_num = *(int*)arg;
free(arg);
arg = 0;
const size_t lines = N/p;
const size_t start = slot_num * lines;
const size_t end = start + lines;
partial_sum[slot_num] = 0;
for(size_t i = start; i < end; ++i)
{
partial_sum[slot_num] += A[i]*B[i];
}
return 0;
}
Beware: This code runs smoothly, only if N is some integral multiple of p.
If this condition is not met, due to truncation in N/p, not all elements of the vectors will be processed.
However, fixing these cases is not the core of this question IMHO.
I spared all kinds of error-checking, which you should add, should this code become part of some operational setup...
if (tid=NULL)
-->
if (tid==NULL)
and
for (i=start;i<end;i++)
I suppose we need
for (i=0;i<end-start;i++)
I'm using pthreads in C in order to perform two operations on an int array: one operation doubles the value of a cell, the other operation halves the value of the cell. If after doubling a cell its value will become greater than the max allowed value the thread needs to wait until another thread will halve the value of that cell. The way I initialized the array is that the first 5 cells have value that is very close to max allowed and the other five have a value far from the max.
I decided to use a global mutex and condition variable for this. In the main first spawn 10 doubler threads then another 10 halver threads. But then my program freezes. I can't understand what the problem is, any help is appreciated.
My motivation is to better understand pthreads and condition variables.
This is the code:
#include <stdio.h>
#include <stdlib.h>
#include <ntsid.h>
#include <pthread.h>
#include <unistd.h>
#define MAX 20
#define THREADS_NUM 10
#define OFFSET 10
typedef struct myStruct {
int cellId;
} myStruct;
int * cells;
pthread_mutex_t globalMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t globalCond = PTHREAD_COND_INITIALIZER;
pthread_t threads[THREADS_NUM * 2];
void * DoublerThread(void * arg) {
myStruct * myStr = (myStruct *) arg;
int id = myStr->cellId;
pthread_mutex_t mutex = globalMutex;
pthread_cond_t condition = globalCond;
pthread_mutex_lock(&mutex);
while((cells[id] * 2) > MAX) {
printf("Waiting... id = %d\n", id);
pthread_cond_wait(&condition, &mutex);
}
cells[id] *= 2;
printf("new val = %d, id = %d\n", cells[id], id);
pthread_mutex_unlock(&mutex);
pthread_exit(NULL);
}
void * HalverThread(void * arg) {
myStruct * myStr = (myStruct *) arg;
int id = myStr->cellId;
pthread_mutex_t mutex = globalMutex;
pthread_cond_t condition = globalCond;
sleep(1);
pthread_mutex_lock(&mutex);
cells[id] /= 2;
pthread_cond_broadcast(&condition);
pthread_mutex_unlock(&mutex);
pthread_exit(NULL);
}
void initMyStructs(myStruct ** myStructs) {
int i;
for(i = 0; i < THREADS_NUM * 2; i++) {
myStructs[i] = (myStruct *) malloc(sizeof(myStruct) * 2);
if(!myStructs[i]) {
printf("malloc error\n");
exit(EXIT_FAILURE);
}
myStructs[i]->cellId = i % THREADS_NUM;
}
}
void initCells() {
int i, tmp;
cells =(int *) malloc(sizeof(int));
if(!cells) {
printf("malloc error\n");
exit(EXIT_FAILURE);
}
for(i = 0; i <= THREADS_NUM; i++) {
if(i < THREADS_NUM / 2) {
cells[i] = MAX - 1;
} else {
tmp = cells[i] = 1;
}
}
}
int main() {
int i;
myStruct ** myStructs;
initMyStructs(myStructs);
initCells();
//create 10 Doubler threads
for(i = 0; i < THREADS_NUM; i++) {
pthread_create(&threads[i], NULL, DoublerThread, (void *) myStructs[i]);
}
//create 10 Halver threads
for(i = 0; i < THREADS_NUM; i++) {
pthread_create(&threads[i + OFFSET], NULL, HalverThread, (void *) myStructs[i + OFFSET]);
}
for(i = 0; i < THREADS_NUM + OFFSET; i++) {
pthread_join(threads[i], NULL);
}
return 0;
}
You have made “private” mutexes and condition variables for each thread, so they are not synchronizing in any (meaningful) way. Rather than this:
pthread_mutex_t mutex = globalMutex;
pthread_cond_t condition = globalCond;
Just use the globalMutex, and globalCond -- that is what you actually want.
[
I moved this in here, because I think we are supposed to. I can't intuit SO-iquette.
]
By the way, just to make sure I understand this, the mutex is per
cell, so that multiple threads can work on multiple cells
simultaneously, right? Just not two threads on the same cell. –
So, what you probably want is something more like:
typedef struct myStruct {
int cellId;
pthread_mutex_t lock;
pthread_cond_t wait;
} myStruct;
and in InitMyStruct():
myStructs[i]->cellId = i % THREADS_NUM;
pthread_mutex_init(&myStructs[i]->lock, NULL);
pthread_cond_init(&myStructs[i]->wait, NULL);
and in Halvers:
pthread_mutex_lock(&myStr->lock);
cells[id] /= 2;
pthread_cond_broadcast(&myStr->wait);
pthread_mutex_unlock(&myStr->lock);
and Doubler:
...
pthread_mutex_lock(&myStr->lock);
while((cells[id] * 2) > MAX) {
printf("Waiting... id = %d\n", id);
pthread_cond_wait(&myStr->wait, &myStr->lock);
}
cells[id] *= 2;
printf("new val = %d, id = %d\n", cells[id], id);
pthread_mutex_unlock(&myStr->lock);
So currently, only one thread can make changes to the array at a time?
But then the program exits after about a second, if threads couldn't
be making changes to the array simultaneously then wouldn't the
program take 10 seconds to finish, because each HalverThread sleeps
for 1 second. – Yos 6 hours
The Halvers sleep before grabbing the mutex, thus all sleep near simultaneously, wake up, fight for mutex and continue.
Hello,
I have created a multithreaded application for multiplying two matrices using pthreads,but to my surprise the multithreaded program is taking much time than my expectation.
I dnt know where is the problem in my code,the code snippet is given below::
#include "pthreads.h"
#include "cv.h"
#include "cxcore.h"
CvMat * matA; /* first matrix */
CvMat * matB; /* second matrix */
CvMat * matRes; /* result matrix */
int size_x_a; /* this variable will be used for the first dimension */
int size_y_a; /* this variable will be used for the second dimension */
int size_x_b,size_y_b;
int size_x_res;
int size_y_res;
struct v {
int i; /* row */
int j; /* column */
};
void *printThreadID(void *threadid)
{
/*long id = (long) threadid;
//printf("Thread ID: %ld\n", id);
arrZ[id] = arrX[id] + arrY[id];
pthread_exit(NULL);*/
return 0;
}
int main()
{
/* assigining the values of sizes */
size_x_a = 200;
size_y_a = 200;
size_x_b = 200;
size_y_b = 200;
/* resultant matrix dimensions */
size_x_res = size_x_a;
size_y_res = size_y_b;
matA = cvCreateMat(size_x_a,size_y_a,CV_64FC1);
matB = cvCreateMat(size_x_b,size_y_b,CV_64FC1);
matRes = cvCreateMat(size_x_res,size_y_res,CV_64FC1);
pthread_t thread1;
pthread_t thread2;
pthread_t multThread[200][200];
int res1;
int res2;
int mulRes;
/*******************************************************************************/
/*Creating a thread*/
res1 = pthread_create(&thread1,NULL,initializeA,(void*)matA);
if(res1!=0)
{
perror("thread creation of thread1 failed");
exit(EXIT_FAILURE);
}
/*Creating a thread*/
res2 = pthread_create(&thread2,NULL,initializeB,(void*)matB);
if(res2!=0)
{
perror("thread creation of thread2 failed");
exit(EXIT_FAILURE);
}
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
/*Multiplication of matrices*/
for(int i=0;i<size_x_a;i++)
{
for(int j=0;j<size_y_b;j++)
{
struct v * data = (struct v*)malloc(sizeof(struct v));
data->i = i;
data->j = j;
mulRes = pthread_create(&multThread[i][j],NULL,multiplication, (void*)data);
}
}
for(int i=0;i<size_x_a;i++)
{
for(int j=0;j<size_y_b;j++)
{
pthread_join(multThread[i][j],NULL);
}
}
for(int i =0;i<size_x_a;i++)
{
for(int j = 0;j<size_y_a;j++)
{
printf("%f ",cvmGet(matA,i,j));
}
}
return 0;
}
void * multiplication(void * param)
{
struct v * data = (struct v *)param;
double sum =0;
for(int k=0;k<size_x_a;k++)
sum += cvmGet(matA,data->i,k) * cvmGet(matB,k,data->j);
cvmSet(matRes,data->i,data->j,sum);
pthread_exit(0);
return 0;
}
void * initializeA(void * arg)
{
CvMat * matA = (CvMat*)arg;
//matA = (CvMat*)malloc(size_x_a * sizeof(CvMat *));
/*initialiazing random values*/
for (int i = 0; i < size_x_a; i++)
{
for (int j = 0; j < size_y_a; j++)
{
cvmSet(matA,i,j,size_y_a + j); /* just some unique number for each element */
}
}
return 0;
}
void * initializeB(void * arg)
{
CvMat* matB = (CvMat*)arg;
//matB = (CvMat*)malloc(size_x_b * sizeof(CvMat *));
/*initialiazing random values*/
for (int i = 0; i < size_x_b; i++)
{
for (int j = 0; j < size_y_b; j++)
{
cvmSet(matB,i,j,size_y_b + j); /* just some unique number for each element */
}
}
return 0;
}
void * initializeRes(void * arg)
{
CvMat * res = (CvMat*)arg;
//res = (CvMat*)malloc(size_x_res * sizeof(CvMat *));
/* for matrix matRes, allocate storage for an array of ints */
for (int i = 0; i < size_x_res; i++)
{
for (int j = 0; j < size_y_res; j++)
{
cvmSet(matRes,i,j,0);
}
}
return 0;
}
I am doing this multithreading for the first time.
Kindly help me with this,any suggestion or correction will be very helpful.
Thanks in advance.
You're creating ALOT of threads, which will involve lots of context switches. If each thread is doing pure calculations, and wont involve any sort of waiting (like networking, sockets, etc) there is no reason why threading will be faster than not threaded. Unless of course you are on a multi CPU/core machine, then you should create one thread per core. With this sort of processing, more threads than cores will just slow it down.
What you could do is divide the work-set into tasks that can be enqueued, and have worker threads (one/CPU core) that will pull the tasks off of a common worker queue. This is a standard producer/consumer problem.
Here is some generic info about the producer/consumer problem.
Its been a long time since Ive done matrix multiplication, so bear with me :) It appears that you could divide the following into separate tasks:
/*Multiplication of matrices*/
for(int i=0;i<size_x_a;i++)
{
for(int j=0;j<size_y_b;j++)
{
struct v * data = (struct v*)malloc(sizeof(struct v));
data->i = i;
data->j = j;
/* Instead of creating a thread, create a task and put it on the queue
* mulRes = pthread_create(&multThread[i][j],NULL,multiplication, (void*)data);
*/
/* Im not going to implement the queue here, since there are several available
* But remember that the queue access MUST be mutex protected. */
enqueue_task(data);
}
}
Previously, you will have to have created what is called the thread-pool (the worker threads, one per CPU core), whose worker function will try to pull off the queue and execute the work. There are ways to do this with pthread conditional variables, whereby the threads are blocked/waiting on the cond var if the queue is empty, and once the queue is populated, then the cond var is signalled, thus releasing the threads so they can start working.
If this is not a logical division of work, and you cant find one, then perhaps this problem is not suitable for multi-threading.