Matrix multiplication using pthreads - c

I am trying to do matrix multiplication using pthreads and creating one thread for each computation of each row instead of each element. Suppose there are two matrices
A[M][K],B[K][N] . Where am I going wrong ?
int A[M][K];
int B[K][N];
int C[][];
void *runner (void *param);
struct v
{
int i;
int j;
};
pthread_t tid[M];
for (i = 0; i < M; i++) // It should create M threads
{
struct v *data = (struct v *) malloc (sizeof (struct v));
data->i = i;
data->j = j;
pthread_create (&tid[count], &attr, runner, data);
pthread_join (tid[count], NULL);
count++;
}
runner (void *param) //
{
struct v *test;
int t = 0;
test = (struct v *) param;
for (t = 0; t < K; t++) // I want to compute it for a row instead of an element
{
C[test->i][test->j] = C[test->i][test->j] + A[test->i][t] * B[t][test->j];
}
pthread_exit (0);
}

First, get rid of data->j. If you are computing entire rows the row index is the only thing your thread needs. Right now your runner(..) computes a single element. You have to iterate over all row elements computing them one by one.
Second, do not join a thread right after it is created. This way you have only one thread running at a time. Start joining threads when all threads have been created.

Related

Attempting to understand multithreading which involves struct. Getting output "Segmentation fault (core dumped)"

I created this program to understand multithreading and have tested this program with single thread and works. Basically you enter 3 digits. First one as an initiale number, Second one is how many squence it will be run and last number is used for the number of threads required. Program will add the first 2 numbers in a struct that has: start, iteration and result. The algorithm will start multiplying the first number by 2 for the number of times you entered in the second number.
example: 1 3 2.
I've done the program in normally which works. but once i introduce pthread i'm getting Segmentation core dump error. I've spend hours trying to identify what is causing it, but no luck.
//The program will do: 1 * 2 = 2, 2 * 2 = 4, 4 * 2 = 8
//The results will be stored in a the struct result which is a pointer.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
struct Params
{
int start;
int iteration;
int *result;
};
void *double_number(void *vFirststruct)
{
struct Params *Firststruct = (struct Params *)vFirststruct;
int iter = 0;
Firststruct->result = (int *)malloc(sizeof(int) * Firststruct->iteration);
for (iter = 0; iter < Firststruct->iteration; iter++)
{
// printf("%d\n", Firststruct->start);
Firststruct->start = Firststruct->start * 2;
Firststruct->result[iter] = Firststruct->start;
}
}
void double_number_Single_Thread(struct Params *Firststruct)
{
int iter = 0;
Firststruct->result = (int *)malloc(sizeof(int) * Firststruct->iteration);
for (iter = 0; iter < Firststruct->iteration; iter++)
{
printf("%d\n", Firststruct->start);
Firststruct->start = Firststruct->start * 2;
Firststruct->result[iter] = Firststruct->start;
}
}
int main(int argc, char *argv[])
{
struct Params *Firststruct = (struct Params *)malloc(sizeof(struct Params));
Firststruct->start = atoi(argv[1]);
Firststruct->iteration = atoi(argv[2]);
int threads = atoi(argv[3]);
//For Single Thread
// double_number_Single_Thread(Firststruct); // <-- testing on single thread
// for (int i = 0; i < Firststruct->iteration; i++)
// {
// printf("%d %d\n", i, Firststruct->result[i]);
// }
//End for Single Thread
//Start of Single thread using pthread-Thread
pthread_t *t = (pthread_t *)malloc(threads * sizeof(pthread_t));
pthread_create(&t[0], NULL, &double_number, (void *)&Firststruct);
pthread_join(t[0], NULL);
//End for Single Thread
//Start of Multi thread
// for (int i = 0; i < threads; i++)
// {
// pthread_create(&t[i], NULL, &double_number, (void *)&Firststruct);
// }
// for (int i = 0; i < threads; i++)
// {
// pthread_join(t[i], NULL);
// }
free(Firststruct);
return 0;
}
The main problem you have (ignoring the fact that different thread will modify the same data) is your pthread_create call.
pthread_create(&t[0], NULL, &double_number, (void *) & Firststruct);
Should be
pthread_create(&t[0], NULL, &double_number, (void *) Firststruct);
Indeed Firststruct is already a pointer on struct Params, the extra & causes the mess.

pthread same ID and output self_t

i hope i will put my question very clear, i am programming pthread,Briefly i calculate the number of threads needed, and pass created threads to a function and back, the function does transpose on different blocks; so each thread has its own block.
To check that im sending different threads, i run pthread_t self_t, but face two problems:
that seems only one same thread is used, and that i always have warning message about the type output of selt_t, below code simplified showing main pints.
any ideas where i went wrong ?
First here struct and main:
pthread_mutex_t mutexZ; // Mutex initialize
int array[nn][nn];
struct v
{
int i, j; // threaded Row,Col
int n, y; //
int iMAX; //
};
void *transposeM(void *arg);
int main(int argc, char *argv[])
{
int Thread_Num = 10;
pthread_t t_ID[Thread_Num]; // the number of threads depending on # blocks
printf("Thread_Num %d\n", Thread_Num);
struct v *data = (struct v *) malloc(sizeof(struct v));
int i, j; //loop varables
//#############################################################
printf("Matrix Initial before Transpose Done\n");
// printing the Matrix Before any transpose if needed testing
for (i = 0; i < nn; i++){
for(j = 0; j< nn; j++){
array[i][j] = i*nn + j;
printf("%d ", array[i][j]);
}
printf("\n");}
//************************************************************/
// Initialize the mutex
pthread_mutex_init(&mutexZ, NULL);
pthread_attr_t attr; //Set of thread attributes
pthread_attr_init(&attr);
int n, y; // Loop Variables for tiling
//************************************************************/
//Start of loop transpose:
int start = 0;
for (n = 0; n < nn; n += TILE)
{
data->n = n; // row
for (y = 0; y <= n; y += TILE) {
data->y = y; // column
printf("y Tile:%d \n", y);
printf("Start before:%d \n", start);
//Transpose the other blocks, thread created for each Block transposed
pthread_create(&(t_ID[start]), NULL, transposeM, (void*) data); // Send the thread to the function
pthread_join(t_ID[start], NULL);
if (start < Thread_Num)
{
start = start + 1;
}
printf("Start after:%d \n", start);
} // End the Y column TileJump loop
} // End of n Row TileJump loop
}
Modified according to the notes,
void *transposeM(void *arg)
{
// Transposing the tiles
struct v *data = arg;
int i, j; //loop row and column
int temp = 0;
pthread_mutex_lock(&mutexZ); //lock the running thread here,so keeps block until thread that holds mutex releases it
pthread_t self_t; // To check the thread id - my check not Mandetory to use
self_t = pthread_self();
printf("Thread number Main = %u \n ", self_t); //here we used u% coz seems the pthread_t is unsigned long data type
//*******************************************************
//here some function to work
//########################################################
pthread_mutex_unlock(&mutexZ);
pthread_exit(NULL);
return (NULL);
} // End
There are two conceptual issues with your code:
You pass the same reference/addrerss to each thread, making each thread work on the same data.
You join the thread immediately after having created it. As joining block until the thread to be joined ended, this sequentialises the running of all threads.
To get around 1. created a unique instance of what data points to for each thread.
To fix 2. move the call to pthread_join() out of the loop creating the threads and put it in a 2nd loop run after creation-loop.
...
printf("Thread_Num %d\n", Thread_Num);
pthread_t t_ID[Thread_Num]; // the number of threads depending on # blocks
struct v data_ID[Thread_Num] = {0}; // define an instance of data for ech thread
...
for (n = 0; n < nn; n += TILE) //limit of row
{
struct v * data = data_ID + start; // assign thread specific instance
data->n = n; // row
for (y = 0; y <= n; y += TILE) // limit of column -here removd the =n, then diagonal tile is not transposed
{
...
pthread_create(&(t_ID[start]), NULL, transposeM, (void*) data); // Send the thread to the function
...
}
} // End the Y column TileJump loop
for (;start >= 0; --start)
{
pthread_join(t_ID[start], NULL);
}
...
Modifications to the thread function:
void *transposeM(void *arg)
{
struct v *data = arg;
...
pthread_t self = pthread_self(); // better naming
...
pthread_exit(NULL); // the thread functions exits here.
return NULL; // this is never reached, but is necessary to calm down thr compiler.
} // End

Calculate series with multithreading in C doesn't work as expected

I am trying to write a program in C that calculates the series:
for(i=0; i <= n; i++){
(2*i+1)/factorial(2*i);
}
n is the number of elements, determined by the user as an argument.
The user also determines the number of threads that are going to calculate the series.
I divide the series in subseries that calculate only a part of the series and each subseries should be calculated by a single thread. The problem is that my threads probably share memory because some series members are calculated many times and others are not calculated at all. Do you know why? Please help!
Here is the problematic part of the code:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <gmp.h>
#include <math.h>
#include <pthread.h>
/* a struct to pass function arguments to the thread */
struct intervals_struct {
int **intervals;
mpf_t *result;
int thread_index;
};
/* calculate the sum of the elements of the subseries;
doesn't work properly for more than one thread */
void* sum_subinterval(void *args) {
/* Initialize the local variables here */
struct intervals_struct *p = (struct intervals_struct*)args;
for(i=(*p).intervals[(*p).thread_index][0]; i<=(*p).intervals[(*p).thread_index][1]; i++){
/* Do something with the local variables here */
}
mpf_set((*p).result[(*p).thread_index],sum);
/* Free resources used by the local variables here */
}
/* calculate the sum of all subseries */
void main(int argc, char * argv[]){
int p, t, i;
p = atoi(argv[1]);
assert( p >= 0);
t = atoi(argv[2]);
assert( t >= 0);
int **intervals_arr;
intervals_arr = (int**)malloc(t * sizeof(int *));
for(i = 0; i < t; i++) {
intervals_arr[i] = (int *)malloc(2 * sizeof(int));
}
/* Calculate intervals and store them in intervals_arr here */
mpf_t *subinterval_sum;
subinterval_sum = (mpf_t*)malloc(t * sizeof(mpf_t));
for(i=0; i < t; i++) {
mpf_init(subinterval_sum[i]);
}
pthread_t *tid;
tid = (pthread_t *)malloc(t * sizeof(pthread_t));
for(i = 0; i < t; i++) {
struct intervals_struct args = {intervals_arr, subinterval_sum, i};
pthread_create(&(tid[i]), NULL, sum_subinterval, (void*)&args);
}
for(i = 0; i < t; i++) {
pthread_join(tid[i], NULL);
}
/* Sum the elements of the result array and free resources used in main here */
}
The problem is probably here:
for(i = 0; i < t; i++) {
struct intervals_struct args = {intervals_arr, subinterval_sum, i};
pthread_create(&(tid[i]), NULL, sum_subinterval, (void*)&args);
}
You are passing the address of args to your new thread, but the lifetime of that variable ended immediately after the pthread_create call. The compiler can and will reuse the stack space occupied by args between different loop iterations.
Try allocating an array on the heap with malloc instead.
Edit: What I meant by that last sentence is something like this:
struct intervals_struct * args = (struct intervals_struct *) calloc(t, sizeof(struct intervals_struct));
for(i = 0; i < t; i++) {
args[i].intervals = intervals_arr;
args[i].result = subinterval_sum;
args[i].thread_index = i;
pthread_create(&(tid[i]), NULL, sum_subinterval, (void*)&args[i]);
}
// at the end of main(), or at least after every thread has been joined
free(args);

Segmentation Fault at pthread_join

So when I run my code, I'm getting a segmentation fault right at the pthread_join. There is a print statement after my pthread_join that doesn't run. Does anyone have any idea why? Could you give me some hints or ideas as to how to figure this out??
the output prints out all of row numbers for my matrix until the end, then it leaves matrixCalc function and prints "after threads are created". This happens when I put in an argument for 1 thread.
I've included a small section of my code here:
int main(int argc, char*argv[])
{
//takes in number of threads as 1st arg
pthread_attr_init(&attr);
//initialize matrix here
//passes num of threads through matrixcalc
for(i = 0; i < numberOfThreads; i++)
{
threadCount++;
pthread_create(&tid, &attr, matrixCalc(threadCount), NULL);
}
printf("after threads are created\n");
pthread_join(tid, NULL);
printf("after join\n");
exit(0);
return 0;
}
Here is matrix calc function:
void *matrixCalc(threadCount)
{
int i, j, sum, tempNum, currentRow;
currentRow = threadCount;
sum=0;
while(currentRow < 1200)
{
//cycles through the column j for matrix B
for(j=0; j<500; j++)
{
//cycles through the diff i values for the set row in matrix A and column in matrix B
for(i=0; i<1000; i++)
{
//Matrix A set i value is at threadcount-1
//Matrix B i value = j
//Matrix B j value = i
//Multiply together and add to sum
tempNum = (matrixA[currentRow-1][i])*(matrixB[i][j]);
sum = sum+tempNum;
}
//Set Matrix C at i value = currentRow and jvalue = i to sum
matrixC[currentRow-1][j] = sum;
//printf("%d\n", matrixC[currentRow-1][i]);
}
//increase threadcount by number of threads
//until you hit max/past max val
currentRow = currentRow + nThreads;
//printf("%d\n", currentRow);
}
return NULL;
}
When calling pthread_create() you need to pass the address of a function of type void *(*)(void *). What the code does is calling a function there so its result is getting passed to pthread_create().
Change this line
pthread_create(&tid, &attr, matrixCalc(threadCount), NULL);
to become
pthread_create(&tid, &attr, matrixCalc, NULL);
or
pthread_create(&tid, &attr, &matrixCalc, NULL);
which in fact is the same.
As already mentioned above the thread function needs to be declared as void *(*)(void *).
So change this
void *matrixCalc(threadCount)
will will become this
void * matrixCalc(void *)
As the code seems to try to spawn off multiple threads and all should be joined perpare room to store the several pthread-ids.
This could for example be done using an array like this:
pthread_t tid[numberOfThreads] = {0};
Then create the thread like this:
pthread_create(&tid[i], &attr, matrixCalc, NULL);
To passed the thread number (counter i) down to the thread also give it room by defining
int thread_counts[numberOfThreads] = {0};
assign it and pass it as 4th parameter on the thread's creation:
thread_counts[i] = i;
pthread_create(&tid[i], &attr, matrixCalc, &thread_Counts[i]);
Down in the thread function then get it by modifying
void *matrixCalc(threadCount)
{
int i, j, sum, tempNum, currentRow;
currentRow = threadCount;
...
like this:
void * matrixCalc(void * pv)
{
int i, j, sum, tempNum, currentRow;
currentRow = *((int*) pv);
...
Finally to join all thread replace the single call to pthread_join() by a loop:
for (i = 0; i < numberOfThreads; ++i)
{
pthread_join(tid[i], NULL);
}
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void * (*start_routine) (void *), void *arg);
The third parameter is a start function taking a void ptr and returning a void ptr.
The fourth parameter takes a void ptr pointing to the data you want to pass, in this case threadcnt.

Pthread program taking longer time than expected

Hello,
I have created a multithreaded application for multiplying two matrices using pthreads,but to my surprise the multithreaded program is taking much time than my expectation.
I dnt know where is the problem in my code,the code snippet is given below::
#include "pthreads.h"
#include "cv.h"
#include "cxcore.h"
CvMat * matA; /* first matrix */
CvMat * matB; /* second matrix */
CvMat * matRes; /* result matrix */
int size_x_a; /* this variable will be used for the first dimension */
int size_y_a; /* this variable will be used for the second dimension */
int size_x_b,size_y_b;
int size_x_res;
int size_y_res;
struct v {
int i; /* row */
int j; /* column */
};
void *printThreadID(void *threadid)
{
/*long id = (long) threadid;
//printf("Thread ID: %ld\n", id);
arrZ[id] = arrX[id] + arrY[id];
pthread_exit(NULL);*/
return 0;
}
int main()
{
/* assigining the values of sizes */
size_x_a = 200;
size_y_a = 200;
size_x_b = 200;
size_y_b = 200;
/* resultant matrix dimensions */
size_x_res = size_x_a;
size_y_res = size_y_b;
matA = cvCreateMat(size_x_a,size_y_a,CV_64FC1);
matB = cvCreateMat(size_x_b,size_y_b,CV_64FC1);
matRes = cvCreateMat(size_x_res,size_y_res,CV_64FC1);
pthread_t thread1;
pthread_t thread2;
pthread_t multThread[200][200];
int res1;
int res2;
int mulRes;
/*******************************************************************************/
/*Creating a thread*/
res1 = pthread_create(&thread1,NULL,initializeA,(void*)matA);
if(res1!=0)
{
perror("thread creation of thread1 failed");
exit(EXIT_FAILURE);
}
/*Creating a thread*/
res2 = pthread_create(&thread2,NULL,initializeB,(void*)matB);
if(res2!=0)
{
perror("thread creation of thread2 failed");
exit(EXIT_FAILURE);
}
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
/*Multiplication of matrices*/
for(int i=0;i<size_x_a;i++)
{
for(int j=0;j<size_y_b;j++)
{
struct v * data = (struct v*)malloc(sizeof(struct v));
data->i = i;
data->j = j;
mulRes = pthread_create(&multThread[i][j],NULL,multiplication, (void*)data);
}
}
for(int i=0;i<size_x_a;i++)
{
for(int j=0;j<size_y_b;j++)
{
pthread_join(multThread[i][j],NULL);
}
}
for(int i =0;i<size_x_a;i++)
{
for(int j = 0;j<size_y_a;j++)
{
printf("%f ",cvmGet(matA,i,j));
}
}
return 0;
}
void * multiplication(void * param)
{
struct v * data = (struct v *)param;
double sum =0;
for(int k=0;k<size_x_a;k++)
sum += cvmGet(matA,data->i,k) * cvmGet(matB,k,data->j);
cvmSet(matRes,data->i,data->j,sum);
pthread_exit(0);
return 0;
}
void * initializeA(void * arg)
{
CvMat * matA = (CvMat*)arg;
//matA = (CvMat*)malloc(size_x_a * sizeof(CvMat *));
/*initialiazing random values*/
for (int i = 0; i < size_x_a; i++)
{
for (int j = 0; j < size_y_a; j++)
{
cvmSet(matA,i,j,size_y_a + j); /* just some unique number for each element */
}
}
return 0;
}
void * initializeB(void * arg)
{
CvMat* matB = (CvMat*)arg;
//matB = (CvMat*)malloc(size_x_b * sizeof(CvMat *));
/*initialiazing random values*/
for (int i = 0; i < size_x_b; i++)
{
for (int j = 0; j < size_y_b; j++)
{
cvmSet(matB,i,j,size_y_b + j); /* just some unique number for each element */
}
}
return 0;
}
void * initializeRes(void * arg)
{
CvMat * res = (CvMat*)arg;
//res = (CvMat*)malloc(size_x_res * sizeof(CvMat *));
/* for matrix matRes, allocate storage for an array of ints */
for (int i = 0; i < size_x_res; i++)
{
for (int j = 0; j < size_y_res; j++)
{
cvmSet(matRes,i,j,0);
}
}
return 0;
}
I am doing this multithreading for the first time.
Kindly help me with this,any suggestion or correction will be very helpful.
Thanks in advance.
You're creating ALOT of threads, which will involve lots of context switches. If each thread is doing pure calculations, and wont involve any sort of waiting (like networking, sockets, etc) there is no reason why threading will be faster than not threaded. Unless of course you are on a multi CPU/core machine, then you should create one thread per core. With this sort of processing, more threads than cores will just slow it down.
What you could do is divide the work-set into tasks that can be enqueued, and have worker threads (one/CPU core) that will pull the tasks off of a common worker queue. This is a standard producer/consumer problem.
Here is some generic info about the producer/consumer problem.
Its been a long time since Ive done matrix multiplication, so bear with me :) It appears that you could divide the following into separate tasks:
/*Multiplication of matrices*/
for(int i=0;i<size_x_a;i++)
{
for(int j=0;j<size_y_b;j++)
{
struct v * data = (struct v*)malloc(sizeof(struct v));
data->i = i;
data->j = j;
/* Instead of creating a thread, create a task and put it on the queue
* mulRes = pthread_create(&multThread[i][j],NULL,multiplication, (void*)data);
*/
/* Im not going to implement the queue here, since there are several available
* But remember that the queue access MUST be mutex protected. */
enqueue_task(data);
}
}
Previously, you will have to have created what is called the thread-pool (the worker threads, one per CPU core), whose worker function will try to pull off the queue and execute the work. There are ways to do this with pthread conditional variables, whereby the threads are blocked/waiting on the cond var if the queue is empty, and once the queue is populated, then the cond var is signalled, thus releasing the threads so they can start working.
If this is not a logical division of work, and you cant find one, then perhaps this problem is not suitable for multi-threading.

Resources