Synchronization problem between threads using pthread_mutex_t - c

Basically i need to make three threads B,C,D to work simultaneously. Thread B sums up the even indexes in a global array X , C sums up the odd indexes in X, D sums up both results while B and C are still summing. I used two mutexes to do so but its not working properly.
In the array X given in the code below the results should be: evenSum = 47,oddSum = 127 ,bothSum = 174.
any help is greatly appreciated!
Thanks!!
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#define SIZE 20
int X[SIZE] = {5,4,5,3,7,9,3,3,1,2,9,0,3,43,3,56,7,3,4,4};
int evenSum = 0;
int oddSum = 0;
int bothSum = 0;
//Initialize two mutex semaphores
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
void* sum_even_indexes(void* args){
int i;
for(i=0 ; i<20 ; i+=2){
pthread_mutex_lock(&mutex1);
evenSum+=X[i];
pthread_mutex_unlock(&mutex1);
}
pthread_exit(NULL);
}
void* sum_odd_indexes(void* args){
int i;
for(i=1 ; i<20 ; i+=2){
pthread_mutex_lock(&mutex2);
oddSum+=X[i];
pthread_mutex_unlock(&mutex2);
}
pthread_exit(NULL);
}
void* sum_of_both(void* args){
int i;
for(i=0 ; i<SIZE ; i++){
pthread_mutex_lock(&mutex1);
pthread_mutex_lock(&mutex2);
bothSum += (evenSum+oddSum);
pthread_mutex_unlock(&mutex2);
pthread_mutex_unlock(&mutex1);
}
pthread_exit(NULL);
}
int main(int argc, char const *argv[]){
pthread_t B,C,D;
/***
* Create three threads:
* thread B : Sums up the even indexes in the array X
* thread C : Sums up the odd indexes in the array X
* thread D : Sums up both B and C results
*
* Note:
* All threads must work simultaneously
*/
pthread_create(&B,NULL,sum_even_indexes,NULL);
pthread_create(&C,NULL,sum_odd_indexes,NULL);
pthread_create(&D,NULL,sum_of_both,NULL);
//Wait for all threads to finish
pthread_join(B,NULL);
pthread_join(C,NULL);
pthread_join(D,NULL);
pthread_mutex_destroy(&mutex1);
pthread_mutex_destroy(&mutex2);
//Testing Print
printf("Odds Sum = %d\n",oddSum);
printf("Evens Sum = %d\n",evenSum);
printf("Both Sum = %d\n",bothSum);
return 0;
}

The mutexes will not enforce any ordering between D and B or C.
To make such ordering, you need to run D later (after B and C joined), or make D wait on a condition, which is set, when B an C are done.
mutex1 and mutex2 are actually not needed, if you implement proper waiting. You need no mutexes if you start D later, or you need only one, if you'll use waiting on a condition variable
Also it is not clear why you need a loop in D. You have just a sum of two variables.
For real application, parallel array processing is usually done by partitioning array by ranges, not by modulo. It does not make sense at all for small sizes like 20. You'd prefer threadpool threads to avoid thread startup overhead and control count of threads. And sure for just evenSum+oddSum you wouldn't need a thread.

Related

why is this c code causing a race condition?

I'm trying to count the number of prime numbers up to 10 million and I have to do it using multiple threads using Posix threads(so, that each thread computes a subset of 10 million). However, my code is not checking for the condition IsPrime. I'm thinking this is due to a race condition. If it is what can I do to ameliorate this issue?
I've tried using a global integer array with k elements but since k is not defined it won't let me declare that at the file scope.
I'm running my code using gcc -pthread:
/*
Program that spawns off "k" threads
k is read in at command line each thread will compute
a subset of the problem domain(check if the number is prime)
to compile: gcc -pthread lab5_part2.c -o lab5_part2
*/
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdlib.h>
typedef int bool;
#define FALSE 0
#define TRUE 1
#define N 10000000 // 10 Million
int k; // global variable k willl hold the number of threads
int primeCount = 0; //it will hold the number of primes.
//returns whether num is prime
bool isPrime(long num) {
long limit = sqrt(num);
for(long i=2; i<=limit; i++) {
if(num % i == 0) {
return FALSE;
}
}
return TRUE;
}
//function to use with threads
void* getPrime(void* input){
//get the thread id
long id = (long) input;
printf("The thread id is: %ld \n", id);
//how many iterations each thread will have to do
int numOfIterations = N/k;
//check the last thread. to make sure is a whole number.
if(id == k-1){
numOfIterations = N - (numOfIterations * id);
}
long startingPoint = (id * numOfIterations);
long endPoint = (id + 1) * numOfIterations;
for(long i = startingPoint; i < endPoint; i +=2){
if(isPrime(i)){
primeCount ++;
}
}
//terminate calling thread.
pthread_exit(NULL);
}
int main(int argc, char** args) {
//get the num of threads from command line
k = atoi(args[1]);
//make sure is working
printf("Number of threads is: %d\n",k );
struct timespec start,end;
//start clock
clock_gettime(CLOCK_REALTIME,&start);
//create an array of threads to run
pthread_t* threads = malloc(k * sizeof(pthread_t));
for(int i = 0; i < k; i++){
pthread_create(&threads[i],NULL,getPrime,(void*)(long)i);
}
//wait for each thread to finish
int retval;
for(int i=0; i < k; i++){
int * result = NULL;
retval = pthread_join(threads[i],(void**)(&result));
}
//get the time time_spent
clock_gettime(CLOCK_REALTIME,&end);
double time_spent = (end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec)/1000000000.0f;
printf("Time tasken: %f seconds\n", time_spent);
printf("%d primes found.\n", primeCount);
}
the current output I am getting: (using the 2 threads)
Number of threads is: 2
Time tasken: 0.038641 seconds
2 primes found.
The counter primeCount is modified by multiple threads, and therefore must be atomic. To fix this using the standard library (which is now supported by POSIX as well), you should #include <stdatomic.h>, declare primeCount as an atomic_int, and increment it with an atomic_fetch_add() or atomic_fetch_add_explicit().
Better yet, if you don’t care about the result until the end, each thread can store its own count in a separate variable, and the main thread can add all the counts together once the threads finish. You will need to create, in the main thread, an atomic counter per thread (so that updates don’t clobber other data in the same cache line), pass each thread a pointer to its output parameter, and then return the partial tally to the main thread through that pointer.
This looks like an exercise that you want to solve yourself, so I won’t write the code for you, but the approach to use would be to declare an array of counters like the array of thread IDs, and pass &counters[i] as the arg parameter of pthread_create() similarly to how you pass &threads[i]. Each thread would need its own counter. At the end of the thread procedure, you would write something like, atomic_store_explicit( (atomic_int*)arg, localTally, memory_order_relaxed );. This should be completely wait-free on all modern architectures.
You might also decide that it’s not worth going to that trouble to avoid a single atomic update per thread, declare primeCount as an atomic_int, and then atomic_fetch_add_explicit( &primeCount, localTally, memory_order_relaxed ); once before the thread procedure terminates.

Array iteration in multiple threads

I've got the following example, let's say I want for each thread to count from 0 to 9.
void* iterate(void* arg) {
int i = 0;
while(i<10) {
i++;
}
pthread_exit(0);
}
int main() {
int j = 0;
pthread_t tid[100];
while(j<100) {
pthread_create(&tid[j],NULL,iterate,NULL);
pthread_join(tid[j],NULL);
}
}
variable i - is in a critical section, it will be overwritten multiple times and therefore threads will fail to count.
int* i=(int*)calloc(1,sizeof(int));
doesn't solve the problem either. I don't want to use mutex. What is the most common solution for this problem?
As other users are commenting, there are severals problems in your example:
Variable i is not shared (it should be a global variable, for instance), nor in a critical section (it is a local variable to each thread). To have a critical section you should use locks or transactional memory.
You don't need to create and destroy threads every iteration. Just create a number of threads at the beggining and wait for them to finish (join).
pthread_exit() is not necessary, just return from the thread function (with a value).
A counter is a bad example for threads. It requires atomic operations to avoid overwriting the value of other threads. Actually, a multithreaded counter is a typical example of why atomic accesses are necessary (see this tutorial, for example).
I recommend you to start with some tutorials, like this or this.
I also recommend frameworks like OpenMP, they simplify the semantics of multithreaded programs.
EDIT: example of a shared counter and 4 threads.
#include <stdio.h>
#include <pthread.h>
#define NUM_THREADS 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static int counter = 0;
void* iterate(void* arg) {
int i = 0;
while(i++ < 10) {
// enter critical section
pthread_mutex_lock(&mutex);
++counter;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
int j;
pthread_t tid[NUM_THREADS];
for(j = 0; j < NUM_THREADS; ++j)
pthread_create(&tid[j],NULL,iterate,NULL);
// let the threads do their magic
for(j = 0; j < NUM_THREADS; ++j)
pthread_join(tid[j],NULL);
printf("%d", counter);
return 0;
}

How to create thread blocks?

I have two questions.
First:
I need to create thread blocks gradually not more then some max value, for example 20.
For example, first 20 thread go, job is finished, only then 20 second thread go, and so on in a loop.
Total number of jobs could be much larger then total number of threads (in our example 20), but total number of threads should not be bigger then our max value (in our example 20).
Second:
Could threads be added continuously? For example, 20 threads go, one thread job is finished, we see that total number of threads is 19 but our max value is 20, so we can create one more thread, and one more thread go :)
So we don't waste a time waiting another threads job to be done and our total threads number is not bigger then our some max value (20 in our example) - sounds cool.
Conclusion:
For total speed I consider the second variant would be much faster and better, and I would be very graceful if you help me with this, but also tell how to do the first variant.
Here is me code (it's not working properly and the result is strange - some_array elements become wrong after eleven step in a loop, something like this: Thread counter = 32748):
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <math.h>
#define num_threads 5 /* total max number of threads */
#define lines 17 /* total jobs to be done */
/* args for thread start function */
typedef struct {
int *words;
} args_struct;
/* thread start function */
void *thread_create(void *args) {
args_struct *actual_args = args;
printf("Thread counter = %d\n", *actual_args->words);
free(actual_args);
}
/* main function */
int main(int argc, char argv[]) {
float block;
int i = 0;
int j = 0;
int g;
int result_code;
int *ptr[num_threads];
int some_array[lines] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17};
pthread_t threads[num_threads];
/* counting how many block we need */
block = ceilf(lines / (double)num_threads);
printf("blocks= %f\n", block);
/* doing ech thread block continuously */
for (g = 1; g <= block; g++) {
//for (i; i < num_threads; ++i) { i < (num_threads * g),
printf("g = %d\n", g);
for (i; i < lines; ++i) {
printf("i= %d\n", i);
/* locate memory to args */
args_struct *args = malloc(sizeof *args);
args->words = &some_array[i];
if(pthread_create(&threads[i], NULL, thread_create, args)) {
free(args);
/* goto error_handler */
}
}
/* wait for each thread to complete */
for (j; j < lines; ++j) {
printf("j= %d\n", j);
result_code = pthread_join(threads[j], (void**)&(ptr[j]));
assert(0 == result_code);
}
}
return 0;
}

Why does this code work without a mutex?

I am trying to learn how locks work in multi-threading. When I execute the following code without lock, it worked fine even though the variable sum is declared as a global variable and multiple threads are updating it. Could anyone please explain why here threads are working perfectly fine on a shared variable without locks?
Here is the code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NTHREADS 100
#define ARRAYSIZE 1000000
#define ITERATIONS ARRAYSIZE / NTHREADS
double sum=0.0, a[ARRAYSIZE];
pthread_mutex_t sum_mutex;
void *do_work(void *tid)
{
int i, start, *mytid, end;
double mysum=0.0;
/* Initialize my part of the global array and keep local sum */
mytid = (int *) tid;
start = (*mytid * ITERATIONS);
end = start + ITERATIONS;
printf ("Thread %d doing iterations %d to %d\n",*mytid,start,end-1);
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
mysum = mysum + a[i];
}
/* Lock the mutex and update the global sum, then exit */
//pthread_mutex_lock (&sum_mutex); //here I tried not to use locks
sum = sum + mysum;
//pthread_mutex_unlock (&sum_mutex);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
int i, start, tids[NTHREADS];
pthread_t threads[NTHREADS];
pthread_attr_t attr;
/* Pthreads setup: initialize mutex and explicitly create threads in a
joinable state (for portability). Pass each thread its loop offset */
pthread_mutex_init(&sum_mutex, NULL);
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for (i=0; i<NTHREADS; i++) {
tids[i] = i;
pthread_create(&threads[i], &attr, do_work, (void *) &tids[i]);
}
/* Wait for all threads to complete then print global sum */
for (i=0; i<NTHREADS; i++) {
pthread_join(threads[i], NULL);
}
printf ("Done. Sum= %e \n", sum);
sum=0.0;
for (i=0;i<ARRAYSIZE;i++){
a[i] = i*1.0;
sum = sum + a[i]; }
printf("Check Sum= %e\n",sum);
/* Clean up and exit */
pthread_attr_destroy(&attr);
pthread_mutex_destroy(&sum_mutex);
pthread_exit (NULL);
}
With and without lock I got the same answer!
Done. Sum= 4.999995e+11
Check Sum= 4.999995e+11
UPDATE: Change suggested by user3386109
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
//pthread_mutex_lock (&sum_mutex);
sum = sum + a[i];
//pthread_mutex_lock (&sum_mutex);
}
EFFECT :
Done. Sum= 3.878172e+11
Check Sum= 4.999995e+11
Mutexes are used to prevent race conditions which are undesirable situations when you have two or more threads accessing a shared resource. Race conditions such as the one in your code happen when the shared variable sum is being accessed by multiple threads. Sometimes the access to the shared variable will be interleaved in such a way that the result is incorrect and sometimes the result will be correct.
For example lets say you have two threads, thread A and thread B both adding 1 to a shared value, sum, which starts at 5. If thread A reads sum and then thread B reads sum and then thread A writes a new value followed by thread B writing a new value you will can an incorrect result, 6 as opposed to 7. However it is also possible than thread A reads and then writes a value (specifically 6) followed by thread B reading and writing a value (specifically 7) and then you get the correct result. The point being that some interleavings of operations result in the correct value and some interleavings result in an incorrect value. The job of the mutex is to force the interleaving to always be correct.

Matrix Multiply with Threads (each thread does single multiply)

I'm looking to do a matrix multiply using threads where each thread does a single multiplication and then the main thread will add up all of the results and place them in the appropriate spot in the final matrix (after the other threads have exited).
The way I am trying to do it is to create a single row array that holds the results of each thread. Then I would go through the array and add + place the results in the final matrix.
Ex: If you have the matrices:
A = [{1,4}, {2,5}, {3,6}]
B = [{8,7,6}, {5,4,3}]
Then I want an array holding [8, 20, 7, 16, 6, 12, 16 etc]
I would then loop through the array adding up every 2 numbers and placing them in my final array.
This is a HW assignment so I am not looking for exact code, but some logic on how to store the results in the array properly. I'm struggling with how to keep track of where I am in each matrix so that I don't miss any numbers.
Thanks.
EDIT2: Forgot to mention that there must be a single thread for every single multiplication to be done. Meaning for the example above, there will be 18 threads each doing its own calculation.
EDIT: I'm currently using this code as a base to work off of.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10
int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];
struct v {
int i; /* row */
int j; /* column */
};
void *runner(void *param); /* the thread */
int main(int argc, char *argv[]) {
int i,j, count = 0;
for(i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
//Assign a row and column for each thread
struct v *data = (struct v *) malloc(sizeof(struct v));
data->i = i;
data->j = j;
/* Now create the thread passing it data as a parameter */
pthread_t tid; //Thread ID
pthread_attr_t attr; //Set of thread attributes
//Get the default attributes
pthread_attr_init(&attr);
//Create the thread
pthread_create(&tid,&attr,runner,data);
//Make sure the parent waits for all thread to complete
pthread_join(tid, NULL);
count++;
}
}
//Print out the resulting matrix
for(i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
printf("%d ", C[i][j]);
}
printf("\n");
}
}
//The thread will begin control in this function
void *runner(void *param) {
struct v *data = param; // the structure that holds our data
int n, sum = 0; //the counter and sum
//Row multiplied by column
for(n = 0; n< K; n++){
sum += A[data->i][n] * B[n][data->j];
}
//assign the sum to its coordinate
C[data->i][data->j] = sum;
//Exit the thread
pthread_exit(0);
}
Source: http://macboypro.wordpress.com/2009/05/20/matrix-multiplication-in-c-using-pthreads-on-linux/
You need to store M * K * N element-wise products. The idea is presumably that the threads will all run in parallel, or at least will be able to do, so each thread needs its own distinct storage location of appropriate type. A straightforward way to do that would be to create an array with that many elements ... but of what element type?
Each thread will need to know not only where to store its result, but also which multiplication to perform. All of that information needs to be conveyed via a single argument of type void *. One would typically, then, create a structure type suitable for holding all the data needed by one thread, create an instance of that structure type for each thread, and pass pointers to those structures. Sounds like you want an array of structures, then.
The details could be worked a variety of ways, but the one that seems most natural to me is to give the structure members for the two factors, and a member in which to store the product. I would then have the main thread declare a 3D array of such structures (if the needed total number is smallish) or else dynamically allocate one. For example,
struct multiplication {
// written by the main thread; read by the compute thread:
int factor1;
int factor2;
// written by the compute thread; read by the main thread:
int product;
} partial_result[M][K][N];
How to write code around that is left as the exercise it is intended to be.
Not sure haw many threads you would need to dispatch and I am also not sure if you would use join later to pick them up. I am guessing you are in C here so I would use the thread id as a way to track which row to process .. something like :
#define NUM_THREADS 64
/*
* struct to pass parameters to a dispatched thread
*/
typedef struct {
int value; /* thread number */
char somechar[128]; /* char data passed to thread */
unsigned long ret;
struct foo *row;
} thread_parm_t;
Where I am guessing that each thread will pick up its row data in the pointer *row which has some defined type foo. A bunch of integers or floats or even complex types. Whatever you need to pass to the thread.
/*
* the thread to actually crunch the row data
*/
void *thr_rowcrunch( void *parm );
pthread_t tid[NUM_THREADS]; /* POSIX array of thread IDs */
Then in your main code segment something like :
thread_parm_t *parm=NULL;
Then dispatch the threads with something like :
for ( i = 0; i < NUM_THREADS; i++) {
parm = malloc(sizeof(thread_parm_t));
parm->value = i;
strcpy(parm->somechar, char_data_to-pass );
fill_in_row ( parm->row, my_row_data );
pthread_create(&tid[i], NULL, thr_insert, (void *)parm);
}
Then later on :
for ( i = 0; i < NUM_THREADS; i++)
pthread_join(tid[i], NULL);
However the real work needs to be done in thr_rowcrunch( void *parm ) which receives the row data and then each thread just knows its own thread number. The guts of what you do in that dispatched thread however I can only guess at.
Just trying to help here, not sure if this is clear.

Resources