Compute the summation of a given interval using multiple threads - c

For my homework, I need to compute the squares of integers in the interval (0,N) (e.g. (0,50) in a way that the load is distributed equally among threads (e.g. 5 threads). I have been advised to use small chunks from the interval and assign it to the thread. For that, I am using a queue. Here's my code:
#include <stdio.h>
#include <pthread.h>
#define QUEUE_SIZE 50
typedef struct {
int q[QUEUE_SIZE];
int first,last;
int count;
} queue;
void init_queue(queue *q)
{
q->first = 0;
q->last = QUEUE_SIZE - 1;
q->count = 0;
}
void enqueue(queue *q,int x)
{
q->last = (q->last + 1) % QUEUE_SIZE;
q->q[ q->last ] = x;
q->count = q->count + 1;
}
int dequeue(queue *q)
{
int x = q->q[ q->first ];
q->first = (q->first + 1) % QUEUE_SIZE;
q->count = q->count - 1;
return x;
}
queue q; //declare the queue data structure
void* threadFunc(void* data)
{
int my_data = (int)data; /* data received by thread */
int sum=0, tmp;
while (q.count)
{
tmp = dequeue(&q);
sum = sum + tmp*tmp;
usleep(1);
}
printf("SUM = %d\n", sum);
printf("Hello from new thread %u - I was created in iteration %d\n",pthread_self(), my_data);
pthread_exit(NULL); /* terminate the thread */
}
int main(int argc, char* argv[])
{
init_queue(&q);
int i;
for (i=0; i<50; i++)
{
enqueue(&q, i);
}
pthread_t *tid = malloc(5 * sizeof(pthread_t) );
int rc; //return value
for(i=0; i<5; i++)
{
rc = pthread_create(&tid[i], NULL, threadFunc, (void*)i);
if(rc) /* could not create thread */
{
printf("\n ERROR: return code from pthread_create is %u \n", rc);
return(-1);
}
}
for(i=0; i<5; i++)
{
pthread_join(tid[i], NULL);
}
}
The output is not always correct. Most of the time it is correct, 40425, but sometimes, the value is bigger. Is it because of the threads are running in parallel and accessing the queue at the same time (the processor on my laptop is is intel i7)? I would appreciate the feedback on my concerns.

I think contrary to what some of the other people here suggested, you don't need any synchronization primitives like semaphores or mutexes at all. Something like this:
Given some array like
int values[50];
I'd create a couple of threads (say: 5), each of which getting a pointer to a struct with the offset into the values array and a number of squares to compute, like
typedef struct ThreadArgs {
int *values;
size_t numSquares;
} ThreadArgs;
You can then start your threads, each of which being told to process 10 numbers:
for ( i = 0; i < 5; ++i ) {
ThreadArgs *args = malloc( sizeof( ThreadArgs ) );
args->values = values + 10 * i;
args->numSquares = 10;
pthread_create( ...., threadFunc, args );
}
Each thread then simply computes the squares it was assigned, like:
void *threadFunc( void *data )
{
ThreadArgs *args = data;
int i;
for ( i = 0; i < args->numSquares; ++i ) {
args->values[i] = args->values[i] * args->values[i];
}
free( args );
}
At the end, you'd just use a pthread_join to wait for all threads to finish, after which you have your squares in the values array.

All your threads read from the same queue. This leads to a race condition. For instance, if the number 10 was read simultaneously by two threads, your result will be offset by 100. You should protect your queue with a mutex. Put the following print in deque function to know which numbers are repeated:
printf("Dequeing %d in thread %d\n", x, pthread_self());
Your code doesn't show where the results are accumulated to a single variable. You should protect that variable with a mutex as well.
Alternatively, you can pass the start number as the input parameter to each thread from the loop so that each thread can work on its set of numbers. First thread will work on 1-10, the second one on 11-20 and so on. In this approach, you have to use mutex only the part where the threads update the global sum variable at the end of their execution.

First you need to define what it means to be "distributed equally among threads." If you mean that each thread does the same amount of work as the other threads, then I would create a single queue, put all the numbers in the queue, and start all threads (which are the same code.) Each thread tries to get a value from the queue which must be protected by a mutex unless it is thread safe, calculates the partial answer from the value taken from the thread, and adds the result to the total which must also be protected by a mutex. If you mean that each thread will execute an equal amount of times as each of the other threads, then you need to make a priority queue and put all the numbers in the queue along with the thread number that should compute on it. Each thread then tries to get a value from the queue that matches its thread number. From the thread point of view, it should try to get a value from the queue, do the work, then try to get another value. If there are no more values to get, then the thread should exit. The main program does a join on all threads and the program exits when all threads have exited.

Related

Why does my simple counting program take longer to run with multiple threads? (in C)

Here's my code:
#define COUNT_TO 100000000
#define MAX_CORES 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
long long i = 0;
void* start_counting(void *arg){
for(;;){
pthread_mutex_lock(&mutex);
if(i >= COUNT_TO){
pthread_mutex_unlock(&mutex);
return NULL;
}
i++;
pthread_mutex_unlock(&mutex);
//printf("i = %lld\n", i);
}
}
int main(int argc, char* argv[]){
int i = 0;
pthread_t * thread_group = malloc(sizeof(pthread_t) * MAX_CORES);
for(i = 0; i < MAX_CORES; i++){
pthread_create(&thread_group[i], NULL, start_counting, NULL);
}
for(i = 0; i < MAX_CORES; i++){
pthread_join(thread_group[i], NULL);
}
return 0;
}
This is what your threads do:
Read the value of i.
Increment the value we read.
Write back the incremented value of i.
Go to step 1.
Cleary, another thread cannot read the value of i after a different thread has accomplished step 1 but before it has completed step 3. So there can be no overlap between two threads doing steps 1, 2, or 3.
So all your threads are fighting over access to the same resource -- i (or the mutex that protects it). No thread can make useful forward progress without exclusive access to one or both of those. Given that, there is no benefit to using multiple threads since only one of them can accomplish useful work at a time.

Trying to understand Race Conditions/Threads in C

For staters, I am a student who wasn't a CS undergrad, but am moving into a CS masters. So I welcome any and all help anyone is willing to give.
The purpose of this was to create N threads between 2-4, then using a randomly generated array of lower case characters, make them uppercase.
This needed to be done using the N threads (defined by the command line when executed), dividing the work up as evenly as possible, using pthread.
My main question I'm trying to ask, is if I avoided race conditions between my threads?
I am also struggling to understand dividing the work among the threads. As I understand (correct me if I'm wrong), in general the threads functioning will be chosen at random during execution. So, I'm assuming I need to do something along the lines of dynamically dividing the array among the N number of threads and setting it so that each thread will perform the uppercasing of a same sized subsection of the array?
I know there are likely a number of other discrepancies I need to get better at within my code, but I haven't coded long and just started using C/C++ about a month ago.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
#include <ctype.h>
//Global variable for threads
char randChars[60];
int j=0;
//Used to avoid race conditions
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
//Establish the threads
void* upperThread(void* argp)
{
while(randChars[j])
{
pthread_mutex_lock( &mutex1 );
putchar (toupper(randChars[j]));
j++;
pthread_mutex_unlock( &mutex1 );
}
return NULL;
}
int main(int argc, char **argv)
{
//Initializae variables and thread
int N,randNum,t;
long i;
pthread_t pth[N];
pthread_mutex_init(&mutex1, NULL);
char randChar = ' ';
//Check number of command inputs given
if(argc!=2)
{
fprintf(stderr,"usage: %s <enter a value for N>\n", argv[0]);
exit(0);
}
N = atoi(argv[1]);
//Checks command inputs for correct values
if(N<2||N>4){
printf("Please input a value between 2 and 4 for the number of threads.\n");
exit(0);
}
//Seed random to create a randomized value
srand(time(NULL));
printf("original lower case version:\n");
for (i=0; i<61; i++)
{
//Generate a random integer in lower alphabetical range
randNum = rand()%26;
randNum = randNum+97;
//Convert int to char and add to array
randChar = (char) randNum;
randChars[i] = randChar;
printf("%c", randChar);
}
//Create N threads
for (i=0; i<N; i++)
{
pthread_create(pth + i, NULL, upperThread, (void *)i);
}
printf("\n\nupper case version:\n");
//Join the threads
for(t=0; t < N; t++)
{
pthread_join(pth[t], NULL);
}
printf("\n");
pthread_exit(NULL);
return 0;
}
The example you provided is not a good multithreaded program. The reason is that your threads will constantly wait for the one which holds the lock. Which basically makes your program sequential. I would change your upperThread to
void* upperThread(void* argp){
int temp;
while(randChars[j]){
pthread_mutex_lock( &mutex1 );
temp = j;
j++;
pthread_mutex_unlock( &mutex1 );
putchar (toupper(randChars[temp]));
}
return NULL;
}
This way your threads will wait for one that holds the lock until it extracts the value of j , increment it and release the lock and then do the rest of its operations.
The general rule is that you have to acquire the lock only when you deal with critical section or critical data in this case it is an index of your string. Read about critical sections and racing conditions here

Multithreaded search in c

I'm supposed to have two threads that search for the minimum element in an array: the first one searches the first half, and the second thread searches the other half. However, when I run my code, it seems that it chooses a thread randomly. I'm not sure what I'm doing wrong, but it probably has to do with the "mid" part. I tried dividing an array into two, finding the midpoint and then writing the conditions from there, but I probably went wrong somewhere. I also tried putting array[i] in the conditions, but in that case only thread2 executes.
EDIT: I'm really trying my best here, but I'm not getting anywhere. I edited the code in a way that made sense to me, and I probably typecasted "min" wrong but now it doesn't even execute it just gives me an error, even though it compiles just fine. I'm just a beginner, and while I do understand everything you guys are talking about, I have a hard time implementing the ideas, so really, any help with fixing this is appreciated!
EDIT2: Okay so the previous code made no sense at all, I do apologize but I was exhausted while writing it. Anyway, I came up with something else that works partially! I split the array into two halves, however only the first element is accessible when using the pointer. But would it work if the whole array was being accessed and if so how can I fix that then?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#define size 20
void *smallest(void *arg);
pthread_t th, th2;
int array[size], i, min;
int main(int argc, char *argv[]) {
srand ( time(NULL) );
for(i = 0; i < size; i++)
{
array[i] = (rand() % 100)+1;
printf("%d ", array[i]);
}
int *array1 = malloc(10 * sizeof(int));
int *array2 = malloc(10 * sizeof(int));
memcpy(array1, array, 10 * sizeof(int));
memcpy(array2, array + 10, 10 * sizeof(int));
printf("\nFirst half gives %d \n", *array1);
printf("Second half gives %d \n", *array2);
pthread_create(&th, NULL, smallest, (void*) array1);
pthread_create(&th2, NULL, smallest, (void*) array2);
pthread_join(th, NULL);
pthread_join(th2, NULL);
//printf("\nFirst half gives %d\n", array1);
//printf("Second half gives %d\n", array2);
if (*array1 < *array2) {
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *array1);
}
else {
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *array2);
}
return 0;
}
void *smallest(void* arg){
int *array = (int*)arg;
min = array[0];
for (i = 0; i < size; i++) {
if (array[i] < min) {
min = array[i];
}
}
pthread_exit(NULL);
}
The code you've set up never runs more than one thread. Notice that if you run the first branch of the if statement, you fire off one thread to search half the array, wait for it to finish, then continue onward, and if the else branch executes, the same thing happens in the second half of the array. Fundamentally, you probably want to rethink your strategy here by having the code always launch two threads and join each of them only after both threads have started running.
The condition within your if statement also seems like it's mistaken. You're asking whether the middle element of the array is greater than its index. I assume this isn't what you're trying to do.
Finally, the code you have in each thread always looks at the entire array, not just a half of it. I would recommend rewriting the thread routine so that its argument represents the start and end indices of the range to take the minimum of. You would then update the code in main so that when you fire off the thread, you specify which range to search.
I would structure things like this:
Fire off a thread to find the minimum of the first half of the array.
Fire off a thread to find the minimum of the second half of the array.
Join both threads.
Use the results from each thread to find the minimum.
As one final note, since you'll have two different threads each running at the same time, you'll need to watch for data races as both threads try to read or write the minimum value. Consider having each thread use its exit code to signal where the minimum is and then resolving the true minimum back in main. This eliminates the race condition. Alternatively, have one global minimum value, but guard it with a mutex.
1) You´re redeclaring the global variables in the main function, so there´s actually no point in declaring i, low, high, min:
int array[size], i, low, high, min;
The problem you´re having is with the scope of the variables when you redeclare the variables in the main function, the global ones with the same name become "invisible"
int *low = array;
int *high = array + (size/2);
int mid = (*low + *high) / 2;
So when you run the threads all the values of your variables(low, high, min;
) are 0, this is because they are never actually modified by the main and because they start in 0 default(startup code,etc).
Anyways I wouldn´t really recommend(it´s really frowned upon) using global variables unless it´s a really small proyect for personal use.
2) Another crucial problem is that you´re ignorning the main idea behind threads which is running both simultaneously
if (array[mid] > mid) {
pthread_create(&th, NULL, &smallest, NULL);
pthread_join(th, NULL);
printf("\nThread1 finds the number\n");
}
else if (array[mid] < mid) {
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
printf("\nThread2 finds the number\n");
}
You´re actually only running one thread when executing.
Try something like this:
pthread_create(&th, NULL, &smallest, NULL);
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
pthread_join(th, NULL);
3) You are trying to have two threads access the same variable this can result in undefined behaviour, you MUST use a muthex to avoid a number from not actually being stored.
This guide is pretty complete regarding mutexes but if you need anyhelp please let me know.
This is a single threaded version of what you are asking.
#include <stdio.h>
#include <stdlib.h>
/*
I can not run pthread on my system.
So this is some code that should kind of work the same way
*/
typedef int pthread_t;
typedef int pthread_attr_t;
typedef void*(*threadfunc)(void*);
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg)
{
start_routine(arg);
return 0;
}
int pthread_join(pthread_t thread, void **value_ptr)
{
return 0;
}
struct context
{
int* begin;
int* end;
int* result;
};
//the function has to be castable to the threadfunction type
//that way you do not have to worry about casting the argument.
//be careful though - if something does not match these errors may be hard to track
void * smallest(context * c) //signature needet for start routine
{
c->result = c->begin;
for (int* current = c->begin; current < c->end; ++current)
{
if (*current < *c->result)
{
c->result = current;
}
}
return 0; // not needet with the way the argument is set up.
}
int main(int argc, char *argv[])
{
pthread_t t1, t2;
#define size 20
int array[size];
srand(0);
for (int i = 0; i < size; ++i)
{
array[i] = (rand() % 100) + 1;
printf("%d ", array[i]);
}
//prepare data
//one surefire way of messing up in multithreading is sharing data between threads.
//even a simple approach like storing in a variable who is accessing will not solve the issues
//to properly lock data you would have to dive into the memory model.
//either lock with mutexes or memory barriers or just don' t share data between threads.
context c1;
context c2;
c1.begin = array;
c1.end = array + (size / 2);
c2.begin = c1.end + 1;
c2.end = array + size;
//start threads - here your threads would go
//note the casting - you may wnt to wrap this in its own function
//there is error potential here, especially due to maintainance etc...
pthread_create(&t1, 0, (void*(*)(void*))smallest, &c1); //without typedef
pthread_create(&t2, 0, (threadfunc)smallest, &c2); //without typedef
pthread_join(t1, 0);//instead of zero you could have a return value here
pthread_join(t1, 0);//as far as i read 0 throws the return value away
//return value could be useful for error handling
//evaluate
if (*c1.result < *c2.result)
{
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *c1.result);
}
else
{
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *c2.result);
}
return 0;
}
Edit:
I edited some stubs in to give you an idea of how to use multithreading.
I never used pthread but this should likely work.
I used this source for prototype information.

How to synchronize manager/worker pthreads without a join?

I'm familiar with multithreading and I've developed many multithreaded programs in Java and Objective-C successfully. But I couldn't achieve the following in C using pthreads without using a join from the main thread:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_OF_THREADS 2
struct thread_data {
int start;
int end;
int *arr;
};
void print(int *ints, int n);
void *processArray(void *args);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
ints[i] = i;
}
print(ints, numOfInts); // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pthread_t threads[NUM_OF_THREADS];
struct thread_data thread_data[NUM_OF_THREADS];
// these vars are used to calculate the index ranges for each thread
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
thread_data[i].arr = ints;
thread_data[i].start = startRange;
thread_data[i].end = endRange;
pthread_create(&threads[i], NULL, processArray, (void *)&thread_data[i]);
remainingWork -= amountOfWork;
}
// 1. Signal to the threads to start working
// 2. Wait for them to finish
print(ints, numOfInts); // should print [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
free(ints);
return 0;
}
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
// 1. Wait for a signal to start from the main thread
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// 2. Signal to the main thread that you're done
pthread_exit(NULL);
}
void print(int *ints, int n)
{
printf("[");
for (int i = 0; i < n; i++) {
printf("%d", ints[i]);
if (i+1 != n)
printf(", ");
}
printf("]\n");
}
I would like to achieve the following in the above code:
In main():
Signal to the threads to start working.
Wait for the background threads to finish.
In processArray():
Wait for a signal to start from the main thread
Signal to the main thread that you're done
I don't want to use a join in the main thread because in the real application, the main thread will create the threads once, and then it will signal to the background threads to work many times, and I can't let the main thread proceed unless all the background threads have finished working. In the processArray function, I will put an infinite loop as following:
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
while (1)
{
// 1. Wait for a signal to start from the main thread
int *arr = data->arr;
int start = data->start;
int end = data->end;
// Process
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// 2. Signal to the main thread that you're done
}
pthread_exit(NULL);
}
Note that I'm new to C and the posix API, so excuse me if I'm missing something obvious. But I really tried many things, starting from using a mutex, and an array of semaphores, and a mixture of both, but without success. I think a condition variable may help, but I couldn't understand how it could be used.
Thanks for your time.
Problem Solved:
Thank you guys so much! I was finally able to get this to work safely and without using a join by following your tips. Although the solution is somewhat ugly, it gets the job done and the performance gains is worth it (as you'll see below). For anyone interested, this is a simulation of the real application I'm working on, in which the main thread keeps giving work continuously to the background threads:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_OF_THREADS 5
struct thread_data {
int id;
int start;
int end;
int *arr;
};
pthread_mutex_t currentlyIdleMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t currentlyIdleCond = PTHREAD_COND_INITIALIZER;
int currentlyIdle;
pthread_mutex_t workReadyMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t workReadyCond = PTHREAD_COND_INITIALIZER;
int workReady;
pthread_cond_t currentlyWorkingCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t currentlyWorkingMutex= PTHREAD_MUTEX_INITIALIZER;
int currentlyWorking;
pthread_mutex_t canFinishMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t canFinishCond = PTHREAD_COND_INITIALIZER;
int canFinish;
void print(int *ints, int n);
void *processArray(void *args);
int validateResult(int *ints, int num, int start);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
ints[i] = i;
}
// print(ints, numOfInts);
pthread_t threads[NUM_OF_THREADS];
struct thread_data thread_data[NUM_OF_THREADS];
workReady = 0;
canFinish = 0;
currentlyIdle = 0;
currentlyWorking = 0;
// these vars are used to calculate the index ranges for each thread
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
// Create the threads and give each one its data struct.
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
thread_data[i].id = i;
thread_data[i].arr = ints;
thread_data[i].start = startRange;
thread_data[i].end = endRange;
pthread_create(&threads[i], NULL, processArray, (void *)&thread_data[i]);
remainingWork -= amountOfWork;
}
int loops = 1111111;
int expectedStartingValue = ints[0] + loops; // used to validate the results
// The elements in ints[] should be incremented by 1 in each loop
while (loops-- != 0) {
// Make sure all of them are ready
pthread_mutex_lock(&currentlyIdleMutex);
while (currentlyIdle != NUM_OF_THREADS) {
pthread_cond_wait(&currentlyIdleCond, &currentlyIdleMutex);
}
pthread_mutex_unlock(&currentlyIdleMutex);
// All threads are now blocked; it's safe to not lock the mutex.
// Prevent them from finishing before authorized.
canFinish = 0;
// Reset the number of currentlyWorking threads
currentlyWorking = NUM_OF_THREADS;
// Signal to the threads to start
pthread_mutex_lock(&workReadyMutex);
workReady = 1;
pthread_cond_broadcast(&workReadyCond );
pthread_mutex_unlock(&workReadyMutex);
// Wait for them to finish
pthread_mutex_lock(&currentlyWorkingMutex);
while (currentlyWorking != 0) {
pthread_cond_wait(&currentlyWorkingCond, &currentlyWorkingMutex);
}
pthread_mutex_unlock(&currentlyWorkingMutex);
// The threads are now waiting for permission to finish
// Prevent them from starting again
workReady = 0;
currentlyIdle = 0;
// Allow them to finish
pthread_mutex_lock(&canFinishMutex);
canFinish = 1;
pthread_cond_broadcast(&canFinishCond);
pthread_mutex_unlock(&canFinishMutex);
}
// print(ints, numOfInts);
if (validateResult(ints, numOfInts, expectedStartingValue)) {
printf("Result correct.\n");
}
else {
printf("Result invalid.\n");
}
// clean up
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_cancel(threads[i]);
}
free(ints);
return 0;
}
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
while (1) {
// Set yourself as idle and signal to the main thread, when all threads are idle main will start
pthread_mutex_lock(&currentlyIdleMutex);
currentlyIdle++;
pthread_cond_signal(&currentlyIdleCond);
pthread_mutex_unlock(&currentlyIdleMutex);
// wait for work from main
pthread_mutex_lock(&workReadyMutex);
while (!workReady) {
pthread_cond_wait(&workReadyCond , &workReadyMutex);
}
pthread_mutex_unlock(&workReadyMutex);
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// mark yourself as finished and signal to main
pthread_mutex_lock(&currentlyWorkingMutex);
currentlyWorking--;
pthread_cond_signal(&currentlyWorkingCond);
pthread_mutex_unlock(&currentlyWorkingMutex);
// Wait for permission to finish
pthread_mutex_lock(&canFinishMutex);
while (!canFinish) {
pthread_cond_wait(&canFinishCond , &canFinishMutex);
}
pthread_mutex_unlock(&canFinishMutex);
}
pthread_exit(NULL);
}
int validateResult(int *ints, int n, int start)
{
int tmp = start;
for (int i = 0; i < n; i++, tmp++) {
if (ints[i] != tmp) {
return 0;
}
}
return 1;
}
void print(int *ints, int n)
{
printf("[");
for (int i = 0; i < n; i++) {
printf("%d", ints[i]);
if (i+1 != n)
printf(", ");
}
printf("]\n");
}
I'm not sure though if pthread_cancel is enough for clean up! As for the barrier, it would've been of a great help if it wasn't limited to some OSs as mentioned by #Jeremy.
Benchmarks:
I wanted to make sure that these many conditions aren't actually slowing down the algorithm, so I've setup this benchmark to compare the two solutions:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#define NUM_OF_THREADS 5
struct thread_data {
int start;
int end;
int *arr;
};
pthread_mutex_t currentlyIdleMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t currentlyIdleCond = PTHREAD_COND_INITIALIZER;
int currentlyIdle;
pthread_mutex_t workReadyMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t workReadyCond = PTHREAD_COND_INITIALIZER;
int workReady;
pthread_cond_t currentlyWorkingCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t currentlyWorkingMutex= PTHREAD_MUTEX_INITIALIZER;
int currentlyWorking;
pthread_mutex_t canFinishMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t canFinishCond = PTHREAD_COND_INITIALIZER;
int canFinish;
void *processArrayMutex(void *args);
void *processArrayJoin(void *args);
double doItWithMutex(pthread_t *threads, struct thread_data *data, int loops);
double doItWithJoin(pthread_t *threads, struct thread_data *data, int loops);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *join_ints = malloc(numOfInts * sizeof(int));
int *mutex_ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
join_ints[i] = i;
mutex_ints[i] = i;
}
pthread_t join_threads[NUM_OF_THREADS];
pthread_t mutex_threads[NUM_OF_THREADS];
struct thread_data join_thread_data[NUM_OF_THREADS];
struct thread_data mutex_thread_data[NUM_OF_THREADS];
workReady = 0;
canFinish = 0;
currentlyIdle = 0;
currentlyWorking = 0;
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
join_thread_data[i].arr = join_ints;
join_thread_data[i].start = startRange;
join_thread_data[i].end = endRange;
mutex_thread_data[i].arr = mutex_ints;
mutex_thread_data[i].start = startRange;
mutex_thread_data[i].end = endRange;
pthread_create(&mutex_threads[i], NULL, processArrayMutex, (void *)&mutex_thread_data[i]);
remainingWork -= amountOfWork;
}
int numOfBenchmarkTests = 100;
int numberOfLoopsPerTest= 1000;
double join_sum = 0.0, mutex_sum = 0.0;
for (int i = 0; i < numOfBenchmarkTests; i++)
{
double joinTime = doItWithJoin(join_threads, join_thread_data, numberOfLoopsPerTest);
double mutexTime= doItWithMutex(mutex_threads, mutex_thread_data, numberOfLoopsPerTest);
join_sum += joinTime;
mutex_sum+= mutexTime;
}
double join_avg = join_sum / numOfBenchmarkTests;
double mutex_avg= mutex_sum / numOfBenchmarkTests;
printf("Join average : %f\n", join_avg);
printf("Mutex average: %f\n", mutex_avg);
double diff = join_avg - mutex_avg;
if (diff > 0.0)
printf("Mutex is %.0f%% faster.\n", 100 * diff / join_avg);
else if (diff < 0.0)
printf("Join is %.0f%% faster.\n", 100 * diff / mutex_avg);
else
printf("Both have the same performance.");
free(join_ints);
free(mutex_ints);
return 0;
}
// From https://stackoverflow.com/a/2349941/408286
double get_time()
{
struct timeval t;
struct timezone tzp;
gettimeofday(&t, &tzp);
return t.tv_sec + t.tv_usec*1e-6;
}
double doItWithMutex(pthread_t *threads, struct thread_data *data, int num_loops)
{
double start = get_time();
int loops = num_loops;
while (loops-- != 0) {
// Make sure all of them are ready
pthread_mutex_lock(&currentlyIdleMutex);
while (currentlyIdle != NUM_OF_THREADS) {
pthread_cond_wait(&currentlyIdleCond, &currentlyIdleMutex);
}
pthread_mutex_unlock(&currentlyIdleMutex);
// All threads are now blocked; it's safe to not lock the mutex.
// Prevent them from finishing before authorized.
canFinish = 0;
// Reset the number of currentlyWorking threads
currentlyWorking = NUM_OF_THREADS;
// Signal to the threads to start
pthread_mutex_lock(&workReadyMutex);
workReady = 1;
pthread_cond_broadcast(&workReadyCond );
pthread_mutex_unlock(&workReadyMutex);
// Wait for them to finish
pthread_mutex_lock(&currentlyWorkingMutex);
while (currentlyWorking != 0) {
pthread_cond_wait(&currentlyWorkingCond, &currentlyWorkingMutex);
}
pthread_mutex_unlock(&currentlyWorkingMutex);
// The threads are now waiting for permission to finish
// Prevent them from starting again
workReady = 0;
currentlyIdle = 0;
// Allow them to finish
pthread_mutex_lock(&canFinishMutex);
canFinish = 1;
pthread_cond_broadcast(&canFinishCond);
pthread_mutex_unlock(&canFinishMutex);
}
return get_time() - start;
}
double doItWithJoin(pthread_t *threads, struct thread_data *data, int num_loops)
{
double start = get_time();
int loops = num_loops;
while (loops-- != 0) {
// create them
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_create(&threads[i], NULL, processArrayJoin, (void *)&data[i]);
}
// wait
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_join(threads[i], NULL);
}
}
return get_time() - start;
}
void *processArrayMutex(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
while (1) {
// Set yourself as idle and signal to the main thread, when all threads are idle main will start
pthread_mutex_lock(&currentlyIdleMutex);
currentlyIdle++;
pthread_cond_signal(&currentlyIdleCond);
pthread_mutex_unlock(&currentlyIdleMutex);
// wait for work from main
pthread_mutex_lock(&workReadyMutex);
while (!workReady) {
pthread_cond_wait(&workReadyCond , &workReadyMutex);
}
pthread_mutex_unlock(&workReadyMutex);
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// mark yourself as finished and signal to main
pthread_mutex_lock(&currentlyWorkingMutex);
currentlyWorking--;
pthread_cond_signal(&currentlyWorkingCond);
pthread_mutex_unlock(&currentlyWorkingMutex);
// Wait for permission to finish
pthread_mutex_lock(&canFinishMutex);
while (!canFinish) {
pthread_cond_wait(&canFinishCond , &canFinishMutex);
}
pthread_mutex_unlock(&canFinishMutex);
}
pthread_exit(NULL);
}
void *processArrayJoin(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
pthread_exit(NULL);
}
And the output is:
Join average : 0.153074
Mutex average: 0.071588
Mutex is 53% faster.
Thank you again. I really appreciate your help!
There are several synchronization mechanisms you can use (condition variables, for example). I think the simplest would be to use a pthread_barrier to synchronize the the start of the threads.
Assuming that you want all of the threads to 'sync up' on each loop iteration, you can just reuse the barrier. If you need something more flexible, a condition variable might be more appropriate.
When you decide it's time for the thread to wrap up (you haven't indicated how the threads will know to break out of the infinite loop - a simple shared variable might be used for that; the shared variable could be an atomic type or protected with a mutex), the main() thread should use pthread_join() to wait for all the threads to complete.
You need to use a different synchronization technique than join, that's clear.
Unfortunately you have a lot of options. One is a "synchronization barrier", which basically is a thing where each thread that reaches it blocks until they've all reached it (you specify the number of threads in advance). Look at pthread_barrier.
Another is to use a condition-variable/mutex pair (pthread_cond_*). When each thread finishes it takes the mutex, increments a count, signals the condvar. The main thread waits on the condvar until the count reaches the value it expects. The code looks like this:
// thread has finished
mutex_lock
++global_count
// optional optimization: only execute the next line when global_count >= N
cond_signal
mutex_unlock
// main is waiting for N threads to finish
mutex_lock
while (global_count < N) {
cond_wait
}
mutex_unlock
Another is to use a semaphore per thread -- when the thread finishes it posts its own semaphore, and the main thread waits on each semaphore in turn instead of joining each thread in turn.
You also need synchronization to re-start the threads for the next job -- this could be a second synchronization object of the same type as the first, with details changed for the fact that you have 1 poster and N waiters rather than the other way around. Or you could (with care) re-use the same object for both purposes.
If you've tried these things and your code didn't work, maybe ask a new specific question about the code you tried. All of them are adequate to the task.
You are working at the wrong level of abstraction. This problem has been solved already. You are reimplementing a work queue + thread pool.
OpenMP seems like a good fit for your problem. It converts #pragma annotations into threaded code. I believe it would let you express what you're trying to do pretty directly.
Using libdispatch, what you're trying to do would be expressed as a dispatch_apply targeting a concurrent queue. This implicitly waits for all child tasks to complete. Under OS X, it's implemented using a non-portable pthread workqueue interface; under FreeBSD, I believe it manages a group of pthreads directly.
If it is portability concerns driving you to use raw pthreads, don't use pthread barriers. Barriers are an additional extension over and above basic POSIX threads. OS X for example does not support it. For more, see POSIX.
Blocking the main thread till all child threads have completed can be done using a count protected by a condition variable or, even more simply, using a pipe and a blocking read where the number of bytes to read matches the number of threads. Each thread writes one byte on work completion, then sleeps till it gets new work from the main thread. The main thread unblocks once each thread has written its "I'm done!" byte.
Passing work to the child threads can be done using a mutex protecting the work-descriptor and a condition to signal new work. You could use a single array of work descriptors that all threads draw from. On signal, each one tries to grab the mutex. On grabbing the mutex, it would dequeue some work, signal anew if the queue is nonempty, and then process its work, after which it would signal completion to the master thread.
You could reuse this "work queue" to unblock the main thread by enqueueing the results, with the main thread waiting till the result queue length matches the number of threads; the pipe approach is just using a blocking read to do this count for you.
To tell all the threads to start working, it can be as simple as a global integer variable which is initialized to zero, and the threads simply wait until it's non-zero. This way you don't need the while (1) loop in the thread function.
For waiting until they are all done, pthread_join is simplest as it will actually block until the thread it's joining is done. It's also needed to clean up system stuff after the thread (like otherwise the return value from the thread will be stored for the remainder of the program). As you have an array of all pthread_t for the threads, just loop over them one by one. As that part of your program doesn't do anything else, and has to wait until all threads are done, just waiting for them in order is okay.

C: pthread performance woes. How can I make this code perform as expected?

I have created this little program to calculate pi using probability and ratios. In order to make it run faster I decided to give multithreading with pthreads a shot. Unfortunately, even after doing much searching around I was unable to solve the problem I have in that when I run the threadFunc function, with one thread, whether that be with a pthread, or just normally called from the calculate_pi_mt function, the performance is much better (at least twice or if not 3 times better) than when I try running it with two threads on my dual core machine. I have tried disabling optimizations to no avail. As far as I can see, when the thread is running it is using local variables apart from at the end when I have used a mutex lock to create the sum of hits...
Firstly are there any tips for creating code that will run better here? (ie style) because I'm just learning by trying this stuff.
And secondly would there be any reason for these obvious performance problems?
When running with number of threads set to 1, one of my cpus maxes out at 100%. When set to two, the second cpu rises to roughly 80%-90%, but all this extra work it is apparently doing is to no avail! Could it be the use of the rand() function?
struct arguments {
int n_threads;
int rays;
int hits_in;
pthread_mutex_t *mutex;
};
void *threadFunc(void *arg)
{
struct arguments* args=(struct arguments*)arg;
int n = 0;
int local_hits_in = 0;
double x;
double y;
double r;
while (n < args->rays)
{
n++;
x = ((double)rand())/((double)RAND_MAX);
y = ((double)rand())/((double)RAND_MAX);
r = (double)sqrt(pow(x, 2) + pow(y, 2));
if (r < 1.0){
local_hits_in++;
}
}
pthread_mutex_lock(args->mutex);
args->hits_in += local_hits_in;
pthread_mutex_unlock(args->mutex);
return NULL;
}
double calculate_pi_mt(int rays, int threads){
double answer;
int c;
unsigned int iseed = (unsigned int)time(NULL);
srand(iseed);
if ( (float)(rays/threads) != ((float)rays)/((float)threads) ){
printf("Error: number of rays is not evenly divisible by threads\n");
}
/* argument initialization */
struct arguments* args = malloc(sizeof(struct arguments));
args->hits_in = 0;
args->rays = rays/threads;
args->n_threads = 0;
args->mutex = malloc(sizeof(pthread_mutex_t));
if (pthread_mutex_init(args->mutex, NULL)){
printf("Error creating mutex!\n");
}
pthread_t thread_ary[MAXTHREADS];
c=0;
while (c < threads){
args->n_threads += 1;
if (pthread_create(&(thread_ary[c]),NULL,threadFunc, args)){
printf("Error when creating thread\n");
}
printf("Created Thread: %d\n", args->n_threads);
c+=1;
}
c=0;
while (c < threads){
printf("main waiting for thread %d to terminate...\n", c+1);
if (pthread_join(thread_ary[c],NULL)){
printf("Error while waiting for thread to join\n");
}
printf("Destroyed Thread: %d\n", c+1);
c+=1;
}
printf("Hits in %d\n", args->hits_in);
printf("Rays: %d\n", rays);
answer = 4.0 * (double)(args->hits_in)/(double)(rays);
//freeing everything!
pthread_mutex_destroy(args->mutex);
free(args->mutex);
free(args);
return answer;
}
There's a couple of problems I can see:
rand() is not thread-safe. Use drand48_r() (which generates a double in the range [0.0, 1.0) natively, which is what you want)
You only create one struct arguments structure, then try to use that for multiple threads. You need to create a seperate one for each thread (just use an array).
Here's how I'd clean up your approach. Note how we don't need to use any mutexes - each thread just stashes its own return value in a seperate location, and the main thread adds them up after the other threads have finished:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#include <pthread.h>
struct thread_info {
int thread_n;
pthread_t thread_id;
int rays;
int hits_in;
};
void seed_rand(int thread_n, struct drand48_data *buffer)
{
struct timeval tv;
gettimeofday(&tv, NULL);
srand48_r(tv.tv_sec * thread_n + tv.tv_usec, buffer);
}
void *threadFunc(void *arg)
{
struct thread_info *thread_info = arg;
struct drand48_data drand_buffer;
int n = 0;
const int rays = thread_info->rays;
int hits_in = 0;
double x;
double y;
double r;
seed_rand(thread_info->thread_n, &drand_buffer);
for (n = 0; n < rays; n++)
{
drand48_r(&drand_buffer, &x);
drand48_r(&drand_buffer, &y);
r = x * x + y * y;
if (r < 1.0){
hits_in++;
}
}
thread_info->hits_in = hits_in;
return NULL;
}
double calculate_pi_mt(int rays, int threads)
{
int c;
int hits_in = 0;
if (rays % threads) {
printf("Error: number of rays is not evenly divisible by threads\n");
rays = (rays / threads) * threads;
}
/* argument initialization */
struct thread_info *thr = malloc(threads * sizeof thr[0]);
for (c = 0; c < threads; c++) {
thr[c].thread_n = c;
thr[c].rays = rays / threads;
thr[c].hits_in = 0;
if (pthread_create(&thr[c].thread_id, NULL, threadFunc, &thr[c])) {
printf("Error when creating thread\n");
}
printf("Created Thread: %d\n", thr[c].thread_n);
}
for (c = 0; c < threads; c++) {
printf("main waiting for thread %d to terminate...\n", c);
if (pthread_join(thr[c].thread_id, NULL)) {
printf("Error while waiting for thread to join\n");
}
hits_in += thr[c].hits_in;
printf("Destroyed Thread: %d\n", c+1);
}
printf("Hits in %d\n", hits_in);
printf("Rays: %d\n", rays);
double answer = (4.0 * hits_in) / rays;
free(thr);
return answer;
}
You're using far too many synchronization primitives. You should sum the local_hits at the end in the main thread, and not use a mutex to update it in an asynchronous fashion. Or, at least, you could use an atomic operation (it's just an int) to do it instead of lock an entire mutex to update one int.
Threading has a cost. It may be that, as your useful computing code looks very simple, the cost of thread management (cost paid when changing thread and synchronisation cost) is much higher than the benefit.

Resources