why are Threads loading data from array randomly? - c

I'm new with threads and I’m trying to create a program where four threads do some parallel computations using values from a global array. But, The problem that the threads are not loading the data in order.
#define QUARTER 64
#define RANGE_STEP 256
struct thread_data
{
unsigned start;
unsigned stop;
__m256* re_fc;
__m256* im_fc;
};
#define NUM_THREADS 4
struct thread_data thread_data_array[NUM_THREADS];
void *routine(void *thread_info)
{
int n,t;
unsigned t_start,t_stop;
__m256 *Re_fac , *Im_fac;
struct thread_data *mydata;
mydata = (struct thread_data*) thread_info;
t_start = mydata->start;
t_stop = mydata->stop;
Re_fac = mydata->re_fc;
Im_fac = mydata->im_fc;
t = t_start;
for (n = t_start; n < t_stop; n += 8)
{
// computations
RE_m256_fac = Re_fac[t];
IM_m256_fac = Im_fac[t];
// computations
t++;
}
pthread_exit(NULL);
}
int main()
{
unsigned t,i=0;
for(t=0;t<RANGE_STEP;t+=QUARTER)
{
thread_data_array[i].start = t;
thread_data_array[i].stop = t+QUARTER;
thread_data_array[i].re_fc = RE_factors;
thread_data_array[i].im_fc = IM_factors;
pthread_create(&threads[i],NULL,routine,(void *)&thread_data_array[i]);
i++;
}
for(i=0; i<NUM_THREADS; i++)
{
int rc = pthread_join(threads[i], NULL);
if (rc)
{
fprintf(stderr, "failed to join thread #%u - %s\n",i, strerror(rc));
}
}
}
The problem that I am talking about is occurring in the routine of the thread inside the for() loop exactly with these two load instructions RE_m256_fac = Re_fac[t];and IM_m256_fac = Im_fac[t]; the loaded data is not correct ... I think the index t is a local variable so no synchros is needed , or I am wrong?

After some digging it turned to be that since I am reading from a global shared array I have to use the mutex mechanism to prevent the mutual exclusion as so:
void *routine(void *thread_info)
{
int n,t;
unsigned t_start,t_stop;
__m256 *Re_fac , *Im_fac;
struct thread_data *mydata;
mydata = (struct thread_data*) thread_info;
t_start = mydata->start;
t_stop = mydata->stop;
Re_fac = mydata->re_fc;
Im_fac = mydata->im_fc;
t = t_start;
for (n = t_start; n < t_stop; n += 8)
{
pthread_mutex_lock(&mutex);
// computations
RE_m256_fac = Re_fac[t];
IM_m256_fac = Im_fac[t];
// computations
pthread_mutex_unlock(&mutex);
t++;
}
pthread_exit(NULL);
}
and since then, I can see that the threads are loading the values correctly from the shared array.

Related

Segmentation Fault when dealing with large array size

I've been trying to use Pthreads in a program to count the 3's in a 3000000 element int array, and when it runs sequentially without using pthreads, it works perfectly.
Using pthreads leads to a segmentation fault and i can't figure out why. the execution stops when each thread reaches around 300000 iteration in case of 4 threads on a 8gb of RAM and 256K L2 cache.
Here is my code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <pthread.h>
#define LENGTH 3000000
#define NUM_THREADS 4
int countarr[128];
int* array;
struct Op_data{
int start_index;
int count_index;
int CHUNK;
int ID;
};
void* count3s(void* data) // you need to parallelize this
{
struct Op_data *info;
info = (struct Op_data *) data;
int count_i = info -> count_index;
int i = info->start_index;
int CHUNK = info -> CHUNK;
printf("Thread data:\nCOUNT_INDEX:\t\t%d\nSTART_INDEX:\t\t%d\nCHUNK_SIZE:\t\t%d\n",count_i,i,CHUNK);
int c = 0;
struct timeval t1, t2;
gettimeofday(&t1, NULL);
for(i;i<i+CHUNK;i++)
{
if(array[i]==3)
{
c++;
}
}
countarr[count_i] = c;
gettimeofday(&t2, NULL);
double t = (t2.tv_sec - t1.tv_sec) + (t2.tv_usec - t1.tv_usec ) / 1000000.0;
printf("execution time = %f seconds\n",t);
}
int main(int argc, char * argv[])
{
int i = 0;
int ok = 0;
int count=0;
array = malloc(LENGTH * sizeof(int));
// initialize the array randomly. Make sure the number of 3's doesn't exceed 500000
srand(12);
for(i= 0;i<LENGTH;i++)
{
array[i]=rand()%10;
if(array[i]==3)
{
ok++; // keep track of how many 3's are there, we will use this later to confirm the correctness of the code
if(ok>500000)
{
ok=500000;
array[i]=12;
}
}
}
pthread_t threads[NUM_THREADS];
struct Op_data* t_data[NUM_THREADS];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
int rc;
int CHUNK = LENGTH/NUM_THREADS;
for(int t=0;t<NUM_THREADS;t++)
{
t_data[t] = (struct Op_data*) malloc(sizeof(struct Op_data));
t_data[t] -> start_index = t*CHUNK;
t_data[t] -> count_index = t*(128/NUM_THREADS);
t_data[t] -> CHUNK = CHUNK;
t_data[t] -> ID = t;
rc = pthread_create(&threads[t], &attr, count3s, (void *)t_data[t]);
if (rc) {
printf("Error:unable to create thread,%d\n",rc);
exit(-1);
}
printf("Thread (%d) has been created.\n",t);
}
for( int g = 0; g < NUM_THREADS; g++ ) {
rc = pthread_join(threads[g], NULL);
if (rc) {
printf("Error:unable to join,%d\n",rc);
exit(-1);
}
}
pthread_attr_destroy(&attr);
for(int x=0;x<NUM_THREADS;x++)
{
count += countarr[x*(128/NUM_THREADS)];
}
if( ok == count ) // check if the result is correct
{
printf("Correct Result!\n");
printf("Number of 3`s: %d\n_________________________________\n",count);
}
else
{
printf("Wrong Result! Your count:%d\n",count);
printf("The correct number of 3`s is: %d\n",ok);
}
pthread_exit(NULL);
return 0;
}
for(i;i<i+CHUNK;i++)
Since i will always be less than i + CHUNK (unless it overflows), this loop will overrun the bounds of the array.

Segmentation fault: core dumped during execution of multi-threaded program

I have realized that my code was too lengthy and rather hard to read.
Can you check over the way I pass in the arguments and constructing the arguments in the main body?
Essentially, provided that I have correct implementation of "produce" and "consume" functions, I want to pass in a shared circular queue and semaphores and mutexes to each produce/consume threads.
typedef struct circularQueue
{
int *items;
int *head;
int *tail;
int numProduced;
int numConsumed;
} circularQueue;
typedef struct threadArg
{
int id;
circularQueue *queue;
pthread_mutex_t *mutex;
sem_t *spaces;
sem_t *itemAvail;
int numItems;
int bufferSize;
int numProducer;
int numConsumer;
} threadArg;
pthread_t *producerThd;
pthread_t *consumerThd;
int main(int argc, char* argv[])
{
pthread_attr_t attr;
// In fo to pass to thread arg
circularQueue *myQueue;
pthread_mutex_t useSharedMem;
sem_t spaces;
sem_t itemAvail;
int numItems;
int bufferSize;
int numProducer;
int numConsumer;
int i, j, k, l;
if(argc != 5)
{
printf("Enter in 4 arguments - N B P C\n");
return -1;
}
numItems = atoi(argv[1]);
bufferSize = atoi(argv[2]);
numProducer = atoi(argv[3]);
numConsumer = atoi(argv[4]);
if(numItems == 0 || bufferSize == 0 || numProducer == 0 || numConsumer == 0)
{
printf("Parameters should not be 0\n");
return -1;
}
// Initialize list of threads
producerThd = malloc(sizeof(pthread_t) * numProducer);
consumerThd = malloc(sizeof(pthread_t) * numConsumer);
// Initialize semaphores
sem_init(&spaces, 0, bufferSize);
sem_init(&itemAvail, 0, 0);
// Initialize mutex
pthread_mutex_init(&useSharedMem, NULL);
// Initialzie thread attributes
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// Initialize queue
myQueue = (circularQueue*)malloc(sizeof(circularQueue));
myQueue->items = (int*)malloc(sizeof(int)*bufferSize);
myQueue->head = myQueue->items;
myQueue->tail = myQueue->items;
myQueue->numProduced = 0;
myQueue->numConsumed = 0;
// thread arguments
for(i = 0; i < numProducer; i++)
{
// Initialize thraed args
threadArg *args = (threadArg*)malloc(sizeof(threadArg));
args->queue = (circularQueue*)malloc(sizeof(circularQueue));
args->mutex = &useSharedMem;
args->spaces = &spaces;
args->itemAvail = &itemAvail;
args->numItems = numItems;
args->bufferSize = bufferSize;
args->numProducer = numProducer;
args->numConsumer = numConsumer;
args->id = i;
pthread_t thisThread = *(producerThd + i);
pthread_create(&thisThread, &attr, produce, args);
}
for(j = 0; j < numConsumer; j++)
{
// Initialize thraed args
threadArg *args = (threadArg*)malloc(sizeof(threadArg));
args->queue = (circularQueue*)malloc(sizeof(circularQueue));
args->mutex = &useSharedMem;
args->spaces = &spaces;
args->itemAvail = &itemAvail;
args->numItems = numItems;
args->bufferSize = bufferSize;
args->numProducer = numProducer;
args->numConsumer = numConsumer;
args->id = j;
pthread_t thisThread = *(consumerThd + i);
pthread_create(&thisThread, &attr, consume, args);
}
for(k = 0; k < numProducer; k++)
{
pthread_join(*(producerThd+k), NULL);
}
printf("Finished waiting for producers\n");
for(l = 0; l < numConsumer; l++)
{
pthread_join(*(consumerThd+l), NULL);
}
printf("Finished waiting for consumers\n");
free(producerThd);
free(consumerThd);
free(myQueue->items);
free(myQueue);
sem_destroy(&spaces);
sem_destroy(&itemAvail);
fflush(stdout);
return 0;
}
Thank you
There are multiple sources of undefined behavior in your code, you are either compiling without enabling compilation warnings, or what I consider worst you ignore them.
You have the wrong printf() specifier in
printf("cid %d found this item %d as valid item %d\n", myArgs->id, thisItem, validItem);
because validItem is a double, so the last specifier should be %f.
Your thread functions never return a value, but you declare them to return void * which is the signature required for such functions.
You are freeing and dereferencing myQueue in the main() function but you have not initialized it because that code is commented.
Your code is also too hard to read because you have no consistent style and you mix declarations with statements, which make everything very confusing, e.g. determining the scope of a variable is very difficult.
Fixing the code will not only help others read it, but will also help you fix it and find issues quickly.

Where is my message queue producing a segmentation fault?

The message queue simply stops working when dealing with many many threads. It only seems to work okay with 10 threads, for exmaple. GDB tells me
Program received signal SIGSEGV, Segmentation fault.
__GI_____strtol_l_internal (nptr=0x0, endptr=endptr#entry=0x0, base=base#entry=10, group=group#entry=0, loc=0x7ffff78b0060 <_nl_global_locale>)
at ../stdlib/strtol_l.c:298
298 ../stdlib/strtol_l.c: No such file or directory.
But I have no idea what this means. The same code on Windows works fine but on linux it doesn't, which confuses me more.
You can see below how this queue works. It is a singly linked list with locking while receiving messages. Please help me find where I messed up.
typedef struct Message {
unsigned type;
unsigned code;
void *data;
} Message;
typedef struct MessageQueueElement {
Message message;
struct MessageQueueElement *next;
} MessageQueueElement;
typedef struct MessageQueue {
MessageQueueElement *first;
MessageQueueElement *last;
} MessageQueue;
MessageQueue mq;
pthread_mutex_t emptyLock, sendLock;
pthread_cond_t emptyCond;
void init() {
mq.first = malloc(sizeof(MessageQueueElement));
mq.last = mq.first;
pthread_mutex_init(&emptyLock, NULL);
pthread_mutex_init(&sendLock, NULL);
pthread_cond_init(&emptyCond, NULL);
}
void clean() {
free(mq.first);
pthread_mutex_destroy(&emptyLock);
pthread_mutex_destroy(&sendLock);
pthread_cond_destroy(&emptyCond);
}
void sendMessage(MessageQueue *this, Message *message) {
pthread_mutex_lock(&sendLock);
if (this->first == this->last) {
pthread_mutex_lock(&emptyLock);
this->last->message = *message;
this->last = this->last->next = malloc(sizeof(MessageQueueElement));
pthread_cond_signal(&emptyCond);
pthread_mutex_unlock(&emptyLock);
} else {
this->last->message = *message;
this->last = this->last->next = malloc(sizeof(MessageQueueElement));
}
pthread_mutex_unlock(&sendLock);
}
int waitMessage(MessageQueue *this, int (*readMessage)(unsigned, unsigned, void *)) {
pthread_mutex_lock(&emptyLock);
if (this->first == this->last) {
pthread_cond_wait(&emptyCond, &emptyLock);
}
pthread_mutex_unlock(&emptyLock);
int n = readMessage(this->first->message.type, this->first->message.code, this->first->message.data);
MessageQueueElement *temp = this->first;
this->first = this->first->next;
free(temp);
return n;
}
some test code:
#define EXIT_MESSAGE 0
#define THREAD_MESSAGE 1
#define JUST_A_MESSAGE 2
#define EXIT 0
#define CONTINUE 1
int readMessage(unsigned type, unsigned code, void *data) {
if (type == THREAD_MESSAGE) {
printf("message from thread %d: %s\n", code, (char *)data);
free(data);
} else if (type == JUST_A_MESSAGE) {
puts((char *)data);
free(data);
} else if (type == EXIT_MESSAGE) {
puts("ending the program");
return EXIT;
}
return CONTINUE;
}
int nThreads;
int counter = 0;
void *worker(void *p) {
double pi = 0.0;
for (int i = 0; i < 1000000; i += 1) {
pi += (4.0 / (8.0 * i + 1.0) - 2.0 / (8.0 * i + 4.0) - 1.0 / (8.0 * i + 5.0) - 1.0 / (8.0 * i + 6.0)) / pow(16.0, i);
}
char *s = malloc(100);
sprintf(s, "pi equals %.8f", pi);
sendMessage(&mq, &(Message){.type = THREAD_MESSAGE, .code = (int)(intptr_t)p, .data = s});
counter += 1;
char *s2 = malloc(100);
sprintf(s2, "received %d message%s", counter, counter == 1 ? "" : "s");
sendMessage(&mq, &(Message){.type = JUST_A_MESSAGE, .data = s2});
if (counter == nThreads) {
sendMessage(&mq, &(Message){.type = EXIT_MESSAGE});
}
}
int main(int argc, char **argv) {
clock_t timer = clock();
init();
nThreads = atoi(argv[1]);
pthread_t threads[nThreads];
for (int i = 0; i < nThreads; i += 1) {
pthread_create(&threads[i], NULL, worker, (void *)(intptr_t)i);
}
while (waitMessage(&mq, readMessage));
for (int i = 0; i < nThreads; i += 1) {
pthread_join(threads[i], NULL);
}
clean();
timer = clock() - timer;
printf("%.2f\n", (double)timer / CLOCKS_PER_SEC);
return 0;
}
--- EDIT ---
Okay I managed to fix the problem by changing the program a bit using semaphores. The waitMessage function doesn't have to be locked since it is accessed by only one thread and the values that it modifies does not clash with sendMessage.
MessageQueue mq;
pthread_mutex_t mutex;
sem_t sem;
void init() {
mq.first = malloc(sizeof(MessageQueueElement));
mq.last = mq.first;
pthread_mutex_init(&mutex, NULL);
sem_init(&sem, 0, 0);
}
void clean() {
free(mq.first);
pthread_mutex_destroy(&mutex);
sem_destroy(&sem);
}
void sendMessage(MessageQueue *this, Message *message) {
pthread_mutex_lock(&mutex);
this->last->message = *message;
this->last = this->last->next = malloc(sizeof(MessageQueueElement));
pthread_mutex_unlock(&mutex);
sem_post(&sem);
}
int waitMessage(MessageQueue *this, int (*readMessage)(unsigned, unsigned, void *)) {
sem_wait(&sem);
int n = readMessage(this->first->message.type, this->first->message.code, this->first->message.data);
MessageQueueElement *temp = this->first;
this->first = this->first->next;
free(temp);
return n;
}
Your waitMessage function is modifying this->first outside of any locking. This is a bad thing.
It's often not worth recreating things that are already provided for you by an OS. You're effectively trying to set up a pipe of Message structures. You could simply use an anonymous pipe instead (see here for Linux, or here for Windows) and write/read Message structures to/from it. There's also POSIX message queues which are probably a bit more efficient.
In your case with multiple worker threads you'd have to have a supplementary mutex semaphore to control which worker is trying to read from the pipe or message queue.

passing struct to pthread as an argument

Ok I am trying to pass pair of numbers through struct to pthread_create function in pthread. But the numbers i am passing and numbers i am getting when the function is called are different and random
Here is the struct
struct Pairs {
long i,j;
};
And inside main
void main()
{
long thread_cmp_count = (long)n*(n-1)/2;
long t,index = 0;
struct Pairs *pair;
pair = malloc(sizeof(struct Pairs));
cmp_thread = malloc(thread_cmp_count*sizeof(pthread_t));
for(thread = 0;(thread < thread_cmp_count); thread++){
for(t = thread+1; t < n; t++){
(*pair).i = thread;
(*pair).j = t;
pthread_create(&cmp_thread[index++], NULL, Compare, (void*) pair);
}
}
for(thread= 0;(thread<thread_cmp_count); thread++){
pthread_join(cmp_thread[thread], NULL);
}
free(cmp_thread);
}
And function Compare
void* Compare(void* pair){
struct Pairs *my_pair = (struct Pairs*)pair;
printf("\nThread %ld, %ld", (*my_pair).i, (*my_pair).j);
return NULL;
}
Number I am getting and it is also random.
Thread 0,2
Thread 1,2
Thread 2,3
Thread 2,3
Thread 2,3
Thread 2,3
am i passing the struct wrong ?
That is because you are passing the same pointer to all pthreads.
When you invoke pthread_create(..., (void*) pair) you are passing the pointer to the new thread, but in the next iteration you are overwriting that memory (potentially before the new thread has extracted those values).
long thread_cmp_count = (long)n*(n-1)/2;
long t,index = 0;
struct Pairs *pair;
cmp_thread = malloc(thread_cmp_count*sizeof(pthread_t));
for(thread = 0;(thread < thread_cmp_count); thread++){
for(t = thread+1; t < n; t++){
// allocate a separate pair for each thread
pair = malloc(sizeof(struct Pairs));
(*pair).i = thread;
(*pair).j = t;
pthread_create(&cmp_thread[index++], NULL, Compare, (void*) pair);
}
}
for(thread= 0;(thread<thread_cmp_count); thread++){
pthread_join(cmp_thread[thread], NULL);
}
free(cmp_thread);
.
void* Compare(void* pair){
struct Pairs *my_pair = (struct Pairs*)pair;
printf("\nThread %ld, %ld", (*my_pair).i, (*my_pair).j);
// free that memory after it has been used
free (pair);
return NULL;
}
Problem solved. Problem was with overlap. using pointer as an array of type struct Pairs it is solved
Here is the correct code
long thread_cmp_count = (long)n*(n-1)/2;
long t,index = 0;
Pair * pair;
pair = malloc(thread_cmp_count*sizeof(Pair));
free(thread_handles);
thread_handles = malloc(thread_cmp_count*sizeof(pthread_t));
for(thread = 0;(thread < n-1); thread++){
for(t = thread+1; t < n; t++){
(pair+index)->i = thread;
(pair+index)->j = t;
pthread_create(&thread_handles[index], NULL, Compare, (void*) (pair+index));
index++;
}
}
for(thread= 0;(thread<thread_cmp_count); thread++){
pthread_join(thread_handles[thread], NULL);
}
free(thread_handles);
And the function Compare
void* Compare(void* pair){
long t,i,j;
Pair *my_pair = (Pair*)pair;
i = my_pair->i;
j = my_pair->j;
printf("\n..................................................................");
if((x_array[i] < x_array[j])&&(x_array[i] != x_array[j])){
w_array[i] = 0;
printf(
"\nThread T(%ld,%ld)"
" compares x[%ld] = %ld and x[%ld] = %ld,"
" and writes 0 to w[%ld]", i, j,
i,x_array[i],
j,x_array[j],
i);
}
else if((x_array[i] > x_array[j])&&(x_array[i] != x_array[j])){
w_array[j] = 0;
printf(
"\nThread T(%ld,%ld)"
" compares x[%ld] = %ld and x[%ld] = %ld,"
" and writes 0 to w[%ld]", i, j,
i,x_array[i],
j,x_array[j],
j);
}
else
return NULL;
return NULL;
}

How to synchronize manager/worker pthreads without a join?

I'm familiar with multithreading and I've developed many multithreaded programs in Java and Objective-C successfully. But I couldn't achieve the following in C using pthreads without using a join from the main thread:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_OF_THREADS 2
struct thread_data {
int start;
int end;
int *arr;
};
void print(int *ints, int n);
void *processArray(void *args);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
ints[i] = i;
}
print(ints, numOfInts); // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pthread_t threads[NUM_OF_THREADS];
struct thread_data thread_data[NUM_OF_THREADS];
// these vars are used to calculate the index ranges for each thread
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
thread_data[i].arr = ints;
thread_data[i].start = startRange;
thread_data[i].end = endRange;
pthread_create(&threads[i], NULL, processArray, (void *)&thread_data[i]);
remainingWork -= amountOfWork;
}
// 1. Signal to the threads to start working
// 2. Wait for them to finish
print(ints, numOfInts); // should print [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
free(ints);
return 0;
}
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
// 1. Wait for a signal to start from the main thread
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// 2. Signal to the main thread that you're done
pthread_exit(NULL);
}
void print(int *ints, int n)
{
printf("[");
for (int i = 0; i < n; i++) {
printf("%d", ints[i]);
if (i+1 != n)
printf(", ");
}
printf("]\n");
}
I would like to achieve the following in the above code:
In main():
Signal to the threads to start working.
Wait for the background threads to finish.
In processArray():
Wait for a signal to start from the main thread
Signal to the main thread that you're done
I don't want to use a join in the main thread because in the real application, the main thread will create the threads once, and then it will signal to the background threads to work many times, and I can't let the main thread proceed unless all the background threads have finished working. In the processArray function, I will put an infinite loop as following:
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
while (1)
{
// 1. Wait for a signal to start from the main thread
int *arr = data->arr;
int start = data->start;
int end = data->end;
// Process
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// 2. Signal to the main thread that you're done
}
pthread_exit(NULL);
}
Note that I'm new to C and the posix API, so excuse me if I'm missing something obvious. But I really tried many things, starting from using a mutex, and an array of semaphores, and a mixture of both, but without success. I think a condition variable may help, but I couldn't understand how it could be used.
Thanks for your time.
Problem Solved:
Thank you guys so much! I was finally able to get this to work safely and without using a join by following your tips. Although the solution is somewhat ugly, it gets the job done and the performance gains is worth it (as you'll see below). For anyone interested, this is a simulation of the real application I'm working on, in which the main thread keeps giving work continuously to the background threads:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_OF_THREADS 5
struct thread_data {
int id;
int start;
int end;
int *arr;
};
pthread_mutex_t currentlyIdleMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t currentlyIdleCond = PTHREAD_COND_INITIALIZER;
int currentlyIdle;
pthread_mutex_t workReadyMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t workReadyCond = PTHREAD_COND_INITIALIZER;
int workReady;
pthread_cond_t currentlyWorkingCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t currentlyWorkingMutex= PTHREAD_MUTEX_INITIALIZER;
int currentlyWorking;
pthread_mutex_t canFinishMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t canFinishCond = PTHREAD_COND_INITIALIZER;
int canFinish;
void print(int *ints, int n);
void *processArray(void *args);
int validateResult(int *ints, int num, int start);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
ints[i] = i;
}
// print(ints, numOfInts);
pthread_t threads[NUM_OF_THREADS];
struct thread_data thread_data[NUM_OF_THREADS];
workReady = 0;
canFinish = 0;
currentlyIdle = 0;
currentlyWorking = 0;
// these vars are used to calculate the index ranges for each thread
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
// Create the threads and give each one its data struct.
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
thread_data[i].id = i;
thread_data[i].arr = ints;
thread_data[i].start = startRange;
thread_data[i].end = endRange;
pthread_create(&threads[i], NULL, processArray, (void *)&thread_data[i]);
remainingWork -= amountOfWork;
}
int loops = 1111111;
int expectedStartingValue = ints[0] + loops; // used to validate the results
// The elements in ints[] should be incremented by 1 in each loop
while (loops-- != 0) {
// Make sure all of them are ready
pthread_mutex_lock(&currentlyIdleMutex);
while (currentlyIdle != NUM_OF_THREADS) {
pthread_cond_wait(&currentlyIdleCond, &currentlyIdleMutex);
}
pthread_mutex_unlock(&currentlyIdleMutex);
// All threads are now blocked; it's safe to not lock the mutex.
// Prevent them from finishing before authorized.
canFinish = 0;
// Reset the number of currentlyWorking threads
currentlyWorking = NUM_OF_THREADS;
// Signal to the threads to start
pthread_mutex_lock(&workReadyMutex);
workReady = 1;
pthread_cond_broadcast(&workReadyCond );
pthread_mutex_unlock(&workReadyMutex);
// Wait for them to finish
pthread_mutex_lock(&currentlyWorkingMutex);
while (currentlyWorking != 0) {
pthread_cond_wait(&currentlyWorkingCond, &currentlyWorkingMutex);
}
pthread_mutex_unlock(&currentlyWorkingMutex);
// The threads are now waiting for permission to finish
// Prevent them from starting again
workReady = 0;
currentlyIdle = 0;
// Allow them to finish
pthread_mutex_lock(&canFinishMutex);
canFinish = 1;
pthread_cond_broadcast(&canFinishCond);
pthread_mutex_unlock(&canFinishMutex);
}
// print(ints, numOfInts);
if (validateResult(ints, numOfInts, expectedStartingValue)) {
printf("Result correct.\n");
}
else {
printf("Result invalid.\n");
}
// clean up
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_cancel(threads[i]);
}
free(ints);
return 0;
}
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
while (1) {
// Set yourself as idle and signal to the main thread, when all threads are idle main will start
pthread_mutex_lock(&currentlyIdleMutex);
currentlyIdle++;
pthread_cond_signal(&currentlyIdleCond);
pthread_mutex_unlock(&currentlyIdleMutex);
// wait for work from main
pthread_mutex_lock(&workReadyMutex);
while (!workReady) {
pthread_cond_wait(&workReadyCond , &workReadyMutex);
}
pthread_mutex_unlock(&workReadyMutex);
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// mark yourself as finished and signal to main
pthread_mutex_lock(&currentlyWorkingMutex);
currentlyWorking--;
pthread_cond_signal(&currentlyWorkingCond);
pthread_mutex_unlock(&currentlyWorkingMutex);
// Wait for permission to finish
pthread_mutex_lock(&canFinishMutex);
while (!canFinish) {
pthread_cond_wait(&canFinishCond , &canFinishMutex);
}
pthread_mutex_unlock(&canFinishMutex);
}
pthread_exit(NULL);
}
int validateResult(int *ints, int n, int start)
{
int tmp = start;
for (int i = 0; i < n; i++, tmp++) {
if (ints[i] != tmp) {
return 0;
}
}
return 1;
}
void print(int *ints, int n)
{
printf("[");
for (int i = 0; i < n; i++) {
printf("%d", ints[i]);
if (i+1 != n)
printf(", ");
}
printf("]\n");
}
I'm not sure though if pthread_cancel is enough for clean up! As for the barrier, it would've been of a great help if it wasn't limited to some OSs as mentioned by #Jeremy.
Benchmarks:
I wanted to make sure that these many conditions aren't actually slowing down the algorithm, so I've setup this benchmark to compare the two solutions:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#define NUM_OF_THREADS 5
struct thread_data {
int start;
int end;
int *arr;
};
pthread_mutex_t currentlyIdleMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t currentlyIdleCond = PTHREAD_COND_INITIALIZER;
int currentlyIdle;
pthread_mutex_t workReadyMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t workReadyCond = PTHREAD_COND_INITIALIZER;
int workReady;
pthread_cond_t currentlyWorkingCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t currentlyWorkingMutex= PTHREAD_MUTEX_INITIALIZER;
int currentlyWorking;
pthread_mutex_t canFinishMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t canFinishCond = PTHREAD_COND_INITIALIZER;
int canFinish;
void *processArrayMutex(void *args);
void *processArrayJoin(void *args);
double doItWithMutex(pthread_t *threads, struct thread_data *data, int loops);
double doItWithJoin(pthread_t *threads, struct thread_data *data, int loops);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *join_ints = malloc(numOfInts * sizeof(int));
int *mutex_ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
join_ints[i] = i;
mutex_ints[i] = i;
}
pthread_t join_threads[NUM_OF_THREADS];
pthread_t mutex_threads[NUM_OF_THREADS];
struct thread_data join_thread_data[NUM_OF_THREADS];
struct thread_data mutex_thread_data[NUM_OF_THREADS];
workReady = 0;
canFinish = 0;
currentlyIdle = 0;
currentlyWorking = 0;
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
join_thread_data[i].arr = join_ints;
join_thread_data[i].start = startRange;
join_thread_data[i].end = endRange;
mutex_thread_data[i].arr = mutex_ints;
mutex_thread_data[i].start = startRange;
mutex_thread_data[i].end = endRange;
pthread_create(&mutex_threads[i], NULL, processArrayMutex, (void *)&mutex_thread_data[i]);
remainingWork -= amountOfWork;
}
int numOfBenchmarkTests = 100;
int numberOfLoopsPerTest= 1000;
double join_sum = 0.0, mutex_sum = 0.0;
for (int i = 0; i < numOfBenchmarkTests; i++)
{
double joinTime = doItWithJoin(join_threads, join_thread_data, numberOfLoopsPerTest);
double mutexTime= doItWithMutex(mutex_threads, mutex_thread_data, numberOfLoopsPerTest);
join_sum += joinTime;
mutex_sum+= mutexTime;
}
double join_avg = join_sum / numOfBenchmarkTests;
double mutex_avg= mutex_sum / numOfBenchmarkTests;
printf("Join average : %f\n", join_avg);
printf("Mutex average: %f\n", mutex_avg);
double diff = join_avg - mutex_avg;
if (diff > 0.0)
printf("Mutex is %.0f%% faster.\n", 100 * diff / join_avg);
else if (diff < 0.0)
printf("Join is %.0f%% faster.\n", 100 * diff / mutex_avg);
else
printf("Both have the same performance.");
free(join_ints);
free(mutex_ints);
return 0;
}
// From https://stackoverflow.com/a/2349941/408286
double get_time()
{
struct timeval t;
struct timezone tzp;
gettimeofday(&t, &tzp);
return t.tv_sec + t.tv_usec*1e-6;
}
double doItWithMutex(pthread_t *threads, struct thread_data *data, int num_loops)
{
double start = get_time();
int loops = num_loops;
while (loops-- != 0) {
// Make sure all of them are ready
pthread_mutex_lock(&currentlyIdleMutex);
while (currentlyIdle != NUM_OF_THREADS) {
pthread_cond_wait(&currentlyIdleCond, &currentlyIdleMutex);
}
pthread_mutex_unlock(&currentlyIdleMutex);
// All threads are now blocked; it's safe to not lock the mutex.
// Prevent them from finishing before authorized.
canFinish = 0;
// Reset the number of currentlyWorking threads
currentlyWorking = NUM_OF_THREADS;
// Signal to the threads to start
pthread_mutex_lock(&workReadyMutex);
workReady = 1;
pthread_cond_broadcast(&workReadyCond );
pthread_mutex_unlock(&workReadyMutex);
// Wait for them to finish
pthread_mutex_lock(&currentlyWorkingMutex);
while (currentlyWorking != 0) {
pthread_cond_wait(&currentlyWorkingCond, &currentlyWorkingMutex);
}
pthread_mutex_unlock(&currentlyWorkingMutex);
// The threads are now waiting for permission to finish
// Prevent them from starting again
workReady = 0;
currentlyIdle = 0;
// Allow them to finish
pthread_mutex_lock(&canFinishMutex);
canFinish = 1;
pthread_cond_broadcast(&canFinishCond);
pthread_mutex_unlock(&canFinishMutex);
}
return get_time() - start;
}
double doItWithJoin(pthread_t *threads, struct thread_data *data, int num_loops)
{
double start = get_time();
int loops = num_loops;
while (loops-- != 0) {
// create them
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_create(&threads[i], NULL, processArrayJoin, (void *)&data[i]);
}
// wait
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_join(threads[i], NULL);
}
}
return get_time() - start;
}
void *processArrayMutex(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
while (1) {
// Set yourself as idle and signal to the main thread, when all threads are idle main will start
pthread_mutex_lock(&currentlyIdleMutex);
currentlyIdle++;
pthread_cond_signal(&currentlyIdleCond);
pthread_mutex_unlock(&currentlyIdleMutex);
// wait for work from main
pthread_mutex_lock(&workReadyMutex);
while (!workReady) {
pthread_cond_wait(&workReadyCond , &workReadyMutex);
}
pthread_mutex_unlock(&workReadyMutex);
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// mark yourself as finished and signal to main
pthread_mutex_lock(&currentlyWorkingMutex);
currentlyWorking--;
pthread_cond_signal(&currentlyWorkingCond);
pthread_mutex_unlock(&currentlyWorkingMutex);
// Wait for permission to finish
pthread_mutex_lock(&canFinishMutex);
while (!canFinish) {
pthread_cond_wait(&canFinishCond , &canFinishMutex);
}
pthread_mutex_unlock(&canFinishMutex);
}
pthread_exit(NULL);
}
void *processArrayJoin(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
pthread_exit(NULL);
}
And the output is:
Join average : 0.153074
Mutex average: 0.071588
Mutex is 53% faster.
Thank you again. I really appreciate your help!
There are several synchronization mechanisms you can use (condition variables, for example). I think the simplest would be to use a pthread_barrier to synchronize the the start of the threads.
Assuming that you want all of the threads to 'sync up' on each loop iteration, you can just reuse the barrier. If you need something more flexible, a condition variable might be more appropriate.
When you decide it's time for the thread to wrap up (you haven't indicated how the threads will know to break out of the infinite loop - a simple shared variable might be used for that; the shared variable could be an atomic type or protected with a mutex), the main() thread should use pthread_join() to wait for all the threads to complete.
You need to use a different synchronization technique than join, that's clear.
Unfortunately you have a lot of options. One is a "synchronization barrier", which basically is a thing where each thread that reaches it blocks until they've all reached it (you specify the number of threads in advance). Look at pthread_barrier.
Another is to use a condition-variable/mutex pair (pthread_cond_*). When each thread finishes it takes the mutex, increments a count, signals the condvar. The main thread waits on the condvar until the count reaches the value it expects. The code looks like this:
// thread has finished
mutex_lock
++global_count
// optional optimization: only execute the next line when global_count >= N
cond_signal
mutex_unlock
// main is waiting for N threads to finish
mutex_lock
while (global_count < N) {
cond_wait
}
mutex_unlock
Another is to use a semaphore per thread -- when the thread finishes it posts its own semaphore, and the main thread waits on each semaphore in turn instead of joining each thread in turn.
You also need synchronization to re-start the threads for the next job -- this could be a second synchronization object of the same type as the first, with details changed for the fact that you have 1 poster and N waiters rather than the other way around. Or you could (with care) re-use the same object for both purposes.
If you've tried these things and your code didn't work, maybe ask a new specific question about the code you tried. All of them are adequate to the task.
You are working at the wrong level of abstraction. This problem has been solved already. You are reimplementing a work queue + thread pool.
OpenMP seems like a good fit for your problem. It converts #pragma annotations into threaded code. I believe it would let you express what you're trying to do pretty directly.
Using libdispatch, what you're trying to do would be expressed as a dispatch_apply targeting a concurrent queue. This implicitly waits for all child tasks to complete. Under OS X, it's implemented using a non-portable pthread workqueue interface; under FreeBSD, I believe it manages a group of pthreads directly.
If it is portability concerns driving you to use raw pthreads, don't use pthread barriers. Barriers are an additional extension over and above basic POSIX threads. OS X for example does not support it. For more, see POSIX.
Blocking the main thread till all child threads have completed can be done using a count protected by a condition variable or, even more simply, using a pipe and a blocking read where the number of bytes to read matches the number of threads. Each thread writes one byte on work completion, then sleeps till it gets new work from the main thread. The main thread unblocks once each thread has written its "I'm done!" byte.
Passing work to the child threads can be done using a mutex protecting the work-descriptor and a condition to signal new work. You could use a single array of work descriptors that all threads draw from. On signal, each one tries to grab the mutex. On grabbing the mutex, it would dequeue some work, signal anew if the queue is nonempty, and then process its work, after which it would signal completion to the master thread.
You could reuse this "work queue" to unblock the main thread by enqueueing the results, with the main thread waiting till the result queue length matches the number of threads; the pipe approach is just using a blocking read to do this count for you.
To tell all the threads to start working, it can be as simple as a global integer variable which is initialized to zero, and the threads simply wait until it's non-zero. This way you don't need the while (1) loop in the thread function.
For waiting until they are all done, pthread_join is simplest as it will actually block until the thread it's joining is done. It's also needed to clean up system stuff after the thread (like otherwise the return value from the thread will be stored for the remainder of the program). As you have an array of all pthread_t for the threads, just loop over them one by one. As that part of your program doesn't do anything else, and has to wait until all threads are done, just waiting for them in order is okay.

Resources