Incorrect output of code after using pthreads - c

I am working in some code trying to make it use threads, with little success. The code is as follows (using some online tuts)
(1) Create an array to save arguments to pass to each thread.
(2) The struct where thread args are stored
(3) Function each thread executes
(4) Main, where I create each pthread and pass args. end and begin are dummy i and i+1 just for testing purposes.
#include <pthread.h>
struct thread_data* thread_data_array;
struct thread_data{
int thread_id;
int begin;
int end;
};
void *proccessData(void *threadarg)
{
struct thread_data *my_data = (struct thread_data *) threadarg;
int id = my_data->thread_id;
int start = my_data->begin;
int end = my_data->end;
printf("%s%d%s%d%s%d%s","Process: Id ",id," begin: ",start," end: ",end,"\n");
// Init
FILE* fileOut;
// do stuff
fileOut = fopen(someNameUsingId, "w");
// more stuff
fclose(fileOut);
}
int main(int argc, char **argv)
{
int n_th = atoi(argv[1]);
thread_data_array = malloc(n_th*sizeof(struct thread_data));
pthread_t threads[n_th];
int i;
for (i=0; i<n_th; i++)
{
thread_data_array[i].thread_id = i;
thread_data_array[i].begin = i;
thread_data_array[i].end = i+1;
pthread_create(&threads[i], NULL, proccessData,(void *) &thread_data_array[i]);
}
}
What I am getting: Sometimes nothing is printed, Sometimes some ids are printed twice. Files are not created (one in every 5 lets say, files are created, and are empty)
But using this in main works as I am expecting, id 1 from 0 to 1 is printed, id 2 from 1 to 2, and both files are created and have the right content
thread_data_array[0].thread_id = 1;
thread_data_array[0].begin = 0;
thread_data_array[0].end = 1;
proccessData((void *) &thread_data_array[0]);
thread_data_array[1].thread_id = 2;
thread_data_array[1].begin = 1;
thread_data_array[1].end = 2;
proccessData((void *) &thread_data_array[1]);
Can someone point out what am I doing wrong, and how to solve it?
Thanks in advance

Your main is terminating before the other threads. main is special, as returning from it is equivalent to call exit.
Either use pthread_exit to end main or use pthread_join to wait for the termination of the other threads.

pthread_join will solve the issue, call it for each created thread in main. pthread_join() function is called to wait for the threads to complete. If main is done before threads are done, they will die before they finished their job.

Related

Changing parts of arrays/structs/.. in threads without blocking the whole thing, in pure c

I want to modify some (not all) fields of an array (or structs) in multiple threads, with out blocking the rest of the array as the rest of it is being modified in other threads. How is this achieved? I found some answers, but they are for C++ and I want to do it in C.
Here is the code I got so far:
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#define ARRAYLENGTH 5
#define TARGET 10000
int target;
typedef struct zstr{
int* array;
int place;
int run;
pthread_mutex_t* locks;
}zstr;
void *countup(void *);
int main(int argc, char** args){
int al;
if(argc>2){
al=atoi(args[1]);
target=atoi(args[2]);
}else{
al=ARRAYLENGTH;
target=TARGET;
}
printf("%d %d\n", al, target);
zstr* t=malloc(sizeof(zstr));
t->array=calloc(al, sizeof(int));
t->locks=calloc(al, sizeof(pthread_mutex_t));
int* rua=calloc(al, sizeof(int));
pthread_t id[4*al];
for(int i=0; i<al; i++)
pthread_mutex_init(&(t->locks[i]), NULL);
for(int j=0; j<4*al; j++){
int st=j%al;
t->run=rua[st]++;
t->place=st;
pthread_create(&id[j], NULL, &countup, t);
}
for(int k=0; k<4*al; k++){
pthread_join(id[k], NULL);
}
for(int u=0; u<al; u++)
printf("%d\n", t->array[u]);
free(rua);
free(t->locks);
free(t->array);
return 0;
}
void *countup(void* table){
zstr* nu=table;
if(!nu->run){
pthread_mutex_lock(nu->locks + nu->place);
}else{
pthread_mutex_trylock(nu->locks + nu->place);
}
while(nu->array[nu->place]<target)
nu->array[nu->place]++;
pthread_mutex_unlock(nu->locks + nu->place);
return NULL;
}
Sometimes this works just fine, but then calculates wrong values and for quiet sort problems (like the default values), it takes super long (strangely it worked once when I handed them in as parameters).
There isn't anything special about part of an array or structure. What matters is that the mutex or other synchronization you apply to a given value is used correctly.
In this case, it seems like you're not checking your locking function results.
The design of the countup function only allows a single thread to ever access the object, running the value all the way up to target before releasing the lock, but you don't check the trylock result.
So what's probably happening is the first thread gets the lock, and subsequent threads on the same mutex call trylock and fail to get the lock, but the code doesn't check the result. Then you get multiple threads incrementing the same value without synchronization. Given all the pointer dereferences the index and increment operations are not guaranteed to be atomic, leading to problems where the values grow well beyond target.
The moral of the story is to check function results and handle errors.
Sorry, don't have enough reputation to comment, yet.
Adding to Brad's comment of not checking the result of pthread_mutex_trylock, there's a misconception that shows many times with Pthreads:
You assume, that pthread_create will start immediately, and receive the values passed (here pointer t to your struct) and it's content read atomically. That is not true. The thread might start any time later and will find the contents, like t->run and t->place already changed by the next iteration of the j-loop in main.
Moreover, you might want to read David Butenhof's book "Programming with Posix Threads" (old, but still a good reference) and check on synchronization and condition variables.
It's not that good style to start that many threads in the first place ;)
As this has come up a few times and might come up again, I have restructured that a bit to issue work_items to the started threads. The code below might be amended by a function, that maps the index into array to always the same area_lock, or by adding a queue to feed the running threads with further work-item...
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
/*
* Macros for default values. To make it more interesting, set:
* ARRAYLENGTH != THREADS
* INCREMENTS != TARGET
* NUM_AREAS != THREADS
* Please note, that NUM_AREAS must be <= ARRAY_LENGTH.
*/
#define ARRAYLENGTH 10
#define TARGET 100
#define INCREMENTS 10
#define NUM_AREAS 2
#define THREADS 5
/* These variables are initialized once in main, then only read... */
int array_len;
int target;
int num_areas;
int threads;
int increments;
/**
* A long array that is going to be equally split into number of areas.
* Each area is covered by a lock. The number of areas do not have to
* equal the length of the array, but must be smaller...
*/
typedef struct shared_array {
int * array;
int num_areas;
pthread_mutex_t * area_locks;
} shared_array;
/**
* A work-item a thread is assigned to upon startup (or later on).
* Then a value of { 0, any } might signal the ending of this thread.
* The thread is working on index within zstr->array, counting up increments
* (or up until the target is reached).
*/
typedef struct work_item {
shared_array * zstr;
int work_on_index;
int increments;
} work_item;
/* Local function declarations */
void * countup(void *);
int main(int argc, char * argv[]) {
int i;
shared_array * zstr;
if (argc == 1) {
array_len = ARRAYLENGTH;
target = TARGET;
num_areas = NUM_AREAS;
threads = THREADS;
increments = INCREMENTS;
} else if (argc == 6) {
array_len = atoi(argv[1]);
target = atoi(argv[2]);
num_areas = atoi(argv[3]);
threads = atoi(argv[4]);
increments = atoi(argv[5]);
} else {
fprintf(stderr, "USAGE: %s len target areas threads increments", argv[0]);
exit(-1);
}
assert(array_len >= num_areas);
zstr = malloc(sizeof (shared_array));
zstr->array = calloc(array_len, sizeof (int));
zstr->num_areas = num_areas;
zstr->area_locks = calloc(num_areas, sizeof (pthread_mutex_t));
for (i = 0; i < num_areas; i++)
pthread_mutex_init(&(zstr->area_locks[i]), NULL);
pthread_t * id = calloc(threads, sizeof (pthread_t));
work_item * work_items = calloc(threads, sizeof (work_item));
for (i = 0; i < threads; i++) {
work_items[i].zstr = zstr;
work_items[i].work_on_index = i % array_len;
work_items[i].increments = increments;
pthread_create(&(id[i]), NULL, &countup, &(work_items[i]));
}
// Let's just do this one work-item.
for (i = 0; i < threads; i++) {
pthread_join(id[i], NULL);
}
printf("Array: ");
for (i = 0; i < array_len; i++)
printf("%d ", zstr->array[i]);
printf("\n");
free(id);
free(work_items);
free(zstr->area_locks);
free(zstr->array);
return 0;
}
void *countup(void* first_work_item) {
work_item * wi = first_work_item;
int inc;
// Extract the information from this work-item.
int idx = wi->work_on_index;
int area = idx % wi->zstr->num_areas;
pthread_mutex_t * lock = &(wi->zstr->area_locks[area]);
pthread_mutex_lock(lock);
for (inc = wi->increments; inc > 0 && wi->zstr->array[idx] < target; inc--)
wi->zstr->array[idx]++;
pthread_mutex_unlock(lock);
return NULL;
}

Compute the summation of a given interval using multiple threads

For my homework, I need to compute the squares of integers in the interval (0,N) (e.g. (0,50) in a way that the load is distributed equally among threads (e.g. 5 threads). I have been advised to use small chunks from the interval and assign it to the thread. For that, I am using a queue. Here's my code:
#include <stdio.h>
#include <pthread.h>
#define QUEUE_SIZE 50
typedef struct {
int q[QUEUE_SIZE];
int first,last;
int count;
} queue;
void init_queue(queue *q)
{
q->first = 0;
q->last = QUEUE_SIZE - 1;
q->count = 0;
}
void enqueue(queue *q,int x)
{
q->last = (q->last + 1) % QUEUE_SIZE;
q->q[ q->last ] = x;
q->count = q->count + 1;
}
int dequeue(queue *q)
{
int x = q->q[ q->first ];
q->first = (q->first + 1) % QUEUE_SIZE;
q->count = q->count - 1;
return x;
}
queue q; //declare the queue data structure
void* threadFunc(void* data)
{
int my_data = (int)data; /* data received by thread */
int sum=0, tmp;
while (q.count)
{
tmp = dequeue(&q);
sum = sum + tmp*tmp;
usleep(1);
}
printf("SUM = %d\n", sum);
printf("Hello from new thread %u - I was created in iteration %d\n",pthread_self(), my_data);
pthread_exit(NULL); /* terminate the thread */
}
int main(int argc, char* argv[])
{
init_queue(&q);
int i;
for (i=0; i<50; i++)
{
enqueue(&q, i);
}
pthread_t *tid = malloc(5 * sizeof(pthread_t) );
int rc; //return value
for(i=0; i<5; i++)
{
rc = pthread_create(&tid[i], NULL, threadFunc, (void*)i);
if(rc) /* could not create thread */
{
printf("\n ERROR: return code from pthread_create is %u \n", rc);
return(-1);
}
}
for(i=0; i<5; i++)
{
pthread_join(tid[i], NULL);
}
}
The output is not always correct. Most of the time it is correct, 40425, but sometimes, the value is bigger. Is it because of the threads are running in parallel and accessing the queue at the same time (the processor on my laptop is is intel i7)? I would appreciate the feedback on my concerns.
I think contrary to what some of the other people here suggested, you don't need any synchronization primitives like semaphores or mutexes at all. Something like this:
Given some array like
int values[50];
I'd create a couple of threads (say: 5), each of which getting a pointer to a struct with the offset into the values array and a number of squares to compute, like
typedef struct ThreadArgs {
int *values;
size_t numSquares;
} ThreadArgs;
You can then start your threads, each of which being told to process 10 numbers:
for ( i = 0; i < 5; ++i ) {
ThreadArgs *args = malloc( sizeof( ThreadArgs ) );
args->values = values + 10 * i;
args->numSquares = 10;
pthread_create( ...., threadFunc, args );
}
Each thread then simply computes the squares it was assigned, like:
void *threadFunc( void *data )
{
ThreadArgs *args = data;
int i;
for ( i = 0; i < args->numSquares; ++i ) {
args->values[i] = args->values[i] * args->values[i];
}
free( args );
}
At the end, you'd just use a pthread_join to wait for all threads to finish, after which you have your squares in the values array.
All your threads read from the same queue. This leads to a race condition. For instance, if the number 10 was read simultaneously by two threads, your result will be offset by 100. You should protect your queue with a mutex. Put the following print in deque function to know which numbers are repeated:
printf("Dequeing %d in thread %d\n", x, pthread_self());
Your code doesn't show where the results are accumulated to a single variable. You should protect that variable with a mutex as well.
Alternatively, you can pass the start number as the input parameter to each thread from the loop so that each thread can work on its set of numbers. First thread will work on 1-10, the second one on 11-20 and so on. In this approach, you have to use mutex only the part where the threads update the global sum variable at the end of their execution.
First you need to define what it means to be "distributed equally among threads." If you mean that each thread does the same amount of work as the other threads, then I would create a single queue, put all the numbers in the queue, and start all threads (which are the same code.) Each thread tries to get a value from the queue which must be protected by a mutex unless it is thread safe, calculates the partial answer from the value taken from the thread, and adds the result to the total which must also be protected by a mutex. If you mean that each thread will execute an equal amount of times as each of the other threads, then you need to make a priority queue and put all the numbers in the queue along with the thread number that should compute on it. Each thread then tries to get a value from the queue that matches its thread number. From the thread point of view, it should try to get a value from the queue, do the work, then try to get another value. If there are no more values to get, then the thread should exit. The main program does a join on all threads and the program exits when all threads have exited.

passing struct to pthread in c not working. Attempt to access passed struct results in Error: expression must have struct or union type

I am writing a multi-threaded program that reads in a file of matrices and creates threads that multiply each row of 2 matrices together. I managed to get everything to work properly using only a single thread, but when I tried to create multiple threads simultaneously, I ran into problems. First, I am storing the result matrices in a global struct containing an array of struct that contains an array of matrix values. (ex. struct matrixArray.array -> struct matrix.array). When my function that was being called by pthread_create ran, my global matrixArray lost all of its data. So, I tried to pass the matrixArray into the function called by pthread_create. However, I keep getting the error: expression must have struct or union type. This is when I try to access the struct inside the function being called, despite the fact I am declaring it as struct matrixArray.
Here are my structures:
typedef struct matrix{
int rows;
int cols;
int multRow; // The "MULTIPLIED ROW" This is for determing which row the current thread needs to use for multiplication. This only applies for Matrix A in each set.
int size;
int set; // This is for which set the matrix belongs to.
char letter; // This is for labeling the matrices A B and C
int * array;
unsigned int * threadID; // Array containing the thread ids that are used to create the result
} matrix;
typedef struct matrixArray{
int size;
matrix * array;
} matrixArray;
Here is the main function:
int main(int argc, char *argv[])
{
pthread_t * tid; /* the thread identifier */
pthread_attr_t attr; /* set of attributes for the thread */
int i; // Counter
int aIndex; // Index of the current 'A' matrix being multiplied.
int rows,cols;
// Checke to make sure we have the correct number of arguments supplied
// when running the program.
if(argc < 1){
printf("Error: You did not provide the correct number of arguments.\n\n");
return 0;
}
// Read the file and create the matrices
readFile();
// Initialize the result matrix before we start creating threads that use it.
mtxResults = newMatrixArray();
// Get the default attributes
pthread_attr_init(&attr);
// Set the current set to be mutliplied to 1
currentSet = 1;
// Create a new matrixArray to pass to the threads
//struct matrixArray *mtxPassed = malloc(sizeof(struct matrixArray));
struct matrixArray *mtxPassed = newMatrixArray();
memcpy(mtxPassed, &mtxResults, sizeof(struct matrixArray));
// Allocate size of tid array based on number of threads
tid = malloc(threads * sizeof(pthread_t));
// Create the threads.
for(i = 0; i < threads; i++){
//pthread_create(&tid[i], &attr, runner, argv[1]);
pthread_create(&tid[i], &attr, runner, mtxPassed);
// Increment currentSet when the current row evalutated
// in the current set is equal to the total number of rows available.
aIndex = ((currentSet * 2) - 2);
if(mtx.array[aIndex].multRow == mtx.array[aIndex].rows){
currentSet++;
}
}
// Wait for threads to finish
for(i = 0; i < threads; i++){
pthread_join(tid[i], NULL);
}
// Print the matrices
//printMatrices();
} // End of main()
And here is the function runner, being called by pthread_create:
// The thread will begin control in this function
void *runner(void *param)
{
struct matrixArray *mtxA = newMatrixArray();
mtxA = (struct matrixArray *)param;
printf("mtxPassed.size = %i\n",mtxA.size);
// Do the matrix multiplication for a single row
matrixMultiply(currentSet, (unsigned int)pthread_self(), mtxA);
pthread_exit(0);
}
Originally, the error was occurring when I passed the mtxA structure to matrixMultiply function, but I have since tried accessing it inside runner directly to see if it worked there. It still is not.
I get the error at "printf("mtxPassed.size = %I\n",mtxA.size);" Instead of the first 2 lines of runner, I have also tried "struct matrixArray *mtxA = (struct matrixArray *)param;" That did not work either.
Here is the error:
[nsltg2#lewis assign2]$ cc -pthread -lpthread assign2.c
assign2.c(602): error: expression must have struct or union type
printf("mtxPassed.size = %i\n",mtxPassed.size);
^
I would really appreciate any help you can offer. Thank you so much!

How to synchronize manager/worker pthreads without a join?

I'm familiar with multithreading and I've developed many multithreaded programs in Java and Objective-C successfully. But I couldn't achieve the following in C using pthreads without using a join from the main thread:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_OF_THREADS 2
struct thread_data {
int start;
int end;
int *arr;
};
void print(int *ints, int n);
void *processArray(void *args);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
ints[i] = i;
}
print(ints, numOfInts); // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pthread_t threads[NUM_OF_THREADS];
struct thread_data thread_data[NUM_OF_THREADS];
// these vars are used to calculate the index ranges for each thread
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
thread_data[i].arr = ints;
thread_data[i].start = startRange;
thread_data[i].end = endRange;
pthread_create(&threads[i], NULL, processArray, (void *)&thread_data[i]);
remainingWork -= amountOfWork;
}
// 1. Signal to the threads to start working
// 2. Wait for them to finish
print(ints, numOfInts); // should print [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
free(ints);
return 0;
}
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
// 1. Wait for a signal to start from the main thread
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// 2. Signal to the main thread that you're done
pthread_exit(NULL);
}
void print(int *ints, int n)
{
printf("[");
for (int i = 0; i < n; i++) {
printf("%d", ints[i]);
if (i+1 != n)
printf(", ");
}
printf("]\n");
}
I would like to achieve the following in the above code:
In main():
Signal to the threads to start working.
Wait for the background threads to finish.
In processArray():
Wait for a signal to start from the main thread
Signal to the main thread that you're done
I don't want to use a join in the main thread because in the real application, the main thread will create the threads once, and then it will signal to the background threads to work many times, and I can't let the main thread proceed unless all the background threads have finished working. In the processArray function, I will put an infinite loop as following:
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
while (1)
{
// 1. Wait for a signal to start from the main thread
int *arr = data->arr;
int start = data->start;
int end = data->end;
// Process
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// 2. Signal to the main thread that you're done
}
pthread_exit(NULL);
}
Note that I'm new to C and the posix API, so excuse me if I'm missing something obvious. But I really tried many things, starting from using a mutex, and an array of semaphores, and a mixture of both, but without success. I think a condition variable may help, but I couldn't understand how it could be used.
Thanks for your time.
Problem Solved:
Thank you guys so much! I was finally able to get this to work safely and without using a join by following your tips. Although the solution is somewhat ugly, it gets the job done and the performance gains is worth it (as you'll see below). For anyone interested, this is a simulation of the real application I'm working on, in which the main thread keeps giving work continuously to the background threads:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_OF_THREADS 5
struct thread_data {
int id;
int start;
int end;
int *arr;
};
pthread_mutex_t currentlyIdleMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t currentlyIdleCond = PTHREAD_COND_INITIALIZER;
int currentlyIdle;
pthread_mutex_t workReadyMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t workReadyCond = PTHREAD_COND_INITIALIZER;
int workReady;
pthread_cond_t currentlyWorkingCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t currentlyWorkingMutex= PTHREAD_MUTEX_INITIALIZER;
int currentlyWorking;
pthread_mutex_t canFinishMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t canFinishCond = PTHREAD_COND_INITIALIZER;
int canFinish;
void print(int *ints, int n);
void *processArray(void *args);
int validateResult(int *ints, int num, int start);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
ints[i] = i;
}
// print(ints, numOfInts);
pthread_t threads[NUM_OF_THREADS];
struct thread_data thread_data[NUM_OF_THREADS];
workReady = 0;
canFinish = 0;
currentlyIdle = 0;
currentlyWorking = 0;
// these vars are used to calculate the index ranges for each thread
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
// Create the threads and give each one its data struct.
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
thread_data[i].id = i;
thread_data[i].arr = ints;
thread_data[i].start = startRange;
thread_data[i].end = endRange;
pthread_create(&threads[i], NULL, processArray, (void *)&thread_data[i]);
remainingWork -= amountOfWork;
}
int loops = 1111111;
int expectedStartingValue = ints[0] + loops; // used to validate the results
// The elements in ints[] should be incremented by 1 in each loop
while (loops-- != 0) {
// Make sure all of them are ready
pthread_mutex_lock(&currentlyIdleMutex);
while (currentlyIdle != NUM_OF_THREADS) {
pthread_cond_wait(&currentlyIdleCond, &currentlyIdleMutex);
}
pthread_mutex_unlock(&currentlyIdleMutex);
// All threads are now blocked; it's safe to not lock the mutex.
// Prevent them from finishing before authorized.
canFinish = 0;
// Reset the number of currentlyWorking threads
currentlyWorking = NUM_OF_THREADS;
// Signal to the threads to start
pthread_mutex_lock(&workReadyMutex);
workReady = 1;
pthread_cond_broadcast(&workReadyCond );
pthread_mutex_unlock(&workReadyMutex);
// Wait for them to finish
pthread_mutex_lock(&currentlyWorkingMutex);
while (currentlyWorking != 0) {
pthread_cond_wait(&currentlyWorkingCond, &currentlyWorkingMutex);
}
pthread_mutex_unlock(&currentlyWorkingMutex);
// The threads are now waiting for permission to finish
// Prevent them from starting again
workReady = 0;
currentlyIdle = 0;
// Allow them to finish
pthread_mutex_lock(&canFinishMutex);
canFinish = 1;
pthread_cond_broadcast(&canFinishCond);
pthread_mutex_unlock(&canFinishMutex);
}
// print(ints, numOfInts);
if (validateResult(ints, numOfInts, expectedStartingValue)) {
printf("Result correct.\n");
}
else {
printf("Result invalid.\n");
}
// clean up
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_cancel(threads[i]);
}
free(ints);
return 0;
}
void *processArray(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
while (1) {
// Set yourself as idle and signal to the main thread, when all threads are idle main will start
pthread_mutex_lock(&currentlyIdleMutex);
currentlyIdle++;
pthread_cond_signal(&currentlyIdleCond);
pthread_mutex_unlock(&currentlyIdleMutex);
// wait for work from main
pthread_mutex_lock(&workReadyMutex);
while (!workReady) {
pthread_cond_wait(&workReadyCond , &workReadyMutex);
}
pthread_mutex_unlock(&workReadyMutex);
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// mark yourself as finished and signal to main
pthread_mutex_lock(&currentlyWorkingMutex);
currentlyWorking--;
pthread_cond_signal(&currentlyWorkingCond);
pthread_mutex_unlock(&currentlyWorkingMutex);
// Wait for permission to finish
pthread_mutex_lock(&canFinishMutex);
while (!canFinish) {
pthread_cond_wait(&canFinishCond , &canFinishMutex);
}
pthread_mutex_unlock(&canFinishMutex);
}
pthread_exit(NULL);
}
int validateResult(int *ints, int n, int start)
{
int tmp = start;
for (int i = 0; i < n; i++, tmp++) {
if (ints[i] != tmp) {
return 0;
}
}
return 1;
}
void print(int *ints, int n)
{
printf("[");
for (int i = 0; i < n; i++) {
printf("%d", ints[i]);
if (i+1 != n)
printf(", ");
}
printf("]\n");
}
I'm not sure though if pthread_cancel is enough for clean up! As for the barrier, it would've been of a great help if it wasn't limited to some OSs as mentioned by #Jeremy.
Benchmarks:
I wanted to make sure that these many conditions aren't actually slowing down the algorithm, so I've setup this benchmark to compare the two solutions:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#define NUM_OF_THREADS 5
struct thread_data {
int start;
int end;
int *arr;
};
pthread_mutex_t currentlyIdleMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t currentlyIdleCond = PTHREAD_COND_INITIALIZER;
int currentlyIdle;
pthread_mutex_t workReadyMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t workReadyCond = PTHREAD_COND_INITIALIZER;
int workReady;
pthread_cond_t currentlyWorkingCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t currentlyWorkingMutex= PTHREAD_MUTEX_INITIALIZER;
int currentlyWorking;
pthread_mutex_t canFinishMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t canFinishCond = PTHREAD_COND_INITIALIZER;
int canFinish;
void *processArrayMutex(void *args);
void *processArrayJoin(void *args);
double doItWithMutex(pthread_t *threads, struct thread_data *data, int loops);
double doItWithJoin(pthread_t *threads, struct thread_data *data, int loops);
int main(int argc, const char * argv[])
{
int numOfInts = 10;
int *join_ints = malloc(numOfInts * sizeof(int));
int *mutex_ints = malloc(numOfInts * sizeof(int));
for (int i = 0; i < numOfInts; i++) {
join_ints[i] = i;
mutex_ints[i] = i;
}
pthread_t join_threads[NUM_OF_THREADS];
pthread_t mutex_threads[NUM_OF_THREADS];
struct thread_data join_thread_data[NUM_OF_THREADS];
struct thread_data mutex_thread_data[NUM_OF_THREADS];
workReady = 0;
canFinish = 0;
currentlyIdle = 0;
currentlyWorking = 0;
int remainingWork = numOfInts, amountOfWork;
int startRange, endRange = -1;
for (int i = 0; i < NUM_OF_THREADS; i++) {
amountOfWork = remainingWork / (NUM_OF_THREADS - i);
startRange = endRange + 1;
endRange = startRange + amountOfWork - 1;
join_thread_data[i].arr = join_ints;
join_thread_data[i].start = startRange;
join_thread_data[i].end = endRange;
mutex_thread_data[i].arr = mutex_ints;
mutex_thread_data[i].start = startRange;
mutex_thread_data[i].end = endRange;
pthread_create(&mutex_threads[i], NULL, processArrayMutex, (void *)&mutex_thread_data[i]);
remainingWork -= amountOfWork;
}
int numOfBenchmarkTests = 100;
int numberOfLoopsPerTest= 1000;
double join_sum = 0.0, mutex_sum = 0.0;
for (int i = 0; i < numOfBenchmarkTests; i++)
{
double joinTime = doItWithJoin(join_threads, join_thread_data, numberOfLoopsPerTest);
double mutexTime= doItWithMutex(mutex_threads, mutex_thread_data, numberOfLoopsPerTest);
join_sum += joinTime;
mutex_sum+= mutexTime;
}
double join_avg = join_sum / numOfBenchmarkTests;
double mutex_avg= mutex_sum / numOfBenchmarkTests;
printf("Join average : %f\n", join_avg);
printf("Mutex average: %f\n", mutex_avg);
double diff = join_avg - mutex_avg;
if (diff > 0.0)
printf("Mutex is %.0f%% faster.\n", 100 * diff / join_avg);
else if (diff < 0.0)
printf("Join is %.0f%% faster.\n", 100 * diff / mutex_avg);
else
printf("Both have the same performance.");
free(join_ints);
free(mutex_ints);
return 0;
}
// From https://stackoverflow.com/a/2349941/408286
double get_time()
{
struct timeval t;
struct timezone tzp;
gettimeofday(&t, &tzp);
return t.tv_sec + t.tv_usec*1e-6;
}
double doItWithMutex(pthread_t *threads, struct thread_data *data, int num_loops)
{
double start = get_time();
int loops = num_loops;
while (loops-- != 0) {
// Make sure all of them are ready
pthread_mutex_lock(&currentlyIdleMutex);
while (currentlyIdle != NUM_OF_THREADS) {
pthread_cond_wait(&currentlyIdleCond, &currentlyIdleMutex);
}
pthread_mutex_unlock(&currentlyIdleMutex);
// All threads are now blocked; it's safe to not lock the mutex.
// Prevent them from finishing before authorized.
canFinish = 0;
// Reset the number of currentlyWorking threads
currentlyWorking = NUM_OF_THREADS;
// Signal to the threads to start
pthread_mutex_lock(&workReadyMutex);
workReady = 1;
pthread_cond_broadcast(&workReadyCond );
pthread_mutex_unlock(&workReadyMutex);
// Wait for them to finish
pthread_mutex_lock(&currentlyWorkingMutex);
while (currentlyWorking != 0) {
pthread_cond_wait(&currentlyWorkingCond, &currentlyWorkingMutex);
}
pthread_mutex_unlock(&currentlyWorkingMutex);
// The threads are now waiting for permission to finish
// Prevent them from starting again
workReady = 0;
currentlyIdle = 0;
// Allow them to finish
pthread_mutex_lock(&canFinishMutex);
canFinish = 1;
pthread_cond_broadcast(&canFinishCond);
pthread_mutex_unlock(&canFinishMutex);
}
return get_time() - start;
}
double doItWithJoin(pthread_t *threads, struct thread_data *data, int num_loops)
{
double start = get_time();
int loops = num_loops;
while (loops-- != 0) {
// create them
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_create(&threads[i], NULL, processArrayJoin, (void *)&data[i]);
}
// wait
for (int i = 0; i < NUM_OF_THREADS; i++) {
pthread_join(threads[i], NULL);
}
}
return get_time() - start;
}
void *processArrayMutex(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
while (1) {
// Set yourself as idle and signal to the main thread, when all threads are idle main will start
pthread_mutex_lock(&currentlyIdleMutex);
currentlyIdle++;
pthread_cond_signal(&currentlyIdleCond);
pthread_mutex_unlock(&currentlyIdleMutex);
// wait for work from main
pthread_mutex_lock(&workReadyMutex);
while (!workReady) {
pthread_cond_wait(&workReadyCond , &workReadyMutex);
}
pthread_mutex_unlock(&workReadyMutex);
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
// mark yourself as finished and signal to main
pthread_mutex_lock(&currentlyWorkingMutex);
currentlyWorking--;
pthread_cond_signal(&currentlyWorkingCond);
pthread_mutex_unlock(&currentlyWorkingMutex);
// Wait for permission to finish
pthread_mutex_lock(&canFinishMutex);
while (!canFinish) {
pthread_cond_wait(&canFinishCond , &canFinishMutex);
}
pthread_mutex_unlock(&canFinishMutex);
}
pthread_exit(NULL);
}
void *processArrayJoin(void *args)
{
struct thread_data *data = (struct thread_data *)args;
int *arr = data->arr;
int start = data->start;
int end = data->end;
// Do the work
for (int i = start; i <= end; i++) {
arr[i] = arr[i] + 1;
}
pthread_exit(NULL);
}
And the output is:
Join average : 0.153074
Mutex average: 0.071588
Mutex is 53% faster.
Thank you again. I really appreciate your help!
There are several synchronization mechanisms you can use (condition variables, for example). I think the simplest would be to use a pthread_barrier to synchronize the the start of the threads.
Assuming that you want all of the threads to 'sync up' on each loop iteration, you can just reuse the barrier. If you need something more flexible, a condition variable might be more appropriate.
When you decide it's time for the thread to wrap up (you haven't indicated how the threads will know to break out of the infinite loop - a simple shared variable might be used for that; the shared variable could be an atomic type or protected with a mutex), the main() thread should use pthread_join() to wait for all the threads to complete.
You need to use a different synchronization technique than join, that's clear.
Unfortunately you have a lot of options. One is a "synchronization barrier", which basically is a thing where each thread that reaches it blocks until they've all reached it (you specify the number of threads in advance). Look at pthread_barrier.
Another is to use a condition-variable/mutex pair (pthread_cond_*). When each thread finishes it takes the mutex, increments a count, signals the condvar. The main thread waits on the condvar until the count reaches the value it expects. The code looks like this:
// thread has finished
mutex_lock
++global_count
// optional optimization: only execute the next line when global_count >= N
cond_signal
mutex_unlock
// main is waiting for N threads to finish
mutex_lock
while (global_count < N) {
cond_wait
}
mutex_unlock
Another is to use a semaphore per thread -- when the thread finishes it posts its own semaphore, and the main thread waits on each semaphore in turn instead of joining each thread in turn.
You also need synchronization to re-start the threads for the next job -- this could be a second synchronization object of the same type as the first, with details changed for the fact that you have 1 poster and N waiters rather than the other way around. Or you could (with care) re-use the same object for both purposes.
If you've tried these things and your code didn't work, maybe ask a new specific question about the code you tried. All of them are adequate to the task.
You are working at the wrong level of abstraction. This problem has been solved already. You are reimplementing a work queue + thread pool.
OpenMP seems like a good fit for your problem. It converts #pragma annotations into threaded code. I believe it would let you express what you're trying to do pretty directly.
Using libdispatch, what you're trying to do would be expressed as a dispatch_apply targeting a concurrent queue. This implicitly waits for all child tasks to complete. Under OS X, it's implemented using a non-portable pthread workqueue interface; under FreeBSD, I believe it manages a group of pthreads directly.
If it is portability concerns driving you to use raw pthreads, don't use pthread barriers. Barriers are an additional extension over and above basic POSIX threads. OS X for example does not support it. For more, see POSIX.
Blocking the main thread till all child threads have completed can be done using a count protected by a condition variable or, even more simply, using a pipe and a blocking read where the number of bytes to read matches the number of threads. Each thread writes one byte on work completion, then sleeps till it gets new work from the main thread. The main thread unblocks once each thread has written its "I'm done!" byte.
Passing work to the child threads can be done using a mutex protecting the work-descriptor and a condition to signal new work. You could use a single array of work descriptors that all threads draw from. On signal, each one tries to grab the mutex. On grabbing the mutex, it would dequeue some work, signal anew if the queue is nonempty, and then process its work, after which it would signal completion to the master thread.
You could reuse this "work queue" to unblock the main thread by enqueueing the results, with the main thread waiting till the result queue length matches the number of threads; the pipe approach is just using a blocking read to do this count for you.
To tell all the threads to start working, it can be as simple as a global integer variable which is initialized to zero, and the threads simply wait until it's non-zero. This way you don't need the while (1) loop in the thread function.
For waiting until they are all done, pthread_join is simplest as it will actually block until the thread it's joining is done. It's also needed to clean up system stuff after the thread (like otherwise the return value from the thread will be stored for the remainder of the program). As you have an array of all pthread_t for the threads, just loop over them one by one. As that part of your program doesn't do anything else, and has to wait until all threads are done, just waiting for them in order is okay.

Passing Data to Multi Threads

I study this code from some book:
#include <pthread.h>
#include <stdio.h>
/* Parameters to print_function. */
struct char_print_parms {
/* The character to print. */
char character;
/* The number of times to print it. */
int count;
};
/* Prints a number of characters to stderr, as given by PARAMETERS,
which is a pointer to a struct char_print_parms. */
void* char_print(void* parameters) {
/* Cast the cookie pointer to the right type. */
struct char_print_parms* p = (struct char_print_parms*) parameters;
int i;
for (i = 0; i < p->count; ++i)
fputc(p->character, stderr);
return NULL;
}
/* The main program. */
int main() {
pthread_t thread1_id;
pthread_t thread2_id;
struct char_print_parms thread1_args;
struct char_print_parms thread2_args;
/* Create a new thread to print 30,000 ’x’s. */
thread1_args.character = 'x';
thread1_args.count = 30000;
pthread_create(&thread1_id, NULL, &char_print, &thread1_args);
/* Create a new thread to print 20,000 o’s. */
thread2_args.character = 'o';
thread2_args.count = 20000;
pthread_create(&thread2_id, NULL, &char_print, &thread2_args);
usleep(20);
return 0;
}
after running this code, I see different result each time. and some time corrupted result. what is wrong and what the correct way to do that?
Add:
pthread_join( thread1_id, NULL);
pthread_join( thread2_id, NULL);
to the bottom of your code, before the return in main. Your prosess ends before your threads can complete. A sleep of 20 micro seconds is not enough to let your threads complete executing. Safer to wait for the threads to return.

Resources