How to speed up C mutex? - c

I has this wrong code.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define MAX 1000
struct TContext {
const char* Name;
int* Counter;
int Mod;
};
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
}
}
pthread_exit(0);
}
int main()
{
pthread_t t1;
pthread_t t2;
int counter = 0;
struct TContext ctxt1 = {"even", &counter, 0};
struct TContext ctxt2 = {"odd", &counter, 1};
pthread_create(&t1, 0, ThreadFunc, &ctxt1);
pthread_create(&t2, 0, ThreadFunc, &ctxt2);
pthread_join(t1, 0);
pthread_join(t2, 0);
printf("\n");
return 0;
}
My aim is to synchronize it and get sequnce 0, 1, 2, 3, 4, 5... .
I am try to lock and unlock mutex in this way
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
pthread_mutex_lock(&mutex);
printf("%d ", (*counter)++);
pthread_mutex_unlock(&mutex);
}
}
pthread_exit(0);
}
But it works very slow(I has tl in one second).
How I can synchronize this code in more effective way? Or maybe I can optimize C-mutex?

A slightly more traditiona way than Chris Halls is:
pthread_cond_t cv;
pthread_mutex_t lock;
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
pthread_mutex_lock(&lock);
while (*counter < MAX) {
if (*counter % 2 == ctxt->Mod) {
printf("%d ", (*counter)++);
pthread_cond_broadcast(&cv);
} else {
pthread_cond_wait(&cv, &lock);
}
}
pthread_mutex_unlock(&lock);
pthread_exit(0);
}
and in main:
pthread_mutex_init(&lock, 0);
pthread_cond_init(&cv, 0);
somewhere before creating the threads. This also lets you add an arbitrary number of even + odd threads without interference ( although no speedup, just intellectual curiosity ).

I suggest:
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
volatile int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = *counter ; // NB: volatile*
if (count >= MAX)
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
*counter = count + 1 ;
} ;
} ;
pthread_exit(0);
}
Which, for x86/x86_64 at least, will have the effect I think you were looking for, namely that the two threads take turns in incrementing the counter.
The really interesting question is why this works :-)
Postscript
The code above depends, critically, on four things:
there is only one value being shared between the threads -- the counter,
the counter is simultaneously data and control -- the ls bit of the counter signals which thread should proceed.
reading and writing the counter must be atomic -- so every read of the counter reads the last value written (and not some combination of the previous and current write).
the compiler must emit code to actually read/write the counter from/to memory inside the loop.
Now (1) and (2) are specific to this particular problem. (3) is generally true for int (though may require correct alignment). (4) is achieved by defining the counter as volatile.
So, I originally said that this would work "for x86/x86_64 at least" because I know (3) is true for those devices, but I also believe it is true for many (most ?) common devices.
A cleaner implementation would define the counter _Atomic, as follows:
#include <stdatomic.h>
void* ThreadFunc(void* arg) {
struct TContext* ctxt = arg;
atomic_int* counter = ctxt->Counter;
fprintf(stderr, "This is %s thread\n", ctxt->Name);
while (1)
{
int count ;
count = atomic_load_explicit(counter, memory_order_relaxed) ;
if (count > MAX) // printing up to and including MAX
break ;
if ((count % 2) == ctxt->Mod)
{
printf("%d ", count) ;
atomic_store_explicit(counter, count + 1, memory_order_relaxed) ;
} ;
} ;
pthread_exit(0);
}
Which makes (3) and (4) explicit. But note that (1) and (2) still mean that we don't need any memory ordering. Every time each thread reads the counter, bit0 tells it whether it "owns" the counter. If it does not own the counter, the thread loops to read it again. If it does own the counter, it uses the value and then writes a new value -- and because that passes "ownership" it returns to the read loop (it cannot do anything further with the counter until it "owns" it again). Once MAX+1 has been written to the counter neither thread will use or change it, so that's safe too.
Brother Employed Russian is correct, there is a "data race" here, but that is resolved by a data dependency, particular to this case.
More Generally
The code above is not terribly useful, unless you have other applications with a single shared value. But this can be generalised, using memory_order_acquire and memory_order_acquire atomic operations.
Suppose we have some struct shared which contains some (non-trivial) amount of data which one thread will produce and another will consume. Suppose we again use atomic_uint counter (initially zero) to manage access to a given struct shared parcel. Now we have a producer thread which:
void* ThreadProducerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 1) ;
... fill the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
And a consumer thread which:
void* ThreadConsumerFunc(void* arg)
{
atomic_uint counter = &count ; // somehow
....
while (1)
{
uint count ;
do
count = atomic_load_explicit(counter, memory_order_acquire) ;
while ((count & 1) == 0) ;
... empty the struct shared parcel, somehow ...
atomic_store_explicit(counter, count + 1, memory_order_release) ;
} ;
....
}
The load-acquire operations synchronize with the store-release operations, so:
in the producer: the filling of the parcel will not start until the producer has "ownership" (as above), and will then "complete" (writes become visible to the other thread) before the count is updated (and the new value becomes visible to the other thread).
in the consumer: the emptying of the parcel will not start until the consumer has "ownership" (as above), and will then "complete" (all reads will have read from memory) before the count is updated (and the new value becomes visible to the other thread).
Clearly, the two threads are busy waiting for each other. But with two or more parcels and counters, the threads can progress at the speed of the slower.
Finally -- x86/x86_64 and acquire/release
With x86/x86_64, all memory reads and writes are implicitly acquire-reads and release-writes. This means that there is zero overhead in atomic_load_explicit(..., memory_order_acquire) and atomic_store_explicit(..., memory_order_release).
Conversely, all read-modify-write operations (and memory_order_seq_cst operations), carry overheads in the several-10s of clocks -- 30?, 50?, more if the operation is contended (depending on the device).
So, where performance is critical, it may be worth understanding what's possible (and what isn't).

How I can synchronize this code in more effective way?
You can't: the code is fundamentally inefficient.
The issue is that the amount of work that you do (incrementing an integer) is minuscule compared to the synchronization overhead, so the latter dominates.
To fix the problem, you need to do more work for each lock/unlock pair.
In a real program, you would have each thread perform 1000 or 10000 "work items" for each lock/unlock iteration. Something like:
lock;
const int start = *ctx->Counter;
*ctx->Counter += N;
unlock;
for (int j = start; j < start + N; j++) /* do work on j-th iteration here */;
But your toy program isn't amenable to this.
Or maybe I can optimize C-mutex?
I suggest trying to implement a correct mutex first. You'll quickly discover that this is far from trivial.

Related

Why does my simple counting program take longer to run with multiple threads? (in C)

Here's my code:
#define COUNT_TO 100000000
#define MAX_CORES 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
long long i = 0;
void* start_counting(void *arg){
for(;;){
pthread_mutex_lock(&mutex);
if(i >= COUNT_TO){
pthread_mutex_unlock(&mutex);
return NULL;
}
i++;
pthread_mutex_unlock(&mutex);
//printf("i = %lld\n", i);
}
}
int main(int argc, char* argv[]){
int i = 0;
pthread_t * thread_group = malloc(sizeof(pthread_t) * MAX_CORES);
for(i = 0; i < MAX_CORES; i++){
pthread_create(&thread_group[i], NULL, start_counting, NULL);
}
for(i = 0; i < MAX_CORES; i++){
pthread_join(thread_group[i], NULL);
}
return 0;
}
This is what your threads do:
Read the value of i.
Increment the value we read.
Write back the incremented value of i.
Go to step 1.
Cleary, another thread cannot read the value of i after a different thread has accomplished step 1 but before it has completed step 3. So there can be no overlap between two threads doing steps 1, 2, or 3.
So all your threads are fighting over access to the same resource -- i (or the mutex that protects it). No thread can make useful forward progress without exclusive access to one or both of those. Given that, there is no benefit to using multiple threads since only one of them can accomplish useful work at a time.

Consumer/producer program got stuck

Below is the code of an example consumer/producer model:
int buffer[MAX];
int fill_ptr = 0;
int use_ptr = 0;
int count = 3;
void put(int value) {
buffer[fill_ptr] = value;
fill_ptr = (fill_ptr + 1) % MAX;
count++;
}
int get() {
int tmp = buffer[use_ptr];
use_ptr = (use_ptr + 1) % MAX;
count--;
return tmp;
}
cond_t empty, fill;
mutex_t mutex;
void *producer(void *arg) {
int i;
for (i = 0; i < loops; i++) {
pthread_mutex_lock(&mutex); // p1
while (count == MAX) // p2
pthread_cond_wait(&empty, &mutex); // p3
put(i);// p4
pthread_cond_signal(&fill); // p5
pthread_mutex_unlock(&mutex); // p6
}
}
void* consumer(void *arg) {
int i;
for (i = 0; i < loops; i++) {
pthread_mutex_lock(&mutex); // c1
while (count == 0) // c2
pthread_cond_wait(&fill, &mutex); // c3
int tmp = get(); // c4
pthread_cond_signal(&empty); // c5
pthread_mutex_unlock(&mutex); // c6
printf("%d\n", tmp);
}
}
However, I think there is a problem in here. Assume MAX=3, and initially the buffer is full (count = 3), then the consumer can execute a get() and signal to the producer. After the producer receives a signal, it wakes up and begins to execute put() in buffer[0] with the mutex held.
Assume the producer just got stuck in put(); then the consumer cannot continue either (because the mutex is held by producer) even though there are 2 resources left.
Is my understanding correct? If so, that's unfair because there are 2 resources left which can be consumed.
Is my understanding correct?
Both yes and no.
Yes, it is correct that the consumerwill be stuck if the producer calls put and put gets stuck (e.g. by entering a endless loop).
However, you can't assume that there are 2 resources left. The pthread_cond_signaldoes not promise the producer executes before the consumer have read all 3 three elements. All you can know is that the consumer read at least one element but maybe it read 2 or even 3 before the producer executes.
If so, that's unfair ....
No, it is not unfair. It is exactly what a mutex is for, i.e. making sure that only one thread has access to the shared resource.
Therefore it is important to make sure that a thread holding a mutex will not get stuck! That is your responsibility as a programmer.
Note: In your case there is nothing inside put that can cause the thread to be stuck.
Assume the producer just got stuck in put(); then the consumer cannot continue either (because the mutex is held by producer) even though there are 2 resources left.
You must never do something that can get stuck while you hold the mutex, and your code does not.

Changing parts of arrays/structs/.. in threads without blocking the whole thing, in pure c

I want to modify some (not all) fields of an array (or structs) in multiple threads, with out blocking the rest of the array as the rest of it is being modified in other threads. How is this achieved? I found some answers, but they are for C++ and I want to do it in C.
Here is the code I got so far:
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#define ARRAYLENGTH 5
#define TARGET 10000
int target;
typedef struct zstr{
int* array;
int place;
int run;
pthread_mutex_t* locks;
}zstr;
void *countup(void *);
int main(int argc, char** args){
int al;
if(argc>2){
al=atoi(args[1]);
target=atoi(args[2]);
}else{
al=ARRAYLENGTH;
target=TARGET;
}
printf("%d %d\n", al, target);
zstr* t=malloc(sizeof(zstr));
t->array=calloc(al, sizeof(int));
t->locks=calloc(al, sizeof(pthread_mutex_t));
int* rua=calloc(al, sizeof(int));
pthread_t id[4*al];
for(int i=0; i<al; i++)
pthread_mutex_init(&(t->locks[i]), NULL);
for(int j=0; j<4*al; j++){
int st=j%al;
t->run=rua[st]++;
t->place=st;
pthread_create(&id[j], NULL, &countup, t);
}
for(int k=0; k<4*al; k++){
pthread_join(id[k], NULL);
}
for(int u=0; u<al; u++)
printf("%d\n", t->array[u]);
free(rua);
free(t->locks);
free(t->array);
return 0;
}
void *countup(void* table){
zstr* nu=table;
if(!nu->run){
pthread_mutex_lock(nu->locks + nu->place);
}else{
pthread_mutex_trylock(nu->locks + nu->place);
}
while(nu->array[nu->place]<target)
nu->array[nu->place]++;
pthread_mutex_unlock(nu->locks + nu->place);
return NULL;
}
Sometimes this works just fine, but then calculates wrong values and for quiet sort problems (like the default values), it takes super long (strangely it worked once when I handed them in as parameters).
There isn't anything special about part of an array or structure. What matters is that the mutex or other synchronization you apply to a given value is used correctly.
In this case, it seems like you're not checking your locking function results.
The design of the countup function only allows a single thread to ever access the object, running the value all the way up to target before releasing the lock, but you don't check the trylock result.
So what's probably happening is the first thread gets the lock, and subsequent threads on the same mutex call trylock and fail to get the lock, but the code doesn't check the result. Then you get multiple threads incrementing the same value without synchronization. Given all the pointer dereferences the index and increment operations are not guaranteed to be atomic, leading to problems where the values grow well beyond target.
The moral of the story is to check function results and handle errors.
Sorry, don't have enough reputation to comment, yet.
Adding to Brad's comment of not checking the result of pthread_mutex_trylock, there's a misconception that shows many times with Pthreads:
You assume, that pthread_create will start immediately, and receive the values passed (here pointer t to your struct) and it's content read atomically. That is not true. The thread might start any time later and will find the contents, like t->run and t->place already changed by the next iteration of the j-loop in main.
Moreover, you might want to read David Butenhof's book "Programming with Posix Threads" (old, but still a good reference) and check on synchronization and condition variables.
It's not that good style to start that many threads in the first place ;)
As this has come up a few times and might come up again, I have restructured that a bit to issue work_items to the started threads. The code below might be amended by a function, that maps the index into array to always the same area_lock, or by adding a queue to feed the running threads with further work-item...
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
/*
* Macros for default values. To make it more interesting, set:
* ARRAYLENGTH != THREADS
* INCREMENTS != TARGET
* NUM_AREAS != THREADS
* Please note, that NUM_AREAS must be <= ARRAY_LENGTH.
*/
#define ARRAYLENGTH 10
#define TARGET 100
#define INCREMENTS 10
#define NUM_AREAS 2
#define THREADS 5
/* These variables are initialized once in main, then only read... */
int array_len;
int target;
int num_areas;
int threads;
int increments;
/**
* A long array that is going to be equally split into number of areas.
* Each area is covered by a lock. The number of areas do not have to
* equal the length of the array, but must be smaller...
*/
typedef struct shared_array {
int * array;
int num_areas;
pthread_mutex_t * area_locks;
} shared_array;
/**
* A work-item a thread is assigned to upon startup (or later on).
* Then a value of { 0, any } might signal the ending of this thread.
* The thread is working on index within zstr->array, counting up increments
* (or up until the target is reached).
*/
typedef struct work_item {
shared_array * zstr;
int work_on_index;
int increments;
} work_item;
/* Local function declarations */
void * countup(void *);
int main(int argc, char * argv[]) {
int i;
shared_array * zstr;
if (argc == 1) {
array_len = ARRAYLENGTH;
target = TARGET;
num_areas = NUM_AREAS;
threads = THREADS;
increments = INCREMENTS;
} else if (argc == 6) {
array_len = atoi(argv[1]);
target = atoi(argv[2]);
num_areas = atoi(argv[3]);
threads = atoi(argv[4]);
increments = atoi(argv[5]);
} else {
fprintf(stderr, "USAGE: %s len target areas threads increments", argv[0]);
exit(-1);
}
assert(array_len >= num_areas);
zstr = malloc(sizeof (shared_array));
zstr->array = calloc(array_len, sizeof (int));
zstr->num_areas = num_areas;
zstr->area_locks = calloc(num_areas, sizeof (pthread_mutex_t));
for (i = 0; i < num_areas; i++)
pthread_mutex_init(&(zstr->area_locks[i]), NULL);
pthread_t * id = calloc(threads, sizeof (pthread_t));
work_item * work_items = calloc(threads, sizeof (work_item));
for (i = 0; i < threads; i++) {
work_items[i].zstr = zstr;
work_items[i].work_on_index = i % array_len;
work_items[i].increments = increments;
pthread_create(&(id[i]), NULL, &countup, &(work_items[i]));
}
// Let's just do this one work-item.
for (i = 0; i < threads; i++) {
pthread_join(id[i], NULL);
}
printf("Array: ");
for (i = 0; i < array_len; i++)
printf("%d ", zstr->array[i]);
printf("\n");
free(id);
free(work_items);
free(zstr->area_locks);
free(zstr->array);
return 0;
}
void *countup(void* first_work_item) {
work_item * wi = first_work_item;
int inc;
// Extract the information from this work-item.
int idx = wi->work_on_index;
int area = idx % wi->zstr->num_areas;
pthread_mutex_t * lock = &(wi->zstr->area_locks[area]);
pthread_mutex_lock(lock);
for (inc = wi->increments; inc > 0 && wi->zstr->array[idx] < target; inc--)
wi->zstr->array[idx]++;
pthread_mutex_unlock(lock);
return NULL;
}

Multithreaded search in c

I'm supposed to have two threads that search for the minimum element in an array: the first one searches the first half, and the second thread searches the other half. However, when I run my code, it seems that it chooses a thread randomly. I'm not sure what I'm doing wrong, but it probably has to do with the "mid" part. I tried dividing an array into two, finding the midpoint and then writing the conditions from there, but I probably went wrong somewhere. I also tried putting array[i] in the conditions, but in that case only thread2 executes.
EDIT: I'm really trying my best here, but I'm not getting anywhere. I edited the code in a way that made sense to me, and I probably typecasted "min" wrong but now it doesn't even execute it just gives me an error, even though it compiles just fine. I'm just a beginner, and while I do understand everything you guys are talking about, I have a hard time implementing the ideas, so really, any help with fixing this is appreciated!
EDIT2: Okay so the previous code made no sense at all, I do apologize but I was exhausted while writing it. Anyway, I came up with something else that works partially! I split the array into two halves, however only the first element is accessible when using the pointer. But would it work if the whole array was being accessed and if so how can I fix that then?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#define size 20
void *smallest(void *arg);
pthread_t th, th2;
int array[size], i, min;
int main(int argc, char *argv[]) {
srand ( time(NULL) );
for(i = 0; i < size; i++)
{
array[i] = (rand() % 100)+1;
printf("%d ", array[i]);
}
int *array1 = malloc(10 * sizeof(int));
int *array2 = malloc(10 * sizeof(int));
memcpy(array1, array, 10 * sizeof(int));
memcpy(array2, array + 10, 10 * sizeof(int));
printf("\nFirst half gives %d \n", *array1);
printf("Second half gives %d \n", *array2);
pthread_create(&th, NULL, smallest, (void*) array1);
pthread_create(&th2, NULL, smallest, (void*) array2);
pthread_join(th, NULL);
pthread_join(th2, NULL);
//printf("\nFirst half gives %d\n", array1);
//printf("Second half gives %d\n", array2);
if (*array1 < *array2) {
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *array1);
}
else {
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *array2);
}
return 0;
}
void *smallest(void* arg){
int *array = (int*)arg;
min = array[0];
for (i = 0; i < size; i++) {
if (array[i] < min) {
min = array[i];
}
}
pthread_exit(NULL);
}
The code you've set up never runs more than one thread. Notice that if you run the first branch of the if statement, you fire off one thread to search half the array, wait for it to finish, then continue onward, and if the else branch executes, the same thing happens in the second half of the array. Fundamentally, you probably want to rethink your strategy here by having the code always launch two threads and join each of them only after both threads have started running.
The condition within your if statement also seems like it's mistaken. You're asking whether the middle element of the array is greater than its index. I assume this isn't what you're trying to do.
Finally, the code you have in each thread always looks at the entire array, not just a half of it. I would recommend rewriting the thread routine so that its argument represents the start and end indices of the range to take the minimum of. You would then update the code in main so that when you fire off the thread, you specify which range to search.
I would structure things like this:
Fire off a thread to find the minimum of the first half of the array.
Fire off a thread to find the minimum of the second half of the array.
Join both threads.
Use the results from each thread to find the minimum.
As one final note, since you'll have two different threads each running at the same time, you'll need to watch for data races as both threads try to read or write the minimum value. Consider having each thread use its exit code to signal where the minimum is and then resolving the true minimum back in main. This eliminates the race condition. Alternatively, have one global minimum value, but guard it with a mutex.
1) You´re redeclaring the global variables in the main function, so there´s actually no point in declaring i, low, high, min:
int array[size], i, low, high, min;
The problem you´re having is with the scope of the variables when you redeclare the variables in the main function, the global ones with the same name become "invisible"
int *low = array;
int *high = array + (size/2);
int mid = (*low + *high) / 2;
So when you run the threads all the values of your variables(low, high, min;
) are 0, this is because they are never actually modified by the main and because they start in 0 default(startup code,etc).
Anyways I wouldn´t really recommend(it´s really frowned upon) using global variables unless it´s a really small proyect for personal use.
2) Another crucial problem is that you´re ignorning the main idea behind threads which is running both simultaneously
if (array[mid] > mid) {
pthread_create(&th, NULL, &smallest, NULL);
pthread_join(th, NULL);
printf("\nThread1 finds the number\n");
}
else if (array[mid] < mid) {
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
printf("\nThread2 finds the number\n");
}
You´re actually only running one thread when executing.
Try something like this:
pthread_create(&th, NULL, &smallest, NULL);
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
pthread_join(th, NULL);
3) You are trying to have two threads access the same variable this can result in undefined behaviour, you MUST use a muthex to avoid a number from not actually being stored.
This guide is pretty complete regarding mutexes but if you need anyhelp please let me know.
This is a single threaded version of what you are asking.
#include <stdio.h>
#include <stdlib.h>
/*
I can not run pthread on my system.
So this is some code that should kind of work the same way
*/
typedef int pthread_t;
typedef int pthread_attr_t;
typedef void*(*threadfunc)(void*);
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg)
{
start_routine(arg);
return 0;
}
int pthread_join(pthread_t thread, void **value_ptr)
{
return 0;
}
struct context
{
int* begin;
int* end;
int* result;
};
//the function has to be castable to the threadfunction type
//that way you do not have to worry about casting the argument.
//be careful though - if something does not match these errors may be hard to track
void * smallest(context * c) //signature needet for start routine
{
c->result = c->begin;
for (int* current = c->begin; current < c->end; ++current)
{
if (*current < *c->result)
{
c->result = current;
}
}
return 0; // not needet with the way the argument is set up.
}
int main(int argc, char *argv[])
{
pthread_t t1, t2;
#define size 20
int array[size];
srand(0);
for (int i = 0; i < size; ++i)
{
array[i] = (rand() % 100) + 1;
printf("%d ", array[i]);
}
//prepare data
//one surefire way of messing up in multithreading is sharing data between threads.
//even a simple approach like storing in a variable who is accessing will not solve the issues
//to properly lock data you would have to dive into the memory model.
//either lock with mutexes or memory barriers or just don' t share data between threads.
context c1;
context c2;
c1.begin = array;
c1.end = array + (size / 2);
c2.begin = c1.end + 1;
c2.end = array + size;
//start threads - here your threads would go
//note the casting - you may wnt to wrap this in its own function
//there is error potential here, especially due to maintainance etc...
pthread_create(&t1, 0, (void*(*)(void*))smallest, &c1); //without typedef
pthread_create(&t2, 0, (threadfunc)smallest, &c2); //without typedef
pthread_join(t1, 0);//instead of zero you could have a return value here
pthread_join(t1, 0);//as far as i read 0 throws the return value away
//return value could be useful for error handling
//evaluate
if (*c1.result < *c2.result)
{
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *c1.result);
}
else
{
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *c2.result);
}
return 0;
}
Edit:
I edited some stubs in to give you an idea of how to use multithreading.
I never used pthread but this should likely work.
I used this source for prototype information.

Compute the summation of a given interval using multiple threads

For my homework, I need to compute the squares of integers in the interval (0,N) (e.g. (0,50) in a way that the load is distributed equally among threads (e.g. 5 threads). I have been advised to use small chunks from the interval and assign it to the thread. For that, I am using a queue. Here's my code:
#include <stdio.h>
#include <pthread.h>
#define QUEUE_SIZE 50
typedef struct {
int q[QUEUE_SIZE];
int first,last;
int count;
} queue;
void init_queue(queue *q)
{
q->first = 0;
q->last = QUEUE_SIZE - 1;
q->count = 0;
}
void enqueue(queue *q,int x)
{
q->last = (q->last + 1) % QUEUE_SIZE;
q->q[ q->last ] = x;
q->count = q->count + 1;
}
int dequeue(queue *q)
{
int x = q->q[ q->first ];
q->first = (q->first + 1) % QUEUE_SIZE;
q->count = q->count - 1;
return x;
}
queue q; //declare the queue data structure
void* threadFunc(void* data)
{
int my_data = (int)data; /* data received by thread */
int sum=0, tmp;
while (q.count)
{
tmp = dequeue(&q);
sum = sum + tmp*tmp;
usleep(1);
}
printf("SUM = %d\n", sum);
printf("Hello from new thread %u - I was created in iteration %d\n",pthread_self(), my_data);
pthread_exit(NULL); /* terminate the thread */
}
int main(int argc, char* argv[])
{
init_queue(&q);
int i;
for (i=0; i<50; i++)
{
enqueue(&q, i);
}
pthread_t *tid = malloc(5 * sizeof(pthread_t) );
int rc; //return value
for(i=0; i<5; i++)
{
rc = pthread_create(&tid[i], NULL, threadFunc, (void*)i);
if(rc) /* could not create thread */
{
printf("\n ERROR: return code from pthread_create is %u \n", rc);
return(-1);
}
}
for(i=0; i<5; i++)
{
pthread_join(tid[i], NULL);
}
}
The output is not always correct. Most of the time it is correct, 40425, but sometimes, the value is bigger. Is it because of the threads are running in parallel and accessing the queue at the same time (the processor on my laptop is is intel i7)? I would appreciate the feedback on my concerns.
I think contrary to what some of the other people here suggested, you don't need any synchronization primitives like semaphores or mutexes at all. Something like this:
Given some array like
int values[50];
I'd create a couple of threads (say: 5), each of which getting a pointer to a struct with the offset into the values array and a number of squares to compute, like
typedef struct ThreadArgs {
int *values;
size_t numSquares;
} ThreadArgs;
You can then start your threads, each of which being told to process 10 numbers:
for ( i = 0; i < 5; ++i ) {
ThreadArgs *args = malloc( sizeof( ThreadArgs ) );
args->values = values + 10 * i;
args->numSquares = 10;
pthread_create( ...., threadFunc, args );
}
Each thread then simply computes the squares it was assigned, like:
void *threadFunc( void *data )
{
ThreadArgs *args = data;
int i;
for ( i = 0; i < args->numSquares; ++i ) {
args->values[i] = args->values[i] * args->values[i];
}
free( args );
}
At the end, you'd just use a pthread_join to wait for all threads to finish, after which you have your squares in the values array.
All your threads read from the same queue. This leads to a race condition. For instance, if the number 10 was read simultaneously by two threads, your result will be offset by 100. You should protect your queue with a mutex. Put the following print in deque function to know which numbers are repeated:
printf("Dequeing %d in thread %d\n", x, pthread_self());
Your code doesn't show where the results are accumulated to a single variable. You should protect that variable with a mutex as well.
Alternatively, you can pass the start number as the input parameter to each thread from the loop so that each thread can work on its set of numbers. First thread will work on 1-10, the second one on 11-20 and so on. In this approach, you have to use mutex only the part where the threads update the global sum variable at the end of their execution.
First you need to define what it means to be "distributed equally among threads." If you mean that each thread does the same amount of work as the other threads, then I would create a single queue, put all the numbers in the queue, and start all threads (which are the same code.) Each thread tries to get a value from the queue which must be protected by a mutex unless it is thread safe, calculates the partial answer from the value taken from the thread, and adds the result to the total which must also be protected by a mutex. If you mean that each thread will execute an equal amount of times as each of the other threads, then you need to make a priority queue and put all the numbers in the queue along with the thread number that should compute on it. Each thread then tries to get a value from the queue that matches its thread number. From the thread point of view, it should try to get a value from the queue, do the work, then try to get another value. If there are no more values to get, then the thread should exit. The main program does a join on all threads and the program exits when all threads have exited.

Resources