This is the code on implementation of Sloppy Counter from OSTEP that I am currently reading, and there are a few things that I don't understand
The following code is assumed to run on a 4-core CPU
typedef struct __counter_t {
int global; // global count
pthread_mutex_t glock; // global lock
int local[NUMCPUS]; // local count (per cpu)
pthread_mutex_t llock[NUMCPUS]; // ... and locks
int threshold; // update frequency
}counter_t;
// init: record threshold, init locks, init values
// of all local counts and global count
void init(counter_t* c, int threshold) {
c->threshold = threshold;
c->global = 0;
pthread_mutex_init(&c->glock, NULL);
int i;
for (i = 0; i < NUMCPUS; i++) {
c->local[i] = 0;
pthread_mutex_init(&c->llock[i], NULL);
}
// update: usually, just grab local lock and update local amount
// once local count has risen by ‘threshold’, grab global
// lock and transfer local values to it
void update(counter_t* c, int threadID, int amt) {
pthread_mutex_lock(&c->llock[threadID]);
c->local[threadID] += amt; // assumes amt > 0
if (c->local[threadID] >= c->threshold) { // transfer to global
pthread_mutex_lock(&c->glock);
c->global += c->local[threadID];
pthread_mutex_unlock(&c->glock);
c->local[threadID] = 0;
}
pthread_mutex_unlock(&c->llock[threadID]);
}
// get: just return global amount (which may not be perfect)
int get(counter_t* c) {
pthread_mutex_lock(&c->glock);
int val = c->global;
pthread_mutex_unlock(&c->glock);
return val; // only approximate!
}
Why must there be a lock for each local counter in __counter_t? In update() function, the id of the thread is passed in as argument, so doesn't that mean that only one thread would be able to access local[threadID]? And if context switch happens, the other thread would only access the local[threadID] that corresponds to their threadID. I don't understand why the threads must be locked before accessing their own local[NUMCPUS] since each element inside the array would not be accessed by the other threads other than their own, and no other thread would call update() with same threadID
Why must there be a lock for each local counter
To quote the book
In addition to these counters, there are also locks: one for each local counter1, and one for the global counter.
What does the 1 mean? At the end of the page:
1 We need the local locks because we assume there may be more than one thread on each core. If, instead, only one thread ran on each core, no local lock would be needed.
Related
i wrote a simple c program to make every thread multiplate its index by 1000000 and add it to sum , i created 5 threads so the logic answer would be (0+1+2+3+4)*1000000 which is 10000000 but it throws 14000000 instead .could anyone helps me understanding this?
#include<pthread.h>
#include<stdio.h>
typedef struct argument {
int index;
int sum;
} arg;
void *fonction(void *arg0) {
((arg *) arg0) -> sum += ((arg *) arg0) -> index * 1000000;
}
int main() {
pthread_t thread[5];
int order[5];
arg a;
for (int i = 0; i < 5; i++)
order[i] = i;
a.sum = 0;
for (int i = 0; i < 5; i++) {
a.index = order[i];
pthread_create(&thread[i], NULL, fonction, &a);
}
for (int i = 0; i < 5; i++)
pthread_join(thread[i], NULL);
printf("%d\n", a.sum);
return 0;
}
It is 140.. because the behavior is undefined. The results will differ on different machines and other environmental factors. The undefined behavior is caused as a result of all threads accessing the same object (see &a given to each thread) that is modified after the first thread is created.
When each thread runs it accesses the same index (as part of accessing a member of the same object (&a)). Thus the assumption that the threads will see [0,1,2,3,4] is incorrect: multiple threads likely see the same value of index (eg. [0,2,4,4,4]1) when they run. This depends on the scheduling with the loop creating threads as it also modifies the shared object.
When each thread updates sum it has to read and write to the same shared memory. This is inherently prone to race conditions and unreliable results. For example, it could be lack of memory visibility (thread X doesn’t see value updated from thread Y) or it could be a conflicting thread schedule between the read and write (thread X read, thread Y read, thread X write, thread Y write) etc..
If creating a new arg object for each thread, then both of these problems are avoided. While the sum issue can be fixed with the appropriate locking, the index issue can only be fixed by not sharing the object given as the thread input.
// create 5 arg objects, one for each thread
arg a[5];
for (..) {
a[i].index = i;
// give DIFFERENT object to each thread
pthread_create(.., &a[i]);
}
// after all threads complete
int sum = 0;
for (..) {
sum += a[i].result;
}
1 Even assuming that there is no race condition in the current execution wrt. the usage of sum, the sequence for the different threads seeing index values as [0,2,4,4,4], the sum of which is 14, might look as follows:
a.index <- 0 ; create thread A
thread A reads a.index (0)
a.index <- 1 ; create thread B
a.index <- 2 ; create thread C
thread B reads a.index (2)
a.index <- 3 ; create thread D
a.index <- 4 ; create thread E
thread D reads a.index (4)
thread C reads a.index (4)
thread E reads a.index (4)
I have an array of 100 requests(integers). I want to create 4 threads to which i call a function(thread_function) and with this function i want every thread to take one by one the requests:
(thread0->request0,
thread1->request1,
thread2->request2,
thread3->request3
and then thread0->request4 etc up to 100) all these by using mutexes.
Here is the code i have writen so far:
threadRes = pthread_create(&(threadID[i]), NULL,thread_function, (void *)id_size);
This is inside my main and it is in a loop for 4 times.Now outside my main:
void *thread_function(void *arg){
int *val_p=(int *) arg;
for(i=0; i<200; i=i+2)
{
f=false;
for (j= 0; j<100; j++)
{
if (val_p[i]==cache[j].id)
f=true;
}
if(f==true)
{
printf("The request %d has been served.\n",val_p[i]);
}
else
{
cache[k].id=val_p[i];
printf("\nCurrent request to be served:%d \n",cache[k].id);
k++;
}
}
Where: val_p is the array with the requests and cache is an array of structs to store the id(requests).
-So now i want mutexes to synchronize my threads. I considered using inside my main:
pthread_join(threadID[0], NULL);
pthread_join(threadID[1], NULL);
pthread_join(threadID[2], NULL);
pthread_join(threadID[3], NULL);
pthread_mutex_destroy(&mutex);
and inside the function to use:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
Before i finish i would like to say that so far my programm result is that 4threads run 100 requests each(400) and what i want to achieve is that 4threads run 100 threads total.
Thanks for your time.
You need to use a loop that looks like this:
Acquire lock.
See if there's any work to be done. If not, release the lock and terminate.
Mark the work that we're going to do as not needing to be done anymore.
Release the lock.
Do the work.
(If necessary) Acquire the lock. Mark the work done and/or report results. Release the lock.
Go to step 1.
Notice how while holding the lock, the thread discovers what work it should do and then prevents any other thread from taking the same assignment before it releases the lock. Note also that the lock is not held while doing the work so that multiple threads can work concurrently.
You may want to post more of your code. How the arrays are set up, how the segment is passed to the individual threads, etc.
Note that using printf will perturb the timing of the threads. It does its own mutex for access to stdout, so it's probably better to no-op this. Or, have a set of per-thread logfiles so the printf calls don't block against one another.
Also, in your thread loop, once you set f to true, you can issue a break as there's no need to scan further.
val_p[i] is loop invariant, so we can fetch that just once at the start of the i loop.
We don't see k and cache, but you'd need to mutex wrap the code that sets these values.
But, that does not protect against races in the for loop. You'd have to wrap the fetch of cache[j].id in a mutex pair inside the loop. You might be okay without the mutex inside the loop on some arches that have good cache snooping (e.g. x86).
You might be better off using stdatomic.h primitives. Here's a version that illustrates that. It compiles but I've not tested it:
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
int k;
#define true 1
#define false 0
struct cache {
int id;
};
struct cache cache[100];
#ifdef DEBUG
#define dbgprt(_fmt...) \
printf(_fmt)
#else
#define dbgprt(_fmt...) \
do { } while (0)
#endif
void *
thread_function(void *arg)
{
int *val_p = arg;
int i;
int j;
int cval;
int *cptr;
for (i = 0; i < 200; i += 2) {
int pval = val_p[i];
int f = false;
// decide if request has already been served
for (j = 0; j < 100; j++) {
cptr = &cache[j].id;
cval = atomic_load(cptr);
if (cval == pval) {
f = true;
break;
}
}
if (f == true) {
dbgprt("The request %d has been served.\n",pval);
continue;
}
// increment the global k value [atomically]
int kold = atomic_load(&k);
int knew;
while (1) {
knew = kold + 1;
if (atomic_compare_exchange_strong(&k,&kold,knew))
break;
}
// get current cache value
cptr = &cache[kold].id;
int oldval = atomic_load(cptr);
// mark the cache
// this should never loop because we atomically got our slot with
// the k value
while (1) {
if (atomic_compare_exchange_strong(cptr,&oldval,pval))
break;
}
dbgprt("\nCurrent request to be served:%d\n",pval);
}
return (void *) 0;
}
I'm writing an extremely simple program to demonstrate a Pthreads implementation I ported from C++ back to C.
I create two lock-step threads and give them two jobs
One increments a1 once per step
One decrements a2 once per step
During the synchronized phase (When the mutexes are locked for both t1 and t2) I compare a1 and a2 to see if we should stop stepping.
I'm wondering if i'm going crazy here because not only does the variable not always change after stepping and locking but they sometimes change at different rates as if the threads were running even after the locks.
EDIT: Yes, I did research this. Yes, the C++ implementation works. Yes, the C++ implementation is nearly identical to this one, but I had to cast PTHREAD_MUTEX_INITIALIZER and PTHREAD_COND_INITIALIZER in c and pass this as the first argument to every function. I spent a while trying to debug this (short of whipping out gdb) to no avail.
#ifndef LOCKSTEPTHREAD_H
#define LOCKSTEPTHREAD_H
#include <pthread.h>
#include <stdio.h>
typedef struct {
pthread_mutex_t myMutex;
pthread_cond_t myCond;
pthread_t myThread;
int isThreadLive;
int shouldKillThread;
void (*execute)();
} lsthread;
void init_lsthread(lsthread* t);
void start_lsthread(lsthread* t);
void kill_lsthread(lsthread* t);
void kill_lsthread_islocked(lsthread* t);
void lock(lsthread* t);
void step(lsthread* t);
void* lsthread_func(void* me_void);
#ifdef LOCKSTEPTHREAD_IMPL
//function declarations
void init_lsthread(lsthread* t){
//pthread_mutex_init(&(t->myMutex), NULL);
//pthread_cond_init(&(t->myCond), NULL);
t->myMutex = (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER;
t->myCond = (pthread_cond_t)PTHREAD_COND_INITIALIZER;
t->isThreadLive = 0;
t->shouldKillThread = 0;
t->execute = NULL;
}
void destroy_lsthread(lsthread* t){
pthread_mutex_destroy(&t->myMutex);
pthread_cond_destroy(&t->myCond);
}
void kill_lsthread_islocked(lsthread* t){
if(!t->isThreadLive)return;
//lock(t);
t->shouldKillThread = 1;
step(t);
pthread_join(t->myThread,NULL);
t->isThreadLive = 0;
t->shouldKillThread = 0;
}
void kill_lsthread(lsthread* t){
if(!t->isThreadLive)return;
lock(t);
t->shouldKillThread = 1;
step(t);
pthread_join(t->myThread,NULL);
t->isThreadLive = 0;
t->shouldKillThread = 0;
}
void lock(lsthread* t){
if(pthread_mutex_lock(&t->myMutex))
puts("\nError locking mutex.");
}
void step(lsthread* t){
if(pthread_cond_signal(&(t->myCond)))
puts("\nError signalling condition variable");
if(pthread_mutex_unlock(&(t->myMutex)))
puts("\nError unlocking mutex");
}
void* lsthread_func(void* me_void){
lsthread* me = (lsthread*) me_void;
int ret;
if (!me)pthread_exit(NULL);
if(!me->execute)pthread_exit(NULL);
while (!(me->shouldKillThread)) {
ret = pthread_cond_wait(&(me->myCond), &(me->myMutex));
if(ret)pthread_exit(NULL);
if (!(me->shouldKillThread) && me->execute)
me->execute();
}
pthread_exit(NULL);
}
void start_lsthread(lsthread* t){
if(t->isThreadLive)return;
t->isThreadLive = 1;
t->shouldKillThread = 0;
pthread_create(
&t->myThread,
NULL,
lsthread_func,
(void*)t
);
}
#endif
#endif
This is my driver program:
#define LOCKSTEPTHREAD_IMPL
#include "include/lockstepthread.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
unsigned char a1, a2;
void JobThread1(){
unsigned char copy = a1;
copy++;
a1 = copy;
}
void JobThread2(){
unsigned char copy = a2;
copy--;
a2 = copy;
}
int main(){
char inputline[2048];
inputline[2047] = '\0';
lsthread t1, t2;
init_lsthread(&t1);
init_lsthread(&t2);
t1.execute = JobThread1;
t2.execute = JobThread2;
printf(
"\nThis program demonstrates threading by having"
"\nTwo threads \"walk\" toward each other using unsigned chars."
"\nunsigned Integer overflow guarantees the two will converge."
);
printf("\nEnter a number for thread 1 to process: ");
fgets(inputline, 2047,stdin);
a1 = (unsigned char)atoi(inputline);
printf("\nEnter a number for thread 2 to process: ");
fgets(inputline, 2047,stdin);
a2 = (unsigned char)atoi(inputline);
start_lsthread(&t1);
start_lsthread(&t2);
unsigned int i = 0;
lock(&t1);
lock(&t2);
do{
printf("\n%u: a1 = %d, a2 = %d",i++,(int)a1,(int)a2);
fflush(stdout);
step(&t1);
step(&t2);
lock(&t1);
lock(&t2);
}while(a1 < a2);
kill_lsthread_islocked(&t1);
kill_lsthread_islocked(&t2);
destroy_lsthread(&t1);
destroy_lsthread(&t2);
return 0;
}
Example program usage:
Enter a number for thread 1 to process: 5
Enter a number for thread 2 to process: 10
0: a1 = 5, a2 = 10
1: a1 = 5, a2 = 10
2: a1 = 5, a2 = 10
3: a1 = 5, a2 = 10
4: a1 = 5, a2 = 10
5: a1 = 5, a2 = 10
6: a1 = 6, a2 = 9
7: a1 = 6, a2 = 9
8: a1 = 7, a2 = 9
9: a1 = 7, a2 = 9
10: a1 = 7, a2 = 9
11: a1 = 7, a2 = 9
12: a1 = 8, a2 = 9
So, what's the deal?
Generally speaking, it sounds like what you're really looking for is a barrier. Nevertheless, I answer the question as posed.
Yes, the C++ implementation works. Yes, the C++ implementation is
nearly identical to this one, but I had to cast
PTHREAD_MUTEX_INITIALIZER and PTHREAD_COND_INITIALIZER in c and pass
this as the first argument to every function. I spent a while trying
to debug this (short of whipping out gdb) to no avail.
That seems unlikely. There are data races and undefined behavior all over the code presented, whether interpreted as C or as C++.
General design
Since you provide an explicit lock() function, which seems reasonable, you should provide an explicit unlock() function as well. Any other functions that expect to be called with the mutex locked should return with the mutex locked, so that the caller can explicitly pair lock() calls with unlock() calls. Failure to adhere to this pattern invites bugs.
In particular, step() should not unlock the mutex unless it also locks it, but I think a non-locking version will suit the purpose.
Initialization
I had to cast PTHREAD_MUTEX_INITIALIZER and PTHREAD_COND_INITIALIZER in c
No, you didn't, because you can't, at least not if pthread_mutex_t and pthread_cond_t are structure types. Initializers for structure types are not values. They do not have types and cannot be cast. But you can form compound literals from them, and that is what you have inadvertently done. This is not a conforming way to assign a value to a pthread_mutex_t or a pthread_cond_t.* The initializer macros are designated for use only in initializing variables in their declarations. That's what "initializer" means in this context.
Example:
pthread_mutex_t mutex = PTREAD_MUTEX_INITIALIZER;
Example:
struct {
pthread_mutex_t mutex;
pthread_cond_t cv;
} example = { PTHREAD_MUTEX_INITIALIZER, PTHREAD_COND_INITIALIZER };
To initialize a mutex or condition variable object in any other context requires use of the corresponding initialization function, pthread_mutex_init() or pthread_cond_init().
Data races
Non-atomic accesses to shared data by multiple concurrently-running threads must be protected by a mutex or other synchronization object if any of the accesses are writes (exceptions apply for accesses to mutexes and other synchronization objects themselves). Shared data in your example include file-scope variables a1 and a2, and most of the members of your lsthread instances. Your lsthread_func and driver both sometimes fail to lock the appropriate mutex before accessing those shared data, and some of the accesses involved are indeed writes, so undefined behavior ensues. Observing unexpected values of a1 and a2 is an entirely plausible manifestation of that undefined behavior.
Condition variable usage
The thread that calls pthread_cond_wait() must do so while holding the specified mutex locked. Your lsthread_func() does not adhere to that requirement, so more undefined behavior ensues. If you're very lucky, that might manifest as an immediate spurious wakeup.
And speaking of spurious wakeups, you do not guard against them. If one does occur then lsthread_func() blithely goes on to perform another iteration of its loop. To avoid this, you need shared data somewhere upon which the condition of the condition variable is predicated. The standard usage of a CV is to check that predicate before waiting, and to loop back and check it again after waking up, repeatedly if necessary, not proceeding until the predicate evaluates true.
Synchronized stepping
The worker threads do not synchronize directly with each other, so only the driver can ensure that one does not run ahead of the other. But it doesn't. The driver does nothing at all to ensure that either thread has completed a step before signaling both threads to perform another. Condition variables do not store signals, so if, by some misfortune of scheduling or by the nature of the tasks involved, one thread should get a step ahead of the other, it will remain ahead until and unless the error happens to be spontaneously balanced by a misstep on the other side.
Probably you want to add a lsthread_wait() function that waits for the thread to complete a step. That would involve use of the CV from the opposite direction.
Overall, you could provide (better) for single-stepping by
Adding a member to type lsthread to indicate whether the thread should be or is executing a step vs. whether it is between steps and should wait.
typedef struct {
// ...
_Bool should_step;
} lsthread;
Adding the aforementioned lsthread_wait(), maybe something like this:
// The calling thread must hold t->myMutex locked
void lsthread_wait(lsthread *t) {
// Wait, if necessary, for the thread to complete a step
while (t->should_step) {
pthread_cond_wait(&t->myCond, &t->myMutex);
}
assert(!t->should_step);
// Prepare to perform another step
t->should_step = 1;
}
That would be paired with a revised version of lsthread_func():
void* lsthread_func(void* me_void){
lsthread* me = (lsthread*) me_void;
if (!me) pthread_exit(NULL);
lock(me); // needed to protect access to *me members and to globals
while (!me->shouldKillThread && me->execute) {
while (!me->should_step && !me->shouldKillThread) {
int ret = pthread_cond_wait(&(me->myCond), &(me->myMutex));
if (ret) {
unlock(me); // mustn't forget to unlock
pthread_exit(NULL);
}
}
assert(me->should_step || me->shouldKillThread);
if (!me->shouldKillThread && me->execute) {
me->execute();
}
// Mark and signal step completed
me->should_step = 0;
ret = pthread_cond_broadcast(me->myCond);
if (ret) break;
}
unlock(me);
pthread_exit(NULL);
}
Modifying step() to avoid it unlocking the mutex.
Modifying the driver loop to use the new wait function appropriately
lock(&t1);
lock(&t2);
do {
printf("\n%u: a1 = %d, a2 = %d", i++, (int) a1, (int) a2);
fflush(stdout);
step(&t1);
step(&t2);
lsthread_wait(&t1);
lsthread_wait(&t2);
} while (a1 < a2); // both locks must be held when this condition is evaluated
kill_lsthread_islocked(&t2);
kill_lsthread_islocked(&t1);
unlock(&t2);
unlock(&t1);
That's not necessarily all the changes that would be required, but I think I've covered the all the key points.
Final note
The above suggestions are based on the example program, in which different worker threads do not access any of the same shared data. That makes it feasible to use per-thread mutexes to protect the shared data they do access. If the workers accessed any of the same shared data, and those or any thread running concurrently with them modified that same data, then per-worker-thread mutexes would not offer sufficient protection.
* And if pthread_mutex_t and pthread_cond_t were pointer or integer types, which is allowed, then the compiler would have accepted the assignments in question without a cast (which would actually be a cast in that case), but those assignments still would be non-conforming as far as pthreads is concerned.
The summary of the problem is the following: Given a global resource of size N and M threads with their resource size of Xi (i=1,M) , syncronize the threads such that a thread is allocated,it does its stuff and then it is deallocated.
The main problem is that there are no resources available and the thread has to wait until there is enough memory. I have tried to "block" it with a while statement, but I realized that two threads can pass the while loop and the first to be allocated can change the global resource such that the second thread does not have enough space, but it has already passed the conditional section.
//piece of pseudocode
...
int MAXRES = 100;
// in thread function
{
while (thread_res > MAXRES);
lock();
allocate_res(); // MAXRES-=thread_res;
unlock();
// do own stuff
lock()
deallocate(); // MAXRES +=thread_res;
unlock();
}
To make a robust solution you need something more. As you noted, you need a way to wait until the condition ' there are enough resources available for me ', and a proper mutex has no such mechanism. At any rate, you likely want to bury that in the allocator, so your mainline thread code looks like:
....
n = Allocate()
do stuff
Release(n)
....
and deal with the contention:
int navail = N;
int Acquire(int n) {
lock();
while (n < navail) {
unlock();
lock();
}
navail -= n;
unlock();
return n;
}
void Release(int n) {
lock();
navail += n;
unlock();
}
But such a spin waiting system may fail because of priorities -- if the highest priority thread is spinning, a thread trying to Release may not be able to run. Spinning isn't very elegant, and wastes energy. You could put a short nap in the spin, but if you make the nap too short it wastes energy, too long and it increases latency.
You really want a signalling primitive like semaphores or condvars rather than a locking primitive. With a semaphore, it could look impressively like:
Semaphore *sem;
int Acquire(int n) {
senter(sem, n);
return n;
}
int Release(int n) {
sleave(sem, n);
return n;
}
void init(int N) {
sem = screate(N);
}
Update: Revised to use System V semaphores, which provides the ability to specify arbitrary 'checkout' and 'checkin' to the semaphore value. Overall logic the same.
Disclaimer: I did not use System V for few years, test before using, in case I missed some details.
int semid ;
// Call from main
do_init() {
shmkey = ftok(...) ;
semid = shmget(shmkey, 1, ...) ;
// Setup 100 as max resources.
struct sembuf so ;
so.sem_num = 0 ;
so.sem_op = 100 ;
so.sem_flg = 0 ;
semop(semid, &so, 1) ;
}
// Thread work
do_thread_work() {
int N = <number of resources>
struct sembuf so ;
so.sem_num = 0;
so.sem_op = -N ;
so.sem_flg = 0 ;
semop(semid, &so, 1) ;
... Do thread work
// Return resources
so.sem_op = +N ;
semop(semid, &so, 1) ;
}
As per: https://pubs.opengroup.org/onlinepubs/009695399/functions/semop.html, this will result in atomic checkout of multiple items.
Note: the ... as sections related to IPC_NOWAIT and SEM_UNDO, not relevant to this case.
If sem_op is a negative integer and the calling process has alter permission, one of the following shall occur:
If semval(see ) is greater than or equal to the absolute value of sem_op, the absolute value of sem_op is subtracted from semval. ...
...
If semval is less than the absolute value of sem_op ..., semop() shall increment the semncnt associated with the specified semaphore and suspend execution of the calling thread until one of the following conditions occurs:
The value of semval becomes greater than or equal to the absolute value of sem_op. When this occurs, the value of semncnt associated with the specified semaphore shall be decremented, the absolute value of sem_op shall be subtracted from semval and, ... .
Okay, so I'm having an issue with dynamically allocating work to pthreads in a queue.
For example, in my code I have a struct like below:
struct calc
{
double num;
double calcVal;
};
I store each struct in an array of length l like below.
struct calc **calcArray;
/* then I initialize the calcArray to say length l and
fill each calc struct with a num*/
Now, based on num, I want to find the value of calcVal. Each struct calc has a different value for num.
I want to spawn 4 pthreads which is easy enough but I want to make it so at the start,
thread 0 gets calcArray[0]
thread 1 gets calcArray[1]
thread 2 gets calcArray[2]
thread 3 gets calcArray[3]
Now assuming that it will take different times for each thread to do the calculations for each calc,
if thread 1 finishes first, it will then get calcArray[4]
then thread 3 finishes and gets calcArray[5] to do
and this continues until it reaches the end of calcArray[l].
I know I could just split the array into l/4 (each thread gets one quarter of the calcs) but I don't want to do this. Instead I want to make the work like a queue. Any ideas on how to do this?
You could accomplish it pretty easily, by creating a variable containing the index of the next element to be assigned, and then having it secured by a mutex.
Example:
// Index of next element to be worked on
int next_pos;
// Mutex that secures next_pos-access
pthread_mutex_t next_pos_lock;
int main() {
// ...
// Initialize the mutex before you create any threads
pthread_mutex_init(&next_pos_lock, NULL);
next_pos = NUM_THREADS;
// Create the threads
// ...
}
void *threadfunc(void *arg) {
int index = ...;
while (index < SIZE_OF_WORK_ARRAY) {
// Do your work
// Update your index
pthread_mutex_lock(&next_pos_lock);
index = next_pos;
next_pos++;
pthread_mutex_unlock(&next_pos_lock);
}
}
See also: POSIX Threads Programming - Mutex Variables