Threading In C: Producer Consumer taking forever to run - c

I'm new to concept of threading.
I was doing producer consumer problem in C but the consumer thread doesn't run when parallel with producer.
my code is as follows:
#include<stdio.h>
#include<stdlib.h>
#include<pthread.h>
int S;
int E;
int F;
void waitS(){
//printf("hbasd");
while(S<=0);
S--;
}
void signalS(){
S++;
}
void waitE(){
while(E<=0);
E--;
}
void signalE(){
E++;
}
void waitF(){
while(F<=0);
F--;
}
void signalF(){
F++;
}
int p,c;
void* producer(void *n){
int *j = (int *)n;
int i = *j;
while(1){
waitS();
waitE();
printf("Producer %d\n",E);
signalS();
signalF();
p++;
if(p>=i){
printf("Exiting: producer\n");
pthread_exit(0);
}
}
}
void* consumer(void *n){
int *j = (int *)n;
int i = *j;
while(1){
waitS();
waitF();
printf("Consumer %d\n",E);
signalS();
signalE();
c++;
if(c>=i){
printf("Exiting Consumer\n");
pthread_exit(0);
}
}
}
int main(int argc, char* argv[]){
int n = atoi(argv[1]);
E = n;
S = 1;
F = 0;
int pro = atoi(argv[2]);
int con = atoi(argv[3]);
pthread_t pid, cid;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_create(&pid,&attr,producer,(void *)&pro);
pthread_create(&cid,&attr,consumer,(void *)&con);
pthread_join(pid,NULL);
pthread_join(cid,NULL);
}
When i give the input as ./a.out 3 4 3
i.e n=3, pro = 4, con = 3
I get no out just an a dead lock kind of situation.
I expect an output like
Producer 2
Producer 1
Producer 0
Consumer 0
Consumer 1
Producer 0
Exiting: producer
Consumer 0
Exiting: consumer
...similar outputs where Producer runs 4 times and consumer thrice
When i give an input like ./a.out 4 4 3
i get the following output
Producer 3
Producer 2
Producer 1
Producer 0
Exiting: producer
Consumer 0
Consumer 1
Consumer 2
Exiting: consumer
from the results i'm getting a conclusion that pthread producer is executing 1st and then is pthread consumer.
I want both of them to execute simultaneously so that i get an answer similar to the first expected output when test cases like 3 4 3 are given.

You are accessing non-atomic variables from different threads without any kind of synchronization; this is a race condition and it leads to undefined behavior.
In particular, modern CPUs provide separate registers and separate caches to each CPU core, which means that if a thread running on CPU core #1 modifies the value of a variable, that modification may remain solely in CPU #1's cache for quite a while, without getting "pushed out" to RAM, and so another thread running on CPU core #2 may not "see" the thread #1's update for a long time (or perhaps never).
The traditional way to deal with this problem is either to serialize accesses to your shared variables with one or more mutexes (see pthread_mutex_init(), pthread_mutex_lock(), pthread_mutex_unlock(), etc), or use atomic variables rather than standard ints for values you want to access from multiple threads simultaneously. Both of those mechanisms have safeguards to ensure that undefined behavior won't occur (if you are using them correctly).

You can not access same memory from two different threads without synchronization. The standard for pthreads spells it out quite clearly here:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads.
Besides, even if we ignore that many CPUs don't synchronise memory unless you explicitly ask them to, your code is still incorrect in normal C because if variables can be changed behind your back they should be volatile. But even though volatile might help on some CPUs, it is incorrect for pthreads.
Just use proper locking, don't spin on global variables, there are methods to heat a room that are much cheaper than using a CPU.

In general, you should use synchronization primitives, but unlike other answerers I do believe we might not need any if we run this program on x86 architecture and prevent compiler to optimize some critical parts in the code.
According to Wikipedia, x86 architecture has almost sequential consistency, which is more than enough to implement a producer-consumer algorithm.
The rules to successfully implement such an producer-consumer algorithm is quite simple:
We must avoid writing the same variable from different threads, i.e. if one thread writes to variable X, another thread just read from X
We must tell the compiler explicitly that our variables might change somewhere, i.e. use volatile keyword on all shared between threads variables.
And here is the working example based on your code. Producer produces numbers from 5 down to 0, consumer consumes them. Please remember, this will work on x86 only due to weaker ordering on other architectures:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
volatile int P = 0;
volatile int C = 0;
volatile int value = 0;
void produce(int v)
{
value = v;
P++;
}
int consume()
{
int v = value;
C++;
return v;
}
void waitForConsumer()
{
while (C != P)
;
}
void waitForProducer()
{
while (C == P)
;
}
void *producer(void *n)
{
int i = *(int *)n;
while (1) {
waitForConsumer();
printf("Producing %d\n", i);
produce(i);
i--;
if (i < 0) {
printf("Exiting: producer\n");
pthread_exit(0);
}
}
}
void *consumer(void *n)
{
while (1) {
waitForProducer();
int v = consume();
printf("Consumed %d\n", v);
if (v == 0) {
printf("Exiting: consumer\n");
pthread_exit(0);
}
}
}
int main(int argc, char *argv[])
{
int pro = 5;
pthread_t pid, cid;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_create(&pid, &attr, producer, (void *)&pro);
pthread_create(&cid, &attr, consumer, NULL);
pthread_join(pid, NULL);
pthread_join(cid, NULL);
}
Produces the following result:
$ ./a.out
Producing 5
Producing 4
Consumed 5
Consumed 4
Producing 3
Producing 2
Consumed 3
Consumed 2
Producing 1
Producing 0
Exiting: producer
Consumed 1
Consumed 0
Exiting: consumer
For more information, I really recommend Herb Sutter's presentation called atomic<> Weapons, which is quite long, but has everything you need to know about ordering and atomics.
Despite the code listed above will work OK on x86, I really encourage you to watch the presentation above and use builtin atomics, like __atomic_load_n(), which will generate the correct assembly code on any platform.

Create new threads for producer and consumer each i.e all producers and consumers have their own threads.

Related

Producer / Consumer using semaphore

I'm starting my studies with syncronzed threads using semaphore.
I just did a test using binary semaphore (2 threads only) and it's all good.
Imagine a lanhouse, that have 3 computers (threads) and some clients (Threads). If all computers are bussy, the client will wait in a queue with a know limit (e.g 15 clients).
I can't understand how threads will relate to each other.
As far as I know, semaphore is used to control the access of the threads to a certain critical region / memory area / global variable.
1) Create 1 semaphore to control the Clients accessing the computers (but both are threads);
2) Create 1 semaphore to control the clients in queue;
But how relate threads with threads ? How the semaphore will know which thread(s) it should work with.
I don't need a full answer for it. I just need to understand how the Threads will relate to eachother. Some help to understand the situation.
This is my code so far and it's not working ;P can't control the clients to access the 3 computers avaliable.
#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define ROOM_SIZE 15
sem_t queue, pc, mutex;
int room [ROOM_SIZE];
int pcsAvaliable = 3, nAvaliable = 0, roomAvaliable = ROOM_SIZE;
int computers [3]; // 0 -> Empty | 1 -> Ocuppied
void* Lan(void* arg)
{
//Enter the lanhouse
//Verify if there is a computer avaliable
sem_wait(&pc);
if(pcsAvaliable > 0)
{
sem_wait(&mutex);
pcsAvaliable--;
computers[nAvaliable] = 1;
printf("Cliente pegou pc: %d\n", nAvaliable);
nAvaliable++;
sem_post(&mutex);
//Wait for 80~90ms
printf("Client liberou pc: %d\n", nAvaliable);
computers[nAvaliable] = 0;
nAvaliable--;
sem_post(&pc);
}
else
{
printf("No computer avaliable...\n");
//Check the waiting room for avaliable slot
if(roomAvaliable > 0)
{
roomAvaliable--;
printf("Client entered the waiting room.");
}
else
printf("No avaliable space in waiting room..\n");
}
}
int main(int argc, char const *argv[])
{
int i;
if(argc > 1)
{
int numClients = atoi(argv[1]);
sem_init(&pc, 0, 3);
sem_init(&mutex, 0, 1);
pthread_t clients[numClients];
//Create Clients
for(i=0; i< numClients; i++)
{
pthread_create(&clients[i], NULL, Lan, NULL);
}
//Join Clients
for(i=0; i< numClients; i++)
{
pthread_join(clients[i], NULL);
}
}
else
printf("Please, insert a parameter.");
pthread_exit(NULL);
sem_destroy(&pc);
return 0;
}
If you're going to be technical, if you're syncing tasks between threads you should use Semaphore. Example reading input before parsing it.
Here's an answer on semaphores.
But if you're using shared resources, and need to avoid race condition/two threads accesing at the same time, you should use mutexes. Here's a question on what is a mutex.
Also look at the disambiguation by Michael Barr which is a really good.
I would read both question thoroughly and the disambiguation, and you might actually end up not using semaphore and just mutexes since from what you explained you're only controlling a shared resource.
Common semaphore function
int sem_init(sem_t *sem, int pshared, unsigned int value); //Use pshared with 0, starts the semaphore with a given value
int sem_wait(sem_t *sem);//decreases the value of a semaphore, if it's in 0 it waits until it's increased
int sem_post(sem_t *sem);//increases the semaphore by 1
int sem_getvalue(sem_t *sem, int *valp);// returns in valp the value of the semaphore the returned int is error control
int sem_destroy(sem_t *sem);//destroys a semaphore created with sim_init
Common Mutexes functions (for linux not sure what O.S. you're running on)
int pthread_mutex_init(pthread_mutex_t *p_mutex, const pthread_mutexattr_t *attr); //starts mutex pointed by p_mutex, use attr NULL for simple use
int pthread_mutex_lock(pthread_mutex_t *p_mutex); //locks the mutex
int pthread_mutex_unlock(pthread_mutex_t *p_mutex); //unlocks the mutex
int pthread_mutex_destroy(pthread_mutex_t *p_mutex);//destroys the mutex
You can treat computers as resources. The data structure for the resource can be initialized by the main thread. Then, there can be client threads trying to acquire an instance of resource (a computer). You can use a counting semaphore with a value 3 for the number of computers. To acquire a computer, a client thread does
P (computer_sem).
Similarly to release the client thread has to do,
V (computer_sem)
For more information on threads and semaphore usage, refer
POSIX Threads Synchronization in C.

test program for processor load generate only 3 threads but we need more

I write a simple test program to produce some processor load. It will throw 6 threads and calculates in every thread pi. But the processor generates only 3 threads on the target platform (arm), the same program on a normal Linux-PC generates all 6 threads.
What is the Problem?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#define ITERATIONS 10000000000000
#define NUM_THREADS 6
void *calculate_pi(void *threadID) {
double i;
double pi;
int add = 0;
pi = 4;
for (i = 0; i < ITERATIONS; i++) {
if (add == 1) {
pi = pi + (4/(3+i*2));
add = 0;
} else {
pi = pi - (4/(3+i*2));
add = 1;
}
}
printf("pi from thread %d = %20lf in %20lf iterations\n", (int)threadID, pi, i);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
pthread_t threads[NUM_THREADS];
int rc;
int i;
for ( i = 0 ; i < NUM_THREADS; i++) {
rc = pthread_create(&threads[i], NULL, calculate_pi, (void *)i);
if (rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(EXIT_FAILURE);
}
}
for ( i = 0 ; i < NUM_THREADS; i++) {
pthread_join(threads[i], NULL);
}
return(EXIT_SUCCESS);
}
When your main thread creates a new thread, depending on how many CPUs you've got and a bunch of other things, the library/OS can decide to switch to the new thread immediately and run that new thread until it blocks or terminates; then switch back to the main thread which creates another new thread that runs until it blocks or terminates, and so on. In this case you'd never have more than 2 threads actually running at the same time (the main thread, and one of the new threads).
Of course the more CPUs you have the more likely it is that the main thread will keep running long enough to spawn all of the new threads. I'm guessing that this is what is happened - your PC simply has a lot more CPUs than the ARM system.
The best way to prevent this would be to make the new threads lower priority than the main thread. That way, when the higher priority main thread creates a lower priority thread, the library/kernel should be smart enough not to stop running the higher priority thread.
Sadly, the implementation of pthreads on Linux has a habit of ignoring normal pthreads thread priorities. The last time I looked into it, the only alternative was to use real time thread priorities instead, and this required root access and creates a security/permissions disaster. This is possibly due to limitations of the underlying scheduler in the kernel (e.g. a problem that the pthreads library can't work around).
There is another alternative. If your main thread acquires a mutex before creating any new threads and released it after all new threads are created, and if the other threads attempt to acquire (and release) the same mutex before doing any real work; then you'd force it to have all 7 threads at the same time.
If the purpose is just to load the processors, and you have a compiler that supports OpenMP, you can use the following:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <omp.h>
double calculate_pi(int iterations) {
double pi;
int add = 0;
pi = 4;
for (int ii = 0; ii < iterations; ii++) {
if (add == 1) {
pi = pi + (4.0/(3.0+ii*2));
add = 0;
} else {
pi = pi - (4.0/(3.0+ii*2));
add = 1;
}
}
return pi;
}
int main(int argc, char *argv[]) {
if ( argc != 2 ) {
printf("Usage: %s <niter>",argv[0]);
return 1;
}
const int iterations = atoi(argv[1]);
#pragma omp parallel
{
double pi = calculate_pi(iterations);
printf("Thread %d, pi = %g\n",omp_get_thread_num(),pi);
}
return 0;
}
In this way you can set the number of iterations from command line, and the number of threads from the environment variable OMP_NUM_THREADS. For instance:
export OMP_NUM_THREADS=4
./pi.x 1000
will run the executable with 1000 iterations and 4 threads.
There's nothing that guarantees you that the operating system will create as many kernel level threads/tasks as you spawn threads with pthread_create. There are pthreads implementations that will do everything in userland and only use one kernel level thread and cpu. Many (most?) implementations will do 1:1 threading where one thread is one kernel level thread because it's the simplest to implement. Some will implement M:N hybrid model where the userland library decides how many kernel level threads to spawn. This might be the case for the implementation you use. "ps -eLF" will only show you the kernel level threads, it doesn't have information about user level threads.
The advantage of M:N threading is that context switching between the various user level threads can be magnitudes faster in some cases. The disadvantage is that it's much more complicated to implement and usually the implementations are very fragile.
Maybe 1000 seconds (on the sleep) is not enough for that many iterations to finish. So the program might be exiting before the 6 threads are done.
Have you tried joining instead of sleeping?
Try replacing the sleep() for this:
for ( i = 0 ; i < NUM_THREADS; i++) {
s = pthread_join(threads[i], NULL);
}

Output in multi threading program

Writing my basic programs on multi threading and I m coming across several difficulties.
In the program below if I give sleep at position 1 then value of shared data being printed is always 10 while keeping sleep at position 2 the value of shared data is always 0.
Why this kind of output is coming ?
How to decide at which place we should give sleep.
Does this mean that if we are placing a sleep inside the mutex then the other thread is not being executed at all thus the shared data being 0.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include<unistd.h>
pthread_mutex_t lock;
int shared_data = 0;
void * function(void *arg)
{
int i ;
for(i =0; i < 10; i++)
{
pthread_mutex_lock(&lock);
shared_data++;
pthread_mutex_unlock(&lock);
}
pthread_exit(NULL);
}
int main()
{
pthread_t thread;
void * exit_status;
int i;
pthread_mutex_init(&lock, NULL);
i = pthread_create(&thread, NULL, function, NULL);
for(i =0; i < 10; i++)
{
sleep(1); //POSITION 1
pthread_mutex_lock(&lock);
//sleep(1); //POSITION 2
printf("Shared data value is %d\n", shared_data);
pthread_mutex_unlock(&lock);
}
pthread_join(thread, &exit_status);
pthread_mutex_destroy(&lock);
}
When you sleep before you lock the mutex, then you're giving the other thread plenty of time to change the value of the shared variable. That's why you're seeing a value of "10" with the 'sleep' in position #1.
When you grab the mutex first, you're able to lock it fast enough that you can print out the value before the other thread has a chance to modify it. The other thread sits and blocks on the pthread_mutex_lock() call until your main thread has finished sleeping and unlocked it. At that point, the second thread finally gets to run and alter the value. That's why you're seeing a value of "0" with the 'sleep' at position #2.
This is a classic case of a race condition. On a different machine, the same code might not display "0" with the sleep call at position #2. It's entirely possible that the second thread has the opportunity to alter the value of the variable once or twice before your main thread locks the mutex. A mutex can ensure that two threads don't access the same variable at the same time, but it doesn't have any control over the order in which the two threads access it.
I had a full explanation here but ended up deleting it. This is a basic synchronization problem and you should be able to trace and identify it before tackling anything more complicated.
But I'll give you a hint: It's only the sleep() in position 1 that matters; the other one inside the lock is irrelevant as long as it doesn't change the code outside the lock.

static storage with pthread functions

I was practicing some multithreaded programs, but I could not figure the logic behind this output.
#include<stdio.h>
#include<stdlib.h>
#include<pthread.h>
int print_message(void* ptr);
int main()
{
pthread_t thread1,thread2;
char *mesg1 = "Thread 1";
char *mesg2 = "Thread 2";
int iret1, iret2;
pthread_create(&thread1, NULL, print_message, (void *)mesg1);
pthread_create(&thread2, NULL, print_message, (void *)mesg2);
pthread_join(thread1,(void*)&iret1 );
pthread_join(thread2, (void*)&iret2);
printf("Thread 1 return : %d\n", (int)iret1);
printf("Thread 2 return : %d\n", (int)iret2);
return 0;
}
int print_message(void *ptr)
{
char *mesg;
static int i=0;
mesg = (char *)ptr;
printf("%s\n",mesg);
i++;
return ((void*)i);
}
I was expecting the output
Thread 1
Thread 2
Thread 1 return : 1
Thread 2 return : 2
but I am getting the output
Thread 1
Thread 2
Thread 1 return : 0
Thread 2 return : 2
Could some please clarify this to me ? And please point if any errors in usage of pthread functions.
The variable i is shared between both threads because it is static. The behaviour of modifying a variable between multiple threads is undefined, so, in fact, both the output you get and the output you want to get are “wrong” in the sense that the compiler is under no obligation to give it to you. In fact, I was able to get the output to change depending on the optimisation level I used and it will undoubtedly be different based on the platform.
If you want to modify i, you should use a mutex:
int print_message(void *ptr)
{
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
char *mesg;
static int i=0;
int local_i;
mesg = (char *)ptr;
printf("%s\n",mesg);
if (pthread_mutex_lock(&mutex) == 0) {
local_i = ++i;
pthread_mutex_unlock(&mutex);
}
return ((void*)local_i);
}
If you do not use a mutex, you will never be sure to get the output you think you should get.
There are several good books on multi-threading. I found Butenhof's Programming with Posix threads quite interesting, but more recent books exist.
You may also want to read this pthreads tutorial online.
Basically, each program source code thread might not view the memory as intuitively as you expect (cache coherence, multi-processing, memory model, C11).
Practically speaking, any access to a data shared between threads should be protected by synchronization primitives, e.g. mutexes or rwlocks.
Also, note that debugging multi-threaded programs is challenging due to non-determinism and heisenbugs.

How can I wait for any/all pthreads to complete?

I just want my main thread to wait for any and all my (p)threads to complete before exiting.
The threads come and go a lot for different reasons, and I really don't want to keep track of all of them - I just want to know when they're all gone.
wait() does this for child processes, returning ECHILD when there are no children left, however wait does not (appear to work with) (p)threads.
I really don't want to go through the trouble of keeping a list of every single outstanding thread (as they come and go), then having to call pthread_join on each.
As there a quick-and-dirty way to do this?
Do you want your main thread to do anything in particular after all the threads have completed?
If not, you can have your main thread simply call pthread_exit() instead of returning (or calling exit()).
If main() returns it implicitly calls (or behaves as if it called) exit(), which will terminate the process. However, if main() calls pthread_exit() instead of returning, that implicit call to exit() doesn't occur and the process won't immediately end - it'll end when all threads have terminated.
http://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_exit.html
Can't get too much quick-n-dirtier.
Here's a small example program that will let you see the difference. Pass -DUSE_PTHREAD_EXIT to the compiler to see the process wait for all threads to finish. Compile without that macro defined to see the process stop threads in their tracks.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>
static
void sleep(int ms)
{
struct timespec waittime;
waittime.tv_sec = (ms / 1000);
ms = ms % 1000;
waittime.tv_nsec = ms * 1000 * 1000;
nanosleep( &waittime, NULL);
}
void* threadfunc( void* c)
{
int id = (int) c;
int i = 0;
for (i = 0 ; i < 12; ++i) {
printf( "thread %d, iteration %d\n", id, i);
sleep(10);
}
return 0;
}
int main()
{
int i = 4;
for (; i; --i) {
pthread_t* tcb = malloc( sizeof(*tcb));
pthread_create( tcb, NULL, threadfunc, (void*) i);
}
sleep(40);
#ifdef USE_PTHREAD_EXIT
pthread_exit(0);
#endif
return 0;
}
The proper way is to keep track of all of your pthread_id's, but you asked for a quick and dirty way so here it is. Basically:
just keep a total count of running threads,
increment it in the main loop before calling pthread_create,
decrement the thread count as each thread finishes.
Then sleep at the end of the main process until the count returns to 0.
.
volatile int running_threads = 0;
pthread_mutex_t running_mutex = PTHREAD_MUTEX_INITIALIZER;
void * threadStart()
{
// do the thread work
pthread_mutex_lock(&running_mutex);
running_threads--;
pthread_mutex_unlock(&running_mutex);
}
int main()
{
for (i = 0; i < num_threads;i++)
{
pthread_mutex_lock(&running_mutex);
running_threads++;
pthread_mutex_unlock(&running_mutex);
// launch thread
}
while (running_threads > 0)
{
sleep(1);
}
}
If you don't want to keep track of your threads then you can detach the threads so you don't have to care about them, but in order to tell when they are finished you will have to go a bit further.
One trick would be to keep a list (linked list, array, whatever) of the threads' statuses. When a thread starts it sets its status in the array to something like THREAD_STATUS_RUNNING and just before it ends it updates its status to something like THREAD_STATUS_STOPPED. Then when you want to check if all threads have stopped you can just iterate over this array and check all the statuses.
Don't forget though that if you do something like this, you will need to control access to the array so that only one thread can access (read and write) it at a time, so you'll need to use a mutex on it.
you could keep a list all your thread ids and then do pthread_join on each one,
of course you will need a mutex to control access to the thread id list. you will
also need some kind of list that can be modified while being iterated on, maybe a std::set<pthread_t>?
int main() {
pthread_mutex_lock(&mutex);
void *data;
for(threadId in threadIdList) {
pthread_mutex_unlock(&mutex);
pthread_join(threadId, &data);
pthread_mutex_lock(&mutex);
}
printf("All threads completed.\n");
}
// called by any thread to create another
void CreateThread()
{
pthread_t id;
pthread_mutex_lock(&mutex);
pthread_create(&id, NULL, ThreadInit, &id); // pass the id so the thread can use it with to remove itself
threadIdList.add(id);
pthread_mutex_unlock(&mutex);
}
// called by each thread before it dies
void RemoveThread(pthread_t& id)
{
pthread_mutex_lock(&mutex);
threadIdList.remove(id);
pthread_mutex_unlock(&mutex);
}
Thanks all for the great answers! There has been a lot of talk about using memory barriers etc - so I figured I'd post an answer that properly showed them used for this.
#define NUM_THREADS 5
unsigned int thread_count;
void *threadfunc(void *arg) {
printf("Thread %p running\n",arg);
sleep(3);
printf("Thread %p exiting\n",arg);
__sync_fetch_and_sub(&thread_count,1);
return 0L;
}
int main() {
int i;
pthread_t thread[NUM_THREADS];
thread_count=NUM_THREADS;
for (i=0;i<NUM_THREADS;i++) {
pthread_create(&thread[i],0L,threadfunc,&thread[i]);
}
do {
__sync_synchronize();
} while (thread_count);
printf("All threads done\n");
}
Note that the __sync macros are "non-standard" GCC internal macros. LLVM supports these too - but if your using another compiler, you may have to do something different.
Another big thing to note is: Why would you burn an entire core, or waste "half" of a CPU spinning in a tight poll-loop just waiting for others to finish - when you could easily put it to work? The following mod uses the initial thread to run one of the workers, then wait for the others to complete:
thread_count=NUM_THREADS;
for (i=1;i<NUM_THREADS;i++) {
pthread_create(&thread[i],0L,threadfunc,&thread[i]);
}
threadfunc(&thread[0]);
do {
__sync_synchronize();
} while (thread_count);
printf("All threads done\n");
}
Note that we start creating the threads starting at "1" instead of "0", then directly run "thread 0" inline, waiting for all threads to complete after it's done. We pass &thread[0] to it for consistency (even though it's meaningless here), though in reality you'd probably pass your own variables/context.

Resources