Use pthread condition variable in ping-pong test - c

I am using pthread condition variable as a synchronization primitive in a ping-pong test. The ping-pong test consists of two threads execute alternatively. Each thread writes to the other thread's memory and wake it up using signal, then wait and sleep on its own memory which will be written by the other thread later. Here is my first version. It works fine when I loop this ping-pong test for 10,000 times, but when I change to 100,000 times, it will hang occasionally. N=1,000,000 will make it hang definitely. I tried to debug and print out the loop number for each loop, but the program never hangs again after I add the print statement, which is annoying. Here is the ping-pong test code:
for(i=0; i<N+1; i++)
{
if(i==N)
{
pthread_cond_signal(&cond[dest]);
break;
}
pthread_mutex_lock(&mutex[dest]);
messages[dest]=my_rank;
pthread_cond_signal(&cond[dest]);
pthread_mutex_unlock(&mutex[dest]);
pthread_mutex_lock(&mutex[my_rank]);
while(pthread_cond_wait(&cond[my_rank], &mutex[my_rank]) && messages[my_rank]!=dest);
messages[my_rank]=my_rank;
pthread_mutex_unlock(&mutex[my_rank]);
printf("rank=%ld i=%ld messages[%ld]=%ld\n", my_rank, i, my_rank, messages[my_rank]);
}
Then I tried a second version which works and never hangs, even I set N to 1,000,000. I changed from two mutexes to one mutex which is shared by the two condition variables. I am not sure if it is the right way to go but this one never hangs again. Here is the code:
for(i=0; i<N+1; i++)
{
if(i==N)
{
pthread_cond_signal(&cond[dest]);
break;
}
pthread_mutex_lock(&mutex[0]);
messages[dest]=my_rank;
pthread_cond_signal(&cond[dest]);
while(pthread_cond_wait(&cond[my_rank], &mutex[0]) && messages[my_rank]!=dest);
messages[my_rank]=my_rank;
pthread_mutex_unlock(&mutex[0]);
}
I am very confused. Could somebody help me explain why the first version hangs but the second version does not? Is it correct for two condition variable sharing a single mutex?
Thanks.
Thanks for everyone, especially caf. Here is my final code that works without hanging.
for(i=0; i<N+1; i++)
{
pthread_mutex_lock(&mutex[dest]);
messages[dest]=my_rank;
pthread_cond_signal(&cond[dest]);
pthread_mutex_unlock(&mutex[dest]);
if(i!=N)
{
pthread_mutex_lock(&mutex[my_rank]);
while(messages[my_rank]!=dest)
pthread_cond_wait(&cond[my_rank], &mutex[my_rank]);
messages[my_rank]=my_rank;
pthread_mutex_unlock(&mutex[my_rank]);
}
}

The problem is in this line:
while (pthread_cond_wait(&cond[my_rank], &mutex[my_rank]) && messages[my_rank]!=dest);
If the 'dest' thread gets scheduled after you unlock mutex[dest] and before you lock mutex[my_rank], it will set messages[my_rank] and signal the condition variable, before this thread calls pthread_cond_wait(), so this thread will wait forever.
The fix for this is very easy: test messages[my_rank] before waiting on the condition variable. You also don't want && here, because you always want to keep looping as long as messages[my_rank] != dest is true - you don't want to break out at the first non-zero return from pthread_cond_wait(). So if you want to ignore errors from pthread_cond_wait() (as your original does, and this is perfectly fine if you aren't using error-checking or robust mutexes, since those are the only times pthread_cond_wait() is allowed to fail), use:
while (messages[my_rank] != dest)
pthread_cond_wait(&cond[my_rank], &mutex[my_rank]);
The reason that your alternate version doesn't have this bug is because the lock is continuously held between signalling the dest thread and waiting on the condition variable, so the dest thread doesn't get a chance to run until we're definitely waiting.
As for your supplementary question:
Is it correct for two condition variable sharing a single mutex?
Yes, this is allowed, but the converse is not (you cannot have two threads waiting on the same condition variable at the same time, using different mutexes).

first of all, sorry, i wanted to put that into comment but i still can not.
Well, in your code, i don't understand much what is "my_rank" and "dest", as i think my_rank should maybe vary within those loops, but i have found the following that may help you : http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003125000000000000000
there it's said :
You should always call pthread_cond_signal() under the protection of the same mutex used with the condition variable being signaled. Otherwise, the condition variable could be signaled between the test of the associated condition and blocking in pthread_cond_wait(), which can cause an infinite wait.
As you're using multiple mutexes in the first version, that may be the reason.
an admin could move that to comments, please?

Related

How to properly synchronize threads at barriers

I am encountering an issue where I have a hard time telling which synchronization primitive I should use.
I am creating n parallel threads that work on a region of memory, each is assigned to a specific part of this region and can accomplish its task independently from the other ones. At some point tho I need to collect the result of the work of all the threads, which is a good case for using barriers, this is what I'm doing.
I must use one of the n worker threads to collect the result of all their work, for this I have the following code that follows the computation code in my thread function:
if (pthread_barrier_wait(thread_args->barrier)) {
// Only gets called on the last thread that goes through the barrier
// This is where I want to collect the results of the worker threads
}
So far so good, but now is where I get stuck: the code above is in a loop as I want the threads to accomplish work again for a certain number of loop spins. The idea is that each time pthread_barrier_wait unblocks it means all threads have finished their work and the next iteration of the loop / parallel work can start again.
The problem with this is that the result collector block statements are not guaranteed to execute before other threads start working on this region again, so there is a race condition. I am thinking of using a UNIX condition variable like this:
// This code is placed in the thread entry point function, inside
// a loop that also contains the code doing the parallel
// processing code.
if (pthread_barrier_wait(thread_args->barrier)) {
// We lock the mutex
pthread_mutex_lock(thread_args->mutex);
collectAllWork(); // We process the work from all threads
// Set ready to 1
thread_args->ready = 1;
// We broadcast the condition variable and check it was successful
if (pthread_cond_broadcast(thread_args->cond)) {
printf("Error while broadcasting\n");
exit(1);
}
// We unlock the mutex
pthread_mutex_unlock(thread_args->mutex);
} else {
// Wait until the other thread has finished its work so
// we can start working again
pthread_mutex_lock(thread_args->mutex);
while (thread_args->ready == 0) {
pthread_cond_wait(thread_args->cond, thread_args->mutex);
}
pthread_mutex_unlock(thread_args->mutex);
}
There is multiple issues with this:
For some reason pthread_cond_broadcast never unlocks any other thread waiting on pthread_cond_wait, I have no idea why.
What happens if a thread pthread_cond_waits after the collector thread has broadcasted? I believe while (thread_args->ready == 0) and thread_args->ready = 1 prevents this, but then see next point...
On the next loop spin, ready will still be set to 1 hence no thread will call pthread_cond_wait again. I don't see any place where to properly set ready back to 0: if I do it in the else block after pthread_cond_wait, there is the possibility that another thread that wasn't cond waiting yet reads 1 and starts waiting even if I already broadcasted from the if block.
Note I am required to use barriers for this.
How can I solve this issue?
You could use two barriers (work and collector):
while (true) {
//do work
//every thread waits until the last thread has finished its work
if (pthread_barrier_wait(thread_args->work_barrier)) {
//only one gets through, then does the collecting
collectAllWork();
}
//every thread will wait until the collector has reached this point
pthread_barrier_wait(thread_args->collect_barrier);
}
You could use a kind of double buffering.
Each worker would have two storage slots for results.
Between the barriers the workers would store their results to one slot while the collector would read results from the other slot.
This approach has a few advantages:
no extra barriers
no condition queues
no locking
slot identifier does not even have to be atomic because each thread could have it's own copy of it and toggle it whenever reaching a barrier
much more performant as workers can work when collector is processing the other slot
Exemplary workflow:
Iteration 1.
workers write to slot 0
collector does nothing because no data is ready
all wait for barrier
Iteration 2.
worker write to slot 1
collector reads from slot 0
all wait for barrier
Iteration 3.
workers write to slot 0
collector reads from slot 1
all wait for barrier
Iteration 4.
go to iteration 2

What will be the behavior of a while loop that encloses pthread_cond_wait?

I would like to know what happens with the while after a waiting thread is waked.
To avoid 'spurious wake-ups', the pthreads documentation points out that you need to use pthread_cond_wait inside a while statement.
So, when the pthread_cond_wait is called, the calling thread is blocked. And after signaled, the thread resume inside the while.
On my code, I'm handling a pthread_cond_wait() this way:
pthread_mutex_lock(&waiting_call_lock);
while(1) {
pthread_cond_wait(&queue_cond[thread_ctx], &waiting_call_lock);
break;
}
pthread_mutex_unlock(&waiting_call_lock);
The question is, it will try to enter the while again or somehow it break the while and go on? Or, in that case, the break after the pthread_cond_wait() is necessary?
To get this right, I think it is best to start with asking yourself: "what is the thread waiting for?"
The answer should not be "it should wait until another thread signals it", because the way condition variables work is assuming you have something else, some kind of mutex-protected information, that the thread is to wait for.
To illustrate this I will invent an example here where the thread is supposed to wait until a variable called counter is larger than 7. The variable counter is accessible by multiple threads and is protected by a mutex that I will call theMutex. Then the code involving the pthread_cond_wait call could look like this:
pthread_mutex_lock(&theMutex);
while(counter <= 7) {
pthread_cond_wait(&cond, &theMutex);
}
pthread_mutex_unlock(&theMutex);
Now, if a "spurious wake-up" were to happen, the program will check the condition (counter <= 7) again and find it is still not satisfied, so it will stay in the loop and call pthread_cond_wait again. So this ensures that the thread will not proceed past the while loop until the condition is satisfied.
Since spurious wake-ups rarely happen in practice, it may be of interest to trigger that to check that your implementation works properly; here is a discussion about that: How to trigger spurious wake-up within a Linux application?

CreateMutex with bInitialOwner=true seem to be acting weird

I am really at a loss here.
i read the MSDN articles thoroughly but still cant figure out what is going on.
my problem is this: when running a few processes with the following code all is working:
HANDLE m = CreateMutex(NULL, FALSE, L"MyMutex");
char c[20] = "2print";
for(int iter = 0; iter<100; ++iter) {
WaitForSingleObject(m,INFINITE);
for(int i=0;i<(int)strlen(c);++i) {
for(int j=0;j<10000;++j)
printf("%c",c[i]);
}
BOOL ok = ReleaseMutex(m);
}
CloseHandle(m);
that is, the different processes each printing at its own turn and releasing the mutex until all printing is done.
BUT changing the CreateMutex to: (bInitialOwner from FALSE to TRUE)
HANDLE m = CreateMutex(NULL, TRUE, L"MyMutex");
the first creator will not release the mutex! other processes are just sitting there.
what amazed me was, that by adding an additional releaseMutex, that is changing:
BOOL ok = ReleaseMutex(m);
into:
BOOL ok = ReleaseMutex(m);
ok = ReleaseMutex(m);
make it work!
i am really confused, why wouldnt the first creator release the mutex correctly?
i tried printing all the errors using GetLastError, and what i get seem reasonable, like ERROR_ALREADY_EXISTS for the creators following the first one, which is just what i expect (MSDN says that in this situation the bInitialOwner is simply ignored).
When you use bInitialOwner=TRUE, the mutex creator automatically acquires the mutex. Then when you call WaitForSingleObject, it acquired the mutex again. Since win32 mutexes are recursive mutexes, you must release once for each time you acquired the mutex - so two ReleaseMutex calls are needed for the initial creator (however every other thread must only release once!)
You can avoid this by either not using bInitialOwner, or by skipping the WaitForSingleObject on the first loop only if any only if GetLastError() != ERROR_ALREADY_EXISTS. You will need to call SetLastError(0) prior to CreateMutex to clear the error code if you choose the latter.
If you only need bInitialOwner for some kind of initial setup, it will simplify your code if you drop the mutex prior to entering the common loop. Otherwise, I would strongly recommend simply not using bInitialOwner, unless you have a compelling reason to do so.

Multithreaded spin lock?

My daemon initializes itself in four different threads before it starts doing its things. Right now I use a counter which is incremented when a thread is started and decremented when it is finished. When the counter hits 0 I call the initialization finished callback.
Is this the preferred way to do it, or are there better ways? I'm using POSIX threads (pthread) and I just run a while cycle to wait for the counter to hit 0.
Edit: pthread_barrier_* functions are not available on my platform although they do seem to be the best choice.
Edit 2: Not all threads exit. Some initialize and then listen to events. Basically the thread needs to say, "I'm done initializing".
A barrier is what you need. They were created for that, when you need to "meet up" at certain points before continuing. See pthread_barrier_*
Rather than spinning, use the pthread mutex/condvar primitives. I'd suggest a single mutex to protect both the count of threads outstanding, and the condvar.
The main loop looks like this:
acquire mutex
count=N_THREADS;
start your N threads
while (1) {
if (count==0) break;
cond_wait(condvar);
}
release mutex
And when each thread is ready it would do something like this:
acquire mutex
count--
cond_signal(condvar)
release mutex
(EDIT: I have assumed that the threads are to keep going once they have done their initialisation stuff. If they are to finish, use pthread_join as others have said.)
pthread_join is the preferred way to wait for pthreads.
That sounds ... weird. Shouldn't you just be using pthread_join() to wait for the threads to complete? Maybe I don't understand the question.
As Klas Lindbäck pointed out in his answer, joining threads is a preferred way to go.
In case your threads are not exiting (i.e. are part of the reusable pool etc.), the logic sounds good. The only thing is that using counter without any synchronisation is dangerous. You have to use either mutex with condition or atomic integer. I'd recommend using mutex + condition if you don't want to spin on atomic counter in the thread that waits for initialisation to finish.
So, what happens if one thread finishes initialization before any of the others begin?
So one way to do it
initialize an atomic counter to 0
when each thread is done with init, increment counter and retrieve the value atomically. If you use GCC, you can use __sync_add_and_fetch()
If the retrieved counter value < N_threads, block on a pthread condition variable.
If the retrieved counter value == N_threads, init phase is done, signal the condition and continue.

Barriers for thread syncing

I'm creating n threads & then starting then execution after a barrier breakdown.
In global data space:
int bkdown = 0;
In main():
pthread_barrier_init(&bar,NULL,n);
for(i=0;i<n;i++)
{
pthread_create(&threadIdArray[i],NULL,runner,NULL);
if(i==n-2)printf("breakdown imminent!\n");
if(i==n-1)printf("breakdown already occurred!\n");
}
In thread runner function:
void *runner(void *param)
{
pthread_barrier_wait(&bar);
if(bkdown==0){bkdown=1;printf("barrier broken down!\n");}
...
pthread_exit(NULL);
}
Expected order:
breakdown imminent!
barrier broken down!
breakdown already occurred!
Actual order: (tested repeatedly)
breakdown imminent!
breakdown already occurred!
barrier broken down!!
Could someone explain why the I am not getting the "broken down" message before the "already occurred" message?
The order in which threads are run is dependent on the operating system. Just because you start a thread doesn't mean the OS is going to run it immediately.
If you really want to control the order in which threads are executed, you have to put some kind of synchronization in there (with mutexes or condition variables.)
for(i=0;i<n;i++)
{
pthread_create(&threadIdArray[i],NULL,runner,NULL);
if(i==n-2)printf("breakdown imminent!\n");
if(i==n-1)printf("breakdown already occurred!\n");
}
Nothing stops this loop from executing until i == n-1 . pthread_create() just fires off a thread to be run. It doesn't wait for it to start or end. Thus you're at the mercy of the scheduler, which might decide to continue executing your loop, or switch to one of the newly created threads (or do both, on a SMP system).
You're also initalizing the barrier to n, so in any case none of the threads will get past the barrier until you've created all of them.
In addition to the answers of nos and Starkey you have to take into account that you have another serialization in your code that is often neglected: you are doing IO on the same FILE variable, namely stdin.
The access to that variable is mutexed internally and the order in which your n+1 threads (including your calling thread) get access to that mutex is implementation defined, take it basically as random in your case.
So the order in which you get your printf output is the order in which your threads pass through these wormholes.
You can get the expected order in one of two ways
Create each thread with a higher priority than the main thread. This will ensure that new thread will run immediately after creation and wait on the barrier.
Move the "breakdown imminent!\n" print before the pthread_create() and call use a sched_yield() call after every pthread_create(). This will schedule the newly created thread for execution.

Resources