possible causes of infinite wait in pthread_cond_wait

possible causes of infinite wait in pthread_cond_wait - c

I am currently trying to analyse a issue in third party source code where a thread (code snippet corresponding to THREAD-T1) is in infinite wait state. The suspicion is that the thread is stuck in pthread_cond_wait. The following are the details of the same.
Code description
T1 does an asynchronous call to an API exposed by T2.
Hence T1 moves to a blocking wait on a conditional variable (say cond_t).
The conditional variable cond_t is signalled in the callback event generated by T2.
The above cycle is repeated n times until the API returns success.
To consolidate, the above is a series of steps which makes the asynchronous call similar to a synchronous one by the use of condition variables.
Sample code
#define MAX_RETRY (3)
bool g_b_ret_val;
pthread_mutex_t g_cond_mutex;
pthread_mutex_t g_ret_val_mutex; /* Assume iniitailzed in the main thread */
pthread_cond_t g_cond_t; /* Assume iniitailzed in the main thread */
retry_async_call_routine() /* Thread-T1 */
{
while(( false == g_b_ret_val) && (retry < MAX_RETRY))
{
(void)invoke_async_api();
pthread_mutex_init(&g_cond_mutex, NULL);
pthread_mutex_lock(&g_cond_mutex);
pthread_cond_wait(g_cond_t, &g_cond_mutex);
pthread_mutex_unlock(&g_cond_mutex);
pthread_mutex_destroy(&g_cond_mutex);
retry ++ ;
}
}
callback_routine() /* Thread-T2 */
{
pthread_mutex_lock(&g_ret_val_mutex);
g_b_ret_val = true; /* May be false also on failure */
pthread_mutex_unlock(&g_ret_val_mutex);
pthread_cond_signal(&g_cond_t);
}
Known issues that I see in the code
Missing retest of condition in a while loop on pthread_cond_wait
Missing mutex lock while signalling
Questions
Please point out me on any more loop holes (or) possibility of infinite wait (if any).
g_cond_t is not reset using pthread_cond_destroy between successive waits, what is the behaviour of the same ? ( Any references regarding this)

This code seems absurd. You are not supposed to create and destroy a mutex just so you can wait on the condition variable. A mutex needs to be created before thread-shared data are used, then the mutex must be used to protect the shared data. In this case, that's g_ret_val_mutex which protects g_b_ret_val.
The condition variable itself is just used for waiting (with regular or timed wait) and signaling (signal or broadcast). It generally does not need its own lock, and in fact, having a separate one (as in the above loop) gets in the way of calling pthread_cond_wait, which takes only one mutex to unlock, not two. There's no need to destroy and re-create condition variables unless you need new/different attributes.
The key to "not getting stuck"—avoiding infinite wait—is to guarantee that, whenever a thread calls pthread_cond_wait, there is definitely some other thread that will, in the future, call pthread_cond_signal (or pthread_cond_broadcast). That is, the waiter tests "why to wait" first, with the "why" part locked, then waits only if the "why" part says "you should wait". The wake-up thread may use the same lock to determine that a wake-up is necessary, or—if the wake-up thread is "lazy", as in the above example—simply issues a "wake up call" every time.
The minimal change for correctness would thus seem to be to change the loop to read:
pthread_mutex_lock(&g_ret_val_mutex);
for (retry = 0; retry < MAX_RETRY && !g_b_ret_val; retry++) {
(void)invoke_async_api();
pthread_cond_wait(&g_cond_t, &g_ret_val_mutex);
}
success = g_b_ret_val; /* if false, we failed */
/* or: success = retry < MAX_RETRY; -- same result */
pthread_mutex_unlock(&g_ret_val_mutex);
(Aside: g_cond_t is a terrible name for a variable; the _t suffix is meant for types.)
It's sometimes wise to separate "some thread needs a wake-up" from "final result of that thread is success". If needed, I'd probably add that using a second boolean. Let's call it g_waiting, which we set true when callback_routine() is (supposedly) guaranteed to be called and it should do a wake-up event, and false when it's not guaranteed to be called or the wakeup is not required. This kind of coding allows you to switch to pthread_cond_timedwait, in case the asynchronous event might never occur for some reason.
Given that g_ret_val_mutex protects g_b_ret_val, it's appropriate to use that for the "waiting" flag as well—adding another mutex just offers more opportunities for problems, here. So now we get:
pthread_mutex_lock(&g_ret_val_mutex);
for (retry = 0; retry < MAX_RETRY && !g_b_ret_val; retry++) {
(void)invoke_async_api();
compute_wakeup_time(&abstime);
g_waiting = true;
pthread_cond_timedwait(&g_cond_t, &g_ret_val_mutex, &abstime);
if (g_waiting) {
/* timeout occurred, we never got our callback */
/* may want something special for this case */
} else {
/* wakeup occurred, result is in g_b_ret_val */
}
}
success = g_b_ret_val;
/* or: success = retry < MAX_RETRY; */
g_waiting = false;
pthread_mutex_unlock(&g_ret_val_mutex);
Meanwhile:
callback_routine() /* Thread-T2 */
{
pthread_mutex_lock(&g_ret_val_mutex);
g_b_ret_val = compute_success_or_failure();
if (g_waiting) {
g_waiting = false;
pthread_cond_signal(&g_cond_t);
}
pthread_mutex_unlock(&g_ret_val_mutex);
}
I've moved the signal to "inside" the mutex, although it's OK either way, so that I can do it only if g_waiting is set, and clear g_waiting. Since we hold the mutex, it's OK to clear g_waiting either before or after calling pthread_cond_signal (as long as no other code will interrupt the sequence).
Note: if we do start using timedwait, we need to find out whether it is OK to call invoke_async_api when another earlier invoke was used but no result was returned before the timeout.

Related

C Timer with cancellation and call back

I've created a Timer pseudo class in C that has call back capability and can be cancelled. I come from the .NET/C# world where this is all done by the framework and I'm not an expert with pthreads.
In .NET there are cancellation tokens which you can wait on which means I don't need to worry so much about the nuts and bolts.
However using pthreads is a bit more low level than I am used to so my question is:
Are there any issues with the way I have implemented this?
Thanks in anticipation for any comments you may have.
Timer struct:
typedef struct _timer
{
pthread_cond_t Condition;
pthread_mutex_t ConditionMutex;
bool IsRunning;
pthread_mutex_t StateMutex;
pthread_t Thread;
int TimeoutMicroseconds;
void * Context;
void (*Callback)(bool isCancelled, void * context);
} TimerObject, *Timer;
C Module:
static void *
TimerTask(Timer timer)
{
struct timespec timespec;
struct timeval now;
int returnValue = 0;
clock_gettime(CLOCK_REALTIME, &timespec);
timespec.tv_sec += timer->TimeoutMicroseconds / 1000000;
timespec.tv_nsec += (timer->TimeoutMicroseconds % 1000000) * 1000000;
pthread_mutex_lock(&timer->StateMutex);
timer->IsRunning = true;
pthread_mutex_unlock(&timer->StateMutex);
pthread_mutex_lock(&timer->ConditionMutex);
returnValue = pthread_cond_timedwait(&timer->Condition, &timer->ConditionMutex, &timespec);
pthread_mutex_unlock(&timer->ConditionMutex);
if (timer->Callback != NULL)
{
(*timer->Callback)(returnValue != ETIMEDOUT, timer->Context);
}
pthread_mutex_lock(&timer->StateMutex);
timer->IsRunning = false;
pthread_mutex_unlock(&timer->StateMutex);
return 0;
}
void
Timer_Initialize(Timer timer, void (*callback)(bool isCancelled, void * context))
{
pthread_mutex_init(&timer->ConditionMutex, NULL);
timer->IsRunning = false;
timer->Callback = callback;
pthread_mutex_init(&timer->StateMutex, NULL);
pthread_cond_init(&timer->Condition, NULL);
}
bool
Timer_IsRunning(Timer timer)
{
pthread_mutex_lock(&timer->StateMutex);
bool isRunning = timer->IsRunning;
pthread_mutex_unlock(&timer->StateMutex);
return isRunning;
}
void
Timer_Start(Timer timer, int timeoutMicroseconds, void * context)
{
timer->Context = context;
timer->TimeoutMicroseconds = timeoutMicroseconds;
pthread_create(&timer->Thread, NULL, TimerTask, (void *)timer);
}
void
Timer_Stop(Timer timer)
{
void * returnValue;
pthread_mutex_lock(&timer->StateMutex);
if (!timer->IsRunning)
{
pthread_mutex_unlock(&timer->StateMutex);
return;
}
pthread_mutex_unlock(&timer->StateMutex);
pthread_cond_broadcast(&timer->Condition);
pthread_join(timer->Thread, &returnValue);
}
void
Timer_WaitFor(Timer timer)
{
void * returnValue;
pthread_join(timer->Thread, &returnValue);
}
Example use:
void
TimerExpiredCallback(bool cancelled, void * context)
{
fprintf(stderr, "TimerExpiredCallback %s with context %s\n",
cancelled ? "Cancelled" : "Timed Out",
(char *)context);
}
void
ThreadedTimerExpireTest()
{
TimerObject timerObject;
Timer_Initialize(&timerObject, TimerExpiredCallback);
Timer_Start(&timerObject, 5 * 1000000, "Threaded Timer Expire Test");
Timer_WaitFor(&timerObject);
}
void
ThreadedTimerCancelTest()
{
TimerObject timerObject;
Timer_Initialize(&timerObject, TimerExpiredCallback);
Timer_Start(&timerObject, 5 * 1000000, "Threaded Timer Cancel Test");
Timer_Stop(&timerObject);
}

Overall, it seems pretty solid work for someone who ordinarily works in different languages and who has little pthreads experience. The idea seems to revolve around pthread_cond_timedwait() to achieve a programmable delay with a convenient cancellation mechanism. That's not unreasonable, but there are, indeed, a few problems.
For one, your condition variable usage is non-idiomatic. The conventional and idiomatic use of a condition variable associates with each wait a condition for whether the thread is clear to proceed. This is tested, under protection of the mutex, before waiting. If the condition is satisfied then no wait is performed. It is tested again after each wakeup, because there is a variety of scenarios in which a thread may return from waiting even though it is not actually clear to proceed. In these cases, it loops back and waits again.
I see at least two such possibilities with your timer:
The timer is cancelled very quickly, before its thread starts to wait. Condition variables do not queue signals, so in this case the cancellation would be ineffective. This is a form of race condition.
Spurious wakeup. This is always a possibility that must be considered. Spurious wakeups are rare under most circumstances, but they really do happen.
It seems natural to me to address that by generalizing your IsRunning to cover more states, perhaps something more like
enum { NEW, RUNNING, STOPPING, FINISHED, ERROR } State;
, instead.
Of course, you still have to test that under protection of the appropriate mutex, which brings me to my next point: one mutex should suffice. That one can and should serve both to protect shared state and as the mutex associated with the CV wait. This, too, is idiomatic. It would lead to code in TimerTask() more like this:
// ...
pthread_mutex_lock(&timer->StateMutex);
// Responsibility for setting the state to RUNNING transferred to Timer_Start()
while (timer->State == RUNNING) {
returnValue = pthread_cond_timedwait(&timer->Condition, &timer->StateMutex, &timespec);
switch (returnValue) {
case 0:
if (timer->State == STOPPING) {
timer->State = FINISHED;
}
break;
case ETIMEDOUT:
timer->State = FINISHED;
break;
default:
timer->State = ERROR;
break;
}
}
pthread_mutex_unlock(&timer->StateMutex);
// ...
The accompanying Timer_Start() and Timer_Stop() would be something like this:
void Timer_Start(Timer timer, int timeoutMicroseconds, void * context) {
timer->Context = context;
timer->TimeoutMicroseconds = timeoutMicroseconds;
pthread_mutex_lock(&timer->StateMutex);
timer->state = RUNNING;
// start the thread before releasing the mutex so that no one can see state
// RUNNING before the thread is actually running
pthread_create(&timer->Thread, NULL, TimerTask, (void *)timer);
pthread_mutex_unlock(&timer->StateMutex);
}
void Timer_Stop(Timer timer) {
_Bool should_join = 0;
pthread_mutex_lock(&timer->StateMutex);
switch (timer->State) {
case NEW:
timer->state = FINISHED;
break;
case RUNNING:
timer->state = STOPPING;
should_join = 1;
break;
case STOPPING:
should_join = 1;
break;
// else no action
}
pthread_mutex_unlock(&timer->StateMutex);
// Harmless if the timer has already stopped:
pthread_cond_broadcast(&timer->Condition);
if (should_join) {
pthread_join(timer->Thread, NULL);
}
}
A few other, smaller adjustments would be needed elsewhere.
Additionally, although the example code above omits it for clarity, you really should ensure that you test the return values of all the functions that provide status information that way, unless you don't care whether they succeeded. That includes almost all standard library and Pthreads functions. What you should do in the event that that one fails is highly contextual, but pretending (or assuming) that it succeeded, instead, is rarely a good choice.
An alternative
Another approach to a cancellable delay would revolve around select() or pselect() with a timeout. To arrange for cancellation, you set up a pipe, and have select() to listen to the read end. Writing anything to the write end will then wake select().
This is in several ways easier to code, because you don't need any mutexes or condition variables. Also, data written to a pipe persists until it is read (or the pipe is closed), which smooths out some of the timing-related issues that the CV-based approach has to code around.
With select, however, you need to be prepared to deal with signals (at minimum by blocking them), and the timeout is a duration, not an absolute time.

pthread_mutex_lock(&timer->StateMutex);
timer->IsRunning = true;
pthread_mutex_unlock(&timer->StateMutex);
pthread_mutex_lock(&timer->ConditionMutex);
returnValue = pthread_cond_timedwait(&timer->Condition, &timer->ConditionMutex, &timespec);
pthread_mutex_unlock(&timer->ConditionMutex);
if (timer->Callback != NULL)
{
(*timer->Callback)(returnValue != ETIMEDOUT, timer->Context);
}
You have two bugs here.
A cancellation can slip in after IsRunning is set to true and before pthread_cond_timedwait gets called. In this case, you'll wait out the entire timer. This bug exists because ConditionMutex doesn't protect any shared state. To use a condition variable properly, the mutex associated with the condition variable must protect the shared state. You can't trade the right mutex for the wrong mutex and then call pthread_cond_timedwait because that creates a race condition. The entire point of a condition variable is to provide an atomic "unlock and wait" operation to prevent this race condition and your code goes to effort to break that logic.
You don't check the return value of pthread_cond_timedwait. If neither the timeout has expired nor cancellation has been requested, you call the callback anyway. Condition variables are stateless. It is your responsibility to track and check state, the condition variable will not do this for you. You need to call pthread_cond_timedwait in a loop until either the state is set to STOPPING or the timeout is reached. Note that the mutex associated with the condition variable, as in 1 above, must protect the shared state -- in this case state.
I think you have a fundamental misunderstanding about how condition variable work and what they're for. They are used when you a mutex that protects shared state and you want to wait for that shared state to change. The mutex associated with the condition variable must protect the shared state to avoid the classic race condition where the state changes after you released the lock but before you managed to start waiting.
UPDATE:
To provide some more useful information, let me briefly explain what a condition variable is for. Say you have some shared state protected by a mutex. And say some thread can't make forward progress until that shared state changes.
You have a problem. You have to hold the mutex that protects the shared state to see what the state is. When you see that it's in the wrong state, you need to wait. But you also need to release the mutex or no other thread can change the shared state.
But if you unlock the mutex and then wait (which is what your code does above!) you have a race condition. After you unlock the mutex but before you wait, another thread can acquire the mutex and change the shared state such that you no longer want to wait. So you need an atomic "unlock the mutex and wait" operation.
That is the purpose, and the only purpose, of condition variables. So you can atomically release the mutex that protects some shared state and wait for a sign with no change for the signal to be lost in-between when you released the mutex and when you waited.
Another important point -- condition variables are stateless. They have no idea what you are waiting for. You must never call pthread_cond_wait or pthread_cond_timedwait and make assumptions about the state. You must check it yourself. Your code releases the mutex after pthread_cond_timedwait returns. You only want to do that if the call times out.
If pthread_cond_timedwait doesn't timeout (or, in any case, when pthread_cond_wait returns), you don't know what happened until you check the state. That's why these functions re-acquire the mutex -- so you can check the state and decide what to do. This is why these functions are almost always called in a loop -- if the thing you're waiting for still hasn't happened (which you determine by checking the shared state that you are responsible for), you need to keep waiting.

What's a a good way to stop a pool of threads from running?

I've been working on a project lately, and i need to manage a pair of thread pools.
What the worker threads in the pools do is basically execute some kind of pop operation to each respective queue, eventually wait on a condition variable (pthread_cond_t) if there is no available value in the queue, and once they get an item, parse it and execute operations accordingly.
What i'm concerned about is the fact that i want to have no memory leaks, and to achieve that i noticed that calling a pthread_cancel on each thread when the main process is exiting is definitely a bad idea, as it leaves a lot of garbage around.
The point is, my first thought was to use a exit flag which i can set when the threads need to exit, so that they can easily free memory and call a pthread_exit...
I guess i should set this flag, then send a broadcast signal to the threads waiting on the condition variable and check the flag right after the pop operation...
Is this really the correct way to implement a good thread pool termination? I don't feel that much confident about this...
I'm writing some pseudo-code here to explain what i'm talking about
Each pool thread will run some code structured like this:
/* Worker thread (which will run on each pool thread) */
{ /* Thread initialization */ ... }
loop {
value = pop();
{ /* Mutex lock because of the shared flag */ ... }
if (flag) {{ /* Free memory and unlock mutex */ ... } pthread_exit(); }
{ /* Unlock the mutex */ ... }
{ /* Elaborate value */ ... }
}
return NULL;
And there will be some kind of pool_stopRunning() function which will look like:
/* pool_stopRunning() function code */
{ /* Acquire flag mutex */ ... }
setFlag(flag);
{ /* Unlock flag mutex */ ... }
{ /* Acquire queue mutex */ ... }
pthread_cond_broadcast(...);
{ /* Unlock queue mutex */ ... }
Thanks in advance, i just need to be sure that there isn't a fancy-er way to stop a thread pool... (or get to know a better way, by any chance)
As always, i'm sorry if there is any typo, i'm not and english speaker and it's kind of late right now >:

What you are describing will work, but I would suggest a different approach...
You already have a mechanism for assigning tasks to threads, complete with all appropriate synchronization. So instead of complicating the design with some new parallel mechanism, just define a new type of task called "STOP". If there are N threads servicing a queue and you want to terminate them, push N STOP tasks onto the queue. Then just wait for all of the threads to terminate. (This last can be done via "join", so it should not require any new mechanism, either.)
No muss, no fuss.

With respect to symmetry with setting the flag and reducing serialization, this code:
{ /* Mutex lock because of the shared flag */ ... }
if (flag) {{ /* Free memory and unlock mutex */ ... } pthread_exit(); }
{ /* Unlock the mutex */ ... }
should look like this:
{ /* Mutex lock because of the shared flag */ ... }
flagcopy = readFlag();
{ /* Unlock the mutex */ ... }
if (flagcopy) {{ /* Free memory ... } pthread_exit(); }
Having said that, you can (should?) factor the mutex code into the setFlag and readFlag methods.
There is one more thing. If the flag is only a boolean and it is only changed once before the whole thing shuts down (i.e., it's never unset after being set), then I would argue that protecting the read with a mutex is not required.
I say this because if the above assumptions are true and if the loop's duration is very short and the loop iteration frequency is high, then you would be imposing undue serialization upon the business task and potentially increasing the response time unacceptably.

Event object manual-reset, wrong thread synchronization

I'm approaching to C Windows programming in particular threads, concurrency and synchronization.
To experiment, I'm writing a C program that accepts N parameters.
Each parameter indicates a path to a file system directory tree and the program has to compare the content of all directories to decide whether all directories have the same content or not.
The main runs a "reading" thread for each parameter while a single "comparison" thread compares the name of all the entries found. For each file/directory found, "reading" threads synchronize themselves by activating the "comparison" thread.
I wrote the program with Semaphore objects and now I'm trying with Event objects.
The idea is to use N Events auto-reset and a single Event manual-reset.
The N events are used by the N "reading" threads to signal the "comparison" thread which is in WaitForMultipleObjects for an INFINITE time. When all the signals are available, it starts comparing the entry and then it performs a SetEvent() for the manual-reset object.
The "reading" threads wait for this set and then Reset the event and continue working with the next entry.
Some code for the N reading threads:
void ReadingTraverseDirectory(LPTSTR StartPathName, DWORD i) {
//variables and some work
do {
//take the next entry and put it in current_entry;
gtParams[it].entry = current_entry; //global var for comparison
SetEvent(glphReadingEvent[i]); //signal the comparison thread
WaitForSingleObject(ghComparisonEvent, INFINITE); //wait signal to restart working
ResetEvent(ghComparisonEvent); //reset the event
if (current_entry == TYPE_DIR) {
ReadingTraverseDirectory(current_entry, i); //recur to explor the next dir
}
} while (FindNextFile(SearchHandle, &FindData)); //while there are still entries
//
return;
}
Some code for the comparison thread:
DWORD WINAPI CompareThread(LPVOID arg) {
while (entries are equal){
WaitForMultipleObjects(N, glphReadingEvent, TRUE, 1000);
for (r = 0; r < nworkers - 1; r++){
if (_tcscmp(entries) != 0){
//entries are different. exit and close.
}
}
SetEvent(ghComparisonEvent);
}
}
The problem:
Sometimes it happens that one reading thread is able to work without respecting the synchro with other threads. If I put a printf() or Sleep(1) -between Wait and Set of the comparison thread-, the program works perfectly.
My opinion:
I think the manual-reset Event is not safe for this kind of (barrier)synchronization.
A reading thread may be too fast in ResetEvent() and if the scheduler slows down other threads, it is possible that some of them risk to stay blocked while the one which performed the Reset is able to continue its work.However if this is the case, the comparison thread should block itself on WaitingForMultipleObjects causing a deadlock... actually there is no deadlock but 1 thread is able to cycle more times respect to others.
What I'm trying to understand is why a simple Sleep(1) can solve the issue. Is it matter of scheduling or wrong implementation of synchronization?
Thank you.

Why do we need a condition check before pthread_cond_wait

I am trying to learn basics of pthread_cond_wait. In all the usages, I see either
if(cond is false)
pthread_cond_wait
or
while(cond is false)
pthread_cond_wait
My question is, we want to cond_wait only because condition is false. Then why should i take the pain of explicitly putting an if/while loop. I can understand that without any if/while check before cond_wait we will directly hit that and it wont return at all. Is the condition check solely for solving this purpose or does it have anyother significance. If it for solving an unnecessary condition wait, then putting a condition check and avoiding the cond_wait is similar to polling?? I am using cond_wait like this.
void* proc_add(void *name){
struct vars *my_data = (struct vars*)name;
printf("In thread Addition and my id = %d\n",pthread_self());
while(1){
pthread_mutex_lock(&mutexattr);
while(!my_data->ipt){ // If no input get in
pthread_cond_wait(&mutexaddr_add,&mutexattr); // Wait till signalled
my_data->opt = my_data->a + my_data->b;
my_data->ipt=1;
pthread_cond_signal(&mutexaddr_opt);
}
pthread_mutex_unlock(&mutexattr);
if(my_data->end)
pthread_exit((void *)0);
}
}
The logic is, I am asking the input thread to process the data whenever an input is available and signal the output thread to print it.

You need a while loop because the thread that called pthread_cond_wait might wake up even when the condition you are waiting for isn't reached. This phenomenon is called "spurious wakeup".
This is not a bug, it is the way the conditional variables are implemented.
This can also be found in man pages:
Spurious wakeups from the pthread_cond_timedwait() or
pthread_cond_wait() functions may occur. Since the return from
pthread_cond_timedwait() or pthread_cond_wait() does not imply
anything about the value of this predicate, the predicate should be
re-evaluated upon such return.
Update regarding the actual code:
void* proc_add(void *name)
{
struct vars *my_data = (struct vars*)name;
printf("In thread Addition and my id = %d\n",pthread_self());
while(1) {
pthread_mutex_lock(&mutexattr);
while(!my_data->ipt){ // If no input get in
pthread_cond_wait(&mutexaddr_add,&mutexattr); // Wait till signalled
}
my_data->opt = my_data->a + my_data->b;
my_data->ipt=1;
pthread_cond_signal(&mutexaddr_opt);
pthread_mutex_unlock(&mutexattr);
if(my_data->end)
pthread_exit((void *)0);
}
}
}

You must test the condition under the mutex before waiting because signals of the condition variable are not queued (condition variables are not semaphores). That is, if a thread calls pthread_cond_signal() when no threads are blocked in pthread_cond_wait() on that condition variable, then the signal does nothing.
This means that if you had one thread set the condition:
pthread_mutex_lock(&m);
cond = true;
pthread_cond_signal(&c);
pthread_mutex_unlock(&m);
and then another thread unconditionally waited:
pthread_mutex_lock(&m);
pthread_cond_wait(&c, &m);
/* cond now true */
this second thread would block forever. This is avoided by having the second thread check for the condition:
pthread_mutex_lock(&m);
if (!cond)
pthread_cond_wait(&c, &m);
/* cond now true */
Since cond is only modified with the mutex m held, this means that the second thread waits if and only if cond is false.
The reason a while () loop is used in robust code instead of an if () is because pthread_cond_wait() does not guarantee that it will not wake up spuriously. Using a while () also means that signalling the condition variable is always perfectly safe - "extra" signals don't affect the program's correctness, which means that you can do things like move the signal outside of the locked section of code.

How can barriers be destroyable as soon as pthread_barrier_wait returns?

This question is based on:
When is it safe to destroy a pthread barrier?
and the recent glibc bug report:
http://sourceware.org/bugzilla/show_bug.cgi?id=12674
I'm not sure about the semaphores issue reported in glibc, but presumably it's supposed to be valid to destroy a barrier as soon as pthread_barrier_wait returns, as per the above linked question. (Normally, the thread that got PTHREAD_BARRIER_SERIAL_THREAD, or a "special" thread that already considered itself "responsible" for the barrier object, would be the one to destroy it.) The main use case I can think of is when a barrier is used to synchronize a new thread's use of data on the creating thread's stack, preventing the creating thread from returning until the new thread gets to use the data; other barriers probably have a lifetime equal to that of the whole program, or controlled by some other synchronization object.
In any case, how can an implementation ensure that destruction of the barrier (and possibly even unmapping of the memory it resides in) is safe as soon as pthread_barrier_wait returns in any thread? It seems the other threads that have not yet returned would need to examine at least some part of the barrier object to finish their work and return, much like how, in the glibc bug report cited above, sem_post has to examine the waiters count after having adjusted the semaphore value.

I'm going to take another crack at this with an example implementation of pthread_barrier_wait() that uses mutex and condition variable functionality as might be provided by a pthreads implementation. Note that this example doesn't try to deal with performance considerations (specifically, when the waiting threads are unblocked, they are all re-serialized when exiting the wait). I think that using something like Linux Futex objects could help with the performance issues, but Futexes are still pretty much out of my experience.
Also, I doubt that this example handles signals or errors correctly (if at all in the case of signals). But I think proper support for those things can be added as an exercise for the reader.
My main fear is that the example may have a race condition or deadlock (the mutex handling is more complex than I like). Also note that it is an example that hasn't even been compiled. Treat it as pseudo-code. Also keep in mind that my experience is mainly in Windows - I'm tackling this more as an educational opportunity than anything else. So the quality of the pseudo-code may well be pretty low.
However, disclaimers aside, I think it may give an idea of how the problem asked in the question could be handled (ie., how can the pthread_barrier_wait() function allow the pthread_barrier_t object it uses to be destroyed by any of the released threads without danger of using the barrier object by one or more threads on their way out).
Here goes:
/*
* Since this is a part of the implementation of the pthread API, it uses
* reserved names that start with "__" for internal structures and functions
*
* Functions such as __mutex_lock() and __cond_wait() perform the same function
* as the corresponding pthread API.
*/
// struct __barrier_wait data is intended to hold all the data
// that `pthread_barrier_wait()` will need after releasing
// waiting threads. This will allow the function to avoid
// touching the passed in pthread_barrier_t object after
// the wait is satisfied (since any of the released threads
// can destroy it)
struct __barrier_waitdata {
struct __mutex cond_mutex;
struct __cond cond;
unsigned waiter_count;
int wait_complete;
};
struct __barrier {
unsigned count;
struct __mutex waitdata_mutex;
struct __barrier_waitdata* pwaitdata;
};
typedef struct __barrier pthread_barrier_t;
int __barrier_waitdata_init( struct __barrier_waitdata* pwaitdata)
{
waitdata.waiter_count = 0;
waitdata.wait_complete = 0;
rc = __mutex_init( &waitdata.cond_mutex, NULL);
if (!rc) {
return rc;
}
rc = __cond_init( &waitdata.cond, NULL);
if (!rc) {
__mutex_destroy( &pwaitdata->waitdata_mutex);
return rc;
}
return 0;
}
int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned int count)
{
int rc;
rc = __mutex_init( &barrier->waitdata_mutex, NULL);
if (!rc) return rc;
barrier->pwaitdata = NULL;
barrier->count = count;
//TODO: deal with attr
}
int pthread_barrier_wait(pthread_barrier_t *barrier)
{
int rc;
struct __barrier_waitdata* pwaitdata;
unsigned target_count;
// potential waitdata block (only one thread's will actually be used)
struct __barrier_waitdata waitdata;
// nothing to do if we only need to wait for one thread...
if (barrier->count == 1) return PTHREAD_BARRIER_SERIAL_THREAD;
rc = __mutex_lock( &barrier->waitdata_mutex);
if (!rc) return rc;
if (!barrier->pwaitdata) {
// no other thread has claimed the waitdata block yet -
// we'll use this thread's
rc = __barrier_waitdata_init( &waitdata);
if (!rc) {
__mutex_unlock( &barrier->waitdata_mutex);
return rc;
}
barrier->pwaitdata = &waitdata;
}
pwaitdata = barrier->pwaitdata;
target_count = barrier->count;
// all data necessary for handling the return from a wait is pointed to
// by `pwaitdata`, and `pwaitdata` points to a block of data on the stack of
// one of the waiting threads. We have to make sure that the thread that owns
// that block waits until all others have finished with the information
// pointed to by `pwaitdata` before it returns. However, after the 'big' wait
// is completed, the `pthread_barrier_t` object that's passed into this
// function isn't used. The last operation done to `*barrier` is to set
// `barrier->pwaitdata = NULL` to satisfy the requirement that this function
// leaves `*barrier` in a state as if `pthread_barrier_init()` had been called - and
// that operation is done by the thread that signals the wait condition
// completion before the completion is signaled.
// note: we're still holding `barrier->waitdata_mutex`;
rc = __mutex_lock( &pwaitdata->cond_mutex);
pwaitdata->waiter_count += 1;
if (pwaitdata->waiter_count < target_count) {
// need to wait for other threads
__mutex_unlock( &barrier->waitdata_mutex);
do {
// TODO: handle the return code from `__cond_wait()` to break out of this
// if a signal makes that necessary
__cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
} while (!pwaitdata->wait_complete);
}
else {
// this thread satisfies the wait - unblock all the other waiters
pwaitdata->wait_complete = 1;
// 'release' our use of the passed in pthread_barrier_t object
barrier->pwaitdata = NULL;
// unlock the barrier's waitdata_mutex - the barrier is
// ready for use by another set of threads
__mutex_unlock( barrier->waitdata_mutex);
// finally, unblock the waiting threads
__cond_broadcast( &pwaitdata->cond);
}
// at this point, barrier->waitdata_mutex is unlocked, the
// barrier->pwaitdata pointer has been cleared, and no further
// use of `*barrier` is permitted...
// however, each thread still has a valid `pwaitdata` pointer - the
// thread that owns that block needs to wait until all others have
// dropped the pwaitdata->waiter_count
// also, at this point the `pwaitdata->cond_mutex` is locked, so
// we're in a critical section
rc = 0;
pwaitdata->waiter_count--;
if (pwaitdata == &waitdata) {
// this thread owns the waitdata block - it needs to hang around until
// all other threads are done
// as a convenience, this thread will be the one that returns
// PTHREAD_BARRIER_SERIAL_THREAD
rc = PTHREAD_BARRIER_SERIAL_THREAD;
while (pwaitdata->waiter_count!= 0) {
__cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
};
__mutex_unlock( &pwaitdata->cond_mutex);
__cond_destroy( &pwaitdata->cond);
__mutex_destroy( &pwaitdata_cond_mutex);
}
else if (pwaitdata->waiter_count == 0) {
__cond_signal( &pwaitdata->cond);
__mutex_unlock( &pwaitdata->cond_mutex);
}
return rc;
}
17 July 20111: Update in response to a comment/question about process-shared barriers
I forgot completely about the situation with barriers that are shared between processes. And as you mention, the idea I outlined will fail horribly in that case. I don't really have experience with POSIX shared memory use, so any suggestions I make should be tempered with scepticism.
To summarize (for my benefit, if no one else's):
When any of the threads gets control after pthread_barrier_wait() returns, the barrier object needs to be in the 'init' state (however, the most recent pthread_barrier_init() on that object set it). Also implied by the API is that once any of the threads return, one or more of the the following things could occur:
another call to pthread_barrier_wait() to start a new round of synchronization of threads
pthread_barrier_destroy() on the barrier object
the memory allocated for the barrier object could be freed or unshared if it's in a shared memory region.
These things mean that before the pthread_barrier_wait() call allows any thread to return, it pretty much needs to ensure that all waiting threads are no longer using the barrier object in the context of that call. My first answer addressed this by creating a 'local' set of synchronization objects (a mutex and an associated condition variable) outside of the barrier object that would block all the threads. These local synchronization objects were allocated on the stack of the thread that happened to call pthread_barrier_wait() first.
I think that something similar would need to be done for barriers that are process-shared. However, in that case simply allocating those sync objects on a thread's stack isn't adequate (since the other processes would have no access). For a process-shared barrier, those objects would have to be allocated in process-shared memory. I think the technique I listed above could be applied similarly:
the waitdata_mutex that controls the 'allocation' of the local sync variables (the waitdata block) would be in process-shared memory already by virtue of it being in the barrier struct. Of course, when the barrier is set to THEAD_PROCESS_SHARED, that attribute would also need to be applied to the waitdata_mutex
when __barrier_waitdata_init() is called to initialize the local mutex & condition variable, it would have to allocate those objects in shared memory instead of simply using the stack-based waitdata variable.
when the 'cleanup' thread destroys the mutex and the condition variable in the waitdata block, it would also need to clean up the process-shared memory allocation for the block.
in the case where shared memory is used, there needs to be some mechanism to ensured that the shared memory object is opened at least once in each process, and closed the correct number of times in each process (but not closed entirely before every thread in the process is finished using it). I haven't thought through exactly how that would be done...
I think these changes would allow the scheme to operate with process-shared barriers. the last bullet point above is a key item to figure out. Another is how to construct a name for the shared memory object that will hold the 'local' process-shared waitdata. There are certain attributes you'd want for that name:
you'd want the storage for the name to reside in the struct pthread_barrier_t structure so all process have access to it; that means a known limit to the length of the name
you'd want the name to be unique to each 'instance' of a set of calls to pthread_barrier_wait() because it might be possible for a second round of waiting to start before all threads have gotten all the way out of the first round waiting (so the process-shared memory block set up for the waitdata might not have been freed yet). So the name probably has to be based on things like process id, thread id, address of the barrier object, and an atomic counter.
I don't know whether or not there are security implications to having the name be 'guessable'. if so, some randomization needs to be added - no idea how much. Maybe you'd also need to hash the data mentioned above along with the random bits. Like I said, I really have no idea if this is important or not.

As far as I can see there is no need for pthread_barrier_destroy to be an immediate operation. You could have it wait until all threads that are still in their wakeup phase are woken up.
E.g you could have an atomic counter awakening that initially set to the number of threads that are woken up. Then it would be decremented as last action before pthread_barrier_wait returns. pthread_barrier_destroy then just could be spinning until that counter falls to 0.