Background
I am somewhat confused about my understanding of how condition variables operate in conjunction with concurrent access to shared data. The following is pseudo code to depict my current issue.
// Thread 1: Producer
void cakeMaker()
{
lock(some_lock);
while(number_of_cakes == MAX_CAKES)
wait(rack_has_space);
number_of_cakes++;
signal(rack_has_cakes);
unlock(some_lock);
}
// Thread 2: Consumer
void cakeEater()
{
lock(some_lock);
while(number_of_cakes == 0)
wait(rack_has_cakes);
number_of_cakes--;
signal(rack_has_space);
unlock(some_lock);
}
Let's consider the scenario where the value of number_of_cakes is 0. As a result, Thread 2 is blocked at wait(rack_has_cakes). When Thread 1 runs and increments the value of number_of_cakes to 1, it signals rack_has_cakes. However, Thread 2 wakes up before Thread 1 releases the lock on some_lock, causing it to go back to sleep and miss the signal.
I am unclear about the operation of wait and signal. Are they like a toggle switch that gets set to 1 when signal is called and 0 when wait succeeds? Can someone explain what is happening behind the scenes?
Question
Can someone walk me through one iteration of the above code step-by-step, with a strong emphasis on the events that occur during the signal and wait method calls?
thread 2 wakes up before Thread 1 calls unlock(some_lock), so it goes
back to sleep again and the signal has been missed.
No, that's not how it works. I will use C++ std::condition_variable for my cite, but POSIX threads, and most run-of-the-mill implementation of mutexes and condition variables work the same way. The underlying concepts are the same.
Thread 2 has the mutex locked, when it starts waiting on a condition variable. The wait() operation unlocks the mutex and waits on the condition variable atomically:
Atomically releases lock, blocks the current executing thread, and
adds it to the list of threads waiting on *this.
This operation is considered "atomic"; in other words, indivisible.
Then, when the condition variable is signaled, the thread re-locks the mutex:
When unblocked, regardless of the reason, lock is reacquired and wait
exits.
The thread does not "go back to sleep" before the other thread "calls unlock". If the mutex has not yet been unlocked: when the thread wakes up upon being signaled by a condition variable, the thread will always wait until it succeeds in locking the mutex again. This is unconditional. When wait() returns the mutex is still locked. Then, and only then, the wait() function returns. So, the sequence of events is:
One thread has the mutex locked, sets some counter, variable, or any kind of mutex-protected data to the state that the other thread is waiting for. After doing so the thread signals the condition variable, and then unlocks the mutex at its leisure.
The other thread has locked the mutex before it wait()s on the condition variable. One of wait()'s prerequisites is that the mutex must be locked before wait()ing on the linked condition variable. So, the wait() operation unlocks the mutex "atomically". That is, there is no instance when the mutex is unlocked, and the thread is not yet waiting on the condition variable. When wait() unlocks the mutex, you are guaranteed that the thread will be waiting, and it will wake up. You can take it to the bank.
Once the condition variable is signaled, the wait()ing thread does not return from wait() until it can re-lock the mutex. Having received a signal from the condition variable is just the first step, the mutex must be locked again, by thread, in the final step of the wait() operation. Which, of course, only happens after the signaling thread unlocks the mutex.
When a thread gets signaled by a condition variable, it will return from wait(). But not immediately, it must wait until the thread locks the mutex again, however long it takes. It will not go "back to sleep", but wait until it has the mutex locked again, and then return. You are guaranteed that a received condition variable signal will cause the thread to return from wait(), and the mutex will be re-locked by the thread. And because the original unlock-then-wait operation was atomic, you are guaranteed to receive the condition variable signal.
Lets say we currently have number_of_cakes = 0, so Thread 2 is currently stuck on wait(rack_has_cakes). Thread 1 runs and increments number_of_cakes by 1. Then it calls signal(rack_has_cakes) - this wakes up Thread 2, unfortunately Thread 2 wakes up before Thread 1 calls unlock(some_lock), so it goes back to sleep again and the signal has been missed.
You are right, that might be happens, because your signal command order was not correct.
In both Producer and Consumer, you have set the following order of commands:
signal(rack_has_cakes);
unlock(some_lock);
But the order should be:
unlock(some_lock);
signal(rack_has_cakes);
You first have to unlock the mutex and then signal the other thread.
Since signal command is condition variable wait() and signal() commands are thread safe, you should not worry about releasing the lock before.
But this step is very important as it give the other thread a chance to lock the mutex.
Related
For the code below, the mutex will not available by the time second cond_broadcast is executed(assuming multiple threads already waiting on the condition). In such situation, does the broadcast select the thread(waiting on the condition) to hand the mutex to and wait for the mutex to be unlocked by some other thread or the second cond_broadcast is just ignored?
void* func(void* arg){
pthread_mutex_lock(&m);
while(condition){
pthread_cond_wait(&c,&m);
}
pthread_cond_broadcast(&c);
pthread_mutex_unlock(&m);
pthread_cond_broadcast(&c);
}
For the code below, the mutex will not available by the time second
cond_broadcast is executed(assuming multiple threads already waiting
on the condition).
I think you mean that the mutex will not be available to the thread calling pthread_cond_broadcast() at the second call to that function, but that's irrelevant. Calling pthread_cond_broadcast() is independent of holding any mutex.
Or perhaps you mean that one of the previously blocked threads will have acquired the mutex by the time the second broadcast happens, but (1) that's not certain, and (2) if it does happen, that has no particular significance with respect to the broadcast.
In such situation, does the broadcast select the
thread(waiting on the condition) to hand the mutex to and wait for the
mutex to be unlocked by some other thread or the second cond_broadcast
is just ignored?
Neither. pthread_cond_broadcast() and pthread_cond_signal() have no role in locking or transferring control of any mutex. They just wake threads blocked on the associated CV. That each such thread must acquire the mutex before returning from the call is a separate consideration -- they all contend normally to lock the mutex, and they do not return from pthread_cond_wait() until they do. They also do not go back to waiting without first returning from their wait and then calling pthread_cond_wait() again.
But that does not mean that the second pthread_cond_broadcast() in your code necessarily will have no effect. One of the just-woken threads might loop around and wait on the CV again between the two calls, or some other thread might arrive at the CV. That becomes possible as soon as the first thread releases the mutex, and the fact that the first thing that thread tries to do is another broadcast does not ensure that the broadcast happens before another thread can start waiting.
It is unlikely that you want two broadcasts one after the other like that, but which one you retain has little, if any, effect on the overall semantics of the program.
This question is regarding the pthread tutorial in llnl.
Say there are three threads.
Thread 1:
pthread_mutex_lock(&mutex)
do_something...
if condition
pthread_cond_signal(&con)
pthread_mutex_unlock(&mutex)
repeat
Thread 2:
pthread_mutex_lock(&mutex)
do_something...
if condition
pthread_cond_signal(&con)
pthread_mutex_unlock(&mutex)
repeat
Thread 3:
pthread_mutex_lock(&mutex)
while(condition not holds)
pthread_cond_wait(&con)
do_something...
pthread_mutex_unlock(&mutex)
Assume that Thread1 checks that condition is satisfied and then sends signal to wake up Thread3. Finally it unlocks the mutex. But at the same time, Thread 2 is trying to lock the mutex.
My question is: is there any guarantee that Thread3 will always win in the competition?
If not then after Thread2 do_something..., the condition variable may be changed, then when Thread3 gets to lock the mutex, the condition variable is different from what it expects.
My question is: is there any guarantee that Tread3 will always win in the competition?
No such guarantees. From POSIX:
The pthread_cond_signal() function shall unblock at least one of the
threads that are blocked on the specified condition variable cond (if
any threads are blocked on cond).
If more than one thread is blocked on a condition variable, the
scheduling policy shall determine the order in which threads are
unblocked. When each thread unblocked as a result of a
pthread_cond_broadcast() or pthread_cond_signal() returns from its
call to pthread_cond_wait() or pthread_cond_timedwait(), the thread
shall own the mutex with which it called pthread_cond_wait() or
pthread_cond_timedwait(). The thread(s) that are unblocked shall
contend for the mutex according to the scheduling policy (if
applicable), and as if each had called pthread_mutex_lock().
(Emphasis mine).
If not then after Tread2 do_something..., the condition variable may be changed, then when Tread3 gets to lock the mutex, the condition variable is different from what it expects.
If the order of execution of threads matter then you probably need to rewrite your code.
There is no guarantee. Returning from pthread_cond_wait() should be treated as implying that the condition might have changed, not that it definitely has - so you need to re-check the condition, and may need to wait again. This is exactly why pthread_cond_wait() should be called within a loop that checks the condition.
This question concerns the pthread API for Posix systems.
My understanding is that when waiting for a conditional variable, or more specifically a pthread_cond_t, the flow goes something like this.
// imagine the mutex is named mutex and the conditional variable is named cond
// first we lock the mutex to prevent race conditions
pthread_mutex_lock(&mutex);
// then we wait for the conditional variable, releasing the mutex
pthread_cond_wait(&cond, &mutex);
// after we're done waiting we own the mutex again have to release it
pthread_mutex_unlock(&mutex);
In this example we stop waiting for the mutex when some other thread follows a procedure like this.
// lock the mutex to prevent race conditions
pthread_mutex_lock(&mutex);
// signal the conditional variable, giving up control of the mutex
pthread_cond_signal(&cond);
My understanding is that if multiple threads are waiting some kind of scheduling policy will be applied, and whichever thread is unblocked also gets back the associated mutex.
Now what I don't understand is what happens when some thread calls pthread_cond_broadcast(&cond) to awake all of the threads waiting on the conditional variable.
Does only one thread get to own the mutex? Do I need to wait in a fundamentally different manner when waiting for a broadcast than when waiting for a signal (i.e. by not calling pthread_mutex_unlock unless I can confirm this thread acquired the mutex)? Or am I wrong in my whole understanding of how the mutex/cond relationship works?
Most importantly, if (as I think is probably the case) pthread_cond_broadcast causes threads to compete for the associated mutex as if they had all tried to lock it, does that mean only one thread will really wake up?
When some thread calls pthread_cond_broadcast while holding the mutex, it holds the mutex. Which means that once pthread_cond_broadcast returns, it still owns the mutex.
The other threads will all wake up, try to lock the mutex, then go to sleep to wait for the mutex to become available.
If you call pthread_cond_broadcast while not holding the mutex, then one of the other threads will be able to lock the mutex immediately. All the others will have to wait to lock the mutex.
Regarding this:
How To Use Condition Variable
Say we have number of consumer threads that execute such code (copied from the referenced page):
while (TRUE) {
s = pthread_mutex_lock(&mtx);
while (avail == 0) { /* Wait for something to consume */
s = pthread_cond_wait(&cond, &mtx);
}
while (avail > 0) { /* Consume all available units */
avail--;
}
s = pthread_mutex_unlock(&mtx);
}
I assume that scenario here is: main thread calls pthread_cond_signal() to tell consumer threads to do some work.
As I understand it - subsequent threads call pthread_mutex_lock() and then pthread_cond_wait() (which atomically unlocks the mutex). By now none of the consumer threads is claiming the mutex, they all wait on pthread_cond_wait().
When the main thread calls pthread_cond_signal(), following the manpage, at least one thread is waken up. When any of them returns from pthread_cond_wait() it automatically claims the mutex.
So my question is: what happens now regarding the provided example code? Namely, what does the thread that lost the contest for the mutex do now?
(AFAICT the thread that won the mutex, should run the rest of the code and release the mutex. The one that lost should be stuck waiting on the mutex - somewhere in the 1st nested while loop - while the winner holds it and after it's been released start blocking on pthread_cond_wait() beacuse the while (avail == 0) will be satisfied by then. Am I correct?)
Note that pthread_cond_signal() is generally intended to wake up only one waiting thread (that's all that it guarantees). But it could wake more 'accidentally'. The while (avail > 0) loop performs two functions:
it allows the one thread guaranteed to be woken up to consume all queued work units
it prevents additional 'accidentally' awakened threads from assuming that there's work to be done, when there might not be since the initial thread would have handled all of them.
It also prevents a race condition where a work unit might have been placed on the queue after the while (avail > 0) has completed, but before the worker thread has waited on the condition again - but that race is also handled by the if test just before calling pthread_cond_wait().
Basically when a thread is awakened, all it knows is that there might be work units for it to consume, but there might not (another thread might have consumed them).
So the sequence of events that occurs when pthread_cond_signal() is called is:
the system will wake one or more threads waiting on the condition
all the threads that are awakened will then try to acquire the mutex - only one of them can acquire it at any particular moment, since that's the purpose of a mutex
that thread will then proceed, perform the work in the while (avail > 0) loop, then will release the mutex
at that point one of the other threads that were previously woken up will acquire the mutex and work the same loop, then release the mutex. Generally, there will be no work units available anymore (since the first thread would have consumed all of them), but if another thread had added an additional unit (or more), then this thread would handle that work
the next thread will acquire the mutex and perform that same set of logic
pthread_cond_wait() has to acquire given mutex once signaled/woken up. If another thread wins that race, the function blocks until the mutex is released. So from the application point of view it doesn't return until current thread holds the mutex. The wait is always done in a loop (while (avail == 0) { ... above) to make sure that application condition we are waiting for still holds (buffer not empty, more work available, etc.)
Hope this helps.
The thread that lost the contest wakes up once the mutex is unlocked, checks the condition again, then goes to sleep on the condition variable.
When any of them returns from pthread_cond_wait() it automatically claims the mutex.
Ah, but it doesn't. Not "automatically", that is, depending on what "automatically" means. You might be confused by the "atomic" semantics of pthread_cond_wait; but that semantics is played out on the entry side: a thread is somehow registered for waiting on the condition before giving up the mutex, so that there isn't any window during which the thread no longer has the mutex, and is not yet waiting on the variable.
Each thread which returns from pthread_cond_wait has to acquire the mutex and therefore contend for it. Those which lose the race for the mutex have to block on the mutex, similarly as if they called pthread_mutex_lock.
The way the mutex is acquired on exit from pthread_cond_wait can be modeled as a regular pthread_mutex_lock operation. Essentially, the threads have to queue up on the mutex in order to exit. Each thread which acquires the mutex then returns from the function; the others have to wait until that thread gives up the mutex before they are allowed to return.
No thread woken up by the signal gets the mutex "automatically", in the sense of somehow being transferred ownership due to special eligibility. Firstly, on a multiprocessor, a woken thread can lose the race to a thread already running on another processor which snatches the mutex, if it is available, or else queue to wait on the mutex ahead of the thread which received the signal. Secondly, the thread which calls pthread_cond_signal may itself not have given up the mutex, and may continue to hold it indefinitely, which means that all the woken threads will queue up on a mutex lock operation and none will emerge from pthread_mutex_lock until that thread gives up the mutex.
All that is "automatic" is that the pthread_cond_wait operation doesn't return until acquiring the mutex again, and so the application doesn't have to take the step to acquire the mutex.
I'm learning pthread and wait conditions. As far as I can tell a typical waiting thread is like this:
pthread_mutex_lock(&m);
while(!condition)
pthread_cond_wait(&cond, &m);
// Thread stuff here
pthread_mutex_unlock(&m);
What I can't understand is why the line while(!condition) is necessary even if I use pthread_cond_signal() to wake up the thread.
I can understand that if I use pthread_cond_broadcast() I need to test condition, because I wake up all waiting threads and one of them can make the condition false again before unlocking the mutex (and thus transferring execution to another waked up thread which should not execute at that point).
But if I use pthread_cond_signal() I wake up just one thread so the condition must be true. So the code could look like this:
pthread_mutex_lock(&m);
pthread_cond_wait(&cond, &m);
// Thread stuff here
pthread_mutex_unlock(&m);
I read something about spurious signals that may happen. Is this (and only this) the reason? Why should I have spurious singnals? Or there is something else I don't get?
I assume the signal code is like this:
pthread_mutex_lock(&m);
condition = true;
pthread_cond_signal(&cond); // Should wake up *one* thread
pthread_mutex_unlock(&m);
The real reason you should put pthread_cond_wait in a while loop is not because of spurious wakeup. Even if your condition variable did not have spurious wakeup, you would still need the loop to catch a common type of error. Why? Consider what can happen if multiple threads wait on the same condition:
Thread 1 Thread 2 Thread 3
check condition (fails)
(in cond_wait) unlock mutex
(in cond_wait) wait
lock mutex
set condition
signal condvar
unlock mutex
lock mutex
check condition (succeeds)
do stuff
unset condition
unlock mutex
(in cond_wait) wake up
(in cond_wait) lock mutex
<thread is awake, but condition
is unset>
The problem here is that the thread must release the mutex before waiting, potentially allowing another thread to 'steal' whatever that thread was waiting for. Unless it is guaranteed that only one thread can wait on that condition, it is incorrect to assume that the condition is valid when a thread wakes up.
Suppose you don't check the condition. Then usually you can't avoid the following bad thing happening (at least, you can't avoid it in one line of code):
Sender Receiver
locks mutex
sets condition
signals condvar, but nothing
is waiting so has no effect
releases mutex
locks mutex
waits. Forever.
Of course your second code example could avoid this by doing:
pthread_mutex_lock(&m);
if (!condition) pthread_cond_wait(&cond, &m);
// Thread stuff here
pthread_mutex_unlock(&m);
Then it would certainly be the case that if there is only ever at most one receiver, and if cond_signal were the only thing that could wake it up, then it would only ever wake up when the condition was set and hence would not need a loop. nos covers why the second "if" isn't true.