For the code below, the mutex will not available by the time second cond_broadcast is executed(assuming multiple threads already waiting on the condition). In such situation, does the broadcast select the thread(waiting on the condition) to hand the mutex to and wait for the mutex to be unlocked by some other thread or the second cond_broadcast is just ignored?
void* func(void* arg){
pthread_mutex_lock(&m);
while(condition){
pthread_cond_wait(&c,&m);
}
pthread_cond_broadcast(&c);
pthread_mutex_unlock(&m);
pthread_cond_broadcast(&c);
}
For the code below, the mutex will not available by the time second
cond_broadcast is executed(assuming multiple threads already waiting
on the condition).
I think you mean that the mutex will not be available to the thread calling pthread_cond_broadcast() at the second call to that function, but that's irrelevant. Calling pthread_cond_broadcast() is independent of holding any mutex.
Or perhaps you mean that one of the previously blocked threads will have acquired the mutex by the time the second broadcast happens, but (1) that's not certain, and (2) if it does happen, that has no particular significance with respect to the broadcast.
In such situation, does the broadcast select the
thread(waiting on the condition) to hand the mutex to and wait for the
mutex to be unlocked by some other thread or the second cond_broadcast
is just ignored?
Neither. pthread_cond_broadcast() and pthread_cond_signal() have no role in locking or transferring control of any mutex. They just wake threads blocked on the associated CV. That each such thread must acquire the mutex before returning from the call is a separate consideration -- they all contend normally to lock the mutex, and they do not return from pthread_cond_wait() until they do. They also do not go back to waiting without first returning from their wait and then calling pthread_cond_wait() again.
But that does not mean that the second pthread_cond_broadcast() in your code necessarily will have no effect. One of the just-woken threads might loop around and wait on the CV again between the two calls, or some other thread might arrive at the CV. That becomes possible as soon as the first thread releases the mutex, and the fact that the first thing that thread tries to do is another broadcast does not ensure that the broadcast happens before another thread can start waiting.
It is unlikely that you want two broadcasts one after the other like that, but which one you retain has little, if any, effect on the overall semantics of the program.
Related
What happens when you call pthread_cond_broadcast() and multiple threads wake up just to compete for the same mutex lock. One of the threads takes the mutex lock but what happens to the other threads? Do they go back to sleep? Or do they spin until the lock is available again?
What happens when you call pthread_cond_broadcast() and multiple
threads wake up just to compete for the same mutex lock. One of the
threads takes the mutex lock but what happens to the other threads? Do
they go back to sleep? Or do they spin until the lock is available
again?
When you call pthread_cond_broadcast(), all threads then waiting on the specified condition variable stop doing so. All such threads will have passed (a pointer to) the same mutex to pthread_cond_wait(), else the behavior is undefined. Each thread that was unblocked will (re)acquire that mutex before returning successfully from pthread_cond_wait(). That may require some or even all of them to block, just as if they were all contending for the same mutex under any other circumstances. They do not spin, and they do not require any further interaction with the CV for them to resume, but each one will hold the mutex locked when it returns from pthread_cond_wait(), just as it did when it called that function.
How atomic is the unlock-and-wait of pthread_cond_wait() call?
Is there a window where mutex is already unlocked but the thread is not yet in the part where it is actually waiting and able to receive notifications?
IOW, is there a potential for missed wakeups in the pthread_cond_wait() function itself?
The POSIX.1-2008 specification of pthread_cond_wait addresses this question in its second paragraph:
These functions [pthread_cond_wait and pthread_cond_timedwait] atomically release mutex and cause the calling thread to block on the condition variable cond; atomically here means "atomically with respect to access by another thread to the mutex and then the condition variable". That is, if another thread is able to acquire the mutex after the about-to-block thread has released it, then a subsequent call to pthread_cond_broadcast() or pthread_cond_signal() in that thread shall behave as if it were issued after the about-to-block thread has blocked.
So, to first order, the answer is "yes, it's atomic", but pay careful attention to the last sentence. There is no missed-wakeup window as long as the call to pthread_cond_broadcast or pthread_cond_signal comes from a thread that successfully acquired the mutex after the sleeping thread released it, and then, perhaps, released it again. If the call comes from a thread that has not acquired the mutex at least once since it was released, the wakeup might get lost.
Background
I am somewhat confused about my understanding of how condition variables operate in conjunction with concurrent access to shared data. The following is pseudo code to depict my current issue.
// Thread 1: Producer
void cakeMaker()
{
lock(some_lock);
while(number_of_cakes == MAX_CAKES)
wait(rack_has_space);
number_of_cakes++;
signal(rack_has_cakes);
unlock(some_lock);
}
// Thread 2: Consumer
void cakeEater()
{
lock(some_lock);
while(number_of_cakes == 0)
wait(rack_has_cakes);
number_of_cakes--;
signal(rack_has_space);
unlock(some_lock);
}
Let's consider the scenario where the value of number_of_cakes is 0. As a result, Thread 2 is blocked at wait(rack_has_cakes). When Thread 1 runs and increments the value of number_of_cakes to 1, it signals rack_has_cakes. However, Thread 2 wakes up before Thread 1 releases the lock on some_lock, causing it to go back to sleep and miss the signal.
I am unclear about the operation of wait and signal. Are they like a toggle switch that gets set to 1 when signal is called and 0 when wait succeeds? Can someone explain what is happening behind the scenes?
Question
Can someone walk me through one iteration of the above code step-by-step, with a strong emphasis on the events that occur during the signal and wait method calls?
thread 2 wakes up before Thread 1 calls unlock(some_lock), so it goes
back to sleep again and the signal has been missed.
No, that's not how it works. I will use C++ std::condition_variable for my cite, but POSIX threads, and most run-of-the-mill implementation of mutexes and condition variables work the same way. The underlying concepts are the same.
Thread 2 has the mutex locked, when it starts waiting on a condition variable. The wait() operation unlocks the mutex and waits on the condition variable atomically:
Atomically releases lock, blocks the current executing thread, and
adds it to the list of threads waiting on *this.
This operation is considered "atomic"; in other words, indivisible.
Then, when the condition variable is signaled, the thread re-locks the mutex:
When unblocked, regardless of the reason, lock is reacquired and wait
exits.
The thread does not "go back to sleep" before the other thread "calls unlock". If the mutex has not yet been unlocked: when the thread wakes up upon being signaled by a condition variable, the thread will always wait until it succeeds in locking the mutex again. This is unconditional. When wait() returns the mutex is still locked. Then, and only then, the wait() function returns. So, the sequence of events is:
One thread has the mutex locked, sets some counter, variable, or any kind of mutex-protected data to the state that the other thread is waiting for. After doing so the thread signals the condition variable, and then unlocks the mutex at its leisure.
The other thread has locked the mutex before it wait()s on the condition variable. One of wait()'s prerequisites is that the mutex must be locked before wait()ing on the linked condition variable. So, the wait() operation unlocks the mutex "atomically". That is, there is no instance when the mutex is unlocked, and the thread is not yet waiting on the condition variable. When wait() unlocks the mutex, you are guaranteed that the thread will be waiting, and it will wake up. You can take it to the bank.
Once the condition variable is signaled, the wait()ing thread does not return from wait() until it can re-lock the mutex. Having received a signal from the condition variable is just the first step, the mutex must be locked again, by thread, in the final step of the wait() operation. Which, of course, only happens after the signaling thread unlocks the mutex.
When a thread gets signaled by a condition variable, it will return from wait(). But not immediately, it must wait until the thread locks the mutex again, however long it takes. It will not go "back to sleep", but wait until it has the mutex locked again, and then return. You are guaranteed that a received condition variable signal will cause the thread to return from wait(), and the mutex will be re-locked by the thread. And because the original unlock-then-wait operation was atomic, you are guaranteed to receive the condition variable signal.
Lets say we currently have number_of_cakes = 0, so Thread 2 is currently stuck on wait(rack_has_cakes). Thread 1 runs and increments number_of_cakes by 1. Then it calls signal(rack_has_cakes) - this wakes up Thread 2, unfortunately Thread 2 wakes up before Thread 1 calls unlock(some_lock), so it goes back to sleep again and the signal has been missed.
You are right, that might be happens, because your signal command order was not correct.
In both Producer and Consumer, you have set the following order of commands:
signal(rack_has_cakes);
unlock(some_lock);
But the order should be:
unlock(some_lock);
signal(rack_has_cakes);
You first have to unlock the mutex and then signal the other thread.
Since signal command is condition variable wait() and signal() commands are thread safe, you should not worry about releasing the lock before.
But this step is very important as it give the other thread a chance to lock the mutex.
This question concerns the pthread API for Posix systems.
My understanding is that when waiting for a conditional variable, or more specifically a pthread_cond_t, the flow goes something like this.
// imagine the mutex is named mutex and the conditional variable is named cond
// first we lock the mutex to prevent race conditions
pthread_mutex_lock(&mutex);
// then we wait for the conditional variable, releasing the mutex
pthread_cond_wait(&cond, &mutex);
// after we're done waiting we own the mutex again have to release it
pthread_mutex_unlock(&mutex);
In this example we stop waiting for the mutex when some other thread follows a procedure like this.
// lock the mutex to prevent race conditions
pthread_mutex_lock(&mutex);
// signal the conditional variable, giving up control of the mutex
pthread_cond_signal(&cond);
My understanding is that if multiple threads are waiting some kind of scheduling policy will be applied, and whichever thread is unblocked also gets back the associated mutex.
Now what I don't understand is what happens when some thread calls pthread_cond_broadcast(&cond) to awake all of the threads waiting on the conditional variable.
Does only one thread get to own the mutex? Do I need to wait in a fundamentally different manner when waiting for a broadcast than when waiting for a signal (i.e. by not calling pthread_mutex_unlock unless I can confirm this thread acquired the mutex)? Or am I wrong in my whole understanding of how the mutex/cond relationship works?
Most importantly, if (as I think is probably the case) pthread_cond_broadcast causes threads to compete for the associated mutex as if they had all tried to lock it, does that mean only one thread will really wake up?
When some thread calls pthread_cond_broadcast while holding the mutex, it holds the mutex. Which means that once pthread_cond_broadcast returns, it still owns the mutex.
The other threads will all wake up, try to lock the mutex, then go to sleep to wait for the mutex to become available.
If you call pthread_cond_broadcast while not holding the mutex, then one of the other threads will be able to lock the mutex immediately. All the others will have to wait to lock the mutex.
Regarding this:
How To Use Condition Variable
Say we have number of consumer threads that execute such code (copied from the referenced page):
while (TRUE) {
s = pthread_mutex_lock(&mtx);
while (avail == 0) { /* Wait for something to consume */
s = pthread_cond_wait(&cond, &mtx);
}
while (avail > 0) { /* Consume all available units */
avail--;
}
s = pthread_mutex_unlock(&mtx);
}
I assume that scenario here is: main thread calls pthread_cond_signal() to tell consumer threads to do some work.
As I understand it - subsequent threads call pthread_mutex_lock() and then pthread_cond_wait() (which atomically unlocks the mutex). By now none of the consumer threads is claiming the mutex, they all wait on pthread_cond_wait().
When the main thread calls pthread_cond_signal(), following the manpage, at least one thread is waken up. When any of them returns from pthread_cond_wait() it automatically claims the mutex.
So my question is: what happens now regarding the provided example code? Namely, what does the thread that lost the contest for the mutex do now?
(AFAICT the thread that won the mutex, should run the rest of the code and release the mutex. The one that lost should be stuck waiting on the mutex - somewhere in the 1st nested while loop - while the winner holds it and after it's been released start blocking on pthread_cond_wait() beacuse the while (avail == 0) will be satisfied by then. Am I correct?)
Note that pthread_cond_signal() is generally intended to wake up only one waiting thread (that's all that it guarantees). But it could wake more 'accidentally'. The while (avail > 0) loop performs two functions:
it allows the one thread guaranteed to be woken up to consume all queued work units
it prevents additional 'accidentally' awakened threads from assuming that there's work to be done, when there might not be since the initial thread would have handled all of them.
It also prevents a race condition where a work unit might have been placed on the queue after the while (avail > 0) has completed, but before the worker thread has waited on the condition again - but that race is also handled by the if test just before calling pthread_cond_wait().
Basically when a thread is awakened, all it knows is that there might be work units for it to consume, but there might not (another thread might have consumed them).
So the sequence of events that occurs when pthread_cond_signal() is called is:
the system will wake one or more threads waiting on the condition
all the threads that are awakened will then try to acquire the mutex - only one of them can acquire it at any particular moment, since that's the purpose of a mutex
that thread will then proceed, perform the work in the while (avail > 0) loop, then will release the mutex
at that point one of the other threads that were previously woken up will acquire the mutex and work the same loop, then release the mutex. Generally, there will be no work units available anymore (since the first thread would have consumed all of them), but if another thread had added an additional unit (or more), then this thread would handle that work
the next thread will acquire the mutex and perform that same set of logic
pthread_cond_wait() has to acquire given mutex once signaled/woken up. If another thread wins that race, the function blocks until the mutex is released. So from the application point of view it doesn't return until current thread holds the mutex. The wait is always done in a loop (while (avail == 0) { ... above) to make sure that application condition we are waiting for still holds (buffer not empty, more work available, etc.)
Hope this helps.
The thread that lost the contest wakes up once the mutex is unlocked, checks the condition again, then goes to sleep on the condition variable.
When any of them returns from pthread_cond_wait() it automatically claims the mutex.
Ah, but it doesn't. Not "automatically", that is, depending on what "automatically" means. You might be confused by the "atomic" semantics of pthread_cond_wait; but that semantics is played out on the entry side: a thread is somehow registered for waiting on the condition before giving up the mutex, so that there isn't any window during which the thread no longer has the mutex, and is not yet waiting on the variable.
Each thread which returns from pthread_cond_wait has to acquire the mutex and therefore contend for it. Those which lose the race for the mutex have to block on the mutex, similarly as if they called pthread_mutex_lock.
The way the mutex is acquired on exit from pthread_cond_wait can be modeled as a regular pthread_mutex_lock operation. Essentially, the threads have to queue up on the mutex in order to exit. Each thread which acquires the mutex then returns from the function; the others have to wait until that thread gives up the mutex before they are allowed to return.
No thread woken up by the signal gets the mutex "automatically", in the sense of somehow being transferred ownership due to special eligibility. Firstly, on a multiprocessor, a woken thread can lose the race to a thread already running on another processor which snatches the mutex, if it is available, or else queue to wait on the mutex ahead of the thread which received the signal. Secondly, the thread which calls pthread_cond_signal may itself not have given up the mutex, and may continue to hold it indefinitely, which means that all the woken threads will queue up on a mutex lock operation and none will emerge from pthread_mutex_lock until that thread gives up the mutex.
All that is "automatic" is that the pthread_cond_wait operation doesn't return until acquiring the mutex again, and so the application doesn't have to take the step to acquire the mutex.