I'm tryng to do the Dining philosophers, and in my code, after a thread drop the stick, they also send a broadcast to all thread waiting in the while loop, to move foward, but apparently this is not happening and I don't know way
https://github.com/lucizzz/Philosophers/blob/main/dinning.c
Your code has a lot of bugs, but the most fundamental one is that you access shared state without holding the mutex that protects that state. For example, the while loop in routine_1 tests the stick array without holding the mutex. It even calls pthread_cond_wait without holding the mutex.
This is wrong for many reasons, but the most obvious is this -- what if the while loop decides to call pthread_cond_wait, but then before you call pthread_cond_wait, the thread holding the resources releases it. Now, you are calling pthread_cond_wait to wait for something that has already happened -- you will be waiting forever.
You must hold the mutex both when you decide whether to call pthread_cond_wait and when you actually do call pthread_cond_wait or your code will wait forever if a thread releases the resource before you were able to wait for it.
Fundamentally, the whole point of condition variables is to provide an atomic "unlock and wait" operation to avoid this race condition. But your code doesn't use the mutexes correctly.
Related
So the idea of pthread_cond_wait() is that it will unlock the mutex and wait for the condition.
Lets suppose that you would manually unlock the mutex first and then wait for a condition. Within that timeframe, between those two operations, you have to assume that something bad can happen, another thread will lock the mutex, which is not good. The same goes for if you first wait and then unlock, that is not possible.
So, here comes my question:
how does pthread_cond_wait() actually work?
The thread calls the function and passes a locked mutex and therafter waits on the condition to settle?
How does another thread then modify the variable, if it is already locked by this thread?
My first thought was, that the mutex has to be recursive, however being recursive only allows the same thread to lock the mutex multiple times.
Not sure why I haven't just google the specification for the pthread_cond_wait().
I guess I didn't completely know what my question will turn out to be, when I started.
Anyways the answer to my question can be found here: http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cond_wait.html
Scenario 1: release mutex then wait
Scenario 2: wait and then release mutex
Trying to understand conceptually what it does.
If the mutex were released before the calling thread is considered "blocked" on the condition variable, then another thread could lock the mutex, change the state that the predicate is based on, and call pthread_cond_signal without the waiting thread ever waking up (since it's not yet blocked). That's the problem.
Scenario 2, waiting then releasing the mutex, is internally how any real-world implementation has to work, since there's no such thing as an atomic implementation of the necessary behavior. But from the application's perspective, there's no way to observe the thread being part of the blocked set without the mutex also being released, so in the sense of the "abstract machine", it's atomic.
Edit: To go into more detail, the real-world implementation of a condition variable wait generally looks like:
Modify some internal state of the condition variable object such that the caller is considered to be part of the blocked set for it.
Unlock the mutex.
Perform a blocking wait operation, with the special property that it will return immediately if the state of the condition variable object from step 1 has changed due to a signal from any other thread.
Thus, the act of "blocking" is split between two steps, one of which happens before the mutex is unlocked (gaining membership in the blocked set) and the other of which happens after the mutex is unlocked (possibly sleeping and yielding control to other threads). It's this split that's able to make the "condition wait" operation "atomic" in the abstract machine.
Suppose a condition variable is used in a situation where the signaling thread modifies the state affecting the truth value of the predicate and calls pthread_cond_signal without holding the mutex associated with the condition variable? Is it true that this type of usage is always subject to race conditions where the signal may be missed?
To me, there seems to always be an obvious race:
Waiter evaluates the predicate as false, but before it can begin waiting...
Another thread changes state in a way that makes the predicate true.
That other thread calls pthread_cond_signal, which does nothing because there are no waiters yet.
The waiter thread enters pthread_cond_wait, unaware that the predicate is now true, and waits indefinitely.
But does this same kind of race condition always exist if the situation is changed so that either (A) the mutex is held while calling pthread_cond_signal, just not while changing the state, or (B) so that the mutex is held while changing the state, just not while calling pthread_cond_signal?
I'm asking from a standpoint of wanting to know if there are any valid uses of the above not-best-practices usages, i.e. whether a correct condition-variable implementation needs to account for such usages in avoiding race conditions itself, or whether it can ignore them because they're already inherently racy.
The fundamental race here looks like this:
THREAD A THREAD B
Mutex lock
Check state
Change state
Signal
cvar wait
(never awakens)
If we take a lock EITHER on the state change OR the signal, OR both, then we avoid this; it's not possible for both the state-change and the signal to occur while thread A is in its critical section and holding the lock.
If we consider the reverse case, where thread A interleaves into thread B, there's no problem:
THREAD A THREAD B
Change state
Mutex lock
Check state
( no need to wait )
Mutex unlock
Signal (nobody cares)
So there's no particular need for thread B to hold a mutex over the entire operation; it just need to hold the mutex for some, possible infinitesimally small interval, between the state change and signal. Of course, if the state itself requires locking for safe manipulation, then the lock must be held over the state change as well.
Finally, note that dropping the mutex early is unlikely to be a performance improvement in most cases. Requiring the mutex to be held reduces contention over the internal locks in the condition variable, and in modern pthreads implementations, the system can 'move' the waiting thread from waiting on the cvar to waiting on the mutex without waking it up (thus avoiding it waking up only to immediately block on the mutex).
As pointed out in the comments, dropping the mutex may improve performance in some cases, by reducing the number of syscalls needed. Then again it could also lead to extra contention on the condition variable's internal mutex. Hard to say. It's probably not worth worrying about in any case.
Note that the applicable standards require that pthread_cond_signal be safely callable without holding the mutex:
The pthread_cond_signal() or pthread_cond_broadcast() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits [...]
This usually means that condition variables have an internal lock over their internal data structures, or otherwise use some very careful lock-free algorithm.
The state must be modified inside a mutex, if for no other reason than the possibility of spurious wake-ups, which would lead to the reader reading the state while the writer is in the middle of writing it.
You can call pthread_cond_signal anytime after the state is changed. It doesn't have to be inside the mutex. POSIX guarantees that at least one waiter will awaken to check the new state. More to the point:
Calling pthread_cond_signal doesn't guarantee that a reader will acquire the mutex first. Another writer might get in before a reader gets a chance to check the new status. Condition variables don't guarantee that readers immediately follow writers (After all, what if there are no readers?)
Calling it after releasing the lock is actually better, since you don't risk having the just-awoken reader immediately going back to sleep trying to acquire the lock that the writer is still holding.
EDIT: #DietrichEpp makes a good point in the comments. The writer must change the state in such a way that the reader can never access an inconsistent state. It can do so either by acquiring the mutex used in the condition-variable, as I indicate above, or by ensuring that all state-changes are atomic.
The answer is, there is a race, and to eliminate that race, you must do this:
/* atomic op outside of mutex, and then: */
pthread_mutex_lock(&m);
pthread_mutex_unlock(&m);
pthread_cond_signal(&c);
The protection of the data doesn't matter, because you don't hold the mutex when calling pthread_cond_signal anyway.
See, by locking and unlocking the mutex, you have created a barrier. During that brief moment when the signaler has the mutex, there is a certainty: no other thread has the mutex. This means no other thread is executing any critical regions.
This means that all threads are either about to get the mutex to discover the change you have posted, or else they have already found that change and ran off with it (releasing the mutex), or else have not found they are looking for and have atomically given up the mutex to gone to sleep (and are guaranteed to be waiting nicely on the condition).
Without the mutex lock/unlock, you have no synchronization. The signal will sometimes fire as threads which didn't see the changed atomic value are transitioning to their atomic sleep to wait for it.
So this is what the mutex does from the point of view of a thread which is signaling. You can get the atomicity of access from something else, but not the synchronization.
P.S. I have implemented this logic before. The situation was in the Linux kernel (using my own mutexes and condition variables).
In my situation, it was impossible for the signaler to hold the mutex for the atomic operation on shared data. Why? Because the signaler did the operation in user space, inside a buffer shared between the kernel and user, and then (in some situations) made a system call into the kernel to wake up a thread. User space simply made some modifications to the buffer, and then if some conditions were satisfied, it would perform an ioctl.
So in the ioctl call I did the mutex lock/unlock thing, and then hit the condition variable. This ensured that the thread would not miss the wake up related to that latest modification posted by user space.
At first I just had the condition variable signal, but it looked wrong without the involvement of the mutex, so I reasoned about the situation a little bit and realized that the mutex must simply be locked and unlocked to conform to the synchronization ritual which eliminates the lost wakeup.
I have a situation where thread 1 is waiting on a condition variable A, which should be woken up by thread 2. Now thread 2 is waiting on a condition variable B , which should be woken up by thread 1. In the scenario I am using the condition variable, I cannot avoid such a deadlock situation. I detect cycle(deadlock) and terminate one of the threads which is a participant in the deadlock.
Now, what I am not sure is how to simply terminate a thread say thread 1 which is waiting on a condition variable.
Would be grateful for some pointers.
Thanks
Condition variables aren't like mutexes. By that I mean they aren't only usable by a single thread controlling them. The mutex that protects the condition variable is treated that way but that's only locked for short periods of time, unlocked manually by a thread after kicking (signalling) the condition variable, and automatically by a thread waiting for such a kick.
You can have a totally separate thread (like your deadlock detector, let's call it thread 3) simply kick one of the condition variables and it will wake up the thread waiting for it.
The usual use case for condition variables is for threads to wait for the kick then check to ensure you have work anyway (don't assume there is work simply because the variable was kicked). That's to take care of spurious wake-ups.
One possibility is to have a "global" deadlock_occurred flag which thread 3 sets when it detects deadlock, then also have thread 3 kick all the condition variables.
The first thing that threads 1 and 2 should do after being woken, should be to check that flag and take appropriate action (probably exit the thread).
You'll find you get into a lot less deadlock-type trouble if you architect your applications so that threads are responsible for their own lifetime. It's too easy to externally kill threads when they're not in a state amenable to being terminated. Don't get me wrong, there are other ways to handle it (such as with cancel points), but my tried and tested solution is by far the easiest I've ever found.
I have encountered a problem while implementing wait and signal conditions on multiple threads.
A thread needs to lock a mutex and wait on a condition variable until some other thread signals it. In the meanwhile, another thread locks the same mutex and waits on the same condition variable. Now, the thread which is running concurrently throughout the process signals the condition variable but I want only the first thread that is waiting must be signalled and not the others.
If two threads wait on the same condition variable, they must be prepared to handle the same conditions, or you must carefully construct your program so they are never waiting on the condition variable at the same time.
Why does this notification have to be handled by the first thread and not the second?
You may be better off with two separate condition variables.
Use pthread_cond_signal() to wake up one of the threads.
However, more than one might be awoken; this is termed spurious wakeup. You need a variable to track your application state, as described in the manual page linked above.
Your requirement is impossible. You say "... I want only the first thread that is waiting must be signalled and not the others." But condition variables never, ever provide any way to ensure a thread isn't signaled. So if you have a requirement that a thread must not be signaled, you cannot use condition variables.
You must always use a condition variable like this:
while(NotSupposedToRun)
pthread_cond_wait(...);
So if the thread wakes up when it's not supposed to, the while is still false and the thread just goes back to sleep. This is mandatory because POSIX does not ever provide any guarantee that a thread won't be woken. An implementation is perfectly free to implement pthread_cond_signal as a call to pthread_cond_broadcast and unblock all threads on every signal if it wants to.
Because condition variables are stateless, the implementation never knows whether a thread is supposed to be woken or not for sure. It is your job to call pthread_cond_wait always, and only, when a thread should not be running.
See
http://en.wikipedia.org/wiki/Spurious_wakeup
for more details.
If you cannot precisely specify the wakeup conditions for each thread in a while loop like the one above, you should not be using condition variables.