POSIX thread exit/crash/exception-crash while holding mutex

POSIX thread exit/crash/exception-crash while holding mutex - c

Is there a well defined behavior for POSIX mutex ownership in case of
Thread exits
Thread crashes
Thread crashes due to exception
Suppose thread-1 owns a mutex. And thread-2 is waiting to acquire the same mutex. And thread-1 goes the 1/2/3 scenario. What is the effect on thread-2 ?
PS : I believe the behavior for spin-lock is, NOT to unblock thread-2, with reasoning that the section protected by spin-lock is in bad shape anyways.

If you're worried about these issues, Robust Mutexes may be the tool you're looking for:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_setrobust.html
After a thread that owns a robust mutex terminates without unlocking it, the next thread that attempts to lock it will get EOWNERDEAD and become the new owner. This signals that it's responsible for cleaning up the state the mutex protects, and marking it consistent again with the pthread_mutex_consistent function before unlocking it. Unlocking it without marking it consistent puts the mutex in a permanently unrecoverable state.
Note that with robust mutexes, all code that locks the mutex must be aware of the possibility that EOWNERDEAD could be returned.

It's really simple. If you don't explicitly unlock the mutex, it remains locked, regardless of what happened or why. This is c, not ruby on rails or visual basic.

Related

Do we indeed need mutex with attribute "PTHREAD_MUTEX_STALLED" which is opposite to "PTHREAD_MUTEX_ROBUST"?

I'm reading APUE Chapter 12(3rd edition) and it says: we can set either PTHREAD_MUTEX_STALLED or PTHREAD_MUTEX_ROBUST to the mutex. But I don't think we need mutex with attribute PTHREAD_MUTEX_STALLED, mutex should always "robust" , so that we can be notified if the side which locked the mutex is dead. If mutex is "stalled", we will be suspending forever.
And I know that Windows' mutex is always be "robust" and we will be notified with error WAIT_ABANDONED if the side which locked the mutex is dead. So, in what kind of scenario, we have to use "stalled" mutex, not "robust" mutex?
Thanks for your attention.

I see the following reasons why stalled mutex exists:
If robust mutex is used then everytime you try to lock a mutex, you have to check for EOWNERDEAD. So it requires an additional check.
If EOWNERDEAD is returned by pthread_mutex_lock() then you probably need to check all the state of shared objects that are relevant to that mutex has to checked and the mutex state has to reinstated by calling pthread_mutex_lock().
It's the default mutex attribute. Hence, no need for the application to call:pthread_mutexattr_setrobust().
Historical: early pthread implementations didn't have robust mutexes.
So all the above mentioned additional checks are only required if an application thinks a thread might die unexpectedly while holding a mutex, which is not how most threaded applications are designed. So it's a decision for an application to make if default behaviour (stalled) is sufficient or robust mutexes are needed.

Will killed process/thread release mutex?

Several processes access shared memory, locking it with the mutex and pthread_mutex_lock() for synchronization, and each process can be killed at any moment (in fact I described php-fpm with APC extension, but it doesn't matter).
Will the mutex be unlocked automatically, if the process locked the mutex and then was killed?
Or is there a way to unlock it automatically?
Edit: As it turns out, dying processes and threads have similar behavior in this situation, which depends on robust attribute of mutex.

That depends on the type of mutex. A "robust" mutex will survive the death of the thread/process. See this question: POSIX thread exit/crash/exception-crash while holding mutex
The next thread that will attempt to lock it will receive a EOWNERDEAD error code
Note: Collected information from the comments.

pthreads lock recovery

I am working on a multi-threaded network server application. At the moment, I am having issues with lock recovery. If a thread dies unexpectedly while it is holding a lock, say a mutex, rwlock, spinlock, etc..., is it possible to recover the lock from a different thread without having to go into the lock struct itself and manually disassociate the owner from the lock. I would like to not have to go to this extreme to clear it as this will make the code non-portable. I have attempted to force a lock owner change by doing a pthread_kill on the offending thread and looking at the return code. But even using a mutex type attribute of PTHREAD_MUTEX_ERRORCHECK, I still cannot gain control of the mutex from another thread if the locking thread has quit. This can be a problem if some internal table is being updated when the thread bails out as it will eventually cause the entire server application to halt.
I have used Google extensively and I'm getting conflicting information, even on here. Any suggestions or ideas that I can explore?
This is on FreeBSD 9.3 using clang-llvm compiler.

For mutexes which are shared between processes (PTHREAD_PROCESS_SHARED) you can set them PTHREAD_MUTEX_ROBUST... but you are stuck with the problem that the state protected by the mutex may be invalid -- depending on the application.
For mutexes which are not shared between processes, there is no standard notion of "robustness", because a thread cannot spontaneously die on its own -- a thread will run until either it is cancelled, it exits or the process exits or dies.
You can use:
void pthread_cleanup_push(void (*routine)(void*), void *arg);
void pthread_cleanup_pop(int execute);
to arrange for a mutex to be released if the thread is cancelled or exits while holding the mutex -- something like:
pthread_mutex_lock(&foo) ; // as now
pthread_cleanup_push(pthread_mutex_unlock, &foo) ; // extra step
....
pthread_cleanup_pop(true) ; // replacing the pthread_mutex_unlock()
HOWEVER: you still need to think very carefully about what state the data protected by the mutex is in when the thread is cancelled or exits !!
You may be much better off examining why the thread needs this, and perhaps sort out any error/exception handling to pass the error/exception up and out of the critical section (leaving the critical section cleanly).

Is this usage of condition variables ALWAYS subject to a lost-signal race?

Suppose a condition variable is used in a situation where the signaling thread modifies the state affecting the truth value of the predicate and calls pthread_cond_signal without holding the mutex associated with the condition variable? Is it true that this type of usage is always subject to race conditions where the signal may be missed?
To me, there seems to always be an obvious race:
Waiter evaluates the predicate as false, but before it can begin waiting...
Another thread changes state in a way that makes the predicate true.
That other thread calls pthread_cond_signal, which does nothing because there are no waiters yet.
The waiter thread enters pthread_cond_wait, unaware that the predicate is now true, and waits indefinitely.
But does this same kind of race condition always exist if the situation is changed so that either (A) the mutex is held while calling pthread_cond_signal, just not while changing the state, or (B) so that the mutex is held while changing the state, just not while calling pthread_cond_signal?
I'm asking from a standpoint of wanting to know if there are any valid uses of the above not-best-practices usages, i.e. whether a correct condition-variable implementation needs to account for such usages in avoiding race conditions itself, or whether it can ignore them because they're already inherently racy.

The fundamental race here looks like this:
THREAD A THREAD B
Mutex lock
Check state
Change state
Signal
cvar wait
(never awakens)
If we take a lock EITHER on the state change OR the signal, OR both, then we avoid this; it's not possible for both the state-change and the signal to occur while thread A is in its critical section and holding the lock.
If we consider the reverse case, where thread A interleaves into thread B, there's no problem:
THREAD A THREAD B
Change state
Mutex lock
Check state
( no need to wait )
Mutex unlock
Signal (nobody cares)
So there's no particular need for thread B to hold a mutex over the entire operation; it just need to hold the mutex for some, possible infinitesimally small interval, between the state change and signal. Of course, if the state itself requires locking for safe manipulation, then the lock must be held over the state change as well.
Finally, note that dropping the mutex early is unlikely to be a performance improvement in most cases. Requiring the mutex to be held reduces contention over the internal locks in the condition variable, and in modern pthreads implementations, the system can 'move' the waiting thread from waiting on the cvar to waiting on the mutex without waking it up (thus avoiding it waking up only to immediately block on the mutex).
As pointed out in the comments, dropping the mutex may improve performance in some cases, by reducing the number of syscalls needed. Then again it could also lead to extra contention on the condition variable's internal mutex. Hard to say. It's probably not worth worrying about in any case.
Note that the applicable standards require that pthread_cond_signal be safely callable without holding the mutex:
The pthread_cond_signal() or pthread_cond_broadcast() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits [...]
This usually means that condition variables have an internal lock over their internal data structures, or otherwise use some very careful lock-free algorithm.

The state must be modified inside a mutex, if for no other reason than the possibility of spurious wake-ups, which would lead to the reader reading the state while the writer is in the middle of writing it.
You can call pthread_cond_signal anytime after the state is changed. It doesn't have to be inside the mutex. POSIX guarantees that at least one waiter will awaken to check the new state. More to the point:
Calling pthread_cond_signal doesn't guarantee that a reader will acquire the mutex first. Another writer might get in before a reader gets a chance to check the new status. Condition variables don't guarantee that readers immediately follow writers (After all, what if there are no readers?)
Calling it after releasing the lock is actually better, since you don't risk having the just-awoken reader immediately going back to sleep trying to acquire the lock that the writer is still holding.
EDIT: #DietrichEpp makes a good point in the comments. The writer must change the state in such a way that the reader can never access an inconsistent state. It can do so either by acquiring the mutex used in the condition-variable, as I indicate above, or by ensuring that all state-changes are atomic.

The answer is, there is a race, and to eliminate that race, you must do this:
/* atomic op outside of mutex, and then: */
pthread_mutex_lock(&m);
pthread_mutex_unlock(&m);
pthread_cond_signal(&c);
The protection of the data doesn't matter, because you don't hold the mutex when calling pthread_cond_signal anyway.
See, by locking and unlocking the mutex, you have created a barrier. During that brief moment when the signaler has the mutex, there is a certainty: no other thread has the mutex. This means no other thread is executing any critical regions.
This means that all threads are either about to get the mutex to discover the change you have posted, or else they have already found that change and ran off with it (releasing the mutex), or else have not found they are looking for and have atomically given up the mutex to gone to sleep (and are guaranteed to be waiting nicely on the condition).
Without the mutex lock/unlock, you have no synchronization. The signal will sometimes fire as threads which didn't see the changed atomic value are transitioning to their atomic sleep to wait for it.
So this is what the mutex does from the point of view of a thread which is signaling. You can get the atomicity of access from something else, but not the synchronization.
P.S. I have implemented this logic before. The situation was in the Linux kernel (using my own mutexes and condition variables).
In my situation, it was impossible for the signaler to hold the mutex for the atomic operation on shared data. Why? Because the signaler did the operation in user space, inside a buffer shared between the kernel and user, and then (in some situations) made a system call into the kernel to wake up a thread. User space simply made some modifications to the buffer, and then if some conditions were satisfied, it would perform an ioctl.
So in the ioctl call I did the mutex lock/unlock thing, and then hit the condition variable. This ensured that the thread would not miss the wake up related to that latest modification posted by user space.
At first I just had the condition variable signal, but it looked wrong without the involvement of the mutex, so I reasoned about the situation a little bit and realized that the mutex must simply be locked and unlocked to conform to the synchronization ritual which eliminates the lost wakeup.

What happens to Mutex when the thread which acquired it exits?

Suppose there are two threads, the main thread and say thread B(created by main). If B acquired a mutex(say pthread_mutex) and it has called pthread_exit without unlocking the lock. So what happens to the mutex? Does it become free?

nope. The mutex remaines locked. What actually happens to such a lock depends on its type, You can read about that here or here

If you created a robust mutex by setting up the right attributes before calling pthread_mutex_init, the mutex will enter a special state when the thread that holds the lock terminates, and the next thread to attempt to acquire the mutex will obtain an error of EOWNERDEAD. It is then responsible for cleaning up whatever state the mutex protects and calling pthread_mutex_consistent to make the mutex usable again, or calling pthread_mutex_unlock (which will make the mutex permanently unusable; further attempts to use it will return ENOTRECOVERABLE).
For non-robust mutexes, the mutex is permanently unusable if the thread that locked it terminates without unlocking it. Per the standard (see the resolution to issue 755 on the Austin Group tracker), the mutex remains locked and its formal ownership continues to belong to the thread that exited, and any thread that attempts to lock it will deadlock. If another thread attempts to unlock it, that's normally undefined behavior, unless the mutex was created with the PTHREAD_MUTEX_ERRORCHECK attribute, in which case an error will be returned.
On the other hand, many (most?) real-world implementations don't actually follow the requirements of the standard. An attempt to lock or unlock the mutex from another thread might spuriously succeed, since the thread id (used to track ownership) might have been reused and may now refer to a different thread (possibly the one making the new lock/unlock request). At least glibc's NPTL is known to exhibit this behavior.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight