Right before exiting, I call from the main() in the following order to:
pthread_cancel() other threads uses mtx which are "waiting" (They are waiting for other cond_variable and mutex. Maybe that's the problem?
pthread_cond_destroy(&cnd) (which is "coupled" whith mtx)
pthread_mutex_unlock(&mtx)
pthread_mutex_destroy(&mtx)
However, the last function results EBUSY. Each time another thread uses the mutex it almost immediately release it. Also, as mentioned, I kill all those threads before trying to destroy the mutex.
Why is it happening?
As per man pthread_mutex_destroy:
The pthread_mutex_destroy() function may fail if:
EBUSY
The implementation has detected an attempt to destroy the object referenced by mutex while it is locked or referenced (for example,
while being used in a pthread_cond_timedwait() or pthread_cond_wait())
by another thread.
Check if the mutex is not used by another thread when you try to destroy it.
pthread_cancel() other threads uses mtx which are "waiting" (They are waiting for other cond_variable and mutex.
Cancellation is running asynchronously to the cancelling process, that is pthread_cancel() might very well return before the thread to be cancelled ended.
This results in resources (mutexes, conditions, ...) used by the thread to be cancelled perhaps still being in use when immediately calling pthread_mutex_destroy() afterwards.
The only way to test whether cancellation succeeded it to call pthread_join()on the cancelled thread and expect it to return PTHREAD_CANCELED. This implies that the thread to be cancelled wasn't detached.
Here you see one possible issue with cancelling threads. There are others. Simply avoid all this by not using pthread_cancel(), but implement a proper design ending all threads in well defined manner.
Related
I'm tryng to do the Dining philosophers, and in my code, after a thread drop the stick, they also send a broadcast to all thread waiting in the while loop, to move foward, but apparently this is not happening and I don't know way
https://github.com/lucizzz/Philosophers/blob/main/dinning.c
Your code has a lot of bugs, but the most fundamental one is that you access shared state without holding the mutex that protects that state. For example, the while loop in routine_1 tests the stick array without holding the mutex. It even calls pthread_cond_wait without holding the mutex.
This is wrong for many reasons, but the most obvious is this -- what if the while loop decides to call pthread_cond_wait, but then before you call pthread_cond_wait, the thread holding the resources releases it. Now, you are calling pthread_cond_wait to wait for something that has already happened -- you will be waiting forever.
You must hold the mutex both when you decide whether to call pthread_cond_wait and when you actually do call pthread_cond_wait or your code will wait forever if a thread releases the resource before you were able to wait for it.
Fundamentally, the whole point of condition variables is to provide an atomic "unlock and wait" operation to avoid this race condition. But your code doesn't use the mutexes correctly.
In the pthread library there is the concept of cancellation points. Most system functions that may block the execution for longer time (or wait on some ressources...) can be aborted by pthread cancellation points.
Guess there is some data protected by a condition variable that is executed in a thread like in pseudo code below. This thread has a setup cleanup procedure that is called in case a cancellation request is made to this thread.
THREAD_CLEANUP_PROC {
UNLOCK(mutex) // Is this unlock required?
}
THREAD_PROC {
SET THREAD_CLEANUP = THREAD_CLEANUP_PROC
LOOP {
LOCK(mutex)
WHILE (condition == false) {
condition.WAIT(mutex) // wait interrupted, cancel point is called
}
// ... we have the lock
UNLOCK(mutex)
condition.NOTIFY_ALL()
read(descriptor); // wait for some data on a file descriptor while lock is not acquired
}
}
If someone cancels the thread (pthread_cancel()) while waiting for the condition variable, the documentation about pthread_cond_wait says that the thread gets unblocked while acquiring the lock and start executing the cleanup handler before the thread ends.
Am I true that the cleanup handler is now responsible for unlocking that lock (mutex)? What if - like in my example - there is another blocking method like read that blocks while waiting for data but without acquiring the lock? In this case that read is also unblocked and the cleanup handler is called as if before. Only this time the cleanup handler shall not unlock the mutex. Am I correct. If so, what is the best way to handle this situation? Are there common concepts that should be followed?
Thread cancellation is messy. Generally speaking, you should not do it.
In the pthread library there is the concept of cancellation points.
Yes.
Most system functions that may block the execution for longer time (or wait on some ressources...) can be aborted by pthread cancellation points.
Not exactly. Many functions such as you describe are cancellation points. A thread with "deferred" cancellation type will abort when it calls a function that is a cancellation point if it is currently cancellable and has a cancellation request pending. That does not imply that such a function can be interrupted by thread cancellation. Threads with "asynchronous" cancellation can be canceled at more or less any time, including when blocking on a long-running task, but cancellation points are irrelevant in that case.
If someone cancels the thread (pthread_cancel()) while waiting for the condition variable, the documentation about pthread_cond_wait says that the thread gets unblocked while acquiring the lock and start executing the cleanup handler before the thread ends.
Yes, provided that the thread has "deferred" cancellation type.
Am I true that the cleanup handler is now responsible for unlocking that lock (mutex)?
Yes. In this case, the thread holds the mutex locked when it commences its cancellation procedure. If it does not unlock the mutex before it terminates then at minimum you're in for a big hassle. Some types of mutexes (supported by pthreads) may provide for a way to recover from this situation, but you would do well to avoid it.
What if - like in my example - there is another blocking method like read that blocks while waiting for data but without acquiring the lock? In this case that read is also unblocked and the cleanup handler is called as if before. Only this time the cleanup handler shall not unlock the mutex. Am I correct.
Again, there are various types of mutex, and the situation may differ depending on which you use, but by far the best choice is to carefully avoid any thread trying to unlock a mutex that it does not hold locked.
If so, what is the best way to handle this situation?
The best way to handle the situation is to avoid it in the first place. Do not use thread cancellation, especially for threads susceptible to such issues, which are in fact common.
Instead, write multithreaded programs carefully to afford yourself alternative means to shut down threads or the whole program in a timely manner. There is a whole host of such techniques, more than I could reasonably summarize in an SO answer.
Your code must be edited to look like:
THREAD_CLEANUP_PROC {
UNLOCK(mutex) // Is this unlock required? YES
}
THREAD_PROC {
LOOP {
LOCK(mutex)
SET THREAD_CLEANUP_PUSH = THREAD_CLEANUP_PROC // After adquire the lock
WHILE (condition == false) {
condition.WAIT(mutex) // wait interrupted, cancel point is called
}
// ... we have the lock
THEAD_CLEANUP_POP(1) // This unlock the mutex and remove the cleanup
// UNLOCK(mutex)
condition.NOTIFY_ALL()
read(descriptor); // wait for some data on a file descriptor while lock is not acquired
}
}
I'm currently using WaitForSingleObject((HANDLE)handle,INFINITE) function to mutex-lock some parts of my code.
Now I have a situation where I do not want to lock it but just peek, if it is in the locked-state. Using POSIX I can do that with pthread_mutex_trylock() - when it fails, I know there is already a lock on this mutex.
So: how can this be done with WaitforSingleObject()-call? How can I find out if the related mutex is already locked?
I guess it has something to do with the dwMilliseconds parameter, but I don't understand how I can find out if it is locked or just returned because of an other lock...
WaitForSingleObject (family of functions) is used for effectively putting a thread to sleep while waiting on various types of Windows handles. Execution of the thread will wait until the function has returned. In the simplest case of using mutex, these functions also request a lock. The thread will keep the mutex locked until you call ReleaseMutex.
dwMilliseconds merely specifies the wait timeout. Normally you should use the constant INFINITE here. You can also pass the value 0 to dwMilliseconds to have the function check the status of the handle and immediately return and continue execution. If it returns WAIT_OBJECT_0 (or equivalent), you have the mutex lock. This is the equivalent to pthread_mutex_trylock.
In case you do specify a timeout, WaitForSingleObject will return a timeout status WAIT_TIMEOUT when it has not gotten the requested handle within the specified time period. In case of WaitForMultipleObjects, you also need to check the result to see which object you got.
Example from MSDN: https://learn.microsoft.com/en-us/windows/win32/sync/using-mutex-objects
There is no "peeking" a mutex's current state. Either you lock the mutex or you don't, there is no peek.
pthread_mutex_trylock() always returns immediately and does not block the calling thread whether the mutex is already locked or not. However, if the mutex was not locked and trylock is successful then the lock has been obtained and you must unlock it. You must check the return value to know which is the case.
To replicate the same behavior with WaitForSingleObject(), simply set the timeout to 0 so it exits immediately without blocking. If the mutex is not already locked and the wait is successful, the lock is obtained and you must unlock it. Again, you must check the return value.
Note: there is a subtle but important difference between a pthread mutex and a Win32 mutex. A Win32 mutex is always recursive. When a thread already has a lock to a Win32 mutex, it can safely relock the same mutex without blocking itself. An internal lock count is incremented each time the mutex is relocked, and the thread simply needs to unlock the mutex as many times as it (re)locked in order to release the mutex for other threads to lock. A pthread mutex, on the other hand, is recursive only if the mutex creator explicitly requests it when calling pthread_mutex_init(). So be carefully with your (re)locking to avoid deadlocking your code.
I'm trying to understand how pthread_cond_broadcast() works and whether is possible to "attach" thread to the waiting list (or queue) of event (broadcast signal) that already blocked by another thread.
Let's assume that we have two threads.
Thread #1 in a waiting loop
pthread_mutex_lock();
while(condition_is_false)
pthread_cond_wait();
pthread_mutex_unlock();
And somewhere in the middle of this process when thread #1 already blocked then another thread #2 calls the same or almost the same code in hope to be "attached" to the same condvar:
pthread_mutex_lock();
while(condition_is_false)
pthread_cond_wait(); or pthread_cond_timedwait()
pthread_mutex_unlock();
As I understand thread #2 would not get access to the code locked by mutex until it will unlocked. Am I right?
I'm trying to implement next case: There are some tasks which required a time to be done. During that time another thread(s) not allowed to duplicate the same task that already in a progress but wait until it will finished. And when task will finally finished then all threads must get the same result.
Your scenario is exactly the one for which condition variables are designed.
There is no problem for the second thread to achieve the lock on the mutex, because wait (and derivative) releases the mutex temporarily during wait and re-acquires it when coming back.
You should definitively read more in the abundant documentation about the concept of mutex and condition variables.
(This question might be somewhat related to pthread_exit in signal handler causes segmentation fault) I'm writing a leadlock prevention library, where there is always a checking thread doing graph stuff and checks if there is deadlock, if so then it signals one of the conflicting threads. When that thread catches the signal it releases all mutex(es) it owns and exits. There are multiple resource mutexes (obviously) and one critical region mutex, all calls to acquire, release resource lock and do graph calculations must obtain this lock first. Now there goes the problem. With 2 competing (not counting the checking thread) threads, sometimes the program deadlocks after one thread gets killed. In gdb it's saying the dead thread owns critical region lock but never released it. After adding break point in signal handler and stepping through, it appears that lock belongs to someone else (as expected) right before pthread_exit(), but the ownership magically goes to this thread after pthread_exit()..The only guess I can think of is the thread to be killed was blocking at pthread_mutex_lock when trying to gain the critical region lock (because it wanted another resource mutex), then the signal came, interrupting the pthread_mutex_lock. Since this call is not signal-proof, something weird happened? Like the signal handler might have returned and that thread got the lock then exited? Idk.. Any insight is appreciated!
pthread_exit is not async-signal-safe, and thus the only way you can call it from a signal handler is if you ensure that the signal is not interrupting any non-async-signal-safe function.
As a general principle, using signals as a method of communication with threads is usually a really bad idea. You end up mixing two issues that are already difficult enough on their own: thread-safety (proper synchronization between threads) and reentrancy within a single thread.
If your goal with signals is just to instruct a thread to terminate, a better mechanism might be pthread_cancel. To use this safely, however, the thread that will be cancelled must setup cancellation handlers at the proper points and/or disable cancellation temporarily when it's not safe (with pthread_setcancelstate). Also, be aware that pthread_mutex_lock is not a cancellation point. There's no safe way to interrupt a thread that's blocked waiting to obtain a mutex, so if you need interruptability like this, you probably need either a more elaborate synchronization setup with condition variables (condvar waits are cancellable), or you could use semaphores instead of mutexes.
Edit: If you really do need a way to terminate threads waiting for mutexes, you could replace calls to pthread_mutex_lock with calls to your own function that loops calling pthread_mutex_timedlock and checking for an exit flag on each timeout.