I am beginner to SO, so please let me know if the question is not clear.
I am using two threads for example A and B. And i have a global variable 'p'.
Thread A is while looping and incrementing the value of 'p'.At the same time B is trying to set the 'p' with some other value(both are two different thread functions).
If I am using mutex for synchronizations, and the thread A get the mutex and incrementation the 'p' in a while loop,but it does not release the mutex.
So my question is that if the thread A doesn’t release the mutex can the thread B access the variable 'p'??
EDIT
The thread B is also protected accses to 'p' using mutex.
If the thread A lock using pthread_mutex_lock(), and doesn’t release it , then what happen if the same thread(A) try to access the lock again(remember the thread A is while looping)
For example
while(1)
{
pthread_mutex_lock(&mutex);
p = 10;
}
Is there any problem with this code if the mutex is never released?
You can still access the variable in thread B as the mutex is a separate object not connected to the variable. If You call mutex lock from thread B before accessing p then the thread B will wait for mutex to be released. In fact the thread A will only execute loop body once as it will wait for the mutex to be released before it can lock it again.
If You don't unlock the mutex then any call to lock the same mutex will wait indefinitely, but the variable will be writable.
In Your example access to variable p is what is called a critical section, or the part of code that is between mutex lock and mutex release.
There is no restriction on mutex, you need to write your program to following the rules of using mutex.
Here is the basic steps to use mutex on shared resource:
Acquire lock first
do job (increase for A, set value for B)
Release lock,
If both A & B follow the rules, then B can't modify it, while A keeps the lock.
Or, if your thread B don't acquire the lock first, it of cause could modify the variable, but that would be a bug for concurrent programming.
And, by the way, you can also use condition together with mutex, so that you can let threads wait & notify each other, instead of looping all the time which is a waste of machine resource.
For your updated question
On linux, in c, there are mainly 3 methods to acquire lock of mutex, what happens when a thread can't get the lock depends on which methods u use.
int pthread_mutex_lock(pthread_mutex_t * mutex );
if it's already locked by another thread, then it block until the lock is unlocked,
int pthread_mutex_trylock(pthread_mutex_t * mutex );
similar to pthread_mutex_lock(), but it won't block, instead return error EBUSY,
int pthread_mutex_timedlock(pthread_mutex_t *restrict mutex, const struct timespec *restrict abs_timeout);
similar to pthread_mutex_lock(), but it will wait for a timeout before return error ETIMEDOUT,
Simple example of statically initialized mutex
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
static int p = 0;
static pthread_mutex_t locker = PTHREAD_MUTEX_INITIALIZER;
static void *
threadFunc(void *arg)
{
int err;
err = pthread_mutex_lock(&locker);
if (err != 0){
perror("pthread_mutex_lock failed");
exit(1);
}
p++;
err = pthread_mutex_unlock(&locker);
if (err != 0){
perror("pthread_mutex_unlock failed");
exit(1);
}
return NULL;
}
int
main(int argc, char *argv[])
{
pthread_t A, B;
pthread_create(&A, NULL, threadFunc, NULL);
pthread_create(&B, NULL, threadFunc, NULL);
pthread_join(A, NULL);
pthread_join(B, NULL);
printf("p = %d\n", p);
return 0;
}
Error checking in main is omitted for brevity but should be used. If you do not release mutex program will never finish, thread B will never get lock.
Related
Story
According to the man page https://linux.die.net/man/3/pthread_mutex_lock
The mutex object referenced by mutex shall be locked by calling pthread_mutex_lock(). If the mutex is already locked, the calling thread shall block until the mutex becomes available.
I have a program with a thread. Here is the program flow:
The main process and the thread always call pthread_mutex_lock inside of a loop.
When the main process is holding the lock, the thread which is asking for lock, blocks (waiting for lock to be granted).
When the main process releases the lock with pthread_mutex_unlock, the thread should suddenly get the lock.
When the main process asks for the lock again, the main process should wait for the thread to release the lock.
The problem is that, at point 3, the thread does not suddenly get the lock as soon as the main process releases the lock. The main process gets it first when it calls to pthread_mutex_lock in the next loop cycle (at point 4).
How to deal with this situation?
Question
How can I make the thread get the lock as soon as the main process releases the lock?
Simple code to reproduce the problem
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
void *
my_thread(void *p)
{
(void)p;
while (1) {
pthread_mutex_lock(&my_mutex);
printf("The thread is holding the lock...\n");
sleep(1);
pthread_mutex_unlock(&my_mutex);
}
}
int
main()
{
pthread_t t;
pthread_create(&t, NULL, my_thread, NULL);
pthread_detach(t);
while (1) {
pthread_mutex_lock(&my_mutex);
printf("The main process is holding the lock...\n");
sleep(1);
pthread_mutex_unlock(&my_mutex);
}
}
Compile and Run
gcc test.c -o test -lpthread
./test
Expected result
The main process is holding the lock...
The thread is holding the lock...
The main process is holding the lock...
The thread is holding the lock...
The main process is holding the lock...
The thread is holding the lock...
The main process is holding the lock...
...
Actual result
The main process is holding the lock...
The main process is holding the lock...
The main process is holding the lock...
The main process is holding the lock...
The main process is holding the lock...
The main process is holding the lock...
The main process is holding the lock...
...
Calls story in order
main -> [1] call lock (get the lock)
thread -> [2] call lock (waiting for main to unlock)
main -> [3] call unlock
thread -> [4] (still does not get the lock from [2], why? even though it has been unlocked?)
main -> [5] lock (get the lock again)
thread -> [6] (still does not get the lock from [2])
main -> [7] call unlock
thread -> [8] (still does not get the lock from [2], why? even though it has been unlocked?)
main -> [9] lock (get the lock again)
... and so on ...
Summary
pthread_mutex_lock does not guarantee the order of lock requests.
pthread_mutex_lock guarantees that it will lock until the mutex becomes available. That does not mean that each lock() call enters a queue and is guaranteed to get the mutex lock next. It only means that nobody else will have the lock at the same time.
If you require a certain order, an option would be to use condition variables. That way, you can have a flag that is set to the next member which should get the mutex. You can then wait for the mutex until the value is as expected. See https://linux.die.net/man/3/pthread_cond_wait.
Alternatively, if your example has sleeps in it anyway as above, you can just move the sleep after the unlock() call. While that is not strictly speaking a guarantee, it will most definitely do the trick for a simple test. I do not recommend this approach for anything more serious/complex though.
EDIT: As Shawn correctly added, you can also use pthread_yield (1) to allow another thread to acquire the mutex if you don't care which other thread it is. Some intricacies with yielding are described in sched_yield(2).
PS: I would comment, but my rep is now high enough yet :)
Here is a 'fairlock' as an example; you can do better:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct fairlock FairLock;
struct fairlock {
pthread_mutex_t lock;
pthread_cond_t cv;
long scount;
long wcount;
};
#define err(x,v) do { int t; if ((t=(x)) != (v)) { \
error(__FILE__,__LINE__, #x, t, (v)); \
}} while (0)
static void error(char *fn, int lno, char *s, long x, long v) {
fprintf(stderr, "%s:%d %s returned %ld rather than %ld\n",
fn, lno, s, x, v);
exit(1);
}
void Lock(FairLock *f) {
err(pthread_mutex_lock(&f->lock), 0);
long me = f->scount++;
while (f->wcount != me) {
err(pthread_cond_wait(&f->cv, &f->lock), 0);
}
err(pthread_mutex_unlock(&f->lock), 0);
}
void UnLock(FairLock *f) {
err(pthread_mutex_lock(&f->lock), 0);
if (f->scount > f->wcount) {
f->wcount++;
err(pthread_cond_broadcast(&f->cv), 0);
}
err(pthread_mutex_unlock(&f->lock), 0);
}
FairLock *NewLock(void) {
FairLock *p = malloc(sizeof *p);
if (p != 0) {
err(pthread_mutex_init(&p->lock, 0),0);
err(pthread_cond_init(&p->cv, 0),0);
p->scount = p->wcount = 0;
}
return p;
}
void DoneLock(FairLock *f) {
err(pthread_mutex_destroy(&f->lock), 0);
err(pthread_cond_destroy(&f->cv), 0);
}
And your testlock.c changed to utilize it; again there is room for improvement, but you should be able to stick sleeps just about anywhere and it will remain fair....
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include "fairlock.c"
FairLock *my;
void *
my_thread(void *p)
{
while (1) {
Lock(my);
printf("%s is holding the lock...\n", p);
UnLock(my);
}
}
int
main()
{
pthread_t t;
my = NewLock();
pthread_create(&t, NULL, my_thread, "one");
pthread_detach(t);
pthread_create(&t, NULL, my_thread, "two");
pthread_detach(t);
pthread_create(&t, NULL, my_thread, "three");
pthread_detach(t);
pthread_create(&t, NULL, my_thread, "four");
pthread_detach(t);
pthread_create(&t, NULL, my_thread, "five");
pthread_detach(t);
while (1) {
Lock(my);
printf("main process is holding the lock...\n");
UnLock(my);
}
}
TLDR Version:
The very next thing that either of your two loops does after releasing the lock is to try to acquire it again.
When there's a race between thread A which has just released a lock and thread B which has been blocked, waiting for the lock, thread A almost always will win because thread A already is running, and thread B still is "asleep."
Releasing the lock doesn't instantaneously "wake up" the waiting thread. All it does is change the status of the other thread from "waiting for the lock" to "waiting to be assigned a CPU to run on." Some time real soon after, the Scheduler will get around to restoring the context of thread B on another CPU, and thread B will start running, but by then it will be too late. Thread A will have already re-locked the lock.
I'm doing a C application that reads and parses data from a set of sensors and, according to the readings of the senors, it turns on or off actuators.
For my application I will be using two threads, one to read and parse the data from the sensors and another one to act on the actuators. Obviously we may face the problem of one thread reading data from a certain variable while another one is trying to write on it. This is a sample code.
#include <pthread.h>
int sensor_values;
void* reads_from_sensor(){
//writes on sensor_values, while(1) loop
}
void* turns_on_or_off(){
//reads from sensor_values, while(1) loop
}
int main(){
pthread_t threads[2];
pthread_create(&threads[1],NULL,reads_from_sensor,NULL);
pthread_create(&threads[2],NULL,turns_on_or_off,NULL);
//code continues after
}
My question is how I can solve this issue, of a certain thread writing on a certain global variable while other thread is trying to read from it, at the same time. Thanks in advance.
OP wrote in the comments
The project is still in an alpha stage. I'll make sure I optimize it once it is done. #Pablo, the shared variable is sensor_values. reads_from_sensors write on it and turns_on_or_off reads from it.
...
sensor_value would be a float as it stores a value measured by a certain sensor. That value can either be voltage, temperature or humidity
In that case I'd use conditional variables using pthread_cond_wait and
pthread_cond_signal. With these functions you can synchronize threads
with each other.
The idea is that both threads get a pointer to a mutx, the condition variable
and the shared resource, whether you declared them a global or you pass them as
thread arguments, doesn't change the idea. In the code below I'm passing all
of these as thread arguments, because I don't like global variables.
The reading thread would lock the mutex and when it reads a new value of the
sensor, it writes the new value in the shared resource. Then it call
pthread_cond_signal to send a signal to the turning thread that a new value
arrived and that it can read from it.
The turning thread would also lock the mutex and execute pthread_cond_wait to
wait on the signal. The locking must be done in that way, because
pthread_cond_wait will release the lock and make the thread block until the
signal is sent:
man pthread_cond_wait
DESCRIPTION
The pthread_cond_timedwait() and pthread_cond_wait() functions shall block on a condition variable. The application shall ensure that
these functions are called with mutex locked by the calling thread; otherwise, an error (for PTHREAD_MUTEX_ERRORCHECK and robust
mutexes) or undefined behavior (for other mutexes) results.
These functions atomically release mutex and cause the calling thread to block on the condition variable cond; atomically here means
atomically with respect to access by another thread to the mutex and then the condition variable. That is, if another thread is
able to acquire the mutex after the about-to-block thread has released it, then a subsequent call to pthread_cond_broadcast() or
pthread_cond_signal() in that thread shall behave as if it were issued after the about-to-block thread has blocked.
Example:
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
struct thdata {
pthread_mutex_t *mutex;
pthread_cond_t *cond;
int *run;
float *sensor_value; // the shared resource
};
void *reads_from_sensors(void *tdata)
{
struct thdata *data = tdata;
int i = 0;
while(*data->run)
{
pthread_mutex_lock(data->mutex);
// read from sensor
*data->sensor_value = (rand() % 2000 - 1000) / 10.0;
// just for testing, send a singnal only every
// 3 reads
if((++i % 3) == 0)
{
printf("read: value == %f, sending signal\n", *data->sensor_value);
pthread_cond_signal(data->cond);
}
pthread_mutex_unlock(data->mutex);
sleep(1);
}
// sending signal so that other thread can
// exit
pthread_mutex_lock(data->mutex);
pthread_cond_signal(data->cond);
pthread_mutex_unlock(data->mutex);
puts("read: bye");
pthread_exit(NULL);
}
void *turns_on_or_off (void *tdata)
{
struct thdata *data = tdata;
while(*data->run)
{
pthread_mutex_lock(data->mutex);
pthread_cond_wait(data->cond, data->mutex);
printf("turns: value read: %f\n\n", *data->sensor_value);
pthread_mutex_unlock(data->mutex);
usleep(1000);
}
puts("turns: bye");
pthread_exit(NULL);
}
int main(void)
{
srand(time(NULL));
struct thdata thd[2];
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
// controlling vars
int run_rfs = 1;
int run_tof = 1;
float sensor_value;
thd[0].run = &run_rfs;
thd[1].run = &run_tof;
thd[0].mutex = &mutex;
thd[1].mutex = &mutex;
thd[0].cond = &cond;
thd[1].cond = &cond;
thd[0].sensor_value = &sensor_value;
thd[1].sensor_value = &sensor_value;
pthread_t th[2];
printf("Press ENTER to exit...\n");
pthread_create(th, NULL, reads_from_sensors, thd);
pthread_create(th + 1, NULL, turns_on_or_off, thd + 1);
getchar();
puts("Stopping threads...");
run_rfs = 0;
run_tof = 0;
pthread_join(th[0], NULL);
pthread_join(th[1], NULL);
return 0;
}
Output:
$ ./a
Press ENTER to exit...
read: value == -99.500000, sending signal
turns: value read: -99.500000
read: value == -25.200001, sending signal
turns: value read: -25.200001
read: value == 53.799999, sending signal
turns: value read: 53.799999
read: value == 20.400000, sending signal
turns: value read: 20.400000
Stopping threads...
read: bye
turns: value read: 20.400000
turns: bye
Note that in the example I only send the signal every 3 seconds (and do a long
sleep(1)) for testing purposes, otherwise the terminal would overflow immediately
and you would have a hard time reading the output.
See also: understanding of pthread_cond_wait() and pthread_cond_signal()
Your question is too generic. There are different multithread synchronization methods mutex, reader-writer locks, conditional variables and so on.
The easiest and most simple are mutex (mutual excluasion). They are pthread_mutex_t type variables. You first need to initialize them; you can do it in two ways:
assigning to the mutex variable the constant value PTHREAD_MUTEX_INITIALIZER
calling the funtion pthread_mutex_init
Then before reading or writing a shared variable you call the function int pthread_mutex_lock(pthread_mutex_t *mutex); and after exited the critical section you must release the critical section by calling int pthread_mutex_unlock(pthread_mutex_t *mutex);.
If the resource is busy the lock will block the execution of your code until it gets released. If you want to avoid that take a look at int pthread_mutex_trylock(pthread_mutex_t *mutex);.
If your program has much more reads than writes on the same shared variable, take a look at the Reader-Writer locks.
A Naive question ..
I read before saying - "A MUTEX has to be unlocked only by the thread that locked it."
But I have written a program where THREAD1 locks mutexVar and goes for a sleep. Then THREAD2 can directly unlock mutexVar do some operations and return.
==> I know everyone say why I am doing so ?? But my question is - Is this a right behaviour of MUTEX ??
==> Adding the sample code
void *functionC()
{
pthread_mutex_lock( &mutex1 );
counter++;
sleep(10);
printf("Thread01: Counter value: %d\n",counter);
pthread_mutex_unlock( &mutex1 );
}
void *functionD()
{
pthread_mutex_unlock( &mutex1 );
pthread_mutex_lock( &mutex1 );
counter=10;
printf("Counter value: %d\n",counter);
}
int main()
{
int rc1, rc2;
pthread_t thread1, thread2;
if(pthread_mutex_init(&mutex1, NULL))
printf("Error while using pthread_mutex_init\n");
if( (rc1=pthread_create( &thread1, NULL, &functionC, NULL)) )
{
printf("Thread creation failed: %d\n", rc1);
}
if( (rc2=pthread_create( &thread2, NULL, &functionD, NULL)) )
{
printf("Thread creation failed: %d\n", rc2);
}
Pthreads has 3 different kinds of mutexes: Fast mutex, recursive mutex, and error checking mutex. You used a fast mutex which, for performance reasons, will not check for this error. If you use the error checking mutex on Linux you will find you get the results you expect.
Below is a small hack of your program as an example and proof. It locks the mutex in main() and the unlock in the created thread will fail.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
/*** NOTE THE ATTR INITIALIZER HERE! ***/
pthread_mutex_t mutex1 = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
int counter = 0;
void *functionD(void* data)
{
int rc;
if ((rc = pthread_mutex_unlock(&mutex1)) != 0)
{
errno = rc;
perror("other thread unlock result");
exit(1);
}
pthread_mutex_lock(&mutex1);
counter=10;
printf("Thread02: Counter value: %d\n",counter);
return(data);
}
int main(int argc, char *argv[])
{
int rc1;
pthread_t thread1;
if ((rc1 = pthread_mutex_lock(&mutex1)) != 0)
{
errno = rc1;
perror("main lock result");
}
if( (rc1 = pthread_create(&thread1, NULL, &functionD, NULL)))
{
printf("Thread creation failed: %d\n", rc1);
}
pthread_join(thread1, NULL);
}
What you've done is simply not legal, and the behavior is undefined. Mutexes only exclude threads that play by the rules. If you tried to lock mutex1 from thread 2, the thread would be blocked, of course; that's the required thing to do. There's nothing in the spec that says what happens if you try to unlock a mutex you don't own!
A mutex is used to prevent multiple threads from executing code that is only safe for one thread at a time.
To do this a mutex has several features:
A mutex can handle the race conditions associated with multiple threads trying to "lock" the mutex at the same time and always results with one thread winning the race.
Any thread that loses the race gets put to sleep permanently until the mutex is unlocked. The mutex maintains a list of these threads.
A will hand the "lock" to one and only one of the waiting threads when the mutex is unlocked by the thread who was just using it. The mutex will wake that thread.
If that type of pattern is useful for some other purpose then go ahead and use it for a different reason.
Back to your question. Lets say you were protecting some code from multiple thread accesses with a mutex and lets say 5 threads were waiting while thread A was executing the code. If thread B (not one of the ones waiting since they are permanently slept at the moment) unlocks the mutex, another thread will commence executing the code at the same time as thread A. Probably not desired.
Maybe if we knew what you were thinking about using the mutex for we could give a better answer. Are you trying to unlock a mutex after a thread was canceled? Do you have code that can handle 2 threads at a time but not three and there is no mutex that lets 2 threads through at a time?
I have written the following program to implement two threads in POSIX. There is a global shared variable sum, which is being accessed by two different threads simultaneously. I have used mutex lock and unlock inside each thread while accessing the shared variable. I have a question. Here I have used the samed mutex lock (pthread_mutex_lock(&mutex))inside the two threads. What will Happen if I use two different mutex lock and unlock inside the threads (such as pthread_mutex_lock(&mutex)) in thread1 and pthread_mutex_lock(&mutex1) in thread2. I have commented out the line of confusion in the code.
My sample code fragment:
#include<stdio.h>
#include<pthread.h>
#include<stdlib.h>
pthread_mutex_t mutex=PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex1=PTHREAD_MUTEX_INITIALIZER;
int sum=0;
void * threadFunc1(void * arg)
{
int i;
for(i=1;i<100;i++)
{
printf("%s\n",(char*)arg);
pthread_mutex_lock(&mutex)
sum++;
pthread_mutex_unlock(&mutex)
sleep(1);
}
}
void * threadFunc2(void * arg)
{
int i;
for(i=1;i<100;i++)
{
printf("%s\n",(char*)arg);
pthread_mutex_lock(&mutex) //what will happen if I use mutex1 here
sum--;
pthread_mutex_lock(&mutex) //what will happen if I use mutex1 here
sleep(1);
}
}
int main(void)
{
pthread_t thread1;
pthread_t thread2;
char * message1 = "i am thread 1";
char * message2 = "i am thread 2";
pthread_create(&thread1,NULL,threadFunc1,(void*)message1 );
pthread_create(&thread2,NULL,threadFunc2,(void*)message2 );
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
return 0;
}
What is the basic difference between using the same mutex locks and different mutex lock in accessing a shared variable?
The purpose of the mutex is to prevent one thread from accessing the shared variable while another thread is, or might be, modifying it. The semantics of a mutex are that two threads cannot lock the same mutex at the same time. If you use two different mutexes, you don't prevent one thread from accessing the shared variable while another thread is modifying it, since threads can hold different mutexes at the same time. So the code will no longer be guaranteed to work.
Consider the following test program:
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <strings.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
pthread_mutex_t mutex;
pthread_mutexattr_t mattr;
pthread_t thread1;
pthread_t thread2;
pthread_t thread3;
void mutex_force_unlock(pthread_mutex_t *mutex, pthread_mutexattr_t *mattr)
{
int e;
e = pthread_mutex_destroy(mutex);
printf("mfu: %s\n", strerror(e));
e = pthread_mutex_init(mutex, mattr);
printf("mfu: %s\n", strerror(e));
}
void *thread(void *d)
{
int e;
e = pthread_mutex_trylock(&mutex);
if (e != 0)
{
printf("thr: %s\n", strerror(e));
mutex_force_unlock(&mutex, &mattr);
e = pthread_mutex_unlock(&mutex);
printf("thr: %s\n", strerror(e));
if (e != 0) pthread_exit(NULL);
e = pthread_mutex_lock(&mutex);
printf("thr: %s\n", strerror(e));
}
pthread_exit(NULL);
}
void * thread_deadtest(void *d)
{
int e;
e = pthread_mutex_lock(&mutex);
printf("thr2: %s\n", strerror(e));
e = pthread_mutex_lock(&mutex);
printf("thr2: %s\n", strerror(e));
pthread_exit(NULL);
}
int main(void)
{
/* Setup */
pthread_mutexattr_init(&mattr);
pthread_mutexattr_settype(&mattr, PTHREAD_MUTEX_ERRORCHECK);
//pthread_mutexattr_settype(&mattr, PTHREAD_MUTEX_NORMAL);
pthread_mutex_init(&mutex, &mattr);
/* Test */
pthread_create(&thread1, NULL, &thread, NULL);
pthread_join(thread1, NULL);
if (pthread_kill(thread1, 0) != 0) printf("Thread 1 has died.\n");
pthread_create(&thread2, NULL, &thread, NULL);
pthread_join(thread2, NULL);
pthread_create(&thread3, NULL, &thread_deadtest, NULL);
pthread_join(thread3, NULL);
return(0);
}
Now when this program runs, I get the following output:
Thread 1 has died.
thr: Device busy
mfu: Device busy
mfu: No error: 0
thr: Operation not permitted
thr2: No error: 0
thr2: Resource deadlock avoided
Now I know this has been asked a number of times before, but is there any way to forcefully unlock a mutex? It seems the implementation will only allow the mutex to be unlocked by the thread that locked it as it seems to actively check, even with a normal mutex type.
Why am I doing this? It has to do with coding a bullet-proof network server that has the ability to recover from most errors, including ones where the thread terminates unexpectedly. At this point, I can see no way of unlocking a mutex from a thread that is different than the one that locked it. So the way that I see it is that I have a few options:
Abandon the mutex and create a new one. This is the undesirable option as it creates a memory leak.
Close all network ports and restart the server.
Go into the kernel internals and release the mutex there bypassing the error checking.
I have asked this before but, the powers that be absolutely want this functionality and they will not take no for an answer (I've already tried), so I'm kinda stuck with this. I didn't design it this way, and I would really like to shoot the person who did, but that's not an option either.
And before someone says anything, my usage of pthread_kill is legal under POSIX...I checked.
I forgot to mention, this is FreeBSD 9.3 that we are working with.
Use a robust mutex, and if the locking thread dies, fix the mutex with pthread_mutex_consistent().
If mutex is a robust mutex in an inconsistent state, the
pthread_mutex_consistent() function can be used to mark the state
protected by the mutex referenced by mutex as consistent again.
If an owner of a robust mutex terminates while holding the mutex, the
mutex becomes inconsistent and the next thread that acquires the mutex
lock shall be notified of the state by the return value [EOWNERDEAD].
In this case, the mutex does not become normally usable again until
the state is marked consistent.
If the thread which acquired the mutex lock with the return value
[EOWNERDEAD] terminates before calling either
pthread_mutex_consistent() or pthread_mutex_unlock(), the next thread
that acquires the mutex lock shall be notified about the state of the
mutex by the return value [EOWNERDEAD].
Well, you cannot do what you ask wit a normal pthread mutex, since, as you say, you can only unlock a mutex from the thread that locked it.
What you can do is wrap locking/unlocking of a mutex such that you have a pthread cancel handler that unlocks the mutex if the thread terminates. To give you an idea:
void cancel_unlock_handler(void *p)
{
pthread_mutex_unlock(p);
}
int my_pthread_mutex_lock(pthread_mutex_t *m)
{
int rc;
pthread_cleanup_push(cancel_unlock_handler, m);
rc = pthread_mutex_lock(&m);
if (rc != 0) {
pthread_cleanup_pop(0);
}
return rc;
}
int my_pthread_mutex_unlock(pthread_mutex_t *m)
{
pthread_cleanup_pop(0);
return pthread_mutex_unlock(&m);
}
Now you'll need to use the my_pthread_mutex_lock/my_pthread_mutex_unlock instead of the pthread lock/unlock functions.
Now, threads don't really terminate "unexpectedly", either it calls pthread_exit or it ends, or you pthread_kill it, in which case the above will suffice (also note that threads exit only at certain cancellation points, so there's no race conditions e.g.between pushing the cleanup handler and locking the mutex) , but logical error or undefined behavior might leave erroneous state affecting the whole process, and you're better off re-starting the whole process.
I have come up with a workable method to deal with this situation. As I mentioned before, FreeBSD does not support robust mutexes so that option is out. Also one a thread has locked a mutex, it cannot be unlocked by any means.
So what I have done to solve the problem is to abandon the mutex and place its pointer onto a list. Since the lock wrapper code uses pthread_mutex_trylock and then relinquishes the CPU if it fails, no thread can get stuck on waiting for a permanently locked mutex. In the case of a robust mutex, the thread locking the mutex will be able recover it if it gets EOWNERDEAD as the return code.
Here's some things that are defined:
/* Checks to see if we have access to robust mutexes. */
#ifndef PTHREAD_MUTEX_ROBUST
#define TSRA__ALTERNATE
#define TSRA_MAX_MUTEXABANDON TSRA_MAX_MUTEX * 4
#endif
/* Mutex: Mutex Data Table Datatype */
typedef struct mutex_lock_table_tag__ mutexlock_t;
struct mutex_lock_table_tag__
{
pthread_mutex_t *mutex; /* PThread Mutex */
tsra_daclbk audcallbk; /* Audit Callback Function Pointer */
tsra_daclbk reicallbk; /* Reinit Callback Function Pointer */
int acbkstat; /* Audit Callback Status */
int rcbkstat; /* Reinit Callback Status */
pthread_t owner; /* Owner TID */
#ifdef TSRA__OVERRIDE
tsra_clnup_t *cleanup; /* PThread Cleanup */
#endif
};
/* ******** ******** Global Variables */
pthread_rwlock_t tab_lock; /* RW lock for mutex table */
pthread_mutexattr_t mtx_attrib; /* Mutex attributes */
mutexlock_t *mutex_table; /* Mutex Table */
int tabsizeentry; /* Table Size (Entries) */
int tabsizebyte; /* Table Size (Bytes) */
int initialized = 0; /* Modules Initialized 0=no, 1=yes */
#ifdef TSRA__ALTERNATE
pthread_mutex_t *mutex_abandon[TSRA_MAX_MUTEXABANDON];
pthread_mutex_t mtx_abandon; /* Abandoned Mutex Lock */
int mtx_abandon_count; /* Abandoned Mutex Count */
int mtx_abandon_init = 0; /* Initialization Flag */
#endif
pthread_mutex_t mtx_recover; /* Mutex Recovery Lock */
And here's some code for the lock recovery:
/* Attempts to recover a broken mutex. */
int tsra_mutex_recover(int lockid, pthread_t tid)
{
int result;
/* Check Prerequisites */
if (initialized == 0) return(EDOOFUS);
if (lockid < 0 || lockid >= tabsizeentry) return(EINVAL);
/* Check Mutex Owner */
result = pthread_equal(tid, mutex_table[lockid].owner);
if (result != 0) return(0);
/* Lock Recovery Mutex */
result = pthread_mutex_lock(&mtx_recover);
if (result != 0) return(result);
/* Check Mutex Owner, Again */
result = pthread_equal(tid, mutex_table[lockid].owner);
if (result != 0)
{
pthread_mutex_unlock(&mtx_recover);
return(0);
}
/* Unless the system supports robust mutexes, there is
really no way to recover a mutex that is being held
by a thread that has terminated. At least in FreeBSD,
trying to destory a mutex that is held will result
in EBUSY. Trying to overwrite a held mutex results
in a memory fault and core dump. The only way to
recover is to abandon the mutex and create a new one. */
#ifdef TSRA__ALTERNATE /* Abandon Mutex */
pthread_mutex_t *ptr;
/* Too many abandoned mutexes? */
if (mtx_abandon_count >= TSRA_MAX_MUTEXABANDON)
{
result = TSRA_PROGRAM_ABORT;
goto error_1;
}
/* Get a read lock on the mutex table. */
result = pthread_rwlock_rdlock(&tab_lock);
if (result != 0) goto error_1;
/* Perform associated data audit. */
if (mutex_table[lockid].acbkstat != 0)
{
result = mutex_table[lockid].audcallbk();
if (result != 0)
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
}
/* Allocate New Mutex */
ptr = malloc(sizeof(pthread_mutex_t));
if (ptr == NULL)
{
result = errno;
goto error_2;
}
/* Init new mutex and abandon the old one. */
result = pthread_mutex_init(ptr, &mtx_attrib);
if (result != 0) goto error_3;
mutex_abandon[mtx_abandon_count] = mutex_table[lockid].mutex;
mutex_abandon[mtx_abandon_count] = mutex_table[lockid].mutex;
mtx_abandon_count++;
mutex_table[lockid].mutex = ptr;
#else /* Recover Mutex */
/* Try locking the mutex and see what we get. */
result = pthread_mutex_trylock(mutex_table[lockid].mutex);
switch (result)
{
case 0: /* No error, unlock and return */
pthread_unlock_mutex(mutex_table[lockid].mutex);
return(0);
break;
case EBUSY: /* No error, return */
return(0);
break;
case EOWNERDEAD: /* Error, try to recover mutex. */
if (mutex_table[lockid].acbkstat != 0)
{
result = mutex_table[lockid].audcallbk();
if (result != 0)
{
if (mutex_table[lockid].rcbkstat != 0)
{
result = mutex_table[lockid].reicallbk();
if (result != 0)
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
}
else
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
}
}
else
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
break;
case EDEADLK: /* Error, deadlock avoided, abort */
case ENOTRECOVERABLE: /* Error, recovery failed, abort */
/* NOTE: We shouldn't get this, but if we do... */
abort();
break;
default:
/* Ambiguous situation, best to abort. */
abort();
break;
}
pthread_mutex_consistant(mutex_table[lockid].mutex);
pthread_mutex_unlock(mutex_table[lockid].mutex);
#endif
/* Housekeeping */
mutex_table[lockid].owner = pthread_self();
pthread_mutex_unlock(&mtx_recover);
/* Return */
return(0);
/* We only get here on errors. */
#ifdef TSRA__ALTERNATE
error_3:
free(ptr);
error_2:
pthread_rwlock_unlock(&tab_lock);
#else
error_2:
pthread_mutex_unlock(mutex_table[lockid].mutex);
#endif
error_1:
pthread_mutex_unlock(&mtx_recover);
return(result);
}
Because FreeBSD is an evolving operating system like Linux is, I have made provisions to allow for the use of robust mutexes in the future. Since without robust mutexes, there really is no way to do enhanced error checking which is available if robust mutexes are supported.
For a robust mutex, enhanced error checking is performed to verify the need to recover the mutex. For systems that do not support robust mutexes, we have to trust the caller to verify that the mutex in question needs to be recovered. Besides, there is some checking to make sure that there is only one thread performing the recovery. All other threads blocking on the mutex are blocked. I have given some thought about how to signal other threads that a recovery is in progress, so that aspect of the routine still needs work. In a recovery situation, I'm thinking about comparing pointer values to see if the mutex was replaced.
In both cases, an audit routine can be set as a callback function. The purpose of the audit routine is to verify and correct any data discrepancies in the protected data. If the audit fails to correct the data, then another callback routine, the data reinitialize routine, is invoked. The purpose of this is to reinitialize the data that is protected by the mutex. If that fail, then abort() is called to terminate program execution and drop a core file for debugging purposes.
For the abandoned mutex case, the pointer is not thrown away, but is placed on a list. If too many mutexes are abandoned, then the program is aborted. As mentioned above, in the mutex lock routine, pthread_mutex_trylock is used instead of pthread_mutex_lock. This way, no thread can be permanently blocked on a dead mutex. So once the pointer is switched in the mutex table to point to the new mutex, all threads waiting on the mutex will immediately switch to the new mutex.
I am sure there are bugs/errors in this code, but this is a work in progress. Although not quite finished and debugged, I feel that there is enough here to warrant an answer to this question.
Well as you probably aware, a thread which locks a mutex, has the sole ownership of that resource. So it has got all the rights to unlock it. There is no way, atleast till now, to force a thread, give up its resource, without having to do a round about way, that you had did in your code.
However, this would be my approach.
Have a single thread, that owns a mutex, called as Resource thread. Make sure that, this thread receives & responds events to other worker thread.
When a worker thread, wanna enter into critical section, it registers with Resource thread to lock a mutex on it's behalf. When done, the worker thread assumes that, it has got exclusive access to critical section. The assumption is valid because, any other worker thread, which needs to get access to critical section, has to go through the same step.
Now assume that, there is another thread, who wants to force the former worker thread, to unlock, then he can make a special call, maybe a flag or with high priority thread to grant access. The resource thread, on comparing the flag / priority of the requesting thread, will unlock the mutex and lock again for the requesting thread.
I don't know for sure your use-case fully, but just my 2 cents. If you like it, don't forget vote my answer.
You could restart just the process with the crashed thread using function from the exec family to change the process image. I assume that it will be faster to reload the process than to reboot the sever.