how to unlock and destroy a mutex atomically - c

I'm reading stevens's book: apue. 2e. I have encountered a problem, about Mutex.
Let's see a peice of code first: (the following code comes from figure 11.10 of apue 2e)
#include <stdlib.h>
#include <pthread.h>
struct foo {
int f_count;
pthread_mutex_t f_lock;
/* ... more stuff here ... */
};
struct foo *
foo_alloc(void) /* allocate the object */
{
struct foo *fp;
if((fp = malloc(sizeof(struct foo))) != NULL){
fp->f_count = 1;
if(pthread_mutex_init(&fp->f_lock, NULL) != 0){
free(fp);
return NULL;
}
/* ... continue initialization ... */
}
return (fp);
}
void
foo_hold(struct foo *fp) /* add a reference to the object */
{
pthread_mutex_lock(&fp->f_lock);
fp->f_count++;
pthread_mutex_unlock(&fp->f_lock);
}
void
foo_rele(struct foo *fp) /* release a reference to the object */
{
pthread_mutex_lock(&fp->f_lock);
if(--fp->f_count == 0){ /* last reference */
pthread_mutex_unlock(&fp->f_lock); /* step 1 */
pthread_mutex_destroy(&fp->f_lock); /* step 2 */
free(fp);
}else{
pthread_mutex_unlock(&fp->f_lock);
}
}
Assuming we have two threads: thread1 and thread2. thread1 is running right now.
it calls function foo_rele and has finished the execution of step 1(see above),
and prepare to execute step2.
however, context switch occurs at this moment. and thread2 starts to execute.
thread2 locks the lock fp->f_lock then does something. but before thread2 unlocks the lock, context switch occurs again.
thread2 stops executing and thread1 starts to execute. thread1 destroy the lock and it's known to us all that error generates.
so, my problem is: Is the situation mentioned above possible to happen?
how can we avoid it and is there any interface(API) that can unlock and destroy a mutex atomically?

I agree with usr (+1), the design is wrong.
Generally speaking, before one thread can destroy anything it must establish that all other threads have finished using it. That requires some other form of inter-thread synchronisation. What you need is a way in which thread 1 can tell thread 2 that the resource needs to be freed, and for thread 2 to acknowledge that so that thread 1 can call foo_rele() knowing for sure that thread 2 will not try to use fp ever again. Or, for thread 2 telling thread 1 that it no longer needs fp and thread 1 calling foo_rele() as a consequence.
This can be accomplished several ways. One is for thread 1 to set a flag under the protection of the lock, and then wait for thread 2 to quit. Meanwhile, thread 2 eventually gets round to seeing the flag, and then quits. This releases thread 1 which then calls foo_rele(). Another way is message passing over pipes (my preferred means of inter-thread synchronisation); something along the lines of thread 1 -> thread 2, "Please stop using fp": thread 2 -> thread 1, "Ok" (though of course a better defined message set based on enums is advisable, but you get my meaning).
What you can't ever have is thread 1 calling foo_rele() without thread 2 either knowing that it must never touch fp ever again or thread 2 having already quit.

This is not well-designed because thread2 accesses an object which might have been destroyed already. Add a reference (increment f_count) before starting thread2. That way thread2 already starts out with the guarantee that the object is stable while it is running.

Related

Understanding pthreads locks and condition variables

I had an exercise about threads, locks, and condition variables in C. I needed to write a program that get data, turn it into a linked list, starting 3 threads each calculating result for each node in the list and main thread printing the results after evreyone finished.
This is the main function:
int thread_finished_count;
// Lock and Conditional variable
pthread_mutex_t list_lock;
pthread_mutex_t thread_lock;
pthread_cond_t thread_cv;
int main(int argc, char const *argv[])
{
node *list;
int pairs_count, status;
thread_finished_count = 0;
/* get the data and start the threads */
node *head = create_numbers(argc, argv, &pairs_count);
list = head; // backup head for results
pthread_t *threads = start_threads(&list);
/* wait for threads and destroy lock */
status = pthread_cond_wait(&thread_cv, &list_lock);
chcek_status(status);
status = pthread_mutex_destroy(&list_lock);
chcek_status(status);
status = pthread_mutex_destroy(&thread_lock);
chcek_status(status);
/* print result in original list */
print_results(head);
/* cleanup */
wait_for_threads(threads, NUM_THREADS);
free_list(head);
free(threads);
return EXIT_SUCCESS;
}
Please note that the create_numbers function is working properly, and the list is working as intended.
Here is the start_thread and thread_function code:
pthread_t *start_threads(node **list)
{
int status;
pthread_t *threads = (pthread_t *)malloc(sizeof(pthread_t) * NUM_THREADS);
check_malloc(threads);
for (int i = 0; i < NUM_THREADS; i++)
{
status = pthread_create(&threads[i], NULL, thread_function, list);
chcek_status(status);
}
return threads;
}
void *thread_function(node **list)
{
int status, self_id = pthread_self();
printf("im in %u\n", self_id);
node *currentNode;
while (1)
{
if (!(*list))
break;
status = pthread_mutex_lock(&list_lock);
chcek_status(status);
printf("list location %p thread %u\n", *list, self_id);
if (!(*list))
{
status = pthread_mutex_unlock(&list_lock);
chcek_status(status);
break;
}
currentNode = (*list);
(*list) = (*list)->next;
status = pthread_mutex_unlock(&list_lock);
chcek_status(status);
currentNode->gcd = gcd(currentNode->num1, currentNode->num2);
status = usleep(10);
chcek_status(status);
}
status = pthread_mutex_lock(&thread_lock);
chcek_status(status);
thread_finished_count++;
status = pthread_mutex_unlock(&thread_lock);
chcek_status(status);
if (thread_finished_count != 3)
return NULL;
status = pthread_cond_signal(&thread_cv);
chcek_status(status);
return NULL;
}
void chcek_status(int status)
{
if (status != 0)
{
fputs("pthread_function() error\n", stderr);
exit(EXIT_FAILURE);
}
}
Note that self_id is used for debugging purposes.
My problem
My main problem is about splitting the work. Each thread so get an element from the global linked list, calculate gcd, then go on and take the next element. I get this effect only if I'm adding usleep(10) after I unlock the mutex in the while loop. If I don't add the usleep, the FIRST thread will go in and do all the work while other thread just wait and come in after all the work has been done.
Please note!: I thought about the option that maybe the first thread is created, and untill the second thread is created the first one already finish all the jobs. This is why I added the "I'm in #threadID" check with usleep(10) when evrey thread is created. They all come in but only first one is doing all the jobs.
Here is the output example if I do usleep after the mutex unlock (notice diffrent thread ID)
with usleep
./v2 nums.txt
im in 1333593856
list location 0x7fffc4fb56a0 thread 1333593856
im in 1316685568
im in 1325139712
list location 0x7fffc4fb56c0 thread 1333593856
list location 0x7fffc4fb56e0 thread 1316685568
list location 0x7fffc4fb5700 thread 1325139712
list location 0x7fffc4fb5720 thread 1333593856
list location 0x7fffc4fb5740 thread 1316685568
list location 0x7fffc4fb5760 thread 1325139712
list location 0x7fffc4fb5780 thread 1333593856
list location 0x7fffc4fb57a0 thread 1316685568
list location 0x7fffc4fb57c0 thread 1325139712
list location 0x7fffc4fb57e0 thread 1333593856
list location 0x7fffc4fb5800 thread 1316685568
list location (nil) thread 1325139712
list location (nil) thread 1333593856
...
normal result output
...
And thats the output if I comment out the usleep after mutex lock (Notice the same thread ID)
without usleep
./v2 nums.txt
im in 2631730944
list location 0x7fffe5b946a0 thread 2631730944
list location 0x7fffe5b946c0 thread 2631730944
list location 0x7fffe5b946e0 thread 2631730944
list location 0x7fffe5b94700 thread 2631730944
list location 0x7fffe5b94720 thread 2631730944
list location 0x7fffe5b94740 thread 2631730944
list location 0x7fffe5b94760 thread 2631730944
list location 0x7fffe5b94780 thread 2631730944
list location 0x7fffe5b947a0 thread 2631730944
list location 0x7fffe5b947c0 thread 2631730944
list location 0x7fffe5b947e0 thread 2631730944
list location 0x7fffe5b94800 thread 2631730944
im in 2623276800
im in 2614822656
...
normal result output
...
My second question is about order of threads working. My exercise ask me to not use join to synchronize the threads (only use at the end to "free resources") but instand use that condition variable.
My goal is that each thread will take element, do the calculation and in the meanwhile another thread will go in and take another element, and new thread will take each element (or at least close to that)
Thanks for reading and I appriciate your help.
First, you are doing the gcd() work while holding the lock... so (a) only one thread will do any work at any one time, though (b) that does not entirely explain why only one thread appears to do (nearly) all the work -- as KamilCuk says, it may be that there is so little work to do, that it's (nearly) all done before the second thread wakes up properly. [More exotic, there may be some latency between thread 'a' unlocking the mutex and another thread starting to run, such that thread 'a' can acquire the mutex before another thread gets there.]
POSIX says that when a mutex is unlocked, if there are waiters then "the scheduling policy shall determine which thread shall acquire the mutex". The default "scheduling policy" is (to the best of my knowledge) implementation defined.
You could try a couple of things: (1) use a pthread_barrier_t to hold all the threads at the start of thread_function() until they are all running; (2) use sched_yield(void) after pthread_mutex_unlock() to prompt the system into running the newly runnable thread.
Second, you should not under any circumstance treat a 'condition variable' as a signal. For main() to know that all threads have finished you need a count -- which could be a pthread_barrier_t; or it could be simple integer, protected by a mutex, with a 'condition variable' to hold the main thread on while it waits; or it could be a count (in main()) and a semaphore (posted once by each thread as it exits).
Third, you show pthread_cond_wait(&cv, &lock); in main(). At that point main() must own lock... and it matters when that happened. But: as it stands, the first thread to find the list empty will kick cv, and main() will proceed, even though other threads are still running. Though once main() does re-acquire lock, any threads which are then still running will either be exiting or will be stuck on the lock. (It's a mess.)
In general, the template for using a 'condition variable' is:
pthread_mutex_lock(&...lock) ;
while (!(... thing we need ...))
pthread_cond_wait(&...cond_var, &...lock) ;
... do stuff now we have what we need ....
pthread_mutex_unlock(&...lock) ;
NB: a 'condition variable' does not have a value... despite the name, it is not a flag to signal that some condition is true. A 'condition variable' is, essentially, a queue of threads waiting to be re-started. When a 'condition variable' is signaled, at least one waiting thread will be re-started -- but if there are no threads waiting, nothing happens, in particular the (so called) 'condition variable' retains no memory of the signal.
In the new code, following the above template, main() should:
/* wait for threads .... */
status = pthread_mutex_lock(&thread_lock);
chcek_status(status);
while (thread_finished_count != 3)
{
pthread_cond_wait(&thread_cv, &thread_lock) ;
chcek_status(status);
} ;
status = pthread_mutex_unlock(&thread_lock) ;
chcek_status(status);
So what is going on here ?
main() is waiting for thread_finished_count == 3
thread_finished_count is a shared variable "protected" by the thread_lock mutex.
...so it is incremented in thread_function() under the mutex.
...and main() must also read it under the mutex.
if main() finds thread_finished_count != 3 it must wait.
to do that it does: pthread_cond_wait(&thread_cv, &thread_lock), which:
unlocks thread_lock
places the thread on the thread_cv queue of waiting threads.
and it does those atomically.
when thread_function() does the pthread_cond_signal(&thread_cv) it wakes up the waiting thread.
when the main() thread wakes up, it will first re-acquire the thread_lock...
...so it can then proceed to re-read thread_finished_count, to see if it is now 3.
FWIW: I recommend not destroying the mutexes etc until after all the threads have been joined.
I have looked deeper into how glibc (v2.30 on Linux & x86_64, at least) implements pthread_mutex_lock() and _unlock().
It turns out that _lock() works something like this:
if (atomic_cmp_xchg(mutex->lock, 0, 1))
return <OK> ; // mutex->lock was 0, is now 1
while (1)
{
if (atomic_xchg(mutex->lock, 2) == 0)
return <OK> ; // mutex->lock was 0, is now 2
...do FUTEX_WAIT(2)... // suspend thread iff mutex->lock == 2...
} ;
And _unlock() works something like this:
if (atomic_xchg(mutex->lock, 0) == 2) // set mutex->lock == 0
...do FUTEX_WAKE(1)... // if may have waiter(s) start 1
Now:
mutex->lock: 0 => unlocked, 1 => locked-but-no-waiters, 2 => locked-with-waiter(s)
'locked-but-no-waiters' optimizes for the case where there is no lock contention and there is no need to do FUTEX_WAKE in _unlock().
the _lock()/_unlock() functions are in the library -- they are not in the kernel.
...in particular, the ownership of the mutex is a matter for the library, not the kernel.
FUTEX_WAIT(2) is a call to the kernel, which will place the thread on a pending queue associated with the mutex, unless mutex->lock != 2.
The kernel checks for mutex->lock == 2 and adds the thread to the queue atomically. This deals with the case of _unlock() being called after the atomic_xchg(mutex->lock, 2).
FUTEX_WAKE(1) is also a call to the kernel, and the futex man page tells us:
FUTEX_WAKE (since Linux 2.6.0)
This operation wakes at most 'val' of the waiters that are waiting ... No guarantee is provided about which waiters are awoken (e.g., a waiter with a higher scheduling priority is not guaranteed to be awoken in preference to a waiter with a lower priority).
where 'val' in this case is 1.
Although the documentation says "no guarantee about which waiters are awoken", the queue appears to be at least FIFO.
Note especially that:
_unlock() does not pass the mutex to the thread started by the FUTEX_WAKE.
once woken up, the thread will again try to obtain the lock...
...but may be beaten to it by any other running thread -- including the thread which just did the _unlock().
I believe this is why you have not seen the work being shared across the threads. There is so little work for each one to do, that a thread can unlock the mutex, do the work and be back to lock the mutex again before a thread woken up by the unlock can get going and succeed in locking the mutex.

How to be notified when a thread has been terminated for some error

I am working on a program with a fixed number of threads in C using posix threads.
How can i be notified when a thread has been terminated due to some error?
Is there a signal to detect it?
If so, can the signal handler create a new thread to keep the number of threads the same?
Make the threads detached
Get them to handle errors gracefully. i.e. Close mutexs, files etc...
Then you will have no probolems.
Perhaps fire a USR1 signal to the main thread to tell it that things have gone pear shaped (i was going to say tits up!)
Create your threads by passing the function pointers to an intermediate function. Start that intermediate function asynchronously and have it synchronously call the passed function. When the function returns or throws an exception, you can handle the results in any way you like.
With the latest inputs you've provided, I suggest you do something like this to get the number of threads a particular process has started-
#include<stdio.h>
#define THRESHOLD 50
int main ()
{
unsigned count = 0;
FILE *a;
a = popen ("ps H `ps -A | grep a.out | awk '{print $1}'` | wc -l", "r");
if (a == NULL)
printf ("Error in executing command\n");
fscanf(a, "%d", &count );
if (count < THRESHOLD)
{
printf("Number of threads = %d\n", count-1);
// count - 1 in order to eliminate header.
// count - 2 if you don't want to include the main thread
/* Take action. May be start a new thread etc */
}
return 0;
}
Notes:
ps H displays all threads.
$1 prints first column where PID is displayed on my system Ubuntu. The column number might change depending on the system
Replace a.out it with your process name
The backticks will evaluate the expression within them and give you the PID of your process. We are taking advantage of the fact that all POSIX threads will have same PID.
I doubt Linux would signal you when a thread dies or exits for any reason. You can do so manually though.
First, let's consider 2 ways for the thread to end:
It terminates itself
It dies
In the first method, the thread itself can tell someone (say the thread manager) that it is being terminated. The thread manager will then spawn another thread.
In the second method, a watchdog thread can keep track of whether the threads are alive or not. This is done more or less like this:
Thread:
while (do stuff)
this_thread->is_alive = true
work
Watchdog:
for all threads t
t->timeout = 0
while (true)
for all threads t
if t->is_alive
t->timeout = 0
t->is_alive = false
else
++t->timeout
if t->timeout > THRESHOLD
Thread has died! Tell the thread manager to respawn it
If for any reason one could not go for Ed Heal's "just work properly"-approach (which is my favorite answer to the OP's question, btw), the lazy fox might take a look at the pthread_cleanup_push() and pthread_cleanup_pop() macros, and think about including the whole thread function's body in between such two macros.
The clean way to know whether a thread is done is to call pthread_join() against that thread.
// int pthread_join(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_join(that_thread_id, &retval);
... here you know that_thread_id returned ...
The problem with pthread_join() is, if the thread never returns (continues to run as expected) then you are blocked. That's therefore not very useful in your case.
However, you may actually check whether you can join (tryjoin) as follow:
//int pthread_tryjoin_np(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_tryjoin_np(that_thread_id, &relval);
// here 'r' tells you whether the thread returned (joined) or not.
if(r == 0)
{
// that_thread_id is done, create new thread here
...
}
else if(errno != EBUSY)
{
// react to "weird" errors... (maybe a perror() at least?)
}
// else -- thread is still running
There is also a timed join which will wait for the amount of time you specified, like a few seconds. Depending on the number of threads to check and if your main process just sits around otherwise, it could be a solution. Block on thread 1 for 5 seconds, then thread 2 for 5 seconds, etc. which would be 5,000 seconds per loop for 1,000 threads (about 85 minutes to go around all threads with the time it takes to manage things...)
There is a sample code in the man page which shows how to use the pthread_timedjoin_np() function. All you would have to do is put a for loop around to check each one of your thread.
struct timespec ts;
int s;
...
if (clock_gettime(CLOCK_REALTIME, &ts) == -1) {
/* Handle error */
}
ts.tv_sec += 5;
s = pthread_timedjoin_np(thread, NULL, &ts);
if (s != 0) {
/* Handle error */
}
If your main process has other things to do, I would suggest you do not use the timed version and just go through all the threads as fast as you can.

How can I kill a pthread that is in an infinite loop, from outside that loop?

I create a thread and I put it into an infinite loop. I get memory leaks when checking the code with valgrind. Here is my code:
#include <pthread.h>
#include <time.h>
void thread_do(void){
while(1){}
}
int main(){
pthread_t th;
pthread_create(&th, NULL, (void *)thread_do, NULL);
sleep(2);
/* I want to kill thread here */
sleep(2);
return 0;
}
So a thread is created in main and just runs thread_do() all the time. Is there a way to kill it from inside main after 2 seconds? I have tried both pthread_detach(th) and pthread_cancel(th) but I still get leaks.
As #sarnold pointed out, by default your thread can't be cancelled with pthread_cancel() without calling any functions that are cancellation points... but this can be changed by using pthread_setcanceltype() to set the thread's cancellation type to asynchronous instead of deferred. To do that, you'd add something like pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL); near the start of your thread function, before you start the loop. You would then be able to terminate the thread by calling pthread_cancel(th) from main().
Note, though, that cancelling threads this way (whether asynchronous or not) doesn't clean up any resources allocated in the thread function (as noted by Kevin in a comment). In order to do this cleanly, you can:
Ensure that the thread doesn't do anything it needs to clean up before exit (e.g. using malloc() to allocate a buffer)
Ensure that you have some way of cleaning up after the thread elsewhere, after the thread exits
Use pthread_cleanup_push() and pthread_cleanup_pop() to add cleanup handlers to clean up resources when the thread is cancelled. Note that this is still risky if the cancellation type is asynchronous, because the thread could be cancelled between allocating a resource and adding the cleanup handler.
Avoid using pthread_cancel() and have the thread check some condition to determine when to terminate (which would be checked in long-running loops). Since your thread then checks for termination itself, it can do whatever cleanup it needs to after the check.
One way of implementing the last option is to use a mutex as a flag, and test it with pthread_mutex_trylock() wrapped in a function to use in the loop tests:
#include <pthread.h>
#include <unistd.h>
#include <errno.h>
/* Returns 1 (true) if the mutex is unlocked, which is the
* thread's signal to terminate.
*/
int needQuit(pthread_mutex_t *mtx)
{
switch(pthread_mutex_trylock(mtx)) {
case 0: /* if we got the lock, unlock and return 1 (true) */
pthread_mutex_unlock(mtx);
return 1;
case EBUSY: /* return 0 (false) if the mutex was locked */
return 0;
}
return 1;
}
/* Thread function, containing a loop that's infinite except that it checks for
* termination with needQuit()
*/
void *thread_do(void *arg)
{
pthread_mutex_t *mx = arg;
while( !needQuit(mx) ) {}
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t th;
pthread_mutex_t mxq; /* mutex used as quit flag */
/* init and lock the mutex before creating the thread. As long as the
mutex stays locked, the thread should keep running. A pointer to the
mutex is passed as the argument to the thread function. */
pthread_mutex_init(&mxq,NULL);
pthread_mutex_lock(&mxq);
pthread_create(&th,NULL,thread_do,&mxq);
sleep(2);
/* unlock mxq to tell the thread to terminate, then join the thread */
pthread_mutex_unlock(&mxq);
pthread_join(th,NULL);
sleep(2);
return 0;
}
If the thread is not detached (it generally isn't by default), you should call pthread_join() after stopping the thread. If the thread is detached, you don't need to join it, but you won't know exactly when it terminates (or even approximately, unless you add another way to indicate its exit).
A few small thoughts:
You're trying to cancel your thread, but if the cancellation policy in place is for a deferred cancellation, your thread_do() will never be canceled, because it never calls any functions that are cancellation points:
A thread's cancellation type, determined by
pthread_setcanceltype(3), may be either asynchronous or
deferred (the default for new threads). Asynchronous
cancelability means that the thread can be canceled at any
time (usually immediately, but the system does not guarantee
this). Deferred cancelability means that cancellation will
be delayed until the thread next calls a function that is a
cancellation point. A list of functions that are or may be
cancellation points is provided in pthreads(7).
You're not joining the thread in your simple example code; call pthread_join(3) before the end of your program:
After a canceled thread has terminated, a join with that
thread using pthread_join(3) obtains PTHREAD_CANCELED as the
thread's exit status. (Joining with a thread is the only way
to know that cancellation has completed.)

Conditional wait with pthreads

I seem to be running in to a possible deadlock with a pthreads conditional variable.
Here is the code
thread function(){
for (condition){
do work
/* should the thread continue? */
if (exit == 1){
break; /* exit for */
}
} /* end for */
pthread_mutex_lock(&mtxExit);
exit = 0;
pthread_cond_signal(&condVar);
pthread_mutex_unlock(&mtxExit);
}
The main function is as follows:
function main(){
if (thread is still active){
pthread_mutex_lock(&mtxExit);
exit = 1;
pthread_mutex_unlock(&mtxExit);
} /* end if */
while (exit == 1){
pthread_mutex_lock(&mtxExit);
/* check again */
if (exit == 1)
pthread_cond_wait(&condVar, &mtxExit);
pthread_mutex_unlock(&mtxExit);
}
create new thread()
....
}
The code is always getting stuck at cond_wait. :(
EDIT:
Let me add some clarification to the thread to explain what I am doing.
At any given time, I need only one thread running. I have a function that starts the thread, tells it what to do and the main thread continues it work.
The next time the main thread decides it needs to spawn another thread, it has to make sure the thread that was previously started has exited. I cannot have two threads alive at the same time as they will interfere with each other. This is by design and by definition of the problem I am working on.
That is where I am running in to problems.
This is my approach:
Start the thread, let it do its job.
the thread checks in every step of its job to see if it is still relevant. This is where "exit" comes in to picture. The main thread sets "exit" to 1, if it needs to tell the thread that it is no longer relevant.
In most cases, the thread will exit before the main thread decides to spawn another thread. But I still need to factor in the case that the thread is still alive by the time the main thread is ready to start another one.
So the main thread sets the value of "exit" and needs to wait for the thread to exit. I dont want to use pthread_kill with 0 as signal because then main thread will be in a loop wasting CPU cycles. I need the main thread to relinquish control and sleep/wait till the thread exits.
Since I only need one thread at a time, I dont need to worry about scaling to more threads. The solution will never have more than one thread. I just need a reliable mechanism to test if my thread is still alive, if it is, signal it to exit, wait for it to exit and start the next one.
From my testing, it looks like, the main thread is still entering the conditional variable even if the thread may have exited or that the signal is not getting delivered to the main thread at all. And its waiting there forever. And is some cases, in debugger I see that the value of exit is set to 0 and still the main thread is waiting at signal. There seems to be a race condition some where.
I am not a fan of how I set up the code right now, its too messy. Its only a proof of concept right now, I will move to a better solution soon. My challenge is to reliably signal the thread to exit, wait on it to exit.
I appreciate your time.
Did you forget to initialize your condition variable?
pthread_cond_init(&condVar, NULL)
while (exit == 1) {
In the code you quote, the way you quote I do not see any particular problem. It is not clean, but it appears functional. What leads me to believe that somewhere else you are setting exit to 0 without signaling that. Or the thread is getting stuck somewhere doing the work.
But considering the comments which hint that you try to signal one thread to terminate before starting another thread, I think you are doing it wrong. Generally pthread condition signaling shouldn't be relied upon if a signal may not be missed. Though it seems that state variable exit covers that, it is still IMO wrong application of the pthread conditions.
In the case you can try to use a semaphores. While terminating, the thread increments the termination semaphore so that main can wait (decrement) the semaphore.
thread function()
{
for (condition)
{
do work
/* should the thread continue? */
if (exit == 1) {
break; /* exit for */
}
} /* end for */
sem_post(&termSema);
}
function main()
{
if (thread is still active)
{
exit = 1;
sem_wait(&termSema);
exit = 0;
}
create new thread()
....
}
As a general remark, I can suggest to look for some thread pool implementations. Because using a state variable to sync threads is still wrong and doesn't scale to more than one thread. And error prone.
When the code is stuck in pthread_cond_wait, is exit 1 or 0? If exit is 1, it should be stuck.
If exit is 0, one of two things are most likely the case:
1) Some code set exit to 0 but didn't signal the condition variable.
2) Some thread blocked on pthread_cond_wait, consumed a signal, but didn't do whatever it is you needed done.
You have all sorts of timing problems with your current implementation (hence the problems).
To ensure that the thread has finished (and its resources have been released), you should call pthread_join().
There is no need for a pthread_cond_t here.
It might also make more sense to use pthread_cancel() to notify the thread that it is no longer required, rather than a flag like you are currently doing.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *thread_func(void *arg) {
int i;
for (i = 0; i < 10; i++) {
/* protect any regions that must not be cancelled... */
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
/* very important work */
printf("%d\n", i);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
/* force a check to see if we're finished */
pthread_testcancel();
/* sleep (for clarity in the example) */
sleep(1);
}
return NULL;
}
void main(void) {
int ret;
pthread_t tid;
ret = pthread_create(&tid, NULL, thread_func, NULL);
if (ret != 0) {
printf("pthread_create() failed %d\n", ret);
exit(1);
}
sleep(5);
ret = pthread_cancel(tid);
if (ret != 0) {
printf("pthread_cancel() failed %d\n", ret);
exit(1);
}
ret = pthread_join(tid, NULL);
if (ret != 0) {
printf("pthread_join() failed %d\n", ret);
exit(1);
}
printf("finished...\n");
}
It's also worth noting:
exit() is a library function - you should not declare anything with the same name as something else.
Depending on your specific situation, it might make sense to keep a single thread alive always, and provide it with jobs to do, rather than creating / cancelling threads continuously (research 'thread pools')

How do I terminate a thread that is waiting for a semaphore operation

I am writing a program that uses shared memory and semaphores for ipc. There is one main server process that creates the shared memory and semaphores. Any number of client processes can attach to the shared memory and read and write to it when allowed. The semaphores provide the blocking mechanism to control reads and writes. Everything works fine except when a I try to terminate a client. The semaphore block to access the shared memory is in a thread and on process termination I have no way to release the semaphore block so the thread exits correctly. How would I go about this? This is for Linux.
To be specific, there is one shm and two sems. The first sem blocks writing, and the second blocks reading. When a client has something to write, it waits for the write sem to be 0, then sets it to 1, writes, then sets the read sem to 0 which releases the waiting server to read what the client wrote. once read the server sets the write sem back to 0 and the next client in line gets to write. It hangs on a semop call which releases when read sem is 0. This semop call is in a thread and I need to figure out how to exit that thread correctly before letting the main thread terminate.
Here is an example of what i want to do but isn't working (the sleep is pretending to be the hanging semop call):
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
void termination_handler (int signum) {
printf( "Got Signal\n" );
}
void *threadfunc( void *parm ) {
struct sigaction action;
action.sa_handler = termination_handler;
sigemptyset( &action.sa_mask );
action.sa_flags = 0;
sigaction( SIGUSR1, &action, NULL );
printf("Thread executing\n");
sleep( 100 ); // pretending to be the semaphore
pthread_exit( NULL );
}
int main() {
int status;
pthread_t threadid;
int thread_stat;
status = pthread_create( &threadid, NULL, threadfunc, NULL );
if ( status < 0) {
perror("pthread_create failed");
exit(1);
}
sleep( 5 );
status = pthread_kill( threadid, SIGUSR1 );
if ( status < 0 )
perror("pthread_kill failed");
status = pthread_join( threadid, (void *)&thread_stat );
if ( status < 0 )
perror("pthread_join failed");
exit( 0 );
}
He said, this is for Linux.
It would be useful if you could say exactly how are you doing it. I assume you are blocking in sem_wait or sem_timedwait. If your thread is blocking there and you want to interrupt it, you can use pthread_kill.
pthread_kill(blocking_thread_id, SIGUSR1);
Of course, you need to setup the proper signal handler (man sigaction) to catch SIGUSR1 and you need to check the return code of sem_wait() for EINTR, in which case you can do whatever you want to do knowing that you were interrupted and did not get the lock.
In the case you are using processes you would use simply kill() and not pthread_kill() providing the process id. (sorry, initially I misread and thought you were using threads)
I have two and a half answers for you. :)
First, your example code works for me (on Linux): the pthread_kill successfully EINTRupts the worker thread's sleep as expected after about five seconds, as revealed with a few printfs and remembering the return value of sleep. AFAICT, if you want to signal-interrupt a specific thread, you've done it.
Second, try SEM_UNDO. This flag may be set in the sem_flg member passed in the sembuf argument semop, and it will, as the name suggests, undo semaphore adjustments upon process termination. IIUC, when you kill a client, that client leaves a semaphore inappropriately locked. SEM_UNDO was made for just this circumstance.
Finally and respectfully, have you perhaps inverted the logic of semaphores here? As I read your question, a semval of zero indicates "resource free" and a semval of one is "resource locked" (quote: "...[a client] waits for the write sem to be 0, then sets it to 1, writes..."). However, if two or more writing clients are waiting for a SysV sem to drop to zero, they will all be released together when that occurs. That's rather an unpleasant race condition, which might at the least result in unanticipated semaphore decrements and increments.
Depending on your environment perhaps you could only try to take the semaphore with a timeout. After each timeout check if a close thread has been requested and simply give up and shutdown.
It may not be the best idea to use blocking mutexes/semaphores if operations on protected area may last too long for your purposes.
You can solve the issue by putting read and write requests to a queue (linked list, for example) and let the first in the queue to operate on protected area and remove it from the list once it enters the area.
In case of read-only op you may access other reads enter the protected area as well as long as the first operation is read only. When the first operation is write, protected area must be empty before allowing it to access.
List modifications must be protected by mutex (or something alike) but that is near constant time and you probably can pay that.
When threads are sitting in a queue, each has its private condition variable which you can use to wake up any of them. Condition variable must be protected by mutex too. You may store condition variable, mutex etc into a structure and put the into an array or list and store thread id with each so that it will be easy to find the thread you want to wake up.
Once thread wakes up, it first checks what was the reason it had to wake up. If exit flag is set, then the thread know to exit.

Resources