pthread_join() hangs in stress test - c

I am running a phread test program until it fails. Here is the main skeleton of the code:
int authSessionListMutexUnlock()
{
int rc = 0;
int rc2 = 0;
rc2 = pthread_mutex_trylock(&mutex);
ERR_IF( rc2 != EBUSY && rc2 != 0 );
rc2 = pthread_mutex_unlock(&mutex);
ERR_IF( rc2 != 0 );
cleanup:
return rc;
}
static void cleanup_handler(void *arg)
{
int rc = 0;
(void)arg;
rc = authSessionListMutexUnlock();
if (rc != 0)
AUTH_DEBUG5("authSessionListMutexUnlock() failed\n");
}
static void *destroy_expired_sessions(void *t)
{
int rc2 = 0;
(void)t;
pthread_cleanup_push(cleanup_handler, NULL);
rc2 = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
if (rc2 != 0)
AUTH_DEBUG5("pthread_setcancelstate(): rc2 == %d\n", rc2);
rc2 = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
if (rc2 != 0)
AUTH_DEBUG5("pthread_setcanceltype(): rc2 == %d\n", rc2);
while (1)
{
... // destroy expired session
sleep(min_timeout);
}
pthread_cleanup_pop(0);
}
int authDeinit( char *path )
{
...
rc2 = authSessionListDeInit();
ERR_IF( rc2 != 0 );
rc2 = pthread_cancel(destroy_thread);
ERR_IF( rc2 != 0 );
rc2 = pthread_join(destroy_thread, &status);
ERR_IF( rc2 != 0 || (int *)status != PTHREAD_CANCELED );
...
return 0
}
It runs well with the test program, but the test program hangs at round #53743 with pthread_join():
(gdb) bt
#0 0x40000410 in __kernel_vsyscall ()
#1 0x0094aa77 in pthread_join () from /lib/libpthread.so.0
#2 0x08085745 in authDeinit ()
at /users/qixu/src/moja/auth/src//app/libauth/authAPI.c:1562
#3 0x0807e747 in main ()
at /users/qixu/src/moja/auth/src//app/tests/test_session.c:45
Looks like pthread_join() caused a deadlock. But looking at the code, I feel there is no reason that a dead lock be caused by pthread_join(). When pthread_join() gets the chance to run, the only mutex operation is of the thread itself. Should be no conflict, right? Really confused here...

At least one "oddity" shows in your code; your cleanup handler will always unlock the mutex even if you're not the thread holding it.
From the manual;
Calling pthread_mutex_unlock() with a mutex that the calling thread
does not hold will result in undefined behavior.

A bigger problem with your code, and probably the cause of the deadlocks, is your use of asynchronous cancellation mode (I missed this before). Only 3 functions in POSIX are async-cancel-safe:
pthread_cancel()
pthread_setcancelstate()
pthread_setcanceltype()
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_05_04
You certainly cannot lock and unlock mutexes while async cancel mode is enabled.
For async cancellation to be usable, you have to do one of the following things:
Use it only with code that's purely computational, e.g. doing heavy math without any library calls at all, just arithmetic operations, or
Constantly toggle it off and back on around each library call you make.
Edit: Based on the comments, I think you have a misunderstanding of what asynchronous cancellation type means. It has nothing to do with the manner in which cleanup handlers run. It's purely a matter of what point the thread can catch the cancellation request and begin acting on it.
When the target is in deferred cancellation mode, calling pthread_cancel on it will not necessarily do anything right away, unless it's already blocked in a function (like read or select) that's a cancellation point. Instead it will just set a flag, and the next time a function which is a cancellation point is called, the thread will instead block any further cancellation attempts, run the cancellation cleanup handlers in the reverse order they were pushed, and exit with a special status indicating that the thread was cancelled.
When the target is in asynchronous cancellation mode, calling pthread_cancel on it will interrupt the thread immediately (possibly between any pair of adjacent machine code instructions). If you don't see why this is potentially dangerous, think about it for a second. Any function that has internal state (static/global variables, file descriptors or other resources being allocated/freed, etc.) could be in inconsistent state at the point of the interruption: a variable partially modified, a lock halfway obtained, a resource obtained but with no record of it having been obtained, or freed but with no record of it having been freed, etc.
At the point of the asynchronous interruption, further cancellation requests are blocked, so there's no danger of calling whatever function you like from your cleanup handlers. When the cleanup handlers finish running, the thread of course ceases to exist.
One other potential source of confusion: cleanup handlers do not run in parallel with the thread being cancelled. When cancellation is acted upon, the cancelled thread stops running the normal flow of code, and instead runs the cleanup handlers then exits.

Related

Why is the thread not waiting for the cancellation point in deferred mode?

Why is the thread in my program cancelling before reaching of the testcancel function? I exepected thread will be cancelled when testcancel called, but it cancelling immediately with a changing cancelstate to enable.
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
int i = 0;
void proc1()
{
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
for (i = 0; i < 7; i++)
{
if (i == 3) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
}
if (i == 5) {
pthread_testcancel();
}
printf("I'm still running! %d\n", i);
sleep(1);
}
}
int main(void)
{
pthread_t thread;
pthread_create(&thread, NULL, (void*)proc1, NULL);
sleep(1);
printf("Requested to cancel the thread\n");
pthread_cancel(thread);
pthread_join(thread, NULL);
printf("The thread is stopped\n");
return 0;
}
Result:
I tried to run it without printf (due to printf is cancellation point too) but it didn't solve the problem.
I exepected thread will be cancelled when testcancel called,
This expectation is not correct.
From the phread_cancel spec
Deferred cancelability means that cancellation will be delayed until the thread next calls a function that is a cancellation point.
There is also a link included to check what a cancellation point is:
The following functions are required to be cancellation points by POSIX.1-2001 and/or POSIX.1-2008:
...
pthread_testcancel()
...
sleep()
...
Each of them will make your thread respond to cancellation.
This means, also this assumption is not fully correct:
but it cancelling immediately with a changing cancelstate to enable.
Instead your thread is cancelled as soon as it calls sleep in the same iteration when it sets cancel state to enabled. (BTW: Cancel type is deferred by default)
You seem to expect that the thread only checks whether it is cancelled, when it actively queries for cancel state. I don't think this can be done using pthread_cancel.
Instead you need to introduce some communication mechanism (maybe via sockets) to tell the thread that it shall terminate itself.

Why does sem_timedwait() not waking up?

I work on an embedded system with eCos:
I have 2 threads within the same process and 1 semaphore.
Thread A initializes a semaphore to 0 so that the 1st attempt to take it will block.
Thread A sends a command to Thread B, providing a callback.
Thread A waits on semaphore with sem_timedwait
Thread B process the command and increments the semaphore
Thread A should be woken up but is still blocked
Here is the code:
Thread A
static sem_t semaphore;
void callback()
{
// Do some stuff
int ret = sem_post(&semaphore);
// print confirmation message
}
void foo()
{
int ret = sem_init(&semaphore, 0, 0);
if(ret != 0)
{
// print errno
}
struct timespec ts;
clock_gettime(CLOCK_REALTIME,&ts); // Get current date
ts.tv_sec += 2; // Add 2s for the deadline
send_command_to_thread_B(&callback);
ret = sem_timedwait(&semaphore, &ts);
if(ret != 0)
{
// print errno
}
// print waking up message
}
What is in Thread B is not relevant.
For debug I tried the following:
Using sem_wait rather than sem_timedwait works: Thread A is blocked, then unlocked after the callback. But I don't want to use it because if there is a failure in the callback process that prevent the semaphore to be incremented, Thread A will wait forever.
If I don't add the 2s to the timespec struct, sem_timedwait returns immediatly and errno is set to ETIMEDOUT (seems legit). The callback is called but it is too late for Thread A.
I put traces in the callback call to ensure that the semaphore is indeed incremented from 0 to 1: all the process is done, the callback exits but Thread A is still blocked.
Do you guys have any clue ? Am I missing something ?
Ok so actually everything is fine with this code, the issue was elsewhere : I had a re-entrance problem that caused a deadlock.
Moral: Carefully protect your resources and addresses in a multi-thread context

pthread_cancel() function failed to terminate a thread

I'm using pthread_create() and pthread_cancel() functions to create a multithreaded program, but I noticed that pthread_cancel() did not really terminate the thread it was supposed to.
void check(void *param){
header_t *Bag = (header_t *) param;
pthread_t timer_thread;
while(true){
if(Bag->size && TIMER_OFF){
pthread_create(&timer_thread, NULL, (void *) &timer, (void *) Bag);
printf("\nCREATE THREAD ID = %d\n", timer_thread);
// ADD
}
else if(!TIMER_OFF && Bag->size >= 2 && Bag->next->time <= CURRENT_APT_TIME && CRRENT_TAG != Bag->next->tag){
printf("\nOLD THREAD ID = %d TO BE CANCELLED\n", timer_thread);
pthread_cancel(timer_thread);
pthread_create(&timer_thread, NULL, (void *) &timer, (void *) Bag);
printf("\nNEW THREAD ID = %d\n", timer_thread);
// Replace
}
Sleep(1);
}
}
timer function void timer(void *) is exactly what it sounds like, and I've included couple of lines to print out the thread ID of itself.
When tested, the following was seen:
...
OLD THREAD ID = 6041240 TO BE CANCELLED
NEW THREAD ID = 6046456
...
THREAD ID EXECUTING = 6041240
So the timer function was not terminated by calling pthread_cancel()?
By default your thread is created with the cancel type PTHREAD_CANCEL_DEFERRED, which means that you need to make sure that your thread has a so-called cancellation point. That is, a point where cancel requests are checked and reacted upon.
This page contains a list of functions that are guaranteed to be cancellation points. Examples are some of the sleep() and pthread functions. You can also use pthread_testcancel() if you just need a function to purely test whether the thread has been canceled, and nothing else.
Another option is to set your threads canceltype to be PTHREAD_CANCEL_ASYNCHRONOUS, by using pthread_setcanceltype(), which will make your thread cancelable even without cancellation points. However, the system is still free to choose when the thread should actually be canceled, and you'll need to take great care to make sure that the system isn't left in an inconsistent state when cancelled (typically means avoiding any system calls and similar - see the list of Async-cancel-safe functions - it's short!).
In general, asynchronous cancellation should only be used for more or less "pure" processing threads, whereas threads performing system calls and similar are better implemented with deferred cancellation and careful placement of cancellation points.
Also, as long as you are not detaching your thread (by either creating it detached through its attribute, or by calling pthread_detach() after it is created), you will need to call pthread_join() on the timer thread to make sure that all resources are cleaned up after canceling it. Otherwise you might end up in a state without any spare threading resources, where you cannot create any new threads.
The default cancel method of pthread_cancel() is delayed cancel.So when you invoke pthread_cancel(), it will not really cancel before the specified thread reach the cancel point.You shoud call pthread_setcanceltype to set another cancel method to your thread.

Conditional wait with pthreads

I seem to be running in to a possible deadlock with a pthreads conditional variable.
Here is the code
thread function(){
for (condition){
do work
/* should the thread continue? */
if (exit == 1){
break; /* exit for */
}
} /* end for */
pthread_mutex_lock(&mtxExit);
exit = 0;
pthread_cond_signal(&condVar);
pthread_mutex_unlock(&mtxExit);
}
The main function is as follows:
function main(){
if (thread is still active){
pthread_mutex_lock(&mtxExit);
exit = 1;
pthread_mutex_unlock(&mtxExit);
} /* end if */
while (exit == 1){
pthread_mutex_lock(&mtxExit);
/* check again */
if (exit == 1)
pthread_cond_wait(&condVar, &mtxExit);
pthread_mutex_unlock(&mtxExit);
}
create new thread()
....
}
The code is always getting stuck at cond_wait. :(
EDIT:
Let me add some clarification to the thread to explain what I am doing.
At any given time, I need only one thread running. I have a function that starts the thread, tells it what to do and the main thread continues it work.
The next time the main thread decides it needs to spawn another thread, it has to make sure the thread that was previously started has exited. I cannot have two threads alive at the same time as they will interfere with each other. This is by design and by definition of the problem I am working on.
That is where I am running in to problems.
This is my approach:
Start the thread, let it do its job.
the thread checks in every step of its job to see if it is still relevant. This is where "exit" comes in to picture. The main thread sets "exit" to 1, if it needs to tell the thread that it is no longer relevant.
In most cases, the thread will exit before the main thread decides to spawn another thread. But I still need to factor in the case that the thread is still alive by the time the main thread is ready to start another one.
So the main thread sets the value of "exit" and needs to wait for the thread to exit. I dont want to use pthread_kill with 0 as signal because then main thread will be in a loop wasting CPU cycles. I need the main thread to relinquish control and sleep/wait till the thread exits.
Since I only need one thread at a time, I dont need to worry about scaling to more threads. The solution will never have more than one thread. I just need a reliable mechanism to test if my thread is still alive, if it is, signal it to exit, wait for it to exit and start the next one.
From my testing, it looks like, the main thread is still entering the conditional variable even if the thread may have exited or that the signal is not getting delivered to the main thread at all. And its waiting there forever. And is some cases, in debugger I see that the value of exit is set to 0 and still the main thread is waiting at signal. There seems to be a race condition some where.
I am not a fan of how I set up the code right now, its too messy. Its only a proof of concept right now, I will move to a better solution soon. My challenge is to reliably signal the thread to exit, wait on it to exit.
I appreciate your time.
Did you forget to initialize your condition variable?
pthread_cond_init(&condVar, NULL)
while (exit == 1) {
In the code you quote, the way you quote I do not see any particular problem. It is not clean, but it appears functional. What leads me to believe that somewhere else you are setting exit to 0 without signaling that. Or the thread is getting stuck somewhere doing the work.
But considering the comments which hint that you try to signal one thread to terminate before starting another thread, I think you are doing it wrong. Generally pthread condition signaling shouldn't be relied upon if a signal may not be missed. Though it seems that state variable exit covers that, it is still IMO wrong application of the pthread conditions.
In the case you can try to use a semaphores. While terminating, the thread increments the termination semaphore so that main can wait (decrement) the semaphore.
thread function()
{
for (condition)
{
do work
/* should the thread continue? */
if (exit == 1) {
break; /* exit for */
}
} /* end for */
sem_post(&termSema);
}
function main()
{
if (thread is still active)
{
exit = 1;
sem_wait(&termSema);
exit = 0;
}
create new thread()
....
}
As a general remark, I can suggest to look for some thread pool implementations. Because using a state variable to sync threads is still wrong and doesn't scale to more than one thread. And error prone.
When the code is stuck in pthread_cond_wait, is exit 1 or 0? If exit is 1, it should be stuck.
If exit is 0, one of two things are most likely the case:
1) Some code set exit to 0 but didn't signal the condition variable.
2) Some thread blocked on pthread_cond_wait, consumed a signal, but didn't do whatever it is you needed done.
You have all sorts of timing problems with your current implementation (hence the problems).
To ensure that the thread has finished (and its resources have been released), you should call pthread_join().
There is no need for a pthread_cond_t here.
It might also make more sense to use pthread_cancel() to notify the thread that it is no longer required, rather than a flag like you are currently doing.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *thread_func(void *arg) {
int i;
for (i = 0; i < 10; i++) {
/* protect any regions that must not be cancelled... */
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
/* very important work */
printf("%d\n", i);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
/* force a check to see if we're finished */
pthread_testcancel();
/* sleep (for clarity in the example) */
sleep(1);
}
return NULL;
}
void main(void) {
int ret;
pthread_t tid;
ret = pthread_create(&tid, NULL, thread_func, NULL);
if (ret != 0) {
printf("pthread_create() failed %d\n", ret);
exit(1);
}
sleep(5);
ret = pthread_cancel(tid);
if (ret != 0) {
printf("pthread_cancel() failed %d\n", ret);
exit(1);
}
ret = pthread_join(tid, NULL);
if (ret != 0) {
printf("pthread_join() failed %d\n", ret);
exit(1);
}
printf("finished...\n");
}
It's also worth noting:
exit() is a library function - you should not declare anything with the same name as something else.
Depending on your specific situation, it might make sense to keep a single thread alive always, and provide it with jobs to do, rather than creating / cancelling threads continuously (research 'thread pools')

WaitForSingleObject and WaitForMultipleObjects equivalent in Linux?

I am migrating an applciation from windows to linux. I am facing problem with respect to WaitForSingleObject and WaitForMultipleObjects interfaces.
In my application I spawn multiple threads where all threads wait for events from parent process or periodically run for every t seconds.
I have checked pthread_cond_timedwait, but we have to specify absolute time for this.
How can I implement this in Unix?
Stick to pthread_cond_timedwait and use clock_gettime. For example:
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_sec += 10; // ten seconds
while (!some_condition && ret == 0)
ret = pthread_cond_timedwait(&cond, &mutex, &ts);
Wrap it in a function if you wish.
UPDATE: complementing the answer based on our comments.
POSIX doesn't have a single API to wait for "all types" of events/objects as Windows does. Each one has its own functions. The simplest way to notify a thread for termination is using atomic variables/operations. For example:
Main thread:
// Declare it globally (argh!) or pass by argument when the thread is created
atomic_t must_terminate = ATOMIC_INIT(0);
// "Signal" termination by changing the initial value
atomic_inc(&must_terminate);
Secondary thread:
// While it holds the default value
while (atomic_read(&must_terminate) == 0) {
// Keep it running...
}
// Do proper cleanup, if needed
// Call pthread_exit() providing the exit status
Another alternative is to send a cancellation request using pthread_cancel. The thread being cancelled must have called pthread_cleanup_push to register any necessary cleanup handler. These handlers are invoked in the reverse order they were registered. Never call pthread_exit from a cleanup handler, because it's undefined behaviour. The exit status of a cancelled thread is PTHREAD_CANCELED. If you opt for this alternative, I recommend you to read mainly about cancellation points and types.
And last but not least, calling pthread_join will make the current thread block until the thread passed by argument terminates. As bonus, you'll get the thread's exit status.
For what it's worth, we (NeoSmart Technologies) have just released an open source (MIT licensed) library called pevents which implements WIN32 manual and auto-reset events on POSIX, and includes both WaitForSingleObject and WaitForMultipleObjects clones.
Although I'd personally advise you to use POSIX multithreading and signaling paradigms when coding on POSIX machines, pevents gives you another choice if you need it.
I realise this is an old question now, but for anyone else who stumbles across it, this source suggests that pthread_join() does effectively the same thing as WaitForSingleObject():
http://www.ibm.com/developerworks/linux/library/l-ipc2lin1/index.html
Good luck!
For WaitForMultipleObjects with false WaitAll try this:
#include <unistd.h>
#include <pthread.h>
#include <stdio.h>
using namespace std;
pthread_cond_t condition;
pthread_mutex_t signalMutex;
pthread_mutex_t eventMutex;
int finishedTask = -1;
void* task(void *data)
{
int num = *(int*)data;
// Do some
sleep(9-num);
// Task finished
pthread_mutex_lock(&eventMutex); // lock until the event will be processed by main thread
pthread_mutex_lock(&signalMutex); // lock condition mutex
finishedTask = num; // memorize task number
pthread_cond_signal(&condition);
pthread_mutex_unlock(&signalMutex); // unlock condtion mutex
}
int main(int argc, char *argv[])
{
pthread_t thread[10];
pthread_cond_init(&condition, NULL);
pthread_mutex_init(&signalMutex, NULL); // First mutex locks signal
pthread_mutex_init(&eventMutex, NULL); // Second mutex locks event processing
int numbers[10];
for (int i = 0; i < 10; i++) {
numbers[i] = i;
printf("created %d\n", i); // Creating 10 asynchronous tasks
pthread_create(&thread[i], NULL, task, &numbers[i]);
}
for (int i = 0; i < 10;)
{
if (finishedTask >= 0) {
printf("Task %d finished\n", finishedTask); // handle event
finishedTask = -1; // reset event variable
i++;
pthread_mutex_unlock(&eventMutex); // unlock event mutex after handling
} else {
pthread_cond_wait(&condition, &signalMutex); // waiting for event
}
}
return 0;
}

Resources