I seem to be running in to a possible deadlock with a pthreads conditional variable.
Here is the code
thread function(){
for (condition){
do work
/* should the thread continue? */
if (exit == 1){
break; /* exit for */
}
} /* end for */
pthread_mutex_lock(&mtxExit);
exit = 0;
pthread_cond_signal(&condVar);
pthread_mutex_unlock(&mtxExit);
}
The main function is as follows:
function main(){
if (thread is still active){
pthread_mutex_lock(&mtxExit);
exit = 1;
pthread_mutex_unlock(&mtxExit);
} /* end if */
while (exit == 1){
pthread_mutex_lock(&mtxExit);
/* check again */
if (exit == 1)
pthread_cond_wait(&condVar, &mtxExit);
pthread_mutex_unlock(&mtxExit);
}
create new thread()
....
}
The code is always getting stuck at cond_wait. :(
EDIT:
Let me add some clarification to the thread to explain what I am doing.
At any given time, I need only one thread running. I have a function that starts the thread, tells it what to do and the main thread continues it work.
The next time the main thread decides it needs to spawn another thread, it has to make sure the thread that was previously started has exited. I cannot have two threads alive at the same time as they will interfere with each other. This is by design and by definition of the problem I am working on.
That is where I am running in to problems.
This is my approach:
Start the thread, let it do its job.
the thread checks in every step of its job to see if it is still relevant. This is where "exit" comes in to picture. The main thread sets "exit" to 1, if it needs to tell the thread that it is no longer relevant.
In most cases, the thread will exit before the main thread decides to spawn another thread. But I still need to factor in the case that the thread is still alive by the time the main thread is ready to start another one.
So the main thread sets the value of "exit" and needs to wait for the thread to exit. I dont want to use pthread_kill with 0 as signal because then main thread will be in a loop wasting CPU cycles. I need the main thread to relinquish control and sleep/wait till the thread exits.
Since I only need one thread at a time, I dont need to worry about scaling to more threads. The solution will never have more than one thread. I just need a reliable mechanism to test if my thread is still alive, if it is, signal it to exit, wait for it to exit and start the next one.
From my testing, it looks like, the main thread is still entering the conditional variable even if the thread may have exited or that the signal is not getting delivered to the main thread at all. And its waiting there forever. And is some cases, in debugger I see that the value of exit is set to 0 and still the main thread is waiting at signal. There seems to be a race condition some where.
I am not a fan of how I set up the code right now, its too messy. Its only a proof of concept right now, I will move to a better solution soon. My challenge is to reliably signal the thread to exit, wait on it to exit.
I appreciate your time.
Did you forget to initialize your condition variable?
pthread_cond_init(&condVar, NULL)
while (exit == 1) {
In the code you quote, the way you quote I do not see any particular problem. It is not clean, but it appears functional. What leads me to believe that somewhere else you are setting exit to 0 without signaling that. Or the thread is getting stuck somewhere doing the work.
But considering the comments which hint that you try to signal one thread to terminate before starting another thread, I think you are doing it wrong. Generally pthread condition signaling shouldn't be relied upon if a signal may not be missed. Though it seems that state variable exit covers that, it is still IMO wrong application of the pthread conditions.
In the case you can try to use a semaphores. While terminating, the thread increments the termination semaphore so that main can wait (decrement) the semaphore.
thread function()
{
for (condition)
{
do work
/* should the thread continue? */
if (exit == 1) {
break; /* exit for */
}
} /* end for */
sem_post(&termSema);
}
function main()
{
if (thread is still active)
{
exit = 1;
sem_wait(&termSema);
exit = 0;
}
create new thread()
....
}
As a general remark, I can suggest to look for some thread pool implementations. Because using a state variable to sync threads is still wrong and doesn't scale to more than one thread. And error prone.
When the code is stuck in pthread_cond_wait, is exit 1 or 0? If exit is 1, it should be stuck.
If exit is 0, one of two things are most likely the case:
1) Some code set exit to 0 but didn't signal the condition variable.
2) Some thread blocked on pthread_cond_wait, consumed a signal, but didn't do whatever it is you needed done.
You have all sorts of timing problems with your current implementation (hence the problems).
To ensure that the thread has finished (and its resources have been released), you should call pthread_join().
There is no need for a pthread_cond_t here.
It might also make more sense to use pthread_cancel() to notify the thread that it is no longer required, rather than a flag like you are currently doing.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *thread_func(void *arg) {
int i;
for (i = 0; i < 10; i++) {
/* protect any regions that must not be cancelled... */
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
/* very important work */
printf("%d\n", i);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
/* force a check to see if we're finished */
pthread_testcancel();
/* sleep (for clarity in the example) */
sleep(1);
}
return NULL;
}
void main(void) {
int ret;
pthread_t tid;
ret = pthread_create(&tid, NULL, thread_func, NULL);
if (ret != 0) {
printf("pthread_create() failed %d\n", ret);
exit(1);
}
sleep(5);
ret = pthread_cancel(tid);
if (ret != 0) {
printf("pthread_cancel() failed %d\n", ret);
exit(1);
}
ret = pthread_join(tid, NULL);
if (ret != 0) {
printf("pthread_join() failed %d\n", ret);
exit(1);
}
printf("finished...\n");
}
It's also worth noting:
exit() is a library function - you should not declare anything with the same name as something else.
Depending on your specific situation, it might make sense to keep a single thread alive always, and provide it with jobs to do, rather than creating / cancelling threads continuously (research 'thread pools')
Related
I have read this and this post on stackoverflow, but no one of them give me what I want to do.
In my case, I want to create a Thread, launch it and let it running with no blocking stat as long as the main process runs. This thread has no communication, no synchronization with the main process, it do his job fully independent.
Consider this code:
#define DAY_PERIOD 86400 /* 3600*24 seconds */
int main() {
char wDir[255] = "/path/to/log/files";
compress_logfiles(wDir);
// do other things, this things let the main process runs all the time.
// just segmentation fault, stackoverflow, memory overwrite or
// somethings like that stop it.
return 0;
}
/* Create and launch thread */
void compress_logfiles(char *wDir)
{
pthread_t compressfiles_th;
if (pthread_create(&compressfiles_th, NULL, compress, wDir))
{
fprintf(stderr, "Error create compressfiles thread\n");
return;
}
if (pthread_join(compressfiles_th, NULL))
{
//fprintf(stderr, "Error joining thread\n");
return;
}
return;
}
void *compress(void *wDir)
{
while(1)
{
// Do job to compress files
// and sleep for one day
sleep(DAY_PERIOD); /* sleep one day*/
}
return NULL;
}
With ptheard_join in compress_logfiles function, the thread compresses all files successfully and never returns because it is in infinite while loop, so the main process still blocked all the time. If I remove ptheard_join from compress_logfiles function, the main process is not blocked because it don't wait thread returns, but the thread compresses one file and exit (there a lot of files, arround one haundred).
So, is there a way to let main process launch compressfiles_th thread and let it do his job without waiting it to finish or exit?
I found pthread_tryjoin_np and pthread_timedjoin_np in Linux Programmer's Manual, it seems that pthread_tryjoin_np do the job if I don't care of the returned value, it is good idea to use it?
Thank you.
Edit 1:
Please note that the main process is daemonized after call to compress_logfiles(wDir), perhaps the daemonization kill the main process and re-launch it is the problem?
Edit 2: the solution
Credit to dbush
Yes, fork causes the problem, and pthread_atfork() solves it. I made this change to run the compressfiles_th without blocking main process:
#define DAY_PERIOD 86400 /* 3600*24 seconds */
char wDir[255] = "/path/to/log/files"; // global now
// function added
void child_handler(){
compress_logfiles(wDir); // wDir is global variable now
}
int main()
{
pthread_atfork(NULL, NULL, child_handler);
// Daemonize the process.
becomeDaemon(BD_NO_CHDIR & BD_NO_CLOSE_FILES & BD_NO_REOPEN_STD_FDS & BD_NO_UMASK0 & BD_MAX_CLOSE);
// do other things, this things let the main process runs all the time.
// just segmentation fault, stackoverflow, memory overwrite or
// somethings like that stop it.
return 0;
}
child_handler() function is called after fork. pthread_atfork
When you fork a new process, only the calling thread is duplicated, not all threads.
If you wish to daemonize, you need to fork first, then create your threads.
From the man page for fork:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent is
replicated in the child, including the states of mutexes, condition
variables, and other pthreads objects; the use of pthread_atfork(3)
may be helpful for dealing with problems that this can cause.
This code plays a sound clip by creating a thread to do it. When bleep() runs, it sets the global variable bleep_playing to TRUE. In its main loop, if it notices that bleep_playing has been set to FALSE, it terminates that loop, cleans up (closing files, freeing buffers), and exits. I don't know the correct way to wait for a detached thread to finish. pthread_join() doesn't do the job. The while loop here continually checks bleep_id to see if it's valid. When it isn't, execution continues. Is this the correct and portable way to tell a thread to clean up and terminate before the next thread is allowed to be created?
if (bleep_playing) {
bleep_playing = FALSE;
while (pthread_kill(bleep_id, 0) == 0) {
/* nothing */
}
}
err = pthread_create(&bleep_id, &attr, (void *) &bleep, &effect);
I
Hmm... pthread_join should do the job. As far as I remember the thread has to call pthread_exit...?
/* bleep thread */
void *bleep(void *)
{
/* do bleeping */
pthread_exit(NULL);
}
/* main thread */
if (pthread_create(&thread, ..., bleep, ...) == 0)
{
/*
** Try sleeping for some ms !!!
** I've had some issues on multi core CPUs requiring a sleep
** in order for the created thread to really "exist"...
*/
pthread_join(&thread, NULL);
}
Anyway if it isn't doing its thing you shouldn't poll a global variable since it will eat up your CPU. Instead create a mutex (pthread_mutex_*-functions) which is initially locked and freed by the "bleep thread". In your main thread you can wait for that mutex which makes your thread sleep until the "bleep thread" frees the mutex.
(or quick & and dirty: sleep for a small amount of time while waiting for bleep_playing becoming FALSE)
My problem is that I cannot reuse cancelled pthread. Sample code:
#include <pthread.h>
pthread_t alg;
pthread_t stop_alg;
int thread_available;
void *stopAlgorithm() {
while (1) {
sleep(6);
if (thread_available == 1) {
pthread_cancel(alg);
printf("Now it's dead!\n");
thread_available = 0;
}
}
}
void *algorithm() {
while (1) {
printf("I'm here\n");
}
}
int main() {
thread_available = 0;
pthread_create(&stop_alg, NULL, stopAlgorithm, 0);
while (1) {
sleep(1);
if (thread_available == 0) {
sleep(2);
printf("Starting algorithm\n");
pthread_create(&alg, NULL, algorithm, 0);
thread_available = 1;
}
}
}
This sample should create two threads - one will be created at the program beginning and will try to cancel second as soon it starts, second should be rerunned as soon at it was cancelled and say "I'm here". But when algorithm thread cancelled once it doesn't start once again, it says "Starting algorithm" and does nothing, no "I'm here" messages any more. Could you please tell me the way to start cancelled(immediately stopped) thread once again?
UPD: So, thanks to your help I understood what is the problem. When I rerun algorithm thread it throws error 11:"The system lacked the necessary resources to create another thread, or the system-imposed limit on the total number of threads in a process PTHREAD_THREADS_MAX would be exceeded.". Actually I have 5 threads, but only one is cancelled, others stop by pthread_exit. So after algorithm stopped and program went to standby mode I checked status of all threads with pthread_join - all thread show 0(cancelled shows PTHREAD_CANCELED), as far as I can understand this means, that all threads stopped successfully. But one more try to run algorithm throws error 11 again. So I've checked memory usage. In standby mode before algorithm - 10428, during the algorithm, when all threads used - 2026m, in standby mode after algorithm stopped - 2019m. So even if threads stopped they still use memory, pthread_detach didn't help with this. Are there any other ways to clean-up after threads?
Also, sometimes on pthread_cancel my program crashes with "libgcc_s.so.1 must be installed for pthread_cancel to work"
Several points:
First, this is not safe:
int thread_available;
void *stopAlgorithm() {
while (1) {
sleep(6);
if (thread_available == 1) {
pthread_cancel(alg);
printf("Now it's dead!\n");
thread_available = 0;
}
}
}
It's not safe for at least reasons. Firstly, you've not marked thread_available as volatile. This means that the compiler can optimise stopAlgorithm to read the variable once, and never reread it. Secondly, you haven't ensured access to it is atomic, or protected it by a mutex. Either declare it:
volatile sig_atomic_t thread_available;
(or similar), or better, protect it by a mutex.
But for the general case of triggering one thread from another, you are better using a condition variable (and a mutex), using pthread_condwait or pthread_condtimedwait in the listening thread, and pthread_condbroadcast in the triggering thread.
Next, what's the point of the stopAlgorithm thread? All it does is cancel the algorithm thread after an unpredictable amount of time between 0 and 6 seconds? Why not just sent the pthread_cancel from the main thread?
Next, do you care where your algorithm is when it is cancelled? If not, just pthread_cancel it. If so (and anyway, I think it's far nicer), regularly check a flag (either atomic and volatile as above, or protected by a mutex) and pthread_exit if it's set. If your algorithm does big chunks every second or so, then check it then. If it does lots of tiny things, check it (say) every 1,000 operations so taking the mutex doesn't introduce a performance penalty.
Lastly, if you cancel a thread (or if it pthread_exits), the way you start it again is simply to call pthread_create again. It's then a new thread running the same code.
I cant terminate the thread, it keeps sending things even after I close the terminal...
void *RTPfun(void * client_addr);
int main(int argc, char *argv[])
{
pthread_t RTPthread;
pthread_create(&RTPthread, NULL, &RTPfun, (void*)client_addr);
...
...
pthread_exit(&RTPfun);
return 0;
}
void *RTPfun(void * client_addr)
{
...
...
return 0;
}
Can someone tell me what am I doing wrong?
Thanks!
pthread_exit kills your current thread.
Notice, that if you kill the main thread as you do, it does not terminate the process. Other threads keep running.
You probably want to use pthread_cancel.
More generally though, killing threads is a bad idea. Correct way is to ask your threads politely to terminate and wait till they do.
If you call exit() from main, it will terminate main thread with all other thread.
If you call the method pthread_exit() from your main it will terminate main thread and let other thread will run continuously.
In your case you are calling pthread_exit() from main so your main thread get terminated, and other thread running until thread gets finish the job.
To cancel thread Add Below in RTPfun and add pthread_cancel in main.
/* call this when you are not ready to cancel the thread */
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
/* call this when you are ready to cancel the thread */
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
Working sample code:
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void *RTPfun(void * client_addr);
int main(int argc, char *argv[])
{
pthread_t RTPthread;
int client_addr;
pthread_create(&RTPthread, NULL, &RTPfun, (void*)client_addr);
sleep(2);
pthread_cancel(RTPthread);
pthread_join(RTPthread, NULL);
return 0;
}
void *RTPfun(void * client_addr)
{
int count = 0;
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
while(1) {
if(count > 10) {
printf("thread set for cancel\n");
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
}
sleep(1);
printf("count:%d\n", count);
count ++;
}
return 0;
}
Used sleep in code for just understanding.
My understanding:
From "man pthread_exit" clearly talks about the description and rational behavior.
we will need to clean-up all respected resource that are been used in the thread created.
" The pthread_exit() function shall terminate the calling thread and make the value value_ptr available to any successful join with the terminating thread."
we shall pass "value_ptr" to exit(value_ptr) --> this will help to analyse what was the results of exit process.
obviously exit() - call exiting from the process, will release resources that used for.
In other way, you can create pthread in detached state, this attribute will reclaim the resources implicitly when pthread_exit . Refer "man pthread_detach".
We don't need to use pthread_join .. either go for pthread_detach again its simple go ahead set attribute .
/* set the thread detach state */
ret = pthread_attr_setdetachstate(&tattr,PTHREAD_CREATE_DETACHED);
Thanks.
Sankar
Your pthread_exit() exits the current thread. The parameter is used to pass a return value to any thread that then wants to join with it - and not to specify which thread to exit, as your code is implying.
The nicest way to do what you want is to use pthread_cancel() from your main thread. It takes the thread to cancel as a parameter, and then sends a cancellation request to that thread. Notice though, that by default cancellation is deferred, so your thread will keep on running until it hits a function that is a cancellation point - if you don't use any of those functions, you can insert an explicit cancellation point with a call to pthread_testcancel().
If you need to do some cleanup (for instance to free allocated memory, unlock mutexes, etc), you can insert a cleanup handler that is automatically called when canceling the thread. Have a look at pthread_cleanup_push() for this.
You can also set your thread to use asynchronous cancellation - your thread can then be canceled immediately, without hitting a cancellation point. However, asynchronous cancellation should only be used if you aren't using any system calls at all (that is, it's okay if you're purely doing calculations on already available data - but not if you're for instance using printf, file I/O, socket communication or similar), as otherwise you'll risk your system ending up in an inconsistent state.
After calling pthread_cancel(), your main thread should call pthread_join() on the canceled thread, to make sure all thread resources are cleaned up (unless you create the thread as detached).
You can of course also just have a shared doExit flag between the two threads, that the main thread can set, and which the other thread looks at from time to time. It's basically a manual way of using pthread_cancel().
I am working on a program with a fixed number of threads in C using posix threads.
How can i be notified when a thread has been terminated due to some error?
Is there a signal to detect it?
If so, can the signal handler create a new thread to keep the number of threads the same?
Make the threads detached
Get them to handle errors gracefully. i.e. Close mutexs, files etc...
Then you will have no probolems.
Perhaps fire a USR1 signal to the main thread to tell it that things have gone pear shaped (i was going to say tits up!)
Create your threads by passing the function pointers to an intermediate function. Start that intermediate function asynchronously and have it synchronously call the passed function. When the function returns or throws an exception, you can handle the results in any way you like.
With the latest inputs you've provided, I suggest you do something like this to get the number of threads a particular process has started-
#include<stdio.h>
#define THRESHOLD 50
int main ()
{
unsigned count = 0;
FILE *a;
a = popen ("ps H `ps -A | grep a.out | awk '{print $1}'` | wc -l", "r");
if (a == NULL)
printf ("Error in executing command\n");
fscanf(a, "%d", &count );
if (count < THRESHOLD)
{
printf("Number of threads = %d\n", count-1);
// count - 1 in order to eliminate header.
// count - 2 if you don't want to include the main thread
/* Take action. May be start a new thread etc */
}
return 0;
}
Notes:
ps H displays all threads.
$1 prints first column where PID is displayed on my system Ubuntu. The column number might change depending on the system
Replace a.out it with your process name
The backticks will evaluate the expression within them and give you the PID of your process. We are taking advantage of the fact that all POSIX threads will have same PID.
I doubt Linux would signal you when a thread dies or exits for any reason. You can do so manually though.
First, let's consider 2 ways for the thread to end:
It terminates itself
It dies
In the first method, the thread itself can tell someone (say the thread manager) that it is being terminated. The thread manager will then spawn another thread.
In the second method, a watchdog thread can keep track of whether the threads are alive or not. This is done more or less like this:
Thread:
while (do stuff)
this_thread->is_alive = true
work
Watchdog:
for all threads t
t->timeout = 0
while (true)
for all threads t
if t->is_alive
t->timeout = 0
t->is_alive = false
else
++t->timeout
if t->timeout > THRESHOLD
Thread has died! Tell the thread manager to respawn it
If for any reason one could not go for Ed Heal's "just work properly"-approach (which is my favorite answer to the OP's question, btw), the lazy fox might take a look at the pthread_cleanup_push() and pthread_cleanup_pop() macros, and think about including the whole thread function's body in between such two macros.
The clean way to know whether a thread is done is to call pthread_join() against that thread.
// int pthread_join(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_join(that_thread_id, &retval);
... here you know that_thread_id returned ...
The problem with pthread_join() is, if the thread never returns (continues to run as expected) then you are blocked. That's therefore not very useful in your case.
However, you may actually check whether you can join (tryjoin) as follow:
//int pthread_tryjoin_np(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_tryjoin_np(that_thread_id, &relval);
// here 'r' tells you whether the thread returned (joined) or not.
if(r == 0)
{
// that_thread_id is done, create new thread here
...
}
else if(errno != EBUSY)
{
// react to "weird" errors... (maybe a perror() at least?)
}
// else -- thread is still running
There is also a timed join which will wait for the amount of time you specified, like a few seconds. Depending on the number of threads to check and if your main process just sits around otherwise, it could be a solution. Block on thread 1 for 5 seconds, then thread 2 for 5 seconds, etc. which would be 5,000 seconds per loop for 1,000 threads (about 85 minutes to go around all threads with the time it takes to manage things...)
There is a sample code in the man page which shows how to use the pthread_timedjoin_np() function. All you would have to do is put a for loop around to check each one of your thread.
struct timespec ts;
int s;
...
if (clock_gettime(CLOCK_REALTIME, &ts) == -1) {
/* Handle error */
}
ts.tv_sec += 5;
s = pthread_timedjoin_np(thread, NULL, &ts);
if (s != 0) {
/* Handle error */
}
If your main process has other things to do, I would suggest you do not use the timed version and just go through all the threads as fast as you can.