Firstly, I use pthread library to write multithreading C programs. Threads always hung by their waited mutexes. When I use the strace utility to find a thread in the FUTEX_WAIT status, I want to know which thread holds that mutex at that time. But I don't know how I could I do it. Are there any utilities that could do that?
Someone told me the Java virtual machine supports this, so I want to know whether Linux support this feature.
You can use knowledge of the mutex internals to do this. Ordinarily this wouldn't be a very good idea, but it's fine for debugging.
Under Linux with the NPTL implementation of pthreads (which is any modern glibc), you can examine the __data.__owner member of the pthread_mutex_t structure to find out the thread that currently has it locked. This is how to do it after attaching to the process with gdb:
(gdb) thread 2
[Switching to thread 2 (Thread 0xb6d94b90 (LWP 22026))]#0 0xb771f424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb771f424 in __kernel_vsyscall ()
#1 0xb76fec99 in __lll_lock_wait () from /lib/i686/cmov/libpthread.so.0
#2 0xb76fa0c4 in _L_lock_89 () from /lib/i686/cmov/libpthread.so.0
#3 0xb76f99f2 in pthread_mutex_lock () from /lib/i686/cmov/libpthread.so.0
#4 0x080484a6 in thread (x=0x0) at mutex_owner.c:8
#5 0xb76f84c0 in start_thread () from /lib/i686/cmov/libpthread.so.0
#6 0xb767784e in clone () from /lib/i686/cmov/libc.so.6
(gdb) up 4
#4 0x080484a6 in thread (x=0x0) at mutex_owner.c:8
8 pthread_mutex_lock(&mutex);
(gdb) print mutex.__data.__owner
$1 = 22025
(gdb)
(I switch to the hung thread; do a backtrace to find the pthread_mutex_lock() it's stuck on; change stack frames to find out the name of the mutex that it's trying to lock; then print the owner of that mutex). This tells me that the thread with LWP ID 22025 is the culprit.
You can then use thread find 22025 to find out the gdb thread number for that thread and switch to it.
I don't know of any such facility so I don't think you will get off that easily - and it probably wouldn't be as informative as you think in helping to debug your program. As low tech as it might seem, logging is your friend in debugging these things. Start collecting your own little logging functions. They don't have to be fancy, they just have to get the job done while debugging.
Sorry for the C++ but something like:
void logit(const bool aquired, const char* lockname, const int linenum)
{
pthread_mutex_lock(&log_mutex);
if (! aquired)
logfile << pthread_self() << " tries lock " << lockname << " at " << linenum << endl;
else
logfile << pthread_self() << " has lock " << lockname << " at " << linenum << endl;
pthread_mutex_unlock(&log_mutex);
}
void someTask()
{
logit(false, "some_mutex", __LINE__);
pthread_mutex_lock(&some_mutex);
logit(true, "some_mutex", __LINE__);
// do stuff ...
pthread_mutex_unlock(&some_mutex);
}
Logging isn't a perfect solution but nothing is. It usually gets you what you need to know.
Normally libc/platforms calls are abstracted by OS abstraction layer. The mutex dead locks can be tracked using a owner variable and pthread_mutex_timedlock. Whenever the thread locks it should update the variable with own tid(gettid() and can also have another variable for pthread id storage) . So when the other threads blocks and timed out on pthread_mutex_timedlock it can print the value of owner tid and pthread_id. this way you can easily find out the owner thread. please find the code snippet below, note that all the error conditions are not handled
pid_t ownerTid;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
class TimedMutex {
public:
TimedMutex()
{
struct timespec abs_time;
while(1)
{
clock_gettime(CLOCK_MONOTONIC, &abs_time);
abs_time.tv_sec += 10;
if(pthread_mutex_timedlock(&mutex,&abs_time) == ETIMEDOUT)
{
log("Lock held by thread=%d for more than 10 secs",ownerTid);
continue;
}
ownerTid = gettid();
}
}
~TimedMutex()
{
pthread_mutex_unlock(&mutex);
}
};
There are other ways to find out dead locks, maybe this link might help http://yusufonlinux.blogspot.in/2010/11/debugging-core-using-gdb.html.
Please read below link, This has a generic solution for finding the lock owner. It works even if lock in side a library and you don't have the source code.
https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Deadlocks
Related
I am trying to implement a single-writer, multiple-reader queue in pthreads. The synchronization pattern works but eventuallly deadlocks after repeated requests (I believe). It works with one writer boss thread and one reader worker thread indefinitely, but if I have one writer boss thread, and multiple reader worker threads, it eventually hangs. When I backtrace in gdb, I see this:
// Boss:
Thread 1 (Thread 0x7ffff7fd1780 (LWP 21029)):
#0 0x00007ffff7bc44b0 in futex_wait
...
// Worker:
Thread 2 (Thread 0x7ffff42ff700 (LWP 21033)):
#0 0x00007ffff7bc39f3 in futex_wait_cancelable
...
// Worker:
Thread 3 (Thread 0x7ffff3afe700 (LWP 21034)):
#0 0x00007ffff7bc39f3 in futex_wait_cancelable
...
To me this seems like the workers are waiting on the signal, and the boss is hanging on the signal and not sending it. But, I don't know why that would happen.
I have tried this synchronization pattern:
// Boss:
pthread_mutex_lock(&queue_mutex);
queue_push(&queue, data);
pthread_cond_signal(&queue_condition);
pthread_mutex_unlock(&queue_mutex);
return;
// Worker(s):
pthread_mutex_lock(&queue_mutex);
while((queue_isempty(&queue)) > 0) {
pthread_cond_wait(&queue_condition, &queue_mutex);
}
data_t *data = queue_pop(&queue);
pthread_mutex_unlock(&queue_mutex);
do_work(data);
To the best of my knowledge, this is the correct synchronization pattern. But, evidence suggests I am not applying the correct pattern. Could someone help me understand why this single-writer, multiple-reader queue access in pthreads would not work as I intend?
Here is the best guess based on available code-let. The dead lock is probably caused by the workers hold the lock while waiting for signal, and the boss got no chance to hold the lock (while worker is holding it, in order to send the signal). The following should avoid the dead lock.
// Boss:
pthread_mutex_lock(&queue_mutex);
queue_push(&queue, data);
pthread_mutex_unlock(&queue_mutex);
pthread_cond_signal(&queue_condition);
return;
// Worker(s):
while((queue_isempty(&queue)) > 0) { //> assume queue_isempty(const void*);
pthread_cond_wait(&queue_condition, &queue_mutex);
}
pthread_mutex_lock(&queue_mutex);
data_t *data = queue_pop(&queue);
pthread_mutex_unlock(&queue_mutex);
do_work(data);
I have read this and this post on stackoverflow, but no one of them give me what I want to do.
In my case, I want to create a Thread, launch it and let it running with no blocking stat as long as the main process runs. This thread has no communication, no synchronization with the main process, it do his job fully independent.
Consider this code:
#define DAY_PERIOD 86400 /* 3600*24 seconds */
int main() {
char wDir[255] = "/path/to/log/files";
compress_logfiles(wDir);
// do other things, this things let the main process runs all the time.
// just segmentation fault, stackoverflow, memory overwrite or
// somethings like that stop it.
return 0;
}
/* Create and launch thread */
void compress_logfiles(char *wDir)
{
pthread_t compressfiles_th;
if (pthread_create(&compressfiles_th, NULL, compress, wDir))
{
fprintf(stderr, "Error create compressfiles thread\n");
return;
}
if (pthread_join(compressfiles_th, NULL))
{
//fprintf(stderr, "Error joining thread\n");
return;
}
return;
}
void *compress(void *wDir)
{
while(1)
{
// Do job to compress files
// and sleep for one day
sleep(DAY_PERIOD); /* sleep one day*/
}
return NULL;
}
With ptheard_join in compress_logfiles function, the thread compresses all files successfully and never returns because it is in infinite while loop, so the main process still blocked all the time. If I remove ptheard_join from compress_logfiles function, the main process is not blocked because it don't wait thread returns, but the thread compresses one file and exit (there a lot of files, arround one haundred).
So, is there a way to let main process launch compressfiles_th thread and let it do his job without waiting it to finish or exit?
I found pthread_tryjoin_np and pthread_timedjoin_np in Linux Programmer's Manual, it seems that pthread_tryjoin_np do the job if I don't care of the returned value, it is good idea to use it?
Thank you.
Edit 1:
Please note that the main process is daemonized after call to compress_logfiles(wDir), perhaps the daemonization kill the main process and re-launch it is the problem?
Edit 2: the solution
Credit to dbush
Yes, fork causes the problem, and pthread_atfork() solves it. I made this change to run the compressfiles_th without blocking main process:
#define DAY_PERIOD 86400 /* 3600*24 seconds */
char wDir[255] = "/path/to/log/files"; // global now
// function added
void child_handler(){
compress_logfiles(wDir); // wDir is global variable now
}
int main()
{
pthread_atfork(NULL, NULL, child_handler);
// Daemonize the process.
becomeDaemon(BD_NO_CHDIR & BD_NO_CLOSE_FILES & BD_NO_REOPEN_STD_FDS & BD_NO_UMASK0 & BD_MAX_CLOSE);
// do other things, this things let the main process runs all the time.
// just segmentation fault, stackoverflow, memory overwrite or
// somethings like that stop it.
return 0;
}
child_handler() function is called after fork. pthread_atfork
When you fork a new process, only the calling thread is duplicated, not all threads.
If you wish to daemonize, you need to fork first, then create your threads.
From the man page for fork:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent is
replicated in the child, including the states of mutexes, condition
variables, and other pthreads objects; the use of pthread_atfork(3)
may be helpful for dealing with problems that this can cause.
first of all I'd like to apologize for the confusing title. But here's my question:
I have a main function which spawns another thread which is only working from time to time with "sleep(3)" in between.
Inside the main.c , I've a while loop which is running infinitively. So to cancel the program, I have to press Ctrl+C. To catch that, I added a signal handler at the beginning of the main function:
signal(SIGINT, quitProgram);
This is my quitProgram function:
void quitProgram() {
printf("CTRL + C received. Quitting.\n");
running = 0;
return;
}
So when running == 0, the loop is left.
It all seems to work, at least until the thread mentioned started. When I hit Ctrl+C after the thread has started, I get a strange error message:
`*** longjmp causes uninitialized stack frame `***: ./cluster_control terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x68e4e)[0xb7407e4e]
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x6b)[0xb749a85b]
/lib/i386-linux-gnu/libc.so.6(+0xfb70a)[0xb749a70a]
/lib/i386-linux-gnu/libc.so.6(__longjmp_chk+0x42)[0xb749a672]
./cluster_control[0x8058427]
[0xb76e2404]
[0xb76e2428]
/lib/i386-linux-gnu/libc.so.6(nanosleep+0x46)[0xb7454826]
/lib/i386-linux-gnu/libc.so.6(sleep+0xcd)[0xb74545cd]
./cluster_control[0x804c0e6]
./cluster_control[0x804ae61]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb73b8a83]
./cluster_control[0x804a331]
When I try to debug it using gdb I get the following:
`(gdb) where
`#0 0xb7fdd428 in __kernel_vsyscall ()
`#1 0xb7d4f826 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
`#2 0xb7d4f5cd in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
`#3 0x0804c0e6 in master_main (mastBroad_sock=3, workReady_ptr=0xbffff084) at master.c:150
`#4 0x0804ae61 in main () at main.c:84
The line 150 in master.c is this:
sleep(PING_PERIOD);
So my guess what's happening: The main thread is exiting while the master_main thread is sleeping and this is causing the error. But: How can I fix this? Is there a better way to let the master_main thread run every few seconds? Or to prevent the main thread from exiting while the master_man is still in sleep?
I tried to use a mutex, but it didn't work (locked the mutex before master_main is sleeping and unlock it afterwards and the exiting main thread needed that mutex to exit).
Additionally I passed an pointer from main to main_master with a state. So I would set the state of main_master to "exit" before exiting the main method, but that didn't work either.
So are there any other ideas? I'm running linux and programming language is C99.
Update 1
Sorry guys, I think I gave you wrong information. The method which causes trouble isn't even inside a thread. Here's an excerpt from my main method:
int main() {
[...]
signal(SIGINT, quitProgram);
while (running)
{
// if system is the current master
if (master_i)
{
master_main(mastBroad_sock, &workReady_condMtx);
pthread_mutex_lock(&(timeCount_mtx_sct.mtx));
master_i = 0;
pthread_mutex_unlock(&(timeCount_mtx_sct.mtx));
}
[...]
}
return 0;
}
And also an excerpt from the master_main which I guess is the problem.
int master_main(int mastBroad_sock, struct cond_mtx *workReady_ptr) {
[...]
while (master_i)
{
// do something
sleep(5); // to perform this loop only every 5 seconds, this is line 150 in master.c
}
}
Update 2
Forgot to add the code which catches Ctrl+C inside the main.c:
void quitProgram() {
printf("CTRL + C received. Quitting.\n");
running = 0;
return;
}
The simplest solution that comes to mind is to have a global flag that tells the thread that the program is shutting down, and so when the main function want to shutdown it sets the flag and then waits for the thread to terminate.
See Joining and Detaching Threads. Depending on what the thread is doing, you might also want to take a look at Condition Variables.
I'm writing a code in which I have two threads running in parallel.
1st is the main thread which started the 2nd thread.
2nd thread is just a simple thread executing empty while loop.
Now I want to pause / suspend the execution of 2nd thread by 1st thread who created it.
And after some time I want to resume the execution of 2nd thread (by issuing some command or function) from where it was paused / suspended.
This question is not about how to use mutexes, but how to suspend a thread.
In Unix specification there is a thread function called pthread_suspend, and another called pthread_resume_np, but for some reason the people who make Linux, FreeBSD, NetBSD and so on have not implemented these functions.
So to understand it, the functions simply are not there. There are workarounds but unfortunately it is just not the same as calling SuspendThread on windows. You have to do all kinds of non-portable stuff to make a thread stop and start using signals.
Stopping and resuming threads is vital for debuggers and garbage collectors. For example, I have seen a version of Wine which is not able to properly implement the "SuspendThread" function. Thus any windows program using it will not work properly.
I thought that it was possible to do it properly using signals based on the fact that JVM uses this technique of signals for the Garbage collector, but I have also just seen some articles online where people are noticing deadlocks and so on with the JVM, sometimes unreproducable.
So to come around to answer the question, you cannot properly suspend and resume threads with Unix unless you have a nice Unix that implements pthread_suspend_np. Otherwise you are stuck with signals.
The big problem with Signals is when you have about five different libraries all linked in to the same program and all trying to use the same signals at the same time. For this reason I believe that you cannot actually use something like ValGrind and for example, the Boehm GC in one program. At least without major coding at the very lowest levels of userspace.
Another answer to this question could be. Do what Linuz Torvalds does to NVidia, flip the finger at him and get him to implement the two most critical parts missing from Linux. First, pthread_suspend, and second, a dirty bit on memory pages so that proper garbage collectors can be implemented. Start a large petition online and keep flipping that finger. Maybe by the time Windows 20 comes out, they will realise that Suspending and resuming threads, and having dirty bits is actually one of the fundamental reasons Windows and Mac are better than Linux, or any Unix that does not implement pthread_suspend and also a dirty bit on virtual pages, like VirtualAlloc does in Windows.
I do not live in hope. Actually for me I spent a number of years planning my future around building stuff for Linux but have abandoned hope as a reliable thing all seems to hinge on the availability of a dirty bit for virtual memory, and for suspending threads cleanly.
As far as I know you can't really just pause some other thread using pthreads. You have to have something in your 2nd thread that checks for times it should be paused using something like a condition variable. This is the standard way to do this sort of thing.
I tried suspending and resuming thread using signals, here is my solution. Please compile and link with -pthread.
Signal SIGUSR1 suspends the thread by calling pause() and SIGUSR2 resumes the thread.
From the man page of pause:
pause() causes the calling process (or thread) to sleep until a
signal is delivered that either terminates the process or causes the
invocation of a
signal-catching function.
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
// Since I have only 2 threads so using two variables,
// array of bools will be more useful for `n` number of threads.
static int is_th1_ready = 0;
static int is_th2_ready = 0;
static void cb_sig(int signal)
{
switch(signal) {
case SIGUSR1:
pause();
break;
case SIGUSR2:
break;
}
}
static void *thread_job(void *t_id)
{
int i = 0;
struct sigaction act;
pthread_detach(pthread_self());
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
act.sa_handler = cb_sig;
if (sigaction(SIGUSR1, &act, NULL) == -1)
printf("unable to handle siguser1\n");
if (sigaction(SIGUSR2, &act, NULL) == -1)
printf("unable to handle siguser2\n");
if (t_id == (void *)1)
is_th1_ready = 1;
if (t_id == (void *)2)
is_th2_ready = 1;
while (1) {
printf("thread id: %p, counter: %d\n", t_id, i++);
sleep(1);
}
return NULL;
}
int main()
{
int terminate = 0;
int user_input;
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, thread_job, (void *)1);
// Spawned thread2 just to make sure it isn't suspended/paused
// when thread1 received SIGUSR1/SIGUSR2 signal
pthread_create(&thread2, NULL, thread_job, (void *)2);
while (!is_th1_ready && !is_th2_ready);
while (!terminate) {
// to test, I am sensing signals depending on input from STDIN
printf("0: pause thread1, 1: resume thread1, -1: exit\n");
scanf("%d", &user_input);
switch(user_input) {
case -1:
printf("terminating\n");
terminate = 1;
break;
case 0:
printf("raising SIGUSR1 to thread1\n");
pthread_kill(thread1, SIGUSR1);
break;
case 1:
printf("raising SIGUSR2 to thread1\n");
pthread_kill(thread1, SIGUSR2);
break;
}
}
pthread_kill(thread1, SIGKILL);
pthread_kill(thread2, SIGKILL);
return 0;
}
There is no pthread_suspend(), pthread_resume() kind of APIs in POSIX.
Mostly condition variables can be used to control the execution of other threads.
The condition variable mechanism allows threads to suspend execution
and relinquish the processor until some condition is true. A condition
variable must always be associated with a mutex to avoid a race
condition created by one thread preparing to wait and another thread
which may signal the condition before the first thread actually waits
on it resulting in a deadlock.
For more info
Pthreads
Linux Tutorial Posix Threads
If you can use processes instead, you can send job control signals (SIGSTOP / SIGCONT) to the second process. If you still want to share the memory between those processes, you can use SysV shared memory (shmop, shmget, shmctl...).
Even though I haven't tried it myself, it might be possible to use the lower-level clone() syscall to spawn threads that don't share signals. With that, you might be able to send SIGSTOP and SIGCONT to the other thread.
For implementing the pause on a thread, you need to make it wait for some event to happen. Waiting on a spin-lock mutex is CPU cycle wasting. IMHO, this method should not be followed as the CPU cycles could have been used up by other processes/threads.
Wait on a non-blocking descriptor (pipe, socket or some other). Example code for using pipes for inter-thread communication can be seen here
Above solution is useful, if your second thread has more information from multiple sources than just the pause and resume signals. A top-level select/poll/epoll can be used on non-blocking descriptors. You can specify the wait time for select/poll/epoll system calls, and only that much micro-seconds worth of CPU cycles will be wasted.
I mention this solution with forward-thinking that your second thread will have more things or events to handle than just getting paused and resumed. Sorry if it is more detailed than what you asked.
Another simpler approach can be to have a shared boolean variable between these threads.
Main thread is the writer of the variable, 0 - signifies stop. 1 - signifies resume
Second thread only reads the value of the variable. To implement '0' state, use usleep for sime micro-seconds then again check the value. Assuming, few micro-seconds delay is acceptable in your design.
To implement '1' - check the value of the variable after doing certain number of operations.
Otherwise, you can also implement a signal for moving from '1' to '0' state.
You can use mutex to do that, pseudo code would be:
While (true) {
/* pause resume */
lock(my_lock); /* if this is locked by thread1, thread2 will wait until thread1 */
/* unlocks it */
unlock(my_lock); /* unlock so that next iteration thread2 could lock */
/* do actual work here */
}
You can suspend a thread simply by signal
pthread_mutex_t mutex;
static void thread_control_handler(int n, siginfo_t* siginfo, void* sigcontext) {
// wait time out
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
}
// suspend a thread for some time
void thread_suspend(int tid, int time) {
struct sigaction act;
struct sigaction oact;
memset(&act, 0, sizeof(act));
act.sa_sigaction = thread_control_handler;
act.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
sigemptyset(&act.sa_mask);
pthread_mutex_init(&mutex, 0);
if (!sigaction(SIGURG, &act, &oact)) {
pthread_mutex_lock(&mutex);
kill(tid, SIGURG);
sleep(time);
pthread_mutex_unlock(&mutex);
}
}
Not sure if you will like my answer or not. But you can achieve it this way.
If it is a separate process instead of a thread, I have a solution (This might even work for thread, maybe someone can share your thoughts) using signals.
There is no system currently in place to pause or resume the execution of the processes. But surely you can build one.
Steps I would do if I want it in my project:
Register a signal handler for the second process.
Inside the signal handler, wait for a semaphore.
Whenever you want to pause the other process, just send in a signal
that you registered the other process with. The program will go into
sleep state.
When you want to resume the process, you can send a different signal
again. Inside that signal handler, you will check if the semaphore is
locked or not. If it is locked, you will release the semaphore. So
the process 2 will continue its execution.
If you can implement this, please do share your feedack, if it worked for you or not. Thanks.
gcc 4.4.3 c89 pthreads
I use valgrind for checking memory errors.
I am just wondering if there is any tool for linux that can detect running threads that haven't been terminated after the program finishes.
I am running a multi-thread application and need a tool to make sure all threads have finished.
Many thanks for any suggestions,
If the program has terminated (because the initial thread returned from main(), some thread called exit(), or a fatal signal was recieved by the process) then you are guaranteed that all threads have been terminated with extreme prejudice.
If you want to write your program so that it ensures that all its threads have exited before main() exits, then you need to loop over all your threads at the end of main(), calling pthread_join() on each one. (This also means that you shouldn't create your threads detached, or detach them).
A Tool Approach
You can use Valgrind to help with this (via it's Helgrind tool), but it requires minor modification of the code. For each thread, you make the thread lock a unique mutex when the thread is created, and release the mutex when the thread exits. Then, when run under Helgrind, you will get a warning if the thread hasn't exited when the program terminates because the thread will still be holding the lock to the mutex. Consider this example thread start routine:
void * thread_start (void *arg)
{
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&mutex);
// ...
// Here the thread does whatever it normally does
// ...
// Unlock the mutex before exiting
pthread_mutex_unlock(&mutex);
}
Simply run the program using Valgrind's Helgrind tool like so:
$ valgrind --tool=helgrind ./<program-name>
If the thread didn't exit when the program terminated, then Helgrind produces a warning like this:
==2203== Thread #2 was created
==2203== at 0x31C96D3CDE: clone (in /lib64/libc-2.5.so)
==2203== by 0x31CA206D87: pthread_create##GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
==2203== by 0x4A0B206: pthread_create_WRK (hg_intercepts.c:229)
==2203== by 0x4A0B2AD: pthread_create#* (hg_intercepts.c:256)
==2203== by 0x40060A: main (main.c:26)
==2203==
==2203== Thread #2: Exiting thread still holds 1 lock
==2203== at 0x4005DD: thread_start (main.c:13)
==2203== by 0x4A0B330: mythread_wrapper (hg_intercepts.c:201)
==2203== by 0x31CA20673C: start_thread (in /lib64/libpthread-2.5.so)
==2203== by 0x31C96D3D1C: clone (in /lib64/libc-2.5.so)
You will get false positives using this method if you don't add the mutex unlock code anywhere the thread may exit (e.g. using pthread_exit), but fixing such a false-positive is easy once it is identified.
An Alternative Approach (Recommended)
Having said all of the above, that's probably not the approach I myself would take. Instead, I would write the program such that it cannot terminate until all threads have exited. The simplest way to achieve this is to call pthread_exit from the main thread before returning from main. Doing so will mean that the process will stay alive so long as any other thread is still running.
If you take this approach, and the process doesn't quit when you expect it to, then you know that a thread is still running. You can then attach a debugger to the process to determine which threads are still running and what they are doing.
If you plan to use Boost.Threads library, then you can use the .join() method.
For example:
#include <boost/thread/thread.hpp>
#include <iostream>
void hello()
{
std::cout <<
"Hello world, I'm a thread!"
<< std::endl;
}
int main(int argc, char* argv[])
{
boost::thread thrd(&hello);
thrd.join();
return 0;
}
There is a simple trick in this similar question: Multiple threads in C program
If you call pthread_exit from main, your process will not exit until all the other threads complete.
Original answer was updated to address pthread_exit() scenario.
Assuming you want to tell whether all threads were pthread_join()-ed properly before you return from main(), there are a few ways:
Run it under the gdb and break on the last line of main(), then look at the output of "threads" command. There should only be main thread.
Make a shared library that overrides pthread_create with a wrapper that keeps a counter of how many threads are started. Thread wrapper increments a counter and calls the actual thread function, and a function registered with pthread_create_key() will decrement it when a thread returns or exits. Library destructor will check if the counter is zero, which means that all of them were terminated. Use it with your executable with LD_PRELOAD=checker.so ./your_executable (no code modification necessary).
Tested on Debian 5.0.5.
checker.c
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <dlfcn.h>
#include <stdlib.h>
/* thread-local storage key */
static pthread_key_t tls_key = 0;
static int counter = 0;
static pthread_mutex_t g_mutex;
/* TLS destructor prototype */
void on_thread_end(void*);
void __attribute__ ((constructor))
init_checker()
{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutex_init(&g_mutex, &attr);
pthread_mutexattr_destroy(&attr);
pthread_key_create(&tls_key, &on_thread_end);
}
void __attribute__ ((destructor))
finalize_checker()
{
int remain;
pthread_mutex_lock(&g_mutex);
remain = counter;
pthread_mutex_unlock(&g_mutex);
pthread_mutex_destroy(&g_mutex);
if (remain)
fprintf(stderr, "Warning: %d threads not terminated\n", remain);
pthread_key_delete(tls_key);
}
/* thread function signature */
typedef void* (*ThreadFn)(void*);
struct wrapper_arg
{
ThreadFn fn;
void* arg;
};
/* TLS destructor: called for every thread we created
when it exits */
void
on_thread_end(void *arg)
{
free(arg);
pthread_mutex_lock(&g_mutex);
--counter;
pthread_mutex_unlock(&g_mutex);
}
static void*
thread_wrapper(void *arg)
{
void *ret;
struct wrapper_arg *warg;
warg = (struct wrapper_arg*)arg;
/* Thread started, increment count. */
pthread_mutex_lock(&g_mutex);
++counter;
pthread_mutex_unlock(&g_mutex);
/* set thread-specific data to avoid leaks
* when thread exits
*/
pthread_setspecific(tls_key, arg);
/* Run the actual function. */
ret = (*warg->fn)(warg->arg);
/* Thread finishes, TLS destructor will be called. */
return ret;
}
/* pthread_create signature */
typedef int (*CreateFn)(pthread_t*,const pthread_attr_t*,ThreadFn,void*);
/* Overriding phtread_create */
int
pthread_create(
pthread_t *thread,
const pthread_attr_t *attr,
ThreadFn start_routine,
void *arg)
{
CreateFn libc_pthread_create;
struct wrapper_arg *warg;
/* Get a handle to the real function. */
libc_pthread_create
= (CreateFn)dlsym(RTLD_NEXT, "pthread_create");
if (!libc_pthread_create)
return -1;
/* Wrap user function. */
warg = malloc(sizeof(struct wrapper_arg));
if (!warg)
return -1;
warg->fn = start_routine;
warg->arg = arg;
/* Create a thread with a wrapper. */
return libc_pthread_create(thread, attr, &thread_wrapper, warg);
}
Makefile
CFLAGS+=-fpic -O3
checker.so: checker.o
gcc -shared -Wl,-soname,$# -o $# $^ -ldl -lpthread
Correct me if wrong, but a program is not finished until all running threads have ended.
You don't need any external tool for this: i would track the threads by using a simple semaphore instead.
1) set it up so that its initial count is the same as the number of your threads:
sem_init( &semThreadCount, 0, threadCount );
2) Modify your threads to "notify" they are exiting gracefully:
sem_wait( &semThreadCount );
3) You can either quit whenever the threads are finished or when the semaphore is 0, or just print the remaining semaphore value and quit, that will be the number of still-running threads:
int v;
sem_getvalue( &semThreadCount, &v );
This way you can both ensure no threads are still running if your exit or, with some logging, being able to know which ones are still running after you quit.
Remember to sem_destroy the sempahore as well.
If you can not use C++ and therefore KMan's answer, then you can also join detached pthreads using the "C" API. (Joining means to wait for the detached threads to finish their work.)
See the pthread tutorial.
The existance of the process, that is if there is any thread still running, can be checked with waitpid.
If you just want your process to continue with all the threads, but you don't need the one of main anymore you can end that thread by pthread_exit. Other than an explicit exit or a simple return this will not terminate your other threads.
Such tools already exists. On Linux you can use ps or top. On Windows, good ole Task Manager does the job:. Just check whether your process still exists:
if the process still exists, it means that one or more threads in it are running.
if there are no more threads running, the process is terminated.
If they're threads (rather than processes) then you just need to check for your process stll running because threads run inside a process.
You can check if a process is running with ps -ef then pipe the result into grep to find your specific process.
If you want an external means to observe the threads in execution for your process, on Linux you can look in /proc/(pid)/task. That's the method tools like ps(1) or top(1) use.
See http://linux.die.net/man/5/proc
You're missing out on the important part:
A program cannot exit unless all its threads are terminated.
What you should do, however, is pthread_join() on all the threads before exiting. This ensures that all threads terminated, and will allow you to free() all their respective pthread_ts, so that you do not leak memory from them.
Have that said, valgrind can give you a comprehensive view on threads you haven't cleaned up after. Run it with --leakcheck=full and make sure you are not leaving various structs behind you. Those will indicate there is a thread you haven't completely terminated properly.