pthread_join function crash

pthread_join function crash - c

There is a crash inside 'pthread_join' when main function calls it and before that the child thread already terminated. This is the backtrace from gdb:
Core was generated by `./bin/test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xb76fb530 in __call_tls_dtors#plt () from /lib/libpthread.so.0
(gdb) bt
#0 0xb76fb530 in __call_tls_dtors#plt () from /lib/libpthread.so.0
#1 0xb76fdd5a in start_thread (arg=0xb40fab40) at pthread_create.c:319
#2 0xb762f74e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129
The pthread activation function receives NULL argument and return NULL argument. I am clueless why I am seeing this crash consistently.
Could somebody help what could be wrong in child thread activation function? I am using Fedora 20 and gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC)
Skeleton of child activation function is below
void* testControl(void* param)
{
...................
return NULL;
}
As the my code is huge I am giving here the code snippet which explains how I am creating child threads, their exits and termination.
unsigned long int rcThId1;
unsigned long int rcThId2;
unsigned long int rcThId3;
unsigned long int rcThId4;
unsigned long int rcThId5;
unsigned long int rcThId6;
void* rcControl1(void* arg)
{
bool th_loop = true;
while(th_loop)
{
/*Listen and receive the message on message queue*/
...........
..........
switch(message_type)
{
............
............
case EXIT:
th_loop = false;
break;
default:
break;
}
}
return NULL;
}
/*Activation functions for rcControl2 rcControl3 rcControl4
rcControl5 rcControl6 similar to the defination of rcControl1*/
int main(void)
{
pthread_create(&rcThId1,NULL,rcControl1,NULL);
pthread_create(&rcThId2,NULL,rcControl2,NULL);
pthread_create(&rcThId3,NULL,rcControl3,NULL);
pthread_create(&rcThId4,NULL,rcControl4,NULL);
pthread_create(&rcThId5,NULL,rcControl5,NULL);
pthread_create(&rcThId6,NULL,rcControl6,NULL);
..............
..............
/*Post EXIT event to Thread1*/
/*Post EXIT event to Thread2*/
/*Post EXIT event to Thread3*/
/*Post EXIT event to Thread4*/
/*Post EXIT event to Thread5*/
/*Post EXIT event to Thread6*/
/*By now all threads would have already exited */
pthread_join(rcThId1, NULL);/*Inside this function crash is happening*/
pthread_join(rcThId2, NULL);
pthread_join(rcThId3, NULL);
pthread_join(rcThId4, NULL);
pthread_join(rcThId5, NULL);
pthread_join(rcThId6, NULL);
return 0;
}
Inside pthread_join(rcThId1, NULL); call the crash is happened.
Thanks

In the (pseudo)code you posted, the main issue is the type of thread identifiers: they all should be of type pthread_t. But you have unsigned long ints. The crash is most likely because pthread_join() attempts to read rcThId1 et al as if they are pthread_t which they are not.
Change the type of rcThId1 ..rcThId6 to pthread_t.
You should be getting some warnings. If not compiler with:
gcc -Wall -Wextra -pedantic-errors
Aside:
You probably need to have th thread ids as global. Move them inside main() unless you have a good reason not to.

Related

Core is getting generated when the alarm goes off

I'm trying to call the func1 for 3 times using SIGALRM but there is crash happen and core is getting generated intermittently.
This is the sample func:
int func_incr++;
volatile sig_atomic_t valData=0;
void alarm_hdlr(int signum);
{
func_incr++;
valData = 1;
}
void func1()
{
signal(SIGALRM, alarm_hdlr);
alarm(300);
while(func_incr !=3)
{
if(valData) // while debugging using gdb, in this line SIGABRT signal is received (Thread 1 "func1" SIGABRT signal received)
{
valData=0;
func1();
}
}
}
int func1()
{
func2();
}
Can someone help me?
Thanks

Since you are using gdb, and you are also getting abrt signal, you should be able to see the backtrace to see what is reason for sigabrt.

catch Ctrl+C in C program [duplicate]

This question already has answers here:
Catch Ctrl-C in C
(9 answers)
Closed 2 years ago.
I use the following code to catch Ctrl+C in my C program
Code
void sig_handler(int signo)
{
if (signo == SIGINT)
exit(EXIT_SUCCESS);
}
void main ()
{
......
if(signal(SIGINT, sig_handler)== SIG_ERR)
{
printf(">>>>>>>>>>>>>>>>>>>>> SIG INT EROOR !!!! sigint=%d ID=%d \n",SIGINT, getpid());
}
else
printf(">>>>>>>>>>AFTER>>>>>>>>>>> SIG INT sigint=%d PID=%d \n",SIGINT, getpid());
char *buf = NULL;
asprintf(&buf, "%d", getpid());
write(fd, buf, strlen(buf));
free(buf);
uloop_run(); //entering main loop
ubus_exit();
uloop_done();
xml_exit();
config_exit();
free(tmp);
closelog();
log_message(NAME, L_NOTICE, "exiting\n");
return 0;
}
My purpose is to catch Ctrl + C but it seem the signal handler function i.e sig_handler() doesn't run.
I want to know how to fix it?

As iharob answered, you should add the handler for the signal.
However, you should carefully read signal(7) and notice that it is not legal to call printf from inside a signal handler (since printf is not an async-signal-safe function). You should use write(2) instead of printf(3).
This restriction is significant and important. Don't forget that e.g. both printf and malloc could be interrupted at arbitrary moments, but they are not designed for that.
At the very least, call fflush(3) and/or end your printf format string with a \n; but that would still be undefined behavior (but you might be "unlucky" to have it do what you want most of the time).
BTW, it is recommended today to use sigaction(2) instead of the "obsolete" signal(2)
In practice, the recommended practice inside a signal handler would be most of the time to set some volatile sigatomic_t flag (to be tested outside the handler), or to call siglongjmp(3). If you insist on doing something else, be sure that you use (even indirectly) only async-signal-safe functions (and there are few of them, mostly the syscalls(2) ....). In particular, stdio(3) & malloc(3) should never be used from a signal handler (and that rules out most of the standard C functions, or most of library functions).
You may want to have some event loop around poll(2) (then you might be interested by the Linux specific signalfd(2)....); you should compile with all warnings and debug info (gcc -Wall -Wextra -g). Then use the gdb debugger (and also strace(1)) to debug your program.
Are you sure that the functions you are using (e.g. uloop_run, etc...) are not blocking or ignoring signals?. You should strace your program to find out!

You should add the handler to the signal with this function
sighandler_t signal(int signum, sighandler_t handler);
in your case
signal(SIGNINT, sig_handler);
One more thing, your main function must return int, so void main() is wrong, it should be int main().
The uloop_run function, from OpenWrt installs a signal handler for SIGINT so it's not possible to interrup it, and it overrides your signal handler.
That is the actual reason why your signal handler is never called.
The program wont handle the signal, until the uloop_run function exits, this is the uloop_run source with the relevant part
static void uloop_setup_signals(bool add)
{
struct sigaction s;
struct sigaction *act, *oldact;
memset(&s, 0, sizeof(struct sigaction));
if (add) {
s.sa_handler = uloop_handle_sigint;
s.sa_flags = 0;
act = &s;
oldact = &org_sighandler;
} else {
act = &org_sighandler;
oldact = NULL;
}
sigaction(SIGINT, act, oldact);
if (uloop_handle_sigchld) {
if (add) {
//act already points to s, so no need to update pointer
s.sa_handler = uloop_sigchld;
oldact = &org_sighandler_child;
} else {
act = &org_sighandler_child;
oldact = NULL;
}
sigaction(SIGCHLD, act, oldact);
}
}
void uloop_run(void)
{
struct timeval tv;
/*
* Handlers are only updated for the first call to uloop_run() (and restored
* when this call is done).
*/
if (!uloop_recursive_count++)
uloop_setup_signals(true);
while(!uloop_cancelled)
{
uloop_gettime(&tv);
uloop_gettime(&tv);
uloop_run_events(uloop_get_next_timeout(&tv));
}
if (!--uloop_recursive_count)
uloop_setup_signals(false);
}
as you can see, uloop_setup_signals(true); installs a new signal handler for SIGNINT and when the loop is finished uloop_setup_signals(false); is called restoring the previous signal handler.
So, this is the reason.

How to detect program termination in C/Linux?

How can an application find out that it just started terminating ? Can I use signal handler for that ?

Enable atexit(). It will call a function when program terminated normally.
Sample code:
#include <stdio.h>
#include <stdlib.h>
void funcall(void);
void fnExit1 (void)
{
printf ("Exit function \n");
}
int main ()
{
atexit (fnExit1);
printf ("Main function start\n");
funcall();
printf ("Main function end\n");
return 0;
}
void funcall(void)
{
sleep(2);
exit(0);
}
Output:
Main function start
Exit function

You Could try ---> int raise (int sig)
And handle when SIGTERM or SIGKILL is raised!!

You can also register a function to be called upon exit of a process. See man atexit

You can install a signal handler for SIGINT ,SIGKILL and SIGSEGV. In the signal handler you can take a stack dump so you can debug your application later.In the signal handler set the disposition of SIGINT ,SIGKILL and SIGSEGV back to default.

Why realloc deadlock after clone syscall?

I have a problem that realloc() deadlocks sometime after clone() syscall.
My code is:
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <linux/types.h>
#define CHILD_STACK_SIZE 4096*4
#define gettid() syscall(SYS_gettid)
#define log(str) fprintf(stderr, "[pid:%d tid:%d] "str, getpid(),gettid())
int clone_func(void *arg){
int *ptr=(int*)malloc(10);
int i;
for (i=1; i<200000; i++)
ptr = realloc(ptr, sizeof(int)*i);
free(ptr);
return 0;
}
int main(){
int flags = 0;
flags = CLONE_VM;
log("Program started.\n");
int *ptr=NULL;
ptr = malloc(16);
void *child_stack_start = malloc(CHILD_STACK_SIZE);
int ret = clone(clone_func, child_stack_start +CHILD_STACK_SIZE, flags, NULL, NULL, NULL, NULL);
int i;
for (i=1; i<200000; i++)
ptr = realloc(ptr, sizeof(int)*i);
free(ptr);
return 0;
}
the callstack in gdb is:
[pid:13268 tid:13268] Program started.
^Z[New LWP 13269]
Program received signal SIGTSTP, Stopped (user).
0x000000000040ba0e in __lll_lock_wait_private ()
(gdb) bt
#0 0x000000000040ba0e in __lll_lock_wait_private ()
#1 0x0000000000408630 in _L_lock_11249 ()
#2 0x000000000040797f in realloc ()
#3 0x0000000000400515 in main () at test-realloc.c:36
(gdb) i thr
2 LWP 13269 0x000000000040ba0e in __lll_lock_wait_private ()
* 1 LWP 13268 0x000000000040ba0e in __lll_lock_wait_private ()
(gdb) thr 2
[Switching to thread 2 (LWP 13269)]#0 0x000000000040ba0e in __lll_lock_wait_private ()
(gdb) bt
#0 0x000000000040ba0e in __lll_lock_wait_private ()
#1 0x0000000000408630 in _L_lock_11249 ()
#2 0x000000000040797f in realloc ()
#3 0x0000000000400413 in clone_func (arg=0x7fffffffe53c) at test-realloc.c:20
#4 0x000000000040b889 in clone ()
#5 0x0000000000000000 in ?? ()
My OS is debian linux-2.6.32-5-amd64, with GNU C Library (Debian EGLIBC 2.11.3-4) stable release version 2.11.3. I deeply suspect that eglibc is the criminal on this bug.
On clone() syscall, is it not enough before using realloc()?

You cannot use clone with CLONE_VM yourself -- or if you do, you have to at least make sure you restrict yourself from invoking any function from the standard library after calling clone in either the parent or the child. In order for multiple threads or processes to share the same memory, the implementations of any functions which access shared resources (like the heap) need to
be aware of the fact that multiple flows of control are potentially accessing it so they can arrange to perform the appropriate synchronization, and
be able to obtain information about their own identities via the thread pointer, usually stored in a special machine register. This is completely implementation-internal, and thus you cannot arrange for a new "thread" which you create yourself via clone to have a properly setup thread pointer.
The proper solution is to use pthread_create, not clone.

You cannot do this:
for (i=0; i<200000; i++)
ptr = realloc(ptr, sizeof(int)*i);
free(ptr);
The first time through the loop, i is zero. realloc( ptr, 0 ) is equivalent to free( ptr ), and you cannot free twice.

I add a flag, CLONE_SETTLS, in clone() syscall. Then the deadlock is gone.
So I think eglibc's realloc() used some TLS data. When new thread create without a new TLS, some locks (in TLS) shared between this thread and his father, and realloc() using those locks stucked. So, if somebody want to use clone() directly, the best way is alloc a new TLS to new thread.
code snippet likes this:
flags = CLONE_VM | CLONE_SETTLS;
struct user_desc* p_tls_desc = malloc(sizeof(struct user_desc));
clone(clone_func, child_stack_start +CHILD_STACK_SIZE, flags, NULL, NULL, p_tls_desc, NULL);

check for threads still running after program exits

gcc 4.4.3 c89 pthreads
I use valgrind for checking memory errors.
I am just wondering if there is any tool for linux that can detect running threads that haven't been terminated after the program finishes.
I am running a multi-thread application and need a tool to make sure all threads have finished.
Many thanks for any suggestions,

If the program has terminated (because the initial thread returned from main(), some thread called exit(), or a fatal signal was recieved by the process) then you are guaranteed that all threads have been terminated with extreme prejudice.
If you want to write your program so that it ensures that all its threads have exited before main() exits, then you need to loop over all your threads at the end of main(), calling pthread_join() on each one. (This also means that you shouldn't create your threads detached, or detach them).

A Tool Approach
You can use Valgrind to help with this (via it's Helgrind tool), but it requires minor modification of the code. For each thread, you make the thread lock a unique mutex when the thread is created, and release the mutex when the thread exits. Then, when run under Helgrind, you will get a warning if the thread hasn't exited when the program terminates because the thread will still be holding the lock to the mutex. Consider this example thread start routine:
void * thread_start (void *arg)
{
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&mutex);
// ...
// Here the thread does whatever it normally does
// ...
// Unlock the mutex before exiting
pthread_mutex_unlock(&mutex);
}
Simply run the program using Valgrind's Helgrind tool like so:
$ valgrind --tool=helgrind ./<program-name>
If the thread didn't exit when the program terminated, then Helgrind produces a warning like this:
==2203== Thread #2 was created
==2203== at 0x31C96D3CDE: clone (in /lib64/libc-2.5.so)
==2203== by 0x31CA206D87: pthread_create##GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
==2203== by 0x4A0B206: pthread_create_WRK (hg_intercepts.c:229)
==2203== by 0x4A0B2AD: pthread_create#* (hg_intercepts.c:256)
==2203== by 0x40060A: main (main.c:26)
==2203==
==2203== Thread #2: Exiting thread still holds 1 lock
==2203== at 0x4005DD: thread_start (main.c:13)
==2203== by 0x4A0B330: mythread_wrapper (hg_intercepts.c:201)
==2203== by 0x31CA20673C: start_thread (in /lib64/libpthread-2.5.so)
==2203== by 0x31C96D3D1C: clone (in /lib64/libc-2.5.so)
You will get false positives using this method if you don't add the mutex unlock code anywhere the thread may exit (e.g. using pthread_exit), but fixing such a false-positive is easy once it is identified.
An Alternative Approach (Recommended)
Having said all of the above, that's probably not the approach I myself would take. Instead, I would write the program such that it cannot terminate until all threads have exited. The simplest way to achieve this is to call pthread_exit from the main thread before returning from main. Doing so will mean that the process will stay alive so long as any other thread is still running.
If you take this approach, and the process doesn't quit when you expect it to, then you know that a thread is still running. You can then attach a debugger to the process to determine which threads are still running and what they are doing.

If you plan to use Boost.Threads library, then you can use the .join() method.
For example:
#include <boost/thread/thread.hpp>
#include <iostream>
void hello()
{
std::cout <<
"Hello world, I'm a thread!"
<< std::endl;
}
int main(int argc, char* argv[])
{
boost::thread thrd(&hello);
thrd.join();
return 0;
}

There is a simple trick in this similar question: Multiple threads in C program
If you call pthread_exit from main, your process will not exit until all the other threads complete.

Original answer was updated to address pthread_exit() scenario.
Assuming you want to tell whether all threads were pthread_join()-ed properly before you return from main(), there are a few ways:
Run it under the gdb and break on the last line of main(), then look at the output of "threads" command. There should only be main thread.
Make a shared library that overrides pthread_create with a wrapper that keeps a counter of how many threads are started. Thread wrapper increments a counter and calls the actual thread function, and a function registered with pthread_create_key() will decrement it when a thread returns or exits. Library destructor will check if the counter is zero, which means that all of them were terminated. Use it with your executable with LD_PRELOAD=checker.so ./your_executable (no code modification necessary).
Tested on Debian 5.0.5.
checker.c
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <dlfcn.h>
#include <stdlib.h>
/* thread-local storage key */
static pthread_key_t tls_key = 0;
static int counter = 0;
static pthread_mutex_t g_mutex;
/* TLS destructor prototype */
void on_thread_end(void*);
void __attribute__ ((constructor))
init_checker()
{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutex_init(&g_mutex, &attr);
pthread_mutexattr_destroy(&attr);
pthread_key_create(&tls_key, &on_thread_end);
}
void __attribute__ ((destructor))
finalize_checker()
{
int remain;
pthread_mutex_lock(&g_mutex);
remain = counter;
pthread_mutex_unlock(&g_mutex);
pthread_mutex_destroy(&g_mutex);
if (remain)
fprintf(stderr, "Warning: %d threads not terminated\n", remain);
pthread_key_delete(tls_key);
}
/* thread function signature */
typedef void* (*ThreadFn)(void*);
struct wrapper_arg
{
ThreadFn fn;
void* arg;
};
/* TLS destructor: called for every thread we created
when it exits */
void
on_thread_end(void *arg)
{
free(arg);
pthread_mutex_lock(&g_mutex);
--counter;
pthread_mutex_unlock(&g_mutex);
}
static void*
thread_wrapper(void *arg)
{
void *ret;
struct wrapper_arg *warg;
warg = (struct wrapper_arg*)arg;
/* Thread started, increment count. */
pthread_mutex_lock(&g_mutex);
++counter;
pthread_mutex_unlock(&g_mutex);
/* set thread-specific data to avoid leaks
* when thread exits
*/
pthread_setspecific(tls_key, arg);
/* Run the actual function. */
ret = (*warg->fn)(warg->arg);
/* Thread finishes, TLS destructor will be called. */
return ret;
}
/* pthread_create signature */
typedef int (*CreateFn)(pthread_t*,const pthread_attr_t*,ThreadFn,void*);
/* Overriding phtread_create */
int
pthread_create(
pthread_t *thread,
const pthread_attr_t *attr,
ThreadFn start_routine,
void *arg)
{
CreateFn libc_pthread_create;
struct wrapper_arg *warg;
/* Get a handle to the real function. */
libc_pthread_create
= (CreateFn)dlsym(RTLD_NEXT, "pthread_create");
if (!libc_pthread_create)
return -1;
/* Wrap user function. */
warg = malloc(sizeof(struct wrapper_arg));
if (!warg)
return -1;
warg->fn = start_routine;
warg->arg = arg;
/* Create a thread with a wrapper. */
return libc_pthread_create(thread, attr, &thread_wrapper, warg);
}
Makefile
CFLAGS+=-fpic -O3
checker.so: checker.o
gcc -shared -Wl,-soname,$# -o $# $^ -ldl -lpthread

Correct me if wrong, but a program is not finished until all running threads have ended.

You don't need any external tool for this: i would track the threads by using a simple semaphore instead.
1) set it up so that its initial count is the same as the number of your threads:
sem_init( &semThreadCount, 0, threadCount );
2) Modify your threads to "notify" they are exiting gracefully:
sem_wait( &semThreadCount );
3) You can either quit whenever the threads are finished or when the semaphore is 0, or just print the remaining semaphore value and quit, that will be the number of still-running threads:
int v;
sem_getvalue( &semThreadCount, &v );
This way you can both ensure no threads are still running if your exit or, with some logging, being able to know which ones are still running after you quit.
Remember to sem_destroy the sempahore as well.

If you can not use C++ and therefore KMan's answer, then you can also join detached pthreads using the "C" API. (Joining means to wait for the detached threads to finish their work.)
See the pthread tutorial.

The existance of the process, that is if there is any thread still running, can be checked with waitpid.
If you just want your process to continue with all the threads, but you don't need the one of main anymore you can end that thread by pthread_exit. Other than an explicit exit or a simple return this will not terminate your other threads.

Such tools already exists. On Linux you can use ps or top. On Windows, good ole Task Manager does the job:. Just check whether your process still exists:
if the process still exists, it means that one or more threads in it are running.
if there are no more threads running, the process is terminated.

If they're threads (rather than processes) then you just need to check for your process stll running because threads run inside a process.
You can check if a process is running with ps -ef then pipe the result into grep to find your specific process.

If you want an external means to observe the threads in execution for your process, on Linux you can look in /proc/(pid)/task. That's the method tools like ps(1) or top(1) use.
See http://linux.die.net/man/5/proc

You're missing out on the important part:
A program cannot exit unless all its threads are terminated.
What you should do, however, is pthread_join() on all the threads before exiting. This ensures that all threads terminated, and will allow you to free() all their respective pthread_ts, so that you do not leak memory from them.
Have that said, valgrind can give you a comprehensive view on threads you haven't cleaned up after. Run it with --leakcheck=full and make sure you are not leaving various structs behind you. Those will indicate there is a thread you haven't completely terminated properly.