Using a function pointer to pthread_create causes segfault - c

I am attempting to stub out pthread_create to be able to fully unit test a module. When the function pointer is called from within the test framework a segmentation fault occurs. If I debug the program using 'gdb' I am able to directly call the function pointer and it works correctly.
I am using CppUTest as the unit test framework and have compiled my object files using gcc.
This function has worked in production code prior to altering it to use a function pointer for pthread_create, so I am confident in the function in general.
Stack Trace from GDB
> Starting program:
> /home/lucid/depot/torr_linux_common_dev/main/src/Utilities/tests/testRunner
> [Thread debugging using libthread_db enabled] Using host libthread_db
> library "/lib/i386-linux-gnu/libthread_db.so.1".
>
> Program received signal SIGSEGV, Segmentation fault. 0x080660c4 in
> sys_pthreads_create () (gdb) backtrace
> #0 0x080660c4 in sys_pthreads_create ()
> #1 0x08049ee4 in th_start_thread_name (thread=0x8049e64 <TestThread>, arg=0x0, opts=0x0, name=0x0) at thr.c:177
> #2 0x08049e47 in test_ThreadTestGroup_ThreadCreateUnnamed_wrapper_c () at thr_test.c:66
> #3 0x08049223 in TEST_ThreadTestGroup_ThreadCreateUnnamed_Test::testBody
> (this=0x806cc90) at testRunner.c:21
> #4 0x0805576a in PlatformSpecificSetJmpImplementation ()
> #5 0x08053ab7 in Utest::run() ()
> #6 0x080550d5 in UtestShell::runOneTestInCurrentProcess(TestPlugin*, TestResult&) ()
> #7 0x08053645 in helperDoRunOneTestInCurrentProcess ()
> #8 0x0805576a in PlatformSpecificSetJmpImplementation ()
> #9 0x08053b8f in UtestShell::runOneTest(TestPlugin*, TestResult&) ()
> #10 0x080530ef in TestRegistry::runAllTests(TestResult&) ()
> #11 0x0804a3ef in CommandLineTestRunner::runAllTests() ()
> #12 0x0804a4e9 in CommandLineTestRunner::runAllTestsMain() ()
> #13 0x0804a628 in CommandLineTestRunner::RunAllTests(int, char const**) ()
> #14 0x08049246 in main (argc=1, argv=0xbffff244) at testRunner.c:25
If I call the function pointer from within gdb it works
(gdb) p (*sys_pthreads_create)(&thr, 0, thread, arg)
[New Thread 0xb7c01b40 (LWP 17717)]
$4 = 0
Function I am testing
#include <pthread.h>
#include "mypthreads.h"
long th_start_thread_name(TH_THREAD_FUNC thread, void *arg, th_opts *opts, const char* name)
{
pthread_t thr;
int ret, sret;
//pthread_create(opts ? &opts->thr : &thr, NULL, thread, arg);
ret = (*sys_pthreads_create)(opts ? &opts->thr : &thr, 0, thread, arg);
if (ret == 0 && name != NULL)
{
extern int pthread_setname_np(pthread_t thr, const char *name); /* Fix warning from missing prototype. */
sret = pthread_setname_np(opts ? opts->thr : thr, name);
/* pthreads says that thread names must not exceed 16, including NULL. */
if (sret != 0 && strlen(name) > 15)
{
ret = -1;
}
}
return (long)ret;
}
mypthreads.h
extern int (*sys_pthreads_create(pthread_t *, const pthread_attr_t *,
void *(*) (void*), void *));
mypthreads.c
#include <stdio.h>
#include <pthread.h>
int my_pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg)
{
printf("Did you get the messsage?");
return pthread_create(thread, attr, start_routine, arg);
}
int (*sys_pthreads_create)(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg) = my_pthread_create;
Edit: Added output from gdb when I call the function pointer and it succeeds.

The problem is that your declaration in mypthreads.h has the wrong type:
extern int (*sys_pthreads_create(pthread_t *, const pthread_attr_t *, void *(*) (void*), void *));
Due to a misplaced parantheses, the type of this symbol is a function that returns a pointer to int, but your actual sys_pthreads_create object is a pointer to a function.
This means that when you call:
ret = (*sys_pthreads_create)(opts ? &opts->thr : &thr, 0, thread, arg);
sys_pthreads_create is converted to a pointer to a function by implicitly taking the address of it, then that address is dereferenced and called. But that's not really the address of a function - it's the address of a pointer to a function! So the call jumps into the data segment where sys_pthreads_create lives and crashes when it tries to execute the function pointer as code (or crashes due to a non-executable mapping).
There's a clue to this in the gdb output:
#0 0x080660c4 in sys_pthreads_create ()
It says that it's executing within sys_pthreads_create - but sys_pthreads_create is a variable, not a function.
The compiler would have diagnosed this for you if you had included <mypthreads.h> in mypthreads.c, because the conflicting types for sys_pthreads_create would have been visible to it (that's why you should always include the header file that declares objects in the source files that define those objects).
The correct declaration of course is the one that matches mypthreads.c:
extern int (*sys_pthreads_create)(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg);
The reason that gdb was able to call the function pointer successfully was that gdb uses the type information stored in the debugging info to determine the type of sys_pthreads_create, not the bogus information from the header file.

Related

Interposing library functions via linker script: other functions cause segfault

I'm currently utilizing the LD_PRELOAD trick and am utilizing a linker version script as detailed in an article on another website. My MCVE code is included below.
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
#define BUFFER_SIZE (1024)
int __printf__(const char *fmt, ...)
{
char buf[BUFFER_SIZE] = { 0 };
int ret;
int len;
va_list args;
va_start(args, fmt);
vsnprintf(buf, BUFFER_SIZE - 1, fmt, args);
#if 1
//typeof(vsnprintf) *real_func = dlsym(RTLD_NEXT, "vsnprintf");
//(*real_func)(buf, BUFFER_SIZE - 1, fmt, args);
#endif
len = strlen(buf);
ret = write(STDOUT_FILENO, buf, len);
va_end(args);
return ret;
}
asm(".symver __printf__, __printf_chk#GLIBC_2.3.4");
If I modify my custom printf function to simply write a static string, no problems. However, I want to modify the data being sent to the console via printf (add a prefix, suffix, and set certain character to UPPERCASE, etc). It seems that whenever I attempt to use any other printf-family functions to generate a copy of the user-provided string, I get a segfault, as shown below.
Program received signal SIGSEGV, Segmentation fault.
strchrnul () at ../sysdeps/x86_64/strchr.S:32
32 ../sysdeps/x86_64/strchr.S: No such file or directory.
(gdb) bt
#0 strchrnul () at ../sysdeps/x86_64/strchr.S:32
#1 0x00007ffff78591c8 in __find_specmb (format=0x1 <error: Cannot access memory at address 0x1>) at printf-parse.h:108
#2 _IO_vfprintf_internal (s=s#entry=0x7fffffffc380, format=format#entry=0x1 <error: Cannot access memory at address 0x1>, ap=ap#entry=0x7fffffffc4f8) at vfprintf.c:1312
#3 0x00007ffff7882989 in _IO_vsnprintf (string=0x7fffffffc510 "", maxlen=<optimized out>, format=0x1 <error: Cannot access memory at address 0x1>, args=0x7fffffffc4f8)
at vsnprintf.c:114
#4 0x00007ffff7bd58a1 in __printf__ (fmt=0x1 <error: Cannot access memory at address 0x1>) at libfakeprintf.c:19
#5 0x00000000004004aa in printf (__fmt=0x400644 "%s received %d args\n") at /usr/include/x86_64-linux-gnu/bits/stdio2.h:104
#6 main (argc=<optimized out>, argv=<optimized out>) at print_args.c:5
(gdb) quit
What is causing this crash?
Thank you.
You have overridden the glibc internal function __printf_chk , however this function does not have a prototype that matches printf. It's prototype is:
int __printf_chk(int flag, const char * format, ...);
So make sure your own __printf__ function has that prototype too.
There's a brief description of __printf_chk here

Portable way to programatically detect where a signal occured

Assuming the following code (main.c):
#include <unistd.h>
#include <signal.h>
void handler(int sig)
{
pause(); /* line 7 */
}
int main(void)
{
signal(SIGALRM, handler);
alarm(1);
pause();
}
When I run this in gbd and set a break point inside handler(), run the code and wait a second I can do the following:
(gdb) b 7
Breakpoint 1 at 0x4005b7: file main.c, line 7.
(gdb) r
Starting program: a.out
Breakpoint 1, handler (sig=14) at main.c:7
7 pause();
(gdb) bt
#0 handler (sig=14) at main.c:7
#1 <signal handler called>
#2 0x00007ffff7afd410 in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:82
#3 0x00000000004005e0 in main () at main.c:14
Is there a portable way to get 0x00007ffff7afd410 or 0x00000000004005e0?
With sigaction instead of signal the handler is called with the ucontext of the location where the signal occurred:
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <ucontext.h>
static void handler(int sig, siginfo_t *siginfo, void *context)
{
ucontext_t *ucontext = context;
printf("rip %p\n", (void *)ucontext->uc_mcontext.gregs[REG_RIP]);
pause();
}
int main(void)
{
struct sigaction sact;
memset(&sact, 0, sizeof sact);
sact.sa_sigaction = handler;
sact.sa_flags = SA_SIGINFO;
if (sigaction(SIGALRM, &sact, NULL) < 0) {
perror("sigaction");
return 1;
}
alarm(1);
pause();
return 0;
}
rip output and gdb bt output:
(gdb) b 13
Breakpoint 1 at 0x4006de: file main.c, line 13.
(gdb) r
Starting program: /home/osboxes/a.out
rip 0x7ffff7ae28a0
Breakpoint 1, handler (sig=14, siginfo=0x7fffffffdf70, context=0x7fffffffde40)
at main.c:13
13 pause();
(gdb) bt
#0 handler (sig=14, siginfo=0x7fffffffdf70, context=0x7fffffffde40)
at main.c:13
#1 <signal handler called>
#2 0x00007ffff7ae28a0 in __pause_nocancel () from /lib64/libc.so.6
#3 0x0000000000400758 in main () at main.c:28
Not extremely portable I guess, but backtrace(3) is available in glibc and a few other libc's:
backtrace() returns a backtrace for the calling program, in the array
pointed to by buffer. A backtrace is the series of currently active
function calls for the program.
You'd have to check how many entries up the stack you need to look. It should be consistent for Linux at least.
If you want to translate the backtrace to something resembling gdb's display, you could use addr2line(1) from binutils. With something like
popen("addr2line -Cfip -e ./myprog", "w")
you could even do it at runtime by writing addresses (as strings) to the FILE* you get back.

gnu c on_exit - segmentation fault

Out of curiosity I am trying to get the libc on_exit function to work, but I have run into a problem with a segmentation fault. The difficulty I am having is finding an explanation of the proper use of this function. The function is defined in glibc as:
Function: int on_exit (void (*function)(int status, void *arg), void *arg)
This function is a somewhat more powerful variant of atexit. It accepts two arguments, a function and an arbitrary pointer arg. At normal program termination, the function is called with two arguments: the status value passed to exit, and the arg.
I created a small test, and I cannot find where the segmentation fault is generated:
#include <stdio.h>
#include <stdlib.h>
void *
exitfn (int stat, void *arg) {
printf ("exitfn has been run with status %d and *arg %s\n", stat, (char *)arg);
return NULL;
}
int
main (void)
{
static char *somearg="exit_argument";
int exit_status = 1;
on_exit (exitfn (exit_status, somearg), somearg);
exit (EXIT_SUCCESS);
}
Compiled with: gcc -Wall -o fn_on_exit fnc-on_exit.c
The result is:
$ ./fn_on_exit
exitfn has been run with status 1 and *arg exit_argument
Segmentation fault
Admittedly, this is probably readily apparent for seasoned coders, but I am not seeing it. What is the proper setup for use of the on_exit function and why in this case is a segmentation fault generated?
The line of code
on_exit (exitfn (exit_status, somearg), somearg);
Should be
on_exit (exitfn, somearg);
As you do not want to call the exitfn at this stage (that returns NULL!)

reading the environment when executing ELF IFUNC dispatch functions

The IFUNC mechanism in recent ELF tools on (at least) Linux allows to choose a implementation of a function at runtime. Look at the iunc attribute in the GCC documentation for more detailed description: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
Another description of IFUNC mecanism : http://www.agner.org/optimize/blog/read.php?i=167
I would like to choose my implementation depending on the value of an environment variable. However, my experiments show me that the libc (at least the part about environment) is not yet initialized when the resolver function is run. So, the classical interfaces (extern char**environ or getenv()) do not work.
Does anybody know how to access the environment of a program in Linux at very early stage ? The environment is setup by the kernel at the execve(2) system call, so it is already somewhere (but where exactly ?) in the program address space at early initialization.
Thanks in advance
Vincent
Program to test:
#include <stdio.h>
#include <stdlib.h>
extern char** environ;
char** save_environ;
char* toto;
int saved=0;
extern int fonction ();
int fonction1 () {
return 1;
}
int fonction2 () {
return 2;
}
static typeof(fonction) * resolve_fonction (void) {
saved=1;
save_environ=environ;
toto=getenv("TOTO");
/* no way to choose between fonction1 and fonction2 with the TOTO envvar */
return fonction1;
}
int fonction () __attribute__ ((ifunc ("resolve_fonction")));
void print_saved() {
printf("saved: %dn", saved);
if (saved) {
printf("prev environ: %pn", save_environ);
printf("prev TOTO: %sn", toto);
}
}
int main() {
print_saved();
printf("main environ: %pn", environ);
printf("main environ[0]: %sn", environ[0]);
printf("main TOTO: %sn", getenv("TOTO"));
printf("main value: %dn", fonction());
return 0;
}
Compilation and execution:
$ gcc -Wall -g ifunc.c -o ifunc
$ env TOTO=ok ./ifunc
saved: 1
prev environ: (nil)
prev TOTO: (null)
main environ: 0x7fffffffe288
main environ[0]: XDG_VTNR=7
main TOTO: ok
main value: 1
$
In the resolver function, environ is NULL and getenv("TOTO") returns NULL. In the main function, the information is here.
Function Pointer
I found no way to use env in early stage legally. Resolver function runs in linker even earlier, than preinit_array functions. The only legal way to resolve this is to use function pointer and decide what function to use in function of .preinit_array section:
extern char** environ;
int(*f)();
void preinit(int argc, char **argv, char **envp) {
f = f1;
environ = envp; // actually, it is done a bit later
char *v = getenv("TOTO");
if (v && strcmp(v, "ok") == 0) {
f = f2;
}
}
__attribute__((section(".preinit_array"))) typeof(preinit) *__preinit = preinit;
ifunc & GNU ld inners
Glibc's linker ld contains a local symbol _environ and it is initialized, but it is rather hard to extract it. There is another way I found, but it is a bit tricky and rather unreliable.
At linker's entry point _start only stack is initialized. Program arguments and environmental values are sent to the process via stack. Arguments are stored in the following order:
argc, argv, argv + 1, ..., argv + argc - 1, NULL, ENV...
Linker ld shares a global symbol _dl_argv, which points to this place on the stack. With the help of it we can extract all the needful variables:
extern char** environ;
extern char **_dl_argv;
char** get_environ() {
int argc = *(int*)(_dl_argv - 1);
char **my_environ = (char**)(_dl_argv + argc + 1);
return my_environ;
}
typeof(f1) * resolve_f() {
environ = get_environ();
const char *var = getenv("TOTO");
if (var && strcmp(var, "ok") == 0) {
return f2;
}
return f1;
}
int f() __attribute__((ifunc("resolve_f")));

pthread_cond_broadcast broken with dlsym?

I am trying to interpose calls to pthread_cond_broadcast using LD_PRELOAD mechanism. My interposed pthread_cond_broadcast function just calls the original pthread_cond_broadcast. However, for a very simple pthread code where both pthread_cond_wait and pthread_cond_broadcast get invoked, I either end up with a segfault in glibc (for glibc 2.11.1) or the program hangs (for glibc 2.15). Any clues on that is going on?
The interposition code (that gets compiled as a shared library):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <dlfcn.h>
static int (*orig_pthread_cond_broadcast)(pthread_cond_t *cond) = NULL;
__attribute__((constructor))
static void start() {
orig_pthread_cond_broadcast =
(int (*)()) dlsym(RTLD_NEXT, "pthread_cond_broadcast");
if (orig_pthread_cond_broadcast == NULL) {
printf("pthread_cond_broadcast not found!!!\n");
exit(1);
}
}
__attribute__((__visibility__("default")))
int pthread_cond_broadcast(pthread_cond_t *cond) {
return orig_pthread_cond_broadcast(cond);
}
The simple pthread program:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
pthread_mutex_t cond_mutex;
pthread_cond_t cond_var;
int condition;
void *thread0_work(void *arg) {
pthread_mutex_lock(&cond_mutex);
printf("Signal\n");
condition = 1;
pthread_cond_broadcast(&cond_var);
pthread_mutex_unlock(&cond_mutex);
return NULL;
}
void *thread1_work(void *arg) {
pthread_mutex_lock(&cond_mutex);
while (condition == 0) {
printf("Wait\n");
pthread_cond_wait(&cond_var, &cond_mutex);
printf("Done waiting\n");
}
pthread_mutex_unlock(&cond_mutex);
return NULL;
}
int main() {
pthread_t thread1;
pthread_mutex_init(&cond_mutex, NULL);
pthread_cond_init(&cond_var, NULL);
pthread_create(&thread1, NULL, thread1_work, NULL);
// Slowdown this thread, so the thread 1 does pthread_cond_wait.
usleep(1000);
thread0_work(NULL);
pthread_join(thread1, NULL);
return 0;
}
EDIT:
For glibc 2.11.1, gdb bt gives:
(gdb) set environment LD_PRELOAD=./libintercept.so
(gdb) run
Starting program: /home/seguljac/intercept/main
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7436700 (LWP 19165)]
Wait
Signal
Before pthread_cond_broadcast
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff79ca0e7 in pthread_cond_broadcast##GLIBC_2.3.2 () from /lib/libpthread.so.0
(gdb) bt
#0 0x00007ffff79ca0e7 in pthread_cond_broadcast##GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00007ffff7bdb769 in pthread_cond_broadcast () from ./libintercept.so
#2 0x00000000004008e8 in thread0_work ()
#3 0x00000000004009a4 in main ()
EDIT 2:
(Solved)
As suggested by R.. (thanks!), the issue is that on my platform pthread_cond_broadcast is a versioned symbol, and dlsym gives the wrong version. This blog explains this situation in great detail: http://blog.fesnel.com/blog/2009/08/25/preloading-with-multiple-symbol-versions/
The call through your function seems to end up in a different version of the function:
With LD_PRELOAD: __pthread_cond_broadcast_2_0 (cond=0x804a060) at old_pthread_cond_broadcast.c:37
Without LD_PRELOAD: pthread_cond_broadcast##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_broadcast.S:39
So your situation is similar to this question, i.e. you are getting incompatible versions of pthread functions: symbol versioning and dlsym
This page gives one way to solve the problem, though a bit complex: http://blog.fesnel.com/blog/2009/08/25/preloading-with-multiple-symbol-versions/

Resources