Who called atexit()? - c

I have a C program that quits unexpectedly on Linux and I have a hard time finding out why (no core dump, see XIO: fatal IO error 11). I placed an atexit() at the beginning of the program and the callback function is indeed being called when the crash happens.
How can I know what called the atexit callback function? From reading the man page, atexit is called at exit (d'ho!) or return from main. I can exclude the latter because there are a bunch of printf at the end of the main and I don't see them. And I can exclude the former simply because there aren't any exit() in my program.
That leaves only one solution: exit is being called from a library function. Is that the only possibility? And how can I know from where? Is it possible to print out a stack trace or force a core dump from inside the atexit callback?

Call e.g. abort() in your atexit handler, and inspect the coredump in gdb. The gdb backtrace command shows you where it exits, if the atexit handler is run. Here's a demonstration:
#include <stdlib.h>
void exit_handler(void)
{
abort();
}
void startup()
{
#ifdef DO_EXIT
exit(99);
#endif
}
int main(int argc, char *argv[])
{
atexit(exit_handler);
startup();
return 0;
}
And doing this:
$ gcc -DDO_EXIT -g atexit.c
$ ulimit -c unlimited
$ ./a.out
Aborted (core dumped)
$ gdb ./a.out core.28162
GNU gdb (GDB) Fedora 7.7.1-19.fc20
..
Core was generated by `./a.out'.
Program terminated with signal SIGABRT, Aborted.
#0 0xb77d7424 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install glibc-2.18-16.fc20.i686
(gdb) bt
#0 0xb77d7424 in __kernel_vsyscall ()
#1 0x42e1a8e7 in raise () from /lib/libc.so.6
#2 0x42e1c123 in abort () from /lib/libc.so.6
#3 0x0804851b in exit_handler () at atexit.c:6
#4 0x42e1dd61 in __run_exit_handlers () from /lib/libc.so.6
#5 0x42e1ddbd in exit () from /lib/libc.so.6
#6 0x0804852d in startup () at atexit.c:12
#7 0x08048547 in main (argc=1, argv=0xbfc39fb4) at atexit.c:21
As expected, it shows startup() calling exit.
You can ofcourse debug this interactively too, start your program in gdb and set a breakpoint in the atexit handler.

The standard only says "at normal program termination", so maybe on Linux this is more than exit or return from main. Also you forgot pthread_exit, which also may terminate the thread of main and thus the whole program.
In any case, there is no way to see immediatly from where the termination was issued. The atexit handlers are usually called by the initializtion function. By definition all other application code, but the atexit handlers are gone at that point.
You could try to trace execution through a debugger no nail the place where the termination happens down.

Related

Why does the getline call glibc's malloc instead of __wrap_malloc when passing `-wrap=malloc` to the linker?

When I try to pass --wrap=malloc to the ld linker for hooking malloc, I find that getline will not call __wrap_malloc, but call glibc's malloc.
// foo.c
void* __wrap_malloc(size_t sz) {
printf("abc");
return __reall_malloc(sz);
}
int main() {
char *line = NULL;
size_t n = 0;
getline(&line, &n, stdin); // will call malloc
....
}
Run as:
$ gcc foo.c -Wl,--wrap=malloc && ./a.out
It doesn't print "abc", it seems that __wrap_malloc isn't called.
But run as:
$ gcc -static foo.c -Wl,--wrap=malloc && ./a.out
I got a segment fault, print the backtrace:
#0 0x00000000004490e7 in vfprintf ()
#1 0x0000000000407a76 in printf ()
#2 0x0000000000400ab7 in __wrap_malloc (size=1024) at foo.c:26
#3 0x0000000000456a2b in _IO_file_doallocate ()
#4 0x000000000040d4a5 in _IO_doallocbuf ()
#5 0x000000000040c4d8 in _IO_new_file_overflow ()
#6 0x000000000040b421 in _IO_new_file_xsputn ()
#7 0x000000000044921f in vfprintf ()
#8 0x0000000000407a76 in printf ()
#9 0x0000000000400ab7 in __wrap_malloc (size=1024) at foo.c:26
....
It seems that __wrap_malloc (called in printf initally) is called recursively, which means that getline call __wrap_malloc instead of glibc's malloc.
What happened if -static passed to linker? How can I force getline call __wrap_malloc instead of glibc's malloc?
If -static is passed to the link editor, the link editor can see all internal calls to malloc and will try to wrap them, causing the infinite recursion because your own malloc implementation calls printf, which in turn calls malloc in some cases.
Without -static, the link editor cannot rewrite the internal malloc references within libc.so.6, so internal calls in glibc are never redirected. There is no infinite recursion, but you do not see those internal calls, either.
glibc supports replacing malloc using ELF symbol interposition (as far as such a thing is possible):
Replacing malloc
You still need to be careful to avoid triggering infinite recursion, though.

How can this code corrupt my stack trace?

This simple test program,
$ cat crash.c
int main() {
int x = 0;
*(&x + 5) = 10;
return 0;
}
Compiled with GCC 7.4.0,
$ gcc -O0 -g crash.c
Has an unexpected stack trace
$ ./a.out
Segmentation fault (core dumped)
$ gdb ./a.out /tmp/wk_cores/core-pid_19675.dump
Reading symbols from ./a.out...done.
[New LWP 19675]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f450000000a in ?? ()
(gdb) bt
#0 0x00007f450000000a in ?? ()
#1 0x0000000000000001 in ?? ()
#2 0x00007fffd6f97598 in ?? ()
#3 0x0000000100008000 in ?? ()
#4 0x00005632be83d66a in frame_dummy ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I don't understand why the stack doesn't show the invalid store to privileged memory? Can someone help me understand this?
I don't understand why the stack doesn't show the invalid store to privileged memory?
Because you didn't store anything to privileged memory.
To do that, you need to write way outside of stack, something like:
*(&x + 0x10000) = 5;
As is, your program does exhibit undefined behavior, but it doesn't write to "privileged" memory, just to memory that is writable, but that you should't write to.
x is the latest variable in your stack. So if you write at x+5, no matter how far you go, you always write in stack memory after the region of the current allocated stack. Therefore it always fail.

gdb bt gives only ??, how can I debug?

/var/log/message:
segfault at 0 ip 00007fcd16e5853a sp 00007ffd98e37e58 error 4 in libc-2.24.so[7fcd16dc9000+195000]
addr2line -e a.out 00007fcd16e5853a
??:0
gdb bt
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fcd16e5853a in ?? ()
(gdb) bt
#0 0x00007fcd16e5853a in ?? ()
#1 0x000055f2f45fe95b in ?? ()
#2 0x000055f200000080 in ?? ()
#3 0x00007fcd068c2040 in ?? ()
#4 0x000055f2f6109c48 in ?? ()
#5 0x0000000000000000 in ?? ()
build with gcc -Wall -O0 -g
How can I debug this, are there more methods?
gdb bt
Surely that is not the command you actually executed.
Most likely you did something like this:
gdb /path/to/core
(gdb) bt
Don't do that. Do this instead:
gdb /path/to/a.out /path/to/core
(gdb) bt
If you already did invoke GDB correctly, other likely reasons why bt did not work:
You are analyzing the core on a different machine from the one on which it was produced. See this answer.
You rebuilt a.out with different flags. Use the exact binary that crashed.
You have updated libc after the core was produced. Restore it to the version that was current as of when the core was produced.
P.S. This command
addr2line -e a.out 00007fcd16e5853a
makes no sense: the error message told you that the address 00007fcd16e5853a is in libc-2.24.so. The a.out has nothing to do with that address.
The command you want to use is:
addr2line -fe /path/to/libc-2.24.so 195000
P.P.S.
segfault at 0 ip 00007fcd16e5853a ...
This means: NULL pointer dereference inside libc. The most probable cause: not checking for error return, e.g. something like:
FILE *fp = fopen("/some/file", "r");
fscanf(fp, buffer, sizeof(buffer)); // Oops: didn't check for NULL.

SDL simple threading produces segmentation fault

I've been trying to use SDL threading, but I've not been able to make it work properly. Even the simplest programs cause a segmentation fault.
I'm using mingw64 (not mingw32) inside the cygwin environment.
This code produces a segmentation fault both when run inside cygwin and outside it:
#include <SDL.h>
#include <SDL_thread.h>
int threadFunction(void* data) {
return 0;
}
int main(int argc,char* argv[]) {
int dataIn=0;
int dataOut=0;
SDL_Thread* thread;
SDL_Window* window;
SDL_Init(0);
thread=SDL_CreateThread(&threadFunction,"Thread",&dataIn);
SDL_WaitThread(thread,&dataOut);
SDL_Quit();
return 0;
}
However, once printf statements are added in between, like so:
SDL_Init(0);
printf("#1 ");
thread=SDL_CreateThread(&threadFunction,"Thread",&dataIn);
printf("#2 ");
SDL_WaitThread(thread,&dataOut);
printf("#3 ");
SDL_Quit();
printf("#4 ");
return 0;
Something very strange happens. When this code is run inside the cygwin environment, either with or without the -mwindows option, it works ok. The program exits normally. When this same code is run outside the cygwin environment it crashes only without the -mwindows option. Rearranging the printf's around has diverse results, sometimes making it crash and sometimes not.
When this code is run, the output is this:
Cygwin: ./test -> #1 #2 #3 #4
Windows: test -> #1 (program freezes and a "test.exe has stopped working" window appears)
The compilation command I'm using is this: gcc -I./SDL2 -o test test.c -lmingw32 -lSDL2main -lSDL2 where gcc is a symlink to the x86_64 version of the mingw64 C compiler. I've also tried with the normal x86 (i386, 32-bits) version of mingw64/SDL and the same thing happens.
When compiling with the -g option and using gdb on the first example (without printf's) this is shown:
Starting program: /Test/test
[New Thread 5828.0x1084]
Program received signal SIGSEGV, Segmentation fault.
0x000000006c868e30 in ?? () from /Test/SDL2.dll
(gdb) where
#0 0x000000006c868e30 in ?? () from /Test/SDL2.dll
#1 0x000000006c830d29 in SDL_LogCritical ()
from /Test/SDL2.dll
#2 0x000000006c7cb85f in SDL_LogCritical ()
from /Test/SDL2.dll
#3 0x0000000000401600 in SDL_main (argc=1, argv=0x2f0010) at test.c:17
#4 0x0000000000402cfa in console_main ()
#5 0x0000000000402db1 in WinMain ()
#6 0x00000000004013e8 in __tmainCRTStartup ()
at /usr/src/debug/mingw64-x86_64-runtime-4.0.6-1/crt/crtexe.c:332
#7 0x000000000040151b in mainCRTStartup ()
at /usr/src/debug/mingw64-x86_64-runtime-4.0.6-1/crt/crtexe.c:212
Whereas the printf version:
Starting program: /home/Mimi/Escocia/C/Test/test
[New Thread 3380.0x98c]
[New Thread 3380.0x1a80]
[Thread 3380.0x1a80 exited with code 0]
#1 #2 #3 #4 [Inferior 1 (process 3380) exited normally]
(But running this program outside cygwin produces a segfault)
Is linking with libmingw32 on mingw64 wrong? (It doesn't seem right either, but it didn't compile without it.) Am I missing something obvious? Does anybody know what I'm doing wrong? Is it a bug? (If it is, where should I report it to: SDL, mingw64 or Cygwin?)
Thanks in advance.
EDIT: After recompiling SDL the problem is gone. Does anyone know why does it work now? Are there any disadvantages to compiling it myself?

GDB breakpoint when executing external program through system()

I have the function system() call a separate script that has already been compiled. But I'd like to be able to set a breakpoint in functions within THAT specific file.
So:
File A:
system("./fileB");
File B:
void main() {
/* etc */
}
I'd like to be able to set a breakpoint at main after the system command is called.
Any help would be appreciated!
Newer versions of GDB (7.1+) can debug multiple programs at once and can indeed support this:
run-program.c
#include <stdlib.h>
int main()
{
system("./program-i-want-to-debug");
return 0;
}
program-i-want-to-debug.c
#include <stdio.h>
int main()
{
printf("Hello, World\n");
return 0;
}
run-program.gdb
set detach-on-fork off
set target-async on
set pagination off
set non-stop on
add-inferior -exec program-i-want-to-debug
break program-i-want-to-debug.c:5
file run-program
run
inferior 3
backtrace
Sample session
$ gdb -q -x run-program.gdb
Added inferior 2
Breakpoint 1 at 0x400441: file program-i-want-to-debug.c, line 5.
[New process 20297]
process 20297 is executing new program: /usr/bin/bash
process 20297 is executing new program: /home/scottt/Dropbox/stackoverflow/program-i-want-to-debug
Reading symbols from /home/scottt/Dropbox/stackoverflow/program-i-want-to-debug...done.
Breakpoint 1, main () at program-i-want-to-debug.c:5
5 printf("Hello, World\n");
[Switching to inferior 3 [process 20297] (/home/scottt/Dropbox/stackoverflow/program-i-want-to-debug)]
[Switching to thread 2 (process 20297)]
#0 main () at program-i-want-to-debug.c:5
5 printf("Hello, World\n");
#0 main () at program-i-want-to-debug.c:5
Obviously you'd want to compile the programs with debug info (gcc -g).
Maybe I do not catch your point. It seems that starting gdb debug on File A and setting a breakpoint on "FileB:line of main" resolve your problem.

Resources