This question already has answers here:
How to make backtrace()/backtrace_symbols() print the function names?
(5 answers)
Closed 8 years ago.
I am trying to print a backtrace when my C++ program terminated. Function printing backtrace is like below;
void print_backtrace(void){
void *tracePtrs[10];
size_t count;
count = backtrace(tracePtrs, 10);
char** funcNames = backtrace_symbols(tracePtrs, count);
for (int i = 0; i < count; i++)
syslog(LOG_INFO,"%s\n", funcNames[i]);
free(funcNames);
}
It gives an output like ;
desktop program: Received SIGSEGV signal, last error is : Success
desktop program: ./program() [0x422225]
desktop program: ./program() [0x422371]
desktop program: /lib/libc.so.6(+0x33af0) [0x7f0710f75af0]
desktop program: /lib/libc.so.6(+0x12a08e) [0x7f071106c08e]
desktop program: ./program() [0x428895]
desktop program: /lib/libc.so.6(__libc_start_main+0xfd) [0x7f0710f60c4d]
desktop program: ./program() [0x4082c9]
Is there a way to get more detailed backtrace with function names and lines, like gdb outputs?
Yes - pass the -rdynamic flag to the linker. It will cause the linker to put in the link tables the name of all the none static functions in your code, not just the exported ones.
The price you pay is a very slightly longer startup time of your program. For small to medium programs you wont notice it. What you get is that backtrace() is able to give you the name of all the none static functions in your back trace.
However - BEWARE: there are several gotchas you need to be aware of:
backtrace_symbols allocates memory from malloc. If you got into a SIGSEGV due to malloc arena corruption (quite common) you will double fault here and never see your back trace.
Depending on the platform this runs on (e.g. x86), the address/function name of the exact function where you crashed will be replaced in place on the stack with the return address of the signal handler. You need to get the right EIP of the crashed function from the signal handler parameters for those platforms.
syslog is not an async signal safe function. It might take a lock internally and if that lock is taken when the crash occurred (because you crashed in the middle of another call to syslog) you have a dead lock
If you want to learn all the gory details, check out this video of me giving a talk about it at OLS: http://free-electrons.com/pub/video/2008/ols/ols2008-gilad-ben-yossef-fault-handlers.ogg
Feed the addresses to addr2line and it will show you the file name, line number, and function name.
If you're fine with only getting proper backtraces when running through valgrind, then this might be an option for you:
VALGRIND_PRINTF_BACKTRACE(format, ...):
It will give you the backtrace for all functions, including static ones.
The better option I have found is libbacktrace by Ian Lance Taylor:
https://github.com/ianlancetaylor/libbacktrace
backtrace_symbols() does prints only exported symbols and could not be less portable as it requires the GNU libc.
addr2line is nice as it includes file names and line numbers. But it fails as soon as the loader performs relocations. Nowadays as ASLR is common, it will fail very often.
libunwind alone will not allow one to print file names and line numbers. To do this, one needs to parse DWARF debugging information inside the ELF binary file. This can be done using libdwarf, though. But why bother when libbacktrace gives you everything required for free?
Create a pipe
fork()
Make child process execute addr2line
In parent process, convert the addresses returned from backtrace() to hexadecimal
Write the hex addresses to the pipe
Read back the output from addr2line and print/log it
Since you're doing all this from a signal handler, make sure to not use functionality which is not async-signal-safe. You can see a list of async-signal-safe POSIX functions here.
If you don't want to take the "signal a different process that runs gdb on you" approach, which I think gby is advocating, you can also slightly alter your code to call open() on a crash log file and then backtrace_symbols_fd() with the fd returned by open() - both functions are async signal safe according to the glibc manual. You'll need still -rdynamic, of course. Also, from what I've seen, you still sometimes need to run addr2line on some addresses that the backtrace*() functions won't be able to decode.
Also note fork() is not async signal safe: http://article.gmane.org/gmane.linux.man/1893/match=fork+async, at least not on Linux. Neither is syslog(), as somebody already pointed out.
If ou want a very detailled backtrace, you should use ptrace(2) to trace the process you want the backtrace.
You will be able to see all functions your process used but you need some basic asm knowledge
Related
How can I debug a C application that does not crash when attached with gdb and run inside of gdb?
It crashes consistently when run standalone - even the same debug build!
A few of us are getting this error with a C program written for BSD/Linux, and we are compiling on macOS with OpenSSL.
app(37457,0x7000017c7000) malloc: *** mach_vm_map(size=13835058055282167808) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ERROR: malloc(buf->length + 1) failed!
I know, not helpful.
Recompiling the application with -g -rdynamic gives the same error. Ok, so now we know it isn't because of a release build as it continues to fail.
It works when running within a gdb debugging session though!!
$ sudo gdb app
(gdb) b malloc_error_break
Function "malloc_error_break" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (malloc_error_break) pending.
(gdb) run -threads 8
Starting program: ~/code/app/app -threads 8
[New Thread 0x1903 of process 45436]
warning: unhandled dyld version (15)
And it runs for hours. CTRL-C, and run ./app -threads 8 and it crashes after a second or two (a few million iterations).
Obviously there's an issue within one of the threads. But those workers for the threads are pretty big (a few hundred lines of code). Nothing stands out.
Note that the threads iterate over loops of about 20 million per second.
macOS 10.12.3
Homebrew w/GNU gcc and openssl (linking to crypto)
Ps, not familiar with C too much - especially any type of debugging. Be kind and expressive/verbose in answers. :)
One debugging technique that is sometimes overlooked is to include debug prints in the code, of course it has it's disadvantages, but also it has advantages. A thing you must keep in mind though in the face of abnormal termination is to make sure the printouts actually get printed. Often it's enough to print to stderr (but if that doesn't make the trick one may need to fflush the stream explicitly).
Another trick is to stop the program before the error occurs. This requires you to know when the program is about to crash, preferably as close as possible. You do this by using raise:
raise(SIGSTOP);
This does not terminate the program, it just suspends execution. Now you can attach with gdb using the command gdb <program-name> <pid> (use ps to find the pid of the process). Now in gdb you have to tell it to ignore SIGSTOP:
> handle SIGSTOP ignore
Then you can set break-points. You can also step out of the raise function using the finish command (may have to be issued multiple times to return to your code).
This technique makes the program have normal behaviour up to the time you decide to stop it, hopefully the final part when running under gdb would not alter the behavior enuogh.
A third option is to use valgrind. Normally when you see these kind of errors there's errors involved that valgrind will pick up. These are accesses out of range and uninitialized variables.
Many memory managers initialise memory to a known bad value to expose problems like this (e.g. Microsoft's CRT will use a range of values (0xCD means uninitialised, 0xDD means already free etc).
After each use of malloc, try memset'ing the memory to 0xCD (or some other constant value). This will allow you to identify uninitialised memory more easily with the debugger. don't use 0x00 as this is a 'normal' value and will be harder to spot if it's wrong (it will also probably 'fix' your problem).
Something like:
void *memory = malloc(sizeof(my_object));
memset(memory, 0xCD, sizeof(my_object));
If you know the size of the blocks, you could do something similar before free (this is sometimes harder unless you know the size of your objects, or track it in some way):
memset(memory, 0xDD, sizeof(my_object));
free(memory);
I'm developing in C++(g++) with a non-opensource lib.
every time I run the program, the lib will crash (it double-free some memory).
it's ok for my program now. but it's bad for profiling. I use -pg to profiling the program. As a result of the crash, no 'gmon.out' is generated. so I cannot profile it at all.
Question:
How to profiling a 'crashy' program (with gprof).
PS. valgrind is ok to analysis a crashy program.
regards!
There's a function you can call from your program to dump profile data (the same one that's automatically installed as an atexit handler when you link with -pg), but I don't know what it's called offhand.
The easyist thing to do it, just insert an exit(0); call at a suitable point in your program. Alternatively, you can set a breakpoint and use call exit(0) in GDB (except that debugging the program will affect the profile data if you stop it in the middle).
We have an embedded version of Linux kernel running on a MIPs core. The Programme we have written runs a particular test suite. During one of the stress tests (runs for about 12hrs) we get a seg fault. This in turn generates a core dump.
Unfortunately the core dump is not very useful. The crash is in some system library that is dynamically linked (probably pthread or glibc). The backtrace in the core dump is not helpful because it only shows the crash point and no other callers (our user space app is built with -g -O0, but still no back trace info):
Cannot access memory at address 0x2aab1004
(gdb) bt
#0 0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.
GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
Another unfortunate-ness is that we cannot run gdb/gdbserver. gdb/gdbserver keeps breaking on __nptl_create_event. Seeing that the test creates threads, timers and destroys then every 5s it is almost impossible to sit for a long time hitting continue on them.
EDIT:
Another note, backtrace and backtrace_symbols is not supported on our toolchain.
Hence:
Is there a way of trapping seg fault and generate more backtrace data, stack pointers, call stack, etc.?
Is there a way of getting more data from a core dump that crashed in a .so file?
Thanks.
GDB can't find the start of the function at 0x2ab05d18
What is at that address at the time of the crash?
Do info shared, and find out if there is a library that contains that address.
The most likely cause of your troubles: did you run strip libpthread.so.0 before uploading it to your target? Don't do that: GDB requires libpthread.so.0 to not be stripped. If your toolchain contains libpthread.so.0 with debug symbols (and thus too large for the target), run strip -g on it, not a full strip.
Update:
info shared produced Cannot access memory at address 0x2ab05d18
This means that GDB can not access the shared library list (which would then explain the missing stack trace). The most usual cause: the binary that actually produced the core does not match the binary you gave to GDB. A less common cause: your core dump was truncated (perhaps due to ulimit -c being set too low).
If all else fails run the command using the debugger!
Just put "gdb" in form of your normal start command and enter "c"ontinue to get the process running. When the task segfaults it will return to the interactive gdb prompt rather than core dump. You should then be able to get more meaningful stack traces etc.
Another option is to use "truss" if it is available. This will tell you which system calls were being used at the time of the abend.
I am currrently working on "Creation of Postmortem data logger in Linux on Intel architecture".
Its nothing but core utility creation.
Can any body share the details about how the signal handlers for various signals(SIGSEGV,SIGABRT,SIGFPE etc) which produce core dump upon crashing an application internally implemented in Linux kernel. I need to re-write these signal handlers with my own user specific needs and rebuild the kernel. It makes my kernel producing the core file (upon crashing an application) with user specific needs like showing registers,stackdump and backtrace etc.
Can anybody share the details about it....
Advance thanks to all the repliers:)
You may not need to modify the kernel at all - the kernel supports invoking a userspace application when a core dump occurs. From the core(5) man page:
Since kernel 2.6.19, Linux supports an
alternate syntax for the
/proc/sys/kernel/core_pattern file.
If the first character of this file is
a pipe symbol (|), then the
remainder of the line is interpreted
as a program to be executed. Instead
of being written to a disk file, the
core dump is given as standard input
to the program.
The actual dumping code depends on the format of the dump. For ELF format, look at the fs/binfmt_elf.c file. I has an elf_dump_core function. (Same with other formats.)
This is triggered by get_signal_to_deliver in kernel/signal.c, which calls into do_coredump in fs/exec.c, which calls the handler.
LXR, the Linux Cross-Reference, is usually helpful when you want to know how something is done in the Linux kernel. It's a browsing and searching tool for the kernel sources.
Searching “core dump” returns a lot of hits, but two of the most promising-looking are in fs/exec.c and fs/proc/kcore.c (promising because the file names are fairly generic, in particular you don't want to start with architecture-specific stuff). kcore.c is actually for a kernel core dump, but the hit in fs/exec.c is in the function do_coredump, which is the main function for dumping a process's core. From there, you can both read the function to see what it does, and search to see where it's called.
Most of the code in do_coredump is about determining whether to dump core and where the dump should go. What to dump is handled near the end: binfmt->core_dump(&cprm), i.e. this is dependent on the executable format (ELF, a.out, …). So your next search is on the core_dump struct field, specifically its “usage”; then select the hit corresponding to an executable format. ELF is probably the one you want, and so you get to the elf_core_dump function.
That being said, I'm not convinced from your description of your goals that what you want is really to change the core dump format, as opposed to writing a tool that analyses existing dumps.
You may be interested in existing work on analyzing kernel crash dumps. Some of that work is relevant to process dumps as well, for example the gcore extension to include process dumps in kernel crash dumps.
What does the extension of the core dump mean and how to read core dump file? As in when I open the file in text editors, I get garbage values.
Note : Its extension is something like .2369
You can use gdb to read the core dump. The extension is the process id.
Here is a link to a thread explaining how to do this.
And here is a gdb tutorial.
A core file is the memory image of a process at that point in time when it was terminated. Termination could for example happen through a segmentation fault or a failed assert. To "view" a coredump you will need a debugger. It will allow you to examine the state of the process. This includes listing the stack traces for all the threads of the process. Printing the values of variables and registers. Note that this works "better" if you have debug information available.
Traditionally core files are just named "core". This has the not so nice effect that cores will overwrite them selfs before a developer/admin discovers them. Many modern platforms allow to give core-files custom names that contain additional information. The number at the end of your core could for example be the PID of the process that this core belonged to.
The extension is most often the process ID that crashed. You need to examine the file with a debug tool.
Wikipedia can explain core dumps better than I, but
It is the dump of "core" memory; that is, the memory, registers, and other program state that the process holds when it crashes.
The value at the end of the filename must be system dependent. I normally use a debugger like GDB, in concert with my program to examine such files.