How to debug cannot access address using gdb - c

When I try to debug core file with gdb I dont see any valid stack trace to proceed further (similar to the reported issue in the link below). Can you please help me how I should debug further to debug the issue. Any pointers or gdb commands which will help in triaging the problem.
GDB debugging trace with no relevant info (#0 0x2e6e6f69 in ?? ())
(gdb) where
#0 0x76c0da28 in ?? ()
#1 0x76c0d9e0 in ?? ()
#2 0x76c0d9e0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
EDIT: To rule out the possibility of the -g flag and host environment issues,I have intentionally added a code to crash and was able to get the correct stack trace from the corefile.

When I try to debug core file with gdb I dont see any valid stack trace
Was the core produced on the same host where it is being analyzed?
If not, this answer explains what you need to do.

Related

Debugging functions in __libc_start_main

I'm writing a library that hooks some CUDA functions to add some functionality. The "constructor" hooks the CUDA functions and set up message queue and shared memory to communicate with other hooked CUDA binaries. When launching several hooked CUDA binaries (by python subprocess.Popen('<path-to-binary>', shell=True)) some processes hangs. So I used gdb -p <pid> to attach one suspended process, hoping to figure out what's going wrong. Here's the result:
Attaching to process 7445
Reading symbols from /bin/dash...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
78 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
#1 0x000055fff93be8a0 in ?? ()
#2 0x000055fff93c009d in ?? ()
#3 0x000055fff93ba6d8 in ?? ()
#4 0x000055fff93b949e in ?? ()
#5 0x000055fff93b9eda in ?? ()
#6 0x000055fff93b7944 in ?? ()
#7 0x00007f9cefdc8b97 in __libc_start_main (main=0x55fff93b7850, argc=3, argv=0x7ffca7c7beb8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca7c7bea8) at ../csu/libc-start.c:310
#8 0x000055fff93b7a4a in ?? ()
I've added -g flag but it seems that the program hangs on wait4 before entering main.
Thanks for any insights on:
How can I load these debug symbols to get rid of ??
Where is ../csu/libc-start.c:310 located?
What else can I do to locate the bug?
System Info: gcc 6.5.0, Ubuntu 18.04 with 4.15.0-54-generic.
How can I load these debug symbols to get rid of ??
You appear to need the debug symbols for /bin/dash, which are probably going to be in a package called dash-dbg or dash-dbgsym or something like that.
Also, I suspect your stack trace would make more sense if you compiled your library with -fno-optimize-sibling-calls.
Where is ../csu/libc-start.c:310 located?
See this answer.
What else can I do to locate the bug?
You said that you are writing a library that uses __attribute__((constructor)), but you showed a stack trace for /bin/dash (which I presume is DASH and not a program you wrote) that does not appear to involve symbols from your library. I infer from this, that your library is loaded with LD_PRELOAD into programs that are not expecting it to be there.
Both of those things -- LD_PRELOAD and __attribute__((constructor)) -- break the normal expectations of both whatever unsuspecting program is involved, and the C library. You should only do those things if you have no other choice, and you should try to do as little as possible within the injected code. (In particular, I do not think any design that involves spawning processes from a constructor function will be workable, period.) If you tell us about your larger goals we may be able to suggest alternative means that are less troublesome.
EDIT:
subprocess.Popen('<path-to-binary>', shell=True)
With shell=True, Python doesn't invoke the program directly, it runs a command of the form /bin/sh -c 'string passed to Popen'. In many cases this will naturally produce a /bin/dash process sleeping (not hung) in a wait syscall for the entire lifetime of the actual binary. Unless you actually need to evaluate some shell code before running the program, try the default shell=False instead and see if that makes your problem go away. (If you do need to evaluate shell code, try Popen('<shell code>; exec <binary>', shell=True).)

GDB symbols missing - libc claimed to be wrong library or version mismatch

I am having trouble showing proper debug symbols in the backtrace in GDB in an ARM cross-compiled system, built using Yocto.
abc.c is a simple printf("Hello world\n"); program in C (nothing tricky). On the build machine:
> yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/bin/arm-angstrom-linux-gnueabi/arm-angstrom-linux-gnueabi-gcc abc --sysroot=yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm -g -O0 -o abc
> scp abc root#DEVICE-IP:~
On the ARM target:
> gdbserver :2345 abc
Start GDB on the build machine (from installed Yocto SDK):
> /usr/local/oecore-x86_64/sysroots/x86_64-angstromsdk-linux/usr/bin/arm-angstrom-linux-gnueabi/arm-angstrom-linux-gnueabi-gdb abc
GNU gdb (Linaro GDB) 7.8-2014.09
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-angstromsdk-linux --target=arm-angstrom-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.linaro.org>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from abc...done.
(gdb) target remote DEVICE-IP:2345
Remote debugging using DEVICE-IP:2345
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Cannot access memory at address 0x0
0x4ae90a20 in ?? ()
(gdb) bt
#0 0x4ae90a20 in ?? ()
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) set sysroot yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm
Reading symbols from yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3...done.
Loaded symbols for yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3
Cannot access memory at address 0x0
After setting the sysroot, it still does not give symbols.
(gdb) bt
#0 0x4ae90a20 in ?? ()
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) b main
Breakpoint 1 at 0x84a8: file abc.c, line 5.
(gdb) c
Continuing.
Breakpoint 1, main () at abc.c:5
5 printf("Hello world\n");
Okay, when it hits a breakpoint, it does display symbols.
(gdb) bt
Cannot access memory at address 0x0
#0 main () at abc.c:5
However, it goes weird stepping beyond there.
(gdb) n
Cannot access memory at address 0x1
0x4aea6ea0 in ?? ()
(gdb) bt
#0 0x4aea6ea0 in ?? ()
#1 0x0000a014 in do_lookup_unique (Cannot access memory at address 0x1
undef_map=0x1, ref=0x0, strtab=0x56ebb27 <error: Cannot access memory at address 0x56ebb27>, sym=0x84a0 <main>, type_class=-1224757248, result=0x1, map=<optimized out>,
new_hash=<optimized out>, undef_name=<optimized out>) at /usr/src/debug/glibc/2.24-r0/git/elf/dl-lookup.c:332
#2 do_lookup_x (undef_name=<optimized out>, new_hash=<optimized out>, old_hash=<optimized out>, ref=0x0, result=<optimized out>, scope=0x177ff8e, i=<optimized out>, version=<optimized out>,
flags=-1224757248, skip=0x1, type_class=100, undef_map=0x1) at /usr/src/debug/glibc/2.24-r0/git/elf/dl-lookup.c:544
#3 0x4aec0b10 in ?? ()
Cannot access memory at address 0x1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It can't find the proper version of libc.so.6.
(gdb) info sharedlibrary
warning: .dynamic section for "yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6" is not at the expected address (wrong library or version mismatch?)
From To Syms Read Shared Object Library
0x000007d0 0x0001bee0 Yes yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/ld-linux.so.3
0x4aee73c0 0x4afe2018 No yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6
(gdb) n
Cannot find bounds of current function
It does not give an ideal debugging experience.
There is a gcc inside yocto-dir sysroot (as used above), as well as in /usr/local/oecore-x86_64. They both behave the same. The /usr/local/oecore-x86_64 SDK is freshly built and installed.
Similarly, there is an imx28scm sysroot inside yocto-dir (as used above), as well as in /usr/local/oecore-x86_64, and they both behave the same. However, they clearly do have different versions of libc.so.6 - yocto-dir's is 14.8MB, and /usr/local/oecore-x86_64's is 1.3MB. This is a concern, however setting either of these locations as the sysroot does not fix the problem.
One workaround is to link with -static. GDB does give symbols in this case:
(gdb) target remote DEVICE-IP:2345
Remote debugging using DEVICE-IP:2345
_start () at ../sysdeps/arm/start.S:79
79 ../sysdeps/arm/start.S: No such file or directory.
(gdb) set sysroot yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm
(gdb) bt
#0 _start () at ../sysdeps/arm/start.S:79
(gdb) b main
Breakpoint 1 at 0x8480: file abc.c, line 5.
(gdb) c
Continuing.
Breakpoint 1, main () at abc.c:5
5 printf("Hello world\n");
(gdb) n
6 return 0;
(gdb) n
7 }
Linking with -Wl,--verbose seems to show it is linking with the library in the expected sysroot:
yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/libexec/arm-angstrom-linux-gnueabi/gcc/arm-angstrom-linux-gnueabi/6.2.1/ld: Attempt to open yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/lib/libc.so.6 succeeded
The linker also finds this one, but it isn't referred to as libc.so.6, so presumably this is not interfering.
yocto-dir/build/tmp-angstrom-glibc/sysroots/x86_64-linux/usr/libexec/arm-angstrom-linux-gnueabi/gcc/arm-angstrom-linux-gnueabi/6.2.1/ld: Attempt to open yocto-dir/build/tmp-angstrom-glibc/sysroots/imx28scm/usr/lib/libc.so succeeded
Why is there a library version mismatch in this case? How can I get GDB to display symbols from the library which it expects? I do not wish to link statically.
Please make sure the libc in the box is same as the one in your build server.
sorry, this should be a comments, but currently, I don't have enough reputation.
Apparently GDB for ARM target has trouble with trying to load symbols before main() (Debugging shared libraries with gdbserver):
The problem I had was that gdbserver stops at the dynamic loader, before main, and the dynamic libraries are not yet loaded at that point, and so GDB does not know where the symbols will go in memory yet.
GDB appears to have some mechanisms to automatically load shared library symbols, and if I compile for host, and run gdbserver locally, running to main is not needed. But on the ARM target, that is the most reliable thing to do.
Therefore, set it to load shared symbols after main has been hit:
> b main
> c
<breakpoint hit>
> set sysroot <sysroot>
Or reload the symbols after you hit main.
> set sysroot <sysroot>
...
> b main
> c
<breakpoint hit>
> nosharedlibrary
> sharedlibrary
Or it might be useful in interfacing with IDE debuggers to set auto loading of symbols to be off on GDB startup:
> set auto-solib-add off

How to gdb a core file with seg fault in centos 6.5?

My program is multi thread. i got a core file and when i try to debug it i got this.
Program terminated with signal 11, Segmentation fault.
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
91 movl %ecx, (%rdi)
Missing separate debuginfos, use: debuginfo-install libssh2-1.8.0-2.0.cf.rhel6.x86_64
(gdb) bt
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
#1 0x00007f981b342feb in ?? ()
#2 0x00000000025f1ef0 in ?? ()
#3 0x00000000025edef0 in ?? ()
#4 0x00007fff4b65a810 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x00000000025cb800 in ?? ()
#7 0x00000000025ccea0 in ?? ()
#8 0x0000000000000000 in ?? ()
Why the bt infos are "???" Can i identify which thread and where case the seg fault?
Thank you.
In order to run gdb and make the best use of it, firstly, you need to compile your source with the -g or -ggdb3 option of gcc in the following way:
gcc -ggdb3 sample.c -o sample
After this you will get an executable or binary file which you can execute. Upon execution the program will generate a segfault and a coredump will be created. You can use this core file in the following way with gdb to obtain the backtrace:
gdb ./sample /path/to/core/file
You can even launch your program using gdb without actually executing it separately and generating the core file explicitly. If you want to do this, execute the following command:
gdb ./sample
The "??" entries are where symbol translation failed. Stack walking – which produces the stack trace – can also fail. In that case you'll likely see a single valid frame, then a small number of bogus addresses. If symbols or stacks are too badly broken to make sense of the stack trace, then there are usually ways to fix it: installing debug info packages (giving gdb more symbols, and letting it do DWARF-based stack walks), or recompiling the software from source with frame pointers and debugging information (-fno-omit-frame-pointer -g).

No Debug Symbols cross compile ARM on BusyBox

i'm trying to debug a C program, which runs on an ARM926EJ-S rev 5 (v5l). The software was cross-compiled (and is statically linked) with the std. arm-linux-gnueabi compiler (intalled via synaptic). I run Ubuntu 13.04 64bit. On the device is a Busybox v1.18.2. I successfully compiled gdbserver (with host=arm-linux-gnueabi) and gdb (with target=arm-linux-gnueabi) and can start my program on the embedded device via the locally running gdb...
My problem now is, that i don't have a proper backtrace output.
Message of gdb:
Remote debugging using 192.168.21.127:2345
0x0000a79c in ?? ()
(gdb) run
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) continue
Continuing.
Cannot access memory at address 0x0
Program received signal SIGINT, Interrupt.
0x00026628 in ?? ()
(gdb) backtrace
#0 0x00026628 in ?? ()
#1 0x00036204 in ?? ()
#2 0x00036204 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I try to compile the software with -g, -g3 -gdwarf-2, -ggdb, -ggdb3 without any difference.
Has anybody an idea what i am missing here?
Is this a problem maybe with the BusyBox or do i need additional libs on my host system?
I also tried the function backtrace_symbols from execinfo.h with nearly the same output...
Thanks in advance for any reply.
Another way for debugging is use gdb inside board follow below steps.
1)Run gdb process and attach your process to gdb using attach <pid> command
2)Continue your process using c command in gdb
Whenever you find any SIGINT or SIGSEGV then refer stack of your process using bt command in gdb.

gdb : address range mappings

I am analyzing this core dump
Program received signal SIGABRT, Aborted.
0xb7fff424 in __kernel_vsyscall ()
(gdb) where
#0 0xb7fff424 in __kernel_vsyscall ()
#1 0x0050cd71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0050e64a in abort () at abort.c:92
#3 0x08083b3b in ?? ()
#4 0x08095461 in ?? ()
#5 0x0808bdea in ?? ()
#6 0x0808c4e2 in ?? ()
#7 0x080b683b in ?? ()
#8 0x0805d845 in ?? ()
#9 0x08083eb6 in ?? ()
#10 0x08061402 in ?? ()
#11 0x004f8cc6 in __libc_start_main (main=0x805f390, argc=15, ubp_av=0xbfffef64, init=0x825e220, fini=0x825e210,
rtld_fini=0x4cb220 <_dl_fini>, stack_end=0xbfffef5c) at libc-start.c:226
#12 0x0804e5d1 in ?? ()
I'm not able to know which function ?? maps to OR for instance #10 0x08061402 in ?? ()
falls in which address range ...
Please help me debug this.
Your program has no debugging symbols. Recompile it with -g. Make sure you haven't stripped your executable, e.g. by passing -s to the linker.
Even though #user794080 didn't say so, it appears exceedingly likely that his program is a 32-bit linux executable.
There are two possible reasons (I can think of) for symbols from main executable (and all symbols in the stack trace in the range [0x08040000,0x08100000) are from the main executable) not to show up.
The main executable has in fact been stripped (this is the same as
ninjalj's answer), and often happens when '-s' is passed into the linker, perhaps inadvertently.
The executable has been compiled with a new(er) GCC, but is being debugged by an old(er) GDB, which chokes on some newer dwarf construct (there should be a warning from GDB about that).
To know what libraries are mapped into the application, record a pid of you program, stopped in gdb and run in other console
cat /proc/$pid/maps
wher $pid is the pid of stopped process. Format of the maps file is described at http://linux.die.net/man/5/proc - starting from "/proc/[number]/maps
A file containing the currently mapped memory regions and their access permissions."
Also, if your OS don't use a ASLR (address space layout randomization) or it is disabled for your program, you can use
ldd ./program
to list linked libraries and their memory ranges. But if ASLR is turned on, you will be not able to get real memory mapping ranges info, as it will change for each run of program. But even then you will know, what libraries are linked in dynamically and install a debuginfo for them.
The stack might be corrupted. The "??" can happen if the return address on the stack has been overwritten by, for example, a buffer overflow.

Resources