i have segfault coming from openmp (pthreads) region and I get very unhelpful:
/usr/src/packages/BUILD/glibc-2.11.1/string/wordcopy.c:85
/usr/src/packages/BUILD/glibc-2.11.1/string/./memmove.c:73
??:0
??:0
??:0
/usr/src/packages/BUILD/glibc-2.11.1/nptl/pthread_create.c:301
Any way to get the actual backtrace within pthread context?
Not mentioned by OP, but assuming it is using gcc/g++ to build code.
If the code is compiled with -g ,
then try using this command at gdb prompt when you debug the core:
thread apply all bt full
It would show all stack trace for each user created threads in the code.
Related
I'm writing a library that hooks some CUDA functions to add some functionality. The "constructor" hooks the CUDA functions and set up message queue and shared memory to communicate with other hooked CUDA binaries. When launching several hooked CUDA binaries (by python subprocess.Popen('<path-to-binary>', shell=True)) some processes hangs. So I used gdb -p <pid> to attach one suspended process, hoping to figure out what's going wrong. Here's the result:
Attaching to process 7445
Reading symbols from /bin/dash...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
78 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
#1 0x000055fff93be8a0 in ?? ()
#2 0x000055fff93c009d in ?? ()
#3 0x000055fff93ba6d8 in ?? ()
#4 0x000055fff93b949e in ?? ()
#5 0x000055fff93b9eda in ?? ()
#6 0x000055fff93b7944 in ?? ()
#7 0x00007f9cefdc8b97 in __libc_start_main (main=0x55fff93b7850, argc=3, argv=0x7ffca7c7beb8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca7c7bea8) at ../csu/libc-start.c:310
#8 0x000055fff93b7a4a in ?? ()
I've added -g flag but it seems that the program hangs on wait4 before entering main.
Thanks for any insights on:
How can I load these debug symbols to get rid of ??
Where is ../csu/libc-start.c:310 located?
What else can I do to locate the bug?
System Info: gcc 6.5.0, Ubuntu 18.04 with 4.15.0-54-generic.
How can I load these debug symbols to get rid of ??
You appear to need the debug symbols for /bin/dash, which are probably going to be in a package called dash-dbg or dash-dbgsym or something like that.
Also, I suspect your stack trace would make more sense if you compiled your library with -fno-optimize-sibling-calls.
Where is ../csu/libc-start.c:310 located?
See this answer.
What else can I do to locate the bug?
You said that you are writing a library that uses __attribute__((constructor)), but you showed a stack trace for /bin/dash (which I presume is DASH and not a program you wrote) that does not appear to involve symbols from your library. I infer from this, that your library is loaded with LD_PRELOAD into programs that are not expecting it to be there.
Both of those things -- LD_PRELOAD and __attribute__((constructor)) -- break the normal expectations of both whatever unsuspecting program is involved, and the C library. You should only do those things if you have no other choice, and you should try to do as little as possible within the injected code. (In particular, I do not think any design that involves spawning processes from a constructor function will be workable, period.) If you tell us about your larger goals we may be able to suggest alternative means that are less troublesome.
EDIT:
subprocess.Popen('<path-to-binary>', shell=True)
With shell=True, Python doesn't invoke the program directly, it runs a command of the form /bin/sh -c 'string passed to Popen'. In many cases this will naturally produce a /bin/dash process sleeping (not hung) in a wait syscall for the entire lifetime of the actual binary. Unless you actually need to evaluate some shell code before running the program, try the default shell=False instead and see if that makes your problem go away. (If you do need to evaluate shell code, try Popen('<shell code>; exec <binary>', shell=True).)
My program is multi thread. i got a core file and when i try to debug it i got this.
Program terminated with signal 11, Segmentation fault.
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
91 movl %ecx, (%rdi)
Missing separate debuginfos, use: debuginfo-install libssh2-1.8.0-2.0.cf.rhel6.x86_64
(gdb) bt
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
#1 0x00007f981b342feb in ?? ()
#2 0x00000000025f1ef0 in ?? ()
#3 0x00000000025edef0 in ?? ()
#4 0x00007fff4b65a810 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x00000000025cb800 in ?? ()
#7 0x00000000025ccea0 in ?? ()
#8 0x0000000000000000 in ?? ()
Why the bt infos are "???" Can i identify which thread and where case the seg fault?
Thank you.
In order to run gdb and make the best use of it, firstly, you need to compile your source with the -g or -ggdb3 option of gcc in the following way:
gcc -ggdb3 sample.c -o sample
After this you will get an executable or binary file which you can execute. Upon execution the program will generate a segfault and a coredump will be created. You can use this core file in the following way with gdb to obtain the backtrace:
gdb ./sample /path/to/core/file
You can even launch your program using gdb without actually executing it separately and generating the core file explicitly. If you want to do this, execute the following command:
gdb ./sample
The "??" entries are where symbol translation failed. Stack walking – which produces the stack trace – can also fail. In that case you'll likely see a single valid frame, then a small number of bogus addresses. If symbols or stacks are too badly broken to make sense of the stack trace, then there are usually ways to fix it: installing debug info packages (giving gdb more symbols, and letting it do DWARF-based stack walks), or recompiling the software from source with frame pointers and debugging information (-fno-omit-frame-pointer -g).
When I try to debug core file with gdb I dont see any valid stack trace to proceed further (similar to the reported issue in the link below). Can you please help me how I should debug further to debug the issue. Any pointers or gdb commands which will help in triaging the problem.
GDB debugging trace with no relevant info (#0 0x2e6e6f69 in ?? ())
(gdb) where
#0 0x76c0da28 in ?? ()
#1 0x76c0d9e0 in ?? ()
#2 0x76c0d9e0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
EDIT: To rule out the possibility of the -g flag and host environment issues,I have intentionally added a code to crash and was able to get the correct stack trace from the corefile.
When I try to debug core file with gdb I dont see any valid stack trace
Was the core produced on the same host where it is being analyzed?
If not, this answer explains what you need to do.
core.1678,core.1689, how can i resolve this problem using gdb.i have tried gdb bt option but it is not resolving the error.
gdb -bt core.1678
(gdb) core
No core file now.
(gdb) n
The program is not being run.
(gdb) r
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
(gdb) core.1678
/home/deepak/deepak/mss/.1678: No such file or directory.
(gdb) /home/deepak/deepak/mss/core.1678
help me out
many core files( e.g core.1678 etc )...
This indicates that your same program or different programs in that particular directory is continuously crashing. When your machine is configured to generate dump file, it creates the file in the form of core.(PID). You may refer many useful article regarding the core dump file. You may refer my blog as well which explains about core dump analysis and its internal.
http://mantoshopensource.blogspot.sg/2011/02/core-dump-analysis-part-ii.html
The basic command to load and analyze the core dump file using GDB is as follows:
mantosh#ubuntu:~$ gdb
// This is how you would open the core dump file.
(gdb) core core.23515
(no debugging symbols found)
Core was generated by `./otest LinuxWorldRocks 10'.
Program terminated with signal 11, Segmentation fault.
[New process 23515]
==> Signal 11(SIGSEGV) was the reason for this core-dump file
==> pid of a program is 23515
#0 0x080485f8 in ?? ()
// Load the debug symbol of your program(build with -g option)
(gdb) symbol ./otest
Reading symbols from /home/mantosh/Desktop/otest...done.
// Now you can execute any normal command which you perform while debugging(except breakpoints).
(gdb) bt
#0 0x080485f8 in printf_info (info=0x8ec5008 "LinuxWorld") at test.c:58
#1 0x080485c2 in my_memcpy (dest=0x8ec5012 "", source=0xbfb9c6fe "Rocks",
length=10) at test.c:47
#2 0x0804855d in main (argc=3, argv=0xbfb9b3f4) at test.c:33
EDIT
cored-ump file is the snapshot of that particular program at the time of exception/segmentation fault. So once you load core-dump in GDB you would only be able to execute the command to read
the memory information. You can not use the debugging commands like breakpoints, continue, run ...etc..........
i'm trying to debug a C program, which runs on an ARM926EJ-S rev 5 (v5l). The software was cross-compiled (and is statically linked) with the std. arm-linux-gnueabi compiler (intalled via synaptic). I run Ubuntu 13.04 64bit. On the device is a Busybox v1.18.2. I successfully compiled gdbserver (with host=arm-linux-gnueabi) and gdb (with target=arm-linux-gnueabi) and can start my program on the embedded device via the locally running gdb...
My problem now is, that i don't have a proper backtrace output.
Message of gdb:
Remote debugging using 192.168.21.127:2345
0x0000a79c in ?? ()
(gdb) run
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) continue
Continuing.
Cannot access memory at address 0x0
Program received signal SIGINT, Interrupt.
0x00026628 in ?? ()
(gdb) backtrace
#0 0x00026628 in ?? ()
#1 0x00036204 in ?? ()
#2 0x00036204 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I try to compile the software with -g, -g3 -gdwarf-2, -ggdb, -ggdb3 without any difference.
Has anybody an idea what i am missing here?
Is this a problem maybe with the BusyBox or do i need additional libs on my host system?
I also tried the function backtrace_symbols from execinfo.h with nearly the same output...
Thanks in advance for any reply.
Another way for debugging is use gdb inside board follow below steps.
1)Run gdb process and attach your process to gdb using attach <pid> command
2)Continue your process using c command in gdb
Whenever you find any SIGINT or SIGSEGV then refer stack of your process using bt command in gdb.