I placed a breakpoint on a function label in gdb.
This breakpoint was hit multiple times, but this last time - it was missed and segmentation-fault occurred.
What could cause gdb to not stop on a breakpoint?
Why didn't it stop on line 1018 this last time?
(gdb) b ambl_test_event_processor
Breakpoint 1 at 0x5852d6e0: file code/comn/ambl/ambltest.c, line 1018.
(gdb) c
Continuing.
[LWP 1139 exited]
[Switching to LWP 1148]
Thread 39 "Metaswitch_0_3" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72f97b28, mib_data=0x7922b8f4, req_data=0x7280e784, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 code/comn/ambl/ambltest.c: No such file or directory.
(gdb) c
Continuing.
[Switching to LWP 1151]
Thread 42 "Metaswitch_0_6" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72ff9508, mib_data=0x7932bda0, req_data=0x7280e71c, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 in code/comn/ambl/ambltest.c
(gdb) c
Continuing.
[Switching to LWP 1145]
Thread 36 "Metaswitch_0_0" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72efb544, mib_data=0x7280b4e4, req_data=0x7280e6e8, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 in code/comn/ambl/ambltest.c
(gdb) c
Continuing.
[Switching to LWP 1150]
Thread 41 "Metaswitch_0_5" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72ffbc8c, mib_data=0x79622694, req_data=0x7280e750, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 in code/comn/ambl/ambltest.c
(gdb)
Continuing.
Thread 42 "Metaswitch_0_6" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1151]
0x5852d996 in ambl_test_event_processor (ambl_data=0x0, mib_data=0x7932bda0, req_data=0x7280e71c, row_cb=<optimized out>, rc=<optimized out>,
ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1568
1568 in code/comn/ambl/ambltest.c
What could cause gdb to not stop on a breakpoint? Why didn't it stop on line 1018 this last time?
You haven't shown any proof that it didn't.
You have:
Thread 42 "Metaswitch_0_6" hit Breakpoint 1
Thread 36 "Metaswitch_0_0" hit Breakpoint 1,
Thread 41 "Metaswitch_0_5" hit Breakpoint 1,
Thread 42 "Metaswitch_0_6" received signal SIGSEGV
There is no indication that thread 42 has finished running ambl_test_event_processor() after it hit the breakpoint, and then somehow skipped that breakpoint, then crashed.
More likely is that it hit the breakpoint (then some other threads also hit it, but that is irrelevant), then crashed.
Related
I'm trying to debug a hard fault in a C++ firmware project for the microbit v1.5 .
The issue at hand is that after a hard fault I would like to reset the microcontroller
and start anew but issuing the dreaded monitor reset halt does not work and execution never restarts properly after a hard fault.
I'm using pyocd (v0.33.1) as my gdb debugserver and a custom built gdb (v8.2.1) with proper support for the nrf51 series.
This is an example interaction with gdb. I set a breakpoint on HardFault_Handler and start execution. The firmware correctly spawns tasks but eventually one of the tasks faults and the HardFault handler gets called. After this I would like to reset the microcontroller and start anew.
I expect the microcontroller to spawn the same set of tasks but this never happens and it also never goes back to main so I'm thinking there must be a specific way to reset it correctly.
What command should I issue to reset the flow of execution to start with main or one of the routines from gcc_startup?
(gdb) info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y 0x000290e2 ../support/libs/nrfx/mdk/gcc_startup_nrf51.S:234
(gdb) c
Continuing.
[New Thread 2]
[New Thread 536884080]
[New Thread 536880760]
[New Thread 536884152]
Thread 2 "Handler mode" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2]
0x000006b0 in ?? ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 2 "Handler mode" (HardFault) 0x000006b0 in ?? ()
3 Thread 536884080 "IDL" (Ready; Priority 0) prvIdleTask (pvParameters=0x0)
at ../support/freertos/tasks.c:3225
4 Thread 536880760 "KNL" (Ready; Priority 1) starlight::sys::Task::<lambda(void*)>::_FUN(void *)
() at ../include/starlight/sys/task.hpp:154
5 Thread 536884152 "Tmr" (Running; Priority 2) __DSB ()
at ../support/libs/CMSIS-Core/Include/cmsis_gcc.h:946
(gdb) monitor reset halt
Resetting target with halt
Successfully halted device on reset
(gdb) c
Continuing.
[New Thread 1]
Thread 6 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1]
0x000006b0 in ?? ()
(gdb) info threads
Id Target Id Frame
* 6 Thread 1 (HardFault) 0x000006b0 in ?? ()
(gdb) monitor reset halt
Resetting target with halt
Successfully halted device on reset
(gdb) c
Continuing.
Thread 6 received signal SIGSEGV, Segmentation fault.
0x000006b0 in ?? ()
(gdb) backtrace
#0 0x000006b0 in ?? ()
#1 <signal handler called>
Backtrace stopped: Cannot access memory at address 0x4b0547f8
I have a program having over 300 threads to which I have attached gdb. I need to identify one particular thread whose call stack has a frame containing a variable whose value I want to use for matching. Can I script this in gdb?
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f16c1eeb700 (LWP 18833))]
#4 0x00007f17f3a3bdd5 in start_thread () from /lib64/libpthread.so.0
(gdb) backtrace
#0 0x00007f17f3a3fd12 in pthread_cond_timedwait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f17e72838be in __afr_shd_healer_wait (healer=healer#entry=0x7f17e05203d0) at afr-self-heald.c:101
#2 0x00007f17e728392d in afr_shd_healer_wait (healer=healer#entry=0x7f17e05203d0) at afr-self-heald.c:125
#3 0x00007f17e72848e8 in afr_shd_index_healer (data=0x7f17e05203d0) at afr-self-heald.c:572
#4 0x00007f17f3a3bdd5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f17f3302ead in clone () from /lib64/libc.so.6
(gdb) frame 3
#3 0x00007f17e72848e8 in afr_shd_index_healer (data=0x7f17e05203d0) at afr-self-heald.c:572
572 afr_shd_healer_wait (healer);
(gdb) p this->name
$6 = 0x7f17e031b910 "testvol-replicate-0"
For example, can I run a macro to loop over each thread, go to frame 3 in each of it, inspect the variable this->name and print the thead number only if the value matches testvol-replicate-0 ?
It's possible to integrate Python into GDB. Then, with the Python GDB API, you could loop over threads and search for a match. Below two examples of debugging threads with GDB and Python.
https://www.linuxjournal.com/article/11027
https://fy.blackhats.net.au/blog/html/2017/08/04/so_you_want_to_script_gdb_with_python.html
Consider the following program.
#include <unistd.h>
int main(){
sleep(1000);
}
If we run strace on this program, the last line that appears before the long sleep is the following.
nanosleep({1000, 0},
While the program is asleep, the code is executing (likely blocked) inside the OS kernel.
When I run the program under gdb, if I send SIGINT in the middle of the sleep, I can collect various information about the main thread, such as its backtrace and various register values.
Is there is some expression in gdb that evaluates to true iff the thread must cross a syscall boundary before executing code in userspace again?
Ideally, there would be a cross-platform solution, but platform-specific solutions are also useful.
Clarification: I do not care whether the thread is actually executing; only whether its most recent program counter value was in kernel code or user code.
Put another way, can gdb tell us whether a particular thread has entered the kernel but not yet exited the kernel?
Is there is some expression in gdb that evaluates to true if the
thread must cross a syscall boundary before executing code in
userspace again?
You can try to use catch syscall nanosleep, see documentation.
catch syscall nanosleep stops on 2 events: the one on call to a system call and the one on return from a system call. You can use info breakpoints to see the number of hit times of this catchpoint. If it is even, then you should be in user space. If it is odd, then you you should be in kernel space:
$ gdb -q a.out
Reading symbols from a.out...done.
(gdb) catch syscall nanosleep
Catchpoint 1 (syscall 'nanosleep' [35])
(gdb) i b
Num Type Disp Enb Address What
1 catchpoint keep y syscall "nanosleep"
(gdb) r
Starting program: /home/ks1322/a.out
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.27-8.fc28.x86_64
Catchpoint 1 (call to syscall nanosleep), 0x00007ffff7adeb54 in nanosleep () from /lib64/libc.so.6
(gdb) i b
Num Type Disp Enb Address What
1 catchpoint keep y syscall "nanosleep"
catchpoint already hit 1 time
(gdb) c
Continuing.
Catchpoint 1 (returned from syscall nanosleep), 0x00007ffff7adeb54 in nanosleep () from /lib64/libc.so.6
(gdb) i b
Num Type Disp Enb Address What
1 catchpoint keep y syscall "nanosleep"
catchpoint already hit 2 times
(gdb) c
Continuing.
[Inferior 1 (process 19515) exited normally]
I want to debug the c function of an R package using R -d gdb, but I get the following after setting breakpoint at c function C_MIM(), I got the following information and also "cannot find bound of the current function" so I could not print out any variable value in this case. Is there something I am doing wrong? Or for some R package, it is not possible to debug?
Breakpoint 1, 0x00007fffdee0035f in C_MIM ()
from /home/sunxd/R/x86_64-pc-linux-gnu-library/3.4/praznik/libs/praznik.so
(gdb) list
76 in ../sysdeps/unix/syscall-template.S
(gdb) n
Single stepping until exit from function C_MIM,
which has no line number information.
^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7fffdddfa700 (LWP 21179)]
---Type <return> to continue, or q <return> to quit---
0x00007ffff45c707e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
it turns out one must have the source code and compile the R package using specific gcc/cc options.
I'd like to know if my program is accessing NULL pointers or stale memory.
The backtrace looks like this:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2b0fa4c8 (LWP 1333)]
0x299a6ad4 in pthread_mutex_lock () from /lib/libpthread.so.0
(gdb) bt
#0 0x299a6ad4 in pthread_mutex_lock () from /lib/libpthread.so.0
#1 0x0058e900 in ?? ()
With GDB 7 and higher, you can examine the $_siginfo structure that is filled out when the signal occurs, and determine the faulting address:
(gdb) p $_siginfo._sifields._sigfault.si_addr
If it shows (void *) 0x0 (or a small number) then you have a NULL pointer dereference.
Run your program under GDB. When the segfault occurs, GDB will inform you of the line and statement of your program, along with the variable and its associated address.
You can use the "print" (p) command in GDB to inspect variables. If the crash occurred in a library call, you can use the "frame" series of commands to see the stack frame in question.