Reset after hard fault - arm

I'm trying to debug a hard fault in a C++ firmware project for the microbit v1.5 .
The issue at hand is that after a hard fault I would like to reset the microcontroller
and start anew but issuing the dreaded monitor reset halt does not work and execution never restarts properly after a hard fault.
I'm using pyocd (v0.33.1) as my gdb debugserver and a custom built gdb (v8.2.1) with proper support for the nrf51 series.
This is an example interaction with gdb. I set a breakpoint on HardFault_Handler and start execution. The firmware correctly spawns tasks but eventually one of the tasks faults and the HardFault handler gets called. After this I would like to reset the microcontroller and start anew.
I expect the microcontroller to spawn the same set of tasks but this never happens and it also never goes back to main so I'm thinking there must be a specific way to reset it correctly.
What command should I issue to reset the flow of execution to start with main or one of the routines from gcc_startup?
(gdb) info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y 0x000290e2 ../support/libs/nrfx/mdk/gcc_startup_nrf51.S:234
(gdb) c
Continuing.
[New Thread 2]
[New Thread 536884080]
[New Thread 536880760]
[New Thread 536884152]
Thread 2 "Handler mode" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2]
0x000006b0 in ?? ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 2 "Handler mode" (HardFault) 0x000006b0 in ?? ()
3 Thread 536884080 "IDL" (Ready; Priority 0) prvIdleTask (pvParameters=0x0)
at ../support/freertos/tasks.c:3225
4 Thread 536880760 "KNL" (Ready; Priority 1) starlight::sys::Task::<lambda(void*)>::_FUN(void *)
() at ../include/starlight/sys/task.hpp:154
5 Thread 536884152 "Tmr" (Running; Priority 2) __DSB ()
at ../support/libs/CMSIS-Core/Include/cmsis_gcc.h:946
(gdb) monitor reset halt
Resetting target with halt
Successfully halted device on reset
(gdb) c
Continuing.
[New Thread 1]
Thread 6 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1]
0x000006b0 in ?? ()
(gdb) info threads
Id Target Id Frame
* 6 Thread 1 (HardFault) 0x000006b0 in ?? ()
(gdb) monitor reset halt
Resetting target with halt
Successfully halted device on reset
(gdb) c
Continuing.
Thread 6 received signal SIGSEGV, Segmentation fault.
0x000006b0 in ?? ()
(gdb) backtrace
#0 0x000006b0 in ?? ()
#1 <signal handler called>
Backtrace stopped: Cannot access memory at address 0x4b0547f8

Related

What can cause gdb to not stop on a breakpoint?

I placed a breakpoint on a function label in gdb.
This breakpoint was hit multiple times, but this last time - it was missed and segmentation-fault occurred.
What could cause gdb to not stop on a breakpoint?
Why didn't it stop on line 1018 this last time?
(gdb) b ambl_test_event_processor
Breakpoint 1 at 0x5852d6e0: file code/comn/ambl/ambltest.c, line 1018.
(gdb) c
Continuing.
[LWP 1139 exited]
[Switching to LWP 1148]
Thread 39 "Metaswitch_0_3" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72f97b28, mib_data=0x7922b8f4, req_data=0x7280e784, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 code/comn/ambl/ambltest.c: No such file or directory.
(gdb) c
Continuing.
[Switching to LWP 1151]
Thread 42 "Metaswitch_0_6" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72ff9508, mib_data=0x7932bda0, req_data=0x7280e71c, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 in code/comn/ambl/ambltest.c
(gdb) c
Continuing.
[Switching to LWP 1145]
Thread 36 "Metaswitch_0_0" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72efb544, mib_data=0x7280b4e4, req_data=0x7280e6e8, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 in code/comn/ambl/ambltest.c
(gdb) c
Continuing.
[Switching to LWP 1150]
Thread 41 "Metaswitch_0_5" hit Breakpoint 1, ambl_test_event_processor (ambl_data=0x72ffbc8c, mib_data=0x79622694, req_data=0x7280e750, row_cb=0x0,
rc=1, ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1018
1018 in code/comn/ambl/ambltest.c
(gdb)
Continuing.
Thread 42 "Metaswitch_0_6" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1151]
0x5852d996 in ambl_test_event_processor (ambl_data=0x0, mib_data=0x7932bda0, req_data=0x7280e71c, row_cb=<optimized out>, rc=<optimized out>,
ambl_replication=0 '\000') at code/comn/ambl/ambltest.c:1568
1568 in code/comn/ambl/ambltest.c
What could cause gdb to not stop on a breakpoint? Why didn't it stop on line 1018 this last time?
You haven't shown any proof that it didn't.
You have:
Thread 42 "Metaswitch_0_6" hit Breakpoint 1
Thread 36 "Metaswitch_0_0" hit Breakpoint 1,
Thread 41 "Metaswitch_0_5" hit Breakpoint 1,
Thread 42 "Metaswitch_0_6" received signal SIGSEGV
There is no indication that thread 42 has finished running ambl_test_event_processor() after it hit the breakpoint, and then somehow skipped that breakpoint, then crashed.
More likely is that it hit the breakpoint (then some other threads also hit it, but that is irrelevant), then crashed.

Is it possible to determine in gdb whether a thread is executing (or blocked) in kernel or user space?

Consider the following program.
#include <unistd.h>
int main(){
sleep(1000);
}
If we run strace on this program, the last line that appears before the long sleep is the following.
nanosleep({1000, 0},
While the program is asleep, the code is executing (likely blocked) inside the OS kernel.
When I run the program under gdb, if I send SIGINT in the middle of the sleep, I can collect various information about the main thread, such as its backtrace and various register values.
Is there is some expression in gdb that evaluates to true iff the thread must cross a syscall boundary before executing code in userspace again?
Ideally, there would be a cross-platform solution, but platform-specific solutions are also useful.
Clarification: I do not care whether the thread is actually executing; only whether its most recent program counter value was in kernel code or user code.
Put another way, can gdb tell us whether a particular thread has entered the kernel but not yet exited the kernel?
Is there is some expression in gdb that evaluates to true if the
thread must cross a syscall boundary before executing code in
userspace again?
You can try to use catch syscall nanosleep, see documentation.
catch syscall nanosleep stops on 2 events: the one on call to a system call and the one on return from a system call. You can use info breakpoints to see the number of hit times of this catchpoint. If it is even, then you should be in user space. If it is odd, then you you should be in kernel space:
$ gdb -q a.out
Reading symbols from a.out...done.
(gdb) catch syscall nanosleep
Catchpoint 1 (syscall 'nanosleep' [35])
(gdb) i b
Num Type Disp Enb Address What
1 catchpoint keep y syscall "nanosleep"
(gdb) r
Starting program: /home/ks1322/a.out
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.27-8.fc28.x86_64
Catchpoint 1 (call to syscall nanosleep), 0x00007ffff7adeb54 in nanosleep () from /lib64/libc.so.6
(gdb) i b
Num Type Disp Enb Address What
1 catchpoint keep y syscall "nanosleep"
catchpoint already hit 1 time
(gdb) c
Continuing.
Catchpoint 1 (returned from syscall nanosleep), 0x00007ffff7adeb54 in nanosleep () from /lib64/libc.so.6
(gdb) i b
Num Type Disp Enb Address What
1 catchpoint keep y syscall "nanosleep"
catchpoint already hit 2 times
(gdb) c
Continuing.
[Inferior 1 (process 19515) exited normally]

Debugging a c function in a R package

I want to debug the c function of an R package using R -d gdb, but I get the following after setting breakpoint at c function C_MIM(), I got the following information and also "cannot find bound of the current function" so I could not print out any variable value in this case. Is there something I am doing wrong? Or for some R package, it is not possible to debug?
Breakpoint 1, 0x00007fffdee0035f in C_MIM ()
from /home/sunxd/R/x86_64-pc-linux-gnu-library/3.4/praznik/libs/praznik.so
(gdb) list
76 in ../sysdeps/unix/syscall-template.S
(gdb) n
Single stepping until exit from function C_MIM,
which has no line number information.
^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7fffdddfa700 (LWP 21179)]
---Type <return> to continue, or q <return> to quit---
0x00007ffff45c707e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
it turns out one must have the source code and compile the R package using specific gcc/cc options.

getting and settings CPU registers of multiple threads using ptrace

I am interested in running a multithreaded application in the supervision of another monitoring process. The monitoring process should be able to get and set CPU registers of all the threads in the monitored application. I know how to do this for a single threaded application. But I'm interested in knowing how to extend this for multithreaded applications.
You can use thread id instead of pid in ptrace and it should work fine. However thread management needs to be done by you.
Use thread id instead of pid in ptrace, is not a solution.
Because in Linux-64, pthread_t--unsigned long, pid_t--unsigned int.
I wondered this issue, too.
I have another method to get thread-reg-info, using gdb.
This is my code:
void *ThrFunc(void *para)
{
printf("hello world.\n");
sleep(-1); // suspend the thread.
}
int main()
{
pthread_t ptid;
int ret = pthread_create(&ptid, NULL, ThrFunc, NULL);
if(ret != 0)
{
exit(errno);
}
pthread_join(ptid, NULL);// suspend the main thread.
return 0;
}
The following is gdb debug details:
(gdb) info thread
2 Thread 0x7ffff7fe9700 (LWP 4533) 0x00000033d98ab91d in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7ffff7feb720 (LWP 4530) 0x00000033d9c080ad in pthread_join () from /lib64/libpthread.so.0
(gdb) info reg
rax 0xfffffffffffffe00 -512
...
rip 0x33d9c080ad 0x33d9c080ad <pthread_join+269>
eflags 0x246 [ PF ZF IF ]
...
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff7fe9700 (LWP 4533))]#0 0x00000033d98ab91d in nanosleep () from /lib64/libc.so.6
(gdb) info thread
* 2 Thread 0x7ffff7fe9700 (LWP 4533) 0x00000033d98ab91d in nanosleep () from /lib64/libc.so.6
1 Thread 0x7ffff7feb720 (LWP 4530) 0x00000033d9c080ad in pthread_join () from /lib64/libpthread.so.0
(gdb) info reg
rax 0xfffffffffffffdfc -516
...
rip 0x33d98ab91d 0x33d98ab91d <nanosleep+45>
eflags 0x293 [ CF AF SF IF ]
...
I hope this will help you.
By the way, I also want to know: How to use ptrace() to get a thread registers details?

How can I get GDB to tell me what address caused a segfault?

I'd like to know if my program is accessing NULL pointers or stale memory.
The backtrace looks like this:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2b0fa4c8 (LWP 1333)]
0x299a6ad4 in pthread_mutex_lock () from /lib/libpthread.so.0
(gdb) bt
#0 0x299a6ad4 in pthread_mutex_lock () from /lib/libpthread.so.0
#1 0x0058e900 in ?? ()
With GDB 7 and higher, you can examine the $_siginfo structure that is filled out when the signal occurs, and determine the faulting address:
(gdb) p $_siginfo._sifields._sigfault.si_addr
If it shows (void *) 0x0 (or a small number) then you have a NULL pointer dereference.
Run your program under GDB. When the segfault occurs, GDB will inform you of the line and statement of your program, along with the variable and its associated address.
You can use the "print" (p) command in GDB to inspect variables. If the crash occurred in a library call, you can use the "frame" series of commands to see the stack frame in question.

Resources