I'm trying to debug my application that is based on an STM32F3 uC running FreeRTOS. I have manually set the PSP to an invalid value (e.g. 0) at random places in thread context in the application expecting my memManageFault/busFault/usageFault/hardFault handlers to fire. Unfortunately none of the fault handlers are executed, but the core locks up on the first push to the invalid stack. What am I missing?
Some more details from the lockup state:
SCB->SHCSR: 0x74001 (all three faultHandlers are enabled, busFault pending, memFault active)
SCB->HFSR:0x40000000 (fault escalated to hardFault even though all handlers are defined and enabled)
SCB->CFSR: 0x28601 (BFAR valid, precise error)
SCB->BFAR/SCB->MMFAR: 0xfffffff7 (erroneous SP after sub, I assume)
PRIMASK/FAULTMASK/BASEPRI: 0
MSP: 0x2000ffe0 (still valid, the handler should run just fine)
Any ideas are welcome.
It seems like once again the core is right and I am wrong. The mistake I made was that although I have implemented the HardFault_Handler as a naked function, all the other fault handlers were simple application failure hooks implemented in C, trying to access the stack in whatever context they interrupted. Needless to say, things went dirty quickly.
Implementing all handlers in asm solved the issue of the core locking up on corrupted a SP.
busFault pending, memFault active - memFault has caused busError - and it kills the micro
Exception stacking uses the same stack as the current context. By providing an invalid stack pointer, you've prevented any of the exception handlers being able to complete. Lockup specifically addresses this scenario.
Related
I'm running on a Raspberry Pi Pico (RP2040, Cortex-M0+ core, debugging via VSCode cortex-debug using JLink SWD), and I'm seeing strange behaviour regarding PendSV.
Immediately prior, the SVCall exception handler requested PendSV via the ICSR register. But on exception return, rather than tail-chaining the PendSV, execution instead returns to the calling code and continues non-exception execution.
All the while the ICSR register shows the pending PendSV, even while thread code instructions are repeatedly stepped. System handler priorities are all zero, IRQ priorities are lower.
According to the ARMv6-M reference manual, PendSV cannot be disabled.
So, what am I missing that would cause this behaviour?
Edited to add:
Perhaps it's a debugger interaction? The JLink software (v4.95d) is still in Beta...
I see that the debugger can actually disable PendSV and Systick - C1.5.1 Debug Stepping: "Optionally, the debugger can set DHCSR.C_MASKINTS to 1 to prevent PendSV, SysTick, and external configurable interrupts from occurring. This is described as masking these interrupts. Table C1-7 on page C1-326 summarizes instruction stepping control."
It turns out that the problem is caused by single-stepping the instruction that writes to the PENDSVSET bit in the ICSR: the bit is set, and the VECTPENDING field shows 0xe, but the PendSV never fires.
Free-running over that instruction to a later breakpoint sees the PendSV fire correctly.
So it is indeed a debugger interaction.
Whether that's to do with interrupts being inhibited as #cooperised suggests isn't clear - the DHCSR's C_MASKINTS bit reads as zero throughout, but how that bit is manipulated during the actual step operation isn't visible at this level.
Which makes me wonder whether the way the JLink is performing the step induces unpredictable/indeterminate behaviour - e.g. as per the warning in the C_MASKINTS description. Or perhaps this is simply what happens in an M0+ under these circumstances, and I've never single-stepped this instruction before.
In any case, the workaround is simply to not single-step the instruction that sets PENDSVSET.
Edited to add:
Finally, #cooperised was correct.
On taking more care to distinguish exactly between stepping (including stepping over function calls) and running (including running to the very next instruction), it's clear that stepping disables interrupts including PendSV.
The same thing happened to me but I found that the reason was that I was not closing the previous PensSV interrupt by returning through LR containing 0xFFFFFFF9. Instead I was returning via the PC to a previous routine's return address.
Since I did not return via 0xFFFFFFF9 it was not properly closing the previous PendSV and did not recognize subsequent ones.
I'm integrating FreeRTOS cmsis_v2 on my STM32F303VCx and come to a certain problem then using Event Flags when blocking the task to wait for operation approval from another task.
If the task executes the following code, all other tasks get minimal runtime (understandably because OS is constantly checking evt_flg):
for(;;)
{
flag = osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, 0);
if (flag == EventOccured)
{
/* Task main route */
osEventFlagsClear (evt_flg,EventOccured);
}
}
But if to set timeout to osWaitForver: osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, osWaitForver ), the whole program goes into HardFault.
What's the best solution for such behavior? I need the task to wait for a flag and don't block other ones, such as terminal input read, from running.
The task code the question provides is constantly busy, polling the RTOS event.
This is a design antipattern, it is virtually always better to have the task block until the event source has fired. The only exception where a call to osEventFlagsWait() with a zero timeout could make more sense is if you have to monitor several different event/data sources for which there is not a common RTOS API to wait for (and even then, this is only an "emergency exit"). Hence, osWaitForver shall be used.
Next, the reason for the HardFault should be sought. Alone in this task code, I don't see a reason for this - the HardFault source is likely somewhere else. When the area the HardFault can come from, that could be worth a new question (or already fixed). Good luck!
I've written a program which triggers the HardFault_Handler. I believe it is because of a out of memory exception but I want to be completely sure about it. I've seen people disable system interrupt handlers on M3/M4 cores and the reference datasheet states that
(19-Feb-2016) Nested Vector Interrupt Controller
Removed MemManage_Handler, BusFault_Handler,Usagefault
_Handler and DebugMon_Handler from Table 53: List of vectors.
Updated EXTI_IMR reset value. (19-Feb-2016)
This means that once upon a time the MemManage_Handler existed and that it could be enabled/disabled. But no documentation of this exists. Is it possible to enable this handler?
I personally find it hard to believe that ST has completely removed this handler from the silicon and as such a dormant part of a register should be written to to enable this handler.
See page 2-17/2-18 of ARM's Cortex-M0+ Devices Generic User Guide, which shows the exceptions native to the processor. This part doesn't have a MemManage exception and all exceptions handled by the fault handler go through to the HardFault.
I suspect that ST's employees made a copy paste error of the vector table at some point from elsewhere, which did have the MemManage_Handler. This also explains the note in the datasheet as they fixed a mistake instead of hiding away a feature.
You can't.
According to Section 3.5 in Managing memory protection unit (MPU) in STM32 MCUs there is no MemMange Fault for Cortex-M0+ devices, it only can trigger HardFault for MPU error.
It is not a ST's decision to remove this feature but Cortex-M0+ simply doesn't have MemMange Fault. I think ST made a copy-and-paste mistake in their documentation.
However, I believe you still can catch MPU errors in the HardFault Handler.
I attended a lecture on FreeRtos and Cortex M where the instructor advised that if ISR safe version of API is not used from ISR it can lead to Usage fault exception in Cortex M processors.This would happen because this can involve going from interrupt context(interrupt handler) to task context(thread handler)
My question is why this task switch would be considered illegal and what would be the repercussions of such a switch?
On a Cortex-M (some of) the current context will be stored on stack in use before the interrupt (on interrupt entry), so if you were in a task and it was interrupted some of the current context would be stored on the task stack (via the PSP). The interrupt itself always runs on the MSP. If the interrupt does not return to the task which it interrupted then the interrupted task will have a messed up stack (as it restores the stored context on exit) as well as trying to restore an incorrect context for the task which was switched to.
On a context switch(which happens in an interrupt) again some of the context is automatically stored on the task stack, but the OS also stores the rest of the context on the task stack as well. When it does the switch and exits the interrupt, the OS restores the context it stored itself for the task and then the rest of the context is automatically restored by exiting the interrupt. This works as it is ensured that the stack is left in the correct format. Look at interrupt entry / exit in the Cortex-M4 Generic User Guide.
Not all processors work like this.
This answer is generic and not FreeRTOS nor Cortex-M specific - it applies to any typical RTOS on any platform:
RTOS API calls that cause the scheduler to run cannot be called from the interrupt service routine. For example, if you give a semaphore, normally the scheduler runs to switch to any task pending on that semaphore; this is inappropriate in an ISR, where the scheduler must run once when the interrupt context exits; obviously you do not want a context switch before the interrupt has completed and there may be other API calls or preemption by higher priority interrupts that cause a different task to become runable; making that assessment only once when the context switch will occur maintains deterministic behaviour.
The ISR specific versions of functions do not invoke the scheduler immediately; instead they set a flag to indicate that the scheduler must run on exit from the interrupt context.
Typically an ISR that makes RTOS API calls must have a prologue and epilogue; specific enter/exit interrupt calls or macros. This prologue, increments a counter that is decremented in the epilogue; if the counter is zero and the schedule flag is set, the scheduler will run. The counter is to prevent the scheduler from running on exit from a nested interrupt. This ensures that the scheduler only runs once on exit form the lowest priority pending interrupt.
Whether or why a "uasge fault" would occur is a FreeRTOS specific implementation detail, and largely academic. An RTOS could equally trap the usage of a non-ISR safe call in an ISR and run a more specific error handler, if it does not bother to do so, then the resulting behaviour might trigger a usage fault; it seems a somewhat crude mechanism to rely on; Usage Fault is a very broad trap and could happen for a number of reasons:
Usage Fault: detects execution of undefined instructions, unaligned
memory access for load/store multiple. When enabled, divide-by-zero
and other unaligned memory accesses are also detected.
Some RTOS do not have ISR specific functions, instead the API calls that cause scheduling detect internally the ISR context and behave differently in that context - this is simpler and safer for the programmer, but carries a small overhead to test the context on every such call. An API that handles ISR safety internally also means that calling functions that may themselves make OS API calls is simpler since those functions themselves need not be ISR specific.
I have an application that I am porting from the Keil IDE to build with the GNU toolchain due to license issues. I have successfully be able to set up, build, flash and run the application on the device.
The application on the GNU side is for some reason is getting stuck in the weak linked IRQ handler for the WWDG which is an infinite loop. The application does not enable the WWDG, and it is disabled at reset by default. I have also verified that the configuration registers are at their default startup values.
The only difference, other than compilers, are the linker and startup files. However, both the startup files, and linker files used by both toolchains are defaults generated by STM.
Any idea what may be causing this? I'm about at my wits end here.
Using the stm32f103XX, let me know if any other information would be helpful.
EDIT:
Using the comments below I was able to ascertain that it is, in fact, the HardFault_Handler that is being triggered.
I have included the backtrace output below if that may be of help
GDB BT:
0 HardFault_Handler ()
1 (signal handler called)
2 0x720a3de in ?? ()
3 0x80005534 in foo ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
2 things stand out to me, though im no gdb expert. 1) foo is not a function, it is a const array of chars and 2) 0x0720a3de is not a valid memory address the flash address range starts at 0x08000000
So thanks to the kick in the pants by D Krueger. I was able to figure out that the HardFault_Handler was what was actually being called. So, anyone that stumbles on this post, verify which IRQ is truly being called by writing temporary functions to cover the likely culprits i.e. HardFault. The true issue for the IRQ call is a bad memory access by memcpy which I am on my way to solving next.
I had exactly the same error as OP (apparent WWDG interrupt, but actually the HardFault_Handler firing) when porting an example for the STM32F3 Discovery board to compile in CooCox CoIDE 1.7.7 with STM32Cube F3 libraries (v1.1.0). The code ran fine as long as I didn't try using any interrupts, but as soon as I turned on the SysTick timer interrupt, the HardFault exception tripped.
The problem was that I had neglected to include the stm32f3xx_it.h and stm32f3xx_it.c files in the project. Their absence wasn't causing any compiler warnings/errors. Once they were compiled & linked in, the code with interrupts ran fine.
I've had this problem due to the same root cause as awilhite. I'm using Atollic TrueStudio 8.0.0. I used it to start a project for STM32F030 and (probably manually) added libraries folder with stm32f0xx.h, which defines ADC1_IRQn (IRQ channel number used in NVIC setup).
And I implemented ADC1_IRQHandler(void) in my main.c (as I'm used to and it always worked so far -- x_IRQn -> x_IRQHandler)
But after 2 days frustration, I found out, that startup_stm32f0xx.s in my project defines ADC1_COMP_IRQHandler.
So, ultimately, my ADC interrupt handler was undefined and when the ADC generated the interrupt, the program crashed (WWDG interrupt).
I hope this helps to people like me, who think they did implement their handler but in fact, they did not.
I had a very similar problem when merging two projects generated separately by STM32CubeMX for an STM32F2XX processor. One project was using the Ethernet peripheral, while the other was not. Besides that one difference, the two projects used the same set of peripherals.
After integrating the two projects together by manually copying files, the application would end up in the WWDG_IRQHandler after starting the first task (when interrupts are enabled for the first time). I first confirmed that the WDGA bit of the WWDG register was indeed not set and, therefore, that the WWDG peripheral was disabled. Next, I verified that the interrupt vector table was initialized correctly. Finally, after several hours of digging, I realized that I had not defined the ETH_IRQHandler function in stm32f2xx_it.c, which provoked the Ethernet interrupt to be handled by the default handler, masking itself as the WWDG_IRQHandler -- likely due to optimization.
The core problem is that the Default Handler is called instead of another irq handler. I doubt that our situations are the same but here is my solution:
I was working on a c++ project, the same happened to me. This was the first time I made a project from scratch & with CMSIS. After some unsuccessful attempts I went through a generated project when I noticed that in the stm32xxxx_it.h the IRQ handler function prototypes are guarded by these:
extern "C"
{
void TIM7_IRQHandler(void);
}
With these guards the linker could find my own interrupt handler functions.
I'll expand a bit on what led me here, and how I use the insight from #Mike to correct it.
I had a project running fine on a demo project in Eclipse SW4STM32, but with sources and headers scattered all over the place so I wanted to have a more "compact" project easier to customize and use as a base for minor modifications (and easier to follow in Git).
I created an empty AC6 project targetting the same board. It generated the HAL drivers, the startup_stm32.s and LinkerScript.ld. I then copied all of the .c and corresponding .h from the original project to my new project (which was a pain in itself because they were scattered in BSP, CMSIS, Components, Middlewares, etc. directories). Everything compiled and seemed to work, until I started modifying a bit.
In the debugger, it seemed all function calls were working until the while(1) main loop where I ended up in the Default_Handler defined in the startup_stm32.s, seemingly from WWDG_IRQHandler. That was, in fact, the default IRQ handler for not-user-defined handlers (WWDG_IRQHandler being the first one declared, it was reported as such by gdb, as indicated by #D Krüger).
I started looking at compiler and linker options or linker script, without much luck, until I realized the only file I didn't check was the startup_stm32.s, which was indeed different.
I blindly copy-pasted it and voilà!
The explanation I could give is that the STM32 is calling IRQ handlers defined in the startup_stm32.s when interrupt occur, all of them initially pointing to Default_Handler() (later overriden by the linker). So if a .c file you copied defines a handler with a slightly different name (but consistent with its own startup_xxx.s), you'll end up with the Default_Handler() being called (which is an infinite loop) instead of the one you defined. And things go wrong.
See https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html for more information.
N.B. I'm not happy to blindly copy-paste without fully understanding, but time constraints and milestones usually push you to territories you're not happy to explore...
Will add my 5 cents. I had this issue on stm32h7, but for me the cause was that the cube "forgot" to add TIM16_IRQHandler when TIM16 is used as the timebase source. It was not happening at the beginning but after several code regenerations. Looks like a bug in the cube, as the TIM16 was still set, but the interrupt handler got removed. So toggking to TIM17 and back resolved the issue.
In my case, I had a function written in the GCC assembly that was migrated from the ARM assembly. The problem went away after I had added the .thumb_func line to the assembly file.
I was getting this error:
(gdb) c
+c
Continuing.
Program received signal SIGINT, Interrupt.
WWDG_IRQHandler () at ...startup_stm32f40_41xxx.s:121
(gdb) bt
#0 WWDG_IRQHandler () at ...startup_stm32f40_41xxx.s:12
#1 <signal handler called>
#2 RTOS_SysTick_Handler () at ...osKernel.s:18
#3 <signal handler called>
#4 0x0800021a in task0 () at ...main.cpp:10
#5 0x08000214 in frame_dummy ()
#6 0x00000000 in ?? ()
RTOS_SysTick_Handler is a function written in assembly and the WWDG_IRQHandler was always triggered before any first assembly instructions in that function (tried different instructions and it didn't change anything).
I was doing some tweaks around the C code and at some point, I hit another handler: UsageFault which led me to the .thumb_func hint: ARM Cortex M4 SVC_Handler "UsageFault".