I have an application that I am porting from the Keil IDE to build with the GNU toolchain due to license issues. I have successfully be able to set up, build, flash and run the application on the device.
The application on the GNU side is for some reason is getting stuck in the weak linked IRQ handler for the WWDG which is an infinite loop. The application does not enable the WWDG, and it is disabled at reset by default. I have also verified that the configuration registers are at their default startup values.
The only difference, other than compilers, are the linker and startup files. However, both the startup files, and linker files used by both toolchains are defaults generated by STM.
Any idea what may be causing this? I'm about at my wits end here.
Using the stm32f103XX, let me know if any other information would be helpful.
EDIT:
Using the comments below I was able to ascertain that it is, in fact, the HardFault_Handler that is being triggered.
I have included the backtrace output below if that may be of help
GDB BT:
0 HardFault_Handler ()
1 (signal handler called)
2 0x720a3de in ?? ()
3 0x80005534 in foo ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
2 things stand out to me, though im no gdb expert. 1) foo is not a function, it is a const array of chars and 2) 0x0720a3de is not a valid memory address the flash address range starts at 0x08000000
So thanks to the kick in the pants by D Krueger. I was able to figure out that the HardFault_Handler was what was actually being called. So, anyone that stumbles on this post, verify which IRQ is truly being called by writing temporary functions to cover the likely culprits i.e. HardFault. The true issue for the IRQ call is a bad memory access by memcpy which I am on my way to solving next.
I had exactly the same error as OP (apparent WWDG interrupt, but actually the HardFault_Handler firing) when porting an example for the STM32F3 Discovery board to compile in CooCox CoIDE 1.7.7 with STM32Cube F3 libraries (v1.1.0). The code ran fine as long as I didn't try using any interrupts, but as soon as I turned on the SysTick timer interrupt, the HardFault exception tripped.
The problem was that I had neglected to include the stm32f3xx_it.h and stm32f3xx_it.c files in the project. Their absence wasn't causing any compiler warnings/errors. Once they were compiled & linked in, the code with interrupts ran fine.
I've had this problem due to the same root cause as awilhite. I'm using Atollic TrueStudio 8.0.0. I used it to start a project for STM32F030 and (probably manually) added libraries folder with stm32f0xx.h, which defines ADC1_IRQn (IRQ channel number used in NVIC setup).
And I implemented ADC1_IRQHandler(void) in my main.c (as I'm used to and it always worked so far -- x_IRQn -> x_IRQHandler)
But after 2 days frustration, I found out, that startup_stm32f0xx.s in my project defines ADC1_COMP_IRQHandler.
So, ultimately, my ADC interrupt handler was undefined and when the ADC generated the interrupt, the program crashed (WWDG interrupt).
I hope this helps to people like me, who think they did implement their handler but in fact, they did not.
I had a very similar problem when merging two projects generated separately by STM32CubeMX for an STM32F2XX processor. One project was using the Ethernet peripheral, while the other was not. Besides that one difference, the two projects used the same set of peripherals.
After integrating the two projects together by manually copying files, the application would end up in the WWDG_IRQHandler after starting the first task (when interrupts are enabled for the first time). I first confirmed that the WDGA bit of the WWDG register was indeed not set and, therefore, that the WWDG peripheral was disabled. Next, I verified that the interrupt vector table was initialized correctly. Finally, after several hours of digging, I realized that I had not defined the ETH_IRQHandler function in stm32f2xx_it.c, which provoked the Ethernet interrupt to be handled by the default handler, masking itself as the WWDG_IRQHandler -- likely due to optimization.
The core problem is that the Default Handler is called instead of another irq handler. I doubt that our situations are the same but here is my solution:
I was working on a c++ project, the same happened to me. This was the first time I made a project from scratch & with CMSIS. After some unsuccessful attempts I went through a generated project when I noticed that in the stm32xxxx_it.h the IRQ handler function prototypes are guarded by these:
extern "C"
{
void TIM7_IRQHandler(void);
}
With these guards the linker could find my own interrupt handler functions.
I'll expand a bit on what led me here, and how I use the insight from #Mike to correct it.
I had a project running fine on a demo project in Eclipse SW4STM32, but with sources and headers scattered all over the place so I wanted to have a more "compact" project easier to customize and use as a base for minor modifications (and easier to follow in Git).
I created an empty AC6 project targetting the same board. It generated the HAL drivers, the startup_stm32.s and LinkerScript.ld. I then copied all of the .c and corresponding .h from the original project to my new project (which was a pain in itself because they were scattered in BSP, CMSIS, Components, Middlewares, etc. directories). Everything compiled and seemed to work, until I started modifying a bit.
In the debugger, it seemed all function calls were working until the while(1) main loop where I ended up in the Default_Handler defined in the startup_stm32.s, seemingly from WWDG_IRQHandler. That was, in fact, the default IRQ handler for not-user-defined handlers (WWDG_IRQHandler being the first one declared, it was reported as such by gdb, as indicated by #D Krüger).
I started looking at compiler and linker options or linker script, without much luck, until I realized the only file I didn't check was the startup_stm32.s, which was indeed different.
I blindly copy-pasted it and voilà!
The explanation I could give is that the STM32 is calling IRQ handlers defined in the startup_stm32.s when interrupt occur, all of them initially pointing to Default_Handler() (later overriden by the linker). So if a .c file you copied defines a handler with a slightly different name (but consistent with its own startup_xxx.s), you'll end up with the Default_Handler() being called (which is an infinite loop) instead of the one you defined. And things go wrong.
See https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html for more information.
N.B. I'm not happy to blindly copy-paste without fully understanding, but time constraints and milestones usually push you to territories you're not happy to explore...
Will add my 5 cents. I had this issue on stm32h7, but for me the cause was that the cube "forgot" to add TIM16_IRQHandler when TIM16 is used as the timebase source. It was not happening at the beginning but after several code regenerations. Looks like a bug in the cube, as the TIM16 was still set, but the interrupt handler got removed. So toggking to TIM17 and back resolved the issue.
In my case, I had a function written in the GCC assembly that was migrated from the ARM assembly. The problem went away after I had added the .thumb_func line to the assembly file.
I was getting this error:
(gdb) c
+c
Continuing.
Program received signal SIGINT, Interrupt.
WWDG_IRQHandler () at ...startup_stm32f40_41xxx.s:121
(gdb) bt
#0 WWDG_IRQHandler () at ...startup_stm32f40_41xxx.s:12
#1 <signal handler called>
#2 RTOS_SysTick_Handler () at ...osKernel.s:18
#3 <signal handler called>
#4 0x0800021a in task0 () at ...main.cpp:10
#5 0x08000214 in frame_dummy ()
#6 0x00000000 in ?? ()
RTOS_SysTick_Handler is a function written in assembly and the WWDG_IRQHandler was always triggered before any first assembly instructions in that function (tried different instructions and it didn't change anything).
I was doing some tweaks around the C code and at some point, I hit another handler: UsageFault which led me to the .thumb_func hint: ARM Cortex M4 SVC_Handler "UsageFault".
Related
I'm running on a Raspberry Pi Pico (RP2040, Cortex-M0+ core, debugging via VSCode cortex-debug using JLink SWD), and I'm seeing strange behaviour regarding PendSV.
Immediately prior, the SVCall exception handler requested PendSV via the ICSR register. But on exception return, rather than tail-chaining the PendSV, execution instead returns to the calling code and continues non-exception execution.
All the while the ICSR register shows the pending PendSV, even while thread code instructions are repeatedly stepped. System handler priorities are all zero, IRQ priorities are lower.
According to the ARMv6-M reference manual, PendSV cannot be disabled.
So, what am I missing that would cause this behaviour?
Edited to add:
Perhaps it's a debugger interaction? The JLink software (v4.95d) is still in Beta...
I see that the debugger can actually disable PendSV and Systick - C1.5.1 Debug Stepping: "Optionally, the debugger can set DHCSR.C_MASKINTS to 1 to prevent PendSV, SysTick, and external configurable interrupts from occurring. This is described as masking these interrupts. Table C1-7 on page C1-326 summarizes instruction stepping control."
It turns out that the problem is caused by single-stepping the instruction that writes to the PENDSVSET bit in the ICSR: the bit is set, and the VECTPENDING field shows 0xe, but the PendSV never fires.
Free-running over that instruction to a later breakpoint sees the PendSV fire correctly.
So it is indeed a debugger interaction.
Whether that's to do with interrupts being inhibited as #cooperised suggests isn't clear - the DHCSR's C_MASKINTS bit reads as zero throughout, but how that bit is manipulated during the actual step operation isn't visible at this level.
Which makes me wonder whether the way the JLink is performing the step induces unpredictable/indeterminate behaviour - e.g. as per the warning in the C_MASKINTS description. Or perhaps this is simply what happens in an M0+ under these circumstances, and I've never single-stepped this instruction before.
In any case, the workaround is simply to not single-step the instruction that sets PENDSVSET.
Edited to add:
Finally, #cooperised was correct.
On taking more care to distinguish exactly between stepping (including stepping over function calls) and running (including running to the very next instruction), it's clear that stepping disables interrupts including PendSV.
The same thing happened to me but I found that the reason was that I was not closing the previous PensSV interrupt by returning through LR containing 0xFFFFFFF9. Instead I was returning via the PC to a previous routine's return address.
Since I did not return via 0xFFFFFFF9 it was not properly closing the previous PendSV and did not recognize subsequent ones.
I've written a program which triggers the HardFault_Handler. I believe it is because of a out of memory exception but I want to be completely sure about it. I've seen people disable system interrupt handlers on M3/M4 cores and the reference datasheet states that
(19-Feb-2016) Nested Vector Interrupt Controller
Removed MemManage_Handler, BusFault_Handler,Usagefault
_Handler and DebugMon_Handler from Table 53: List of vectors.
Updated EXTI_IMR reset value. (19-Feb-2016)
This means that once upon a time the MemManage_Handler existed and that it could be enabled/disabled. But no documentation of this exists. Is it possible to enable this handler?
I personally find it hard to believe that ST has completely removed this handler from the silicon and as such a dormant part of a register should be written to to enable this handler.
See page 2-17/2-18 of ARM's Cortex-M0+ Devices Generic User Guide, which shows the exceptions native to the processor. This part doesn't have a MemManage exception and all exceptions handled by the fault handler go through to the HardFault.
I suspect that ST's employees made a copy paste error of the vector table at some point from elsewhere, which did have the MemManage_Handler. This also explains the note in the datasheet as they fixed a mistake instead of hiding away a feature.
You can't.
According to Section 3.5 in Managing memory protection unit (MPU) in STM32 MCUs there is no MemMange Fault for Cortex-M0+ devices, it only can trigger HardFault for MPU error.
It is not a ST's decision to remove this feature but Cortex-M0+ simply doesn't have MemMange Fault. I think ST made a copy-and-paste mistake in their documentation.
However, I believe you still can catch MPU errors in the HardFault Handler.
Let's say you have a pointer to function whose source you do not have and which is "untrusted" because it might read/write to disallowed memory region.
Before it executes each assembly instruction, you want to verify that it doesn't access disallowed memory regions.
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
This is for a functionality that needs to be enabled not only during development but during normal runtime.
Ideally, it'd run something like this:
void (*fptr)(int);
fptr = &someFunction; // untrusted, don't have source
// enable interrupts for each assembly instruction
_EN_INT();
// call the function
fptr();
// everytime the PC increments, some other code runs which verifies that if any load/stores are executed, it doesn't access some disallowed memory range
// disable interrupts for each assembly instruction
_DIS_INT();
QUESTION
Is it possible to call that function and pause execution after every assembly instruction?
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
My answer assumes that you can modify the "OS" the way you need it...
Cortex MK20DX256VLH7
This seems to be a Cortex M4 CPU.
how to single-step code on-target with no jtag, breakpoints
From the doc, it doesn't say whether you NEED an external debugger to resume execution.
If the CPU is really stopped, you'll definitely need an external signal (e.g. from a debugger).
However most CPUs support software debugging. This means that an interrupt service routine is executed whenever a breakpoint is hit. To continue execution you simply return from the interrupt service routine.
I don't know about the Cortex M4 but for the Cortex M3 you'll have to set some special registers to enable that feature. Whenever a "BKPT" instruction is hit then interrupt #12 (*) is executed.
For code in RAM you simply write an BKPT instruction (0xBExx, e.g. 0xBEBE) to the address where you want to set your breakpoint. (Before writing you read out the value to be able to restore it later on).
For code in Flash the M3 has a "Flash patching unit" which allows you to specify up to three addresses which shall be read out as 0xBExx (0xBEBE ?) even if other data is stored there. This allows you to set up to 3 breakpoints in Flash.
Interesting for you: The register controlling the debug features in the M3 (named "DEMCR") also has a bit named "MON_STEP":
If you set this bit in interrupt handler #12 then exactly one instruction is executed after returning from the interrupt handler and interrupt #12 is triggered again. The use case for this feature is - of course - single-stepping code!
To stop single-stepping you'll have to clear the MON_STEP bit again...
Important 1:
I don't know if the MK20DX256VLH7 really has all these features. However because it is a Cortex M4 chip and the M4 should have nearly all features of the M3 these features should be present...
Important 2:
Implementing single-stepping and debugging is not done quickly. Assembly language knowledge will be very helpful and you'll need a lot of time...
From the doc, ...
You will not only need the documentation for the MK20DX256VLH7 from NXP but you'll also need the Cortex M4 documentation from ARM.
(*) Offset 4*12 in the vector table is meant here (which is named "IRQ(-4)" in some ARM documents); not IRQ12.
yes the ARM emulator/interpreter sounds exactly like what I want. Is there a free one?
qemu is open-source, most of it is GPLv2. https://wiki.qemu.org/License. You'd probably need to modify it a lot, because it's designed for use as a stand-alone wrapper for a whole Unix process (qemu-user) or whole machine (qemu-system).
I googled, and there's also http://www.unicorn-engine.org/ which is designed to be used as part of a larger program (written in C with bindings for calling from various languages). It's also GPLv2 (not LGPL), so you can use it if the rest of your code is also Free software.
It's actually based on the CPU-emulation code from QEMU; they stripped out all the device / BIOS emulation stuff to make a flexible library for just emulating CPUs.
Presumably you could configure some memory protections for it and set up the starting machine state, and let it run your function (with a return address that leads to some code that hands control back to your main code?)
I want to write a LKM (Linux Kernel Module) that hijacks the realtime clock (interrupt 8). So I want the interrupt to be set to my function and at some point send it back to the old function.
I have tried to use the request_irq function without any success, probably because the kernel function that is there is not willing to share the interrupt (which is a good decision I guess).
I also tried to edit the IDT (Interrupt Descriptor Table), according to some pages I found. Non of them worked, most didn't even compile since they where written for kernel 2.6, and I'm working with 3.10.
This is a simplified code that I have just to give you the idea of what I'm doing.
kpage =__get_free_page( GFP_KERNEL);
asm("sidt %0": : "m"(*idtr) : );
memcpy(kpage, idtr, 256*sizeof(kpage));
newidt = (unsigned long long *)(*(unsigned long*)(idtr+1));
newidt[8] = &my_function;
asm("lidt %0": "=m"(newidt):);
All my attempts ended in good times with a segmentation fault, and in bad times with the kernel crashing forcing me to reboot (luckily I work with a virtual machine and snapshots).
So how can I hijack the realtime interrupt so it does my stuff? (And then send it back to the original function to get executed.)
Here is some nice code to change the pagefault function on the IDT. I couldn't make it work, since it's also written for kernel 2.6. This question is also worth looking into.
To get the bounty please publish working code, or at least sufficient info to make it run.
This can help you : http://cormander.com/2011/12/how-to-hook-into-hijack-linux-kernel-functions-via-lkm/
Why not you simply hook a function that is call every x steps like you want and execute what ever you need ?
I have a strange bug in my code which disappears when I try to debug it.
In my timer interrupt (always running system ticker) I have something like this:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
in my main loop I have
if (some_global_flag)
{
some_global_flag = 0;
do_something_very_important(); // breakpoint 1
}
This condition in the main loop is never called when the conditions in the timer are (I think) fulfilled. The conditions are external (portpins, ADC results, etc).
First I put a breakpoint at the position 1, and it is never triggered.
To check it, I put breakpoint nr. 2 on the line some_global_flag = 1;, and in this case the code works: both breakpoints are triggered when the conditions are true.
Update 1:
To research whether some timing condition is responsible, and the if in the timer is never entered if running without debugging, I added the following in my timer:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
if (some_global_flag)
{
#asm("NOP"); // breakpoint 3
}
The flag is not used anywhere else in the code. It is in RAM, and the RAM is cleared to zero at the beginning.
Now, when all the breakpoints are disabled (or only breakpoint 1 in the main is enabled), the code does not work correctly, the function is not executed. However, if I enable only the breakpoint 3 on the NOP, the code works! The breakpoint is triggered, and after continuing, the function is executed. (It has visible and audible output, so it's obvious if it runs)
Update 2:
The timer interrupt was interruptible, by means of a "SEI" at its beginning. I removed that line, but the behavior is not changed in any noticeable way.
Update 3:
I'm not using any external memory.
As I'm very close to the limit in the flash, I have size optimization in the compiler on maximum.
Can the compiler (CodeVision) be responsible, or did I do something very wrong?
Debuggers can/do change the way the processor runs and code executes so this is not surprising.
divide and conquer. Start removing things until it works. In parallel with that start with nothing add only the timer interrupt and the few lines of code in the main loop with do_something_very_important() being something simple like blinking an led or spitting something out the uart. if that doesnt work you wont get the bigger app to work. If that does work start adding init code and more conditions in your interrupt, but do not complicate the main loop any more than the few lines described. Increase the interrupt handler conditions by adding more of the code back in until it fails.
When you reach the boundary where you can add one thing and fail and remove it and not fail then do some disassembly to see if it is a compiler thing. this might warrant another SO ticket if it is not obvious, "why does my avr interrupt handler break when I add ..."
If you are able to get this down to a small number of lines of code a dozen or so main and just the few interrupt lines, post that so others can try it on their own hardware and perhaps figure it out in parallel.
This is probably an typical optimizing / debugging bug. Make sure that some_global_flag is marked as volatile. This may be an int uint8 uint64 whatever you like...
volatile int some_global_flag
This way you tell the compiler not to make any assumptions on what the value of some_global_flag will be. You must do this because the compiler/optimizer can't see any call to your interrupt routine, so it assumes some_global_flag is always 0 (the initial state) and never changed.
Sorry misread the part where you already tried it...
You can try to compile the code with avr-gcc and see if you have the same behavior...
It might seem strange but it finally proved to be caused by strong transients on one of the input lines (which powers the system but its ADC measurement is also used as a condition).
The system can have periodic power fails for a short time, and important temporary data is kept in part of the internal SRAM, which is not cleaned after startup and designed to retain the data (for as much as 10 minutes or more) with the use of a small capacitor while the CPU is in brown-out.
I did not post this in the question because I tested this part of the system it and worked perfectly, so I did not want to throw you off course.
What I found out at the end, is that a new feature was used in an environment which created very strong transients, and one of the conditions in my question depended on a state which depended on one of those variables in the "permanent RAM", and finally using a breakpoint saved me from the effects of that transient.
Finally the problem was solved with adjustments in timing.
Edit: what helped me find the location of the problem was that I logged the values of my most important variables in the "permanent RAM" area and could see that a few of them got corrupted.
I may be wrong here but if you are using a debugger to attach to the board in question and debug the program on the hardware it was supposed to run on i think it can change the behavior of the microcontroller when it performs an attach.... Other that that and the volatile keyword suggested above i have no clues.
This is written assuming an ARM processor.
using a breakpoint ( RAM or ROM bkpoint ) forces processor to switch from Run mode to Debug Mode at the breakpoint ( either to halt mode or Monitor mode) and force it to run in Debug speed or to run an abort handler and hence JTAG based debugging is basically intrusive debugging.
ETM( embedded Trace Macrocell),specifically in ARM (or other types of bus instrumentation ) is designed to be non intrusive and can log the instructions and data in real time so that we can inspect what really happened.