I'm debugging a piece of (embedded) software. I've set a breakpoint on a function, and for some reason, once I've reached that breakpoint and continue I always come back to the function (which is an initialisation function which should only be called once). When I remove the breakpoint, and continue, GDB tells me:
Program received signal SIGTRAP, Trace/breakpoint trap.
Since I was working with breakpoints, I'm assuming I fell in a "breakpoint trap". What is a breakpoint trap?
Breakpoint trap just means the processor has hit a breakpoint. There are two possibilities for why this is happening. Most likely, your initialization code is being hit because your CPU is resetting and hitting the breakpoint again. The other possibility would be that the code where you set the breakpoint is actually run in places other than initialization. Sometimes with aggressive compiler optimization it can be hard to tell exactly which code your breakpoint maps to and which execution paths can get there.
The other possibility i can think of is:
1.Your process is running more than one thread.
For eg - 2 say x & y.
2.Thread y hits the break point but you have attached gdb to thread x.
This case is a Trace/breakpoint trap.
I got this problem running linux project in Visual studio 2015 and debugging remotely. My solution is project_properties -> Configuration properties -> Debugging -> Debugging mode and change the value from "gdbserver" to "gdb"
If you use V BAT as backup supply and your backup voltage drives lower than 1.65V then you get the same problem after conecting to a power supply.
In this case you have to disconnect all power supplies and reconnect with correct voltage level. Then the problem with debugging goes away.
I stucked with the same problem and in my
case the solution is to decrease SWDs frequency.
(I've got soldering staff between mcu and host, not so reliable)
I changed 4000k to 100k and problem gone.
Related
I'm running on a Raspberry Pi Pico (RP2040, Cortex-M0+ core, debugging via VSCode cortex-debug using JLink SWD), and I'm seeing strange behaviour regarding PendSV.
Immediately prior, the SVCall exception handler requested PendSV via the ICSR register. But on exception return, rather than tail-chaining the PendSV, execution instead returns to the calling code and continues non-exception execution.
All the while the ICSR register shows the pending PendSV, even while thread code instructions are repeatedly stepped. System handler priorities are all zero, IRQ priorities are lower.
According to the ARMv6-M reference manual, PendSV cannot be disabled.
So, what am I missing that would cause this behaviour?
Edited to add:
Perhaps it's a debugger interaction? The JLink software (v4.95d) is still in Beta...
I see that the debugger can actually disable PendSV and Systick - C1.5.1 Debug Stepping: "Optionally, the debugger can set DHCSR.C_MASKINTS to 1 to prevent PendSV, SysTick, and external configurable interrupts from occurring. This is described as masking these interrupts. Table C1-7 on page C1-326 summarizes instruction stepping control."
It turns out that the problem is caused by single-stepping the instruction that writes to the PENDSVSET bit in the ICSR: the bit is set, and the VECTPENDING field shows 0xe, but the PendSV never fires.
Free-running over that instruction to a later breakpoint sees the PendSV fire correctly.
So it is indeed a debugger interaction.
Whether that's to do with interrupts being inhibited as #cooperised suggests isn't clear - the DHCSR's C_MASKINTS bit reads as zero throughout, but how that bit is manipulated during the actual step operation isn't visible at this level.
Which makes me wonder whether the way the JLink is performing the step induces unpredictable/indeterminate behaviour - e.g. as per the warning in the C_MASKINTS description. Or perhaps this is simply what happens in an M0+ under these circumstances, and I've never single-stepped this instruction before.
In any case, the workaround is simply to not single-step the instruction that sets PENDSVSET.
Edited to add:
Finally, #cooperised was correct.
On taking more care to distinguish exactly between stepping (including stepping over function calls) and running (including running to the very next instruction), it's clear that stepping disables interrupts including PendSV.
The same thing happened to me but I found that the reason was that I was not closing the previous PensSV interrupt by returning through LR containing 0xFFFFFFF9. Instead I was returning via the PC to a previous routine's return address.
Since I did not return via 0xFFFFFFF9 it was not properly closing the previous PendSV and did not recognize subsequent ones.
I am trying to initiate a spi communication between an omap processor an sam4l one. I have configured spi protocol and omap is the master. Now what I see is the test data I am sending is correctly reaching on sam4l and I can see the isr is printing that data. Using more printf here and there in isr makes the operation happen and the respective operation happens, but if I remove all printfs I can't see any operation happening. What can be the cause of this anomaly? Is it a usual case of wrong frequency settings or something?
If code is needed I will post that too but its big.
Thanks
I think you are trying to print message in driver.
As printing message on console with slow down your driver, it may behave slowly and your driver work well.
Use pr_info() for debug and change setting to not come message on console by editing /proc/sys/kernel/printk to 4 4 1 7
-> It will store debug message in buffer.
-> Driver not slow down because of printing message on screen.
-> And you can see it by typing dmesg command later.
Then find orignal problem which may cause error.
If a routine works with printf "here and there" and not otherwise, almostcertainly the problem is that there are timing issues. As a trivial example, let's say you write to an SPI flash and then check its content. The flash memory write will take some times, so if you check immediately, the data would not be valid, but if you insert a printf call in between, it may have taken enough time that the read back is now valid.
I have an application that I am porting from the Keil IDE to build with the GNU toolchain due to license issues. I have successfully be able to set up, build, flash and run the application on the device.
The application on the GNU side is for some reason is getting stuck in the weak linked IRQ handler for the WWDG which is an infinite loop. The application does not enable the WWDG, and it is disabled at reset by default. I have also verified that the configuration registers are at their default startup values.
The only difference, other than compilers, are the linker and startup files. However, both the startup files, and linker files used by both toolchains are defaults generated by STM.
Any idea what may be causing this? I'm about at my wits end here.
Using the stm32f103XX, let me know if any other information would be helpful.
EDIT:
Using the comments below I was able to ascertain that it is, in fact, the HardFault_Handler that is being triggered.
I have included the backtrace output below if that may be of help
GDB BT:
0 HardFault_Handler ()
1 (signal handler called)
2 0x720a3de in ?? ()
3 0x80005534 in foo ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
2 things stand out to me, though im no gdb expert. 1) foo is not a function, it is a const array of chars and 2) 0x0720a3de is not a valid memory address the flash address range starts at 0x08000000
So thanks to the kick in the pants by D Krueger. I was able to figure out that the HardFault_Handler was what was actually being called. So, anyone that stumbles on this post, verify which IRQ is truly being called by writing temporary functions to cover the likely culprits i.e. HardFault. The true issue for the IRQ call is a bad memory access by memcpy which I am on my way to solving next.
I had exactly the same error as OP (apparent WWDG interrupt, but actually the HardFault_Handler firing) when porting an example for the STM32F3 Discovery board to compile in CooCox CoIDE 1.7.7 with STM32Cube F3 libraries (v1.1.0). The code ran fine as long as I didn't try using any interrupts, but as soon as I turned on the SysTick timer interrupt, the HardFault exception tripped.
The problem was that I had neglected to include the stm32f3xx_it.h and stm32f3xx_it.c files in the project. Their absence wasn't causing any compiler warnings/errors. Once they were compiled & linked in, the code with interrupts ran fine.
I've had this problem due to the same root cause as awilhite. I'm using Atollic TrueStudio 8.0.0. I used it to start a project for STM32F030 and (probably manually) added libraries folder with stm32f0xx.h, which defines ADC1_IRQn (IRQ channel number used in NVIC setup).
And I implemented ADC1_IRQHandler(void) in my main.c (as I'm used to and it always worked so far -- x_IRQn -> x_IRQHandler)
But after 2 days frustration, I found out, that startup_stm32f0xx.s in my project defines ADC1_COMP_IRQHandler.
So, ultimately, my ADC interrupt handler was undefined and when the ADC generated the interrupt, the program crashed (WWDG interrupt).
I hope this helps to people like me, who think they did implement their handler but in fact, they did not.
I had a very similar problem when merging two projects generated separately by STM32CubeMX for an STM32F2XX processor. One project was using the Ethernet peripheral, while the other was not. Besides that one difference, the two projects used the same set of peripherals.
After integrating the two projects together by manually copying files, the application would end up in the WWDG_IRQHandler after starting the first task (when interrupts are enabled for the first time). I first confirmed that the WDGA bit of the WWDG register was indeed not set and, therefore, that the WWDG peripheral was disabled. Next, I verified that the interrupt vector table was initialized correctly. Finally, after several hours of digging, I realized that I had not defined the ETH_IRQHandler function in stm32f2xx_it.c, which provoked the Ethernet interrupt to be handled by the default handler, masking itself as the WWDG_IRQHandler -- likely due to optimization.
The core problem is that the Default Handler is called instead of another irq handler. I doubt that our situations are the same but here is my solution:
I was working on a c++ project, the same happened to me. This was the first time I made a project from scratch & with CMSIS. After some unsuccessful attempts I went through a generated project when I noticed that in the stm32xxxx_it.h the IRQ handler function prototypes are guarded by these:
extern "C"
{
void TIM7_IRQHandler(void);
}
With these guards the linker could find my own interrupt handler functions.
I'll expand a bit on what led me here, and how I use the insight from #Mike to correct it.
I had a project running fine on a demo project in Eclipse SW4STM32, but with sources and headers scattered all over the place so I wanted to have a more "compact" project easier to customize and use as a base for minor modifications (and easier to follow in Git).
I created an empty AC6 project targetting the same board. It generated the HAL drivers, the startup_stm32.s and LinkerScript.ld. I then copied all of the .c and corresponding .h from the original project to my new project (which was a pain in itself because they were scattered in BSP, CMSIS, Components, Middlewares, etc. directories). Everything compiled and seemed to work, until I started modifying a bit.
In the debugger, it seemed all function calls were working until the while(1) main loop where I ended up in the Default_Handler defined in the startup_stm32.s, seemingly from WWDG_IRQHandler. That was, in fact, the default IRQ handler for not-user-defined handlers (WWDG_IRQHandler being the first one declared, it was reported as such by gdb, as indicated by #D Krüger).
I started looking at compiler and linker options or linker script, without much luck, until I realized the only file I didn't check was the startup_stm32.s, which was indeed different.
I blindly copy-pasted it and voilà!
The explanation I could give is that the STM32 is calling IRQ handlers defined in the startup_stm32.s when interrupt occur, all of them initially pointing to Default_Handler() (later overriden by the linker). So if a .c file you copied defines a handler with a slightly different name (but consistent with its own startup_xxx.s), you'll end up with the Default_Handler() being called (which is an infinite loop) instead of the one you defined. And things go wrong.
See https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html for more information.
N.B. I'm not happy to blindly copy-paste without fully understanding, but time constraints and milestones usually push you to territories you're not happy to explore...
Will add my 5 cents. I had this issue on stm32h7, but for me the cause was that the cube "forgot" to add TIM16_IRQHandler when TIM16 is used as the timebase source. It was not happening at the beginning but after several code regenerations. Looks like a bug in the cube, as the TIM16 was still set, but the interrupt handler got removed. So toggking to TIM17 and back resolved the issue.
In my case, I had a function written in the GCC assembly that was migrated from the ARM assembly. The problem went away after I had added the .thumb_func line to the assembly file.
I was getting this error:
(gdb) c
+c
Continuing.
Program received signal SIGINT, Interrupt.
WWDG_IRQHandler () at ...startup_stm32f40_41xxx.s:121
(gdb) bt
#0 WWDG_IRQHandler () at ...startup_stm32f40_41xxx.s:12
#1 <signal handler called>
#2 RTOS_SysTick_Handler () at ...osKernel.s:18
#3 <signal handler called>
#4 0x0800021a in task0 () at ...main.cpp:10
#5 0x08000214 in frame_dummy ()
#6 0x00000000 in ?? ()
RTOS_SysTick_Handler is a function written in assembly and the WWDG_IRQHandler was always triggered before any first assembly instructions in that function (tried different instructions and it didn't change anything).
I was doing some tweaks around the C code and at some point, I hit another handler: UsageFault which led me to the .thumb_func hint: ARM Cortex M4 SVC_Handler "UsageFault".
Usually when interrupt occurs, program returns to the line from where interrupt is generated.
I want to run the program from new line after ISR routine is completed, i.e. I don't want it to go back from where interrupt is generated.
would I have to change IP stored in SP Or what else?
thanks
PC(Program Counter)commonly called the instruction pointer (IP) in Intel x86 will store the next instruction address. you need to change PC to the Newline at End of Interrupt routinue.
you can Also increment Pc VAlue which is store in stack at the end of Interrupt routine, then would be stored in the PC.
Your ISR has no idea of the point of execution of what it has interrupted and no clue as to what is stored on the stack of what it has interrupted. Just 'jumping' to another 'line', without a stack-cleanup operation, (which is not possible 'cos you don't know what's on it), will generate UB, (probably UB erring on the AV/segFault side).
The only way I know of to achieve something like what seem to want is to swap to a different stack - signal a semaphore/event upon which a thread is waiting and request an OS scheduler run on ISR exit. The newly-ready thread may well then run immediately after the ISR completes, (depending on loading/priorities etc), maybe even preempting the thread that was interrupted and so 'run the program from new line', sort-of.. :)
I have a strange bug in my code which disappears when I try to debug it.
In my timer interrupt (always running system ticker) I have something like this:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
in my main loop I have
if (some_global_flag)
{
some_global_flag = 0;
do_something_very_important(); // breakpoint 1
}
This condition in the main loop is never called when the conditions in the timer are (I think) fulfilled. The conditions are external (portpins, ADC results, etc).
First I put a breakpoint at the position 1, and it is never triggered.
To check it, I put breakpoint nr. 2 on the line some_global_flag = 1;, and in this case the code works: both breakpoints are triggered when the conditions are true.
Update 1:
To research whether some timing condition is responsible, and the if in the timer is never entered if running without debugging, I added the following in my timer:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
if (some_global_flag)
{
#asm("NOP"); // breakpoint 3
}
The flag is not used anywhere else in the code. It is in RAM, and the RAM is cleared to zero at the beginning.
Now, when all the breakpoints are disabled (or only breakpoint 1 in the main is enabled), the code does not work correctly, the function is not executed. However, if I enable only the breakpoint 3 on the NOP, the code works! The breakpoint is triggered, and after continuing, the function is executed. (It has visible and audible output, so it's obvious if it runs)
Update 2:
The timer interrupt was interruptible, by means of a "SEI" at its beginning. I removed that line, but the behavior is not changed in any noticeable way.
Update 3:
I'm not using any external memory.
As I'm very close to the limit in the flash, I have size optimization in the compiler on maximum.
Can the compiler (CodeVision) be responsible, or did I do something very wrong?
Debuggers can/do change the way the processor runs and code executes so this is not surprising.
divide and conquer. Start removing things until it works. In parallel with that start with nothing add only the timer interrupt and the few lines of code in the main loop with do_something_very_important() being something simple like blinking an led or spitting something out the uart. if that doesnt work you wont get the bigger app to work. If that does work start adding init code and more conditions in your interrupt, but do not complicate the main loop any more than the few lines described. Increase the interrupt handler conditions by adding more of the code back in until it fails.
When you reach the boundary where you can add one thing and fail and remove it and not fail then do some disassembly to see if it is a compiler thing. this might warrant another SO ticket if it is not obvious, "why does my avr interrupt handler break when I add ..."
If you are able to get this down to a small number of lines of code a dozen or so main and just the few interrupt lines, post that so others can try it on their own hardware and perhaps figure it out in parallel.
This is probably an typical optimizing / debugging bug. Make sure that some_global_flag is marked as volatile. This may be an int uint8 uint64 whatever you like...
volatile int some_global_flag
This way you tell the compiler not to make any assumptions on what the value of some_global_flag will be. You must do this because the compiler/optimizer can't see any call to your interrupt routine, so it assumes some_global_flag is always 0 (the initial state) and never changed.
Sorry misread the part where you already tried it...
You can try to compile the code with avr-gcc and see if you have the same behavior...
It might seem strange but it finally proved to be caused by strong transients on one of the input lines (which powers the system but its ADC measurement is also used as a condition).
The system can have periodic power fails for a short time, and important temporary data is kept in part of the internal SRAM, which is not cleaned after startup and designed to retain the data (for as much as 10 minutes or more) with the use of a small capacitor while the CPU is in brown-out.
I did not post this in the question because I tested this part of the system it and worked perfectly, so I did not want to throw you off course.
What I found out at the end, is that a new feature was used in an environment which created very strong transients, and one of the conditions in my question depended on a state which depended on one of those variables in the "permanent RAM", and finally using a breakpoint saved me from the effects of that transient.
Finally the problem was solved with adjustments in timing.
Edit: what helped me find the location of the problem was that I logged the values of my most important variables in the "permanent RAM" area and could see that a few of them got corrupted.
I may be wrong here but if you are using a debugger to attach to the board in question and debug the program on the hardware it was supposed to run on i think it can change the behavior of the microcontroller when it performs an attach.... Other that that and the volatile keyword suggested above i have no clues.
This is written assuming an ARM processor.
using a breakpoint ( RAM or ROM bkpoint ) forces processor to switch from Run mode to Debug Mode at the breakpoint ( either to halt mode or Monitor mode) and force it to run in Debug speed or to run an abort handler and hence JTAG based debugging is basically intrusive debugging.
ETM( embedded Trace Macrocell),specifically in ARM (or other types of bus instrumentation ) is designed to be non intrusive and can log the instructions and data in real time so that we can inspect what really happened.