Schrödinger bug disappearing when breakpoint is set

Schrödinger bug disappearing when breakpoint is set - c

I have a strange bug in my code which disappears when I try to debug it.
In my timer interrupt (always running system ticker) I have something like this:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
in my main loop I have
if (some_global_flag)
{
some_global_flag = 0;
do_something_very_important(); // breakpoint 1
}
This condition in the main loop is never called when the conditions in the timer are (I think) fulfilled. The conditions are external (portpins, ADC results, etc).
First I put a breakpoint at the position 1, and it is never triggered.
To check it, I put breakpoint nr. 2 on the line some_global_flag = 1;, and in this case the code works: both breakpoints are triggered when the conditions are true.
Update 1:
To research whether some timing condition is responsible, and the if in the timer is never entered if running without debugging, I added the following in my timer:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
if (some_global_flag)
{
#asm("NOP"); // breakpoint 3
}
The flag is not used anywhere else in the code. It is in RAM, and the RAM is cleared to zero at the beginning.
Now, when all the breakpoints are disabled (or only breakpoint 1 in the main is enabled), the code does not work correctly, the function is not executed. However, if I enable only the breakpoint 3 on the NOP, the code works! The breakpoint is triggered, and after continuing, the function is executed. (It has visible and audible output, so it's obvious if it runs)
Update 2:
The timer interrupt was interruptible, by means of a "SEI" at its beginning. I removed that line, but the behavior is not changed in any noticeable way.
Update 3:
I'm not using any external memory.
As I'm very close to the limit in the flash, I have size optimization in the compiler on maximum.
Can the compiler (CodeVision) be responsible, or did I do something very wrong?

Debuggers can/do change the way the processor runs and code executes so this is not surprising.
divide and conquer. Start removing things until it works. In parallel with that start with nothing add only the timer interrupt and the few lines of code in the main loop with do_something_very_important() being something simple like blinking an led or spitting something out the uart. if that doesnt work you wont get the bigger app to work. If that does work start adding init code and more conditions in your interrupt, but do not complicate the main loop any more than the few lines described. Increase the interrupt handler conditions by adding more of the code back in until it fails.
When you reach the boundary where you can add one thing and fail and remove it and not fail then do some disassembly to see if it is a compiler thing. this might warrant another SO ticket if it is not obvious, "why does my avr interrupt handler break when I add ..."
If you are able to get this down to a small number of lines of code a dozen or so main and just the few interrupt lines, post that so others can try it on their own hardware and perhaps figure it out in parallel.

This is probably an typical optimizing / debugging bug. Make sure that some_global_flag is marked as volatile. This may be an int uint8 uint64 whatever you like...
volatile int some_global_flag
This way you tell the compiler not to make any assumptions on what the value of some_global_flag will be. You must do this because the compiler/optimizer can't see any call to your interrupt routine, so it assumes some_global_flag is always 0 (the initial state) and never changed.
Sorry misread the part where you already tried it...
You can try to compile the code with avr-gcc and see if you have the same behavior...

It might seem strange but it finally proved to be caused by strong transients on one of the input lines (which powers the system but its ADC measurement is also used as a condition).
The system can have periodic power fails for a short time, and important temporary data is kept in part of the internal SRAM, which is not cleaned after startup and designed to retain the data (for as much as 10 minutes or more) with the use of a small capacitor while the CPU is in brown-out.
I did not post this in the question because I tested this part of the system it and worked perfectly, so I did not want to throw you off course.
What I found out at the end, is that a new feature was used in an environment which created very strong transients, and one of the conditions in my question depended on a state which depended on one of those variables in the "permanent RAM", and finally using a breakpoint saved me from the effects of that transient.
Finally the problem was solved with adjustments in timing.
Edit: what helped me find the location of the problem was that I logged the values of my most important variables in the "permanent RAM" area and could see that a few of them got corrupted.

I may be wrong here but if you are using a debugger to attach to the board in question and debug the program on the hardware it was supposed to run on i think it can change the behavior of the microcontroller when it performs an attach.... Other that that and the volatile keyword suggested above i have no clues.

This is written assuming an ARM processor.
using a breakpoint ( RAM or ROM bkpoint ) forces processor to switch from Run mode to Debug Mode at the breakpoint ( either to halt mode or Monitor mode) and force it to run in Debug speed or to run an abort handler and hence JTAG based debugging is basically intrusive debugging.
ETM( embedded Trace Macrocell),specifically in ARM (or other types of bus instrumentation ) is designed to be non intrusive and can log the instructions and data in real time so that we can inspect what really happened.

Related

How Can I save some data before hardware reset of microcontroller?

I'm working on one of Freesacle micro controller. This microcontroller has several reset sources (e.g. clock monitor reset, watchdog reset and ...).
Suppose that because of watchdog, my micro controller is reset. How can I save some data just before reset happens. I mean for example how can I understand that where had been the program counter just before watchdog reset. With this method I want to know where I have error (in another words long process) that causes watchdog reset.

Most Freescale MCUs work like this:
RAM is preserved after watchdog reset. But probably not after LVD reset and certainly not after power-on reset. This is in most cases completely undocumented.
The MCU will either have a status register where you can check the reset cause (for example HCS08, MPC5x, Kinetis), or it will have special reset vectors for different reset causes (for example HC11, HCS12, Coldfire).
There is no way to save anything upon reset. Reset happens and only afterwards can you find out what caused the reset.
It is however possible to reserve a chunk of RAM as a special segment. Upon power-on reset, you can initialize this segment by setting everything to zero. If you get a watchdog reset, you can assume that this RAM segment is still valid and intact. So you don't initialize it, but leave it as it is. This method enables you to save variable values across reset. Probably - this is not well documented for most MCU families. I have used this trick at least on HCS08, HCS12 and MPC56.
As for the program counter, you are out of luck. It is reset with no means to recover it. Meaning that the only way to find out where a watchdog reset occurred is the tedious old school way of moving a breakpoint bit by bit down your code, run the program and check if it reached the breakpoint.
Though in case of modern MCUs like MPC56 or Cortex M, you simply check the trace buffer and see what code that caused the reset. Not only do you get the PC, you get to see the C source code. But you might need a professional, Eclipse-free tool chain to do this.

Depending on your microcontroller you may get Reset Reason, but getting previous program counter (PC/IP) after reset is not possible.
Most of modern microcontrollers have provision for Watchdog Interrupt Instead of reset.
You can configure watchdog peripheral to enable interrupt , In that ISR you can check stored context on stack. ( You can take help from JTAG debugger to check call stack).
There are multiple debugging methods available if your micro-controller dosent support above method.
e.g
In simple while(1) based architecture you can use a HW timer and restart it after some section of code. In Timer ISR you will know which code section is consuming long enough than the timer.

Two things:
Write a log! And rotate that log to keep the last 30 min. or whatever reasonable amount of time you think you need to reproduce the error. Where the log stops, you can see what happened just before that. Even in production-level devices there is some level of logging.
(Less, practical) You can attach a debugger to nearly every micrcontroller and step through the code. Probably put a break-point that is hit just before you enter the critical section of code. Some IDEs/uCs allow having "data-breakpoints" that get triggered when certain variables contain certain values.
Disclaimer: I am not familiar with the exact microcontroller that you are using.

It is written in your manual.
I don't know that specific processor but in most microprocessors a watchdog reset is a soft reset, meaning that certain registers will keep information about the reset source and sometimes reason.
You need to post more specific information on your Freescale μC for this be answered properly.

Even if you could get the Program Counter before reset, it wouldn't be advisable to blindly set the program counter to another after reset --- as there would likely have been stack and heap information as well as the data itself may also have changed.
It depends on what you want to preserve after reset, certain behaviour or data? Volatile memory may or may not have been cleared after watchdog (see your uC datasheet) and you will be able to detect a reset after checking reset registers (again see your uC datasheet). By detecting a reset and checking volatile memory you may be able to prepare your uC to restart in a way that you'd prefer after the unlikely event of a reset occurring. You could create a global value and set it to a particular value in global scope, then if it resets, check the value against it when a reset event occurs -- if it is the same, you could assume other memory may also be the same. If volatile memory is not an option you'll need to have a look at the datasheet for non-volatile options, however it is also advisable not to continually write to non-volatile memory due to writing limitations.

The only reliable solution is to use a debugger with trace capability if your chip supports embedded instruction trace.
Some devices have an option to redirect the watchdog timeout to an interrupt rather then a reset. This would allow you to write the watchdog timeout handler much like an exception handler and dump or store the stack information including the return address which will indicate the location the interrupt occurred.
However in some cases, neither solution is a reliable method of achieving your aim. In a multi-tasking environment or system with interrupt handlers, the code running when the watchdog timeout occurs may not be the process that is causing the problem.

How do you know when a micro-controller reset?

I am learning embedded systems on the ARM9 processor (SAM9G20). I am more familiar with procedural programming for general purpose. Thus what I am doing is going through the data sheet and learning what registers there are and how to manipulate them.
My question is, how do I know when the computer reset? I know that there is a Reset Controller that manages resets. A register called the Status Register (RSTC_SR) stores the source of the reset. Do I need to keep periodically reading this register?
My solution is to store the number of resets in the FRAM (or start by setting it to 0), once a reset happens, I compare this variable with the register value in my main function. If the register value is higher then obviously it reset. However I am sure there is a more optimized way (perhaps using interrupts). Or is this how its usually done?

You do not need to periodically check, since every time the machine is reset your program will re-start from the beginning.
Simply add checks to the startup code, i.e. early in main(), as needed. If you want to figure out things like how often you reset, then that is more difficult since typically (no experience with SAMs, I'm an STM32 type of guy) on-board timers etc will also reset. Best would be some kind of real-world independent clock, like an RTC that you can poll and save the value of. Please consider if you really need this, though.

A simple solution is to exploit the structure of your code.
Many code bases for embedded take this form:
int main(void)
{
// setup stuff here
while (1)
{
// handle stuff here
}
return 0;
}
You can exploit that the code above while(1) is only run once at startup. You could increment a counter there, and save it in non-volatile storage. That would tell you how many times the microcontroller has reset.
Another example is on Arduino, where the code is structured such that a function called setup() is called once, and a function called loop() is called continuously. With this structure, you could increment the variable in the setup()-function to achieve the same effect.

Whenever your processor starts up, it has by definition come out of reset. What the reset status register does is indicate the source or reason for the reset, such as power-on, watchdog-timer, brown-out, software-instruction, reset-pin etc.
It is not a matter of knowing when your processor has reset - that is implicit by the fact that your code has restarted. It is rather a matter of knowing the cause of the reset.
You need not monitor or read the reset status at all if your application has no need of it, but in some applications perhaps it is a useful diagnostic for example to maintain a count of various reset causes as it may be indicative of the stability of your system software, its power-supply or the behaviour of the operators. Ideally you'd want to log the cause with a timestamp assuming you have an suitable RTC source early enough in your start-up. The timing of resets is often a useful diagnostic where simply counting them may not be.
Any counting of the reset cause should occur early in your code start-up before any interrupts are enabled (because an interrupt may itself cause a reset). This may require you to implement the counters in the start-up code before main() is invoked in cases where the start-up code might enable interrupts - for stdio or filesystem support fro example.

A way to do this is to run the code in debug mode (if you got a debugger for the SAM). After a reset the program counter(PC) points to the address where your code starts.

HCS12 embedded: counter timer and calculated output compare values

I'm having trouble with timer output compare interrupts on the HCS12. The problem seems to be that I'm writing calculated values to the output compare registers rather than immediates, ie...
OCval = x + y ; ldd OC1, OCval ; // what I need to do
ldd OC1, #3000 ; // what works
With the calculated values, the timer interrupt is erratic, which is unacceptable in my application. The problem has been firmly pinned down to the documented requirement to access the timer and OC registers in a single cycle, anything other than an immediate write violates this. I also note that all the sample code on the web uses immediate ops.
Just wondering if there is a software workaround. I need to allow the counter to free run (ie. no reset) because there are other output compares with immediate writes that have to remain in action. Only two of my interrupts need to be calculated.
A software fix would be nice because the only other options I can see involve additional hardware to handle the dynamic timing, messy. TIA

This is a bit tentative, but early tests are encouraging. I've moved the offending interrupts from the main timer to the modulo downcounter, which also provides clocked interrupts. The documentation states that setting the count register is subject to the same single-cycle write rules, however my extensive testing with the main timer indicates that problems are very unlikely to occur until the counter has run for a while with the new setting. The advantage of the new approach is that a value needs to be written only once, on initially setting the time value, unlike the main timer where a rewrite needed to be performed many times a second.
In case it helps, I'm stopping the counter before doing the write, then restarting it.

Where does finite-state machine code belong in µC?

I asked this question on EE forum. You guys on StackOverflow know more about coding than we do on EE so maybe you can give more detail information about this :)
When I learned about micrcontrollers, teachers taught me to always end the code with while(1); with no code inside that loop.
This was to be sure that the software get "stuck" to keep interruption working. When I asked them if it was possible to put some code in this infinite loop, they told me it was a bad idea. Knowing that, I now try my best to keep this loop empty.
I now need to implement a finite state machine in a microcontroller. At first view, it seems that that code belong in this loop. That makes coding easier.
Is that a good idea? What are the pros and cons?
This is what I plan to do :
void main(void)
{
// init phase
while(1)
{
switch(current_State)
{
case 1:
if(...)
{
current_State = 2;
}
else(...)
{
current_State = 3;
}
else
current_State = 4;
break;
case 2:
if(...)
{
current_State = 3;
}
else(...)
{
current_State = 1;
}
else
current_State = 5;
break;
}
}
Instead of:
void main(void)
{
// init phase
while(1);
}
And manage the FSM with interrupt

It is like saying return all functions in one place, or other habits. There is one type of design where you might want to do this, one that is purely interrupt/event based. There are products, that go completely the other way, polled and not even driven. And anything in between.
What matters is doing your system engineering, thats it, end of story. Interrupts add complication and risk, they have a higher price than not using them. Automatically making any design interrupt driven is automatically a bad decision, simply means there was no effort put into the design, the requirements the risks, etc.
Ideally you want most of your code in the main loop, you want your interrupts lean and mean in order to keep the latency down for other time critical tasks. Not all MCUs have a complicated interrupt priority system that would allow you to burn a lot of time or have all of your application in handlers. Inputs into your system engineering, may help choose the mcu, but here again you are adding risk.
You have to ask yourself what are the tasks your mcu has to do, what if any latency is there for each task from when an event happens until they have to start responding and until they have to finish, per event/task what if any portion of it can be deferred. Can any be interrupted while doing the task, can there be a gap in time. All the questions you would do for a hardware design, or cpld or fpga design. except you have real parallelism there.
What you are likely to end up with in real world solutions are some portion in interrupt handlers and some portion in the main (infinite) loop. The main loop polling breadcrumbs left by the interrupts and/or directly polling status registers to know what to do during the loop. If/when you get to where you need to be real time you can still use the main super loop, your real time response comes from the possible paths through the loop and the worst case time for any of those paths.
Most of the time you are not going to need to do this much work. Maybe some interrupts, maybe some polling, and a main loop doing some percentage of the work.
As you should know from the EE world if a teacher/other says there is one and only one way to do something and everything else is by definition wrong...Time to find a new teacher and or pretend to drink the kool-aid, pass the class and move on with your life. Also note that the classroom experience is not real world. There are so many things that can go wrong with MCU development, that you are really in a controlled sandbox with ideally only a few variables you can play with so that you dont have spend years to try to get through a few month class. Some percentage of the rules they state in class are to get you through the class and/or to get the teacher through the class, easier to grade papers if you tell folks a function cant be bigger than X or no gotos or whatever. First thing you should do when the class is over or add to your lifetime bucket list, is to question all of these rules. Research and try on your own, fall into the traps and dig out.

When doing embedded programming, one commonly used idiom is to use a "super loop" - an infinite loop that begins after initialization is complete that dispatches the separate components of your program as they need to run. Under this paradigm, you could run the finite state machine within the super loop as you're suggesting, and continue to run the hardware management functions from the interrupt context as it sounds like you're already doing. One of the disadvantages to doing this is that your processor will always be in a high power draw state - since you're always running that loop, the processor can never go to sleep. This would actually also be a problem in any of the code you had written however - even an empty infinite while loop will keep the processor running. The solution to this is usually to end your while loop with a series of instructions to put the processor into a low power state (completely architecture dependent) that will wake it when an interrupt comes through to be processed. If there are things happening in the FSM that are not driven by any interrupts, a normally used approach to keep the processor waking up at periodic intervals is to initialize a timer to interrupt on a regular basis to cause your main loop to continue execution.
One other thing to note, if you were previously executing all of your code from the interrupt context - interrupt service routines (ISRs) really should be as short as possible, because they literally "interrupt" the main execution of the program, which may cause unintended side effects if they take too long. A normal way to handle this is to have handlers in your super loop that are just signalled to by the ISR, so that the bulk of whatever processing that needs to be done is done in the main context when there is time, rather than interrupting a potentially time critical section of your main context.

What should you implement is your choice and debugging easiness of your code.
There are times that it will be right to use the while(1); statement at the end of the code if your uC will handle interrupts completely (ISR). While at some other application the uC will be used with a code inside an infinite loop (called a polling method):
while(1)
{
//code here;
}
And at some other application, you might mix the ISR method with the polling method.
When said 'debugging easiness', using only ISR methods (putting the while(1); statement at the end), will give you hard time debugging your code since when triggering an interrupt event the debugger of choice will not give you a step by step event register reading and following. Also, please note that writing a completely ISR code is not recommended since ISR events should do minimal coding (such as increment a counter, raise/clear a flag, e.g.) and being able to exit swiftly.

It belongs in one thread that executes it in response to input messages from a producer-consumer queue. All the interrupts etc. fire input to the queue and the thread processes them through its FSM serially.
It's the only way I've found to avoid undebuggable messes whilst retaining the low latencty and efficient CPU use of interrupt-driven I/O.
'while(1);' UGH!

breakpoint inside interrupt C

I'm using LCDK C6748 of Texas Intruments with Code Composer Studio and TMDSEMU100V2U-14T - XDS100v2 USB JTAG Emulator.
LCDK comes with bunch of support functions, including a function that initialize the board and defines which callback functions are called for each interrupt.
I just implemented the callback function, so it does something whenever a new sample comes from the ADC.
I tried to set a breakpoint inside the interrupt but in run time the program "flow" didn't get there.
Furthermore, I've done something simpler:
volatile int flag = 0;
interrupt void interrupt4(void) // interrupt service routine
{
flag = 1;
return;
}
int main(){
// board initializing function, defining sampling rate etc.
L138_initialise_intr(FS_48000_HZ,ADC_GAIN_0DB,DAC_ATTEN_0DB);
while(1){
if (flag == 1){
printf("interrupt entered");
flag = 0;
}
}
}
but the from some reason the while loop was entered only once.
it surprised me because if I don't set breakpoint the interrupt is entered continuously-
I tried to just pass the samples to the speakers line without doing anything else and I heard music.
I have a feeling that I'm missing something very basic about interrupts, I'm quite new to this subject.
Can someone please explain to me [or link me to good source that explain how the mechnism works in DSP]:
1) why we can't set a breakpoint inside interrupt?
2) why even if I set breakpoint in the main, it seems the interrupt doesn't occur, and if I don't it does.
3) which ways I have to have access to the variables in run time, in CCS?
thanks

Try putting breakpoint and then run. see if it hits atleast once. If it does, then your interrupt source is not cleared automatically [because you are not doing so explicitly inside ISR]. in TI controller they expect you to clear ISR path to receive next as per my experience,.
If you dont receive even 1st time interrupt then, check assembly generated for ISR and optimization done by compiler.
Although, you might need to see the timing and global variable protection later, in case of conflicts but as of now above 2 suggestions shall do.

I think that your interrupt is a Timer Interrupt. In many cases jtag, when a breakpoint is triggered, stops a lot of MPU/DSP modules, but timer continue running. This causes overflow of timer, that means that overflow flag is set and the interrupt will be never called until the flag is reset.
I don't know if you can set jtag to stop timers also when a breakpoint is triggered. With freescale MPUs, IAR IDE and segger jtag I can.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight