Variable is not incremented in ISR - c

I have an ISR which increments a variable 'head' of an array. The problem is after a few hours this variable even after getting incremented comes back to its previous value. Something like:
array[head] = val;
head++;
/*val is the byte that came from ISR and I am assigning it to my buffer 'array' at head position*/
Now when I run the code for several hours, I observe that if head was say 119, stored the byte from ISR, became 120, and on next Interrupt instead of storing the next byte on 120 and incrementing head to 121, head becomes 120 again and overwrites that byte in my array. What could be the problem? Any suggestions are welcome!
Note:
head is a volatile variable.
Speed of interrupt is very high.
code snippet:
/*before storing on to the circular buffer check whether it is full*/
if ((COM1RxBufHead == COM1RxBufTail - 1) ||((COM1RxBufHead == (COM1RXBUFSIZE - 1)) && (COM1RxBufTail == 0)))
{
logDEBUG("[FULL]");
U1STAbits.OERR = 0;
return;
}
else
{
/* Byte can be safely stored on to buffer*/
COM1RxBuf[COM1RxBufHead] = U1RXREG;
if (COM1RxBufHead == (COM1RXBUFSIZE - 1))
{
COM1RxBufHead = 0;
}
else
{
COM1RxBufHead++;
}

You're working at the wrong abstraction level. Although it's perfectly okay to use C for coding this stuff up, finding problems that seem to be impossible based on the C code itself means that you need to step down one level.
And that means going into the assembler/machine-architecture arena.
The following are general suggestions only since I don't actually know your machine architecture.
Examine the actual assembly language that's generated by your compiler. Seeing the translated code may make it obvious what could be causing this issue, such as the use of cached values (even though you state that you've marked the variable volatile).
Make sure that further interrupts are disable while you're running the ISR. I don't know off the top of my head any architectures where this isn't the case but they may exist, requiring you to disable and re-enable manually.
Even if interrupts are automatically disabled in the ISR, there are architectures that have priority levels of interrupts, where a higher priority one can interrupt a lower priority ISR in progress. There are also NMIs, non-maskable interrupts, which can interrupt anything (though they tend to be used for more serious things).
Make sure that, if you're modifying the variable outside of the ISR, you disable interrupts while doing it. That's to prevent the possibility of an ISR running halfway through an update, if the update itself isn't an atomic operation interrupt-wise. That's likely to be the case since incrementing a pointer with potential circular buffer wrap-around will almost certainly be a multiple instruction (and hence interruptible) process.

Related

Thread safe queue implementation (or alternative data structure)

I'm trying to implement a threadsafe queue that will hold data coming in on a UART buffer. The queue is written to as part of the UART RX-complete-ISR. This queue now holds the data that came in on the UART RX channel. The queue also needs to be read by the application using another thread to process the data. But since I'm running all of this on a bare-metal system without any RTOS support, I'm wondering if there is a better data structure to use here. Because when I'm using queues there is one common variable that both the threads need to access and this might cause a race condition.
I realize as I'm writing this that this is the producer-consumer problem and the only way I have solved this in the past is with mutexes. Is there an alternative to that approach?
Edit:
The processor being used is a ST micro cortex-M0 based processor. I looked into some mutex implementations for M0 but couldn't find anything definitive. This is mostly because the M0 processor does not support LDREX or STREX instructions that are usually present in M3 and M4 systems and are used for implementing atomic operations required for mutexes.
As for the system, the code runs straight to main after booting and has NO OS functionality. Even the scheduler was something that was written by me and simply looks at a table that holds function pointers and calls them.
The requirement is that one thread writes into a memory location from the ISR to store data coming in through the UART RX channel and another thread reads from those memory locations to process the data received. So my initial thought was that I would push to a queue from the ISR and read from it using the application thread, but that is looking less and less feasible because of the race condition that comes out of a producer-consumer setup (with the ISR being the producer and the application being the consumer).
Your M0 is a uniprocessor, so you can disable interrupts to serve basic exclusion:
int q_put(int c, Q *q) {
int ps, n, r;
ps = disable();
if ((n = q->tail+1) == q->len) {
n = 0;
}
if (n != q->head) {
q->buf[q->tail] = c;
q->tail = n;
r = 0;
} else {
r = -1;
}
restore(ps);
return r;
}
int q_get(Q *q) {
int ps, n, r;
ps = disable();
if ((n=q->head) == q->tail) {
r = -1;
} else {
r = q->buf[n] & 0xff;
q->head = n+1 == q->len ? 0 : n+1;
}
restore(ps);
return r;
}
where disable disables interrupts returning the previous state, and restore sets the interrupt state to its argument.
If it is bare metal, then you won't have any mutex or higher level concepts, so you need to implement something similar yourself. This is a common scenario however.
The normal datatype to use for this is a ring buffer, which is a manner of queue, implemented over a circular array. You should write that one as a separate module, but include two parameters: interrupt register and bit mask for setting/clearing that register. Then let the ring buffer code temporarily disable the UART RX interrupt during copy from the ring buffer to the caller application. This will protect against race conditions.
Since UART is most of the time relatively slow (< 115.2kbps), disabling the RX interrupt for a brief moment is harmless, since you only need a couple of microseconds to do the copy. The theory behind this is that the ISR will run once per data byte received, but the caller runs completely asynchronous in relation to the data. So the caller should not be allowed to block the ISR for longer than the time it takes to clock in 2 data bytes, in which case there will be overrun errors and data losses.
Which in practice means that the caller should only block the ISR for shorter time than it takes to clock in 1 data byte, because we don't know how far the UART has gotten in clocking in the current byte, at the time we disable the RX interrupt. If the interrupt is disabled at the point the byte is clocked in, that should still be fine since it should become a pending interrupt and trigger once you enable the RX interrupt once again. (At least all UART hardware I've ever used works like this, but double-check the behavior of your specific one just to be sure.)
So this all assuming that you can do the copy faster than the time it takes to clock in 1+8+1 new bits on the UART (no parity 1 stop). So if you are running for example 115.2kbps, your code must be faster than 1/115200 * (1+8+1) = 86.8us. If you are only copying less than a 32 bit word during that time, a Cortex M should have no trouble keeping up, assuming you run a sensible clock speed (8-48MHz something like that) and not some low power clock.
You always need to check for overrun and framing errors. Depending on UART hardware, these might be separate interrupts or the same one as RX. Then handle those errors in whatever way that makes sense for the application. If both sender & receiver is configured correctly and you didn't mess up the timing calculations, you shouldn't have any such errors.

Issue with global variable while making 32-bit counter

I am trying to do quadrature decoding using atmel xmega avr microcontroller. Xmega has only 16-bit counters. And in addition I have used up all the available timers.
Now to make 32-bit counter I have used one 16-bit counter and in its over/under flow interrupt I have increment/decrement a 16-bit global variable, so that by combining them we can make 32-bit counter.
ISR(timer_16bit)
{
if(quad_enc_mov_forward)
{
timer_over_flow++;
}
else if (quad_enc_mov_backward)
{
timer_over_flow--;
}
}
so far it is working fine. But I need to use this 32-bit value in various tasks running parallel. I'm trying to read 32-bit values as below
uint32_t current_count = timer_over_flow;
current_count = current_count << 16;
current_count = current_count + timer_16bit_count;
`timer_16_bit_count` is a hardware register.
Now the problem I am facing is when I read the read timer_over_flow to current_count in the first statement and by the time I add the timer_16bit_count there may be overflow and the 16bit timer may have become zero. This may result in taking total wrong value.
And I am trying to read this 32-bit value in multiple tasks .
Is there a way to prevent this data corruption and get the working model of 32-bit value.
Details sought by different members:
My motor can move forward or backward and accordingly counter increments/decrements.
In case of ISR, before starting my motor I'm making the global variables(quad_enc_mov_forward & quad_enc_mov_backward) set so that if there is a overflow/underflow timer_over_flow will get changed accordingly.
Variables that are modified in the ISR are declared as volatile.
Multiple tasks means that I'm using RTOS Kernel with about 6 tasks (mostly 3 tasks running parallel).
In the XMEGA I'm directly reading TCCO_CNT register for the lower byte.
One solution is:
uint16_t a, b, c;
do {
a = timer_over_flow;
b = timer_16bit_count;
c = timer_over_flow;
} while (a != c);
uint32_t counter = (uint32_t) a << 16 | b;
Per comment from user5329483, this must not be used with interrupts disabled, since the hardware counter fetched into b may be changing while the interrupt service routine (ISR) that modifies timer_over_flow would not run if interrupts are disabled. It is necessary that the ISR interrupt this code if a wrap occurs during it.
This gets the counters and checks whether the high word changed. If it did, this code tries again. When the loop exits, we know the low word did not wrap during the reads. (Unless there is a possibility we read the high word, then the low word wrapped, then we read the low word, then it wrapped the other way, then we read the high word. If that can happen in your system, an alternative is to add a flag that the ISR sets when the high word changes. The reader would clear the flag, read the timer words, and read the flag. If the flag is set, it tries again.)
Note that timer_over_flow, timer_16bit_count, and the flag, if used, must be volatile.
If the wrap-two-times scenario cannot happen, then you can eliminate the loop:
Read a, b, and c as above.
Compare b to 0x8000.
If b has a high value, either there was no wrap, it was read before a wrap upward (0xffff to 0), or it was read after a wrap downward. Use the lower of a or c.
Otherwise, either there was no wrap, b was read after a wrap upward, or it was read before a wrap downward. Use the larger of a or c.
The #1 fundamental embedded systems programming FAQ:
Any variable shared between the caller and an ISR, or between different ISRs, must be protected against race conditions. To prevent some compilers from doing incorrect optimizations, such variables should also be declared as volatile.
Those who don't understand the above are not qualified to write code containing ISRs. Or programs containing multiple processes or threads for that matter. Programmers who don't realize the above will always write very subtle, very hard-to-catch bugs.
Some means to protect against race conditions could be one of these:
Temporary disabling the specific interrupt during access.
Temporary disabling all maskable interrupts during access (crude way).
Atomic access, verified in the machine code.
A mutex or semaphore. On single-core MCU:s where interrupts cannot be interrupted in turn, you can use a bool as "poor man's mutex".
Just reading TCCO_CNT in multithreaded code is race condition if you do not handle it correctly. Check the section on reading 16bit registers in XMega manual. You should read lower byte first (this will be probably handled transparently by compiler for you). When lower byte is read, higher byte is (atomically) copied into the TEMP register. Then, reading high byte does read the TEMP register, not the counter. In this way atomic reading of 16bit value is ensured, but only if there is no access to TEMP register between low and high byte read.
Note that this TEMP register is shared between all counters, so context switch in right (wrong) moment will probably trash its content and therefore your high byte. You need to disable interrupts for this 16bit read. Because XMega will execute one instruction after the sei with interrupts disabled, the best way is probably:
cli
ld [low_byte]
sei
ld [high byte]
It disables interrupts for four CPU cycles (if I counted it correctly).
An alternative would to save shared TEMP register(s) on each context switch. It is possible (not sure if likely) that your OS already does this, but be sure to check. Even so, you need to make sure colliding access does not occur from an ISR.
This precaution should be applied to any 16bit register read in your code. Either make sure TEMP register is correctly saved/restored (or not used by multiple threads at all) or disable interrupts when reading/writing 16bit value.
This problem is indeed a very common and very hard one. All solutions will toit will have a caveat regarding timing constraints in the lower priority layers. To clarify this: the highest priority function in your system is the hardware counter - it's response time defines the maximum frequency that you can eventually sample. The next lower priority in your solution is the interrupt routine which tries to keep track of bit 2^16 and the lowest is your application level code which tries to read the 32-bit value. The question now is, if you can quantify the shortest time between two level changes on the A- and B- inputs of your encoder. The shortest time usually does occur not at the highest speed that your real world axis is rotating but when halting at a position: through minimal vibrations the encoder can double swing between two increments, thereby producing e.g. a falling and a rising edge on the same encoder output in short succession. Iff (if and only if) you can guarantee that your interrupt processing time is shorter (by a margin) than this minmal time you can use such a method to virtually extend the coordinate range of your encoder.

Why NOP/few extra lines of code/optimization of pointer aliasing helps? [Fujitsu MB90F543 MCU C code]

I am trying to fix an bug found in a mature program for Fujitsu MB90F543. The program works for nearly 10 years so far, but it was discovered, that under some special circumstances it fails to do two things at it's very beginning. One of them is crucial.
After low and high level initialization (ports, pins, peripherials, IRQ handlers) configuration data is read over SPI from EEPROM and status LEDs are turned on for a moment (to turn them a data is send over SPI to a LED driver).
When those special circumstances occur first and only first function invoking just a few EEPROM reads fails and additionally a few of the LEDs that should, don't turn on.
The program is written in C and compiled using Softune v30L32.
Surprisingly it is sufficient to add single __asm(" NOP ") in low level hardware init to make the program work as expected under mentioned circumstances. It is sufficient to turn off 'Control optimization of pointer aliasing' in Optimization settings. Adding just a few lines of code in various places helps too.
I have compared (DIFFed) ASM listings of compiled program for a version with and without __asm(" NOP ") and with both aforementioned optimizer settings and they all look just fine.
The only warning Softune compiler has been printing for years during compilation is as follows:
*** W1372L: The section is placed outside the RAM area or the I/O area (IOXTND)
I do realize it's rather general question, but maybe someone who has a bigger picture will be able to point out possible cause.
Have you got an idea what may cause such a weird behaviour? How to locate the bug and fix it?
During the initialization a few long (about 20ms) delay loops are used. They don't help although they were increased from about 2ms, yet single NOP in any line of the hardware initialization function and even before or after the function helps.
Both the wait loops works. I have checked it using an oscilloscope. (I have added LED turn on before and off after).
I have checked timming hypothesis by slowing down SPI clock from 1MHz to 500kHz. It does not change anything. Slowing down to 250kHz makes watchdog resets, as some parts of the code execute too long (>25ms).
One more thing. I have observed that adding local variables in any source file sometimes makes the problem disappear or reappear. The same concerns initializing uninitialized local variables. Adding a few extra lines of a code in any of the files helps or reveals the problem.
void main(void)
{
watchdog_init();
// waiting for power supply to stabilize
wait; // about 45ms
hardware_init();
clear_watchdog();
application_init();
clear_watchdog();
wait; // about 20ms
test_LED();
{...}
}
void hardware_init (void)
{
__asm("NOP"); // how it comes it helps? - it may be in any line of the function
io_init(); // ports initialization
clk_init();
timer_init();
adc_init();
spi_init();
LED_init();
spi_start();
key_driver_init();
can_init();
irq_init(); // set IRQ priorities and global IRQ enable
}
Could be one of many things but two spring to mind.
Timing.
Maybe the wait is not long enough for power to stabilize and not everything is synced to the clock. The NOP gets everything back in sync.
Alignment.
Perhaps the NOP gets your instructions aligned on a 32 or 64 bit boundary expected by the hardware. (we used to do this a lot on mainframe assemblers as IO operations often expected things to be on double word boundarys).
The problem was solved. It was caused by a trivial bug.
EEPROM's nHOLD and nCS signals were not initialized immediately after MCU's reset, but before the first use of the EEPROM. As a result they were 0's, so active.
This means EEPROM was selected, but waiting on hold. Meantime other transfer using SPI started. After 6 out of 8 CLK pulses EEPROM's nHOLD I/O pin was initialized and brought high. EEPROM was no longer on hold so it clocked in last two bits of a data for an other peripheral. Every subsequent operation on the EEPROM found it being having not synchronized CLK and MOSI.
When I have added NOP or anything other the moment of nHOLD 0->1 edge was shifted to happen after the last CLK pulse. Now CLK-MOSI were in sync.
All I have had to do was to initialize all the EEPROM's SPI lines, in
particular nHOLD and nCS right after the MCU reset.

Reducing Clock Cycle Count in While Loop for Granularity

I have a while loop implemented in C for an MSP430 processor that currently looks like this:
register unsigned int sw_loop_count = 0U;
...
while (TACCL0 & CCIE)
{
++sw_loop_count;
}
...
#pragma vector=TIMERA0_VECTOR
__interrupt void Timer_A(void)
{
// Disable the timer interrupt flag and enable.
TACCTL0 &= ~CCIFG;
TACCTL0 &= ~CCIE;
}
I'm using this loop for calibration purposes, the context of which I don't think matters too much for my question. I've calculated that each iteration of the loop, including the check TACCL0 & CCIE takes 11 clock cycles. For purposes of granularity, I would really like to get this number as low as possible, and programmatically if possible. I might be being a complete moron, but I can't think of a way of reducing the cycle count for the loop, so any advice would be appreciated. I need the sw_loop_count value, one way or another.
Hmm, after I put a comment I realized that there may be something you can do ;-) In your while() condition you are checking two values. From the looks of it both those values must be defined as volatile so that they are ready from memory every single time they are used...
Can you reduce those two into a single one? Have your interrupt handler do the necessary comparison and set a single flag that you will be checking in your loop.
Or you can get really fancy and do it another way around, like that:
// signed and global (or you can pass it's address into your interrupt's routine)
volatile signed int sw_loop_count = 0;
Then there's your "measurement" loop:
while(++sw_loop_count) {}
and in your interrupt routine:
if(TACCL0 & CCIE)
{
real_count = sw_loop_count; // save the value for future use before we destroy it
sw_loop_count = -1; // this will turn into 0 in that while's pre-increment, ending the loop
}
OTOH... introducing the volatile may get so much hit from memory access that it may in fact slow down the while() loop. It really does all depend on your actual architecture (down to what type of memory controller and cache controllers there are) and I still maintain that you may be better off running it through an assembler mode and looking at what the compiler is doing.

Schrödinger bug disappearing when breakpoint is set

I have a strange bug in my code which disappears when I try to debug it.
In my timer interrupt (always running system ticker) I have something like this:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
in my main loop I have
if (some_global_flag)
{
some_global_flag = 0;
do_something_very_important(); // breakpoint 1
}
This condition in the main loop is never called when the conditions in the timer are (I think) fulfilled. The conditions are external (portpins, ADC results, etc).
First I put a breakpoint at the position 1, and it is never triggered.
To check it, I put breakpoint nr. 2 on the line some_global_flag = 1;, and in this case the code works: both breakpoints are triggered when the conditions are true.
Update 1:
To research whether some timing condition is responsible, and the if in the timer is never entered if running without debugging, I added the following in my timer:
if (a && lot && of && conditions)
{
some_global_flag = 1; // breakpoint 2
}
if (some_global_flag)
{
#asm("NOP"); // breakpoint 3
}
The flag is not used anywhere else in the code. It is in RAM, and the RAM is cleared to zero at the beginning.
Now, when all the breakpoints are disabled (or only breakpoint 1 in the main is enabled), the code does not work correctly, the function is not executed. However, if I enable only the breakpoint 3 on the NOP, the code works! The breakpoint is triggered, and after continuing, the function is executed. (It has visible and audible output, so it's obvious if it runs)
Update 2:
The timer interrupt was interruptible, by means of a "SEI" at its beginning. I removed that line, but the behavior is not changed in any noticeable way.
Update 3:
I'm not using any external memory.
As I'm very close to the limit in the flash, I have size optimization in the compiler on maximum.
Can the compiler (CodeVision) be responsible, or did I do something very wrong?
Debuggers can/do change the way the processor runs and code executes so this is not surprising.
divide and conquer. Start removing things until it works. In parallel with that start with nothing add only the timer interrupt and the few lines of code in the main loop with do_something_very_important() being something simple like blinking an led or spitting something out the uart. if that doesnt work you wont get the bigger app to work. If that does work start adding init code and more conditions in your interrupt, but do not complicate the main loop any more than the few lines described. Increase the interrupt handler conditions by adding more of the code back in until it fails.
When you reach the boundary where you can add one thing and fail and remove it and not fail then do some disassembly to see if it is a compiler thing. this might warrant another SO ticket if it is not obvious, "why does my avr interrupt handler break when I add ..."
If you are able to get this down to a small number of lines of code a dozen or so main and just the few interrupt lines, post that so others can try it on their own hardware and perhaps figure it out in parallel.
This is probably an typical optimizing / debugging bug. Make sure that some_global_flag is marked as volatile. This may be an int uint8 uint64 whatever you like...
volatile int some_global_flag
This way you tell the compiler not to make any assumptions on what the value of some_global_flag will be. You must do this because the compiler/optimizer can't see any call to your interrupt routine, so it assumes some_global_flag is always 0 (the initial state) and never changed.
Sorry misread the part where you already tried it...
You can try to compile the code with avr-gcc and see if you have the same behavior...
It might seem strange but it finally proved to be caused by strong transients on one of the input lines (which powers the system but its ADC measurement is also used as a condition).
The system can have periodic power fails for a short time, and important temporary data is kept in part of the internal SRAM, which is not cleaned after startup and designed to retain the data (for as much as 10 minutes or more) with the use of a small capacitor while the CPU is in brown-out.
I did not post this in the question because I tested this part of the system it and worked perfectly, so I did not want to throw you off course.
What I found out at the end, is that a new feature was used in an environment which created very strong transients, and one of the conditions in my question depended on a state which depended on one of those variables in the "permanent RAM", and finally using a breakpoint saved me from the effects of that transient.
Finally the problem was solved with adjustments in timing.
Edit: what helped me find the location of the problem was that I logged the values of my most important variables in the "permanent RAM" area and could see that a few of them got corrupted.
I may be wrong here but if you are using a debugger to attach to the board in question and debug the program on the hardware it was supposed to run on i think it can change the behavior of the microcontroller when it performs an attach.... Other that that and the volatile keyword suggested above i have no clues.
This is written assuming an ARM processor.
using a breakpoint ( RAM or ROM bkpoint ) forces processor to switch from Run mode to Debug Mode at the breakpoint ( either to halt mode or Monitor mode) and force it to run in Debug speed or to run an abort handler and hence JTAG based debugging is basically intrusive debugging.
ETM( embedded Trace Macrocell),specifically in ARM (or other types of bus instrumentation ) is designed to be non intrusive and can log the instructions and data in real time so that we can inspect what really happened.

Resources