static void RadioReleaseSPI(void) {
__disable_interrupt();
spiTxRxByteCount &= ~0x0100;
__enable_interrupt();
}
I understand that multiple tasks may attempt to use the SPI resource. spiTxRxByteCount is a global variable used to keep track of whether the SPI is currently in use by another task. When a task requires the SPI it can check the status of spiTxRxByteCount to see if the SPI is being used. When a task is done using the SPI it calls this function and clears the bit, to indicate that the SPI is now free. But why disable the interrupts first and then re-enable them after? Just paranoia?
The &= will do a read-modify-write operation - it's not atomic. You don't want an interrupt changing things in the middle of that, resulting in the write over-writing with an incorrect value.
You need to disable interrupts to ensure atomic access. You don't want any other process to access and potentially modify that variable while you're reading it.
From Introduction to Embedded Computing:
The Need for Atomic Access
Imagine this scenario: foreground program, running on an 8-bit uC,
needs to examine a 16-bit variable, call it X. So it loads the high
byte and then loads the low byte (or the other way around, the order
doesn’t matter), and then examines the 16-bit value. Now imagine an
interrupt with an associated ISR that modifies that 16-bit variable.
Further imagine that the value of the variable happens to be 0x1234 at
a given time in the program execution. Here is the Very Bad Thing
that can happen:
foreground loads high byte (0x12)
ISR occurs, modifies X to 0xABCD
foreground loads low byte (0xCD)
foreground program sees a 16-bit value of 0x12CD.
The problem is that a supposedly indivisible piece of data, our
variable X, was actually modified in the process of accessing it,
because the CPU instructions to access the variable were divisible.
And thus our load of variable X has been corrupted. You can see that
the order of the variable read does not matter. If the order were
reversed in our example, the variable would have been incorrectly read
as 0xAB34 instead of 0x12CD. Either way, the value read is neither
the old valid value (0x1234) nor the new valid value (0xABCD).
Writing ISR-referenced data is no better. This time assume that the
foreground program has written, for the benefit of the ISR, the
previous value 0x1234, and then needs to write a new value 0xABCD. In
this case, the VBT is as follows:
foreground stores new high byte (0xAB)
ISR occurs, reads X as 0xAB34
foreground stores new low byte (0xCD)
Once again the code (this time the ISR) sees neither the previous
valid value of 0x1234, nor the new valid value of 0xABCD, but rather
the invalid value of 0xAB34.
While spiTxRxByteCount &= ~0x0100; may look like a single instruction in C, it is actually several instructions to the CPU. Compiled in GCC, the assembly listing looks like so:
57:atomic.c **** spiTxRxByteCount &= ~0x0100;
68 .loc 1 57 0
69 004d A1000000 movl _spiTxRxByteCount, %eax
69 00
70 0052 80E4FE andb $254, %ah
71 0055 A3000000 movl %eax, _spiTxRxByteCount
71 00
If an interrupt comes in in-between any of those instructions and modifies the data, your first ISR can potentially read the wrong value. So you need to disable interrupts before you operate on it and also declare the variable volatile.
There are two reasons for why you should be disabling interrupts:
The &= is a read-modify-write operation which is in nature not atomic. It consists of a read, a bitwise-and, and a write. You don't want this operation to be interrupted by an ISR (interrupt service route). The ISR could modify spiTxRxByteCount after the read and before the write. The write would then be based on an outdated value and you would lose information.
__disable_interrupt() and __enable_interrupt() serve as software barriers. Even if optimization is enabled, the compiler must not move the read or the write across the two barriers. Also, the compiler must not cache the value of spiTxRxByteCount across the two barriers. If there were no barriers, the compiler would be allowed to hold a copy of spiTxRxByteCount in some CPU register even across multiple invocations of RadioReleaseSPI(). This would typically happen if inlining is enabled and RadioReleaseSPI() is called repeatedly.
That disabling and enabling interrupts serves as barriers is at least as important as avoiding the interruption by an ISR, IMHO. But it seems to be overlooked, sometimes.
Related
When writing to device registers on a Cortex M0 (in my case, on an STM32L073), a question arises as to how careful one should be in a) ordering accesses to device memory and b) deciding that a change to a peripheral configuration has actually completed to the point that any dependencies become valid.
Taking a specific example to change the internal voltage regulator to a different voltage. You write the change to PWR->CR and read the status from PWR->CSR. I see code that does something like this:
Write to PWR->CR to set the voltage range
Spin until (PWR->CSR & voltage flag) becomes zero
In my mind there are three issues here:
Access ordering. This is Device Memory so transaction order is preserved relative to other Device access transactions. I would assume this means a DSB is not required between the write to CR and the read from CSR. A linked question and the answer to this is: [ARM CortexA]Difference between Strongly-ordered and Device Memory Type
Device memory can be buffered. Is there a possibility that a write to CR could still be in process when the read from CSR occurs. This would mean that the voltage flag would be clear and the code would proceed. In actual fact the flag hasn't gone high yet!
Hardware response time. Is there a latency between the write and the effects becoming final? In actuality this should always be documented - for the STM32 the docs definitively say that the flag is set when the CR register changes.
Are there any race condition possibilities here? It's really the buffering that worries me - that a peripheral write is still in progress when a peripheral read takes place.
Access ordering.
Accesses are strongly ordered and you do not need barrier instructions to read back the same register.
Device memory can be buffered. Is there a possibility that a write to CR
Yes, it is possible. But it is not because of buffering but because of the bus propagation time. It may take several clocks before a particular operation will go through all bridges.
Hardware response time. Is there a latency between the write and the
effects becoming final
Even if there is a latency it is not important from your point of view. If you set bit in the CR register and wait for the result in the status register. Simply wait for the status bit to have the expected value.
I have read the ARM documentation and it appears that they say in some places that the Cortex M4 can reorder memory writes, while in other places it indicates that M4 will not.
Specifically I am wondering if the DBM instruction is needed like:
volatile int flag=0;
char buffer[10];
void foo(char c)
{
__ASM volatile ("dbm" : : : "memory");
__disable_irq(); //disable IRQ as we use flag in ISR
buffer[0]=c;
flag=1;
__ASM volatile ("dbm" : : : "memory");
__enable_irq();
}
Uh, it depends on what your flag is, and it also varies from chip to chip.
In case that flag is stored in memory:
DSB is not needed here. An interrupt handler that would access flag would have to load it from memory first. Even if your previous write is still in progress the CPU will make sure that the load following the store will happen in the correct order.
If your flag is stored in peripheral memory:
Now it gets interesting. Lets assume flag is in some hardware peripheral. A write to it may make an interrupt pending or acknowledge an interrupt (aka clear a pending interrupt). Contrary to the memory example above this effect happens without the CPU having to read the flag first. So the automatic ordering of stores and loads won't help you. Also writes to flag may take effect with a surprisingly long delay due to different clock domains between the CPU and the peripheral.
So the following szenario can happen:
you write flag=1 to clear an handled interrupt.
you enable interrupts by calling __enable_irq()
interrupts get enabled, write to flag=1 is still pending.
wheee, an interrupt is pending and the CPU jumps to the interrupt handler.
flag=1 takes effect. You're now in an interrupt handler without anything to do.
Executing a DSB in front of __enable_irq() will prevent this problem because whatever is triggered by flag=1 will be in effect before __enable_irq() executes.
If you think that this case is purely academic: Nope, it's real.
Just think about a real-time clock. These usually runs at 32khz. If you write into it's peripheral space from a CPU running at 64Mhz it can take a whopping 2000 cycles before the write takes effect. Now for real-time clocks the data-sheet usually shows specific sequences that make sure you don't run into this problem.
The same thing can however happen with slow peripherals.
My personal anecdote happened when implementing power-saving late in a project. Everything was working fine. Then we reduced the peripheral clock speed of I²C and SPI peripherals to the lowest possible speed we could get away with. This can save lots of power and extend battery live. What we found out was that suddenly interrupts started to do unexpected things. They seem to fire twice each time wrecking havoc. Putting a DSB at the end of each affected interrupt handler fixed this because - you can guess - the lower clock speed caused us to leave the interrupt handlers before clearing the interrupt source was in effect due to the slow peripheral clock.
This section of the Cortex M4 generic device user guide enumerates the factors which can affect reordering.
the processor can reorder some memory accesses to improve efficiency, providing this does not affect the behavior of the instruction sequence.
the processor has multiple bus interfaces
memory or devices in the memory map have different wait states
some memory accesses are buffered or speculative.
You should also bear in mind that both DSB and ISB are often required (in that order), and that C does not make any guarantees about the ordering (except in-thread volatile accesses).
You will often observe that the short pipeline and instruction sequences can combine in such a way that the race conditions seem unreachable with a specific compiled image, but this isn't something you can rely on. Either the timing conditions might be rare (but possible), or subsequent code changes might change the resulting instruction sequence.
I am trying to do quadrature decoding using atmel xmega avr microcontroller. Xmega has only 16-bit counters. And in addition I have used up all the available timers.
Now to make 32-bit counter I have used one 16-bit counter and in its over/under flow interrupt I have increment/decrement a 16-bit global variable, so that by combining them we can make 32-bit counter.
ISR(timer_16bit)
{
if(quad_enc_mov_forward)
{
timer_over_flow++;
}
else if (quad_enc_mov_backward)
{
timer_over_flow--;
}
}
so far it is working fine. But I need to use this 32-bit value in various tasks running parallel. I'm trying to read 32-bit values as below
uint32_t current_count = timer_over_flow;
current_count = current_count << 16;
current_count = current_count + timer_16bit_count;
`timer_16_bit_count` is a hardware register.
Now the problem I am facing is when I read the read timer_over_flow to current_count in the first statement and by the time I add the timer_16bit_count there may be overflow and the 16bit timer may have become zero. This may result in taking total wrong value.
And I am trying to read this 32-bit value in multiple tasks .
Is there a way to prevent this data corruption and get the working model of 32-bit value.
Details sought by different members:
My motor can move forward or backward and accordingly counter increments/decrements.
In case of ISR, before starting my motor I'm making the global variables(quad_enc_mov_forward & quad_enc_mov_backward) set so that if there is a overflow/underflow timer_over_flow will get changed accordingly.
Variables that are modified in the ISR are declared as volatile.
Multiple tasks means that I'm using RTOS Kernel with about 6 tasks (mostly 3 tasks running parallel).
In the XMEGA I'm directly reading TCCO_CNT register for the lower byte.
One solution is:
uint16_t a, b, c;
do {
a = timer_over_flow;
b = timer_16bit_count;
c = timer_over_flow;
} while (a != c);
uint32_t counter = (uint32_t) a << 16 | b;
Per comment from user5329483, this must not be used with interrupts disabled, since the hardware counter fetched into b may be changing while the interrupt service routine (ISR) that modifies timer_over_flow would not run if interrupts are disabled. It is necessary that the ISR interrupt this code if a wrap occurs during it.
This gets the counters and checks whether the high word changed. If it did, this code tries again. When the loop exits, we know the low word did not wrap during the reads. (Unless there is a possibility we read the high word, then the low word wrapped, then we read the low word, then it wrapped the other way, then we read the high word. If that can happen in your system, an alternative is to add a flag that the ISR sets when the high word changes. The reader would clear the flag, read the timer words, and read the flag. If the flag is set, it tries again.)
Note that timer_over_flow, timer_16bit_count, and the flag, if used, must be volatile.
If the wrap-two-times scenario cannot happen, then you can eliminate the loop:
Read a, b, and c as above.
Compare b to 0x8000.
If b has a high value, either there was no wrap, it was read before a wrap upward (0xffff to 0), or it was read after a wrap downward. Use the lower of a or c.
Otherwise, either there was no wrap, b was read after a wrap upward, or it was read before a wrap downward. Use the larger of a or c.
The #1 fundamental embedded systems programming FAQ:
Any variable shared between the caller and an ISR, or between different ISRs, must be protected against race conditions. To prevent some compilers from doing incorrect optimizations, such variables should also be declared as volatile.
Those who don't understand the above are not qualified to write code containing ISRs. Or programs containing multiple processes or threads for that matter. Programmers who don't realize the above will always write very subtle, very hard-to-catch bugs.
Some means to protect against race conditions could be one of these:
Temporary disabling the specific interrupt during access.
Temporary disabling all maskable interrupts during access (crude way).
Atomic access, verified in the machine code.
A mutex or semaphore. On single-core MCU:s where interrupts cannot be interrupted in turn, you can use a bool as "poor man's mutex".
Just reading TCCO_CNT in multithreaded code is race condition if you do not handle it correctly. Check the section on reading 16bit registers in XMega manual. You should read lower byte first (this will be probably handled transparently by compiler for you). When lower byte is read, higher byte is (atomically) copied into the TEMP register. Then, reading high byte does read the TEMP register, not the counter. In this way atomic reading of 16bit value is ensured, but only if there is no access to TEMP register between low and high byte read.
Note that this TEMP register is shared between all counters, so context switch in right (wrong) moment will probably trash its content and therefore your high byte. You need to disable interrupts for this 16bit read. Because XMega will execute one instruction after the sei with interrupts disabled, the best way is probably:
cli
ld [low_byte]
sei
ld [high byte]
It disables interrupts for four CPU cycles (if I counted it correctly).
An alternative would to save shared TEMP register(s) on each context switch. It is possible (not sure if likely) that your OS already does this, but be sure to check. Even so, you need to make sure colliding access does not occur from an ISR.
This precaution should be applied to any 16bit register read in your code. Either make sure TEMP register is correctly saved/restored (or not used by multiple threads at all) or disable interrupts when reading/writing 16bit value.
This problem is indeed a very common and very hard one. All solutions will toit will have a caveat regarding timing constraints in the lower priority layers. To clarify this: the highest priority function in your system is the hardware counter - it's response time defines the maximum frequency that you can eventually sample. The next lower priority in your solution is the interrupt routine which tries to keep track of bit 2^16 and the lowest is your application level code which tries to read the 32-bit value. The question now is, if you can quantify the shortest time between two level changes on the A- and B- inputs of your encoder. The shortest time usually does occur not at the highest speed that your real world axis is rotating but when halting at a position: through minimal vibrations the encoder can double swing between two increments, thereby producing e.g. a falling and a rising edge on the same encoder output in short succession. Iff (if and only if) you can guarantee that your interrupt processing time is shorter (by a margin) than this minmal time you can use such a method to virtually extend the coordinate range of your encoder.
I am trying to fix an bug found in a mature program for Fujitsu MB90F543. The program works for nearly 10 years so far, but it was discovered, that under some special circumstances it fails to do two things at it's very beginning. One of them is crucial.
After low and high level initialization (ports, pins, peripherials, IRQ handlers) configuration data is read over SPI from EEPROM and status LEDs are turned on for a moment (to turn them a data is send over SPI to a LED driver).
When those special circumstances occur first and only first function invoking just a few EEPROM reads fails and additionally a few of the LEDs that should, don't turn on.
The program is written in C and compiled using Softune v30L32.
Surprisingly it is sufficient to add single __asm(" NOP ") in low level hardware init to make the program work as expected under mentioned circumstances. It is sufficient to turn off 'Control optimization of pointer aliasing' in Optimization settings. Adding just a few lines of code in various places helps too.
I have compared (DIFFed) ASM listings of compiled program for a version with and without __asm(" NOP ") and with both aforementioned optimizer settings and they all look just fine.
The only warning Softune compiler has been printing for years during compilation is as follows:
*** W1372L: The section is placed outside the RAM area or the I/O area (IOXTND)
I do realize it's rather general question, but maybe someone who has a bigger picture will be able to point out possible cause.
Have you got an idea what may cause such a weird behaviour? How to locate the bug and fix it?
During the initialization a few long (about 20ms) delay loops are used. They don't help although they were increased from about 2ms, yet single NOP in any line of the hardware initialization function and even before or after the function helps.
Both the wait loops works. I have checked it using an oscilloscope. (I have added LED turn on before and off after).
I have checked timming hypothesis by slowing down SPI clock from 1MHz to 500kHz. It does not change anything. Slowing down to 250kHz makes watchdog resets, as some parts of the code execute too long (>25ms).
One more thing. I have observed that adding local variables in any source file sometimes makes the problem disappear or reappear. The same concerns initializing uninitialized local variables. Adding a few extra lines of a code in any of the files helps or reveals the problem.
void main(void)
{
watchdog_init();
// waiting for power supply to stabilize
wait; // about 45ms
hardware_init();
clear_watchdog();
application_init();
clear_watchdog();
wait; // about 20ms
test_LED();
{...}
}
void hardware_init (void)
{
__asm("NOP"); // how it comes it helps? - it may be in any line of the function
io_init(); // ports initialization
clk_init();
timer_init();
adc_init();
spi_init();
LED_init();
spi_start();
key_driver_init();
can_init();
irq_init(); // set IRQ priorities and global IRQ enable
}
Could be one of many things but two spring to mind.
Timing.
Maybe the wait is not long enough for power to stabilize and not everything is synced to the clock. The NOP gets everything back in sync.
Alignment.
Perhaps the NOP gets your instructions aligned on a 32 or 64 bit boundary expected by the hardware. (we used to do this a lot on mainframe assemblers as IO operations often expected things to be on double word boundarys).
The problem was solved. It was caused by a trivial bug.
EEPROM's nHOLD and nCS signals were not initialized immediately after MCU's reset, but before the first use of the EEPROM. As a result they were 0's, so active.
This means EEPROM was selected, but waiting on hold. Meantime other transfer using SPI started. After 6 out of 8 CLK pulses EEPROM's nHOLD I/O pin was initialized and brought high. EEPROM was no longer on hold so it clocked in last two bits of a data for an other peripheral. Every subsequent operation on the EEPROM found it being having not synchronized CLK and MOSI.
When I have added NOP or anything other the moment of nHOLD 0->1 edge was shifted to happen after the last CLK pulse. Now CLK-MOSI were in sync.
All I have had to do was to initialize all the EEPROM's SPI lines, in
particular nHOLD and nCS right after the MCU reset.
Can someone please explain to me what happens inside an interrupt service routine (although it depends upon specific routine, a general explanation is enough)? This always used be a black box for me.
There is a good wikipedia page on interrupt handlers.
"An interrupt handler, also known as an interrupt service routine (ISR), is a callback subroutine in an operating system or device driver whose execution is triggered by the reception of an interrupt. Interrupt handlers have a multitude of functions, which vary based on the reason the interrupt was generated and the speed at which the Interrupt Handler completes its task."
Basically when a piece of hardware (a hardware interrupt) or some OS task (software interrupt) needs to run it triggers an interrupt. If these interrupts aren't masked (ignored) the OS will stop what it's doing and call some special code to handle this new event.
One good example is reading from a hard drive. The drive is slow and you don't want your OS to wait for the data to come back; you want the OS to go and do other things. So you set up the system so that when the disk has the data requested, it raises an interrupt. In the interrupt service routine for the disk the CPU will take the data that is now ready and will return it to the requester.
ISRs often need to happen quickly as the hardware can have a limited buffer, which will be overwritten by new data if the older data is not pulled off quickly enough.
It's also important to have your ISR complete quickly as while the CPU is servicing one ISR other interrupts will be masked, which means if the CPU can't get to them quickly enough data can be lost.
Minimal 16-bit example
The best way to understand is to make some minimal examples yourself.
First learn how to create a minimal bootloader OS and run it on QEMU and real hardware as I've explained here: https://stackoverflow.com/a/32483545/895245
Now you can run in 16-bit real mode:
movw $handler0, 0x00
mov %cs, 0x02
movw $handler1, 0x04
mov %cs, 0x06
int $0
int $1
hlt
handler0:
/* Do 0. */
iret
handler1:
/* Do 1. */
iret
This would do in order:
Do 0.
Do 1.
hlt: stop executing
Note how the processor looks for the first handler at address 0, and the second one at 4: that is a table of handlers called the IVT, and each entry has 4 bytes.
Minimal example that does some IO to make handlers visible.
Protected mode
Modern operating systems run in the so called protected mode.
The handling has more options in this mode, so it is more complex, but the spirit is the same.
Minimal example
See also
Related question: What does "int 0x80" mean in assembly code?
While the 8086 is executing a program an interrupt breaks the normal sequence of execution of instruction, divert its execution to some other program called interrupt service Routine (ISR). after executing, control return the back again to the main program.
An interrupt is used to cause a temporary halt in the execution of program. Microprocessor responds to the interrupt service routine, which is short program or subroutine that instruct the microprocessor on how to handle the interrupt.