Reducing Clock Cycle Count in While Loop for Granularity

Reducing Clock Cycle Count in While Loop for Granularity - c

I have a while loop implemented in C for an MSP430 processor that currently looks like this:
register unsigned int sw_loop_count = 0U;
...
while (TACCL0 & CCIE)
{
++sw_loop_count;
}
...
#pragma vector=TIMERA0_VECTOR
__interrupt void Timer_A(void)
{
// Disable the timer interrupt flag and enable.
TACCTL0 &= ~CCIFG;
TACCTL0 &= ~CCIE;
}
I'm using this loop for calibration purposes, the context of which I don't think matters too much for my question. I've calculated that each iteration of the loop, including the check TACCL0 & CCIE takes 11 clock cycles. For purposes of granularity, I would really like to get this number as low as possible, and programmatically if possible. I might be being a complete moron, but I can't think of a way of reducing the cycle count for the loop, so any advice would be appreciated. I need the sw_loop_count value, one way or another.

Hmm, after I put a comment I realized that there may be something you can do ;-) In your while() condition you are checking two values. From the looks of it both those values must be defined as volatile so that they are ready from memory every single time they are used...
Can you reduce those two into a single one? Have your interrupt handler do the necessary comparison and set a single flag that you will be checking in your loop.
Or you can get really fancy and do it another way around, like that:
// signed and global (or you can pass it's address into your interrupt's routine)
volatile signed int sw_loop_count = 0;
Then there's your "measurement" loop:
while(++sw_loop_count) {}
and in your interrupt routine:
if(TACCL0 & CCIE)
{
real_count = sw_loop_count; // save the value for future use before we destroy it
sw_loop_count = -1; // this will turn into 0 in that while's pre-increment, ending the loop
}
OTOH... introducing the volatile may get so much hit from memory access that it may in fact slow down the while() loop. It really does all depend on your actual architecture (down to what type of memory controller and cache controllers there are) and I still maintain that you may be better off running it through an assembler mode and looking at what the compiler is doing.

Related

Interrupts stop executing after vector, SEI and intflags are still set. (AtMega4809)

I have several interrupts in the system, including TC capture, external interrupts and ADC conversion interrupts. There's one vector in particular that, when executed, will prevent ANY vector from executing even after the reti() instruction.
The TCB2 Capture interrupt vector (Vector 25) is the only vector in the system that stops everything leading to a WDT reset.
Global Interrupt Flag is 1, and the INTFLAGS for the activated interrupts get set correctly when an interrupt occurs. But no vector is executed.
This applies for all interrupts.
I've tried changing the vector a little, making sure the reti() instruction is present and making sure the vector table didn't change.
I also already made sure the code kept executing normally after the reti(). The only thing that seems not to work are vectors.
volatile uint8_t new_input_regs = 0x00;
/*[Other code, including other vectors without this issue]*/
void __vector_25() {
new_input_regs = PORTD_IN; //Get PORTD input values
TCB2_CTRLA &= ~(0x01); //Clear TC enable and flags
TCB2_INTFLAGS = 0xFF;
asm("WDR"); //Reset the WD timer
event_type = EVENT_INPUT_INTERRUPT; //Set the event
reti();
}
After this executes, all interrupts stop working until reset (This also causes a Watchdog Reset eventually due to no interrupts executing).
More info about the register and vector table here: https://imgur.com/a/SpLKrgT

Thanks to the comment, i've noticed the ISRs were too optimized, even skipping retis and merging vectors in one single block of assemble with no returns. The default vector functions (Like __vector_25() used above) do not work as expected, leading to un-terminated ISRs and missing reti() even when manually written. This lead to completely unexpected behaviour.
I've changed the vector definition to ISR(_VECTOR(25)) and removed reti() at the end of functions. Interrupts have resumed working as they should.
Thanks for the help.

Is there a standard function in C that allows for setting a specific delay in nSec?

In my application I need to generate a function in C that will provide a specific time delay in nano seconds. This delay timer must be done in software as I don't have any hardware timers left in my AVR MCU. My problem is that I would like to be able to set the value in nanoseconds. My MCU clock is 20MHz (50nS period). I thought a quick "for" loop, like;
for (n=0; n<value; n++)
but that won't take into account how many cycles are added to each time around the loop when compiled. Has anyone got any suggestions? I really don't want to write the code in assembler.

You give us too few information btw. but I think I can answer without them but it makes answer long. Lets start with easier problem that is you have this situation that your action need to be executed less times than the most frequent isr is executing. For example you need send byte every 1s but your isr is executing every 1ms. So in short you need to send byte every 1000 executions ISR, then you make counter in ISR thats incrementing every ISR and when reaches 1000 you send byte and set cnt to 0.
ISR()
{
cnt++;
if(cnt >= 1000)
{
execute(Z);
cnt = 0;
}
}
When you have opposed problem, isr is slower than desired time of executing your actions then I stand for redesign your use of timers. You should then make this ISR to execute faster and then divide time by counting exectued isr as I described above. This was mentioned in comments.
My suggestion is that you rethink the way you use timers.

Since you are using an AVR, you should look into using the AVR-Libc delay functions, _delay_us and _delau_ms, which are documented here:
https://www.nongnu.org/avr-libc/user-manual/group__util__delay.html
They are standard in the context of AVRs, but not standard for all C environments in general.
Some example code to get you started:
#define F_CPU 20000000
#include <util/delay.h>
int main() {
while (1) {
_delay_us(0.05);
}
}
Note that even though the _delay_us and _delay_ms functions each take a double as an argument, all floating point arithmetic is done at compile time if possible in order to produce efficient code for your delay.

Issue with global variable while making 32-bit counter

I am trying to do quadrature decoding using atmel xmega avr microcontroller. Xmega has only 16-bit counters. And in addition I have used up all the available timers.
Now to make 32-bit counter I have used one 16-bit counter and in its over/under flow interrupt I have increment/decrement a 16-bit global variable, so that by combining them we can make 32-bit counter.
ISR(timer_16bit)
{
if(quad_enc_mov_forward)
{
timer_over_flow++;
}
else if (quad_enc_mov_backward)
{
timer_over_flow--;
}
}
so far it is working fine. But I need to use this 32-bit value in various tasks running parallel. I'm trying to read 32-bit values as below
uint32_t current_count = timer_over_flow;
current_count = current_count << 16;
current_count = current_count + timer_16bit_count;
`timer_16_bit_count` is a hardware register.
Now the problem I am facing is when I read the read timer_over_flow to current_count in the first statement and by the time I add the timer_16bit_count there may be overflow and the 16bit timer may have become zero. This may result in taking total wrong value.
And I am trying to read this 32-bit value in multiple tasks .
Is there a way to prevent this data corruption and get the working model of 32-bit value.
Details sought by different members:
My motor can move forward or backward and accordingly counter increments/decrements.
In case of ISR, before starting my motor I'm making the global variables(quad_enc_mov_forward & quad_enc_mov_backward) set so that if there is a overflow/underflow timer_over_flow will get changed accordingly.
Variables that are modified in the ISR are declared as volatile.
Multiple tasks means that I'm using RTOS Kernel with about 6 tasks (mostly 3 tasks running parallel).
In the XMEGA I'm directly reading TCCO_CNT register for the lower byte.

One solution is:
uint16_t a, b, c;
do {
a = timer_over_flow;
b = timer_16bit_count;
c = timer_over_flow;
} while (a != c);
uint32_t counter = (uint32_t) a << 16 | b;
Per comment from user5329483, this must not be used with interrupts disabled, since the hardware counter fetched into b may be changing while the interrupt service routine (ISR) that modifies timer_over_flow would not run if interrupts are disabled. It is necessary that the ISR interrupt this code if a wrap occurs during it.
This gets the counters and checks whether the high word changed. If it did, this code tries again. When the loop exits, we know the low word did not wrap during the reads. (Unless there is a possibility we read the high word, then the low word wrapped, then we read the low word, then it wrapped the other way, then we read the high word. If that can happen in your system, an alternative is to add a flag that the ISR sets when the high word changes. The reader would clear the flag, read the timer words, and read the flag. If the flag is set, it tries again.)
Note that timer_over_flow, timer_16bit_count, and the flag, if used, must be volatile.
If the wrap-two-times scenario cannot happen, then you can eliminate the loop:
Read a, b, and c as above.
Compare b to 0x8000.
If b has a high value, either there was no wrap, it was read before a wrap upward (0xffff to 0), or it was read after a wrap downward. Use the lower of a or c.
Otherwise, either there was no wrap, b was read after a wrap upward, or it was read before a wrap downward. Use the larger of a or c.

The #1 fundamental embedded systems programming FAQ:
Any variable shared between the caller and an ISR, or between different ISRs, must be protected against race conditions. To prevent some compilers from doing incorrect optimizations, such variables should also be declared as volatile.
Those who don't understand the above are not qualified to write code containing ISRs. Or programs containing multiple processes or threads for that matter. Programmers who don't realize the above will always write very subtle, very hard-to-catch bugs.
Some means to protect against race conditions could be one of these:
Temporary disabling the specific interrupt during access.
Temporary disabling all maskable interrupts during access (crude way).
Atomic access, verified in the machine code.
A mutex or semaphore. On single-core MCU:s where interrupts cannot be interrupted in turn, you can use a bool as "poor man's mutex".

Just reading TCCO_CNT in multithreaded code is race condition if you do not handle it correctly. Check the section on reading 16bit registers in XMega manual. You should read lower byte first (this will be probably handled transparently by compiler for you). When lower byte is read, higher byte is (atomically) copied into the TEMP register. Then, reading high byte does read the TEMP register, not the counter. In this way atomic reading of 16bit value is ensured, but only if there is no access to TEMP register between low and high byte read.
Note that this TEMP register is shared between all counters, so context switch in right (wrong) moment will probably trash its content and therefore your high byte. You need to disable interrupts for this 16bit read. Because XMega will execute one instruction after the sei with interrupts disabled, the best way is probably:
cli
ld [low_byte]
sei
ld [high byte]
It disables interrupts for four CPU cycles (if I counted it correctly).
An alternative would to save shared TEMP register(s) on each context switch. It is possible (not sure if likely) that your OS already does this, but be sure to check. Even so, you need to make sure colliding access does not occur from an ISR.
This precaution should be applied to any 16bit register read in your code. Either make sure TEMP register is correctly saved/restored (or not used by multiple threads at all) or disable interrupts when reading/writing 16bit value.

This problem is indeed a very common and very hard one. All solutions will toit will have a caveat regarding timing constraints in the lower priority layers. To clarify this: the highest priority function in your system is the hardware counter - it's response time defines the maximum frequency that you can eventually sample. The next lower priority in your solution is the interrupt routine which tries to keep track of bit 2^16 and the lowest is your application level code which tries to read the 32-bit value. The question now is, if you can quantify the shortest time between two level changes on the A- and B- inputs of your encoder. The shortest time usually does occur not at the highest speed that your real world axis is rotating but when halting at a position: through minimal vibrations the encoder can double swing between two increments, thereby producing e.g. a falling and a rising edge on the same encoder output in short succession. Iff (if and only if) you can guarantee that your interrupt processing time is shorter (by a margin) than this minmal time you can use such a method to virtually extend the coordinate range of your encoder.

Variable is not incremented in ISR

I have an ISR which increments a variable 'head' of an array. The problem is after a few hours this variable even after getting incremented comes back to its previous value. Something like:
array[head] = val;
head++;
/*val is the byte that came from ISR and I am assigning it to my buffer 'array' at head position*/
Now when I run the code for several hours, I observe that if head was say 119, stored the byte from ISR, became 120, and on next Interrupt instead of storing the next byte on 120 and incrementing head to 121, head becomes 120 again and overwrites that byte in my array. What could be the problem? Any suggestions are welcome!
Note:
head is a volatile variable.
Speed of interrupt is very high.
code snippet:
/*before storing on to the circular buffer check whether it is full*/
if ((COM1RxBufHead == COM1RxBufTail - 1) ||((COM1RxBufHead == (COM1RXBUFSIZE - 1)) && (COM1RxBufTail == 0)))
{
logDEBUG("[FULL]");
U1STAbits.OERR = 0;
return;
}
else
{
/* Byte can be safely stored on to buffer*/
COM1RxBuf[COM1RxBufHead] = U1RXREG;
if (COM1RxBufHead == (COM1RXBUFSIZE - 1))
{
COM1RxBufHead = 0;
}
else
{
COM1RxBufHead++;
}

You're working at the wrong abstraction level. Although it's perfectly okay to use C for coding this stuff up, finding problems that seem to be impossible based on the C code itself means that you need to step down one level.
And that means going into the assembler/machine-architecture arena.
The following are general suggestions only since I don't actually know your machine architecture.
Examine the actual assembly language that's generated by your compiler. Seeing the translated code may make it obvious what could be causing this issue, such as the use of cached values (even though you state that you've marked the variable volatile).
Make sure that further interrupts are disable while you're running the ISR. I don't know off the top of my head any architectures where this isn't the case but they may exist, requiring you to disable and re-enable manually.
Even if interrupts are automatically disabled in the ISR, there are architectures that have priority levels of interrupts, where a higher priority one can interrupt a lower priority ISR in progress. There are also NMIs, non-maskable interrupts, which can interrupt anything (though they tend to be used for more serious things).
Make sure that, if you're modifying the variable outside of the ISR, you disable interrupts while doing it. That's to prevent the possibility of an ISR running halfway through an update, if the update itself isn't an atomic operation interrupt-wise. That's likely to be the case since incrementing a pointer with potential circular buffer wrap-around will almost certainly be a multiple instruction (and hence interruptible) process.

Temporarily disable timer interrupt

I'm working on an embedded project in C on a stm32f4xx uC.
I have a portion of a code that does an loop-operation XYZ continuously, and from time to time a TIM4 interrupt changes some global parameters and causes the operation XYZ to restart.
code is something like this:
for (;;) {
//line A
XYZ;
//line B
}
XYZ is a complex operation involving tranfers of data between buffers and others.
The TIM4 interrupt handler does this: stops XYZ & changes some globals that afect XYZ operations.
So basically I want XYZ to execute repeatedly and TIM4 interrupt to stop XYZ, change the parameters and then the loop must restart by restarting XYZ with the new global parameters.
PROBLEM IS: Since XYZ has many instructions, TIM4 IRQ may come right in the middle of it and, after the IRQHandler changes the globals, the operations resume from the middle of XYZ which ruins the program.
MY INITIAL SOLUTION: Disable interrupts on line A with __disable_irq() and restore them on line B with __enable_irq()
Fails because the XYZ complex operation must use other interrupts (other than TIM4).
NEXT SOLUTION Disable only TIM4 interrupt on line A with:
TIM_ITConfig(TIM4, TIM_IT_Update , DISABLE)
and enable it back on line B with:
TIM_ITConfig(TIM4, TIM_IT_Update , ENABLE)
Fails because I am losing the interrupt: when the int is restored, the interrupt that arrived during XYZ is ignored. This is a big problem (one of the reasons is that TIM4 IRQHandler changes the globals and then activates the TIM4 again to give an interrupt later, I do this because the period between interrupts varies).
Can anyone give me a solution to this problem? Is there a better way to disable/restore TIM4 IRQ and NOT lose any interrupt?

You could operate on a copy of the global variables and swap in the new value from the interrupt once you're done with XYZ.
It's not clear from the question whether you need to stop processing of XYZ immediately when the globals change, or if you can wait till XYZ finishes processing to swap in new copies of the variables. I'll operate under the assumption that you need to break out of processing XYZ but it's easy enough to not.
Code would look something like this:
volatile int x;
int x_prime;
int main(void)
{
while(1)
{
//copy in new global parameter value
x_prime = x;
while(1)
{
//do stuff with x_prime
if (x_prime != x)
{
break;
}
//do more stuff with x_prime
if (x_prime != x)
{
break;
}
}
}
}
// interrupt handler
void TIM_IT_Update(void)
{
x++;
}
The break patterns assume that you're not changing x_prime. If you need to modify x_prime, you'll need another copy.
Since the interrupt is never disabled, you never have to worry about losing any of them. And since you're operating on a copy of the parameters changed by the interrupt, it doesn't matter if the interrupt changes the parameters in the middle of execution because you're not looking at those values until you make copies.

There's a few options potentially available (I'm not 100% on ARM architecture):
Alter the interrupt priority/level mask register to only mask off TIM4, leaving other interrupts to happen. Hopefully, if TIM4 is fired whilst masked, on restoring the level mask it will remember & fire the ISR.
Mask off interrupts and manually check for the TIM4 interrupt flag being set during XYZ
Break XYZ into smaller sections and only mask off TIM4 when absolutely necessary.
Operate on a copy of the data, optionally checking the TIM4 interrupt flag to decide whether to continue/keep the result or to discard & restart.
Check the time & avoid starting XYZ if TIM4 is likely to fire soon, or only run XYZ N times after TIM4 fires.

When I find myself in a similar situation, where processing may take longer time than the interruption period, I use a FIFO to detach the processing from the incoming data. I.E: TIM4 fills a FIFO. XYZ (or a manager) consume the FIFO and process the data.
(I feel that your design may be wrong, since you shouldn't be using globals to control the data or process flow.
Book reference for study on the matter: Making Embedded Systems: Design Patterns for Great Software)

Before XYZ make a copy of everything from the buffer and work with copies. I believe it's the best way, helped during writing a gps parser.

Have you considered using a RTOS for your system? I realize that would require some restructuring, but it may provide you the task handling flexibility and resolution you need for your system. If you're using a STM32's CubeIDE, enabling, configuring, and getting running with a RTOS is fairly straightforward.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight