I am currently programming an ATmega32u4. I have implemented serial communication which is implemented using a build in interrupt that executes every time there is a byte received on the Rx pin. The byte on the Rx pin is placed in a one byte buffer which is replaced when another byte is received on the Rx pin. This is a built in library in atmel.
ISR(USART1_RX_vect, ISR_BLOCK)
{
RingBuffer_Insert(&usart_rx_buffer,UDR1);
}
My code executes an interrupt when a byte is received on the Rx pin. When a byte is receives this byte is entered into my ring buffer uart_rx_buffer where it is later decoded.
If an interrupt is being executed and this causes the one byte buffer to be replaced before the UART interrupt can be executed, this byte is lost.
The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost.Is there any way to avoid this problem?
One way to solve this problem would be to use the attribute ISR_NOBLOCK in all interrupts that take longer than the baud rate, causing the interrupt enable flag to be activated by the compiler as early as possible within the ISR and allowing the USART1_RX_vect to be executed inside other interrupts. However, "care should be taken to avoid stack overflows, or to avoid infinitely entering the ISR for those cases where the AVR hardware does not clear the respective interrupt flag before entering the ISR".
I've experienced this same problem and so far this was the best solution I could think of. I didn't use it nor tested it, though.
Edit: keep in mind that all other interrupts could also be executed inside interrupts declared with the attribute ISR_NOBLOCK, not just the interrupt you want. So you would basically allow all interrupts to be nested inside all interrupts, except USART1_RX_vect (and those declared with ISR_BLOCK). This is the main problem with this solution (besides the stack overflow problem).
The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost. Is there any way to avoid this problem?
All your observations are correct. While allowing nested interrupts like suggested in Nuno's answer could work, it is normally something you would/should want to avoid. Allowing nested interrupts everywhere makes code petty unpredictable.
I would first try to optimize the execution time of the interrupts that are blocking your UART receive ISR. Take a look at the interrupt priorities. If several interrupts are pending, they will be executed according to this priority. This can result in "starvation" of lower level interrupts, if there is "always" a higher level interrupt pending.
What is your baud rate? Even at 115200 bit/s you can execute about 700 instructions (assuming 8MHz) per byte received. ISRs should be as short as possible. If there is one single ISR that is taking long and you can't optimize it for what reason whatsoever, you could consider just allowing nested interrupts in this single ISR (this is only feasible if the execution is not critical).
If you use a high baud rate, consider reducing it. 9600 baud is often enough, but may require asynchronous sending to prevent blocking code.
Related
I have been implementing device driver for the SPI peripheral of the MCU in C language.
I would like to exploit interrupt mechanism for reception and also for transmission.
As far as the reception part I think that I can implement this via exposing
the function SpiRegisterCallback into the SPI driver interface. This function
enables the client register its function which will be invoked as soon as
data byte is received (reception buffer full interrupt is invoked).
As far as the transmission part I would like to use some SpiTransmit function
which will receive pointer to the data bytes to be transmitted and number of bytes
to be transmitted. As far as implementation I am going to define some internal
callback function of the SPI driver. This internal callback will be registered
for transmission buffer empty interrupt. In this callback function the passed data bytes will be gradually placed into the transmission buffer. I am not sure whether this approach
is appropriate. Can anybody give me an advice how to implement SPI peripheral
driver which exploits interrupts for data transmission? Thanks in advance for any
suggestions.
SPI is often very real-time critical, introducing a callback with function pointers means needless overhead code. The actual copying of data from SPI to RAM must be done internally by your driver. That's all the ISR should be doing. Some general guidance can be found here.
So your ISR should be filling up a buffer, then swap pointers to buffers (no slow memcpy!) in a protected way, so that the caller always has one buffer with valid data, and the ISR always has one working buffer to fill up. Let the caller poll a flag rather than to invoke a callback from inside an ISR. I like to use tripple buffering if I can spare the RAM. That is: one buffer for the ISR, one buffer for the caller and one spare that the ISR can swap with without disrupting the caller.
This is all rather intricate to code and most programmers get it wrong. DMA is superior to interrupts here, so you should really be considering DMA instead. This is something you should be considering when picking MCU.
A request for "any suggestions" does not really make this a great question because multiple answers may be acceptable, and few will be comprehensive. It invites comments rather then answers. However I will indulge:
First, this is not by any definition an exploit. To "exploit" implies making use of something for a purpose it was not intended - that is not the correct term in this case, you are not "exploiting" the interrupt mechanism, you are simply using it.
At high clock rates, in some cases the interrupt latency and context switch time involved in processing the interrupts may be less efficient than a simple busy-wait. If the transfers are more than two or three bytes at a time, you should in any case consider using DMA if available - so the interrupt will be the DMA interrupt for a complete transfer rather then a single character. For applications such as SD card interfacing or EEPROM, DMA will have a significant performance impact and free up the CPU to do other useful work concurrently. A driver that uses a busy-wait for single byte/word transfers and DMA for block transfers may be optimal. This is particularly true perhaps if you are using an RTOS and the ISR triggers a task context to process the data - the context switch overhead may be nearly as much or more than a busy-wait for a single byte. If your SPI clock is > 1MHz for example, you will wait 8us for a byte transfer, your ISR and call backs could easily be greater then that, in which case it is not worthwhile.
So my advice here is to only consider interrupts for SPI if you are using a slow clock and can get other useful work done whilst waiting for the interrupt.
A problem with allowing call-backs in interrupts is it allows the callback provider to do things ill-advised or illegal in an interrupt context, and you loose the ability to control the processing time of the interrupt. It is fine perhaps if the callback is intended for use by someone writing a device driver - they should be aware of what they are doing, but this is the device driver.
I have read the ARM documentation and it appears that they say in some places that the Cortex M4 can reorder memory writes, while in other places it indicates that M4 will not.
Specifically I am wondering if the DBM instruction is needed like:
volatile int flag=0;
char buffer[10];
void foo(char c)
{
__ASM volatile ("dbm" : : : "memory");
__disable_irq(); //disable IRQ as we use flag in ISR
buffer[0]=c;
flag=1;
__ASM volatile ("dbm" : : : "memory");
__enable_irq();
}
Uh, it depends on what your flag is, and it also varies from chip to chip.
In case that flag is stored in memory:
DSB is not needed here. An interrupt handler that would access flag would have to load it from memory first. Even if your previous write is still in progress the CPU will make sure that the load following the store will happen in the correct order.
If your flag is stored in peripheral memory:
Now it gets interesting. Lets assume flag is in some hardware peripheral. A write to it may make an interrupt pending or acknowledge an interrupt (aka clear a pending interrupt). Contrary to the memory example above this effect happens without the CPU having to read the flag first. So the automatic ordering of stores and loads won't help you. Also writes to flag may take effect with a surprisingly long delay due to different clock domains between the CPU and the peripheral.
So the following szenario can happen:
you write flag=1 to clear an handled interrupt.
you enable interrupts by calling __enable_irq()
interrupts get enabled, write to flag=1 is still pending.
wheee, an interrupt is pending and the CPU jumps to the interrupt handler.
flag=1 takes effect. You're now in an interrupt handler without anything to do.
Executing a DSB in front of __enable_irq() will prevent this problem because whatever is triggered by flag=1 will be in effect before __enable_irq() executes.
If you think that this case is purely academic: Nope, it's real.
Just think about a real-time clock. These usually runs at 32khz. If you write into it's peripheral space from a CPU running at 64Mhz it can take a whopping 2000 cycles before the write takes effect. Now for real-time clocks the data-sheet usually shows specific sequences that make sure you don't run into this problem.
The same thing can however happen with slow peripherals.
My personal anecdote happened when implementing power-saving late in a project. Everything was working fine. Then we reduced the peripheral clock speed of I²C and SPI peripherals to the lowest possible speed we could get away with. This can save lots of power and extend battery live. What we found out was that suddenly interrupts started to do unexpected things. They seem to fire twice each time wrecking havoc. Putting a DSB at the end of each affected interrupt handler fixed this because - you can guess - the lower clock speed caused us to leave the interrupt handlers before clearing the interrupt source was in effect due to the slow peripheral clock.
This section of the Cortex M4 generic device user guide enumerates the factors which can affect reordering.
the processor can reorder some memory accesses to improve efficiency, providing this does not affect the behavior of the instruction sequence.
the processor has multiple bus interfaces
memory or devices in the memory map have different wait states
some memory accesses are buffered or speculative.
You should also bear in mind that both DSB and ISB are often required (in that order), and that C does not make any guarantees about the ordering (except in-thread volatile accesses).
You will often observe that the short pipeline and instruction sequences can combine in such a way that the race conditions seem unreachable with a specific compiled image, but this isn't something you can rely on. Either the timing conditions might be rare (but possible), or subsequent code changes might change the resulting instruction sequence.
I'm developping a bare-metal project on a STM32L4 and I'm starting from an existing code base.
The ISRs have been implemented the following way:
read interrupt status in the peripheral to know what event(s) provoked the interrupt
do something
clear the flags that have read at the beginning.
Is it the right way to clear the flag ? Shouldn't the flags be cleared at the very beginning of the ISR ? My understanding is that, if the same peripheral event is happening a second time during step 2, it will not provoke a second IRQ so it would be lost. On the other hand if you clear the flag as soon as you can, this second event would pulse the interrupt whose state in the CPU would change to "pending and active": a second IRQ would happen.
PS: From STM32 Processor Programming Manual I read: "STM32 interrupts are both level-sensitive and pulse-sensitive".
Definitely at the beginning (unless you have special reasons in the program logic) as some time is needed the for actual write to the flag clear register to propagate through the buses.
If you decide for some reason to put it at the end of the interrupt you should leave some instructions, place the barrier instruction or read back the register before the interrupt routine return to make sure that the clear operation has propagated across the buses. Otherwise you may have a "phantom" duplicate routine calls.
This is my first post at this forum.
I am developing a MIDI sequencer device based on a STM32F429DISCOVERY board running at stock 180MHz. In order to send midi messages the USART1 is configured for 31250 bauds and the appropriate DMA is configured to transfer a 3 byte array stored in ram to the USART. I was doing tests of even timing of sending of midi messages, by configuring the Timer 4 update interrupt, within the service routine of which I am enabling the memory-to-peripheralUSART1 DMA operation. This gives me a periodic sending of a 3 byte message over the USART1 peripheral.
Everything works great and with correct frequency and correct data, but i have a small issue which i have been researching for few days now and have not been able to correct. To make things clearer, within the timer interrupt routine I set a led on the discovery (RG13) to momentarily blink and connected 1 channel of an oscilloscope to the led pin. The second channel of the oscilloscope is connected to the USART TX pin. Now, when the code is executed, i can see the led pulse on the oscilloscope's CH1, followed by the USART serial data on the CH2. But for some reason the time between the led pulse and the beginning of the serial data transfer fluctuates with every sending of the data. It increments with every sending, going from around 1uS to around 30uS, and then jumps back to 1.
I noticed that if i slightly change the USART baudrate, the time fluctuation between the pulse and the data sending changes in pattern, going faster or slower and with longer or shorter range.
I have tried resetting all the apropriate flags from USART as well as DMA, have tried to disable/enable the timer, played with interrupt priorities, but nothing has worked to get rid of the time fluctuation.
As you can imagine, the stability of this is crucial for a MIDI sequencer hardware application as it bases the timing of the musical events, which must be rock solid.
I have also tried using the USART by itself without DMA, manually sending every byte, basically same results. Interrupt driven USART TX exhibited likewise results.
The only thing which seemed to work to get rid of the time fluctuation of USART TX response is, before every sending operation to deinitialize USART and the DMA modules and reinitialize them again. This seemed to give a stable operation but inserts a long delay between the timer interrupt and the actual sending of the data over the USART, which is unacceptable.
If anyone has any thoughts on this or have done anything similar, I need an advice on where to look at.
Thanks a lot in advance!
Best regards,
Konstantin
Even based on your detailed description, there are various possibilities for errors, so best I can do is guess:
Maybe just one of the TIM setting is just slightly wrong: What about the timer's auto-reload register (TIM4_ARR)?
The period setting must be just one unit lower than the desired transmission period divided by the (possibly prescaled) clock period (see details upcounting/downcounting spec).
Now, if the reload value were just equal to the value instead, the second trigger would be late by one tiny period, the third trigger twice as much and so on (which may look like what you described).
This "ramp of delays" would then rise until the unwanted delay sums up to one UART bit period (which happens to be 32uS for 31250 bauds, quite near to the "around 30uS" you described). The next trigger would then just fit for the neighbouring UART bit cycle (without much delay).
Comparing this hypothesis with your other findings...
Changing the UART baud rate would preserve the fundamental error, but the duration of the irritating delay changes. It can appear to change its sign ("faster or slower"), depending on the beat characteristics between the (actual) TIM period and the UART bit period. => OK
Changing the event processing from DMA to IRQ handler wouldn't change much about the problem but only the "phase" of the initial delay (by the time the CPU needs to execute a different ST library function). => OK
Disabling and re-enabling the UART might have changed the behaviour because the UART clock might re-synchronize newly with the underlying bus clock (APB2 for USART2), so the delay after the TIM trigger would appear constant, and you wouldn't notice fluctuations. => OK
I am programming a microcontroller of the PIC24H family and using xc16 compiler.
I am relaying U1RX-data to U2TX within main(), but when I try that in an ISR it does not work.
I am sending commands to the U1RX and the ISR() is down below. At U2RX, there are databytes coming in constantly and I want to relay 500 of them with the U1TX. The results of this is that U1TX is relaying the first 4 databytes from U2RX but then re-sending the 4th byte over and over again.
When I copy the for loop below into my main() it all works properly. In the ISR(), its like that U2RX's corresponding FIFObuffer is not clearing when read so the buffer overflows and stops reading further incoming data to U2RX. I would really appreciate if someone could show me how to approach the problem here. The variables tmp and command are globally declared.
void __attribute__((__interrupt__, auto_psv, shadow)) _U1RXInterrupt(void)
{
command = U1RXREG;
if(command=='d'){
for(i=0;i<500;i++){
while(U2STAbits.URXDA==0);
tmp=U2RXREG;
while(U1STAbits.UTXBF==1); //
U1TXREG=tmp;
}
}
}
Edit: I added the first line in the ISR().
Trying to draw an answer from the various comments.
If the main() has nothing else to do, and there are no other interrupts, you might be able to "get away with" patching all 500 chars from one UART to another under interrupt, once the first interrupt has ocurred, and perhaps it would be a useful exercise to get that working.
But that's not how you should use an interrupt. If you have other tasks in main(), and equal or lower priority interrupts, the relatively huge time that this interrupt will take (500 chars at 9600 baud = half a second) will make the processor what is known as "interrupt-bound", that is, the other processes are frozen out.
As your project gains complexity, you won't want to restrict main() to this task, and there is no need to for it be involved at all, after setting up the UARTs and IRQs. After that it can calculate π ad infinitum if you want.
I am a bit perplexed as to your sequence of operations. A command 'd' is received from U1 which tells you to patch 500 chars from U2 to U1.
I suggest one way to tackle this (and there are many) seeing as you really want to use interrupts, is to wait until the command is received from U1 - in main(). You then configure, and enable, interrupts for RXD on U2.
Then the job of the ISR will be to receive data from U2 and transmit it thru U1. If both UARTS have the same clock and the same baud rate, there should not be a synchronisation problem, since a UART is typically buffered internally: once it begins to transmit, the TXD register is available to hold another character, so any stagnation in the ISR should be minimal.
I can't write the actual code for you, since it would be supposed to work, but here is some very pseudo code, and I don't have a PIC handy (or wish to research its operational details).
ISR
has been invoked because U2 has a char RXD
you *might* need to check RXD status as a required sequence to clear the interrupt
read the RXD register, which also might clear the interrupt status
if not, specifically clear the interrupt status
while (U1 TXD busy);
write char to U1
if (chars received == 500)
disable U2 RXD interrupt
return from interrupt
ISR's must be kept lean and mean and the code made hyper-efficient if there is any hope of keeping up with the buffer on a UART. Experiment with the BAUD rate just to find the point at which your code can keep up, to help discover the right heuristic and see how far away you are from achieving your goal.
Success could depend on how fast your micro controller is, as well, and how many tasks it is running. If the microcontroller has a built in UART theoretically you should be able to manage keeping the FIFO from overflowing. On the other hand, if you paired up a UART with an insufficiently-powered micro controller, you might not be able to optimize your way out of the problem.
Besides the suggestion to offload the lower-priority work to the main thread and keep the ISR fast (that someone made in the comments), you will want to carefully look at the timing of all of the lines of code and try every trick in the book to get them to run faster. One expensive instruction can ruin your whole day, so get real creative in finding ways to save time.
EDIT: Another thing to consider - look at the assembly language your C compiler creates. A good compiler should let you inline assembly language instructions to allow you to hyper-optimize for your particular case. Generally in an ISR it would just be a small number of instructions that you have to find and implement.
EDIT 2: A PIC 24 series should be fast enough if you code it right and select a fast oscillator or crystal and run the chip at a good clock rate. Also consider the divisor the UART might be using to achieve its rate vs. the PIC clock rate. It is conceivable (to me) that an even division that could be accomplished internally via shifting would be better than one where math was required.