How to exploit interrupts for data transfer over SPI peripheral

How to exploit interrupts for data transfer over SPI peripheral - c

I have been implementing device driver for the SPI peripheral of the MCU in C language.
I would like to exploit interrupt mechanism for reception and also for transmission.
As far as the reception part I think that I can implement this via exposing
the function SpiRegisterCallback into the SPI driver interface. This function
enables the client register its function which will be invoked as soon as
data byte is received (reception buffer full interrupt is invoked).
As far as the transmission part I would like to use some SpiTransmit function
which will receive pointer to the data bytes to be transmitted and number of bytes
to be transmitted. As far as implementation I am going to define some internal
callback function of the SPI driver. This internal callback will be registered
for transmission buffer empty interrupt. In this callback function the passed data bytes will be gradually placed into the transmission buffer. I am not sure whether this approach
is appropriate. Can anybody give me an advice how to implement SPI peripheral
driver which exploits interrupts for data transmission? Thanks in advance for any
suggestions.

SPI is often very real-time critical, introducing a callback with function pointers means needless overhead code. The actual copying of data from SPI to RAM must be done internally by your driver. That's all the ISR should be doing. Some general guidance can be found here.
So your ISR should be filling up a buffer, then swap pointers to buffers (no slow memcpy!) in a protected way, so that the caller always has one buffer with valid data, and the ISR always has one working buffer to fill up. Let the caller poll a flag rather than to invoke a callback from inside an ISR. I like to use tripple buffering if I can spare the RAM. That is: one buffer for the ISR, one buffer for the caller and one spare that the ISR can swap with without disrupting the caller.
This is all rather intricate to code and most programmers get it wrong. DMA is superior to interrupts here, so you should really be considering DMA instead. This is something you should be considering when picking MCU.

A request for "any suggestions" does not really make this a great question because multiple answers may be acceptable, and few will be comprehensive. It invites comments rather then answers. However I will indulge:
First, this is not by any definition an exploit. To "exploit" implies making use of something for a purpose it was not intended - that is not the correct term in this case, you are not "exploiting" the interrupt mechanism, you are simply using it.
At high clock rates, in some cases the interrupt latency and context switch time involved in processing the interrupts may be less efficient than a simple busy-wait. If the transfers are more than two or three bytes at a time, you should in any case consider using DMA if available - so the interrupt will be the DMA interrupt for a complete transfer rather then a single character. For applications such as SD card interfacing or EEPROM, DMA will have a significant performance impact and free up the CPU to do other useful work concurrently. A driver that uses a busy-wait for single byte/word transfers and DMA for block transfers may be optimal. This is particularly true perhaps if you are using an RTOS and the ISR triggers a task context to process the data - the context switch overhead may be nearly as much or more than a busy-wait for a single byte. If your SPI clock is > 1MHz for example, you will wait 8us for a byte transfer, your ISR and call backs could easily be greater then that, in which case it is not worthwhile.
So my advice here is to only consider interrupts for SPI if you are using a slow clock and can get other useful work done whilst waiting for the interrupt.
A problem with allowing call-backs in interrupts is it allows the callback provider to do things ill-advised or illegal in an interrupt context, and you loose the ability to control the processing time of the interrupt. It is fine perhaps if the callback is intended for use by someone writing a device driver - they should be aware of what they are doing, but this is the device driver.

Related

STM32 DMA from timer count to memory

I'm using an STM32H743. I have an external clock signal coming in on a GPIO pin, and I want to very accurately measure elapsed time between each rising (or falling) edge in the external clock signal. So I set things up so that TIM4 is triggered by the external clock, and TIM5 is triggered by the internal oscillator.
I wrote an IRQ so that whenever TIM4 triggers, an interrupt runs that captures TIM5's value. It seems to work OK, but I'm wondering if I can do it through DMA to avoid all the context switching and free up the CPU. Basically I want to set up a DMA so that each TIM4 event initiates a DMA transfer that copies the TIM5 counter value to a circular buffer somewhere.
I've searched through forums and the DMA documentation but I'm hazy on whether a timer register can be a valid DMA source. I was thinking maybe I could do something like this:
hDma->PAR = (uint32_t) &htim5.Instance->CNT;
hDma->M0AR = (uint32_t) myBufferPtr;
hDma->NDTR = myBufferSize;
hDma->CR |= (uint32_t)DMA_SxCR_EN;
But I'm not sure if this can work.
Short version: Can I use the timer's CNT register as a DMA transfer source? Would it be a peripheral-to-memory transfer? Or a memory-to-memory transfer? Are there other flags I need to make this work? Or is it not possible? Or is there another STM32 feature that would make it easier to count time between pulses?

Disclaimer
I must confess that my long practical experience with STM32 by now stayed with mainstream controller families like STM32F0, STM32F3, STM32F4 and STM32L4.
Therefore I'm answering based on what those controllers would offer you in your situation.
The STM32H7 series is much stronger, let alone it offers several additional DMA technologies like DMA2D, MDMA and lots of other stuff that I'm not sure about.
But I think a simplified answer might also help you for now, so I'm daring to write it.
Can I use the timer's CNT register as a DMA transfer source? Would it be a peripheral-to-memory transfer? Or a memory-to-memory transfer? Are there other flags I need to make this work? Or is it not possible?
I would expect this to work.
I don't see a reason not to read the TIMx_CNT register in a DMA transfer.
The CNT register is definitely a peripheral address so you have to configure it as a peripheral-to-memory transfer.
I believe that the peripheral/memory separation refers to the bus from which the DMA controller fetches the data (or to which bus one it delivers them) in the bus matrix implemented in every STM32.
Or is there another STM32 feature that would make it easier to count time between pulses?
Yes, there is:
Many of the TIM peripherals (not all are the same) offer you a feature called "Input Capture" that connects the channel (sub-)peripheral of the TIM instance to the input and has the main part of the (same!) TIM peripheral do the internal clocking.
A prerequisite of this is, that the pin you'd like to measure has a TIMx_CHy alternate function, not "only" a TIMx_ETR one.
The TIM peripherals offer a wealthy range of different configuration options - and a complicated mess as long as you haven't got used to it.
As an introduction and a good overview, I recommend two application notes from ST:
AN4013 Application note. "STM32 cross-series timer overview", Rev.8
Which timers you have on your µC, and which features are offered by which one.
AN4776 Application note. "General-purpose timer cookbook for STM32 microcontrollers", Rev.3
How to use the timers you have. Check out section 2.6, input capture is on page 27.
Looking up those two, I found a third one you might want to check out for better precision, related to HRTIM timers:
AN4539 Application note. "HRTIM cookbook", Rev.4

It is easily done using STM32CubeIDE configurator:
configure timer, enable input capture channel, enable DMA (mode
circular, peripheral to memory,data width word/word). Enable
interrupts.
Prepare buffer for storing captured counter values
Start IC in DMA mode before main loop
For high speed operation you may copy data from timerCaptureBuffer
to timerCaptureBufferSafe inside these callbacks. For example, DMA memory to memory transfer to minimize time spent in HAL_TIM_IC_CaptureHalfCpltCallback and HAL_TIM_IC_CaptureCallback interrupts. Process adjacent captured values stored in timerCaptureBufferSafe after DMA memory to memory callback signals data is ready. You may use signaling flags so timerCaptureBufferSafe will not be overwritten.
Here is an example:
#define TIM_BUFFER_SIZE 128
uint32_t timerCaptureBuffer[TIM_BUFFER_SIZE];
uint32_t timerCaptureBufferSafe[TIM_BUFFER_SIZE];
// ...
HAL_DMA_RegisterCallback(&hdma_memtomem_dma2_stream2,
HAL_DMA_XFER_CPLT_CB_ID,
myDMA_Callback22);
// ...
HAL_TIM_IC_Start_DMA(&htim2, TIM_CHANNEL_1, uint32_t*)timerCaptureBuffer,TIM_BUFFER_SIZE);
// ...
void HAL_TIM_IC_CaptureHalfCpltCallback(TIM_HandleTypeDef *htim)
{
HAL_DMA_Start_IT(&hdma_memtomem_dma2_stream2,
(uint32_t)&timerCaptureBuffer[0],
(uint32_t)&timerCaptureBufferSafe[0],
sizeof(timerCaptureBuffer)/2/4);
// ...
}
void HAL_TIM_IC_CaptureCallback(TIM_HandleTypeDef *htim)
{
HAL_DMA_Start_IT(&hdma_memtomem_dma2_stream2,
(uint32_t)&timerCaptureBuffer[TIM_BUFFER_SIZE/2],
(uint32_t)&timerCaptureBufferSafe[TIM_BUFFER_SIZE/2],
sizeof(timerCaptureBuffer)/2/4);
// ...
}
void myDMA_Callback22(DMA_HandleTypeDef *_hdma)
{
//...
}

Relaying UART1 with UART2 within an ISR() (PIC24H)

I am programming a microcontroller of the PIC24H family and using xc16 compiler.
I am relaying U1RX-data to U2TX within main(), but when I try that in an ISR it does not work.
I am sending commands to the U1RX and the ISR() is down below. At U2RX, there are databytes coming in constantly and I want to relay 500 of them with the U1TX. The results of this is that U1TX is relaying the first 4 databytes from U2RX but then re-sending the 4th byte over and over again.
When I copy the for loop below into my main() it all works properly. In the ISR(), its like that U2RX's corresponding FIFObuffer is not clearing when read so the buffer overflows and stops reading further incoming data to U2RX. I would really appreciate if someone could show me how to approach the problem here. The variables tmp and command are globally declared.
void __attribute__((__interrupt__, auto_psv, shadow)) _U1RXInterrupt(void)
{
command = U1RXREG;
if(command=='d'){
for(i=0;i<500;i++){
while(U2STAbits.URXDA==0);
tmp=U2RXREG;
while(U1STAbits.UTXBF==1); //
U1TXREG=tmp;
}
}
}
Edit: I added the first line in the ISR().

Trying to draw an answer from the various comments.
If the main() has nothing else to do, and there are no other interrupts, you might be able to "get away with" patching all 500 chars from one UART to another under interrupt, once the first interrupt has ocurred, and perhaps it would be a useful exercise to get that working.
But that's not how you should use an interrupt. If you have other tasks in main(), and equal or lower priority interrupts, the relatively huge time that this interrupt will take (500 chars at 9600 baud = half a second) will make the processor what is known as "interrupt-bound", that is, the other processes are frozen out.
As your project gains complexity, you won't want to restrict main() to this task, and there is no need to for it be involved at all, after setting up the UARTs and IRQs. After that it can calculate π ad infinitum if you want.
I am a bit perplexed as to your sequence of operations. A command 'd' is received from U1 which tells you to patch 500 chars from U2 to U1.
I suggest one way to tackle this (and there are many) seeing as you really want to use interrupts, is to wait until the command is received from U1 - in main(). You then configure, and enable, interrupts for RXD on U2.
Then the job of the ISR will be to receive data from U2 and transmit it thru U1. If both UARTS have the same clock and the same baud rate, there should not be a synchronisation problem, since a UART is typically buffered internally: once it begins to transmit, the TXD register is available to hold another character, so any stagnation in the ISR should be minimal.
I can't write the actual code for you, since it would be supposed to work, but here is some very pseudo code, and I don't have a PIC handy (or wish to research its operational details).
ISR
has been invoked because U2 has a char RXD
you *might* need to check RXD status as a required sequence to clear the interrupt
read the RXD register, which also might clear the interrupt status
if not, specifically clear the interrupt status
while (U1 TXD busy);
write char to U1
if (chars received == 500)
disable U2 RXD interrupt
return from interrupt

ISR's must be kept lean and mean and the code made hyper-efficient if there is any hope of keeping up with the buffer on a UART. Experiment with the BAUD rate just to find the point at which your code can keep up, to help discover the right heuristic and see how far away you are from achieving your goal.
Success could depend on how fast your micro controller is, as well, and how many tasks it is running. If the microcontroller has a built in UART theoretically you should be able to manage keeping the FIFO from overflowing. On the other hand, if you paired up a UART with an insufficiently-powered micro controller, you might not be able to optimize your way out of the problem.
Besides the suggestion to offload the lower-priority work to the main thread and keep the ISR fast (that someone made in the comments), you will want to carefully look at the timing of all of the lines of code and try every trick in the book to get them to run faster. One expensive instruction can ruin your whole day, so get real creative in finding ways to save time.
EDIT: Another thing to consider - look at the assembly language your C compiler creates. A good compiler should let you inline assembly language instructions to allow you to hyper-optimize for your particular case. Generally in an ISR it would just be a small number of instructions that you have to find and implement.
EDIT 2: A PIC 24 series should be fast enough if you code it right and select a fast oscillator or crystal and run the chip at a good clock rate. Also consider the divisor the UART might be using to achieve its rate vs. the PIC clock rate. It is conceivable (to me) that an even division that could be accomplished internally via shifting would be better than one where math was required.

AVR8 Real Time Scheduler, Serial Communication

I am currently programming an ATmega32u4. I have implemented serial communication which is implemented using a build in interrupt that executes every time there is a byte received on the Rx pin. The byte on the Rx pin is placed in a one byte buffer which is replaced when another byte is received on the Rx pin. This is a built in library in atmel.
ISR(USART1_RX_vect, ISR_BLOCK)
{
RingBuffer_Insert(&usart_rx_buffer,UDR1);
}
My code executes an interrupt when a byte is received on the Rx pin. When a byte is receives this byte is entered into my ring buffer uart_rx_buffer where it is later decoded.
If an interrupt is being executed and this causes the one byte buffer to be replaced before the UART interrupt can be executed, this byte is lost.
The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost.Is there any way to avoid this problem?

One way to solve this problem would be to use the attribute ISR_NOBLOCK in all interrupts that take longer than the baud rate, causing the interrupt enable flag to be activated by the compiler as early as possible within the ISR and allowing the USART1_RX_vect to be executed inside other interrupts. However, "care should be taken to avoid stack overflows, or to avoid infinitely entering the ISR for those cases where the AVR hardware does not clear the respective interrupt flag before entering the ISR".
I've experienced this same problem and so far this was the best solution I could think of. I didn't use it nor tested it, though.
Edit: keep in mind that all other interrupts could also be executed inside interrupts declared with the attribute ISR_NOBLOCK, not just the interrupt you want. So you would basically allow all interrupts to be nested inside all interrupts, except USART1_RX_vect (and those declared with ISR_BLOCK). This is the main problem with this solution (besides the stack overflow problem).

The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost. Is there any way to avoid this problem?
All your observations are correct. While allowing nested interrupts like suggested in Nuno's answer could work, it is normally something you would/should want to avoid. Allowing nested interrupts everywhere makes code petty unpredictable.
I would first try to optimize the execution time of the interrupts that are blocking your UART receive ISR. Take a look at the interrupt priorities. If several interrupts are pending, they will be executed according to this priority. This can result in "starvation" of lower level interrupts, if there is "always" a higher level interrupt pending.
What is your baud rate? Even at 115200 bit/s you can execute about 700 instructions (assuming 8MHz) per byte received. ISRs should be as short as possible. If there is one single ISR that is taking long and you can't optimize it for what reason whatsoever, you could consider just allowing nested interrupts in this single ISR (this is only feasible if the execution is not critical).
If you use a high baud rate, consider reducing it. 9600 baud is often enough, but may require asynchronous sending to prevent blocking code.

Interrupt-safe buffer

I'm writing code for an embedded system (Cortex M0) and do not have all the luxuries of mutexes/spinlocks/etc. Is there a simple way to add data to a shared buffer (log-file) which will be flushed to disk from my Main() loop?
If there is only a single producer (1 interrupt) and single consumer (main-loop), I could use a simple buffer where the producer increases the 'head' and the consumer the 'tail'. And it will be perfectly safe. But now that I have multiple producers (interrupts) it seems like I'm stuck.
I could give each interrupt its own buffer, and combine them in Main(), but this will require a lot of extra RAM and complexity.

You can implement this through a simple ring buffer (circular array), where you turn off the hardware interrupt sources during access. It only needs the functions init, add and remove.
I'm not certain how your particular MCU handles interrupts, but most likely they will remain pending, as long as you only enable/disable the particular hardware peripheral's interrupt. Depending on the nature of your application, you could also disable the global interrupt mask, but that's rather crude.
Generally, you don't need to worry about missing out interrupts, because if the code that handles the incoming interrupts is slower than the interrupt frequency, no software in the world will fix it. You would either have to accept data losses or increase the CPU clock to dodge such scenarios. But of course you should always try to keep the code inside the ISR as compact as possible.

Why are nanosleep() and usleep() too slow?

I have a program that generates packets to send to a receiver. I need an efficient method of introducing a small delay between the sending of each packet so as not to overrun the receiver. I've tried usleep() and nanosleep() but they seem to be too slow. I've implemented a busy wait loop and had more success, but it's not the most efficient method, I know. I'm interested in anyone's experiences in trying to do what I'm doing. Do others find usleep() and nanosleep() to function well for this type of application?
Thanks,
Danny Llewallyn

The behaviour of the sleep functions for very small intervals is heavily dependent on the kernel version and configuration.
If you have a "tickless" kernel (CONFIG_NO_HZ) and high resolution timers, then you can expect the sleeps to be quite close to what you ask for.
Otherwise, you'll generally end up sleeping at the granularity of the timer interrupt. The timer interrupt interval is configurable (CONFIG_HZ) - 10ms, 4ms, 3.3ms and 1ms are the common choices.

Assuming that the higher level approaches other commenters have mentioned are not available to you, then a common approach in embedded/microcontroller land is to create a NOP-loop of the required length.
A NOP operation takes one CPU cycle and in an embedded environment you typically know exactly what clock speed your processor is running at so you can just use a simple for-loop conatining _NOP() or if only a very short delay is required then don't bother with a loop, just add in the required number of nops.
regTX = 0xFF; // Transmit FF on special register
// Wait three clock cycles
_NOP();
_NOP();
_NOP();
regTX = 0x00; // Transmit 00

This seems like a bad design. Ideally the receiver would queue any extra data it receives , and then do its message processing separate thread. In that way, it can handle bursts of data without relying on the sender to throttle its requests.
But perhaps such an approach is not practical if (for example) you do not have control of the receiver's code, or if this is an embedded application.

I can speak for Solaris here, in that it uses an OS timer to wake up sleep calls. By default the minimum wait time will be 10ms, regardless of what you specify in your usleep. However, you can use the parameters hires_tick = 1 (1ms wakeups) and hires_hz = in the /etc/system configuration file to increase the frequency of timer wake up calls.

Instead of doing things at the packet level, where you need to worry about such things as overrunning the reciever. Why not use a TCP stream to transmit the data? Let TCP handle things like flow rate control and packet retransmission.
If you've already got a lot invested in the packetized approach, you can always use a layer on top of TCP to extract the original packets of data from the TCP stream and feed these into your existing functions.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight