STM32F429 Timer triggered USART DMA transfer issue - timer

This is my first post at this forum.
I am developing a MIDI sequencer device based on a STM32F429DISCOVERY board running at stock 180MHz. In order to send midi messages the USART1 is configured for 31250 bauds and the appropriate DMA is configured to transfer a 3 byte array stored in ram to the USART. I was doing tests of even timing of sending of midi messages, by configuring the Timer 4 update interrupt, within the service routine of which I am enabling the memory-to-peripheralUSART1 DMA operation. This gives me a periodic sending of a 3 byte message over the USART1 peripheral.
Everything works great and with correct frequency and correct data, but i have a small issue which i have been researching for few days now and have not been able to correct. To make things clearer, within the timer interrupt routine I set a led on the discovery (RG13) to momentarily blink and connected 1 channel of an oscilloscope to the led pin. The second channel of the oscilloscope is connected to the USART TX pin. Now, when the code is executed, i can see the led pulse on the oscilloscope's CH1, followed by the USART serial data on the CH2. But for some reason the time between the led pulse and the beginning of the serial data transfer fluctuates with every sending of the data. It increments with every sending, going from around 1uS to around 30uS, and then jumps back to 1.
I noticed that if i slightly change the USART baudrate, the time fluctuation between the pulse and the data sending changes in pattern, going faster or slower and with longer or shorter range.
I have tried resetting all the apropriate flags from USART as well as DMA, have tried to disable/enable the timer, played with interrupt priorities, but nothing has worked to get rid of the time fluctuation.
As you can imagine, the stability of this is crucial for a MIDI sequencer hardware application as it bases the timing of the musical events, which must be rock solid.
I have also tried using the USART by itself without DMA, manually sending every byte, basically same results. Interrupt driven USART TX exhibited likewise results.
The only thing which seemed to work to get rid of the time fluctuation of USART TX response is, before every sending operation to deinitialize USART and the DMA modules and reinitialize them again. This seemed to give a stable operation but inserts a long delay between the timer interrupt and the actual sending of the data over the USART, which is unacceptable.
If anyone has any thoughts on this or have done anything similar, I need an advice on where to look at.
Thanks a lot in advance!
Best regards,

Even based on your detailed description, there are various possibilities for errors, so best I can do is guess:
Maybe just one of the TIM setting is just slightly wrong: What about the timer's auto-reload register (TIM4_ARR)?
The period setting must be just one unit lower than the desired transmission period divided by the (possibly prescaled) clock period (see details upcounting/downcounting spec).
Now, if the reload value were just equal to the value instead, the second trigger would be late by one tiny period, the third trigger twice as much and so on (which may look like what you described).
This "ramp of delays" would then rise until the unwanted delay sums up to one UART bit period (which happens to be 32uS for 31250 bauds, quite near to the "around 30uS" you described). The next trigger would then just fit for the neighbouring UART bit cycle (without much delay).
Comparing this hypothesis with your other findings...
Changing the UART baud rate would preserve the fundamental error, but the duration of the irritating delay changes. It can appear to change its sign ("faster or slower"), depending on the beat characteristics between the (actual) TIM period and the UART bit period. => OK
Changing the event processing from DMA to IRQ handler wouldn't change much about the problem but only the "phase" of the initial delay (by the time the CPU needs to execute a different ST library function). => OK
Disabling and re-enabling the UART might have changed the behaviour because the UART clock might re-synchronize newly with the underlying bus clock (APB2 for USART2), so the delay after the TIM trigger would appear constant, and you wouldn't notice fluctuations. => OK


How reliable is DMA to GPIO on STM32 MCUs?

ST has some application notes that talk about emulating a parallel bus using DMA to GPIO. I appreciate that, but it doesn't answer important questions. I am looking through the reference manual, and I can't seem to find clarify the things that I am concerned about.
I am most concerned about the jitter. The reference manual repeatedly states, that when DMA is triggered (e.g., by a timer), the DMA controller will read the memory and transfer the value to the peripheral. That might be fine with peripherals that have their own FIFO. There, when space is available in the FIFO, DMA is triggered and fills the FIFO. That will probably happen before the FIFO runs empty.
But with GPIO, if the DMA channels doesn't have a FIFO itself, the data will not be ready when the timer triggers and it needs to be fetched from SRAM. So between the timer triggering and between the value actually arriving in the GPIO output register, some time may pass. This might be measurable when looking at the clock output by the timer and the GPIO pins. The DMA controller has to compete for access to the SRAM with the running program, so certain activities by the program may increase the jitter.
Maybe that is a colossal oversight on my part, but ST's reference manual doesn't seem mention a FIFO as part of the DMA. If that is the case, that would result in jitter which may impact performance at higher frequencies.
I need to toggle 3 to 4 pins synchronously to a clock from 100kHz to 1MHz. I am considering DMA to GPIO and also abusing a QuadSPI controller. I am currently testing on a STM32L4 but I'm also considering STM32F4 or even F1.
DMA to/from GPIOit is just memory-to-memory transfer. Many STM32 uCs have built in DMA FIFOs - but they will have not use here.
The core has always priority over the DMA so if it can be the issue (very unlikely) place the core accesible data (this data which uC will access when DMA is active in the separate memory area - for example CCM (if your uC has one)
Answering the question
memory to/FROM GPIO is very reliable - I personally did not have any problems with it.
If your clock can be anything between 100 kHz and 1 MHz, I guess you're not worried about jitter in the clock itself, only jitter in the data versus the clock. If your clock need not be continuous, a novel idea then is to do some preprocessing of the data to include the clock signal as part of the GPIO data. Then you could trigger the DMA at regular intervals using a timer, and you'll get the data frequency on the bus at half that rate with perfect alignment between clock and data.
So if you you want to send the four-bit data 5 6 B D with data valid on the positive clock edge, prepare the DMA buffer as so: 05 15 06 16 0B 1B 0D 1D and connect the GPIO pin 4 as the clock. Leave a final byte in the buffer to reset the clock/bus to idle state, if you need.
You can of course extend the idea and incorporate control signals such as chip selects and tri-state signals for external buffers, if needed.
Also take note that not all DMA blocks may have access to the AHB bus which is holding the GPIO registers. For example on STM32F40x, only DMA2 can be used (this is what got me, until I read this answer
I haven't fully explored this space yet, but, by disabling interrupts and polling for interrupt flags in my main loop, it's made the jitter on my GPIO DMA basically disappear! Granted it might just be the set of interrupts have enabled, but everything down to the systick timer was killing me. By polling the interrupts in the main loop it seems to have fixed my issue.
Note that this is on an STM32F042, and I never exceed 6 MHz for my period. When I try to, i.e. try to go to 8 MHz sampling out, everything falls apart. YMMV

Raspberry: how does the PWM via DMA work?

I read that the driver for "Software PWM" is running somehow on the PWM-HW and acessing all GPIOs without using the CPU. Can someone explain how that works? Is there a second processor in the Raspberry Pi used for PWM and PCM module(is there a diagram for the blocks)?
The question is related to this excellent driver which I used a lot in my robots.
Here is the explanation, which I unfortunately don't understand...
The driver works by setting up a linked list of DMA control blocks with the
last one linked back to the first, so once initialised the DMA controller
cycles round continuously and the driver does not need to get involved except
when a pulse width needs to be changed. For a given period there are two DMA
control blocks; the first transfers a single word to the GPIO 'clear output'
register, while the second transfers some number of words to the PWM FIFO to
generate the required pulse width time. In addition, interspersed with these
control blocks is one for each configured servo which is used to set an output.
While the driver does use the PWM peripheral, it only uses it to pace the DMA
transfers, so as to generate accurate delays."
Is the following understanding right:
The DMA controller is like a second processor. You can run code on it. So it is used here to control all the Raspberry GPIO pins high/low states together with the PWM block. DMA Controller does this continously. There are probably more than one DMA controller in the Raspberry, so the speed of the OS Linux is not influenced much due to one missing DMA controller.
I don't understand how exactly DMA and PWM work together.
I recommend reading RPIO source code together with ServoBlaster's, as it's slightly simplified and can help understanding. Also very important: Broadcom's BCM2835 manual which contains all the tiny details.
is there a diagram for the blocks
The manual contains all the functionalities offered by the chip (not in a block diagram though, as far as I’ve seen).
Is the following understanding right:
The DMA controller is part of the main chip (Broadcom, although I think the same happens on desktop CPUs). It can't exactly run code, but it can copy memory across peripherals by itself, without consuming the main processor’s time. The DMA controller has different channels which can copy memory independently and runs independently of the CPU.
It is configurable via "control blocks" (BCM manual page 40, you can tell the DMA controller to first copy memory from A to B, then from C to D and so on.
don't understand how exactly DMA and PWM work together
DMA is used to send data to the PWM controller ("Pulse Width Modulator", BCM manual page 138, chap. 9), which consumes the data and this creates a very precise delay. Interestingly, the PWM controller is... not used to generate any PWM pulse, but just to wait.
Can someone explain how that works?
Ultimately, you configure the value of the GPIO pins (or the settings of the PWM or PCM generator), by setting memory at a special address; the memory in that region represents the peripheral configuration (BCM manual page 89, chapter 6).
So the idea is: copy 1 onto the memory that controls the GPIO pin value, using the DMA controller; wait the pulse width; copy 0 onto the GPIO pin value; wait the remaining part of the period; loop. Since the DMA controller does it, it doesn't consume CPU cycles.
The key point here is being able to make the DMA controller "wait" an exact amount of time, and for this, RPIO and ServoBlaster use the PWM controller in FIFO mode (the PCM generator also has such functionality, but let's stick to PWM). This means that the PWM controller will "send" the data it reads from its so-called FIFO queue, and then stop. It doesn't matter how it's "sent" (BCM manual page 139, 9.4 MSENi=0), the key point is that it requires a fixed amount of time. As a matter of fact, it doesn't even matter which data is sent: the DMA controller is configured to write into the FIFO queue and then wait until the PWM controller has finished sending data, and this creates a very precise delay.
The resolution of the resulting pulse is given by the duration of the PWM transfer, which depends on the frequency at which the PWM controller is running.
We have a maximum resolution of 1ms (given by the PWM delay), and we want to have a pulse of 25% duty cycle with frequency 125Hz. The period of a pulse is thus 8ms. The DMA operation performed will be
Set pin to 1 (DMA write to GPIO mem)
Wait 1ms (DMA write to PWM FIFO)
Wait 1ms (DMA write to PWM FIFO)
Set the pin to 0 (DMA write to GPIO mem)
Wait 1ms (DMA write to PWM FIFO)
...repeat "Wait 1ms" 4 more times.
Wait 1ms (DMA write to PWM FIFO) and jump back to 1.
This will thus require at least 10 DMA control blocks (8 wait instructions, given by period / delay plus 2 write operations).
Note: in ServoBlaster and RPIO, it will consume exactly 16 DMA control blocks, because (for higher precision), they always perform a "memory copy" operation before a "wait operation". The "memory copy" operation is just a dummy unless it needs to change the pin value.

Troubleshooting Missing Data Sample Packets in xBee?

I have my first two nodes setup, I have a ZigBee Coordinator API module and a ZigBee End Device API module. I have the end point connected on Analog pins 1-3 with sensors for temp, moisture, and light.
I have the pins D1-3 configured for ADC, and the IR sample rate setting at EA60 for once per 60 seconds.
The frames log on the co-ordinator shows a stream of Explicit RX Indicator frames and Transmit Status frames, but I am seeing no IO Data Sample RX Indicator frames.
Also, I wired an LED to the sleep indicator pin, and it is almost constantly lit, it's certainly not sleeping for a minute at a time.
Any help would be greatly appreciated.
Do the Explicit RX Indicator frames look like they might contain your I/O samples? You might need to set ATAO=0 to receive the 0x92 frame type, but you're probably better off sticking with parsing the Explicit Rx to find the I/O sample payload and using that.
Regarding your sleeping end device, have you configured the various XBee registers to have it sleep? Find the section on sleep in your XBee documentation and read through it entirely -- there are many configuration options. For the ZigBee specification, you'll need to wake up every 7 seconds, even if it's just a short wakeup for the device to ping its parent device and check for network messages.
Finally, make sure you've wired your LED correctly. If the sleep indicator pin is active low, it will be pulled low whenever sleeping. And the end device will be waking for a short amount of time, possibly too short to see on an LED. You could use a scope or a logic analyzer to monitor the pin for changes instead.

AVR8 Real Time Scheduler, Serial Communication

I am currently programming an ATmega32u4. I have implemented serial communication which is implemented using a build in interrupt that executes every time there is a byte received on the Rx pin. The byte on the Rx pin is placed in a one byte buffer which is replaced when another byte is received on the Rx pin. This is a built in library in atmel.
My code executes an interrupt when a byte is received on the Rx pin. When a byte is receives this byte is entered into my ring buffer uart_rx_buffer where it is later decoded.
If an interrupt is being executed and this causes the one byte buffer to be replaced before the UART interrupt can be executed, this byte is lost.
The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost.Is there any way to avoid this problem?
One way to solve this problem would be to use the attribute ISR_NOBLOCK in all interrupts that take longer than the baud rate, causing the interrupt enable flag to be activated by the compiler as early as possible within the ISR and allowing the USART1_RX_vect to be executed inside other interrupts. However, "care should be taken to avoid stack overflows, or to avoid infinitely entering the ISR for those cases where the AVR hardware does not clear the respective interrupt flag before entering the ISR".
I've experienced this same problem and so far this was the best solution I could think of. I didn't use it nor tested it, though.
Edit: keep in mind that all other interrupts could also be executed inside interrupts declared with the attribute ISR_NOBLOCK, not just the interrupt you want. So you would basically allow all interrupts to be nested inside all interrupts, except USART1_RX_vect (and those declared with ISR_BLOCK). This is the main problem with this solution (besides the stack overflow problem).
The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost. Is there any way to avoid this problem?
All your observations are correct. While allowing nested interrupts like suggested in Nuno's answer could work, it is normally something you would/should want to avoid. Allowing nested interrupts everywhere makes code petty unpredictable.
I would first try to optimize the execution time of the interrupts that are blocking your UART receive ISR. Take a look at the interrupt priorities. If several interrupts are pending, they will be executed according to this priority. This can result in "starvation" of lower level interrupts, if there is "always" a higher level interrupt pending.
What is your baud rate? Even at 115200 bit/s you can execute about 700 instructions (assuming 8MHz) per byte received. ISRs should be as short as possible. If there is one single ISR that is taking long and you can't optimize it for what reason whatsoever, you could consider just allowing nested interrupts in this single ISR (this is only feasible if the execution is not critical).
If you use a high baud rate, consider reducing it. 9600 baud is often enough, but may require asynchronous sending to prevent blocking code.

DAC Signal Generator stm32

I am programming DAC peripheral of stm32f2xx. I have an array of bytes (Sound) & I would like to generate signal with sample rate = 8K.
Now my question is:
How do I specify sample rate?
I googled alot. I am only getting trangle wave generation and sine wave generation using DMA. I dont want to use DMA.
Thanks in advance for help...
It's not practical to play waveforms out of the DAC without using DMA. You set up the DMA with your samples, and you set up the DAC to use a timer as the trigger. Then you set up your timer to trigger at your desired sample rate.
I would agree with TJD that in general it is not practical to do so without DMA, however it is not impossible, particularly at a low sample rate.
One could use a timer set to trigger every 1/8000th of a second as the fixed time base. From there, the interrupt routine would need to load up the next sample into the DAC. The sample rate could be varied by changing the timer's time base.
It would be a similar effort to write the code to configure the DMA controller when compared to writing the code to move the correct sample into the buffer. However, the DMA approach would be more reliable, likely posses less jitter in the sample rate, and frees up the core to execute other code that may be needed. In fact, with the TIM/DMA/DACs setup, you may be able to halt the core or enter a sleep mode that keeps peripheral clocks running.
yes, i agree with TJD too.
using DMA is effecient as well as free up CPU for other task [good].
managing the timing in software(core with busy loop) [bad] will not produce good results. (so, use timer for timing [good]).
now for copying, you have to dedicate CPU to do the copying after a specific interval of time (from busy-loop or timer timeout) to DAC register.[bad]
at the end i recommend, connect DMA and timer, and on timeout, DMA will copy data to DAC register [good]. this solution only appear hard but actually much easier to work with when setup'd.
[note: written in pov of someone who is trying to understand/start on something like this]
