Implementing an SSI slave interface on STM32 Board

Implementing an SSI slave interface on STM32 Board - arm

I am trying to implement a SSI Slave Protocol on a STM32 Board. Since the STM32 Boards don't have a SSI interface, I used its SPI interface in Slave(Transmit only mode). The master SSI sends 24 clock signals and the slave reacts by sending its data(3 Bytes) over the MISO pins. The problem I am facing is that the data is always shifted on the left on every clock signal coming from the master. For example assuming I am constantly sending 0x010101 from slave.
At first transmission the master receives 0x010101
At Second transmission the master receives 0x020202
At third transmission the master receives 0x040404
Can someone please give me some hints on how to solve this problem?

The data-shift with each transmission can happen when the SPI slave recognizes an (unexpected) additional clock pulse. Looking at the SSI protocol description on Wikipedia this actually makes sense:
In order to transmit N bits of data the master emits N clock cycles, followed by another clock pulse to signal the end of the transfer (so-called "Monoflop Time" - referring to the original hardware implementation of the SSI interface). Since the SPI protocol / SPI slave does not know about this additional clock pulse, it begins to output the first bit of the next data byte, which is in turn not recognized by the SSI master. As a result this leads to a shift in the data bits recognized by the SSI master on the next SSI frame.
Unfortunately, it is not easy to handle the Monoflop time correctly with the SPI slave. In order to deal with the additional clock pulse, we could try to set the SPI frame size to 25 bits on the slave side. Since the STM32 hardware only supports SPI frame sizes between 4 bit and 16 bit, the only choice is to set it to 5 bit. This is not very convenient, since we need to convert the 3 byte (24 bit) output data into 5 blocks of 5 bit (24 bit output data + 1 bit dummy data), but it should work for a "normal" transfer.
Things get more complicated though, if we also want to handle the cases "Multiple transmissions" and "Interrupting transmission" correctly. We need to monitor the clock signal to be able to detect the monoflop timeout. This can be done using a STM32 hardware timer with an external trigger. When the timer expires, we need to reset the SPI unit (in order to handle an interrupted transmission) and update the output value. This "simple" task can be quite challenging since it requires a couple of instructions - requiring a fast MCU depending on the SSI clock frequency.
Alternatively the SSI protocol can be implemented using a software-only "bit banging" solution. But this requires a fast MCU as well in order to handle a fast SSI clock correctly.
IMHO the best solution is to use a small (inexpensive) FPGA to implement the SSI slave and let the MCU feed it with data over a traditional SPI interface.

Related

How reliable is DMA to GPIO on STM32 MCUs?

ST has some application notes that talk about emulating a parallel bus using DMA to GPIO. I appreciate that, but it doesn't answer important questions. I am looking through the reference manual, and I can't seem to find clarify the things that I am concerned about.
I am most concerned about the jitter. The reference manual repeatedly states, that when DMA is triggered (e.g., by a timer), the DMA controller will read the memory and transfer the value to the peripheral. That might be fine with peripherals that have their own FIFO. There, when space is available in the FIFO, DMA is triggered and fills the FIFO. That will probably happen before the FIFO runs empty.
But with GPIO, if the DMA channels doesn't have a FIFO itself, the data will not be ready when the timer triggers and it needs to be fetched from SRAM. So between the timer triggering and between the value actually arriving in the GPIO output register, some time may pass. This might be measurable when looking at the clock output by the timer and the GPIO pins. The DMA controller has to compete for access to the SRAM with the running program, so certain activities by the program may increase the jitter.
Maybe that is a colossal oversight on my part, but ST's reference manual doesn't seem mention a FIFO as part of the DMA. If that is the case, that would result in jitter which may impact performance at higher frequencies.
I need to toggle 3 to 4 pins synchronously to a clock from 100kHz to 1MHz. I am considering DMA to GPIO and also abusing a QuadSPI controller. I am currently testing on a STM32L4 but I'm also considering STM32F4 or even F1.

DMA to/from GPIOit is just memory-to-memory transfer. Many STM32 uCs have built in DMA FIFOs - but they will have not use here.
The core has always priority over the DMA so if it can be the issue (very unlikely) place the core accesible data (this data which uC will access when DMA is active in the separate memory area - for example CCM (if your uC has one)
Answering the question
memory to/FROM GPIO is very reliable - I personally did not have any problems with it.

If your clock can be anything between 100 kHz and 1 MHz, I guess you're not worried about jitter in the clock itself, only jitter in the data versus the clock. If your clock need not be continuous, a novel idea then is to do some preprocessing of the data to include the clock signal as part of the GPIO data. Then you could trigger the DMA at regular intervals using a timer, and you'll get the data frequency on the bus at half that rate with perfect alignment between clock and data.
So if you you want to send the four-bit data 5 6 B D with data valid on the positive clock edge, prepare the DMA buffer as so: 05 15 06 16 0B 1B 0D 1D and connect the GPIO pin 4 as the clock. Leave a final byte in the buffer to reset the clock/bus to idle state, if you need.
You can of course extend the idea and incorporate control signals such as chip selects and tri-state signals for external buffers, if needed.
Also take note that not all DMA blocks may have access to the AHB bus which is holding the GPIO registers. For example on STM32F40x, only DMA2 can be used (this is what got me, until I read this answer https://stackoverflow.com/a/46619315/6552613).

I haven't fully explored this space yet, but, by disabling interrupts and polling for interrupt flags in my main loop, it's made the jitter on my GPIO DMA basically disappear! Granted it might just be the set of interrupts have enabled, but everything down to the systick timer was killing me. By polling the interrupts in the main loop it seems to have fixed my issue.
Note that this is on an STM32F042, and I never exceed 6 MHz for my period. When I try to, i.e. try to go to 8 MHz sampling out, everything falls apart. YMMV

Read non conventional ADC with STM32F3

I'm attempting to interface an STM32F303 Nucleo with an AD7748-4 ADC. Datasheet for the ADC:
https://www.analog.com/media/en/technical-documentation/data-sheets/ad7768-7768-4.pdf
The issue is, the ADC DOES NOT output the converted value through the SPI port, but rather employs a Data Ready Signal (DRDY), a Data Clock (DCLK), and a combination of 4 Data Outputs (DOUT0-DOUT3). The output streams 96 bits serially through one wire if I set it up that way, but timing is critical in my application and I need to clock the data in using DOUT0 to DOUT2, which would each output 32 bits. If I were serially streaming the data, I could trick the SPI port into reading it, but I'm not. The ADC is running at 20MHz, so DCLK will be operating at the same frequency. The Nucleo runs at a maximum of 72MHz, but when the DAM is utilized, it sets the clock to 64MHz.
In the STM manual, it describes a "GPIO port input data register (GPIOx_IDR) (x = A..H)" as being a read only register - my understanding is that the lower 16 bits can store an inputted value up to 16 bits (most likely for memory data R/W) - so the question is, how can I configure the GPIO to read in the data? I'm at a slight impass here. My instinct tells me that the Nucleo may not be fast enough to read the data coming from the ADC... Any ideas? All being written in C/C++ basically bare metal... I'm new to the Nucleo, haven't written code in 4 years - pardon any lapse in knowledge...

If DCLK works at 20Mhz, the uC is obviously not fast enough (you have about 3 instructions between each cycle, so even assembly language would be difficult to implement...). As I am not familiar with the stm architecture, I can only suggest a trick that will maybe spark some ideas in your head. Rather than using a crystal for the ADC, use a timer from the STM that is connected to an output pin, and clock the ADC using that pin (MCLK). When configuring the ADC using spi, idle mode, etc. you can leave this clock signal at 20Mhz. But when you need a sample from the ADC, stop the STM timer and clock the ADC "manually". (you practically control the DCLK signal). After your conversion routine is over, restart the timer at 20Mhz.

Raspberry: how does the PWM via DMA work?

I read that the driver for "Software PWM" is running somehow on the PWM-HW and acessing all GPIOs without using the CPU. Can someone explain how that works? Is there a second processor in the Raspberry Pi used for PWM and PCM module(is there a diagram for the blocks)?
The question is related to this excellent driver which I used a lot in my robots.
Here is the explanation, which I unfortunately don't understand...
The driver works by setting up a linked list of DMA control blocks with the
last one linked back to the first, so once initialised the DMA controller
cycles round continuously and the driver does not need to get involved except
when a pulse width needs to be changed. For a given period there are two DMA
control blocks; the first transfers a single word to the GPIO 'clear output'
register, while the second transfers some number of words to the PWM FIFO to
generate the required pulse width time. In addition, interspersed with these
control blocks is one for each configured servo which is used to set an output.
While the driver does use the PWM peripheral, it only uses it to pace the DMA
transfers, so as to generate accurate delays."
Is the following understanding right:
The DMA controller is like a second processor. You can run code on it. So it is used here to control all the Raspberry GPIO pins high/low states together with the PWM block. DMA Controller does this continously. There are probably more than one DMA controller in the Raspberry, so the speed of the OS Linux is not influenced much due to one missing DMA controller.
I don't understand how exactly DMA and PWM work together.

I recommend reading RPIO source code together with ServoBlaster's, as it's slightly simplified and can help understanding. Also very important: Broadcom's BCM2835 manual which contains all the tiny details.
is there a diagram for the blocks
The manual contains all the functionalities offered by the chip (not in a block diagram though, as far as I’ve seen).
Is the following understanding right:
The DMA controller is part of the main chip (Broadcom, although I think the same happens on desktop CPUs). It can't exactly run code, but it can copy memory across peripherals by itself, without consuming the main processor’s time. The DMA controller has different channels which can copy memory independently and runs independently of the CPU.
It is configurable via "control blocks" (BCM manual page 40, 4.2.1.1): you can tell the DMA controller to first copy memory from A to B, then from C to D and so on.
don't understand how exactly DMA and PWM work together
DMA is used to send data to the PWM controller ("Pulse Width Modulator", BCM manual page 138, chap. 9), which consumes the data and this creates a very precise delay. Interestingly, the PWM controller is... not used to generate any PWM pulse, but just to wait.
Can someone explain how that works?
Ultimately, you configure the value of the GPIO pins (or the settings of the PWM or PCM generator), by setting memory at a special address; the memory in that region represents the peripheral configuration (BCM manual page 89, chapter 6).
So the idea is: copy 1 onto the memory that controls the GPIO pin value, using the DMA controller; wait the pulse width; copy 0 onto the GPIO pin value; wait the remaining part of the period; loop. Since the DMA controller does it, it doesn't consume CPU cycles.
The key point here is being able to make the DMA controller "wait" an exact amount of time, and for this, RPIO and ServoBlaster use the PWM controller in FIFO mode (the PCM generator also has such functionality, but let's stick to PWM). This means that the PWM controller will "send" the data it reads from its so-called FIFO queue, and then stop. It doesn't matter how it's "sent" (BCM manual page 139, 9.4 MSENi=0), the key point is that it requires a fixed amount of time. As a matter of fact, it doesn't even matter which data is sent: the DMA controller is configured to write into the FIFO queue and then wait until the PWM controller has finished sending data, and this creates a very precise delay.
The resolution of the resulting pulse is given by the duration of the PWM transfer, which depends on the frequency at which the PWM controller is running.
Example
We have a maximum resolution of 1ms (given by the PWM delay), and we want to have a pulse of 25% duty cycle with frequency 125Hz. The period of a pulse is thus 8ms. The DMA operation performed will be
Set pin to 1 (DMA write to GPIO mem)
Wait 1ms (DMA write to PWM FIFO)
Wait 1ms (DMA write to PWM FIFO)
Set the pin to 0 (DMA write to GPIO mem)
Wait 1ms (DMA write to PWM FIFO)
...repeat "Wait 1ms" 4 more times.
Wait 1ms (DMA write to PWM FIFO) and jump back to 1.
This will thus require at least 10 DMA control blocks (8 wait instructions, given by period / delay plus 2 write operations).
Note: in ServoBlaster and RPIO, it will consume exactly 16 DMA control blocks, because (for higher precision), they always perform a "memory copy" operation before a "wait operation". The "memory copy" operation is just a dummy unless it needs to change the pin value.

Setting nss_soft in Master (SPI)

I want to set the NSS pin to software mode in master using Nucleo STM32F103RB.
In the reference manual, they say,
In NSS Software mode, set the SSM and SSI bits in the SPI_CR1 register. If the NSS pin is required in output mode, the SSOE bit only should be set.
Why do we need to set the SSI bit with SSM?
What is the purpose of the SSOE bit?

It's related to the rarely used multi-master communication.
In a multi-master setup, the NSS signal controls access to the SPI bus. The ST documentation is unfortunately a bit vague there, but my understanding is that
NSS high input means the bus is free, and you are allowed to transmit
NSS low input means someone else is transmitting, and you become a slave.
Why do we need to set SSI bit with SSM?
If the SSM (Software Slave Management) bit is set in master mode, then the SSI (Slave Select Internal) bit becomes the source of the NSS signal instead of the pin. Setting SSI to 1 allows the master to transmit, setting it to 0 makes it a slave (clears the MSTR bit in CR1).
If you have a single master, just set
SPI->CR1 = SPI_CR1_MSTR | SPI_CR1_SPE | SPI_CR1_SSM | SPI_CR1_SSI
and don't worry about the rest. It's the most flexible way, and you can control as many slaves as you like with GPIO outputs connected to the CS lines separately. You can use the NSS pin as GPIO as well.
What is the purpose of SSOE bit?
It changes the NSS pin to an output. Initially set to high, it becomes low when the controller starts transmitting (when the DR register is written to). Note that it won't automatically become high again when the transfer is finished, but by setting SPI_CR1_SPE to 0.
Using SSOE can be useful when a single master is talking to a single slave, because CS is controlled by the SPI registers. Not having to talk to a GPIO peripheral at all, there isn't any need to load its base address to a register and holding it there, saving some cycles and a couple of bytes in flash, making it possible to use a register for something else by an optimizing compiler.

Implementing a non-standard SPI variation on ARM Cortex M3

I need to create a driver for a flash memory chip connected to a STM32 Cortex M3 MCU. The chip is controlled via an SPI bus. I intended to use integrated SPI peripheral of the MCU, but unfortunately it only supports 8- or 16-bit data packets while the flash chip commands are 14 bit long. Thus, I have to implement the protocol from scratch using GPIOs. My question is: what is the right way to ensure correct timings of the signals? I currently think of inserting delays between asserting and deasserting GPIO lines with interrupts disabled, but it seems fairly unreliable to me. Are there any better methods?

Jeb's answer is the preferred method and you should use the hardware SPI if possible, and if DMA is an option that is nice as well.
If you for some reason find out that you cannot use the hardware SPI, but that you must implement it using "bit-banging" over GPIO, you should check what options there are available in the timer/PWM hardware on the MCU. You cannot and should not use blunt "hobbyist burn-away delays" as in the link you posted, the real-time performance will be crap and you will occupy the CPU 100%.
Most MCU timers come with a pin output feature, that would allow a pin to change state when the timer elapses. The pseudo code would then be:
Determine if the next bit to send is 1 or 0.
Set the MCU polarity register accordingly, so that it will switch the pin to a high or low level.
When the timer elapses, you need to set the polarity once again, likely through an interrupt. How to do this is very hardware-dependent.
At the same time as you bit-bang the data (MOSI), you also need to generate the clock and chip select. The clock can be generated in the same way as the data, or possibly through a PWM signal if that option is available. Chip select is the easiest part as you only need to pull a pin low during the data transmission.
Finally, there is most likely some application note or official example over how to write a software SPI for your particular MCU.

I would recommend to use the build in SPI and DMA if possible!
You could remapping your data into an array of bytes with a size of a multiple of 14bits.
So you have to send a multiple of 7*4Bits=28bytes each time.
Then you can use the standard SPI with 8Bit-size.
But this should be much faster with SPI/DMA than bit banging the GPIO's.

Some devices that use obscure data lengths are designed so that at the start of a transaction they will either ignore all "0" bits that are clocked in before the first "1", or all "1" bits that are clocked in before the first "0". If your device happens to be designed in such a fashion, you may be able to use 8- or 16-bit SPI mode by clocking out two "junk" bits along with the bits of interest.