Implementing a non-standard SPI variation on ARM Cortex M3 - c

I need to create a driver for a flash memory chip connected to a STM32 Cortex M3 MCU. The chip is controlled via an SPI bus. I intended to use integrated SPI peripheral of the MCU, but unfortunately it only supports 8- or 16-bit data packets while the flash chip commands are 14 bit long. Thus, I have to implement the protocol from scratch using GPIOs. My question is: what is the right way to ensure correct timings of the signals? I currently think of inserting delays between asserting and deasserting GPIO lines with interrupts disabled, but it seems fairly unreliable to me. Are there any better methods?

Jeb's answer is the preferred method and you should use the hardware SPI if possible, and if DMA is an option that is nice as well.
If you for some reason find out that you cannot use the hardware SPI, but that you must implement it using "bit-banging" over GPIO, you should check what options there are available in the timer/PWM hardware on the MCU. You cannot and should not use blunt "hobbyist burn-away delays" as in the link you posted, the real-time performance will be crap and you will occupy the CPU 100%.
Most MCU timers come with a pin output feature, that would allow a pin to change state when the timer elapses. The pseudo code would then be:
Determine if the next bit to send is 1 or 0.
Set the MCU polarity register accordingly, so that it will switch the pin to a high or low level.
When the timer elapses, you need to set the polarity once again, likely through an interrupt. How to do this is very hardware-dependent.
At the same time as you bit-bang the data (MOSI), you also need to generate the clock and chip select. The clock can be generated in the same way as the data, or possibly through a PWM signal if that option is available. Chip select is the easiest part as you only need to pull a pin low during the data transmission.
Finally, there is most likely some application note or official example over how to write a software SPI for your particular MCU.

I would recommend to use the build in SPI and DMA if possible!
You could remapping your data into an array of bytes with a size of a multiple of 14bits.
So you have to send a multiple of 7*4Bits=28bytes each time.
Then you can use the standard SPI with 8Bit-size.
But this should be much faster with SPI/DMA than bit banging the GPIO's.

Some devices that use obscure data lengths are designed so that at the start of a transaction they will either ignore all "0" bits that are clocked in before the first "1", or all "1" bits that are clocked in before the first "0". If your device happens to be designed in such a fashion, you may be able to use 8- or 16-bit SPI mode by clocking out two "junk" bits along with the bits of interest.

Related

Implementing an SSI slave interface on STM32 Board

I am trying to implement a SSI Slave Protocol on a STM32 Board. Since the STM32 Boards don't have a SSI interface, I used its SPI interface in Slave(Transmit only mode). The master SSI sends 24 clock signals and the slave reacts by sending its data(3 Bytes) over the MISO pins. The problem I am facing is that the data is always shifted on the left on every clock signal coming from the master. For example assuming I am constantly sending 0x010101 from slave.
At first transmission the master receives 0x010101
At Second transmission the master receives 0x020202
At third transmission the master receives 0x040404
Can someone please give me some hints on how to solve this problem?
The data-shift with each transmission can happen when the SPI slave recognizes an (unexpected) additional clock pulse. Looking at the SSI protocol description on Wikipedia this actually makes sense:
In order to transmit N bits of data the master emits N clock cycles, followed by another clock pulse to signal the end of the transfer (so-called "Monoflop Time" - referring to the original hardware implementation of the SSI interface). Since the SPI protocol / SPI slave does not know about this additional clock pulse, it begins to output the first bit of the next data byte, which is in turn not recognized by the SSI master. As a result this leads to a shift in the data bits recognized by the SSI master on the next SSI frame.
Unfortunately, it is not easy to handle the Monoflop time correctly with the SPI slave. In order to deal with the additional clock pulse, we could try to set the SPI frame size to 25 bits on the slave side. Since the STM32 hardware only supports SPI frame sizes between 4 bit and 16 bit, the only choice is to set it to 5 bit. This is not very convenient, since we need to convert the 3 byte (24 bit) output data into 5 blocks of 5 bit (24 bit output data + 1 bit dummy data), but it should work for a "normal" transfer.
Things get more complicated though, if we also want to handle the cases "Multiple transmissions" and "Interrupting transmission" correctly. We need to monitor the clock signal to be able to detect the monoflop timeout. This can be done using a STM32 hardware timer with an external trigger. When the timer expires, we need to reset the SPI unit (in order to handle an interrupted transmission) and update the output value. This "simple" task can be quite challenging since it requires a couple of instructions - requiring a fast MCU depending on the SSI clock frequency.
Alternatively the SSI protocol can be implemented using a software-only "bit banging" solution. But this requires a fast MCU as well in order to handle a fast SSI clock correctly.
IMHO the best solution is to use a small (inexpensive) FPGA to implement the SSI slave and let the MCU feed it with data over a traditional SPI interface.

Which MCU(Cortex-M) for time critical GPIO application?

We have an application which runs on PIC24H, we would like to port it to another MCU, preferably ARM Cortex. Application is extremely time critical, meaning that we need extremely deterministic code behaviour. In short, there are pulses which are obtained via special hardware to GPIO pins, data is analyzed right away. Processing of data is not complex(we don't need a beefy cpu/mcu to do it). After analyzing the data GPIO output pins are written to their values.
App in 3 short lines:
process input pins
determine pattern within processing of input pins
based on the received pattern write output pins
PIC24H is working at 40MHz, we can toggle the pin in 25ns, we would be grateful with at least 2x speed for future upgrades. So MCU which can run deterministic code and toggle pins with at least 80MHz (12.5ns) would be just fine. We don't need toggling of the pins at constant fast rate, we need a mcu which can toggle it in less than 25ns. We can't waste cycles while toggling, if one cycle is off we loose synchronization. Everything must be done in one cycle precision(or two but constant two cycles), so code should be 100% deterministic.
Please let me know if I'm missing something or if what we need can be done using some other methods on Cortex-M. Just keep in mind that if one cycle is lost(due cache or similar) we loose signal sync and app will not do it's work right or at all.
Thanks!
Br
According to this blog post, the interrupt latency for Cortex-M ranges from 12 to 16 cycles (assuming you are not using FPU registers) with best-case memories. M0 and M0+ are slower than M3/M4/M7. On top of this, you need to add the GPIO access times (and watch out for different clock frequencies between the core and the peripherals. Cortex-M7 will suppport higher clock speeds than M3/M4.
It still isn't clear how many cycles are consumed in recognising a pattern, and how an interrupt is useful in doing this - generally a low latency interface function like this would be an obvious target for dedicated hardware, but since you have an existing software solution it seems the problem is mis-specified.
Providing you avoid accessing any 'slow' peripherals which might stall the bus, the interrupt latency should be deterministic - any specific device should have documentation which covers this.
NXP have an application note which describes some of the detail of how to measure what is going on.

Read non conventional ADC with STM32F3

I'm attempting to interface an STM32F303 Nucleo with an AD7748-4 ADC. Datasheet for the ADC:
https://www.analog.com/media/en/technical-documentation/data-sheets/ad7768-7768-4.pdf
The issue is, the ADC DOES NOT output the converted value through the SPI port, but rather employs a Data Ready Signal (DRDY), a Data Clock (DCLK), and a combination of 4 Data Outputs (DOUT0-DOUT3). The output streams 96 bits serially through one wire if I set it up that way, but timing is critical in my application and I need to clock the data in using DOUT0 to DOUT2, which would each output 32 bits. If I were serially streaming the data, I could trick the SPI port into reading it, but I'm not. The ADC is running at 20MHz, so DCLK will be operating at the same frequency. The Nucleo runs at a maximum of 72MHz, but when the DAM is utilized, it sets the clock to 64MHz.
In the STM manual, it describes a "GPIO port input data register (GPIOx_IDR) (x = A..H)" as being a read only register - my understanding is that the lower 16 bits can store an inputted value up to 16 bits (most likely for memory data R/W) - so the question is, how can I configure the GPIO to read in the data? I'm at a slight impass here. My instinct tells me that the Nucleo may not be fast enough to read the data coming from the ADC... Any ideas? All being written in C/C++ basically bare metal... I'm new to the Nucleo, haven't written code in 4 years - pardon any lapse in knowledge...
If DCLK works at 20Mhz, the uC is obviously not fast enough (you have about 3 instructions between each cycle, so even assembly language would be difficult to implement...). As I am not familiar with the stm architecture, I can only suggest a trick that will maybe spark some ideas in your head. Rather than using a crystal for the ADC, use a timer from the STM that is connected to an output pin, and clock the ADC using that pin (MCLK). When configuring the ADC using spi, idle mode, etc. you can leave this clock signal at 20Mhz. But when you need a sample from the ADC, stop the STM timer and clock the ADC "manually". (you practically control the DCLK signal). After your conversion routine is over, restart the timer at 20Mhz.

How does the LPC1788FBD144 chip configure the ADC function to collect two signals simultaneously?

Now I need to configure the AD sampling feature of the LPC1788FBD144 chip, which requires the ability to read both signals simultaneously. However, there is only one ADC in the chip, how to sample two signals. By looking at the chip manual, you know that the chip has an ADC mouth with 8 channels at the same time. But in software mode, only one channel can be sampled at a time. If in the hardware scan mode, which bits of the 8 channels are set to 1, the sampling values of these channels can be read. I suspect that you may need to configure a hardware scan mode to sample both signals simultaneously.
My question is:
1、LPC1788FBD144 chip has only one ADC mouth, how to sample two signals simultaneously?
2、The first 8 bits in the AD control register of LPC1788FBD144 chip are the selection and input channels. In the software mode, only one can be set to 1. In the hardware scan mode, any value containing 1-8 can be written into that bit. I now need to collect two signals, which will require two channels, so two channels must be configured in the hardware scan mode. So what is the hardware scan pattern? How to start the hardware scan mode?
LPC1788FBD144 chip has only one ADC mouth, how to sample two signals simultaneously?
You can't read them exactly at the same time. Microcontroller SA ADC:s work by connecting one pin at a time to the actual ADC. How fast it can do this depends on sample rate and ADC clock. According to the product brief of that part, it has a conversion rate of up to 400kHz, meaning you'll get at best a 2.5us delay between samples. Check the manual for details.
This is usually good enough for the vast majority of applications. If you have tighter real-time requirements than that, you should probably be using a DSP instead of some general-purpose microcontroller.
You could of course get a MCU with two ADC:s, or use an external ADC. But I kind of doubt that your real-time specification "read at exactly the same time" makes sense. What is the purpose of the ADC read?
As for how to use your specific ADC, I don't know, but typically you'd set it up for "continuous conversion", where it keeps cycling through the channels you have enabled and write the results to their corresponding data input registers.

Can I disable Interrupts on a BBB for a short duration (0.5ms)?

I am trying to write a small driver program on a Beaglebone Black that needs to send a signal with timings like this:
I need to send 360 bits of information. I'm wondering if I can turn off all interrupts on the board for a duration of 500µs while I send the signal. I have no idea if I can just turn off all the interrupts like that. Searches have been unkind to me so far. Any ideas how I might achieve this? I do have some prototypes in assembly language for the signal, but I'm pretty sure its being broken by interrupts.
So for example, I'm hoping I could have something like this:
disable_irq();
/* asm code to send my bytes */
reenable_irq();
What would the bodies of disable_irq() and reenable_irq() look like?
The calls you would want to use are local_irq_disable() and local_irq_enable() to disable & enable IRQs locally on the current CPU. This also has the effect of disabling all preemption on the CPU.
Now lets talk about your general approach. If I understand you correctly, you'd like to bit bang your protocol over a GPIO with timing accurate to < 1/3 us.
This will be a challenge. Tests show that the Beaglebone black GPIO toggle frequency is going to max out at ~2.78MHz writing directly to the SoC IO registers in kernel mode (~0.18 us minimum pulse width).
So, although this might be achievable by the thinnest of margins by writing atomic code in kernel space, I propose another concept:
Implement your custom serial protocol on the SPI bus.
Why?
The SPI bus can be clocked up to 48MHz on the Beaglebone Black, its buffered and can be used with the DMA engine. Therefore, you don't have to worry about disabling interrupts and monopolizing your CPU for this one interface. With a timing resolution of ~0.021us (# 48MHz), you should be able to achieve your timing needs with an acceptable margin of error.
With the bus configured for Single Channel Continuous Transfer Transmit-Only Master mode and 30-bit word length (2 30-bit words for each bit of your protocol):
To write a '0' with your protocol, you'd write the 2 word sequence - 17 '1's followed by 43 '0's - on SPI (#48MHz).
To write a '1' with your protocol, you'd write the 2 word sequence - 43 '1's followed by 17 '0's - on SPI (#48MHz).
From your signal timmings it's easy to figure out that SPI or other serial peripheral can not reach your demand. In your timmings, encoding is based on the width of the pulse. So let's get to the point:
Q1 Could you turn off all interrupts for a duration of 500µs?
A: 0.5ms is quite a long time in embedded system. ISR is born to enable the concurrency of multi-task and improve the real-time capability. Your should keep in mind that ISR and context-switch(in some chip architecture) are all influenced by global interrupt.
But if your top priority is to perform the timmings, and the real-time window of other tasks are acceptable, of cause you can disable the global interrupt in the duration. Even longer. If not, don't do ATOM operation in such a long time.
Q2 How?
A: For a certain chip, there's asm instruction for open/close global interrupt undoubtedly. Find the instructions or the APIs provided by your OS, do the 3 steps below(pseudocode):
state_t tState = get_interrupt_status( );
disable_interrupt( );
... /*your operation here*/
resume_interrupt( tState );

Resources