STM32F303 DMA throughput - arm

It is maybe an easy question but I struggle to find the answer. What is the theoretical throughput of a DMA controller.
There are co conflicts between the core & DMA as all data (except those for DMA transfers are located in the CCMRAM)
Only one controller is transferring at the time

Related

STM32 - How to choose between DMA or Interrupt for peripheral R/W in HAL library

I am using STM32F3 microcontrollers and the HAL library. For many peripherals (e.g. ADC, SPI, I2C), the HAL library provides 3 ways to read/write data: polling mode, interrupt mode, and DMA mode. I know I don't want the polling mode because it's blocking. However, I am unsure how to choose between interrupt and DMA mode. Is there a general rule of thumb? I feel like DMA mode should always be better because it can write values into memory without CPU intervention?
The advantage of DMA is that it does not require CPU intervention. DMA transfers can run while the CPU is busy doing other things, or while it is idle.
Some disadvantages of DMA are that:
Most microcontrollers have a limited number of DMA channels, so it may not be possible to use DMA for all peripherals.
The overhead of setting up and executing a DMA transfer may negate its benefits when many small transfers are required, e.g. when receiving individual characters over a USART.
Unusual interactions with devices (like bidirectional data transfers with some SPI devices) are often not supported with DMA.
DMA transfers place heavier (and less predictable) loads on the microcontroller's bus matrix, making them a frequent source of errata.
Generally speaking, I'd advise against using DMA for I2C. The protocol typically only runs at 100 - 200 kHz, so using interrupts will not place an especially heavy load on the microcontroller.

Dual port RAM best practices?

I have a setup with an FPGA acting as a dual ported RAM shared between a PC and a micro controller. There are fpga semaphores that protect the ram from simultaneous access so I can avoid reading data in the middle of an update. So far, I’ve been using a byte buffer with a fixed order that I am reading into some structs to pass data in each direction, updated at 100 Hz. This has worked well.
I will be expanding the size of the ram window between the two processors, and would like to be able to pass large files between them. Is there a standard set of techniques for using dual ported ram this way?
If you have a FPGA implement a FIFO for each direction of communication between the two. This would mean file sizes and synchronization is no longer a hardware related problem. When your struct or file is packed have a DMA or Interrupt handler transfer it over and visa versa. This will make you code simpler and more reliable.
If there is high rate data that will be blocked by a large file transfer you will need a high and a low priority FIFO.

How do I write to an SD Card using SPI for the PSoC 5LP chip?

How do I write to an SD Card using SPI with DMA available for the PSoC 5LP (32-bit Cortex-M3) chip?
I currently have a DMA and SPI tx/rx pair working, but for a different purpose so if the actual transmission is not an issue, I just don't know how to interact with the SDcard.
The datasheet for the PSoC 5LP is here.
Basic Info:
I am using the DMA in simple mode and the DMA TD chain is setup for:
8 bit width, 4 Byte bursts
auto complete the full TD (only needs initial HW request)
Loop back to beginning of initial TD when done and wait for HW request
The SPI Master is initialized in a gui, I have it set using a 16Mhz clock, 8 bit tx/rx transfers with a 4 Byte tx/rx buffer. interrupts are set on rx FIFO full, connected to them is an rx DMA.
The pointers for the SDcard SPI rx/tx are SPIM_RX_PTR and SPIM_TX_PTR respectively. The DMA transfers to and from them. The Arrays that I am transferring from and to are SDcardout and SDcardin.
Having SPI communication will only get you the lowest command/block level access to the card; you will need a file system. SD cards come pre-formatted as FAT32, so a FAT file-system will provide the greatest comparability, is not the greatest reliability (corruption is likely if write is interrupted by power loss or reset for example). It also has the advantage of being relatively simple to implement and requires few resources.
There are several commercial and open-source FAT filesystems libraries available. I suggest that you look at ELM FatFs or ELM Petit FatFs both have permissive licences and are well documented. In each case you simply need to implement the disk I/O stubs to map them to your SPI driver. There are plenty of examples, documentation and application notes on the site to help you. You can start with an SPI SD implementation example for another target and adapt it to your driver (or adapt your driver perhaps). Other FAT filesystem libraries are broadly similar to this and require I/O layer implementation.
The diskio layer of ELM FatFs is not media specific, so you in fact need an additional MMC/SD layer between that and the SPI driver. It is unlikely that you will find an example for your specific target, but it is possible to work from examples for other targets since MMC/SD over SPI itself is not target specific, the hardware dependencies come only at the SPI level and the GPIO implementation for the card-detect and write-protect (optional) signals. There are several examples for various ARM targets here, a project for PSoC support here (apparently a work-in-progress at time of writing).
I have done work on exactly this problem.
I found that the existing SPI module provided with the PSoC 5 components library is not ideally suited to bulk transfers to / from an SD card. As far as I could tell, it was necessary to clear SPI module flags in software on each byte transfer, rendering DMA much less useful. I think one solution is to use two TDs (Transfer descriptors) - one to perform the data transfer and a second to clear the RX flag after the first TD has completed - anyway, that's off topic.
I also found that the emFile component supplied in the components library is limited in its capabilities. I couldn't see any way to attach DMA, and even if I could, its clock speed appeared to be very poor. On top of this, emFile requires compile-time selection of FAT16 or FAT32, limiting your design to one or another filesystem only.
As I didn't like the idea of a more complicated DMA setup, I decided to design my own SPI component hardware in the UDB editor. The project containing the component can be found at: https://github.com/PolyVinalDistillate/NSDSPI
This incorporates the excellent FatFS library mentioned above (thanks ChaN), which takes care of FAT12, FAT16 and FAT32 formatted cards. As stated, without the filesystem layer, you will only be accessing raw data blocks of 512 bytes each. With FatFS, you get analogues of fopen(), fclose(), etc.
If you look at my component in PSoC Creator, you'll see it's actually composed of 2 components: One is the specialised UDB component implementing the main SPI logic, the other is a schematic connecting my UDB component to DMA and some control logic. This second component also has the API files containing my hardware-specific code and is the component to drop into your TopDesign schematic.
FatFS is included as a precompiled library, and LowLevelFilesys.h in the API folder provides access to all the file functions.
This component was designed with bulk-reads in mind and the API does the following for read:
Sets up a DMA TD of the required data length and tells my SPI component how many bytes will be transferred.
Triggers the transfer, causing my SPI component to send 0xFF automatically (no need to write 0xFF to the SPI for every byte received), while copying each received byte into the receive buffer via DMA.
Writing the card is performed in a more typical fashion, with the DMA simply sending data to the SPI module after preparing the SD card for it.
If you run my project on your PSoC system, it will perform a read / write test on the SD card, depositing a file reporting the specs:
Testing Speed
Writing 16000 bytes to file, non-DMA
Took 94 ms
Rate 1361 kbps
Reading 16000 bytes to file, non-DMA
Took 50 ms
Verifying... All Good! :D
Rate 2560 kbps
Writing 16000 bytes to file, DMA
Took 17 ms
Rate 7529 kbps
Reading 16000 bytes to file, DMA
Took 12 ms
Verifying... All Good! :D
Rate 10666 kbps
Some SD cards give better results, some give worse. I believe this is down to the SD card itself (e.g. class, usage, age of tech, etc).

Clock and Bus how they have been connected

I am learning about these hardware clocks and Bus communication.
As per my understanding, if two processors(say ARM and DSP) are to be communicated/data transfer through bus, they need a clock for synchronous access.
In such case, will there be a single clock for both master and slave, or there can be an individual clock for each master and slave, both running at the same Hz?
I am specific to AMBA AHB/AXI. Can somebody help me in understanding this correctly or get some more resources possibly?
This is the wrong forum for this.
First off in general you do not need clocks depending on the interface, ethernet for example, uarts, etc, etc. The clock can be extracted from the data and/or agree on the same clock and deal with the drift between oscillators (uart).
For amba/axi that is all within the same silicon, the chip vendor if they choose to have an arm and a dsp are building it on the same die and are managing the clocks. that bus does have clocks, absolutely. The problem is solved by basic design, if you have to interface between two busses on the same chip you ... interface between two busses on the same chip.
If you are crossing chips then you are not using amba/axi...

DMA controller features and terminology

I'm reading some technical reference of a DMA controller but I don't understand many things, it seems that I'm missing some points, I have no practical experience dealing directly with DMA but want to understand at least the theory.
In the document it is mentioned:
The DMA controller contains an instruction processing block that
enables it to process program code that controls a DMA transfer.
So what's the purpose of that instruction processing block? who load instructions into it? I mean, if I write a driver to my device, then in order to transfer a big chunk of data from/to my device I should load the instructions to the DMA to do that? (is it the same bunch of instructions that I would feed the main processor with if there was no DMA?
The DMAC also contains an ARM AMBA and AXI master interface unit to
fetch the program code from system memory into instruction cache. The DMA instruction execution engine executes the program code from its instruction cache and schedules read/write AXI instructions through the respective instruction queue.
I know what is AMBA and AXI but still. Is it the same as before? what is the program code, what its purpose? and if the DMA controller itself has the previous "instruction processing block" then what is the "instruction cache"? What is exactly "instruction execution engine"? is it like the cpu of the DMA?
These are the main things, I assume that after understanding them, other things would be more clear as well.
I'll appreciate any good answer and reference about the field since I didn't find much.

Resources