Dual port RAM best practices? - c

I have a setup with an FPGA acting as a dual ported RAM shared between a PC and a micro controller. There are fpga semaphores that protect the ram from simultaneous access so I can avoid reading data in the middle of an update. So far, I’ve been using a byte buffer with a fixed order that I am reading into some structs to pass data in each direction, updated at 100 Hz. This has worked well.
I will be expanding the size of the ram window between the two processors, and would like to be able to pass large files between them. Is there a standard set of techniques for using dual ported ram this way?

If you have a FPGA implement a FIFO for each direction of communication between the two. This would mean file sizes and synchronization is no longer a hardware related problem. When your struct or file is packed have a DMA or Interrupt handler transfer it over and visa versa. This will make you code simpler and more reliable.
If there is high rate data that will be blocked by a large file transfer you will need a high and a low priority FIFO.

Related

Is it wrong to log inner working of an IoT device in case of failure?

I'm currently working on an IoT project and I want to log the execution of my software and hardware.
I want to log them then send them to some DB in case I need to have a look at my device remotely.
The wip IoT device will have to be as minimal as possible so the act of having to write very often inside a flash memory module seems weird to me.
I know that it will run the RTOS OS Nucleus on an Cortex-M4 with some modules connected through SPI.
Can someone with more expertise enlighten me ?
Thanks.
You will have to estimate your hourly/daily/whatever data volume that needs to go into the log and extrapolate to the expected lifetime of your product. Microcontroller flash usually isn't made for logging and thus it features neither enduring flash cells (some 10K-100K write cycles usually compared to 1M or more for dedicated data chips - look it up in the uC spec sheet) nor wear leveling. Wear leveling is any method which prevents software from writing to the same physical cell too frequently (which would e.g. be the directory for a simple file system).
For your log you will have to create a quite clever or complex method to circumvent any flash lifetime problems.
But the problems don't stop there: usually the MCU isn't able to read from Flash memory when writing to it where "writing" means a prolonged (several microseconds up to milliseconds depending on the chip) sequence of instructions controlling the internal Flash statemachine (programming voltage, saturation times, etc.) until the new values have reliably settled in the memory. And, maybe you guessed it, "reading" in this context also means reading instructions, that is you have to make sure that whichever code and interrupts that may occur during the Flash write are only executing code in RAM, cache or other memories and not in the normal instruction memory. It is doable but the more complex the SW system that you are running above the HW layer, the less likely it will work reliably.

How do I increase the speed of my USB cdc device?

I am upgrading the processor in an embedded system for work. This is all in C, with no OS. Part of that upgrade includes migrating the processor-PC communications interface from IEEE-488 to USB. I finally got the USB firmware written, and have been testing it. It was going great until I tried to push through lots of data only to discover my USB connection is slower than the old IEEE-488 connection. I have the USB device enumerating as a CDC device with a baud rate of 115200 bps, but it is clear that I am not even reaching that throughput, and I thought that number was a dummy value that is a holdover from RS232 days, but I might be wrong. I control every aspect of this from the front end on the PC to the firmware on the embedded system.
I am assuming my issue is how I write to the USB on the embedded system side. Right now my USB_Write function is run in free time, and is just a while loop that writes one char to the USB port until the write buffer is empty. Is there a more efficient way to do this?
One of my concerns that I have, is that in the old system we had a board in the system dedicated to communications. The CPU would just write data across a bus to this board, and it would handle communications, which means that the CPU didn't have to waste free time handling the actual communications, but could offload the communications to a "co processor" (not a CPU but functionally the same here). Even with this concern though I figured I should be getting faster speeds given that full speed USB is on the order of MB/s while IEEE-488 is on the order of kB/s.
In short is this more likely a fundamental system constraint or a software optimization issue?
I thought that number was a dummy value that is a holdover from RS232 days, but I might be wrong.
You are correct, the baud number is a dummy value. If you create a CDC/RS232 adapter you would use this to configure your RS232 hardware, in this case it means nothing.
Is there a more efficient way to do this?
Absolutely! You should be writing chunks of data the same size as your USB endpoint for maximum transfer speed. Depending on the device you are using your stream of single byte writes may be gathered into a single packet before sending but from my experience (and your results) this is unlikely.
Depending on your latency requirements you can stick in a circular buffer and only issue data from it to the USB_Write function when you have ENDPOINT_SZ number of byes. If this results in excessive latency or your interface is not always communicating you may want to implement Nagles algorithm.
One of my concerns that I have, is that in the old system we had a board in the system dedicated to communications.
The NXP part you mentioned in the comments is without a doubt fast enough to saturate a USB full speed connection.
In short is this more likely a fundamental system constraint or a software optimization issue?
I would consider this a software design issue rather than an optimisation one, but no, it is unlikely you are fundamentally stuck.
Do take care to figure out exactly what sort of USB connection you are using though, if you are using USB 1.1 you will be limited to 64KB/s, USB 2.0 full speed you will be limited to 512KB/s. If you require higher throughput you should migrate to using a separate bulk endpoint for the data transfer.
I would recommend reading through the USB made simple site to get a good overview of the various USB speeds and their capabilities.
One final issue, vendor CDC libraries are not always the best and implementations of the CDC standard can vary. You can theoretically get more data through a CDC endpoint by using larger endpoints, I have seen this bring host side drivers to their knees though - if you go this route create a custom driver using bulk endpoints.
Try testing your device on multiple systems, you may find you get quite different results between windows and linux. This will help to point the finger at the host end.
And finally, make sure you are doing big buffered reads on the host side, USB will stop transferring data once the host side buffers are full.

How do I write to an SD Card using SPI for the PSoC 5LP chip?

How do I write to an SD Card using SPI with DMA available for the PSoC 5LP (32-bit Cortex-M3) chip?
I currently have a DMA and SPI tx/rx pair working, but for a different purpose so if the actual transmission is not an issue, I just don't know how to interact with the SDcard.
The datasheet for the PSoC 5LP is here.
Basic Info:
I am using the DMA in simple mode and the DMA TD chain is setup for:
8 bit width, 4 Byte bursts
auto complete the full TD (only needs initial HW request)
Loop back to beginning of initial TD when done and wait for HW request
The SPI Master is initialized in a gui, I have it set using a 16Mhz clock, 8 bit tx/rx transfers with a 4 Byte tx/rx buffer. interrupts are set on rx FIFO full, connected to them is an rx DMA.
The pointers for the SDcard SPI rx/tx are SPIM_RX_PTR and SPIM_TX_PTR respectively. The DMA transfers to and from them. The Arrays that I am transferring from and to are SDcardout and SDcardin.
Having SPI communication will only get you the lowest command/block level access to the card; you will need a file system. SD cards come pre-formatted as FAT32, so a FAT file-system will provide the greatest comparability, is not the greatest reliability (corruption is likely if write is interrupted by power loss or reset for example). It also has the advantage of being relatively simple to implement and requires few resources.
There are several commercial and open-source FAT filesystems libraries available. I suggest that you look at ELM FatFs or ELM Petit FatFs both have permissive licences and are well documented. In each case you simply need to implement the disk I/O stubs to map them to your SPI driver. There are plenty of examples, documentation and application notes on the site to help you. You can start with an SPI SD implementation example for another target and adapt it to your driver (or adapt your driver perhaps). Other FAT filesystem libraries are broadly similar to this and require I/O layer implementation.
The diskio layer of ELM FatFs is not media specific, so you in fact need an additional MMC/SD layer between that and the SPI driver. It is unlikely that you will find an example for your specific target, but it is possible to work from examples for other targets since MMC/SD over SPI itself is not target specific, the hardware dependencies come only at the SPI level and the GPIO implementation for the card-detect and write-protect (optional) signals. There are several examples for various ARM targets here, a project for PSoC support here (apparently a work-in-progress at time of writing).
I have done work on exactly this problem.
I found that the existing SPI module provided with the PSoC 5 components library is not ideally suited to bulk transfers to / from an SD card. As far as I could tell, it was necessary to clear SPI module flags in software on each byte transfer, rendering DMA much less useful. I think one solution is to use two TDs (Transfer descriptors) - one to perform the data transfer and a second to clear the RX flag after the first TD has completed - anyway, that's off topic.
I also found that the emFile component supplied in the components library is limited in its capabilities. I couldn't see any way to attach DMA, and even if I could, its clock speed appeared to be very poor. On top of this, emFile requires compile-time selection of FAT16 or FAT32, limiting your design to one or another filesystem only.
As I didn't like the idea of a more complicated DMA setup, I decided to design my own SPI component hardware in the UDB editor. The project containing the component can be found at: https://github.com/PolyVinalDistillate/NSDSPI
This incorporates the excellent FatFS library mentioned above (thanks ChaN), which takes care of FAT12, FAT16 and FAT32 formatted cards. As stated, without the filesystem layer, you will only be accessing raw data blocks of 512 bytes each. With FatFS, you get analogues of fopen(), fclose(), etc.
If you look at my component in PSoC Creator, you'll see it's actually composed of 2 components: One is the specialised UDB component implementing the main SPI logic, the other is a schematic connecting my UDB component to DMA and some control logic. This second component also has the API files containing my hardware-specific code and is the component to drop into your TopDesign schematic.
FatFS is included as a precompiled library, and LowLevelFilesys.h in the API folder provides access to all the file functions.
This component was designed with bulk-reads in mind and the API does the following for read:
Sets up a DMA TD of the required data length and tells my SPI component how many bytes will be transferred.
Triggers the transfer, causing my SPI component to send 0xFF automatically (no need to write 0xFF to the SPI for every byte received), while copying each received byte into the receive buffer via DMA.
Writing the card is performed in a more typical fashion, with the DMA simply sending data to the SPI module after preparing the SD card for it.
If you run my project on your PSoC system, it will perform a read / write test on the SD card, depositing a file reporting the specs:
Testing Speed
Writing 16000 bytes to file, non-DMA
Took 94 ms
Rate 1361 kbps
Reading 16000 bytes to file, non-DMA
Took 50 ms
Verifying... All Good! :D
Rate 2560 kbps
Writing 16000 bytes to file, DMA
Took 17 ms
Rate 7529 kbps
Reading 16000 bytes to file, DMA
Took 12 ms
Verifying... All Good! :D
Rate 10666 kbps
Some SD cards give better results, some give worse. I believe this is down to the SD card itself (e.g. class, usage, age of tech, etc).

C application for parallel communication with direct memory access

I'm having a problem with a parallel connection I've got to establish using DMA (Direct Acces Memory).
I've got to write some characters to a parallel port with a given address, through a C application. I know that for a PIO access, there are the _inp/_outp functions, but I don't know how to manage a direct memory access parallel communication.
Does anyone know how I should do or has any good links (I couldn't find any even after long research on the Web
This is not something that can be answered generically.
DMA access is determined by either a DMA controller (in OLD PC's), or using "bus mastering" (PCI onwards). Either of these solutions requires access to the relevant hardware manuals for the device that you are working with (and the DMA controller, if applicable).
In general, the principle works as this:
Reserve a piece of memory (DMA buffer) for the device to store data in.
Configure the device to store the data in said region (remember that in nearly all cases, DMA happens on physical addresses, as opposed to the virtual addresses that Windows or Linux uses).
When the device has stored the requested data, an interrupt is fired, the software responsible for the device takes the interrupt and signals some higher level software that the data is ready, and (perhaps) reprograms the device to start storing data again (either after copying the DMA buffer to someplace else, or assigning a new DMA buffer).

How expensive is opening a HANDLE?

I am currently writing a time-sensitive application, and it got me thinking: How expensive is opening/closing a handle (in my case a COM port) compared to reading/writing from the handle?
I know the relative cost of other operations (like dynamic allocation vs. stack allocation), but I haven't found anything in my travels about this.
There isn't an unique answer, specially in case of devices. In general, the "open" operation (CreateFile) involves more work by the device driver. Device drivers are inclined to do the most work they can do at the initialization/opening in order to optimize subsequent read/write operations. Moreover, many devices may require a long setup. E.g. the "classic" serial driver takes much time to initialize the baudrate prescaler and the handshake signals. Instead, when the device is open and ready, the read and write operations are usually quite fast. But this is just a hint, it depends on the particular driver you are using (traditional COM? USB converter? The drivers are very different). I recommend you an investigation by a profiler.

Resources