DMA controller features and terminology

DMA controller features and terminology - arm

I'm reading some technical reference of a DMA controller but I don't understand many things, it seems that I'm missing some points, I have no practical experience dealing directly with DMA but want to understand at least the theory.
In the document it is mentioned:
The DMA controller contains an instruction processing block that
enables it to process program code that controls a DMA transfer.
So what's the purpose of that instruction processing block? who load instructions into it? I mean, if I write a driver to my device, then in order to transfer a big chunk of data from/to my device I should load the instructions to the DMA to do that? (is it the same bunch of instructions that I would feed the main processor with if there was no DMA?
The DMAC also contains an ARM AMBA and AXI master interface unit to
fetch the program code from system memory into instruction cache. The DMA instruction execution engine executes the program code from its instruction cache and schedules read/write AXI instructions through the respective instruction queue.
I know what is AMBA and AXI but still. Is it the same as before? what is the program code, what its purpose? and if the DMA controller itself has the previous "instruction processing block" then what is the "instruction cache"? What is exactly "instruction execution engine"? is it like the cpu of the DMA?
These are the main things, I assume that after understanding them, other things would be more clear as well.
I'll appreciate any good answer and reference about the field since I didn't find much.

Related

Is it wrong to log inner working of an IoT device in case of failure?

I'm currently working on an IoT project and I want to log the execution of my software and hardware.
I want to log them then send them to some DB in case I need to have a look at my device remotely.
The wip IoT device will have to be as minimal as possible so the act of having to write very often inside a flash memory module seems weird to me.
I know that it will run the RTOS OS Nucleus on an Cortex-M4 with some modules connected through SPI.
Can someone with more expertise enlighten me ?
Thanks.

You will have to estimate your hourly/daily/whatever data volume that needs to go into the log and extrapolate to the expected lifetime of your product. Microcontroller flash usually isn't made for logging and thus it features neither enduring flash cells (some 10K-100K write cycles usually compared to 1M or more for dedicated data chips - look it up in the uC spec sheet) nor wear leveling. Wear leveling is any method which prevents software from writing to the same physical cell too frequently (which would e.g. be the directory for a simple file system).
For your log you will have to create a quite clever or complex method to circumvent any flash lifetime problems.
But the problems don't stop there: usually the MCU isn't able to read from Flash memory when writing to it where "writing" means a prolonged (several microseconds up to milliseconds depending on the chip) sequence of instructions controlling the internal Flash statemachine (programming voltage, saturation times, etc.) until the new values have reliably settled in the memory. And, maybe you guessed it, "reading" in this context also means reading instructions, that is you have to make sure that whichever code and interrupts that may occur during the Flash write are only executing code in RAM, cache or other memories and not in the normal instruction memory. It is doable but the more complex the SW system that you are running above the HW layer, the less likely it will work reliably.

ARM Linux reboot process [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How reboot procedure works on ARM SOCs running Linux, e.g do boot loaders reinitialize DDR memory? can anybody please explain me rebooting process in detail.

How reboot procedure works on ARM SOCs running Linux, ... ?
The typical ARM processor in use today is integrated with peripherals on a single IC called a SoC, system on a chip. Typically the reboot procedure is nearly identical to a power-on boot procedure. On a reset the ARM processor typically jumps to address 0.
Main memory, e.g. DRAM, and non-volatile storage, e.g. NAND flash, are typically external to the SoC (that is Linux capable) for maximum design flexibility.
But typically there is a small (perhaps 128KB) embedded ROM (read-only memory) to initialize the minimal system components (e.g. clocks, external memories) to begin bootstrap operations. A processor reset will cause execution of this boot ROM. (This ROM is truly read-only, and cannot be modified. The code is masked into the silicon during chip fabrication.)
The SoC may have a strapping option to instead execute an external boot memory, such as NOR flash or EEPROM, which can be directly executed (i.e. XIP, execute in place).
The salient characteristic of any ROM, flash, and SRAM that the first-stage boot program uses is that these memories must be accessible immediately after a reset.
One of the problems of bootstrapping a system that uses DRAM for main memory is its hardware initialization. The DRAM memory controller has to be initialized with board-specific parameters before code can be loaded into DRAM and executed. So from where does this board-specific initialization code execute, since it can't be in main memory?
Each vendor has their own solution.
Some require memory configuration data to be stored in nonvolatile memory for the boot ROM to access.
Some SoCs have integrated SRAM (which does not require initialization like DRAM) to execute a small second-stage bootstrap program.
Some SoCs use NOR flash to hold a XIP (execute in place) bootstrap program (e.g. the SPL program of U-Boot).
Each SoC vendor has its own bootstrap method to get the OS loaded and executing.
Some use hardware strapping read through GPIO pins to determine the source of the next stage of the bootstrap sequence.
Another vendor may use an ordered list of memories and devices to probe for a bootstrap program.
Another technique, is to branch to firmware in NOR flash, which can be directly executed (i.e. XIP, execute in place).
Once the bootstrap program has initialized the DRAM, then this main memory can be used to load the next stage of booting. That could be a sophisticated boot utility such as U-Boot, or (if the bootstrap program is capable) the Linux kernel. A ROM boot program could do everything to load an ARM Linux kernel (e.g. ETRAX), but more common is that there will be several bootstrap programs or stages that have be performed between processor reset to execution of the OS.
The requirements of booting the Linux ARM kernel are spelled out in the following document: Booting ARM Linux
Older versions of Linux ARM used the ATAGs list to pass basic configuration information to the kernel. Modern versions provide a complete board configuration using a compiled binary of a Device Tree.
... e.g do boot loaders reinitialize DDR memory?
Of the few examples that I have seen, the boot programs unconditionally configure the dynamic RAM controller.
PCs have a BIOS and Power On Self Tests, aka POST. The execution of POST is the primary difference between a power-on reset (aka cold boot) versus a software reset (aka warm boot or reboot). ARM systems typically do not perform POST, so you typically will see minimal to no difference between types of reset.

This is way too broad. It's not only SoC vendor dependent, but also hardware and software dependent.
However, the most typical setup is:
CPU executes first-stage bootloader (FSB).
FSB is located on the chip itself in ROM or EEPROM and is very small (AT91RM9200 FSB is 10kB max, AFAIR). FSB then initializes minimum set of peripherals (clocks, RAM, flash), transfers second-stage bootloader (U-Boot) to RAM, and executes it.
U-Boot starts.
U-Boot initializes some other hardware (serial, ethernet, etc), transfers Linux kernel to RAM, prepares the pointer to kernel input parameters and jumps into it's entry point.
Linux kernel starts.
Magic happens here. The system now able to serve you cookies via SSH console and/or executes whatever needs to be executed.
A bit more in-depth info about warm start:
Warm start is a software reset, while cold start is power-on or hardware reset. Some (most?) SoC's are able to pass the info to FSB/SSB about warm start. This way bootloaders are able to minimize the overall boot time by skipping re-initializion of already initialized peripherals.
Again, this is most typical setup from my 15+ years experience in embedded world.

It varies a lot depending on the SoC. I'll describe something like a "typical" one (Freescale iMX6)...
Typically an on-chip Watchdog Timer is used to reset the SoC cleanly. Sometimes, an external Power Management IC can be provoked to perform a board-wide reset (this method may be better, as it avoids the risk of external chips getting "stuck" in an unexpected state, but not all board designs support it).
Upon reset, the SoC will start its normal boot process: checking option pins, fuse settings and initializing clocks and the boot device (e.g. eMMC). This is typically controlled by CPU code executing from a small on-chip ROM.
Either the internal boot ROM will initialize DDR SDRAM (using settings taken from fuses or read from a file on the boot device), or the bootloader gets loaded into internal RAM then it takes care of DDR initialization (and other things). The U-Boot bootloader can be configured to work either way.
Finally, the kernel and DTB are loaded into memory and started.

note uboot, etc are not required they are GROSS overkill, they are operating systems in their own right. to load and run linux you need memory up and running copy the kernel branch to it with some registers set to point at tables that you setup or copied from flash along with the kernel.
What you do on a cold reset or warm is up to you, same chip and board no reason necessarily why any two solutions have to do the exact same thing unless it is driven by hardware (if you do a wdt reset to start over and that reset wipes out the whole chip including the ddr controller). You just have to put the system in the same state that linux expects.

DMA access to DWT registers

I'm wrapping my head around the following question:
is it possible to access the registers of the DWT unit in Cortex-M devices with a DMA transfer?
My intention is to get readings of the DWT_CPICNT register without executing instructions on the core.

From the general description of the Core's debug subsystems elsewhere in the TRM (emphasis mine):
All the debug components exist on the internal Private Peripheral Bus (PPB) and can be accessed using privileged code.
A look at the topology in the block diagram also makes it pretty clear that these are internal to the debug layer wrapped around the core, and it's only the core itself and the external debug port which have any access.
As #LPs points out, even it it were an external block there's still no guarantee that it would be a valid DMA target, as that would further depend on the DMA controller and the interconnects within the SoC - only the manual for that particular SoC can tell you what the DMA has access to.

Can I run a program from a SD memory instead of flash on an evaluation board (embedded programming)?

I have an evaluation board (Olimex STM32-P103) which supports a SD-card connector. I want to put my program in to a SD memory instead of internal flash of the micro-controler; and run it from there.
I don't know if it is possible to do that according to boot-loader issue!
P.S my goal is running linux on this board and then port my application over it.

To run programs from SD-Card in general you should know that you can't run them "right away". This means, you have to load it in a executable memory somewhere in your address space which is done by a (more or less) simple bootloader. In the simplest instance, the bootloader is capable to read from a SD-Card a specific binary and copy it into the memory.
That being said you should think about this considering you only got 20k of RAM and 128k of Flash on your board. So where should your program go? Or better: Why not flashing the program in the 128k of Flash from the very beginning? Especially you should know that Linux is a bit "hungry" in terms of memory.
If your goal is to run a "normal" Linux on this board, I'm afraid you're screwed. This because from what I know Linux needs a MMU to run and the chip on this board does not provide one (as far as researchable without access to datasheets from ST).
If you're lucky you can go with uCLinux. I'm not sure if a finished port exists for the STM32 but it seems there are some resources based on a short google search for "STM32 uCLinux". But even if you manage to run uCLinux I'm afraid there's not much left in your system for your application, so the result might be a bit disappointing.
Depending on why you are looking for Linux running on this MCU, there are maybe other solutions like a FreeRTOS in combination with a lwIP-stack (if networking is needed) or a FAT library like FullFAT if you are looking for reading SD-Cards and stuff.
Edit: One thing i'd like to add is that booting from the SD-Card is typically something you do with "bigger" (not much but slightly) systems where you have enough RAM to keep the whole image you'd like to run in it and still have some space left for the data you want to process.

You're going to have to have some code in the STM's onboard flash (typically called a "boot loader") that implements this since the "bare metal" very likely can't boot from SD card.
You're going to have to build that code, which figures out how to use the STM's onboard peripherals to talk to the SD card, finds the file you want to run in the file system (which you also have to implement), and loads it.
I wanted to include a link to the STM standard peripheral library, but it seems to be down (being moved). :/

The data on the SD card is not memory mapped, so cannot be executed directly.
It is possible to dynamically load the data from the card into RAM for execution. WindRiver's VxWorks RTOS supports loading and linking object modules dynamically, I know of no other OS that would scale to a Cortex-M that directly supports that but it would be possible to write your own.
However, I would suggest that in the case of the microcontroller you are using the idea is ill-advised; optimal performance on Cortex-M is achieved when the code is in on-chip flash and data in RAM allowing the data and instruction to be fetch to occur simultaneously on the separate buses (Harvard architecture). If you execute the code from RAM the performance will be severely hit since then data and instructions must be fetched sequentially over the same bus.
The board is entirely unsuited to running Linux, with only 128K Bytes Program Flash, and 20K Bytes RAM is is not at all feasible. Even the smallest Linux distribution requires 600Kb RAM plus whatever is needed by application code. uClinux can just about run on higher-end STM32 with external RAM and Flash, but that would suffer from the same bus contention performance hit and Linux without an MMU is rather missing the one major benefit of using Linux at all. The part on your board lacks an external memory interface, so cannot be expanded to support Linux.
If you need an OS consider a RTOS such as uC/OS-II, FreeRTOS, or emBOS for example.

AS other says you cannot directly execute your code directly from the SD CARD.
But like those "linux board", you can load the stored kernel/programm into an external SDRAM that can be mapped and execute it from there.
You'll still need to write that "bootloader" and store it in the internal flash.
That'is a lot work to my opinion, for limited application.
If you want to write your application in a linux environnement then port it suck small target, I would rather design my application using dependency injection, or even use an emulator.

Any open-source ARM7 emulators suitable for linking with C?

I have an open-source Atari 2600 emulator (Z26), and I'd like to add support for cartridges containing an embedded ARM processor (NXP 21xx family). The idea would be to simulate the 6507 until it tries to read or write a byte of memory (which it will do every 841ns). If the 6507 performs a write, put the address and data on some of the ARM's I/O ports and let the ARM code run 20 cycles, confirm that the ARM is floating its data bus, and let the ARM run for another 38 cycles. If the 6507 performs a read, put the address on the ARM's I/O ports, let the ARM run 38 cycles, grab the data from the ARM's I/O port (hopefully the ARM software will have put it there), and let the ARM run another 20 cycles.
The ARM7 seems pretty straightforward to implement; I don't need to simulate a whole lot of hardware features. Any thoughts?
Edit
What I have in mind would be a routine that would take as a parameter a struct holding the machine state and pointers to a memory access routine. When called, the routine would emulate the ARM's instruction engine, generating appropriate reads, writes, and code fetches. I could then write the memory access routine to regard appropriate areas as flash (with roughly-approximated wait states), RAM, I/O ports, and timer registers. Some other areas would be marked as don't-care, and accesses to any other areas would flag an error and stop the emulator.
Perhaps QEMU uses such a thing internally. Since the ARM emulation would be integrated into an already-existing emulation engine (which I didn't write and don't fully understand--the only parts of Z26 I've patched have been the memory read/write logic) I would need something with a fairly small footprint.
Any idea how QEMU works inside? Any idea what the GPL licence would require if I just use 2% of the code in QEMU--whether I'd have to bundle the code for the whole thing, or just the part that I use, or what?

Try QEMU.

With some work, you can make my emulator do what you want. It was written for ARM920, and the Thumb instruction set isn't done yet. Neither is the MMU/cache interface. Also, it's slow because it is an interpreter. On the bright side, it's all written in C99.
http://code.google.com/p/gp2xemu/
I haven't worked on it for a while (The svn trunk is 2 years old), but if you're going to use the code, I'll be glad to help you out with the missing features. It is licensed under MIT, so it's just the same as the broad BSD license.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight