Entering sleep mode on arm cortex m4 - arm

I'm trying to put a cortex m4 processor to sleep for a little less than a second. I want to be able to tell it to sleep, then a second later, or when a button is pressed, pick up right where I left off. I've looked in the reference manual and VLPS mode looks like it would fit my needs. I don't know how to begin to enter that mode or how to program the NVIC.
More Info:
I am doing this in C, on the bare metal.

You can download and inspect the code that implements this demo. Although the demo is for an RTOS the code used to place the CPU into a sleep mode is the same whether an RTOS is being used or the application is running on bare metal.
There are generic things you can do to place a Cortex-M3 core into a low power state (see the WFI instruction). To get extreme low power then you have to do chip specific things as well. The above linked code performs some chip specific pre-sleep processing (turn of peripherals, set the chips own sleep mode, etc.) before calling WFI, then does some chip specific things when it returns from the WFI instruction.

You don't need a RTOS in order to wake up from sleep a Cortex M4, what you need is to use and interrupt (ISR) you should refer to the manufacturer manual, you may wake up with a timer(ISR) or a button(GPIO) depending of the sleep-hibernation modes of your particular chip. Here is a ARM document more in depth about it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/BABGGICD.html

Related

ARM Cortex M3 - Add a new interrupt to the end of the vector table?

I am doing some bare metal C development on an ARM Cortex M3 SoC, and I wanted to check and see if it is possible to add a new user-defined interrupt handler to the NVIC. I am adding my own IRQ with the plan of triggering it via software, either via NVIC_SetPendingIRQ() or via the NVIC->STIR register. Neither seem to work.
I have added my interrupt vector name to the end of the vector list in the CMSIS startup assembler file, and added the corresponding enum to the system header, and while debugging and executing the function call NVIC_EnableIRQ(), it doesn't correctly update the NVIC->ISER (Interrupt Set Enable Register). So I guess, the question is, can you even add your own interrupt? There are 256 total interrupts than can be used in the ARM Cortex M3, and I just followed how the others were added so I figured it wouldn't be an issue.
Thank you.
The datasheet for your SoC should say how many interrupts are supported by the NVIC. While 240 is the maximum possible number for a Cortex-M3 device in general, the actual number on your chip is defined by the implementation and it makes sense to have that number be as small as possible to reduce costs.
In general, there is no way to add interrupts in software, but you might be able to use the SVCall interrupt, which is designed to be triggered by software. Or you could find some other interrupt you aren't using in your system, and which is not being activated by hardware, and try to use that for your purposes.
References:
Nested Vectored Interrupt Controller in the Cortex-M3 Devices Generic User Guide
The SVC instruction invokes the SVCall handler with an 8-bit service number available to the handler which can be used to invoke a handler from a look-up table (essentially a secondary vector table for software interrupts).
An example of that can be found at https://developer.arm.com/documentation/ka004005/latest, except there it uses a switch rather than a look-up table - to the same effect.

Timer supply to CPU in QEMU

I am trying to emulate the clock control for STM32 machine with CPU cortex m4. It is provided in the STM32 reference manual the clock supplied to the core is by the HCLK.
The RCC feeds the external clock of the Cortex System Timer (SysTick) with the AHB clock (HCLK) divided by 8. The SysTick can work either with this clock or with the Cortex clock (HCLK), configurable in the SysTick control and status register.
Now Cortex m4 is already emulated by QEMU and I am using the same for STM32 emulation. My confusion is should i supply the clock frequency of "HCLK" I have developed for STM32 to send clock pulses to cortex m4 or cortex -m4 itself manages to have its own clock with HCLK clock frequency 168MHz? or the clock frequency is different ?
If I have to pass this frequency to cortex m4, how do i do that?
QEMU's emulation does not generally try to emulate actual clock lines which send pulses at megahertz rates (this would be incredibly inefficient). Instead when the guest programs a timer device the model of the timer device sets up an internal QEMU timer to fire after the appropriate duration (and the handler for that then raises the interrupt line or does whatever is necessary for emulating the hardware behaviour). The duration is calculated from the values the guest has written to the device registers together with a value for what the clock frequency should be.
QEMU doesn't have any infrastructure for handling things like programmable clock dividers or a "clock tree" that routes clock signals around the SoC (one could be added, but nobody has got around to it yet). Instead timer devices are usually either written with a hard-coded frequency, or may be written to have a QOM property that allows the frequency to be set by the board or SoC model code that creates them.
In particular for the SysTick device in the Cortex-M models the current implementation will program the QEMU timer it uses with durations corresponding to a frequency of:
1MHz, if the guest has set the CLKSOURCE bit to 1 (processor clock)
something which the board model has configured via the 'system_clock_scale' global variable (eg 25MHz for the mps2 boards), if the guest has set CLKSOURCE to 0 (external reference clock)
(The system_clock_scale global should be set to NANOSECONDS_PER_SECOND / clk_frq_in_hz.)
The 1MHz is just a silly hardcoded value that nobody has yet bothered to improve upon, because we haven't run into guest code that cares yet. The system_clock_scale global is clunky but works.
None of this affects the speed of the emulated QEMU CPU (ie how many instructions it executes in a given time period). By default QEMU CPUs will run "as fast as possible". You can use the -icount option to specify that you want the CPU to run at a particular rate relative to real time, which sort of implicitly sets the 'cpu frequency', but this will only sort of roughly set an average -- some instructions will run much faster than others, in a not very predictable way. In general QEMU's philosophy is "run guest code as fast as we can", and we don't make any attempt at anything approaching cycle-accurate or otherwise tightly timed emulation.
Update as of 2020: QEMU now has some API and infrastructure for modelling clock trees, which is documented in docs/devel/clocks.rst in the source tree. This is basically a formalized version of the concepts described above, to make it easier for one device to tell another "my clock rate is 20MHz now" without hacks like the "system_clock_scale" global variable or ad-hoc QOM properties.
Systick is supplied via multiplexer and you can choose the AHB bus clock or divided by 8 system timer clock
An old thread and an oft asked question so this should help some of you trying to emulate cortex systems.
If using a .dtb when booting then in your .dts one can add to the 'timers' block a line of clock-frequency = <value>; and recompile it. This will indeed increase the speed of cortex processors. Clearly, value is some large number.

Which MCU(Cortex-M) for time critical GPIO application?

We have an application which runs on PIC24H, we would like to port it to another MCU, preferably ARM Cortex. Application is extremely time critical, meaning that we need extremely deterministic code behaviour. In short, there are pulses which are obtained via special hardware to GPIO pins, data is analyzed right away. Processing of data is not complex(we don't need a beefy cpu/mcu to do it). After analyzing the data GPIO output pins are written to their values.
App in 3 short lines:
process input pins
determine pattern within processing of input pins
based on the received pattern write output pins
PIC24H is working at 40MHz, we can toggle the pin in 25ns, we would be grateful with at least 2x speed for future upgrades. So MCU which can run deterministic code and toggle pins with at least 80MHz (12.5ns) would be just fine. We don't need toggling of the pins at constant fast rate, we need a mcu which can toggle it in less than 25ns. We can't waste cycles while toggling, if one cycle is off we loose synchronization. Everything must be done in one cycle precision(or two but constant two cycles), so code should be 100% deterministic.
Please let me know if I'm missing something or if what we need can be done using some other methods on Cortex-M. Just keep in mind that if one cycle is lost(due cache or similar) we loose signal sync and app will not do it's work right or at all.
Thanks!
Br
According to this blog post, the interrupt latency for Cortex-M ranges from 12 to 16 cycles (assuming you are not using FPU registers) with best-case memories. M0 and M0+ are slower than M3/M4/M7. On top of this, you need to add the GPIO access times (and watch out for different clock frequencies between the core and the peripherals. Cortex-M7 will suppport higher clock speeds than M3/M4.
It still isn't clear how many cycles are consumed in recognising a pattern, and how an interrupt is useful in doing this - generally a low latency interface function like this would be an obvious target for dedicated hardware, but since you have an existing software solution it seems the problem is mis-specified.
Providing you avoid accessing any 'slow' peripherals which might stall the bus, the interrupt latency should be deterministic - any specific device should have documentation which covers this.
NXP have an application note which describes some of the detail of how to measure what is going on.

How to Benchmark Runtime on Cortex-M4

I'm pretty new to ARM and am trying to get timing results for functions written in C for a Cortex-M4 processor. Would any of you be able to tell me what steps I need to take to get timing results?
I've been running my code on Keil uVision, but I'm unable to use the program's Performance Analyzer during a real-environment debug. From what I've read it seems that the Performance Analyzer only works outside of simulated debug sessions if one is using proprietary connector from Keil.
Set a pin high at the start of the function you wish to time, set it low at the end, and measure the pulse width with an oscilloscope.
Dending on which Cortex M4 you're using there may be a cycle count register DWT->CYCCNT, but the inclusion of such is vendor defined. Details can be found in the Cortex M4 Technical Reference Manual. Your processors datasheet, reference manual and programming manual should provide more information if required.
Alternatively, if you have a fast timer, such as the SysTick running from the processor clock, you could initialise the count to 0x00FFFFFF, start it downcounting at the beginning of your function and stop it at the end, you can then work out the time taken as (0x00FFFFFF - SysTick->CVR) * (1 / SysTick Frequency) .

fiq & irq handler -- arm

I am new to arm & have some doubs related to IRQ & FIQ. Please try to clarify these.
How many number of FIQ & IRQ channel arm have ?
And what number of handlers can we write for each channel ?
Also if we can register multiple handler for single interrupt channel how arm comes to know which handler to run.
The distinction between IRQ and FIQ goes right the way back to early days of ARM when it was designed by Acorn. It was always the case that the IRQ line was attached to an interrupt controller that multiplexed a large number of interrupt sources together. This is precisely what happens in all modern ARMs
The rationale behind the FIQ was to provide an extremely low latency response with maximum priority (it can safely pre-empt the IRQ handler). The comparatively large number of shadow registers facilitate writing handlers that store the handler's state in CPU registers and not hitting the stack.
The shadow registers are almost of the opposite set to those commonly used by APCS for function call, so writing handlers in C, would cause a push and eventual pop of up to 8 non-shadowed registers. Having any kind of interrupt demultiplexing wipes out any performance advantage that FIQ might have given.
All of this means that there is only really any benefit in using FIQ for very specialised applications where really hard-real time interrupt response is required for one interrupting device, and you're willing to write your handler in assembler. You'll also be left with working out how to synchronise with the rest of the system - some of which would rely on disabling IRQ to keep data synchronised.
traditionally the arm has one interrupt line which you can send to one of two handlers FIQ or IRQ. FIQ has a larger bank of FIQ mode only registers so you have fewer that you need to store on the stack. From there you read the vendor specific registers if any to determine the source of the interrupt and then branch into separate handlers.
More recently there have bend arm architectures with many interrupts 128, 256 each with a separate handler. So generically asking about arm is not as varied but about like asking something generic about x86.
All of this information is easily available in the ARM architectural reference manuals for the different architectures and the pinouts to the core (what the vendor builds its chip around) is documented in the technical reference manuals for the various cores (also very easy to obtain). infocenter.arm.com has the architecture and technical reference manuals as well as amba/axi (the data bus that the vendor connects to). Your question is completely answered in those documents.
The ARM processor directly supports only ONE IRQ and ONE FIQ. ARM supports multiple interrupts through a peripheral called Interrupt Controller. ARM standard interrupt controllers are called GIC (Generic Interrupt Controller).
The GIC has a number of inputs for peripherals to connect their interrupt lines and two output lines that connect to IRQ and FIQ. Basically it acts as a MUX. A GIC driver will setup configurations such as interrupt priority, type (IRQ/FIQ), masking etc.
In traditional ARM systems there is one entry each for IRQ and FIQ in the Exception Vectors. Depending on which line the interrupt fired, IRQ or FIQ handler is called. The interrupt handler queries the GIC (GIC CPU interface registers, to be specific) to get the interrupt number. Based on this interrupt number, corresponding device handler is invoked.
Number of interrupts depends on the specific GIC implementation. So you would have to check the manual for the interrupt controller in your system to get those specifics.
Note: The interrupt handling is slightly different depending on which specific ARM core you are coding for.
Actually the question is a bit tricky. You must specify in the question to which architecture in ARM you work. ARM v7-A and ARM v7-R Architecture Reference Manual (ARM ARM) specifies one FIQ and one IRQ, as many already answered. But ARMv7-M (used in Cortex-M processors) integrates a interrupt controller in the processor, and thus offers one NMI (instead of FIQ) and up to 240 IRQ lines.
For more information: ARMv7 A and ARMv7-R Architecure reference manual: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html
ARMv7-M Architecture Reference Manual: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403e.b/index.html
As an example, Cortex M4 specs sheet: http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php

Resources