Benchmarking on CMSIS RTOS Cortex M-33 - arm

I'm trying to time the duration of a function on a Cortex M33 with CMSIS RTOS. I'm currently reading cycles directly from the ARM_CM_DWT_CYCCNT register.
This is working, but I'm wondering whether I can do anything else to increase the precision/variance of my measurement? I.e. limit interrupts etc.?
Some third party code has included the use of int_lock() and int_unlock(lock) but I can't find any CMSIS RTOS documentation of this usage.

Your "third-pary" routines are most likely some sort of wrapper around a CPSR manipulation. Intuitively I'd expect to wrap the code section in something like :
lock_handle = int_lock() ;
...
int_unlock( lock_handle );
But unless you have documentation or access to the source of these functions to be certain how they behave, you might do well to avoid using them.
CMSIS RTOS does not provide facilities to disable interrupts or implement critical sections. In general you have failed in your design if your application needs critical-sections. More generally however CMSIS CORE (CMSIS is more than just CMSIS RTOS) has the NVIC API with NVIC_DisableIRQ() and NVIC_EnableIRQ() functions to enable/disable individual interrupts (in case some such as SYSTICK are essential to your benchmark).
Means to globally disable all interrupts is part of CMSIS Core Register Access which defines __disable_irq() and __enable_irq().
It is likely that the third-party enable/disable functions provide a handle to ensure that enabling and disabling are correctly paired or perhaps a nest counter so that only the outer enable of a nested disable re-enables interrupts. These are all mechanisms that are trivial to implement using the available the CMSIS primitives, but may not be needed in your case.

Related

ARM Cortex M3 - Add a new interrupt to the end of the vector table?

I am doing some bare metal C development on an ARM Cortex M3 SoC, and I wanted to check and see if it is possible to add a new user-defined interrupt handler to the NVIC. I am adding my own IRQ with the plan of triggering it via software, either via NVIC_SetPendingIRQ() or via the NVIC->STIR register. Neither seem to work.
I have added my interrupt vector name to the end of the vector list in the CMSIS startup assembler file, and added the corresponding enum to the system header, and while debugging and executing the function call NVIC_EnableIRQ(), it doesn't correctly update the NVIC->ISER (Interrupt Set Enable Register). So I guess, the question is, can you even add your own interrupt? There are 256 total interrupts than can be used in the ARM Cortex M3, and I just followed how the others were added so I figured it wouldn't be an issue.
Thank you.
The datasheet for your SoC should say how many interrupts are supported by the NVIC. While 240 is the maximum possible number for a Cortex-M3 device in general, the actual number on your chip is defined by the implementation and it makes sense to have that number be as small as possible to reduce costs.
In general, there is no way to add interrupts in software, but you might be able to use the SVCall interrupt, which is designed to be triggered by software. Or you could find some other interrupt you aren't using in your system, and which is not being activated by hardware, and try to use that for your purposes.
References:
Nested Vectored Interrupt Controller in the Cortex-M3 Devices Generic User Guide
The SVC instruction invokes the SVCall handler with an 8-bit service number available to the handler which can be used to invoke a handler from a look-up table (essentially a secondary vector table for software interrupts).
An example of that can be found at https://developer.arm.com/documentation/ka004005/latest, except there it uses a switch rather than a look-up table - to the same effect.

Critical sections in ARM

I am experienced in implementing critical sections on the AVR family of processors, where all you do is disable interrupts (with a memory barrier of course), do the critical operation, and then reenable interrupts:
void my_critical_function()
{
cli(); //Disable interrupts
// Mission critical code here
sei(); //Enable interrupts
}
Now my question is this:
Does this simple method apply to the ARM architecture of processor as well? I have heard things about the processor doing lookahead on the instructions, and other black magic, and was wondering primarily if these types of things could be problematic to this implementation of critical sections.
Assuming you're on a Cortex-M processor, take a look at the LDREX and STREX instructions, which are available in C via the __LDREXW() and __STREXW() macros provided by CMSIS (the Cortex Microcontroller Software Interface Standard). They can be used to build extremely lightweight mutual exclusion mechanisms.
Basically,
data = __LDREXW(address)
works like data = *address except that it sets an 'exclusive access flag' in the CPU. When you've finished manipulating your data, write it back using
success = __STREXW(address, data)
which is like *address = data but will only succeed in writing if the exclusive access flag is still set. If it does succeed in writing then it also clears the flag. It returns 0 on success and 1 on failure. If the STREX fails, you have to go back to the LDREX and try again.
For simple exclusive access to a shared variable, nothing else is required. For example:
do {
data = LDREX(address);
data++;
} while (STREXW(address, data));
The interesting thing about this mechanism is that it's effectively 'last come, first served'; if this code is interrupted and the interrupt uses LDREX and STREX, the STREX interrupt will succeed and the (lower-priority) user code will have to retry.
If you're using an operating system, the same primitives can be used to build 'proper' semaphores and mutexes (see this application note, for example); but then again if you're using an OS you probably already have access to mutexes through its API!
ARM architecture is very wide and as I understand you probably mean ARM Cortex M micro controllers.
You can use this technique, but many ARM uCs offer much more. As I do know what is the actual hardware I can only give you some examples:
bitband area. In this memory regions you can set and reset bits atomic way.
Hardware semaphores (STM32H7)
Hardware MUTEX-es (some NXP uCs)
etc etc.

how to single-step code on-target with no jtag, breakpoints, simulator, emulator

Let's say you have a pointer to function whose source you do not have and which is "untrusted" because it might read/write to disallowed memory region.
Before it executes each assembly instruction, you want to verify that it doesn't access disallowed memory regions.
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
This is for a functionality that needs to be enabled not only during development but during normal runtime.
Ideally, it'd run something like this:
void (*fptr)(int);
fptr = &someFunction; // untrusted, don't have source
// enable interrupts for each assembly instruction
_EN_INT();
// call the function
fptr();
// everytime the PC increments, some other code runs which verifies that if any load/stores are executed, it doesn't access some disallowed memory range
// disable interrupts for each assembly instruction
_DIS_INT();
QUESTION
Is it possible to call that function and pause execution after every assembly instruction?
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
My answer assumes that you can modify the "OS" the way you need it...
Cortex MK20DX256VLH7
This seems to be a Cortex M4 CPU.
how to single-step code on-target with no jtag, breakpoints
From the doc, it doesn't say whether you NEED an external debugger to resume execution.
If the CPU is really stopped, you'll definitely need an external signal (e.g. from a debugger).
However most CPUs support software debugging. This means that an interrupt service routine is executed whenever a breakpoint is hit. To continue execution you simply return from the interrupt service routine.
I don't know about the Cortex M4 but for the Cortex M3 you'll have to set some special registers to enable that feature. Whenever a "BKPT" instruction is hit then interrupt #12 (*) is executed.
For code in RAM you simply write an BKPT instruction (0xBExx, e.g. 0xBEBE) to the address where you want to set your breakpoint. (Before writing you read out the value to be able to restore it later on).
For code in Flash the M3 has a "Flash patching unit" which allows you to specify up to three addresses which shall be read out as 0xBExx (0xBEBE ?) even if other data is stored there. This allows you to set up to 3 breakpoints in Flash.
Interesting for you: The register controlling the debug features in the M3 (named "DEMCR") also has a bit named "MON_STEP":
If you set this bit in interrupt handler #12 then exactly one instruction is executed after returning from the interrupt handler and interrupt #12 is triggered again. The use case for this feature is - of course - single-stepping code!
To stop single-stepping you'll have to clear the MON_STEP bit again...
Important 1:
I don't know if the MK20DX256VLH7 really has all these features. However because it is a Cortex M4 chip and the M4 should have nearly all features of the M3 these features should be present...
Important 2:
Implementing single-stepping and debugging is not done quickly. Assembly language knowledge will be very helpful and you'll need a lot of time...
From the doc, ...
You will not only need the documentation for the MK20DX256VLH7 from NXP but you'll also need the Cortex M4 documentation from ARM.
(*) Offset 4*12 in the vector table is meant here (which is named "IRQ(-4)" in some ARM documents); not IRQ12.
yes the ARM emulator/interpreter sounds exactly like what I want. Is there a free one?
qemu is open-source, most of it is GPLv2. https://wiki.qemu.org/License. You'd probably need to modify it a lot, because it's designed for use as a stand-alone wrapper for a whole Unix process (qemu-user) or whole machine (qemu-system).
I googled, and there's also http://www.unicorn-engine.org/ which is designed to be used as part of a larger program (written in C with bindings for calling from various languages). It's also GPLv2 (not LGPL), so you can use it if the rest of your code is also Free software.
It's actually based on the CPU-emulation code from QEMU; they stripped out all the device / BIOS emulation stuff to make a flexible library for just emulating CPUs.
Presumably you could configure some memory protections for it and set up the starting machine state, and let it run your function (with a return address that leads to some code that hands control back to your main code?)

How to disable nesting in NVIC Interrupts in ARM Cortex M0+?

I have started using ARM Cortex M0+ for GPIO Interrupts. I want to disable nesting feature from ARM Interrupts. Is there any way to do it.? I know by default, nesting is enabled in ARM, I want to disable it.
ARM Cortex-M0/M0+ do not support interrupt priority grouping into preemption priority(nestable) and sub-priority (non-nestable) available on the M3/M4/M7 for example.
If you wish to prevent interrupt nesting; it would be necessary to either;
set all interrupts to the same priority, or
disable and re-enable interrupts on entry and exit to all handlers.
The first of these options is the simplest, but gives no control over execution order (which seldom matters for asynchronous events, but may lead to non-deterministic behaviour and timing). The second does not actually prevent nesting, but does allow nesting only before the lower-priority interrupt has disabled the interrupts - before it has started processing the actual event. The result is behaviour similar to that of sub-priorities available on Cortex-M3 etc.

Atomic Block for reading vs ARM SysTicks

I am currently porting my DCF77 library (you may find the source code at GitHub) from Arduino (AVR based) to Arduino Due (ARM Cortex M3). I am an absolute beginner with the ARM platform.
With the AVR based Arduino I can use avr-libc to get atomic blocks. Basically this blocks all interrupts during the block and will allow interrupts later on again. For the AVR this was fine. Now for the ARM Cortex things start to get complicated.
First of all: for the current uses of the library this approach would work as well. So my first question is: is there someting similar to the "ATOMIC" macros of avr-libc for ARM? Obviously other people have thought of something in this directions. Since I am using gcc I could enhance these macors to work almost exactly like the avr-libv ATOMIC macors. I already found some CMSIS documentation however this seems only to provide an "enable_irq" macro instead of a "restore_irq" macro.
Question 1: is there any library out there (for gcc) that already does this?
Because ARM has different priority interrupts I could establish the atomicity in different ways as well. In my case the "atomic" blocks must only make sure that they are not interrupted by the systick interrupt. So actually I would not need to block everything to make my blocks "atomic enough". Searching further I found an ARM synchronization primitives article in the developer infocenter. Especially there is a hint at lockless programming. According to the article this is an advanced concept and that there are many publications on it. Searching the net I found only general explanations of the concept, e.g. here. I assume that a lockless implementation would be very cool but at this time I feel not confident enough on ARM to implement this from scratch.
Question 2: does anyone have some hints for me on lockless reads of memory blocks on ARM Cortex M3?
As I already said I only need to protect the lower priority thread from sysTicks. So another option would be to disable sysTicks briefly. Since I am implementing a timing sensitive clock algorithm this must not slow down the overall sysTick frequency in the long run. Introducing some small jitter would be OK though. At this time I would find this most attractive.
Question 3: is there any good way to block sysTick interrupts without losing any ticks?
I also found the CMSIS documentation for semaphores. However I am somewhat overwhelmed. Especially I am wondering if I should use CMSIS and how to do this on an Arduino Due.
Question 4: What would be my best option? Or where should I continue reading?
Partial Answer:
with the hint from Notlikethat I implemented
#if defined(ARDUINO_ARCH_AVR)
#include <util/atomic.h>
#define CRITICAL_SECTION ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
#elif defined(ARDUINO_ARCH_SAM)
// Workaround as suggested by Stackoverflow user "Notlikethat"
// http://stackoverflow.com/questions/27998059/atomic-block-for-reading-vs-arm-systicks
static inline int __int_disable_irq(void) {
int primask;
asm volatile("mrs %0, PRIMASK\n" : "=r"(primask));
asm volatile("cpsid i\n");
return primask & 1;
}
static inline void __int_restore_irq(int *primask) {
if (!(*primask)) {
asm volatile ("" ::: "memory");
asm volatile("cpsie i\n");
}
}
// This critical section macro borrows heavily from
// avr-libc util/atomic.h
// --> http://www.nongnu.org/avr-libc/user-manual/atomic_8h_source.html
#define CRITICAL_SECTION for (int primask_save __attribute__((__cleanup__(__int_restore_irq))) = __int_disable_irq(), __ToDo = 1; __ToDo; __ToDo = 0)
#else
#error Unsupported controller architecture
#endif
This macro does more or less what I need. However I find there is room for improvement as this blocks all interrupts although it would be sufficient to block only systicks. So Question 3 is still open.
Most of what you've referenced is about synchronising memory accesses between multiple CPUs, or pre-emptively scheduled threads on the same CPU, which seems entirely inappropriate given the stated situation. "Atomicity" in that sense refers to guaranteeing that when one observer is updating memory, any observer reading memory sees either the initial state, or the updated state, but never something part-way in between.
"Atomicity" with respect to interrupts follows the same principle - i.e. ensuring that if an interrupt occurs, a sequence of code has either not run at all, or run completely - but is a conceptually different thing1. There are only two things guaranteed to be atomic w.r.t. interrupts: a single instruction2, or a sequence of instructions executed with interrupts disabled.
The "right" way to achieve that is indeed via the CPSID/CPSIE instructions, which are wrapped in the __disable_irq()/__enable_irq() intrinsics. Note that there are two "stages" of interrupt handling in the system: the M3 core itself only has a single IRQ signal - it's the external NVIC's job to do all the routing/multiplexing/prioritisation of the system IRQs into this one line. When the CPU wants to enter a critical section, all it needs to do is mask its own IRQ input with CPSID, do what it needs, then unmask with CPSIE, at which point any pending IRQ from the NVIC will be taken immediately.
For the case of nested/re-entrant critical sections, the intrinsics provide a handy int __disable_irq(void) form which returns the previous state, so you can unmask conditionally on that.
For other compilers which don't offer such intrinsics, it's straightforward enough to roll your own, e.g.:
static inline int disable_irq(void) {
int primask;
asm volatile("mrs %0, PRIMASK\n"
"cpsid i\n" : "=r"(primask));
return primask & 1;
}
static inline void enable_irq(int primask) {
if (primask)
asm volatile("cpsie i\n");
}
[1] One confusing overlap is the latter sense is often used to achieve the former in single-CPU multitasking - if interrupts are off, no other thread can get scheduled until you've finished, thus will never see partially-updated memory.
[2] With the possible exception of load/store-multiple instructions - in the low-latency interrupt configuration, these can be interrupted, and either restarted or continued upon return.

Resources