Measuring Cycle Count on an505 M33 Qemu - arm

I'm trying to emulate an Arm cortex M33 using QEMU, using the an505 model. I've used this git repo as a starting point.
I've successfully built the project and even managed to debug into it however now I want to measure the cpu cycles consumed - without any luck.
Firstly, I tried to access the DWT registers like so:
#define ARM_CM_DEMCR (*(uint32_t *)0xE000EDFC)
#define ARM_CM_DWT_CTRL (*(uint32_t *)0xE0001000)
#define ARM_CM_DWT_CYCCNT (*(uint32_t *)0xE0001004)
main()
{
if (ARM_CM_DWT_CTRL != 0) { /* See if DWT is available */
ARM_CM_DEMCR |= 1 << 24; /* Set bit 24 */
ARM_CM_DWT_CYCCNT = 0;
ARM_CM_DWT_CTRL |= 1 << 0; /* Set bit 24 */
}
}
however ARM_CM_DWT_CTRL appears to be 0, indicating that the DWT register is not configured. This code is working on the actual M33 hardware I have. I've checked the an505 doc and I can't see anything wrt DWT. Is this therefore a lost cause? Is it not likely that the FPGA would implement the DWT register?
I moved on to using the SysTick API as described here however when I access SysTick->VAL it equals 0.
I've also tried to read from STCVR = 0xE000E018 as per the instructions here but also, this returns 0.
Am I missing something fundamental here?

QEMU is not a cycle accurate model. We also don't implement the DWT unit.
The SysTick timer in QEMU's implementation will essentially measure real-world time. This is not a very good guide to what real-hardware performance will look like.
If you want to measure the performance of code then your best bet is usually to do it on the real hardware that you care about. Getting performance information out of an emulation is tricky at best. This old answer to another question has a runthrough of some of the difficulties.

Related

Using PMU to count ticks on ARM

I'm trying to use the PMU (specifically use PMCCNTR to determine ticks per usec) from userspace in ARM. I have an arm64 kernel running an arm32 bit userspace app.
I created an LKM to force the PMUSERENR.EN bit to be on, which works. I then I ran a test program from userspace:
asm volatile ("mrc p15, 0, %[en], c9, c14, 0"
: [en] "=r" (pmuserenr));
printf(" -- %08x\n", pmuserenr);
The first few times I ran this, the bit was correctly marked as turned on. But every third or fourth time I run it, the bit shows up as off. After puzzling over this, I came across this link, where the user is enabling this bit for each cpu. Does each core have it's own instance of a PMU, or is there only one per SoC? If there are multiple per SoC, are the PMUCCNTR registers synced up? (That is, if I read the PMUCCNTR, then do a context switch to another core, and read PMUCCNTR again, can I safely compare the two to see how many ticks have elapsed?).
Even if I modify my LKM to initialize on every online-cpu (and register a notifier to enable it on all new cpu's coming up), I'm still seeing the same behavior, where every nth read from userspace shows the bit not set.
My other issue is that the PMCCNTR is not incrementing. I set the PMCR.E bit but it's still not incrementing. I found one website that says I need to set the PMCR.C bit -- but this has a side effect of clearing the counter, which I don't want to do in fear of creating a race condition (where someone else is using the counter as I clear it). Any help or insights would be very welcome.
Each cpu has its own PMU instance, they are entirely independant (except for a small number of cluster wide events, where each PMU keeps its own count).
See, for example, the first hit today for "debug memory map pmu", remembering that v7 and v8 debug memory map standards are different. The actual answer is too 'obvious' to expect it to be explicitly documented.
0x010000 - 0x010FFF CPU 0 Debug
0x030000 - 0x030FFF CPU 0 PMU
0x110000 - 0x110FFF CPU 1 Debug
0x130000 - 0x130FFF CPU 1 PMU
0x210000 - 0x210FFF CPU 2 Debug
0x230000 - 0x230FFF CPU 2 PMU
0x310000 - 0x310FFF CPU 3 Debug
0x330000 - 0x330FFF CPU 3 PMU

Programming GPIO pins on a FINTEK F81866A chipset

I have a Cincoze DE-1000 industrial PC, that features a Fintek F81866A chipset. I have to manage the DIO pins to read the input from a phisical button and to set on/off a LED. I have experience in C++ programming, but not at low/hardware level.
On the documentation accompanying the PC, there is the following C code:
#define AddrPort 0x4E
#define DataPort 0x4F
//<Enter the Extended Function Mode>
WriteByte(AddrPort, 0x87)
WriteByte(AddrPort, 0x87) //Must write twice to entering Extended mode
//<Select Logic Device>
WriteByte(AddrPort, 0x07)
WriteByte(DataPort, 0x06)
//Select logic device 06h
//<Input Mode Selection> //Set GP74 to GP77 input mode
WriteByte(AddrPort, 0x80) //Select configuration register 80h
WriteByte(DataPort, 0x0X)
//Set (bit 4~7) = 0 to select GP 74~77 as Input mode.
//<input Value>
WriteByte(AddrPort, 0x82) // Select configuration register 82h
ReadByte(DataPort, Value) // Read bit 4~7(0xFx)= GP74 ~77 as High.
//<Leave the Extended Function Mode>
WriteByte(AddrPort, 0xAA)
As far as I understood, the above code should read the value of the four input PINs (so it should read 1 for each PIN), but I am really struggling to understand how it actually works. I have understood the logic (selecting an address and reading/writing an hex value to it), but I cannot figure out what kind of C instructions WriteByte() and ReadByte() are. Also, I do not understand where Value in the line ReadByte(DataPort, Value) comes from. It should read the 4 PINs all together, so it should be some kind of "byte" type and it should contain 1 in its bits 4-7, but again I cannot really grasp the meaning of that line.
I have found an answer for a similar chip, but it did not help me in understanding.
Please advice me or point me to some relevant documentation.
That chip looks like a fairly typical Super I/O controller, which is basically the hub where all of the "slow" peripherals are combined into a single chipset.
Coreboot has a wiki page that talks about how to access the super I/O.
On the PC architecture, Port I/O is accomplished using special CPU instructions, namely in and out. These are privileged instructions, which can only be used from a kernel-mode driver (Ring 0), or a userspace process which has been given I/O privileges.
Luckily, this is easy in Linux. Check out the man page for outb and friends.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
So we could adapt your function into a Linux environment like this:
/* Untested: Use at your own risk! */
#include <sys/io.h>
#include <stdio.h>
#define ReadByte(port) inb(port)
#define WriteByte(port, val) outb(val, port)
int main(void)
{
if (iopl(3) < 0) {
fprintf(stderr, "Failed to get I/O privileges (are you root?)\n");
return 2;
}
/* Your code using ReadByte / WriteByte here */
}
Warning
You should be very careful when using this method to talk directly to the Super IO, because your operating system almost certainly has device drivers that are also talking to the chip.
The right way to accomplish this is to write a device driver that properly coordinates with other kernel code to avoid concurrent access to the device.
The Linux kernel provides GPIO access to at least some Super I/O devices; it should be straightforward to port one of these to your platform. See this pull request for the IT87xx chipset.
WriteByte() and ReadByte() are not part of the C language. By the looks of things, they are functions intended to be placeholders for some form of system call for the OS kernel port IO (not macros doing memory mapped IO directly as per a previous version of this answer).
The prototypes for the functions would be something along the lines of:
#include <stdint.h>
void WriteByte(unsigned port, uint8_t value);
void ReadByte(unsigned port, uint8_t *value);
Thus the Value variable would be a pointer to an 8 bit unsigned integer (unsigned char could also be used), something like:
uint8_t realValue;
uint8_t *Value = &realValue;
Of course it would make much more sense to have Value just be a uint8_t and have ReadByte(DataPort, &Value). But then the example code also doesn't have any semicolons, so was probably never anything that actually ran. Either way, this is how Value would contain the data you are looking for.
I also found some more documentation of the registers here - https://www.electronicsdatasheets.com/download/534cf560e34e2406135f469d.pdf?format=pdf
Hope this helps.

Atomic Block for reading vs ARM SysTicks

I am currently porting my DCF77 library (you may find the source code at GitHub) from Arduino (AVR based) to Arduino Due (ARM Cortex M3). I am an absolute beginner with the ARM platform.
With the AVR based Arduino I can use avr-libc to get atomic blocks. Basically this blocks all interrupts during the block and will allow interrupts later on again. For the AVR this was fine. Now for the ARM Cortex things start to get complicated.
First of all: for the current uses of the library this approach would work as well. So my first question is: is there someting similar to the "ATOMIC" macros of avr-libc for ARM? Obviously other people have thought of something in this directions. Since I am using gcc I could enhance these macors to work almost exactly like the avr-libv ATOMIC macors. I already found some CMSIS documentation however this seems only to provide an "enable_irq" macro instead of a "restore_irq" macro.
Question 1: is there any library out there (for gcc) that already does this?
Because ARM has different priority interrupts I could establish the atomicity in different ways as well. In my case the "atomic" blocks must only make sure that they are not interrupted by the systick interrupt. So actually I would not need to block everything to make my blocks "atomic enough". Searching further I found an ARM synchronization primitives article in the developer infocenter. Especially there is a hint at lockless programming. According to the article this is an advanced concept and that there are many publications on it. Searching the net I found only general explanations of the concept, e.g. here. I assume that a lockless implementation would be very cool but at this time I feel not confident enough on ARM to implement this from scratch.
Question 2: does anyone have some hints for me on lockless reads of memory blocks on ARM Cortex M3?
As I already said I only need to protect the lower priority thread from sysTicks. So another option would be to disable sysTicks briefly. Since I am implementing a timing sensitive clock algorithm this must not slow down the overall sysTick frequency in the long run. Introducing some small jitter would be OK though. At this time I would find this most attractive.
Question 3: is there any good way to block sysTick interrupts without losing any ticks?
I also found the CMSIS documentation for semaphores. However I am somewhat overwhelmed. Especially I am wondering if I should use CMSIS and how to do this on an Arduino Due.
Question 4: What would be my best option? Or where should I continue reading?
Partial Answer:
with the hint from Notlikethat I implemented
#if defined(ARDUINO_ARCH_AVR)
#include <util/atomic.h>
#define CRITICAL_SECTION ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
#elif defined(ARDUINO_ARCH_SAM)
// Workaround as suggested by Stackoverflow user "Notlikethat"
// http://stackoverflow.com/questions/27998059/atomic-block-for-reading-vs-arm-systicks
static inline int __int_disable_irq(void) {
int primask;
asm volatile("mrs %0, PRIMASK\n" : "=r"(primask));
asm volatile("cpsid i\n");
return primask & 1;
}
static inline void __int_restore_irq(int *primask) {
if (!(*primask)) {
asm volatile ("" ::: "memory");
asm volatile("cpsie i\n");
}
}
// This critical section macro borrows heavily from
// avr-libc util/atomic.h
// --> http://www.nongnu.org/avr-libc/user-manual/atomic_8h_source.html
#define CRITICAL_SECTION for (int primask_save __attribute__((__cleanup__(__int_restore_irq))) = __int_disable_irq(), __ToDo = 1; __ToDo; __ToDo = 0)
#else
#error Unsupported controller architecture
#endif
This macro does more or less what I need. However I find there is room for improvement as this blocks all interrupts although it would be sufficient to block only systicks. So Question 3 is still open.
Most of what you've referenced is about synchronising memory accesses between multiple CPUs, or pre-emptively scheduled threads on the same CPU, which seems entirely inappropriate given the stated situation. "Atomicity" in that sense refers to guaranteeing that when one observer is updating memory, any observer reading memory sees either the initial state, or the updated state, but never something part-way in between.
"Atomicity" with respect to interrupts follows the same principle - i.e. ensuring that if an interrupt occurs, a sequence of code has either not run at all, or run completely - but is a conceptually different thing1. There are only two things guaranteed to be atomic w.r.t. interrupts: a single instruction2, or a sequence of instructions executed with interrupts disabled.
The "right" way to achieve that is indeed via the CPSID/CPSIE instructions, which are wrapped in the __disable_irq()/__enable_irq() intrinsics. Note that there are two "stages" of interrupt handling in the system: the M3 core itself only has a single IRQ signal - it's the external NVIC's job to do all the routing/multiplexing/prioritisation of the system IRQs into this one line. When the CPU wants to enter a critical section, all it needs to do is mask its own IRQ input with CPSID, do what it needs, then unmask with CPSIE, at which point any pending IRQ from the NVIC will be taken immediately.
For the case of nested/re-entrant critical sections, the intrinsics provide a handy int __disable_irq(void) form which returns the previous state, so you can unmask conditionally on that.
For other compilers which don't offer such intrinsics, it's straightforward enough to roll your own, e.g.:
static inline int disable_irq(void) {
int primask;
asm volatile("mrs %0, PRIMASK\n"
"cpsid i\n" : "=r"(primask));
return primask & 1;
}
static inline void enable_irq(int primask) {
if (primask)
asm volatile("cpsie i\n");
}
[1] One confusing overlap is the latter sense is often used to achieve the former in single-CPU multitasking - if interrupts are off, no other thread can get scheduled until you've finished, thus will never see partially-updated memory.
[2] With the possible exception of load/store-multiple instructions - in the low-latency interrupt configuration, these can be interrupted, and either restarted or continued upon return.

Writing Flash on STM32

I am implementing a emulated EEPROM in flash memory on a STM32 microprocessor, mostly based on the Application Note by ST (AN2594 - EEPROM emulation in STM32F10x microcontrollers).
The basics outline there and in the respective Datasheet and Programming manual (PM0075) are quite clear. However, I am unsure regarding the implications of power-out/system reset on flash programming and page erasure operations. The AppNote considers this case, too but does not clarify what exactly happens when a programming (write) operations is interrupted:
Does the address have a arbitrary (random) value? OR
Are only part of the bits written? OR
Does it have the default erase value 0xFF?
Thanks for hints or pointers to the relevant documentation.
Arne
This is not really a software question (much less C++). It belongs on electronics.se, but there does not seem to be an option to migrate questions thereā€¦ only to sites such as superuser or webmasters.se.
The short answer is that hardware is inherently unreliable. Something can always in theory go wrong that interrupts the write process or causes the wrong bit to be written.
The long answer is that Flash circuits are usually designed for maximum reliability. A sudden power loss on write will probably not cause corruption because the driver circuit may have enough capacitance or the capability to operate under a low-voltage condition long enough to finish draining the charge as necessary. A power loss on erasure might be trickier. You really need to consult the manufacturer.
For a "soft" system reset with no power interruption, it would be pretty surprising if the hardware didn't always completely erase whatever bytes it was immediately working on. Usually the bytes are erased in a predefined order, so you can use the first or last ones to indicate whether a page is full or empty.
#include "stm32f10x.h"
#define FLASH_KEY1 ((uint32_t)0x45670123)
#define FLASH_KEY2 ((uint32_t)0xCDEF89AB)
#define Page_127 0x0801FC00
uint16_t i;
int main()
{
//FLASH_Unlock
FLASH->KEYR = FLASH_KEY1;
FLASH->KEYR = FLASH_KEY2;
//FLASH_Erase Page
while((FLASH->SR&FLASH_SR_BSY));
FLASH->CR |= FLASH_CR_PER; //Page Erase Set
FLASH->AR = Page_127; //Page Address
FLASH->CR |= FLASH_CR_STRT; //Start Page Erase
while((FLASH->SR&FLASH_SR_BSY));
FLASH->CR &= ~FLASH_CR_PER; //Page Erase Clear
//FLASH_Program HalfWord
FLASH->CR |= FLASH_CR_PG;
for(i=0; i<1024; i+=2)
{
while((FLASH->SR&FLASH_SR_BSY));
*(__IO uint16_t*)(Page_127 + i) = i;
}
FLASH->CR &= ~FLASH_CR_PG;
FLASH->CR |= FLASH_CR_LOCK;
while(1);
}
If you are using the EEProm Emulation driver, you shouldn't worry too much about the flash corruption issues as the EEProm emulation driver always keeps a shadow copy in another page. Worst come worst, you will lose the most recent values that are being written into the flash. If you look closely on the emulation driver, you will notice that it is nothing but essentially a wrapper to stm32fxx_flash.c in the standard peripheral library.
If you look at the application note, you will see the times that the emulation library take for the flash operations. Erasing a page typically takes the longest time (tens of milliseconds on M0 core - this depends on the clock frequency).
If you are using the EEProm Emulation driver, you had bettern add a function such as check the data after write finished.
For example, if you have 10 data to save, so you need write 11 bytes to flash. The last byte is checksum. And check the data after read from flash.

The Cleanest Reset for an ARM Processor

Lately, I've been cleaning up some some C code that runs on an ARM7 controller. In some situations (upgrade, fatal error, etc...) the program will perform a reset. Presently it just jumps to 0 and assumes that the start-up code will reinitialize everything correctly. It got me to thinking about what would be the best procedure a la "Leave No Trace" for an ARM reset. Here is my first crack at it:
void Reset(void)
{
/* Disable interrupts */
__disable_interrupts();
/* Reset peripherals, externals and processor */
AT91C_BASE_RSTC->RSTC_RCR = AT91C_RSTC_KEY | AT91C_RSTC_PERRST | AT91C_RSTC_EXTRST| AT91C_RSTC_PROCRST;
while(AT91C_BASE_RSTC->RSTC_RSR & AT91C_RSTC_SRCMP);
/* Jump to the reset vector */
(*(void(*)())0)();
}
This code assumes the IAR ARM compiler and the At91Lib. Anything I haven't considered?
The very best solution to accomplish a "hard reset", as opposed to simply jumping through the reset vector, is to force a watchdog timer reset -- if you have one, that is.
Since your title is "cleanest reset", that's my advice. If you simply do a "jump to reset vector", the system could be in any number of states (peripherals still active, ADC conversions in progress, etc...)
I agree with #Dan that if your system has a watchdog timer available, that should provide the cleanest whole-board reset. BUT... If your processor is an ARMv7-M architecture (e.g. Cortex-M3, etc), you may be able to do the following even if you do NOT have a watchdog timer available, depending on your particular implementation:
#define SYSRESETREQ (1<<2)
#define VECTKEY (0x05fa0000UL)
#define VECTKEY_MASK (0x0000ffffUL)
#define AIRCR (*(uint32_t*)0xe000ed0cUL) // fixed arch-defined address
#define REQUEST_EXTERNAL_RESET (AIRCR=(AIRCR&VECTKEY_MASK)|VECTKEY|SYSRESETREQ)
printf("\nRequesting an external reset...\n");
fflush(stdout);
REQUEST_EXTERNAL_RESET;
printf("\nIt doesn't seem to have worked.\n");
fflush(stdout);
See the ARMv7-M Architecture Reference Manual, search for AIRCR and SYSRESETREQ.
This may be effectively the same solution as what "Judge Maygarden" posted, but the identifiers used in his post appear to be Atmel-specific, while the AIRCR register & SYSRESETREQ bits are defined by the underlying ARMv7-M architecture, not by Atmel.
That should do the trick. I use a similar function with an Atmel SAM3U. I never bothered to poll the status register, but that's a good idea and I'm going to go add that right now!
However, you should never get to the reset vector line since the processor will have already reset. IAR has an __noreturn attribute for use in these cases to allow further compiler optimization. I also load my reset function into ram (see __ramfunc) since I use at the end of firmware updates where the microcontroller can't run from flash.
Also, you shouldn't need AT91C_RSTC_EXTRST flag unless you are controlling the reset of external devices with that line.
__noreturn void Reset(void)
{
__disable_interrupts();
AT91C_BASE_RSTC->RSTC_RCR = AT91C_RSTC_KEY |
AT91C_RSTC_PERRST |
AT91C_RSTC_EXTRST |
AT91C_RSTC_PROCRST;
while (AT91C_BASE_RSTC->RSTC_RSR & AT91C_RSTC_SRCMP);
}
These days, wouldn't using CMSIS __NVIC_SystemReset(void) be cleanest?

Resources