ARMv8-A architecture register PMUSERENR_EL0

ARMv8-A architecture register PMUSERENR_EL0 - arm

I am new to arm architecture.
In my testing program, there is an assembly code to read PMCR_EL0 register. (like below)
mrs x0, pmcr_el0
But when I execute the program, it occurs exception with "Illegal instruction".
I searched "ARM a53 technical reference manual", then I found following sentence.
This register is accessible at EL0 when PMUSERENR_EL0.EN is set to 1.
and EL1 was required to write a value to PMUSERENR_EL0.EN.
So, I wrote kernel module which set PMUSERENR_EL0.EN bit to 1 when it upload by insmod shell command.
But after the kernel moduel upload, when I execute my testing program there is still occuring exception with "Illegal instruction". So, I checked the value of PMUSERENR_EL0 register via additional assembly code mrs x0, PMUSERENR_EN0, and its value is still 0.
I'm not sure what went wrong. How can I set the PMUSERENR_EL0.EN bit to 1 and finally read the PMCR_EL0 register value in user space application?
My computing environment is Raspberrypi 3 B v1.2(ARM cortex A53) and raspabian 64 bit OS.

Related

Cortex M33 unable to get stack pointer and pc from word offset 0/1 with qemu

I tried to bringup cortex m33 with QEMU, and I used board MPS2-AN505, but QEMU always abort with HardFault error, so I checked system registers with GDB and I found that the T bit of ESPR reg is 0 after CPU boot, and the pc/sp is 0 too, but I have already write SP value and PC value into memory word offset 0/1
I also checked VTOR which defines the vector tables and it is 0, so sp /pc should be read out from word offset 0/1
But if I used GDB to load elf files again, the T bit of ESPR is 1 but pc/sp still not be loaded. so the program start from address 0.
I also tried with board type MPS3-AN547 which used corte-m55 with the same software code and vector table, and it works fine. PC / SP can be loaded correctly from word offset 0/1.
the QEMU version that I used is 6.2.0, and I used option -kernel xxx.bin to load my software image.
qemu error
as described above ...
qemu error debug info

How do I exit from a ARM fault handler?

I am using STM32F746, an ARM cortex-M7 based processor. I am trying to do something hacky, which requires me to return to the program from a MemManage Fault handler.
When entering MemManage Fault handler, the PC before the fault and everything I need is stored on the stack. So I thought I can simply recover those to return to the previous execution point.
However, I cannot properly restore the xPSR.
The previous CPSR before the fault handler is saved in the stack, so I tried restoring it using MSR instruction.
I tried both MSR, xpsr, r12 and MSR, apsr, r12
However, it would only restore the flags and not the other parts of the CPSR , such as the GE or the system mode bits.
(and my mode bits seems also weird.. my xPSR shows as: 0x61070004, but this tells me that the last 5 bits cannot be 0x04)
How can I go back to the program point before the fault handler? I also tried popping the pc but it does not work and I think the problem is CPSR not getting properly restored.

When a Cortex M7 enters an exception handler, the execution context is saved as follows and of course restored when exiting the handler (from ARM Cortex M7 Programming Manual):
As you see, the xPSR is restored after the return from exception.
Furthermore
faults are a subset of the exceptions.
You can do a simple test: dereference on purpose an unvalid pointer. It will trigger a HardFault. Modify your HardFault handler to just return and do nothing. You can check that the context is restored. I tried on STM32H753, it works fine, xPSR latest bits (ISR_NUMBER) are indeed 0 (thread mode).
Be careful though: I don't know for MemManage but HardFault returns to the very same instruction that triggered the fault (and not to the following instruction like a regular exception). It means you will execute again the same instruction after the Hardfault.

how to single-step code on-target with no jtag, breakpoints, simulator, emulator

Let's say you have a pointer to function whose source you do not have and which is "untrusted" because it might read/write to disallowed memory region.
Before it executes each assembly instruction, you want to verify that it doesn't access disallowed memory regions.
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
This is for a functionality that needs to be enabled not only during development but during normal runtime.
Ideally, it'd run something like this:
void (*fptr)(int);
fptr = &someFunction; // untrusted, don't have source
// enable interrupts for each assembly instruction
_EN_INT();
// call the function
fptr();
// everytime the PC increments, some other code runs which verifies that if any load/stores are executed, it doesn't access some disallowed memory range
// disable interrupts for each assembly instruction
_DIS_INT();
QUESTION
Is it possible to call that function and pause execution after every assembly instruction?

The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
My answer assumes that you can modify the "OS" the way you need it...
Cortex MK20DX256VLH7
This seems to be a Cortex M4 CPU.
how to single-step code on-target with no jtag, breakpoints
From the doc, it doesn't say whether you NEED an external debugger to resume execution.
If the CPU is really stopped, you'll definitely need an external signal (e.g. from a debugger).
However most CPUs support software debugging. This means that an interrupt service routine is executed whenever a breakpoint is hit. To continue execution you simply return from the interrupt service routine.
I don't know about the Cortex M4 but for the Cortex M3 you'll have to set some special registers to enable that feature. Whenever a "BKPT" instruction is hit then interrupt #12 (*) is executed.
For code in RAM you simply write an BKPT instruction (0xBExx, e.g. 0xBEBE) to the address where you want to set your breakpoint. (Before writing you read out the value to be able to restore it later on).
For code in Flash the M3 has a "Flash patching unit" which allows you to specify up to three addresses which shall be read out as 0xBExx (0xBEBE ?) even if other data is stored there. This allows you to set up to 3 breakpoints in Flash.
Interesting for you: The register controlling the debug features in the M3 (named "DEMCR") also has a bit named "MON_STEP":
If you set this bit in interrupt handler #12 then exactly one instruction is executed after returning from the interrupt handler and interrupt #12 is triggered again. The use case for this feature is - of course - single-stepping code!
To stop single-stepping you'll have to clear the MON_STEP bit again...
Important 1:
I don't know if the MK20DX256VLH7 really has all these features. However because it is a Cortex M4 chip and the M4 should have nearly all features of the M3 these features should be present...
Important 2:
Implementing single-stepping and debugging is not done quickly. Assembly language knowledge will be very helpful and you'll need a lot of time...
From the doc, ...
You will not only need the documentation for the MK20DX256VLH7 from NXP but you'll also need the Cortex M4 documentation from ARM.
(*) Offset 4*12 in the vector table is meant here (which is named "IRQ(-4)" in some ARM documents); not IRQ12.

yes the ARM emulator/interpreter sounds exactly like what I want. Is there a free one?
qemu is open-source, most of it is GPLv2. https://wiki.qemu.org/License. You'd probably need to modify it a lot, because it's designed for use as a stand-alone wrapper for a whole Unix process (qemu-user) or whole machine (qemu-system).
I googled, and there's also http://www.unicorn-engine.org/ which is designed to be used as part of a larger program (written in C with bindings for calling from various languages). It's also GPLv2 (not LGPL), so you can use it if the rest of your code is also Free software.
It's actually based on the CPU-emulation code from QEMU; they stripped out all the device / BIOS emulation stuff to make a flexible library for just emulating CPUs.
Presumably you could configure some memory protections for it and set up the starting machine state, and let it run your function (with a return address that leads to some code that hands control back to your main code?)

How to correctly use a startup-ipi to start an application processor?

My goal is to let my own kernel start an application cpu. It uses the same mechanism as the linux kernel:
Send asserting and level triggered init-IPI
Wait...
Send deasserting and level triggered init-IPI
Wait...
Send up to two startup-IPIs with vector number (0x40000 >> 12) (the entry code for the application processor lies there)
Currently I'm just interested in making it work with QEMU. Unfortunately, instead of jumping to 0x40000, the application cpu jumps to 0x0 with the cs register set to 0x4000. (I checked with gdb).
The Intel MultiProcessor Specification (B.4.2) explains that the behavior that I noticed is valid if the target processor is halted immediately after RESET or INIT. But shouldn't this also apply to the code of the linux kernel? It sends the startup-IPI after the init-IPI. Or do I misunderstand the specification?
What can I do to have the application processor jump to 0x000VV000 and not to 0x0 with the cs register set to 0xVV00? I really can't see, where linux does something that changes the behavior.

It seems that I really misunderstood the specification: Since the application cpu is started in real mode 0x000VV000 is equivalent to 0xVV00:0x0000. It is not possible to represent the address just in the 16 bit ip register. Therefore a segment offset for the code segment is required.
Additionally, debugging real mode code with gdb is comparable complicated because it does not respect the segment offset. When required to see the disassembled code of the trampoline at the current position, it is necessary to calculate the physical location:
x/20i $eip+0xVV000
This makes gdb print the next 20 instructions at 0xVV00:$eip.

ARM modes: User and System

Can you explain how the ARM mode get changed in case of a system call handling?
I heard ARM mode change can happen only in privileged mode, but in case of a system call handling while the ARM is in user mode (which is a non-privileged mode), how does the ARM mode change?
Can anybody explain the whole action flow for the user mode case, and also more generally the system call handling (especially how the ARM mode change)?
Thanks in advance.

In the case of system calls on ARM, normally the system call causes a SWI instruction to be executed. Anytime the processor executes a SWI (software interrupt) instruction, it goes into SVC mode, which is privileged, and jumps to the SWI exception handler. The SWI handler then looks at the cause of the interrupt (embedded in the instruction) and then does whatever the OS programmer decided it should do. The other exceptions - reset, undefined instruction, prefetch abort, data abort, interrupt, and fast interrupt - all also cause the processor to enter privileged modes.
How file handling works is entirely up to whoever wrote your operating system - there's nothing ARM specific about that at all.

You need to get a copy of the ARM ARM (Architectural Reference Manual).
http://infocenter.arm.com -> ARM Architecture -> Reference Manuals -> ARMv5 Architectural Reference Manual then download the pdf.
It used to be a single ARM ARM for the ARM world but there are too many cores and starting to diverge so they split off the old one as ARMv5 ARM and made new Architectural Reference Manuals for each of the major ARM processor families.
In the Programmers Model chapter it talks about the modes, it says that you can change freely among the modes other than user. ARM startup code will often go through a series of mode changes so that the stack pointers, etc can be configured. Then as needed go back to System mode or User mode.
In that same chapter look at the Exceptions section, this describes the exceptions and what mode the processor switches to for each exception.
The Software interrupt exception which happens when an SWI instruction is executed, is a way to implement system calls. The processor is put in Supervisor mode and if in thumb mode switches to arm mode.
There needs to be code to support that exception handler of course. You need to verify with the operating system, if any, you are running, what is supported and what the calling convention is, etc.
Not all ARM processors work this way. The Cortex-M (ARMv7-M) does not have the same modes and same exception table, etc. As with any time you are using an ARM (at this level) you need to get the ARM ARM for the family you are using and you need to get the TRM (Techincal Reference Manual) for the core(s) you are using, ideally the exact revision, even if ARM marks the TRM as having been replaced by a newer version the chip manufacturer has purchased and uses a specific rev of the core and there can be enough differences between revs that you will want the correct manual.

When an SVC instruction is encountered by the PC, the following behaviour takes place:
The current status (CPSR) is saved (to the supervisor SPSR)
The mode is switched to supervisor mode
Normal (IRQ) interrupts are disabled
ARM mode is entered (if not already in use)
The address of the following instruction (the return address) is saved into the link register (R14) - It's worth noting that this is
the link register that belongs to the supervisor mode
The PC is altered to jump to address 0x00000008
An exception vector (just a branch instruction) should be at the address 0x0000008, which will branch the program to another area of code used to determine which supervisor call has been made.
Determining which supervisor call has been made is usually accomplished by loading the SVC instruction into a register (by offsetting the LR by one word - since the LR is still pointing to the instruction next to the supervisor call), bit clearing the last 8 bits and using the value in the remaining 24 bits of the register to calculate an offset in a jump table, to branch to the corresponding SVC code.
When the supervisor call code wishes to return to the user application, the processor needs to context switch back into user mode and return to the address contained within the LR (which is only available in supervisor mode, since certain registers are banked for both modes). This problem is overcome using the MOVS instruction, as illustrated below:
(consider this to also be your explanation on how to change mode)
MRS R0, CPSR ; load CPSR into R0
BIC R0, R0, #&1F ; clear mode field
ORR R0, R0, #&10 ; user mode code
MSR SPSR, R0 ; store modified CPSR into SPSR
MOVS PC, LR ; context switch and branch
The MRS and MSR instructions are used to transfer content between an ARM register and the CPSR or SPSR.
The MOVS instruction is a special instruction, which operates as a standard MOV instruction, but also sets the CPSR equal to the SPSR upon branching. This allows the processor to branch back (since we're moving the LR into the PC) and change mode to the mode specified by the SPSR.

I quote from the ARM documentation available here:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471c/BABDCIEH.html
When an exception is generated, the processor performs the following
actions:
Copies the CPSR into the appropriate SPSR. This saves the current mode, interrupt mask, and condition flags.
Switches state automatically if the current state does not match the instruction set used in the exception vector table.
Changes the appropriate CPSR mode bits to:
Change to the appropriate mode, and map in the appropriate banked out registers for that mode.
Disable interrupts. IRQs are disabled when any exception occurs. FIQs are disabled when an FIQ occurs and on reset.
Sets the appropriate LR to the return address.
Sets the PC to the vector address for the exception.
where, CPSR refers to Current Program Status Register and SPSR to Saved Program Status register used to restore the state of the process that was interrupted. Thus, as seen in point 3, the processor circuitry is designed in a way that the hardware itself changes the mode when user mode executes a Supervisor call instruction.

"I heard ARM mode change can happen only in privileged mode". You are partly right here. By partly I mean the control field of the CPSR register can be manually modified (manually modified means through code) in the privileged modes only not in the unprivileged mode (i.e. user mode). When a system call happens in the user mode it happens because of SWI instruction. An SWI instruction has inbuilt mechanism to change the mode to supervisor mode.
So to conclude , there are two ways to change the mode:
1) Explicitly through code. Only allowed in a privileged mode.
2) Implicitly through IRQ, FIQ, SWI, RESET, undefined instruction encountered, data abort, prefetch abort. This is allowed in all the modes.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight