ARM HYPERVISOR: Execute single instruction and trap (equivalent of Intel's MTF)

ARM HYPERVISOR: Execute single instruction and trap (equivalent of Intel's MTF) - arm

I'm implementing a hypervisor on ARM and I need to know if there's a way to resume (ERET) the guest and trap after a single instruction execution, without depending on the debug architecture (v7.1). I could use a software approach by modifying the next instruction of the guest to a HVC (equivalent of VMCALL on Intel), but I don't know how to handle instructions that cause branching (JMP).
On Intel I could use either the trap flag (=TF bit in RFLAGS, per-OS-thread-context) or the monitor trap flag (=MTF, VT-x feature).
Thank you
EDIT: Clarifications
I want to avoid disassembly/emulation (as much as possible)

Related

how to single-step code on-target with no jtag, breakpoints, simulator, emulator

Let's say you have a pointer to function whose source you do not have and which is "untrusted" because it might read/write to disallowed memory region.
Before it executes each assembly instruction, you want to verify that it doesn't access disallowed memory regions.
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
This is for a functionality that needs to be enabled not only during development but during normal runtime.
Ideally, it'd run something like this:
void (*fptr)(int);
fptr = &someFunction; // untrusted, don't have source
// enable interrupts for each assembly instruction
_EN_INT();
// call the function
fptr();
// everytime the PC increments, some other code runs which verifies that if any load/stores are executed, it doesn't access some disallowed memory range
// disable interrupts for each assembly instruction
_DIS_INT();
QUESTION
Is it possible to call that function and pause execution after every assembly instruction?

The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
My answer assumes that you can modify the "OS" the way you need it...
Cortex MK20DX256VLH7
This seems to be a Cortex M4 CPU.
how to single-step code on-target with no jtag, breakpoints
From the doc, it doesn't say whether you NEED an external debugger to resume execution.
If the CPU is really stopped, you'll definitely need an external signal (e.g. from a debugger).
However most CPUs support software debugging. This means that an interrupt service routine is executed whenever a breakpoint is hit. To continue execution you simply return from the interrupt service routine.
I don't know about the Cortex M4 but for the Cortex M3 you'll have to set some special registers to enable that feature. Whenever a "BKPT" instruction is hit then interrupt #12 (*) is executed.
For code in RAM you simply write an BKPT instruction (0xBExx, e.g. 0xBEBE) to the address where you want to set your breakpoint. (Before writing you read out the value to be able to restore it later on).
For code in Flash the M3 has a "Flash patching unit" which allows you to specify up to three addresses which shall be read out as 0xBExx (0xBEBE ?) even if other data is stored there. This allows you to set up to 3 breakpoints in Flash.
Interesting for you: The register controlling the debug features in the M3 (named "DEMCR") also has a bit named "MON_STEP":
If you set this bit in interrupt handler #12 then exactly one instruction is executed after returning from the interrupt handler and interrupt #12 is triggered again. The use case for this feature is - of course - single-stepping code!
To stop single-stepping you'll have to clear the MON_STEP bit again...
Important 1:
I don't know if the MK20DX256VLH7 really has all these features. However because it is a Cortex M4 chip and the M4 should have nearly all features of the M3 these features should be present...
Important 2:
Implementing single-stepping and debugging is not done quickly. Assembly language knowledge will be very helpful and you'll need a lot of time...
From the doc, ...
You will not only need the documentation for the MK20DX256VLH7 from NXP but you'll also need the Cortex M4 documentation from ARM.
(*) Offset 4*12 in the vector table is meant here (which is named "IRQ(-4)" in some ARM documents); not IRQ12.

yes the ARM emulator/interpreter sounds exactly like what I want. Is there a free one?
qemu is open-source, most of it is GPLv2. https://wiki.qemu.org/License. You'd probably need to modify it a lot, because it's designed for use as a stand-alone wrapper for a whole Unix process (qemu-user) or whole machine (qemu-system).
I googled, and there's also http://www.unicorn-engine.org/ which is designed to be used as part of a larger program (written in C with bindings for calling from various languages). It's also GPLv2 (not LGPL), so you can use it if the rest of your code is also Free software.
It's actually based on the CPU-emulation code from QEMU; they stripped out all the device / BIOS emulation stuff to make a flexible library for just emulating CPUs.
Presumably you could configure some memory protections for it and set up the starting machine state, and let it run your function (with a return address that leads to some code that hands control back to your main code?)

ARM modes when context switching a user process running on guest

It is my understanding (from this article) that on ARM, the hypervisor/VMM runs in HYP mode, the guest OS runs in SVC mode, and user processes on the guest run in USR mode.
When there is a context switch in the guest OS, say switching from one user process to another, does this trap all the way up to the VMM in HYP mode? And if so, what happens at each stage of the process, going from USR to to SVC to HYP modes?

Short answer: depends on the hypervisor, architecture permits both approaches.
A context switch on ARM would be switching the Page Table and invalidating the TLB.
To switch Page Table, you need to modify the register TTBR0 (user-space part) or TTBR1 (kernel-space. normally for Linux it never changes but some exotic OS might be different) which are accessed via the "co-processor" instructions.
To set TTBR0 you use the instruction "MRC" with CRn = 2.
Such coprocessor accesses can be trapped by a HYP, but not necessarily. It depends on whether you request them to be trapped or not. This is set in the "Hypervisor System Trap Register" (HSTR_EL2 on aarch64).
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0488d/CIHJFIHA.html
TLB invalidation instructions and cache maintenance operations are also implemented as coprocessor access instructions on ARMv7 (technically also on ARMv8 but the Architecture Reference Manual suggests to use human-readable mnemonics instead). For example, "TLBIALL" is coprocessor CRn8 so you need to set bit T8 in HSTR_EL2.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/CIHECHCD.html

How to set the default ARM processor state for kernel code?

For the ARM processor supports both ARM and Thumb-2 state (instruction set), how to set which state to run kernel mode code, such as the kernel code for interrupt and system call.

In the v7 architecture*, bit 30 of the SCTLR determines whether exceptions are taken in ARM or Thumb state, which defaults to ARM on reset unless an external signal says otherwise. For v6 and earlier, exceptions are always taken in ARM state.
In the context of actually writing an OS, if you wanted to write exception handlers in Thumb in a backwards-compatible way I imagine you could also simply use an ARM interworking branch from the exception vector to the handler code itself - I'm no expert, though, so I can't say for sure there's no nasty pitfalls to this.
*and the in-between oddity that is the 1156

ARM modes and why are there so many?

I'm currently reading/learning about ARM architecture ...
and I was wondering why there are so many modes
(FIQ, User, System, Supervisor, IRQ, ...).
My question is why do we need so many modes? Wouldn't just User and System be enough?
Thanks in advance.

It's just an architectural decision. The big advantage of the multiple modes is that they have some banked registers. Those extra registers allow you to write much less complicated exception routines.
If you were to pick only two, just USR and SYS are probably as good a choice as any, but what would happen when you took an exception? The normal ARM model is to go to an exception mode, set the banked link register for that exception mode to point to the instruction you want to return to after you resolve the exception, save the processor state in the exception mode's SPSR register, and then jump to the exception vector. USR and SYS share all their registers - using this model, you'd blow away your function return address (in LR) every time you took an interrupt!
The FIQ mode in particular has even more banked registers than the other exception modes. Those extra registers are in keeping with the "F" part of FIQ - it stands for "Fast". Not having to save and restore more processor context in software will speed up your interrupt handler.

Not too much to add to Carl's answer. Not sure what family / architecture of ARM processors you're talking about, so I'll just assume based on your question (FIQ, IRQ, etc.) that you're talking about ARM7/9/11. I won't enumerate every difference between every mode in every ARM architecture variant.
In addition to what Carl said, a few other advantages of having different modes for different circumstances:
for example, in the FIQ, you don't have to branch off right away, you can just keep on executing. With other exceptions you have to branch right away
with different modes, you have natural support for separate stacks. If you're multitasking (e.g., RTOS) and you don't have a separate stack when you're in an interrupt mode, you have to build-in extra space onto each task stack for the worst-case interrupt situation
with different modes, certain registers (e.g. CPSR, MMU regs, etc. - depends on architecture) are off-limits. Same thing with certain instructions. You don't want to let user code modify privileged registers, now do you?

switching between ARM/THUMB state

Why should an ARM controller return to ARM state from THUMB state, when an exception occurs?

One explanation might be that ARM mode is the CPU's "native" operating mode, and that it's possible to do more operations in that mode than in the limited Thumb mode. The Thumb mode, as far as I've understood, is optimized for code size, which might mean it lacks certain instructions that perhaps are necessary in exception processing.
This page mentions that exception processing is always done in ARM mode. It doesn't provide any reasons why, so maybe it's just The Way It Is, by design. It does talk about ways to exit from exception processing back to the proper (ARM or Thumb) mode though, so as long as you're not writing the exception handler yourself, you might be able to ignore this issue. That, of course, assumes that your system is set up with a "default" exception handler that does retain the execution mode.
On the other hand, this page says this, about the interrupt vectors of the Cortex-M3 ARM implementation:
The LSB of each exception vector indicates whether the exception is to
be executed in the Thumb state.
So it doesn't seem to be universally true, perhaps you can make your particular exception run in Thumb mode.

Perhaps it is because that the interrupt vector table is really an ARM instruction and to process it requires being in ARM mode. This reduces the programmers job as you dont have to write two handlers one for arm mode and one for thumb mode. How would you even know there is one entry point for an exception and you can only have one instruction type to handle it. You can certainly switch to thumb mode once entered no different than switching to thumb mode after a reset exception.
The cortex-m3 has re-defined the interrupt vector table to be more traditional (an address instead of an instruction). By necessity I would assume, the cortex-m3 is a thumb(2) only processor so either they re-define the vector table to hold thumb instructions or they re-define the table with addresses, or they have just enough of an arm core to process the load or jump that you normally see in a vector table entry.
Basically you would either need two entries per exception, one for the arm based handler and one for the thumb based handler or you require the user to write their handler with an entry point that is one mode specifically.
Even with the one mode entry point into a handler, you still have to be aware of the mode the processor was in when the exception occurred to know what address to return to and how to inspect the instruction in question that caused the exception.

It depends which CPU you have, as there are two thumb instruction sets. The original thumb instruction set (used in armv4t, armv5te) lacked instructions to be able to deal with interrupts; the newer thumb2 set (in the cortex series) has extra instructions so you can remain in thumb2 mode to service an interrupt routine.

Traditional ARM systems boot into ARM mode and jumps to reset exception vector after reset. This means that all exception vectors have to be written in ARM assembly. If your exceptions are ARM instructions naturally the CPU is forced to change its mode to ARM mode before exception handling; if this does not happen it will result in an undefined exception, which will cause another one and so on and on in an infinite loop.
Initial ARM systems only had ARM instructions, the THUMB instructions were later added on; this might be another explanation.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight