Explicitly accessing banked registers on ARM - arm

According to the ARM manual, it should be possible to access the banked registers for a specific CPU mode as, for instance, "r13_svc". When I try to do this gcc yells at me with the following error:
immediate expression requires a # prefix -- `mov r2,sp_svc'
What's wrong?
Update. The following text from the ARM Architecture Reference Manual for ARMv5 and ARMv6 led me to believe that it is possible, section A2.4.2:
Registers R13 and R14 have six banked
physical registers each. One is used
in User and System modes, and each of
the remaining five is used in one of
the five exception modes. Where it is
necessary to be specific about which
version is being referred to, you use
names of the form: R13_mode
R14_mode where mode is the
appropriate one of usr, svc (for
Supervisor mode), abt, und, irq and
fiq.

The correct syntax for this is mrs r2,sp_svc or mrs r3, sp_usr. This is a new armv7 extension. The code can be seen in the ARM Linux KVM source file interrupt_head.S. The gas binutils patch for this instruction support by Matthew Gretton-Dann. It requires the virtualization extensions are far as I understand.
According to what I understand, the LPAE (large physical address extension) implies the virtualization extensions. So Cortex-A7, Cortex-A12, Cortex-A15, and Cortex-A17 may be able to use this extension. However, the Cortex-A5, Cortex-A8, and Cortex-A9 can not.
Documentation on the instruction can be found in the ARMv7a TRM revC, under section B9.3.9 MRS (Banked register).
For other Cortex-A (and ARMv6) CPU's you can use the cps instruction to switch modes and transfer the banked register to an un-banked register (R0-R7) and then switch back. The obvious difficulty is with user mode. The correct way to handle this is with ldm rN, {sp,lr}^; user mode has no simple way back to the privileged modes.
For all older CPUs, the information given by old_timer will work. Mainly, use mrs/msr to change modes. mrs/msr works over the full class of ARM cpus but requires multiple instructions and hence may have race issues which require interrupt and exception masking depending on context.
This is an important instruction (sequences) for context switching (which VMs do a lot of).

I don't think that's possible with the mov instruction; at least according to the ARM Architecture Reference Manual I'm reading. What document do you have? There are is a variant of ldm that can load user mode registers from a privileged mode (using ^). Your only other option is to switch to SVC mode, do mov r2, sp, and then switch back to whatever other mode you were using.
The error you're getting is because it doesn't understand sp_svc, so it thinks you're trying to do an immediate mov, which would look like:
mov r2, #0x14
So that's why it says "requires a # prefix".

You use mrs and msr to change modes by changing bits in the cpsr then use r13 normally.
From the arm arm
MRS R0,CPSR
BIC R0,R0,#0x1F
ORR R0,R0,#0x13
MSR CPSR_c,R0
then
mov sp,#0x10000000
or if you need more bits in the immediate
ldr sp,=0x12345600
or if you dont want the assembler placing your data, you can place it yourself.
ldr sp,svc_stack
b 1f
svc_stack: .word 0x12345600
1:
You will see typical arm startup code, where the application is going to support interrupts, aborts and other exceptions, to set all of your stack pointers that you are going to need, change mode, set sp, change mode, set sp, change mode ...

Related

Very Baisc Arm Assembly Questions(add, compare)

TLDR: What exactly does bx lr do?
I have trouble understanding these two following examples:
*Add Example: *
I understand that the code "add r0, r0, r1" add r1 to r1 and stores it to register 0. What I do not understand is that how the code "bx lr" knows how
to return r0 without explicitly stating r0.
Compare Example:
Same here I understand that the code "BGT r0_Gt" compares if r0 > r1, and if this is true, the code will skip to r0_gt: However, how does bx lr know how to return the correct value?
It is defined by the used ABI; for ARM, this is EABI which states in "5.4 Result Return"
A Fundamental Data Type that is smaller than 4 bytes is zero- or sign-extended to a word and returned in r0.
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf
bx lr doesn't return any register at all, it just passes control over back to the caller (in the address in the lr register), without modifying any other registers than pc.
The caller then knows, based on the calling convention, that on return, the return value will be in the r0 register (depending on the exact type of the return value and the platform's calling convention).
BX simply means branch exchange, it does a branch and can switch modes between arm/thumb if supported for that architecture. LR is a shortcut for register 14 its that simple. branch to the address in r14.
if you look at the bl instruction you see that r14 will be set with the address after the bl instruction, the return address from a function call.
The pair bl something then later bx lr (or mov pc,lr also works if you dont need to change modes and are in arm mode) is how you make function calls in arm.
The processor has very little concept of context (in an abstract sense). It does not know where it came from, what the registers are for, or if it is in a function call/subroutine. The higher level languages and compiler do know this, and use some common standards to make things easier.
A very small number of operations do have a special, well defined purpose. A BL instruction updates both the 'next instruction to execute' (otherwise known as PC or R15), but also magically updates R14 (the link register).
Exceptions (in V7-A) change a few of the banked core registers around, including the register which is usually used to access the stack, and the link register. This means that exceptions can happen without loosing track of everything else that was going on. Cortex M does things differently, and actually uses the stack to help with the banking (setting R14 to a 'magic value' to indicate if the most recent call was an exception or not).
Unless an instruction interacts with specific registers, CPSR specifically, it probably doesn't care about the context. Some operations (related to security) will be restricted so they can only happen in privileged states - this is ultimately used to prevent an operating system from the user applications, but usually these will relate to accessing very specific control registers.

Jump between Thumb and ARM

I am interested in the ARM and Thumb2 commands: LDR and LDR.W, PC, =ADDR for absolute jumping to a certain address.
For example, when I jump from ARM code to ARM, the command LDR PC, =ADDR is performed.
But what happens in the other scenarios?
from ARM to Thumb2
from Thumb2 to Thumb2
from Thumb2 to ARM
when is +1 needed to be added to the address? and why?
The rule is actually quite simple:
If bit 0 of the address is 0, the CPU will execute the code as ARM code after the next branch
If bit 0 of the address is 1, the CPU will execute the code as Thumb after the next branch
Of course if there is a mismatch, the CPU will certainly get a fault (After executing random code) because it has no way to check if the code is ARM or Thumb.
This is what explains the +1.
Note that depending on the compiler, and depending on the label used, bit 0 of the address may be automatically set by the compiler.
You need to just read the documentation.
The following instructions write a value to the PC, treating that value as an interworking address to branch
to, with low-order bits that determine the new instruction set state:
— BLX (register), BX , and BXJ
— LDR instructions with <Rt> equal to the PC
— POP and all forms of LDM except LDM (exception return), when the register list includes the PC
— in ARM state only, ADC , ADD , ADR , AND , ASR (immediate), BIC , EOR , LSL (immediate), LSR (immediate), MOV ,
MVN , ORR , ROR (immediate), RRX , RSB , RSC , SBC , and SUB instructions with <Rd> equal to the PC and without
flag-setting specified.
Since you mentioned thumb2 that means armv6 or newer. (did you say thumb2 and generically mean thumb?) and I believe the docs are telling us the above applies for armv6 and armv7.
Note that bit is consumed by the instruction, the pc doesnt carry around a set lsbit in thumb mode, it is just used by the instruction to indicate a mode change.
Also note you should think in terms of OR 1 not PLUS 1. If you write your code correctly the toolchain will supply you with the correct address with the correct lsbit, if you add a one to that address you will break the code, if you are paranoid or have not done it right you can OR a one to the address and if it has it there already no harm, if it doesnt then it fixes the problem that prevented it from being there. I would never use a plus one though with respect to switching to thumb mode.

ARM Program Counter distinguishing feature

How does the R15 of ARM differ from the general PC of a CPU?
Both of them are program counters only. What is the difference?
ARM's PC is more similar to a regular register with some restrictions than x86's IP is similar to a regular register.
Considering general PC is an Intel x86 based CPU, in x86's case you can't manipulate PC (Instruction pointer) directly but it is updated implicitly by provided control flow instructions.
In ARM's case historically Program Counter (PC), mapped as register at index 15 (16th register) can be manipulated directly via arithmetic instructions. For example you can add 16 to PC which would alter flow of instruction stream similar to a 16-byte forward jump instruction.
The ARM PC maybe more of a general register than most CPUs, but it is still very special. The traditional simple arithmetic instructions can use the PC as an input argument in many cases. Here it functions as a pointer or array base. It can also be used as the output for control transfer with these instructions. As a read-only value, it is useful for calculating return values in a PC-independent way. It is also useful to use as a constant table look-up in near-by code. For these cases, the PC is very much like a regular register. This is probably more common on many RISC CPUs as opposed to a CISC ISA.
However, when the PC is used as a destination (lvalue or updated and written), the behavior is often non-standard. Some examples of special cases (for some ARM architechure versions) for R15/PC are,
adcs - copies SPSR to CPSR
adds - copies SPSR to CPSR
ands - copies SPSR to CPSR
bics - copies SPSR to CPSR
bx r15 - highly discourage or not supported.
clz r15 - not supported.
mcr pXX, xx, r15,... - unpredictable
etc.
In most cases, using the PC as a destination of an instruction will have some special case. Especially, the use of the S (normally to set conditions codes) can be used to return from an exception. This might be used as some sort of veneer when returning from an exception or just a direct return. In some cases, the meaning of the instruction might change completely. For instance, ldm sp, {r0-r15}^ and ldm sp, {r0-r14}^ use different register banks; the first will load the registers according to the mode in the SPSR; whereas the 2nd will load the register to user mode.
For load/store, atomics, mode manipulation, co-processor and complex arithmetic (64 bit multiplies, etc) instructions, the PC is often unsupported or has a different meaning; the different meaning is often a mechanism for handling exceptions for system level code.

For the NEON coding for ARM Arch64,How do you push the registers to the stack??Seems like STMFD is not a part of the instruction set on Arch64?

For the NEON coding for ARM Arch64,How do you push the registers to the stack??Seems like STMFD is not a part of the instruction set on Arch64?Do you just save the register pairs on to the stack one by one?
AArch64 designers deliberately removed the STM/LDM instructions, presumably to simplify instruction scheduling and fault handling.
3.5 Memory Load-Store
3.5.1 Bulk Transfers
The LDM, STM, PUSH and POP instructions do not exist in A64, however bulk transfers can be constructed using the LDP
and STP instructions which load and store a pair of independent
registers from consecutive memory locations, and which support
unaligned addresses when accessing normal memory. The LDNP and STNP
instructions additionally provide a “streaming” or ”non-temporal” hint
that the data does not need to be retained in caches. The PRFM
(prefetch memory) instructions also include hints for “streaming” or
“non-temporal” accesses, and allow targeting of a prefetch to a
specific cache level.
(from ARMv8 ISA Overview)
So yes, you're supposed to use multiple STP/LDP instructions instead.

Unexpected warning on GNU ARM Assembler

I am writing some bare metal code for the Raspberry Pi and am getting an unexpected warning from the ARM cross assembler on Windows. The instructions causing the warnings were:
stmdb sp!,{r0-r14}^
and
ldmia sp!,{r0-r14}^
The warning is:
Warning: writeback of base register is UNPREDICTABLE
I can sort of understand this as although the '^' modifier tells the processor to store the user mode copies of the registers, it doesn't know what mode the processor will be in when the instruction is executed and there doesn't appear to be a way to tell it. I was a little more concerned to get the same warning for:
stmdb sp!,{r0-r9,sl,fp,ip,lr}^
and:
ldmia sp!,{r0-r9,sl,fp,ip,lr}^
despite the fact that I am explicitly not storing ANY sp register.
My concern is that, although I used to do a lot of assembler coding about 15 years ago, ARM code is new to me and I may be misunderstanding something! Also, if I can safely ignore the warnings, is there any way to suppress them?
The ARM Architecture Reference Manual says that writeback is not allowed in LDM/SMT of user registers. It is allowed in the exception return case, where pc is in the register list.
LDM (exception return)
LDM{<amode>}<c> <Rn>{!},<registers_with_pc>^
LDM (user registers)
LDM{<amode>}<c> <Rn>,<registers_without_pc>^
The term "writeback" refers not to the presence or absence of SP in the register list, but to the ! symbol which means the instruction is supposed to update the SP value with the end of transfer area address. The base register (SP) value will be used from the current mode, not the User mode, so you can still load or store user-mode SP value into your stack. From the ARM ARM B9.3.6 LDM (User registers):
In a PL1 mode other than System mode, Load Multiple (User registers)
loads multiple User mode registers from consecutive memory locations
using an address from a base register. The registers loaded cannot
include the PC. The processor reads the base register value normally,
using the current mode to determine the correct Banked version of the
register. This instruction cannot writeback to the base register.
The encoding diagram reflects this by specifying the bit 21 (W, writeback) as '(0)' which means that the result is unpredictable if the bit is not 0.
So the solution is just to not specify the ! and decrement or increment SP manually if necessary.

Resources