Armv7: return frome exception to Thumb Code - arm

since I have no luck with my question on arm community (see here), I also ask you the following question.
I am writing a OS targeted to ARM v7 processors, and I load executable in ELF32 format. If the LSB of the entry point address is set, it means that the entry point should be executed in Thumb mode. My question is to know whether a movs pc, lr will update the CPSR.T bit if the LSB of lr is set, or if I should first set SPSR.T bit by hand prior to performing my return from exception using movs pc, lr.
EDIT:
to be 100% clear, my question is about the very first scheduling of an application which entry point is in Thumb. There is no "old state" to return to.

The thing with initial entry to a lower exception level is that it's not actually any different from the normal case - sure, you haven't actually taken an exception, but in order to "return" you simply make things look exactly as if you had.
The LR on exception entry is a straight copy of the PC at the time (minus any preferred return address offset) thus the lsb would never be set - interworking only applies to branches, and exception entry/return is not a branch. If you'd taken an exception whilst executing Thumb code, the LR value would be the real halfword-aligned instruction address, and SPSR.T would be set, so that's the state you need to construct.

Related

Understanding Cortex-M assembly LDR with pc offset

I'm looking at the disassembly code for this piece of C code:
#define GPIO_PORTF_DATA_R (*((volatile unsigned long *)0x400253FC))
int main(void){
// Initialization code
while(1) {
SW1 = GPIO_PORTF_DATA_R&0x10; // Read PF4 into SW1
// Other code
SW2 = GPIO_PORTF_DATA_R&0x01;
}
}
The assembly for that SW1= line is (sorry can't copy code):
https://imgur.com/dnPHZrd
Here are my questions:
At the first line, PC = 0x00000A56, and PC + 92 = 0x00000AB2, which is not equal to 0x00000AB4, the number shown. Why?
I did a bit of research on SO and found out that PC actually points to the Next Next instruction to be executed.
When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.
However 0x00000AB4 - 0x00000A56 = 0x5E = 94, neither does it match 92+8 or 92+4. Where did I get wrong?
Reference:
Strange behaviour of ldr [pc, #value]
Why does the ARM PC register point to the instruction after the next one to be executed?
LDR Rd,-Label vs LDR Rd,[PC+Offset]
From ARM documentation:
Operation
address = (PC[31:2] << 2) + (immed_8 * 4)
Rd = Memory[address, 4]
The pc is 0xA56+4 because of two instructions ahead and this is thumb so 4 bytes.
(0xA5A>>2)<<2 + (0x17*4)
or
(0x00000A5A&0xFFFFFFFC) + (0x17<<2)
0xA58+92=0xA64
This is an LDR so it is a word-based address ideally. Because the thumb instruction can be on a non-word aligned address, you start off by adding two instructions of course (thumb2 complicates this but add four for thumb). Then zero the lower two bits (LDR) the offset is in words so need to convert that to bytes, times four. This makes the encoding make more sense if you think about each part of it. In arm mode the PC is already word aligned so that step is not required (and in arm mode you have more bits for the immediate so it is byte-based not word-based), making the offset encoding between arm and thumb possibly confusing.
The various documents will show the math in different ways but it is the same math nevertheless. The PC is the only confusing part, especially for thumb. For ARM you add 8, two ahead, for thumb it is basically 4 because the execution cannot tell if there is a thumb2 coming, and it would break a great many things if they had attempted that. So add 4 for the two ahead, for thumb. Since thumb is compressed they do not use a byte offset but instead a word offset giving 4 times the range. Likewise this and/or other instructions can only look forward not back so unsigned offset. This is why you will get alignment errors when assembling things in thumb that in arm would just be unaligned (and you get what you get there depending on architecture and settings). Thumb cannot encode any address for an instruction like this.
For understanding instruction encoding, in particular pc based addressing, it is best to go back to the early ARM ARM (before the armv5 one but if not then just get the armv5 one) as well as the armv6-m and armv7-m and full sized armv7-ar. And look at the pseudo-code for each. The older one generally has the best pseudo-code, but sometimes they leave out the masking of lower bits of the address. No document is perfect, they have bugs just like everything else. Naturally the architecture tied to the core you are using is the official document for the IP the chip vendor used (even down to the specific version of the TRM as these can vary in incompatible ways from one to the next). But if that document is not perfectly clear you can sometimes get an idea from others that, upon inspection, have compatible instructions, architectural features.
You missed a key part of the rules for Thumb mode, quoted in one of the question you linked (Why does the ARM PC register point to the instruction after the next one to be executed?):
For all other instructions that use labels, the value of the PC is the address of the current instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.
(0xA56 + 4) & -4 = 0xA58 is the location that PC-relative things are relative to during execution of that ldr r0, [PC, #92]
((0xA56 + 4) & -4) + 92 = 0xab4, the location the disassembler calculated.
It's equivalent to do 0xA56 & -4 = 0xa54 then +4 + 92, because +4 doesn't modify bit #1; you can think of clearing it before or after adding that +4. But you can't clear the bit after adding the PC-relative offset; that can be unaligned for other instructions like ldrb. (Thumb-mode ldr encodes an offset in words to make better use of the limited number of bits, so the scaled offset and thus the final load address always have bits[1:0] clear.)
(Thanks to Raymond Chen for spotting this; I had also missed it initially!)
Also note that your debugger shows you a PC value when stopped at a breakpoint, but that's the address of the instruction you're stopped at. (Because that's how ARM exceptions work, I assume, saving the actual instruction to return to, not some offset.) During execution of the instruction, PC-relative stuff follows different rules. And the debugger doesn't "cook" this value to show what PC will be during its execution.
The rule is not "relative to the end of this / start of next instruction". Answers and comments stating that rule happen to get the right answer in this case, but would get the wrong answer in other Thumb cases like in LDR Rd,-Label vs LDR Rd,[PC+Offset] where the PC-relative load instruction happens to start at a 4-byte aligned address so bit #1 of PC is already cleared.
Your LDR is at address 0xA56 where bit #1 is set, so the rounding down has an effect. And your ldr instruction used a 2-byte encoding, not a Thumb2 32-bit instruction like you might need for a larger offset. Both of these things means round-down + 4 happens to be the address of the next instruction, rather than 2 instruction later or the middle of this instruction.
Since the program counter points to the next instruction, when it executes the LDR at address 0x00000A56, the program counter will be holding the address of the next instruction, which is 0x00000A58.
0x0A58 + 0x5C (decimal 92) == 0x00000AB4

Stacktrace on ARM cortex-M4

When I run into a fault handler on my ARM cortex-M4 (Thumb) I get a snapshot of the CPU register just before the fault occured. With this information I can find the stack pointer where it was. Now, what I want is to backtrace through all functions it passed. The only problem I see here is that I don't have a frame pointer, so I cannot really see where a certain subroutine has saved the LR, ad infinitum.
How would one tackle this problem if the frame pointer is not available in r7?
This blog post discusses this issue with reference to the MIPS architecture - the principles can be readily adapted to ARM architectures.
In short, it describes three possibilities for locating the stack frame for a given SP and PC:
Using compiler-generated debug information (not included in the executable image) to calculate it.
Using compiler-generated stack-unwinding (exception handling) information (included in the executable image) to calculate it.
Scanning the call site to locate the prologue or epilogue code that adjusts the stack pointer, and deducing the stack frame address from that.
Obviously it's very compiler- and compiler-option dependent, and not guaranteed to work in all cases.
R7 is not the frame pointer on the M4, it's R11. R7 is the FP for Cortex-M0+/M1 where only the lower registers are generally available. In anycase, when Cortex-M makes a call to a function using BL and variants, it saves the return address into LR (link register). At function entry, the LR is saved onto the stack. So in theory, to get a call trace, you would "chase" the chain of the LRs.
Unfortunately, the saved location of LR on the stack is not defined by the calling convention, and its location must be deduced from the debug info for that function entry in the DWARF records (in the .elf file). I do not know if there is an utility that would extract the LR locations from an ELF file, but it should not be too difficult.
Richard at ImageCraft is right.
More information can be found here
This works fine with C code. I had a harder applying it to C++ but it's not impossible.

Jump between Thumb and ARM

I am interested in the ARM and Thumb2 commands: LDR and LDR.W, PC, =ADDR for absolute jumping to a certain address.
For example, when I jump from ARM code to ARM, the command LDR PC, =ADDR is performed.
But what happens in the other scenarios?
from ARM to Thumb2
from Thumb2 to Thumb2
from Thumb2 to ARM
when is +1 needed to be added to the address? and why?
The rule is actually quite simple:
If bit 0 of the address is 0, the CPU will execute the code as ARM code after the next branch
If bit 0 of the address is 1, the CPU will execute the code as Thumb after the next branch
Of course if there is a mismatch, the CPU will certainly get a fault (After executing random code) because it has no way to check if the code is ARM or Thumb.
This is what explains the +1.
Note that depending on the compiler, and depending on the label used, bit 0 of the address may be automatically set by the compiler.
You need to just read the documentation.
The following instructions write a value to the PC, treating that value as an interworking address to branch
to, with low-order bits that determine the new instruction set state:
— BLX (register), BX , and BXJ
— LDR instructions with <Rt> equal to the PC
— POP and all forms of LDM except LDM (exception return), when the register list includes the PC
— in ARM state only, ADC , ADD , ADR , AND , ASR (immediate), BIC , EOR , LSL (immediate), LSR (immediate), MOV ,
MVN , ORR , ROR (immediate), RRX , RSB , RSC , SBC , and SUB instructions with <Rd> equal to the PC and without
flag-setting specified.
Since you mentioned thumb2 that means armv6 or newer. (did you say thumb2 and generically mean thumb?) and I believe the docs are telling us the above applies for armv6 and armv7.
Note that bit is consumed by the instruction, the pc doesnt carry around a set lsbit in thumb mode, it is just used by the instruction to indicate a mode change.
Also note you should think in terms of OR 1 not PLUS 1. If you write your code correctly the toolchain will supply you with the correct address with the correct lsbit, if you add a one to that address you will break the code, if you are paranoid or have not done it right you can OR a one to the address and if it has it there already no harm, if it doesnt then it fixes the problem that prevented it from being there. I would never use a plus one though with respect to switching to thumb mode.

ARM Program Counter distinguishing feature

How does the R15 of ARM differ from the general PC of a CPU?
Both of them are program counters only. What is the difference?
ARM's PC is more similar to a regular register with some restrictions than x86's IP is similar to a regular register.
Considering general PC is an Intel x86 based CPU, in x86's case you can't manipulate PC (Instruction pointer) directly but it is updated implicitly by provided control flow instructions.
In ARM's case historically Program Counter (PC), mapped as register at index 15 (16th register) can be manipulated directly via arithmetic instructions. For example you can add 16 to PC which would alter flow of instruction stream similar to a 16-byte forward jump instruction.
The ARM PC maybe more of a general register than most CPUs, but it is still very special. The traditional simple arithmetic instructions can use the PC as an input argument in many cases. Here it functions as a pointer or array base. It can also be used as the output for control transfer with these instructions. As a read-only value, it is useful for calculating return values in a PC-independent way. It is also useful to use as a constant table look-up in near-by code. For these cases, the PC is very much like a regular register. This is probably more common on many RISC CPUs as opposed to a CISC ISA.
However, when the PC is used as a destination (lvalue or updated and written), the behavior is often non-standard. Some examples of special cases (for some ARM architechure versions) for R15/PC are,
adcs - copies SPSR to CPSR
adds - copies SPSR to CPSR
ands - copies SPSR to CPSR
bics - copies SPSR to CPSR
bx r15 - highly discourage or not supported.
clz r15 - not supported.
mcr pXX, xx, r15,... - unpredictable
etc.
In most cases, using the PC as a destination of an instruction will have some special case. Especially, the use of the S (normally to set conditions codes) can be used to return from an exception. This might be used as some sort of veneer when returning from an exception or just a direct return. In some cases, the meaning of the instruction might change completely. For instance, ldm sp, {r0-r15}^ and ldm sp, {r0-r14}^ use different register banks; the first will load the registers according to the mode in the SPSR; whereas the 2nd will load the register to user mode.
For load/store, atomics, mode manipulation, co-processor and complex arithmetic (64 bit multiplies, etc) instructions, the PC is often unsupported or has a different meaning; the different meaning is often a mechanism for handling exceptions for system level code.

Unexpected warning on GNU ARM Assembler

I am writing some bare metal code for the Raspberry Pi and am getting an unexpected warning from the ARM cross assembler on Windows. The instructions causing the warnings were:
stmdb sp!,{r0-r14}^
and
ldmia sp!,{r0-r14}^
The warning is:
Warning: writeback of base register is UNPREDICTABLE
I can sort of understand this as although the '^' modifier tells the processor to store the user mode copies of the registers, it doesn't know what mode the processor will be in when the instruction is executed and there doesn't appear to be a way to tell it. I was a little more concerned to get the same warning for:
stmdb sp!,{r0-r9,sl,fp,ip,lr}^
and:
ldmia sp!,{r0-r9,sl,fp,ip,lr}^
despite the fact that I am explicitly not storing ANY sp register.
My concern is that, although I used to do a lot of assembler coding about 15 years ago, ARM code is new to me and I may be misunderstanding something! Also, if I can safely ignore the warnings, is there any way to suppress them?
The ARM Architecture Reference Manual says that writeback is not allowed in LDM/SMT of user registers. It is allowed in the exception return case, where pc is in the register list.
LDM (exception return)
LDM{<amode>}<c> <Rn>{!},<registers_with_pc>^
LDM (user registers)
LDM{<amode>}<c> <Rn>,<registers_without_pc>^
The term "writeback" refers not to the presence or absence of SP in the register list, but to the ! symbol which means the instruction is supposed to update the SP value with the end of transfer area address. The base register (SP) value will be used from the current mode, not the User mode, so you can still load or store user-mode SP value into your stack. From the ARM ARM B9.3.6 LDM (User registers):
In a PL1 mode other than System mode, Load Multiple (User registers)
loads multiple User mode registers from consecutive memory locations
using an address from a base register. The registers loaded cannot
include the PC. The processor reads the base register value normally,
using the current mode to determine the correct Banked version of the
register. This instruction cannot writeback to the base register.
The encoding diagram reflects this by specifying the bit 21 (W, writeback) as '(0)' which means that the result is unpredictable if the bit is not 0.
So the solution is just to not specify the ! and decrement or increment SP manually if necessary.

Resources