What is SP0 and SPn in AArch64? - arm

If the exception is being taken at the same Exception level, the stack pointer to be used (SP0 or SPn)
In a document of AArch64 exception vector table, an entry is selected based off the factor above. I couldn't find any useful information regarding SP0 and SPn, so I'd like to ask why there are two stack pointers and what's the difference between them? A link is also appreciated!

Disclaimer: I am not an expert on the Armv8-a architecture, I have just been writing some bare-metal code dealing with exceptions on a Cortex-A53 for the purpose of learning.
The document you are pointing to explains, although succinctly, that:
There are one stack pointer per exception level, i.e. SP_EL0 for EL0, SP_EL1 for EL1, SP_EL2 for EL2 and SP_EL3 for EL3,
When you execute code at a given exception level, the stack pointer that will be used for storing the exception context and by the exception handler to access the saved context will depend on the value of the SPSel system register at the time the exception occurred.
From the Arm documentation for the SPSel system register:
Bits [63:1] Reserved, RES0.
SP, bit [0] Stack pointer to use. Possible values of this bit are:
0b0 Use SP_EL0 at all Exception levels.
0b1 Use SP_ELx for Exception level ELx.
The reset behaviour of this field is: On a Warm reset, this field resets to 1.
Using the stack pointer dedicated to a given exception level helps isolating more the code executing at, say, EL3, from the less-privileged code executing at EL2..EL0, since different memory areas can be used for implementing the stack for each exception level.
If you are writing your own bare-metal code, the value to set in SPSel would ultimately be your choice: For example, when using a standard Arm-Trusted firmware (code running at EL3)/u-boot (code running at EL2) bundle on an Alwinner H6 Cortex-A53:
SPSel.s:
.global _start
.align 3
.text
_start:
mrs x0, SPSel
ret
Building:
/opt/arm/11/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin/aarch64-none-elf-gcc -nostartfiles -nostdlib --freestanding -Wl,--section-start=.text=0x40080000 -o SPSel.elf SPSel.s
/opt/arm/11/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin/aarch64-none-elf-objcopy -O srec SPSel.elf SPSel.srec
/opt/arm/11/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin/aarch64-none-elf-objdump -D -j .text SPSel.elf > SPSel.lst
Executing:
=> loads
## Ready for S-Record download ...
## First Load Addr = 0x40080000
## Last Load Addr = 0x40080007
## Total Size = 0x00000008 = 8 Bytes
## Start Addr = 0x40080000
=> go 0x40080000
## Starting application at 0x40080000 ...
## Application terminated, rc = 0x0
=>
The value of SPSel returned in x0 is 0, i.e. the SP bit of SPSelis 0, and SP_EL0 is therefore the stack-pointer register that will be used at all EL3..EL0 exception levels.

Related

Are ARM Cortex-M0 Stacking Registers Saved On $psp or $msp During Hardfault?

I have an issue where my Cortex-M0 is hard faulting, so I am trying to debug it. I am trying to print the contents of the ARM core registers that were pushed to the stack when the hard fault occurred.
Here is my basic assembly code:
__attribute__((naked)) void HardFaultVector(void) {
asm volatile(
// check LR to see whether the process stack or the main stack was being used at time of exception.
"mov r2, lr\n"
"mov r3, #0x4\n"
"tst r2, r3\n"
"beq _MSP\n"
//process stack was being used.
"_PSP:\n"
"mrs r0, psp\n"
"b _END\n"
//main stack was being used.
"_MSP:\n"
"mrs r0, msp\n"
"b _END\n"
"_END:\n"
"b fault_handler\n"
);
}
The function fault_handler will print the contents of the stack frame that was pushed to either the process stack or the main stack. Here's my question though:
When I print the contents of the stack frame that supposedly has the saved registers, here is what I see:
Stack frame at 0x20000120:
pc = 0xfffffffd; saved pc 0x55555554
called by frame at 0x20000120, caller of frame at 0x20000100
Arglist at unknown address.
Locals at unknown address, Previous frame's sp is 0x20000120
Saved registers:
r0 at 0x20000100, r1 at 0x20000104, r2 at 0x20000108, r3 at 0x2000010c, r12 at 0x20000110, lr at 0x20000114, pc at 0x20000118, xPSR at 0x2000011c
You can see the saved registers, these are the registers that are pushed by the ARM core when a hard fault occurs. You can also see the line pc = 0xfffffffd; which indicates that this is the LR's EXC_RETURN value. The value 0xfffffffd indicates to me that the process stack was being used at the time of the hard fault.
If I print the $psp value, I get the following:
gdb $ p/x $psp
$91 = 0x20000328
If I print the $msp value, I get the following:
gdb $ p/x $msp
$92 = 0x20000100
You can clearly see that the $msp is pointing to the top of the stack where supposedly the saved registers are located. Doesn't this mean that the main stack has the saved registers that the ARM core pushed to the stack?
If I print the memory contents, starting at the $msp address, I get the following:
gdb $ x/8xw 0x20000100
0x20000100 <__process_stack_base__>: 0x55555555 0x55555555 0x55555555 0x55555555
0x20000110 <__process_stack_base__+16>: 0x55555555 0x55555555 0x55555555 0x55555555
It's empty...
Now, if I print the memory contents, starting at the $psp address, I get the following:
gdb $ x/8xw 0x20000328
0x20000328 <__process_stack_base__+552>: 0x20000860 0x00000054 0x00000054 0x20000408
0x20000338 <__process_stack_base__+568>: 0x20000828 0x08001615 0x1ad10800 0x20000000
This looks more accurate. But I thought the saved registers are supposed to indicate where in flash memory they are located? So how does this make sense?
The comments by old_timer under your question are all correct. The registers will be pushed to the active stack on exception entry, whether this is PSP or MSP at the time. By default, all code uses the main stack (MSP), but if you're using anything other than complete bare metal it's likely that whatever kernel you're using has switched Thread mode to using the process stack (PSP).
Most of your investigations suggest that the PSP was in use, with your memory peek around the PSP and MSP being pretty much indisputable. The only bit of evidence you have for it having been the MSP is the results of the fault_handler function, for which you have not posted the source; so my first guess would be that this function is broken in some way.
Do also remember that one common reason for entering the HardFault handler is that another exception handler has caused an exception. This can easily happen in cases of memory corruption. In these cases (assuming Thread mode uses the PSP) the CPU will first enter Handler mode in response to the original exception, pushing r0-r3,r12,lr,pc,psr to the process stack. It will start executing the original exception handler, then fault again, pushing r0-r3,r12,lr,pc,psr to the main stack while entering the HardFault handler. There's often some unravelling to do.
old_timer also mentions using real assembly language, and I agree here too. Even though the ((naked)) attribute should be removing the prologue and epilogue (between them most of the possible 'compilerisms'), your code would simply be far more readable if it was written in bare assembly language. Inline assembly language has its uses, for example if you want to do something very low-level that you can't do from C but you want to avoid a call-return overhead. But when your entire function is written in assembly language, there's no reason to use it.

Stacktrace on ARM cortex-M4

When I run into a fault handler on my ARM cortex-M4 (Thumb) I get a snapshot of the CPU register just before the fault occured. With this information I can find the stack pointer where it was. Now, what I want is to backtrace through all functions it passed. The only problem I see here is that I don't have a frame pointer, so I cannot really see where a certain subroutine has saved the LR, ad infinitum.
How would one tackle this problem if the frame pointer is not available in r7?
This blog post discusses this issue with reference to the MIPS architecture - the principles can be readily adapted to ARM architectures.
In short, it describes three possibilities for locating the stack frame for a given SP and PC:
Using compiler-generated debug information (not included in the executable image) to calculate it.
Using compiler-generated stack-unwinding (exception handling) information (included in the executable image) to calculate it.
Scanning the call site to locate the prologue or epilogue code that adjusts the stack pointer, and deducing the stack frame address from that.
Obviously it's very compiler- and compiler-option dependent, and not guaranteed to work in all cases.
R7 is not the frame pointer on the M4, it's R11. R7 is the FP for Cortex-M0+/M1 where only the lower registers are generally available. In anycase, when Cortex-M makes a call to a function using BL and variants, it saves the return address into LR (link register). At function entry, the LR is saved onto the stack. So in theory, to get a call trace, you would "chase" the chain of the LRs.
Unfortunately, the saved location of LR on the stack is not defined by the calling convention, and its location must be deduced from the debug info for that function entry in the DWARF records (in the .elf file). I do not know if there is an utility that would extract the LR locations from an ELF file, but it should not be too difficult.
Richard at ImageCraft is right.
More information can be found here
This works fine with C code. I had a harder applying it to C++ but it's not impossible.

Find which instruction caused a trap on Cortex M3

I am currently debugging a hard fault trap which turned out to be a precise data bus error on a STM32F205 Cortex-M3 processor, using Keil uVision. Due to a lengthy debugging and googling process I found the assembly instruction that caused the trap. Now I am looking for a way to avoid this lengthy process next time a trap occurs.
In the application note 209 by Keil it says:
PRECISEERR: Precise data bus error:
0 = no precise data bus error
1 = a data bus error has occurred, and the PC value stacked for the exception return points to the instruction that caused the fault. When the processor sets this bit it writes the faulting address to SCB->BFAR
and also this:
An exception saves the state of registers R0-R3, R12, PC & LR either the Main Stack or the Process Stack (depends on the stack in use when the exception occurred).
The last quote I am interpreting as such that there should be 7 registers plus the respective stack. When I look up my SP address in the memory I see the address that caused the error at an address 10 words higher than the stack pointer address.
My questions are:
Is the address of the instruction that caused the trap always saved 10 words higher than the current stack pointer? And could you please point out a document where I can read up on how and why this is?
Is there another register that would contain this address as well?
As you said, exceptions (or interrupts) on ARM Cortex-M3 will automatically stack some registers, namely :
Program Counter (PC)
Processor Status Register (xPSR)
r0-r3
r12
Link Register (LR).
For a total of 8 registers (reference : Cortex™-M3 Technical Reference Manual, Chapter 5.5.1).
Now, if you write your exception handler in a language other than assembly, the compiler may stack additional registers, and you can't be sure how many.
One simple solution is to add a small code before the real handler, to pass the address of the auto-stacked registers :
void myRealHandler( void * stack );
void handler(void) {
asm volatile("mov r0, sp");
asm volatile("b myRealHandler");
}
The register BFAR is specific to bus faults. It will contain the faulty address, not the address of the faulty instruction. For example, if there was an error reading an integer at address 0x30005632, BFAR will be set to 0x30005632.
The precise stack location of the return address depends on how much stack the interrupt handler requires. If you look at the disassembly of your HardFault_Handler, you should be able to see how much data is stored on the stack / how many registers are pushed in addition to the registers pushed by the hardware interrupt machinery (R0-R3, R12, PC, LR & PSR)
I found this to be a pretty good idea on how to debug Hard Faults, though it requires a bit of inline assembly.

Unexpected warning on GNU ARM Assembler

I am writing some bare metal code for the Raspberry Pi and am getting an unexpected warning from the ARM cross assembler on Windows. The instructions causing the warnings were:
stmdb sp!,{r0-r14}^
and
ldmia sp!,{r0-r14}^
The warning is:
Warning: writeback of base register is UNPREDICTABLE
I can sort of understand this as although the '^' modifier tells the processor to store the user mode copies of the registers, it doesn't know what mode the processor will be in when the instruction is executed and there doesn't appear to be a way to tell it. I was a little more concerned to get the same warning for:
stmdb sp!,{r0-r9,sl,fp,ip,lr}^
and:
ldmia sp!,{r0-r9,sl,fp,ip,lr}^
despite the fact that I am explicitly not storing ANY sp register.
My concern is that, although I used to do a lot of assembler coding about 15 years ago, ARM code is new to me and I may be misunderstanding something! Also, if I can safely ignore the warnings, is there any way to suppress them?
The ARM Architecture Reference Manual says that writeback is not allowed in LDM/SMT of user registers. It is allowed in the exception return case, where pc is in the register list.
LDM (exception return)
LDM{<amode>}<c> <Rn>{!},<registers_with_pc>^
LDM (user registers)
LDM{<amode>}<c> <Rn>,<registers_without_pc>^
The term "writeback" refers not to the presence or absence of SP in the register list, but to the ! symbol which means the instruction is supposed to update the SP value with the end of transfer area address. The base register (SP) value will be used from the current mode, not the User mode, so you can still load or store user-mode SP value into your stack. From the ARM ARM B9.3.6 LDM (User registers):
In a PL1 mode other than System mode, Load Multiple (User registers)
loads multiple User mode registers from consecutive memory locations
using an address from a base register. The registers loaded cannot
include the PC. The processor reads the base register value normally,
using the current mode to determine the correct Banked version of the
register. This instruction cannot writeback to the base register.
The encoding diagram reflects this by specifying the bit 21 (W, writeback) as '(0)' which means that the result is unpredictable if the bit is not 0.
So the solution is just to not specify the ! and decrement or increment SP manually if necessary.

Problems with simple C bootstrap/kernel

Recently I've become interested in writing my own really really basic OS.
I wrote (well, copied) some basic Assembly that establishes a stack and does some basic things and this seemed to work fine, however attempting to introduce C into the mix has screwed everything up.
I have two main project files: loader.s which is some NASM that creates the stack and calls my C function, and kernel.c which contains the basic C function.
My issue at the moment is essentially that QEMU freezes up when I run my kernel.bin file. I'm guessing there are any number of things wrong with my code -- perhaps this question isn't really appropriate for a StackOverflow format due to its extreme specificity. My project files are as follows:
loader.s:
BITS 16 ; 16 Bits
extern kmain ; Our 'proper' kernel function in C
loader:
mov ax, 07C0h ; Move the starting address [7C00h] into 'ax'
add ax, 32 ; Leave 32 16 byte blocks [200h] for the 512 code segment
mov ss, ax ; Set 'stack segment' to the start of our stack
mov sp, 4096 ; Set the stack pointer to the end of our stack [4096 bytes in size]
mov ax, 07C0h ; Use 'ax' to set 'ds'
mov ds, ax ; Set data segment to where we're loaded
mov es, ax ; Set our extra segment
call kmain ; Call the kernel proper
cli ; Clear ints
jmp $ ; Hang
; Since putting these in and booting the image without '-kernel' can't find
; a bootable device, we'll comment these out for now and run the ROM with
; the '-kernel' flag in QEMU
;times 510-($-$$) db 0 ; Pad remained of our boot sector with 0s
;dw 0xAA55 ; The standard 'magic word' boot sig
kernel.c:
#include <stdint.h>
void kmain(void)
{
unsigned char *vidmem = (char*)0xB8000; //Video memory address
vidmem[0] = 65; //The character 'A'
vidmem[1] = 0x07; //Light grey (7) on black (0)
}
I compile everything like so:
nasm -f elf -o loader.o loader.s
i386-elf-gcc -I/usr/include -o kernel.o -c kernel.c -Wall -nostdlib -fno-builtin -nostartfiles -nodefaultlibs
i386-elf-ld -T linker.ld -o kernel.bin loader.o kernel.o
And then test like so:
qemu-system-x86_64 -kernel kernel.bin
Hopefully someone can have a look over this for me -- the code snippets aren't massively long.
Thanks.
Gosh, where to begin? (rhughes, is that you?)
The code from loader.s goes into the Master Boot Record (MBR). The MBR, however, also holds the partition table of the hard drive. So, once you assembled the loader.s, you have to merge it with the MBR: The code from loader.s, the partition table from the MBR. If you just copy the loader.s code into the MBR, you killed your hard drive's partitioning. To properly do the merge, you have to know where exactly the partition table is located in the MBR...
The output from loader.s, which goes into the MBR, is called a "first stage bootloader". Due to the things described above, you only have 436 bytes in that first stage. One thing you cannot do at this point is slapping some C compiler output on top of that (i.e. making your binary larger than one sector, the MBR) and copying that to the hard drive. While it might work temporarily on an old hard drive, modern ones carry yet more partitioning information in sector 1 onward, which would be destroyed by your copying.
The idea is that you compile kernel.c into a separate binary, the "second stage". The first stage, in the 436 bytes available, then uses the BIOS (or EFI) to load the second stage from a specific point on the hard drive (because you won't be able to add partition table and file system parsing to the first stage), then jump to that just-loaded code. Since the second stage isn't under the same kind of size limitation, it can then go ahead to do the proper thing, i.e. parse the partitioning information, find the "home" partition, parse its file system, then load and parse the actual kernel binary.
I hope you are aware that I am looking at all this from low-earth orbit. Bootloading is one heck of an involved process, and no-one can hope to detail it in one SO posting. Hence, there are websites dedicated to these subjects, like OSDev. But be warned: This kind of development takes experienced programmers, people capable of doing professional-grade research, asking questions the smart way, and carrying their own weight. Since these skills are on a general decline these days, OS development websites have a tendency for grumpy reactions if you approach it wrongly.(*)
(*): Or they toss uncommented source at you, like dwalter did just as I finished this post. ;-)
Edit: Of course, none of this is the actual reason why the emulator freezes. i386-elf-gcc is a compiler generating code for 32-bit protected mode, assuming a "flat" memory model, i.e. code / data segments beginning at zero. Your loader.s is 16-bit real mode code (as stated by the BITS 16 part), which does not activate protected mode, and does not initialize the segment registers to the values expected by GCC, and then proceeds to jump to the code generated by GCC under false assumptions... BAM.

Resources