moving the vector table on an STM32F405 - c

I want to move my code in the flash memory on an STM32F405.
I changed the linker script to change the start of flash like so:
FLASH (rx) : ORIGIN = 0x08008000, LENGTH = 1024K-32K
If I am correct the vector table will also be located at 0x08008000. I would like to create a bootloader for a start I would like to run my application in the new memory location. Do my bootloader and application have sepperate vector tables? How can I initialize the stack pointer to 0x8008000?

Yes, your bootloader will have a separate vector table to your main code. The last thing your bootloader, or the first thing your main code should do is remap the vector table, using the SCB->VTOR register. The vector table is 4 bytes in from the start of the image, so using your numbers, SCB->VTOR should be 0x08008004. The first 4 bytes of the image are the value the stack pointer should be initialised with.
You don't want to initalise your stack pointer to 0x8008000, that address is in flash and will cause a hard fault as soon as you try to push something, if that is where your application starts then the memory at 0x08008000 contains the address you should use as the stack pointer.
To set it I have always used an asm function which just loads SP with the value passed to the function in R0, something like the following.
SetSP PROC
EXPORT SetSP
MOV SP, R0
BX LR
ENDP
To call from a C context:
extern void SetSP(uint32_t address);
uint32_t sp = *((uint32_t *)0x08008000);
SetSP(sp);
That dereferences a pointer to 0x08008000, to get the initial stack pointer, then sets it.

Related

setting stack pointer before jumping to app from bootloader

I am coding a bootloader for Nucleo-F429ZI. I have two different STM32 projects, one for the bootloader itself and an application to jump from the bootloader.
Linker script for bootloader
MEMORY
{
CCMRAM (xrw) : ORIGIN = 0x10000000, LENGTH = 64K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 32K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 32K
}
Linker script for app
_estack = ORIGIN(RAM) + LENGTH(RAM);
MEMORY
{
CCMRAM (xrw) : ORIGIN = 0x10000000, LENGTH = 64K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 192K
FLASH (rx) : ORIGIN = 0x8008000, LENGTH = 64K
}
I did not forget to set the flash offset of the app.
system_stm32f4xx.c (in the app project)
#define VECT_TAB_BASE_ADDRESS FLASH_BASE // 0x8000000
#define VECT_TAB_OFFSET 0x00008000U
The tutorial of STMicroelectronics about bootloaders has the following code to jump
main.c (in bootloader project)
#define FLASH_APP_ADDR 0x8008000
typedef void (*pFunction)(void);
uint32_t JumpAddress;
pFunction Jump_To_Application;
void go2APP(void)
{
JumpAddress = *(uint32_t*)(FLASH_APP_ADDR + 4);
Jump_To_Application = (pFunction) JumpAddress;
__set_MSP(*(uint32_t*)FLASH_APP_ADDR); // in cmsis_gcc.h
Jump_To_Application();
}
cmsis_gcc.h (in bootloader project)
__STATIC_FORCEINLINE void __set_MSP(uint32_t topOfMainStack)
{
__ASM volatile ("MSR msp, %0" : : "r" (topOfMainStack) : );
}
As you can see, __set_MSP function sets the main stack pointer before jumping to FLASH_APP_ADDR + 4.
I found the memory location of the target place by debugging. FLASH_APP_ADDR + 4 caused to run Reset_Handler function of app project. Lets see what will be executed.
startup_stm32f429zitx.c (in the app project)
.section .text.Reset_Handler
.weak Reset_Handler
.type Reset_Handler, %function
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
/* Copy the data segment initializers from flash to SRAM */
ldr r0, =_sdata
ldr r1, =_edata
ldr r2, =_sidata
movs r3, #0
b LoopCopyDataInit
First thing of what Reset_Handler does is setting the stack pointer. _estack was defined in linker script.
If Reset_Handler is setting stack pointer, why did we call the __set_MSP function? I remove the function __set_MSP and bootloding process is still working. However I examined some other bootloader codes and found the exact same logic.
I tried what i have said and could not find an explanation.
Cortex-M core the loads SP register with initial value from address FLASH_BASE+0 during boot sequence. Then jumps to the code entry point (Reset vector) from address FLASH_BASE+4. Any bootloader code mimics core behaviour. Note, that FLASH_BASE here is not necessarily actual flash base, but an abstract value, that depends on the used processor, and it's settings.
Provided Reset_Handler code loads the sp register with __estack (Main stack top) value, but it doesn't have to! Bootloader can not expect the main program to do it, but has perform the same boot sequence as the core after reset. This way the main code doesn't have to rely on knowing, who started it - core, bootloader, jtag, or something else.
I've seen startup code, that doesn't load SP, but disables interrupts with the first instruction. Or startup code, written in C, which could use stack with the first instruction.
The real question here could be: Why this startup code loads SP if it is already loaded? But perhaps it should be forwarded to the original code author.
Let's see what's happening line by line.
JumpAddress = *(uint32_t*)(FLASH_APP_ADDR + 4);
Okay, so we take FLASH_APP_ADDR, add 1 word to it, call it a pointer to a word, dereference it. So it's the content of 0x8008004 (which is the one word after start of the vector table - list of interrupt handler pointers). You can find it in the vector table in reference manual. Here is reference manual for your MCU. Page 375
Next,
Jump_To_Application = (pFunction) JumpAddress;
Okay, so we treat reset handler address as a void function(void).
Eventually, you get to the stack
__set_MSP(*(uint32_t*)FLASH_APP_ADDR);
This function, as we see from its source code, simply sets main stack pointer to its argument. The argument is take vector table address, treat it as a pointer to a word, dereference it. So it's the first word of that vector table. And the first word of the vector table is the main stack pointer auto-loaded after power on. By definition of the vector table. You reset the stack to cold boot value, same value as the first word of your Flash. Your bootloader has used some stack until this point, but it won't be needed anymore, and the bootloader function will never return and free that stack, so you just reset stack to its initial value for your program. It will reuse all stack used by the bootloader.
So right now you've reset the stack pointer and you assigned reset handler to the function you call. And then you, well, call it.
Your vector table and the program that the bootloader starts are two different entities in memory. If you don't need to remap the interrupt handlers at runtime, don't move the vector table. It will stay at the beginning of the flash and will lead to the default interrupt handlers. Just make sure the address you execute from contains executable code and you run it from the start (well, if you don't, you will hardfault).

Why does the stack frame also store instructions(besides data)? What is the precise mechanism by which instructions on stack frame get executed?

Short version:
0: 48 c7 c7 ee 4f 37 45 mov $0x45374fee, %rdi
7: 68 60 18 40 00 pushq $0x401860
c: c3 retq
How can these 3 lines of instruction(0,7,c), saved in the stack frame, get executed? I thought stack frame only store data, does it also store instructions? I know data is read to registers, but how do these instructions get executed?
Long version:
I am self-studying 15-213(Computer Systems) from CMU. In the Attack lab, there is an instance (phase 2) where the stack frame gets overwritten with "attack" instructions. The attack happens by then overwriting the return address from the calling function getbuf() with the address %rsp points to, which I know is the top of the stack frame. In this case, the top of the stack frame is in turn injected with the attack code mentioned above.
Here is the question, by reading the book(CSAPP), I get the sense that the stack frame only stores data the is overflown from the registers(including return address, extra arguments, etc.). But I don't get why it can also store instructions(attack code) and be executed. How exactly did the content in the stack frame, which %rsp points to, get executed? I also know that %rsp stores the return address of the calling function, the point being it is an address, not an instruction? So exactly by which mechanism does an supposed address get executed as an instruction? I am very confused.
Edit: Here is a link to the question(4.2 level 2):
http://csapp.cs.cmu.edu/3e/attacklab.pdf
This is a post that is helpful for me in understanding: https://github.com/magna25/Attack-Lab/blob/master/Phase%202.md
Thanks for your explanation!
ret instruction gets a pointer from the current position of the stack and jumps to it. If, while in a function, you modify the stack to point to another function or piece of code that could be used maliciously, the code can return to it.
The code below doesn't necessarily compile, and it is just meant to represent the concept.
For example, we have two functions: add(), and badcode():
int add(int a, int b)
{
return a + b;
}
void badcode()
{
// Some very bad code
}
Let's also assume that we have a stack such as the below when we call add()
...
0x00....18 extra arguments
0x00....10 return address
0x00....08 saved RBP
0x00....00 local variables and etc.
...
If during the execution of add, we managed to change the return address to address of badcode(), on ret instruction we will automatically start executing badcode(). I don't know if this answer your question.
Edit:
An instruction is simply an array of numbers. Where you store them is irrelevant (mostly) to their execution. A stack is essentially an abstract data structure, it is not a special place in RAM. If your OS doesn't mark the stack as non-executable, there is nothing stopping the code on the stack from being returned to by the ret.
Edit 2:
I get the sense that the stack frame only stores data that is overflown
from the registers(including return address, extra arguments, etc.)
I do not think that you know how registers, RAM, stack, and programs are incorporated. The sense that stack frame only stores data that is overflown is incorrect.
Let's start over.
Registers are pieces of memory on your CPU. They are independent of RAM. There are mainly 8 registers on a CPU. a, c, d, b, si, di, sp, and bp. a is for accumulator and it generally used for arithmetic operations, likewise b stands for base, c stands for counter, d stands for data, si stands for source, di stands for destination, sp is the stack pointer, and bp is the base pointer.
On 16 bit computers a, b, c, d, si, di, sp, and bp are 16 bits (2 byte). The a, b, c, and d are often shown as ax, bx, cx, and dx where the x stands for extension from their original 8 bit versions. They can also be referred to as eax, ecx, edx, ebx, esi, edi, esp, ebp for 32 bit (e again stands for extended) and rax, rcx, rdx, rbx, rsi, rdi, rsp, rbp for 64 bit.
Once again these are on your CPU and are independent of RAM. CPU uses these registers to do everything that it does. You wanna add two numbers? put one of them inside ax and another one inside cx and add them.
You also have RAM. RAM (standing for Random Access Memory) is a storage device that allows you to access and modify all of its values using equal computation power or time (hence the term random access). Each value that RAM holds also has an address that determines where on the RAM this value is. CPU can use numbers and treat such numbers as addresses to access memory addresses of RAM. Numbers that are used for such purposes are called pointers.
A stack is an abstract data structure. It has a FILO (first in last out) structure which means that to access the first datum that you have stored you have to access all of the other data. To manipulate the stack CPU provides us with sp which holds the pointer to the current position of the stack, and bp which holds the top of the stack. The position that bp holds is called the top of the stack because the stack usually grows downwards meaning that if we start a stack from the memory address 0x100 and store 4 bytes in it, sp will now be at the memory address 0x100 - 4 = 0x9C. To do such operations automatically we have the push and pop instructions. In that sense a stack could be used to store any type of data regardless of the data's relation to registers are programs.
Programs are pieces of structured code that are placed on the RAM by the operating system. The operating system reads program headers and relevant information and sets up an environment for the program to run on. For each program a stack is set up, usually, some space for the heap is given, and instructions (which are the building blocks of a program) are placed in arbitrary memory locations that are either predetermined by the program itself or automatically given by the OS.
Over the years some conventions have been set to standardize CPUs. For example, on most CPU's ret instruction receives the system pointer size amount of data from the stack and jumps to it. Jumping means executing code at a particular RAM address. This is only a convention and has no relation to being overflown from registers and etc. For that reason when a function is called firstly the return address (or the current address in the program at the time of execution) is pushed onto the stack so that it could be retrieved later by ret. Local variables are also stored in the stack, along with arguments if a function has more than 6(?).
Does this help?
I know it is a long read but I couldn't be sure on what you know and what you don't know.
Yet Another Edit:
Lets also take a look at the code from the PDF:
void test()
{
int val;
val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
}
Phase 2 involves injecting a small amount of code as part of your exploit string.
Within the file ctarget there is code for a function touch2 having the following C representation:
void touch2(unsigned val)
{
vlevel = 2; /* Part of validation protocol */
if (val == cookie) {
printf("Touch2!: You called touch2(0x%.8x)\n", val);
validate(2);
} else {
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Your task is to get CTARGET to execute the code for touch2 rather than returning to test. In this case,
however, you must make it appear to touch2 as if you have passed your cookie as its argument.
Let's think about what you need to do:
You need to modify the stack of test() so that two things happen. The first thing is that you do not return to test() but you rather return to touch2. The other thing you need to do is give touch2 an argument which is your cookie. Since you are giving only one argument you don't need to modify the stack for the argument at all. The first argument is stored on rdi as a part of x86_64 calling convention.
The final code that you write has to change the return address to touch2()'s address and also call mov rdi, cookie
Edit:
I before talked about RAM being able to store data on addresses and CPU being able to interact with them. There is a secret register on your CPU that you are not able to reach from you assembly code. This register is called ip/eip/rip. It stands for instruction pointer. This register holds a 16/32/64 bit pointer to an address on RAM. this particular address is the address that the CPU will execute in its clock cycle. With that in my we can say that what a ret instruction is doing is
pop rip
which means get the last 64 bits (8 bytes for a pointer) on the stack into this instruction pointer. Once rip is set to this value, the CPU begins executing this code. The CPU doesn't do any checks on rip whatsoever. You can technically do the following thing (excuse me, my assembly is in intel syntax):
mov rax, str ; move the RAM address of "str" into rax
push rax ; push rax into stack
ret ; return to the last pushed qword (8 bytes) on the stack
str: db "Hello, world!", 0 ; define a string
This code can call/execute a string. Your CPU will be very upset tho, that there is no valid instruction there and will probably stop working.

Code execution exploit Cortex M4

For testing the MPU and playing around with exploits, I want to execute code from a local buffer running on my STM32F4 dev board.
int main(void)
{
uint16_t func[] = { 0x0301f103, 0x0301f103, 0x0301f103 };
MPU->CTRL = 0;
unsigned int address = (void*)&func+1;
asm volatile(
"mov r4,%0\n"
"ldr pc, [r4]\n"
:
: "r"(address)
);
while(1);
}
In main, I first turn of the MPU. In func my instructions are stored. In the ASM part I load the address (0x2001ffe8 +1 for thumb) into the program counter register. When stepping through the code with GDB, in R4 the correct value is stored and then transfered to PC register. But then I will end up in the HardFault Handler.
Edit:
The stack looks like this:
0x2001ffe8: 0x0301f103 0x0301f103 0x0301f103 0x2001ffe9
The instructions are correct in the memory. Definitive Guide to Cortex says region 0x20000000–0x3FFFFFFF is the SRAM and "this region is executable,
so you can copy program code here and execute it".
You are assigning 32 bit values to a 16 bit array.
Your instructions dont terminate, they continue on to run into whatever is found in ram, so that will crash.
You are not loading the address to the array into the program counter you are loading the first item in the array into the program counter, this will crash, you created a level of indirection.
Look at the BX instruction for this rather than ldr pc
You did not declare the array as static, so the array can be optimized out as dead and unused, so this can cause it to crash.
The compiler should also complain that you are assigning a void* to an unsigned variable, so a typecast is wanted there.
As a habit I recommend address|=1 rather than +=1, in this case either will function.

AVR C compilers behavior. Memory management

Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
absoluteAdr = functionDataStartAdr + localShiftOfVariable.
And do they increase data stack point when variable declared by it's length or stack pointer increased in end/start of function for all it's variables lengths.
Let's have a look at avr-gcc, which is freely available including its ABI:
Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
Yes, no, it depends:
Static Storage
For variables in static storage, i.e. variables as defined by
unsigned char func (void)
{
static unsigned char var;
return ++var;
}
the compiler generates a symbol like var.123 with appropriate size (1 byte in this case). The linker / locator will then assign the address.
func:
lds r24,var.1505
subi r24,lo8(-(1))
sts var.1505,r24
ret
.local var.1505
.comm var.1505,1,1
Automatic
Automatic variables are held in registers if possible, otherwise the compiler allocates space in the frame of the function. It may even be the case that variables are optimized out, and in that case they do not exist anywhere in the program:
int add (void)
{
int a = 1;
int b = 2;
return a + b;
}
→
add:
ldi r24,lo8(3)
ldi r25,0
ret
There are 3 types of entities that are stored in the frame of a function, all of which might be present or absent depending on the program:
Callee-saved registers that are saved (PUSH'ed) by the function prologue and restored (POP'ed) by the epilogue. This is needed when local variables are allocated to callee-saved registers.
Space for local variables that cannot be allocated to registers. This happens when the variable is too big to be held in registers, there are too many auto variables, or the address of a variable is taken (and taking the address cannot be optimized out). This is because you cannot take the address of a register1.
void use_address (int*);
void func (void)
{
int a;
use_address (&a);
}
The space for these variables is allocated in the prologue and deallocated in the epilogue. Shrink-wrapping is not implemented:
func:
push r28
push r29
rcall .
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 2 */
/* stack size = 4 */
movw r24,r28
adiw r24,1
rcall use_address
pop __tmp_reg__
pop __tmp_reg__
pop r29
pop r28
ret
In this example, a occupies 2 bytes which are allocated by rcall . (it was compiled for a device with 16-bit program counter). Then the compiler initialized the frame-pointer Y (R29:R28) with the value of the stack pointer. This is needed because on AVR, you cannot access memory via SP; the only memory operations that involve SP are PUSH and POP. Then the address of that variable which is Y+1 is passed in R24. After the call of the function, the epilogue frees the frame and restores R28 and R29.
Arguments that have to be passed on the stack:
void xfunc (int, ...);
void call_xfunc (void)
{
xfunc (42);
}
These arguments are pushed and the callee is picking them up from the stack. These arguments are pushed / popped around the call, but can also be accumulated by means of -maccumulate-args.
call_func:
push __zero_reg__
ldi r24,lo8(42)
push r24
rcall xfunc
pop __tmp_reg__
pop __tmp_reg__
ret
In this example, the argument has to be passed on the stack because the ABI says that all arguments of a varargs function have to be passed on the stack, including the named ones.
For a description on how exactly the frame is being layed out and arguments are being passed, see [Frame Layout and Argument Passing] (https://gcc.gnu.org/wiki/avr-gcc#Frame_Layout).
1 Some AVRs actually allow this, but you never (like in NEVER) want to pass around the address of a general purpose register!
Compilers are not managing the RAM, compilers at compilation time calculate the required size for each data sections like bss, data, text, rodata, .. etc and generate relocatable object file for each translation unit
The linker comes after and generate one object file and assign the relocatable addresses to absolute ones mapped according to the Linker configuration File LCF.
In run time, the mechanism depends on the architecture itself. normally, each function call has a frame in the stack where it's arguments, return address and local variables are defined. the stack extend with a creation of variables and for low cost AVR microcontrollers, there is no memory management protection regarding the stack increase or the overlapping between the stack and another memory section -normally the heap-. even if there is OS managing the protection from the tasks to exceed its allocated stack, without a memory management unit, all what OS can do is to assert a RESET with illegal memory access reason.

lpc 1768 Secondary Boot Loader error

I am working on lpc 1768 SBL which includes the following code to jump to user application.
#define NVIC_VectTab_FLASH (0x00000000)
#define USER_FLASH_START (0x00002000)
void NVIC_SetVectorTable(DWORD NVIC_VectTab, DWORD Offset)
{
NVIC_VECT_TABLE = NVIC_VectTab | (Offset & 0x1FFFFF80);
}
void execute_user_code(void)
{
void (*user_code_entry)(void);
/* Change the Vector Table to the USER_FLASH_START
in case the user application uses interrupts */
NVIC_SetVectorTable(NVIC_VectTab_FLASH, USER_FLASH_START);
user_code_entry = (void (*)(void))((USER_FLASH_START)+1);
user_code_entry();
}
It was working without any errors. After adding some heap memory to the code, the machine is stuck. I tried out different values for heap. Some of them are working. After some deep debugging ,I could find out that machine was not stuck when there is a value which is divisible by 64 is at first locations of application bin file.
ie,
When I select heap memory as 0x00002E90 ,it generates stack base as 0x10005240 . Then stack base + stack size(0x2900) gives a value = 0x10007B40.
I found this is loaded at first locations of application bin file. This value is divisible by 64 and the code is running without stuck.
But ,when I select heap memory as 0x00002E88 ,it generates stack base as 0x10005238 . Then stack base + stack size(0x2900) gives a value = 0x10007B38.
This value is not divisible by 64 and the code is stuck.
The disassembly is as follows in this case.
When stepping from address 0x0000 2000 ,it goes to hard fault handler. But in the earlier case it doesn't go to hard fault. It continues and works as well.
I cannot understand the instruction DCW and why it goes to hard fault.
Can anyone tell me the reason behind this?
Executing the vector table is what you do on older ARM7/ARM9 parts (or bigger Cortex-A ones) where the vectors are instructions, and the first entry will be a jump to the reset handler, but on Cortex-M, the vector table is pure data - the first entry is your initial stack pointer, and the second entry is the address of the reset handler - so trying to execute it is liable to go horribly wrong..
As it happens, in this case you can actually get away with executing most of that vector table by sheer chance, because the memory layout leads to each halfword of the flash addresses becoming fairly innocuous instructions:
2: 1000 asrs r0, r0, #32
4: 20d9 movs r0, #217 ; 0xd9
6: 0000 movs r0, r0
8: 20f5 movs r0, #245 ; 0xf5
a: 0000 movs r0, r0
...
Until you eventually bumble through all the remaining NOPs to 0x20d8 where you pick up the real entry point. However, the killer is that initial stack pointer, because thanks to the RAM being higher up, you get this:
0: 7b38 ldrb r0, [r7, #12]
The lower byte of 0x7bxx is where the base register is encoded, so by varying the address you have a crapshoot as to which register that is, and furthermore whether whatever junk value is left in there also happens to be a valid address to load from. Do you feel lucky?
Anyway, in summary: Rather than call the address of the vector table directly, you need to load the second word from it, then call whatever address that contains.

Resources