setting stack pointer before jumping to app from bootloader - c

I am coding a bootloader for Nucleo-F429ZI. I have two different STM32 projects, one for the bootloader itself and an application to jump from the bootloader.
Linker script for bootloader
MEMORY
{
CCMRAM (xrw) : ORIGIN = 0x10000000, LENGTH = 64K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 32K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 32K
}
Linker script for app
_estack = ORIGIN(RAM) + LENGTH(RAM);
MEMORY
{
CCMRAM (xrw) : ORIGIN = 0x10000000, LENGTH = 64K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 192K
FLASH (rx) : ORIGIN = 0x8008000, LENGTH = 64K
}
I did not forget to set the flash offset of the app.
system_stm32f4xx.c (in the app project)
#define VECT_TAB_BASE_ADDRESS FLASH_BASE // 0x8000000
#define VECT_TAB_OFFSET 0x00008000U
The tutorial of STMicroelectronics about bootloaders has the following code to jump
main.c (in bootloader project)
#define FLASH_APP_ADDR 0x8008000
typedef void (*pFunction)(void);
uint32_t JumpAddress;
pFunction Jump_To_Application;
void go2APP(void)
{
JumpAddress = *(uint32_t*)(FLASH_APP_ADDR + 4);
Jump_To_Application = (pFunction) JumpAddress;
__set_MSP(*(uint32_t*)FLASH_APP_ADDR); // in cmsis_gcc.h
Jump_To_Application();
}
cmsis_gcc.h (in bootloader project)
__STATIC_FORCEINLINE void __set_MSP(uint32_t topOfMainStack)
{
__ASM volatile ("MSR msp, %0" : : "r" (topOfMainStack) : );
}
As you can see, __set_MSP function sets the main stack pointer before jumping to FLASH_APP_ADDR + 4.
I found the memory location of the target place by debugging. FLASH_APP_ADDR + 4 caused to run Reset_Handler function of app project. Lets see what will be executed.
startup_stm32f429zitx.c (in the app project)
.section .text.Reset_Handler
.weak Reset_Handler
.type Reset_Handler, %function
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
/* Copy the data segment initializers from flash to SRAM */
ldr r0, =_sdata
ldr r1, =_edata
ldr r2, =_sidata
movs r3, #0
b LoopCopyDataInit
First thing of what Reset_Handler does is setting the stack pointer. _estack was defined in linker script.
If Reset_Handler is setting stack pointer, why did we call the __set_MSP function? I remove the function __set_MSP and bootloding process is still working. However I examined some other bootloader codes and found the exact same logic.
I tried what i have said and could not find an explanation.

Cortex-M core the loads SP register with initial value from address FLASH_BASE+0 during boot sequence. Then jumps to the code entry point (Reset vector) from address FLASH_BASE+4. Any bootloader code mimics core behaviour. Note, that FLASH_BASE here is not necessarily actual flash base, but an abstract value, that depends on the used processor, and it's settings.
Provided Reset_Handler code loads the sp register with __estack (Main stack top) value, but it doesn't have to! Bootloader can not expect the main program to do it, but has perform the same boot sequence as the core after reset. This way the main code doesn't have to rely on knowing, who started it - core, bootloader, jtag, or something else.
I've seen startup code, that doesn't load SP, but disables interrupts with the first instruction. Or startup code, written in C, which could use stack with the first instruction.
The real question here could be: Why this startup code loads SP if it is already loaded? But perhaps it should be forwarded to the original code author.

Let's see what's happening line by line.
JumpAddress = *(uint32_t*)(FLASH_APP_ADDR + 4);
Okay, so we take FLASH_APP_ADDR, add 1 word to it, call it a pointer to a word, dereference it. So it's the content of 0x8008004 (which is the one word after start of the vector table - list of interrupt handler pointers). You can find it in the vector table in reference manual. Here is reference manual for your MCU. Page 375
Next,
Jump_To_Application = (pFunction) JumpAddress;
Okay, so we treat reset handler address as a void function(void).
Eventually, you get to the stack
__set_MSP(*(uint32_t*)FLASH_APP_ADDR);
This function, as we see from its source code, simply sets main stack pointer to its argument. The argument is take vector table address, treat it as a pointer to a word, dereference it. So it's the first word of that vector table. And the first word of the vector table is the main stack pointer auto-loaded after power on. By definition of the vector table. You reset the stack to cold boot value, same value as the first word of your Flash. Your bootloader has used some stack until this point, but it won't be needed anymore, and the bootloader function will never return and free that stack, so you just reset stack to its initial value for your program. It will reuse all stack used by the bootloader.
So right now you've reset the stack pointer and you assigned reset handler to the function you call. And then you, well, call it.
Your vector table and the program that the bootloader starts are two different entities in memory. If you don't need to remap the interrupt handlers at runtime, don't move the vector table. It will stay at the beginning of the flash and will lead to the default interrupt handlers. Just make sure the address you execute from contains executable code and you run it from the start (well, if you don't, you will hardfault).

Related

MCU crashes when "declaring" vector table in SRAM

I have a new standard c++ project on an imx rt 1024 (an nxp chip), in which I try to move my vector table to SRAM. It fails, depending on a change I apply in the linker script.
The project is a new project from scratch created by MCUxpresso. I am not looking for answers MCUxpresso related, or c/c++/startup code related. I only want to properly understand the consequences of my changed linker script I show below.
The part that works
My starting point is a small program on my evk board, using a simple FreeRTOS task to blink a led. This works fine, when I put my vector table in flash.
linker script:
/* Not relevant for this question, other than showing there is something
written to flash before my vector table, harmless I think, but didn't want to leave
out of this question
*/
.boot_hdr : ALIGN(4)
{
__boot_hdr_start__ = ABSOLUTE(.) ;
KEEP(*(.boot_hdr.conf))
. = 0x1000 ;
KEEP(*(.boot_hdr.ivt))
. = 0x1020 ;
KEEP(*(.boot_hdr.boot_data))
. = 0x1030 ;
KEEP(*(.boot_hdr.dcd_data))
__boot_hdr_end__ = ABSOLUTE(.) ;
. = 0x2000 ;
} >PROGRAM_FLASH
/*
Here I write my vector table to flash
*/
.vector : ALIGN(4)
{
__vector_table_flash_start__ = ADDR(.vector) ;
__vector_table_itc_start__ = LOADADDR(.vector) ;
KEEP(*(.isr_vector))
__vector_table_flash_end__ = ABSOLUTE(.) ;
. = ALIGN(4) ;
} >PROGRAM_FLASH
Disassembled code for vector table
Disassembled code of reset handler
Note: 0x600022e5 corresponds to 0x600022e4, this has something to do with arm .thumb. I don't exactly know how that works tbh.
When I run this app, it runs fine. If I set a breakpoint in the ResetHandler it breaks and I can step through the startup code and jump to main. When I let the program run, my led will blink every second.
The part which fails
I changed my linker script to put my vector table in SRAM as follows
.vector : ALIGN(4)
{
__vector_table_flash_start__ = ADDR(.vector) ;
__vector_table_itc_start__ = LOADADDR(.vector) ;
KEEP(*(.isr_vector))
__vector_table_flash_end__ = ABSOLUTE(.) ;
. = ALIGN(4) ;
} >SRAM_ITC AT>PROGRAM_FLASH
For reference, the memory section:
MEMORY
{
PROGRAM_FLASH (rx) : ORIGIN = 0x60000000, LENGTH = 0x400000
SRAM_DTC (rwx) : ORIGIN = 0x20000000, LENGTH = 0x10000
SRAM_ITC (rwx) : ORIGIN = 0x0, LENGTH = 0x10000
SRAM_OC (rwx) : ORIGIN = 0x20200000, LENGTH = 0x20000
}
ENTRY(ResetISR)
When I upload, my program doesn't even reach the reset vector. It goes straight into the woods, and crashes somewhere outside program code.
The questions
What EXACTLY happens when I adjust my linker script with >SRAM_ITC AT>PROGRAM_FLASH?
I am pretty sure the produced elf file still contains the entire vector table starting from address 0x60002000. The >SRAM_ITC only tells the linker where certain parts of memory will end up AFTER the startup code copied all parts to their final ram location. Right? So how on earth can the initial jump to 0x60002004 (the address which holds the location of the reset handler) fail? The nxp bootloader always expects the reset vector on that location. I didn't change that. I only told the linker that the memory on that location will finally end up in SRAM. What am I misunderstanding here?
Maybe a stupid question: If I am completely wrong with my above assumptions, is there a way to see this from disassembly? I think objdump only shows the final addresses, but my debug probe will only write to flash as far as I know. So after uploading my code to my target, I still assume that stuff got written to flash, and after reset the built in bootloader will jump to 0x60002004 and set the PC to the address located at 0x60002000. Where can I see the actual blob of bytes which is programmed to flash memory?
Copying the vector table to sram from my custom bootloader solved the problem. That way the "on chip bootloader" from nxp can jump to my custom bootloader.
Before I just to my app from my custom bootloader, I copy the vector table to sram and set SCB->VTOR to the start of sram vector table.

AVR C compilers behavior. Memory management

Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
absoluteAdr = functionDataStartAdr + localShiftOfVariable.
And do they increase data stack point when variable declared by it's length or stack pointer increased in end/start of function for all it's variables lengths.
Let's have a look at avr-gcc, which is freely available including its ABI:
Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
Yes, no, it depends:
Static Storage
For variables in static storage, i.e. variables as defined by
unsigned char func (void)
{
static unsigned char var;
return ++var;
}
the compiler generates a symbol like var.123 with appropriate size (1 byte in this case). The linker / locator will then assign the address.
func:
lds r24,var.1505
subi r24,lo8(-(1))
sts var.1505,r24
ret
.local var.1505
.comm var.1505,1,1
Automatic
Automatic variables are held in registers if possible, otherwise the compiler allocates space in the frame of the function. It may even be the case that variables are optimized out, and in that case they do not exist anywhere in the program:
int add (void)
{
int a = 1;
int b = 2;
return a + b;
}
→
add:
ldi r24,lo8(3)
ldi r25,0
ret
There are 3 types of entities that are stored in the frame of a function, all of which might be present or absent depending on the program:
Callee-saved registers that are saved (PUSH'ed) by the function prologue and restored (POP'ed) by the epilogue. This is needed when local variables are allocated to callee-saved registers.
Space for local variables that cannot be allocated to registers. This happens when the variable is too big to be held in registers, there are too many auto variables, or the address of a variable is taken (and taking the address cannot be optimized out). This is because you cannot take the address of a register1.
void use_address (int*);
void func (void)
{
int a;
use_address (&a);
}
The space for these variables is allocated in the prologue and deallocated in the epilogue. Shrink-wrapping is not implemented:
func:
push r28
push r29
rcall .
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 2 */
/* stack size = 4 */
movw r24,r28
adiw r24,1
rcall use_address
pop __tmp_reg__
pop __tmp_reg__
pop r29
pop r28
ret
In this example, a occupies 2 bytes which are allocated by rcall . (it was compiled for a device with 16-bit program counter). Then the compiler initialized the frame-pointer Y (R29:R28) with the value of the stack pointer. This is needed because on AVR, you cannot access memory via SP; the only memory operations that involve SP are PUSH and POP. Then the address of that variable which is Y+1 is passed in R24. After the call of the function, the epilogue frees the frame and restores R28 and R29.
Arguments that have to be passed on the stack:
void xfunc (int, ...);
void call_xfunc (void)
{
xfunc (42);
}
These arguments are pushed and the callee is picking them up from the stack. These arguments are pushed / popped around the call, but can also be accumulated by means of -maccumulate-args.
call_func:
push __zero_reg__
ldi r24,lo8(42)
push r24
rcall xfunc
pop __tmp_reg__
pop __tmp_reg__
ret
In this example, the argument has to be passed on the stack because the ABI says that all arguments of a varargs function have to be passed on the stack, including the named ones.
For a description on how exactly the frame is being layed out and arguments are being passed, see [Frame Layout and Argument Passing] (https://gcc.gnu.org/wiki/avr-gcc#Frame_Layout).
1 Some AVRs actually allow this, but you never (like in NEVER) want to pass around the address of a general purpose register!
Compilers are not managing the RAM, compilers at compilation time calculate the required size for each data sections like bss, data, text, rodata, .. etc and generate relocatable object file for each translation unit
The linker comes after and generate one object file and assign the relocatable addresses to absolute ones mapped according to the Linker configuration File LCF.
In run time, the mechanism depends on the architecture itself. normally, each function call has a frame in the stack where it's arguments, return address and local variables are defined. the stack extend with a creation of variables and for low cost AVR microcontrollers, there is no memory management protection regarding the stack increase or the overlapping between the stack and another memory section -normally the heap-. even if there is OS managing the protection from the tasks to exceed its allocated stack, without a memory management unit, all what OS can do is to assert a RESET with illegal memory access reason.

moving the vector table on an STM32F405

I want to move my code in the flash memory on an STM32F405.
I changed the linker script to change the start of flash like so:
FLASH (rx) : ORIGIN = 0x08008000, LENGTH = 1024K-32K
If I am correct the vector table will also be located at 0x08008000. I would like to create a bootloader for a start I would like to run my application in the new memory location. Do my bootloader and application have sepperate vector tables? How can I initialize the stack pointer to 0x8008000?
Yes, your bootloader will have a separate vector table to your main code. The last thing your bootloader, or the first thing your main code should do is remap the vector table, using the SCB->VTOR register. The vector table is 4 bytes in from the start of the image, so using your numbers, SCB->VTOR should be 0x08008004. The first 4 bytes of the image are the value the stack pointer should be initialised with.
You don't want to initalise your stack pointer to 0x8008000, that address is in flash and will cause a hard fault as soon as you try to push something, if that is where your application starts then the memory at 0x08008000 contains the address you should use as the stack pointer.
To set it I have always used an asm function which just loads SP with the value passed to the function in R0, something like the following.
SetSP PROC
EXPORT SetSP
MOV SP, R0
BX LR
ENDP
To call from a C context:
extern void SetSP(uint32_t address);
uint32_t sp = *((uint32_t *)0x08008000);
SetSP(sp);
That dereferences a pointer to 0x08008000, to get the initial stack pointer, then sets it.

Offset of User Code for Bootloader XMC4500 ARM Cortex M4

I've made a bootloader and a windows application which communicate. The windows application sends the hex file (user app with an offset starting at 0x0C008000) to the bootloader which stores it at the address in flash. The data of the hex file is stored at the right address. I've checked it by reading the flash.
The bootloader waits for 3 seconds to receive a sign. If there is no sign received then it willjump to the user app.
The Jump is not working and I dont know why. Bootloader starts at 0x0C000000.
In the linker script of the user app i mapped the memory regions like this (starting at 0x0C008000)
MEMORY
{
FLASH_1_cached(RX) : ORIGIN = 0x08008000, LENGTH = 0xf8000
FLASH_1_uncached(RX) : ORIGIN = 0x0C008000, LENGTH = 0xf8000
PSRAM_1(!RX) : ORIGIN = 0x10000000, LENGTH = 0x10000
DSRAM_1_system(!RX) : ORIGIN = 0x20000000, LENGTH = 0x10000
DSRAM_2_comm(!RX) : ORIGIN = 0x30000000, LENGTH = 0x8000
}
The code of the jump is following
void RunFlash(void)
{
PPB->VTOR = 0x0c008000; // Offset of the Vector
asm("ldr r0, =0x0c008000");
asm("ldr r1, =0xE000ED08");
asm("str r0,[r1]");
asm("ldr sp, [r0], #4");
asm("ldr r15, [R0]");
}
I always get an "No source available for "0x80082b0" error.
Can somebody please help me? Do i need to make additional changes to the linker script? Do i need to change something in the startup code?
I'm using an Infenion XMC4500 and DAVE4 IDE.

Using #defined values before RAM has been initialised

I am writing the boot-up code for an ARM CPU. There is no internal RAM, but there is 1GB of DDRAM connected to the CPU, which is not directly accessible before initialisation. The code is stored in flash, initialises RAM, then copies itself and the data segment to RAM and continue execution there. My program is:
#define REG_BASE_BOOTUP 0xD0000000
#define INTER_REGS_BASE REG_BASE_BOOTUP
#define SDRAM_FTDLL_REG_DEFAULT_LEFT 0x887000
#define DRAM_BASE 0x0
#define SDRAM_FTDLL_CONFIG_LEFT_REG (DRAM_BASE+ 0x1484)
... //a lot of registers
void sdram_init() __attribute__((section(".text_sdram_init")));
void ram_init()
{
static volatile unsigned int* const sdram_ftdll_config_left_reg = (unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
... //a lot of registers assignments
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
At the moment my program is not working correctly because the register values end up being linked to RAM, and at the moment the program tries to access them only the flash is usable.
How could I change my linker script or my program so that those values have their address in flash? Is there a way I can have those values in the text segment?
And actually are those defined values global or static data when they are declared at file scope?
Edit:
The object file is linked with the following linker script:
MEMORY
{
RAM (rw) : ORIGIN = 0x00001000, LENGTH = 12M-4K
ROM (rx) : ORIGIN = 0x007f1000, LENGTH = 60K
VECTOR (rx) : ORIGIN = 0x007f0000, LENGTH = 4K
}
SECTIONS
{
.startup :
{
KEEP((.text.vectors))
sdram_init.o(.sdram_init)
} > VECTOR
...
}
Disassembly from the register assignment:
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
7f0068: e59f3204 ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: e5932000 ldr r2, [r3]
7f0070: e59f3200 ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: e5823000 str r3, [r2]
...
7f0274: 007f2304 .word 0x007f2304
7f0278: 00887000 .word 0x00887000
To answer your question directly -- #defined values are not stored in the program anywhere (besides possibly in debug sections). Macros are expanded at compile time as if you'd typed them out in the function, something like:
*((unsigned int *) 0xd0010000) = 0x800f800f;
The values do end up in the text segment, as part of your compiled code.
What's much more likely here is that there's something else you're doing wrong. Off the top of my head, my first guess would be that your stack isn't initialized properly, or is located in a memory region that isn't available yet.
There are a few options to solve this problem.
Use PC relative data access.
Use a custom linker script.
Use assembler.
Use PC relative data access
The trouble you have with this method is you must know details of how the compiler will generate code. #define register1 (volatile unsigned int *)0xd0010000UL is that this is being stored as a static variable which is loaded from the linked SDRAM address.
7f0068: ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: ldr r2, [r3] ; !! This is a problem !!
7f0070: ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: str r3, [r2]
...
7f0274: .word 0x007f2304 ; !! This memory doesn't exist.
7f0278: .word 0x00887000
You must do this,
void ram_init()
{
/* NO 'static', you can not do that. */
/* static */ volatile unsigned int* const sdram_reg =
(unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
Or you may prefer to implement this in assembler as it is probably pretty obtuse as to what you can and can't do here. The main effect of the above C code is that every thing is calculated or PC relative. If you opt not to use a linker script, this must be the case. As Duskwuff points out, you also can have stack issues. If you have no ETB memory, etc, that you can use as a temporary stack then it probably best to code this in assembler.
Linker script
See gnu linker map... and many other question on using a linker script in this case. If you want specifics, you need to give actual addresses use by the processor. With this option you can annotate your function to specify which section it will live in. For instance,
void ram_init() __attribute__((section("FLASH")));
In this case, you would use the Gnu Linkers MEMORY statement and AT statements to put this code at the flash address where you desire it to run from.
Use assembler
Assembler gives you full control over memory use. You can garentee that no stack is used, that no non-PC relative code is generated and it will probably be faster to boot. Here is some table driven ARM assembler I have used for the case you describe, initializing an SDRAM controller.
/* Macro for table of register writes. */
.macro DCDGEN,type,addr,data
.long \type
.long \addr
.long \data
.endm
.set FTDLL_CONFIG_LEFT, 0xD0001484
sdram_init:
DCDGEN 4, FTDLL_CONFIG_LEFT, 0x887000
1:
init_sdram_bank:
adr r0,sdram_init
adr r1,1b
1:
/* Delay. */
mov r5,#0x100
2: subs r5,r5,#1
bne 2b
ldmia r0!, {r2,r3,r4} /* Load DCD entry. */
cmp r2,#1 /* byte? */
streqb r4,[r3] /* Store byte... */
strne r4,[r3] /* Store word. */
cmp r0,r1 /* table done? */
blo 1b
bx lr
/* Dump literal pool. */
.ltorg
Assembler has many benefits. You can also clear the bss section and setup the stack with simple routines. There are many on the Internet and I think you can probably code one yourself. The gnu ld script is also beneficial with assembler as you can ensure that sections like bss are aligned and a multiple of 4,8,etc. so that the clearing routine doesn't need special cases. Also, you will have to copy the code from flash to SDRAM after it is initialized. This is a fairly expensive/long running task and you can speed it up with some short assembler.

Resources