What are the appropriate steps to write add a custom bootloader for stm32l0 in IAR? The following questions are not clear:
Do I make a new IAR Project?
If yes, do I write the bootloader like a normal project and just change my original .icf file so there is a small ROM and an small RAM region for the bootloader?
if no, what things do I have to configure in the IAR proejct apart from icf file and code?
what other things do I need to think of?
I'm having trouble starting into this.
So the icf would be for the main project:
__region_ROM_start__ = 0x08000000;
__region_ROM_end__ = 0x08008FFF;
So the icf would be for the bootloader project:
__region_Bootloader_ROM_start__ = 0x08009000;
__region_Bootloader_ROM_end__ = 0x08009FFF;
and the same thing for about 0xFF of RAM?
You do not need to restrict the RAM - you can use all of it because when you switch to the application a new run-time environment will be established and the RAM will be reused.
The flash you reserve for the bootloader must be a whole number of flash pages starting from the reset address The STM32L0 has very small flash pages so there should be minimal waste, but you don't want to have to change it if your bootloader grows, because then you will have to rebuild your application code for the new start address and old application images will no longer be loadable. So consider giving yourself a little headroom.
The bootloader can be built just like any other STM32L0xx project; the application code ROM configuration must start from an address above the bootloader. So for example say you have a 1Kbyte bootloader:
Boot ROM Start: 0x0800 0000
Boot ROM End: 0x0800 03FF
Application Start: 0x0800 0400
Application End: Part size dependent.
The bootloader itself must have a means of determining that an update is available, if an update is available it must then read the application data and write it to the application flash memory, it must then disable any interrupts that may have been enabled, it may also be necessary to deinitialise any peripherals used (if they remain active when the switch to the application is made it may cause problems), then the switch to the application code is made.
It is possible if the bootloader and application both run from the same clock configuration to minimise the configuration in the application and rely on the bootloader. This is a small space saving, but less flexible. If for example you make the bootloader run using the internal RC oscillator it will be portable across multiple hardware designs that may have differing application speed and clocking requirements and different external oscillator frequencies
The switch to the application is pretty simple on Cortex-M, it simply requires the vector table to be switched to the application's vector table, then the program-counter to be loaded - the latter requires a little assembly code. The following is for Cortex-M3, it may need some adaptation for M0+ but possibly not:
Given the following in-line assembly function:
__asm void boot_jump( uint32_t address )
{
LDR SP, [R0] ;Load new stack pointer address
LDR PC, [R0, #4] ;Load new program counter address
}
The bootloader switched to the application image thus:
// Switch off core clock before switching vector table
SysTick->CTRL = 0 ;
// Switch off any other enabled interrupts too
...
// Switch vector table
SCB->VTOR = APPLICATION_START_ADDR ;
//Jump to start address
boot_jump( APPLICATION_START_ADDR ) ;
Where APPLICATION_START_ADDR is the base address of the application area; this address is the start of the application's vector table, which starts with the initial stack pointer and reset vector, the boot_jump() function loads these into the SP and PC registers to start the application as if it had been started at reset. The application's reset vector contains the application's execution start address.
Your needs may vary, but in my experience a serial bootloader (using UART) using XMODEM and decoding an image in Intel Hex format takes about 4Kb of Flash. On an STM32L0 you may want to use something simpler - 1Kb is probably feasible if you simply stream raw binary the data and use hardware flow control (you need to control data flow because erasing and programming the flash takes time and also stops the CPU from running because you cannot on STM32 write flash memory while simultaneously fetching instructions from it).
See also: How to jump between programs in Stellaris
Related
I've created an application which has 2 firmware slots in its memory mapping. It works pretty fine and both slots are executed correctly based on a 32-bit sequencer number stored in FLASH.
The problem appears when I'm trying to use FreeRTOS. By default, firmware is compiled for the first slot... and there's no any problem in running this slot. However when the device starts firmware saved in the second slot, when RTOS starts its first task in prvPortStartFirstTask, then jumps to vPortSVCHandler it switches to the task in the first slot.
What am I doing wrong? I thought function addresses are relative after compilation, so there should be no difficulties running this application with 2 firmware slots.
EDIT
My flow during switching from bootloader to main application is as follows:
1. Check which firmware slot should be used.
2. Disable IRQs.
2. Copy vector table to RAM. That part of RAM is the same for both slots. During copying process I'm changing offset for each address, so they will be compatible with particular firmware slot. By default addresses don't have offset, it's removed in post-compiling stage.
3. Set stack pointer, according to the first word in vector table in RAM. That addresses is not changed while copying vector table to RAM.
4. Set SCB->VTOR.
5. Execute Data Sync Barrier DSB().
6. Jump to the Reset Handler from vector table copied to RAM.
EDIT 2
When I compile application with changed FLASH memory address range to the secondary slot, it works properly.
Is it possible compile code such that application will be PC independent, at least it will work in that case?
EDIT 3
# Generate position independent code.
-fPIC
# Access bss via the GOT.
-mno-pic-data-is-text-relative
# GOT is not PC-relative; store GOT location in a register.
-msingle-pic-base
# Store GOT location in r9.
-mpic-register=r9
However, now this slot stopped working.
I think my problem is similar to that one.
Generally, firmwares aren't built position independent, so I wouldn't trust that all "function addresses are relative after compilation". You compile firmware for a specific start location (either the first or the second firmware slot).
As for your main question, have you done anything to switch the interrupt handlers / interrupt vector from one firmware slot to the other? Or are you jumping to the first firmware's interrupt handlers when you call the SVC handler?
How to change the interrupt vector varies between architectures. For an stm32f429, you could perhaps look here
I have write a bootloader to jump into my app.
First I have try with a simple blinky-led app => I am able to jump into the app from the bootloader.
Now I want to jump into my real app. The app is working well alone but when I jump into it from my bootloader the app crash as soon as the interrupts are enabled, my jumping code :
__disable_irq();
SCB->VTOR = (uint32_t)0x0800BA00;
JumpAddress = *(__IO uint32_t*) (0X0800BA04);
JumpToApplication = (pFunction) JumpAddress;
__set_MSP(*(__IO uint32_t*) 0X0800BA00);
JumpToApplication();
I don't know what is wrong and why the activation of interruptions crash the app.
Thank you for your help
Before you jump to the application, you should deinitialize everything that you've initialized in the bootloader. If your bootloader uses USART with interrupts, you should disable this USART (for example using RCC->AHBxRST/RCC->APBxRST registers) and disable its interrupts. You should also jump to your application with interrupts enabled. Your application should get the chip just as it would be after a normal reset.
If your application uses this crap code from ST called SPL or HAL, then make sure that this code does NOT reset SCB->VTOR back to 0 or 0x8000000, because normally it does that in SystemInit(), which is called from Reset_Handler(), before main().
BTW - are you absolutely sure about the address of your application? You usually put the application at the page boundary, while your code does not indicate that - 0x800ba00 (46.5kB) is pretty far from the closest page boundaries 32kB and 48kB...
set clock setting of boot code & application code same
I have splitted software into two parts: Bootloader(without RTX), Application image with RTX.
But the bootloader could not load the application image with RTX.
The Flash settings are:
--------------------------------------------------------------------
start address size
IROM 1: 0x08000000 0x2800 - Bootloader (without RTX)
IROM 2: 0x08002800 0xD000 - Application Image (with RTX)
I have test 3 ways:
(1) Use another App without RTX. The bootloader could load the app successfully.
(2) Change the application with RTX project IROM setting. I change the application project IROM start address from 0x08002800 to 0x08000000. And I download the application image into flash from the address 0x08000000. Ihe image could run from 0x08000000 successfully.
(3) The application image IROM start address setting is 0x08002800. After downloading bootloader and app image into flash, I debug the app project in keil step by step. I found that there is a "osTimerthread stack overflow" error. Then the main thread stack is also overflowed. I have tried to increase the stack size, but it doesn't work.
I found that the app starks in the RTX kernel switching. All threads are in the waiting state, and are not running.
Ps, when I am debugging in the keil,test item(2) also have stack overflow errors during kernel initialization. The item(2) works fine till now. So I just put any information needed here.
This is the debugging picture for item (3).
Are you actually changing the linker script to link starting at 0x08002800 when using the bootloader or just loading the application (linked at 0x08000000) at an offset of 0x2800? Double check this (look in the map file) for your linked output to ensure that all your symbols are not linked in the 0x08000000 - 0x08002800 range.
Additionally, make sure you are using the correct entry point and stack pointer. The application's stack pointer should be at 0x08002800, and the reset vector will be at 0x08002804. Your bootloader will need to setup the MSP register with the correct stack pointer before jumping to the application. Here is some example code from ST's USB DFU bootloader:
typedef void (*pFunction)(void);
pFunction JumpToApplication;
uint32_t JumpAddress;
/* Jump to user application */
JumpAddress = *(__IO uint32_t*) (USBD_DFU_APP_DEFAULT_ADD + 4);
JumpToApplication = (pFunction) JumpAddress;
/* Initialize user application's Stack Pointer */
__set_MSP(*(__IO uint32_t*) USBD_DFU_APP_DEFAULT_ADD);
JumpToApplication();
Additionally, depending on how much your bootloader configures before jumping to the application, you may need to 'deconfigure' certain peripherals. As an example, if you setup your clocks in the bootloader before deciding to jump to the application, you may run into problems in your application if it assumes that the clocks are in the default configuration already. Similar things can happen with the NVIC and SysTick if your bootloader is using these before jumping to the application.
Lastly, along the same lines as the previous section, the application may be making assumptions about the state of peripherals being default, but it also may be making assumptions that the peripheral defaults are correct. For example: SCB->VTOR has a default value (I believe it is always 0x00000000), and this points to the vector table. Your bootloader will be linked to have its vector table at that location. You'll need to make sure that when your application is starting up, it updates the VTOR register to point to the actual location of its vector table.
Hopefully one of these sections helps you identify the problem.
According to some tutorials, we will disable MMU and I/D-Caches at the beginning of bootlaoder. If I understand correctly, it aims to use the physical address directly in the program, so please correct me if I'm wrong. Thank you!
Secondly, we do this to disable MMU and Caches:
mrc P15, 0, R0, C1, C0, 0
bic R0, R0, #0x00002300 # clear bits 13, 9:8
bic R0, R0, #0x00000087 # clear bits 7, 2:0
orr R0, R0, #0x00000002 # set bit 2 (A) Align
orr R0, R0, #0x00001000 # set bit 12 (I) I-Cache
mcr P15, 0, R0, C1, C0, 0
D-Cache, MMU and Data Address Alignment Fault Checking have been disabled by clear bits 2:0, but why we enable bit 2 immediately in the following instrument? To make sure this manipulation is valid?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
The MMU has settings to determine which memory regions are cacheable or not. If you do not have the mmu on but you have the data cache on (if possible) then you cannot safely talk to peripherals. if you read the uart status register for example that goes through the cache just like any other data operation, whatever that status is stays in the cache for subsequent reads until such time as that cache line is evicted and you get one more shot at the actual register. Lets say for example you have some code that polls the uart status register waiting for a character in the rx buffer. If that first read shows there is no character, that status goes in the cache, you will remain in the loop forever since you will never get to talk to the status register again you will simply get the cached copy of the register. if there was a character in there then that status also gets cached, you read the rx register, and perhaps do something, if when you come back again if the status has not been evicted from the data cache then you get the stale status which shows there is a character, you rx buffer read may or may not also be cached so you may get the stale value in the cache, you may get a stale value or whatever the peripheral does when you read and there is no new value or you might get a new value, but what you dont get in these situations is proper access to the peripheral. When the mmu is on, you use the mmu to mark the address space used by that peripheral as non-(data)-cacheable, and you dont have this problem. With the mmu off you need the data cache off for arm systems.
Leaving the I-cache on is okay because instruction fetches only read instructions...Well for a bare metal application that is okay, it helps for example if you are using a flash that has a potential for read disturb (spi or i2c flashes). The problem is this application is a bootloader, so you must take some extra care. For example your bootloader has some code at address 0x8000 that it runs through at least once, then you choose to use it as a bootloader, the bootloader might be at say address 0x10000000 allowing you to load a new program at 0x8000, this load uses data accesses so it does not go through the instruction cache. So there is a potential that the instruction cache has some or all of the code from the last time you were in the 0x8000 area, and when you branch to the bootloaded code at 0x8000 you will get either the old program from cache or a nasty mixture of old program and new program for the parts that are cached and not cached. So if your bootloader allows for the i-cache to be on, you need to invalidate the cache before branching to bootloaded code.
Lastly, if you or anyone using this bootloader wants to use jtag, then you have that same problem but worse, data cycles that do not go through the i-cache are used to write the new program to ram, when you tell the jtag debugger to then run the new program you will get 1) only the new program, 2) a mixture of the new program and old program fragments from cache 3) the old program from cache.
So d-cache is bad without an mmu because of things that are not in ram, peripherals, etc. The i-cache is a use at your own risk kind of thing which you can mitigate except for the times that jtag is used for debugging.
If you have concerns or have confirmed read-disturb in your (external) flash, then I recommend turn on the i-cache, use a tight loop to copy your application to ram, branch to the ram copy and run there, turn off the i-cache (or use at your own risk) and dont touch the flash again, certainly not heavy read accesses to small areas. A tight uart polling loop like you might have for a command line parser, is a really good place to get hit with read-disturb.
You did not specified on which ARM you are working. Capabilities may vary from one ARM to an other (there is a huge gap between an ARM9 and an ARM Cortex A15).
In the given code, bit 2 is cleared and then set, but it does not matter, as those changes are done in R0. There is no change in the ARM behavior until the write in CP15 register (done by the instruction mcr P15, 0, R0, C1, C0, 0).
Concerning d-cache/i-cache enabling, it is only a matter of choice, there is no requirement. On the products I work on, the bootloader enables L1 I-cache, D-cache, L2 cache, and MMU (and it disables all that stuff before jumping on Linux). Be sure to follow ARM documentations about cache invalidation and memory barriers (according to your actual ARM Core) if you use cache and MMU in your bootloader.
I am working on a boot loader for Stellaris LM3S1607 chip.
I am using Keil MicroVision4 C compiler.
The idea is to create 2 independent firmware that one will update another.
In firmware1 i downloaded firmware2 file and write it to flash in address 0x3200. untill here it is working. i also verifed that the data is being written to flash correct.
Now i have in flash two applications. one is my uip boot loader and the seoncd one is my main project.
i want to know how can i jump from the first program to the second program located in 0x3200.
If someone can help me to jump it will be great.
Thanks
This will work on any Cortex-M part...
Create an assembler function like:
__asm void boot_jump( uint32_t address )
{
LDR SP, [R0] ;Load new stack pointer address
LDR PC, [R0, #4] ;Load new program counter address
}
In-line assembler syntax varies; this example is Keil ARM-MDK / ARM RealView.
Then at the end of your bootloader:
// Switch off core clock before switching vector table
SysTick->CTRL = 0 ;
// Switch off any other enabled interrupts too
...
// Switch vector table
SCB->VTOR = APPLICATION_START_ADDR ;
//Jump to start address
boot_jump( APPLICATION_START_ADDR ) ;
Note that APPLICATION_START_ADDR in this case is the base or location address of your linked application code (0x3200 in this case), not the entry point indicated in the link map. The application vector table is located at this address, and the start of the vector table contains the application's initial stack pointer address and program counter (the actual code entry point).
The boot_jump() function loads a stack pointer and program counter from the application's vector table, simulating what happens on reset where they are loaded from the base of Flash memory (the bootloader's vector table).
Note that you must have set the start address in your application code's linker settings to the same as that which the bootloader will copy the image. If you are using the Keil debugger, you will not be able to load and run the application in the debugger without the bootloader present (or at least without manually setting the SP and PC correctly or using a debugger script), because the debugger loads the reset vector addresses rather than the application vector addresses.
It is important that interrupts are disabled before switching the vector table, otherwise any interrupt that occurs before the application is initialised will vector to the application's handler, and that may not be ready.
Be careful of any peripherals that you use in both the application and boot code, any assumptions about reset conditions may not hold if the peripheral registers have already been set by the boot code.