Custom Bootloader for Kinetis MKE06Z microcontrollers on IAR EWARM issue - c

First I'd like to introduce myself, as I'm new to the site. I'm an Electronic Engineer, specialized in embedded systems design and development. I've been gathering info from the site for a long time, and I think that there's a lot of people with great deal of knowledge. I'm hoping some other of you may have stumbled upon this or a similar issue.
I've been having some trouble in the implementation of a custom bootloader for a Kinetis MKE06Z microcontroller, not in the bootloader itself but in the relocation of the application code and the behavior after jumping to it. The application is completely coded in C.
The bootloader executes everything as expected, determines if it should run or jump to user application. This is the sequence that implements the jump:
__disable_interrupt();
SCB->VTOR = RELOCATION_VECTOR_ADDR & 0x3FFFFE00;
JumpToUserApplication(RELOCATION_VECTOR_ADDR);
where:
void JumpToUserApplication(uint32_t userStartup)
{
/* set up stack pointer */
asm("LDR r1, [r0]");
asm("MOV r13, r1");
/* jump to application reset vector */
asm("ADDS r0,r0,#0x04 ");
asm("LDR r0, [r0]");
asm("BX r0");
}
as implemented in Frescale's AN4767.
So far, so good. Once the jump is executed, I trace the application behavior (on the Disassembly Window) and find out after some instructions, it gets stuck at some specific address with a jump instruction, which ends up being an infinite loop. I then run it step by step to determine which was the instruction that causes this malfunction. It's very strange, as it is running OK and suddenly jumps to a RAM address. A couple of cycles and then jumps to the infinite loop. I took note of the addresses with the instruction causing this strange jump and the one with the infinite loop. I look at the core registers and find out there is an exception, and notice it's the number 0x03 (Hard Fault). Then switch to debugging the user application.
Once in the user application, I start debugging. The user application works fine running like this (no jump from the bootloader). Then I look for the relevant addresses and discover that the routine causing the hard fault when jumping from bootloader is from IAR: __iar_data_init3. The thing is, it's part of a precompiled library and I'm not sure if it's safe to remove it (by removing the __iar_program_start and replacing it directly with the call to main on the startup file.
The real question is: why does the application behave like that after the jump from the booloader but not if there is no such jump? Why does this routine jumps to a RAM address (when it shouldn't)?
Of course, it may be a little to specific, but hopefully there's someone that can help me.

It seems that something IAR does with the linker configuration is not very clear to me, but has something to do with this problem. The thing is I relocated .text segment:
define symbol __ICFEDIT_intvec_start__ = 0x00001800;
define symbol __ICFEDIT_region_ROM_start__ = 0x00002000;
define symbol __ICFEDIT_region_ROM_end__ = 0x0000FFFF;
define region APP_ROM = mem:[from (__ICFEDIT_region_ROM_start__) to (__ICFEDIT_region_ROM_end__)];
place at address mem:__ICFEDIT_intvec_start__ { readonly section .intvec };
place at start of APP_ROM { readonly section .text };
It seems that the linker doesn't appreciate this and something make the app misbehave when jumping from other app. Instead of this, keeping the original .icf file and editing within the GUI only the .intvec_start solved the problem, but code starts right next to the vector table. Not an issue, but I wanted to relocate code a little farther.
Thanks.

Related

Forcing a function to restore all registers before making a function call

I am using an EK-LM4F120XL board, which contains a cortex-M4 processor. I also use GCC-ARM-none-eabi as toolchain.
I am building on a little hobby project, which slowly becomes an operating system. An important part of this is that I need to switch out registers to switch processes. This happens inside an interrupt and this specific processor makes sure that all the temporary registers (r0-r3, r12, lr) are pushed to the process stack. So in order to continue I need to write the content of r4-r11 and the SP to a place in memory, I need to load the r4-r11 of the new process, load its stackpointer and return. Additionally the lr value contains some information about the process that was interrupted, so I need information from that register too.
All of this works, because I wrote it in assembly. I linked the assembly function directly to the interrupt, so I have full control over what happens to the registers. The combination of C and inline assembly did not work because the compiler usually pushes some registers to the stack and that is fatal. But the OS is growing and the context change is growing along: there are now also some global variables that need changing, etc. All of this is doable in assembly, but its becoming a pain: assembly is hard to read and to debug. So I want a C/Assemlby combo. Basically I am looking for something like this:
void contextSwitch(void){
//Figure out what the next process will be
//Change every variable that needs changing
// Restore register state to the moment of interrupt. The following function will not return in the sense that it will end the interrupt.
swapRegisters(oldProc, newProc);
}
And then write only swapRegisters in assembly. Is there a way to achieve this? Is my solution even the best solution?
There is no portable method of directly accessing CPU registers in C; you will need assembler, in-line assembler, compiler intrinsics or a kernel library (that uses assembler code).
The details of how that is done for Cortex-M are well covered elsewhere and probably too complex to be repeated here: The specifics of doing this in Cortex-M4(F) are described at the ARM Info Center site here. The approach is broadly similar for the Cortex-M3 except for the FPU considerations, an M3 specific description of context switching is provided in this Embedded.com article.
As you can never have enough explanations because different authors make some things clearer than others or give better or more directly applicable examples, here's another - also M3 based, but will work on M4 if not using the FPU or for M4's without an FPU. And yet another example.

MPLAB/XC8 can't jump in ASM?

I have a project for the PIC18F25K50 of mixed C and Assembly; most of what I want to do I can easily manage (and must for efficiency) in Assembly, but some parts where I care more about ease of development use C. I actually have a couple of these, and I keep encountering the same issue: I can't use ASM to jump to a label. Every single function to jump - CALL, GOTO, BNC, and so on - will fail if given a label, setting PC to some random-but-consistent value where there are no instructions, causing the program to hang. Using an address works fine: BC $+4 skips the next line.
An example of what does not work is this:
#asm
_waitUS:
GLOBAL _waitUS
waitLoop:
//12 cycles = 1 microsecond:
NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
DECFSZ WREG, F, ACCESS
GOTO waitLoop
RETURN
#endasm
void main() {
//DEBUG:
waitUS(6);
}
Now, this may not work overall, and I am begging you to focus on the issue of jumping - this is still in prototyping because I can't even get the function called. The program does compile without issue.
As soon as waitUS(6) is called, the PC jumps from - in my case - 0x7C96 to 0x52. Swapping the C call out for MOVLW 6; CALL _waitUS breaks in exactly the same way.
If I strictly use C for calling/jumping (as I had to in the previous project), it works fine, and figures out where it's going.
I've been searching for an answer to this for a few weeks now, and still haven't seen anyone else with this problem, even though every project I make (including plaintext in notepad, compiling via command line) has the exact same issue. What the heck is up with this?
Edit: Having discovered the program memory view, I was able to get a better idea of what it's doing. The compiler does know where the functions are, and it is trying to jump to the right location. Apparently, CALL just doesn't know where it's going.
Example:
Address 0x7C92 contains CALL 0x2044, 0. That is precisely what it ought to, that is where the desired function starts. However, upon running this instruction, PC is altered to 0x205E, missing half of the function.
Attempting to be clever, I decided to tack on several NOPs to the start of the function after its label, lining the real code up with 0x205E. Unfortunately, it seems any change alters where its unpredictable jumping will land, and it then landed at 0x2086 instead.
Incidentally, when it starts running at random places, it will often run across a GOTO - and it will jump to the specified location as intended. This only works within the same function, as trying to use GOTO instead of CALL ends up in the same incorrect location, despite what the compiled result demands.
The PDF document at http://ww1.microchip.com/downloads/en/DeviceDoc/33014K.pdf has many examples on how to code the PIC18.
Here is one such example:
RST CODE 0x0 ;The code section named RST
;is placed at program memory
;location 0x0. The next two
;instructions are placed in
;code section RST.
pagesel start ;Jumps to the location labelled
goto start ;’start’.
PGM CODE ;This is the beginning of the
;code section named PGM. It is
;a relocatable code section
;since no absolute address is
;given along with directive CODE.
start
movlw D'10'
movwf delay_value
xorlw 0x80
call delay
goto start
end

ARM assembly Jump to address

I have a need in pure C, after make the page read, I want to replace the function address with jump instruction and another function address, so I can use another function instead of current function at runtime, which implements MOCK.
It works fine on X86, but on ARM, I came into some issues, and do not know how to solve it. could you help me?
What is jump instruction of ARM, and how to replace it with current function address using memcpy?
I think maybe the key element is 16hex ARM jump instruction
From blog post titled Caches and Self-Modifying Code on arm's community page:
Cached ARM architectures have a separate cache for data and
instruction accesses; these are called the D-cache and the I-cache,
respectively. ... with two interfaces to the CPU,
the core can load an instruction and some data at the same time.
... because the D-cache and I-cache are not coherent, the
newly-written instructions might be masked by the existing contents of
the I-cache, causing the processor to execute old (or possibly
invalid) instructions.
I believe rest of the article would help you dig deeper however I wonder why you are not using function pointers? They would be much easier to build on.

AT91 Bootstrap + Bare Metal Application

I am currently trying to understand how AT91 and a bare metal application can work together. I'll try to describe what I have:
IAR as development environment
A simple application which I can download via debugger to SRAM and which toggles some LEDs (working!)
Using SAM-BA I can write this application to SRAM and it will start correctly (LEDs are toggling)
My hardware platform is the ATSAMA5D3x-EK
Now I would like this application to first run the AT91 bootstrap to initialize all the low level hardware (like DDR-RAM), then jump to my application and run it. I have not been able to do that yet successfully. I am able to start the pre-built uboot binary though so I assume it's not the copying or jump that are failing but my application is setup incorrectly.
As far as I understand, if I jump to an application (I assume this is some sort of "LDR pc, appstart_address") the operation at address appstart_address gets executed.
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt vectors, whereas the first instruction is usually some sort of "LDR pc, =main". Are these required if my application is copied to RAM and executed from there? I somehow have the feeling that after copying my application to RAM, the address pointers do not match anymore (although they should be relative - is that correct at all?)
So my questions basically boil down to:
What happens after AT91 has initialized the hardware and jumps to my application
Do I need to setup my application in some specific way? Do I need to tell the linker or any other component that it will be relocated to some other memory location (at91 bootstrap copies it to 0x2600 0000 whereas 0x2000 0000 is the start address of DDR).
Does anyone know of a good tutorial which explains exactely this step (the jump from at91 bootstrap to my application)?
One more question which I probably can answer myself:
Is it safe to assume that I will not need to execute the instructions in board_startup.s at the beginning of my application which enable The floating point unit, setup the sys stack pointer and so on. I would say that the hardware itself has already been setup by AT91 Bootstrap and therefore there is no need for such setup.
After thinking about a few things it comes down to this:
Does it make sense to tell the linker that it should link main to address 0x0 (because this is where bootstrap will jump to) - how would I do that?
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt
vectors, whereas the first instruction is usually some sort of "LDR
pc, =main". Are these required if my application is copied to RAM and
executed from there? I somehow have the feeling that after copying my
application to RAM, the address pointers do not match anymore
(although they should be relative - is that correct at all?)
The first 8 WORDS are exception entry points yes. Of which one is undefined so 7 real ones...
The reset vector does not want to go straight to main implying C code, you have not setup the stack or anything that you need to do to call C code. Also the reset vector is often close enough to use a branch b instead of a ldr pc, but since you only have one word/instruction to get out of the exception table then it either needs to be a branch or a ldr pc,something.
if your binary is position dependent then you build it for that position, you can then place it in non-volatile storage, copy and run if you like there is no problem with that. if you build it for its non-volatile address but you run it in a different address space and it is not position independent then you are right it simply wont work.
What happens after AT91 has initialized the hardware and jumps to my
application
your application runs
Do I need to setup my application in some specific way? Do I need to
tell the linker or any other component that it will be relocated to
some other memory location (at91 bootstrap copies it to 0x2600 0000
whereas 0x2000 0000 is the start address of DDR).
either build it position independent or link it for the address where it will run.
Does anyone know of a good tutorial which explains exactely this step
(the jump from at91 bootstrap to my application)?
I assume when you say at91 bootstrap (need to use a more correct term) you mean some part specific (at91 is a long lived family of devices) you really mean either some atmel part specific code or IAR part specific code. And the answer to your question is in their examples or documentation. You need to demonstrate what you found, examples, etc before posting a question like that.
Is it safe to assume that I will not need to execute the instructions
in board_startup.s at the beginning of my application which enable The
floating point unit, setup the sys stack pointer and so on. I would
say that the hardware itself has already been setup by AT91 Bootstrap
and therefore there is no need for such setup.
if you are relying on someone elses code to for example setup ddr, then it is probably a safe bet that they setup the stack. fpu, thats another story. But if that file name is specific to their project and is something they call/use then well, they called it or used it. Again this is specific to this magic AT91 Bootstrap thing which you have not demonstrated that you looked at or through or read about. Please, do some more research on the topic, show what you tried, etc. For example it should be quite trivial after this bootstrap code to read the registers that enable the fpu and or just use it and see what you see. that is an easy way to tell if it had been run. alternatively insert an infinite loop in that code and re-build if the code hangs at the infinite loop. they they are running it. (careful not to brick your board with such a move, in theory SAM-BA will let you re-load).
Does it make sense to tell the linker that it should link main to
address 0x0 (because this is where bootstrap will jump to) - how would
I do that?
The exception table for this processor is at a well known location (possibly one of two depending on strapping). the exception handlers need to be in the right place for the processor to boot properly. Generally it is the linker that does the final arranging of code and it is linker specific as to how you tell the linker where to put things so the answer is in the documentation for the linker and also either somewhere in the project it specifies this information (linker script, makefile, etc) or a default is used either global default or some variable or command line option tells one of the tools where to look for this information. so how you do it is read the docs and do what the docs say.

call stack unwinding in ARM cortex m3

I would like to create a debugging tool which will help me debug better my application.
I'm working bare-bones (without an OS). using IAR embedded workbench on Atmel's SAM3.
I have a Watchdog timer, which calls a specific IRQ in case of timeout (This will be replaced with a software reset on release).
In the IRQ handler, I want to print out (UART) the stack trace, of where exactly the Watchdog timeout occurred.
I looked in the web, and I didn't find any implementation of that functionality.
Anyone has an idea on how to approach this kind of thing ?
EDIT: OK, I managed to grab the return address from the stack, so I know exactly where the WDT timeout occurred.
Unwinding the whole stack is not simple as it first appears, because each function pushes different amount of local variables into the stack.
The code I end up with is this (for others, who may find it usefull)
void WDT_IrqHandler( void )
{
uint32_t * WDT_Address;
Wdt *pWdt = WDT ;
volatile uint32_t dummy ;
WDT_Address = (uint32_t *) __get_MSP() + 16 ;
LogFatal ("Watchdog Timer timeout,The Return Address is %#X", *WDT_Address);
/* Clear status bit to acknowledge interrupt */
dummy = pWdt->WDT_SR ;
}
ARM defines a pair of sections, .ARM.exidx and .ARM.extbl, that contain enough information to unwind the stack without debug symbols. These sections exist for exception handling but you can use them to perform a backtrace as well. Add -funwind-tables to force GCC to include these sections.
To do this with ARM, you will need to tell your compiler to generate stack frames. For instance with gcc, check the option -mapcs-frame. It may not be the one you need, but this will be a start.
If you do not have this, it will be nearly impossible to "unroll" the stack, because you will need for each function the exact stack usage depending on parameters and local variables.
If you are looking for some exemple code, you can check dump_stack() in Linux kernel sources, and find back the related piece of code executed for ARM.
It should be pretty straight forward to follow execution. Not programmatically in your isr...
We know from the ARM ARM that on a Cortex-M3 it pushes xPSR,
ReturnAddress, LR (R14), R12, R3, R2, R1, and R0 on the stack. mangles the lr so it can detect a return from interrupt then calls the entry point listed in the vector table. if you implement your isr in asm to control the stack, you can have a simple loop that disables the interrupt source (turns off the wdt, whatever, this is going to take some time) then goes into a loop to dump a portion of the stack.
From that dump you will see the lr/return address, the function/instruction that was interrupted, from a disassembly of your program you can then see what the compiler has placed on the stack for each function, subtract that off at each stage and go as far back as you like or as far back as you have printed the stack contents.
You could also make a copy of the stack in ram and dissect it later rather than doing such things in an isr (the copy still takes too much time but is less intrusive than waiting on the uart).
If all you are after is the address of the instruction that was interrupted, that is the most trivial task, just read that from the stack, it will be at a known place, and print it out.
Did I hear my name? :)
You will probably need a tiny bit of inline assembly. Just figure out the format of the stack frames, and which register holds the ordinary1 stack pointer, and transfer the relevant values into C variables from which you can format strings for output to the UART.
It shouldn't be too tricky, but of course (being rather low-level) you need to pay attention to the details.
1As in "non-exception"; not sure if the ARM has different stacks for ordinary code and exceptions, actually.
Your watchdog timer can fire at any point, even when the stack does not contain enough information to unwind (e.g. stack space has been allocated for register spill, but the registers not copied yet).
For properly optimized code, you need debug info, period. All you can do from a watchdog timer is a register and stack dump in a format that is machine readable enough to allow conversion into a core dump for gdb.

Resources