Bootloader. ARM CORTEX M0+ relocating Interrupt Table assembly ERROR

Bootloader. ARM CORTEX M0+ relocating Interrupt Table assembly ERROR - c

I am currently trying to develop my own bootloader for an Atmel SAM R21.
My idea is to run the bootloader firstly, so it will decide if an update is needed to be performed or just jumping to the application. The main problem is that the Interrupt Vector Table is located at the 0x0000_0000 address, so it needs to be relocated just before the application code, so if the bootloader has a 8KB space set in the linker file and using the BOOTPROT fuse in that way (setting this fuse it is supposed that there will be some protection to the amount of memory selected through the fuse), the vector table should start at the 0x0000_2000 address.
In order to relocate the vector table I pretend to use the VTOR register, which is an offset applied to the original table address (0x0000_0000).
The assembly code is the following:
asm(" LDR R0,=0xE000ED08 "); //VTOR ADDRESS
asm("LDR R1,=0x00002000"); //OFFSET
asm(" STR R1, [R0]");
asm(" LDR R0,[R1] ");
asm(" MOV SP, R0");
asm(" LDR R0,[R1, #4]");
asm(" BX R0");
LDR instruction gives me the following error:
Error[Og006]: Syntax error in inline assembly: "Error[401]: Operand syntax error"
What am I doing wrong? Maybe I am trying to use ARM instruction instead of a Thumb one?
I will very appreciate any advise.
I am also doubting if once I get the Interrup Vector Table relocated, should I count with the Initial MSP value also? I want to mean, if the Interrupt Vector table starts at address 0x0000_2000 after being relocated, I should count 4(bytes) * Interrupt in order to know which should be the initial application address, shouldn't I? If someone knows something about this it would be nice. I know I am close (or I think so), but I need to clarify those points.
Edited 27/06/16 at 13:04.
This instruction works LDR R0,[R1]
So I guess it is something related to receive the 32 bits address into the register, but I don't understand why it is complaining about this.
SOLUTION:
As an answer to my question, someone posted that not all assembly directives can be used inlined, so I needed to create an assembler file, my_file.s
In this file should be created a function to be called from outside, something like this:
#define _PORT_ASM_ARM_SRC
#define __ASSEMBLY__
;/****************************************************************************
;** **
;** ASSEMBLY FUNCTIONS **
;** **
;****************************************************************************/
NAME start_app
RSEG CODE:CODE(2)
THUMB
PUBLIC jump_to_app
;/***************************************************************************/
;/***************************************************************************/
;/* jump_to_app()
; * Jump to application function.
; */
jump_to_app:
LDR R0,=0xE000ED08 ; Set R0 to VTOR address
LDR R1,=0x00010000 ; User’s flash memory based address
STR R1, [R0] ; Define beginning of user’s flash memory as vector table
LDR R0,[R1] ; Load initial MSP value
MOV SP, R0 ; Set SP value (assume MSP is selected)
LDR R0,[R1, #4] ; Load reset vector
BX R0 ; Branch to reset handler in user’s flash
END
After doing this, the function prototipe should be included into a .h file of your project as a normal function, using something like this:
void jump_to_app(void);
Best regards,
Iván.

There is nothing syntactically wrong with the assembly code above. If you put the assembly code into an asm file and assemble it, it will build (as to whether it does what you intended I have not checked).
For some reason the inline assembler does not like LDR Rd, =expr.
See the following quote from the IAR Embedded Workbench Help:
The pseudo-instruction LDR Rd, =expr is not available from inline assembler
Also from ARM:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472j/chr1359124248868.html

SOLUTION:
As an answer to my question, someone posted that not all assembly directives can be used inlined, so I needed to create an assembler file, my_file.s In this file should be created a function to be called from outside, something like this:
#define _PORT_ASM_ARM_SRC
#define __ASSEMBLY__
;/****************************************************************************
;** **
;** ASSEMBLY FUNCTIONS **
;** **
;****************************************************************************/
NAME start_app
RSEG CODE:CODE(2)
THUMB
PUBLIC jump_to_app
;/***************************************************************************/
;/***************************************************************************/
;/* jump_to_app()
; * Jump to application function.
; */
jump_to_app:
LDR R0,=0xE000ED08 ; Set R0 to VTOR address
LDR R1,=0x00010000 ; User’s flash memory based address
STR R1, [R0] ; Define beginning of user’s flash memory as vector table
LDR R0,[R1] ; Load initial MSP value
MOV SP, R0 ; Set SP value (assume MSP is selected)
LDR R0,[R1, #4] ; Load reset vector
BX R0 ; Branch to reset handler in user’s flash
END
After doing this, the function prototipe should be included into a .h file of your project as a normal function, using something like this:
void jump_to_app(void);
Best regards,
Iván.

Related

Recover from Hard Fault on Cortex M0+

Until now I had a Hard fault handler in C that I defined in the vector table:
.sect ".intvecs"
.word _top_of_main_stack
.word _c_int00
.word NMI
.word Hard_Fault
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
....
....
....
One of our tests triggers a hard fault (on purpose) by writing to a non existing address. Once the test is done, the handler returns to the calling function and the cortex recovers from the fault. Worth mentioning that the handler does not have any arguments.
Now I'm in the phase of writing a real handler.
I created a struct for the stack frame so we can print PC, LR, and xPSR in case of a fault:
typedef struct
{
int R0 ;
int R1 ;
int R2 ;
int R3 ;
int R12 ;
int LR ;
int ReturnAddress ;
int xPSR ;
} InterruptStackFrame_t ;
My hard fault handler in C is defined:
void Hard_Fault(InterruptStackFrame_t* p_stack_frame)
{
// Write to external memory that I can read from outside
/* prints a message containing information about stack frame:
* p_stack_frame->LR, p_stack_frame->PC, p_stack_frame->xPSR,
* (uint32_t)p_stack_frame (SP)
*/
}
I created an assembly function:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP ; store pointer to stack frame
BL Hard_Fault ; go to C function handler
POP {R0-R7} ; pop out all stack frame
MOV PC, R5 ; jump to LR that was in the stack frame (the calling function before the fault)
.endasmfunc
This is the right time to say that I don't have an OS, so I do not have to check bit[2] of LR because I definitely know that I use MSP and not PSP.
The program compiles and runs properly and I used JTAG to ensure that all registers restore to the wanted values.
When executing the last command (MOV PC, R5) the PC returns to the correct address, but at some point, the debugger indicates that the M0 is locked in a hard fault and cannot recover.
I do not understand the difference between using a C function as a handler or an assembly function that calls a C function.
Does anyone know what is the problem?
Eventually, I will use an assert function that will stuck the processor, but I want it to be optional and up to my decision.

To explain "old_timer"'s comment:
When entering an exception or interrupt handler on the Cortex the LR register has a special value.
Normally you return from the exception handler by simply jumping to that value (by writing that value to the PC register).
The Cortex CPU will then automatically pop all the registers from the stack and it will reset the interrupt logic.
When directly jumping to the PC stored on the stack however you will destroy some registers and you don't restore the interrupt logic.
Therefore this is not a good idea.
Instead I'd do something like this:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP
B Hard_Fault
EDIT
Using the B instruction may not work because the "distance" allowed for the B instruction is more limited than for the BL instruction.
However there are two possibilities you could use (unfortunately I'm not sure if these will definitely work).
The first one will return to the address that had been passed in the LR register when entering the assembler handler:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP
PUSH {LR}
BL Hard_Fault
POP {PC}
The second one will indirectly do the jump:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP
LDR R1, =Hard_Fault
MOV PC, R1
EDIT 2
You cannot use LR because it holds EXC_RETURN value. ... You have to read the LR from stack and you must clean the stack from the stack frame, because the interrupted program doesn't know that a frame was stored.
According to the Cortex M3 manual you must exit from an exception handler by writing one of the three EXC_RETURN values to the PC register.
If you simply jump to the LR value stored in the stack frame you remain in the exception handler!
If something stupid happens during the program the CPU will assume that an exception happened inside the exception handler and it hangs.
I assume that the Cortex M0 works the same way as the M3 in this point.
If you want to modify some CPU register during the exception handler you can modify the stack frame. Thc CPU will automatically pop all registers from the stack frame when you are writing the EXC_RETURN value to the PC register.
If you want to modify one of the registers not present in the stack frame (such as R5) you can directly modify it in the exception handler.
And this shows another problem of your interrupt handler:
The instruction POP {R0-R7} will set registers R4 to R7 to values that do not match the program that has been interrupted. R12 will also be destroyed depending on the C code. This means that in the program being interrupted these four registers suddenly change while the program is not prepared for that!

ARM M3: Using 'extra' space in GPIO peripheral memory map? Can you do this?

I'm trying to understand someones code and they are reading a memory space in the GPIO region (0x4002 0000 - 0x4002 03FF), but it's a higher address than the GPIO registers (they only go to 0x24).
Can you use all the extra space above 0x4002 0024 and below 0x4002 03FFF? What would happen if this space is read from?
EDIT:
I totally forgot I could just post the code. I bolded the line that causes me headaches:
R0 = 0x15
PUSH {R3,LR} ;
ADD.W R0, R0, R0,LSL#1 ;
MOV GPIO_Port_A_Address, #0x40020000
LSLS R0, R0, #2 ;
ADDS R2, GPIO_Port_A_Address, R0 ;
LDRB R2, [R2,#4] ;
MOVS R1, #1 ;
LSL.W R1, R1, R2 ;
LDR R0, [GPIO_Port_A_Address,R0] ;
UXTH R1, R1 ;
BL sub_8001ED8 ;
MOVS R0, #0 ;
POP {R3,PC} ;

As a start, there are other GPIO ports that live at every multiple of 0x200 from 0x40020000 - 0x400223FF and beyond that but still in your range, there are the CRC peripheral, RCC, and Flash controller. The relevant memory map is on page 50 of RM0033 (Rev 3, old version so page number is probably wrong).
0x40023C00 - 0x40023FFF Flash interface register
0x40023800 - 0x40023BFF RCC
0x40023000 - 0x400233FF CRC
0x40022000 - 0x400223FF GPIOI
0x40021C00 - 0x40021FFF GPIOH
0x40021800 - 0x40021BFF GPIOG
0x40021400 - 0x400217FF GPIOF
0x40021000 - 0x400213FF GPIOE
0X40020C00 - 0x40020FFF GPIOD
0x40020800 - 0x40020BFF GPIOC
0x40020400 - 0x400207FF GPIOB
0x40020000 - 0x400203FF GPIOA
The code you have posted, as best I have been able to calculate, does access some unimplemented addresses (0x40020100, 0x400200FC), so I'm not sure what's going on there, or if I have miscalculated. In testing on an STM32F207, I can confirm that you can read and write to this without getting a fault, but the registers are unimplemented and always read as zero.
It would be a really bad idea to use peripheral registers as general purpose memory. Not every bit will be R/W, not all addresses may be implemented, and that's not even getting into the fact that you'll be configuring hardware based on application data and not correct register values. The range you've specified includes the flash controller and RCC, both of which are vital to the operation of the microcontroller.
If you are out of memory, there are some memory spaces that you may be able to use as general purpose if they are not already used for another purpose. The STM32F2's have a 4 kB backup SRAM that can be used, though there is some setup required to make it R/W. The USB peripheral(s) also has some RAM built in for endpoint buffers. If you aren't using USB, you could abuse some of this memory, and you could configure the USB peripheral so there aren't any bad side effects.

How the dynamic linker determines which routine to call on Linux?

I have a question about dynamic linking on Linux. Consider the following disassembly of an ARM binary.
8300 <printf#plt-0x40>:
....
8320: e28fc600 add ip, pc, #0, 12
8324: e28cca08 add ip, ip, #8, 20 ; 0x8000
8328: e5bcf344 ldr pc, [ip, #836]! ; 0x344
....
83fc <main>:
...
8424:ebffffbd bl 8320 <_init+0x2c>
Main function calls printf at 8424: bl 8320. 8320 is an address in the .plt shown above. Now the code in .plt makes call to dynamic linker to invoke printf routine. My question is how the dynamic linker will be able to say that it is a call to printf?

TLDR; The PLT calls the dynamic linker by passing:
the address of the GOT entry in IP (&PLTGOT[n+3]);
&PLTGOT[2] is in LR;
Moreover PLTGOT[1] identifies the shared-object/executable.
The dynamic linker use this to find the relocation entry (plt_relocation_table[n]) and thus the symbol (printf).
Explanation of the PLT entry code
This is explained (somehow) in section A.3 of ELF for ARM:
8320: e28fc600 add ip, pc, #0, 12
8324: e28cca08 add ip, ip, #8, 20 ; 0x8000
8328: e5bcf344 ldr pc, [ip, #836]! ; 0x344
Which are explained by:
ADD ip, pc, #-8:PC_OFFSET_27_20:__PLTGOT(X)
; R_ARM_ALU_PC_G0_NC(__PLTGOT(X))
ADD ip, ip, #-4:PC_OFFSET_19_12: __PLTGOT(X)
;R_ARM_ALU_PC_G1_NC(__PLTGOT(X))
LDR pc, [ip, #0:PC_OFFSET_11_0:__PLTGOT(X)]!
; R_ARM_LDR_PC_G2(__PLTGOT(X))
Those instructions do two things:
they compute the address of the GOT entry as an offset from PC and store it in the IP register;
they jump to this GOT entry.
The spec notes that:
The write-back on the final LDR ensures that ip contains
the address of the PLTGOT entry. This is critical to
incremental dynamic linking.
The "write-back" is the use of "!" in the last instruction: this is used to update IP register with the final offset (#836). This way IP contains the addess of the GOT entry at the end of the PLT entry.
The dynamic linker has the address of the GOT entry in IP:
it can find the shared-object or executable;
it can find the correct relocation entry.
This relocation entry references the symbol of target function (printf in your case):
Offset Info Type Sym. Value Sym. Name
0001066c 00000116 R_ARM_JUMP_SLOT 00000000 printf
The Base Platform ABI for the ARM architecture notes that:
When the platform supports lazy function binding (as ARM Linux does)
this ABI requires ip to address the corresponding
PLTGOT entry at the point where the PLT calls through it.
(The PLT is requir ed to behave as if it ended with LDR pc, [ip]).
Finding the relocation entry from the GOT
Now the way the relocation entry is found from the GOT address is not clear. Binary search could be used but is would not be convenient. The GNU ld.so does it like this (glibc/sysdeps/arm/dl-trampoline.S):
dl_runtime_resolve:
cfi_adjust_cfa_offset (4)
cfi_rel_offset (lr, 0)
# we get called with
# stack[0] contains the return address from this call
# ip contains &GOT[n+3] (pointer to function)
# lr points to &GOT[2]
# Save arguments. We save r4 to realign the stack.
push {r0-r4}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
cfi_rel_offset (r2, 8)
cfi_rel_offset (r3, 12)
# get pointer to linker struct
ldr r0, [lr, #-4]
# prepare to call _dl_fixup()
# change &GOT[n+3] into 8*n NOTE: reloc are 8 bytes each
sub r1, ip, lr
sub r1, r1, #4
add r1, r1, r1
[...]
The address of the second GOT entry is in LR. I guess this is donebyt .PLT0:
00015b84 :
15b84: e52de004 push {lr} ; (str lr, [sp, #-4]!)
15b88: e59fe004 ldr lr, [pc, #4] ; 15b94
15b8c: e08fe00e add lr, pc, lr
15b90: e5bef008 ldr pc, [lr, #8]!
15b94: 0012f46c andseq pc, r2, ip, ror #8
From those two GOT addresses, the dynamic linker can find the GOT offset and the offset in the PLT relocation table.
From &GOT[2], the dynamic linker can find the second entry of the PLTGOT (GOT[1]) which contains the address of the linker struct (a reference used by the dynamic linker to recosgnise this shared-object/executable).
I don't where this is specified: it does not seem to be part of the base ARM ABI spec.

.rela.plt contains the address of printf to inform the dynamic linker from where to locate the printf
check this link for details very soft to digest https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html. This article also clarify about process of variables to be accessed through Shared libraries first and then functions.

The process of dynamic linking is described in great detail here.
TL;DR: at static link time, ld creates a set of tables in special sections such as .rel.dyn, .rel.plt, etc., which tell the runtime loader what to do at runtime.
You can examine these tables with nm -D, readelf -Wr, objdump -R, etc.

ARM Cortex-M4: issues met when calling printf in assembly

I am trying to call printf in ARM M4 assembly and meet some problems. The purpose is to dump content in R1. The code is like the following
.data
.balign 4
output_string:
dcb "content in R1 is 0x%x\n", 0
....
.text
....
push {r0, r1}
mov r1, r0
ldr r0, =output_string
bl printf
pop {r0, r1}
The problem I meet is that, when put "output_string" address into R0, the value is added with a extra 1. For example, if the symbol "output_string" have a value of 0x2000, R0 will get the value 0x2001.
I feel this has something to do with THUMB/ARM mode. But I have declare "output_string" in data section, why the assembler still translate it as an instruction address?
Or is there some more formal way to do such in-assembly function calling?

I think you should use:
ldr r0, =output_string
The = prefix is an assembler shorthand to make it load an arbitrary 32-bit constant. See this ARM Information Center page.

Passing arguments from asm to C in on ARM

I read a lot of topics on this forum and found a lot of answers on this subject. I achieved to pass 5 arguments to a C function from my assembly code. For doing this, i used the instructions below :
mov r0, #0
mov r1, #1
mov r2, #2
mov r3, #3
mov r4, #4
STR r4, [sp, #-4]!
BL displayRegistersValue
But today i'm trying to pass the whole registers to a C function to save them in a C structure. I tried with this instruction :
STMDB sp!, {registers that i want to save}
My C function :
displayRegistersValue(int registers[number_of_registers])
char printable = registers[0] + (int)'0'; // Convert in a printable character
print_uart0(&printable);
But my display is not good. So, how I can access to the registers in C code?

Pretty sure the ARM standard only allows R0-R3 to be passed by value so 4 max. If you need more values, then push them onto the stack and access them that way - like the compiler does. Or make a struct and pass its address.
Ok, doubled cheked and I was right here is a link to the ARM calling conventions - down the page a bit.
To do what you want, pass the address of some memory location (an array) into your assembly routine. Once you have that address, probably within r0, you can stmdb! into that location all your register values and that memory will be viewable at the C level.
Beware, this probably isn't going to do what you think it will. Those values are allowed to change quite a bit as per the calling convention link above. If this is for debugging, you are better off using a debugger and watching the registers that way.
Ok, you are still not understanding here:
{
int registerValues[14];
myAsmRoutine(registerValues);
print_uart0(& registerValues);
}
myAsmRoutine:
stmia r0!, {r1-r14}
blx lr
I skipped R0 and PC, but you get the idea. Also, you will need to do something a bit mroe complex to change the values into a printable format - sprintf or itoa os something like that.

displayRegistersValue(int registers[number_of_registers])
this is an array not a structure and is passed as a pointer to something not as a long list of items. same goes for structures btw.
It is usually easiest to construct a C function that does what you want in asm then see what the compiler produces, then go from there (use the ABI document to confirm, etc).
#define NUMREGS 13
void displayRegistersValue(unsigned int registers[NUMREGS]);
void outer ( void )
{
unsigned int regs[NUMREGS];
displayRegistersValue(regs);
}
> arm-none-linux-gnueabi-gcc -O2 -c fun.c -o fun.o
> arm-none-linux-gnueabi-objdump -D fun.o
fun.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <outer>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e24dd03c sub sp, sp, #60 ; 0x3c
8: e28d0004 add r0, sp, #4
c: ebfffffe bl 0 <displayRegistersValue>
10: e28dd03c add sp, sp, #60 ; 0x3c
14: e49df004 pop {pc} ; (ldr pc, [sp], #4)
You will need to do something similar, make room on the stack by adding to the stack pointer, save the lr so you dont trash it with the branch link, copy your registers to that memory (the stack) point r0 to the beginning of the memory/array you want to pass, then call the function (r0 being the first and only parameter you are passing to the function).
push {lr}
mov lr,sp
stmdb sp!,{r0-r12}
mov r0,lr
bl displayRegistersValue
add sp,sp,#52
pop {lr}

An array is passed as a pointer in a single register. If you want 5 registers then you need to have 5 parameters (int i1, int i2 etc.).

To quote from the ARM APCS document:
"The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls)."
So if you want to pass more than 4 values to a C function, you need to pass the rest of the values on the stack. A better idea would be to put the register values in a memory region that has been statically allocated and pass the address of the memory (pointer) to the C function. The pointer can be de-referenced by the function to get to the register values.