"Synchronous Abort" handler, esr 0x96000004 - u-boot

I am trying to boot an ELF file using u-boot on the orangepi-lite2 board, which is based on an allwinnertech-H6 (Cortex-A53 CPU).
The execution of my program is triggering an exception:
elr: 000000004a004d60 lr : 000000004a004dac (reloc)
elr: 000000007ff59d60 lr : 000000007ff59dac
x0 : 0000000000000221 x1 : 00000000400800c0
x2 : 0000000000000040 x3 : 000000000000003f
x4 : 0000000040080090 x5 : 000000007ff7435c
x6 : 000000007bf2e588 x7 : 0000000000000008
x8 : 0000000000000010 x9 : 0000000000000008
x10: 0000000000000044 x11: 000000000000000a
x12: 0000000000000000 x13: 0000000000000200
x14: 0000000040080000 x15: 0000000000000020
x16: 000000007ff59bf8 x17: 0000000000000000
x18: 000000007bf34df0 x19: 0000000040080000
x20: b900002098080381 x21: 0000000000000000
x22: 000000007bf64a40 x23: 0000000000000000
x24: 0000000000000002 x25: 0000000000000000
x26: 0000000000000000 x27: 0000000000000000
x28: 000000007bf65ec0 x29: 000000007bf2e590
Code: 17ffffd9 f9401674 8b140274 8b181a94 (f9400680)
Resetting CPU ...
Here are the files/commands I am using:
boot.txt:
···
setenv bootargs console=ttyS0,115200 root=/dev/mmcblk0p1 rootwait panic=10
load mmc 0:1 0x40080000 boot/app.elf
# bootm 0x42000000 - 0x43000000
bootelf 0x40080000 - 0x40280000`
start.S:
#define PACFG1 0x07022000
#define PADAT 0x07022010
.global _start
.align 3
_start:
main_loop :
bl led_on
bl delay
bl led_off
bl delay
b main_loop
led_on :
ldr w0, = 0x11111111
ldr x1, = PACFG1
str w0, [x1]
ldr w0, = 0xFFFFFFFF
ldr x1, = PADAT
str w0, [x1]
ret
led_off :
ldr w0, = 0x11111111
ldr x1, = PACFG1
str w0, [x1]
ldr w0, = 0x0
ldr x1, = PADAT
str w0, [x1]
ret
delay:
mov x2, #0x40000
mov x3, #0
delay_loop:
sub x2, x2, #1
cmp x2, x3
bne delay_loop
ret
gboot.lds:
OUTPUT_FORMAT("elf64-littleaarch64", "elf64-littleaarch64", "elf64-littleaarch64")
OUTPUT_ARCH(aarch64)
ENTRY(_start)
SECTIONS
{
. = 0x40080000;
. = ALIGN(8);
.text :
{
start.o(.text*)
*(.text*)
}
. = ALIGN(8);
.data :
{
*(.data*)
}
_end = .;
. = ALIGN(8);
.bss_start : {
KEEP(*(.__bss_start));
}
. = ALIGN(8);
.bss : {
*(.bss*)
. = ALIGN(8);
}
. = ALIGN(8);
.bss_end : {
KEEP(*(.__bss_end));
}
}
Makefile:
ARCH = arm64
CROSS_COMPILE = aarch64-none-linux-gnu
CC = $(CROSS_COMPILE)-gcc
LD = $(CROSS_COMPILE)-ld
OBJCOPY = $(CROSS_COMPILE)-objcopy
all : start.o
$(LD) -Tgboot.lds -o gboot.elf $^
$(OBJCOPY) -O binary gboot.elf gboot.bin
../tools/mksunxiboot gboot.bin gboot.bins
%.o : %.S
$(CC) -g -c $^
%.o : %.c
$(CC) -g -c $^
clean:
rm -rf gboot.elf gboot.bin *.o gboot.bins
I am just using the 'gboot.elf' file, which I renamed to 'app.elf'. Its path on the SD card is: boot/app.elf.
I need a god to save me.

For this kind of typical bare-metal program, attempting to load an ELF file looks a little bit overkill to me. I was able to test your program on an Orangepi Plus One (Allwinner H6), with a slightly different procedure, and it was able to light a couple of the on-board LED on without triggering any exceptions, even though the wiring of the LEDs may be different on my SBC.
I would add that mksunxiboot is 12 years old and was written for a Cortex-A7 A33 (32 bit) Soc. I would therefore stay away from this tool for now.
Just copy gboot.bin on your SD card, and use the following commands:
loadb mmc 0:1 0x40080000 boot/gboot.bin
go 0x40080000
Note that the gocommand is very convenient in your case because simple programs can just return to u-boot by executing ret - the value ix x0 will be returned to u-boot as the program exit code.
Example:
.global _start
.align 3
.text
_start:
mov x0, xzr
add x0, x0, #10
ret
Now execute the program:
=> go 0x40080000
## Starting application at 0x40080000 ...
## Application terminated, rc = 0xA
Application exited with 10d/0x0a as its exit status, as expected.
Please note that if you load successive programs without rebooting, you may have to flush the caches in order to the second program to be executed instead of the cached first one:
=> dcache flush
=> icache flush
=> go 0x40080000
## Starting application at 0x40080000 ...
## Application terminated, rc = 0xA
This caused me a load of headaches when I started experimenting with sample programs from u-boot.

Thanks for Frant's help, the led is flashing. according to your solution, I modify the code.
load mmc 0:1 0x40080000 boot/gboot.bin
go 0x40080000
another point, I modify the delay time:
delay:
mov x2, #0x40000
mov x3, #0
delay_loop:
delay:
mov x2, #0x10000000
mov x3, #0
delay_loop:
and i got something in this example.
the load address should be equal to entry point in linker script, if the app do not have an image header which added by mkimage
the bootelf command do not support aarch64 program.
thanks again.

Related

Can't run a no-op function in qemu

I am using xpack qemu arm which is a fork of qemu with support for STM32 boards.
I am trying to run a simple program to get myself started.
I have my linker script
ENTRY(Reset_Handler)
MEMORY
{
FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 0x08000000
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 0x20000000
}
SECTIONS
{
. = ORIGIN(FLASH);
.text :
{
LONG(ORIGIN(RAM) + LENGTH(RAM)) /* set the SP initial value */
LONG(Reset_Handler) /* set the PC initial value */
*(.text)
}
}
my assembly file
.section .text
.global Reset_Handler
Reset_Handler:
BL main
BL .
and a c function, main
void main () {
return;
}
When I assemble, compile, and link, the generated memory contents are
00000000 <main-0x8>:
0: 40000000 .word 0x40000000
4: 00000020 .word 0x00000020
00000008 <main>:
void main () {
8: e52db004 push {fp} ; (str fp, [sp, #-4]!)
c: e28db000 add fp, sp, #0
return;
10: e1a00000 nop ; (mov r0, r0)
14: e24bd000 sub sp, fp, #0
18: e49db004 pop {fp} ; (ldr fp, [sp], #4)
1c: e12fff1e bx lr
00000020 <Reset_Handler>:
.section .text
.global Reset_Handler
Reset_Handler:
BL main
20: ebfffff8 bl 8 <main>
BL .
24: ebfffffe bl 24 <Reset_Handler+0x4>
I am using a STM32F407VG MCU, the docs state that
After this startup delay is over, the CPU fetches the top-of-stack value from address
0x0000 0000, then starts code execution from the boot memory starting from 0x0000 0004.
Thus, I store the initial value of the stack pointer 0x40000000 in memory location 0x00000000 and the initial value of the program counter in memory location 0x00000004
I start qemu like so
qemu-system-gnuarmeclipse -mcu STM32F407VG -machine STM32F4-Discovery -image myfile.elf -nographic --verbose --verbose -no-reboot -S
And I can see that the SP and PC registers (R13 and R15, respectively) are set to the expected values:
R00=00000000 R01=00000000 R02=00000000 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=40000000 R14=00000000 R15=00000020
PSR=40000153 -Z-- A svc32
FPSCR: 00000000
So, following the memory mapping output, the program should flow like so:
PC is set to 0x20, which runs BL 8 <main>
This branches to memory location 0x8, which is the start of the main function, it also saves the return address in LR
This function should perform a no-op, with pushing and popping FP to/from the stack
the function should return to the address of LR (which was previously saved)
The next instruction should loop forever (24: ebfffffe bl 24 <Reset_Handler+0x4>)
However, I run this, and I get the following error:
(qemu) Bad ram pointer 0x4
I am a little lost on what this error means. Am I missing something in my setup?
ORIGIN = 0x00000000
The memory is aliased to 0 by the hardware but the real address is not zero,
You linker script has to the use corrent FLASH address not boot time alias.
0x8000000
I would suggest to use stm provided linker scripts as you not exactly understand the documentation of the chip.

How to implement SVC handler on ARM926EJ-S?

I'm writing an amateur operating system for ARM-based devices and currently trying to make it working in QEMU's versatilepb (ARM926EJ-S).
The problem arrives when I try to implement syscalls to my kernel. The idea is pretty simple: to implement system calls via SVC (SWI) instruction. So applications work in user mode, and to call a kernel function, they do SVC <code> instruction, so ARM processor switches to supervisor mode and calls the appropriate SVC handler.
But the problem is that when I call __asm__("SVC #0x08");, the device just resets and calls RESET_HANDLER, so it looks like the emulator just reboots.
I spent a few hours already to figure out what is the problem, but still got no idea.
Here is the code of ivt.s (the initial code with handlers):
.global __RESET
__RESET:
B RESET_HANDLER /* Reset */
B . /* Undefined */
B SWI_HANDLER /* SWI */
B . /* Prefetch Abort */
B . /* Data Abort */
B . /* reserved */
B . /* IRQ */
B . /* FIQ */
RESET_HANDLER:
MSR CPSR_c, 0x13 /* Supervisor mode */
LDR SP, =stack_top
MSR CPSR_c, 0x10 /* User mode */
LDR SP, =usr_stack_top
BL usermode_function
B .
SWI_HANDLER:
PUSH {LR}
BL syscall
POP {LR}
MOVS PC, LR
This is how I make the syscall:
void usermode_function() {
__asm__("SVC #0x00"); // Make syscall
}
And syscall implementation:
void syscall() {
// NEVER CALLED
__asm__("PUSH {r0-r7}");
__asm__("POP {r0-r7}");
}
But the code under SWI_HANDLER even never invoked.
I really even don't know how to ask the question, since it looks like I'm missing some very basic information in my mind.
So what could be the problem? Which information I should provide to make you able to help me?
Here is also the linker script:
ENTRY(__RESET)
SECTIONS
{
. = 0x10000;
.ivt . : { ivt.o(.text) }
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(8);
. = . + 0x1000; /* 4KB of stack memory */
stack_top = .;
. = . + 0x100;
usr_stack_top = .;
}
Many thanks to #Jester and #old_timer, the problem is solved.
The problem was not with code, but with linker script. I have put my vector table at 0x10000, as you can see in the linker script, but it should be placed at 0x0. So SVC was not handled properly because the handler was placed in a wrong place.
When I changed the base address in my ld script and tried to load the firmware as ELF, everything starts to work perfectly.
You solved it one way but I'll still write my answer.
Very bare bare metal example...
strap.s
.globl _start
_start:
b reset
b hang
b swi_handler
b hang
reset:
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstrings ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x20);
}
static void hexstring ( unsigned int d )
{
hexstrings(d);
uart_send(0x0D);
uart_send(0x0A);
}
int notmain ( void )
{
unsigned int ra;
hexstring(0x12345678);
hexstring(GETPC());
for(ra=0;ra<0x20;ra+=4)
{
hexstrings(ra);
hexstring(GET32(ra));
}
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
Build
arm-linux-gnueabi-as --warn --fatal-warnings -march=armv5t strap.s -o strap.o
arm-linux-gnueabi-gcc -c -Wall -O2 -nostdlib -nostartfiles -ffreestanding -march=armv5t notmain.c -o notmain.o
arm-linux-gnueabi-ld strap.o notmain.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
arm-linux-gnueabi-objcopy notmain.elf -O binary notmain.bin
Execute
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.bin
Output
12345678
0001003C
00000000 E3A00000
00000004 E59F1004
00000008 E59F2004
0000000C E59FF004
00000010 00000183
00000014 00000100
00000018 00010000
0000001C 00000000
Examine, assemble disassemble
.word 0xE3A00000
.word 0xE59F1004
.word 0xE59F2004
.word 0xE59FF004
.word 0x00000183
.word 0x00000100
.word 0x00010000
.word 0x00000000
0: e3a00000 mov r0, #0
4: e59f1004 ldr r1, [pc, #4] ; 10 <.text+0x10>
8: e59f2004 ldr r2, [pc, #4] ; 14 <.text+0x14>
c: e59ff004 ldr pc, [pc, #4] ; 18 <.text+0x18>
10: 00000183 andeq r0, r0, r3, lsl #3
14: 00000100 andeq r0, r0, r0, lsl #2
18: 00010000 andeq r0, r1, r0
1c: 00000000 andeq r0, r0, r0
So you can see that they are basically launching a Linux kernel the ATAGS/dtb is in ram at 0x100 perhaps. And they jump to 0x10000. 0001003C being the pc shown by the program as loaded with that command line using the -O binary version was loaded at 0x10000 and executed there. If you were to have an swi event then you would execute starting with the ldr r2 instruction and land on the rest handler in your code.
(Note incidentally that qemu doesn't properly model uarts, at least so far as I have found so you don't have to initialize them you don't have to wait for the tx buffer to be empty you just jam bytes into the tx buffer and they come out).
If you run the elf without changing the linker script
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.elf
12345678
0001003C
00000000 00000000
00000004 00000000
00000008 00000000
0000000C 00000000
00000010 00000000
00000014 00000000
00000018 00000000
0000001C 00000000
Interesting it loads and runs at 0x10000 which is what it was linked for but doesn't bother to setup for coming out of reset at 0x00000000 and/or this is that linker issue that makes for bad elf files and it padded with zeros which is
1c: 00000000 andeq r0, r0, r0
So it could have executed from 0x00000000 to 0x10000 and run into our code.
If we change the linker script
ram : ORIGIN = 0x00000000, LENGTH = 32K
Run the elf not the bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.elf
12345678
0000003C
00000000 EA000002
00000004 EA000006
00000008 EA000006
0000000C EA000004
00000010 E321F013
00000014 E3A0D801
00000018 E321F010
0000001C E3A0DA09
as expected.
Now for the swi.
strap.s
.globl _start
_start:
b reset
b hang
b swi_handler
b hang
reset:
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
bl handler
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl do_swi
do_swi:
svc #0x08
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
void do_swi ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstring ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
void handler ( void )
{
hexstring(0x11223344);
}
int notmain ( void )
{
hexstring(0x12345678);
do_swi();
hexstring(0x12345678);
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
Run the elf, output is
12345678
11223344
12345678
as desired. But you could have also done this
strap.s
.globl _start
_start:
ldr pc,reset_addr
ldr pc,hang_addr
ldr pc,swi_handler_addr
ldr pc,hang_addr
reset_addr: .word reset
hang_addr: .word hang
swi_handler_addr: .word swi_handler
reset:
mov r0,#0x10000
mov r1,#0x00000
ldmia r0!,{r2,r3,r4,r5}
stmia r1!,{r2,r3,r4,r5}
ldmia r0!,{r2,r3,r4,r5}
stmia r1!,{r2,r3,r4,r5}
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
bl handler
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl do_swi
do_swi:
svc #0x08
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
void do_swi ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstring ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
void handler ( void )
{
hexstring(0x11223344);
}
int notmain ( void )
{
unsigned int ra;
hexstring(0x12345678);
for(ra=0x10000;ra<0x10020;ra+=4) hexstring(GET32(ra));
for(ra=0x00000;ra<0x00020;ra+=4) hexstring(GET32(ra));
do_swi();
hexstring(0x12345678);
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
And now both the elf and the binary image versions work. I let the toolchain do the work for me:
00010010 <reset_addr>:
10010: 0001001c
00010014 <hang_addr>:
10014: 00010048
00010018 <swi_handler_addr>:
10018: 0001004c
The ldr pc, is position independent. I copy the four entries plus the four (well three) addresses so that 0x00000 matches 0x10000 and now the exception table (it is not a vector table btw) works.
With newer arm processors you could instead set VTOR to 0x10000 and it would use the one built into the binary, no copying necessary. Or as you solved just build and run your program from 0x00000 and there you go. I wanted to show the alternatives as well as how to figure out (by cheating, you have to love uarts in qemu) what qemu is doing and where it is loading without having to use a debugger.

Keil stm32, using assembly, scatter file and c. How to export c code entry point to assembly?

In order to combine .c and assembly, I want to pass start address of my .c code, and program microcontroller to know that its program starts at that address. As I am writing my startup file in assembly, I need to pass .c code starting address to assembly, and then to write this address to the specific memory region of microcontroller ( so the microcontroller can start execution on this address after RESET)
Trying to create a project for stm32f103 in Keil with this structure:
Some .c file, for example main.c (for the main part of the program).
Startup file in assembly language. Which gets the adress of entry to the function written in some .c file, to be passed to Reset_Handler
Scatter file, written in this way:
LR_IROM1 0x08000000 0x00010000 { ; load region size_region
ER_IROM1 0x08000000 0x00010000 { ; load address = execution address
*.o (RESET, +First) ; RESET is code section with I.V.T.
* (InRoot$$Sections)
.ANY (+RO)
.ANY (+XO)
}
RW_IRAM1 0x20000000 0x00005000 { ; RW data
.ANY (+RW +ZI)
}
}
The problem is passing the entry point to the .c function. Reset_Handler, which needs .c entry point(starting adress) passed by __main, looks like this:
Reset_Handler PROC
EXPORT Reset_Handler [WEAK]
IMPORT __main
LDR R0, =__main
BX R0
ENDP
bout entry point __main, as a answer for one assembly raleted question was written:
__main() is the compiler supplied entry point for your C code. It is not the main() function you write, but performs initialisation for the
standard library, static data, the heap before calling your `main()'
function.
So, how to get this entry point in my assembly file?
Edit>> If somebody is interested in solution for KEIL, here it is, its all that simple!
Simple assembly startup.s file:
AREA STACK, NOINIT, READWRITE
SPACE 0x400
Stack_top
AREA RESET, DATA, READONLY
dcd Stack_top
dcd Reset_Handler
EXPORT _InitMC
IMPORT notmain
AREA PROGRAM, CODE, READONLY
Reset_Handler PROC
bl notmain
ENDP
_InitMC PROC ;start of the assembly procedure
Loop
b Loop ;infinite loop
ENDP
END
Simple c file:
extern int _InitMC();
int notmain(void) {
_InitMC();
return 0;
}
Linker is the same as the one mentioned above.
Project build was successful.
Using the gnu toolchain for example:
Bootstrap:
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word loop
.word loop
.word loop
.thumb_func
reset:
bl notmain
b loop
.thumb_func
loop: b .
.align
.thumb_func
.globl fun
fun:
bx lr
.end
C entry point (function name is not relevant, sometimes using main() adds garbage, depends on the compiler/toolchain)
void fun ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) fun(ra);
return(0);
}
Linker script
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Build
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m0 -march=armv6-m -c so.c -o so.thumb.o
arm-none-eabi-ld -o so.thumb.elf -T flash.ld flash.o so.thumb.o
arm-none-eabi-objdump -D so.thumb.elf > so.thumb.list
arm-none-eabi-objcopy so.thumb.elf so.thumb.bin -O binary
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c so.c -o so.thumb2.o
arm-none-eabi-ld -o so.thumb2.elf -T flash.ld flash.o so.thumb2.o
arm-none-eabi-objdump -D so.thumb2.elf > so.thumb2.list
arm-none-eabi-objcopy so.thumb2.elf so.thumb2.bin -O binary
Result (all thumb versions)
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000
8000004: 08000015
8000008: 0800001b
800000c: 0800001b
8000010: 0800001b
08000014 <reset>:
8000014: f000 f804 bl 8000020 <notmain>
8000018: e7ff b.n 800001a <loop>
0800001a <loop>:
800001a: e7fe b.n 800001a <loop>
0800001c <fun>:
800001c: 4770 bx lr
800001e: 46c0 nop ; (mov r8, r8)
08000020 <notmain>:
8000020: b570 push {r4, r5, r6, lr}
8000022: 25fa movs r5, #250 ; 0xfa
8000024: 2400 movs r4, #0
8000026: 00ad lsls r5, r5, #2
8000028: 0020 movs r0, r4
800002a: 3401 adds r4, #1
800002c: f7ff fff6 bl 800001c <fun>
8000030: 42ac cmp r4, r5
8000032: d1f9 bne.n 8000028 <notmain+0x8>
8000034: 2000 movs r0, #0
8000036: bd70 pop {r4, r5, r6, pc}
Of course this has to be placed in flash at the right place with some tool.
The vector table is mapped by logic to 0x00000000 in the stm32 family.
08000000 <_start>:
8000000: 20001000
8000004: 08000015 <---- reset ORR 1
And in this minimal code the reset handler calls the C code the C code messes around and returns. Technically a fully functional program for most stm32s (change the stack init to a smaller value for those with less ram say 0x20000400 and it should work anywhere by using -mthumb by itself (armv4t) or adding the cortex-m0. well okay not the armv8ms they can technically not support all of armv6m but the one in the field I know about does.
I don't have Kiel so don't know how to translate to that, but it shouldn't be much of a stretch, just syntax.

C function from main is not pushing on stack in arm

I am executing C code for arm cortex-m3 for stm32l152C-discovery board but I observed that the function call from main is not getting pushed into the stack. I have analyzed the asm code of this source but I find it is OK. To understand better, please look the asm code generated for C code here:
main.elf: file format elf32-littlearm
*SYMBOL TABLE:
00000010 l d .text 00000000 .text
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l d .debug_frame 00000000 .debug_frame
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000 clock.c
20004ffc g .text 00000000 _STACKTOP
**00000028 g F .text 000000e0 SystemClock_Config**
20000000 g .text 00000000 _DATA_BEGIN
20000000 g .text 00000000 _HEAP
**00000010 g F .text 00000016 main**
20000000 g .text 00000000 _BSS_END
00000108 g .text 00000000 _DATAI_BEGIN
20000000 g .text 00000000 _BSS_BEGIN
00000108 g .text 00000000 _DATAI_END
20000000 g .text 00000000 _DATA_END
Disassembly of section .text:
00000010 <main>:
#define LL_GPIO_MODE_OUTPUT 1
void SystemInit() ;
int main()
{
10: b580 push {r7, lr}
12: b082 sub sp, #8
14: af00 add r7, sp, #0
int i = 0;
16: 2300 movs r3, #0
18: 607b str r3, [r7, #4]
SystemClock_Config();
**1a: f000 f805 bl 28 <SystemClock_Config>
for(;;)
i++;
1e: 687b ldr r3, [r7, #4]
20: 3301 adds r3, #1**
22: 607b str r3, [r7, #4]
24: e7fb b.n 1e <main+0xe>
}
00000028 <SystemClock_Config>:
* PLLDIV = 3
* Flash Latency(WS) = 1
* #retval None
*/
void SystemClock_Config(void)
{
28: b480 push {r7}
2a: af00 add r7, sp, #0
SET_BIT(FLASH->ACR, FLASH_ACR_ACC64);
2c: 4a33 ldr r2, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
2e: 4b33 ldr r3, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
30: 681b ldr r3, [r3, #0]
32: f043 0304 orr.w r3, r3, #4
36: 6013 str r3, [r2, #0]
MODIFY_REG(FLASH->ACR, FLASH_ACR_LATENCY, LL_FLASH_LATENCY_1);
38: 4a30 ldr r2, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3a: 4b30 ldr r3, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3c: 681b ldr r3, [r3, #0]
3e: f043 0301 orr.w r3, r3, #1
42: 6013 str r3, [r2, #0]*
}
the execution loops around 0x1a, 0x1c, 0x1e, 0x20 in PC register.
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
It should jump to 0x28 (SystemClock_Config) at 0x1a.
A very simple completely working example:
vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl centry
done: b done
so.c
unsigned int fun ( unsigned int );
unsigned int centry ( void )
{
return(fun(5)+1);
}
fun.c
unsigned int fun ( unsigned int x )
{
return(x+1);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -o so.elf -T flash.ld vectors.o so.o fun.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
the whole program
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000009 andeq r0, r0, r9
00000008 <reset>:
8: f000 f802 bl 10 <centry>
0000000c <done>:
c: e7fe b.n c <done>
...
00000010 <centry>:
10: b510 push {r4, lr}
12: 2005 movs r0, #5
14: f000 f802 bl 1c <fun>
18: 3001 adds r0, #1
1a: bd10 pop {r4, pc}
0000001c <fun>:
1c: 3001 adds r0, #1
1e: 4770 bx lr
a simulation of the program:
read32(0x00000000)=0x20001000
read32(0x00000004)=0x00000009
--- 0x00000008: 0xF000
--- 0x0000000A: 0xF802 bl 0x0000000F
--- 0x00000010: 0xB510 push {r4,lr}
write32(0x20000FF8,0x00000000)
write32(0x20000FFC,0x0000000D)
--- 0x00000012: 0x2005 movs r0,#0x05
--- 0x00000014: 0xF000
--- 0x00000016: 0xF802 bl 0x0000001B
--- 0x0000001C: 0x3001 adds r0,#0x01
--- 0x0000001E: 0x4770 bx r14
--- 0x00000018: 0x3001 adds r0,#0x01
--- 0x0000001A: 0xBD10 pop {r4,pc}
read32(0x20000FF8)=0x00000000
read32(0x20000FFC)=0x0000000D
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
sure it is a somewhat useless program but it demonstrates booting and calling functions (the function address does not show up on the stack, when you do a call (bl) the r14 gets the return address and r15 gets the address to branch to. if you have nested functions like centry (the C entry point main() is not an important function name you can call your entry point whatever you want so long as your bootstrap matches) calling fun, then you need to preserve the return address however you choose, typically save it on the stack. r4 is being pushed just to keep the stack aligned on a 64 bit boundary per the abi.
for your system you would set the linker script for 0x08000000 normally (stm32).
What we are missing from you is the beginning of your binary, can you do a hexdump of the memory image/binary showing the handfuls of byte before main including the first few instructions of main?
If a bare metal program doesnt do the simplest boot steps right, the first thing you do is to examine the binary where the entry point or vector table is depending on the architecture and see that you built it right.
In this case in my example this is a cortex-m so the stack pointer initialization value (if you choose to use it) is at 0x00000000, you can put anything there and then simply write over the sp if you want, your choice...then address 0x00000004 is the reset vector which is the address of the code to handle the reset with the lsbit set to indicate thumb mode.
so 0x00000008|1 = 0x00000009.
If you dont have
0x2000xxxx
0x00000011
then your processor is not going to boot right. I am so much in the habit of using 0x08000000 that I dont remember if 0x00000000 works for an stm, it in theory should...but depends on how you are loading the flash and what mode/state the chip is in at that time.
you might need to link for 0x08000000 and at a minimum if nothing else changed
0x2000xxxx
0x08000011
as the first two word in your binary/memory image.
EDIT
note you can make a single binary that can be entered both with a vector or a bootloader
.thumb
.thumb_func
.global _start
_start:
bl reset
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
placing a branch (well bl to fill the space) in the stack address spot then loading the stack pointer later.
Or use a branch
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
Your application is missing an interrupt table. As a result, the processor is reading instructions as interrupt vectors, and faulting repeatedly as those instructions cannot be interpreted as invalid addresses.
Use the support files from the STM32L1xx standard peripheral library to generate an appropriate linker script and interrupt table.

freeRTOS linking process: multiple definition of `_start'

I am trying to compile freeRTOS for raspberry pi 2. Those are the commands I tried so far:
arm-none-eabi-gcc -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard test.c -o test.o
arm-none-eabi-as -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard startup.s -o startup.o
arm-none-eabi-ld test.o startup.o -static -Map kernel7.map -o
target.elf -T raspberrypi.ld
The two upper ones do work fine. However the last one doesn't, it gives me the following error:
startup.o: In function _start':
(.init+0x0): multiple definition of_start'
test.o::(.text+0x6c): first defined here
startup.o: In function swi_handler':
(.init+0x28): undefined reference tovPortYieldProcessor'
startup.o: In function irq_handler':
(.init+0x38): undefined reference tovFreeRTOS_ISR'
startup.o: In function zero_loop':
(.init+0xcc): undefined reference torpi_cpu_irq_disable'
This is the corresponding code:
test.c:
#include <stdio.h>
void exit(int code)
{
while(1)
;
}
int main(void)
{
return 0;
}
startup.s:
.extern system_init
.extern __bss_start
.extern __bss_end
.extern vFreeRTOS_ISR
.extern vPortYieldProcessor
.extern rpi_cpu_irq_disable
.extern main
.section .init
.globl _start
;;
_start:
;# All the following instruction should be read as:
;# Load the address at symbol into the program counter.
ldr pc,reset_handler ;# Processor Reset handler -- we will have to force this on the raspi!
;# Because this is the first instruction executed, of cause it causes an immediate branch into reset!
ldr pc,undefined_handler ;# Undefined instruction handler -- processors that don't have thumb can emulate thumb!
ldr pc,swi_handler ;# Software interrupt / TRAP (SVC) -- system SVC handler for switching to kernel mode.
ldr pc,prefetch_handler ;# Prefetch/abort handler.
ldr pc,data_handler ;# Data abort handler/
ldr pc,unused_handler ;# -- Historical from 26-bit addressing ARMs -- was invalid address handler.
ldr pc,irq_handler ;# IRQ handler
ldr pc,fiq_handler ;# Fast interrupt handler.
;# Here we create an exception address table! This means that reset/hang/irq can be absolute addresses
reset_handler: .word reset
undefined_handler: .word undefined_instruction
swi_handler: .word vPortYieldProcessor
prefetch_handler: .word prefetch_abort
data_handler: .word data_abort
unused_handler: .word unused
irq_handler: .word vFreeRTOS_ISR
fiq_handler: .word fiq
reset:
/* Disable IRQ & FIQ */
cpsid if
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continueBoot
overHyped: /* Get out of HYP mode */
ldr r1, =continueBoot
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;# CPSR_MODE_MASK
orr r1, r1, #0x13 ;# CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
eret
continueBoot:
;# In the reset handler, we need to copy our interrupt vector table to 0x0000, its currently at 0x8000
mov r0,#0x8000 ;# Store the source pointer
mov r1,#0x0000 ;# Store the destination pointer.
;# Here we copy the branching instructions
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load multiple values from indexed address. ; Auto-increment R0
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store multiple values from the indexed address. ; Auto-increment R1
;# So the branches get the correct address we also need to copy our vector table!
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load from 4*n of regs (8) as R0 is now incremented.
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store this extra set of data.
;# Set up the various STACK pointers for different CPU modes
;# (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD2
msr cpsr_c,r0
mov sp,#0x8000
;# (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD1
msr cpsr_c,r0
mov sp,#0x4000
;# (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD3
msr cpsr_c,r0
mov sp,#0x8000000
ldr r0, =__bss_start
ldr r1, =__bss_end
mov r2, #0
zero_loop:
cmp r0,r1
it lt
strlt r2,[r0], #4
blt zero_loop
bl rpi_cpu_irq_disable
;# mov sp,#0x1000000
b main ;# We're ready?? Lets start main execution!
.section .text
undefined_instruction:
b undefined_instruction
prefetch_abort:
b prefetch_abort
data_abort:
b data_abort
unused:
b unused
fiq:
b fiq
hang:
b hang
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl dummy
dummy:
bx lr
raspberrypi.ld:
/**
* BlueThunder Linker Script for the raspberry Pi!
*
*
*
**/
MEMORY
{
RESERVED (r) : ORIGIN = 0x00000000, LENGTH = 32K
INIT_RAM (rwx) : ORIGIN = 0x00008000, LENGTH = 32K
RAM (rwx) : ORIGIN = 0x00010000, LENGTH = 128M
}
ENTRY(_start)
SECTIONS {
/*
* Our init section allows us to place the bootstrap code at address 0x8000
*
* This is where the Graphics processor forces the ARM to start execution.
* However the interrupt vector code remains at 0x0000, and so we must copy the correct
* branch instructions to 0x0000 - 0x001C in order to get the processor to handle interrupts.
*
*/
.init : {
KEEP(*(.init))
} > INIT_RAM = 0
.module_entries : {
__module_entries_start = .;
KEEP(*(.module_entries))
KEEP(*(.module_entries.*))
__module_entries_end = .;
__module_entries_size = SIZEOF(.module_entries);
} > INIT_RAM
/**
* This is the main code section, it is essentially of unlimited size. (128Mb).
*
**/
.text : {
*(.text)
} > RAM
/*
* Next we put the data.
*/
.data : {
*(.data)
} > RAM
.bss :
{
__bss_start = .;
*(.bss)
*(.bss.*)
__bss_end = .;
} > RAM
/*
__exidx_start = .;
.ARM.exidx :
{
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > RAM
__exidx_end = .;
*/
/**
* Place HEAP here???
**/
PROVIDE(__HEAP_START = __bss_end );
/**
* Stack starts at the top of the RAM, and moves down!
**/
_estack = ORIGIN(RAM) + LENGTH(RAM);
}
As you can see test.c doesn't contain an entry point called _start, neither does it have one in its assembly compiled form. Only startup.s does.
Any idea's about how I could solve my current issue?
EDIT: all the code if needed used can be found here:https://github.com/jameswalmsley/RaspberryPi-FreeRTOS

Resources