Fatal error when calling ARM assembly function from C - c

I'm trying to implement an example from a book of calling an assembly function from C. But I keep getting a fatal error where the PC = fffffffe and therefore executing code outside of RAM or ROM.
Here is my C file, t.c:
int g; // uninitialized global
int main()
{
int a, b, c, d, e, f; // local variables
a = b = c = d = e = f = 1; // values do not matter
g = sum(a,b,c,d,e,f); // call sum(), passing a,b,c,d,e,f
}
My assembly file, ts.s:
.global start, sum
start: ldr sp, =stack_top
bl main // call main() in c
stop: b stop
sum: // int sum(int a,b,c,d,e,f) { return a+b+c+d+e+f; }
// upon entry, stack top contains e, f, passed by main() in C
// Establish stack frame
stmfd sp!, {fp, lr} // push fp, lr
add fp, sp, #4 // fp -> saved lr on stack
// Compute sum of all (6) parameters
add r0, r0, r1 // first 4 parameters are in r0-r1
add r0, r0, r2
add r0, r0, r3
ldr r3, [fp, #4] // load e into r3
add r0, r0, r3 // add to sum in r0
ldr r3, [fp, #8] // load f into r3
add r0, r0, r3 // add to sum in r0
// Return to caller
sub sp, fp, #4 // sp=fp-4 (point at saved FP)
ldmfd sp!, {fp, pc} // return to caller
Here is the linker script, t.ld:
ENTRY(start) /* define start as the entry address */
SECTIONS /* program sections */
{
. = 0x10000; /* loading address, required by QEMU */
.text : { *(.text) } /* all text in .text section */
.data : { *(.data) } /* all data in .data section */
.bss : { *(.bss) } /* all bss in .bss section */
. =ALIGN(8);
. =. + 0x1000; /* 4 KB stack space */
stack_top =.; /* stack_top is a symbol exported by linker */
}
I assemble the files using the following:
arm-none-eabi-as -o ts.o ts.s
arm-none-eabi-gcc -c t.c
arm-none-eabi-ld -T t.ld -o t.elf t.o ts.o
arm-none-eabi-objcopy -O binary t.elf t.bin
Then execute with:
qemu-system-arm -M versatilepb -kernel t.bin -nographic -serial /dev/null
Here is the output:
QEMU 2.5.0 monitor - type 'help' for more information
(qemu) pulseaudio: set_sink_input_volume() failed
pulseaudio: Reason: Invalid argument
pulseaudio: set_sink_input_mute() failed
pulseaudio: Reason: Invalid argument
qemu: fatal: Trying to execute code outside RAM or ROM at 0xfffffffe
R00=fffffffc R01=ffffffff R02=00000000 R03=ffffffff
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=ffffffff
R12=00000000 R13=42fffff0 R14=00010060 R15=fffffffe
PSR=400001f3 -Z-- T svc32
s00=00000000 s01=00000000 d00=0000000000000000
s02=00000000 s03=00000000 d01=0000000000000000
s04=00000000 s05=00000000 d02=0000000000000000
s06=00000000 s07=00000000 d03=0000000000000000
s08=00000000 s09=00000000 d04=0000000000000000
s10=00000000 s11=00000000 d05=0000000000000000
s12=00000000 s13=00000000 d06=0000000000000000
s14=00000000 s15=00000000 d07=0000000000000000
s16=00000000 s17=00000000 d08=0000000000000000
s18=00000000 s19=00000000 d09=0000000000000000
s20=00000000 s21=00000000 d10=0000000000000000
s22=00000000 s23=00000000 d11=0000000000000000
s24=00000000 s25=00000000 d12=0000000000000000
s26=00000000 s27=00000000 d13=0000000000000000
s28=00000000 s29=00000000 d14=0000000000000000
s30=00000000 s31=00000000 d15=0000000000000000
FPSCR: 00000000
Aborted (core dumped)
I am new to ARM assembly, so I'm not sure how the PC gets all the way to fffffffe. Any help would be appreciated, thank you!
So I've simplified both the C and asm file but still keep getting a fatal error.
Here is the updated C file:
int g; // uninitialized global
int main()
{
g = sum();
}
The ASM file:
.global start, sum
start: ldr sp, =stack_top
bl main // call main() in c
stop: b stop
sum:
push {lr}
pop {pc}
The linking script is the same as before. I still get the following error:
QEMU 2.5.0 monitor - type 'help' for more information
(qemu) pulseaudio: set_sink_input_volume() failed
pulseaudio: Reason: Invalid argument
pulseaudio: set_sink_input_mute() failed
pulseaudio: Reason: Invalid argument
qemu: fatal: Trying to execute code outside RAM or ROM at 0xfffffffe
R00=00000000 R01=00000183 R02=00000100 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=43000004
R12=00000000 R13=43000000 R14=0001000c R15=fffffffe
PSR=400001f3 -Z-- T svc32
s00=00000000 s01=00000000 d00=0000000000000000
s02=00000000 s03=00000000 d01=0000000000000000
s04=00000000 s05=00000000 d02=0000000000000000
s06=00000000 s07=00000000 d03=0000000000000000
s08=00000000 s09=00000000 d04=0000000000000000
s10=00000000 s11=00000000 d05=0000000000000000
s12=00000000 s13=00000000 d06=0000000000000000
s14=00000000 s15=00000000 d07=0000000000000000
s16=00000000 s17=00000000 d08=0000000000000000
s18=00000000 s19=00000000 d09=0000000000000000
s20=00000000 s21=00000000 d10=0000000000000000
s22=00000000 s23=00000000 d11=0000000000000000
s24=00000000 s25=00000000 d12=0000000000000000
s26=00000000 s27=00000000 d13=0000000000000000
s28=00000000 s29=00000000 d14=0000000000000000
s30=00000000 s31=00000000 d15=0000000000000000
FPSCR: 00000000
Aborted (core dumped)
It has something to do with PUSH and POP. Replacing those two with MOV PC,LR the program runs and get no error.

strap.s
.globl _start
_start:
mov sp,#0x10000
bl main
hang:
b hang
.globl sum
sum:
stmfd sp!, {fp, lr} // push fp, lr
add fp, sp, #4 // fp -> saved lr on stack
// Compute sum of all (6) parameters
add r0, r0, r1 // first 4 parameters are in r0-r1
add r0, r0, r2
add r0, r0, r3
ldr r3, [fp, #4] // load e into r3
add r0, r0, r3 // add to sum in r0
ldr r3, [fp, #8] // load f into r3
add r0, r0, r3 // add to sum in r0
// Return to caller
sub sp, fp, #4 // sp=fp-4 (point at saved FP)
ldmfd sp!, {fp, pc} // return to caller
notmain.c
int sum ( int, int, int, int, int, int );
int g; // uninitialized global
int main()
{
int a, b, c, d, e, f; // local variables
a = b = c = d = e = f = 1; // values do not matter
g = sum(a,b,c,d,e,f); // call sum(), passing a,b,c,d,e,f
}
/* memmap */
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
arm-linux-gnueabi-as --warn --fatal-warnings -march=armv5t strap.s -o strap.o
arm-linux-gnueabi-gcc -c -Wall -O2 -nostdlib -nostartfiles -ffreestanding -march=armv5t notmain.c -o notmain.o
notmain.c: In function ‘main’:
notmain.c:11:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
arm-linux-gnueabi-ld strap.o notmain.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
arm-linux-gnueabi-objcopy notmain.elf -O binary notmain.bin
00010000 <_start>:
10000: e3a0d801 mov sp, #65536 ; 0x10000
10004: eb00000b bl 10038 <main>
00010008 <hang>:
10008: eafffffe b 10008 <hang>
0001000c <sum>:
1000c: e92d4800 push {fp, lr}
10010: e28db004 add fp, sp, #4
10014: e0800001 add r0, r0, r1
10018: e0800002 add r0, r0, r2
1001c: e0800003 add r0, r0, r3
10020: e59b3004 ldr r3, [fp, #4]
10024: e0800003 add r0, r0, r3
10028: e59b3008 ldr r3, [fp, #8]
1002c: e0800003 add r0, r0, r3
10030: e24bd004 sub sp, fp, #4
10034: e8bd8800 pop {fp, pc}
00010038 <main>:
10038: e3a03001 mov r3, #1
1003c: e52de004 push {lr} ; (str lr, [sp, #-4]!)
10040: e24dd00c sub sp, sp, #12
10044: e1a02003 mov r2, r3
10048: e58d3004 str r3, [sp, #4]
1004c: e58d3000 str r3, [sp]
10050: e1a01003 mov r1, r3
10054: e1a00003 mov r0, r3
10058: ebffffeb bl 1000c <sum>
1005c: e59f200c ldr r2, [pc, #12] ; 10070 <main+0x38>
10060: e5820000 str r0, [r2]
10064: e1a00003 mov r0, r3
10068: e28dd00c add sp, sp, #12
1006c: e49df004 pop {pc} ; (ldr pc, [sp], #4)
10070: 00010074 andeq r0, r1, r4, ror r0
Disassembly of section .bss:
00010074 <g>:
10074: 00000000 andeq r0, r0, r0
run it. ctrl-a then x to exit
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.bin
pulseaudio: pa_context_connect() failed
pulseaudio: Reason: Connection refused
pulseaudio: Failed to initialize PA contextaudio: Could not init `pa' audio driver
QEMU: Terminated
no faults.
arm-none-eabi-as --warn --fatal-warnings -march=armv5t strap.s -o strap.o
arm-none-eabi-gcc -c -Wall -O2 -nostdlib -nostartfiles -ffreestanding -march=armv5t notmain.c -o notmain.o
notmain.c: In function ‘main’:
notmain.c:11:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
arm-none-eabi-ld strap.o notmain.o -T memmap -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf -O binary notmain.bin
again no faults.
What are the missing parts to your example that you didnt provide?
you can simplify your assembly a little bit if you care to, I can understand why to use the stack frame if you feel the need.
.globl sum
sum:
push {r3,lr}
add r0, r0, r1
add r0, r0, r2
add r0, r0, r3
ldr r3, [fp, #0x08]
add r0, r0, r3
ldr r3, [fp, #0x0C]
add r0, r0, r3
pop {r3,pc}

Related

Fault when operating on double/float on STM32F4

I am working with STM32F407 microcontroller.
void mya_usart_init(USART_Type* USARTx, uint32_t baudRate, uint8_t UsartClockSpeedMHz) {
double a = (double)72/7;
double b = (double)UsartClockSpeedMHz;
// usart_div = (double)UsartClockSpeedMHz; // * 1000000 / (8 * (2) * baudRate);
//
// USARTx->BRR |= ((uint32_t)usart_div << 4) | (uint32_t) round(((usart_div - (uint32_t)usart_div) * 16)) ; // seperate integer part and decimal part
//
// // USARTx->CR1 &= ~USART_CR1_M; // this is reset value already
//
// USARTx->CR1 |= USART_CR1_RE | USART_CR1_TE | USART_CR1_UE; // Rx, Tx, and USART enable
return;
}
I have been dealing with USART, then I figured out that I cannot make this conversion from uint8_t to double. As seen from below, variable a is successfully gets its value, however, the variable b cannot be written with the desired value. I have added the assembly code, everything goes well but at this stage;
08000ac4: bl 0x800046c <__floatunsidf>
The MCU enters a fault with the following error in IDE(Eclipse based STM32CubeIDE).
No source available for "__floatunsidf() at 0x800046c"
I also noticed that, this assembly line above does not appear for (double)72/7 but appears for (double)UsartClockSpeedMHz. Why is that so?
So, to me, while it can convert 72/7 to double, it cannot, for some reason, convert UsartClockSpeedMHz, which is 84, given as an argument to the function, to double. What is the reason for this?
Thanks.
6 void mya_usart_init(USART_Type* USARTx, uint32_t baudRate, uint8_t UsartClockSpeedMHz) {
mya_usart_init:
08000aa8: push {r4, r7, lr}
08000aaa: sub sp, #36 ; 0x24
08000aac: add r7, sp, #0
08000aae: str r0, [r7, #12]
08000ab0: str r1, [r7, #8]
08000ab2: mov r3, r2
08000ab4: strb r3, [r7, #7]
10 double a = (double)72/7;
08000ab6: add r4, pc, #32 ; (adr r4, 0x8000ad8 <mya_usart_init+48>)
08000ab8: ldrd r3, r4, [r4]
08000abc: strd r3, r4, [r7, #24]
11 double b = (double)UsartClockSpeedMHz;
08000ac0: ldrb r3, [r7, #7]
08000ac2: mov r0, r3
08000ac4: bl 0x800046c <__floatunsidf>
08000ac8: mov r3, r0
08000aca: mov r4, r1
08000acc: strd r3, r4, [r7, #16]
21 return;
08000ad0: nop
22 }
update:
The build command is:
18:14:21 **** Incremental Build of configuration Debug for project MYA-STM32F4-DRIVERS ****
make -j8 all
arm-none-eabi-gcc "../Src/mya_usart.c" -mcpu=cortex-m4 -std=gnu11 -g3 -DSTM32 -DSTM32F407G_DISC1 -DSTM32F4 -DSTM32F407VGTx -DDEBUG -c -I../Inc -O0 -ffunction-sections -fdata-sections -Wall -fstack-usage -MMD -MP -MF"Src/mya_usart.d" -MT"Src/mya_usart.o" --specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb -o "Src/mya_usart.o"
../Src/mya_usart.c: In function 'mya_usart_init':
../Src/mya_usart.c:11:12: warning: unused variable 'b' [-Wunused-variable]
double b = (float)UsartClockSpeedMHz;
^
../Src/mya_usart.c:10:12: warning: unused variable 'a' [-Wunused-variable]
double a = (float)72/7;
^
arm-none-eabi-gcc -o "MYA-STM32F4-DRIVERS.elf" #"objects.list" -mcpu=cortex-m4 -T"/home/muyustan/Documents/STM32CubeIDE/workspace_1.3.0/MYA-STM32F4-DRIVERS/STM32F407VGTX_FLASH.ld" --specs=nosys.specs -Wl,-Map="MYA-STM32F4-DRIVERS.map" -Wl,--gc-sections -static --specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb -Wl,--start-group -lc -lm -Wl,--end-group
Finished building target: MYA-STM32F4-DRIVERS.elf
arm-none-eabi-objdump -h -S MYA-STM32F4-DRIVERS.elf > "MYA-STM32F4-DRIVERS.list"
arm-none-eabi-objcopy -O binary MYA-STM32F4-DRIVERS.elf "MYA-STM32F4-DRIVERS.bin"
arm-none-eabi-size MYA-STM32F4-DRIVERS.elf
text data bss dec hex filename
2804 8 1568 4380 111c MYA-STM32F4-DRIVERS.elf
Finished building: default.size.stdout
Finished building: MYA-STM32F4-DRIVERS.bin
Finished building: MYA-STM32F4-DRIVERS.list
I cannot even perform this in main(),
for (i=0 ; i<15; i++){
f*=10.01;
}
I get No source available for "__extendsfdf2() at 0x80004b0" for the above one.
If I change it to, (notice the 10.01f)
for (i=0 ; i<15; i++){
f*=10.01f;
}
This time, assembly code changes:
83 f*=10.01f;
080002fc: vldr s15, [r7]
08000300: vldr s14, [pc, #44] ; 0x8000330 <main+288>
08000304: vmul.f32 s15, s15, s14
08000308: vstr s15, [r7]
0800030c: ldr r3, [r7, #4]
0800030e: adds r3, #1
08000310: str r3, [r7, #4]
08000312: ldr r3, [r7, #4]
08000314: cmp r3, #14
08000316: ble.n 0x80002fc <main+236>
Executing the very first step, vldr s15, [r7] routes me to the following screen:

Why -marm option still generates thumb instruction?

(I am new to the ARM world. Excuse me if this is a dumb question.)
I am using below command line to generate assembly code for a C file.
The cpu is arm926ej-s, which is ARMv5 architecture.
arm-none-eabi-gcc -mcpu=arm926ej-s -mthumb -S t.c -o t_thumb.S
arm-none-eabi-gcc -mcpu=arm926ej-s -marm -S t.c -o t_arm.S
I am expecting the -marm and -mthumb options would generate different function prologues. But they give similar results:
for -marm:
# args = 0, pretend = 0, frame = 72
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr} #<========== push is used instead of stmfd
add fp, sp, #4
sub sp, sp, #72
bl uart_init
for -mthumb:
# args = 0, pretend = 0, frame = 72
# frame_needed = 1, uses_anonymous_args = 0
push {r7, lr} #<========== push is used as expected
sub sp, sp, #72
add r7, sp, #0
bl uart_init
So they both use the push instruction. But as I checked the ARMv5 arch spec, the push instruction only belongs to the Thumb instruction set. I was expecting stmfd for the -marm option.
Why is push chosen instead?
How can I generate pure ARM instructions?
ADD 1 - 5:21 PM 12/18/2019
Below is the disassembly of the .o file:
arm-none-eabi-gcc -mcpu=arm926ej-s -marm -g -c t.c -o build/t_arm.o
arm-none-eabi-objdump.exe -d build/t_arm.o > t_arm.dism
The disassembly:
000002a0 <main>:
2a0: e92d4800 push {fp, lr} <=============== push is used!
2a4: e28db004 add fp, sp, #4
2a8: e24dd048 sub sp, sp, #72 ; 0x48
2ac: ebfffffe bl 0 <uart_init>
2b0: e59f3168 ldr r3, [pc, #360] ; 420 <main+0x180>
2b4: e50b300c str r3, [fp, #-12]
2b8: e59f1164 ldr r1, [pc, #356] ; 424 <main+0x184>
2bc: e51b000c ldr r0, [fp, #-12]
ADD 2 - 5:34 PM 12/18/2019
Thanks to #Erlkoenig.
I just tried to disassemble a -mthumb binary:
arm-none-eabi-gcc -mcpu=arm926ej-s -mthumb -g -c t.c -o build/t_thumb.o
arm-none-eabi-objdump.exe -d build/t_thumb.o > t_thumb.dism
A totally different thumb disassembly is shown:
00000170 <main>:
170: b580 push {r7, lr} <====== though still push is shown, but the encoding is different.
172: b092 sub sp, #72 ; 0x48
174: af00 add r7, sp, #0
176: f7ff fffe bl 0 <uart_init>
17a: 4b3c ldr r3, [pc, #240] ; (26c <main+0xfc>)
17c: 643b str r3, [r7, #64] ; 0x40
17e: 4a3c ldr r2, [pc, #240] ; (270 <main+0x100>)
180: 6c3b ldr r3, [r7, #64] ; 0x40
The hex encoding of the raw instruction as shown by objdump -d indicates that this is a 32bit ARM ("A32") instruction (0xe92d4800). The .S file generated by the -S flag to GCC, and the objdump output just use the ARM UAL (Unified Assembly Syntax), which uses push as an alias for stmfd, while the ARMv5T Architecture Reference Manual uses the old syntax, which has no push on A32. The instruction encoding matches the encoding of stmdb, for which stmfd is an alias. The encoding is shown on p. 339 in the ARMv5T Reference Manual.
A32 ("ARM") code can be easily recognized as all instructions are 4-byte wide and the first 4 bits are often hex E (which means that the condition code is AL, i.e. the instructions are always executed unconditionally):
[e]92d4800
[e]28db004
[e]24dd048
[e]bfffffe
This is helpful when viewing raw binaries in a hex editor. Thumb ("T32") code has many 16bit instructions, some 32bit, and no "stacks" of Es:
b580
b092
af00
f7ff fffe
Of course, for a raw binary, it is not directly clear which 2- and 4-byte groups belong together as instructions.

How to to build a binary into a image at a fixed address 0x80000?

In our c project, we need to build a binary firmware into a image at a fixed file offset 0x80000.
Then when the image is loaded to memory. We can load firmware from offset 0x80000 to a specified address.
Meanwhile, as the firmware is placed at file offset 0x80000, we can upgrade the firmare independently.
So I'm trying to use GNU linker script to implement that.
What I do now is use incbin to include my binary file in a asm file.
And in linker script, my code is:
.fw_image_start : {
*(.__fw_image_start)
}
.fw_image : {
KEEP(*(.fw_image))
}
.fw_image_end : {
*(.__fw_image_end)
}
Then I can use fw_image_start to load firmware in image code.
But I still can't find a way to put the firmware binary to file offset 0x80000 in the final image.
Could you help me on this?
Thank you in advance!
What did you find when you looked at the documentation and examples from GNU? Some of it is admittedly confusing or misleading, but some is pretty easy. This should give a hit of at least one way to do it (there are multiple ways to solve your problem).
novectors.s
.global _start
_start:
bl notmain
b .
.globl bounce
bounce:
bx lr
.section .hello_world
.word 1,2,3,4
notmain.c
void bounce ( unsigned int );
unsigned int mybss[8];
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) bounce(ra);
return(0);
}
memmap.ld
MEMORY
{
ram : ORIGIN = 0x80000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.hello_world : { *(.hello_world) } > ram
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings novectors.s -o novectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c notmain.c -o notmain.o
arm-none-eabi-ld -o notmain.elf -T memmap.ld novectors.o notmain.o
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf notmain.bin -O binary
examine results
Disassembly of section .text:
00080000 <_start>:
80000: eb000001 bl 8000c <notmain>
80004: eafffffe b 80004 <_start+0x4>
00080008 <bounce>:
80008: e12fff1e bx lr
0008000c <notmain>:
8000c: e92d4010 push {r4, lr}
80010: e3a04000 mov r4, #0
80014: e1a00004 mov r0, r4
80018: e2844001 add r4, r4, #1
8001c: ebfffff9 bl 80008 <bounce>
80020: e3540ffa cmp r4, #1000 ; 0x3e8
80024: 1afffffa bne 80014 <notmain+0x8>
80028: e3a00000 mov r0, #0
8002c: e8bd4010 pop {r4, lr}
80030: e12fff1e bx lr
Disassembly of section .hello_world:
00080034 <.hello_world>:
80034: 00000001 andeq r0, r0, r1
80038: 00000002 andeq r0, r0, r2
8003c: 00000003 andeq r0, r0, r3
80040: 00000004 andeq r0, r0, r4
Disassembly of section .bss:
00080034 <mybss>:
...
Naturally:
MEMORY
{
bob : ORIGIN = 0x80000, LENGTH = 0x1000
ted : ORIGIN = 0xB0000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > bob
.rodata : { *(.rodata*) } > bob
.hello_world : { *(.hello_world) } > ted
.bss : { *(.bss*) } > ted
}
gives
Disassembly of section .text:
00080000 <_start>:
80000: eb000001 bl 8000c <notmain>
80004: eafffffe b 80004 <_start+0x4>
00080008 <bounce>:
80008: e12fff1e bx lr
0008000c <notmain>:
8000c: e92d4010 push {r4, lr}
80010: e3a04000 mov r4, #0
80014: e1a00004 mov r0, r4
80018: e2844001 add r4, r4, #1
8001c: ebfffff9 bl 80008 <bounce>
80020: e3540ffa cmp r4, #1000 ; 0x3e8
80024: 1afffffa bne 80014 <notmain+0x8>
80028: e3a00000 mov r0, #0
8002c: e8bd4010 pop {r4, lr}
80030: e12fff1e bx lr
Disassembly of section .hello_world:
000b0000 <.hello_world>:
b0000: 00000001 andeq r0, r0, r1
b0004: 00000002 andeq r0, r0, r2
b0008: 00000003 andeq r0, r0, r3
b000c: 00000004 andeq r0, r0, r4
Disassembly of section .bss:
000b0000 <mybss>:
...
and as demonstrated the name ram, rom, etc are not special can call it other things like bob, ted, alice...I assume there are some reserved words you cant use.
Again there are numerous solutions, see the GNU documentation, I like this method as it reads better for me, but you will see solutions that skip the MEMORY part.
(no this wasnt intended to be completely correct code, but demonstrates the assembly language bootstrap, the C code and the linker script).

What am I doing wrong when compiling C code to bare metal (raspberry pi)?

I have spent multiple days trying to figure this out and I just can't. I have some C code. I have made the assembly code for this C program, copy pasted the assembly to someone else's project (that only contains a single assembly file) and assembled that. In these case things work. But if I try to compile from C directly to generate the binaries, it doesn't work. Even though everything else should be identical. This is my C code:
#include <stdint.h>
#define REGISTERS_BASE 0x3F000000
#define MAIL_BASE 0xB880 // Base address for the mailbox registers
// This bit is set in the status register if there is no space to write into the mailbox
#define MAIL_FULL 0x80000000
// This bit is set in the status register if there is nothing to read from the mailbox
#define MAIL_EMPTY 0x40000000
struct Message
{
uint32_t messageSize;
uint32_t requestCode;
uint32_t tagID;
uint32_t bufferSize;
uint32_t requestSize;
uint32_t pinNum;
uint32_t on_off_switch;
uint32_t end;
};
struct Message m =
{
.messageSize = sizeof(struct Message),
.requestCode =0,
.tagID = 0x00038041,
.bufferSize = 8,
.requestSize =0,
.pinNum = 130,
.on_off_switch = 1,
.end = 0,
};
/** Main function - we'll never return from here */
int _start(void)
{
uint32_t mailbox = MAIL_BASE + REGISTERS_BASE + 0x18;
volatile uint32_t status;
do
{
status = *(volatile uint32_t *)(mailbox);
}
while((status & 0x80000000));
*(volatile uint32_t *)(MAIL_BASE + REGISTERS_BASE + 0x20) = ((uint32_t)(&m) & 0xfffffff0) | (uint32_t)(8);
while(1);
}
This is a linker file I copied from the successful method:
/*
* Very simple linker script, combing the text and data sections
* and putting them starting at address 0x800.
*/
SECTIONS {
/* Put the code at 0x80000, leaving room for ARM and
* the stack. It also conforms to the standard expecations.
*/
.init 0x8000 : {
*(.init)
}
.text : {
*(.text)
}
/* Put the data after the code */
.data : {
*(.data)
}
}
And these is how I am compiling and linking everything:
arm-none-eabi-gcc -O0 -march=armv8-a PiTest.c -nostartfiles -o kernel.o
arm-none-eabi-ld kernel.o -o kernel.elf -T kernel.ld
arm-none-eabi-objcopy kernel.elf -O binary kernel.img
My target architecture is armv8 since that's what the pi model 3 uses.
I have no idea how the generated assembly works, but the C code directly does not. Please help I am on the verge of madness.
EDIT: The expected behaviour is for the pi's light to turn on. which it does with the first method I described. With the second method the light remains off.
EDIT4: Made some changes to files, deleted previous edits with outdated info to reduce post size
kernel.elf: file format elf32-littlearm
Disassembly of section .init:
00008000 <_start>:
8000: e3a0dd7d mov sp, #8000 ; 0x1f40
8004: eaffffff b 8008 <kernel_main>
Disassembly of section .text:
00008008 <kernel_main>:
8008: e52db004 push {fp} ; (str fp, [sp, #-4]!)
800c: e28db000 add fp, sp, #0
8010: e24dd00c sub sp, sp, #12
8014: e30b3898 movw r3, #47256 ; 0xb898
8018: e3433f00 movt r3, #16128 ; 0x3f00
801c: e50b3008 str r3, [fp, #-8]
8020: e51b3008 ldr r3, [fp, #-8]
8024: e5933000 ldr r3, [r3]
8028: e50b300c str r3, [fp, #-12]
802c: e51b300c ldr r3, [fp, #-12]
8030: e3530000 cmp r3, #0
8034: bafffff9 blt 8020 <kernel_main+0x18>
8038: e30b38a0 movw r3, #47264 ; 0xb8a0
803c: e3433f00 movt r3, #16128 ; 0x3f00
8040: e3082050 movw r2, #32848 ; 0x8050
8044: e3402001 movt r2, #1
8048: e3c2200f bic r2, r2, #15
804c: e3822008 orr r2, r2, #8
8050: e5832000 str r2, [r3]
8054: eafffffe b 8054 <kernel_main+0x4c>
Disassembly of section .data:
00008058 <__data_start>:
8058: 00000020 andeq r0, r0, r0, lsr #32
805c: 00000000 andeq r0, r0, r0
8060: 00038041 andeq r8, r3, r1, asr #32
8064: 00000008 andeq r0, r0, r8
8068: 00000000 andeq r0, r0, r0
806c: 00000082 andeq r0, r0, r2, lsl #1
8070: 00000001 andeq r0, r0, r1
8074: 00000000 andeq r0, r0, r0
Disassembly of section .ARM.attributes:
00000000 <_stack-0x80021>:
0: 00002e41 andeq r2, r0, r1, asr #28
4: 61656100 cmnvs r5, r0, lsl #2
8: 01006962 tsteq r0, r2, ror #18
c: 00000024 andeq r0, r0, r4, lsr #32
10: 412d3805 ; <UNDEFINED> instruction: 0x412d3805
14: 070e0600 streq r0, [lr, -r0, lsl #12]
18: 09010841 stmdbeq r1, {r0, r6, fp}
1c: 14041202 strne r1, [r4], #-514 ; 0xfffffdfe
20: 17011501 strne r1, [r1, -r1, lsl #10]
24: 1a011803 bne 46038 <__bss_end__+0x3dfc0>
28: 2a012201 bcs 48834 <__bss_end__+0x407bc>
2c: Address 0x000000000000002c is out of bounds.
Disassembly of section .comment:
00000000 <.comment>:
0: 3a434347 bcc 10d0d24 <_stack+0x1050d03>
4: 35312820 ldrcc r2, [r1, #-2080]! ; 0xfffff7e0
8: 392e343a stmdbcc lr!, {r1, r3, r4, r5, sl, ip, sp}
c: 732b332e ; <UNDEFINED> instruction: 0x732b332e
10: 33326e76 teqcc r2, #1888 ; 0x760
14: 37373131 ; <UNDEFINED> instruction: 0x37373131
18: 2029312d eorcs r3, r9, sp, lsr #2
1c: 2e392e34 mrccs 14, 1, r2, cr9, cr4, {1}
20: 30322033 eorscc r2, r2, r3, lsr r0
24: 35303531 ldrcc r3, [r0, #-1329]! ; 0xfffffacf
28: 28203932 stmdacs r0!, {r1, r4, r5, r8, fp, ip, sp}
2c: 72657270 rsbvc r7, r5, #112, 4
30: 61656c65 cmnvs r5, r5, ror #24
34: 00296573 eoreq r6, r9, r3, ror r5
kernel8.img
12345678
00000800
00080264
00000000
12345678
kernel8-32.img
12345678
00008320
00008224
200001DA
12345678
kernel7.img
12345678
00000700
00008224
200001DA
12345678
kernel.img
12345678
00000000
00008224
200001DA
12345678
when I wrote and posted this code this is what I got so if you name your file kernel.img then 0x8000 is your entry point the answer I gave in your other SO question is a complete raspberry pi starting point. You can simply add your mailbox stuff, although if you are struggling with this I thing the mailbox and video are not where you should start IMO.
if you name the file kernel8.img then the entry point is 0x80000 change the linker script to match.
I have a serial port based bootloader you can use to save on the sd card dance, can get a long way with that then simply use the binary version of what you are creating to write to the flash once your application is working.
EDIT
Okay this is incredibly disgusting and by posting it here maybe that means you cant use it in your classwork...you should really do this right and not use inline assembly for your bootstrap...
so.c
asm(
".globl _start\n"
"_start:\n"
"mov sp,#0x8000\n"
"bl centry\n"
"b .\n"
);
unsigned int centry ( void )
{
return(5);
}
build
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -Ttext=0x8000 so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf -O binary kernel.img
examine
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
0000800c <centry>:
800c: e3a00005 mov r0, #5
8010: e12fff1e bx lr
A complete raspberry pi C with bootstrap example that will work on any of the flavors of pi (so far as I know they might have changed the GPU bootloader in the last few months but assume the didnt).
There are a couple of things I see wrong here. The most obvious ones are:
You aren't leaving anything at address 0, so the CPU is left executing blank memory at startup. You need to put something (like a branch instruction!) at 0x0.
On ARM Cortex-A, the stack pointer is not initialized at startup. You have to initialize it yourself in _start -- which means you will need to write that function in assembly.
First, cudos to old timer for his patience helping me.
The mistakes were:
Wrong entry point for the program, fixed by creating an assembly file with the label _start to set the stack pointer and using the linker to put the init section at address 0x8000
The compilation line itself was also wrong, it was missing a -c argument

How to make bare metal ARM programs and run them on QEMU?

I am trying to get this tutorial to work as intended without success (Something fails after the bl main instruction).
According to the tutorial the command
(qemu) xp /1dw 0xa0000018
should result in the print 33 (But i get 0x00 instead)
a0000018: 33
This is the content of the registers after the main call (see startup.s)
(qemu) info registers
R00=a000001c R01=a000001c R02=00000006 R03=00000000
R04=00000000 R05=00000005 R06=00000006 R07=00000007
R08=00000008 R09=00000009 R10=00000000 R11=a3fffffc
R12=00000000 R13=00000000 R14=0000003c R15=00000004
PSR=800001db N--- A und32
FPSCR: 00000000
I have the following files
main.c
startup.s
lscript.ld
Makefile
And I am using the following toolchain
arm-2013.11-24-arm-none-eabi-i686-pc-linux-gnu
Makefile:
SRCS := main.c startup.s
LINKER_NAME := lscript.ld
ELF_NAME := program.elf
BIN_NAME := program.bin
FLASH_NAME := flash.bin
CC := arm-none-eabi
CFLAGS := -nostdlib
OBJFLAGS ?= -DS
QEMUFLAGS := -M connex -pflash $(FLASH_NAME) -nographic -serial /dev/null
# Allocate 16MB to use as a virtual flash for th qemu
# bs = blocksize -> 4KB
# count = number of block -> 4096
# totalsize = 16MB
setup:
dd if=/dev/zero of=$(FLASH_NAME) bs=4096 count=4096
# Compile srcs and write to virtual flash
all: clean setup
$(CC)-gcc $(CFLAGS) -o $(ELF_NAME) -T $(LINKER_NAME) $(SRCS)
$(CC)-objcopy -O binary $(ELF_NAME) $(BIN_NAME)
dd if=$(BIN_NAME) of=$(FLASH_NAME) bs=4096 conv=notrunc
objdump:
$(CC)-objdump $(OBJFLAGS) $(ELF_NAME)
mem-placement:
$(CC)-nm -n $(ELF_NAME)
qemu:
qemu-system-arm $(QEMUFLAGS)
clean:
rm -rf *.bin
rm -rf *.elf
main.c:
static int arr[] = { 1, 10, 4, 5, 6, 7 };
static int sum;
static const int n = sizeof(arr) / sizeof(arr[0]);
int main()
{
int i;
for (i = 0; i < n; i++){
sum += arr[i];
}
return 0;
}
startup.s:
.section "vectors"
reset: b _start
undef: b undef
swi: b swi
pabt: b pabt
dabt: b dabt
nop
irq: b irq
fiq: b fiq
.text
_start:
init:
## Copy data to RAM.
ldr r0, =flash_sdata
ldr r1, =ram_sdata
ldr r2, =data_size
## Handle data_size == 0
cmp r2, #0
beq init_bss
copy:
ldrb r4, [r0], #1
strb r4, [r1], #1
subs r2, r2, #1
bne copy
init_bss:
## Initialize .bss
ldr r0, =sbss
ldr r1, =ebss
ldr r2, =bss_size
## Handle bss_size == 0
cmp r2, #0
beq init_stack
mov r4, #0
zero:
strb r4, [r0], #1
subs r2, r2, #1
bne zero
init_stack:
## Initialize the stack pointer
ldr sp, =0xA4000000
## **this call dosent work as expected.. (r13/sp contains 0xA4000000)**
bl main
## Dosent return from main
## r0 should now contain 33
stop:
b stop
lscript.ld:
/*
* Linker for testing purposes
* (using 16 MB virtual flash = 0x0100_0000)
*/
MEMORY {
rom (rx) : ORIGIN = 0x00000000, LENGTH = 0x01000000
ram (rwx) : ORIGIN = 0xA0000000, LENGTH = 0x04000000
}
SECTIONS {
.text : {
* (vectors);
* (.text);
} > rom
.rodata : {
* (.rodata);
} > rom
flash_sdata = .;
ram_sdata = ORIGIN(ram);
.data : AT (flash_sdata) {
* (.data);
} > ram
ram_edata = .;
data_size = ram_edata - ram_sdata;
sbss = .;
.bss : {
* (.bss);
} > ram
ebss = .;
bss_size = ebss - sbss;
/DISCARD/ : {
*(.note*)
*(.comment)
*(.ARM*)
/*
*(.debug*)
*/
}
}
Disassembly of the executable (objdump):
program.elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <reset>:
0: ea000023 b 94 <_start>
00000004 <undef>:
4: eafffffe b 4 <undef>
00000008 <swi>:
8: eafffffe b 8 <swi>
0000000c <pabt>:
c: eafffffe b c <pabt>
00000010 <dabt>:
10: eafffffe b 10 <dabt>
14: e320f000 nop {0}
00000018 <irq>:
18: eafffffe b 18 <irq>
0000001c <fiq>:
1c: eafffffe b 1c <fiq>
00000020 <main>:
20: e52db004 push {fp} ; (str fp, [sp, #-4]!)
24: e28db000 add fp, sp, #0
28: e24dd00c sub sp, sp, #12
2c: e3a03000 mov r3, #0
30: e50b3008 str r3, [fp, #-8]
34: ea00000d b 70 <main+0x50>
38: e3003000 movw r3, #0
3c: e34a3000 movt r3, #40960 ; 0xa000
40: e51b2008 ldr r2, [fp, #-8]
44: e7932102 ldr r2, [r3, r2, lsl #2]
48: e3003018 movw r3, #24
4c: e34a3000 movt r3, #40960 ; 0xa000
50: e5933000 ldr r3, [r3]
54: e0822003 add r2, r2, r3
58: e3003018 movw r3, #24
5c: e34a3000 movt r3, #40960 ; 0xa000
60: e5832000 str r2, [r3]
64: e51b3008 ldr r3, [fp, #-8]
68: e2833001 add r3, r3, #1
6c: e50b3008 str r3, [fp, #-8]
70: e3a02006 mov r2, #6
74: e51b3008 ldr r3, [fp, #-8]
78: e1530002 cmp r3, r2
7c: baffffed blt 38 <main+0x18>
80: e3a03000 mov r3, #0
84: e1a00003 mov r0, r3
88: e24bd000 sub sp, fp, #0
8c: e49db004 pop {fp} ; (ldr fp, [sp], #4)
90: e12fff1e bx lr
00000094 <_start>:
94: e59f004c ldr r0, [pc, #76] ; e8 <stop+0x4>
98: e59f104c ldr r1, [pc, #76] ; ec <stop+0x8>
9c: e59f204c ldr r2, [pc, #76] ; f0 <stop+0xc>
a0: e3520000 cmp r2, #0
a4: 0a000003 beq b8 <init_bss>
000000a8 <copy>:
a8: e4d04001 ldrb r4, [r0], #1
ac: e4c14001 strb r4, [r1], #1
b0: e2522001 subs r2, r2, #1
b4: 1afffffb bne a8 <copy>
000000b8 <init_bss>:
b8: e59f0034 ldr r0, [pc, #52] ; f4 <stop+0x10>
bc: e59f1034 ldr r1, [pc, #52] ; f8 <stop+0x14>
c0: e59f2034 ldr r2, [pc, #52] ; fc <stop+0x18>
c4: e3520000 cmp r2, #0
c8: 0a000003 beq dc <init_stack>
cc: e3a04000 mov r4, #0
000000d0 <zero>:
d0: e4c04001 strb r4, [r0], #1
d4: e2522001 subs r2, r2, #1
d8: 1afffffc bne d0 <zero>
000000dc <init_stack>:
dc: e3a0d329 mov sp, #-1543503872 ; 0xa4000000
e0: ebffffce bl 20 <main>
000000e4 <stop>:
e4: eafffffe b e4 <stop>
e8: 00000104 andeq r0, r0, r4, lsl #2
ec: a0000000 andge r0, r0, r0
f0: 00000018 andeq r0, r0, r8, lsl r0
f4: a0000018 andge r0, r0, r8, lsl r0
f8: a000001c andge r0, r0, ip, lsl r0
fc: 00000004 andeq r0, r0, r4
Disassembly of section .rodata:
00000100 <n>:
100: 00000006 andeq r0, r0, r6
Disassembly of section .data:
a0000000 <arr>:
a0000000: 00000001 andeq r0, r0, r1
a0000004: 0000000a andeq r0, r0, sl
a0000008: 00000004 andeq r0, r0, r4
a000000c: 00000005 andeq r0, r0, r5
a0000010: 00000006 andeq r0, r0, r6
a0000014: 00000007 andeq r0, r0, r7
Disassembly of section .bss:
a0000018 <sum>:
a0000018: 00000000 andeq r0, r0, r0
Can someone point me in the right direction to why this isn't working according to my expectations?
Thanks Henrik
Minimal examples that just work
https://github.com/cirosantilli/linux-kernel-module-cheat/tree/54e15e04338c0fecc0be139a0da2d0d972c21419#baremetal-setup-getting-started
The prompt.c example takes input from your host terminal and gives back output all through the simulated UART:
enter a character
got: a
new alloc of 1 bytes at address 0x0x4000a1c0
enter a character
got: b
new alloc of 2 bytes at address 0x0x4000a1c0
enter a character
It uses Newlib to expose a subset of the C standard library. This allows you to run existing programs written in C if the only use that restricted subset of the C standard library.
More details about Newlib at: https://electronics.stackexchange.com/questions/223929/c-standard-libraries-on-bare-metal/400077#400077
https://github.com/freedomtan/aarch64-bare-metal-qemu/tree/2ae937a2b106b43bfca49eec49359b3e30eac1b1 for -M virt, just the hello world on the repo. Compile with:
sudo apt-get install gcc-aarch64-linux-gnu
make CROSS_PREFIX=aarch64-linux-gnu-
Here is the example minimized to printing a single character from assembly: How to run a bare metal ELF file on QEMU?
https://github.com/bztsrc/raspi3-tutorial for -M raspi3. Quick getting started at: https://raspberrypi.stackexchange.com/questions/34733/how-to-do-qemu-emulation-for-bare-metal-raspberry-pi-images/85135#85135 Several other examples on the repo going to more advanced subjects.
Also does display output on 09_framebuffer.
Both write a hello world to the UART.
Tested in Ubuntu 18.04, gcc-aarch64-linux-gnu version 4:7.3.0-3ubuntu2.
Debugging!
First, look at the PC and PSR: You're in Undef mode, in the undefined instruction handler.
OK, in an exception mode, the LR tells you where you took the exception. There are some slightly complicated rules between the PC offset and the preferred return address determining exactly what it points at, but just eyeballing it it's clearly in the vicinity of the movw/movt pair.
The movw instruction effectively only exists in the ARMv7 ISA onwards. A brief investigation tells me the machine you're emulating is some old PXA255 thing, whose CPU only implements the ARMv5 ISA. Thus it's not surprising it faults on an instruction that it predates by many years.
Your compiler is apparently configured to target ARMv7 by default (which is not uncommon), so you need to add at least -march=armv5te to your CFLAGS to target the appropriate architecture version. The 'advanced' challenge would be to switch to a different, newer, machine, but that's going to involve adapting the linker script to a new memory map and rewriting any hardware-touching code for new peripherals, so I'd save that idea for the longer term, once you're comfortable with the basics of bare-metal code and slogging through hardware reference manuals.
for the same code on my ubuntu i got
arm-none-eabi-gcc -nostdlib -o sum.elf sum.lds startup.s -w
/usr/lib/gcc/arm-none-eabi/4.9.3/../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 00000000
/tmp/ccBthV7t.o: In function init_stack':
(.text+0x4c): undefined reference tomain'
collect2: error: ld returned 1 exit status

Resources