C function from main is not pushing on stack in arm - c

I am executing C code for arm cortex-m3 for stm32l152C-discovery board but I observed that the function call from main is not getting pushed into the stack. I have analyzed the asm code of this source but I find it is OK. To understand better, please look the asm code generated for C code here:
main.elf: file format elf32-littlearm
*SYMBOL TABLE:
00000010 l d .text 00000000 .text
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l d .debug_frame 00000000 .debug_frame
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000 clock.c
20004ffc g .text 00000000 _STACKTOP
**00000028 g F .text 000000e0 SystemClock_Config**
20000000 g .text 00000000 _DATA_BEGIN
20000000 g .text 00000000 _HEAP
**00000010 g F .text 00000016 main**
20000000 g .text 00000000 _BSS_END
00000108 g .text 00000000 _DATAI_BEGIN
20000000 g .text 00000000 _BSS_BEGIN
00000108 g .text 00000000 _DATAI_END
20000000 g .text 00000000 _DATA_END
Disassembly of section .text:
00000010 <main>:
#define LL_GPIO_MODE_OUTPUT 1
void SystemInit() ;
int main()
{
10: b580 push {r7, lr}
12: b082 sub sp, #8
14: af00 add r7, sp, #0
int i = 0;
16: 2300 movs r3, #0
18: 607b str r3, [r7, #4]
SystemClock_Config();
**1a: f000 f805 bl 28 <SystemClock_Config>
for(;;)
i++;
1e: 687b ldr r3, [r7, #4]
20: 3301 adds r3, #1**
22: 607b str r3, [r7, #4]
24: e7fb b.n 1e <main+0xe>
}
00000028 <SystemClock_Config>:
* PLLDIV = 3
* Flash Latency(WS) = 1
* #retval None
*/
void SystemClock_Config(void)
{
28: b480 push {r7}
2a: af00 add r7, sp, #0
SET_BIT(FLASH->ACR, FLASH_ACR_ACC64);
2c: 4a33 ldr r2, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
2e: 4b33 ldr r3, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
30: 681b ldr r3, [r3, #0]
32: f043 0304 orr.w r3, r3, #4
36: 6013 str r3, [r2, #0]
MODIFY_REG(FLASH->ACR, FLASH_ACR_LATENCY, LL_FLASH_LATENCY_1);
38: 4a30 ldr r2, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3a: 4b30 ldr r3, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3c: 681b ldr r3, [r3, #0]
3e: f043 0301 orr.w r3, r3, #1
42: 6013 str r3, [r2, #0]*
}
the execution loops around 0x1a, 0x1c, 0x1e, 0x20 in PC register.
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
It should jump to 0x28 (SystemClock_Config) at 0x1a.

A very simple completely working example:
vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl centry
done: b done
so.c
unsigned int fun ( unsigned int );
unsigned int centry ( void )
{
return(fun(5)+1);
}
fun.c
unsigned int fun ( unsigned int x )
{
return(x+1);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -o so.elf -T flash.ld vectors.o so.o fun.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
the whole program
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000009 andeq r0, r0, r9
00000008 <reset>:
8: f000 f802 bl 10 <centry>
0000000c <done>:
c: e7fe b.n c <done>
...
00000010 <centry>:
10: b510 push {r4, lr}
12: 2005 movs r0, #5
14: f000 f802 bl 1c <fun>
18: 3001 adds r0, #1
1a: bd10 pop {r4, pc}
0000001c <fun>:
1c: 3001 adds r0, #1
1e: 4770 bx lr
a simulation of the program:
read32(0x00000000)=0x20001000
read32(0x00000004)=0x00000009
--- 0x00000008: 0xF000
--- 0x0000000A: 0xF802 bl 0x0000000F
--- 0x00000010: 0xB510 push {r4,lr}
write32(0x20000FF8,0x00000000)
write32(0x20000FFC,0x0000000D)
--- 0x00000012: 0x2005 movs r0,#0x05
--- 0x00000014: 0xF000
--- 0x00000016: 0xF802 bl 0x0000001B
--- 0x0000001C: 0x3001 adds r0,#0x01
--- 0x0000001E: 0x4770 bx r14
--- 0x00000018: 0x3001 adds r0,#0x01
--- 0x0000001A: 0xBD10 pop {r4,pc}
read32(0x20000FF8)=0x00000000
read32(0x20000FFC)=0x0000000D
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
sure it is a somewhat useless program but it demonstrates booting and calling functions (the function address does not show up on the stack, when you do a call (bl) the r14 gets the return address and r15 gets the address to branch to. if you have nested functions like centry (the C entry point main() is not an important function name you can call your entry point whatever you want so long as your bootstrap matches) calling fun, then you need to preserve the return address however you choose, typically save it on the stack. r4 is being pushed just to keep the stack aligned on a 64 bit boundary per the abi.
for your system you would set the linker script for 0x08000000 normally (stm32).
What we are missing from you is the beginning of your binary, can you do a hexdump of the memory image/binary showing the handfuls of byte before main including the first few instructions of main?
If a bare metal program doesnt do the simplest boot steps right, the first thing you do is to examine the binary where the entry point or vector table is depending on the architecture and see that you built it right.
In this case in my example this is a cortex-m so the stack pointer initialization value (if you choose to use it) is at 0x00000000, you can put anything there and then simply write over the sp if you want, your choice...then address 0x00000004 is the reset vector which is the address of the code to handle the reset with the lsbit set to indicate thumb mode.
so 0x00000008|1 = 0x00000009.
If you dont have
0x2000xxxx
0x00000011
then your processor is not going to boot right. I am so much in the habit of using 0x08000000 that I dont remember if 0x00000000 works for an stm, it in theory should...but depends on how you are loading the flash and what mode/state the chip is in at that time.
you might need to link for 0x08000000 and at a minimum if nothing else changed
0x2000xxxx
0x08000011
as the first two word in your binary/memory image.
EDIT
note you can make a single binary that can be entered both with a vector or a bootloader
.thumb
.thumb_func
.global _start
_start:
bl reset
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
placing a branch (well bl to fill the space) in the stack address spot then loading the stack pointer later.
Or use a branch
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000

Your application is missing an interrupt table. As a result, the processor is reading instructions as interrupt vectors, and faulting repeatedly as those instructions cannot be interpreted as invalid addresses.
Use the support files from the STM32L1xx standard peripheral library to generate an appropriate linker script and interrupt table.

Related

How to implement SVC handler on ARM926EJ-S?

I'm writing an amateur operating system for ARM-based devices and currently trying to make it working in QEMU's versatilepb (ARM926EJ-S).
The problem arrives when I try to implement syscalls to my kernel. The idea is pretty simple: to implement system calls via SVC (SWI) instruction. So applications work in user mode, and to call a kernel function, they do SVC <code> instruction, so ARM processor switches to supervisor mode and calls the appropriate SVC handler.
But the problem is that when I call __asm__("SVC #0x08");, the device just resets and calls RESET_HANDLER, so it looks like the emulator just reboots.
I spent a few hours already to figure out what is the problem, but still got no idea.
Here is the code of ivt.s (the initial code with handlers):
.global __RESET
__RESET:
B RESET_HANDLER /* Reset */
B . /* Undefined */
B SWI_HANDLER /* SWI */
B . /* Prefetch Abort */
B . /* Data Abort */
B . /* reserved */
B . /* IRQ */
B . /* FIQ */
RESET_HANDLER:
MSR CPSR_c, 0x13 /* Supervisor mode */
LDR SP, =stack_top
MSR CPSR_c, 0x10 /* User mode */
LDR SP, =usr_stack_top
BL usermode_function
B .
SWI_HANDLER:
PUSH {LR}
BL syscall
POP {LR}
MOVS PC, LR
This is how I make the syscall:
void usermode_function() {
__asm__("SVC #0x00"); // Make syscall
}
And syscall implementation:
void syscall() {
// NEVER CALLED
__asm__("PUSH {r0-r7}");
__asm__("POP {r0-r7}");
}
But the code under SWI_HANDLER even never invoked.
I really even don't know how to ask the question, since it looks like I'm missing some very basic information in my mind.
So what could be the problem? Which information I should provide to make you able to help me?
Here is also the linker script:
ENTRY(__RESET)
SECTIONS
{
. = 0x10000;
.ivt . : { ivt.o(.text) }
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(8);
. = . + 0x1000; /* 4KB of stack memory */
stack_top = .;
. = . + 0x100;
usr_stack_top = .;
}
Many thanks to #Jester and #old_timer, the problem is solved.
The problem was not with code, but with linker script. I have put my vector table at 0x10000, as you can see in the linker script, but it should be placed at 0x0. So SVC was not handled properly because the handler was placed in a wrong place.
When I changed the base address in my ld script and tried to load the firmware as ELF, everything starts to work perfectly.
You solved it one way but I'll still write my answer.
Very bare bare metal example...
strap.s
.globl _start
_start:
b reset
b hang
b swi_handler
b hang
reset:
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstrings ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x20);
}
static void hexstring ( unsigned int d )
{
hexstrings(d);
uart_send(0x0D);
uart_send(0x0A);
}
int notmain ( void )
{
unsigned int ra;
hexstring(0x12345678);
hexstring(GETPC());
for(ra=0;ra<0x20;ra+=4)
{
hexstrings(ra);
hexstring(GET32(ra));
}
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
Build
arm-linux-gnueabi-as --warn --fatal-warnings -march=armv5t strap.s -o strap.o
arm-linux-gnueabi-gcc -c -Wall -O2 -nostdlib -nostartfiles -ffreestanding -march=armv5t notmain.c -o notmain.o
arm-linux-gnueabi-ld strap.o notmain.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
arm-linux-gnueabi-objcopy notmain.elf -O binary notmain.bin
Execute
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.bin
Output
12345678
0001003C
00000000 E3A00000
00000004 E59F1004
00000008 E59F2004
0000000C E59FF004
00000010 00000183
00000014 00000100
00000018 00010000
0000001C 00000000
Examine, assemble disassemble
.word 0xE3A00000
.word 0xE59F1004
.word 0xE59F2004
.word 0xE59FF004
.word 0x00000183
.word 0x00000100
.word 0x00010000
.word 0x00000000
0: e3a00000 mov r0, #0
4: e59f1004 ldr r1, [pc, #4] ; 10 <.text+0x10>
8: e59f2004 ldr r2, [pc, #4] ; 14 <.text+0x14>
c: e59ff004 ldr pc, [pc, #4] ; 18 <.text+0x18>
10: 00000183 andeq r0, r0, r3, lsl #3
14: 00000100 andeq r0, r0, r0, lsl #2
18: 00010000 andeq r0, r1, r0
1c: 00000000 andeq r0, r0, r0
So you can see that they are basically launching a Linux kernel the ATAGS/dtb is in ram at 0x100 perhaps. And they jump to 0x10000. 0001003C being the pc shown by the program as loaded with that command line using the -O binary version was loaded at 0x10000 and executed there. If you were to have an swi event then you would execute starting with the ldr r2 instruction and land on the rest handler in your code.
(Note incidentally that qemu doesn't properly model uarts, at least so far as I have found so you don't have to initialize them you don't have to wait for the tx buffer to be empty you just jam bytes into the tx buffer and they come out).
If you run the elf without changing the linker script
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.elf
12345678
0001003C
00000000 00000000
00000004 00000000
00000008 00000000
0000000C 00000000
00000010 00000000
00000014 00000000
00000018 00000000
0000001C 00000000
Interesting it loads and runs at 0x10000 which is what it was linked for but doesn't bother to setup for coming out of reset at 0x00000000 and/or this is that linker issue that makes for bad elf files and it padded with zeros which is
1c: 00000000 andeq r0, r0, r0
So it could have executed from 0x00000000 to 0x10000 and run into our code.
If we change the linker script
ram : ORIGIN = 0x00000000, LENGTH = 32K
Run the elf not the bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.elf
12345678
0000003C
00000000 EA000002
00000004 EA000006
00000008 EA000006
0000000C EA000004
00000010 E321F013
00000014 E3A0D801
00000018 E321F010
0000001C E3A0DA09
as expected.
Now for the swi.
strap.s
.globl _start
_start:
b reset
b hang
b swi_handler
b hang
reset:
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
bl handler
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl do_swi
do_swi:
svc #0x08
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
void do_swi ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstring ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
void handler ( void )
{
hexstring(0x11223344);
}
int notmain ( void )
{
hexstring(0x12345678);
do_swi();
hexstring(0x12345678);
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
Run the elf, output is
12345678
11223344
12345678
as desired. But you could have also done this
strap.s
.globl _start
_start:
ldr pc,reset_addr
ldr pc,hang_addr
ldr pc,swi_handler_addr
ldr pc,hang_addr
reset_addr: .word reset
hang_addr: .word hang
swi_handler_addr: .word swi_handler
reset:
mov r0,#0x10000
mov r1,#0x00000
ldmia r0!,{r2,r3,r4,r5}
stmia r1!,{r2,r3,r4,r5}
ldmia r0!,{r2,r3,r4,r5}
stmia r1!,{r2,r3,r4,r5}
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
bl handler
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl do_swi
do_swi:
svc #0x08
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
void do_swi ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstring ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
void handler ( void )
{
hexstring(0x11223344);
}
int notmain ( void )
{
unsigned int ra;
hexstring(0x12345678);
for(ra=0x10000;ra<0x10020;ra+=4) hexstring(GET32(ra));
for(ra=0x00000;ra<0x00020;ra+=4) hexstring(GET32(ra));
do_swi();
hexstring(0x12345678);
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
And now both the elf and the binary image versions work. I let the toolchain do the work for me:
00010010 <reset_addr>:
10010: 0001001c
00010014 <hang_addr>:
10014: 00010048
00010018 <swi_handler_addr>:
10018: 0001004c
The ldr pc, is position independent. I copy the four entries plus the four (well three) addresses so that 0x00000 matches 0x10000 and now the exception table (it is not a vector table btw) works.
With newer arm processors you could instead set VTOR to 0x10000 and it would use the one built into the binary, no copying necessary. Or as you solved just build and run your program from 0x00000 and there you go. I wanted to show the alternatives as well as how to figure out (by cheating, you have to love uarts in qemu) what qemu is doing and where it is loading without having to use a debugger.

How to to build a binary into a image at a fixed address 0x80000?

In our c project, we need to build a binary firmware into a image at a fixed file offset 0x80000.
Then when the image is loaded to memory. We can load firmware from offset 0x80000 to a specified address.
Meanwhile, as the firmware is placed at file offset 0x80000, we can upgrade the firmare independently.
So I'm trying to use GNU linker script to implement that.
What I do now is use incbin to include my binary file in a asm file.
And in linker script, my code is:
.fw_image_start : {
*(.__fw_image_start)
}
.fw_image : {
KEEP(*(.fw_image))
}
.fw_image_end : {
*(.__fw_image_end)
}
Then I can use fw_image_start to load firmware in image code.
But I still can't find a way to put the firmware binary to file offset 0x80000 in the final image.
Could you help me on this?
Thank you in advance!
What did you find when you looked at the documentation and examples from GNU? Some of it is admittedly confusing or misleading, but some is pretty easy. This should give a hit of at least one way to do it (there are multiple ways to solve your problem).
novectors.s
.global _start
_start:
bl notmain
b .
.globl bounce
bounce:
bx lr
.section .hello_world
.word 1,2,3,4
notmain.c
void bounce ( unsigned int );
unsigned int mybss[8];
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) bounce(ra);
return(0);
}
memmap.ld
MEMORY
{
ram : ORIGIN = 0x80000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.hello_world : { *(.hello_world) } > ram
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings novectors.s -o novectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c notmain.c -o notmain.o
arm-none-eabi-ld -o notmain.elf -T memmap.ld novectors.o notmain.o
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf notmain.bin -O binary
examine results
Disassembly of section .text:
00080000 <_start>:
80000: eb000001 bl 8000c <notmain>
80004: eafffffe b 80004 <_start+0x4>
00080008 <bounce>:
80008: e12fff1e bx lr
0008000c <notmain>:
8000c: e92d4010 push {r4, lr}
80010: e3a04000 mov r4, #0
80014: e1a00004 mov r0, r4
80018: e2844001 add r4, r4, #1
8001c: ebfffff9 bl 80008 <bounce>
80020: e3540ffa cmp r4, #1000 ; 0x3e8
80024: 1afffffa bne 80014 <notmain+0x8>
80028: e3a00000 mov r0, #0
8002c: e8bd4010 pop {r4, lr}
80030: e12fff1e bx lr
Disassembly of section .hello_world:
00080034 <.hello_world>:
80034: 00000001 andeq r0, r0, r1
80038: 00000002 andeq r0, r0, r2
8003c: 00000003 andeq r0, r0, r3
80040: 00000004 andeq r0, r0, r4
Disassembly of section .bss:
00080034 <mybss>:
...
Naturally:
MEMORY
{
bob : ORIGIN = 0x80000, LENGTH = 0x1000
ted : ORIGIN = 0xB0000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > bob
.rodata : { *(.rodata*) } > bob
.hello_world : { *(.hello_world) } > ted
.bss : { *(.bss*) } > ted
}
gives
Disassembly of section .text:
00080000 <_start>:
80000: eb000001 bl 8000c <notmain>
80004: eafffffe b 80004 <_start+0x4>
00080008 <bounce>:
80008: e12fff1e bx lr
0008000c <notmain>:
8000c: e92d4010 push {r4, lr}
80010: e3a04000 mov r4, #0
80014: e1a00004 mov r0, r4
80018: e2844001 add r4, r4, #1
8001c: ebfffff9 bl 80008 <bounce>
80020: e3540ffa cmp r4, #1000 ; 0x3e8
80024: 1afffffa bne 80014 <notmain+0x8>
80028: e3a00000 mov r0, #0
8002c: e8bd4010 pop {r4, lr}
80030: e12fff1e bx lr
Disassembly of section .hello_world:
000b0000 <.hello_world>:
b0000: 00000001 andeq r0, r0, r1
b0004: 00000002 andeq r0, r0, r2
b0008: 00000003 andeq r0, r0, r3
b000c: 00000004 andeq r0, r0, r4
Disassembly of section .bss:
000b0000 <mybss>:
...
and as demonstrated the name ram, rom, etc are not special can call it other things like bob, ted, alice...I assume there are some reserved words you cant use.
Again there are numerous solutions, see the GNU documentation, I like this method as it reads better for me, but you will see solutions that skip the MEMORY part.
(no this wasnt intended to be completely correct code, but demonstrates the assembly language bootstrap, the C code and the linker script).

Keil stm32, using assembly, scatter file and c. How to export c code entry point to assembly?

In order to combine .c and assembly, I want to pass start address of my .c code, and program microcontroller to know that its program starts at that address. As I am writing my startup file in assembly, I need to pass .c code starting address to assembly, and then to write this address to the specific memory region of microcontroller ( so the microcontroller can start execution on this address after RESET)
Trying to create a project for stm32f103 in Keil with this structure:
Some .c file, for example main.c (for the main part of the program).
Startup file in assembly language. Which gets the adress of entry to the function written in some .c file, to be passed to Reset_Handler
Scatter file, written in this way:
LR_IROM1 0x08000000 0x00010000 { ; load region size_region
ER_IROM1 0x08000000 0x00010000 { ; load address = execution address
*.o (RESET, +First) ; RESET is code section with I.V.T.
* (InRoot$$Sections)
.ANY (+RO)
.ANY (+XO)
}
RW_IRAM1 0x20000000 0x00005000 { ; RW data
.ANY (+RW +ZI)
}
}
The problem is passing the entry point to the .c function. Reset_Handler, which needs .c entry point(starting adress) passed by __main, looks like this:
Reset_Handler PROC
EXPORT Reset_Handler [WEAK]
IMPORT __main
LDR R0, =__main
BX R0
ENDP
bout entry point __main, as a answer for one assembly raleted question was written:
__main() is the compiler supplied entry point for your C code. It is not the main() function you write, but performs initialisation for the
standard library, static data, the heap before calling your `main()'
function.
So, how to get this entry point in my assembly file?
Edit>> If somebody is interested in solution for KEIL, here it is, its all that simple!
Simple assembly startup.s file:
AREA STACK, NOINIT, READWRITE
SPACE 0x400
Stack_top
AREA RESET, DATA, READONLY
dcd Stack_top
dcd Reset_Handler
EXPORT _InitMC
IMPORT notmain
AREA PROGRAM, CODE, READONLY
Reset_Handler PROC
bl notmain
ENDP
_InitMC PROC ;start of the assembly procedure
Loop
b Loop ;infinite loop
ENDP
END
Simple c file:
extern int _InitMC();
int notmain(void) {
_InitMC();
return 0;
}
Linker is the same as the one mentioned above.
Project build was successful.
Using the gnu toolchain for example:
Bootstrap:
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word loop
.word loop
.word loop
.thumb_func
reset:
bl notmain
b loop
.thumb_func
loop: b .
.align
.thumb_func
.globl fun
fun:
bx lr
.end
C entry point (function name is not relevant, sometimes using main() adds garbage, depends on the compiler/toolchain)
void fun ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) fun(ra);
return(0);
}
Linker script
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Build
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m0 -march=armv6-m -c so.c -o so.thumb.o
arm-none-eabi-ld -o so.thumb.elf -T flash.ld flash.o so.thumb.o
arm-none-eabi-objdump -D so.thumb.elf > so.thumb.list
arm-none-eabi-objcopy so.thumb.elf so.thumb.bin -O binary
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c so.c -o so.thumb2.o
arm-none-eabi-ld -o so.thumb2.elf -T flash.ld flash.o so.thumb2.o
arm-none-eabi-objdump -D so.thumb2.elf > so.thumb2.list
arm-none-eabi-objcopy so.thumb2.elf so.thumb2.bin -O binary
Result (all thumb versions)
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000
8000004: 08000015
8000008: 0800001b
800000c: 0800001b
8000010: 0800001b
08000014 <reset>:
8000014: f000 f804 bl 8000020 <notmain>
8000018: e7ff b.n 800001a <loop>
0800001a <loop>:
800001a: e7fe b.n 800001a <loop>
0800001c <fun>:
800001c: 4770 bx lr
800001e: 46c0 nop ; (mov r8, r8)
08000020 <notmain>:
8000020: b570 push {r4, r5, r6, lr}
8000022: 25fa movs r5, #250 ; 0xfa
8000024: 2400 movs r4, #0
8000026: 00ad lsls r5, r5, #2
8000028: 0020 movs r0, r4
800002a: 3401 adds r4, #1
800002c: f7ff fff6 bl 800001c <fun>
8000030: 42ac cmp r4, r5
8000032: d1f9 bne.n 8000028 <notmain+0x8>
8000034: 2000 movs r0, #0
8000036: bd70 pop {r4, r5, r6, pc}
Of course this has to be placed in flash at the right place with some tool.
The vector table is mapped by logic to 0x00000000 in the stm32 family.
08000000 <_start>:
8000000: 20001000
8000004: 08000015 <---- reset ORR 1
And in this minimal code the reset handler calls the C code the C code messes around and returns. Technically a fully functional program for most stm32s (change the stack init to a smaller value for those with less ram say 0x20000400 and it should work anywhere by using -mthumb by itself (armv4t) or adding the cortex-m0. well okay not the armv8ms they can technically not support all of armv6m but the one in the field I know about does.
I don't have Kiel so don't know how to translate to that, but it shouldn't be much of a stretch, just syntax.

How do I convert a binary firmware dump to an .elf for assembly language debugging?

I have a binary firmware image for ARM Cortex M that I know should be loaded at 0x20000000. I would like to convert it to a format that I can use for assembly level debugging with gdb, which I assume means converting to an .elf. But I have not been able to figure out how to add enough metadata to the .elf for this to happen. Here is what I've tried so far.
arm-none-eabi-objcopy -I binary -O elf32-littlearm --set-section-flags \
.data=alloc,contents,load,readonly \
--change-section-address .data=0x20000000 efr32.bin efr32.elf
efr32.elf: file format elf32-little
efr32.elf
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000168 20000000 20000000 00000034 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
SYMBOL TABLE:
20000000 l d .data 00000000 .data
20000000 g .data 00000000 _binary_efr32_bin_start
20000168 g .data 00000000 _binary_efr32_bin_end
00000168 g *ABS* 00000000 _binary_efr32_bin_size
Do I need to start by converting the binary to .o and write a simple linker script? Should I add an architecture option to the objcopy command?
A little experiment...
58: 480a ldr r0, [pc, #40] ; (84 <spi_write_byte+0x38>)
5a: bf08 it eq
5c: 4809 ldreq r0, [pc, #36] ; (84 <spi_write_byte+0x38>)
5e: f04f 01ff mov.w r1, #255 ; 0xff
you dont have that of course, but you can read the binary and do this with it:
.thumb
.globl _start
_start:
.inst.n 0x480a
.inst.n 0xbf08
.inst.n 0x4809
.inst.n 0xf04f
.inst.n 0x01ff
then see what happens.
arm-none-eabi-as test.s -o test.o
arm-none-eabi-ld -Ttext=0x58 test.o -o test.elf
arm-none-eabi-objdump -D test.elf
test.elf: file format elf32-littlearm
Disassembly of section .text:
00000058 <_start>:
58: 480a ldr r0, [pc, #40] ; (84 <_start+0x2c>)
5a: bf08 it eq
5c: 4809 ldreq r0, [pc, #36] ; (84 <_start+0x2c>)
5e: f04f 01ff mov.w r1, #255 ; 0xff
but the reality is it wont work...if this binary has any thumb2 extensions it isnt going to work, you cant disassemble variable length instructions linearly. You have to deal with them in execution order. So to do this correctly you have to write a dissassembler that walks through the code in execution order, determining the instructions you can figure out, mark them as instructions...
80: d1e8 bne.n 54 <spi_write_byte+0x8>
82: bd70 pop {r4, r5, r6, pc}
84: 40005200
88: F7FF4000
8c: e92d 41f0 stmdb sp!, {r4, r5, r6, r7, r8, lr}
90: 4887 ldr r0, [pc, #540] ; (2b0 <notmain+0x224>)
.thumb
.globl _start
_start:
.inst.n 0xd1e8
.inst.n 0xbd70
.inst.n 0x5200
.inst.n 0x4000
.inst.n 0x4000
.inst.n 0xF7FF
.inst.n 0xe92d
.inst.n 0x41f0
.inst.n 0x4887
80: d1e8 bne.n 54 <_start-0x2c>
82: bd70 pop {r4, r5, r6, pc}
84: 5200 strh r0, [r0, r0]
86: 4000 ands r0, r0
88: 4000 ands r0, r0
8a: f7ff e92d ; <UNDEFINED> instruction: 0xf7ffe92d
8e: 41f0 rors r0, r6
90: 4887 ldr r0, [pc, #540] ; (2b0 <_start+0x230>)
it will recover, and break and recover, etc...
instead you have to write a disassembler that walks through the code (doesnt necessarily have to disassemble to assembly language but enough to walk the code and recurse down all possible branch paths). all data not determined to be instructions mark as instructions
.thumb
.globl _start
_start:
.inst.n 0xd1e8
.inst.n 0xbd70
.word 0x40005200
.word 0xF7FF4000
.inst.n 0xe92d
.inst.n 0x41f0
.inst.n 0x4887
00000080 <_start>:
80: d1e8 bne.n 54 <_start-0x2c>
82: bd70 pop {r4, r5, r6, pc}
84: 40005200 andmi r5, r0, r0, lsl #4
88: f7ff4000 ; <UNDEFINED> instruction: 0xf7ff4000
8c: e92d 41f0 stmdb sp!, {r4, r5, r6, r7, r8, lr}
90: 4887 ldr r0, [pc, #540] ; (2b0 <_start+0x230>)
and our stmdb instruction is now correct.
good luck.

Third party C static library: Add -ffunction-sections -fdata-sections

I am in the situation to have a c static library (compiled with arm-gcc), which is provided by a third party. I have no possibility to (let the third party) re-compile the library.
When investigating the library contents, i found that the gcc options -ffunction-sections and -fdata-sections have not been used for compiling the library. But this would be very helpful for reducing the binary size of the project.
Compilation is done with: (GNU Tools for ARM Embedded Processors) 4.8.4 20140526 (release) [ARM/embedded-4_8-branch revision 211358].
Is there any way to put every data and every function into their own separate section to enable function-level-linking for this library, without needing to recompile code?
I thought of this possible approach:
Split library into its object files.
For each object file:
Write code to move the symbols into own sections
Put new object files back together into archive file
Could this work, or do you have other suggestions, which ideally only use the tools provided by arm-gcc?
I'm aware this is old, but I came across this problem as well, and figured I'd provide my findings.
TL;DR: It's possible, but incredibly difficult. You can't simply move symbols into their own sections. Relocations will bite you.
When the compiler generates machine code, it will generate slightly different instructions if the -ffunction-sections and -fdata-sections flags are, or are not, provided. This is due to assumptions the compiler is able to make about where symbols will be located. These assumptions change depending on the flags provided.
This is best illustrated by example. Take the following very simple code snippet:
int a, b;
int getAPlusB()
{
return a + b;
}
The following is the result of arm-none-eabi-objdump -xdr test.o:
arm-none-eabi-gcc -c -Os -mthumb -mcpu=cortexm3 -mlittle-endian -o test.o test.c:
SYMBOL TABLE:
00000000 g F .text 0000000c getAPlusB
00000004 g O .bss 00000004 b
00000000 g O .bss 00000004 a
Disassembly of section .text:
00000024 <getAPlusB>:
24: 4b01 ldr r3, [pc, #4] ; (2c <getAPlusB+0x8>)
26: cb09 ldmia r3, {r0, r3}
28: 4418 add r0, r3
2a: 4770 bx lr
2c: 00000000 .word 0x00000000
2c: R_ARM_ABS32 .bss
arm-none-eabi-gcc -c -Os -ffunction-sections -fdata-sections \
-mthumb -mcpu=cortexm3 -mlittle-endian -o test.o test.c:
SYMBOL TABLE:
00000000 g F .text.getAPlusB 00000014 getAPlusB
00000000 g O .bss.b 00000004 b
00000000 g O .bss.a 00000004 a
Disassembly of section .text.getAPlusB:
00000000 <getAPlusB>:
0: 4b02 ldr r3, [pc, #8] ; (c <getAPlusB+0xc>)
2: 6818 ldr r0, [r3, #0]
4: 4b02 ldr r3, [pc, #8] ; (10 <getAPlusB+0x10>)
6: 681b ldr r3, [r3, #0]
8: 4418 add r0, r3
a: 4770 bx lr
...
c: R_ARM_ABS32 .bss.a
10: R_ARM_ABS32 .bss.b
The difference is subtle, but important. The flag enabled code performs two separate loads, while the disabled code performs a single "load multiple." The enabled code does this because it knows both symbols are contained in the same section, in a certain sequence. With the enabled code, this is not the case. The symbols are in two separate sections, and while it is likely they will keep their order and proximity, it is not guaranteed. What's more, if both sections are not referenced, the linker may decide one section is not used, and remove it.
Another example:
int a, b;
int getB()
{
return b;
}
And the generated code. First without the flags:
SYMBOL TABLE:
00000000 g F .text 0000000c getB
00000004 g O .bss 00000004 b
00000000 g O .bss 00000004 a
Disassembly of section .text:
00000018 <getB>:
18: 4b01 ldr r3, [pc, #4] ; (20 <getB+0x8>)
1a: 6858 ldr r0, [r3, #4]
1c: 4770 bx lr
1e: bf00 nop
20: 00000000 .word 0x00000000
20: R_ARM_ABS32 .bss
And with the flags:
SYMBOL TABLE:
00000000 g F .text.getB 00000014 getB
00000000 g O .bss.b 00000004 b
00000000 g O .bss.a 00000004 a
Disassembly of section .text.getB:
00000000 <getB>:
0: 4b01 ldr r3, [pc, #4] ; (8 <getB+0x8>)
2: 6818 ldr r0, [r3, #0]
4: 4770 bx lr
6: bf00 nop
8: 00000000 .word 0x00000000
8: R_ARM_ABS32 .bss.b
In this case, the difference is even more subtle. The enabled code loads with an offset of 0, while the disabled code uses 4. Since the disabled code references the beginning of the section, it needs to offset to the location of b. However the enabled code references the section which contains solely b, and therefore does not need an offset. If we were to split this and only change the relocation, the new code would contain a reference to the section a was in, but not b. This, again, could cause the linker to garbage collect the wrong section.
These were just two scenarios that I came across when looking at this problem, there may be more.
Producing valid object files functionally equivalent to code compiled with the -ffunction-sections and -fdata-sections flags would require parsing the machine instructions looking for these and any other relocation issues that could come up. This is not an easy task to accomplish.

Resources