I am in the situation to have a c static library (compiled with arm-gcc), which is provided by a third party. I have no possibility to (let the third party) re-compile the library.
When investigating the library contents, i found that the gcc options -ffunction-sections and -fdata-sections have not been used for compiling the library. But this would be very helpful for reducing the binary size of the project.
Compilation is done with: (GNU Tools for ARM Embedded Processors) 4.8.4 20140526 (release) [ARM/embedded-4_8-branch revision 211358].
Is there any way to put every data and every function into their own separate section to enable function-level-linking for this library, without needing to recompile code?
I thought of this possible approach:
Split library into its object files.
For each object file:
Write code to move the symbols into own sections
Put new object files back together into archive file
Could this work, or do you have other suggestions, which ideally only use the tools provided by arm-gcc?
I'm aware this is old, but I came across this problem as well, and figured I'd provide my findings.
TL;DR: It's possible, but incredibly difficult. You can't simply move symbols into their own sections. Relocations will bite you.
When the compiler generates machine code, it will generate slightly different instructions if the -ffunction-sections and -fdata-sections flags are, or are not, provided. This is due to assumptions the compiler is able to make about where symbols will be located. These assumptions change depending on the flags provided.
This is best illustrated by example. Take the following very simple code snippet:
int a, b;
int getAPlusB()
{
return a + b;
}
The following is the result of arm-none-eabi-objdump -xdr test.o:
arm-none-eabi-gcc -c -Os -mthumb -mcpu=cortexm3 -mlittle-endian -o test.o test.c:
SYMBOL TABLE:
00000000 g F .text 0000000c getAPlusB
00000004 g O .bss 00000004 b
00000000 g O .bss 00000004 a
Disassembly of section .text:
00000024 <getAPlusB>:
24: 4b01 ldr r3, [pc, #4] ; (2c <getAPlusB+0x8>)
26: cb09 ldmia r3, {r0, r3}
28: 4418 add r0, r3
2a: 4770 bx lr
2c: 00000000 .word 0x00000000
2c: R_ARM_ABS32 .bss
arm-none-eabi-gcc -c -Os -ffunction-sections -fdata-sections \
-mthumb -mcpu=cortexm3 -mlittle-endian -o test.o test.c:
SYMBOL TABLE:
00000000 g F .text.getAPlusB 00000014 getAPlusB
00000000 g O .bss.b 00000004 b
00000000 g O .bss.a 00000004 a
Disassembly of section .text.getAPlusB:
00000000 <getAPlusB>:
0: 4b02 ldr r3, [pc, #8] ; (c <getAPlusB+0xc>)
2: 6818 ldr r0, [r3, #0]
4: 4b02 ldr r3, [pc, #8] ; (10 <getAPlusB+0x10>)
6: 681b ldr r3, [r3, #0]
8: 4418 add r0, r3
a: 4770 bx lr
...
c: R_ARM_ABS32 .bss.a
10: R_ARM_ABS32 .bss.b
The difference is subtle, but important. The flag enabled code performs two separate loads, while the disabled code performs a single "load multiple." The enabled code does this because it knows both symbols are contained in the same section, in a certain sequence. With the enabled code, this is not the case. The symbols are in two separate sections, and while it is likely they will keep their order and proximity, it is not guaranteed. What's more, if both sections are not referenced, the linker may decide one section is not used, and remove it.
Another example:
int a, b;
int getB()
{
return b;
}
And the generated code. First without the flags:
SYMBOL TABLE:
00000000 g F .text 0000000c getB
00000004 g O .bss 00000004 b
00000000 g O .bss 00000004 a
Disassembly of section .text:
00000018 <getB>:
18: 4b01 ldr r3, [pc, #4] ; (20 <getB+0x8>)
1a: 6858 ldr r0, [r3, #4]
1c: 4770 bx lr
1e: bf00 nop
20: 00000000 .word 0x00000000
20: R_ARM_ABS32 .bss
And with the flags:
SYMBOL TABLE:
00000000 g F .text.getB 00000014 getB
00000000 g O .bss.b 00000004 b
00000000 g O .bss.a 00000004 a
Disassembly of section .text.getB:
00000000 <getB>:
0: 4b01 ldr r3, [pc, #4] ; (8 <getB+0x8>)
2: 6818 ldr r0, [r3, #0]
4: 4770 bx lr
6: bf00 nop
8: 00000000 .word 0x00000000
8: R_ARM_ABS32 .bss.b
In this case, the difference is even more subtle. The enabled code loads with an offset of 0, while the disabled code uses 4. Since the disabled code references the beginning of the section, it needs to offset to the location of b. However the enabled code references the section which contains solely b, and therefore does not need an offset. If we were to split this and only change the relocation, the new code would contain a reference to the section a was in, but not b. This, again, could cause the linker to garbage collect the wrong section.
These were just two scenarios that I came across when looking at this problem, there may be more.
Producing valid object files functionally equivalent to code compiled with the -ffunction-sections and -fdata-sections flags would require parsing the machine instructions looking for these and any other relocation issues that could come up. This is not an easy task to accomplish.
Related
I want to enable thumb mode in stm32f401re board. the code i had written for it is in embedded c. How do we enable thumb mode in embedded c language. Do we use -mthumb command for it, do we have to add any library prior using that command. Or is there any totally different method.
I searched and found the method only in assembly language. But i want it in embedded c. I used even the -mthumb command but it showed an error.
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x12345678));
}
$ arm-none-eabi-gcc -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <fun+0x14>
8: ebfffffe bl 0 <more_fun>
c: e8bd4010 pop {r4, lr}
10: e12fff1e bx lr
14: 12345678 .word 0x12345678
That is defaulting to arm, looks like armv4, so that should work on non-cortex-ms from armv4 to armv7 (couple of decades).
To get all thumb variants, which will work on your cortex-m4
$ arm-none-eabi-gcc -mthumb -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 4803 ldr r0, [pc, #12] ; (10 <fun+0x10>)
4: f7ff fffe bl 0 <more_fun>
8: bc10 pop {r4}
a: bc02 pop {r1}
c: 4708 bx r1
e: 46c0 nop ; (mov r8, r8)
10: 12345678 .word 0x12345678
add -mthumb, but you are using armv4t, it still works
a: bc02 pop {r1}
c: 4708 bx r1
Now you can move up to cortex-m0 which will work on all cortex-ms
$ arm-none-eabi-gcc -mcpu=cortex-m0 -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 4802 ldr r0, [pc, #8] ; (c <fun+0xc>)
4: f7ff fffe bl 0 <more_fun>
8: bd10 pop {r4, pc}
a: 46c0 nop ; (mov r8, r8)
c: 12345678 .word 0x12345678
the mthumb was not needed but we see it is not arv4t level it is newer
8: bd10 pop {r4, pc}
Note we did not need -mthumb, but always check just in case
And then you can go up to what you have if you wish
$ arm-none-eabi-gcc -mcpu=cortex-m4 -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 4801 ldr r0, [pc, #4] ; (8 <fun+0x8>)
2: f7ff bffe b.w 0 <more_fun>
6: bf00 nop
8: 12345678 .word 0x12345678
okay that is a big disturbing, but I guess because of the additional thumb2 extensions that arm7-m has that armv6-m does not they chose this, they could have done the tail optimization with cortex-m0 or -mthumb as well.
I was hoping for this instead
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x00001234)+1);
}
Disassembly of section .text:
00000000 <fun>:
0: b508 push {r3, lr}
2: f241 2034 movw r0, #4660 ; 0x1234
6: f7ff fffe bl 0 <more_fun>
a: 3001 adds r0, #1
with the 16 bit immediates in two instructions but this one did an ldr, same number of bytes, slower, but whatever both work, I got it to generate one movw...
And then you link these objects together along with your bootstrap and then figure out how to get it on the flash in your mcu.
If all you wanted to know is how to make the compiler generate thumb instructions from C that is easy. If you have C code from some other mcu, then the instruction set is trivial and you may have a significant amount of work as a fair amount of the code has nothing to do with the instruction set but instead the chip which is likely completely incompatible with any other mcu that is not already cortex-m based (and even if cortex-m based if it is not same vendor same family you are still doing a re-write)
I want to call a C function, say:
int foo(int a, int b) {return 2;}
inside an assembly (ARM) code. I read that I need to mention
import foo
in my assembly code, for assembler to search for foo in C file. But, I am stuck at passing arguments a and b from assembly and retrieving an integer (here 2) again back in assembly. Could someone could explain me how to do this, with a mini example?
You have already written the minimal example.
int foo(int a, int b) {return 2;}
compile and disassemble
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
Anything to do with a and b are dead code so optimized out. While using C to learn asm is good/okay to get started you really want to do it with optimizations on which mean you have to work harder on crafting the experimental code.
int foo(int a, int b) {return 2;}
int bar ( void )
{
return(foo(5,4));
}
and we learn nothing new.
Disassembly of section .text:
00000000 <foo>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
00000008 <bar>:
8: e3a00002 mov r0, #2
c: e12fff1e bx lr
need to do this for the call:
int foo(int a, int b);
int bar ( void )
{
return(foo(5,4));
}
and now we see
00000000 <bar>:
0: e92d4010 push {r4, lr}
4: e3a01004 mov r1, #4
8: e3a00005 mov r0, #5
c: ebfffffe bl 0 <foo>
10: e8bd4010 pop {r4, lr}
14: e12fff1e bx lr
(yes this is built for the this compilers default target armv4t, should be obvious to some others have no clue how I/we know)(can also tell how new/old the compiler is from this example as well (there was an abi change years ago that is visible here)(the newer versions of gcc are worse than older so older is still good to use for some use cases))
per this compilers convention (now while this compiler does use the arm convention of some version of some document for some version of this compiler, always remember it is the compiler authors choice, they are under no obligation to conform to anyones written standard, they choose)
So we see that the first parameter goes in r0, the second in r1. You can craft functions with more operands or more types of operands to see what nuances there are. How many are in registers and when they start using the stack instead. For example try a 64 bit variable then a 32 in that order as operands then try it in reverse.
To see what is going on on the callee side.
int foo(int a, int b)
{
return((a<<1)+b+0x123);
}
We see that r0 and r1 are the first two operands, the compiler would be grossly broken otherwise.
00000000 <foo>:
0: e0810080 add r0, r1, r0, lsl #1
4: e2800e12 add r0, r0, #288 ; 0x120
8: e2800003 add r0, r0, #3
c: e12fff1e bx lr
What we did not see explicitly in the caller example is that r0 is where the return is stored (at least for this variable type).
The ABI documention is not an easy read, but if you first "just try it" then if you wish refer to the documentation it should help with the documentation. At the end of the day you have a compiler you are going to use, it has a convention and is probably part of a toolchain so you must conform to that compilers convention not some third party document (even if that third party is arm) AND you should probably use that toolchain's assembler which means you should use that assembly language (many incompatible assembly languages for arm, the tool defines the language not the target).
You can see how simple it is to figure this out on your own.
And...so this gets painful but you can look at the assembly output of the compiler, at least some will let you. With gcc you can use -save-temps or -S
int foo(int a, int b)
{
return 2;
}
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global foo
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #2
bx lr
.size foo, .-foo
.ident "GCC: (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
Almost none of this do you "need".
The minimum looks like this
.globl foo
foo:
mov r0,#2
bx lr
.global or .globl are equivalent, somewhat reflects the age or how/when you learned gnu assembler.
Now this will break if you are mixing arm and thumb instructions, this defaults to arm.
arm-none-eabi-as x.s -o x.o
arm-none-eabi-objdump -d x.o
x.o: file format elf32-littlearm
Disassembly of section .text:
00000000 :
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
If we want thumb then we have to tell it
.thumb
.globl foo
foo:
mov r0,#2
bx lr
and we get thumb.
00000000 <foo>:
0: 2002 movs r0, #2
2: 4770 bx lr
With ARM and with the gnu toolchain at least you can mix arm and thumb and the linker will take care of the transition
int foo ( int, int );
int fun ( void )
{
return(foo(1,2));
}
we do not need a bootstrap nor other things to get the linker to link so we can see how that part of it works.
arm-none-eabi-ld so.o x.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: e3a01002 mov r1, #2
8008: e3a00001 mov r0, #1
800c: eb000001 bl 8018 <foo>
8010: e8bd4010 pop {r4, lr}
8014: e12fff1e bx lr
00008018 <foo>:
8018: 2002 movs r0, #2
801a: 4770 bx lr
Now this is broken not just because we have no bootstrap, etc, but there is a bl to foo but foo is thumb and the caller is arm. So for gnu assembler for arm you can take this shortcut which I think I learned from an older gcc, but whatever
.thumb
.thumb_func
.globl foo
foo:
mov r0,#2
bx lr
.thumb_func says the next label you find is considered a function label not just an address.
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: e3a01002 mov r1, #2
8008: e3a00001 mov r0, #1
800c: eb000003 bl 8020 <__foo_from_arm>
8010: e8bd4010 pop {r4, lr}
8014: e12fff1e bx lr
00008018 <foo>:
8018: 2002 movs r0, #2
801a: 4770 bx lr
801c: 0000 movs r0, r0
...
00008020 <__foo_from_arm>:
8020: e59fc000 ldr ip, [pc] ; 8028 <__foo_from_arm+0x8>
8024: e12fff1c bx ip
8028: 00008019 .word 0x00008019
802c: 00000000 .word 0x00000000
The linker adds a trampoline as I call it, I think others call it a vaneer. Either way the toolchain took care of is so long as we write the code right.
Remember and in particular this syntax for the assembler is very much assembler specific other assemblers may have other syntax to make this work. From the gcc generated code we see the generic solution which is more typing but probably a better habit.
.thumb
.type foo, %function
.global foo
foo:
mov r0,#2
bx lr
the .type foo, %function works for both arm and thumb in gnu assembler for arm. And it does not have to be positioned just before the labe (just like .globl or .global does not either. We get the same result from the toolchain with this assembly language.
Just for demonstration...
arm-none-eabi-as x.s -o x.o
arm-none-eabi-gcc -O2 -mthumb -c so.c -o so.o
arm-none-eabi-ld so.o x.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <fun>:
8000: b510 push {r4, lr}
8002: 2102 movs r1, #2
8004: 2001 movs r0, #1
8006: f000 f807 bl 8018 <__foo_from_thumb>
800a: bc10 pop {r4}
800c: bc02 pop {r1}
800e: 4708 bx r1
00008010 <foo>:
8010: e3a00002 mov r0, #2
8014: e12fff1e bx lr
00008018 <__foo_from_thumb>:
8018: 4778 bx pc
801a: e7fd b.n 8018 <__foo_from_thumb>
801c: eafffffb b 8010 <foo>
And you can see it works both ways thumb to arm arm to thumb if we write the asm write it does the rest of the work for us.
Now I personally hate the unified syntax, it is one of the major mistakes arm has made along with CMSIS. But, you want to do this for a living you find that you pretty much hate most corporate decisions and worse, have to work/operate with them. Often the time unified syntax generates the wrong instruction and have to fiddle with the syntax to get it to work, but if I have to get a specific instruction then I have to fiddle about to get it to generate the specific instruction I am after. Other than a bootstrap and some other exceptions you do not often write assembly language anyway, usually compile something then take the compiler generated code and tune it or replace it.
I started with the arm gnu tools before unified syntax so I am used to
.thumb
.globl hello
hello:
sub r0,#1
bne hello
instead of
.thumb
.globl hello
hello:
subs r0,#1
bne hello
And fine with bouncing between the two syntaxes (unified and not, yes two assembly languages within one tool).
All of the above is with the 32 bit arm, if you are interested in 64 bit arm, AND using gnu tools, then a percentage of this still applies, you just need to use the aarch64 tools not the arm tools from gnu. ARM's aarch64 is a completely different, and incompatible, instruction set from aarch32. But gnu syntax like .global and .type...function are often used across all gnu supported targets. There are exceptions for some directives, but if you take the same approach of having the tools themselves tell you how they work...by using them...You can figure this out.
so.elf: file format elf64-littleaarch64
Disassembly of section .text:
0000000000400000 <fun>:
400000: 52800041 mov w1, #0x2 // #2
400004: 52800020 mov w0, #0x1 // #1
400008: 14000001 b 40000c <foo>
000000000040000c <foo>:
40000c: 52800040 mov w0, #0x2 // #2
400010: d65f03c0 ret
What you need to do is place the arguments in the correct registers (or on the stack) as required. All the details on how to do this are what is known as the calling convention and forms a very important part of the Application Binary Interface(ABI).
Details on the ARM (Armv7) calling convention can be found at: https://developer.arm.com/documentation/den0013/d/Application-Binary-Interfaces/Procedure-Call-Standard
In order to combine .c and assembly, I want to pass start address of my .c code, and program microcontroller to know that its program starts at that address. As I am writing my startup file in assembly, I need to pass .c code starting address to assembly, and then to write this address to the specific memory region of microcontroller ( so the microcontroller can start execution on this address after RESET)
Trying to create a project for stm32f103 in Keil with this structure:
Some .c file, for example main.c (for the main part of the program).
Startup file in assembly language. Which gets the adress of entry to the function written in some .c file, to be passed to Reset_Handler
Scatter file, written in this way:
LR_IROM1 0x08000000 0x00010000 { ; load region size_region
ER_IROM1 0x08000000 0x00010000 { ; load address = execution address
*.o (RESET, +First) ; RESET is code section with I.V.T.
* (InRoot$$Sections)
.ANY (+RO)
.ANY (+XO)
}
RW_IRAM1 0x20000000 0x00005000 { ; RW data
.ANY (+RW +ZI)
}
}
The problem is passing the entry point to the .c function. Reset_Handler, which needs .c entry point(starting adress) passed by __main, looks like this:
Reset_Handler PROC
EXPORT Reset_Handler [WEAK]
IMPORT __main
LDR R0, =__main
BX R0
ENDP
bout entry point __main, as a answer for one assembly raleted question was written:
__main() is the compiler supplied entry point for your C code. It is not the main() function you write, but performs initialisation for the
standard library, static data, the heap before calling your `main()'
function.
So, how to get this entry point in my assembly file?
Edit>> If somebody is interested in solution for KEIL, here it is, its all that simple!
Simple assembly startup.s file:
AREA STACK, NOINIT, READWRITE
SPACE 0x400
Stack_top
AREA RESET, DATA, READONLY
dcd Stack_top
dcd Reset_Handler
EXPORT _InitMC
IMPORT notmain
AREA PROGRAM, CODE, READONLY
Reset_Handler PROC
bl notmain
ENDP
_InitMC PROC ;start of the assembly procedure
Loop
b Loop ;infinite loop
ENDP
END
Simple c file:
extern int _InitMC();
int notmain(void) {
_InitMC();
return 0;
}
Linker is the same as the one mentioned above.
Project build was successful.
Using the gnu toolchain for example:
Bootstrap:
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word loop
.word loop
.word loop
.thumb_func
reset:
bl notmain
b loop
.thumb_func
loop: b .
.align
.thumb_func
.globl fun
fun:
bx lr
.end
C entry point (function name is not relevant, sometimes using main() adds garbage, depends on the compiler/toolchain)
void fun ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) fun(ra);
return(0);
}
Linker script
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Build
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m0 -march=armv6-m -c so.c -o so.thumb.o
arm-none-eabi-ld -o so.thumb.elf -T flash.ld flash.o so.thumb.o
arm-none-eabi-objdump -D so.thumb.elf > so.thumb.list
arm-none-eabi-objcopy so.thumb.elf so.thumb.bin -O binary
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c so.c -o so.thumb2.o
arm-none-eabi-ld -o so.thumb2.elf -T flash.ld flash.o so.thumb2.o
arm-none-eabi-objdump -D so.thumb2.elf > so.thumb2.list
arm-none-eabi-objcopy so.thumb2.elf so.thumb2.bin -O binary
Result (all thumb versions)
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000
8000004: 08000015
8000008: 0800001b
800000c: 0800001b
8000010: 0800001b
08000014 <reset>:
8000014: f000 f804 bl 8000020 <notmain>
8000018: e7ff b.n 800001a <loop>
0800001a <loop>:
800001a: e7fe b.n 800001a <loop>
0800001c <fun>:
800001c: 4770 bx lr
800001e: 46c0 nop ; (mov r8, r8)
08000020 <notmain>:
8000020: b570 push {r4, r5, r6, lr}
8000022: 25fa movs r5, #250 ; 0xfa
8000024: 2400 movs r4, #0
8000026: 00ad lsls r5, r5, #2
8000028: 0020 movs r0, r4
800002a: 3401 adds r4, #1
800002c: f7ff fff6 bl 800001c <fun>
8000030: 42ac cmp r4, r5
8000032: d1f9 bne.n 8000028 <notmain+0x8>
8000034: 2000 movs r0, #0
8000036: bd70 pop {r4, r5, r6, pc}
Of course this has to be placed in flash at the right place with some tool.
The vector table is mapped by logic to 0x00000000 in the stm32 family.
08000000 <_start>:
8000000: 20001000
8000004: 08000015 <---- reset ORR 1
And in this minimal code the reset handler calls the C code the C code messes around and returns. Technically a fully functional program for most stm32s (change the stack init to a smaller value for those with less ram say 0x20000400 and it should work anywhere by using -mthumb by itself (armv4t) or adding the cortex-m0. well okay not the armv8ms they can technically not support all of armv6m but the one in the field I know about does.
I don't have Kiel so don't know how to translate to that, but it shouldn't be much of a stretch, just syntax.
I am executing C code for arm cortex-m3 for stm32l152C-discovery board but I observed that the function call from main is not getting pushed into the stack. I have analyzed the asm code of this source but I find it is OK. To understand better, please look the asm code generated for C code here:
main.elf: file format elf32-littlearm
*SYMBOL TABLE:
00000010 l d .text 00000000 .text
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l d .debug_frame 00000000 .debug_frame
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000 clock.c
20004ffc g .text 00000000 _STACKTOP
**00000028 g F .text 000000e0 SystemClock_Config**
20000000 g .text 00000000 _DATA_BEGIN
20000000 g .text 00000000 _HEAP
**00000010 g F .text 00000016 main**
20000000 g .text 00000000 _BSS_END
00000108 g .text 00000000 _DATAI_BEGIN
20000000 g .text 00000000 _BSS_BEGIN
00000108 g .text 00000000 _DATAI_END
20000000 g .text 00000000 _DATA_END
Disassembly of section .text:
00000010 <main>:
#define LL_GPIO_MODE_OUTPUT 1
void SystemInit() ;
int main()
{
10: b580 push {r7, lr}
12: b082 sub sp, #8
14: af00 add r7, sp, #0
int i = 0;
16: 2300 movs r3, #0
18: 607b str r3, [r7, #4]
SystemClock_Config();
**1a: f000 f805 bl 28 <SystemClock_Config>
for(;;)
i++;
1e: 687b ldr r3, [r7, #4]
20: 3301 adds r3, #1**
22: 607b str r3, [r7, #4]
24: e7fb b.n 1e <main+0xe>
}
00000028 <SystemClock_Config>:
* PLLDIV = 3
* Flash Latency(WS) = 1
* #retval None
*/
void SystemClock_Config(void)
{
28: b480 push {r7}
2a: af00 add r7, sp, #0
SET_BIT(FLASH->ACR, FLASH_ACR_ACC64);
2c: 4a33 ldr r2, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
2e: 4b33 ldr r3, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
30: 681b ldr r3, [r3, #0]
32: f043 0304 orr.w r3, r3, #4
36: 6013 str r3, [r2, #0]
MODIFY_REG(FLASH->ACR, FLASH_ACR_LATENCY, LL_FLASH_LATENCY_1);
38: 4a30 ldr r2, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3a: 4b30 ldr r3, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3c: 681b ldr r3, [r3, #0]
3e: f043 0301 orr.w r3, r3, #1
42: 6013 str r3, [r2, #0]*
}
the execution loops around 0x1a, 0x1c, 0x1e, 0x20 in PC register.
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
It should jump to 0x28 (SystemClock_Config) at 0x1a.
A very simple completely working example:
vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl centry
done: b done
so.c
unsigned int fun ( unsigned int );
unsigned int centry ( void )
{
return(fun(5)+1);
}
fun.c
unsigned int fun ( unsigned int x )
{
return(x+1);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -o so.elf -T flash.ld vectors.o so.o fun.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
the whole program
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000009 andeq r0, r0, r9
00000008 <reset>:
8: f000 f802 bl 10 <centry>
0000000c <done>:
c: e7fe b.n c <done>
...
00000010 <centry>:
10: b510 push {r4, lr}
12: 2005 movs r0, #5
14: f000 f802 bl 1c <fun>
18: 3001 adds r0, #1
1a: bd10 pop {r4, pc}
0000001c <fun>:
1c: 3001 adds r0, #1
1e: 4770 bx lr
a simulation of the program:
read32(0x00000000)=0x20001000
read32(0x00000004)=0x00000009
--- 0x00000008: 0xF000
--- 0x0000000A: 0xF802 bl 0x0000000F
--- 0x00000010: 0xB510 push {r4,lr}
write32(0x20000FF8,0x00000000)
write32(0x20000FFC,0x0000000D)
--- 0x00000012: 0x2005 movs r0,#0x05
--- 0x00000014: 0xF000
--- 0x00000016: 0xF802 bl 0x0000001B
--- 0x0000001C: 0x3001 adds r0,#0x01
--- 0x0000001E: 0x4770 bx r14
--- 0x00000018: 0x3001 adds r0,#0x01
--- 0x0000001A: 0xBD10 pop {r4,pc}
read32(0x20000FF8)=0x00000000
read32(0x20000FFC)=0x0000000D
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
sure it is a somewhat useless program but it demonstrates booting and calling functions (the function address does not show up on the stack, when you do a call (bl) the r14 gets the return address and r15 gets the address to branch to. if you have nested functions like centry (the C entry point main() is not an important function name you can call your entry point whatever you want so long as your bootstrap matches) calling fun, then you need to preserve the return address however you choose, typically save it on the stack. r4 is being pushed just to keep the stack aligned on a 64 bit boundary per the abi.
for your system you would set the linker script for 0x08000000 normally (stm32).
What we are missing from you is the beginning of your binary, can you do a hexdump of the memory image/binary showing the handfuls of byte before main including the first few instructions of main?
If a bare metal program doesnt do the simplest boot steps right, the first thing you do is to examine the binary where the entry point or vector table is depending on the architecture and see that you built it right.
In this case in my example this is a cortex-m so the stack pointer initialization value (if you choose to use it) is at 0x00000000, you can put anything there and then simply write over the sp if you want, your choice...then address 0x00000004 is the reset vector which is the address of the code to handle the reset with the lsbit set to indicate thumb mode.
so 0x00000008|1 = 0x00000009.
If you dont have
0x2000xxxx
0x00000011
then your processor is not going to boot right. I am so much in the habit of using 0x08000000 that I dont remember if 0x00000000 works for an stm, it in theory should...but depends on how you are loading the flash and what mode/state the chip is in at that time.
you might need to link for 0x08000000 and at a minimum if nothing else changed
0x2000xxxx
0x08000011
as the first two word in your binary/memory image.
EDIT
note you can make a single binary that can be entered both with a vector or a bootloader
.thumb
.thumb_func
.global _start
_start:
bl reset
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
placing a branch (well bl to fill the space) in the stack address spot then loading the stack pointer later.
Or use a branch
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
Your application is missing an interrupt table. As a result, the processor is reading instructions as interrupt vectors, and faulting repeatedly as those instructions cannot be interpreted as invalid addresses.
Use the support files from the STM32L1xx standard peripheral library to generate an appropriate linker script and interrupt table.
I've been programming in C and C++ for quite a long time now, so I'm familiar with the linking process as a user: the preprocessor expands all prototypes and macros in each .c file which is then compiled separately into its own object file, and all object files together with static libraries are linked into an executable.
However I'd like to know more about this process: how does the linker link the object files (what do they contain anyway?)? Matching declared but undefined functions with their definitions in other files (how?)? Translating into the exact content of the program memory (context: microcontrollers)?
Application example
Ideally, I'm looking for a detailed step-by-step description of what the process is doing, based on the following simplistic example. Since it doesn't appear to be said anywhere, fame and glory to whoever answers in this way.
main.c
#include "otherfile.h"
int main(void) {
otherfile_print("Foo");
return 0;
}
otherfile.h
void otherfile_print(char const *);
otherfile.c
#include "otherfile.h"
#include <stdio.h>
void otherfile_print(char const *str) {
printf(str);
}
printf is insanely complicated, very bad for a microcontroller hello world example, blinking leds are better but that gets specific to the microcontroller. this will suffice for linking.
two.c
unsigned int glob;
unsigned int two ( unsigned int a, unsigned int b )
{
glob=5;
return(a+b+7);
}
one.c
extern unsigned int glob;
unsigned int two ( unsigned int, unsigned int );
unsigned int one ( void )
{
return(two(5,6)+glob);
}
start.s
.globl _start
_start:
bl one
b .
build everything.
% arm-none-eabi-gcc -O2 -c one.c -o one.o
% arm-none-eabi-gcc -O2 -c two.c -o two.o
% touch start.s
% arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
% arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
% arm-none-eabi-as start.s -o start.o
% arm-none-eabi-ld -Ttext=0x10000000 start.o one.o two.o -o onetwo.elf
now lets look...
arm-none-eabi-objdump -D start.o
...
00000000 <_start>:
0: ebfffffe bl 0 <one>
4: eafffffe b 4 <_start+0x4>
it not is the compiler/assemblers job to deal with external references so the branch link to one is left incomplete, they chose to make it a bl of 0 but they could have simply left it totally unencoded, it is up to the authors of the toolchain as to how to communicate between the compiler, assembler, and linker via object files.
Same here
00000000 <one>:
0: e92d4008 push {r3, lr}
4: e3a00005 mov r0, #5
8: e3a01006 mov r1, #6
c: ebfffffe bl 0 <two>
10: e59f300c ldr r3, [pc, #12] ; 24 <one+0x24>
14: e5933000 ldr r3, [r3]
18: e0800003 add r0, r0, r3
1c: e8bd4008 pop {r3, lr}
20: e12fff1e bx lr
24: 00000000 andeq r0, r0, r0
both the function two and the address for the global variable glob are unknown. Note that for the unknown variable the compiler generates code that requires the explicit address of the global so that the linker simply needs to fill in the address, also glob is .data not .text.
00000000 <two>:
0: e59f3010 ldr r3, [pc, #16] ; 18 <two+0x18>
4: e2811007 add r1, r1, #7
8: e3a02005 mov r2, #5
c: e0810000 add r0, r1, r0
10: e5832000 str r2, [r3]
14: e12fff1e bx lr
18: 00000000 andeq r0, r0, r0
here too the global is in .data not here, so the linker will have to place .data and the things in it and then fill in the addresses.
so here we have linked it all together, the gnu linker requires an entry point label defined _start (main is an extern address required by the standard bootstrap, which I am not using so we dont get a main not found error). Because I am not using a linker script the gnu linker places items in the binary in the order they were defined on the command line, as desired i need start first for a microcontroller since I am controlling the boot. I used a non-zero here for demonstration purposes as well...
10000000 <_start>:
10000000: eb000000 bl 10000008 <one>
10000004: eafffffe b 10000004 <_start+0x4>
10000008 <one>:
10000008: e92d4008 push {r3, lr}
1000000c: e3a00005 mov r0, #5
10000010: e3a01006 mov r1, #6
10000014: eb000005 bl 10000030 <two>
10000018: e59f300c ldr r3, [pc, #12] ; 1000002c <one+0x24>
1000001c: e5933000 ldr r3, [r3]
10000020: e0800003 add r0, r0, r3
10000024: e8bd4008 pop {r3, lr}
10000028: e12fff1e bx lr
1000002c: 1000804c andne r8, r0, ip, asr #32
10000030 <two>:
10000030: e59f3010 ldr r3, [pc, #16] ; 10000048 <two+0x18>
10000034: e2811007 add r1, r1, #7
10000038: e3a02005 mov r2, #5
1000003c: e0810000 add r0, r1, r0
10000040: e5832000 str r2, [r3]
10000044: e12fff1e bx lr
10000048: 1000804c andne r8, r0, ip, asr #32
Disassembly of section .bss:
1000804c <__bss_start>:
1000804c: 00000000 andeq r0, r0, r0
so the linker starts to place the first item start.o, it roughly figures out how big that needs to be by just putting what was there. those two instructions. they take 8 bytes so in theory the second item one.o goes next at 0x10000008. That means the encoding for the bl one in start.s can be completed to use the correct relative address (_start + 8 which is the value of the pc when executing so the offset is zero, pc+0 is the encoding)
the linker has roughly placed one.o into the binary it is building and it has to resolve the address to two and the global so it has to place two.o and then figure out where the end of that is to place in this case .bss not .data since I didnt pre-init the variable.
the label for two is at 0x10000030 so it encodes the bl two in one() for that relative offset, it has also placed glob at 1000804c for some reason (I didnt complete define where ram was so the gnu linker will do things like this). Despite the reason, that is where the linker defined the home for that global variable and where the address to glob is needed is filled in by the linker, both one() and two() needed those filled in.
So the compiler (assembler) and linker have to in the end result in a usable binary, the compiler (assembler) tend to worry about making position independent machine code and leave enough information for the linker so that it has the machine code and a list of unresolved externs that it has to fill in. compilers have improved over time, a simple model would be to have an address location like they did above for the global variables address, where the linker computes the absolute address and just fills it in, clearly above they did not encode the function call in a way that it can use an absolute address to one and two. instead it uses pc relative addressing. This means that the linker has to know the machine code encoding of the bl instruction. the current generation of gnu linker knows quite a bit more and can do some cool things resolving arm to thumb and back, stuff it didnt used to know (you dont need to compile for thumb interwork anymore the linker takes care of it).
So the linker takes binary blobs including data and...links them together into one binary. It first needs to know the actual addresses for the various things in the binary. How you tell the linker this is linker specific and not a global thing for all C/C++ toolchains. Gnu linker scripts are a programming language in and of themselves. These are not necessarily physical nor virtual addresses it is simply the address space of the code in whatever mode it is in (virtual or physical). Once the linker knows the addresses it, based on linker rules (again linker specific) it starts placing these various binary blobs into those address spaces. then it goes through and resolves the external/global addresses. It was not above but can be an iterative process. If for example the function two() was at an address in memory that cannot be accessed with a single pc relative instruction (say we put one near zero and two near 0xF0000000) then those that wrote the linker have two choices, the simple choice is to simply state that it cannot encode/implement that far of a branch and bail out and gnu linker did or still does do that. Or the other solution is the linker fixes the problem. the linker could add a few words of data within the range of the pc relative branch link and those few words of data are a trampoline for example an absolute address that is loaded into a register then a register based branch or perhaps of clever a pc relative branch if the trampoline is within range (in the case of 0x10000000 to 0xF0000000 that wouldnt work). If the linker has to add these few words then that may mean that some of the binary blobs have to move to make room for those few words and now all of the addresses in those binary blobs now have to move as well. So you have to make another pass across all the binary blobs, resolving all of the new addresses filling in the answers and for pc relative determining if you can still reach everything. Adding those few words might have made something that was reachable with a pc-relative now unreachable and now that requires a solution (error or patch).
The assembler itself for a single source file has to go through even more of these gyrations esp for a variable length instruction set like x86 where the addressing is a big vague. I recommend trying for yourself to make a simple assembler that only supports a few instructions but some of those branches. and parse and encode the instructions and compare that to an existing debugged assembler like gnu assembler.
test.s
ldr r1,locdat
nop
nop
nop
nop
nop
b over
locdat: .word 0x12345678
top:
nop
nop
nop
nop
nop
nop
over:
b top
the right answer is
00000000 <locdat-0x1c>:
0: e59f1014 ldr r1, [pc, #20] ; 1c <locdat>
4: e1a00000 nop ; (mov r0, r0)
8: e1a00000 nop ; (mov r0, r0)
c: e1a00000 nop ; (mov r0, r0)
10: e1a00000 nop ; (mov r0, r0)
14: e1a00000 nop ; (mov r0, r0)
18: ea000006 b 38 <over>
0000001c <locdat>:
1c: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
00000020 <top>:
20: e1a00000 nop ; (mov r0, r0)
24: e1a00000 nop ; (mov r0, r0)
28: e1a00000 nop ; (mov r0, r0)
2c: e1a00000 nop ; (mov r0, r0)
30: e1a00000 nop ; (mov r0, r0)
34: e1a00000 nop ; (mov r0, r0)
00000038 <over>:
38: eafffff8 b 20 <top>
there are parallels to that activity and the job of a linker. also you could fashion a simple linker based on the above files or something similar, extract the binary blobs and other info and start placing them in whatever address space you want.
Either one are fairly simple programming tasks, yet fairly educational. Having an existing toolchain that can produce the answer you can figure out where you are going wrong or how to get at the right answer.