How do we enable thumb mode in stm32f series when we use embedded c to drive it?Can you explain it in detail as i am a beginner - arm

I want to enable thumb mode in stm32f401re board. the code i had written for it is in embedded c. How do we enable thumb mode in embedded c language. Do we use -mthumb command for it, do we have to add any library prior using that command. Or is there any totally different method.
I searched and found the method only in assembly language. But i want it in embedded c. I used even the -mthumb command but it showed an error.

unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x12345678));
}
$ arm-none-eabi-gcc -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <fun+0x14>
8: ebfffffe bl 0 <more_fun>
c: e8bd4010 pop {r4, lr}
10: e12fff1e bx lr
14: 12345678 .word 0x12345678
That is defaulting to arm, looks like armv4, so that should work on non-cortex-ms from armv4 to armv7 (couple of decades).
To get all thumb variants, which will work on your cortex-m4
$ arm-none-eabi-gcc -mthumb -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 4803 ldr r0, [pc, #12] ; (10 <fun+0x10>)
4: f7ff fffe bl 0 <more_fun>
8: bc10 pop {r4}
a: bc02 pop {r1}
c: 4708 bx r1
e: 46c0 nop ; (mov r8, r8)
10: 12345678 .word 0x12345678
add -mthumb, but you are using armv4t, it still works
a: bc02 pop {r1}
c: 4708 bx r1
Now you can move up to cortex-m0 which will work on all cortex-ms
$ arm-none-eabi-gcc -mcpu=cortex-m0 -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 4802 ldr r0, [pc, #8] ; (c <fun+0xc>)
4: f7ff fffe bl 0 <more_fun>
8: bd10 pop {r4, pc}
a: 46c0 nop ; (mov r8, r8)
c: 12345678 .word 0x12345678
the mthumb was not needed but we see it is not arv4t level it is newer
8: bd10 pop {r4, pc}
Note we did not need -mthumb, but always check just in case
And then you can go up to what you have if you wish
$ arm-none-eabi-gcc -mcpu=cortex-m4 -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 4801 ldr r0, [pc, #4] ; (8 <fun+0x8>)
2: f7ff bffe b.w 0 <more_fun>
6: bf00 nop
8: 12345678 .word 0x12345678
okay that is a big disturbing, but I guess because of the additional thumb2 extensions that arm7-m has that armv6-m does not they chose this, they could have done the tail optimization with cortex-m0 or -mthumb as well.
I was hoping for this instead
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x00001234)+1);
}
Disassembly of section .text:
00000000 <fun>:
0: b508 push {r3, lr}
2: f241 2034 movw r0, #4660 ; 0x1234
6: f7ff fffe bl 0 <more_fun>
a: 3001 adds r0, #1
with the 16 bit immediates in two instructions but this one did an ldr, same number of bytes, slower, but whatever both work, I got it to generate one movw...
And then you link these objects together along with your bootstrap and then figure out how to get it on the flash in your mcu.
If all you wanted to know is how to make the compiler generate thumb instructions from C that is easy. If you have C code from some other mcu, then the instruction set is trivial and you may have a significant amount of work as a fair amount of the code has nothing to do with the instruction set but instead the chip which is likely completely incompatible with any other mcu that is not already cortex-m based (and even if cortex-m based if it is not same vendor same family you are still doing a re-write)

Related

Call C function from Assembly, passing args and getting the return value in the ARM calling convention

I want to call a C function, say:
int foo(int a, int b) {return 2;}
inside an assembly (ARM) code. I read that I need to mention
import foo
in my assembly code, for assembler to search for foo in C file. But, I am stuck at passing arguments a and b from assembly and retrieving an integer (here 2) again back in assembly. Could someone could explain me how to do this, with a mini example?
You have already written the minimal example.
int foo(int a, int b) {return 2;}
compile and disassemble
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
Anything to do with a and b are dead code so optimized out. While using C to learn asm is good/okay to get started you really want to do it with optimizations on which mean you have to work harder on crafting the experimental code.
int foo(int a, int b) {return 2;}
int bar ( void )
{
return(foo(5,4));
}
and we learn nothing new.
Disassembly of section .text:
00000000 <foo>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
00000008 <bar>:
8: e3a00002 mov r0, #2
c: e12fff1e bx lr
need to do this for the call:
int foo(int a, int b);
int bar ( void )
{
return(foo(5,4));
}
and now we see
00000000 <bar>:
0: e92d4010 push {r4, lr}
4: e3a01004 mov r1, #4
8: e3a00005 mov r0, #5
c: ebfffffe bl 0 <foo>
10: e8bd4010 pop {r4, lr}
14: e12fff1e bx lr
(yes this is built for the this compilers default target armv4t, should be obvious to some others have no clue how I/we know)(can also tell how new/old the compiler is from this example as well (there was an abi change years ago that is visible here)(the newer versions of gcc are worse than older so older is still good to use for some use cases))
per this compilers convention (now while this compiler does use the arm convention of some version of some document for some version of this compiler, always remember it is the compiler authors choice, they are under no obligation to conform to anyones written standard, they choose)
So we see that the first parameter goes in r0, the second in r1. You can craft functions with more operands or more types of operands to see what nuances there are. How many are in registers and when they start using the stack instead. For example try a 64 bit variable then a 32 in that order as operands then try it in reverse.
To see what is going on on the callee side.
int foo(int a, int b)
{
return((a<<1)+b+0x123);
}
We see that r0 and r1 are the first two operands, the compiler would be grossly broken otherwise.
00000000 <foo>:
0: e0810080 add r0, r1, r0, lsl #1
4: e2800e12 add r0, r0, #288 ; 0x120
8: e2800003 add r0, r0, #3
c: e12fff1e bx lr
What we did not see explicitly in the caller example is that r0 is where the return is stored (at least for this variable type).
The ABI documention is not an easy read, but if you first "just try it" then if you wish refer to the documentation it should help with the documentation. At the end of the day you have a compiler you are going to use, it has a convention and is probably part of a toolchain so you must conform to that compilers convention not some third party document (even if that third party is arm) AND you should probably use that toolchain's assembler which means you should use that assembly language (many incompatible assembly languages for arm, the tool defines the language not the target).
You can see how simple it is to figure this out on your own.
And...so this gets painful but you can look at the assembly output of the compiler, at least some will let you. With gcc you can use -save-temps or -S
int foo(int a, int b)
{
return 2;
}
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global foo
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #2
bx lr
.size foo, .-foo
.ident "GCC: (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
Almost none of this do you "need".
The minimum looks like this
.globl foo
foo:
mov r0,#2
bx lr
.global or .globl are equivalent, somewhat reflects the age or how/when you learned gnu assembler.
Now this will break if you are mixing arm and thumb instructions, this defaults to arm.
arm-none-eabi-as x.s -o x.o
arm-none-eabi-objdump -d x.o
x.o: file format elf32-littlearm
Disassembly of section .text:
00000000 :
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
If we want thumb then we have to tell it
.thumb
.globl foo
foo:
mov r0,#2
bx lr
and we get thumb.
00000000 <foo>:
0: 2002 movs r0, #2
2: 4770 bx lr
With ARM and with the gnu toolchain at least you can mix arm and thumb and the linker will take care of the transition
int foo ( int, int );
int fun ( void )
{
return(foo(1,2));
}
we do not need a bootstrap nor other things to get the linker to link so we can see how that part of it works.
arm-none-eabi-ld so.o x.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: e3a01002 mov r1, #2
8008: e3a00001 mov r0, #1
800c: eb000001 bl 8018 <foo>
8010: e8bd4010 pop {r4, lr}
8014: e12fff1e bx lr
00008018 <foo>:
8018: 2002 movs r0, #2
801a: 4770 bx lr
Now this is broken not just because we have no bootstrap, etc, but there is a bl to foo but foo is thumb and the caller is arm. So for gnu assembler for arm you can take this shortcut which I think I learned from an older gcc, but whatever
.thumb
.thumb_func
.globl foo
foo:
mov r0,#2
bx lr
.thumb_func says the next label you find is considered a function label not just an address.
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: e3a01002 mov r1, #2
8008: e3a00001 mov r0, #1
800c: eb000003 bl 8020 <__foo_from_arm>
8010: e8bd4010 pop {r4, lr}
8014: e12fff1e bx lr
00008018 <foo>:
8018: 2002 movs r0, #2
801a: 4770 bx lr
801c: 0000 movs r0, r0
...
00008020 <__foo_from_arm>:
8020: e59fc000 ldr ip, [pc] ; 8028 <__foo_from_arm+0x8>
8024: e12fff1c bx ip
8028: 00008019 .word 0x00008019
802c: 00000000 .word 0x00000000
The linker adds a trampoline as I call it, I think others call it a vaneer. Either way the toolchain took care of is so long as we write the code right.
Remember and in particular this syntax for the assembler is very much assembler specific other assemblers may have other syntax to make this work. From the gcc generated code we see the generic solution which is more typing but probably a better habit.
.thumb
.type foo, %function
.global foo
foo:
mov r0,#2
bx lr
the .type foo, %function works for both arm and thumb in gnu assembler for arm. And it does not have to be positioned just before the labe (just like .globl or .global does not either. We get the same result from the toolchain with this assembly language.
Just for demonstration...
arm-none-eabi-as x.s -o x.o
arm-none-eabi-gcc -O2 -mthumb -c so.c -o so.o
arm-none-eabi-ld so.o x.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <fun>:
8000: b510 push {r4, lr}
8002: 2102 movs r1, #2
8004: 2001 movs r0, #1
8006: f000 f807 bl 8018 <__foo_from_thumb>
800a: bc10 pop {r4}
800c: bc02 pop {r1}
800e: 4708 bx r1
00008010 <foo>:
8010: e3a00002 mov r0, #2
8014: e12fff1e bx lr
00008018 <__foo_from_thumb>:
8018: 4778 bx pc
801a: e7fd b.n 8018 <__foo_from_thumb>
801c: eafffffb b 8010 <foo>
And you can see it works both ways thumb to arm arm to thumb if we write the asm write it does the rest of the work for us.
Now I personally hate the unified syntax, it is one of the major mistakes arm has made along with CMSIS. But, you want to do this for a living you find that you pretty much hate most corporate decisions and worse, have to work/operate with them. Often the time unified syntax generates the wrong instruction and have to fiddle with the syntax to get it to work, but if I have to get a specific instruction then I have to fiddle about to get it to generate the specific instruction I am after. Other than a bootstrap and some other exceptions you do not often write assembly language anyway, usually compile something then take the compiler generated code and tune it or replace it.
I started with the arm gnu tools before unified syntax so I am used to
.thumb
.globl hello
hello:
sub r0,#1
bne hello
instead of
.thumb
.globl hello
hello:
subs r0,#1
bne hello
And fine with bouncing between the two syntaxes (unified and not, yes two assembly languages within one tool).
All of the above is with the 32 bit arm, if you are interested in 64 bit arm, AND using gnu tools, then a percentage of this still applies, you just need to use the aarch64 tools not the arm tools from gnu. ARM's aarch64 is a completely different, and incompatible, instruction set from aarch32. But gnu syntax like .global and .type...function are often used across all gnu supported targets. There are exceptions for some directives, but if you take the same approach of having the tools themselves tell you how they work...by using them...You can figure this out.
so.elf: file format elf64-littleaarch64
Disassembly of section .text:
0000000000400000 <fun>:
400000: 52800041 mov w1, #0x2 // #2
400004: 52800020 mov w0, #0x1 // #1
400008: 14000001 b 40000c <foo>
000000000040000c <foo>:
40000c: 52800040 mov w0, #0x2 // #2
400010: d65f03c0 ret
What you need to do is place the arguments in the correct registers (or on the stack) as required. All the details on how to do this are what is known as the calling convention and forms a very important part of the Application Binary Interface(ABI).
Details on the ARM (Armv7) calling convention can be found at: https://developer.arm.com/documentation/den0013/d/Application-Binary-Interfaces/Procedure-Call-Standard

Why -marm option still generates thumb instruction?

(I am new to the ARM world. Excuse me if this is a dumb question.)
I am using below command line to generate assembly code for a C file.
The cpu is arm926ej-s, which is ARMv5 architecture.
arm-none-eabi-gcc -mcpu=arm926ej-s -mthumb -S t.c -o t_thumb.S
arm-none-eabi-gcc -mcpu=arm926ej-s -marm -S t.c -o t_arm.S
I am expecting the -marm and -mthumb options would generate different function prologues. But they give similar results:
for -marm:
# args = 0, pretend = 0, frame = 72
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr} #<========== push is used instead of stmfd
add fp, sp, #4
sub sp, sp, #72
bl uart_init
for -mthumb:
# args = 0, pretend = 0, frame = 72
# frame_needed = 1, uses_anonymous_args = 0
push {r7, lr} #<========== push is used as expected
sub sp, sp, #72
add r7, sp, #0
bl uart_init
So they both use the push instruction. But as I checked the ARMv5 arch spec, the push instruction only belongs to the Thumb instruction set. I was expecting stmfd for the -marm option.
Why is push chosen instead?
How can I generate pure ARM instructions?
ADD 1 - 5:21 PM 12/18/2019
Below is the disassembly of the .o file:
arm-none-eabi-gcc -mcpu=arm926ej-s -marm -g -c t.c -o build/t_arm.o
arm-none-eabi-objdump.exe -d build/t_arm.o > t_arm.dism
The disassembly:
000002a0 <main>:
2a0: e92d4800 push {fp, lr} <=============== push is used!
2a4: e28db004 add fp, sp, #4
2a8: e24dd048 sub sp, sp, #72 ; 0x48
2ac: ebfffffe bl 0 <uart_init>
2b0: e59f3168 ldr r3, [pc, #360] ; 420 <main+0x180>
2b4: e50b300c str r3, [fp, #-12]
2b8: e59f1164 ldr r1, [pc, #356] ; 424 <main+0x184>
2bc: e51b000c ldr r0, [fp, #-12]
ADD 2 - 5:34 PM 12/18/2019
Thanks to #Erlkoenig.
I just tried to disassemble a -mthumb binary:
arm-none-eabi-gcc -mcpu=arm926ej-s -mthumb -g -c t.c -o build/t_thumb.o
arm-none-eabi-objdump.exe -d build/t_thumb.o > t_thumb.dism
A totally different thumb disassembly is shown:
00000170 <main>:
170: b580 push {r7, lr} <====== though still push is shown, but the encoding is different.
172: b092 sub sp, #72 ; 0x48
174: af00 add r7, sp, #0
176: f7ff fffe bl 0 <uart_init>
17a: 4b3c ldr r3, [pc, #240] ; (26c <main+0xfc>)
17c: 643b str r3, [r7, #64] ; 0x40
17e: 4a3c ldr r2, [pc, #240] ; (270 <main+0x100>)
180: 6c3b ldr r3, [r7, #64] ; 0x40
The hex encoding of the raw instruction as shown by objdump -d indicates that this is a 32bit ARM ("A32") instruction (0xe92d4800). The .S file generated by the -S flag to GCC, and the objdump output just use the ARM UAL (Unified Assembly Syntax), which uses push as an alias for stmfd, while the ARMv5T Architecture Reference Manual uses the old syntax, which has no push on A32. The instruction encoding matches the encoding of stmdb, for which stmfd is an alias. The encoding is shown on p. 339 in the ARMv5T Reference Manual.
A32 ("ARM") code can be easily recognized as all instructions are 4-byte wide and the first 4 bits are often hex E (which means that the condition code is AL, i.e. the instructions are always executed unconditionally):
[e]92d4800
[e]28db004
[e]24dd048
[e]bfffffe
This is helpful when viewing raw binaries in a hex editor. Thumb ("T32") code has many 16bit instructions, some 32bit, and no "stacks" of Es:
b580
b092
af00
f7ff fffe
Of course, for a raw binary, it is not directly clear which 2- and 4-byte groups belong together as instructions.

arm-linux-gcc -mthumb seems to generate regular arm code

I'm trying to compile a C program to the ARM Thumb-2 instruction set through GCC 6.2, using arm-linux-gnueabihf-gcc from the 6.2.0-5ubuntu12 package on Ubuntu GNU/Linux.
The problem is I'm getting the same binary as when I'm not using the "-mthumb" option:
$ arm-linux-gnueabihf-gcc -c hello.c
$ arm-linux-gnueabihf-objdump -S hello.o
hello.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <main>:
0: b580 push {r7, lr}
2: af00 add r7, sp, #0
4: f240 0000 movw r0, #0
8: f2c0 0000 movt r0, #0
c: f7ff fffe bl 0 <puts>
10: 2300 movs r3, #0
12: 4618 mov r0, r3
14: bd80 pop {r7, pc}
And:
$ arm-linux-gnueabihf-gcc -mthumb -c hello.c
$ arm-linux-gnueabihf-objdump -S hello.o
hello.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <main>:
0: b580 push {r7, lr}
2: af00 add r7, sp, #0
4: f240 0000 movw r0, #0
8: f2c0 0000 movt r0, #0
c: f7ff fffe bl 0 <puts>
10: 2300 movs r3, #0
12: 4618 mov r0, r3
14: bd80 pop {r7, pc}
I'm getting the same code. I also tried with more complex code (computing the decimals of Pi), and still getting the same code.
I'm getting the same behavior with gcc-4.7-arm-linux-gnueabihf, so the issue is probably in the way I'm using the compiler...
It's hard to believe I'm the only one facing this issue. There must be a way to get GCC to do what's it's supposed to.

Reduce clang-generated code size for ARM

I compare code generated by clang and generated by gcc for arm.
Unfortunately, gcc's code more often has less instructions.
I am just curious - is it possible to reduce code, generated by clang?
Maybe I should use some options to do so...
Please, consider very simple example:
> cat test.c
int to_upper(int c)
{
if(c < 'a' || c > 'z') return c;
else return c - ('a' - 'A');
}
> clang -target arm-none-eabi -Oz -c -mthumb -mcpu=cortex-m0 -msoft-float ./test.c -o ./clang_test.o
> /usr/bin/arm-none-eabi-gcc -Os -c -mthumb -mcpu=cortex-m0 -msoft-float ./test.c -o ./gcc_test.o
> /usr/bin/arm-none-eabi-objdump -d ./clang_test.o
./clang_test.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <to_upper>:
0: 4602 mov r2, r0
2: 3a61 subs r2, #97 ; 0x61
4: 4601 mov r1, r0
6: 3920 subs r1, #32
8: 2a19 cmp r2, #25
a: d800 bhi.n e <to_upper+0xe>
c: 4608 mov r0, r1
e: 4770 bx lr
> /usr/bin/arm-none-eabi-objdump -d ./gcc_test.o
./gcc_test.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <to_upper>:
0: 1c03 adds r3, r0, #0
2: 3b61 subs r3, #97 ; 0x61
4: 2b19 cmp r3, #25
6: d800 bhi.n a <to_upper+0xa>
8: 3820 subs r0, #32
a: 4770 bx lr
Why so much difference in such simple code?
Can clang generate less code in this case? At least as gcc?
Note: if we change cpu to -mcpu=cortex-a5 (other options remains the same), then clang ang gcc produce
absolutely identical code:
00000000 <to_upper>:
0: f1a0 0361 sub.w r3, r0, #97 ; 0x61
4: 2b19 cmp r3, #25
6: bf98 it ls
8: 3820 subls r0, #32
a: 4770 bx lr
OS: Ubuntu 14.04.3
clang version 3.7.1 (tags/RELEASE_371/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
arm-none-eabi-gcc (4.8.2-14ubuntu1+6) 4.8.2
No, clang cannot generate less code in this case. And also in many others.
Historically, very few code size optimizations have been implemented in LLVM. When optimizing for code size, GCC typically outperforms LLVM significantly.
Here presentation, where done a closer look at the comparing GCC and Clang in terms of code size optimization.
Presentation video

Would Thumb-2 ARM-Core Micros From Different Manufacturers Have Same Codesize?

Comparing two Thumb-2 micros from two different manufacturers. One's a Cortex M3, one's an A5. Are they guaranteed to compile a particular piece of code to the same codesize?
so here goes
fun.c
unsigned int fun ( unsigned int x )
{
return(x);
}
addimm.c
extern unsigned int fun ( unsigned int );
unsigned int addimm ( unsigned int x )
{
return(fun(x)+0x123);
}
for demonstration purposes building for bare metal, not really a functional program but it compiles clean and demonstrates what I intend to demonstrate.
arm instructions
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-a5 -march=armv7-a -c addimm.c -o addimma.o
disassembly of the object, not linked
00000000 <addimm>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <fun>
8: e2800e12 add r0, r0, #288 ; 0x120
c: e2800003 add r0, r0, #3
10: e8bd8008 pop {r3, pc}
thumb generic (armv4 or v5 whatever the default was for this compiler build)
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -c addimm.c -o addimmt.o
00000000 <addimm>:
0: b508 push {r3, lr}
2: f7ff fffe bl 0 <fun>
6: 3024 adds r0, #36 ; 0x24
8: 30ff adds r0, #255 ; 0xff
a: bc08 pop {r3}
c: bc02 pop {r1}
e: 4708 bx r1
cortex-a5 specific
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-a5 -march=armv7-a -c addimm.c -o addimma5.o
00000000 <addimm>:
0: b508 push {r3, lr}
2: f7ff fffe bl 0 <fun>
6: f200 1023 addw r0, r0, #291 ; 0x123
a: bd08 pop {r3, pc}
cortex-a5 is armv7-a which supports thumb-2 as far as the add immediate itself goes and related to binary size there is no optimization here, 32 bits for thumb and 32 bits for thumb2. But this is but one example there perhaps will be times that thumb2 produces smaller binaries than thumb.
cortex-m3
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c addimm.c -o addimmm3.o
00000000 <addimm>:
0: b508 push {r3, lr}
2: f7ff fffe bl 0 <fun>
6: f200 1023 addw r0, r0, #291 ; 0x123
a: bd08 pop {r3, pc}
produced the same result as cortex-a5. for this simple example the machine code for this object is the same, same size, when built for cortex-a5 and cortex-m3
Now if I add a bootstrap, a main, and call this function and fill in the function it calls to create a complete, linked, program
00000000 <_start>:
0: f000 f802 bl 8 <notmain>
4: e7fe b.n 4 <_start+0x4>
...
00000008 <notmain>:
8: 2005 movs r0, #5
a: f000 b801 b.w 10 <addimm>
e: bf00 nop
00000010 <addimm>:
10: b508 push {r3, lr}
12: f000 f803 bl 1c <fun>
16: f200 1023 addw r0, r0, #291 ; 0x123
1a: bd08 pop {r3, pc}
0000001c <fun>:
1c: 4770 bx lr
1e: 46c0 nop ; (mov r8, r8)
We get a result. The addimm function itself did not change in size. with a cortex-a5 you have to have some arm code that then switches to thumb, and likely when linking with libraries, etc you may get a mixture of arm and thumb, so
00000000 <_start>:
0: eb000000 bl 8 <notmain>
4: eafffffe b 4 <_start+0x4>
00000008 <notmain>:
8: e92d4008 push {r3, lr}
c: e3a00005 mov r0, #5
10: fa000001 blx 1c <addimm>
14: e8bd4008 pop {r3, lr}
18: e12fff1e bx lr
0000001c <addimm>:
1c: b508 push {r3, lr}
1e: f000 e804 blx 28 <fun>
22: f200 1023 addw r0, r0, #291 ; 0x123
26: bd08 pop {r3, pc}
00000028 <fun>:
28: e12fff1e bx lr
overall larger binary, the addimm part itself did not change in size though.
as far as linking changing the size of the object, look at this example
bootstrap.s
.thumb
.thumb_func
.globl _start
_start:
bl notmain
hang: b hang
.thumb_func
.globl dummy
dummy:
bx lr
.code 32
.globl bounce
bounce:
bx lr
hello.c
void dummy ( void );
void bounce ( void );
void notmain ( void )
{
dummy();
bounce();
}
looking at an arm build of notmain by itself, the object:
00000000 <notmain>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: ebfffffe bl 0 <dummy>
c: ebfffffe bl 0 <bounce>
10: e24bd004 sub sp, fp, #4
14: e8bd4800 pop {fp, lr}
18: e12fff1e bx lr
depending on what is calling it and what it calls, the linker may have to add more code to deal with items that are defined outside the object, from global variables to external functions
00008000 <_start>:
8000: f000 f818 bl 8034 <__notmain_from_thumb>
00008004 <hang>:
8004: e7fe b.n 8004 <hang>
00008006 <dummy>:
8006: 4770 bx lr
00008008 <bounce>:
8008: e12fff1e bx lr
0000800c <notmain>:
800c: e92d4800 push {fp, lr}
8010: e28db004 add fp, sp, #4
8014: eb000003 bl 8028 <__dummy_from_arm>
8018: ebfffffa bl 8008 <bounce>
801c: e24bd004 sub sp, fp, #4
8020: e8bd4800 pop {fp, lr}
8024: e12fff1e bx lr
00008028 <__dummy_from_arm>:
8028: e59fc000 ldr ip, [pc] ; 8030 <__dummy_from_arm+0x8>
802c: e12fff1c bx ip
8030: 00008007 andeq r8, r0, r7
00008034 <__notmain_from_thumb>:
8034: 4778 bx pc
8036: 46c0 nop ; (mov r8, r8)
8038: eafffff3 b 800c <notmain>
803c: 00000000 andeq r0, r0, r0
dummy_from_arm and notmain_from_thumb were both added, an increase in the size of the binary. each object did not change in size but the overall binary did. bounce() was an arm to arm function, no patching, dummy() arm to thumb and notmain() thumb to main.
so you might have a cortex-m3 object, and a cortex-a5 object that as far as the code in that object goes they are both identical. But dopending on what you link them with, which eventually something is dfferent between a cortex-m3 system and a cortex-a5 system, you may see more or less code added by the linker to account for the system differences, libraries, operating system specific, etc even so much as where in the binary you put the object, if it has to have a further reach than it can with a single instruction, then the linker will add even more code.
This is all gcc specific stuff, each toolchain is going to deal with each of these problems in its own way. It is the nature of the beast when you use an object and linker model, a very good model but the compiler, assembler, and linker have to work together to make sure that global resources can be properly accessed when linked. has nothing to do with ARM, this problem exists with many/most processor architectures and the toolchains deal with those problems per toolchain, per version, per target architecture. When I said change the size of the object what I really meant was the linker may add more code to the final binary in order to deal with that object and how it interacts with others.

Resources