Related
I want to enable thumb mode in stm32f401re board. the code i had written for it is in embedded c. How do we enable thumb mode in embedded c language. Do we use -mthumb command for it, do we have to add any library prior using that command. Or is there any totally different method.
I searched and found the method only in assembly language. But i want it in embedded c. I used even the -mthumb command but it showed an error.
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x12345678));
}
$ arm-none-eabi-gcc -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <fun+0x14>
8: ebfffffe bl 0 <more_fun>
c: e8bd4010 pop {r4, lr}
10: e12fff1e bx lr
14: 12345678 .word 0x12345678
That is defaulting to arm, looks like armv4, so that should work on non-cortex-ms from armv4 to armv7 (couple of decades).
To get all thumb variants, which will work on your cortex-m4
$ arm-none-eabi-gcc -mthumb -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 4803 ldr r0, [pc, #12] ; (10 <fun+0x10>)
4: f7ff fffe bl 0 <more_fun>
8: bc10 pop {r4}
a: bc02 pop {r1}
c: 4708 bx r1
e: 46c0 nop ; (mov r8, r8)
10: 12345678 .word 0x12345678
add -mthumb, but you are using armv4t, it still works
a: bc02 pop {r1}
c: 4708 bx r1
Now you can move up to cortex-m0 which will work on all cortex-ms
$ arm-none-eabi-gcc -mcpu=cortex-m0 -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 4802 ldr r0, [pc, #8] ; (c <fun+0xc>)
4: f7ff fffe bl 0 <more_fun>
8: bd10 pop {r4, pc}
a: 46c0 nop ; (mov r8, r8)
c: 12345678 .word 0x12345678
the mthumb was not needed but we see it is not arv4t level it is newer
8: bd10 pop {r4, pc}
Note we did not need -mthumb, but always check just in case
And then you can go up to what you have if you wish
$ arm-none-eabi-gcc -mcpu=cortex-m4 -O2 -c so.c -o so.o
$ arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 4801 ldr r0, [pc, #4] ; (8 <fun+0x8>)
2: f7ff bffe b.w 0 <more_fun>
6: bf00 nop
8: 12345678 .word 0x12345678
okay that is a big disturbing, but I guess because of the additional thumb2 extensions that arm7-m has that armv6-m does not they chose this, they could have done the tail optimization with cortex-m0 or -mthumb as well.
I was hoping for this instead
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x00001234)+1);
}
Disassembly of section .text:
00000000 <fun>:
0: b508 push {r3, lr}
2: f241 2034 movw r0, #4660 ; 0x1234
6: f7ff fffe bl 0 <more_fun>
a: 3001 adds r0, #1
with the 16 bit immediates in two instructions but this one did an ldr, same number of bytes, slower, but whatever both work, I got it to generate one movw...
And then you link these objects together along with your bootstrap and then figure out how to get it on the flash in your mcu.
If all you wanted to know is how to make the compiler generate thumb instructions from C that is easy. If you have C code from some other mcu, then the instruction set is trivial and you may have a significant amount of work as a fair amount of the code has nothing to do with the instruction set but instead the chip which is likely completely incompatible with any other mcu that is not already cortex-m based (and even if cortex-m based if it is not same vendor same family you are still doing a re-write)
(I am new to the ARM world. Excuse me if this is a dumb question.)
I am using below command line to generate assembly code for a C file.
The cpu is arm926ej-s, which is ARMv5 architecture.
arm-none-eabi-gcc -mcpu=arm926ej-s -mthumb -S t.c -o t_thumb.S
arm-none-eabi-gcc -mcpu=arm926ej-s -marm -S t.c -o t_arm.S
I am expecting the -marm and -mthumb options would generate different function prologues. But they give similar results:
for -marm:
# args = 0, pretend = 0, frame = 72
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr} #<========== push is used instead of stmfd
add fp, sp, #4
sub sp, sp, #72
bl uart_init
for -mthumb:
# args = 0, pretend = 0, frame = 72
# frame_needed = 1, uses_anonymous_args = 0
push {r7, lr} #<========== push is used as expected
sub sp, sp, #72
add r7, sp, #0
bl uart_init
So they both use the push instruction. But as I checked the ARMv5 arch spec, the push instruction only belongs to the Thumb instruction set. I was expecting stmfd for the -marm option.
Why is push chosen instead?
How can I generate pure ARM instructions?
ADD 1 - 5:21 PM 12/18/2019
Below is the disassembly of the .o file:
arm-none-eabi-gcc -mcpu=arm926ej-s -marm -g -c t.c -o build/t_arm.o
arm-none-eabi-objdump.exe -d build/t_arm.o > t_arm.dism
The disassembly:
000002a0 <main>:
2a0: e92d4800 push {fp, lr} <=============== push is used!
2a4: e28db004 add fp, sp, #4
2a8: e24dd048 sub sp, sp, #72 ; 0x48
2ac: ebfffffe bl 0 <uart_init>
2b0: e59f3168 ldr r3, [pc, #360] ; 420 <main+0x180>
2b4: e50b300c str r3, [fp, #-12]
2b8: e59f1164 ldr r1, [pc, #356] ; 424 <main+0x184>
2bc: e51b000c ldr r0, [fp, #-12]
ADD 2 - 5:34 PM 12/18/2019
Thanks to #Erlkoenig.
I just tried to disassemble a -mthumb binary:
arm-none-eabi-gcc -mcpu=arm926ej-s -mthumb -g -c t.c -o build/t_thumb.o
arm-none-eabi-objdump.exe -d build/t_thumb.o > t_thumb.dism
A totally different thumb disassembly is shown:
00000170 <main>:
170: b580 push {r7, lr} <====== though still push is shown, but the encoding is different.
172: b092 sub sp, #72 ; 0x48
174: af00 add r7, sp, #0
176: f7ff fffe bl 0 <uart_init>
17a: 4b3c ldr r3, [pc, #240] ; (26c <main+0xfc>)
17c: 643b str r3, [r7, #64] ; 0x40
17e: 4a3c ldr r2, [pc, #240] ; (270 <main+0x100>)
180: 6c3b ldr r3, [r7, #64] ; 0x40
The hex encoding of the raw instruction as shown by objdump -d indicates that this is a 32bit ARM ("A32") instruction (0xe92d4800). The .S file generated by the -S flag to GCC, and the objdump output just use the ARM UAL (Unified Assembly Syntax), which uses push as an alias for stmfd, while the ARMv5T Architecture Reference Manual uses the old syntax, which has no push on A32. The instruction encoding matches the encoding of stmdb, for which stmfd is an alias. The encoding is shown on p. 339 in the ARMv5T Reference Manual.
A32 ("ARM") code can be easily recognized as all instructions are 4-byte wide and the first 4 bits are often hex E (which means that the condition code is AL, i.e. the instructions are always executed unconditionally):
[e]92d4800
[e]28db004
[e]24dd048
[e]bfffffe
This is helpful when viewing raw binaries in a hex editor. Thumb ("T32") code has many 16bit instructions, some 32bit, and no "stacks" of Es:
b580
b092
af00
f7ff fffe
Of course, for a raw binary, it is not directly clear which 2- and 4-byte groups belong together as instructions.
I have spent multiple days trying to figure this out and I just can't. I have some C code. I have made the assembly code for this C program, copy pasted the assembly to someone else's project (that only contains a single assembly file) and assembled that. In these case things work. But if I try to compile from C directly to generate the binaries, it doesn't work. Even though everything else should be identical. This is my C code:
#include <stdint.h>
#define REGISTERS_BASE 0x3F000000
#define MAIL_BASE 0xB880 // Base address for the mailbox registers
// This bit is set in the status register if there is no space to write into the mailbox
#define MAIL_FULL 0x80000000
// This bit is set in the status register if there is nothing to read from the mailbox
#define MAIL_EMPTY 0x40000000
struct Message
{
uint32_t messageSize;
uint32_t requestCode;
uint32_t tagID;
uint32_t bufferSize;
uint32_t requestSize;
uint32_t pinNum;
uint32_t on_off_switch;
uint32_t end;
};
struct Message m =
{
.messageSize = sizeof(struct Message),
.requestCode =0,
.tagID = 0x00038041,
.bufferSize = 8,
.requestSize =0,
.pinNum = 130,
.on_off_switch = 1,
.end = 0,
};
/** Main function - we'll never return from here */
int _start(void)
{
uint32_t mailbox = MAIL_BASE + REGISTERS_BASE + 0x18;
volatile uint32_t status;
do
{
status = *(volatile uint32_t *)(mailbox);
}
while((status & 0x80000000));
*(volatile uint32_t *)(MAIL_BASE + REGISTERS_BASE + 0x20) = ((uint32_t)(&m) & 0xfffffff0) | (uint32_t)(8);
while(1);
}
This is a linker file I copied from the successful method:
/*
* Very simple linker script, combing the text and data sections
* and putting them starting at address 0x800.
*/
SECTIONS {
/* Put the code at 0x80000, leaving room for ARM and
* the stack. It also conforms to the standard expecations.
*/
.init 0x8000 : {
*(.init)
}
.text : {
*(.text)
}
/* Put the data after the code */
.data : {
*(.data)
}
}
And these is how I am compiling and linking everything:
arm-none-eabi-gcc -O0 -march=armv8-a PiTest.c -nostartfiles -o kernel.o
arm-none-eabi-ld kernel.o -o kernel.elf -T kernel.ld
arm-none-eabi-objcopy kernel.elf -O binary kernel.img
My target architecture is armv8 since that's what the pi model 3 uses.
I have no idea how the generated assembly works, but the C code directly does not. Please help I am on the verge of madness.
EDIT: The expected behaviour is for the pi's light to turn on. which it does with the first method I described. With the second method the light remains off.
EDIT4: Made some changes to files, deleted previous edits with outdated info to reduce post size
kernel.elf: file format elf32-littlearm
Disassembly of section .init:
00008000 <_start>:
8000: e3a0dd7d mov sp, #8000 ; 0x1f40
8004: eaffffff b 8008 <kernel_main>
Disassembly of section .text:
00008008 <kernel_main>:
8008: e52db004 push {fp} ; (str fp, [sp, #-4]!)
800c: e28db000 add fp, sp, #0
8010: e24dd00c sub sp, sp, #12
8014: e30b3898 movw r3, #47256 ; 0xb898
8018: e3433f00 movt r3, #16128 ; 0x3f00
801c: e50b3008 str r3, [fp, #-8]
8020: e51b3008 ldr r3, [fp, #-8]
8024: e5933000 ldr r3, [r3]
8028: e50b300c str r3, [fp, #-12]
802c: e51b300c ldr r3, [fp, #-12]
8030: e3530000 cmp r3, #0
8034: bafffff9 blt 8020 <kernel_main+0x18>
8038: e30b38a0 movw r3, #47264 ; 0xb8a0
803c: e3433f00 movt r3, #16128 ; 0x3f00
8040: e3082050 movw r2, #32848 ; 0x8050
8044: e3402001 movt r2, #1
8048: e3c2200f bic r2, r2, #15
804c: e3822008 orr r2, r2, #8
8050: e5832000 str r2, [r3]
8054: eafffffe b 8054 <kernel_main+0x4c>
Disassembly of section .data:
00008058 <__data_start>:
8058: 00000020 andeq r0, r0, r0, lsr #32
805c: 00000000 andeq r0, r0, r0
8060: 00038041 andeq r8, r3, r1, asr #32
8064: 00000008 andeq r0, r0, r8
8068: 00000000 andeq r0, r0, r0
806c: 00000082 andeq r0, r0, r2, lsl #1
8070: 00000001 andeq r0, r0, r1
8074: 00000000 andeq r0, r0, r0
Disassembly of section .ARM.attributes:
00000000 <_stack-0x80021>:
0: 00002e41 andeq r2, r0, r1, asr #28
4: 61656100 cmnvs r5, r0, lsl #2
8: 01006962 tsteq r0, r2, ror #18
c: 00000024 andeq r0, r0, r4, lsr #32
10: 412d3805 ; <UNDEFINED> instruction: 0x412d3805
14: 070e0600 streq r0, [lr, -r0, lsl #12]
18: 09010841 stmdbeq r1, {r0, r6, fp}
1c: 14041202 strne r1, [r4], #-514 ; 0xfffffdfe
20: 17011501 strne r1, [r1, -r1, lsl #10]
24: 1a011803 bne 46038 <__bss_end__+0x3dfc0>
28: 2a012201 bcs 48834 <__bss_end__+0x407bc>
2c: Address 0x000000000000002c is out of bounds.
Disassembly of section .comment:
00000000 <.comment>:
0: 3a434347 bcc 10d0d24 <_stack+0x1050d03>
4: 35312820 ldrcc r2, [r1, #-2080]! ; 0xfffff7e0
8: 392e343a stmdbcc lr!, {r1, r3, r4, r5, sl, ip, sp}
c: 732b332e ; <UNDEFINED> instruction: 0x732b332e
10: 33326e76 teqcc r2, #1888 ; 0x760
14: 37373131 ; <UNDEFINED> instruction: 0x37373131
18: 2029312d eorcs r3, r9, sp, lsr #2
1c: 2e392e34 mrccs 14, 1, r2, cr9, cr4, {1}
20: 30322033 eorscc r2, r2, r3, lsr r0
24: 35303531 ldrcc r3, [r0, #-1329]! ; 0xfffffacf
28: 28203932 stmdacs r0!, {r1, r4, r5, r8, fp, ip, sp}
2c: 72657270 rsbvc r7, r5, #112, 4
30: 61656c65 cmnvs r5, r5, ror #24
34: 00296573 eoreq r6, r9, r3, ror r5
kernel8.img
12345678
00000800
00080264
00000000
12345678
kernel8-32.img
12345678
00008320
00008224
200001DA
12345678
kernel7.img
12345678
00000700
00008224
200001DA
12345678
kernel.img
12345678
00000000
00008224
200001DA
12345678
when I wrote and posted this code this is what I got so if you name your file kernel.img then 0x8000 is your entry point the answer I gave in your other SO question is a complete raspberry pi starting point. You can simply add your mailbox stuff, although if you are struggling with this I thing the mailbox and video are not where you should start IMO.
if you name the file kernel8.img then the entry point is 0x80000 change the linker script to match.
I have a serial port based bootloader you can use to save on the sd card dance, can get a long way with that then simply use the binary version of what you are creating to write to the flash once your application is working.
EDIT
Okay this is incredibly disgusting and by posting it here maybe that means you cant use it in your classwork...you should really do this right and not use inline assembly for your bootstrap...
so.c
asm(
".globl _start\n"
"_start:\n"
"mov sp,#0x8000\n"
"bl centry\n"
"b .\n"
);
unsigned int centry ( void )
{
return(5);
}
build
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -Ttext=0x8000 so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf -O binary kernel.img
examine
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
0000800c <centry>:
800c: e3a00005 mov r0, #5
8010: e12fff1e bx lr
A complete raspberry pi C with bootstrap example that will work on any of the flavors of pi (so far as I know they might have changed the GPU bootloader in the last few months but assume the didnt).
There are a couple of things I see wrong here. The most obvious ones are:
You aren't leaving anything at address 0, so the CPU is left executing blank memory at startup. You need to put something (like a branch instruction!) at 0x0.
On ARM Cortex-A, the stack pointer is not initialized at startup. You have to initialize it yourself in _start -- which means you will need to write that function in assembly.
First, cudos to old timer for his patience helping me.
The mistakes were:
Wrong entry point for the program, fixed by creating an assembly file with the label _start to set the stack pointer and using the linker to put the init section at address 0x8000
The compilation line itself was also wrong, it was missing a -c argument
I'm trying to construct a call stack on a Cortex-M3 processor (ARMv7-M architecture), no OS (bare metal). However, there is no frame pointer register for this ABI. Therefore I'm struggling to generate the call stack when I have no frame pointer.
Regardless of using -mapcs-frame, -fno-omit-frame-pointer and -O0 options with GCC, no frame pointer is kept. I'm wondering if there's a different ABI I can force GCC to use so I have a frame pointer/stack frame? If not, is there some other reliable method of generating a call stack?
Thanks in advance.
BTW wrt the comment above, the ARM calling standard is the same as Thumb, see (AAPCS Arm calling standard) The instruction sets are different BUT the CPU register set is not.
I would prefer to ask questions in the comments, but as yet I do not have enough points.
With that in mind....
Do you have a binary successfully built and execution, but you are trying to dump some kind of call trace? My confusion is the 'no frame pointer register' statement - r13 is the stack frame pointer. I think you are referring to storing the frame pointer though.
It has been a while but I think these are the options I used
arm-none-eabi-gcc - -nostdlib -ggdb -mthumb -mcpu=cortex-m3
-mtpcs-frame -mtpcs-leaf-frame myfile.c
This was on gcc-arm-none downloaded from linaro.
gdb was able to do a backtrace with those options on an Atmel SAM3X.
The Thumb ABI was the same as the ARM EABI, or at least appears to be looking at the assembler via objdump -D .
The previous frame pointer gets stored in r7 when -fno-omit-frame-pointer is specified (or implied)
void test2(int i) {}
void main() { test(0);
Compiled with -fomit-frame-pointer
00008000 <test2>:
8000: b082 sub sp, #8
8002: 9001 str r0, [sp, #4]
8004: b002 add sp, #8
8006: 4770 bx lr
00008008 <main>:
8008: b508 push {r3, lr}
800a: f04f 0000 mov.w r0, #0
800e: f7ff fff7 bl 8000 <test2>
8012: bd08 pop {r3, pc}
Compiled with -fno-omit-frame-pointer
00008000 <test2>:
8000: b480 push {r7}
8002: b083 sub sp, #12
8004: af00 add r7, sp, #0
8006: 6078 str r0, [r7, #4]
8008: f107 070c add.w r7, r7, #12
800c: 46bd mov sp, r7
800e: bc80 pop {r7}
8010: 4770 bx lr
8012: bf00 nop
00008014 <main>:
8014: b580 push {r7, lr}
8016: af00 add r7, sp, #0
8018: f04f 0000 mov.w r0, #0
801c: f7ff fff0 bl 8000 <test2>
8020: bd80 pop {r7, pc}
8022: bf00 nop
So use r7 to get to the previous stack frame, then get the next r7 from that location and so on.
Comparing two Thumb-2 micros from two different manufacturers. One's a Cortex M3, one's an A5. Are they guaranteed to compile a particular piece of code to the same codesize?
so here goes
fun.c
unsigned int fun ( unsigned int x )
{
return(x);
}
addimm.c
extern unsigned int fun ( unsigned int );
unsigned int addimm ( unsigned int x )
{
return(fun(x)+0x123);
}
for demonstration purposes building for bare metal, not really a functional program but it compiles clean and demonstrates what I intend to demonstrate.
arm instructions
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-a5 -march=armv7-a -c addimm.c -o addimma.o
disassembly of the object, not linked
00000000 <addimm>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <fun>
8: e2800e12 add r0, r0, #288 ; 0x120
c: e2800003 add r0, r0, #3
10: e8bd8008 pop {r3, pc}
thumb generic (armv4 or v5 whatever the default was for this compiler build)
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -c addimm.c -o addimmt.o
00000000 <addimm>:
0: b508 push {r3, lr}
2: f7ff fffe bl 0 <fun>
6: 3024 adds r0, #36 ; 0x24
8: 30ff adds r0, #255 ; 0xff
a: bc08 pop {r3}
c: bc02 pop {r1}
e: 4708 bx r1
cortex-a5 specific
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-a5 -march=armv7-a -c addimm.c -o addimma5.o
00000000 <addimm>:
0: b508 push {r3, lr}
2: f7ff fffe bl 0 <fun>
6: f200 1023 addw r0, r0, #291 ; 0x123
a: bd08 pop {r3, pc}
cortex-a5 is armv7-a which supports thumb-2 as far as the add immediate itself goes and related to binary size there is no optimization here, 32 bits for thumb and 32 bits for thumb2. But this is but one example there perhaps will be times that thumb2 produces smaller binaries than thumb.
cortex-m3
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c addimm.c -o addimmm3.o
00000000 <addimm>:
0: b508 push {r3, lr}
2: f7ff fffe bl 0 <fun>
6: f200 1023 addw r0, r0, #291 ; 0x123
a: bd08 pop {r3, pc}
produced the same result as cortex-a5. for this simple example the machine code for this object is the same, same size, when built for cortex-a5 and cortex-m3
Now if I add a bootstrap, a main, and call this function and fill in the function it calls to create a complete, linked, program
00000000 <_start>:
0: f000 f802 bl 8 <notmain>
4: e7fe b.n 4 <_start+0x4>
...
00000008 <notmain>:
8: 2005 movs r0, #5
a: f000 b801 b.w 10 <addimm>
e: bf00 nop
00000010 <addimm>:
10: b508 push {r3, lr}
12: f000 f803 bl 1c <fun>
16: f200 1023 addw r0, r0, #291 ; 0x123
1a: bd08 pop {r3, pc}
0000001c <fun>:
1c: 4770 bx lr
1e: 46c0 nop ; (mov r8, r8)
We get a result. The addimm function itself did not change in size. with a cortex-a5 you have to have some arm code that then switches to thumb, and likely when linking with libraries, etc you may get a mixture of arm and thumb, so
00000000 <_start>:
0: eb000000 bl 8 <notmain>
4: eafffffe b 4 <_start+0x4>
00000008 <notmain>:
8: e92d4008 push {r3, lr}
c: e3a00005 mov r0, #5
10: fa000001 blx 1c <addimm>
14: e8bd4008 pop {r3, lr}
18: e12fff1e bx lr
0000001c <addimm>:
1c: b508 push {r3, lr}
1e: f000 e804 blx 28 <fun>
22: f200 1023 addw r0, r0, #291 ; 0x123
26: bd08 pop {r3, pc}
00000028 <fun>:
28: e12fff1e bx lr
overall larger binary, the addimm part itself did not change in size though.
as far as linking changing the size of the object, look at this example
bootstrap.s
.thumb
.thumb_func
.globl _start
_start:
bl notmain
hang: b hang
.thumb_func
.globl dummy
dummy:
bx lr
.code 32
.globl bounce
bounce:
bx lr
hello.c
void dummy ( void );
void bounce ( void );
void notmain ( void )
{
dummy();
bounce();
}
looking at an arm build of notmain by itself, the object:
00000000 <notmain>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: ebfffffe bl 0 <dummy>
c: ebfffffe bl 0 <bounce>
10: e24bd004 sub sp, fp, #4
14: e8bd4800 pop {fp, lr}
18: e12fff1e bx lr
depending on what is calling it and what it calls, the linker may have to add more code to deal with items that are defined outside the object, from global variables to external functions
00008000 <_start>:
8000: f000 f818 bl 8034 <__notmain_from_thumb>
00008004 <hang>:
8004: e7fe b.n 8004 <hang>
00008006 <dummy>:
8006: 4770 bx lr
00008008 <bounce>:
8008: e12fff1e bx lr
0000800c <notmain>:
800c: e92d4800 push {fp, lr}
8010: e28db004 add fp, sp, #4
8014: eb000003 bl 8028 <__dummy_from_arm>
8018: ebfffffa bl 8008 <bounce>
801c: e24bd004 sub sp, fp, #4
8020: e8bd4800 pop {fp, lr}
8024: e12fff1e bx lr
00008028 <__dummy_from_arm>:
8028: e59fc000 ldr ip, [pc] ; 8030 <__dummy_from_arm+0x8>
802c: e12fff1c bx ip
8030: 00008007 andeq r8, r0, r7
00008034 <__notmain_from_thumb>:
8034: 4778 bx pc
8036: 46c0 nop ; (mov r8, r8)
8038: eafffff3 b 800c <notmain>
803c: 00000000 andeq r0, r0, r0
dummy_from_arm and notmain_from_thumb were both added, an increase in the size of the binary. each object did not change in size but the overall binary did. bounce() was an arm to arm function, no patching, dummy() arm to thumb and notmain() thumb to main.
so you might have a cortex-m3 object, and a cortex-a5 object that as far as the code in that object goes they are both identical. But dopending on what you link them with, which eventually something is dfferent between a cortex-m3 system and a cortex-a5 system, you may see more or less code added by the linker to account for the system differences, libraries, operating system specific, etc even so much as where in the binary you put the object, if it has to have a further reach than it can with a single instruction, then the linker will add even more code.
This is all gcc specific stuff, each toolchain is going to deal with each of these problems in its own way. It is the nature of the beast when you use an object and linker model, a very good model but the compiler, assembler, and linker have to work together to make sure that global resources can be properly accessed when linked. has nothing to do with ARM, this problem exists with many/most processor architectures and the toolchains deal with those problems per toolchain, per version, per target architecture. When I said change the size of the object what I really meant was the linker may add more code to the final binary in order to deal with that object and how it interacts with others.