"__floatsidf" undefined warning when compiling kernel module - c

I was writing a loadable kernel module and when trying to compile it, the linker fails with the following message:
*** Warning: "__floatsidf" [/testing/Something.ko] undefined!
I am NOT using floating point variables, so that's not it. What is the cause of such errors?
Note: I'm using Linux ubuntu kernel v. 3.5.0-23-generic

__floatsidf is a runtime routine to convert a 32-bit signed integer into a double precision floating point number. Somewhere in your project is a line that looks like:
double foo = bar;
Or something similar, where bar is a 32-bit integer. It could also be that you're calling one of the libm functions (or any other function, really) that expects a double with an integer parameter:
foo = pow(bar, baz);
where either bar or baz (or both) is an integer.
Without showing some code, there's not much more we can do to help.
To narrow it down, check the object files your compiler generates (before you link them) and look in the disassembly for a reference to that symbol - that should tell you what function it's happening in.
Here's an example of what I mean. First up - source code:
#include <math.h>
int function(int x, int y)
{
return pow(x, y);
}
Pretty straightforward, right? Now, I'm going to compile it for ARM and disassemble:
$ clang -arch arm -O2 -c -o example.o example.c
$ otool -tV example.o
example.o:
(__TEXT,__text) section
_function:
00000000 e92d40f0 push {r4, r5, r6, r7, lr}
00000004 e28d700c add r7, sp, #12
00000008 e1a04001 mov r4, r1
0000000c ebfffffb bl ___floatsidf
00000010 e1a05000 mov r5, r0
00000014 e1a00004 mov r0, r4
00000018 e1a06001 mov r6, r1
0000001c ebfffff7 bl ___floatsidf
00000020 e1a02000 mov r2, r0
00000024 e1a03001 mov r3, r1
00000028 e1a00005 mov r0, r5
0000002c e1a01006 mov r1, r6
00000030 ebfffff2 bl _pow
00000034 ebfffff1 bl ___fixdfsi
00000038 e8bd40f0 pop {r4, r5, r6, r7, lr}
0000003c e12fff1e bx lr
Look at that - two calls to __floatsidf and one to __fixdfsi, matching the two conversions of x and y to double and then the conversion of the return type back to int.

You are using floating point somewhere - your module includes a conversion from int to double. It might be as simple as calling a function that takes a double parameter and passing an int.
You could try searching your code for "double".
You could try compiling your module to assembly code and looking at that to find which function uses __floatsidf.
Remember that the use of double might be in a header file, possibly one written by someone else.

Related

Call C function from Assembly, passing args and getting the return value in the ARM calling convention

I want to call a C function, say:
int foo(int a, int b) {return 2;}
inside an assembly (ARM) code. I read that I need to mention
import foo
in my assembly code, for assembler to search for foo in C file. But, I am stuck at passing arguments a and b from assembly and retrieving an integer (here 2) again back in assembly. Could someone could explain me how to do this, with a mini example?
You have already written the minimal example.
int foo(int a, int b) {return 2;}
compile and disassemble
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
Anything to do with a and b are dead code so optimized out. While using C to learn asm is good/okay to get started you really want to do it with optimizations on which mean you have to work harder on crafting the experimental code.
int foo(int a, int b) {return 2;}
int bar ( void )
{
return(foo(5,4));
}
and we learn nothing new.
Disassembly of section .text:
00000000 <foo>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
00000008 <bar>:
8: e3a00002 mov r0, #2
c: e12fff1e bx lr
need to do this for the call:
int foo(int a, int b);
int bar ( void )
{
return(foo(5,4));
}
and now we see
00000000 <bar>:
0: e92d4010 push {r4, lr}
4: e3a01004 mov r1, #4
8: e3a00005 mov r0, #5
c: ebfffffe bl 0 <foo>
10: e8bd4010 pop {r4, lr}
14: e12fff1e bx lr
(yes this is built for the this compilers default target armv4t, should be obvious to some others have no clue how I/we know)(can also tell how new/old the compiler is from this example as well (there was an abi change years ago that is visible here)(the newer versions of gcc are worse than older so older is still good to use for some use cases))
per this compilers convention (now while this compiler does use the arm convention of some version of some document for some version of this compiler, always remember it is the compiler authors choice, they are under no obligation to conform to anyones written standard, they choose)
So we see that the first parameter goes in r0, the second in r1. You can craft functions with more operands or more types of operands to see what nuances there are. How many are in registers and when they start using the stack instead. For example try a 64 bit variable then a 32 in that order as operands then try it in reverse.
To see what is going on on the callee side.
int foo(int a, int b)
{
return((a<<1)+b+0x123);
}
We see that r0 and r1 are the first two operands, the compiler would be grossly broken otherwise.
00000000 <foo>:
0: e0810080 add r0, r1, r0, lsl #1
4: e2800e12 add r0, r0, #288 ; 0x120
8: e2800003 add r0, r0, #3
c: e12fff1e bx lr
What we did not see explicitly in the caller example is that r0 is where the return is stored (at least for this variable type).
The ABI documention is not an easy read, but if you first "just try it" then if you wish refer to the documentation it should help with the documentation. At the end of the day you have a compiler you are going to use, it has a convention and is probably part of a toolchain so you must conform to that compilers convention not some third party document (even if that third party is arm) AND you should probably use that toolchain's assembler which means you should use that assembly language (many incompatible assembly languages for arm, the tool defines the language not the target).
You can see how simple it is to figure this out on your own.
And...so this gets painful but you can look at the assembly output of the compiler, at least some will let you. With gcc you can use -save-temps or -S
int foo(int a, int b)
{
return 2;
}
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global foo
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #2
bx lr
.size foo, .-foo
.ident "GCC: (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
Almost none of this do you "need".
The minimum looks like this
.globl foo
foo:
mov r0,#2
bx lr
.global or .globl are equivalent, somewhat reflects the age or how/when you learned gnu assembler.
Now this will break if you are mixing arm and thumb instructions, this defaults to arm.
arm-none-eabi-as x.s -o x.o
arm-none-eabi-objdump -d x.o
x.o: file format elf32-littlearm
Disassembly of section .text:
00000000 :
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
If we want thumb then we have to tell it
.thumb
.globl foo
foo:
mov r0,#2
bx lr
and we get thumb.
00000000 <foo>:
0: 2002 movs r0, #2
2: 4770 bx lr
With ARM and with the gnu toolchain at least you can mix arm and thumb and the linker will take care of the transition
int foo ( int, int );
int fun ( void )
{
return(foo(1,2));
}
we do not need a bootstrap nor other things to get the linker to link so we can see how that part of it works.
arm-none-eabi-ld so.o x.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: e3a01002 mov r1, #2
8008: e3a00001 mov r0, #1
800c: eb000001 bl 8018 <foo>
8010: e8bd4010 pop {r4, lr}
8014: e12fff1e bx lr
00008018 <foo>:
8018: 2002 movs r0, #2
801a: 4770 bx lr
Now this is broken not just because we have no bootstrap, etc, but there is a bl to foo but foo is thumb and the caller is arm. So for gnu assembler for arm you can take this shortcut which I think I learned from an older gcc, but whatever
.thumb
.thumb_func
.globl foo
foo:
mov r0,#2
bx lr
.thumb_func says the next label you find is considered a function label not just an address.
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: e3a01002 mov r1, #2
8008: e3a00001 mov r0, #1
800c: eb000003 bl 8020 <__foo_from_arm>
8010: e8bd4010 pop {r4, lr}
8014: e12fff1e bx lr
00008018 <foo>:
8018: 2002 movs r0, #2
801a: 4770 bx lr
801c: 0000 movs r0, r0
...
00008020 <__foo_from_arm>:
8020: e59fc000 ldr ip, [pc] ; 8028 <__foo_from_arm+0x8>
8024: e12fff1c bx ip
8028: 00008019 .word 0x00008019
802c: 00000000 .word 0x00000000
The linker adds a trampoline as I call it, I think others call it a vaneer. Either way the toolchain took care of is so long as we write the code right.
Remember and in particular this syntax for the assembler is very much assembler specific other assemblers may have other syntax to make this work. From the gcc generated code we see the generic solution which is more typing but probably a better habit.
.thumb
.type foo, %function
.global foo
foo:
mov r0,#2
bx lr
the .type foo, %function works for both arm and thumb in gnu assembler for arm. And it does not have to be positioned just before the labe (just like .globl or .global does not either. We get the same result from the toolchain with this assembly language.
Just for demonstration...
arm-none-eabi-as x.s -o x.o
arm-none-eabi-gcc -O2 -mthumb -c so.c -o so.o
arm-none-eabi-ld so.o x.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <fun>:
8000: b510 push {r4, lr}
8002: 2102 movs r1, #2
8004: 2001 movs r0, #1
8006: f000 f807 bl 8018 <__foo_from_thumb>
800a: bc10 pop {r4}
800c: bc02 pop {r1}
800e: 4708 bx r1
00008010 <foo>:
8010: e3a00002 mov r0, #2
8014: e12fff1e bx lr
00008018 <__foo_from_thumb>:
8018: 4778 bx pc
801a: e7fd b.n 8018 <__foo_from_thumb>
801c: eafffffb b 8010 <foo>
And you can see it works both ways thumb to arm arm to thumb if we write the asm write it does the rest of the work for us.
Now I personally hate the unified syntax, it is one of the major mistakes arm has made along with CMSIS. But, you want to do this for a living you find that you pretty much hate most corporate decisions and worse, have to work/operate with them. Often the time unified syntax generates the wrong instruction and have to fiddle with the syntax to get it to work, but if I have to get a specific instruction then I have to fiddle about to get it to generate the specific instruction I am after. Other than a bootstrap and some other exceptions you do not often write assembly language anyway, usually compile something then take the compiler generated code and tune it or replace it.
I started with the arm gnu tools before unified syntax so I am used to
.thumb
.globl hello
hello:
sub r0,#1
bne hello
instead of
.thumb
.globl hello
hello:
subs r0,#1
bne hello
And fine with bouncing between the two syntaxes (unified and not, yes two assembly languages within one tool).
All of the above is with the 32 bit arm, if you are interested in 64 bit arm, AND using gnu tools, then a percentage of this still applies, you just need to use the aarch64 tools not the arm tools from gnu. ARM's aarch64 is a completely different, and incompatible, instruction set from aarch32. But gnu syntax like .global and .type...function are often used across all gnu supported targets. There are exceptions for some directives, but if you take the same approach of having the tools themselves tell you how they work...by using them...You can figure this out.
so.elf: file format elf64-littleaarch64
Disassembly of section .text:
0000000000400000 <fun>:
400000: 52800041 mov w1, #0x2 // #2
400004: 52800020 mov w0, #0x1 // #1
400008: 14000001 b 40000c <foo>
000000000040000c <foo>:
40000c: 52800040 mov w0, #0x2 // #2
400010: d65f03c0 ret
What you need to do is place the arguments in the correct registers (or on the stack) as required. All the details on how to do this are what is known as the calling convention and forms a very important part of the Application Binary Interface(ABI).
Details on the ARM (Armv7) calling convention can be found at: https://developer.arm.com/documentation/den0013/d/Application-Binary-Interfaces/Procedure-Call-Standard

What is 'veneer' that arm linker uses in function call?

I just read https://www.keil.com/support/man/docs/armlink/armlink_pge1406301797482.htm. but can't understand what a veneer is that arm linker inserts between function calls.
In "Procedure Call Standard for the ARM Architecture" document, it says,
5.3.1.1 Use of IP by the linker Both the ARM- and Thumb-state BL instructions are unable to address the full 32-bit address space, so
it may be necessary for the linker to insert a veneer between the
calling routine and the called subroutine. Veneers may also be needed
to support ARM-Thumb inter-working or dynamic linking. Any veneer
inserted must preserve the contents of all registers except IP (r12)
and the condition code flags; a conforming program must assume that a
veneer that alters IP may be inserted at any branch instruction that
is exposed to a relocation that supports inter-working or long
branches. Note R_ARM_CALL, R_ARM_JUMP24, R_ARM_PC24, R_ARM_THM_CALL,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19 are examples of the ELF
relocation types with this property. See [AAELF] for full details
Here is what I guess, is it something like this ? : when function A calls function B, and when those two functions are too far apart for the bl command to express, the linker inserts function C between function A and B in such a way function C is close to function B. Now function A uses b instruction to go to function C(copying all the registers between the function call), and function C uses bl instruction(copying all the registers too). Of course the r12 register is used to keep the remaining long jump address bits. Is this what veneer means? (I don't know why arm doesn't explain what veneer is but only what veneer provides..)
It is just a trampoline. Interworking is the easier one to demonstrate, using gnu here, but the implication is that Kiel has a solution as well.
.globl even_more
.type eve_more,%function
even_more:
bx lr
.thumb
.globl more_fun
.thumb_func
more_fun:
bx lr
extern unsigned int more_fun ( unsigned int x );
extern unsigned int even_more ( unsigned int x );
unsigned int fun ( unsigned int a )
{
return(more_fun(a)+even_more(a));
}
Unlinked object:
Disassembly of section .text:
00000000 <fun>:
0: e92d4070 push {r4, r5, r6, lr}
4: e1a05000 mov r5, r0
8: ebfffffe bl 0 <more_fun>
c: e1a04000 mov r4, r0
10: e1a00005 mov r0, r5
14: ebfffffe bl 0 <even_more>
18: e0840000 add r0, r4, r0
1c: e8bd4070 pop {r4, r5, r6, lr}
20: e12fff1e bx lr
Linked binary (yes completely unusable, but demonstrates what the tool does)
Disassembly of section .text:
00001000 <fun>:
1000: e92d4070 push {r4, r5, r6, lr}
1004: e1a05000 mov r5, r0
1008: eb000008 bl 1030 <__more_fun_from_arm>
100c: e1a04000 mov r4, r0
1010: e1a00005 mov r0, r5
1014: eb000002 bl 1024 <even_more>
1018: e0840000 add r0, r4, r0
101c: e8bd4070 pop {r4, r5, r6, lr}
1020: e12fff1e bx lr
00001024 <even_more>:
1024: e12fff1e bx lr
00001028 <more_fun>:
1028: 4770 bx lr
102a: 46c0 nop ; (mov r8, r8)
102c: 0000 movs r0, r0
...
00001030 <__more_fun_from_arm>:
1030: e59fc000 ldr r12, [pc] ; 1038 <__more_fun_from_arm+0x8>
1034: e12fff1c bx r12
1038: 00001029 .word 0x00001029
103c: 00000000 .word 0x00000000
You cannot use bl to switch modes between arm and thumb so the linker has added a trampoline as I call it or have heard it called that you hop on and off to get to the destination. In this case essentially converting the branch part of bl into a bx, the link part they take advantage of just using the bl. You can see this done for thumb to arm or arm to thumb.
The even_more function is in the same mode (ARM) so no need for the trampoline/veneer.
For the distance limit of bl lemme see. Wow, that was easy, and gnu called it a veneer as well:
.globl more_fun
.type more_fun,%function
more_fun:
bx lr
extern unsigned int more_fun ( unsigned int x );
unsigned int fun ( unsigned int a )
{
return(more_fun(a)+1);
}
MEMORY
{
bob : ORIGIN = 0x00000000, LENGTH = 0x1000
ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.some : { so.o(.text*) } > bob
.more : { more.o(.text*) } > ted
}
Disassembly of section .some:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: eb000003 bl 18 <__more_fun_veneer>
8: e8bd4010 pop {r4, lr}
c: e2800001 add r0, r0, #1
10: e12fff1e bx lr
14: 00000000 andeq r0, r0, r0
00000018 <__more_fun_veneer>:
18: e51ff004 ldr pc, [pc, #-4] ; 1c <__more_fun_veneer+0x4>
1c: 20000000 .word 0x20000000
Disassembly of section .more:
20000000 <more_fun>:
20000000: e12fff1e bx lr
Staying in the same mode it did not need the bx.
The alternative is that you replace every bl instruction at compile time with a more complicated solution just in case you need to do a far call. Or since the bl offset/immediate is computed at link time you can, at link time, put the trampoline/veneer in to change modes or cover the distance.
You should be able to repeat this yourself with Kiel tools, all you needed to do was either switch modes on an external function call or exceed the reach of the bl instruction.
Edit
Understand that toolchains vary and even within a toolchain, gcc 3.x.x was the first to support thumb and I do not know that I saw this back then. Note the linker is part of binutils which is as separate development from gcc. You mention "arm linker", well arm has its own toolchain, then they bought Kiel and perhaps replaced Kiel's with their own or not. Then there is gnu and clang/llvm and others. So it is not a case of "arm linker" doing this or that, it is a case of the toolchains linker doing this or that and each toolchain is first free to use whatever calling convention they want there is no mandate that they have to use ARM's recommendations, second they can choose to implement this or not or simply give you a warning and you have to deal with it (likely in assembly language or through function pointers).
ARM does not need to explain it, or let us say, it is clearly explained in the Architectural Reference Manual (look at the bl instruction, the bx instruction look for the words interworking, etc. All quite clearly explained) for a particular architecture. So there is no reason to explain it again. Especially for a generic statement where the reach of bl varies and each architecture has different interworking features, it would be a long set of paragraphs or a short chapter to explain something that is already clearly documented.
Anyone implementing a compiler and linker would be well versed in the instruction set before hand and understand the bl and conditional branch and other limitations of the instruction set. Some instruction sets offer near and far jumps and some of those the assembly language for the near and far may be the same mnemonic so the assembler will often decide if it does not see the label in the same file to implement a far jump/call rather than a near one so that the objects can be linked.
In any case before linking you have to compile and assembly and the toolchain folks will have fully understood the rules of the architecture. ARM is not special here.
This is Raymond Chen's comment :
The veneer has to be close to A because B is too far away. A does a bl
to the veneer, and the veneer sets r12 to the final destination(B) and
does a bx r12. bx can reach the entire address space.
This answers to my question enough, but he doesn't want to write a full answer (maybe for lack of time..) I put it here as an answer and select it. If someone posts a better, more detailed answer, I'll switch to it.

ARM Parameter Passing

I am trying to write an ARM program that takes three numbers and calculates the discriminant. It has two source files, driver.s & prog3.s. I understand how to find the discriminate, but how do I pass the values A, B, & C into the discrim function from the main function? I have included the code I typed thus far....
MAIN() driver.s
avalue .reg r0
bvalue .req r1
cvalue .req r2
final .req r3
loopcount .req r4
readA:
.ascii “%d”
readB:
.ascii “%d”
readC:
.ascii “%d”
addressReadA: .word readA
addressReadB: .word readB
addressReadC: .word readC
main:
ldr avalue, addressReadA # load in avalue
ldr bvalue, addressReadB # load in bvalue
ldr cvalue, addressReadC # load in cvalue
DISCRIM() prog3.s
avalue .reg r0
bvalue .req r1
cvalue .req r2
final .req r3
discrim:
mul bvalue, bvalue, bvalue # square bvalue
mul avalue, avalue, #4 # multiply avalue by 4
mul cvalue, avalue, cvalue # multiply avalue by cvalue
add final, bvalue, cvalue # calculated discriminant
Going with the calling convention that C compilers use is not a bad idea, esp since if you go from pure assembly programs to C and asm mixed, you already have that experience. And/or you may see the simplicity and wisdom in the calling conventions used.
How do you know what the calling convention for a compiler is? 1) read the manual/documentation and google. 2) just try it. Prototype a function that is similar in the number of operands the type of operands and return value and feed it real-ish numbers and see what it produces.
Compiling to asm sometimes works but with pseudo instructions and other things done by the assembler I prefer to dissemble than to compile to asm YMMV.
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c );
unsigned int test ( void )
{
return(fun(1,2,3));
}
which with gnu currently results in
00000000 <test>:
0: e92d4010 push {r4, lr}
4: e3a02003 mov r2, #3
8: e3a01002 mov r1, #2
c: e3a00001 mov r0, #1
10: ebfffffe bl 0 <fun>
14: e8bd4010 pop {r4, lr}
18: e12fff1e bx lr
Each combination of compiler and target may have a different calling convention, there is no reason to assume that different compilers or versions of the same compiler use the same convention. ARM, MIPS, and no doubt others try to help/encourage/suggest a calling convention to use and some compilers simply follow that, why not.
There are lots of exceptions to the rule in the convention, but for ARM for the first up to four registers worth of parameters, in this case for up to four signed or unsigned integers or up to four less than or equal to 32 bit quantities (float can create exceptions) the first four general purposes regisers are used r0 for the first parameter r1 for the second and so on. And currently the standard keeps the stack aligned on 64 bit boundaries.
So we see that the first parameter is indeed placed in r0 the second in r1 and third in r2, obviously you dont have to arrange those three instructions in that order, doesnt matter.
because this function is calling another function it has to preserve its return value in lr so that goes on the stack, because the standard says to keep the stack aligned on 64 bit boundaries they are pushing another register on the stack r4 is arbitrary it could be any register, this is the one the tool chose.
because the standard says to return in r0, code that implements one of these functions.
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c )
{
return(a+b^c);
}
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e0200002 eor r0, r0, r2
8: e12fff1e bx lr
it is very interesting now that I see this that the compiler did not do a tail optimization on the call, it could have not saved lr and did a branch to fun, since the return value in r0 is what test() was also returning in the same register. really kind of baffled that that didnt happen.
but you can see that indeed the return value is left in r0, and per the convention we can trash r0-r3 we dont have to preserve them, and these functions are not.
if you change test to this
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c );
unsigned int test ( void )
{
return(fun(1,2,3)+7);
}
then it cant tail optimize and also shows the return register so you dont have to create a fun() function to see it.
00000000 <test>:
0: e92d4010 push {r4, lr}
4: e3a02003 mov r2, #3
8: e3a01002 mov r1, #2
c: e3a00001 mov r0, #1
10: ebfffffe bl 0 <fun>
14: e8bd4010 pop {r4, lr}
18: e2800007 add r0, r0, #7
1c: e12fff1e bx lr
you can do this kind of thing with other targets or other compilers, and there is no reason to assume that one target has the same convention as another.
Disassembly of section .text:
00000000 <fun>:
0: 0f 5e add r14, r15
2: 0f ed xor r13, r15
4: 30 41 ret
0000000000000000 <fun>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: 31 d0 xor %edx,%eax
5: c3 retq
and this one is stack based instead of register based
Disassembly of section .text:
00000000 <_fun>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 1d41 0004 mov 4(r5), r1
8: 6d41 0006 add 6(r5), r1
c: 1d40 0008 mov 10(r5), r0
10: 7840 xor r1, r0
12: 1585 mov (sp)+, r5
14: 0087 rts pc
But if this is just a pure assembly project and you dont have to interface with compiled output, do whatever you want, part of designing the project is not just each individual function but how they interact, no different than C or Python or some other language you have to still define the interface for yourself between functions. Assembly doesnt make that special or different, just another language.

Why ARM gcc push register r3 and lr into stack at the beginning of a function? [duplicate]

This question already has answers here:
ARM: Why do I need to push/pop two registers at function calls?
(3 answers)
Closed 12 months ago.
I tried to write a simple test code like this(main.c):
main.c
void test(){
}
void main(){
test();
}
Then I used arm-non-eabi-gcc to compile and objdump to get the assembly code:
arm-none-eabi-gcc -g -fno-defer-pop -fomit-frame-pointer -c main.c
arm-none-eabi-objdump -S main.o > output
The assembly code will push r3 and lr registers, even the function did nothing.
main.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <test>:
void test(){
}
0: e12fff1e bx lr
00000004 <main>:
void main(){
4: e92d4008 push {r3, lr}
test();
8: ebfffffe bl 0 <test>
}
c: e8bd4008 pop {r3, lr}
10: e12fff1e bx lr
My question is why arm gcc choose to push r3 into stack, even test() function never use it? Does gcc just random choose 1 register to push?
If it's for the stack aligned(8 bytes for ARM) requirement, why not just subtract the sp? Thanks.
==================Update==========================
#KemyLand For your answer, I have another example:
The source code is:
void test1(){
}
void test(int i){
test1();
}
void main(){
test(1);
}
I use the same compile command above, then get the following assembly:
main.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <test1>:
void test1(){
}
0: e12fff1e bx lr
00000004 <test>:
void test(int i){
4: e52de004 push {lr} ; (str lr, [sp, #-4]!)
8: e24dd00c sub sp, sp, #12
c: e58d0004 str r0, [sp, #4]
test1();
10: ebfffffe bl 0 <test1>
}
14: e28dd00c add sp, sp, #12
18: e49de004 pop {lr} ; (ldr lr, [sp], #4)
1c: e12fff1e bx lr
00000020 <main>:
void main(){
20: e92d4008 push {r3, lr}
test(1);
24: e3a00001 mov r0, #1
28: ebfffffe bl 4 <test>
}
2c: e8bd4008 pop {r3, lr}
30: e12fff1e bx lr
If push {r3, lr} in first example is for use less instructions, why in this function test(), the compiler didn't just using one instruction?
push {r0, lr}
It use 3 instructions instead of 1.
push {lr}
sub sp, sp #12
str r0, [sp, #4]
By the way, why it sub sp with 12, the stack is 8-bytes aligned, it can just sub it with 4 right?
According to the Standard ARM Embedded ABI, r0 through r3 are used to pass the arguments to a function, and the return value thereof, meanwhile lr (a.k.a: r14) is the link register, whose purpose is to hold the return address for a function.
It's obvious that lr must be saved, as otherwise main() would have no way to return to its caller.
It's now notorious to mention that every single ARM instruction takes 32 bits, and as you mentioned, ARM has a call stack alignment requirement of 8 bytes. And, as a bonus, we're using the Embedded ARM ABI, so code size shall be optimized. Thus, it's more efficient to have a single 32-bit instruction both saving lr and aligning the stack by pushing an unused register (r3 is not needed, because test() does not take arguments nor it returns anything), and then pop in a single 32-bit instruction, rather than adding more instructions (and thus, wasting precious memory!) to manipulate the stack pointer.
After all, it's pretty logical to conclude this is just an optimization from GCC.

How does a linker work exactly (microcontroller context)?

I've been programming in C and C++ for quite a long time now, so I'm familiar with the linking process as a user: the preprocessor expands all prototypes and macros in each .c file which is then compiled separately into its own object file, and all object files together with static libraries are linked into an executable.
However I'd like to know more about this process: how does the linker link the object files (what do they contain anyway?)? Matching declared but undefined functions with their definitions in other files (how?)? Translating into the exact content of the program memory (context: microcontrollers)?
Application example
Ideally, I'm looking for a detailed step-by-step description of what the process is doing, based on the following simplistic example. Since it doesn't appear to be said anywhere, fame and glory to whoever answers in this way.
main.c
#include "otherfile.h"
int main(void) {
otherfile_print("Foo");
return 0;
}
otherfile.h
void otherfile_print(char const *);
otherfile.c
#include "otherfile.h"
#include <stdio.h>
void otherfile_print(char const *str) {
printf(str);
}
printf is insanely complicated, very bad for a microcontroller hello world example, blinking leds are better but that gets specific to the microcontroller. this will suffice for linking.
two.c
unsigned int glob;
unsigned int two ( unsigned int a, unsigned int b )
{
glob=5;
return(a+b+7);
}
one.c
extern unsigned int glob;
unsigned int two ( unsigned int, unsigned int );
unsigned int one ( void )
{
return(two(5,6)+glob);
}
start.s
.globl _start
_start:
bl one
b .
build everything.
% arm-none-eabi-gcc -O2 -c one.c -o one.o
% arm-none-eabi-gcc -O2 -c two.c -o two.o
% touch start.s
% arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
% arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
% arm-none-eabi-as start.s -o start.o
% arm-none-eabi-ld -Ttext=0x10000000 start.o one.o two.o -o onetwo.elf
now lets look...
arm-none-eabi-objdump -D start.o
...
00000000 <_start>:
0: ebfffffe bl 0 <one>
4: eafffffe b 4 <_start+0x4>
it not is the compiler/assemblers job to deal with external references so the branch link to one is left incomplete, they chose to make it a bl of 0 but they could have simply left it totally unencoded, it is up to the authors of the toolchain as to how to communicate between the compiler, assembler, and linker via object files.
Same here
00000000 <one>:
0: e92d4008 push {r3, lr}
4: e3a00005 mov r0, #5
8: e3a01006 mov r1, #6
c: ebfffffe bl 0 <two>
10: e59f300c ldr r3, [pc, #12] ; 24 <one+0x24>
14: e5933000 ldr r3, [r3]
18: e0800003 add r0, r0, r3
1c: e8bd4008 pop {r3, lr}
20: e12fff1e bx lr
24: 00000000 andeq r0, r0, r0
both the function two and the address for the global variable glob are unknown. Note that for the unknown variable the compiler generates code that requires the explicit address of the global so that the linker simply needs to fill in the address, also glob is .data not .text.
00000000 <two>:
0: e59f3010 ldr r3, [pc, #16] ; 18 <two+0x18>
4: e2811007 add r1, r1, #7
8: e3a02005 mov r2, #5
c: e0810000 add r0, r1, r0
10: e5832000 str r2, [r3]
14: e12fff1e bx lr
18: 00000000 andeq r0, r0, r0
here too the global is in .data not here, so the linker will have to place .data and the things in it and then fill in the addresses.
so here we have linked it all together, the gnu linker requires an entry point label defined _start (main is an extern address required by the standard bootstrap, which I am not using so we dont get a main not found error). Because I am not using a linker script the gnu linker places items in the binary in the order they were defined on the command line, as desired i need start first for a microcontroller since I am controlling the boot. I used a non-zero here for demonstration purposes as well...
10000000 <_start>:
10000000: eb000000 bl 10000008 <one>
10000004: eafffffe b 10000004 <_start+0x4>
10000008 <one>:
10000008: e92d4008 push {r3, lr}
1000000c: e3a00005 mov r0, #5
10000010: e3a01006 mov r1, #6
10000014: eb000005 bl 10000030 <two>
10000018: e59f300c ldr r3, [pc, #12] ; 1000002c <one+0x24>
1000001c: e5933000 ldr r3, [r3]
10000020: e0800003 add r0, r0, r3
10000024: e8bd4008 pop {r3, lr}
10000028: e12fff1e bx lr
1000002c: 1000804c andne r8, r0, ip, asr #32
10000030 <two>:
10000030: e59f3010 ldr r3, [pc, #16] ; 10000048 <two+0x18>
10000034: e2811007 add r1, r1, #7
10000038: e3a02005 mov r2, #5
1000003c: e0810000 add r0, r1, r0
10000040: e5832000 str r2, [r3]
10000044: e12fff1e bx lr
10000048: 1000804c andne r8, r0, ip, asr #32
Disassembly of section .bss:
1000804c <__bss_start>:
1000804c: 00000000 andeq r0, r0, r0
so the linker starts to place the first item start.o, it roughly figures out how big that needs to be by just putting what was there. those two instructions. they take 8 bytes so in theory the second item one.o goes next at 0x10000008. That means the encoding for the bl one in start.s can be completed to use the correct relative address (_start + 8 which is the value of the pc when executing so the offset is zero, pc+0 is the encoding)
the linker has roughly placed one.o into the binary it is building and it has to resolve the address to two and the global so it has to place two.o and then figure out where the end of that is to place in this case .bss not .data since I didnt pre-init the variable.
the label for two is at 0x10000030 so it encodes the bl two in one() for that relative offset, it has also placed glob at 1000804c for some reason (I didnt complete define where ram was so the gnu linker will do things like this). Despite the reason, that is where the linker defined the home for that global variable and where the address to glob is needed is filled in by the linker, both one() and two() needed those filled in.
So the compiler (assembler) and linker have to in the end result in a usable binary, the compiler (assembler) tend to worry about making position independent machine code and leave enough information for the linker so that it has the machine code and a list of unresolved externs that it has to fill in. compilers have improved over time, a simple model would be to have an address location like they did above for the global variables address, where the linker computes the absolute address and just fills it in, clearly above they did not encode the function call in a way that it can use an absolute address to one and two. instead it uses pc relative addressing. This means that the linker has to know the machine code encoding of the bl instruction. the current generation of gnu linker knows quite a bit more and can do some cool things resolving arm to thumb and back, stuff it didnt used to know (you dont need to compile for thumb interwork anymore the linker takes care of it).
So the linker takes binary blobs including data and...links them together into one binary. It first needs to know the actual addresses for the various things in the binary. How you tell the linker this is linker specific and not a global thing for all C/C++ toolchains. Gnu linker scripts are a programming language in and of themselves. These are not necessarily physical nor virtual addresses it is simply the address space of the code in whatever mode it is in (virtual or physical). Once the linker knows the addresses it, based on linker rules (again linker specific) it starts placing these various binary blobs into those address spaces. then it goes through and resolves the external/global addresses. It was not above but can be an iterative process. If for example the function two() was at an address in memory that cannot be accessed with a single pc relative instruction (say we put one near zero and two near 0xF0000000) then those that wrote the linker have two choices, the simple choice is to simply state that it cannot encode/implement that far of a branch and bail out and gnu linker did or still does do that. Or the other solution is the linker fixes the problem. the linker could add a few words of data within the range of the pc relative branch link and those few words of data are a trampoline for example an absolute address that is loaded into a register then a register based branch or perhaps of clever a pc relative branch if the trampoline is within range (in the case of 0x10000000 to 0xF0000000 that wouldnt work). If the linker has to add these few words then that may mean that some of the binary blobs have to move to make room for those few words and now all of the addresses in those binary blobs now have to move as well. So you have to make another pass across all the binary blobs, resolving all of the new addresses filling in the answers and for pc relative determining if you can still reach everything. Adding those few words might have made something that was reachable with a pc-relative now unreachable and now that requires a solution (error or patch).
The assembler itself for a single source file has to go through even more of these gyrations esp for a variable length instruction set like x86 where the addressing is a big vague. I recommend trying for yourself to make a simple assembler that only supports a few instructions but some of those branches. and parse and encode the instructions and compare that to an existing debugged assembler like gnu assembler.
test.s
ldr r1,locdat
nop
nop
nop
nop
nop
b over
locdat: .word 0x12345678
top:
nop
nop
nop
nop
nop
nop
over:
b top
the right answer is
00000000 <locdat-0x1c>:
0: e59f1014 ldr r1, [pc, #20] ; 1c <locdat>
4: e1a00000 nop ; (mov r0, r0)
8: e1a00000 nop ; (mov r0, r0)
c: e1a00000 nop ; (mov r0, r0)
10: e1a00000 nop ; (mov r0, r0)
14: e1a00000 nop ; (mov r0, r0)
18: ea000006 b 38 <over>
0000001c <locdat>:
1c: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
00000020 <top>:
20: e1a00000 nop ; (mov r0, r0)
24: e1a00000 nop ; (mov r0, r0)
28: e1a00000 nop ; (mov r0, r0)
2c: e1a00000 nop ; (mov r0, r0)
30: e1a00000 nop ; (mov r0, r0)
34: e1a00000 nop ; (mov r0, r0)
00000038 <over>:
38: eafffff8 b 20 <top>
there are parallels to that activity and the job of a linker. also you could fashion a simple linker based on the above files or something similar, extract the binary blobs and other info and start placing them in whatever address space you want.
Either one are fairly simple programming tasks, yet fairly educational. Having an existing toolchain that can produce the answer you can figure out where you are going wrong or how to get at the right answer.

Resources