Problems with static local variables with relocatable code - arm

I am building a project which has relocatable code on bare metal. It is a Cortex M3 embedded application. I do not have a dynamic linker and have implemented all the relocations in my startup code.
Mostly it is working but my local static variables appear to be incorrectly located. Their address is offset by the amount that my executable is offset in memory - ie I compile my code as if it is loaded at memory location 0 but I actually load it in memory located at 0x8000. The static local variable have their memory address offset by 0x8000 which is not good.
My global variables are located properly by the GOT but the static local variables are not in the GOT at all (at least they don't appear when I run readelf -r). I am compiling my code with -fpic and the linker has -fpic and -pie specified. I think I must be missing a compile and/or link option to either instruct gcc to use the GOT for the static local variables or to instruct it to use absolute addressing for them.
It seems that currently the code adds the PC to the location of the static local variables.

I think I have repeated what you are seeing:
statloc.c
unsigned int glob;
unsigned int fun ( unsigned int a )
{
static unsigned int loc;
if(a==0) loc=7;
return(a+glob+loc);
}
arm-none-linux-gnueabi-gcc -mcpu=cortex-m3 -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -fpic -pie -S statloc.c
Which gives:
.cpu cortex-m3
.fpu softvfp
.thumb
.text
.align 2
.global fun
.thumb
.thumb_func
fun:
ldr r3, .L6
.LPIC2:
add r3, pc
cbnz r0, .L5
ldr r1, .L6+4
movs r2, #7
.LPIC1:
add r1, pc
ldr ip, .L6+8
str r2, [r1, #0]
ldr r1, [r3, ip]
ldr r3, [r1, #0]
adds r0, r0, r3
adds r0, r0, r2
bx lr
.L5:
ldr ip, .L6+8
ldr r2, .L6+12
ldr r1, [r3, ip]
.LPIC0:
add r2, pc
ldr r2, [r2]
ldr r3, [r1, #0]
adds r0, r0, r3
adds r0, r0, r2
bx lr
.L7:
.align 2
.L6:
.word _GLOBAL_OFFSET_TABLE_-(.LPIC2+4)
.word .LANCHOR0-(.LPIC1+4)
.word glob(GOT)
.word .LANCHOR0-(.LPIC0+4)
.size fun, .-fun
.comm glob,4,4
.bss
.align 2
.LANCHOR0 = . + 0
.type loc.823, %object
.size loc.823, 4
loc.823:
.space 4
I also added startup code and compiled a binary and disassembled to further understand/verify what is going on.
this is the offset from the pc to the .got
ldr r3, .L6
add pc so r3 holds a position independent offset to the .got
add r3, pc
offset in the got for the address of the glob
ldr ip, .L6+8
read the absolute address for the global variable from the got
ldr r1, [r3, ip]
finally read the global variable into r3
ldr r3, [r1, #0]
this is the offset from the pc to the static local in .bss
ldr r2, .L6+12
add pc so that r2 holds a position independent offset to the static local
in .bss
add r2, pc
read the static local in .bss
ldr r2, [r2]
So if you were to change where .text is loaded and were to change where both .got and .bss are loaded relative to .text and that is it then the contents of .got would be wrong and the global variable would be loaded from the wrong place.
if you were to change where .text is loaded, leave .bss where the linker put it and move .got relative to .text. then the global would be pulled from the right place and the local would not
if you were to change where .text is loaded, change where both .got and .bss are loaded relative to .text and modify .got contents to reflect where .text is loaded, then both the local and global variable would be accessed from the right place.
So the loader and gcc/ld need to all be in sync. My immediate recommendation is to not use a static local and just use a global. That or dont worry about position independent code, it is a cortex-m3 after all and somewhat resource limited, just define the memory map up front. I assume the question is how do I make gcc use the .got for the local global, and that one I dont know the answer to, but taking a simple example like the one above you can work through the many command line options until you find one that changes the output.

This problem is also discussed here: https://answers.launchpad.net/gcc-arm-embedded/+question/236744
Starting with gcc 4.8 (ARM), there's a command line switch called -mpic-data-is-text-relative which causes static variables being addressed through the GOT as well.

Related

Static library code uses wrong address for extern struct residing in application code

really strange error on my side. I am programming a firmware for a Cortex-M4f running Nucleus RTOS. For my application I have some prebuilt static libraries (e.g. libexternal.a) which expect an global struct which must be provided by the application (declared extern inside the static library).
From within my application code I can access the struct just fine, but from within the code of the static library I always get an HardFault interrupt.
Through debugging I found that the processor tries to access the struct on the wrong address. Here is some example pseudo-code:
appconfig.c
struct AppConfiguration g_appconfig = {
/* initialize everything statically */
};
main.c
include <staticlib.h>
void main(){
g_appconfig.somemember = 1 /* this works */
staticlib_lib_init();
}
Everything below is part of the static library and prebuilt befor compiling the application.
appconfig.h
struct AppConfiguration{
uint32_t somemember;
};
staticlib.h
#include "appconfig.h"
extern struct AppConfiguration g_appconfig;
void static_lib_init();
staticlib.c
#include "staticlib.h"
void static_lib_init(){
g_appconfig.somemember = 1 /* causes a HardFault */
}
The static library is compiled with flags:
-ffunction-sections
-fdata-sections
-fno-builtin-memcpy
-DARM_MATH_CM4=1
-mthumb
-mcpu=cortex-m4
-mfloat-abi=soft
-fPIC
-Wno-unused-but-set-variable
and the application with flags:
-ffunction-sections
-fdata-sections
-fno-builtin-memcpy
-mthumb
-mcpu=cortex-m4
-g
-mfloat-abi=soft
Through debugging I discovered that the static library tries to access g_appconfig at the wrong Memory address. E.g.
g_appconfig starts at address 0x2000061c and static_lib_init() tries to access it at address 0xff7f4342. My SRAM is 320k large starting at 0x2000000 and my ROM is 2MB large starting at 0x8000000. So it makes no sense to access memory at 0xff...... (propably some hardware register or anything else).
Dissassembly of static_lib_init() looks like following:
00000000 <static_lib_init>:
0: b5f0 push {r4, r5, r6, r7, lr}
2: b08f sub sp, #60 ; 0x3c
4: af06 add r7, sp, #24
6: 6178 str r0, [r7, #20]
8: 6139 str r1, [r7, #16]
a: 60fa str r2, [r7, #12]
c: 60bb str r3, [r7, #8]
e: 4cf9 ldr r4, [pc, #996] ; (3f4 <static_lib_init+0x3f4>)
10: 447c add r4, pc
12: 4bf9 ldr r3, [pc, #996] ; (3f8 <static_lib_init+0x3f8>)
14: 58e3 ldr r3, [r4, r3]
16: 461a mov r2, r3
18: 2301 movs r3, #1
1a: f8c2 30d0 str.w r3, [r2, #208] ; 0xd0
r2 should contain the starting address of g_appconfig (#208 is the offset of the element I want to access in my realworld code), but while debugging contains 0xff7f4342.
Any idea how this could happen? Shouldn't the linker replace the address of g_appconfig in the resulting static library while linking?
I can't give you a sure-bet smoking gun answer without being able to test it myself, but my guess is the problem is that you're building your library with -fPIC, but your main application is not.
Keep in mind that when you ask the compiler to generate position independent code, you're also asking it to generate position independent data. In your case the global variable g_appconfig is going to be at a fixed location determined by the linker.
Try taking out the -fPIC option on your library build. Since you're only linking statically you don't really need it, and it's more likely to hurt your performance than help.

Where do macros such as global integers get stored in memory?

I am trying to understand where things are stored in memory such as global and static variables (.data, if not initialized to zero,) etc.
What I am trying to find/considering is a macro such as shown below:
#define thisInteger 100
Can this be found using objdump?
Additionally, if I were to then assign this to a new variable such as below, where would this be found (guessing in .data):
#define THIS_INTEGER 100
int newVariable = THIS_INTEGER;
Macros are not variables, thus they are not stored anywhere. When you do #define thisInteger 100, C preprocessor runs through the source code and replaces thisInteger with the integer literal 100. Asking where thisInteger is stored is the same as asking where 100 is stored. To verify this, try something like &thisInteger. It won't compile because &100 is illegal and makes no sense.
Can this be found using objdump?
No. Preprocessing is a copy-paste processing done before compilation.
Additionally, if I were to then assign this to a new variable such as below, where would this be found
Depends on where you define the variable.
macros are only compile time (they are preprocessed before compilation)
If you use gcc compiler you can see preprocessed C file by using -E gcc option. This preprocessed file will be used in the actual compilation.
Your preprocessed example
if the newVariable has static or thread storage duration it is initialized to this value before the main function is called
if the newVariable has an automatic storage duration it is initialized to this value when the function is called.
The compiler will source the value 100 wherever the macro is used. It is most likely found in various machine code instructions, using immediate mode addressing, e.g. when used within expression statements, like a = a + 100, or f(100).
The compiler will most likely embed small constants like this on demand within instructions involved in computing expressions like the above, so if we do a = a + thisInteger; and f(thisInteger), there will probably be two different machine code instructions that embed the constant 100 as an immediate, one for each such use. Global data takes work to address, more so than embedding small immediates, so the compiler will not attempt to share the 100 between the two uses as global or static data.
So, yes, you can see the 100 in objdump, but for many usages you probably need to look at the code (.text) section to find instructions that use #100 as an immediate operand (or #64h if printed in hex). In disassembly, you're looking for instructions like add [rbp+24], #100, or move rdi, #100.
You're right that if you declare a mutable global variable int x = thisInteger; you could find the 100 in the data (.data) section with objdump. But local variable of the same declaration would be initialized at runtime using machine code instructions, so something like mov ??, #100.
try it yourself and see
Starting point: so.c
#define THIS_INTEGER 100
int newVariable = THIS_INTEGER;
void fun0 ( void )
{
static int hello;
hello = 100;
}
int fun1 ( void )
{
int hello;
hello = 100;
return(hello);
}
the pre-processor does the search and replace for the defines
arm-none-eabi-gcc -save-temps -O2 -c so.c -o so.o
so.i
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
int newVariable = 100;
void fun0 ( void )
{
static int hello;
hello = 100;
}
int fun1 ( void )
{
int hello;
hello = 100;
return(hello);
}
You can see that THIS_INTEGER no longer exists it was just a macro/define its purpose is to keep tract of a constant in this case so that if you want to change it you can change all the relevant instances of it. But the compiler needs something it can actually compile.
The preprocessor output so.i is then fed to the actual compiler and that produces assembly: so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun0
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type fun0, %function
fun0:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
bx lr
.size fun0, .-fun0
.align 2
.global fun1
.syntax unified
.arm
.fpu softvfp
.type fun1, %function
fun1:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #100
bx lr
.size fun1, .-fun1
.global newVariable
.data
.align 2
.type newVariable, %object
.size newVariable, 4
newVariable:
.word 100
.ident "GCC: (GNU) 9.2.0"
That is fed to the assembler and then if you disassemble that you get:
Disassembly of section .text:
00000000 <fun0>:
0: e12fff1e bx lr
00000004 <fun1>:
4: e3a00064 mov r0, #100 ; 0x64
8: e12fff1e bx lr
Disassembly of section .data:
00000000 <newVariable>:
0: 00000064
Ehh I had hoped the static would keep it there. For the global variable being initialized that makes it .data if it werent it would be .bss. Then in .data you
can see the 100 (0x64). but it has nothing to do with the macro/define the macro/define simply put the actual value 100 in the actual compiled code.
For the other case, with optimization here, there is no variable on the stack or anything like that the value is placed in the return register, so in this case it lives in a register briefly.
Had the static worked as desired which in hindsight it makes sense it didnt. I was hoping for what I call a local global. Its a local variable but adding static puts it in .bss or .data not the stack and then was hoping to see code generated to then put 100 in a variable then put that in that .data/.bss area which works unoptimized of course but that is harder to read:
Disassembly of section .text:
00000000 <fun0>:
0: e52db004 push {r11} ; (str r11, [sp, #-4]!)
4: e28db000 add r11, sp, #0
8: e59f3018 ldr r3, [pc, #24] ; 28 <fun0+0x28>
c: e3a02064 mov r2, #100 ; 0x64
10: e5832000 str r2, [r3]
14: e1a00000 nop ; (mov r0, r0)
18: e1a00003 mov r0, r3
1c: e28bd000 add sp, r11, #0
20: e49db004 pop {r11} ; (ldr r11, [sp], #4)
24: e12fff1e bx lr
28: 00000000 andeq r0, r0, r0
0000002c <fun1>:
2c: e52db004 push {r11} ; (str r11, [sp, #-4]!)
30: e28db000 add r11, sp, #0
34: e24dd00c sub sp, sp, #12
38: e3a03064 mov r3, #100 ; 0x64
3c: e50b3008 str r3, [r11, #-8]
40: e51b3008 ldr r3, [r11, #-8]
44: e1a00003 mov r0, r3
48: e28bd000 add sp, r11, #0
4c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
50: e12fff1e bx lr
Disassembly of section .data:
00000000 <newVariable>:
0: 00000064 andeq r0, r0, r4, rrx
Disassembly of section .bss:
00000000 <hello.4142>:
0: 00000000 andeq r0, r0, r0
Specifically:
c: e3a02064 mov r2, #100 ; 0x64
10: e5832000 str r2, [r3]
The 100 is put in a register, then that register value is written to memory where the local global hello from fun0 lives in .bss.
macros/defines simply search and replace, the preprocessor is going to iterate as many times as needed for the various levels/layers of macros until they are all replaced, none of them exist as written in the pre-processed code. Then that is sent to the compiler.
The VALUE 100 in this case is visible in the final output but it depends on how you used it as to how it is represented or where it is stored.

Process sections: does a declaration add also something to .text? If yes, what does it add?

I have a C code like this one, that will be possibly compiled in an ELF file for ARM:
int a;
int b=1;
int foo(int x) {
int c=2;
static float d=1.5;
// .......
}
I know that all the executable code goes into the .text section, while .data , .bss and .rodata will contain the various variables/constants.
My question is: does a line like int b=1; here add also something to the .text section, or does it only tell the compiler to place a new variable initialized to 1 in .data (then probably mapped in RAM memory when deployed on the final hardware)?
Moreover, trying to decompile a similar code, I noticed that a line such as int c=2;, inside the function foo(), was adding something to the stack, but also some lines of .text where the value '2' was actually memorized there.
So, in general, does a declaration always imply also something added to .text at an assembly level? If yes, does it depends on the context (i.e. if the variable is inside a function, if it is a local global variable, ...) and what is actually added?
Thanks a lot in advance.
does a line like int b=1; here add also something to the .text section, or does it only tell the compiler to place a new variable initialized to 1 in .data (then probably mapped in RAM memory when deployed on the final hardware)?
You understand that this is likely to be implementation specific, but the likelihood is that that you will just get initialised data in the data section. Were it a constant, it might, instead go into the text section.
Moreover, trying to decompile a similar code, I noticed that a line such as int c=2;, inside the function foo(), was adding something to the stack, but also some lines of .text where the value '2' was actually memorized there.
Automatic variables that are initialised, have to be initialised each time the function's scope is entered. The space for c is reserved on the stack (or in a register, depending on the ABI) but the program has to remember the constant from which it is initialised and this is best placed somewhere in the text segment, either as a constant value or as a "move immediate" instruction.
So, in general, does a declaration always imply also something added to .text at an assembly level?
No. If a static variable is initialised to zero or null or not initialised at all, it is often just enough to reserve space in bss. If a static non constant variable is initialised to a non zero value, it will just be put in the data segment.
As #goodvibration correctly stated, only global or static variables go to the segments. This is because their lifetime is the whole execution time of the program.
Local variables have a different lifetime. They exist only during the execution of the block (e.g. function) they are defined within. If a function is called, all parameters that does not fit into registers a pushed to the stack and the return address is written to the link register.* The function saves possibly the link register and other registers at the stack and adds some space at the stack for local variables (this is the code you have observed). At the end of the function, the saved registers are poped and the the stackpointer is readjusted. In this way, you get an automatic garbage collection for local variables.
*: Please note, that this is true for (some calling conventions of) ARM only. It's different e.g. for Intel processors.
this is one of those just try it things.
int a;
int b=1;
int foo(int x) {
int c=2;
static float d=1.5;
int e;
e=x+2;
return(e);
}
first thing without optimization.
arm-none-eabi-gcc -c so.c -o so.o
arm-none-eabi-objdump -D so.o
arm-none-eabi-ld -Ttext=0x1000 -Tdata=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf > so.list
do worry about the warning, needed to link to see that everything found a home
Disassembly of section .text:
00001000 <foo>:
1000: e52db004 push {r11} ; (str r11, [sp, #-4]!)
1004: e28db000 add r11, sp, #0
1008: e24dd014 sub sp, sp, #20
100c: e50b0010 str r0, [r11, #-16]
1010: e3a03002 mov r3, #2
1014: e50b3008 str r3, [r11, #-8]
1018: e51b3010 ldr r3, [r11, #-16]
101c: e2833002 add r3, r3, #2
1020: e50b300c str r3, [r11, #-12]
1024: e51b300c ldr r3, [r11, #-12]
1028: e1a00003 mov r0, r3
102c: e28bd000 add sp, r11, #0
1030: e49db004 pop {r11} ; (ldr r11, [sp], #4)
1034: e12fff1e bx lr
Disassembly of section .data:
00002000 <b>:
2000: 00000001 andeq r0, r0, r1
00002004 <d.4102>:
2004: 3fc00000 svccc 0x00c00000
Disassembly of section .bss:
00002008 <a>:
2008: 00000000 andeq r0, r0, r0
as a disassembly it tries to disassemble data so ignore that (the andeq next to 0x2008 for example).
The a variable is global and uninitialized so it lands in .bss (typically...a compiler can choose to do whatever it wants so long as it implements the language correctly, doesnt have to have something called .bss for example, but gnu and many others do).
b is global and initialized so it lands in .data, had it been declared as const it might land in .rodata depending on the compiler and what it offers.
c is a local non-static variable that is initialized, because C offers recursion this needs to be on the stack (or managed with registers or other volatile resources), and initialized each run. We needed to compile without optimization to see this
1010: e3a03002 mov r3, #2
1014: e50b3008 str r3, [r11, #-8]
d is what I call a local global, it is a static local so it lives outside the function, not on the stack, alongside the globals but with local access only.
I added e to your example, this is a local not initialized, but then used. Had I not used it and not optimized there probably would have been space allocated for it but no initialization.
save x on the stack (per this calling convention x enters in r0)
100c: e50b0010 str r0, [r11, #-16]
then load x from the stack, add two, save as e on the stack. read e from
the stack and place in the return location for this calling convention which is r0.
1018: e51b3010 ldr r3, [r11, #-16]
101c: e2833002 add r3, r3, #2
1020: e50b300c str r3, [r11, #-12]
1024: e51b300c ldr r3, [r11, #-12]
1028: e1a00003 mov r0, r3
For all architectures, unoptimized this is somewhat typical, always read variables from the stack and put them back quickly. Other architectures have different calling conventions with respect to where the incoming parameters and outgoing return value live.
If I optmize (-O2 on the gcc line)
Disassembly of section .text:
00001000 <foo>:
1000: e2800002 add r0, r0, #2
1004: e12fff1e bx lr
Disassembly of section .data:
00002000 <b>:
2000: 00000001 andeq r0, r0, r1
Disassembly of section .bss:
00002004 <a>:
2004: 00000000 andeq r0, r0, r0
b is a global, so at the object level a global space has to be reserved for it, it is .data, optimization doesnt change that.
a is also global and still .bss, because at the object level it was declared such so allocated in case another object needs it. The linker doesnt remove these.
Now c and d are dead code they dont do anything they need no storage so
c is no longer allocated space on the stack nor is d allocated any .data
space.
We have plenty of registers for this architecture for this calling convention for this code, so e does not need any memory allocated on the
stack, it comes in in r0 the math can be done with r0 and then it is returned in r0.
I know I didnt tell the linker where to put .bss by telling it .data it put .bss in the same space without complaint. I could have put -Tbss=0x3000 for example to give it its own space or just done a linker script. Linker scripts can play havoc with the typical results, so beware.
Typical, but there might be a compiler with exceptions:
non-constant globals go in .data or .bss depending on whether they are initialized during the declaration or not.
If const then perhaps .rodata or .text depending (or .data or .bss would technically work)
non-static locals go in general purpose registers or on the stack as needed (if not completely optimized away).
static locals (if not optimized away) live with globals but are not globally accessible they just get allocated space in .data or .bss like the globals do.
parameters are governed completely by the calling convention used by that compiler for that target. Just because arm or mips or other may have written down a convention doesnt mean a compiler has to use it, only if they claim to support some convention or standard should they then attempt to comply. For a compiler to be useful it needs a convention and stick to it whatever it is, so that both caller and callee of a function know where to get parameters and to return a value. Architectures with enough registers will often have a convention where some few number of registers are used for the first so many parameters (not necessarily one to one) and then the stack is used for all other parameters. likewise a register may be used if possible for a return value. Some architectures due to lack of gprs or other, use the stack in both directions. or the stack in one and a register in the other. You are welcome to seek out the conventions and try to read them, but at the end of the day the compiler you are using, if not broken follows a convention and by setting up experiments like the one above you can see the convention in action.
Plus in this case optimizations.
void more_fun ( unsigned long long );
unsigned fun ( unsigned int x, unsigned long long y )
{
more_fun(y);
return(x+1);
}
If I told you that arm conventions typically use r0-r3 for the first few parameters you might assume that x is in r0 and r1 and r2 are used for y and we could have another small parameter before needing the stack, well
perhaps older arm, but now it wants the 64 bit variable to use an even then an odd.
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e1a04000 mov r4, r0
8: e1a01003 mov r1, r3
c: e1a00002 mov r0, r2
10: ebfffffe bl 0 <more_fun>
14: e2840001 add r0, r4, #1
18: e8bd4010 pop {r4, lr}
1c: e12fff1e bx lr
so r0 contains x, r2/r3 contain y and r1 was passed over.
the test was crafted to not have y as dead code and to pass it to another function we can see where y was stored on the way into fun and way out to more_fun. r2/r3 on the way in, needs to be in r0/r1 to call more fun.
we need to preserve x for the return from fun. one might expect that x would land on the stack, which unoptimized it would, but instead save a register that the convention has stated will be preserved by functions (r4) and use r4 throughout the function or at least in this function to store x. A performance optimization, if x needed to be touched more than once memory cycles going to the stack cost more than register accesses.
then it computes the return and cleans up the stack, registers.
IMO it is important to see this, the calling convention comes into play for some variables and others can vary based on optimization, no optimization they are what most folks are going to state off hand, .bss, .data (.text/.rodata), with optimization then it depends if if the variable survives at all.

_start definition in U-boot source

I am understanding U-boot(v2014.07).
In the start.S(at arch/arm/cpu/armv7/) file it is loading vector base address using following instructions.
ldr r0, =_start
mcr p15, 0, r0, c12, c0, 0 #Set VBAR
Can you please guide to understand where "_start" is defined. I checked in start.S and lowlevel_init.S, but I couldn't find.
Can you please guide to understand where "_start" is defined
For the ARM architecture, _start is defined as a global in arch/arm/lib/vectors.S
When disassembly the start.o file, the "ldr r0, =_start" instruction is updated as "ldr r0, [pc, #104] ; 9c " .
That should correspond to the first entry in the 32-byte ARM Exception vector, i.e.
ldr pc, _reset

How does a linker work exactly (microcontroller context)?

I've been programming in C and C++ for quite a long time now, so I'm familiar with the linking process as a user: the preprocessor expands all prototypes and macros in each .c file which is then compiled separately into its own object file, and all object files together with static libraries are linked into an executable.
However I'd like to know more about this process: how does the linker link the object files (what do they contain anyway?)? Matching declared but undefined functions with their definitions in other files (how?)? Translating into the exact content of the program memory (context: microcontrollers)?
Application example
Ideally, I'm looking for a detailed step-by-step description of what the process is doing, based on the following simplistic example. Since it doesn't appear to be said anywhere, fame and glory to whoever answers in this way.
main.c
#include "otherfile.h"
int main(void) {
otherfile_print("Foo");
return 0;
}
otherfile.h
void otherfile_print(char const *);
otherfile.c
#include "otherfile.h"
#include <stdio.h>
void otherfile_print(char const *str) {
printf(str);
}
printf is insanely complicated, very bad for a microcontroller hello world example, blinking leds are better but that gets specific to the microcontroller. this will suffice for linking.
two.c
unsigned int glob;
unsigned int two ( unsigned int a, unsigned int b )
{
glob=5;
return(a+b+7);
}
one.c
extern unsigned int glob;
unsigned int two ( unsigned int, unsigned int );
unsigned int one ( void )
{
return(two(5,6)+glob);
}
start.s
.globl _start
_start:
bl one
b .
build everything.
% arm-none-eabi-gcc -O2 -c one.c -o one.o
% arm-none-eabi-gcc -O2 -c two.c -o two.o
% touch start.s
% arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
% arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
% arm-none-eabi-as start.s -o start.o
% arm-none-eabi-ld -Ttext=0x10000000 start.o one.o two.o -o onetwo.elf
now lets look...
arm-none-eabi-objdump -D start.o
...
00000000 <_start>:
0: ebfffffe bl 0 <one>
4: eafffffe b 4 <_start+0x4>
it not is the compiler/assemblers job to deal with external references so the branch link to one is left incomplete, they chose to make it a bl of 0 but they could have simply left it totally unencoded, it is up to the authors of the toolchain as to how to communicate between the compiler, assembler, and linker via object files.
Same here
00000000 <one>:
0: e92d4008 push {r3, lr}
4: e3a00005 mov r0, #5
8: e3a01006 mov r1, #6
c: ebfffffe bl 0 <two>
10: e59f300c ldr r3, [pc, #12] ; 24 <one+0x24>
14: e5933000 ldr r3, [r3]
18: e0800003 add r0, r0, r3
1c: e8bd4008 pop {r3, lr}
20: e12fff1e bx lr
24: 00000000 andeq r0, r0, r0
both the function two and the address for the global variable glob are unknown. Note that for the unknown variable the compiler generates code that requires the explicit address of the global so that the linker simply needs to fill in the address, also glob is .data not .text.
00000000 <two>:
0: e59f3010 ldr r3, [pc, #16] ; 18 <two+0x18>
4: e2811007 add r1, r1, #7
8: e3a02005 mov r2, #5
c: e0810000 add r0, r1, r0
10: e5832000 str r2, [r3]
14: e12fff1e bx lr
18: 00000000 andeq r0, r0, r0
here too the global is in .data not here, so the linker will have to place .data and the things in it and then fill in the addresses.
so here we have linked it all together, the gnu linker requires an entry point label defined _start (main is an extern address required by the standard bootstrap, which I am not using so we dont get a main not found error). Because I am not using a linker script the gnu linker places items in the binary in the order they were defined on the command line, as desired i need start first for a microcontroller since I am controlling the boot. I used a non-zero here for demonstration purposes as well...
10000000 <_start>:
10000000: eb000000 bl 10000008 <one>
10000004: eafffffe b 10000004 <_start+0x4>
10000008 <one>:
10000008: e92d4008 push {r3, lr}
1000000c: e3a00005 mov r0, #5
10000010: e3a01006 mov r1, #6
10000014: eb000005 bl 10000030 <two>
10000018: e59f300c ldr r3, [pc, #12] ; 1000002c <one+0x24>
1000001c: e5933000 ldr r3, [r3]
10000020: e0800003 add r0, r0, r3
10000024: e8bd4008 pop {r3, lr}
10000028: e12fff1e bx lr
1000002c: 1000804c andne r8, r0, ip, asr #32
10000030 <two>:
10000030: e59f3010 ldr r3, [pc, #16] ; 10000048 <two+0x18>
10000034: e2811007 add r1, r1, #7
10000038: e3a02005 mov r2, #5
1000003c: e0810000 add r0, r1, r0
10000040: e5832000 str r2, [r3]
10000044: e12fff1e bx lr
10000048: 1000804c andne r8, r0, ip, asr #32
Disassembly of section .bss:
1000804c <__bss_start>:
1000804c: 00000000 andeq r0, r0, r0
so the linker starts to place the first item start.o, it roughly figures out how big that needs to be by just putting what was there. those two instructions. they take 8 bytes so in theory the second item one.o goes next at 0x10000008. That means the encoding for the bl one in start.s can be completed to use the correct relative address (_start + 8 which is the value of the pc when executing so the offset is zero, pc+0 is the encoding)
the linker has roughly placed one.o into the binary it is building and it has to resolve the address to two and the global so it has to place two.o and then figure out where the end of that is to place in this case .bss not .data since I didnt pre-init the variable.
the label for two is at 0x10000030 so it encodes the bl two in one() for that relative offset, it has also placed glob at 1000804c for some reason (I didnt complete define where ram was so the gnu linker will do things like this). Despite the reason, that is where the linker defined the home for that global variable and where the address to glob is needed is filled in by the linker, both one() and two() needed those filled in.
So the compiler (assembler) and linker have to in the end result in a usable binary, the compiler (assembler) tend to worry about making position independent machine code and leave enough information for the linker so that it has the machine code and a list of unresolved externs that it has to fill in. compilers have improved over time, a simple model would be to have an address location like they did above for the global variables address, where the linker computes the absolute address and just fills it in, clearly above they did not encode the function call in a way that it can use an absolute address to one and two. instead it uses pc relative addressing. This means that the linker has to know the machine code encoding of the bl instruction. the current generation of gnu linker knows quite a bit more and can do some cool things resolving arm to thumb and back, stuff it didnt used to know (you dont need to compile for thumb interwork anymore the linker takes care of it).
So the linker takes binary blobs including data and...links them together into one binary. It first needs to know the actual addresses for the various things in the binary. How you tell the linker this is linker specific and not a global thing for all C/C++ toolchains. Gnu linker scripts are a programming language in and of themselves. These are not necessarily physical nor virtual addresses it is simply the address space of the code in whatever mode it is in (virtual or physical). Once the linker knows the addresses it, based on linker rules (again linker specific) it starts placing these various binary blobs into those address spaces. then it goes through and resolves the external/global addresses. It was not above but can be an iterative process. If for example the function two() was at an address in memory that cannot be accessed with a single pc relative instruction (say we put one near zero and two near 0xF0000000) then those that wrote the linker have two choices, the simple choice is to simply state that it cannot encode/implement that far of a branch and bail out and gnu linker did or still does do that. Or the other solution is the linker fixes the problem. the linker could add a few words of data within the range of the pc relative branch link and those few words of data are a trampoline for example an absolute address that is loaded into a register then a register based branch or perhaps of clever a pc relative branch if the trampoline is within range (in the case of 0x10000000 to 0xF0000000 that wouldnt work). If the linker has to add these few words then that may mean that some of the binary blobs have to move to make room for those few words and now all of the addresses in those binary blobs now have to move as well. So you have to make another pass across all the binary blobs, resolving all of the new addresses filling in the answers and for pc relative determining if you can still reach everything. Adding those few words might have made something that was reachable with a pc-relative now unreachable and now that requires a solution (error or patch).
The assembler itself for a single source file has to go through even more of these gyrations esp for a variable length instruction set like x86 where the addressing is a big vague. I recommend trying for yourself to make a simple assembler that only supports a few instructions but some of those branches. and parse and encode the instructions and compare that to an existing debugged assembler like gnu assembler.
test.s
ldr r1,locdat
nop
nop
nop
nop
nop
b over
locdat: .word 0x12345678
top:
nop
nop
nop
nop
nop
nop
over:
b top
the right answer is
00000000 <locdat-0x1c>:
0: e59f1014 ldr r1, [pc, #20] ; 1c <locdat>
4: e1a00000 nop ; (mov r0, r0)
8: e1a00000 nop ; (mov r0, r0)
c: e1a00000 nop ; (mov r0, r0)
10: e1a00000 nop ; (mov r0, r0)
14: e1a00000 nop ; (mov r0, r0)
18: ea000006 b 38 <over>
0000001c <locdat>:
1c: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
00000020 <top>:
20: e1a00000 nop ; (mov r0, r0)
24: e1a00000 nop ; (mov r0, r0)
28: e1a00000 nop ; (mov r0, r0)
2c: e1a00000 nop ; (mov r0, r0)
30: e1a00000 nop ; (mov r0, r0)
34: e1a00000 nop ; (mov r0, r0)
00000038 <over>:
38: eafffff8 b 20 <top>
there are parallels to that activity and the job of a linker. also you could fashion a simple linker based on the above files or something similar, extract the binary blobs and other info and start placing them in whatever address space you want.
Either one are fairly simple programming tasks, yet fairly educational. Having an existing toolchain that can produce the answer you can figure out where you are going wrong or how to get at the right answer.

Resources