I know that there are many questions like this, but this question is not about what static and volatile means from C standard's point of view. I'm interested in what is happening a bit lower - on assembly level.
static keyword for variables makes those variables to be visible statically (static storage duration), like global variables. To make it real a compiler should write those variables to .bss section or somewhere else? Also, static keyword prevents the variable/function to be used outside the file, does it happen only during compilation or there are some runtime-checks?
volatile keyword for variables makes those variables to be read from memory to make sure that if something else (like Peripheral Devices) wants to modify that variable it'll see exactly the value from that memory. Here, what does exactly mean "to be read from memory"? What is memory location used? .bss, .data, or something else?
The static keyword has two meanings: (a) it conveys static storage class and (b) it conveys internal linkage. These two meanings have to be strictly distinguished.
An object having static storage class means that it is allocated at the start of the program and lives until the end of the program. This is usually achieved by placing the object into the data segment (for initialised objects) or into the bss segment (for uninitialised objects). Details may vary depending on the toolchain in question.
An identifier having internal linkage means that each identifier in the same translation unit with the same name and some linkage (i.e. the linkage is not “none”) refers to the same object as that identifier. This is usually realised by not making the symbol corresponding to the identifier a global symbol. The linker will then not recognise references of the same symbol from different translation units as referring to the same symbol.
The volatile keyword indicates that all operations performed on the volatile-qualified object in the abstract machine must be performed in the code generated. The compiler is not permitted to perform any optimisations that would discard any such operations performed on the volatile-qualified object as it usually would for non-volatile-qualified objects.
This keyword is purely an instruction to the compiler to suppress certain optimisations. It does not affect the storage class of the objects qualified such. See also my previous answer on this topic.
You can also try it and see.
C code:
static unsigned int x;
unsigned int y;
unsigned int z = 1;
static volatile unsigned int j;
static volatile const unsigned int k = 11;
void fun ( void )
{
x = 5;
y = 7;
z ++ ;
j+=2;
}
Assembler:
mov ip, #7
ldr r3, .L3
ldr r0, .L3+4
ldr r2, [r3, #4]
ldr r1, [r0]
add r2, r2, #2
add r1, r1, #1
str r1, [r0]
str r2, [r3, #4]
str ip, [r3]
bx lr
.global z
.global y
.data
.align 2
.set .LANCHOR1,. + 0
.type z, %object
.size z, 4
z:
.word 1
.type k, %object
.size k, 4
k:
.word 11
.bss
.align 2
.set .LANCHOR0,. + 0
.type y, %object
.size y, 4
y:
.space 4
.type j, %object
.size j, 4
j:
.space 4
x was not expected to survive in an example like this and in any file it will maybe land in .bss since I did not put an initial value.
y is .bss as expected
z is .data as expected
volatile prevents j from being optimized out despite it being dead code/variable.
k could have ended up in .rodata but looks like .data here.
You guys are using fancy words but static in C just means it is limited in scope limited to that function or file. Global, local, initialized or not, const or not, can affect if it is .data, .bss, or .rodata (could even land in .text instead of .rodata if you play the alphabet game with the (rwx) stuff in the linker script (suggestion: never use those).
volatile is implied to mean some flavors of do not optimize out this variable/operation, do it in this order do not move it outside the loop, etc. You can find discussions about how it is not what you think it is and we have seen on this site that llvm/clang and gnu/gcc have a different opinion on what volatile actually means (when used to describe a pointer that is intended to access a control or status register in a peripheral, based on some arguments as what volatile was invented for (not for sharing variables between interrupts and foreground code)).
Like static volatile does not imply what segment it is in can even be used with asm volatile (stuff); to tell the compiler I do not want you to move this code around I want it to happen right here in this order. (which is an aspect of using it on a variable, or so we believe).
static unsigned int x;
void fun ( void )
{
x = 5;
}
Disassembly of section .text:
00000000 <fun>:
0: e12fff1e bx lr
no .rodata, .data, nor .bss optimized away.
but
static unsigned int x;
void fun ( void )
{
x += 5;
}
Disassembly of section .text:
00000000 <fun>:
0: e59f200c ldr r2, [pc, #12] ; 14 <fun+0x14>
4: e5923000 ldr r3, [r2]
8: e2833005 add r3, r3, #5
c: e5823000 str r3, [r2]
10: e12fff1e bx lr
14: 00000000 andeq r0, r0, r0
Disassembly of section .bss:
00000000 <x>:
0: 00000000 andeq r0, r0, r0
How fun is that, ewww... let's not optimize out the dead code, let's put it in there. It is not global, nobody else can see it...
fun.c
static unsigned int x;
void fun ( void )
{
x += 5;
}
so.c
static unsigned int x;
void more_fun ( void )
{
x += 3;
}
linked
Disassembly of section .text:
00008000 <more_fun>:
8000: e59f200c ldr r2, [pc, #12] ; 8014 <more_fun+0x14>
8004: e5923000 ldr r3, [r2]
8008: e2833003 add r3, r3, #3
800c: e5823000 str r3, [r2]
8010: e12fff1e bx lr
8014: 00018030 andeq r8, r1, r0, lsr r0
00008018 <fun>:
8018: e59f200c ldr r2, [pc, #12] ; 802c <fun+0x14>
801c: e5923000 ldr r3, [r2]
8020: e2833005 add r3, r3, #5
8024: e5823000 str r3, [r2]
8028: e12fff1e bx lr
802c: 00018034 andeq r8, r1, r4, lsr r0
Disassembly of section .bss:
00018030 <x>:
18030: 00000000 andeq r0, r0, r0
00018034 <x>:
18034: 00000000 andeq r0, r0, r0
each x is static so as expected there are two of them... well expectations are they are optimized out but...
and they are .bss as expected since I did not initialize them.
and on that note
static unsigned int x=3;
void fun ( void )
{
x += 5;
}
Disassembly of section .text:
00000000 <fun>:
0: e59f200c ldr r2, [pc, #12] ; 14 <fun+0x14>
4: e5923000 ldr r3, [r2]
8: e2833005 add r3, r3, #5
c: e5823000 str r3, [r2]
10: e12fff1e bx lr
14: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00000000 <x>:
0: 00000003 andeq r0, r0, r3
static const unsigned int x=3;
unsigned int fun ( void )
{
return(x);
}
Disassembly of section .text:
00000000 <fun>:
0: e3a00003 mov r0, #3
4: e12fff1e bx lr
static const unsigned int x=3;
const unsigned int y=5;
unsigned int fun ( void )
{
return(x+y);
}
Disassembly of section .text:
00000000 <fun>:
0: e3a00008 mov r0, #8
4: e12fff1e bx lr
Disassembly of section .rodata:
00000000 <y>:
0: 00000005 andeq r0, r0, r5
Okay I finally got a .rodata.
static const unsigned int x=3;
volatile const unsigned int y=5;
unsigned int fun ( void )
{
return(x+y);
}
Disassembly of section .text:
00000000 <fun>:
0: e59f3008 ldr r3, [pc, #8] ; 10 <fun+0x10>
4: e5930000 ldr r0, [r3]
8: e2800003 add r0, r0, #3
c: e12fff1e bx lr
10: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00000000 <y>:
0: 00000005 andeq r0, r0, r5
There is only so much you can do with words and their (perceived) definitions, the topic as I understand it is C vs (generated) asm. At some point you should actually try it and you can see how trivial it was, do not need to write elaborate code. gcc, objdump and sometimes ld. Hmm I just noticed y moved to .data from .rodata in that case... That is interesting.
And this just try it will test the compiler and other tool authors interpretation. Things like what does register mean what does volatile mean, etc (and to find that it is subject to different interpretations like so much of the C language (implementation defined)). It is important sometimes to
know what your favorite/specific compilers interpretation of the language is, but be very mindful of actual implementation defined things (bitfields, unions, how structs are constructed (packing them causes as many problems as it solves) and so on)...
Go to the spec read whatever definition, then go to
your compiler and see how they interpreted it, then go back to the spec and see if you can figure it out.
As far as static goes essentially means scope, stays within the function or file (well compile domain for a single compile operation). and volatile implies please do this in this order and please do not optimize out this item and/or its operations. in both cases it is what you used them with that determines where they are .text, .data, .bss, .rodata, etc.
Related
I just read https://www.keil.com/support/man/docs/armlink/armlink_pge1406301797482.htm. but can't understand what a veneer is that arm linker inserts between function calls.
In "Procedure Call Standard for the ARM Architecture" document, it says,
5.3.1.1 Use of IP by the linker Both the ARM- and Thumb-state BL instructions are unable to address the full 32-bit address space, so
it may be necessary for the linker to insert a veneer between the
calling routine and the called subroutine. Veneers may also be needed
to support ARM-Thumb inter-working or dynamic linking. Any veneer
inserted must preserve the contents of all registers except IP (r12)
and the condition code flags; a conforming program must assume that a
veneer that alters IP may be inserted at any branch instruction that
is exposed to a relocation that supports inter-working or long
branches. Note R_ARM_CALL, R_ARM_JUMP24, R_ARM_PC24, R_ARM_THM_CALL,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19 are examples of the ELF
relocation types with this property. See [AAELF] for full details
Here is what I guess, is it something like this ? : when function A calls function B, and when those two functions are too far apart for the bl command to express, the linker inserts function C between function A and B in such a way function C is close to function B. Now function A uses b instruction to go to function C(copying all the registers between the function call), and function C uses bl instruction(copying all the registers too). Of course the r12 register is used to keep the remaining long jump address bits. Is this what veneer means? (I don't know why arm doesn't explain what veneer is but only what veneer provides..)
It is just a trampoline. Interworking is the easier one to demonstrate, using gnu here, but the implication is that Kiel has a solution as well.
.globl even_more
.type eve_more,%function
even_more:
bx lr
.thumb
.globl more_fun
.thumb_func
more_fun:
bx lr
extern unsigned int more_fun ( unsigned int x );
extern unsigned int even_more ( unsigned int x );
unsigned int fun ( unsigned int a )
{
return(more_fun(a)+even_more(a));
}
Unlinked object:
Disassembly of section .text:
00000000 <fun>:
0: e92d4070 push {r4, r5, r6, lr}
4: e1a05000 mov r5, r0
8: ebfffffe bl 0 <more_fun>
c: e1a04000 mov r4, r0
10: e1a00005 mov r0, r5
14: ebfffffe bl 0 <even_more>
18: e0840000 add r0, r4, r0
1c: e8bd4070 pop {r4, r5, r6, lr}
20: e12fff1e bx lr
Linked binary (yes completely unusable, but demonstrates what the tool does)
Disassembly of section .text:
00001000 <fun>:
1000: e92d4070 push {r4, r5, r6, lr}
1004: e1a05000 mov r5, r0
1008: eb000008 bl 1030 <__more_fun_from_arm>
100c: e1a04000 mov r4, r0
1010: e1a00005 mov r0, r5
1014: eb000002 bl 1024 <even_more>
1018: e0840000 add r0, r4, r0
101c: e8bd4070 pop {r4, r5, r6, lr}
1020: e12fff1e bx lr
00001024 <even_more>:
1024: e12fff1e bx lr
00001028 <more_fun>:
1028: 4770 bx lr
102a: 46c0 nop ; (mov r8, r8)
102c: 0000 movs r0, r0
...
00001030 <__more_fun_from_arm>:
1030: e59fc000 ldr r12, [pc] ; 1038 <__more_fun_from_arm+0x8>
1034: e12fff1c bx r12
1038: 00001029 .word 0x00001029
103c: 00000000 .word 0x00000000
You cannot use bl to switch modes between arm and thumb so the linker has added a trampoline as I call it or have heard it called that you hop on and off to get to the destination. In this case essentially converting the branch part of bl into a bx, the link part they take advantage of just using the bl. You can see this done for thumb to arm or arm to thumb.
The even_more function is in the same mode (ARM) so no need for the trampoline/veneer.
For the distance limit of bl lemme see. Wow, that was easy, and gnu called it a veneer as well:
.globl more_fun
.type more_fun,%function
more_fun:
bx lr
extern unsigned int more_fun ( unsigned int x );
unsigned int fun ( unsigned int a )
{
return(more_fun(a)+1);
}
MEMORY
{
bob : ORIGIN = 0x00000000, LENGTH = 0x1000
ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.some : { so.o(.text*) } > bob
.more : { more.o(.text*) } > ted
}
Disassembly of section .some:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: eb000003 bl 18 <__more_fun_veneer>
8: e8bd4010 pop {r4, lr}
c: e2800001 add r0, r0, #1
10: e12fff1e bx lr
14: 00000000 andeq r0, r0, r0
00000018 <__more_fun_veneer>:
18: e51ff004 ldr pc, [pc, #-4] ; 1c <__more_fun_veneer+0x4>
1c: 20000000 .word 0x20000000
Disassembly of section .more:
20000000 <more_fun>:
20000000: e12fff1e bx lr
Staying in the same mode it did not need the bx.
The alternative is that you replace every bl instruction at compile time with a more complicated solution just in case you need to do a far call. Or since the bl offset/immediate is computed at link time you can, at link time, put the trampoline/veneer in to change modes or cover the distance.
You should be able to repeat this yourself with Kiel tools, all you needed to do was either switch modes on an external function call or exceed the reach of the bl instruction.
Edit
Understand that toolchains vary and even within a toolchain, gcc 3.x.x was the first to support thumb and I do not know that I saw this back then. Note the linker is part of binutils which is as separate development from gcc. You mention "arm linker", well arm has its own toolchain, then they bought Kiel and perhaps replaced Kiel's with their own or not. Then there is gnu and clang/llvm and others. So it is not a case of "arm linker" doing this or that, it is a case of the toolchains linker doing this or that and each toolchain is first free to use whatever calling convention they want there is no mandate that they have to use ARM's recommendations, second they can choose to implement this or not or simply give you a warning and you have to deal with it (likely in assembly language or through function pointers).
ARM does not need to explain it, or let us say, it is clearly explained in the Architectural Reference Manual (look at the bl instruction, the bx instruction look for the words interworking, etc. All quite clearly explained) for a particular architecture. So there is no reason to explain it again. Especially for a generic statement where the reach of bl varies and each architecture has different interworking features, it would be a long set of paragraphs or a short chapter to explain something that is already clearly documented.
Anyone implementing a compiler and linker would be well versed in the instruction set before hand and understand the bl and conditional branch and other limitations of the instruction set. Some instruction sets offer near and far jumps and some of those the assembly language for the near and far may be the same mnemonic so the assembler will often decide if it does not see the label in the same file to implement a far jump/call rather than a near one so that the objects can be linked.
In any case before linking you have to compile and assembly and the toolchain folks will have fully understood the rules of the architecture. ARM is not special here.
This is Raymond Chen's comment :
The veneer has to be close to A because B is too far away. A does a bl
to the veneer, and the veneer sets r12 to the final destination(B) and
does a bx r12. bx can reach the entire address space.
This answers to my question enough, but he doesn't want to write a full answer (maybe for lack of time..) I put it here as an answer and select it. If someone posts a better, more detailed answer, I'll switch to it.
In C, we can use the following two examples to show the difference between a static and non-static variable:
for (int i = 0; i < 5; ++i) {
static int n = 0;
printf("%d ", ++n); // prints 1 2 3 4 5 - the value persists
}
And:
for (int i = 0; i < 5; ++i) {
int n = 0;
printf("%d ", ++n); // prints 1 1 1 1 1 - the previous value is lost
}
Source: this answer.
What would be the most basic example in assembly to show the difference between how a static or non-static variable is created? (Or does this concept not exist in assembly?)
To implement a static object in assembly, you define it in a data section (of which there are various types, involving options for initialization and modification).
To implement an automatic object in assembly, you include space for it in the stack frame of a routine.
Examples, not necessarily syntactically correct in a particular assembly language, might be:
.data
foo: .word 34 // Static object named "foo".
.text
…
lr r3, foo // Load value of foo.
and:
.text
bar: // Start of routine named "bar".
foo = 16 // Define a symbol for convenience.
add sp, sp, -CalculatedSize // Allocate stack space for local data.
…
li r3, 34 // Load immediate value into register.
sr r3, foo(sp) // Store value into space reserved for foo on stack.
…
add sp, sp, +CalculatedSize // Automatic objects are released here.
ret
These are very simplified examples (as requested). Many modern schemes for using the hardware stack include frame pointers, which are not included above.
In the second example, CalculatedSize represents some amount that includes space for registers to be saved, space for the foo object, space for arguments for subroutine calls, and whatever other stack space is needed by the routine. The offset of 16 provided for foo is part of those calculations; the author of the routine would arrange their stack frame largely as they desire.
Just try it
void more_fun ( int );
void fun0 ( void )
{
for (int i = 0; i < 500; ++i) {
static int n = 0;
more_fun(++n);
}
}
void fun1 ( void )
{
for (int i = 0; i < 500; ++i) {
int n = 0;
more_fun( ++n);
}
}
Disassembly of section .text:
00000000 <fun0>:
0: e92d4070 push {r4, r5, r6, lr}
4: e3a04f7d mov r4, #500 ; 0x1f4
8: e59f501c ldr r5, [pc, #28] ; 2c <fun0+0x2c>
c: e5953000 ldr r3, [r5]
10: e2833001 add r3, r3, #1
14: e1a00003 mov r0, r3
18: e5853000 str r3, [r5]
1c: ebfffffe bl 0 <more_fun>
20: e2544001 subs r4, r4, #1
24: 1afffff8 bne c <fun0+0xc>
28: e8bd8070 pop {r4, r5, r6, pc}
2c: 00000000
00000030 <fun1>:
30: e92d4010 push {r4, lr}
34: e3a04f7d mov r4, #500 ; 0x1f4
38: e3a00001 mov r0, #1
3c: ebfffffe bl 0 <more_fun>
40: e2544001 subs r4, r4, #1
44: 1afffffb bne 38 <fun1+0x8>
48: e8bd8010 pop {r4, pc}
Disassembly of section .bss:
00000000 <n.4158>:
0: 00000000 andeq r0, r0, r0
I like to think of static locals as local globals. They sit in .bss or .data just like globals. But from a C perspective they can only be accessed within the function/context that they were created in.
I local variable has no need for long term storage, so it is "created" and destroyed within that fuction. If we were to not optimize you would see that
some stack space is allocated.
00000064 <fun1>:
64: e92d4800 push {fp, lr}
68: e28db004 add fp, sp, #4
6c: e24dd008 sub sp, sp, #8
70: e3a03000 mov r3, #0
74: e50b300c str r3, [fp, #-12]
78: ea000009 b a4 <fun1+0x40>
7c: e3a03000 mov r3, #0
80: e50b3008 str r3, [fp, #-8]
84: e51b3008 ldr r3, [fp, #-8]
88: e2833001 add r3, r3, #1
8c: e50b3008 str r3, [fp, #-8]
90: e51b0008 ldr r0, [fp, #-8]
94: ebfffffe bl 0 <more_fun>
98: e51b300c ldr r3, [fp, #-12]
9c: e2833001 add r3, r3, #1
a0: e50b300c str r3, [fp, #-12]
a4: e51b300c ldr r3, [fp, #-12]
a8: e3530f7d cmp r3, #500 ; 0x1f4
ac: bafffff2 blt 7c <fun1+0x18>
b0: e1a00000 nop ; (mov r0, r0)
b4: e24bd004 sub sp, fp, #4
b8: e8bd8800 pop {fp, pc}
But optimized for fun1 the local variable is kept in a register, faster than keeping on the stack, in this solution they save the upstream value held in r4 so
that r4 can be used to hold n within this function, when the function returns there is no more need for n per the rules of the language.
For the static local, per the rules of the language that value remains static
outside the function and can be accessed within. Because it is initialized to 0 it lives in .bss not .data (gcc, and many others). In the code above the linker
will fill this value
2c: 00000000
in with the address to this
00000000 <n.4158>:
0: 00000000 andeq r0, r0, r0
IMO one could argue the implementation didnt need to treat it like a volatile and
sample and save n every loop. Could have basically implemented it like the second
function, but sampled up front from memory and saved it in the end. Either way
you can see the difference in an implementation of the high level code. The non-static local only lives within the function and then its storage anc contents
are essentially gone.
When modifying a variable, the static local variable modified by static is executed only once, and the life cycle of the local variable is extended until the program is run.
If you don't add static, each loop reallocates the value.
I am trying to understand where things are stored in memory such as global and static variables (.data, if not initialized to zero,) etc.
What I am trying to find/considering is a macro such as shown below:
#define thisInteger 100
Can this be found using objdump?
Additionally, if I were to then assign this to a new variable such as below, where would this be found (guessing in .data):
#define THIS_INTEGER 100
int newVariable = THIS_INTEGER;
Macros are not variables, thus they are not stored anywhere. When you do #define thisInteger 100, C preprocessor runs through the source code and replaces thisInteger with the integer literal 100. Asking where thisInteger is stored is the same as asking where 100 is stored. To verify this, try something like &thisInteger. It won't compile because &100 is illegal and makes no sense.
Can this be found using objdump?
No. Preprocessing is a copy-paste processing done before compilation.
Additionally, if I were to then assign this to a new variable such as below, where would this be found
Depends on where you define the variable.
macros are only compile time (they are preprocessed before compilation)
If you use gcc compiler you can see preprocessed C file by using -E gcc option. This preprocessed file will be used in the actual compilation.
Your preprocessed example
if the newVariable has static or thread storage duration it is initialized to this value before the main function is called
if the newVariable has an automatic storage duration it is initialized to this value when the function is called.
The compiler will source the value 100 wherever the macro is used. It is most likely found in various machine code instructions, using immediate mode addressing, e.g. when used within expression statements, like a = a + 100, or f(100).
The compiler will most likely embed small constants like this on demand within instructions involved in computing expressions like the above, so if we do a = a + thisInteger; and f(thisInteger), there will probably be two different machine code instructions that embed the constant 100 as an immediate, one for each such use. Global data takes work to address, more so than embedding small immediates, so the compiler will not attempt to share the 100 between the two uses as global or static data.
So, yes, you can see the 100 in objdump, but for many usages you probably need to look at the code (.text) section to find instructions that use #100 as an immediate operand (or #64h if printed in hex). In disassembly, you're looking for instructions like add [rbp+24], #100, or move rdi, #100.
You're right that if you declare a mutable global variable int x = thisInteger; you could find the 100 in the data (.data) section with objdump. But local variable of the same declaration would be initialized at runtime using machine code instructions, so something like mov ??, #100.
try it yourself and see
Starting point: so.c
#define THIS_INTEGER 100
int newVariable = THIS_INTEGER;
void fun0 ( void )
{
static int hello;
hello = 100;
}
int fun1 ( void )
{
int hello;
hello = 100;
return(hello);
}
the pre-processor does the search and replace for the defines
arm-none-eabi-gcc -save-temps -O2 -c so.c -o so.o
so.i
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
int newVariable = 100;
void fun0 ( void )
{
static int hello;
hello = 100;
}
int fun1 ( void )
{
int hello;
hello = 100;
return(hello);
}
You can see that THIS_INTEGER no longer exists it was just a macro/define its purpose is to keep tract of a constant in this case so that if you want to change it you can change all the relevant instances of it. But the compiler needs something it can actually compile.
The preprocessor output so.i is then fed to the actual compiler and that produces assembly: so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun0
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type fun0, %function
fun0:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
bx lr
.size fun0, .-fun0
.align 2
.global fun1
.syntax unified
.arm
.fpu softvfp
.type fun1, %function
fun1:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #100
bx lr
.size fun1, .-fun1
.global newVariable
.data
.align 2
.type newVariable, %object
.size newVariable, 4
newVariable:
.word 100
.ident "GCC: (GNU) 9.2.0"
That is fed to the assembler and then if you disassemble that you get:
Disassembly of section .text:
00000000 <fun0>:
0: e12fff1e bx lr
00000004 <fun1>:
4: e3a00064 mov r0, #100 ; 0x64
8: e12fff1e bx lr
Disassembly of section .data:
00000000 <newVariable>:
0: 00000064
Ehh I had hoped the static would keep it there. For the global variable being initialized that makes it .data if it werent it would be .bss. Then in .data you
can see the 100 (0x64). but it has nothing to do with the macro/define the macro/define simply put the actual value 100 in the actual compiled code.
For the other case, with optimization here, there is no variable on the stack or anything like that the value is placed in the return register, so in this case it lives in a register briefly.
Had the static worked as desired which in hindsight it makes sense it didnt. I was hoping for what I call a local global. Its a local variable but adding static puts it in .bss or .data not the stack and then was hoping to see code generated to then put 100 in a variable then put that in that .data/.bss area which works unoptimized of course but that is harder to read:
Disassembly of section .text:
00000000 <fun0>:
0: e52db004 push {r11} ; (str r11, [sp, #-4]!)
4: e28db000 add r11, sp, #0
8: e59f3018 ldr r3, [pc, #24] ; 28 <fun0+0x28>
c: e3a02064 mov r2, #100 ; 0x64
10: e5832000 str r2, [r3]
14: e1a00000 nop ; (mov r0, r0)
18: e1a00003 mov r0, r3
1c: e28bd000 add sp, r11, #0
20: e49db004 pop {r11} ; (ldr r11, [sp], #4)
24: e12fff1e bx lr
28: 00000000 andeq r0, r0, r0
0000002c <fun1>:
2c: e52db004 push {r11} ; (str r11, [sp, #-4]!)
30: e28db000 add r11, sp, #0
34: e24dd00c sub sp, sp, #12
38: e3a03064 mov r3, #100 ; 0x64
3c: e50b3008 str r3, [r11, #-8]
40: e51b3008 ldr r3, [r11, #-8]
44: e1a00003 mov r0, r3
48: e28bd000 add sp, r11, #0
4c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
50: e12fff1e bx lr
Disassembly of section .data:
00000000 <newVariable>:
0: 00000064 andeq r0, r0, r4, rrx
Disassembly of section .bss:
00000000 <hello.4142>:
0: 00000000 andeq r0, r0, r0
Specifically:
c: e3a02064 mov r2, #100 ; 0x64
10: e5832000 str r2, [r3]
The 100 is put in a register, then that register value is written to memory where the local global hello from fun0 lives in .bss.
macros/defines simply search and replace, the preprocessor is going to iterate as many times as needed for the various levels/layers of macros until they are all replaced, none of them exist as written in the pre-processed code. Then that is sent to the compiler.
The VALUE 100 in this case is visible in the final output but it depends on how you used it as to how it is represented or where it is stored.
Just a question from curiosity. In C we can initialize value directly or assign after defining the variable. like
char* pStr = NULL;
or
char* pStr;
pStr = NULL;
functionality-wise they are similar but is there any difference after compilation. Is extra instruction cycle is required for the later or modern compiler are intelligent enough to optimize.
N.B: I am reviewing a old codebase where the second case is being used extensively. Thats why I am curious, if I can get real change by changing the code in all the places.
The first snippet initializes the variable with a value. The second default-initializes it, which does nothing for a pointer with automatic storage duration, and then assigns a value.
For a non-const pointer with automatic storage duration, there should be no difference except that you may unintentionally use it before it is initialized, which would be UB.
Other things like references or constants for example require the first style.
depends on whether it is local or global
int hello;
int world=6;
void fun ( void )
{
int foo;
int bar=5;
foo=4;
hello=2;
}
for globals hello would land in .bss (which may require bootstrap code to set to zero) and the code would be created that executes runtime to set that "variable" to 2. world would land in .data and have the initial value of 6 set at compile time. The allocated memory/data space would have that value, but there may be bootstrap required to place that data before use.
foo and bar are ideally on the stack, which is a runtime "allocation" so in either case code is required to make room for them as well as runtime set them to a value. If you made them static or basically "local globals" they now fall into the same category as globals landing in .bss or .data bar being initialized to 5 the one time but foo being set at runtime in the .text code generated.
simple examples, compiling and disassembling will show how all of this works, granted not trivial as an optimizer may eliminate some of what you are looking for depending on the rest of the code. (the code above sets hello to 2, foo and bar are dead code and would be optimized out).
00000000 <fun>:
0: e3a02002 mov r2, #2
4: e59f3004 ldr r3, [pc, #4] ; 10 <fun+0x10>
8: e5832000 str r2, [r3]
c: e12fff1e bx lr
10: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00000000 <world>:
0: 00000006 andeq r0, r0, r6
If I do a very crude link without startup code, etc to see the rest of the picture:
00001000 <fun>:
1000: e3a02002 mov r2, #2
1004: e59f3004 ldr r3, [pc, #4] ; 1010 <fun+0x10>
1008: e5832000 str r2, [r3]
100c: e12fff1e bx lr
1010: 00011018 andeq r1, r1, r8, lsl r0
Disassembly of section .data:
00011014 <__data_start>:
11014: 00000006 andeq r0, r0, r6
Disassembly of section .bss:
00011018 <__bss_start>:
11018: 00000000 andeq r0, r0, r0
we see both hello and world but foo and bar are optimized out.
For a homework assignment I have been given some c files, and compiled them using arm-linux-gcc (we will eventually be targeting gumstix boards, but for these exercises we have been working with qemu and ema).
One of the questions confuses me a bit-- we are told to:
Use arm-linux-objdump to find the location of variables declared in main() in the executable binary.
However, these variables are local and thus shouldn't have addresses until runtime, correct?
I'm thinking that maybe what I need to find is the offset in the stack frame, which can in fact be found using objdump (not that I know how).
Anyways, any insight into the matter would be greatly appreciated, and I would be happy to post the source code if necessary.
unsigned int one ( unsigned int, unsigned int );
unsigned int two ( unsigned int, unsigned int );
unsigned int myfun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int a,b;
a=one(x,y);
b=two(a,z);
return(a+b);
}
compile and disassemble
arm-none-eabi-gcc -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
code created by compiler
00000000 <myfun>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: e24dd018 sub sp, sp, #24
c: e50b0010 str r0, [fp, #-16]
10: e50b1014 str r1, [fp, #-20]
14: e50b2018 str r2, [fp, #-24]
18: e51b0010 ldr r0, [fp, #-16]
1c: e51b1014 ldr r1, [fp, #-20]
20: ebfffffe bl 0 <one>
24: e50b0008 str r0, [fp, #-8]
28: e51b0008 ldr r0, [fp, #-8]
2c: e51b1018 ldr r1, [fp, #-24]
30: ebfffffe bl 0 <two>
34: e50b000c str r0, [fp, #-12]
38: e51b2008 ldr r2, [fp, #-8]
3c: e51b300c ldr r3, [fp, #-12]
40: e0823003 add r3, r2, r3
44: e1a00003 mov r0, r3
48: e24bd004 sub sp, fp, #4
4c: e8bd4800 pop {fp, lr}
50: e12fff1e bx lr
Short answer is the memory is "allocated" both at compile time and at run time. At compile time in the sense that the compiler at compile time determines the size of the stack frame and who goes where. Run time in the sense that the memory itself is on the stack which is a dynamic thing. The stack frame is taken from stack memory at run time, almost like a malloc() and free().
It helps to know the calling convention, x enters in r0, y in r1, z in r2. then x has its home at fp-16, y at fp-20, and z at fp-24. then the call to one() needs x and y so it pulls those from the stack (x and y). the result of one() goes into a which is saved at fp-8 so that is the home for a. and so on.
the function one is not really at address 0, this is a disassembly of an object file not a linked binary. once an object is linked in with the rest of the objects and libraries, the missing parts, like where external functions are, are patched in by the linker and the calls to one() and two() will get real addresses. (and the program will likely not start at address 0).
I cheated here a little, I knew that with no optimizations enabled on the compiler and a relatively simple function like this there really is no reason for a stack frame:
compile with just a little optimization
arm-none-eabi-gcc -O1 -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
and the stack frame is gone, the local variables remain in registers.
00000000 :
0: e92d4038 push {r3, r4, r5, lr}
4: e1a05002 mov r5, r2
8: ebfffffe bl 0
c: e1a04000 mov r4, r0
10: e1a01005 mov r1, r5
14: ebfffffe bl 0
18: e0800004 add r0, r0, r4
1c: e8bd4038 pop {r3, r4, r5, lr}
20: e12fff1e bx lr
what the compiler decided to do instead is give itself more registers to work with by saving them on the stack. Why it saved r3 is a mystery, but that is another topic...
entering the function r0 = x, r1 = y and r2 = z per the calling convention, we can leave r0 and r1 alone (try again with one(y,x) and see what happens) since they drop right into one() and are never used again. The calling convention says that r0-r3 can be destroyed by a function, so we need to preserve z for later so we save it in r5. The result of one() is r0 per the calling convention, since two() can destroy r0-r3 we need to save a for later, after the call to two() also we need r0 for the call to two anyway, so r4 now holds a. We saved z in r5 (was in r2 moved to r5) before the call to one, we need the result of one() as the first parameter to two(), and it is already there, we need z as the second so we move r5 where we had saved z to r1, then we call two(). the result of two() per the calling convention. Since b + a = a + b from basic math properties the final add before returning is r0 + r4 which is b + a, and the result goes in r0 which is the register used to return something from a function, per the convention. clean up the stack and restore the modified registers, done.
Since myfun() made calls to other functions using bl, bl modifies the link register (r14), in order to be able to return from myfun() we need the value in the link register to be preserved from the entry into the function to the final return (bx lr), so lr is pushed on the stack. The convention states that we can destroy r0-r3 in our function but not other registers so r4 and r5 are pushed on the stack because we used them. why r3 is pushed on the stack is not necessary from a calling convention perspective, I wonder if it was done in anticipation of a 64 bit memory system, making two full 64 bit writes is cheaper than one 64 bit write and one 32 bit right. but you would need to know the alignment of the stack going in so that is just a theory. There is no reason to preserve r3 in this code.
Now take this knowledge and disassemble the code assigned (arm-...-objdump -D something.something) and do the same kind of analysis. particularly with functions named main() vs functions not named main (I did not use main() on purpose) the stack frame can be a size that doesnt make sense, or less sense than other functions. In the non optimized case above we needed to store 6 things total, x,y,z,a,b and the link register 6*4 = 24 bytes which resulted in sub sp, sp, #24, I need to think about the stack pointer vs frame pointer
thing for a bit. I think there is a command line argument to tell the compiler not to use a frame pointer. -fomit-frame-pointer and it saves a couple of instructions
00000000 <myfun>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e24dd01c sub sp, sp, #28
8: e58d000c str r0, [sp, #12]
c: e58d1008 str r1, [sp, #8]
10: e58d2004 str r2, [sp, #4]
14: e59d000c ldr r0, [sp, #12]
18: e59d1008 ldr r1, [sp, #8]
1c: ebfffffe bl 0 <one>
20: e58d0014 str r0, [sp, #20]
24: e59d0014 ldr r0, [sp, #20]
28: e59d1004 ldr r1, [sp, #4]
2c: ebfffffe bl 0 <two>
30: e58d0010 str r0, [sp, #16]
34: e59d2014 ldr r2, [sp, #20]
38: e59d3010 ldr r3, [sp, #16]
3c: e0823003 add r3, r2, r3
40: e1a00003 mov r0, r3
44: e28dd01c add sp, sp, #28
48: e49de004 pop {lr} ; (ldr lr, [sp], #4)
4c: e12fff1e bx lr
optimizing saves a whole lot more though...
It's going to depend on the program and how exactly they want the location of the variables. Does the question want what code section they're stored in? .const .bss etc? Does it want specific addresses? Either way a good start is using objdump -S flag
objdump -S myprogram > dump.txt
This is nice because it will print out an intermixing of your source code and the assembly with addresses. From here just do a search for your int main and that should get you started.