Labels as Values in Inline Assembly Macros - c

I'm wondering how I can load a register with a the value of a label in inline assembly.
I tried naively doing it in this fashion:
#define inline_macro(a) \
asm volatile("1: movw %[rega], #:lower16:1b\n" \
" movt %[rega], #:upper16:1b\n" \
: [rega] "+r" (a) \
: \
: \
);
int main(void){
register int i asm("r2");
register void* value asm("r1");
i++;
i += 2;
inline_macro(value);
return i;
}
However, the object dump shows this assembly code being generated:
00000000 <main>:
0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
4: e28db000 add fp, sp, #0
8: e1a03002 mov r3, r2
c: e2833001 add r3, r3, #1
10: e1a02003 mov r2, r3
14: e1a03002 mov r3, r2
18: e2833002 add r3, r3, #2
1c: e1a02003 mov r2, r3
00000020 <.L11>:
20: e3001000 movw r1, #0
24: e3401000 movt r1, #0
28: e1a03002 mov r3, r2
2c: e1a00003 mov r0, r3
30: e24bd000 sub sp, fp, #0
34: e49db004 pop {fp} ; (ldr fp, [sp], #4)
38: e12fff1e bx lr
What I want is for the value 0x20 to be loaded into register r1.

There is actually a whole instruction for getting the address of a label into a register: ADR.
Since it works by arithmetic on the PC, it's completely position-independent and doesn't have unlimited range, but neither seems an issue here - taking advantage of the assembler's "current instruction" pseudo-label ., the example in the question becomes:
#define inline_macro(a) \
asm volatile("adr %[rega], . \n" \
: [rega] "=r" (a) \
: \
: \
);

I don't know arm asm, but you're trying to get the address of the movw instruction? Does ARM really require two separate 16bit mov instructions for that?
It looks like you're disassembling the .o, where the linker hasn't replaced symbol references with their actual values, and you're seeing 0 as a placeholder.
Since I think your code doesn't depend on the old value of the register, you should use it as a write-only output operand (=r), not a read-write operand (+r).

Related

Example of a static vs automatic variable in assembly

In C, we can use the following two examples to show the difference between a static and non-static variable:
for (int i = 0; i < 5; ++i) {
static int n = 0;
printf("%d ", ++n); // prints 1 2 3 4 5 - the value persists
}
And:
for (int i = 0; i < 5; ++i) {
int n = 0;
printf("%d ", ++n); // prints 1 1 1 1 1 - the previous value is lost
}
Source: this answer.
What would be the most basic example in assembly to show the difference between how a static or non-static variable is created? (Or does this concept not exist in assembly?)
To implement a static object in assembly, you define it in a data section (of which there are various types, involving options for initialization and modification).
To implement an automatic object in assembly, you include space for it in the stack frame of a routine.
Examples, not necessarily syntactically correct in a particular assembly language, might be:
.data
foo: .word 34 // Static object named "foo".
.text
…
lr r3, foo // Load value of foo.
and:
.text
bar: // Start of routine named "bar".
foo = 16 // Define a symbol for convenience.
add sp, sp, -CalculatedSize // Allocate stack space for local data.
…
li r3, 34 // Load immediate value into register.
sr r3, foo(sp) // Store value into space reserved for foo on stack.
…
add sp, sp, +CalculatedSize // Automatic objects are released here.
ret
These are very simplified examples (as requested). Many modern schemes for using the hardware stack include frame pointers, which are not included above.
In the second example, CalculatedSize represents some amount that includes space for registers to be saved, space for the foo object, space for arguments for subroutine calls, and whatever other stack space is needed by the routine. The offset of 16 provided for foo is part of those calculations; the author of the routine would arrange their stack frame largely as they desire.
Just try it
void more_fun ( int );
void fun0 ( void )
{
for (int i = 0; i < 500; ++i) {
static int n = 0;
more_fun(++n);
}
}
void fun1 ( void )
{
for (int i = 0; i < 500; ++i) {
int n = 0;
more_fun( ++n);
}
}
Disassembly of section .text:
00000000 <fun0>:
0: e92d4070 push {r4, r5, r6, lr}
4: e3a04f7d mov r4, #500 ; 0x1f4
8: e59f501c ldr r5, [pc, #28] ; 2c <fun0+0x2c>
c: e5953000 ldr r3, [r5]
10: e2833001 add r3, r3, #1
14: e1a00003 mov r0, r3
18: e5853000 str r3, [r5]
1c: ebfffffe bl 0 <more_fun>
20: e2544001 subs r4, r4, #1
24: 1afffff8 bne c <fun0+0xc>
28: e8bd8070 pop {r4, r5, r6, pc}
2c: 00000000
00000030 <fun1>:
30: e92d4010 push {r4, lr}
34: e3a04f7d mov r4, #500 ; 0x1f4
38: e3a00001 mov r0, #1
3c: ebfffffe bl 0 <more_fun>
40: e2544001 subs r4, r4, #1
44: 1afffffb bne 38 <fun1+0x8>
48: e8bd8010 pop {r4, pc}
Disassembly of section .bss:
00000000 <n.4158>:
0: 00000000 andeq r0, r0, r0
I like to think of static locals as local globals. They sit in .bss or .data just like globals. But from a C perspective they can only be accessed within the function/context that they were created in.
I local variable has no need for long term storage, so it is "created" and destroyed within that fuction. If we were to not optimize you would see that
some stack space is allocated.
00000064 <fun1>:
64: e92d4800 push {fp, lr}
68: e28db004 add fp, sp, #4
6c: e24dd008 sub sp, sp, #8
70: e3a03000 mov r3, #0
74: e50b300c str r3, [fp, #-12]
78: ea000009 b a4 <fun1+0x40>
7c: e3a03000 mov r3, #0
80: e50b3008 str r3, [fp, #-8]
84: e51b3008 ldr r3, [fp, #-8]
88: e2833001 add r3, r3, #1
8c: e50b3008 str r3, [fp, #-8]
90: e51b0008 ldr r0, [fp, #-8]
94: ebfffffe bl 0 <more_fun>
98: e51b300c ldr r3, [fp, #-12]
9c: e2833001 add r3, r3, #1
a0: e50b300c str r3, [fp, #-12]
a4: e51b300c ldr r3, [fp, #-12]
a8: e3530f7d cmp r3, #500 ; 0x1f4
ac: bafffff2 blt 7c <fun1+0x18>
b0: e1a00000 nop ; (mov r0, r0)
b4: e24bd004 sub sp, fp, #4
b8: e8bd8800 pop {fp, pc}
But optimized for fun1 the local variable is kept in a register, faster than keeping on the stack, in this solution they save the upstream value held in r4 so
that r4 can be used to hold n within this function, when the function returns there is no more need for n per the rules of the language.
For the static local, per the rules of the language that value remains static
outside the function and can be accessed within. Because it is initialized to 0 it lives in .bss not .data (gcc, and many others). In the code above the linker
will fill this value
2c: 00000000
in with the address to this
00000000 <n.4158>:
0: 00000000 andeq r0, r0, r0
IMO one could argue the implementation didnt need to treat it like a volatile and
sample and save n every loop. Could have basically implemented it like the second
function, but sampled up front from memory and saved it in the end. Either way
you can see the difference in an implementation of the high level code. The non-static local only lives within the function and then its storage anc contents
are essentially gone.
When modifying a variable, the static local variable modified by static is executed only once, and the life cycle of the local variable is extended until the program is run.
If you don't add static, each loop reallocates the value.

Why do I get different result using log() function in C?

Here is a simple example of log() function test:
#include <stdio.h>
#include <math.h>
int main(void)
{
int a = 2;
printf("int a = %d, log((double)a) = %g, log(2.0) = %g\n", a, log((double)a), log(2.0));
return 0;
}
I get difference on Raspberry Pi 3 and Ubuntu16.04:
arm-linux-gnueabi-gcc
$ arm-linux-gnueabi-gcc -mfloat-abi=soft -march=armv7-a foo.c -o foo -lm
$ ./foo
int a = 2, log((double)a) = 5.23028e-314, log(2.0) = 0.693147
arm-linux-gnueabihf-gcc
$ arm-linux-gnueabihf-gcc -march=armv7-a foo.c -o foo -lm
$ ./foo
int a = 2, log((double)a) = 0.693147, log(2.0) = 0.693147
gcc
$ gcc foo.c -o foo -lm
$ ./foo
int a = 2, log((double)a) = 0.693147, log(2.0) = 0.693147
The standard distribution of Raspbian uses the hardware floating point support of the Raspberry Pi (Raspbian FAQ) which is not fully compatible with the different approach of using a software library to emulate floating point computation using integers only.
You can tell the type of your Raspbian distribution by looking for the directory /lib/arm-linux-gnueabihf for the hard-float version and /lib/arm-linux-gnueabi (How can I tell...) for the soft-float one.
As Pascal Cuoq noted in one of the comments to this question, it might be of interest to know that the reason for the correct result of log(2.0) in all examples is called constant folding. The compiler is allowed to compute every result at compile time—if possible—for optimization purposes. This might be an unwanted behaviour if you have for example different rounding modes in your code. GCC has -frounding-math to switch of constant folding (among other things), although it might not catch everything, so be careful here.
Not able to repeat the issue. Where is your disassembly to show the value fed to printf?
#include <math.h>
double fun1 ( void )
{
return(log(2));
}
double fun2 ( void )
{
return(log(2.0));
}
00000000 <fun1>:
0: e30309ef movw r0, #14831 ; 0x39ef
4: e3021e42 movw r1, #11842 ; 0x2e42
8: e34f0efa movt r0, #65274 ; 0xfefa
c: e3431fe6 movt r1, #16358 ; 0x3fe6
10: e12fff1e bx lr
00000014 <fun2>:
14: e30309ef movw r0, #14831 ; 0x39ef
18: e3021e42 movw r1, #11842 ; 0x2e42
1c: e34f0efa movt r0, #65274 ; 0xfefa
20: e3431fe6 movt r1, #16358 ; 0x3fe6
24: e12fff1e bx lr
00000000 <fun1>:
0: ed9f 0b01 vldr d0, [pc, #4] ; 8 <fun1+0x8>
4: 4770 bx lr
6: bf00
8: fefa39ef
c: 3fe62e42
00000010 <fun2>:
10: ed9f 0b01 vldr d0, [pc, #4] ; 18 <fun2+0x8>
14: 4770 bx lr
16: bf00
18: fefa39ef
1c: 3fe62e42
0000000000000000 <fun1>:
0: f2 0f 10 05 00 00 00 movsd 0x0(%rip),%xmm0 # 8 <fun1+0x8>
7: 00
8: c3 retq
0000000000000010 <fun2>:
10: f2 0f 10 05 00 00 00 movsd 0x0(%rip),%xmm0 # 18 <fun2+0x8>
17: 00
18: c3 retq
0000000000000000 <.LC0>:
0: ef
1: 39 fa
3: fe 42 2e
6: e6 3f
Now causing an int to float conversion vs building in the float version (2) vs (2.0) as well as adding in (2.0F). Compile time or runtime can cause differences.
Start by eliminating the printf, divide this problem in half, am I seeing some printf thing or not printf thing. then is this a compile time thing or is this a runtime thing, is this a hard float thing or a soft float thing. Is this a c library thing or not a C library thing.
What if anything have you done so far to debug this?
Eventually someone is going to link the "whatever programmer should know about floating point" whether it applies or not...
EDIT
#include <math.h>
double fun ( void )
{
return(log(2.0));
}
00000000 <fun>:
0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
4: e28db000 add fp, sp, #0
8: e30329ef movw r2, #14831 ; 0x39ef
c: e34f2efa movt r2, #65274 ; 0xfefa
10: e3023e42 movw r3, #11842 ; 0x2e42
14: e3433fe6 movt r3, #16358 ; 0x3fe6
18: ec432b17 vmov d7, r2, r3
1c: eeb00b47 vmov.f64 d0, d7
20: e24bd000 sub sp, fp, #0
24: e49db004 pop {fp} ; (ldr fp, [sp], #4)
28: e12fff1e bx lr
00000000 <fun>:
0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
4: e28db000 add fp, sp, #0
8: e30329ef movw r2, #14831 ; 0x39ef
c: e34f2efa movt r2, #65274 ; 0xfefa
10: e3023e42 movw r3, #11842 ; 0x2e42
14: e3433fe6 movt r3, #16358 ; 0x3fe6
18: e1a00002 mov r0, r2
1c: e1a01003 mov r1, r3
20: e24bd000 sub sp, fp, #0
24: e49db004 pop {fp} ; (ldr fp, [sp], #4)
28: e12fff1e bx lr
well there goes the notion of constant folding explaining why to calls to log() give vastly different results. (arguably a different version of the toolchain (or different command line arguments) you could just get lucky, so far we dont know what version of the toolchains, build options, etc were used to be able to repeat this).
EDIT 2
#include <math.h>
double fun ( void )
{
return(log(2));
}
00000000 <fun>:
0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
4: e28db000 add fp, sp, #0
8: e30329ef movw r2, #14831 ; 0x39ef
c: e34f2efa movt r2, #65274 ; 0xfefa
10: e3023e42 movw r3, #11842 ; 0x2e42
14: e3433fe6 movt r3, #16358 ; 0x3fe6
18: ec432b17 vmov d7, r2, r3
1c: eeb00b47 vmov.f64 d0, d7
20: e24bd000 sub sp, fp, #0
24: e49db004 pop {fp} ; (ldr fp, [sp], #4)
28: e12fff1e bx lr
00000000 <fun>:
0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
4: e28db000 add fp, sp, #0
8: e30329ef movw r2, #14831 ; 0x39ef
c: e34f2efa movt r2, #65274 ; 0xfefa
10: e3023e42 movw r3, #11842 ; 0x2e42
14: e3433fe6 movt r3, #16358 ; 0x3fe6
18: e1a00002 mov r0, r2
1c: e1a01003 mov r1, r3
20: e24bd000 sub sp, fp, #0
24: e49db004 pop {fp} ; (ldr fp, [sp], #4)
28: e12fff1e bx lr
around 60 seconds worth of work to contemplate constant folding maybe being a factor, so far it doesnt apply, but there is potential dumb luck there, but the same dumb luck could/would apply to both calls to log
A few seconds of work by the OP to disassemble that program would quickly cover this side topic.

This assembly code makes nothing

I'm working in a C file, which is something like this:
#define v2 0x560000a0
int main(void)
{
long int v1;
v1 = ioread32(v2);
return 0;
}
and I've extracted this part to write it in assembly:
int main ()
{
extern v1,v2;
v1=ioread32(v2);
return 0;
}
I'm trying to write the value of v2 in v1 using assembly code for armv4. Using
arm-linux-gnueabi-gcc -S -march=armv4 assembly_file.c
I get this code:
.arch armv4
.eabi_attribute 27, 3
.fpu vfpv3-d16
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "txind-rsi.c"
.text
.align 2
.global main
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
stmfd sp!, {fp, lr}
add fp, sp, #4
ldr r3, .L2
ldr r3, [r3, #0]
mov r0, r3
bl ioread32
mov r2, r0
ldr r3, .L2+4
str r2, [r3, #0]
mov r3, #0
mov r0, r3
sub sp, fp, #4
ldmfd sp!, {fp, lr}
bx lr
.L3:
.align 2
.L2:
.word v1
.word v2
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",%progbits
I use that code to put it back inside the C file this way:
asm volatile(
"stmfd sp!, {fp, lr}\n"
"add fp, sp, #4\n"
"ldr r3, =v1\n"
"ldr r3, [r3, #0]\n"
"mov r0, r2\n"
"ldr r3, =v2\n"
"str r2, [r3, #0]\n"
"sub sp, fp, #4\n"
"ldmfd sp!, {fp, lr}\n"
"bx lr"
);
The code doesn't do anything.
In fact, it stops the target working.
Does anyones know why?
EDITED:
After reading your answers I have another question:
How would I put a constant value in a register?. The code in C would be this:
#define v2 0x560000a0
int main(void)
{
long int value = 0x0000ffff;
long int v1;
v1 = ioread32(v2);
iowrite32(v2,value);
return 0;
}
I've tried this:
asm volatile("mov r3, #value");
and I get an assembler message: "value symbol is in a different section";
I've also tried
asm volatile("mov r3, #0x0000ffff);
and the assembler message is:
"invalid constant(ffff) after fixup". And after reading this: Invalid constant after fixup?
I don't know how I can put that value into r3, as it seems I can't do it with mov.I'm using armv4, not armv7. And this solution proposed in the link, doesn't work for me:
asm volatile("ldr r3, =#0000ffff\n");
There are no problems with your code, bx lr is the proper way to terminate main, no issue there. Your code is most likely crashing due to the address you are accessing, it is probably not an address you are allowed to access...
#define v2 0x560000a0
int main(void)
{
long int v1;
v1 = ioread32(v2);
return 0;
}
if you optimize on the C compile step you can see a cleaner, simpler version
00000000 <main>:
0: e92d4008 push {r3, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <main+0x14>
8: ebfffffe bl 0 <ioread32>
c: e3a00000 mov r0, #0
10: e8bd8008 pop {r3, pc}
14: 560000a0 strpl r0, [r0], -r0, lsr #1
which is not hard to implement in asm.
.globl main
main:
push {r3,lr}
ldr r0,=0x560000A0
bl ioread32
mov r0,#0
pop {r3,pc}
assembled and disassembled
00000000 <main>:
0: e92d4008 push {r3, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <main+0x14>
8: ebfffffe bl 0 <ioread32>
c: e3a00000 mov r0, #0
10: e8bd8008 pop {r3, pc}
14: 560000a0 strpl r0, [r0], -r0, lsr #1
you had simplified the code to the point that v1 becomes dead code, but the function call cannot be optimized out, so the return is discarded.
if you dont use main but create a separate function that returns
#define v2 0x560000a0
long int fun(void)
{
long int v1;
v1 = ioread32(v2);
return v1;
}
...damn...tail optimization:
00000000 <fun>:
0: e59f0000 ldr r0, [pc] ; 8 <fun+0x8>
4: eafffffe b 0 <ioread32>
8: 560000a0 strpl r0, [r0], -r0, lsr #1
Oh well. You are on the right path, see what the C compiler generates then mimic or modify that. I suspect it is your ioread that is the problem and not the outer structure (why are you doing a 64 bit thing with a 32 bit read, maybe that is the problem long it is most likely going to be implemented as 64 bit).
The bx lr at the end would try to return from the current function. You probably don't want that. I guess that is also the reason for the crash, given that you don't let the compiler undo whatever setup it has done in the function prologue.
Also, in your asm block you don't use any locals so you don't need to adjust the stack pointer, and you don't need a new stack frame either. The mov r0, r2 is also broken because you have loaded the value into r3. You can delete the mov if you just load directly into whatever register you want. Furthermore, in gcc inline asm you should tell the compiler what registers you modify. Not doing so can also lead to a crash because you might overwrite values the compiler relies upon.
Not sure what's the point in doing this in assembly. If you insist on that, the following should work somewhat better:
asm volatile(
"ldr r3, =v1\n"
"ldr r2, [r3, #0]\n"
"ldr r3, =v2\n"
"str r2, [r3, #0]\n" ::: "r2", "r3"
);

arm-none-eabi-gcc C Pointers

So working with C in the arm-none-eabi-gcc. I have been having an issue with pointers, they don't seem to exists. Perhaps I'm passing the wrong cmds to the compiler.
Here is an example.
unsigned int * gpuPointer = GetGPU_Pointer(framebufferAddress);
unsigned int color = 16;
int y = 768;
int x = 1024;
while(y >= 0)
{
while(x >= 0)
{
*gpuPointer = color;
color = color + 2;
x--;
}
color++;
y--;
x = 1024;
}
and the output from the disassembler.
81c8: ebffffc3 bl 80dc <GetGPU_Pointer>
81cc: e3a0c010 mov ip, #16 ; 0x10
81d0: e28c3b02 add r3, ip, #2048 ; 0x800
81d4: e2833002 add r3, r3, #2 ; 0x2
81d8: e1a03803 lsl r3, r3, #16
81dc: e1a01823 lsr r1, r3, #16
81e0: e1a0300c mov r3, ip
81e4: e1a02003 mov r2, r3
81e8: e2833002 add r3, r3, #2 ; 0x2
81ec: e1a03803 lsl r3, r3, #16
81f0: e1a03823 lsr r3, r3, #16
81f4: e1530001 cmp r3, r1
81f8: 1afffff9 bne 81e4 <setup_framebuffer+0x5c>
Shouldn't there be a str cmd around 81e4? To add further the GetGPU_Pointer is coming from an assembler file but there is a declaration as so.
extern unsigned int * GetGPU_Pointer(unsigned int framebufferAddress);
My gut feeling is its something absurdly simple but I'm missing it.
You never change the value of gpuPointer and you haven't declared it to point to a volatile. So from the compiler's perspective you are overwriting a single memory location (*gpuPointer) 768*1024 times, but since you never use the value you are writing into it, the compiler is entitled to optimize by doing a single write at the end of the loop.
Adding to rici's answer (upvote rici not me)...
It gets even better, taking what you offered and wrapping it
extern unsigned int * GetGPU_Pointer ( unsigned int );
void fun ( unsigned int framebufferAddress )
{
unsigned int * gpuPointer = GetGPU_Pointer(framebufferAddress);
unsigned int color = 16;
int y = 768;
int x = 1024;
while(y >= 0)
{
while(x >= 0)
{
*gpuPointer = color;
color = color + 2;
x--;
}
color++;
y--;
x = 1024;
}
}
Optimizes to
00000000 <fun>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <GetGPU_Pointer>
8: e59f3008 ldr r3, [pc, #8] ; 18 <fun+0x18>
c: e5803000 str r3, [r0]
10: e8bd4008 pop {r3, lr}
14: e12fff1e bx lr
18: 00181110 andseq r1, r8, r0, lsl r1
because the code really doesnt do anything but that one store.
Now if you were to modify the pointer
while(x >= 0)
{
*gpuPointer = color;
gpuPointer++;
color = color + 2;
x--;
}
then you get the store you were looking for
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: ebfffffe bl 0 <GetGPU_Pointer>
8: e59f403c ldr r4, [pc, #60] ; 4c <fun+0x4c>
c: e1a02000 mov r2, r0
10: e3a0c010 mov ip, #16
14: e2820a01 add r0, r2, #4096 ; 0x1000
18: e2801004 add r1, r0, #4
1c: e1a0300c mov r3, ip
20: e4823004 str r3, [r2], #4
24: e1520001 cmp r2, r1
28: e2833002 add r3, r3, #2
2c: 1afffffb bne 20 <fun+0x20>
30: e28ccb02 add ip, ip, #2048 ; 0x800
34: e28cc003 add ip, ip, #3
38: e15c0004 cmp ip, r4
3c: e2802004 add r2, r0, #4
40: 1afffff3 bne 14 <fun+0x14>
44: e8bd4010 pop {r4, lr}
48: e12fff1e bx lr
4c: 00181113 andseq r1, r8, r3, lsl r1
or if you make it volatile (and then dont have to modify it)
volatile unsigned int * gpuPointer = GetGPU_Pointer(framebufferAddress);
then
00000000 <fun>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <GetGPU_Pointer>
8: e59fc02c ldr ip, [pc, #44] ; 3c <fun+0x3c>
c: e3a03010 mov r3, #16
10: e2831b02 add r1, r3, #2048 ; 0x800
14: e2812002 add r2, r1, #2
18: e5803000 str r3, [r0]
1c: e2833002 add r3, r3, #2
20: e1530002 cmp r3, r2
24: 1afffffb bne 18 <fun+0x18>
28: e2813003 add r3, r1, #3
2c: e153000c cmp r3, ip
30: 1afffff6 bne 10 <fun+0x10>
34: e8bd4008 pop {r3, lr}
38: e12fff1e bx lr
3c: 00181113 andseq r1, r8, r3, lsl r1
then you get your store
arm-none-eabi-gcc -O2 -c a.c -o a.o
arm-none-eabi-objdump -D a.o
arm-none-eabi-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The problem is, as written, you didnt tell the compiler to update the pointer more than the one time. So as in my first example it has no reason to even implement the loop, it can pre-compute the answer and write it one time. In order to force the compiler to implement the loop and write to the pointer more than one time, you either need to make it volatile and/or modify it, depends on what you were really needing to do.

Local variable location in memory

For a homework assignment I have been given some c files, and compiled them using arm-linux-gcc (we will eventually be targeting gumstix boards, but for these exercises we have been working with qemu and ema).
One of the questions confuses me a bit-- we are told to:
Use arm-linux-objdump to find the location of variables declared in main() in the executable binary.
However, these variables are local and thus shouldn't have addresses until runtime, correct?
I'm thinking that maybe what I need to find is the offset in the stack frame, which can in fact be found using objdump (not that I know how).
Anyways, any insight into the matter would be greatly appreciated, and I would be happy to post the source code if necessary.
unsigned int one ( unsigned int, unsigned int );
unsigned int two ( unsigned int, unsigned int );
unsigned int myfun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int a,b;
a=one(x,y);
b=two(a,z);
return(a+b);
}
compile and disassemble
arm-none-eabi-gcc -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
code created by compiler
00000000 <myfun>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: e24dd018 sub sp, sp, #24
c: e50b0010 str r0, [fp, #-16]
10: e50b1014 str r1, [fp, #-20]
14: e50b2018 str r2, [fp, #-24]
18: e51b0010 ldr r0, [fp, #-16]
1c: e51b1014 ldr r1, [fp, #-20]
20: ebfffffe bl 0 <one>
24: e50b0008 str r0, [fp, #-8]
28: e51b0008 ldr r0, [fp, #-8]
2c: e51b1018 ldr r1, [fp, #-24]
30: ebfffffe bl 0 <two>
34: e50b000c str r0, [fp, #-12]
38: e51b2008 ldr r2, [fp, #-8]
3c: e51b300c ldr r3, [fp, #-12]
40: e0823003 add r3, r2, r3
44: e1a00003 mov r0, r3
48: e24bd004 sub sp, fp, #4
4c: e8bd4800 pop {fp, lr}
50: e12fff1e bx lr
Short answer is the memory is "allocated" both at compile time and at run time. At compile time in the sense that the compiler at compile time determines the size of the stack frame and who goes where. Run time in the sense that the memory itself is on the stack which is a dynamic thing. The stack frame is taken from stack memory at run time, almost like a malloc() and free().
It helps to know the calling convention, x enters in r0, y in r1, z in r2. then x has its home at fp-16, y at fp-20, and z at fp-24. then the call to one() needs x and y so it pulls those from the stack (x and y). the result of one() goes into a which is saved at fp-8 so that is the home for a. and so on.
the function one is not really at address 0, this is a disassembly of an object file not a linked binary. once an object is linked in with the rest of the objects and libraries, the missing parts, like where external functions are, are patched in by the linker and the calls to one() and two() will get real addresses. (and the program will likely not start at address 0).
I cheated here a little, I knew that with no optimizations enabled on the compiler and a relatively simple function like this there really is no reason for a stack frame:
compile with just a little optimization
arm-none-eabi-gcc -O1 -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
and the stack frame is gone, the local variables remain in registers.
00000000 :
0: e92d4038 push {r3, r4, r5, lr}
4: e1a05002 mov r5, r2
8: ebfffffe bl 0
c: e1a04000 mov r4, r0
10: e1a01005 mov r1, r5
14: ebfffffe bl 0
18: e0800004 add r0, r0, r4
1c: e8bd4038 pop {r3, r4, r5, lr}
20: e12fff1e bx lr
what the compiler decided to do instead is give itself more registers to work with by saving them on the stack. Why it saved r3 is a mystery, but that is another topic...
entering the function r0 = x, r1 = y and r2 = z per the calling convention, we can leave r0 and r1 alone (try again with one(y,x) and see what happens) since they drop right into one() and are never used again. The calling convention says that r0-r3 can be destroyed by a function, so we need to preserve z for later so we save it in r5. The result of one() is r0 per the calling convention, since two() can destroy r0-r3 we need to save a for later, after the call to two() also we need r0 for the call to two anyway, so r4 now holds a. We saved z in r5 (was in r2 moved to r5) before the call to one, we need the result of one() as the first parameter to two(), and it is already there, we need z as the second so we move r5 where we had saved z to r1, then we call two(). the result of two() per the calling convention. Since b + a = a + b from basic math properties the final add before returning is r0 + r4 which is b + a, and the result goes in r0 which is the register used to return something from a function, per the convention. clean up the stack and restore the modified registers, done.
Since myfun() made calls to other functions using bl, bl modifies the link register (r14), in order to be able to return from myfun() we need the value in the link register to be preserved from the entry into the function to the final return (bx lr), so lr is pushed on the stack. The convention states that we can destroy r0-r3 in our function but not other registers so r4 and r5 are pushed on the stack because we used them. why r3 is pushed on the stack is not necessary from a calling convention perspective, I wonder if it was done in anticipation of a 64 bit memory system, making two full 64 bit writes is cheaper than one 64 bit write and one 32 bit right. but you would need to know the alignment of the stack going in so that is just a theory. There is no reason to preserve r3 in this code.
Now take this knowledge and disassemble the code assigned (arm-...-objdump -D something.something) and do the same kind of analysis. particularly with functions named main() vs functions not named main (I did not use main() on purpose) the stack frame can be a size that doesnt make sense, or less sense than other functions. In the non optimized case above we needed to store 6 things total, x,y,z,a,b and the link register 6*4 = 24 bytes which resulted in sub sp, sp, #24, I need to think about the stack pointer vs frame pointer
thing for a bit. I think there is a command line argument to tell the compiler not to use a frame pointer. -fomit-frame-pointer and it saves a couple of instructions
00000000 <myfun>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e24dd01c sub sp, sp, #28
8: e58d000c str r0, [sp, #12]
c: e58d1008 str r1, [sp, #8]
10: e58d2004 str r2, [sp, #4]
14: e59d000c ldr r0, [sp, #12]
18: e59d1008 ldr r1, [sp, #8]
1c: ebfffffe bl 0 <one>
20: e58d0014 str r0, [sp, #20]
24: e59d0014 ldr r0, [sp, #20]
28: e59d1004 ldr r1, [sp, #4]
2c: ebfffffe bl 0 <two>
30: e58d0010 str r0, [sp, #16]
34: e59d2014 ldr r2, [sp, #20]
38: e59d3010 ldr r3, [sp, #16]
3c: e0823003 add r3, r2, r3
40: e1a00003 mov r0, r3
44: e28dd01c add sp, sp, #28
48: e49de004 pop {lr} ; (ldr lr, [sp], #4)
4c: e12fff1e bx lr
optimizing saves a whole lot more though...
It's going to depend on the program and how exactly they want the location of the variables. Does the question want what code section they're stored in? .const .bss etc? Does it want specific addresses? Either way a good start is using objdump -S flag
objdump -S myprogram > dump.txt
This is nice because it will print out an intermixing of your source code and the assembly with addresses. From here just do a search for your int main and that should get you started.

Resources