Related
I am trying to understand where things are stored in memory such as global and static variables (.data, if not initialized to zero,) etc.
What I am trying to find/considering is a macro such as shown below:
#define thisInteger 100
Can this be found using objdump?
Additionally, if I were to then assign this to a new variable such as below, where would this be found (guessing in .data):
#define THIS_INTEGER 100
int newVariable = THIS_INTEGER;
Macros are not variables, thus they are not stored anywhere. When you do #define thisInteger 100, C preprocessor runs through the source code and replaces thisInteger with the integer literal 100. Asking where thisInteger is stored is the same as asking where 100 is stored. To verify this, try something like &thisInteger. It won't compile because &100 is illegal and makes no sense.
Can this be found using objdump?
No. Preprocessing is a copy-paste processing done before compilation.
Additionally, if I were to then assign this to a new variable such as below, where would this be found
Depends on where you define the variable.
macros are only compile time (they are preprocessed before compilation)
If you use gcc compiler you can see preprocessed C file by using -E gcc option. This preprocessed file will be used in the actual compilation.
Your preprocessed example
if the newVariable has static or thread storage duration it is initialized to this value before the main function is called
if the newVariable has an automatic storage duration it is initialized to this value when the function is called.
The compiler will source the value 100 wherever the macro is used. It is most likely found in various machine code instructions, using immediate mode addressing, e.g. when used within expression statements, like a = a + 100, or f(100).
The compiler will most likely embed small constants like this on demand within instructions involved in computing expressions like the above, so if we do a = a + thisInteger; and f(thisInteger), there will probably be two different machine code instructions that embed the constant 100 as an immediate, one for each such use. Global data takes work to address, more so than embedding small immediates, so the compiler will not attempt to share the 100 between the two uses as global or static data.
So, yes, you can see the 100 in objdump, but for many usages you probably need to look at the code (.text) section to find instructions that use #100 as an immediate operand (or #64h if printed in hex). In disassembly, you're looking for instructions like add [rbp+24], #100, or move rdi, #100.
You're right that if you declare a mutable global variable int x = thisInteger; you could find the 100 in the data (.data) section with objdump. But local variable of the same declaration would be initialized at runtime using machine code instructions, so something like mov ??, #100.
try it yourself and see
Starting point: so.c
#define THIS_INTEGER 100
int newVariable = THIS_INTEGER;
void fun0 ( void )
{
static int hello;
hello = 100;
}
int fun1 ( void )
{
int hello;
hello = 100;
return(hello);
}
the pre-processor does the search and replace for the defines
arm-none-eabi-gcc -save-temps -O2 -c so.c -o so.o
so.i
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
int newVariable = 100;
void fun0 ( void )
{
static int hello;
hello = 100;
}
int fun1 ( void )
{
int hello;
hello = 100;
return(hello);
}
You can see that THIS_INTEGER no longer exists it was just a macro/define its purpose is to keep tract of a constant in this case so that if you want to change it you can change all the relevant instances of it. But the compiler needs something it can actually compile.
The preprocessor output so.i is then fed to the actual compiler and that produces assembly: so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun0
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type fun0, %function
fun0:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
bx lr
.size fun0, .-fun0
.align 2
.global fun1
.syntax unified
.arm
.fpu softvfp
.type fun1, %function
fun1:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #100
bx lr
.size fun1, .-fun1
.global newVariable
.data
.align 2
.type newVariable, %object
.size newVariable, 4
newVariable:
.word 100
.ident "GCC: (GNU) 9.2.0"
That is fed to the assembler and then if you disassemble that you get:
Disassembly of section .text:
00000000 <fun0>:
0: e12fff1e bx lr
00000004 <fun1>:
4: e3a00064 mov r0, #100 ; 0x64
8: e12fff1e bx lr
Disassembly of section .data:
00000000 <newVariable>:
0: 00000064
Ehh I had hoped the static would keep it there. For the global variable being initialized that makes it .data if it werent it would be .bss. Then in .data you
can see the 100 (0x64). but it has nothing to do with the macro/define the macro/define simply put the actual value 100 in the actual compiled code.
For the other case, with optimization here, there is no variable on the stack or anything like that the value is placed in the return register, so in this case it lives in a register briefly.
Had the static worked as desired which in hindsight it makes sense it didnt. I was hoping for what I call a local global. Its a local variable but adding static puts it in .bss or .data not the stack and then was hoping to see code generated to then put 100 in a variable then put that in that .data/.bss area which works unoptimized of course but that is harder to read:
Disassembly of section .text:
00000000 <fun0>:
0: e52db004 push {r11} ; (str r11, [sp, #-4]!)
4: e28db000 add r11, sp, #0
8: e59f3018 ldr r3, [pc, #24] ; 28 <fun0+0x28>
c: e3a02064 mov r2, #100 ; 0x64
10: e5832000 str r2, [r3]
14: e1a00000 nop ; (mov r0, r0)
18: e1a00003 mov r0, r3
1c: e28bd000 add sp, r11, #0
20: e49db004 pop {r11} ; (ldr r11, [sp], #4)
24: e12fff1e bx lr
28: 00000000 andeq r0, r0, r0
0000002c <fun1>:
2c: e52db004 push {r11} ; (str r11, [sp, #-4]!)
30: e28db000 add r11, sp, #0
34: e24dd00c sub sp, sp, #12
38: e3a03064 mov r3, #100 ; 0x64
3c: e50b3008 str r3, [r11, #-8]
40: e51b3008 ldr r3, [r11, #-8]
44: e1a00003 mov r0, r3
48: e28bd000 add sp, r11, #0
4c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
50: e12fff1e bx lr
Disassembly of section .data:
00000000 <newVariable>:
0: 00000064 andeq r0, r0, r4, rrx
Disassembly of section .bss:
00000000 <hello.4142>:
0: 00000000 andeq r0, r0, r0
Specifically:
c: e3a02064 mov r2, #100 ; 0x64
10: e5832000 str r2, [r3]
The 100 is put in a register, then that register value is written to memory where the local global hello from fun0 lives in .bss.
macros/defines simply search and replace, the preprocessor is going to iterate as many times as needed for the various levels/layers of macros until they are all replaced, none of them exist as written in the pre-processed code. Then that is sent to the compiler.
The VALUE 100 in this case is visible in the final output but it depends on how you used it as to how it is represented or where it is stored.
when I create ARM assembly code from C code with gcc -S, I get a variant of the LDR instruction that I don't know. Specifically, I get the "ldr r3, .L5" instruction where ".L5" is a lable defined by the compiler. It is not clear to me why I don't get the pseudoinstruction "ldr r3, =.L5", which should be the only way to load an arbitrary number in a register.
More in details:
I start from this C code (file name: sum_squares_C.c):
int sum;
int main(){
sum = 0;
for(int i=1; i<=n; i++){
sum = sum + i*i;
}
}
Then on a Raspeberry PI, I compile with "gcc -O0 -S sum_squares_C.c", with compiler version gcc (Raspbian 8.3.0-6+rpi1) 8.3.0.
The output is this ARM code (the instruction "ldr r3, .L5" is in the 7th line after label "main"):
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "sum_squares_C.c"
.text
.global n
.data
.align 2
.type n, %object
.size n, 4
n:
.word 1
.comm sum,4,4
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
# args = 0, pretend = 0, frame = 8
# frame_needed = 1, uses_anonymous_args = 0
# link register save eliminated.
str fp, [sp, #-4]!
add fp, sp, #0
sub sp, sp, #12
ldr r3, .L5
mov r2, #0
str r2, [r3]
mov r3, #1
str r3, [fp, #-8]
b .L2
.L3:
ldr r3, [fp, #-8]
ldr r2, [fp, #-8]
mul r2, r2, r3
ldr r3, .L5
ldr r3, [r3]
add r3, r2, r3
ldr r2, .L5
str r3, [r2]
ldr r3, [fp, #-8]
add r3, r3, #1
str r3, [fp, #-8]
.L2:
ldr r3, .L5+4
ldr r3, [r3]
ldr r2, [fp, #-8]
cmp r2, r3
ble .L3
mov r3, #0
mov r0, r3
add sp, fp, #0
# sp needed
ldr fp, [sp], #4
bx lr
.L6:
.align 2
.L5:
.word sum
.word n
.size main, .-main
.ident "GCC: (Raspbian 8.3.0-6+rpi1) 8.3.0"
.section .note.GNU-stack,"",%progbits
It seems to me that gcc uses the instruction "ldr r3, .L5" as equivalent to "ldr r3, =.L5". Is it correct? Where can I find the definition of this instruction syntax? Is it possible to force gcc to not use this instruction, but use "ldr r3, =.L5" (I need this for teaching reasons)?
Thanks!
Francesco
ldr r3, .L5 loads a word from the address .L5 into r3. At the label .L5 there is the address of the variable sum. So this loads the address of sum into r3.
ldr r3, =.L5 loads the address of .L5 into r3. Then the program would need to dereference it again in order to get the address of sum. There is no reason to do this.
When you use ldr r3, =.L5 the assembler stores the address of .L5 somewhere, and then loads from that address. So this:
ldr r3, =.L5
...
.L5:
.word sum
is the same as this:
ldr r3, .address_of_L5
...
.L5:
.word sum
...
.address_of_L5:
.word .L5
As you can see, the compiler has already done this for sum. Instead of writing this assembly:
ldr r3, =sum
the compiler has written:
ldr r3, .L5
...
.L5:
.word sum
which is exactly what the assembler would have done anyway. I don't know why the compiler wants to do this instead of the assembler.
It is not clear to me why I don't get the pseudoinstruction "ldr r3, =.L5", which should be the only way to load an arbitrary number in a register.
Notice this is not the only way to load an arbitrary number into a register. It's not even a real way to load an arbitrary number into a register. It's a pseudoinstruction (as you know): it's not something the CPU can actually do, it's something that the assembler can "compile" for your convenience.
To save typing and assume a risk a person might use:
ldr r3,=sum
ldr r3,[r3]
As pointed out in the other example the assembler will create in machine code the equivalent of what the human could have typed without the =address trick:
ldr r3,address_of_sum (without the =)
ldr r3,[r3]
...
address_of_sum: .word sum
And that first ldr (not pseudo as it translates directly into a known instruction, one to one) is a pc-relative load (assuming it can reach).
Both of these though are assembler specific as assembly language is defined by the assembler not the target.
The =address shortcut is not supported by all arm assemblers and should be used with care, for certain values it does not turn into a word in the pool with a pc relative load.
For questions like this first examine the disassembly, most of the time that will answer your question, even better examine the dissasembly first then in question the assembly. Compiler generated assembly is not as easy to read and follow as a disassembly, especially when linked. It is also easier to learn from optimized code than unoptimized as so much of the code is this stack (or in this case global) variable stuff.
ldr r3,=0x1000
ldr r3,=0x1234
b .
00000000 <.text>:
0: e3a03a01 mov r3, #4096 ; 0x1000
4: e51f3000 ldr r3, [pc, #-0] ; c <.text+0xc>
8: eafffffe b 8 <.text+0x8>
c: 00001234 andeq r1, r0, r4, lsr r2
In one case where it can it generates a mov, where it cant then it allocates from the pool and places the value there then does a pc relative load. Now yes when reading the output this way you need to see/understand/ignore the andeq disassembly that line we are looking at the value 0x00001234 and seeing the instruction generated.
You should not always assume the =address trick will work if you choose to try various tools, it works for gnu now if it can find a pool if it can't then you either need to just do the typing yourself or add a .pool or whatever the other pseudocode that does the same thing is to help the assembler find a place for this value as needed.
I would expect an assembler to always place this (=address) in the pool for an external reference, but it is technically possible for a toolchain to put a placeholder there and let the linker fill it in either with a mov or add a nearby item and place the value there like binutils does with a bl to an external reference.
gas:
ldr r3,=sum
b .
00000000 <.text>:
0: e51f3000 ldr r3, [pc, #-0] ; 8 <.text+0x8>
4: eafffffe b 4 <.text+0x4>
8: 00000000 andeq r0, r0, r0
The linker will fill in the address later as with your compiler output. Now the -0 disassembly is very interesting, almost amusing.
According to this thread, initializing an array with a shorter string literal pads the array with zeros.
So, is there any reason why these two functions (test1 and test2) would produce different results when compiled for ARM Cortex M4?
extern void write(char * buff);
void test1(void)
{
char buff[8] = {'t', 'e', 's', 't', 0, 0, 0, 0 };
write(buff);
}
void test2(void)
{
char buff[8] = "test";
write(buff);
}
I am getting equivalent assemblies for x86-64, but on ARM gcc I get different output:
test1:
str lr, [sp, #-4]!
sub sp, sp, #12
mov r3, sp
ldr r2, .L4
ldm r2, {r0, r1}
stm r3, {r0, r1}
mov r0, r3
bl write
add sp, sp, #12
ldr pc, [sp], #4
.L4:
.word .LANCHOR0
test2:
mov r3, #0
str lr, [sp, #-4]!
ldr r2, .L8
sub sp, sp, #12
ldm r2, {r0, r1}
str r0, [sp]
mov r0, sp
strb r1, [sp, #4]
strb r3, [sp, #5]
strb r3, [sp, #6]
strb r3, [sp, #7]
bl write
add sp, sp, #12
ldr pc, [sp], #4
.L8:
.word .LANCHOR0+8
First of, the code is equivalent in the sense that the contents of memory constituting the objects named buf will be the same.
That said, the compiler clearly produces worse code for the second function. Consequently, since there is a way to produce more optimized code that has equivalent semantics, it would be reasonable to consider this an optimization failure in the compiler and file a bug for it.
If the compiler wanted to emit the same code, it would have to recognize that the string literal representation in memory can be zero-padded without changing the semantics of the program (though the string literal itself cannot be padded, since sizeof "test" cannot be equal to sizeof "test\0\0\0").
However, since this would only be advantageous if the string literal is used to initialize an explicit-length array (usually a bad idea) that is longer than the literal and where normal string semantics are not sufficient (bytes past the null-terminator are relevant), the value of better optimization for this case seems limited.
Addendum: If you set godbolt to not remove assembler directives, you can see how the literals are created:
.section .rodata
.align 2
.set .LANCHOR0,. + 0
.byte 116
.byte 101
.byte 115
.byte 116
.byte 0
.byte 0
.byte 0
.byte 0
.ascii "test\000"
.space 3
Interestingly, the compiler does not deduplicate, and it leaves the padding after the string literal to the assembler (.space 3), rather than explicitly zeroing it.
does anyone know the code for inputting an integer in ARM? this is what I have so far
.text .global main
main:
mov r0, #0
ldr r1, =inputint
mov r2, #4
ldr R7, #3
swi 0
.data
inputint: .asciz "%d"
I'm new to assembly programing and I'm programing for ARM.
I'm making a program with two subroutines: one that appends a byte info on a byte vector in memory, and one that prints this vector. The first address of the vector contains the number of elements that follows, up to 255. As I debug it with GDB, I can see that the "appendbyte" subroutine works fine. But when it comes to the "printvector" one, there are some problems. First, the element loaded in register r1 is wrong (it loads 0, when it should be 7). Then, when I read the registers values with GDB after I use the "printf" function, a lot of register get other values that weren't supposed to change, since I didn't modify them, I just used "printf". Why is "printf" modyfing the values.
I was thinking something about the align. I'm not sure if i'm using the directive correctly.
Here is the full code:
.text
.global main
.equ num, 255 # Max number of elements
main:
push {lr}
mov r8, #7
bl appendbyte
mov r8, #5
bl appendbyte
mov r8, #8
bl appendbyte
bl imprime
pop {pc}
.text
.align
printvector:
push {lr}
ldr r3, =vet # stores the address of the start of the vector in r3
ldr r2, [r3], #1 # stores the number of elements in r2
.align
loop:
cmp r2, #0 #if there isn't elements to print
beq fimimprime #quit subroutine
ldr r0, =node #r0 receives the print format
ldr r1, [r3], #1 #stores in r1 the value of the element pointed by r3. Increments r3 after that.
sub r2, r2, #1 #decrements r2 (number of elements left to print)
bl printf #call printf
b loop #continue on the loop
.align
endprint:
pop {pc}
.align
appendbyte:
push {lr}
ldr r0, =vet #stores in r0 the beggining address of the vector
ldr r1, [r0], #1 #stores in r1 the number of elements and makes r0 point to the next address
add r3, r0, r1 #stores in r3 the address of the first available position
str r8, [r3] #put the value at the first available position
ldr r0, =vet #stores in r0 the beggining address of the vector
add r1, r1, #1 # increment the number of elements in the vector
str r1, [r0] # stores it in the vector
pop {pc}
.data # Read/write data follows
.align # Make sure data is aligned on 32-bit boundaries
vet: .byte 0
.skip num # Reserve num bytes
.align
node: .asciz "[%d]\n"
.end
The problems are in
ldr r1, [r3], #1
and
bl printf
I hope I was clear on the problem.
Thanks in advance!
The ARM ABI specifies that registers r0-r3 and r12 are to be considered volatile on function calls. Meaning that the callee does not have to restore their value. LR also changes if you use bl, because LR will then contain the return address for the called function.
More information can be found on ARMs Information Center entry for the ABI or in the APCS (ARM Procedure Call Standard) document.
printvector:
push {lr}
ldr r3, =vet # stores the address of the start of the vector in r3
ldr r2, [r3], #1 # stores the number of elements in r2
.align
loop:
cmp r2, #0 #if there isn't elements to print
beq fimimprime #quit subroutine
ldr r0, =node #r0 receives the print format
ldr r1, [r3], #1 #stores in r1 the value of the element pointed by r3. Increments r3 after that.
sub r2, r2, #1 #decrements r2 (number of elements left to print)
bl printf #call printf
b loop #continue on the loop
.align
endprint:
pop {pc}
that is definitely not how you use align. Align is there to...align the thing that follows on some boundary (specified in an optional argument, note this is an assembler directive, not an instruction) by padding the binary with zeros or whatever the padding is. So you dont want a .align in the code flow, between instructions. You have done that between the ldr r1, and the cmp r2 after loop. Now the align after b loop is not harmful as the branch is unconditional but at the same time not necessary as there is no reason to align there the assembler is generating an instruction flow so the bytes cant be unaligned. Where you would use .align is after some data declaration before instructions:
.byte 1,2,3,4,5,
.align
some_code_branch_dest:
In particular one where the assembler complains or the code crashes.