I have this test.c on my Ubuntu14.04 x86_64 system.
void foo(int a, long b, int c) {
}
int main() {
foo(0x1, 0x2, 0x3);
}
I compiled this with gcc --no-stack-protector -g test.c -o test and got the assembly code with objdump -dS test -j .text
00000000004004ed <_Z3fooili>:
void foo(int a, long b, int c) {
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: 89 7d fc mov %edi,-0x4(%rbp)
4004f4: 48 89 75 f0 mov %rsi,-0x10(%rbp)
4004f8: 89 55 f8 mov %edx,-0x8(%rbp) // !!Attention here!!
}
4004fb: 5d pop %rbp
4004fc: c3 retq
00000000004004fd <main>:
int main() {
4004fd: 55 push %rbp
4004fe: 48 89 e5 mov %rsp,%rbp
foo(0x1, 0x2, 0x3);
400501: ba 03 00 00 00 mov $0x3,%edx
400506: be 02 00 00 00 mov $0x2,%esi
40050b: bf 01 00 00 00 mov $0x1,%edi
400510: e8 d8 ff ff ff callq 4004ed <_Z3fooili>
}
400515: b8 00 00 00 00 mov $0x0,%eax
40051a: 5d pop %rbp
40051b: c3 retq
40051c: 0f 1f 40 00 nopl 0x0(%rax)
I know that the function parameters should be pushed to stack from right to left in sequence. So I was expecting this
void foo(int a, long b, int c) {
push %rbp
mov %rsp,%rbp
mov %edi,-0x4(%rbp)
mov %rsi,-0x10(%rbp)
mov %edx,-0x14(%rbp) // c should be push on stack after b, not after a
But gcc seemed clever enough to push parameter c(0x3) right after a(0x1) to save the four bytes which should be reserved for data alignment of b(0x2). Can someone please explain this and show me some documentation on why gcc did this?
The parameters are passed in registers - edi, esi, edx (then rcx, r8, r9 and only then pushed on stack) - just what the Linux amd64 calling convention mandates.
What you see in your function is just how the compiler saves them upon entry when compiling with -O0, so they're in memory where a debugger can modify them. It is free to do it in any way it wants, and it cleverly does this space optimization.
The only reason it does this is that gcc -O0 always spills/reloads all C variables between C statements to support modifying variables and jumping between lines in a function with a debugger.
All this would be optimized out in release build in the end.
Related
I know and understand the purpose of volatile variables and optimisation in general (well, I think I do!). This question relates specifically to what happens if a variable is accessed outside the module it is declared in.
In the following scenario, if funcThatWaits was called inside bar.c, it could be optimised and not fetch the value of sTheVar each loop iteration.
However, when GetTheVar is called externally could the same optimisation apply or does the function call ensure sTheVar will always be read each loop iteration?
I am not suggesting this is good code or practice, but an example for the sake of the question.
bar.h
int GetTheVar(void);
bar.c
static /*volatile*/ int sTheVar;
int GetTheVar(void)
{
return sTheVar;
}
static void someISROrFuncCalledFromAnotherThread(void)
{
sTheVar = 1;
}
foo.c
#include "bar.h"
void funcThatWaits(void)
{
while(GetTheVar() != 1) {}
}
when GetTheVar is called externally could the same optimisation apply or does the function call ensure sTheVar will always be read each loop iteration?
The same optimization may apply. For instance, if you are using LTO (Link-Time Optimization), then the compiler knows everything about GetTheVar and will likely decide funcThatWaits is an infinite loop (which, by the way, would be UB).
Function calls are not going to be optimized away since, for all the caller knows, the function being called could depend on some exogenous state.
I compiled the following three files using gcc:
foo.c
#include "bar.h"
void funcThatWaits(void) {
while ( getVar() != 1 );
}
bar.c
#include "foo.h"
static int theVar;
int getTheVar(void) {
return theVar;
}
void theFunc(void) {
funcThatWaits();
}
test.c
#include "bar.h"
int main() {
theFunc();
return 0;
}
Compiling those three into a.out and running objdump -d a.out, the following comes out:
00000000000005fa <main>:
5fa: 55 push %rbp
5fb: 48 89 e5 mov %rsp,%rbp
5fe: e8 25 00 00 00 callq 628 <theFunc>
603: b8 00 00 00 00 mov $0x0,%eax
608: 5d pop %rbp
609: c3 retq
000000000000060a <funcThatWaits>:
60a: 55 push %rbp
60b: 48 89 e5 mov %rsp,%rbp
60e: 90 nop
60f: e8 08 00 00 00 callq 61c <getTheVar>
614: 83 f8 01 cmp $0x1,%eax
617: 75 f6 jne 60f <funcThatWaits+0x5>
619: 90 nop
61a: 5d pop %rbp
61b: c3 retq
000000000000061c <getTheVar>:
61c: 55 push %rbp
61d: 48 89 e5 mov %rsp,%rbp
620: 8b 05 ee 09 20 00 mov 0x2009ee(%rip),%eax # 201014 <theVar>
626: 5d pop %rbp
627: c3 retq
0000000000000628 <theFunc>:
628: 55 push %rbp
629: 48 89 e5 mov %rsp,%rbp
62c: e8 d9 ff ff ff callq 60a <funcThatWaits>
631: 90 nop
632: 5d pop %rbp
633: c3 retq
634: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
63b: 00 00 00
63e: 66 90 xchg %ax,%ax
I'm trying to change C code to assembly code.
At first, i used gcc and objdump function to extract assembly code from c code.
The C code was just simple printf code.
#include <stdio.h>
int main(){
printf("this\n");
return 0;
}
gcc -c -S -O0 test.c
objdump -dS test.o > test.txt
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <main+0xe>
e: b8 00 00 00 00 mov $0x0,%eax
13: 5d pop %rbp
14: c3 retq
in this assembly code,
i was curious why callq instructions destination is e
so i run this code in gdb using
disas main
(gdb) disas main
Dump of assembler code for function main:
0x0000000000400526 <+0>: push %rbp
0x0000000000400527 <+1>: mov %rsp,%rbp
0x000000000040052a <+4>: mov $0x4005c4,%edi
0x000000000040052f <+9>: callq 0x400400 <puts#plt>
0x0000000000400534 <+14>: mov $0x0,%eax
0x0000000000400539 <+19>: pop %rbp
0x000000000040053a <+20>: retq
in this code, i assumed that 0x400400 is the address of printf function.
Why does objdump and gdb's assembly code show different result?
How can i make objdump result shows the right callq destination?
When you run the objdump command you are not disassembling the final executable, you are disassembling the object file produced by the compiler (test.o). I performed similar steps (using your code) to you (compiling and running objdump and dissas in GDB) except I performed the objdump on the linked executable not on the object file (this means I did not compile with the -c flag). The outputs are below:
objdump -dS a.out:
1140: 55 push %rbp
1141: 48 89 e5 mov %rsp,%rbp
1144: 48 83 ec 10 sub $0x10,%rsp
1148: 48 8d 3d b5 0e 00 00 lea 0xeb5(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
114f: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1156: b0 00 mov $0x0,%al
1158: e8 d3 fe ff ff callq 1030 <printf#plt>
115d: 31 c9 xor %ecx,%ecx
115f: 89 45 f8 mov %eax,-0x8(%rbp)
1162: 89 c8 mov %ecx,%eax
1164: 48 83 c4 10 add $0x10,%rsp
1168: 5d pop %rbp
1169: c3 retq
116a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
GDB:
(gdb) disas main
Dump of assembler code for function main:
0x0000000000001140 <+0>: push %rbp
0x0000000000001141 <+1>: mov %rsp,%rbp
0x0000000000001144 <+4>: sub $0x10,%rsp
0x0000000000001148 <+8>: lea 0xeb5(%rip),%rdi # 0x2004
0x000000000000114f <+15>: movl $0x0,-0x4(%rbp)
0x0000000000001156 <+22>: mov $0x0,%al
0x0000000000001158 <+24>: callq 0x1030 <printf#plt>
0x000000000000115d <+29>: xor %ecx,%ecx
0x000000000000115f <+31>: mov %eax,-0x8(%rbp)
0x0000000000001162 <+34>: mov %ecx,%eax
0x0000000000001164 <+36>: add $0x10,%rsp
0x0000000000001168 <+40>: pop %rbp
0x0000000000001169 <+41>: retq
End of assembler dump.
As you can see, the two disassemblies are the same, except for some minor syntax differences (e.g. GDB prefixes it's addresses with 0x).
What you're missing with objdump by default is relocations.
Running objdump with the -r flag lets you see these. e.g.
objdump -Sr foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .rodata
9: e8 00 00 00 00 callq e <main+0xe>
a: R_X86_64_PC32 puts-0x4
e: b8 00 00 00 00 mov $0x0,%eax
13: 5d pop %rbp
14: c3 retq
Shows us that the call will use a PC relative address, pointing to puts
Is there a difference between declaring a variable first and then assigning a value or directly declaring and assigning a value in the compiled function? Does the compiled function do the same work? e.g, does it still read the parameters, declare variables and then assign value or is there a difference between the two examples in the compiled versions?
example:
void foo(u32 value) {
u32 extvalue = NULL;
extvalue = value;
}
compared with
void foo(u32 value) {
u32 extvalue = value;
}
I am under the impression that there is no difference between those two functions if you look at the compiled code, e.g they will look the same and i will not be able to tell which is which.
it depends on the compiler & the optimization level of course.
A dumb compiler/low optimization level when it sees:
u32 extvalue = NULL;
extvalue = value;
could set to NULL then to value in the next line.
Since extvalue isn't used in-between, the NULL initialization is useless and most compilers directly set to value as an easy optimization
Note that declaring a variable isn't really an instruction per se. The compiler just allocates auto memory to store this variable.
I've tested a simple code with and without assignment and the result is diff
erent when using gcc compiler 6.2.1 with -O0 (don't optimize anything) flag:
#include <stdio.h>
void foo(int value) {
int extvalue = 0;
extvalue = value;
printf("%d",extvalue);
}
disassembled:
Disassembly of section .text:
00000000 <_foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 28 sub $0x28,%esp
6: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp) <=== here we see the init
d: 8b 45 08 mov 0x8(%ebp),%eax
10: 89 45 f4 mov %eax,-0xc(%ebp)
13: 8b 45 f4 mov -0xc(%ebp),%eax
16: 89 44 24 04 mov %eax,0x4(%esp)
1a: c7 04 24 00 00 00 00 movl $0x0,(%esp)
21: e8 00 00 00 00 call 26 <_foo+0x26>
26: c9 leave
27: c3 ret
now:
void foo(int value) {
int extvalue;
extvalue = value;
printf("%d",extvalue);
}
disassembled:
Disassembly of section .text:
00000000 <_foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 28 sub $0x28,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 89 45 f4 mov %eax,-0xc(%ebp)
c: 8b 45 f4 mov -0xc(%ebp),%eax
f: 89 44 24 04 mov %eax,0x4(%esp)
13: c7 04 24 00 00 00 00 movl $0x0,(%esp)
1a: e8 00 00 00 00 call 1f <_foo+0x1f>
1f: c9 leave
20: c3 ret
21: 90 nop
22: 90 nop
23: 90 nop
the 0 init has disappeared. The compiler didn't optimize the initialization in that case.
If I switch to -O2 (good optimization level) the code is then identical in both cases, compiler found that the initialization wasn't necessary (still, silent, no warnings):
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: c7 04 24 00 00 00 00 movl $0x0,(%esp)
10: 89 44 24 04 mov %eax,0x4(%esp)
14: e8 00 00 00 00 call 19 <_foo+0x19>
19: c9 leave
1a: c3 ret
I tried these functions in godbolt:
void foo(uint32_t value)
{
uint32_t extvalue = NULL;
extvalue = value;
}
void bar(uint32_t value)
{
uint32_t extvalue = value;
}
I ported to the actual type uint32_t rather than u32 which is not standard. The resulting non-optimized assembly generated by x86-64 GCC 6.3 is:
foo(unsigned int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-4], eax
nop
pop rbp
ret
bar(unsigned int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-4], eax
nop
pop rbp
ret
So clearly the non-optimized code retains the (weird, as pointed out by others since it's not written to a pointer) NULL assignment, which is of course pointless.
I'd vote for the second one since it's shorter (less to hold in one's head when reading the code), and never allow/recommend the pointless setting to NULL before overwriting with the proper value. I would consider that a bug, since you're saying/doing something you don't mean.
I have a main.c file
int boyut(const char* string);
char greeting[6] = {"Helle"};
int main(){
greeting[5] = 0x00;
int a = boyut(greeting);
return 0;
}
int boyut(const char* string){
int len=0;
while(string[len]){
len++;
}
return len;
}
I compile it with GCC command gcc -Wall -m32 -nostdlib main.c -o main.o
When I check disassembly, I see the variable greeting is placed in .data segment. And before calling boyut it's not pushed into stack. Inside the boyut function, it acts like variable greeting is in stack segment. So that variable actually not being accessed inside the function. Why is it generating a code like this? How can I correct this?
Disassembly of section .text:
080480f8 <main>:
80480f8: 55 push ebp
80480f9: 89 e5 mov ebp,esp
80480fb: 83 ec 18 sub esp,0x18
80480fe: c6 05 05 a0 04 08 00 mov BYTE PTR ds:0x804a005,0x0
8048105: 83 ec 0c sub esp,0xc
8048108: 68 00 a0 04 08 push 0x804a000
804810d: e8 0d 00 00 00 call 804811f <boyut>
8048112: 83 c4 10 add esp,0x10
8048115: 89 45 f4 mov DWORD PTR [ebp-0xc],eax
8048118: b8 00 00 00 00 mov eax,0x0
804811d: c9 leave
804811e: c3 ret
0804811f <boyut>:
804811f: 55 push ebp
8048120: 89 e5 mov ebp,esp
8048122: 83 ec 10 sub esp,0x10
8048125: c7 45 fc 00 00 00 00 mov DWORD PTR [ebp-0x4],0x0
804812c: eb 04 jmp 8048132 <boyut+0x13>
804812e: 83 45 fc 01 add DWORD PTR [ebp-0x4],0x1
8048132: 8b 55 fc mov edx,DWORD PTR [ebp-0x4]
8048135: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
8048138: 01 d0 add eax,edx
804813a: 0f b6 00 movzx eax,BYTE PTR [eax]
804813d: 84 c0 test al,al
804813f: 75 ed jne 804812e <boyut+0xf>
8048141: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
8048144: c9 leave
8048145: c3 ret
main.o: file format elf32-i386
Contents of section .data:
804a000 48656c6c 6500 Helle.
The function boyut is declared like this:
int boyut(const char* string);
That means: boyut takes a pointer to char and returns an int. And indeed, the compiler pushes a point to char on the stack. This pointer points to the beginning of greeting. This happens, because in C, an array is implicitly converted to a pointer to its first element under most circumstances.
If you want to pass an array to a function so it is copied to the function, you have to wrap the array into a structure and pass that.
Here is my code
#include <stdio.h>
char func_with_ret()
{
return 1;
}
void func_1()
{
char buf[16];
func_with_ret();
}
void func_2()
{
char buf[16];
getchar();
}
int main()
{
func_1();
func_2();
}
I declare 16-byte local buffers to keep the stack pointer aligned(for x86).
I write two function "func_1", "func_2", they look almost the same - allocate 16-byte local buffer and call a function with char return value and no parameter, but one is self-defined and the other is getchar().
Compile with gcc parameter "-fno-stack-protector"(so there's no canary on stack) and "-O0" to avoid unexpected optimization behavior.
Here is the disassembly code by gdb for func_1 and func_2.
Dump of assembler code for function func_1:
0x08048427 <+0>: push ebp
0x08048428 <+1>: mov ebp,esp
0x0804842a <+3>: sub esp,0x10
0x0804842d <+6>: call 0x804841d <func_with_ret>
0x08048432 <+11>: leave
0x08048433 <+12>: ret
Dump of assembler code for function func_2:
0x08048434 <+0>: push ebp
0x08048435 <+1>: mov ebp,esp
0x08048437 <+3>: sub esp,0x18
0x0804843a <+6>: call 0x80482f0 <getchar#plt>
0x0804843f <+11>: leave
0x08048440 <+12>: ret
In func_1, buffer is allocated for 0x10(16) bytes,
but in func_2 , it is allocated for 0x18(24) bytes, why?
Edit:
#Attie figure out that buffer size is actually the same for both, but there's
strange 8-byte stack spaces in func_2 don't know where it comes from.
I have just tried to reproduce this, see below:
Compiling for x86-64 (no joy):
$ gcc p.c -g -o p -O0 -fno-stack-protector
$ objdump -d p
p: file format elf64-x86-64
[...]
0000000000400538 <func_1>:
400538: 55 push %rbp
400539: 48 89 e5 mov %rsp,%rbp
40053c: 48 83 ec 10 sub $0x10,%rsp
400540: b8 00 00 00 00 mov $0x0,%eax
400545: e8 e3 ff ff ff callq 40052d <func_with_ret>
40054a: c9 leaveq
40054b: c3 retq
000000000040054c <func_2>:
40054c: 55 push %rbp
40054d: 48 89 e5 mov %rsp,%rbp
400550: 48 83 ec 10 sub $0x10,%rsp
400554: e8 c7 fe ff ff callq 400420 <getchar#plt>
400559: c9 leaveq
40055a: c3 retq
Compiling for i386 (success):
$ gcc p.c -g -o p -O0 -fno-stack-protector -m32
$ objdump -d p
p: file format elf32-i386
[...]
08048427 <func_1>:
8048427: 55 push %ebp
8048428: 89 e5 mov %esp,%ebp
804842a: 83 ec 10 sub $0x10,%esp
804842d: e8 eb ff ff ff call 804841d <func_with_ret>
8048432: c9 leave
8048433: c3 ret
08048434 <func_2>:
8048434: 55 push %ebp
8048435: 89 e5 mov %esp,%ebp
8048437: 83 ec 18 sub $0x18,%esp
804843a: e8 b1 fe ff ff call 80482f0 <getchar#plt>
804843f: c9 leave
8048440: c3 ret
This doesn't appear to be related to any of the following:
The fact that getchar() is a library function, and thus we are calling into the PLT (Procedure Linkage Table).
Related to the return type of the function
The order of the calls in main()
The order of the functions in the binary
Dynamic / static compliation
If you however increase the size of your buffer by one, to 17, then the stack usage increases to 32 and 40 bytes (from 16 and 24 bytes) respectively. The difference is 16 bytes, which is used for alignment, as answered here.
I can't answer why the stack alignment appears to be off by 8-bytes upon entering func_2() though.
If you update func_1() and func_2() to have a 15-byte buffer, and a single byte variable, and write data to them, then you can see where these items are in the stack frame:
void func_1(void) {
char buf[15];
char x;
buf[0] = 0xaa;
x = 0x55;
func_with_ret();
}
void func_2(void) {
char buf[15];
char x;
buf[0] = 0xaa;
x = 0x55;
getchar();
}
08048434 <func_1>:
8048434: 55 push %ebp
8048435: 89 e5 mov %esp,%ebp
8048437: 83 ec 10 sub $0x10,%esp
804843a: c6 45 f0 aa movb $0xaa,-0x10(%ebp)
804843e: c6 45 ff 55 movb $0x55,-0x1(%ebp)
8048442: e8 d6 ff ff ff call 804841d <func_with_ret>
8048447: c9 leave
8048448: c3 ret
08048449 <func_2>:
8048449: 55 push %ebp
804844a: 89 e5 mov %esp,%ebp
804844c: 83 ec 18 sub $0x18,%esp
804844f: c6 45 e8 aa movb $0xaa,-0x18(%ebp)
8048453: c6 45 f7 55 movb $0x55,-0x9(%ebp)
8048457: e8 94 fe ff ff call 80482f0 <getchar#plt>
804845c: c9 leave
804845d: c3 ret