MOVABS opcode in the assembly code - disassembly

While debugging one of the assembly code examples, I found following piece of information:
(gdb) x /10i 0x4005c4
0x4005c4: push %rbp
0x4005c5: mov %rsp,%rbp
0x4005c8: sub $0xa0,%rsp
0x4005cf: mov %fs:0x28,%rax
0x4005d8: mov %rax,-0x8(%rbp)
0x4005dc: xor %eax,%eax
0x4005de: movabs $0x6673646c6a6b3432,%rax
0x4005e8: mov %rax,-0x40(%rbp)
0x4005ec: movl $0x323339,-0x38(%rbp)
0x4005f3: movl $0x553059,-0x90(%rbp)
As per my understanding movabs should not be used, it seems like it was introduced intentionally. Am I right in my understanding?
What should be the equivalent MOV command to replace it?

As a direct copy from this question: https://reverseengineering.stackexchange.com/questions/2627/what-is-the-meaning-of-movabs-in-gas-x86-att-syntax
[...] The movabs instruction to load arbitrary 64-bit
constant into register and to load/store integer register from/to
arbitrary constant 64-bit address is available.
http://www.ucw.cz/~hubicka/papers/amd64/node1.html
It does exactly what you'd expect from it - it puts the immediate into the register.

Related

Difference in x86-32 and x64 Assembly stack allocation for a fixed-size buffer with unoptimized C (GCC)

Doing some basic disassembly and have noticed that the buffer is being given additional buffer space for some reason although what i am looking at in a tutorial uses the same code but is only given the correct (500) chars in length. Why is this?
My code:
#include <stdio.h>
#include <string.h>
int main (int argc, char** argv){
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}
compiled with GCC, the dissembled code is:
0x0000000000001139 <+0>: push %rbp
0x000000000000113a <+1>: mov %rsp,%rbp
0x000000000000113d <+4>: sub $0x210,%rsp
0x0000000000001144 <+11>: mov %edi,-0x204(%rbp)
0x000000000000114a <+17>: mov %rsi,-0x210(%rbp)
0x0000000000001151 <+24>: mov -0x210(%rbp),%rax
0x0000000000001158 <+31>: add $0x8,%rax
0x000000000000115c <+35>: mov (%rax),%rdx
0x000000000000115f <+38>: lea -0x200(%rbp),%rax
0x0000000000001166 <+45>: mov %rdx,%rsi
0x0000000000001169 <+48>: mov %rax,%rdi
0x000000000000116c <+51>: call 0x1030 <strcpy#plt>
0x0000000000001171 <+56>: mov $0x0,%eax
0x0000000000001176 <+61>: leave
0x0000000000001177 <+62>: ret
However, this video https://www.youtube.com/watch?v=1S0aBV-Waeo clearly only has 500 bytes assigned
Why is this this the case as the only difference I can see here is one is 32-bit and another (mine) is on x86-64.
500 is not a multiple of 16.
The x86-64 ABI (application binary interface) requires the stack pointer to be a multiple of 16 whenever a call instruction is about to happen. (Since call pushes an 8-byte return address, this means the stack pointer is always congruent to 8, mod 16, when control reaches the first instruction of a called function.) For the code shown, it is convenient for the compiler to achieve this requirement by increasing the value it uses in the sub instruction, making it be a multiple of 16.
The x86-32 ABI did not make this requirement, so there was no reason for the compiler used in the video to increase the size of the stack frame.
Note that you appear to have compiled your code without optimization. I get this at -O2:
0x0000000000000000 <+0>: sub $0x208,%rsp
0x0000000000000007 <+7>: mov 0x8(%rsi),%rsi
0x000000000000000b <+11>: mov %rsp,%rdi
0x000000000000000e <+14>: call <strcpy#PLT>
0x0000000000000013 <+19>: xor %eax,%eax
0x0000000000000015 <+21>: add $0x208,%rsp
0x000000000000001c <+28>: ret
The stack adjustment is still somewhat larger than the size of the array, but not as big as what you had, and no longer a multiple of 16; the difference is that with optimization on, the frame pointer is eliminated, so %rbp does not need to be saved and restored, and so the stack pointer is not a multiple of 16 at the point of the sub instruction.
(Incidentally, there is no requirement anywhere for a stack frame to be as small as possible. "Quality of implementation" dictates that it should be as small as possible, but for various reasons it's quite common for the compiler to miss that target. In my optimized code dump, I don't see any reason why the immediate operand to sub and add couldn't have been 0x1f8 (504).

Understanding x86-64 assembly for simple program in C with a function call

I have simple C program that produces this x86-64 assembly for function func
#include <stdio.h>
#include <string.h>
void func(char *name)
{
char buf[90];
strcpy(buf, name);
printf("Welcome %s\n", buf);
}
int main(int argc, char *argv[])
{
func(argv[1]);
return 0;
}
So I think this
0x000000000000118d <+4>: push %rbp
pushes the base pointer like placed argument which is char *name
then 0x000000000000118e <+5>: mov %rsp,%rbp set stack pointer to what at base pointer I belive that above and this makes stack point points to char *name at this point
then
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
I am little unsure about this. Why is 0xffffffffffffff80 added to rsp? What is the point of this instruction. Can any one please tell.
then in next instruction 0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
its just setting -128 decimal to rdi. But still no buffer char buf[90] can be seen, where is my buffer? in following assmebly, can anyone please tell?
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
Dump of assembler code for function func:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
0x00000000000011a6 <+29>: xor %eax,%eax
0x00000000000011a8 <+31>: mov -0x78(%rbp),%rdx
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
0x00000000000011bb <+50>: lea -0x70(%rbp),%rax
0x00000000000011bf <+54>: mov %rax,%rsi
0x00000000000011c2 <+57>: lea 0xe3b(%rip),%rax # 0x2004
0x00000000000011c9 <+64>: mov %rax,%rdi
0x00000000000011cc <+67>: mov $0x0,%eax
0x00000000000011d1 <+72>: call 0x1090 <printf#plt>
0x00000000000011d6 <+77>: nop
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
0x00000000000011ec <+99>: ret
End of assembler dump.
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
As already mentioned in comments, your buffer is on the stack.
In the beginning of the function the rsp is decreased to allow more space (stack grows towards lower addresses, thus rsp is decreased as stack grows). This space is generally used for local variables, arguments passed to the function, and also for other purposes (will get back to it below).
In your case, you may trace back where your buffer buf is by looking at what arguments are passed to the strcpy - the first argument is passed in rdi register, the second - in rsi.
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
In the snippet above you can see that the pointer to buf (first argument to strcpy) was in rax prior to being put to rdi. And rax got its value from this instruction:
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
which means "load effective address (i.e. a pointer) that resides at offset -0x70 from the address rbp is pointing to". rbp points to where the stack pointer was in the beginning of the function (function frame pointer).
So it answers where the compiler has put your buffer.
Now for other questions:
then in next instruction 0x0000000000001195 <+12>:
mov %rdi,-0x78(%rbp) its just setting -128 decimal to rdi.
As we said, rdi holds the first argument to a function. Here it holds a first argument to func(), which is a pointer to name. This instruction puts this argument onto a stack at an offset of -0x78 from rbp - 8 bytes right before the space reserved for your buffer buf.
And the last two questions are related:
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
and
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
...
...
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
There is some value at %fs:0x28 (which denotes an offset of 0x28 in an fs segment). And this value is being placed (via rax) to the stack. To the very first 8 bytes in the space allocated for your function. And there it stays, hopefully untouched, until the function is about to return. There, it checks whether the value on the stack was changed. If it remained unchanged, the jump (je) will take you to the leave and the function will return. If, by any chance, the value on the stack got changed - your code has caused a stack overflow (aha!) and a call to __stack_chk_fail will be triggered, which perhaps will warn you about the overflow, and perhaps dump some debug information. So the value at %fs:0x28 is a kind of a unique magic/canary value.
And one last thing - about why add $0xffffffffffffff80,%rsp was used to allocate space on the stack, and not sub - other compilers do use sub as did GCC (version 8.5.0 20210514):
sub $0x70,%rsp
It allocated less, and one of the reasons is that the compiler did not reserve space for the stack overflow check.
As to "why use an add %rsp rather than a sub %rsp instruction":
On x86_64 there are actually two versions of these add/sub immediate with rsp instructions
a 4 byte version with a 1 byte immediate
a 7 byte version with a 4 byte immediate
For both versions, the immediate will be sign-extended to 64 bits and then added to (or subtracted from) %rsp. Now because of that sign extension, a 1-byte immediate can be any value from -128 (-0x80) up to 127 (0x7f). So the instruction
add $-0x80, %rsp
can use the 4-byte encoding, while the instruction
sub $0x80, %rsp
would require the 7 byte encoding. All else being equal (as it never is), the shorter encoding is better as it occupies less memory/cache.

Difficulty understanding logic in disassembled binary bomb phase 3

I have the following assembly program from the binary-bomb lab. The goal is to determine the keyword needed to run the binary without triggering the explode_bomb function. I commented my analysis of the assembly for this program but I am having trouble piecing everything together.
I believe I have all the information I need, but I still am unable to see the actual underlying logic and thus I am stuck. I would greatly appreciate any help!
The following is the disassembled program itself:
0x08048c3c <+0>: push %edi
0x08048c3d <+1>: push %esi
0x08048c3e <+2>: sub $0x14,%esp
0x08048c41 <+5>: movl $0x804a388,(%esp)
0x08048c48 <+12>: call 0x80490ab <string_length>
0x08048c4d <+17>: add $0x1,%eax
0x08048c50 <+20>: mov %eax,(%esp)
0x08048c53 <+23>: call 0x8048800 <malloc#plt>
0x08048c58 <+28>: mov $0x804a388,%esi
0x08048c5d <+33>: mov $0x13,%ecx
0x08048c62 <+38>: mov %eax,%edi
0x08048c64 <+40>: rep movsl %ds:(%esi),%es:(%edi)
0x08048c66 <+42>: movzwl (%esi),%edx
0x08048c69 <+45>: mov %dx,(%edi)
0x08048c6c <+48>: movzbl 0x11(%eax),%edx
0x08048c70 <+52>: mov %dl,0x10(%eax)
0x08048c73 <+55>: mov %eax,0x4(%esp)
0x08048c77 <+59>: mov 0x20(%esp),%eax
0x08048c7b <+63>: mov %eax,(%esp)
0x08048c7e <+66>: call 0x80490ca <strings_not_equal>
0x08048c83 <+71>: test %eax,%eax
0x08048c85 <+73>: je 0x8048c8c <phase_3+80>
0x08048c87 <+75>: call 0x8049363 <explode_bomb>
0x08048c8c <+80>: add $0x14,%esp
0x08048c8f <+83>: pop %esi
0x08048c90 <+84>: pop %edi
0x08048c91 <+85>: ret
The following block contains my analysis
5 <phase_3>
6 0x08048c3c <+0>: push %edi // push value in edi to stack
7 0x08048c3d <+1>: push %esi // push value of esi to stack
8 0x08048c3e <+2>: sub $0x14,%esp // grow stack by 0x14 (move stack ptr -0x14 bytes)
9
10 0x08048c41 <+5>: movl $0x804a388,(%esp) // put 0x804a388 into loc esp points to
11
12 0x08048c48 <+12>: call 0x80490ab <string_length> // check string length, store in eax
13 0x08048c4d <+17>: add $0x1,%eax // increment val in eax by 0x1 (str len + 1)
14 // at this point, eax = str_len + 1 = 77 + 1 = 78
15
16 0x08048c50 <+20>: mov %eax,(%esp) // get val in eax and put in loc on stack
17 //**** at this point, 0x804a388 should have a value of 78? ****
18
19 0x08048c53 <+23>: call 0x8048800 <malloc#plt> // malloc --> base ptr in eax
20
21 0x08048c58 <+28>: mov $0x804a388,%esi // 0x804a388 in esi
22 0x08048c5d <+33>: mov $0x13,%ecx // put 0x13 in ecx (counter register)
23 0x08048c62 <+38>: mov %eax,%edi // put val in eax into edi
24 0x08048c64 <+40>: rep movsl %ds:(%esi),%es:(%edi) // repeat 0x13 (19) times
25 // **** populate malloced memory with first 19 (edit: 76) chars of string at 0x804a388 (this string is 77 characters long)? ****
26
27 0x08048c66 <+42>: movzwl (%esi),%edx // put val in loc esi points to into edx
***** // at this point, edx should contain the string at 0x804a388?
28
29 0x08048c69 <+45>: mov %dx,(%edi) // put val in dx to loc edi points to
***** // not sure what effect this has or what is in edi at this point
30 0x08048c6c <+48>: movzbl 0x11(%eax),%edx // edx = [eax + 0x11]
31 0x08048c70 <+52>: mov %dl,0x10(%eax) // [eax + 0x10] = dl
32 0x08048c73 <+55>: mov %eax,0x4(%esp) // [esp + 0x4] = eax
33 0x08048c77 <+59>: mov 0x20(%esp),%eax // eax = [esp + 0x20]
34 0x08048c7b <+63>: mov %eax,(%esp) // put val in eax into loc esp points to
***** // not sure what effect these movs have
35
36 // edi --> first arg
37 // esi --> second arg
38 // compare value in esi to edi
39 0x08048c7e <+66>: call 0x80490ca <strings_not_equal> // store result in eax
40 0x08048c83 <+71>: test %eax,%eax
41 0x08048c85 <+73>: je 0x8048c8c <phase_3+80>
42 0x08048c87 <+75>: call 0x8049363 <explode_bomb>
43 0x08048c8c <+80>: add $0x14,%esp
44 0x08048c8f <+83>: pop %esi
45 0x08048c90 <+84>: pop %edi
46 0x08048c91 <+85>: ret
Update:
Upon inspecting the registers before strings_not_equal is called, I get the following:
eax 0x804d8aa 134535338
ecx 0x0 0
edx 0x76 118
ebx 0xffffd354 -11436
esp 0xffffd280 0xffffd280
ebp 0xffffd2b8 0xffffd2b8
esi 0x804a3d4 134521812
edi 0x804f744 134543172
eip 0x8048c7b 0x8048c7b <phase_3+63>
eflags 0x282 [ SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
and I get the following disassembled pseudocode using Hopper:
I even tried using both the number found in eax and the string seen earlier as my keyword but neither of them worked.
The function makes a modified copy of a string from static storage, into a malloced buffer.
This looks weird. The malloc size is dependent on strlen+1, but the memcpy size is a compile-time constant? Your decompilation apparently shows that address was a string literal so it seems that's fine.
Probably that missed optimization happened because of a custom string_length() function that was maybe only defined in another .c (and the bomb was compiled without link-time optimization for cross-file inlining). So size_t len = string_length("some string literal"); is not a compile-time constant and the compiler emitted a call to it instead of being able to use the known constant length of the string.
But probably they used strcpy in the source and the compiler did inline that as a rep movs. Since it's apparently copying from a string literal, the length is a compile-time constant and it can optimize away that part of the work that strcpy normally has to do. Normally if you've already calculated the length it's better to use memcpy instead of making strcpy calculate it again on the fly, but in this case it actually helped the compiler make better code for that part than if they'd passed the return value of string_length to a memcpy, again because string_length couldn't inline and optimize away.
<+0>: push %edi // push value in edi to stack
<+1>: push %esi // push value of esi to stack
<+2>: sub $0x14,%esp // grow stack by 0x14 (move stack ptr -0x14 bytes)
Comments like that are redundant; the instruction itself already says that. This is saving two call-preserved registers so the function can use them internally and restore them later.
Your comment on the sub is better; yes, grow the stack is the higher level semantic meaning here. This function reserves some space for locals (and for function args to be stored with mov instead of pushed).
The rep movsd copies 0x13 * 4 bytes, incrementing ESI and EDI to point past the end of the copied region. So another movsd instruction would copy another 4 bytes contiguous with the previous copy.
The code actually copies another 2, but instead of using movsw, it uses a movzw word load and a mov store. This makes a total of 78 bytes copied.
...
# at this point EAX = malloc return value which I'll call buf
<+28>: mov $0x804a388,%esi # copy src = a string literal in .rodata?
<+33>: mov $0x13,%ecx
<+38>: mov %eax,%edi # copy dst = buf
<+40>: rep movsl %ds:(%esi),%es:(%edi) # memcpy 76 bytes and advance ESI, EDI
<+42>: movzwl (%esi),%edx
<+45>: mov %dx,(%edi) # copy another 2 bytes (not moving ESI or EDI)
# final effect: 78-byte memcpy
On some (but not all) CPUs it would have been efficient to just use rep movsb or rep movsw with appropriate counts, but that's not what the compiler chose in this case. movzx aka AT&T movz is a good way to do narrow loads without partial-register penalties. That's why compilers do it, so they can write a full register even though they're only going to read the low 8 or 16 bits of that reg with a store instruction.
After that copy of a string literal into buf, we have a byte load/store that copies a character with buf. Remember at this point EAX is still pointing at buf, the malloc return value. So it's making a modified copy of the string literal.
<+48>: movzbl 0x11(%eax),%edx
<+52>: mov %dl,0x10(%eax) # buf[16] = buf[17]
Perhaps if the source hadn't defeated constant-propagation, with high enough optimization level the compiler might have just put the final string into .rodata where you could find it, trivializing this bomb phase. :P
Then it stores pointers as stack args for string compare.
<+55>: mov %eax,0x4(%esp) # 2nd arg slot = EAX = buf
<+59>: mov 0x20(%esp),%eax # function arg = user input?
<+63>: mov %eax,(%esp) # first arg slot = our incoming stack arg
<+66>: call 0x80490ca <strings_not_equal>
How to "cheat": looking at the runtime result with GDB
Some bomb labs only let you run the bomb online, on a test server, which would record explosions. You couldn't run it under GDB, only use static disassembly (like objdump -drwC -Mintel). So the test server could record how many failed attempts you had. e.g. like CS 3330 at cs.virginia.edu that I found with google, where full credit requires less than 20 explosions.
Using GDB to examine memory / registers part way through a function makes this vastly easier than only working from static analysis, in fact trivializing this function where the single input is only checked at the very end. e.g. just look at what other arg is being passed to strings_not_equal. (Especially if you use GDB's jump or set $pc = ... commands to skip past the bomb explosion checks.)
Set a breakpoint or single-step to just before the call to strings_not_equal. Use p (char*)$eax to treat EAX as a char* and show you the (0-terminated) C string starting at that address. At that point EAX holds the address of the buffer, as you can see from the store to the stack.
Copy/paste that string result and you're done.
Other phases with multiple numeric inputs typically aren't this easy to cheese with a debugger and do require at least some math, but linked-list phases that requires you to have a sequence of numbers in the right order for list traversal also become trivial if you know how to use a debugger to set registers to make compares succeed as you get to them.
rep movsl copies 32-bit longwords from address %esi to address %edi, incrementing both by 4 each time, a number of times equal to %ecx. Think of it as memcpy(edi, esi, ecx*4).
See https://felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq (it's movsd in Intel notation).
So this is copying 19*4=76 bytes.

Trouble understanding this assembly code

I have an exam comming up, and I'm strugling with assembly. I have written some simple C code, gotten its assembly code, and then trying to comment on the assembly code as practice. The C code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[])
{
int x = 10;
char const* y = argv[1];
printf("%s\n",y );
return 0;
}
Its assembly code:
0x00000000000006a0 <+0>: push %rbp # Creating stack
0x00000000000006a1 <+1>: mov %rsp,%rbp # Saving base of stack into base pointer register
0x00000000000006a4 <+4>: sub $0x20,%rsp # Allocate 32 bytes of space on the stack
0x00000000000006a8 <+8>: mov %edi,-0x14(%rbp) # First argument stored in stackframe
0x00000000000006ab <+11>: mov %rsi,-0x20(%rbp) # Second argument stored in stackframe
0x00000000000006af <+15>: movl $0xa,-0xc(%rbp) # Value 10 stored in x's address in the stackframe
0x00000000000006b6 <+22>: mov -0x20(%rbp),%rax # Second argument stored in return value register
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # ??
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # ??
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # ??
0x00000000000006c6 <+38>: mov %rax,%rdi # Return value copied to 1st argument register - why??
0x00000000000006c9 <+41>: callq 0x560 # printf??
0x00000000000006ce <+46>: mov $0x0,%eax # Value 0 is copied to return register
0x00000000000006d3 <+51>: leaveq # Destroying stackframe
0x00000000000006d4 <+52>: retq # Popping return address, and setting instruction pointer equal to it
Can a friendly soul help me out wherever I have "??" (meaning I don't understand what is happening or I'm unsure)?
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # get argv[1] to rax
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # move argv[1] to local variable
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # move local variable to rax (for move to rdi)
0x00000000000006c6 <+38>: mov %rax,%rdi # now rdi has argv[1]
0x00000000000006c9 <+41>: callq 0x560 # it is puts (optimized)
I will try to make a guess:
mov -0x20(%rbp),%rax # retrieve argv[0]
mov 0x8(%rax),%rax # store argv[1] into rax
mov %rax,-0x8(%rbp) # store argv[1] (which now is in rax) into y
mov -0x8(%rbp),%rax # put y back into rax (which might look dumb, but possibly it has its reasons)
mov %rax,%rdi # copy y to rdi, possibly to prepare the context for the printf
When you deal with assembler, please specify which architecture you are using. An Intel processor might use a different set of instructions from an ARM one, the same instructions might be different or they might rely on different assumptions. As you might know, optimisations change the sequence of assembler instructions generated by the compiler, you might want to specify whether you are using that as well (looks like not?) and which compiler you are using as everyone has its own policy for generating assembler.
Maybe we will never know why the compiler must prepare the context for printf by copying from rax, it could be a compiler's choice or an obligation imposed by the specific architecture. For all those annoying reasons, most of people prefer to use a "high level language" such as C, so that the set of instructions is always right although it might look very dumb for a human (as we know computers are dumb by design) and not always the most choice, that's why there are still many compilers around.
I can give you two more tips:
you IDE must have a way to interleave assembler instructions with C code, and to single step within the assembler. Try to find it out and explore it yourself
the IDE should also have a function to explore the memory of your program. If you find that try to enter the 0x560 address and look were it will lead you. It is very likely that that will be the entry point of your printf
I hope that my answer will help you work it out, good luck

Why gcc disassembler allocating extra space for local variable?

I have written simple function in C,
void GetInput()
{
char buffer[8];
gets(buffer);
puts(buffer);
}
When I disassemble it in gdb's disassembler, it gives following disassembly.
0x08048464 <+0>: push %ebp
0x08048465 <+1>: mov %esp,%ebp
0x08048467 <+3>: sub $0x10,%esp
0x0804846a <+6>: mov %gs:0x14,%eax
0x08048470 <+12>: mov %eax,-0x4(%ebp)
0x08048473 <+15>: xor %eax,%eax
=> 0x08048475 <+17>: lea -0xc(%ebp),%eax
0x08048478 <+20>: mov %eax,(%esp)
0x0804847b <+23>: call 0x8048360 <gets#plt>
0x08048480 <+28>: lea -0xc(%ebp),%eax
0x08048483 <+31>: mov %eax,(%esp)
0x08048486 <+34>: call 0x8048380 <puts#plt>
0x0804848b <+39>: mov -0x4(%ebp),%eax
0x0804848e <+42>: xor %gs:0x14,%eax
0x08048495 <+49>: je 0x804849c <GetInput+56>
0x08048497 <+51>: call 0x8048370 <__stack_chk_fail#plt>
0x0804849c <+56>: leave
0x0804849d <+57>: ret
Now please look at line number three, 0x08048467 <+3>: sub $0x10,%esp, I have only 8 bytes allocated as local variable, then why compiler is allocating 16 bytes(0x10).
Secondly, what is meaning of xor %gs:0x14,%eax.
#Edit: If it is optimization, is there any way to stop it.
Thanks.
Two things:
The compiler may reserve space for intermediate expressions to which you did not give names in the source code (or conversely not allocate space for local variables that can live entirely in registers). The list of stack slots in the binary does not have to match the list of local variables in the source code.
On some platforms, the compiler has to keep the stack pointer aligned. For the particular example in your question, it is likely that the compiler is striving to keep the stack pointer aligned to a boundary of 16 bytes.
Regarding your other question that you should have asked separately, xor %gs:0x14,%eax is clearly part of a stack protection mechanism, enabled by default. If you are using GCC, turn it off with -fno-stack-protector.
Besides the other answers already given, gcc will prefer to keep the stack 16-byte aligned for storing SSE values on the stack since some (all?) of the SSE instructions require their memory argument to be 16-byte aligned.
This more builds upon Pascal's answer, but in this case, it's probably because of the stack protection mechanism.
You allocate 8 bytes, which is fair enough and taken into account with the stack pointer. In addition, the current stack protection address is saved to %ebp, which points to the top of the current stack frame on the following lines
0x0804846a <+6>: mov %gs:0x14,%eax
0x08048470 <+12>: mov %eax,-0x4(%ebp)
This appears to take a four bytes. Given this, the other four bytes are probably for alignment of some form, or are taken up with some other stack information on the following lines:
=> 0x08048475 <+17>: lea -0xc(%ebp),%eax
0x08048478 <+20>: mov %eax,(%esp)

Resources