Trouble understanding this assembly code - c

I have an exam comming up, and I'm strugling with assembly. I have written some simple C code, gotten its assembly code, and then trying to comment on the assembly code as practice. The C code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[])
{
int x = 10;
char const* y = argv[1];
printf("%s\n",y );
return 0;
}
Its assembly code:
0x00000000000006a0 <+0>: push %rbp # Creating stack
0x00000000000006a1 <+1>: mov %rsp,%rbp # Saving base of stack into base pointer register
0x00000000000006a4 <+4>: sub $0x20,%rsp # Allocate 32 bytes of space on the stack
0x00000000000006a8 <+8>: mov %edi,-0x14(%rbp) # First argument stored in stackframe
0x00000000000006ab <+11>: mov %rsi,-0x20(%rbp) # Second argument stored in stackframe
0x00000000000006af <+15>: movl $0xa,-0xc(%rbp) # Value 10 stored in x's address in the stackframe
0x00000000000006b6 <+22>: mov -0x20(%rbp),%rax # Second argument stored in return value register
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # ??
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # ??
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # ??
0x00000000000006c6 <+38>: mov %rax,%rdi # Return value copied to 1st argument register - why??
0x00000000000006c9 <+41>: callq 0x560 # printf??
0x00000000000006ce <+46>: mov $0x0,%eax # Value 0 is copied to return register
0x00000000000006d3 <+51>: leaveq # Destroying stackframe
0x00000000000006d4 <+52>: retq # Popping return address, and setting instruction pointer equal to it
Can a friendly soul help me out wherever I have "??" (meaning I don't understand what is happening or I'm unsure)?

0x00000000000006ba <+26>: mov 0x8(%rax),%rax # get argv[1] to rax
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # move argv[1] to local variable
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # move local variable to rax (for move to rdi)
0x00000000000006c6 <+38>: mov %rax,%rdi # now rdi has argv[1]
0x00000000000006c9 <+41>: callq 0x560 # it is puts (optimized)

I will try to make a guess:
mov -0x20(%rbp),%rax # retrieve argv[0]
mov 0x8(%rax),%rax # store argv[1] into rax
mov %rax,-0x8(%rbp) # store argv[1] (which now is in rax) into y
mov -0x8(%rbp),%rax # put y back into rax (which might look dumb, but possibly it has its reasons)
mov %rax,%rdi # copy y to rdi, possibly to prepare the context for the printf
When you deal with assembler, please specify which architecture you are using. An Intel processor might use a different set of instructions from an ARM one, the same instructions might be different or they might rely on different assumptions. As you might know, optimisations change the sequence of assembler instructions generated by the compiler, you might want to specify whether you are using that as well (looks like not?) and which compiler you are using as everyone has its own policy for generating assembler.
Maybe we will never know why the compiler must prepare the context for printf by copying from rax, it could be a compiler's choice or an obligation imposed by the specific architecture. For all those annoying reasons, most of people prefer to use a "high level language" such as C, so that the set of instructions is always right although it might look very dumb for a human (as we know computers are dumb by design) and not always the most choice, that's why there are still many compilers around.
I can give you two more tips:
you IDE must have a way to interleave assembler instructions with C code, and to single step within the assembler. Try to find it out and explore it yourself
the IDE should also have a function to explore the memory of your program. If you find that try to enter the 0x560 address and look were it will lead you. It is very likely that that will be the entry point of your printf
I hope that my answer will help you work it out, good luck

Related

Difference in x86-32 and x64 Assembly stack allocation for a fixed-size buffer with unoptimized C (GCC)

Doing some basic disassembly and have noticed that the buffer is being given additional buffer space for some reason although what i am looking at in a tutorial uses the same code but is only given the correct (500) chars in length. Why is this?
My code:
#include <stdio.h>
#include <string.h>
int main (int argc, char** argv){
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}
compiled with GCC, the dissembled code is:
0x0000000000001139 <+0>: push %rbp
0x000000000000113a <+1>: mov %rsp,%rbp
0x000000000000113d <+4>: sub $0x210,%rsp
0x0000000000001144 <+11>: mov %edi,-0x204(%rbp)
0x000000000000114a <+17>: mov %rsi,-0x210(%rbp)
0x0000000000001151 <+24>: mov -0x210(%rbp),%rax
0x0000000000001158 <+31>: add $0x8,%rax
0x000000000000115c <+35>: mov (%rax),%rdx
0x000000000000115f <+38>: lea -0x200(%rbp),%rax
0x0000000000001166 <+45>: mov %rdx,%rsi
0x0000000000001169 <+48>: mov %rax,%rdi
0x000000000000116c <+51>: call 0x1030 <strcpy#plt>
0x0000000000001171 <+56>: mov $0x0,%eax
0x0000000000001176 <+61>: leave
0x0000000000001177 <+62>: ret
However, this video https://www.youtube.com/watch?v=1S0aBV-Waeo clearly only has 500 bytes assigned
Why is this this the case as the only difference I can see here is one is 32-bit and another (mine) is on x86-64.
500 is not a multiple of 16.
The x86-64 ABI (application binary interface) requires the stack pointer to be a multiple of 16 whenever a call instruction is about to happen. (Since call pushes an 8-byte return address, this means the stack pointer is always congruent to 8, mod 16, when control reaches the first instruction of a called function.) For the code shown, it is convenient for the compiler to achieve this requirement by increasing the value it uses in the sub instruction, making it be a multiple of 16.
The x86-32 ABI did not make this requirement, so there was no reason for the compiler used in the video to increase the size of the stack frame.
Note that you appear to have compiled your code without optimization. I get this at -O2:
0x0000000000000000 <+0>: sub $0x208,%rsp
0x0000000000000007 <+7>: mov 0x8(%rsi),%rsi
0x000000000000000b <+11>: mov %rsp,%rdi
0x000000000000000e <+14>: call <strcpy#PLT>
0x0000000000000013 <+19>: xor %eax,%eax
0x0000000000000015 <+21>: add $0x208,%rsp
0x000000000000001c <+28>: ret
The stack adjustment is still somewhat larger than the size of the array, but not as big as what you had, and no longer a multiple of 16; the difference is that with optimization on, the frame pointer is eliminated, so %rbp does not need to be saved and restored, and so the stack pointer is not a multiple of 16 at the point of the sub instruction.
(Incidentally, there is no requirement anywhere for a stack frame to be as small as possible. "Quality of implementation" dictates that it should be as small as possible, but for various reasons it's quite common for the compiler to miss that target. In my optimized code dump, I don't see any reason why the immediate operand to sub and add couldn't have been 0x1f8 (504).

Understanding x86-64 assembly for simple program in C with a function call

I have simple C program that produces this x86-64 assembly for function func
#include <stdio.h>
#include <string.h>
void func(char *name)
{
char buf[90];
strcpy(buf, name);
printf("Welcome %s\n", buf);
}
int main(int argc, char *argv[])
{
func(argv[1]);
return 0;
}
So I think this
0x000000000000118d <+4>: push %rbp
pushes the base pointer like placed argument which is char *name
then 0x000000000000118e <+5>: mov %rsp,%rbp set stack pointer to what at base pointer I belive that above and this makes stack point points to char *name at this point
then
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
I am little unsure about this. Why is 0xffffffffffffff80 added to rsp? What is the point of this instruction. Can any one please tell.
then in next instruction 0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
its just setting -128 decimal to rdi. But still no buffer char buf[90] can be seen, where is my buffer? in following assmebly, can anyone please tell?
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
Dump of assembler code for function func:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
0x00000000000011a6 <+29>: xor %eax,%eax
0x00000000000011a8 <+31>: mov -0x78(%rbp),%rdx
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
0x00000000000011bb <+50>: lea -0x70(%rbp),%rax
0x00000000000011bf <+54>: mov %rax,%rsi
0x00000000000011c2 <+57>: lea 0xe3b(%rip),%rax # 0x2004
0x00000000000011c9 <+64>: mov %rax,%rdi
0x00000000000011cc <+67>: mov $0x0,%eax
0x00000000000011d1 <+72>: call 0x1090 <printf#plt>
0x00000000000011d6 <+77>: nop
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
0x00000000000011ec <+99>: ret
End of assembler dump.
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
As already mentioned in comments, your buffer is on the stack.
In the beginning of the function the rsp is decreased to allow more space (stack grows towards lower addresses, thus rsp is decreased as stack grows). This space is generally used for local variables, arguments passed to the function, and also for other purposes (will get back to it below).
In your case, you may trace back where your buffer buf is by looking at what arguments are passed to the strcpy - the first argument is passed in rdi register, the second - in rsi.
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
In the snippet above you can see that the pointer to buf (first argument to strcpy) was in rax prior to being put to rdi. And rax got its value from this instruction:
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
which means "load effective address (i.e. a pointer) that resides at offset -0x70 from the address rbp is pointing to". rbp points to where the stack pointer was in the beginning of the function (function frame pointer).
So it answers where the compiler has put your buffer.
Now for other questions:
then in next instruction 0x0000000000001195 <+12>:
mov %rdi,-0x78(%rbp) its just setting -128 decimal to rdi.
As we said, rdi holds the first argument to a function. Here it holds a first argument to func(), which is a pointer to name. This instruction puts this argument onto a stack at an offset of -0x78 from rbp - 8 bytes right before the space reserved for your buffer buf.
And the last two questions are related:
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
and
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
...
...
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
There is some value at %fs:0x28 (which denotes an offset of 0x28 in an fs segment). And this value is being placed (via rax) to the stack. To the very first 8 bytes in the space allocated for your function. And there it stays, hopefully untouched, until the function is about to return. There, it checks whether the value on the stack was changed. If it remained unchanged, the jump (je) will take you to the leave and the function will return. If, by any chance, the value on the stack got changed - your code has caused a stack overflow (aha!) and a call to __stack_chk_fail will be triggered, which perhaps will warn you about the overflow, and perhaps dump some debug information. So the value at %fs:0x28 is a kind of a unique magic/canary value.
And one last thing - about why add $0xffffffffffffff80,%rsp was used to allocate space on the stack, and not sub - other compilers do use sub as did GCC (version 8.5.0 20210514):
sub $0x70,%rsp
It allocated less, and one of the reasons is that the compiler did not reserve space for the stack overflow check.
As to "why use an add %rsp rather than a sub %rsp instruction":
On x86_64 there are actually two versions of these add/sub immediate with rsp instructions
a 4 byte version with a 1 byte immediate
a 7 byte version with a 4 byte immediate
For both versions, the immediate will be sign-extended to 64 bits and then added to (or subtracted from) %rsp. Now because of that sign extension, a 1-byte immediate can be any value from -128 (-0x80) up to 127 (0x7f). So the instruction
add $-0x80, %rsp
can use the 4-byte encoding, while the instruction
sub $0x80, %rsp
would require the 7 byte encoding. All else being equal (as it never is), the shorter encoding is better as it occupies less memory/cache.

Why is the assembler using FS:40 [duplicate]

My hello & regards to all. I have a C program, basically wrote for testing Buffer overflow.
#include<stdio.h>
void display()
{
char buff[8];
gets(buff);
puts(buff);
}
main()
{
display();
return(0);
}
Now i disassemble display and main sections of it using GDB. The code:-
Dump of assembler code for function main:
0x080484ae <+0>: push %ebp # saving ebp to stack
0x080484af <+1>: mov %esp,%ebp # saving esp in ebp
0x080484b1 <+3>: call 0x8048474 <display> # calling display function
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
0x080484bb <+13>: pop %ebp # remove ebp from stack
0x080484bc <+14>: ret # return
End of assembler dump.
Dump of assembler code for function display:
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
0x0804848b <+23>: call 0x8048374 <gets#plt> # calling get, what is "#plt" ????
0x08048490 <+28>: lea -0xc(%ebp),%eax # LEA of 12 bytes lower to eax
0x08048493 <+31>: mov %eax,(%esp) # make esp point to eax contained address
0x08048496 <+34>: call 0x80483a4 <puts#plt> # again what is "#plt" ????
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
End of assembler dump.
So folks, you should consider my homework. Rest all of the code is known to me, except few lines. I have included a big "WHY ????" and some more questions in the comments ahead of each line. The first hurdle for me is "mov %gs:0x14,%eax" instruction, I cant make flow chart after this instruction. Somebody plz explain me, what these few instructions are meant for and doing what in the program? Thanks...
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
Don't you have this?:
return(0);
They are probably related. :)
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
It means reading 4 bytes into eax from memory at address gs:0x14. gs is a segment register. Most likely thread-local storage (AKA TLS) is referenced through this register.
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Don't know. Could be optimization-related.
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
It makes eax point to a local variable that lives on the stack. sub $0x10,%esp allocated some space for them.
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
Wrong. It writes eax to the stack, to the stack top. It will be passed as an on-stack argument to the called function:
0x0804848b <+23>: call 0x8048374 <gets#plt> # calling get, what is "#plt" ????
I don't know. Could be some name mangling.
By now you should've guessed what local variable that was. buff, what else could it be?
0x080484ac <+56>: leave # a new instruction, not known to me
Why don't you look it up in the CPU manual?
Now, I can probably explain you the gs/TLS thing...
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
...
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>
So, this code takes a value from the TLS (at gs:0x14) and stores it right below the saved ebp value (at ebp-4). Then there's your stuff with get() and put(). Then this code checks whether the copy of the value from the TLS is unchanged. xor %gs:0x14,%eax does the compare.
If XORed values are the same, the result of the XOR is 0 and flags.zf is 1. Else, the result isn't 0 and flags.zf is 0.
je 0x80484ac <display+56> checks flags.zf and skips call 0x8048394 <__stack_chk_fail#plt> if flags.zf = 1. IOW, this call is skipped if the copy of the value from the TLS is unchanged.
What is that all about? That's a way to try to catch a buffer overflow. If you write beyond the end of the buffer, you will overwrite that value copied from the TLS to the stack.
Why do we take this value from the TLS, why not just a constant, hard-coded value? We probably want to use different, non-hard-coded values to catch overflows more often (and so the value in the TLS will change from a run to another run of your program and it will be different in different threads of your program). That also lowers chances of successfully exploiting the buffer overflow by an attacker if the value is chosen randomly each time your program runs.
Finally, if the copy of the value is found to have been overwritten due to a buffer overflow, call 0x8048394 <__stack_chk_fail#plt> will call a special function dedicated to doing whatever's necessary, e.g. reporting a problem and terminating the program.
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
The gs segment can be used for thread local storage. E.g. it's used for errno, so that each thread in a multi-threaded program effectively has its own errno variable.
The function name above is a big clue. This must be a stack canary.
(leave is some CISC instruction that does everything you need to do before the actual ret. I don't know the details).
Others already explained the GS thing (has to do with threads)..
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Explaining this requires some history of the X86 architecture:
the xor eax, eax instruction clears out all bits in register eax (loads a zero), but as you've already found it this seems to be unnecessary because the register gets loaded with a new value in the next instruction.
However, xor eax, eax does something else on the x86 as well. You probably know that you are able to access parts of the register eax by using al, ah and ax. It has been that way since the 386, and it was okay back then when eax really was a single register.
However, this is no more. The registers that you see and use in your code are just placeholders. Inside the CPU is working with much more internal registers and a completely different instruction set. Instructions that you write are translated into this internal instruction set.
If you use AL, AH and EAX for example you are using three different registers from the CPU point of view.
Now if you access EAX after you have used AL or AH, the CPU has to merge back these different registers to build a valid EAX value.
The line:
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Does not only clear out register eax. It also tells the CPU that all renamed sub-registers: AL, AH and AX can now considered to be invalidated (set to zero) and the CPU does not have to do any sub-register merging.
Why is the compiler emitting this instruction?
Because the compiler does not know in which context display() will get called. You may call it from a piece of code that does lots of byte arithmetic using AL and AH. If it would not clear out the EAX register via XOR, the CPU would have to do the costly register merging which takes a lot of cycles.
So doing this extra work at the function start improves performance. It is unnecessary in your case, but since the compiler can't know that emits the instruction to be sure.
The stack_check_fail is part of gcc buffer overflow check. It uses libssp (stack-smash-protection), and your move at the beginning sets up a guard for the stack, and the xor %gs:0x14... is a check if the guard is still ok. When it is ok, it jumps to the leave (check assembler doc for it, its an helper instruction for stack handling) and skips the jump to the stack_chk_fail, which would abort the program and emit an error message.
You can disable the emitting of this overflow check with the gcc option -fno-stack-protector.
And as already mentioned in the comments, the xor x,x is just a quick command to clear x, and the final mov 0, %eax is for the return value of your main.

How does GDB determine the address to break at when you do "break function-name"?

A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.

what does this instruction do?:- mov %gs:0x14,%eax

My hello & regards to all. I have a C program, basically wrote for testing Buffer overflow.
#include<stdio.h>
void display()
{
char buff[8];
gets(buff);
puts(buff);
}
main()
{
display();
return(0);
}
Now i disassemble display and main sections of it using GDB. The code:-
Dump of assembler code for function main:
0x080484ae <+0>: push %ebp # saving ebp to stack
0x080484af <+1>: mov %esp,%ebp # saving esp in ebp
0x080484b1 <+3>: call 0x8048474 <display> # calling display function
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
0x080484bb <+13>: pop %ebp # remove ebp from stack
0x080484bc <+14>: ret # return
End of assembler dump.
Dump of assembler code for function display:
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
0x0804848b <+23>: call 0x8048374 <gets#plt> # calling get, what is "#plt" ????
0x08048490 <+28>: lea -0xc(%ebp),%eax # LEA of 12 bytes lower to eax
0x08048493 <+31>: mov %eax,(%esp) # make esp point to eax contained address
0x08048496 <+34>: call 0x80483a4 <puts#plt> # again what is "#plt" ????
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
End of assembler dump.
So folks, you should consider my homework. Rest all of the code is known to me, except few lines. I have included a big "WHY ????" and some more questions in the comments ahead of each line. The first hurdle for me is "mov %gs:0x14,%eax" instruction, I cant make flow chart after this instruction. Somebody plz explain me, what these few instructions are meant for and doing what in the program? Thanks...
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
Don't you have this?:
return(0);
They are probably related. :)
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
It means reading 4 bytes into eax from memory at address gs:0x14. gs is a segment register. Most likely thread-local storage (AKA TLS) is referenced through this register.
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Don't know. Could be optimization-related.
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
It makes eax point to a local variable that lives on the stack. sub $0x10,%esp allocated some space for them.
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
Wrong. It writes eax to the stack, to the stack top. It will be passed as an on-stack argument to the called function:
0x0804848b <+23>: call 0x8048374 <gets#plt> # calling get, what is "#plt" ????
I don't know. Could be some name mangling.
By now you should've guessed what local variable that was. buff, what else could it be?
0x080484ac <+56>: leave # a new instruction, not known to me
Why don't you look it up in the CPU manual?
Now, I can probably explain you the gs/TLS thing...
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
...
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>
So, this code takes a value from the TLS (at gs:0x14) and stores it right below the saved ebp value (at ebp-4). Then there's your stuff with get() and put(). Then this code checks whether the copy of the value from the TLS is unchanged. xor %gs:0x14,%eax does the compare.
If XORed values are the same, the result of the XOR is 0 and flags.zf is 1. Else, the result isn't 0 and flags.zf is 0.
je 0x80484ac <display+56> checks flags.zf and skips call 0x8048394 <__stack_chk_fail#plt> if flags.zf = 1. IOW, this call is skipped if the copy of the value from the TLS is unchanged.
What is that all about? That's a way to try to catch a buffer overflow. If you write beyond the end of the buffer, you will overwrite that value copied from the TLS to the stack.
Why do we take this value from the TLS, why not just a constant, hard-coded value? We probably want to use different, non-hard-coded values to catch overflows more often (and so the value in the TLS will change from a run to another run of your program and it will be different in different threads of your program). That also lowers chances of successfully exploiting the buffer overflow by an attacker if the value is chosen randomly each time your program runs.
Finally, if the copy of the value is found to have been overwritten due to a buffer overflow, call 0x8048394 <__stack_chk_fail#plt> will call a special function dedicated to doing whatever's necessary, e.g. reporting a problem and terminating the program.
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
The gs segment can be used for thread local storage. E.g. it's used for errno, so that each thread in a multi-threaded program effectively has its own errno variable.
The function name above is a big clue. This must be a stack canary.
(leave is some CISC instruction that does everything you need to do before the actual ret. I don't know the details).
Others already explained the GS thing (has to do with threads)..
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Explaining this requires some history of the X86 architecture:
the xor eax, eax instruction clears out all bits in register eax (loads a zero), but as you've already found it this seems to be unnecessary because the register gets loaded with a new value in the next instruction.
However, xor eax, eax does something else on the x86 as well. You probably know that you are able to access parts of the register eax by using al, ah and ax. It has been that way since the 386, and it was okay back then when eax really was a single register.
However, this is no more. The registers that you see and use in your code are just placeholders. Inside the CPU is working with much more internal registers and a completely different instruction set. Instructions that you write are translated into this internal instruction set.
If you use AL, AH and EAX for example you are using three different registers from the CPU point of view.
Now if you access EAX after you have used AL or AH, the CPU has to merge back these different registers to build a valid EAX value.
The line:
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Does not only clear out register eax. It also tells the CPU that all renamed sub-registers: AL, AH and AX can now considered to be invalidated (set to zero) and the CPU does not have to do any sub-register merging.
Why is the compiler emitting this instruction?
Because the compiler does not know in which context display() will get called. You may call it from a piece of code that does lots of byte arithmetic using AL and AH. If it would not clear out the EAX register via XOR, the CPU would have to do the costly register merging which takes a lot of cycles.
So doing this extra work at the function start improves performance. It is unnecessary in your case, but since the compiler can't know that emits the instruction to be sure.
The stack_check_fail is part of gcc buffer overflow check. It uses libssp (stack-smash-protection), and your move at the beginning sets up a guard for the stack, and the xor %gs:0x14... is a check if the guard is still ok. When it is ok, it jumps to the leave (check assembler doc for it, its an helper instruction for stack handling) and skips the jump to the stack_chk_fail, which would abort the program and emit an error message.
You can disable the emitting of this overflow check with the gcc option -fno-stack-protector.
And as already mentioned in the comments, the xor x,x is just a quick command to clear x, and the final mov 0, %eax is for the return value of your main.

Resources