Disassembly of a C function - c

I was trying to understand disassembled code of the following function.
void func(char *string) {
printf("the string is %s\n",string);
}
The disassembled code is given below.
1) 0x080483e4 <+0>: push %ebp
2) 0x080483e5 <+1>: mov %esp,%ebp
3) 0x080483e7 <+3>: sub $0x18,%esp
4) 0x080483ea <+6>: mov $0x80484f0,%eax
5) 0x080483ef <+11>: mov 0x8(%ebp),%edx
6) 0x080483f2 <+14>: mov %edx,0x4(%esp)
7) 0x080483f6 <+18>: mov %eax,(%esp)
8) 0x080483f9 <+21>: call 0x8048300 <printf#plt>
Could anyone tell me what lines 4-7 means (not the literal explanation). Also why 24 bytes are allocated on stack on line 3?

Basically what happens here:
4) 0x080483ea <+6> : mov $0x80484f0,%eax
Load the address of "the string is %s\n" into eax.
5) 0x080483ef <+11>: mov 0x8(%ebp),%edx
Move the argument string into edx.
6) 0x080483f2 <+14>: mov %edx,0x4(%esp)
Push the value of edx or string into the stack, second argument of printf
7) 0x080483f6 <+18>: mov %eax,(%esp)
Push the value of eax or "the string is %s\n" into the stack, first argument of printf, and then it will call printf.
sub $0x18,%esp is not necessary since the function has no local variables, gcc seems to like making extra space but honestly I don't know why.

The stack is a continuous region of memory that starts at a higher address and ends at esp. Whenever you need your stack to grow, you subtract from esp. Every function can have a frame on the stack. It is the part of the stack that the function owns and is responsible for cleaning after it is done. It means, when the function starts it decreases esp to create its frame. When it ends it increases it back. ebp usually points to the beginning of your frame.
Initially this function pushs ebp to tha stack so that it can be stored when the function ends, sets esp = ebp to mark the begin of its frame and allocate 28 bytes. Why 28? For alignment. It already allocated 4 bytes for the ebp. 4 + 28 = 32.
The lines 4-7 will prepare the call to printf. It expects its arguments to be on the frame of the caller. When we read mov 0x8(%ebp), %edx, we are taking our argument char* string from the caller's frame. printf will do the same.
Note that your assembly is missing a leave and a ret instructions to clear the stack and return to the caller.

Related

Understanding x86-64 assembly for simple program in C with a function call

I have simple C program that produces this x86-64 assembly for function func
#include <stdio.h>
#include <string.h>
void func(char *name)
{
char buf[90];
strcpy(buf, name);
printf("Welcome %s\n", buf);
}
int main(int argc, char *argv[])
{
func(argv[1]);
return 0;
}
So I think this
0x000000000000118d <+4>: push %rbp
pushes the base pointer like placed argument which is char *name
then 0x000000000000118e <+5>: mov %rsp,%rbp set stack pointer to what at base pointer I belive that above and this makes stack point points to char *name at this point
then
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
I am little unsure about this. Why is 0xffffffffffffff80 added to rsp? What is the point of this instruction. Can any one please tell.
then in next instruction 0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
its just setting -128 decimal to rdi. But still no buffer char buf[90] can be seen, where is my buffer? in following assmebly, can anyone please tell?
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
Dump of assembler code for function func:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
0x00000000000011a6 <+29>: xor %eax,%eax
0x00000000000011a8 <+31>: mov -0x78(%rbp),%rdx
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
0x00000000000011bb <+50>: lea -0x70(%rbp),%rax
0x00000000000011bf <+54>: mov %rax,%rsi
0x00000000000011c2 <+57>: lea 0xe3b(%rip),%rax # 0x2004
0x00000000000011c9 <+64>: mov %rax,%rdi
0x00000000000011cc <+67>: mov $0x0,%eax
0x00000000000011d1 <+72>: call 0x1090 <printf#plt>
0x00000000000011d6 <+77>: nop
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
0x00000000000011ec <+99>: ret
End of assembler dump.
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
As already mentioned in comments, your buffer is on the stack.
In the beginning of the function the rsp is decreased to allow more space (stack grows towards lower addresses, thus rsp is decreased as stack grows). This space is generally used for local variables, arguments passed to the function, and also for other purposes (will get back to it below).
In your case, you may trace back where your buffer buf is by looking at what arguments are passed to the strcpy - the first argument is passed in rdi register, the second - in rsi.
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
In the snippet above you can see that the pointer to buf (first argument to strcpy) was in rax prior to being put to rdi. And rax got its value from this instruction:
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
which means "load effective address (i.e. a pointer) that resides at offset -0x70 from the address rbp is pointing to". rbp points to where the stack pointer was in the beginning of the function (function frame pointer).
So it answers where the compiler has put your buffer.
Now for other questions:
then in next instruction 0x0000000000001195 <+12>:
mov %rdi,-0x78(%rbp) its just setting -128 decimal to rdi.
As we said, rdi holds the first argument to a function. Here it holds a first argument to func(), which is a pointer to name. This instruction puts this argument onto a stack at an offset of -0x78 from rbp - 8 bytes right before the space reserved for your buffer buf.
And the last two questions are related:
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
and
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
...
...
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
There is some value at %fs:0x28 (which denotes an offset of 0x28 in an fs segment). And this value is being placed (via rax) to the stack. To the very first 8 bytes in the space allocated for your function. And there it stays, hopefully untouched, until the function is about to return. There, it checks whether the value on the stack was changed. If it remained unchanged, the jump (je) will take you to the leave and the function will return. If, by any chance, the value on the stack got changed - your code has caused a stack overflow (aha!) and a call to __stack_chk_fail will be triggered, which perhaps will warn you about the overflow, and perhaps dump some debug information. So the value at %fs:0x28 is a kind of a unique magic/canary value.
And one last thing - about why add $0xffffffffffffff80,%rsp was used to allocate space on the stack, and not sub - other compilers do use sub as did GCC (version 8.5.0 20210514):
sub $0x70,%rsp
It allocated less, and one of the reasons is that the compiler did not reserve space for the stack overflow check.
As to "why use an add %rsp rather than a sub %rsp instruction":
On x86_64 there are actually two versions of these add/sub immediate with rsp instructions
a 4 byte version with a 1 byte immediate
a 7 byte version with a 4 byte immediate
For both versions, the immediate will be sign-extended to 64 bits and then added to (or subtracted from) %rsp. Now because of that sign extension, a 1-byte immediate can be any value from -128 (-0x80) up to 127 (0x7f). So the instruction
add $-0x80, %rsp
can use the 4-byte encoding, while the instruction
sub $0x80, %rsp
would require the 7 byte encoding. All else being equal (as it never is), the shorter encoding is better as it occupies less memory/cache.

Difficulty understanding logic in disassembled binary bomb phase 3

I have the following assembly program from the binary-bomb lab. The goal is to determine the keyword needed to run the binary without triggering the explode_bomb function. I commented my analysis of the assembly for this program but I am having trouble piecing everything together.
I believe I have all the information I need, but I still am unable to see the actual underlying logic and thus I am stuck. I would greatly appreciate any help!
The following is the disassembled program itself:
0x08048c3c <+0>: push %edi
0x08048c3d <+1>: push %esi
0x08048c3e <+2>: sub $0x14,%esp
0x08048c41 <+5>: movl $0x804a388,(%esp)
0x08048c48 <+12>: call 0x80490ab <string_length>
0x08048c4d <+17>: add $0x1,%eax
0x08048c50 <+20>: mov %eax,(%esp)
0x08048c53 <+23>: call 0x8048800 <malloc#plt>
0x08048c58 <+28>: mov $0x804a388,%esi
0x08048c5d <+33>: mov $0x13,%ecx
0x08048c62 <+38>: mov %eax,%edi
0x08048c64 <+40>: rep movsl %ds:(%esi),%es:(%edi)
0x08048c66 <+42>: movzwl (%esi),%edx
0x08048c69 <+45>: mov %dx,(%edi)
0x08048c6c <+48>: movzbl 0x11(%eax),%edx
0x08048c70 <+52>: mov %dl,0x10(%eax)
0x08048c73 <+55>: mov %eax,0x4(%esp)
0x08048c77 <+59>: mov 0x20(%esp),%eax
0x08048c7b <+63>: mov %eax,(%esp)
0x08048c7e <+66>: call 0x80490ca <strings_not_equal>
0x08048c83 <+71>: test %eax,%eax
0x08048c85 <+73>: je 0x8048c8c <phase_3+80>
0x08048c87 <+75>: call 0x8049363 <explode_bomb>
0x08048c8c <+80>: add $0x14,%esp
0x08048c8f <+83>: pop %esi
0x08048c90 <+84>: pop %edi
0x08048c91 <+85>: ret
The following block contains my analysis
5 <phase_3>
6 0x08048c3c <+0>: push %edi // push value in edi to stack
7 0x08048c3d <+1>: push %esi // push value of esi to stack
8 0x08048c3e <+2>: sub $0x14,%esp // grow stack by 0x14 (move stack ptr -0x14 bytes)
9
10 0x08048c41 <+5>: movl $0x804a388,(%esp) // put 0x804a388 into loc esp points to
11
12 0x08048c48 <+12>: call 0x80490ab <string_length> // check string length, store in eax
13 0x08048c4d <+17>: add $0x1,%eax // increment val in eax by 0x1 (str len + 1)
14 // at this point, eax = str_len + 1 = 77 + 1 = 78
15
16 0x08048c50 <+20>: mov %eax,(%esp) // get val in eax and put in loc on stack
17 //**** at this point, 0x804a388 should have a value of 78? ****
18
19 0x08048c53 <+23>: call 0x8048800 <malloc#plt> // malloc --> base ptr in eax
20
21 0x08048c58 <+28>: mov $0x804a388,%esi // 0x804a388 in esi
22 0x08048c5d <+33>: mov $0x13,%ecx // put 0x13 in ecx (counter register)
23 0x08048c62 <+38>: mov %eax,%edi // put val in eax into edi
24 0x08048c64 <+40>: rep movsl %ds:(%esi),%es:(%edi) // repeat 0x13 (19) times
25 // **** populate malloced memory with first 19 (edit: 76) chars of string at 0x804a388 (this string is 77 characters long)? ****
26
27 0x08048c66 <+42>: movzwl (%esi),%edx // put val in loc esi points to into edx
***** // at this point, edx should contain the string at 0x804a388?
28
29 0x08048c69 <+45>: mov %dx,(%edi) // put val in dx to loc edi points to
***** // not sure what effect this has or what is in edi at this point
30 0x08048c6c <+48>: movzbl 0x11(%eax),%edx // edx = [eax + 0x11]
31 0x08048c70 <+52>: mov %dl,0x10(%eax) // [eax + 0x10] = dl
32 0x08048c73 <+55>: mov %eax,0x4(%esp) // [esp + 0x4] = eax
33 0x08048c77 <+59>: mov 0x20(%esp),%eax // eax = [esp + 0x20]
34 0x08048c7b <+63>: mov %eax,(%esp) // put val in eax into loc esp points to
***** // not sure what effect these movs have
35
36 // edi --> first arg
37 // esi --> second arg
38 // compare value in esi to edi
39 0x08048c7e <+66>: call 0x80490ca <strings_not_equal> // store result in eax
40 0x08048c83 <+71>: test %eax,%eax
41 0x08048c85 <+73>: je 0x8048c8c <phase_3+80>
42 0x08048c87 <+75>: call 0x8049363 <explode_bomb>
43 0x08048c8c <+80>: add $0x14,%esp
44 0x08048c8f <+83>: pop %esi
45 0x08048c90 <+84>: pop %edi
46 0x08048c91 <+85>: ret
Update:
Upon inspecting the registers before strings_not_equal is called, I get the following:
eax 0x804d8aa 134535338
ecx 0x0 0
edx 0x76 118
ebx 0xffffd354 -11436
esp 0xffffd280 0xffffd280
ebp 0xffffd2b8 0xffffd2b8
esi 0x804a3d4 134521812
edi 0x804f744 134543172
eip 0x8048c7b 0x8048c7b <phase_3+63>
eflags 0x282 [ SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
and I get the following disassembled pseudocode using Hopper:
I even tried using both the number found in eax and the string seen earlier as my keyword but neither of them worked.
The function makes a modified copy of a string from static storage, into a malloced buffer.
This looks weird. The malloc size is dependent on strlen+1, but the memcpy size is a compile-time constant? Your decompilation apparently shows that address was a string literal so it seems that's fine.
Probably that missed optimization happened because of a custom string_length() function that was maybe only defined in another .c (and the bomb was compiled without link-time optimization for cross-file inlining). So size_t len = string_length("some string literal"); is not a compile-time constant and the compiler emitted a call to it instead of being able to use the known constant length of the string.
But probably they used strcpy in the source and the compiler did inline that as a rep movs. Since it's apparently copying from a string literal, the length is a compile-time constant and it can optimize away that part of the work that strcpy normally has to do. Normally if you've already calculated the length it's better to use memcpy instead of making strcpy calculate it again on the fly, but in this case it actually helped the compiler make better code for that part than if they'd passed the return value of string_length to a memcpy, again because string_length couldn't inline and optimize away.
<+0>: push %edi // push value in edi to stack
<+1>: push %esi // push value of esi to stack
<+2>: sub $0x14,%esp // grow stack by 0x14 (move stack ptr -0x14 bytes)
Comments like that are redundant; the instruction itself already says that. This is saving two call-preserved registers so the function can use them internally and restore them later.
Your comment on the sub is better; yes, grow the stack is the higher level semantic meaning here. This function reserves some space for locals (and for function args to be stored with mov instead of pushed).
The rep movsd copies 0x13 * 4 bytes, incrementing ESI and EDI to point past the end of the copied region. So another movsd instruction would copy another 4 bytes contiguous with the previous copy.
The code actually copies another 2, but instead of using movsw, it uses a movzw word load and a mov store. This makes a total of 78 bytes copied.
...
# at this point EAX = malloc return value which I'll call buf
<+28>: mov $0x804a388,%esi # copy src = a string literal in .rodata?
<+33>: mov $0x13,%ecx
<+38>: mov %eax,%edi # copy dst = buf
<+40>: rep movsl %ds:(%esi),%es:(%edi) # memcpy 76 bytes and advance ESI, EDI
<+42>: movzwl (%esi),%edx
<+45>: mov %dx,(%edi) # copy another 2 bytes (not moving ESI or EDI)
# final effect: 78-byte memcpy
On some (but not all) CPUs it would have been efficient to just use rep movsb or rep movsw with appropriate counts, but that's not what the compiler chose in this case. movzx aka AT&T movz is a good way to do narrow loads without partial-register penalties. That's why compilers do it, so they can write a full register even though they're only going to read the low 8 or 16 bits of that reg with a store instruction.
After that copy of a string literal into buf, we have a byte load/store that copies a character with buf. Remember at this point EAX is still pointing at buf, the malloc return value. So it's making a modified copy of the string literal.
<+48>: movzbl 0x11(%eax),%edx
<+52>: mov %dl,0x10(%eax) # buf[16] = buf[17]
Perhaps if the source hadn't defeated constant-propagation, with high enough optimization level the compiler might have just put the final string into .rodata where you could find it, trivializing this bomb phase. :P
Then it stores pointers as stack args for string compare.
<+55>: mov %eax,0x4(%esp) # 2nd arg slot = EAX = buf
<+59>: mov 0x20(%esp),%eax # function arg = user input?
<+63>: mov %eax,(%esp) # first arg slot = our incoming stack arg
<+66>: call 0x80490ca <strings_not_equal>
How to "cheat": looking at the runtime result with GDB
Some bomb labs only let you run the bomb online, on a test server, which would record explosions. You couldn't run it under GDB, only use static disassembly (like objdump -drwC -Mintel). So the test server could record how many failed attempts you had. e.g. like CS 3330 at cs.virginia.edu that I found with google, where full credit requires less than 20 explosions.
Using GDB to examine memory / registers part way through a function makes this vastly easier than only working from static analysis, in fact trivializing this function where the single input is only checked at the very end. e.g. just look at what other arg is being passed to strings_not_equal. (Especially if you use GDB's jump or set $pc = ... commands to skip past the bomb explosion checks.)
Set a breakpoint or single-step to just before the call to strings_not_equal. Use p (char*)$eax to treat EAX as a char* and show you the (0-terminated) C string starting at that address. At that point EAX holds the address of the buffer, as you can see from the store to the stack.
Copy/paste that string result and you're done.
Other phases with multiple numeric inputs typically aren't this easy to cheese with a debugger and do require at least some math, but linked-list phases that requires you to have a sequence of numbers in the right order for list traversal also become trivial if you know how to use a debugger to set registers to make compares succeed as you get to them.
rep movsl copies 32-bit longwords from address %esi to address %edi, incrementing both by 4 each time, a number of times equal to %ecx. Think of it as memcpy(edi, esi, ecx*4).
See https://felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq (it's movsd in Intel notation).
So this is copying 19*4=76 bytes.

How an assembly language works?

I am learning assembly and I have this assembly code and having much trouble understanding it can someone clarify it?
Dump of assembler code for function main:
0x080483ed <+0>: push ebp
0x080483ee <+1>: mov ebp,esp
0x080483f0 <+3>: sub esp,0x10
0x080483f3 <+6>: mov DWORD PTR [ebp-0x8],0x0
0x080483fa <+13>: mov eax,DWORD PTR [ebp-0x8]
0x080483fd <+16>: add eax,0x1
0x08048400 <+19>: mov DWORD PTR [ebp-0x4],eax
0x08048403 <+22>: leave
0x08048404 <+23>: ret
Until now, my understood knowledge is the following:
Push something (don't know what) in ebp register. then move content of esp register into ebp (I think the data of ebp should be overwritten), then subtract 10 from the esp and store it in the esp (The function will take 10 byte, This reg is never used again, so no point of doing this operation). Now assign value 0 to the address pointed by 8 bytes less than ebp.
Now store that address into register eax. Now add 1 to the value pointed by eax (the previous value is lost). Now store the eax value on [ebp-0x4], then leave to the return address of main.
Here is my C code for the above program:
int main(){
int x=0;
int y = x+1;
}
Now, can someone figure out if I am wrong at anything,and I also don't understand the mov at <+13> it adds 1 to the addrs ebp-0x8, but that is the address of int x so, x no longer contain 0. Where am I wrong?
first of all, push ebp and then mov ebp, esp are two instructions that are common at the beggining of a procedure. ESP register is an indicator for the top of the stack - so it changes constantly as the stack grows or shrinks. EBP is a helping register here. First we push content of ebp on stack. then we copy ESP (current stack top adress) to ebp - that is why when we refer to other items on the stack, we use constant value of ebp (and not changing one of esp).
sub esp, 0x10 ; means we reserve 16 bytes on the stack (0x10 is 16 in hex)
now for the real fun:
mov DWORD PTR [ebp-0x8],0x0 ; remember ebp was showing on the stack
; top BEFORE reserving 16 bytes.
; DWORD PTR means Double-word property which is 32 bits.
; so the whole instruction means
; "move 0 to the 32 bits of the stack in a place which
; starts with the adress ebp-8.
; this is our`int x = 0`
mov eax,DWORD PTR [ebp-0x8] ; send x to EAX register.
add eax,0x1` ; add 1 to the eax register
mov DWORD PTR [ebp-0x4],eax ; send the result (which is in eax) to the stack adress
; [ebp-4]
leave ; Cleanup stack (reverse the "mov ebp, esp" from above).
ret ; let's say this instruction returns to the program, (it's slightly more
; complicated than that)
Hope this helps! :)
0x080483ed <+0>: push ebp
0x080483ee <+1>: mov ebp,esp
Setting up the stack frame. Save the old base pointer and set the top of the stack as the new base pointer. This allows local variables and arguments within this function to be referenced relative to ebp (base pointer). The advantage of this is that its value is stable unlike esp which is affected by pushes and pops.
0x080483f0 <+3>: sub esp,0x10
On the x86 platform the stack 'grows' downwards. Generally speaking this means esp has a lower value (address in memory) than ebp. When ebp == esp the stack has not reserved any memory for local variables. This does not mean it is 'empty' - a common usage of a stack is [ebp+0x8] for instance. In this case the code is looking for something on the stack which was previously pushed on prior to the call (this could be arguments in the stdcall convention).
In this case the stack is extended by 16 bytes. In this case more space is reserved than necessary for alignment purposes:
0x080483f3 <+6>: mov DWORD PTR [ebp-0x8],0x0
The 4 bytes at [ebp-0x8] are initialised to the value 0. This is your x local variable.
0x080483fa <+13>: mov eax,DWORD PTR [ebp-0x8]
The 4 bytes at [ebp-0x8] are moved to a register. Arithmetic opcodes can not operate with two memory operands. Data needs to be moved to a register first before the arithmetic is performed. eax now holds the value of your x variable.
0x080483fd <+16>: add eax,0x1
The value of eax is increased so it now holds the value x + 1.
0x08048400 <+19>: mov DWORD PTR [ebp-0x4],eax
Stores the calculated value back on the stack. Notice the local variable is now [ebp-0x4] - this is your y variable.
0x08048403 <+22>: leave
Destroys the stack frame. Essentially maps to pop ebp and restores the old stack base pointer.
0x08048404 <+23>: ret
Pops the top of the stack treating the value as a return address and sets the program pointer (eip) to this value. The return address typically holds the address of the instruction directly after the call instruction that brought execution into this function.

Debugging C program (int declaration)

I'm still learning assembly and C, but now, I'm trying to understand how the compiler works. I have here a simple code:
int sub()
{
return 0xBEEF;
}
main()
{
int a=10;
sub();
}
Now I know already how the CPU works, jumping into the frames and subroutines etc.
What i don't understand is where the program "store" their local variables. In this case in the main's frame?
Here is the main frame on debugger:
0x080483f6 <+0>: push %ebp
0x080483f7 <+1>: mov %esp,%ebp
0x080483f9 <+3>: sub $0x10,%esp
=> 0x080483fc <+6>: movl $0xa,-0x4(%ebp)
0x08048403 <+13>: call 0x80483ec <sub>
0x08048408 <+18>: leave
0x08048409 <+19>: ret
I have in "int a=10;" a break point that's why the the offset 6 have that arrow.
So, the main's function starts like the others pushing the ebp bla bla bla, and then i don't understand this:
0x080483f9 <+3>: sub $0x10,%esp
=> 0x080483fc <+6>: movl $0xa,-0x4(%ebp)
why is doing sub in esp? is the variable 'a' on the stack with the offset -0x4 of the stack pointer?
just to clear the ideas here :D
Thanks in advance!
0x080483f9 <+3>: sub $0x10,%esp
You will find such an instruction in every function. Its purpose is to create a stack frame of the appropriate size so that the function can store its locals (remember that the stack grows backward!).
The stack frame is a little too big in this case. This is because gcc (starting from 2.96) pads stack frames to 16 bytes boundaries by default to account for SSEx instructions which require packed 128-bit vectors to be aligned to 16 bytes. (reference here).
=> 0x080483fc <+6>: movl $0xa,-0x4(%ebp)
This line is initializing a to the correct value (0xa = 10d). Locals are always referred with an offset relative to ebp, which marks the beginning of the stack frame (which is therefore included between ebp and esp).

Finding the Starting Address of an array

I've been working on the bufbomb lab from CSAPPS and I've gotten stuck on one of the phases.
I won't get into the gore-y details of the project since I just need a nudge in the right direction. I'm having a hard time finding the starting address of the array called "buf" in the given assembly.
We're given a function called getbuf:
#define NORMAL_BUFFER_SIZE 32
int getbuf()
{
char buf[NORMAL_BUFFER_SIZE];
Gets(buf);
return 1;
}
And the assembly dumps:
Dump of assembler code for function getbuf:
0x08048d92 <+0>: sub $0x3c,%esp
0x08048d95 <+3>: lea 0x10(%esp),%eax
0x08048d99 <+7>: mov %eax,(%esp)
0x08048d9c <+10>: call 0x8048c66 <Gets>
0x08048da1 <+15>: mov $0x1,%eax
0x08048da6 <+20>: add $0x3c,%esp
0x08048da9 <+23>: ret
End of assembler dump.
Dump of assembler code for function Gets:
0x08048c66 <+0>: push %ebp
0x08048c67 <+1>: push %edi
0x08048c68 <+2>: push %esi
0x08048c69 <+3>: push %ebx
0x08048c6a <+4>: sub $0x1c,%esp
0x08048c6d <+7>: mov 0x30(%esp),%esi
0x08048c71 <+11>: movl $0x0,0x804e100
0x08048c7b <+21>: mov %esi,%ebx
0x08048c7d <+23>: jmp 0x8048ccf <Gets+105>
0x08048c7f <+25>: mov %eax,%ebp
0x08048c81 <+27>: mov %al,(%ebx)
0x08048c83 <+29>: add $0x1,%ebx
0x08048c86 <+32>: mov 0x804e100,%eax
0x08048c8b <+37>: cmp $0x3ff,%eax
0x08048c90 <+42>: jg 0x8048ccf <Gets+105>
0x08048c92 <+44>: lea (%eax,%eax,2),%edx
0x08048c95 <+47>: mov %ebp,%ecx
0x08048c97 <+49>: sar $0x4,%cl
0x08048c9a <+52>: mov %ecx,%edi
0x08048c9c <+54>: and $0xf,%edi
0x08048c9f <+57>: movzbl 0x804a478(%edi),%edi
0x08048ca6 <+64>: mov %edi,%ecx
---Type <return> to continue, or q <return> to quit---
0x08048ca8 <+66>: mov %cl,0x804e140(%edx)
0x08048cae <+72>: mov %ebp,%ecx
0x08048cb0 <+74>: and $0xf,%ecx
0x08048cb3 <+77>: movzbl 0x804a478(%ecx),%ecx
0x08048cba <+84>: mov %cl,0x804e141(%edx)
0x08048cc0 <+90>: movb $0x20,0x804e142(%edx)
0x08048cc7 <+97>: add $0x1,%eax
0x08048cca <+100>: mov %eax,0x804e100
0x08048ccf <+105>: mov 0x804e110,%eax
0x08048cd4 <+110>: mov %eax,(%esp)
0x08048cd7 <+113>: call 0x8048820 <_IO_getc#plt>
0x08048cdc <+118>: cmp $0xffffffff,%eax
0x08048cdf <+121>: je 0x8048ce6 <Gets+128>
0x08048ce1 <+123>: cmp $0xa,%eax
0x08048ce4 <+126>: jne 0x8048c7f <Gets+25>
0x08048ce6 <+128>: movb $0x0,(%ebx)
0x08048ce9 <+131>: mov 0x804e100,%eax
0x08048cee <+136>: movb $0x0,0x804e140(%eax,%eax,2)
0x08048cf6 <+144>: mov %esi,%eax
0x08048cf8 <+146>: add $0x1c,%esp
0x08048cfb <+149>: pop %ebx
0x08048cfc <+150>: pop %esi
0x08048cfd <+151>: pop %edi
---Type <return> to continue, or q <return> to quit---
0x08048cfe <+152>: pop %ebp
0x08048cff <+153>: ret
End of assembler dump.
I'm having a difficult time locating where the starting address of buf is (or where buf is at all in this mess!). If someone could point that out to me, I'd greatly appreciate it.
Attempt at a solution
Reading symbols from /home/user/CS247/buflab/buflab-handout/bufbomb...(no debugging symbols found)...done.
(gdb) break getbuf
Breakpoint 1 at 0x8048d92
(gdb) run -u user < firecracker-exploit.bin
Starting program: /home/user/CS247/buflab/buflab-handout/bufbomb -u user < firecracker-exploit.bin
Userid: ...
Cookie: ...
Breakpoint 1, 0x08048d92 in getbuf ()
(gdb) print buf
No symbol table is loaded. Use the "file" command.
(gdb)
As has been pointed out by some other people, buf is allocated on the stack at run time. See these lines in the getbuf() function:
0x08048d92 <+0>: sub $0x3c,%esp
0x08048d95 <+3>: lea 0x10(%esp),%eax
0x08048d99 <+7>: mov %eax,(%esp)
The first line subtracts 0x3c (60) bytes from the stack pointer, effectively allocating that much space. The extra bytes beyond 32 are probably for parameters for Gets (Its hard to tell what the calling convention is for Gets is precisely, so its hard to say) The second line gets the address of the 16 bytes up. This leaves 44 bytes above it that are unallocated. The third line puts that address onto the stack for probably for the gets function call. (remember the stack grows down, so the stack pointer will be pointing at the last item on the stack). I am not sure why the compiler generated such strange offsets (60 bytes and then 44) but there is probably a good reason. If I figure it out I will update here.
Inside the gets function we have the following lines:
0x08048c66 <+0>: push %ebp
0x08048c67 <+1>: push %edi
0x08048c68 <+2>: push %esi
0x08048c69 <+3>: push %ebx
0x08048c6a <+4>: sub $0x1c,%esp
0x08048c6d <+7>: mov 0x30(%esp),%esi
Here we see that we save the state of some of the registers, which add up to 16-bytes, and then Gets reserves 28 (0x1c) bytes on the stack. The last line is key: It grabs the value at 0x30 bytes up the stack and loads it into %esi. This value is the address of buf put on the stack by getbuf. Why? 4 for the return addres plus 16 for the registers+28 reserved = 48. 0x30 = 48, so it is grabbing the last item placed on the stack by getbuf() before calling gets.
To get the address of buf you have to actually run the program in the debugger because the address will probably be different everytime you run the program, or even call the function for that matter. You can set a break point at any of these lines above and either dump the %eax register when the it contains the address to be placed on the stack on the second line of getbuf, or dump the %esi register when it is pulled off of the stack. This will be the pointer to your buffer.
to be able to see debugging info while using gdb,you must use the -g3 switch with gcc when you compile.see man gcc for more details on the -g switch.
Only then, gcc will add debugging info (symbol table) into the executable.
0x08048cd4 <+110>: mov %eax,(%esp)
0x08048cd7 <+113>: **call 0x8048820 <_IO_getc#plt>**
0x08048cdc <+118>: cmp $0xffffffff,%eax
0x0848cdf <+121>: je 0x8048ce6 <Gets+128>
0x08048ce1 <+123>: cmp $0xa,%eax
0x08048ce4 <+126>: jne 0x8048c7f <Gets+25>
0x08048ce6 <+128>: movb $0x0,(%ebx)
0x08048ce9 <+131>: mov 0x804e100,%eax
0x08048cee <+136>: movb $0x0,0x804e140(%eax,%eax,2)
0x08048cf6 <+144>: mov %esi,%eax
0x08048cf8 <+146>: add $0x1c,%esp
0x08048cfb <+149>: **pop %ebx**
0x08048cfc <+150>: **pop %esi**
0x08048cfd <+151>: **pop %edi**
---Type <return> to continue, or q <return> to quit---
0x08048cfe <+152>: **pop %ebp**
0x08048cff <+153>: ret
End of assembler dump.
I Don't know your flavour of asm but there's a call in there which may use the start address
The end of the program pops various pointers
That's where I'd start looking
If you can tweak the asm for these functions you can input your own routines to dump data as the function runs and before those pointers get popped
buf is allocated on the stack. Therefore, you will not be able to spot its address from an assembly listing. In other words, buf is allocated (and its address therefore known) only when you enter the function getbuf() at runtime.
If you must know the address, one option would be to use gbd (but make sure you compile with the -g flag to enable debugging support) and then:
gdb a.out # I'm assuming your binary is a.out
break getbuf # Set a breakpoint where you want gdb to stop
run # Run the program. Supply args if you need to
# WAIT FOR your program to reach getbuf and stop
print buf
If you want to go this route, a good gdb tutorial (example) is essential.
You could also place a printf inside getbuf and debug that way - it depends on what you are trying to do.
One other point leaps out from your code. Upon return from getbuf, the result of Gets will be trashed. This is because Gets is presumably writing its results into the stack-allocated buf. When you return from getbuf, your stack is blown and you cannot reliably access buf.

Resources