where the arguments are in gdb and where s the ret? - c

I have the following program:
void test_function(int a,int b, int c, int d){
int flag;
char buffer[10];
flag = 31337;
buffer[0]='A';
}
int main(){
test_function(1,2,3,4);
}
I compiled it with gcc -g option.
I am setting 2 breakpoints one just before the test_function call inside main and one right after.
(gdb) list
1 void test_function(int a,int b, int c, int d){
2 int flag;
3 char buffer[10];
4
5 flag = 31337;
6 buffer[0]='A';
7 }
8
9 int main(){
10 test_function(1,2,3,4);
(gdb) break 10
Breakpoint 1 at 0x804843c: file stackexample.c, line 10.
(gdb) break test_function
Breakpoint 2 at 0x804840a: file stackexample.c, line 1.
(gdb) run
Starting program: /root/tests/c-tests/./stackexample
Breakpoint 1, main () at stackexample.c:10
10 test_function(1,2,3,4);
(gdb) i r esp ebp eip
esp 0xbffff4d0 0xbffff4d0
ebp 0xbffff4e8 0xbffff4e8
eip 0x804843c 0x804843c <main+9>
According to my knowledge, 0xbffff4d0 this address is the current bottom of the stack (top-highest address) and this will be used for the creation (reference) of the new stack frame after the call of test_function.
(gdb) x/5i $eip
=> 0x804843c <main+9>: mov DWORD PTR [esp+0xc],0x4
0x8048444 <main+17>: mov DWORD PTR [esp+0x8],0x3
0x804844c <main+25>: mov DWORD PTR [esp+0x4],0x2
0x8048454 <main+33>: mov DWORD PTR [esp],0x1
0x804845b <main+40>: call 0x8048404 <test_function>
Before the test_function call the arguments are stored with these mov instructions.
(gdb) info frame
Stack level 0, frame at 0xbffff4f0:
eip = 0x804843c in main (stackexample.c:10); saved eip 0xb7e8bbd6
source language c.
Arglist at 0xbffff4e8, args:
Locals at 0xbffff4e8, Previous frame's sp is 0xbffff4f0
Saved registers:
ebp at 0xbffff4e8, eip at 0xbffff4ec
(gdb) cont
Continuing.
Breakpoint 2, test_function (a=1, b=2, c=3, d=4) at stackexample.c:1
1 void test_function(int a,int b, int c, int d){
(gdb) info frame
Stack level 0, frame at 0xbffff4d0:
eip = 0x804840a in test_function (stackexample.c:1); saved eip 0x8048460
called by frame at 0xbffff4f0
source language c.
Arglist at 0xbffff4c8, args: a=1, b=2, c=3, d=4
Locals at 0xbffff4c8, Previous frame's sp is 0xbffff4d0
Saved registers:
ebp at 0xbffff4c8, eip at 0xbffff4cc
(gdb) i r esp ebp eip
esp 0xbffff4a0 0xbffff4a0
ebp 0xbffff4c8 0xbffff4c8
eip 0x804840a 0x804840a <test_function+6>
So here its obvious that first frame's esp became the current starting address of this frame. Although what I dont get is in which stack frame the arguments are ??? because...
(gdb) info locals
flag = 134513420
buffer = "\377\267\364\237\004\b\350\364\377\277"
Here we cannot see the args. If we ..
(gdb) info args
a = 1
b = 2
c = 3
d = 4
(gdb) print &a
$4 = (int *) 0xbffff4d0
(gdb) print &b
$5 = (int *) 0xbffff4d4
(gdb) print &c
$6 = (int *) 0xbffff4d8
(gdb) print &d
$7 = (int *) 0xbffff4dc
So here we see that the arguments are starting from the first address that this current stack frame has which is 0xbffff4d0
And the other question is the following according to this output
(gdb) x/16xw $esp
0xbffff4a0: 0xb7fc9ff4 0x08049ff4 0xbffff4b8 0x0804830c
0xbffff4b0: 0xb7ff1080 0x08049ff4 0xbffff4e8 0x08048499
0xbffff4c0: 0xb7fca324 0xb7fc9ff4 0xbffff4e8 0x08048460
0xbffff4d0: 0x00000001 0x00000002 0x00000003 0x00000004
Address 0x08048460 is the eip = 0x804840a in test_function (stackexample.c:1); saved eip 0x8048460 and also `#1 0x08048460 in main () at stackexample.c:10 (output from backtrace)
How come and RET to main is on top (into a lower address) than the arguments ? Shouldnt ret address be in the start of the new stack frame? Sorry but I am trying to understand how stack works and I am kinda confused :S Another thing that I dont undestand is that the reference for the local variables is happening through $esp+(offset). Does the value of esp is always depending on the "current" stack frame that the execution is?

Your disassembled program looks like this on my system:
gcc -m32 -c -o stackexample.o stackexample.c
objdump -d -M intel stackexample.o
test_function:
push ebp
mov ebp,esp
sub esp,0x10
mov DWORD PTR [ebp-0x4],0x7a69
mov BYTE PTR [ebp-0xe],0x41
leave
ret
main:
push ebp
mov ebp,esp
sub esp,0x10
mov DWORD PTR [esp+0xc],0x4
mov DWORD PTR [esp+0x8],0x3
mov DWORD PTR [esp+0x4],0x2
mov DWORD PTR [esp],0x1
call test_function
leave
ret
Let's start from the beginning.
Stack is arranged from top to bottom in memory. The top of the stack has the lowest address.
esp is the Stack Pointer. It always points to the top of the stack.
ebp is the Base Pointer. It points to the bottom of current stack frame. It is used for referencing current function's arguments and local variables.
These instructions
push ebp
mov ebp,esp
can be found at the top of every function. They do the following:
save caller's Base Pointer
setup current function's Base Pointer by assigning it to Stack Pointer. At this point Stack Pointer points to bottom of current stack frame, so by assigning Base Pointer to it, Base Pointer will show current bottom. Stack Pointer can increase/decrease during function's execution, so you use Base Pointer to reference the variables. Base Pointer also servers for saving/storing caller's Stack Pointer.
leave is equivalent to
mov esp, ebp
pop ebp
which is the exact opposite of the instructions above:
restore caller's Stack Pointer
restore caller's Base Pointer
Now to answer your questions
in which stack frame the arguments are ???
Arguments are stored in caller's stack frame. However you can use Base Pointer to access them. info locals does not display the information about the function arguments as part of gdb's specification:
http://visualgdb.com/gdbreference/commands/info_locals
How come and RET to main is on top (into a lower address) than the arguments ? Shouldnt ret address be in the start of the new stack frame
That's because arguments are stored in caller's frame. When test_function is called, the stack already has the arguments stored, so the returned address is stored higher (aka lower address) than the arguments.
reference for the local variables is happening through $esp+(offset).
As far as I know, referencing local variables can happen both using the Base Pointer and the Stack Pointer - whichever is convenient for your compiler (not really sure).
Does the value of esp is always depending on the "current" stack frame that the execution is?
Yes. Stack Pointer is THE most important stack register. It points to the top of the stack.

Related

How is the stack unwound and stack frames identified

Let's say I have this simple program in C.
int my_func(int a, int b, int c) //0x4000
{
int d = 0;
int e = 0;
return e+d;
}
int main()
{
my_func(1,2,3); // 0x5000
return 0;
}
Ignoring the fact that it is essentially all dead code which can be completely optimized away. We'll say that my_func() lives at address 0x4000 and it is being called at address 0x5000.
From my understanding, a c compiler (I understand they can operate differently by vendor) may:
push c to the stack
push b to the stack
push a to the stack
push 0x5000 to the stack (return address)
call 0x4000
Then I'm assuming to access a it uses sp (stack pointer) + 1. b is sp+2 and c is sp+3.
Since d and e are on the stack, I'm guessing our stack would now look like this?
c
b
a
0x5000
d
e
When we get to the end of the function.
Does it then pop e and d off the stack?
Then... push e+d? Or save it to a register to be used after return?
Return to 0x5000 because it's the top of the stack?
Then pop the return address (0x5000) and a, b and c?
I'm guessing this is why old c required all the variables to be declared at the top of a function so that the compiler could count the number of pops it needed to perform at the end of the function?
I understand that it could have stored 0x5000 in a register, but a C program is able to go multiple levels deep into many functions and there are only so many registers...
Thanks!
In default calling convention for C, caller frees function argument after return from function. But function itself manages its own variables on stack. For example here is your code in assembly without any optimization:
my_func:
push ebp // +
mov ebp, esp // These 2 lines prepare function stack
sub esp, 16 // reserve memory for local variables
mov DWORD PTR [ebp-4], 0
mov DWORD PTR [ebp-8], 0
mov edx, DWORD PTR [ebp-8]
mov eax, DWORD PTR [ebp-4]
add eax, edx // <--return value in eax
leave // return esp to what it was at start of function
ret // return to caller
main:
push ebp
mov ebp, esp
push 3
push 2
push 1
call my_func
add esp, 12 // <- return esp to what it was before pushing arguments
mov eax, 0
leave
ret
As you see, there is a add esp, 12 in main for returning esp as it was before pushing arguments. In my_func there is a pair like this:
push ebp
mov ebp, esp
sub esp, 16 // <--- size of stack
...
leave
ret
This pair set is used for reserving some memory as stack. leave reverses the effect of push ebp/move ebp,esp. And function used ebp for accessing its arguments and stack-allocated variables. Return value is always in eax.
A quick allocated stack size note:
As you see, in function, there is a add esp, 16 instruction even though you only keep 2 variable of type int on stack which has a total size of 8 bytes. It is because stack size is aligned to specific boundaries (At least with default compile options). If you add 2 more int variables to my_func, this instruction is still add esp, 16, because total stack is still in 16 byte alignment. But if you add a 3rd variable of int, this instruction becomes add esp, 32. This alignment can be configured by -mpreferred-stack-boundary option in GCC.
By the way, all of these are for 32-bit compilation of code.In contrast, you normally never pass argument via stack pushing in 64-bit and you pass them through registers. As mentioned in comment, in 64-bit arguments are only passed through stack starting 5th argument(on microsoft x64 calling convention).
Update:
From default calling convention, In mean cdecl which is normally used when you compile your code for x86, without any compiler options or specific function attributes. If you change function call to stdcall as an example, all these will change.

What exactly happens in this minimalistic C code on assembly level?

I am currently trying to understand Writing buffer overflow exploits - a tutorial for beginners.
The C code, compiled with cc -ggdb exploitable.c -o exploitable
#include <stdio.h>
void exploitableFunction (void) {
char small[30];
gets (small);
printf("%s\n", small);
}
main() {
exploitableFunction();
return 0;
}
seems to have the assembly code
0x000000000040063b <+0>: push %rbp
0x000000000040063c <+1>: mov %rsp,%rbp
0x000000000040063f <+4>: callq 0x4005f6 <exploitableFunction>
0x0000000000400644 <+9>: mov $0x0,%eax
0x0000000000400649 <+14>: pop %rbp
0x000000000040064a <+15>: retq
I think it does the following, but I'm really not sure about it and I would like to hear from somebody who is experienced with assembly code if I'm right / what is right.
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
400644: Copy the value from the address $0x0 to the EAX register
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
You want to save the base pointer because it is probably used by the calling function.
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
This gives you a fixed position into the stack, which might contain parameters for the function. It can also be used as a base address for any local variables.
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
"call" means pushing the return address (address of the next instruction) onto the stack, and then jumping to the start of the called function.
400644: Copy the value from the address $0x0 to the EAX register
It is actually the value 0 from the return statement.
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
This restores the base pointer we saved at the top. The calling function might assume that we do.
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
It was the constant from return 0. Using EAX for a small return value is a common convention.
I found a Link which have similar code to your own with full explenation.
40063b: push the old base pointer onto the stack to save it for later. It's pushed because this is not the only process in the code. some other process call it.
40063c: copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main’s stack frame.
40063f: call the function in address 0x4005f6 which push the program counter into stack and load address 0x4005f6 into program conter, when the function returns, pop operation is happened to return the saved address in the stack to program counter which is 0x400644 here
400644: This instruction copies 0 into %eax, The x86 calling convention dictates that a function’s return value is stored in %eax
400649: We pop the old base pointer off the stack and store it back in %rbp
40064a: jumps back to return address, which is also stored in the stack frame. which specify the end of the program.
Also you didn't mention the assembly code for the function exploitableFunction. here is only main function
The function entry saves bp and moves sp into bp. All parameters of the function will now be addressed using bp. This is a standard cdecl convention (in Intel assembler):
; int example(char *s, int i)
push bp ; save the caller's value of bp
mov bp,sp ; set-up our base pointer to the stack-frame
sub sp, 16 ; room for automatic variables
mov ax,dword ptr [bp+8] ; ax has *s
mov bx,dword ptr [bp+12] ; bx has i
... ; do your thing
mov ax, dword ptr[result] ; function return in ax
pop bp ; restore caller's base-pointer
ret
When calling this function, the compiler pushes the parameters onto the stack and then calls the function. Upon return, it cleans up the stack:
; i= example(myString, k);
mov ax, [bp+16] ; this gets a parameter of the curent function
push ax ; this will be parameter i
mov ax, [bp-16] ; this gets a local variable
push ax ; this is parameter s
call example
add sp,8 ; remove the pushed parameters from the stack
mov dword ptr [i], ax ; save return value - always in ax
Different compilers can use different conventions about passing parameters in registers, but I think the above is the basics of calls in C (using cdecl).

Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?

That's what I understood by reading some memory segmentation documents: when a function is called, there are a few instructions (called function prologue) that save the frame pointer on the stack, copy the value of the stack pointer into the base pointer and save some memory for local variables.
Here's a trivial code I am trying to debug using GDB:
void test_function(int a, int b, int c, int d) {
int flag;
char buffer[10];
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_function(1, 2, 3, 4);
}
The purpose of debugging this code was to understand what happens in the stack when a function is called: so I had to examine the memory at various step of the execution of the program (before calling the function and during its execution). Although I managed to see things like the return address and the saved frame pointer by examining the base pointer, I really can't understand what I'm going to write after the disassembled code.
Disassembling:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400509 <+0>: push rbp
0x000000000040050a <+1>: mov rbp,rsp
0x000000000040050d <+4>: mov ecx,0x4
0x0000000000400512 <+9>: mov edx,0x3
0x0000000000400517 <+14>: mov esi,0x2
0x000000000040051c <+19>: mov edi,0x1
0x0000000000400521 <+24>: call 0x4004ec <test_function>
0x0000000000400526 <+29>: pop rbp
0x0000000000400527 <+30>: ret
End of assembler dump.
(gdb) disassemble test_function
Dump of assembler code for function test_function:
0x00000000004004ec <+0>: push rbp
0x00000000004004ed <+1>: mov rbp,rsp
0x00000000004004f0 <+4>: mov DWORD PTR [rbp-0x14],edi
0x00000000004004f3 <+7>: mov DWORD PTR [rbp-0x18],esi
0x00000000004004f6 <+10>: mov DWORD PTR [rbp-0x1c],edx
0x00000000004004f9 <+13>: mov DWORD PTR [rbp-0x20],ecx
0x00000000004004fc <+16>: mov DWORD PTR [rbp-0x4],0x7a69
0x0000000000400503 <+23>: mov BYTE PTR [rbp-0x10],0x41
0x0000000000400507 <+27>: pop rbp
0x0000000000400508 <+28>: ret
End of assembler dump.
I understand that "saving the frame pointer on the stack" is done by " push rbp", "copying the value of the stack pointer into the base pointer" is done by "mov rbp, rsp" but what is getting me confused is the lack of a "sub rsp $n_bytes" for "saving some memory for local variables". I've seen that in a lot of exhibits (even in some topics here on stackoverflow).
I also read that arguments should have a positive offset from the base pointer (after it's filled with the stack pointer value), since if they are located in the caller function and the stack grows toward lower addresses it makes perfect sense that when the base pointer is updated with the stack pointer value the compiler goes back in the stack by adding some positive numbers. But my code seems to store them in a negative offset, just like local variables.. I also can't understand why they are put in those registers (in the main).. shouldn't they be saved directly in the rsp "offsetted"?
Maybe these differences are due to the fact that I'm using a 64 bit system, but my researches didn't lead me to anything that would explain what I am facing.
The System V ABI for x86-64 specifies a red zone of 128 bytes below %rsp. These 128 bytes belong to the function as long as it doesn't call any other function (it is a leaf function).
Signal handlers (and functions called by a debugger) need to respect the red zone, since they are effectively involuntary function calls. All of the local variables of your test_function, which is a leaf function, fit into the red zone, thus no adjustment of %rsp is needed. (Also, the function has no visible side-effects and would be optimized out on any reasonable optimization setting).
You can compile with -mno-red-zone to stop the compiler from using space below the stack pointer. Kernel code has to do this because hardware interrupts don't implement a red-zone.
But my code seems to store them in a negative offset, just like local variables
The first x86_64 arguments are passed on registers, not on the stack. So when rbp is set to rsp, they are not on the stack, and cannot be on a positive offset.
They are being pushed only to:
save register state for a second function call.
In this case, this is not required since it is a leaf function.
make register allocation easier.
But an optimized allocator could do a better job without memory spill here.
The situation would be different if you had:
x86_64 function with lots of arguments. Those that don't fit on registers go on the stack.
IA-32, where every argument goes on the stack.
the lack of a "sub rsp $n_bytes" for "saving some memory for local variables".
The missing sub rsp on red zone of leaf function part of the question had already been asked at: Why does the x86-64 GCC function prologue allocate less stack than the local variables?

How do I calculate the address of the stored EIP

As the title says, I am trying to obtain the address of the stored EIP in the frame.
For this simple program:
func1(int a, int b)
{
int x = 1;
}
int main(void)
{
func1(1,2);
}
My gdb disassembly is:
(gdb) disassemble main
Dump of assembler code for function main:
0x08048430 <main+0>: push %ebp
0x08048431 <main+1>: mov %esp,%ebp
0x08048433 <main+3>: sub $0x8,%esp
0x08048436 <main+6>: add $0xfffffff8,%esp
0x08048439 <main+9>: push $0x2
0x0804843b <main+11>: push $0x1
0x0804843d <main+13>: call 0x8048410 <func1>
0x08048442 <main+18>: add $0x10,%esp
0x08048445 <main+21>: mov %ebp,%esp
0x08048447 <main+23>: pop %ebp
0x08048448 <main+24>: ret
End of assembler dump.
Stack frame printed from GDB:
(gdb) info frame
Stack level 0, frame at 0xffbfdda0:
eip = 0x8048416 in func1 (t.c:3); saved eip 0x8048442
called by frame at 0xffbfddc0
source language c.
Arglist at 0xffbfdd98, args: a=1, b=2
Locals at 0xffbfdd98, Previous frame's sp is 0xffbfdda0
Saved registers:
ebp at 0xffbfdd98, eip at 0xffbfdd9c
info frame doesn't provide the address of the saved eip, it just shows the value of the save eip.
I setup a break point on func1, then printed the frame information. The saved EIP has a value of 0x8048442, which corresponds to in the disassembly. I am confused, how do I calculate the address of where EIP(0x8048442) is located?
i have examined the address 0x8048412(0x8048416 - 4), but it doesn't contain the saved EIP address.
You need to examine the area before the arg list. It tells you that: eip at 0xffbfdd9c.
This address is 4 bytes before the arg list - 0xffbfdd98. Remember that the list grows down so "4 bytes before x" means "x+4".
The saved eip 0x8048442 info is about where does the EIP points to, which is in the text section, not in the stack.

How to write a buffer-overflow exploit in GCC,windows XP,x86?

void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;//why is it 8??
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
The above demo is from here:
http://insecure.org/stf/smashstack.html
But it's not working here:
D:\test>gcc -Wall -Wextra hw.cpp && a.exe
hw.cpp: In function `void function(int, int, int)':
hw.cpp:6: warning: unused variable 'buffer2'
hw.cpp: At global scope:
hw.cpp:4: warning: unused parameter 'a'
hw.cpp:4: warning: unused parameter 'b'
hw.cpp:4: warning: unused parameter 'c'
1
And I don't understand why it's 8 though the author thinks:
A little math tells us the distance is
8 bytes.
My gdb dump as called:
Dump of assembler code for function main:
0x004012ee <main+0>: push %ebp
0x004012ef <main+1>: mov %esp,%ebp
0x004012f1 <main+3>: sub $0x18,%esp
0x004012f4 <main+6>: and $0xfffffff0,%esp
0x004012f7 <main+9>: mov $0x0,%eax
0x004012fc <main+14>: add $0xf,%eax
0x004012ff <main+17>: add $0xf,%eax
0x00401302 <main+20>: shr $0x4,%eax
0x00401305 <main+23>: shl $0x4,%eax
0x00401308 <main+26>: mov %eax,0xfffffff8(%ebp)
0x0040130b <main+29>: mov 0xfffffff8(%ebp),%eax
0x0040130e <main+32>: call 0x401b00 <_alloca>
0x00401313 <main+37>: call 0x4017b0 <__main>
0x00401318 <main+42>: movl $0x0,0xfffffffc(%ebp)
0x0040131f <main+49>: movl $0x3,0x8(%esp)
0x00401327 <main+57>: movl $0x2,0x4(%esp)
0x0040132f <main+65>: movl $0x1,(%esp)
0x00401336 <main+72>: call 0x4012d0 <function>
0x0040133b <main+77>: movl $0x1,0xfffffffc(%ebp)
0x00401342 <main+84>: mov 0xfffffffc(%ebp),%eax
0x00401345 <main+87>: mov %eax,0x4(%esp)
0x00401349 <main+91>: movl $0x403000,(%esp)
0x00401350 <main+98>: call 0x401b60 <printf>
0x00401355 <main+103>: leave
0x00401356 <main+104>: ret
0x00401357 <main+105>: nop
0x00401358 <main+106>: add %al,(%eax)
0x0040135a <main+108>: add %al,(%eax)
0x0040135c <main+110>: add %al,(%eax)
0x0040135e <main+112>: add %al,(%eax)
End of assembler dump.
Dump of assembler code for function function:
0x004012d0 <function+0>: push %ebp
0x004012d1 <function+1>: mov %esp,%ebp
0x004012d3 <function+3>: sub $0x38,%esp
0x004012d6 <function+6>: lea 0xffffffe8(%ebp),%eax
0x004012d9 <function+9>: add $0xc,%eax
0x004012dc <function+12>: mov %eax,0xffffffd4(%ebp)
0x004012df <function+15>: mov 0xffffffd4(%ebp),%edx
0x004012e2 <function+18>: mov 0xffffffd4(%ebp),%eax
0x004012e5 <function+21>: movzbl (%eax),%eax
0x004012e8 <function+24>: add $0x5,%al
0x004012ea <function+26>: mov %al,(%edx)
0x004012ec <function+28>: leave
0x004012ed <function+29>: ret
In my case the distance should be - = 5,right?But it seems not working..
Why function needs 56 bytes for local variables?( sub $0x38,%esp )
As joveha pointed out, the value of EIP saved on the stack (return address) by the call instruction needs to be incremented by 7 bytes (0x00401342 - 0x0040133b = 7) in order to skip the x = 1; instruction (movl $0x1,0xfffffffc(%ebp)).
You are correct that 56 bytes are being reserved for local variables (sub $0x38,%esp), so the missing piece is how many bytes past buffer1 on the stack is the saved EIP.
A bit of test code and inline assembly tells me that the magic value is 28 for my test. I cannot provide a definitive answer as to why it is 28, but I would assume the compiler is adding padding and/or stack canaries.
The following code was compiled using GCC 3.4.5 (MinGW) and tested on Windows XP SP3 (x86).
unsigned long get_ebp() {
__asm__("pop %ebp\n\t"
"movl %ebp,%eax\n\t"
"push %ebp\n\t");
}
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
/* distance in bytes from buffer1 to return address on the stack */
printf("test %d\n", ((get_ebp() + 4) - (unsigned long)&buffer1));
ret = (int *)(buffer1 + 28);
(*ret) += 7;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
I could have just as easily used gdb to determine this value.
(compiled w/ -g to include debug symbols)
(gdb) break function
...
(gdb) run
...
(gdb) p $ebp
$1 = (void *) 0x22ff28
(gdb) p &buffer1
$2 = (char (*)[5]) 0x22ff10
(gdb) quit
(0x22ff28 + 4) - 0x22ff10 = 28
(ebp value + size of word) - address of buffer1 = number of bytes
In addition to Smashing The Stack For Fun And Profit, I would suggest reading some of the articles I mentioned in my answer to a previous question of yours and/or other material on the subject. Having a good understanding of exactly how this type of exploit works should help you write more secure code.
It's hard to predict what buffer1 + 12 really points to. Your compiler can put buffer1 and buffer2 in any location on the stack it desires, even going as far as to not save space for buffer2 at all. The only way to really know where buffer1 goes is to look at the assembler output of your compiler, and there's a good chance it would jump around with different optimization settings or different versions of the same compiler.
I do not test the code on my own machine yet, but have you taken memory alignment into consideration?
Try to disassembly the code with gcc. I think a assembly code may give you a further understanding of the code. :-)
This code prints out 1 as well on OpenBSD and FreeBSD, and gives a segmentation fault on Linux.
This kind of exploit is heavily dependent on both the instruction set of the particular machine, and the calling conventions of the compiler and operating system. Everything about the layout of the stack is defined by the implementation, not the C language. The article assumes Linux on x86, but it looks like you're using Windows, and your system could be 64-bit, although you can switch gcc to 32-bit with -m32.
The parameters you'll have to tweak are 12, which is the offset from the tip of the stack to the return address, and 8, which is how many bytes of main you want to jump over. As the article says, you can use gdb to inspect the disassembly of the function to see (a) how far the stack gets pushed when you call function, and (b) the byte offsets of the instructions in main.
The +8 bytes part is by how much he wants the saved EIP to the incremented with. The EIP was saved so the program could return to the last assignment after the function is done - now he wants to skip over it by adding 8 bytes to the saved EIP.
So all he tries to is to "skip" the
x = 1;
In your case the saved EIP will point to 0x0040133b, the first instruction after function returns. To skip the assignment you need to make the saved EIP point to 0x00401342. That's 7 bytes.
It's really a "mess with RET EIP" rather than an buffer overflow example.
And as far as the 56 bytes for local variables goes, that could be anything your compiler comes up with like padding, stack canaries, etc.
Edit:
This shows how difficult it is to make buffer overflows examples in C. The offset of 12 from buffer1 assumes a certain padding style and compile options. GCC will happily insert stack canaries nowadays (which becomes a local variable that "protects" the saved EIP) unless you tell it not to. Also, the new address he wants to jump to (the start instruction for the printf call) really has to be resolved manually from assembly. In his case, on his machie, with his OS, with his compiler, on that day.... it was 8.
You're compiling a C program with the C++ compiler. Rename hw.cpp to hw.c and you'll find it will compile.

Resources