I'm reading a book that explains how the ebp and eip registers work when a function is called. The following figure is provided:
here array is a local function variable. The function arguments are a, and b. This is how the actual C code looks like:
#include <stdio.h>
void function(int a, int b)
{
int array[8];
}
int main()
{
function(1,2);
return 0;
}
I compile with gcc -m32 -g function.c and run the program in gdb. The command disas main shows (skipped some lines):
0x08048474 : push $0x2
0x08048476 : push $0x1
0x08048478 : call 0x804843b
0x0804847d : add $0x10,%esp
the first and last few instructions of function() are:
0x0804843b : push %ebp
0x0804843c : mov %esp,%ebp
0x0804843e : sub $0x38,%esp
0x08048441 : mov %gs:0x14,%eax
0x08048447 : mov %eax,-0xc(%ebp)
0x0804844a : xor %eax,%eax
0x0804844c : nop
...
0x0804845e : leave
0x0804845f : ret
and when I inspect the contents of ebp:
(gdb) x/4xw $ebp
0xffffcd48: 0xffffcd68 0x0804847d 0x00000001 0x00000002
I understand that in the stack, ebp should be followed by the return location 0x0804847d and the function arguments 0x00000001 and 0x00000002. However I don't know what is 0xffffcd68. Is this the address of ebp?
It is the value of ebp at the beginning of the function.
It's a consequence of push %ebp and the fact that the x86 stack is Full Descending.
It's the caller frame pointer.
Beware that the compilers update the way they handle the stack much more frequently than books authors do with their books.
Particularly: alignment, frame-pointer omission, RVO, implicit parameters and so on may throw you off.
Related
I want to try a buffer overflow on a c program. I compiled it like this gcc -fno-stack-protector -m32 buggy_program.c with gcc. If i run this program in gdb and i overflow the buffer, it should said 0x41414141, because i sent A's. But its saying 0x565561f5. Sorry for my bad english. Can somebody help me?
This is the source code:
#include <stdio.h>
int main(int argc, char **argv)
{
char buffer[64];
printf("Type in something: ");
gets(buffer);
}
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x565561f5 in main ()
I want to see this:
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in main ()
Looking at the address at which the process segfaulted shows the relevant line in the disassembled code:
gdb a.out <<EOF
set logging on
r < inp
disassemble main
x/i $eip
p/x $esp
Produces the following output:
(gdb) Starting program: .../a.out < in
Program received signal SIGSEGV, Segmentation fault.
0x08048482 in main (argc=, argv=) at tmp.c:10 10 }
(gdb) Dump of assembler code for function main:
0x08048436 <+0>: lea 0x4(%esp),%ecx
0x0804843a <+4>: and $0xfffffff0,%esp
0x0804843d <+7>: pushl -0x4(%ecx)
0x08048440 <+10>: push %ebp
0x08048441 <+11>: mov %esp,%ebp
0x08048443 <+13>: push %ebx
0x08048444 <+14>: push %ecx
0x08048445 <+15>: sub $0x40,%esp
0x08048448 <+18>: call
0x8048370 <__x86.get_pc_thunk.bx>
0x0804844d <+23>: add $0x1bb3,%ebx
0x08048453 <+29>: sub $0xc,%esp
0x08048456 <+32>: lea -0x1af0(%ebx),%eax
0x0804845c <+38>: push %eax
0x0804845d <+39>: call 0x8048300
0x08048462 <+44>: add $0x10,%esp
0x08048465 <+47>: sub $0xc,%esp
0x08048468 <+50>: lea -0x48(%ebp),%eax
0x0804846b <+53>: push %eax
0x0804846c <+54>: call 0x8048310
0x08048471 <+59>: add $0x10,%esp
0x08048474 <+62>: mov $0x0,%eax
0x08048479 <+67>: lea -0x8(%ebp),%esp
0x0804847c <+70>: pop %ecx
0x0804847d <+71>: pop %ebx
0x0804847e <+72>: pop %ebp
0x0804847f <+73>: lea -0x4(%ecx),%esp
=> 0x08048482 <+76>: ret
End of assembler dump.
(gdb) => 0x8048482 : ret
(gdb) $1 = 0x4141413d
(gdb) quit
The failing statement is the ret at the end of main. The program fails, when ret attempts to load the return-address from the top of the stack. The produced executable stores the old value of esp on the stack, before aligning to word-boundaries. When main is completed, the program attempts to restore the esp from the stack and afterwards read the return-address. However the whole top of the stack is compromised, thus rendering the new value of the stack-pointer garbage ($1 = 0x4141413d). When ret is executed, it attempts to read a word from address 0x4141413d, which isn't allocated and produces as segfault.
Notes
The above disassembly was produced from the code in the question using the following compiler-options:
-m32 -fno-stack-protector -g -O0
So guys, i found a solution:
Just compile it with gcc 3.3.4
gcc -m32 buggy_program.c
Modern operating systems use address-space-layout-randomization ASLR to make this stuff not work quite so easily.
I remember the controversy when it was first started. ASLR was kind of a bad idea for 32 bit processes due to the number of other constraints it imposed on the system and dubious security benefit. On the other hand, it works great on 64 bit processes and almost everybody uses it now.
You don't know where the code is. You don't know where the heap is. You don't know where the stack is. Writing exploits is hard now.
Also, you tried to use 32 bit shellcode and documentation on a 64 bit process.
On reading the updated question: Your code is compiled with frame pointers (which is the default). This is causing the ret instruction itself to fault because esp is trashed. ASLR appears to still be in play most likely it doesn't really matter.
I was reading Hacking: The Art of Exploitation by Jon Erickson, and followed the example in the book in my Kali Linux system (64 bit).
I wrote a simple C program:
#include<stdio.h>
int main()
{
int i;
for(i=0;i<10;i++)
{
printf("Hello");
}
}
After using objdump and gdb to examine the executable, I found something strange.
As the picture shows, the main function was in the "0x000000000000063a".
But the breakpoint info after the gdb "run" command, it seems that the program stopped at 63e rather than 63a.
Another peculiar thing is that the value in the instruction pointer (rip) was 0x55555555463e.
Shouldn't it be 0x000000000000063a?
Where do those 5s come from?
GDB sets breakpoints on useful code for a function if you don't set an asterisk. It omits all preparation for a function(prologue). To make it clear try to debug the following code:
#include <stdio.h>
int main()
{
int i=10;
i++;
return 0;
}
Gdb session:
(gdb) b main
Breakpoint 1 at 0x80483e1
(gdb) b *main
Breakpoint 2 at 0x80483db
(gdb) r
Starting program: /home/src/main
Breakpoint 2, 0x080483db in main ()
(gdb) disas
Dump of assembler code for function main:
=> 0x080483db <+0>: push ebp
0x080483dc <+1>: mov ebp,esp
0x080483de <+3>: sub esp,0x10
0x080483e1 <+6>: mov DWORD PTR [ebp-0x4],0xa
0x080483e8 <+13>: add DWORD PTR [ebp-0x4],0x1
0x080483ec <+17>: mov eax,0x0
0x080483f1 <+22>: leave
0x080483f2 <+23>: ret
End of assembler dump.
(gdb) c
Continuing.
Breakpoint 1, 0x080483e1 in main ()
(gdb) disas
Dump of assembler code for function main:
0x080483db <+0>: push ebp
0x080483dc <+1>: mov ebp,esp
0x080483de <+3>: sub esp,0x10
=> 0x080483e1 <+6>: mov DWORD PTR [ebp-0x4],0xa
0x080483e8 <+13>: add DWORD PTR [ebp-0x4],0x1
0x080483ec <+17>: mov eax,0x0
0x080483f1 <+22>: leave
0x080483f2 <+23>: ret
End of assembler dump.
in this case, preparation to execute useful code of the function is :
0x080483db <+0>: push ebp
0x080483dc <+1>: mov ebp,esp
0x080483de <+3>: sub esp,0x10
first instruction in main:
int i=10;
compiled into:
mov DWORD PTR [ebp-0x4],0xa
GDB set a breakpoint on the instruction, when we give the command b main
But if we use the command with an asterisk(pointer) b *main we set a breakpoint on the actual address of the function(on first instruction of prologue).
In OP case, if we set breakpoint by break *main and then run, the instruction pointer register(rip) will have the value 0x55555555463a
A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.
As the title says, I am trying to obtain the address of the stored EIP in the frame.
For this simple program:
func1(int a, int b)
{
int x = 1;
}
int main(void)
{
func1(1,2);
}
My gdb disassembly is:
(gdb) disassemble main
Dump of assembler code for function main:
0x08048430 <main+0>: push %ebp
0x08048431 <main+1>: mov %esp,%ebp
0x08048433 <main+3>: sub $0x8,%esp
0x08048436 <main+6>: add $0xfffffff8,%esp
0x08048439 <main+9>: push $0x2
0x0804843b <main+11>: push $0x1
0x0804843d <main+13>: call 0x8048410 <func1>
0x08048442 <main+18>: add $0x10,%esp
0x08048445 <main+21>: mov %ebp,%esp
0x08048447 <main+23>: pop %ebp
0x08048448 <main+24>: ret
End of assembler dump.
Stack frame printed from GDB:
(gdb) info frame
Stack level 0, frame at 0xffbfdda0:
eip = 0x8048416 in func1 (t.c:3); saved eip 0x8048442
called by frame at 0xffbfddc0
source language c.
Arglist at 0xffbfdd98, args: a=1, b=2
Locals at 0xffbfdd98, Previous frame's sp is 0xffbfdda0
Saved registers:
ebp at 0xffbfdd98, eip at 0xffbfdd9c
info frame doesn't provide the address of the saved eip, it just shows the value of the save eip.
I setup a break point on func1, then printed the frame information. The saved EIP has a value of 0x8048442, which corresponds to in the disassembly. I am confused, how do I calculate the address of where EIP(0x8048442) is located?
i have examined the address 0x8048412(0x8048416 - 4), but it doesn't contain the saved EIP address.
You need to examine the area before the arg list. It tells you that: eip at 0xffbfdd9c.
This address is 4 bytes before the arg list - 0xffbfdd98. Remember that the list grows down so "4 bytes before x" means "x+4".
The saved eip 0x8048442 info is about where does the EIP points to, which is in the text section, not in the stack.
I am trying to make the buffer exploitation example (example3.c from http://insecure.org/stf/smashstack.html) work on Debian Lenny 2.6 version. I know the gcc version and the OS version is different than the one used by Aleph One. I have disabled any stack protection mechanisms using -fno-stack-protector and sysctl -w kernel.randomize_va_space=0 arguments. To account for the differences in my setup and Aleph One's I introduced two parameters : offset1 -> Offset from buffer1 variable to the return address and offset2 -> how many bytes to jump to skip a statement. I tried to figure out these parameters by analyzing assembly code but was not successful. So, I wrote a shell script that basically runs the buffer overflow program with simultaneous values of offset1 and offset2 from (1-60). But much to my surprise I am still not able to break this program. It would be great if someone can guide me for the same. I have attached the code and assembly output for consideration. Sorry for the really long post :)
Thanks.
// Modified example3.c from Aleph One paper - Smashing the stack
void function(int a, int b, int c, int offset1, int offset2) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = (int *)buffer1 + offset1;// how far is return address from buffer ?
(*ret) += offset2; // modify the value of return address
}
int main(int argc, char* argv[]) {
int x;
x = 0;
int offset1 = atoi(argv[1]);
int offset2 = atoi(argv[2]);
function(1,2,3, offset1, offset2);
x = 1; // Goal is to skip this statement using buffer overflow
printf("X : %d\n",x);
return 0;
}
-----------------
// Execute the buffer overflow program with varying offsets
#!/bin/bash
for ((i=1; i<=60; i++))
do
for ((j=1; j<=60; j++))
do
echo "`./test $i $j`"
done
done
-- Assembler output
(gdb) disassemble main
Dump of assembler code for function main:
0x080483c2 <main+0>: lea 0x4(%esp),%ecx
0x080483c6 <main+4>: and $0xfffffff0,%esp
0x080483c9 <main+7>: pushl -0x4(%ecx)
0x080483cc <main+10>: push %ebp
0x080483cd <main+11>: mov %esp,%ebp
0x080483cf <main+13>: push %ecx
0x080483d0 <main+14>: sub $0x24,%esp
0x080483d3 <main+17>: movl $0x0,-0x8(%ebp)
0x080483da <main+24>: movl $0x3,0x8(%esp)
0x080483e2 <main+32>: movl $0x2,0x4(%esp)
0x080483ea <main+40>: movl $0x1,(%esp)
0x080483f1 <main+47>: call 0x80483a4 <function>
0x080483f6 <main+52>: movl $0x1,-0x8(%ebp)
0x080483fd <main+59>: mov -0x8(%ebp),%eax
0x08048400 <main+62>: mov %eax,0x4(%esp)
0x08048404 <main+66>: movl $0x80484e0,(%esp)
0x0804840b <main+73>: call 0x80482d8 <printf#plt>
0x08048410 <main+78>: mov $0x0,%eax
0x08048415 <main+83>: add $0x24,%esp
0x08048418 <main+86>: pop %ecx
0x08048419 <main+87>: pop %ebp
0x0804841a <main+88>: lea -0x4(%ecx),%esp
0x0804841d <main+91>: ret
End of assembler dump.
(gdb) disassemble function
Dump of assembler code for function function:
0x080483a4 <function+0>: push %ebp
0x080483a5 <function+1>: mov %esp,%ebp
0x080483a7 <function+3>: sub $0x20,%esp
0x080483aa <function+6>: lea -0x9(%ebp),%eax
0x080483ad <function+9>: add $0x30,%eax
0x080483b0 <function+12>: mov %eax,-0x4(%ebp)
0x080483b3 <function+15>: mov -0x4(%ebp),%eax
0x080483b6 <function+18>: mov (%eax),%eax
0x080483b8 <function+20>: lea 0x7(%eax),%edx
0x080483bb <function+23>: mov -0x4(%ebp),%eax
0x080483be <function+26>: mov %edx,(%eax)
0x080483c0 <function+28>: leave
0x080483c1 <function+29>: ret
End of assembler dump.
The disassembly for function you provided seems to use hardcoded values of offset1 and offset2, contrary to your C code.
The address for ret should be calculated using byte/char offsets: ret = (int *)(buffer1 + offset1), otherwise you'll get hit by pointer math (especially in this case, when your buffer1 is not at a nice aligned offset from the return address).
offset1 should be equal to 0x9 + 0x4 (the offset used in lea + 4 bytes for the push %ebp). However, this can change unpredictably each time you compile - the stack layout might be different, the compiler might create some additional stack alignment, etc.
offset2 should be equal to 7 (the length of the instruction you're trying to skip).
Note that you're getting a little lucky here - the function uses the cdecl calling convention, which means the caller is responsible for removing arguments off the stack after returning from the function, which normally looks like this:
push arg3
push arg2
push arg1
call func
add esp, 0Ch ; remove as many bytes as were used by the pushed arguments
Your compiler chose to combine this correction with the one after printf, but it could also decide to do this after your function call. In this case the add esp, <number> instruction would be present between your return address and the instruction you want to skip - you can probably imagine that this would not end well.