Buffer Overflow - why some ascii's work and not others - c

I'm sorry if this question is stupid or has been asked, but I couldn't find it.
I have a program that I was attempting to use a buffer over flow. It is a simple program that uses getchar() to retrieve the input from the user. The buffer is set to size 12. I can get the program to crash by typing >12 x's or using >12 \x78's, but it won't seg fault if I type in hundreds of A's or \x41's.
Any help or pointing in the right direction would be greatly appreciated.

0x41414141 may be a valid address within a text page of the process. Look at the segment map of the process for details.

To eliminate guessing, look at the assembly code and then at machine instructions of your program. Run it in a debugger and see what happens in the memory. You can see at what addresses on the stack local variables are placed and and what addresses registers and especially the instruction pointer are saved on a function call.
Have you look at examples like the stack overflow on Wikipedia?

Related

Running own code with a buffer overflow exploit

I am trying to understand the buffer overflow exploit and more specifically, how it can be used to run own code - e.g. by starting our own malicious application or anything similar.
While I do understand the idea of the buffer overflow exploit using the gets() function (overwriting the return address with a long enough string and then jumping to the said address), there are a few things I am struggling to understand in real application, those being:
Do I put my own code into the string just behind the return address? If so, how do I know the address to jump to? And if not, where do I jump and where is the actual code located?
Is the actual payload that runs the code my own software that's running and the other program just jumps into it or are all the instructions provided in the payload? Or more specifically, what does the buffer overflow exploit implementation actually look like?
What can I do when the address (or any instruction) contains 0? gets() function stops reading when it reads 0 so how is it possible to get around this problem?
As a homework, I am trying to exploit a very simple program that just asks for an input with gets() (ASLR turned off) and then prints it. While I can find the memory address of the function which calls it and the return, I just can't figure out how to actually implement the exploit.
You understand how changing the return address lets you jump to an arbitrary location.
But as you have correctly identified you don't know where you have loaded the code you want to execute. You just copied it into a local buffer(which was mostly some where on the stack).
But there is something that always points to this stack and it is the stack pointer register. (Lets assume x64 and it would be %rsp).
Assuming your custom code is on the top of the stack. (It could be at an offset but that too can be managed similarly).
Now we need an instruction that
1. Allows us to jump to the esp
2. Is located at a fixed address.
So most binaries use some kind of shared libraries. On windows you have kernel32.dll. In all the programs this library is loaded, it is always mapped at the same address. So you know the exact location of every instruction in this library.
All you have to do is disassemble one such library and find an instruction like
jmp *%rsp // or a sequence of instructions that lets you jump to an offset
Then the address of this instruction is what you will place where the return address is supposed to be.
The function will return then and then jump to the stack (ofcourse you need an executable stack for this). Then it will execute your arbitrary code.
Hope that clears some confusion on how to get the exploit running.
To answer your other questions -
Yes you can place your code in the buffer directly. Or if you can find the exact code you want to execute (again in a shared library), you can simply jump to that.
Yes, gets would stop at \n and 0. But usually you can get away by changing your instructions a bit to write code that doesn't use these bytes at all.
You try different instructions and check the assembled bytes.

Understanding stack and heap using a C/C++/Java program

Can someone suggest me any exercise or code to help me understand heap and stack memory better? I have read about this in textbooks a lot, just wanted to see some code to understand better. Maybe some stack overflow example or something like that. I assume C/C++ would be an ideal candidate.
Both the stack and the heap reside in memory. Where exactly isn't necessarily important but you should seek to understand their data structure.
Heap:
For the Heap you should read this example implementation of malloc and it should give you an idea of how it works (code included).
Stack (theory):
The stack (the basic data structure) is a rather simple concept to understand and is explained well enough here. That link also briefly covers how it is used in your program. I haven't found something that I think describes it in a nice way so I'll include the following as additional references to follow up:
There is a simple power point about it available here,
an interesting (and quite involved) video here
and a simplified explanation here.
If these don't quite explain it right, you should try and find something that explains the call stack and the stack frame in a way you will understand.
Stack (code):
All the links I pointed out explain the theory and don't give you the code. My understanding of the stack was significantly improved after I learned about computer architecture and assembly language.
So if you want a code example you should try and implement a simple assembly function and link it to a C program. Here is an example (with a brief intro into assembly).
Java and C++:
Java and C++ hide this stuff from you so if this is a learning exercise I recommend C and assembly. I'm not sure if it is even possible to do it in Java. You can in C++ but it gets complicated...
My explanation of the Call Stack:
I will try and give a short explanation of the call stack and the stack frame with respect to the C programming language and Assembly.
To do that lets define a simple machine so that we can begin on the same page.
Our processor has a finite number of registers, lets say 8. Operations like add, subtract, compare and others can ONLY be done on registers. There are only two instructions that you can execute on memory and they are load and store. These instructions copy from memory to a register, or from a register to memory respectively.
Now, if we want to add 2 numbers we have plenty of variables. We will load one number into register 1 and the other into register 2, add them and put the result into register 3. Now, if we do another operation on registers 4 and 5, store the result in register 6 and then want to add registers 7 and 8 where do we put the result? We have run out of variables...
Clearly we need to use memory and save information there when we are not processing it. To do that we need to know where in memory each variable lives.
You could remember the address of each variable and manually type them in every time, possibly writing it down next to a sensible name on a piece of paper on your desk. But that would be tedious and would pose a serious limitation on functions.
Functions have local variables. If these variable's addresses were hardwired they would always be allocated to the function and would pose a big problem to recursion. If a function were to call itself it would overwrite the values that it's caller (itself) was using.
To solve this the simplest way is to use a stack. However, to do so we need to keep the Stack Pointer address somewhere. So lets just agree that from now on we will use register 8 as our stack pointer.
Also lets say that when you push stuff onto the stack we will subtract the number of bytes we want from the stack pointer and when we want to pop we will add the number of bytes to the stack pointer.
Every time we call a function we will put its local variables on the stack. Since we know what variables we want, we know how much memory we need.
But to access each local variable we need to know its address... Let us set up an example.
Lets say that we have 2 variables, both of which have 2 bytes in our function and that the stack pointer is 18.
As per our agreement lets subtract 4 bytes from the stack pointer, making the stack pointer equal 14.
Now to access the first variable we will simply use the stack pointer. To access the second we will subtract 2 from the stack pointer.
As long as our function is running this will be true, but before our function exits we will free our local variables from the stack by adding 4 to it again, making the stack pointer 18 (like it was before we started).

Curious thing when finding environment variable address in gdb

Recently I'm doing some Return-to-libc attack experiment base on the paper Bypassing non-executable-stack during exploitation using return-to-libc with my Ubuntu11.10.
Before my experiment I closed the ALSR.
According to the paper, I can find address of the environment variable SHELL="/bin/bash" in gdb(use gdb to debug the program I want to attack):
But I found that this address is wrong when I try to use it to Return-to-libc experiment.
And then I write a simple program to get the environment variable address:
When I run this program in the Terminal, I get the right address:
With this address I can do the attack.
I also find the related question about this. But the answers doesn't really make sense(the second one may be better).
Just tell me some details about this, please.
From your screenshots, I'll assume you're running on an 32-bit intel platform. I haven't spent the time to fully research an answer to this, but these are points worth noting:
I'll bet that your entire environment is in about the same place, and is packed together tightly as c-style strings. (try x/100s **(char***)&environ).
When I tried ths on my x86-64 installation, the only thing I saw after the environment was my command line, and some empty strings.
At 0xBffff47A, you're very close to the top of user address space (which ends at 0xC0000000).
So, my guess is that what's going on here is that:
The environment block and command line parameters are, at some point during startup, shoved in a packed form right at the end of user address space.
The contents of your environment are different when you run your program in GDB or in the terminal. For example, I notice "_=/usr/bin/gdb" when running under GDB, and I'll just bet that's only there when running under GDB.
The result is that, while your fixed pointer tends to land somewhere in the middle of the environment block, it doesn't land in the same place every time, since the environment itself is changing between runs.

attacking the stack

There is a binary file i am working on . It has a function with the address starting at 123. I need to get my code to execute this function.
Binary file accepts a byte array of size 'n' and does not check for bounds. Entire task is actually to overflow the buffer and cause bad things to happen.
Again, the job is to call address 123 and get it to execute. I was under impression that if a buffer size is , say "4", and i pass 9 characters .., 5 characters will be placed on the stack and executed. (is it true?)
Additionally, in order for me to get to address to be executed, i'd like to say "call 123". From what i understand "call" is "e8", no?
This problem is a bit confusing for me. If someone could help me better understand it, i would very much appreciate it
(Yes, this is a homework question)
I strongly recommend that you read Smashing the Stack for Fun and Profit. It describes in great detail exactly the steps you need to do to do this.
The stack does not contain code, but it does contain the return address for the function. Typical stack structure is:
<stack data> <old frame pointer> <return address>
<old frame pointer> is omitted sometimes and I think would have to for this, so all you have to provide is the data to fill the array then 123.
A common strategy is to find a call ESP opcode in a fixed memory address, and overwrite return address with this address. In this way the execution will continue on the stack. It will work if there is no DEP activated (or supported).
In your case you could also look for an abs_jump 123 in memory.
EDIT:
#Loren It would work only if stack can be executed. call esp has only two bytes opcode, so there is high probability to find it somewhere in memory. The second approach does not execute code from stack, but requires a 5 byte opcode, that is very unlikely to be found.

How to debug a stackoverflow problem on targets

I want to know how do we proceed to debug a STACKOVERFLOW issue on targets .
I mean what are the steps we should follow to reach a conclusion.
Put a memory write watchpoint for one word past the end of your stack space. Then the debugger will break in when that spot gets written to, and you can see what's at fault.
All stacks can be filled at start up with certain hex value (for example 0xAAAAAAAA). And then using special routine you can monitor all stack's maximum usage periodically by calculating the quantity of known values (0xAA..) from end of stack until finds the first difference.
Run it through a debugger such as gdb. The backtrace at the time of the stack overflow will tell you exactly which function or functions are repeating indefinitely. From there, figure out which input(s) to those functions are not changing, and not moving the function (if it's recursive) towards a base-case that will end the recursion.

Resources