Running own code with a buffer overflow exploit - c

I am trying to understand the buffer overflow exploit and more specifically, how it can be used to run own code - e.g. by starting our own malicious application or anything similar.
While I do understand the idea of the buffer overflow exploit using the gets() function (overwriting the return address with a long enough string and then jumping to the said address), there are a few things I am struggling to understand in real application, those being:
Do I put my own code into the string just behind the return address? If so, how do I know the address to jump to? And if not, where do I jump and where is the actual code located?
Is the actual payload that runs the code my own software that's running and the other program just jumps into it or are all the instructions provided in the payload? Or more specifically, what does the buffer overflow exploit implementation actually look like?
What can I do when the address (or any instruction) contains 0? gets() function stops reading when it reads 0 so how is it possible to get around this problem?
As a homework, I am trying to exploit a very simple program that just asks for an input with gets() (ASLR turned off) and then prints it. While I can find the memory address of the function which calls it and the return, I just can't figure out how to actually implement the exploit.

You understand how changing the return address lets you jump to an arbitrary location.
But as you have correctly identified you don't know where you have loaded the code you want to execute. You just copied it into a local buffer(which was mostly some where on the stack).
But there is something that always points to this stack and it is the stack pointer register. (Lets assume x64 and it would be %rsp).
Assuming your custom code is on the top of the stack. (It could be at an offset but that too can be managed similarly).
Now we need an instruction that
1. Allows us to jump to the esp
2. Is located at a fixed address.
So most binaries use some kind of shared libraries. On windows you have kernel32.dll. In all the programs this library is loaded, it is always mapped at the same address. So you know the exact location of every instruction in this library.
All you have to do is disassemble one such library and find an instruction like
jmp *%rsp // or a sequence of instructions that lets you jump to an offset
Then the address of this instruction is what you will place where the return address is supposed to be.
The function will return then and then jump to the stack (ofcourse you need an executable stack for this). Then it will execute your arbitrary code.
Hope that clears some confusion on how to get the exploit running.
To answer your other questions -
Yes you can place your code in the buffer directly. Or if you can find the exact code you want to execute (again in a shared library), you can simply jump to that.
Yes, gets would stop at \n and 0. But usually you can get away by changing your instructions a bit to write code that doesn't use these bytes at all.
You try different instructions and check the assembled bytes.

Related

GDB: Find stack memory address where return address of a function is stored?

I'm working on producing a buffer overflow on my Raspberry Pi (ASLR disabled).
I have a program, which has a main function, a vulnerable function and a function which should not be called, the evil function.
My main function calls the vulnerable function at some point, but the evil function obviously never gets called. I need to make sure it does, using a buffer overflow.
So what I have got so far is the return address of the vulnerable function in the main function, which I want to overwrite with the starting address of the evil function. I think this is correct approach.
However I wasn't able to figure out how I examine the memory in gdb in such a way so that I find at what stack address the return address is stored. There is an example available, which inputs a string of characters through gdb while the program is running, then they look up the memory around the stackpointer and somehow that is where the return address is stored. This seems rather weird to me, since how could they know that their input gets stored just a couple addresses away from the desperately wanted return address.
My question is if I can 'search' the stack for my return address using gdb.
The Raspberry Pi is running an ARM microcontroller, so you should read more about the ARM architecture and calling convention.
You should read more about ARM registers, especially the stack pointer (abbreviated SP) as well as the link register (abbreviated LR: this is where the return address of a function is stored). See for instance this question for a good explanation.
To visually inspect the values of those registers with gdb, you can use the instruction info registers, (also work if you type i r). See the doc for more details.

buffer overflow exploit: Why does "jmp esp" need to be located in a DLL?

I am trying to understand classical buffer overflow exploits where an input buffer overwrites the stack, the function return address that is saved on the stack and upper memory regions (where you usually place the shell code).
There are many examples of this on the internet and I think I have understood this pretty well:
You put more data in some input buffer that the developer has made fixed size
Your input overwrites the function arguments and the return address to the calling function on the stack
That address is loaded into EIP when the OS tries to return from the function the overflow happened in and that is what allows you to get data you control into the EIP register (when not reading carefully some articles you get the impression you can overwrite CPU registers. Of course, this is not the case, you can only overwrite the stack but the CPU will load addresses from the stack into its registers)
If the exploit is well designed the value loaded into EIP will make the program jump to the beginning of the shell code (see point 5)
The next thing I think I have understood well is the "JMP ESP" mechanism. When replicating the crash in your lab environment you look for a place in memory that contains the "JMP ESP" instruction and you overwrite the EIP (now my wording is not precise, I know...) with that address. That address needs to be identical no matter how often you run this, at what time, which thread is executing your stuff etc., at least for the same OS version and the address must not contain any forbidden bytes. The code will jump to that address (at the same time the stack is reduced by 4 bytes) so "jmp esp" will jump to the next address in my overflow buffer after the place where I had put the value to overwrite EIP with and that is usually the place where the shellcode goes, maybe prepended with NOPs.
Now comes the question.
All articles I've read so far are looking for the address of the "JMP ESP" instructions in a DLL (which must not be relocatable, not compiled with ASLR, etc.). Why not looking in exe itself for a "jmp esp"? Why does it need to be in a DLL?
I have run the "!mona modules" command in Immunity Debugger and the only module displayed that satisfies all these conditions is the exe itself. When I look in popular exploit databases the address is always in loaded DLLs.
I cannot see any obvious reason for this. The exe can also be located at the same address in memory the same way a DLL can. What is the difference?
Found another resource on this:
As I wrote in a comment before, the addresses of the exe usually contain a zero:
http://resources.infosecinstitute.com/in-depth-seh-exploit-writing-tutorial-using-ollydbg/#commands
A module you can consider using is the main executable itself, which
is quite often not subject to any compiler based exploit protections,
especially when the application has been written by a third party
developer and not Microsoft. Using the main executable has one major
flaw however, in that it almost always starts with a zero byte. This
is a problem because the zero byte is a string terminator in C/C++,
and using zero bytes as part of strings used to overflow a buffer will
often result in the string being terminated at that point, potentially
preventing the buffer from being appropriately overflowed and breaking
the exploit.
In position independent code jump addresses are relative to the program-counter, while in non-relocatable code they are absolute. DLLs in Windows do not generally use position independent code. Exploits that rely on knowing the offset of the executable code in the binary require non-relocatable code.
short answer: the address DOES NOT need to be in a DLL.
long answer:
ANY mapped unprotected executable memory in your process will be executed if the Instruction register is set to its address.
MAPPED: means the memory is mapped to your process by the operating system, some addresses may not be mapped and any and all access will cause operating system memory fault signals to be raised.
EXECUTABLE: mapped memory often has permissions set to it, sometimes it is enough for the memory to be readable yet with newer processors with NX-bits it may be required to be mapped executable
UNPROTECTED: means that the memory is not mapped as a guard page. memory pages that are protected by guard pages will raise processor interrupt. the handling of these depends on your operating system and may be used to implement non-executable pages on processors that do not implement NX-bits.
if the memory in your executable file fulfils these demands it CAN be used for your exploit. whether you can FIND this address during runtime is another question and can be very difficult for real world exploits. sticking to non relocatable DLLs and EXEs is a good idea for beginners.
as for your comment:
I am looking again at my example and I think the answer is a lot easier than I originally thought. The exe is loaded at base address 0x004.... and goes up to 0x009.... This simply means that the every address will contain a 0x00 which is probably a show stopper for each C like program...
trying to overrun previous addresses with new addresses that contain '\0' characters may cause problems for exploits that use an overrun in string based functions (eg. strcpy or gets) to overflow the buffer because those stop on '\0' characters.
if you can overflow your buffer with logic that is not limited by '\0' characters (eg memcpy) you may use those addresses as well.

Allocating a new call stack

(I think there's a high chance of this question either being a duplicate or otherwise answered here already, but searching for the answer is hard thanks to interference from "stack allocation" and related terms.)
I have a toy compiler I've been working on for a scripting language. In order to be able to pause the execution of a script while it's in progress and return to the host program, it has its own stack: a simple block of memory with a "stack pointer" variable that gets incremented using the normal C code operations for that sort of thing and so on and so forth. Not interesting so far.
At the moment I compile to C. But I'm interested in investigating compiling to machine code as well - while keeping the secondary stack and the ability to return to the host program at predefined control points.
So... I figure it's not likely to be a problem to use the conventional stack registers within my own code, I assume what happens to registers there is my own business as long as everything is restored when it's done (do correct me if I'm wrong on this point). But... if I want the script code to call out to some other library code, is it safe to leave the program using this "virtual stack", or is it essential that it be given back the original stack for this purpose?
Answers like this one and this one indicate that the stack isn't a conventional block of memory, but that it relies on special, system specific behaviour to do with page faults and whatnot.
So:
is it safe to move the stack pointers into some other area of memory? Stack memory isn't "special"? I figure threading libraries must do something like this, as they create more stacks...
assuming any area of memory is safe to manipulate using the stack registers and instructions, I can think of no reason why it would be a problem to call any functions with a known call depth (i.e. no recursion, no function pointers) as long as that amount is available on the virtual stack. Right?
stack overflow is obviously a problem in normal code anyway, but would there be any extra-disastrous consequences to an overflow in such a system?
This is obviously not actually necessary, since simply returning the pointers to the real stack would be perfectly serviceable, or for that matter not abusing them in the first place and just putting up with fewer registers, and I probably shouldn't try to do it at all (not least due to being obviously out of my depth). But I'm still curious either way. Want to know how these sorts of things work.
EDIT: Sorry of course, should have said. I'm working on x86 (32-bit for my own machine), Windows and Ubuntu. Nothing exotic.
All of these answer are based on "common processor architectures", and since it involves generating assembler code, it has to be "target specific" - if you decide to do this on processor X, which has some weird handling of stack, below is obviously not worth the screensurface it's written on [substitute for paper]. For x86 in general, the below holds unless otherwise stated.
is it safe to move the stack pointers into some other area of memory?
Stack memory isn't "special"? I figure threading libraries
must do something like this, as they create more stacks...
The memory as such is not special. This does however assume that it's not on an x86 architecture where the stack segment is used to limit the stack usage. Whilst that is possible, it's rather rare to see in an implementation. I know that some years ago Nokia had a special operating system using segments in 32-bit mode. As far as I can think of right now, that's the only one I've got any contact with that uses the stack segment for as x86-segmentation mode describes.
Assuming any area of memory is safe to manipulate using the stack
registers and instructions, I can think of no reason why it would be a
problem to call any functions with a known call depth (i.e. no
recursion, no function pointers) as long as that amount is available
on the virtual stack. Right?
Correct. Just as long as you don't expect to be able to get back to some other function without switching back to the original stack. Limited level of recursion would also be acceptable, as long as the stack is deep enough [there are certain types of problems that are definitely hard to solve without recursion - binary tree search for example].
stack overflow is obviously a problem in normal code anyway,
but would there be any extra-disastrous consequences to an overflow in
such a system?
Indeed, it would be a tough bug to crack if you are a little unlucky.
I would suggest that you use a call to VirtualProtect() (Windows) or mprotect() (Linux etc) to mark the "end of the stack" as unreadable and unwriteable so that if your code accidentally walks off the stack, it crashes properly rather than some other more subtle undefined behaviour [because it's not guaranteed that the memory just below (lower address) is unavailable, so you could overwrite some other useful things if it does go off the stack, and that would cause some very hard to debug bugs].
Adding a bit of code that occassionally checks the stack depth (you know where your stack starts and ends, so it shouldn't be hard to check if a particular stack value is "outside the range" [if you give yourself some "extra buffer space" between the top of the stack and the "we're dead" zone that you protected - a "crumble zone" as they would call it if it was a car in a crash]. You can also fill the entire stack with a recognisable pattern, and check how much of that is "untouched".
Typically, on x86, you can use the existing stack without any problems so long as:
you don't overflow it
you don't increment the stack pointer register (with pop or add esp, positive_value / sub esp, negative_value) beyond what your code starts with (if you do, interrupts or asynchronous callbacks (signals) or any other activity using the stack will trash its contents)
you don't cause any CPU exception (if you do, the exception handling code might not be able to unwind the stack to the nearest point where the exception can be handled)
The same applies to using a different block of memory for a temporary stack and pointing esp to its end.
The problem with exception handling and stack unwinding has to do with the fact that your compiled C and C++ code contains some exception-handling-related data structures like the ranges of eip with the links to their respective exception handlers (this tells where the closest exception handler is for every piece of code) and there's also some information related to identification of the calling function (i.e. where the return address is on the stack, etc), so you can bubble up exceptions. If you just plug in raw machine code into this "framework", you won't properly extend these exception-handling data structures to cover it, and if things go wrong, they'll likely go very wrong (the entire process may crash or become damaged, despite you having exception handlers around the generated code).
So, yeah, if you're careful, you can play with stacks.
You can use any region you like for the processor's stack (modulo the memory protections).
Essentially, you simply load the ESP register ("MOV ESP, ...") with a pointer to the new area, however you managed to allocate it.
You have to have enough for your program, and whatever it might call (e.g., a Windows OS API), and whatever funny behaviours the OS has. You might be able to figure out how much space your code needs; a good compiler can easily do that. Figuring how much is needed by Windows is harder; you can always allocate "way too much" which is what Windows programs tend to do.
If you decide to manage this space tightly, you'll probably have to switch stacks to call Windows functions. That won't be enough; you'll likely get burned by various Windows surprises. I describe one of them here Windows: avoid pushing full x86 context on stack. I have mediocre solutions, but not good solutions for this.

Example of a buffer overflow leading to a security leak

I read many articles about unsafe functions like strcpy, memcpy, etc. which may lead to security problems when processing external data, like the content of a file or data coming from sockets. This may sound stupid, but I wrote a vulnerable program but I did not manage to "hack" it.
I understand the problem of buffer overflow. Take this example code:
int main() {
char buffer[1];
int var = 0;
scan("%s", &buffer);
printf("var = 0x%x\n", var);
return 0;
}
When I execute the program and type "abcde", the program outputs 0x65646362 which is "edcb" in hexadecimal + little-endian. However I read that you could modify the eip value that was pushed on the stack in order to make the program execute some unwanted code (eg. right before a call to the system() function).
However the function's assembly starts like this:
push %ebp
mov %ebp, %esp
and $0xfffffff0, %esp
sub $0x20, %esp
Since the value of %esp is random at the start of the function and because of this "and", there seems to be no reliable way to write a precise value into the pushed eip value.
Moreover, I read that it was possible to execute the code you wrote in the buffer (here the buffer is only 1 byte long, but in reality it would be large enough to store some code) but what value would you give to eip in order to do so (considering the location of the buffer is random)?
So why are developpers so worried about security problems (except that the program could crash) ? Do you have an example of a vulnerable program and how to "hack" it to execute unwanted code? I tried this on linux, is Windows less safe?
Read the excellent article by Aleph One: Smashing the Stack for Fun and Profit.
Well for one thing, don't under estimate the hazards associated with being able to unreliably place a value inside EIP. If an exploit works one in 16 times, and the service it is attacking automatically restarts, like many web applications, then an attacker that fails when trying to get access can always try, try again.
Also in a lot of cases the value of ESP is less random than you think. For starters on a 32-bit system it is nearly always a multiple of four. That means that the extra padding offered by the and $0xfffffff0, %esp instruction will be either 0, 4, 8 or 12 bytes. That means that it is possible to just repeat the value that is to be written into the return EIP four times to cover all possible offsets to the address of return EIP.
There are actually much more aggressive stack protection / buffer overflow detection mechanisms around. However, there are ways and means around even these.
Also, for an example of where this sort of thing can be dangerous, consider if the value of var was important to you logic as in the following toy example.
int main() {
char buffer[1];
int var = 0;
var = SecurityCheck();
scan("%s", &buffer);
if (var != 0)
GrantAccess();
else
DenyAccess()
}
Further you don't have to overwrite EIP with a pointer to something in your string. For example you could overwrite it with a pointer to system() and overwrite the next word with a pointer to /bin/sh at a fixed location in the program image.
Edit: Note that system uses the PATH (actually it runs the command via a shell), so "sh" would be just as good; thus, any English word ending in "sh" at the end of a string provides the argument you need.
A classic example of an actual exploit based on buffer overruns is the Morris Worm of 1988.
As mentioned in other answers, absolute reliability is not always essential for the attack to succeed. Applications that restart automatically are an example. Locally exploitable buffer overflows on suid programs would be another. And there's the NOP sled technique to increase chances of successful exploitation, put a lot of NOPs before your shellcode so you have a much better chance to correctly guess the "start" of your shellcode.
There are many more techniques for increasing the reliability of attacks. On Windows, back in the day, many exploits overwrote the return address with the address of a "jmp %esp" located somewhere in the program (trampoline).
"Insecure programming by example" had a nice trick for Linux. Clean your environment and put your shellcode in an environment variable. Back in the day, this would lead to a predictable address near the top of the stack.
And there are also variants like return-into-libc and return-oriented programming.
There was even an article on Phrack on how to exploit 1-byte stack overflows (meaning the buffer was overrun by only one byte) (btw, 1-byte heap overflows are also exploitable in the vast majority of cases, barring protections).
To sum up, it's not that developers are paranoid, there are lots of ways to exploit even the strangest cases, and remember:
A program is of good quality when it does what it is supposed to do.
A program is secure when it does what it is supposed to do, and nothing more.
Here's a windows version and tutorial:
http://www.codeproject.com/KB/winsdk/CodeInject.aspx
The general case I was always warned about was:
printf( string );
Because the user can provide a "%n" in there, which allows you to insert anything you want into memory. All you need to do is find the memory offset for a system call, pass a few "%n"'s and junk characters, and thus insert the memory address onto the stack where the return vector would normally be. Voila -- insert any code you like.

Why compilers creates one variable "twice"?

I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.

Resources