Understanding Format String Exploits - c

I'm learning various exploits and I can't quite get my head around Format String exploits. I've got a fairly simple program set up in an environment that allows the exploit.
int woah(char *arg){
char buf[200];
snprintf(buf, sizeof buf, arg);
return 0;
}
I'm able to control the arg being passed into the function which will be how the attack will happen with the end result of the program running my shellcode and giving me root. Making the program crash is easy, just feed it "%s%s" and it segfaults. We want to do more than that so we feed it something like "AAAA%x%x%x%x%x%x%x". Looking at the program in gdb we look at the buffer right after the snprinf and we can see:
"AAAA849541414141353934....blah blah blah"
That's good! We can see see the A's on the stack as well as the 41s which is A in hex. But then what comes next? I get that the general idea here is to overwrite the instruction pointer with four bytes by having the address at the start of our string that we feed in.....and then somewhere along the line we have it pointing to our shellcode.
How would I find the address of the seip/return address to overwrite?

When snprintf() is called, a stack frame - memory region - is created to execute it's function statements. However, before this happens, the compiler needs to know the previous caller of the function - return point address. This address is included in the stack frame so when the stack frame unwinds, that is the function is finishing up its work, it has go back to that address so the program can continue to run. What you are trying to do is overwrite this address with your shellcode address. Research more on stack frames, ESP, EBP, EIP to get the idea.

Related

Function call after buffer overflow

I've seen a video: https://www.youtube.com/watch?v=AXQefYKWjz4
I don't understand 2 things:
I can't see the function call, but it happens.
How he got a specific number, which he wrote to the file.
He is trying to write specific value(perhaps address of function to some position in the stack). Why it is possible? How I can repeat this?
First of all, What happens here is that he stores the hardcoded address of the function foo() in the 'file' that he reads into the variable 'x'. He stored it as '134513853' which when converted to hexadecimal becomes: 0x80484bd which must be the address of the function foo().
So, in order of execution,
the program reads the address of foo() from the file and copies it into x. Then it overwrites the buffer with this address such that after it overflows the buffer, it overwrites the return address.
For example:
If this is what the function stack looks like,
Buffer----------------->
EBP ----------------->
Return address --------> some 0x value <--- EIP
Post overflow it will look like this:
Buffer-----------------> 0x80484bd
EBP--------------------> 0x80484bd
Return Address---------> 0x80484bd <----EIP
Lets not bother with little-endian for now. So, when the function main() ends, the execution will resume from the address stored at the 'Return address' thereby diverting the execution to function foo() and printing the string, "Welcome to my...".
As for your second question, i think the guy who made the video has disabled ASLR and Stack Cookies.
ASLR or Address Space Layout Randomization randomizes key parts of the executable such that a function exists at different addresses on every new instance.
Stack Cookie/Canary is a random runtime generate value which is placed in between the local variables and the return address such that any overflow will have to first overwrite the cookie value. This cookie value is checked before the function ends and if there is a mismatch, the function exits thereby not letting the execution flow being diverted to the attacker controlled return address.
In order to repeat this, u will have to disable ASLR on your system, on Ubuntu this can be achieved by typing the following in your terminal such as Bash:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Then, you will have to compile your program without the stack cookie in the following way:
gcc -fno-stack-protector -z execstack -o test test.c
For more information:
ASLR: http://en.wikipedia.org/wiki/Address_space_layout_randomization
http://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries
Hope this helps.
A buffer overflow is a kind of 'problem' that lets the program potentially write over its own memory stack.
This does two things:
overwrites IP's (instruction pointer's) return address with a trivial value
(over)writes data in the stack, meant to be used as function parameters.
When the current function exits, the instruction pointers changes to the address in the stack. If that adress is that of malicious code, the code will be executed as if it were the program's.
This could allow one to potentially execute code with, for example root priviledges, if done inside a program that uses such priviledges.

The Notesearch Exploit anomalies (Hacking: Art of Exploitation)

This question is about the exploit for the program notesearch on pg 121 of the book Hacking: Art of Exploitation 2nd Edition.
There is something I do not understand in the exploit:
When the System executes the ./notesearch 'xyz....' the argument
'xyz...' overflows the string buffer in the child program thereby
overwriting the return address....that much is clear.
The assumption here is that the notesearch program's stack frame comes ontop of the calling exploit's Stack frame. This holds true when the compiled versions exist on the same system.
My first question is 1. Will this work even as a remote hack?
My second question is
2. Since the buffer has been used to overwrite all variables including and beyond the return address, how does the notesearch program work as intended?
Variables like "printing" etc which sit in this stackframe and decide whether messages are printed or not all seem to work fine.
Even though the calling functions sit ontop of the relevant stackframe, where the string buffer which is being flooded sits, there are certain key variables whioch would have been overwritten.
Question no. 3.
Given that String buffer is part of a new stack frame pushed in after execution of notesearch starts, the buffer overwrites all the given variables in that notesearch program. Also the buffer is the value for the search string. By the program logic since the search string does not match with message, the program should not output details of the User messages. In this case, the messages appear. I want to know why?
(For reference: the book is http://www.tinker.tv/download/hacking2_sample.pdf and the code is downloadable for free from http://www.nostarch.com/hacking2.htm.)
Keep reading the book; another example is given on page 122, and then there's plenty of explanatory text that tells all about the exploits.
Here's the relevant part of notesearch's code:
int main(int argc, char *argv[]) {
int userid, printing=1, fd; // file descriptor
char searchstring[100];
if(argc > 1) // If there is an arg
strcpy(searchstring, argv[1]); // that is the search string
else // otherwise
searchstring[0] = 0; // search string is empty
userid = getuid();
fd = open(FILENAME, O_RDONLY); // open the file for read-only access
You wrote:
The assumption here is that the notesearch program's stack frame comes ontop of the calling exploit's Stack frame.
No, that's wrong. There is only one stack frame that's relevant here: the stack frame of the main() function in notesearch. The fact that we invoke ./notesearch xyz... via a system() call inside exploit_notesearch is irrelevant; we could just as well invoke ./notesearch xyz... directly on the bash command line, or trick some other process (such as, you know, bash) into executing it on our behalf.
Will this work even as a remote hack?
Of course.
Since the buffer has been used to overwrite all variables including and beyond the return address, how does the notesearch program work as intended?
Well, it doesn't really work as intended. Look at the output again:
reader#hacking:~/booksrc $ gcc exploit_notesearch.c
reader#hacking:~/booksrc $ ./a.out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
sh-3.2#
Giving you a shell clearly doesn't count as "working as intended". But even before that, the program claims to find notes for userid 999 in /var/notes, which might indicate that it's a little bit confused. In our role as the malicious hacker, we don't care about this garbage output from the notesearch program; all we care about is that it eventually reaches the end of main() and returns to our shellcode, giving us access to the shell.
But, if you're wondering how we managed to overwrite the return address without overwriting local variables userid, printing, and fd, there are at least three obvious possibilities:
A. Maybe those variables are allocated below searchstring on the stack.
B. Maybe those variables are allocated in registers instead of on the stack.
C. Overwhelmingly likely, those variables are being overwritten, but their initial values simply don't matter to the program. For example, userid can get any value at all, because that garbage value will immediately be overwritten with getuid() on the next line. The only variable whose initial value matters is printing. And even printing changes the behavior of the program only if it happens to get the value 0 — and it can't get the value 0, because the data we're copying in consists entirely of non-zero bytes, by design.
I think you don't really understand what is buffer overflow. That searchstring variable is originally located on stack for 100 bytes. Now you are copying a large chunk of buffer into searchstring without checking the length of it. Therefore the buffer overflows to other parts of the stack frame of the notesearch's main function. The return address is also overwritten. That's how it works.
I think that the most important assumption here is that the notesearch stack is similar to that of exploit_notesearch. That is why he uses an exploit_notesearch local variable (unsigned int i) to calculate ret. He assumes (of course, knowing the source code of notesearch) that when both programs are loaded in memory they will have similar frame addresses (around 0xffff7..)
Of course, the 2 programs does not share memory, they are different processes.

Format String exploit - opening root shell

I'm running this in an environment with stack randomization disabled, and using the gcc version compatible with AlephOne's buffer overflow - that works great!
I'm trying to overwrite the instruction pointer register (eip) with the address of the array containing my shellcode. I always end up with segmentation fault though. The following is a snippet I'm trying to exploit:
//Prints version
static
void print_version(char* cmd)
{
char txt[640+1];
snprintf(txt, 640, "Submission program version 0.1 (%s)\n", cmd);
printf(txt);
}
I'm calling this function through execve(). The format string is the argv[0] here, which is successfully passed to the function above.
I'm having this format string:
\x0c\xdc\xbf\xffjunk\x0d\xdc\xbf\xffjunk\x0e\xdc\xbf\xffjunk\x0f\xdc\xbf\xff%8x%8x%120x%n%5x%n%230x%n%64x%n
followed by 200 NOPs, shellcode and remaining array with the NOPs as well.
The shellcode is Aleph One's code:
static char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
Going back to the format string, I'm overwriting 4 addresses 0xffbfdc0c to 0xffbfdc0f. 0xffbfdc0c is the saved eip address I found by setting a break point on print_version() function mentioned at the top.
I'm trying to replace it with the address 0xffbfdcd4, which is 150 bytes above the base address for txt[] array in print_version (counting the initial characters of the txt[] in the function and format string after that, hoping it may land somewhere in the NOPs preceding the shellcode).
I just end up with a SEG fault. I'm not sure how to proceed further or how I should debug to see if it's actually overwriting the value at the intended address.
EDIT: Am I using the correct format string?
Could anyone also tell me how to check the address using gdb after my program has generated a segment fault or right before a seg fault but after change of address?
Thanks.
A few things.. you state that your format string is argv[0]. Are you sure you didn't mean argv[1]? argv[0] is typically reserved for the program name, argv[1] being the first argument to the program.
You can examine the address after the right by setting a breakpoint before the right occurs and immediately after, and then using x/x 0xffbfdc0c to see if you're writing to the correct location. The GDB manual (available online) may be helpful to you as well.
The neat thing about format string writes are that you're not limited to where you can write, so anything is free game (hint hint). You might want to make sure that -D_FORITY_SOURCE isn't being set when you compile your program as well.

buffer overflow exploits - Why is the shellcode put before the return address

The code I'm reffering to is found here: Link to code
I read that the buffer overflow exploit uses a buffer that looks something like this:
| NOP SLED | SHELLCODE | REPEATED RETURN ADDRESS |
From what I understand the exploit happens when the buffer is put onto the stack as a function parameter and overwrites the function's return address.
I also understand that the repeated return address points to the NOP sled in the same buffer on the stack.
What I don't understand are the following:
Why does the return address have to point to the shellcode in the same buffer? Why not have the reapeated return addresses point to another part of the memory where the NOP sled and the Shellcode can be found?
How is the return address on the buffer perfectly aligned with the original one so the "ret" command will read the correct address and not read it from the middle for example.
The return address doesn't have to point to code in the same buffer, it is just often easier to do it this way. If you can put the shell code in and return address into the same buffer then this is the simplest. If the buffer that can be overflowed is too small to fit the shell code, it is feasible to put the shell code into another buffer and then jump to that when the vulnerable buffer is overflowed.
Also, protections such as Data Execution Prevention or (NX) prevent code being executed from a stack. In this case, techniques such as Return-Oriented Programming can be used to circumvent DEP. This technique involves using legitimate, executable code segments to run code the attacker wants to.
This can be tricky and may require some fiddling around with the payload. Usually the start of a buffer is at an address that is word aligned. In this case, ensuring the return address is correctly aligned means writing a buffer that is a multiple of the CPU word (4 bytes for 32-bit machines, 8 bytes for 64-bit). If the buffer is not word aligned, then an attacker may just experiment by adding or removing byte at a time until he thinks it is.
The reason it is simpler to do everything in one buffer is because not much will change between the injection of the shell code and the jumping to the newly modified return address. At the point of attack, it is very unlikely that an attacker can reference memory of another process, so we must look at buffers in process.
Putting shell code into a different buffer requires the attacker to understand how long the buffer will stay in place. Do different function calls cause one of the buffers to be deallocated? Is one of the buffers on the heap instead of the stack? So long as your single buffer is large enough, it is much simpler to put your NOP sled and shell code near the start and then just fill the rest with the return address. Compared to finding one buffer to populate with shell code and another to populate with the address of the previous buffer. Some shell code may also reference the stack pointer which means it needs to be set correctly.

Example of a buffer overflow leading to a security leak

I read many articles about unsafe functions like strcpy, memcpy, etc. which may lead to security problems when processing external data, like the content of a file or data coming from sockets. This may sound stupid, but I wrote a vulnerable program but I did not manage to "hack" it.
I understand the problem of buffer overflow. Take this example code:
int main() {
char buffer[1];
int var = 0;
scan("%s", &buffer);
printf("var = 0x%x\n", var);
return 0;
}
When I execute the program and type "abcde", the program outputs 0x65646362 which is "edcb" in hexadecimal + little-endian. However I read that you could modify the eip value that was pushed on the stack in order to make the program execute some unwanted code (eg. right before a call to the system() function).
However the function's assembly starts like this:
push %ebp
mov %ebp, %esp
and $0xfffffff0, %esp
sub $0x20, %esp
Since the value of %esp is random at the start of the function and because of this "and", there seems to be no reliable way to write a precise value into the pushed eip value.
Moreover, I read that it was possible to execute the code you wrote in the buffer (here the buffer is only 1 byte long, but in reality it would be large enough to store some code) but what value would you give to eip in order to do so (considering the location of the buffer is random)?
So why are developpers so worried about security problems (except that the program could crash) ? Do you have an example of a vulnerable program and how to "hack" it to execute unwanted code? I tried this on linux, is Windows less safe?
Read the excellent article by Aleph One: Smashing the Stack for Fun and Profit.
Well for one thing, don't under estimate the hazards associated with being able to unreliably place a value inside EIP. If an exploit works one in 16 times, and the service it is attacking automatically restarts, like many web applications, then an attacker that fails when trying to get access can always try, try again.
Also in a lot of cases the value of ESP is less random than you think. For starters on a 32-bit system it is nearly always a multiple of four. That means that the extra padding offered by the and $0xfffffff0, %esp instruction will be either 0, 4, 8 or 12 bytes. That means that it is possible to just repeat the value that is to be written into the return EIP four times to cover all possible offsets to the address of return EIP.
There are actually much more aggressive stack protection / buffer overflow detection mechanisms around. However, there are ways and means around even these.
Also, for an example of where this sort of thing can be dangerous, consider if the value of var was important to you logic as in the following toy example.
int main() {
char buffer[1];
int var = 0;
var = SecurityCheck();
scan("%s", &buffer);
if (var != 0)
GrantAccess();
else
DenyAccess()
}
Further you don't have to overwrite EIP with a pointer to something in your string. For example you could overwrite it with a pointer to system() and overwrite the next word with a pointer to /bin/sh at a fixed location in the program image.
Edit: Note that system uses the PATH (actually it runs the command via a shell), so "sh" would be just as good; thus, any English word ending in "sh" at the end of a string provides the argument you need.
A classic example of an actual exploit based on buffer overruns is the Morris Worm of 1988.
As mentioned in other answers, absolute reliability is not always essential for the attack to succeed. Applications that restart automatically are an example. Locally exploitable buffer overflows on suid programs would be another. And there's the NOP sled technique to increase chances of successful exploitation, put a lot of NOPs before your shellcode so you have a much better chance to correctly guess the "start" of your shellcode.
There are many more techniques for increasing the reliability of attacks. On Windows, back in the day, many exploits overwrote the return address with the address of a "jmp %esp" located somewhere in the program (trampoline).
"Insecure programming by example" had a nice trick for Linux. Clean your environment and put your shellcode in an environment variable. Back in the day, this would lead to a predictable address near the top of the stack.
And there are also variants like return-into-libc and return-oriented programming.
There was even an article on Phrack on how to exploit 1-byte stack overflows (meaning the buffer was overrun by only one byte) (btw, 1-byte heap overflows are also exploitable in the vast majority of cases, barring protections).
To sum up, it's not that developers are paranoid, there are lots of ways to exploit even the strangest cases, and remember:
A program is of good quality when it does what it is supposed to do.
A program is secure when it does what it is supposed to do, and nothing more.
Here's a windows version and tutorial:
http://www.codeproject.com/KB/winsdk/CodeInject.aspx
The general case I was always warned about was:
printf( string );
Because the user can provide a "%n" in there, which allows you to insert anything you want into memory. All you need to do is find the memory offset for a system call, pass a few "%n"'s and junk characters, and thus insert the memory address onto the stack where the return vector would normally be. Voila -- insert any code you like.

Resources