The Notesearch Exploit anomalies (Hacking: Art of Exploitation) - c

This question is about the exploit for the program notesearch on pg 121 of the book Hacking: Art of Exploitation 2nd Edition.
There is something I do not understand in the exploit:
When the System executes the ./notesearch 'xyz....' the argument
'xyz...' overflows the string buffer in the child program thereby
overwriting the return address....that much is clear.
The assumption here is that the notesearch program's stack frame comes ontop of the calling exploit's Stack frame. This holds true when the compiled versions exist on the same system.
My first question is 1. Will this work even as a remote hack?
My second question is
2. Since the buffer has been used to overwrite all variables including and beyond the return address, how does the notesearch program work as intended?
Variables like "printing" etc which sit in this stackframe and decide whether messages are printed or not all seem to work fine.
Even though the calling functions sit ontop of the relevant stackframe, where the string buffer which is being flooded sits, there are certain key variables whioch would have been overwritten.
Question no. 3.
Given that String buffer is part of a new stack frame pushed in after execution of notesearch starts, the buffer overwrites all the given variables in that notesearch program. Also the buffer is the value for the search string. By the program logic since the search string does not match with message, the program should not output details of the User messages. In this case, the messages appear. I want to know why?

(For reference: the book is http://www.tinker.tv/download/hacking2_sample.pdf and the code is downloadable for free from http://www.nostarch.com/hacking2.htm.)
Keep reading the book; another example is given on page 122, and then there's plenty of explanatory text that tells all about the exploits.
Here's the relevant part of notesearch's code:
int main(int argc, char *argv[]) {
int userid, printing=1, fd; // file descriptor
char searchstring[100];
if(argc > 1) // If there is an arg
strcpy(searchstring, argv[1]); // that is the search string
else // otherwise
searchstring[0] = 0; // search string is empty
userid = getuid();
fd = open(FILENAME, O_RDONLY); // open the file for read-only access
You wrote:
The assumption here is that the notesearch program's stack frame comes ontop of the calling exploit's Stack frame.
No, that's wrong. There is only one stack frame that's relevant here: the stack frame of the main() function in notesearch. The fact that we invoke ./notesearch xyz... via a system() call inside exploit_notesearch is irrelevant; we could just as well invoke ./notesearch xyz... directly on the bash command line, or trick some other process (such as, you know, bash) into executing it on our behalf.
Will this work even as a remote hack?
Of course.
Since the buffer has been used to overwrite all variables including and beyond the return address, how does the notesearch program work as intended?
Well, it doesn't really work as intended. Look at the output again:
reader#hacking:~/booksrc $ gcc exploit_notesearch.c
reader#hacking:~/booksrc $ ./a.out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
sh-3.2#
Giving you a shell clearly doesn't count as "working as intended". But even before that, the program claims to find notes for userid 999 in /var/notes, which might indicate that it's a little bit confused. In our role as the malicious hacker, we don't care about this garbage output from the notesearch program; all we care about is that it eventually reaches the end of main() and returns to our shellcode, giving us access to the shell.
But, if you're wondering how we managed to overwrite the return address without overwriting local variables userid, printing, and fd, there are at least three obvious possibilities:
A. Maybe those variables are allocated below searchstring on the stack.
B. Maybe those variables are allocated in registers instead of on the stack.
C. Overwhelmingly likely, those variables are being overwritten, but their initial values simply don't matter to the program. For example, userid can get any value at all, because that garbage value will immediately be overwritten with getuid() on the next line. The only variable whose initial value matters is printing. And even printing changes the behavior of the program only if it happens to get the value 0 — and it can't get the value 0, because the data we're copying in consists entirely of non-zero bytes, by design.

I think you don't really understand what is buffer overflow. That searchstring variable is originally located on stack for 100 bytes. Now you are copying a large chunk of buffer into searchstring without checking the length of it. Therefore the buffer overflows to other parts of the stack frame of the notesearch's main function. The return address is also overwritten. That's how it works.

I think that the most important assumption here is that the notesearch stack is similar to that of exploit_notesearch. That is why he uses an exploit_notesearch local variable (unsigned int i) to calculate ret. He assumes (of course, knowing the source code of notesearch) that when both programs are loaded in memory they will have similar frame addresses (around 0xffff7..)
Of course, the 2 programs does not share memory, they are different processes.

Related

Function call after buffer overflow

I've seen a video: https://www.youtube.com/watch?v=AXQefYKWjz4
I don't understand 2 things:
I can't see the function call, but it happens.
How he got a specific number, which he wrote to the file.
He is trying to write specific value(perhaps address of function to some position in the stack). Why it is possible? How I can repeat this?
First of all, What happens here is that he stores the hardcoded address of the function foo() in the 'file' that he reads into the variable 'x'. He stored it as '134513853' which when converted to hexadecimal becomes: 0x80484bd which must be the address of the function foo().
So, in order of execution,
the program reads the address of foo() from the file and copies it into x. Then it overwrites the buffer with this address such that after it overflows the buffer, it overwrites the return address.
For example:
If this is what the function stack looks like,
Buffer----------------->
EBP ----------------->
Return address --------> some 0x value <--- EIP
Post overflow it will look like this:
Buffer-----------------> 0x80484bd
EBP--------------------> 0x80484bd
Return Address---------> 0x80484bd <----EIP
Lets not bother with little-endian for now. So, when the function main() ends, the execution will resume from the address stored at the 'Return address' thereby diverting the execution to function foo() and printing the string, "Welcome to my...".
As for your second question, i think the guy who made the video has disabled ASLR and Stack Cookies.
ASLR or Address Space Layout Randomization randomizes key parts of the executable such that a function exists at different addresses on every new instance.
Stack Cookie/Canary is a random runtime generate value which is placed in between the local variables and the return address such that any overflow will have to first overwrite the cookie value. This cookie value is checked before the function ends and if there is a mismatch, the function exits thereby not letting the execution flow being diverted to the attacker controlled return address.
In order to repeat this, u will have to disable ASLR on your system, on Ubuntu this can be achieved by typing the following in your terminal such as Bash:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Then, you will have to compile your program without the stack cookie in the following way:
gcc -fno-stack-protector -z execstack -o test test.c
For more information:
ASLR: http://en.wikipedia.org/wiki/Address_space_layout_randomization
http://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries
Hope this helps.
A buffer overflow is a kind of 'problem' that lets the program potentially write over its own memory stack.
This does two things:
overwrites IP's (instruction pointer's) return address with a trivial value
(over)writes data in the stack, meant to be used as function parameters.
When the current function exits, the instruction pointers changes to the address in the stack. If that adress is that of malicious code, the code will be executed as if it were the program's.
This could allow one to potentially execute code with, for example root priviledges, if done inside a program that uses such priviledges.

Understanding Format String Exploits

I'm learning various exploits and I can't quite get my head around Format String exploits. I've got a fairly simple program set up in an environment that allows the exploit.
int woah(char *arg){
char buf[200];
snprintf(buf, sizeof buf, arg);
return 0;
}
I'm able to control the arg being passed into the function which will be how the attack will happen with the end result of the program running my shellcode and giving me root. Making the program crash is easy, just feed it "%s%s" and it segfaults. We want to do more than that so we feed it something like "AAAA%x%x%x%x%x%x%x". Looking at the program in gdb we look at the buffer right after the snprinf and we can see:
"AAAA849541414141353934....blah blah blah"
That's good! We can see see the A's on the stack as well as the 41s which is A in hex. But then what comes next? I get that the general idea here is to overwrite the instruction pointer with four bytes by having the address at the start of our string that we feed in.....and then somewhere along the line we have it pointing to our shellcode.
How would I find the address of the seip/return address to overwrite?
When snprintf() is called, a stack frame - memory region - is created to execute it's function statements. However, before this happens, the compiler needs to know the previous caller of the function - return point address. This address is included in the stack frame so when the stack frame unwinds, that is the function is finishing up its work, it has go back to that address so the program can continue to run. What you are trying to do is overwrite this address with your shellcode address. Research more on stack frames, ESP, EBP, EIP to get the idea.

buffer overflow exploits - Why is the shellcode put before the return address

The code I'm reffering to is found here: Link to code
I read that the buffer overflow exploit uses a buffer that looks something like this:
| NOP SLED | SHELLCODE | REPEATED RETURN ADDRESS |
From what I understand the exploit happens when the buffer is put onto the stack as a function parameter and overwrites the function's return address.
I also understand that the repeated return address points to the NOP sled in the same buffer on the stack.
What I don't understand are the following:
Why does the return address have to point to the shellcode in the same buffer? Why not have the reapeated return addresses point to another part of the memory where the NOP sled and the Shellcode can be found?
How is the return address on the buffer perfectly aligned with the original one so the "ret" command will read the correct address and not read it from the middle for example.
The return address doesn't have to point to code in the same buffer, it is just often easier to do it this way. If you can put the shell code in and return address into the same buffer then this is the simplest. If the buffer that can be overflowed is too small to fit the shell code, it is feasible to put the shell code into another buffer and then jump to that when the vulnerable buffer is overflowed.
Also, protections such as Data Execution Prevention or (NX) prevent code being executed from a stack. In this case, techniques such as Return-Oriented Programming can be used to circumvent DEP. This technique involves using legitimate, executable code segments to run code the attacker wants to.
This can be tricky and may require some fiddling around with the payload. Usually the start of a buffer is at an address that is word aligned. In this case, ensuring the return address is correctly aligned means writing a buffer that is a multiple of the CPU word (4 bytes for 32-bit machines, 8 bytes for 64-bit). If the buffer is not word aligned, then an attacker may just experiment by adding or removing byte at a time until he thinks it is.
The reason it is simpler to do everything in one buffer is because not much will change between the injection of the shell code and the jumping to the newly modified return address. At the point of attack, it is very unlikely that an attacker can reference memory of another process, so we must look at buffers in process.
Putting shell code into a different buffer requires the attacker to understand how long the buffer will stay in place. Do different function calls cause one of the buffers to be deallocated? Is one of the buffers on the heap instead of the stack? So long as your single buffer is large enough, it is much simpler to put your NOP sled and shell code near the start and then just fill the rest with the return address. Compared to finding one buffer to populate with shell code and another to populate with the address of the previous buffer. Some shell code may also reference the stack pointer which means it needs to be set correctly.

Example of a buffer overflow leading to a security leak

I read many articles about unsafe functions like strcpy, memcpy, etc. which may lead to security problems when processing external data, like the content of a file or data coming from sockets. This may sound stupid, but I wrote a vulnerable program but I did not manage to "hack" it.
I understand the problem of buffer overflow. Take this example code:
int main() {
char buffer[1];
int var = 0;
scan("%s", &buffer);
printf("var = 0x%x\n", var);
return 0;
}
When I execute the program and type "abcde", the program outputs 0x65646362 which is "edcb" in hexadecimal + little-endian. However I read that you could modify the eip value that was pushed on the stack in order to make the program execute some unwanted code (eg. right before a call to the system() function).
However the function's assembly starts like this:
push %ebp
mov %ebp, %esp
and $0xfffffff0, %esp
sub $0x20, %esp
Since the value of %esp is random at the start of the function and because of this "and", there seems to be no reliable way to write a precise value into the pushed eip value.
Moreover, I read that it was possible to execute the code you wrote in the buffer (here the buffer is only 1 byte long, but in reality it would be large enough to store some code) but what value would you give to eip in order to do so (considering the location of the buffer is random)?
So why are developpers so worried about security problems (except that the program could crash) ? Do you have an example of a vulnerable program and how to "hack" it to execute unwanted code? I tried this on linux, is Windows less safe?
Read the excellent article by Aleph One: Smashing the Stack for Fun and Profit.
Well for one thing, don't under estimate the hazards associated with being able to unreliably place a value inside EIP. If an exploit works one in 16 times, and the service it is attacking automatically restarts, like many web applications, then an attacker that fails when trying to get access can always try, try again.
Also in a lot of cases the value of ESP is less random than you think. For starters on a 32-bit system it is nearly always a multiple of four. That means that the extra padding offered by the and $0xfffffff0, %esp instruction will be either 0, 4, 8 or 12 bytes. That means that it is possible to just repeat the value that is to be written into the return EIP four times to cover all possible offsets to the address of return EIP.
There are actually much more aggressive stack protection / buffer overflow detection mechanisms around. However, there are ways and means around even these.
Also, for an example of where this sort of thing can be dangerous, consider if the value of var was important to you logic as in the following toy example.
int main() {
char buffer[1];
int var = 0;
var = SecurityCheck();
scan("%s", &buffer);
if (var != 0)
GrantAccess();
else
DenyAccess()
}
Further you don't have to overwrite EIP with a pointer to something in your string. For example you could overwrite it with a pointer to system() and overwrite the next word with a pointer to /bin/sh at a fixed location in the program image.
Edit: Note that system uses the PATH (actually it runs the command via a shell), so "sh" would be just as good; thus, any English word ending in "sh" at the end of a string provides the argument you need.
A classic example of an actual exploit based on buffer overruns is the Morris Worm of 1988.
As mentioned in other answers, absolute reliability is not always essential for the attack to succeed. Applications that restart automatically are an example. Locally exploitable buffer overflows on suid programs would be another. And there's the NOP sled technique to increase chances of successful exploitation, put a lot of NOPs before your shellcode so you have a much better chance to correctly guess the "start" of your shellcode.
There are many more techniques for increasing the reliability of attacks. On Windows, back in the day, many exploits overwrote the return address with the address of a "jmp %esp" located somewhere in the program (trampoline).
"Insecure programming by example" had a nice trick for Linux. Clean your environment and put your shellcode in an environment variable. Back in the day, this would lead to a predictable address near the top of the stack.
And there are also variants like return-into-libc and return-oriented programming.
There was even an article on Phrack on how to exploit 1-byte stack overflows (meaning the buffer was overrun by only one byte) (btw, 1-byte heap overflows are also exploitable in the vast majority of cases, barring protections).
To sum up, it's not that developers are paranoid, there are lots of ways to exploit even the strangest cases, and remember:
A program is of good quality when it does what it is supposed to do.
A program is secure when it does what it is supposed to do, and nothing more.
Here's a windows version and tutorial:
http://www.codeproject.com/KB/winsdk/CodeInject.aspx
The general case I was always warned about was:
printf( string );
Because the user can provide a "%n" in there, which allows you to insert anything you want into memory. All you need to do is find the memory offset for a system call, pass a few "%n"'s and junk characters, and thus insert the memory address onto the stack where the return vector would normally be. Voila -- insert any code you like.

about buffer overflow

I am new to the ethical hacking world, and one of the most important things is the stack overflow, anyway I coded a vulnerable C program which has a char name [400] statement, and when I try to run the program with 401A's it doesn't overflow, but the book which I am following says it must overflow and the logic sense says so, so what's wrong???
If you've defined a buffer:
char buf[400];
And wrote 401 bytes into it, the buffer has overflown. The rest, however, depends on the structure of your code:
How is the buffer allocated (statically, dynamically, on the stack)
What comes before and after it in memory
Your architecture's calling convention and ABI (in case of a stack buffer)
some more...
Things are more complex than they seem. To quote Wikipedia:
In computer security and programming,
a buffer overflow, or buffer overrun,
is an anomaly where a process stores
data in a buffer outside the memory
the programmer set aside for it. The
extra data overwrites adjacent memory,
which may contain other data,
including program variables and
program flow control data. This may
result in erratic program behavior,
including memory access errors,
incorrect results, program termination
(a crash), or a breach of system
security.
Note the multiple instances of the word may in this quote. All of this may happen, and it may not. Again, this depends on other factors.
C doesn't check about buffer overflow (overflowing the buffer is an undefined behavior). Usually the system will just allow you (and the hacker) to write beyond the buffer, and this is the reason why buffer overflow is vulnerable.
For example if the code is
char name[400];
char secret_password[400];
...
The memory may be layout as
[John ][12345 ]
name secret_password
Now if you write 401 A followed by a NULL to name, the extra A\0 will be written to secret_password, which basically changed the password from your luggage combination to just "A":
[AAAAAAAAA...AAAAA][A␀345 ]
name secret_password
Stackoverflow and bufferoverflow are different concepts.
Stackoverflow:
The size of a programs stack is static, it never changes at runtime. Since it is not possible to know how much memory your stack will need at runtime a reasonable big memory block is reserved. However some programs exeed this by calling a rekursive function.
A function call reserves as much space as it needs to store lokal variables on the stack and releases the memory once it exits. A recursive function will reserve new memory each time it is entered and release it once it exits. If the recursion never ends due to a programming error, more and more memory on the stack is reserved until the stack is full.
Trying to reserve memory on a full stack will cause an error, the stackoverflow.
Example code:
volatile bool args = false;
int myoverflow(int i){
int a[500];
if(args)
return a[i%500];
else
return myoverflow(i+1);
}
This should overflow the stack. It will reserve 500 * sizeof(int) every time it enters the function.
Bufferoverflow:
You have two variables, an array a and an array b. a can hold 4 elements and b can hold 2.
Now you write 5 elements into a, the 5th element lands in b.
Example:
void main(int ,char**)
{
int a[4];
int b[2];
a[5] = 22;
std::cout<<b[0];
}
This should print 22. it will write outside of a, into the memory used by b.
Note: None of my example functions are guaranteed to work, the compiler is free to optimize function calls and to arrange the memory used on the stack as it wants. It may even print a compile error on accessing memory out of bounds for array a.
Here's a good example in C showing how a buffer overflow can be used to execute arbitrary code. Its objective is to find an input string that will overwrite a return address causing a target function to be executed.
For a very good explanation of buffer overflows I would recommend chapter 5 of Writing Secure Code 2nd Edition.
Other good info on buffer overflows:
Secure programmer: Countering buffer overflows by David Wheeler
Smashing the Stack for Fun and Profit The classic article from Phrack Magazine

Resources