Overwrite value in memory by taking in user input - c

This is related to a stack smash attack.
Basically, I am trying to smash the stack by giving a program a particular input. The program takes in a user input like this, using getchar:
for (i = 0; (c = getchar()) != '\n'; i++) buf[i] = c;
I want to overwrite memory to become 0x000000a1. Unfortunately, 0xa1 is not an ascii character, so I cannot just input something like ¡ (inverted exclamation) because that ends up giving 0x0000a1c2 in memory. How can I overwrite the value to be just 0x000000a1 without changing how the user input is processed in the program?

You can use bash to inject arbitrary characters:
echo -e '\xA1' | /path/to/program
You can add additional input, put the echo in a loop, etc.
echo -e 'Something\xA1\xA1\xA1' | /path/to/program

Your system's information is not provided, but usually the standard input is just a byte stream. It means that you can send arbitrary byte stream, not just valid characters.
For example, if your victim program is ./a.out, you can create a program to emit a payload
#include <stdio.h>
int main(void) {
putchar(0xa1);
putchar('\n'); /* to have the victim finish reading input */
return 0;
}
and compile to, for example, ./b.out and execute using a pipe
$ ./b.out | ./a.out
($ is your terminal's prompt)

Related

Strange behavior with read()

I discovered the function read(), but I don't understand everything.
Here is my code:
#include <unistd.h>
#include <stdio.h>
int main(void)
{
char array[10];
int ret;
printf("read : ");
fflush(stdout);
array[sizeof(array) - 1] = '\0';
ret = read(STDIN_FILENO, array, sizeof(array) - 1);
printf("array = %s\n", array);
printf("characters read = %d\n", ret);
//getchar();
return (0);
}
Here is an example of the running program :
$> ./a.out
read : hi guys how are you
array = hi guys h
characters read = 9
$> ow are you
zsh: command not found: ow
$>
Why is it launching a shell command after the end of the program?
I noticed that if I uncomment the getchar() line, this strange behavior disappears. I'd like to understand what is going on, if someone has an idea :)
Your call to read is reading in the first 9 characters of what you've type. Anything else will be left in the input buffer so that when you program exits, your shell will read it instead.
You should check the return value of read so you know how much has been read as it's not guaranteed that it'll be the amount you ask for and also the value returned is used to indicate an error.
The string read in won't be null-terminated either, so you also should use the return value (if positive) to put the NUL character in so that your string is valid.
If you want to read in the whole line, you'll need to put in a loop and identify when there is an end of line character (most likely '\n').
You typed about 20 characters, but you only read 9 characters with read(). Everything after that was left in the terminal driver's input buffer. So when the shell called read() after the program exited, it got the rest of the line, and tried to execute it as a command.
To prevent this, you should keep reading until you get to the end of the line.

how to use a GDB input file for multiple input

EDIT: GDB was not the issue. Bugs in my code created the behaviour.
I am wondering how GDB's input works.
For example I created the following small c program:
#include <stdlib.h>
#include <stdio.h>
int main(){
setbuf(stdout,NULL);
printf("first:\n");
char *inp;
size_t k = 0;
getline(&inp, &k, stdin);
printf("%s",inp);
free(inp);
// read buffer overflow
printf("second:\n");
char buf[0x101];
read(fileno(stdin),buf,0x100);
printf("%s",buf);
printf("finished\n");
}
It reads two times a string from stdin and prints the echo of it.
To automate this reading I created following python code:
python3 -c 'import sys,time; l1 = b"aaaa\n"; l2 = b"bbbb\n"; sys.stdout.buffer.write(l1); sys.stdout.buffer.flush(); time.sleep(1); sys.stdout.buffer.write(l2); sys.stdout.buffer.flush();'
Running the c programm works fine. Running the c program with the python input runs fine, too:
python-snippet-above | ./c-program
Running gdb without an input file, typing the strings when requested, seems also fine.
But when it comes to using an inputfile in gdb, I am afraid I am using the debugger wrongly.
Through tutorials and stackoverflow posts I know that gdb can take input via file.
So I tried:
& python-snippet > in
& gdb ./c-program
run < in
I expected that gdb would use for the first read the first line of the file in and for the second read the second line of in.
in looks like (due to the python code):
aaaa
bbbb
But instead gdb prints:
(gdb) r < in
Starting program: /home/user/tmp/stackoverflow/test < in
first:
aaaa
second:
finished
[Inferior 1 (process 24635) exited with code 011]
Observing the variable buf after read(fileno(stdin),buf,0x100) shows me:
(gdb) print buf
$1 = 0x0
So i assume that my second input (bbbb) gets lost. How can I use multiple input inside gdb?
Thanks for reading :)
I am wondering how GDB's input works.
Your problem doesn't appear to have anything to with GDB, and everything to do with bugs in your program itself.
First, if you run the program outside of GDB in the same way, namely:
./a.out < in
you should see the same behavior that you see in GDB. Here is what I see:
./a.out < in
first:
aaaa
second:
p ��finished
So what are the bugs?
The first one: from "man getline"
getline() reads an entire line from stream, storing the address
of the buffer containing the text into *lineptr.
If *lineptr is NULL, then getline() will allocate a buffer
for storing the line, which should be freed by the user program.
You did not set inp to NULL, nor to an allocated buffer. If inp didn't happen to be NULL, you would have gotten heap corruption.
Second bug: you don't check return value from read. If you did, you'd discover that it returns 0, and therefore your printf("%s",buf); prints uninitialized values (which are visible in my terminal as ��).
Third bug: you are expecting read to return the second line. But you used getline on stdin before, and when reading from a file, stdin will use full buffering. Since your input is small, the first getline tries to read BUFSIZ worth of data, and reads (buffers) all of it. A subsequent read (naturally) returns 0 since you've already reached end of file.
You have setbuf(stdout,NULL);. Did you mean to disable buffering on stdin instead?
Fourth bug: read does not NUL-terminate the string, you have to do that yourself, before you can call printf("%s", ...) on it.
With the bugs corrected, I get expected:
first:
aaaa
second:
bbbb
finished

Two distinct characters sharing same ASCII value in C

I am using Linux x86_64 with gcc 4.8.1.
Code:
#include <stdio.h>
int main(int argc, char *argv[])
{
int ch;
do
{
printf("ch : ");
ch = getchar(); //Q Why CTRL+M = 10 and not 13?
getchar();
printf("ch = %d\n\n", ch);
}while(ch != 'z');
return 0;
}
Output:
ch : ^N
ch = 14
ch :
ch = 10
ch :
ch = 10
ch : z
ch = 122
Question:
In above program when I enter Ctrl+J (linefeed character) it spits 10 which is indeed the ASCII of \n But when I feed Ctrl+M (carriage-return character) then too it spits 10 instead of 13 (ASCII value of \r).
What's going on? Does \n and \r share the same ASCII value? Then which character represents ASCII 13?
EDIT:
$ uname -a
Linux Titanic 3.11.0-26-generic #45-Ubuntu SMP Tue Jul 15 04:02:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The problem is that ICRNL is enabled in the terminal driver. Here's a snippet from the man page for tcsetattr(3), which is used from C to set terminal attributes:
ICRNL Translate carriage return to newline on input (unless IGNCR is set).
To disable ICRNL, you can run the following command before your program (or use tcsetattr() directly):
$ stty -icrnl
stty -a lets you view the current terminal settings.
Note that the above will prevent your Enter key from working normally too (since it generates a carriage return, while the terminal driver is waiting for a linefeed to terminate the line before sending it to your program). You will have to use Ctrl-J instead. :)
Below is a tangent on why Return still works in the shell with IGNCR disabled (in Bash at least), in case you're interested:
Bash uses the readline library to read commands. Before reading a command, readline puts the terminal into noncanonical mode, where input is unbuffered (can be read a character at a time, as soon as a character is typed). readline therefore sees the carriage return character as soon as it is typed, and happens to accept it as a line terminator.
Noncanonical mode is needed to implement fancy line editing like being able to move the cursor with the cursor keys and insert text in the middle of a command. Text UI libraries like ncurses also use this mode.
While your C program runs, the terminal is in canonical mode instead, where the terminal driver does line buffering (sends the input to the process a line at a time). This mode only has rudimentary line editing (e.g., erasing is supported) and does not interpret the cursor keys, which is why you get strange characters sequences on the screen when you press them. (Those characters are the terminal escape sequences generated by the cursor keys, which become visible in this mode. A handy command to experiment with is a plain cat with no arguments.)
Canonical mode is enabled/disabled through ICANON, which is an option just like IGNCR. Experimenting with it from the shell might be a bit tricky though, since the shell sets and resets it as programs (like stty) are run.
I don't know the ^J keyboard shortcut, but I'm willing to bet that if you feed your code with a fixed character (not read from terminal) '\r' and '\n' you'll get the proper ASCII values. This means it is either your terminal setup that's wrong like #alk said, or ^J doesn't do what you think it does...

Forcing a program to call a function in C with an input string

So I'm doing an exercise where I want to call the function void not_called() just by inputting a buffer. Basically what I want to do is use a buffer overflow to call not_called(). I'm approaching this by using a binary exploit string then using a program hex2raw (takes hex format then turns it into the ASCII for decimal digit.) I'm then going to put that binary exploit string into a .txt file, then use a series of pipes in the unix terminal to call not_called() like so:
cat exploit.txt | ./hex2raw | ./nameofpgrm
So what I'm struggling with is finding that binary exploit string. I think what I need to do is find the location in memory where not_called is called with an objdump, but I'm not sure. Any help on what I can do? I know I'm going to have to use gdb to find it. I just don't really know where to look.
#include <stdlib.h>
#include <stdio.h>
void echo();
/* Main program */
int main() {
while (1)
echo();
return(0); // never called
} // main
/* My gets -- just like gets - Get a string from stdin */
char *mygets(char *dest) {
int c = getchar();
char *p = dest;
while (c != EOF && c != '\n') {
*p++ = c;
c = getchar();
}
*p = '\0';
return dest;
} // mygets
/* Echo Line */
void echo() {
char buf[4]; /* Way too small */
mygets(buf);
puts(buf);
} // echo
void not_called() {
printf("This routine is never called\n");
printf("If you see this message, something bad has happend\n");
exit(0);
} // not_called
You want to overwrite the return address from the function echo with bytes read from stdin so that is now points to not_called entry point.
Let's use for example Mac OS/X 10.10 aka Yosemite. I simplified the code and added an extra printf to get the actual address of the function not_called:
#include <stdlib.h>
#include <stdio.h>
void echo(void) {
char buf[4]; /* Way too small */
gets(buf);
puts(buf);
}
void not_called(void) {
printf("This routine is never called\n");
printf("If you see this message, something bad has happened\n");
exit(0);
}
int main(void) {
printf("not_called is at address %p\n", not_called);
echo();
}
Let's compile and execute this code using clang:
chqrlie> clang t20.c && ./a.out
The output is quite clear:
not_called is at address 0x106dade50
warning: this program uses gets(), which is unsafe.
Using a hex editor, let's coin the input and paste it to the console: the short buffer buf aligned on 64 bits, 8 bytes below the saved copy of the stack frame pointer rbp, itself followed by the return address we want to overwrite. The input in hex is for example:
0000 3031 3233 3435 3637-3839 3031 3233 3435 0123456789012345
0010 50de da06 0100 0000- P��.....
Let's paste these 24 bytes to the console and hit enter:
0123456789012345P��^F^A^#^#^#
0123456789012345P��^F^A
This routine is never called
If you see this message, something bad has happened
Segmentation fault: 11
Function echo uses gets to read stdin, the 24 bytes are stored beyond the end of buf, overwriting the frame pointer rbp, the return address, and an extra 0 byte. echo then calls puts to output the string in buf. Output stops at the first "'\0'" as expected. rbp is then restored from the stack and gets a corrupt value, control is transferred to the return address. The return address was overwritten with that of function not_called, so that's what gets executed next. Indeed we see the message from function not_called and for some reason exit crashes instead of exiting the process gracefully.
I used gets on purpose so readers understand how easy it to cause buffer overflows with this function. No matter how big the buffer, input can be coined to crash the program or make it do interesting things.
Another interesting find is how Mac OS/X tries to prevent attackers from using this trick too easily: the address printed by the program varies from one execution to the next:
chqrlie > ./a.out < /dev/null
not_called is at address 0x101db8e50
warning: this program uses gets(), which is unsafe.
chqrlie > ./a.out < /dev/null
not_called is at address 0x10af4ae50
warning: this program uses gets(), which is unsafe.
chqrlie > ./a.out < /dev/null
not_called is at address 0x102a46e50
warning: this program uses gets(), which is unsafe.
The code is loaded at a different address each time, chosen randomly.
The input required to make function echo return to not_called is different each time. Try your own OS and check if it uses this trick. Try coining the appropriate input to get the job done (it depends on your compiler and your system). Have fun!

EOF in Windows command prompt doesn't terminate input stream

Code:
#include <stdio.h>
#define NEWLINE '\n'
#define SPACE ' '
int main(void)
{
int ch;
int count = 0;
while((ch = getchar()) != EOF)
{
if(ch != NEWLINE && ch != SPACE)
count++;
}
printf("There are %d characters input\n" , count);
return 0;
}
Question:
Everything works just fine, it will ignore spaces and newline and output the number of characters input to the screen (in this program I just treat comma, exclamation mark, numbers or any printable special symbol character like ampersand as character too) when I hit the EOF simulation which is ^z.
But there's something wrong when I input this line to the program. For example I input this: abcdefg^z, which means I input some character before and on the same line as ^z. Instead of terminating the program and print out total characters, the program would continue to ask for input.
The EOF terminating character input only works when I specify ^z on a single line or by doing this: ^zabvcjdjsjsj. Why is this happening?
This is true in almost every terminal driver. You'll get the same behavior using Linux.
Your program isn't actually executing the loop until \n or ^z has been entered by you at the end of a line. The terminal driver is buffering the input and it hasn't been sent to your process until that occurs.
At the end of a line, hitting ^z (or ^d on Linux) does not cause the terminal driver to send EOF. It only makes it flush the buffer to your process (with no \n).
Hitting ^z (or ^d on Linux) at the start of a line is interpreted by the terminal as "I want to signal EOF".
You can observe this behavior if you add the following inside your loop:
printf("%d\n",ch);
Run your program:
$ ./test
abc <- type "abc" and hit "enter"
97
98
99
10
abc97 <- type "abc" and hit "^z"
98
99
To better understand this, you have to realize that EOF is not a character. ^z is a user command for the terminal itself. Because the terminal is responsible for taking user input and passing it to processes, this gets tricky and thus the confusion.
A way to see this is by hitting ^v then hitting ^z as input to your program.
^v is another terminal command that tells the terminal, "Hey, the next thing I type - don't interpret that as a terminal command; pass it to the process' input instead".
^Z is only translated by the console to an EOF signal to the program when it is typed at the start of a line. That's just the way that the Windows console works. There is no "workaround" to this behaviour that I know of.

Resources