format string vulnerability - printf - c

Why does this print the value of the memory address at 0x08480110? I'm not sure why there are 5 %08x arguments - where does that take you up the stack?
address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
This example is taken from page 11 of this paper http://crypto.stanford.edu/cs155/papers/formatstring-1.2.pdf

I think that the paper provides its printf() examples in a somewhat confusing way because the examples use string literals for format strings, and those don't generally permit the type of vulnerability being described. The format string vulnerability as described here depends on the format string being provided by user input.
So the example:
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
Might better be presented as:
/*
* in a real program, some user input source would be copied
* into the `outstring` buffer
*/
char outstring[80] = "\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|";
printf(outstring);
Since the outstring array is an automatic, the compiler will likely put it on the stack. After copying the user input to the outstring array, it'll look like the following as 'words' on the stack (assuming little endian):
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
The compiler will put other items on the stack as it sees fit (other local variables, saved registers, whatever).
When the printf() call is about to be made, the stack might look like:
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
var1
var2
saved ECX
saved EDI
Note that I'm completely making those entries up - each compiler will use the stack in different ways (so a format string vulnerability has to be custom crafted for a particular exact scenario. In other words, you won't always use 5 dummy format specifiers like in this example - as the attacker you'd need to figure out how many dummies the particular vulnerability would need.
Now to call printf(), the argument (the address of outstring) is pushed on to the stack and printf() is called, so the argument area of the stack looks like:
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
var1
var2
var3
saved ECX
saved EDI
&outstring // the one real argument to `printf()`
However, printf doesn't really know anything about how many arguments have been placed on the stack for it - it goes by the format specifiers it finds in the format string (the one argument it's 'sure' to get). So printf() gets the format string argument and starts processing it. When it gets to the 1st "%08x" that will correspond to the 'saved EDI' in my example, then next "%08x" will print the
saved ECX' and so on. So the "%08x" format specifiers are just eating up data on the stack until it gets back to the string the attacker was able to input. Determining how many of those are needed is something an attacker would do by a kind of trial and error (probably by a test run that has a whole slew of "%08x" formats until he can 'see' where the format string starts).
Anyway, when printf() gets to processing the "%s" format specifier, it has consumed all the stack entries up to where the outstring buffer resides. The "%s" specifier treats its stack entry as a pointer, and the string that the user has put into that buffer has been carefully crafted to have a binary representation of 0x08480110, so printf() will print out whatever is at that address as an ASCIIZ string.

You have 6 format specifiers (5 lots of %08x and one of %s), but you do not provide values for those format specifiers. You immediately fall into the realm of undefined behaviour - anything could happen and there is no wrong answer.
However, in the normal course of events, the values passed to printf() would have been stored on the stack, so the code in printf() reads values off the stack as if the extra values had been passed. The function return address is on the stack, too. There is no guarantee that I can see that the value 0x08480110 will actually be produced. This sort of attack very much depends on the the specific program and faulty function call, and you might well get a very different value. The example code is most likely written assuming a 32-bit Intel (little-endian) CPU - rather than a 64-bit or big-endian CPU.
Adapting the code fragment, compiling it into a complete program, ignoring the compilation warnings, using a 32-bit compilation on MacOS X 10.6.7 with GCC 4.2.1 (XCode 3), the following code:
#include <stdio.h>
static void somefunc(void)
{
printf("AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.|%s|\n");
}
int main(void)
{
char buffer[160] =
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz01234";
somefunc();
return 0;
}
produces the following result:
AAAAAAAAAAAAAAAA.0x000000A0.0xBFFFF11C.0x00001EC4.0x00000000.0x00001E22.0xBFFFF1C8.0x00001E5A.|abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz01234|
As you can see, I eventually 'found' the string in the main program from the printf() statement. When I compiled it in 64-bit mode, I got a core dump instead. Both results are perfectly correct; the program invokes undefined behaviour, so anything the program does is valid. If you're curious, search for 'nasal demons' for more information on undefined behaviour.
And get used to experimenting with these sorts of issues.
Another variation
#include <stdio.h>
static void somefunc(void)
{
char format[] =
"AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n";
printf(format, 1);
}
int main(void)
{
char buffer[160] =
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz01234";
somefunc();
return 0;
}
This produces:
AAAAAAAAAAAAAAAA.0x00000001.0x00000099.0x8FE467B4.0x41000024.0x41414141
.0x41414141.0x41414141.0x2E414141.0x30257830.0x302E5838.0x38302578.0x78302E58
.0x58383025.0x2578302E.0x2E583830.0x30257830.0x2E0A5838.0x30257830.0x302E5838
You might recognize the format string in the hex output - 0x41 is capital A, for example.
The 64-bit output from that code is both similar and different:
AAAAAAAAAAAAAAAA.0x00000001.0x00000000.0x00000000.0xFFE0082C.0x00000000
.0x41414141.0x41414141.0x2578302E.0x30257830.0x38302578.0x58383025.0x0A583830
.0x2E583830.0x302E5838.0x78302E58.0x2578302E.0x30257830.0x38302578.0x38302578

You misunderstood the paper.
The text you linked is assuming that the current position on the stack is 0x08480110 (look at the surrounding text). The printf() will dump data from wherever on the stack you happen to be.
The \x10\x01\x48\x08 at the beginning of the format string is merely to print the (assumed) address to stdout in front of the dumped data. In no way do these numbers modify the address from which the data is dumped.

You're correct about "take you up the stack", but only barely; it relies on the assumption that arguments are passed on the stack, rather than in registers. (Which, for a variadic function is probably a safe assumption, but still an assumption about implementation details.)
Each %08x asks for the 'next unsigned int argument' to be printed in hex; what actually occurs in that 'next argument' location is both architecture and compiler dependent. If you compare the values you get with /proc/self/maps for the process, you might be able to narrow down what some of the numbers mean.

Related

Exploits in use of Externally-Controlled Format String

first, please look thought this website (https://cwe.mitre.org/data/definitions/134.html). They are some code is having vulnerability. I not really understand where is the vulnerability code, them talking about.
It has 3 code snippets with vulnerability such as PrintWrapper, Snprintf and %1$d, present on this website.
#CassieJade you need to look at the documentation of these functions online.
printf, snpritf are pretty common functions. And by the way, this platform is not for school assignments. You are most welcome if you have tried something and want to follow from there.
http://www.cplusplus.com/reference/cstdio/printf/
http://www.cplusplus.com/reference/cstdio/snprintf/
The following explains beautifully about your concern of $.
(GCC) Dollar sign in printf format string
Notation %2$d means the same as %d (output signed integer), except it formats the parameter with given 1-based number (in your case it's a second parameter, b).
int a = 3, b = 2;
printf("%2$d %1$d", a, b);
Here you would expect 3 2 to be printed, but it will print 2 3, because the parameter a becomes param#1, and b becomes param#2, and %2$d is printed first so 2 is printed first followed by %1$d which is 3
You may want to look at man page of printf, its a bit complex for newbies but its the final source of truth.
The following is your print wrapper.
char buf[5012];
memcpy(buf, argv[1], 5012);
printWrapper(argv[1]);
return (0);
Your website says: When an attacker can modify an
externally-controlled format string, this can lead to buffer
overflows, denial of service, or data representation problems.
Now, if this argv1 can be provided by someone who is not trusted, he can provide any junk argument which will go to printf. The goal of your task is to not to feed on print() with any string that is externally controlled.
e.g. argv1 can be very huge string (max allowable).
Or for example I am the one invoking your program and I passed argv1 as "%d Hello World", your printWrapper will end up printing some junk like "-446798072 Hello World", because no integer is passed as argument in printf(argv1).
Also memcpy is reading fixed number of bytes from origin argv1 which can have shorter length string, in this case it will be an invalid read (read past bound).
snprintf(buf,128,argv[1]);
exploit here is very clear, the argv1 can be changed with containment of several specifiers like %n which can write n number of bytes to your buf rather than intended write. By using %X in argc1 hacker can gain address of a variable on stack which can be exploited further. All this is vulnerable because an external untrusted source is creating the format specifier string that is used by your printf or snprintf, sprintf functions.
For example suppose hacker gave "%200d" in the argv1. sprintf(buf, 128, argv[1]);
will land up printing 200 bytes and then a junk integer, which might not be intended at all, since its snprintf which is a bounded function it will allow only 128 bytes to be written which will be empty.
I hope it is clear now.

i try this piece of code in all possible way ,but i cant find why?

Here is the program in C and its output
#include <stdio.h>
#include <conio.h>
void main()
{
int i, t[4], s[4];
for(i=0;i<=3;i++)
{
printf("\n%d",&s[i]);
printf(" %d",&t[i]);
}
for(i=0;i<=3;i++)
{
printf("\n%d %d",&s[i],&t[i]);
}
}
output:
8600 8608
8602 8610
8604 8612
8606 8614
8600 8641
8602 8641
8604 8641
8606 8641
I want to know what exactly happened in second for loop statement that making different from first for loop.
The only obvious problem in your program is that you are passing pointer arguments corresponding to printf's %d format. This is undefined behavior. It can happen to work for some compilation platforms, but you shouldn't count on it.
The most likely explanation is that the ABI for passing pointer arguments to a variadic functions such as printf is, on your platform, different from the ABI for passing int arguments. For all we know, on your platform, pointers are not even the same width as int.
Use the %p format to print a pointer. Or better, use printf("%p", (void*)…);, which is even more portable, in case not all pointer types have the same representation.
The problem is that you are using the wrong format code for printing a pointer. As #PascalCuoq says, you should use %p, not %d.
The reason is that pointers and integers are clearly not the same size, on your system.
When you pass the two pointers to different printf calls %d will print the first part of the pointer value.
When you pass the two pointers to the same printf call, getting the lengths wrong will mean that it will print two different values that do not line up with either pointer.
Your printf statements are printing an integer, put you're putting a pointer (&t[i] means address of the i th element of the t array).
An integer and a pointer are not necessarily the same number of bytes and most implementations of printf takes a fixed number of bytes from the stack for each % field. Also the 'endianism' of the machine will determine whether the least or most significant bit of the address are used as in integer when printf takes its field data from the stack. It looks like you are running on a 16 bit machine with 24 bit addresses and LSB ordering - some kind of micro-controller, I'd guess.
Your arrays are at the memory addresses (converted to hex from your output:
s : 0xC12198
t : 0xC121A0
(24 bit addreses, I think.)
The first loop handles each array seperately in diffierent printf statements, hence you can see the least significant bits of each array incrementing with each iteration.
The second loop tries to handle both arrays in one `printf. So you get values indicating the incrementing part of one of the addresses, plus the second is the most significant part of the address, which is not incrementing, and the second array's address is not output at all.

Why output length is coming 6?

I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).

Format String Vulnerability troubles

So I have this function:
void print_usage(char* arg)
{
char buffer[640];
sprintf(buffer, "Usage: %s [options]\n"
"Randomly generates a password, optionally writes it to /etc/shadow\n"
"\n"
"Options:\n"
"-s, --salt <salt> Specify custom salt, default is random\n"
"-e, --seed [file] Specify custom seed from file, default is from stdin\n"
"-t, --type <type> Specify different encryption method\n"
"-v, --version Show version\n"
"-h, --help Show this usage message\n"
"\n"
"Encryption types:\n"
" 0 - DES (default)\n"
" 1 - MD5\n"
" 2 - Blowfish\n"
" 3 - SHA-256\n"
" 4 - SHA-512\n", arg);
printf(buffer);
}
I wish to utilize a format string vulnerability attack (my assignment). Here is my attempt:
I have an exploit program which fills a buffer with noops and shell code (I have used this program to buffer overflow the same function, so I know its good). Now, I did an object dump of the file to find the .dtors_list address and I got 0x0804a20c, adding 4 bytes to get the end I get 0x804a210.
Next I used gdb to find at what address my noops begin while running my program. Using this I got 0xffbfdbb8.
So up to this point I feel like I'm correct, now I know I want to use format string to copy the noop address into my .dtors_end address. Here is the string I came up with (this is the string I'm providing as user input to the function):
"\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08%%.168u%%1$n%%.51u%%2$n%%.228u%%3$n%%.64u%%4$n"
This doesn't work for me. The program runs normally and the %s is replaced with the string I input (minus the little endian memory address at the front, and the two percent signs are now one percent sign for some reason).
Anyways, I'm kind of stumped here, any help would be appreciated.
Disclaimer: I'm no expert.
You're passing "\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08%%.168u%%1$n%%.51u%%2$n%%.228u%%3$n%%.64u%%4$n" as the value of arg? That means that buffer will contain
"Usage:\x20\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08%.168u%1$n%.51u%2$n%.228u%3$n%.64u%4$n [options]\x0aRandomly..."
Now let's further assume that you're on an x86-32 target (if you're on x86-64, this won't work), and that you're compiling with an optimization level that doesn't put anything in print_usage's stack frame except for the 640-byte buffer array.
Then printf(buffer) will do the following things, in order:
Push the 4-byte address &buffer.
Push a 4-byte return address.
Invoke printf...
Print out "Usage:\x20\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08" (a sequence of 23 bytes).
%.168u: Interpret the next argument to printf as an unsigned int and print it in a field of width 168. Since printf has no next argument, this is actually going to print the next thing on the stack; that is, the first four bytes of buffer; that is, "Usag" (0x67617355).
%1$n: Interpret the second argument to printf as a pointer to int and store 23+168 at that location. This stores 0x000000bf in location 0x67617355. So this is your main problem: You should have used %2$n instead of %1$n and added one junk byte to the front of your arg. (Incidentally, notice that GNU says "If any of the formats has a specification for the parameter position all of them in the format string shall have one. Otherwise the behavior is undefined." So you should go through and add 1$s to all your %us just to be on the safe side.)
%.51u: Print another 51 bytes of garbage.
%2$n: Interpret the third argument to printf as a pointer to int and store 0x000000f2 in that garbage location. As above, this should have been %3$n.
... etc. etc. ...
So, your major bug here is that you forgot to account for the "Usage: " prefix.
I assume you were trying to store the four bytes 0xffbfdbb8 into address 0x804a210. Let's say you'd gotten that to work. But then what would your next step be? How do you get the program to treat the four-byte quantity at 0x804a210 as a function pointer and jump through it?
The traditional way to exploit this code would be to exploit the buffer overflow in sprintf, rather than the more complicated "%n" vulnerability in printf. You just need to make your arg roughly 640 characters long and make sure that the 4 bytes of it that correspond to print_usage's return address contain the address of your NOP sled.
Even that part is tricky, though. You might conceivably be running into something related to ASLR: just because your sled exists at address 0xffbfdbb8 in one run doesn't mean it'll exist at that same address in the next run.
Does this help?

What is the difference between printf() and puts() in C?

I know you can print with printf() and puts(). I can also see that printf() allows you to interpolate variables and do formatting.
Is puts() merely a primitive version of printf(). Should it be used for every possible printf() without string interpolation?
puts is simpler than printf but be aware that the former automatically appends a newline. If that's not what you want, you can fputs your string to stdout or use printf.
(This is pointed out in a comment by Zan Lynx, but I think it deserves an answer - given that the accepted answer doesn't mention it).
The essential difference between puts(mystr); and printf(mystr); is that in the latter the argument is interpreted as a formatting string. The result will be often the same (except for the added newline) if the string doesn't contain any control characters (%) but if you cannot rely on that (if mystr is a variable instead of a literal), you should not use it.
So, it's generally dangerous - and conceptually wrong - to pass a dynamic string as single argument of printf:
char * myMessage;
// ... myMessage gets assigned at runtime, unpredictable content
printf(myMessage); // <--- WRONG! (what if myMessage contains a '%' char?)
puts(myMessage); // ok
printf("%s\n",myMessage); // ok, equivalent to the previous, perhaps less efficient
The same applies to fputs vs fprintf (but fputs doesn't add the newline).
Besides formatting, puts returns a nonnegative integer if successful or EOF if unsuccessful; while printf returns the number of characters printed (not including the trailing null).
In simple cases, the compiler converts calls to printf() to calls to puts().
For example, the following code will be compiled to the assembly code I show next.
#include <stdio.h>
main() {
printf("Hello world!");
return 0;
}
push rbp
mov rbp,rsp
mov edi,str.Helloworld!
call dword imp.puts
mov eax,0x0
pop rbp
ret
In this example, I used GCC version 4.7.2 and compiled the source with gcc -o hello hello.c.
In my experience, printf() hauls in more code than puts() regardless of the format string.
If I don't need the formatting, I don't use printf. However, fwrite to stdout works a lot faster than puts.
static const char my_text[] = "Using fwrite.\n";
fwrite(my_text, 1, sizeof(my_text) - sizeof('\0'), stdout);
Note: per comments, '\0' is an integer constant. The correct expression should be sizeof(char) as indicated by the comments.
int puts(const char *s);
puts() writes the string s and a trailing newline to stdout.
int printf(const char *format, ...);
The function printf() writes output to stdout, under the control of a format string that specifies how subsequent arguments are converted for output.
I'll use this opportunity to ask you to read the documentation.
Right, printf could be thought of as a more powerful version of puts. printf provides the ability to format variables for output using format specifiers such as %s, %d, %lf, etc...
the printf() function is used to print both strings and variables to the screen while the puts() function only permits you to print a string only to your screen.
puts is the simple choice and adds a new line in the end and printfwrites the output from a formatted string.
See the documentation for puts
and for printf.
I would recommend to use only printf as this is more consistent than switching method, i.e if you are debbugging it is less painfull to search all printfs than puts and printf. Most times you want to output a variable in your printouts as well, so puts is mostly used in example code.
When comparing puts() and printf(), even though their memory consumption is almost the same, puts() takes more time compared to printf().

Resources