Exploits in use of Externally-Controlled Format String

Exploits in use of Externally-Controlled Format String - c

first, please look thought this website (https://cwe.mitre.org/data/definitions/134.html). They are some code is having vulnerability. I not really understand where is the vulnerability code, them talking about.
It has 3 code snippets with vulnerability such as PrintWrapper, Snprintf and %1$d, present on this website.

#CassieJade you need to look at the documentation of these functions online.
printf, snpritf are pretty common functions. And by the way, this platform is not for school assignments. You are most welcome if you have tried something and want to follow from there.
http://www.cplusplus.com/reference/cstdio/printf/
http://www.cplusplus.com/reference/cstdio/snprintf/
The following explains beautifully about your concern of $.
(GCC) Dollar sign in printf format string
Notation %2$d means the same as %d (output signed integer), except it formats the parameter with given 1-based number (in your case it's a second parameter, b).
int a = 3, b = 2;
printf("%2$d %1$d", a, b);
Here you would expect 3 2 to be printed, but it will print 2 3, because the parameter a becomes param#1, and b becomes param#2, and %2$d is printed first so 2 is printed first followed by %1$d which is 3
You may want to look at man page of printf, its a bit complex for newbies but its the final source of truth.
The following is your print wrapper.
char buf[5012];
memcpy(buf, argv[1], 5012);
printWrapper(argv[1]);
return (0);
Your website says: When an attacker can modify an
externally-controlled format string, this can lead to buffer
overflows, denial of service, or data representation problems.
Now, if this argv1 can be provided by someone who is not trusted, he can provide any junk argument which will go to printf. The goal of your task is to not to feed on print() with any string that is externally controlled.
e.g. argv1 can be very huge string (max allowable).
Or for example I am the one invoking your program and I passed argv1 as "%d Hello World", your printWrapper will end up printing some junk like "-446798072 Hello World", because no integer is passed as argument in printf(argv1).
Also memcpy is reading fixed number of bytes from origin argv1 which can have shorter length string, in this case it will be an invalid read (read past bound).
snprintf(buf,128,argv[1]);
exploit here is very clear, the argv1 can be changed with containment of several specifiers like %n which can write n number of bytes to your buf rather than intended write. By using %X in argc1 hacker can gain address of a variable on stack which can be exploited further. All this is vulnerable because an external untrusted source is creating the format specifier string that is used by your printf or snprintf, sprintf functions.
For example suppose hacker gave "%200d" in the argv1. sprintf(buf, 128, argv[1]);
will land up printing 200 bytes and then a junk integer, which might not be intended at all, since its snprintf which is a bounded function it will allow only 128 bytes to be written which will be empty.
I hope it is clear now.

Related

Why does C's printf format string have both %c and %s?

Why does C's printf format string have both %c and %s?
I know that %c represents a single character and %s represents a null-terminated string of characters, but wouldn't the string representation alone be enough?

Probably to distinguish between null terminated string and a character. If they just had %s, then every single character must also be null terminated.
char c = 'a';
In the above case, c must be null terminated. This is my assumption though :)

%s prints out chars until it reaches a 0 (or '\0', same thing).
If you just have a char x;, printing it with printf("%s", &x); - you'd have to provide the address, since %s expects a char* - would yield unexpected results, as &x + 1 might not be 0.
So you couldn't just print a single character unless it was null-terminated (very inefficent).
EDIT: As other have pointed out, the two expect different things in the var args parameters - one a pointer, the other a single char. But that difference is somewhat clear.

The issue that is mentioned by others that a single character would have to be null terminated isn't a real one. This could be dealt with by providing a precision to the format %.1s would do the trick.
What is more important in my view is that for %s in any of its forms you'd have to provide a pointer to one or several characters. That would mean that you wouldn't be able to print rvalues (computed expressions, function returns etc) or register variables.
Edit: I am really pissed off by the reaction to this answer, so I will probably delete this, this is really not worth it. It seems that people react on this without even having read the question or knowing how to appreciate the technicality of the question.
To make that clear: I don't say that you should prefer %.1s over %c. I only say that reasons why %c cannot be replaced by that are different than the other answer pretend to tell. These other answers are just technically wrong. Null termination is not an issue with %s.

The printf function is a variadic function, meaning that it has variable number of arguments. Arguments are pushed on the stack before the function (printf) is called. In order for the function printf to use the stack, it needs to know information about what is in the stack, the format string is used for that purpose.
e.g.
printf( "%c", ch ); tells the function the argument 'ch'
is to be interpreted as a character and sizeof(char)
whereas
printf( "%s", s ); tells the function the argument 's' is a pointer
to a null terminated string sizeof(char*)
it is not possible inside the printf function to otherwise determine stack contents e.g. distinguishing between 'ch' and 's' because in C there is no type checking during runtime.

%s says print all the characters until you find a null (treat the variable as a pointer).
%c says print just one character (treat the variable as a character code)
Using %s for a character doesn't work because the character is going to be treated like a pointer, then it's going to try to print all the characters following that place in memory until it finds a null
Stealing from the other answers to explain it in a different way.
If you wanted to print a character using %s, you could use the following to properly pass it an address of a char and to keep it from writing garbage on the screen until finding a null.
char c = 'c';
printf('%.1s', &c);

For %s, we need provide the address of string, not its value.
For %c, we provide the value of characters.
If we used the %s instead of %c, how would we provide a '\0' after the characters?

Id like to add another point of perspective to this fun question.
Really this comes down to data typing. I have seen answers on here that state that you could provide a pointer to the char, and provide a
"%.1s"
This could indeed be true. But the answer lies in the C designer's trying to provide flexibility to the programmer, and indeed a (albeit small) way of decreasing footprint of your application.
Sometimes a programmer might like to run a series of if-else statements or a switch-case, where the need is to simply output a character based upon the state. For this, hard coding the the characters could indeed take less actual space in memory as the single characters are 8 bits versus the pointer which is 32 or 64 bits (for 64 bit computers). A pointer will take up more space in memory.
If you would like to decrease the size through using actual chars versus pointers to chars, then there are two ways one could think to do this within printf types of operators. One would be to key off of the .1s, but how is the routine supposed to know for certain that you are truly providing a char type versus a pointer to a char or pointer to a string (array of chars)? This is why they went with the "%c", as it is different.
Fun Question :-)

C has the %c and %s format specifiers because they handle different types.
A char and a string are about as different as night and 1.

%c expects a char, which is an integer value and prints it according to encoding rules.
%s expects a pointer to a location of memory that contains char values, and prints the characters in that location according to encoding rules until it finds a 0 (null) character.
So you see, under the hood, the two cases while they look alike they have not much in common, as one works with values and the other with pointers. One is instructions for interpreting a specific integer value as an ascii char, and the other is iterating the contents of a memory location char by char and interpreting them until a zero value is encountered.

I have done a experiment with printf("%.1s", &c) and printf("%c", c).
I used the code below to test, and the bash's time utility the get the runing time.
#include<stdio.h>
int main(){
char c = 'a';
int i;
for(i = 0; i < 40000000; i++){
//printf("%.1s", &c); get a result of 4.3s
//printf("%c", c); get a result of 0.67s
}
return 0;
}
The result says that using %c is 10 times faster than %.1s. So, althought %s can do the job of %c, %c is still needed for performance.

Since no one has provided an answer with ANY reference whatsoever, here is a printf specification from pubs.opengroup.com which is similar to the format definition from IBM
%c
The int argument shall be converted to an unsigned char, and the resulting byte shall be written.
%s
The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte.

Format String Vulnerability troubles

So I have this function:
void print_usage(char* arg)
{
char buffer[640];
sprintf(buffer, "Usage: %s [options]\n"
"Randomly generates a password, optionally writes it to /etc/shadow\n"
"\n"
"Options:\n"
"-s, --salt <salt> Specify custom salt, default is random\n"
"-e, --seed [file] Specify custom seed from file, default is from stdin\n"
"-t, --type <type> Specify different encryption method\n"
"-v, --version Show version\n"
"-h, --help Show this usage message\n"
"\n"
"Encryption types:\n"
" 0 - DES (default)\n"
" 1 - MD5\n"
" 2 - Blowfish\n"
" 3 - SHA-256\n"
" 4 - SHA-512\n", arg);
printf(buffer);
}
I wish to utilize a format string vulnerability attack (my assignment). Here is my attempt:
I have an exploit program which fills a buffer with noops and shell code (I have used this program to buffer overflow the same function, so I know its good). Now, I did an object dump of the file to find the .dtors_list address and I got 0x0804a20c, adding 4 bytes to get the end I get 0x804a210.
Next I used gdb to find at what address my noops begin while running my program. Using this I got 0xffbfdbb8.
So up to this point I feel like I'm correct, now I know I want to use format string to copy the noop address into my .dtors_end address. Here is the string I came up with (this is the string I'm providing as user input to the function):
"\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08%%.168u%%1$n%%.51u%%2$n%%.228u%%3$n%%.64u%%4$n"
This doesn't work for me. The program runs normally and the %s is replaced with the string I input (minus the little endian memory address at the front, and the two percent signs are now one percent sign for some reason).
Anyways, I'm kind of stumped here, any help would be appreciated.

Disclaimer: I'm no expert.
You're passing "\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08%%.168u%%1$n%%.51u%%2$n%%.228u%%3$n%%.64u%%4$n" as the value of arg? That means that buffer will contain
"Usage:\x20\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08%.168u%1$n%.51u%2$n%.228u%3$n%.64u%4$n [options]\x0aRandomly..."
Now let's further assume that you're on an x86-32 target (if you're on x86-64, this won't work), and that you're compiling with an optimization level that doesn't put anything in print_usage's stack frame except for the 640-byte buffer array.
Then printf(buffer) will do the following things, in order:
Push the 4-byte address &buffer.
Push a 4-byte return address.
Invoke printf...
Print out "Usage:\x20\x10\xa2\x04\x08\x11\xa2\x04\x08\x12\xa2\x04\x08\x13\xa2\x04\x08" (a sequence of 23 bytes).
%.168u: Interpret the next argument to printf as an unsigned int and print it in a field of width 168. Since printf has no next argument, this is actually going to print the next thing on the stack; that is, the first four bytes of buffer; that is, "Usag" (0x67617355).
%1$n: Interpret the second argument to printf as a pointer to int and store 23+168 at that location. This stores 0x000000bf in location 0x67617355. So this is your main problem: You should have used %2$n instead of %1$n and added one junk byte to the front of your arg. (Incidentally, notice that GNU says "If any of the formats has a specification for the parameter position all of them in the format string shall have one. Otherwise the behavior is undefined." So you should go through and add 1$s to all your %us just to be on the safe side.)
%.51u: Print another 51 bytes of garbage.
%2$n: Interpret the third argument to printf as a pointer to int and store 0x000000f2 in that garbage location. As above, this should have been %3$n.
... etc. etc. ...
So, your major bug here is that you forgot to account for the "Usage: " prefix.
I assume you were trying to store the four bytes 0xffbfdbb8 into address 0x804a210. Let's say you'd gotten that to work. But then what would your next step be? How do you get the program to treat the four-byte quantity at 0x804a210 as a function pointer and jump through it?
The traditional way to exploit this code would be to exploit the buffer overflow in sprintf, rather than the more complicated "%n" vulnerability in printf. You just need to make your arg roughly 640 characters long and make sure that the 4 bytes of it that correspond to print_usage's return address contain the address of your NOP sled.
Even that part is tricky, though. You might conceivably be running into something related to ASLR: just because your sled exists at address 0xffbfdbb8 in one run doesn't mean it'll exist at that same address in the next run.
Does this help?

Why would one add 1 or 2 to the second argument of snprintf?

What is the role of 1 and 2 in these snprintf functions? Could anyone please explain it
snprintf(argv[arg++], strlen(pbase) + 2 + strlen("ivlpp"), "%s%ccivlpp", pbase, sep);
snprintf(argv[arg++], strlen(defines_path) + 1, "-F\"%s\"", defines_path);

The role of the +2 is to allow for a terminal null and the embedded character from the %c format, so there is exactly the right amount of space for formatting the first string. but (as 6502 points out), the actual string provided is one space shorter than needed because the strlen("ivlpp") doesn't match the civlpp in the format itself. This means that the last character (the second 'p') will be truncated in the output.
The role of the +1 is also to cause snprintf() to truncate the formatted data. The format string contains 4 literal characters, and you need to allow for the terminal null, so the code should allocate strlen(defines)+5. As it is, the snprintf() truncates the data, leaving off 4 characters.
I'm dubious about whether the code really works reliably...the memory allocation is not shown, but will have to be quite complex - or it will have to over-allocate to ensure that there is no danger of buffer overflow.
Since a comment from the OP says:
I don't know the use of snprintf()
int snprintf(char *restrict s, size_t n, const char *restrict format, ...);
The snprintf() function formats data like printf(), but it writes it to a string (the s in the name) instead of to a file. The first n in the name indicates that the function is told exactly how long the string is, and snprintf() therefore ensures that the output data is null terminated (unless the length is 0). It reports how long the string should have been; if the reported value is longer than the value provided, you know the data got truncated.
So, overall, snprintf() is a relatively safe way of formatting strings, provided you use it correctly. The examples in the question do not demonstrate 'using it correctly'.
One gotcha: if you work on MS Windows, be aware that the MSVC implementation of snprintf() does not exactly follow the C99 standard (and it looks a bit as though MS no longer provides snprintf() at all; only various alternatives such as _snprintf()). I forget the exact deviation, but I think it means that the string is not properly null-terminated in all circumstances when it should be longer than the space provided.
With locally defined arrays, you normally use:
nbytes = snprintf(buffer, sizeof(buffer), "format...", ...);
With dynamically allocated memory, you normally use:
nbytes = snprintf(dynbuffer, dynbuffsize, "format...", ...);
In both cases, you check whether nbytes contains a non-negative value less than the size argument; if it does, your data is OK; if the value is equal to or larger, then your data got chopped (and you know how much space you needed to allocate).
The C99 standard says:
The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a negative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.

The programmer whose code you are reading doesn't know how to use snprintf properly. The second argument is the buffer size, so it should almost always look like this:
snprintf(buf, sizeof buf, "..." ...);
The above is for situations where buf is an array, not a pointer. In the latter case you have to pass the buffer size along:
snprintf(buf, bufsize, "...", ...);
Computing the buffer size is unneeded.
By the way, since you tagged the question as qt-related. There is a very nice QString class that you should use instead.

At a first look both seem incorrect.
In the first case the correct computation would be path + sep + name + NUL so 2 would seem ok, but for the name the strlen call is using ilvpp while the formatting code is using instead cilvpp that is one char longer.
In the second case the number of chars added is 4 (-L"") so the number to add should be 5 because of the ending NUL.

format string vulnerability - printf

Why does this print the value of the memory address at 0x08480110? I'm not sure why there are 5 %08x arguments - where does that take you up the stack?
address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
This example is taken from page 11 of this paper http://crypto.stanford.edu/cs155/papers/formatstring-1.2.pdf

I think that the paper provides its printf() examples in a somewhat confusing way because the examples use string literals for format strings, and those don't generally permit the type of vulnerability being described. The format string vulnerability as described here depends on the format string being provided by user input.
So the example:
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
Might better be presented as:
/*
* in a real program, some user input source would be copied
* into the `outstring` buffer
*/
char outstring[80] = "\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|";
printf(outstring);
Since the outstring array is an automatic, the compiler will likely put it on the stack. After copying the user input to the outstring array, it'll look like the following as 'words' on the stack (assuming little endian):
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
The compiler will put other items on the stack as it sees fit (other local variables, saved registers, whatever).
When the printf() call is about to be made, the stack might look like:
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
var1
var2
saved ECX
saved EDI
Note that I'm completely making those entries up - each compiler will use the stack in different ways (so a format string vulnerability has to be custom crafted for a particular exact scenario. In other words, you won't always use 5 dummy format specifiers like in this example - as the attacker you'd need to figure out how many dummies the particular vulnerability would need.
Now to call printf(), the argument (the address of outstring) is pushed on to the stack and printf() is called, so the argument area of the stack looks like:
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
var1
var2
var3
saved ECX
saved EDI
&outstring // the one real argument to `printf()`
However, printf doesn't really know anything about how many arguments have been placed on the stack for it - it goes by the format specifiers it finds in the format string (the one argument it's 'sure' to get). So printf() gets the format string argument and starts processing it. When it gets to the 1st "%08x" that will correspond to the 'saved EDI' in my example, then next "%08x" will print the
saved ECX' and so on. So the "%08x" format specifiers are just eating up data on the stack until it gets back to the string the attacker was able to input. Determining how many of those are needed is something an attacker would do by a kind of trial and error (probably by a test run that has a whole slew of "%08x" formats until he can 'see' where the format string starts).
Anyway, when printf() gets to processing the "%s" format specifier, it has consumed all the stack entries up to where the outstring buffer resides. The "%s" specifier treats its stack entry as a pointer, and the string that the user has put into that buffer has been carefully crafted to have a binary representation of 0x08480110, so printf() will print out whatever is at that address as an ASCIIZ string.

You have 6 format specifiers (5 lots of %08x and one of %s), but you do not provide values for those format specifiers. You immediately fall into the realm of undefined behaviour - anything could happen and there is no wrong answer.
However, in the normal course of events, the values passed to printf() would have been stored on the stack, so the code in printf() reads values off the stack as if the extra values had been passed. The function return address is on the stack, too. There is no guarantee that I can see that the value 0x08480110 will actually be produced. This sort of attack very much depends on the the specific program and faulty function call, and you might well get a very different value. The example code is most likely written assuming a 32-bit Intel (little-endian) CPU - rather than a 64-bit or big-endian CPU.
Adapting the code fragment, compiling it into a complete program, ignoring the compilation warnings, using a 32-bit compilation on MacOS X 10.6.7 with GCC 4.2.1 (XCode 3), the following code:
#include <stdio.h>
static void somefunc(void)
{
printf("AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.|%s|\n");
}
int main(void)
{
char buffer[160] =
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz01234";
somefunc();
return 0;
}
produces the following result:
AAAAAAAAAAAAAAAA.0x000000A0.0xBFFFF11C.0x00001EC4.0x00000000.0x00001E22.0xBFFFF1C8.0x00001E5A.|abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz01234|
As you can see, I eventually 'found' the string in the main program from the printf() statement. When I compiled it in 64-bit mode, I got a core dump instead. Both results are perfectly correct; the program invokes undefined behaviour, so anything the program does is valid. If you're curious, search for 'nasal demons' for more information on undefined behaviour.
And get used to experimenting with these sorts of issues.
Another variation
#include <stdio.h>
static void somefunc(void)
{
char format[] =
"AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n";
printf(format, 1);
}
int main(void)
{
char buffer[160] =
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz01234";
somefunc();
return 0;
}
This produces:
AAAAAAAAAAAAAAAA.0x00000001.0x00000099.0x8FE467B4.0x41000024.0x41414141
.0x41414141.0x41414141.0x2E414141.0x30257830.0x302E5838.0x38302578.0x78302E58
.0x58383025.0x2578302E.0x2E583830.0x30257830.0x2E0A5838.0x30257830.0x302E5838
You might recognize the format string in the hex output - 0x41 is capital A, for example.
The 64-bit output from that code is both similar and different:
AAAAAAAAAAAAAAAA.0x00000001.0x00000000.0x00000000.0xFFE0082C.0x00000000
.0x41414141.0x41414141.0x2578302E.0x30257830.0x38302578.0x58383025.0x0A583830
.0x2E583830.0x302E5838.0x78302E58.0x2578302E.0x30257830.0x38302578.0x38302578

You misunderstood the paper.
The text you linked is assuming that the current position on the stack is 0x08480110 (look at the surrounding text). The printf() will dump data from wherever on the stack you happen to be.
The \x10\x01\x48\x08 at the beginning of the format string is merely to print the (assumed) address to stdout in front of the dumped data. In no way do these numbers modify the address from which the data is dumped.

You're correct about "take you up the stack", but only barely; it relies on the assumption that arguments are passed on the stack, rather than in registers. (Which, for a variadic function is probably a safe assumption, but still an assumption about implementation details.)
Each %08x asks for the 'next unsigned int argument' to be printed in hex; what actually occurs in that 'next argument' location is both architecture and compiler dependent. If you compare the values you get with /proc/self/maps for the process, you might be able to narrow down what some of the numbers mean.

difference betweent printf and gets

I am beginner for programming.I referred books of C programming,but i am confused.
1.) What's the difference betweent printf and gets?
I believe gets is simpler and doesn't have any formats?

printf
The printf function writes a formatted string to the standard output. A formatted string is the result of replacing placeholders with their values. This sounds a little complicated but it will become very clear with an example:
printf("Hello, my name is %s and I am %d years old.", "Andreas", 22);
Here %s and %d are the placeholders, that are substituted with the first and second argument. You should read on the man page (linked above) the list of placeholders and their options, but the ones you'll run into most often are %d (a number) and %s (a string).
Making sure that the placeholder arguments match their type is extremely important. For example, the following code will result in undefined behavior (meaning that anything can happen: the program may crash, it may work, it may corrupt data, etc):
printf("Hello, I'm %s years old.", 22);
Unfortunately in C there is no way to avoid these relatively common mistakes.
gets
The gets function is used for a completely different purpose: it reads a string from the standard input.
For example:
char name[512];
printf("What's your name? ");
gets(name);
This simple program will ask the user for a name and save what he or she types into name.
However, gets() should NEVER be used. It will open your application and the system it runs on to security vulnerabilities.
Quoting from the man page:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many characters
gets() will read, and because gets()
will continue to store characters past
the end of the buffer, it is extremely
dangerous to use. It has been used to
break computer security. Use fgets()
instead.
Explained in a more simple way the problem is that if the variable you give gets (name in this case) is not big enough to hold what the user types a buffer overflow will occur, which is, gets will write past the end of the variable. This is undefined behavior and on some systems it will allow execution of arbitrary code by the attacker.
Since the variable must have a finite, static size and you can't set a limit of the amount of characters the user can type as the input, gets() is never secure and should never be used. It exists only for historical reasons.
As the manual suggested, you should use fgets instead. It has the same purpose as gets but has a size argument that specifies the size of the variable:
char *fgets(char *s, int size, FILE *stream);
So, the program above would become:
char name[512];
printf("What's your name? ");
fgets(name, sizeof(name) /* 512 */, stdin /* The standard input */);

They fundamentally perform different tasks.
printf: prints out text to a console.
gets: reads in input from the keyboard.

printf: allowing you to format a string from components (ie. taking results from variables), and when output to stdout, it does not append new line character. You have to do this by inserting '\n' in the format string.
puts: only output a string to stdout, but does append new line afterward.
scanf: scan the input fields, one character at a time, and convert them according to the given format.
gets: simply read a string from stdin, with no format consideration, the return character is replaced by string terminator '\0'.
http://en.wikipedia.org/wiki/Printf
http://en.wikipedia.org/wiki/Gets

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Exploits in use of Externally-Controlled Format String - c

Related

Why does C's printf format string have both %c and %s?

Format String Vulnerability troubles

Why would one add 1 or 2 to the second argument of snprintf?

format string vulnerability - printf

difference betweent printf and gets

Categories

Resources