I know you can print with printf() and puts(). I can also see that printf() allows you to interpolate variables and do formatting.
Is puts() merely a primitive version of printf(). Should it be used for every possible printf() without string interpolation?
puts is simpler than printf but be aware that the former automatically appends a newline. If that's not what you want, you can fputs your string to stdout or use printf.
(This is pointed out in a comment by Zan Lynx, but I think it deserves an answer - given that the accepted answer doesn't mention it).
The essential difference between puts(mystr); and printf(mystr); is that in the latter the argument is interpreted as a formatting string. The result will be often the same (except for the added newline) if the string doesn't contain any control characters (%) but if you cannot rely on that (if mystr is a variable instead of a literal), you should not use it.
So, it's generally dangerous - and conceptually wrong - to pass a dynamic string as single argument of printf:
char * myMessage;
// ... myMessage gets assigned at runtime, unpredictable content
printf(myMessage); // <--- WRONG! (what if myMessage contains a '%' char?)
puts(myMessage); // ok
printf("%s\n",myMessage); // ok, equivalent to the previous, perhaps less efficient
The same applies to fputs vs fprintf (but fputs doesn't add the newline).
Besides formatting, puts returns a nonnegative integer if successful or EOF if unsuccessful; while printf returns the number of characters printed (not including the trailing null).
In simple cases, the compiler converts calls to printf() to calls to puts().
For example, the following code will be compiled to the assembly code I show next.
#include <stdio.h>
main() {
printf("Hello world!");
return 0;
}
push rbp
mov rbp,rsp
mov edi,str.Helloworld!
call dword imp.puts
mov eax,0x0
pop rbp
ret
In this example, I used GCC version 4.7.2 and compiled the source with gcc -o hello hello.c.
In my experience, printf() hauls in more code than puts() regardless of the format string.
If I don't need the formatting, I don't use printf. However, fwrite to stdout works a lot faster than puts.
static const char my_text[] = "Using fwrite.\n";
fwrite(my_text, 1, sizeof(my_text) - sizeof('\0'), stdout);
Note: per comments, '\0' is an integer constant. The correct expression should be sizeof(char) as indicated by the comments.
int puts(const char *s);
puts() writes the string s and a trailing newline to stdout.
int printf(const char *format, ...);
The function printf() writes output to stdout, under the control of a format string that specifies how subsequent arguments are converted for output.
I'll use this opportunity to ask you to read the documentation.
Right, printf could be thought of as a more powerful version of puts. printf provides the ability to format variables for output using format specifiers such as %s, %d, %lf, etc...
the printf() function is used to print both strings and variables to the screen while the puts() function only permits you to print a string only to your screen.
puts is the simple choice and adds a new line in the end and printfwrites the output from a formatted string.
See the documentation for puts
and for printf.
I would recommend to use only printf as this is more consistent than switching method, i.e if you are debbugging it is less painfull to search all printfs than puts and printf. Most times you want to output a variable in your printouts as well, so puts is mostly used in example code.
When comparing puts() and printf(), even though their memory consumption is almost the same, puts() takes more time compared to printf().
Related
What is the behavior of printf() when we supply multiple arguments to it without a format specifier?
Example:
int main()
{
printf("hello", "hi");
return 0;
}
Why does the compiler produce a warning on compilation of the above program? :
warning: too many arguments for format [-Wformat-extra-args]
If we compile the similar program below:
int main()
{
char *s1 = "hello";
char *s2 = "hi";
printf(s1, s2);
}
No warnings are produced. What is the reason for this?
Also, why do both programs output hello only, and don't also print hi?
The C 2018 standard specifies the behavior of printf in clause 7.21.6.3, in which paragraph 2 says “The printf function is equivalent to fprintf with the argument stdout interposed before the arguments to printf.”
The standard specifies the behavior of fprintf in 7.21.6.1, which tells us the second argument (the first argument of printf) is a format string and that it may contain various conversion specifications introduced by the character “%”. Thus, in printf("hello", "hi"), "hello" is a format string with no conversion specifications. In this case, paragraph 2 tells us what happens:
If the format is exhausted [fully processed] while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored.
Thus, in printf("hello", "hi"), "hi" is ignored, and "hello" is a format string that contains only ordinary characters, which are copied to the output stream per paragraph 3.
The compiler warns about printf("hello", "hi") because it is able to see that this call contains an excess argument because the format string does not contain a conversion specification for it.
Your compiler does not warn about printf(s1,s2); because it does not analyze what s1 will contain during this call. This sort of analysis is not impossible in this situation, but situations like this are rare: When a programmer uses a pointer to a string as the format string for printf, it is usually a string or pointer that is computed, constructed, or selected during program execution, and the manner of this computation is often beyond the ability of a compiler to analyze. Situations where the pointer is clearly a pointer to a fixed string are rare, since they are not frequently useful, so presumably compiler implementors have not found it valuable to implement the code necessary for the compiler to handle these situations.
tl;dr: Extra arguments to printf() are ignored.
The official C language standard (the link is to a draft of the C11 version) says the following:
§ 7.21.6.1 The fprintf function
...
... If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored. The fprintf function returns when the end of the format string is encountered.
... and printf() is simply fprintf() targeted at the standard output file.
About your two code snippets:
The compiler is giving you a hint, for the first snippet, that the number of arguments doesn't match the number of specifiers in the format string. It's just a courtesy - it's not required to notice this. This also explains why the compiler does not notice it for the second snippet. It could, but it's too much effort to chase your pointers and check what they point at.
In both cases, your format string is your first argument to printf(), i.e. "hello". That string has no format specifiers, so the printf() looks at the "hello", and understands it only needs to print that and doesn't need process any other arguments. That's whi it ignores "hi".
The first parameter of printf is the format string, because printf is about printing formatted data. To specify how to format the data, printf uses the first argument. This is different from other languages and libraries where all the parameters (like Python's print) are used in the same way and formatting is done through other means.
The first and second examples you provide are both "incorrect" although technically valid because you are passing a format string that does not need any extra argument, so "hi" is unused.
What you may want to do instead is:
printf("%s %s", "hello", "hi");
many compilers know well very well the printf function family and read compile time the format string analysing the parameters. printf("hello",s2); compiler see that there is no %... in the format string and does not expect any other parameters. Warning is issued
if you call printf(s1,s2); compiler does not know what is the content of the s1 and it cannot go through the format string and no warning issued.
Many compilers have special extension to inform them that your function is printf like and you want compiler to read the format string - gcc:
extern int
my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));
This question already has answers here:
Behaviour of printf when printing a %d without supplying variable name
(6 answers)
Closed 4 years ago.
I stumbled upon this C code in a college test and when I tested it on Dev-C++ 5.11 compiler, it printed random characters. I can't understand how or why this code works. Can someone enlighten me?
int main() {
char a[10] = "%s" ;
printf( a ) ;
}
This question has two parts: the missing quotes and the random characters.
printf() is just a function. You can pass strings and other values to functions as arguments. You don't have to use a literal. You can use both char *a = "something"; printf(a) (passing a variable as an argument) and printf("something") (passing a string literal as an argument).
printf() is also a variadic function. This means it can accept any number of arguments. You can use printf("hello world"), printf("%s", "hello world") and even printf("%s %s", "hello", "world"). Some older compilers don't verify you actually passed the right number of arguments based on the first argument which is the format string. This is why your code compiles even though it's missing an argument. When the program runs the code goes over the format string, sees "%s" and looks for the second argument to print it as a string. Since there is no second argument it basically reads random memory and you get garbage characters.
printf function signature is:
int printf(const char *format, ...);
It expects format string as the first argument and variable number of arguments that are handled and printed based on the format specifiers in the format string. variable a in your question is providing it the format string. Reason for random characters is that the argument for format specifier %s is missing. Following will correctly print a string:
printf( a, "Hello World!" );
A list of format specifiers can be seen here https://en.wikipedia.org/wiki/Printf_format_string
Why does it compile?
Because variadic arguments accepted by printf are processed at run time. Not all compilers do compile time checks for validating arguments against the format string. Even if they do they would at most throw a warning, but still compile the program.
It's using the string "%s" as a format string, and using uninitialized memory as the "data".
The only reason it does "something" is because the compiler was apparently not smart enough to recognize that the format string required one parameter and zero parameters were supplied. Or because compiler warnings were ignored and/or errors were turned off.
Just an FYI for anybody who bumps into this: "Always leave all warnings and errors enabled and fix your code until they're gone" This doesn't guarantee correct behaviour but does make "mysterious" problems less likely.
I am reading the book "The C Programming Language" by Brian Kernighan and Dennis Ritchie(2nd edition, published by PHI). In the first article 1.1 Getting started of the first chapter A Tutorial Introduction, page number 7, they say that one must use \n in the printf() argument, otherwise the C compile will produce an error message. But when I compiled the program without \n in printf(), it went fine. I did not see any error message. I am using Dev-C portable with "MinGW GCC 4.6.2 32-bit" compiler.
Why I do not get the error message?
Here is the passage in question, from page 7 of the second edition of K&R:
You must use \n to include a newline character in the printf argument; if you try something like
printf("hello, world
");
the C compiler will produce an error message.
This means that you can't embed a literal newline in a quoted string.
Either one of the lines below, however, are fine:
printf("hello, world"); /* does not print a newline */
printf("hello, world\n"); /* prints a newline */
All the text above is saying is that you can't have a quoted string that spans multiple lines in the source code.
You can also escape a newline with a backslash. The C preprocessor will remove the backslash and newline, so the following two statements are equivalent:
printf("hello, world\
");
printf("hello, world");
And if you have a lot of text, you can put multiple quoted strings next to each other, or separated by whitespace, and the compiler will join them for you:
printf("hello, world\n"
"this is a second line of text\n"
"but you still need to include backslash-n to break each line\n");
You don't get a compile-time error message because there is no error.
In the first article they say that one must use \n in the printf() argument, otherwise the C compiler will produce an error message.
Can you cite (by section and/or page number) where that statement appears? I seriously do not believe that K&R (you're using the second edition, right?) says that. If it did say that, it would be an error in the book.
Update: What the book says, quite correctly, is that a newline in a string literal is represented by the two-character sequence \n, not by an actual newline character. A string literal must be on a single logical source line; something like
printf("hello
world");
is a syntax error. This applies to all string literals, whether they're printf format strings or not.
An actual newline in a string literal is an error. A \n sequence that represents a newline is optional; its lack is not an error, but a printf format string should usually end with a \n.
There is no requirement for a printf call to include the \n character, and I've never seen a compiler complain about a printf that lacks a \n.
There is an issue here, but it's not a compile-time error.
Some examples:
printf("No newline");
This is a perfectly legal call. It prints the specified string on standard output without a newline character.
printf("hello%c", '\n');
There's no \n in the format string, but it prints hello followed by a newline. Again, this is perfectly legal.
The actual issue is that you should (almost) always print a newline at the very end of your output. This complete program:
#include <stdio.h>
int main(void) {
printf("hello");
return 0;
}
is legal, but its behavior may be undefined in some implementations. The relevant rule is in the standard, section 7.21.2 paragraph 2 (the quote is from the N1570 draft):
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined.
Whether that terminating newline character is required or not, it's (almost always) a very good idea to end your output with a newline. If I run it on my system, I get the string hello immediately followed by my shell prompt on the same line. It's not illegal, but it's inconvenient and ugly.
But that applies only at the very end of the program's output. This program is perfectly valid and has well defined behavior:
#include <stdio.h>
int main(void) {
printf("hello");
putchar('\n');
return 0;
}
Still, the easiest and most reliable way to produce clean output is for each printf call to print exactly one line, which ends with exactly one '\n' character. This isn't a universal rule; sometimes it's convenient to print a line a piece at a time, or to print two or more lines in a single printf.
Very often, if you don't end your printf format string with a \n, some of the output stays in the stdout buffer, and you need to call fflush to get all the output shown.
This means that if you don't get all the expected output you should add fflush at appropriate places (e.g. before calls to fork).
But you won't get a compiler message in such case, because it is not an error (it may be a mistake many beginners are doing). If you really wanted, you could customize your compiler (e.g. with MELT if using a recent GCC compiler) to get the warning. I believe it is not worth the effort (because there are legitimate calls to printf without any \n....)
An example of legitimate printf calls without newlines would be if you coded a (recursive) function to output an expression from its AST; you certainly should not emit a newline after each token.
See documentation of printf(3), fflush(3), stdio(3), setvbuf(3) etc...
i am learning now c and i come up with this example, where i can print a text using pointers.
#include <stdio.h>
main ()
{
char *quotes = "One good thing about music, when it hits you, you feel no pain. \"Bob Marley\"\n";
printf(quotes);
}
I get a warning from the compiler "format not a string literal and no format arguments" and when I execute the program it runs successfully.I read some other questions here that they had the same warning from the compiler but I didn't find an answer that fits me. I understood the reason why i get this message:
This warning is gcc's way of telling you that it cannot verify the format string argument to the printf style function (printf, fprintf... etc). This warning is generated when the compiler can't manually peek into the string and ensure that everything will go as you intend during runtime...
Case 3. Now this is somewhat your case. You are taking a string generated at runtime and trying to print it. The warning you are getting is the compiler warning you that there could be a format specifier in the string. Say for eg "bad%sdata". In this case, the runtime will try to access a non-existent argument to match the %s. Even worse, this could be a user trying to exploit your program (causing it to read data that is not safe to read).
(See the answer)
but what i have to add in my case to in order to have not warnings from the compiler?
Change it to printf("%s", quotes); which adds the specifier that quotes is a 'string', or array of char.
You need to tell printf what is it that you are printing. %s descriptor will tell printf that you are printing a string.
Format of printf = ("descriptor of what type of data you are printing",variable holding the data);
descriptor for strings is %s, for characters %c, for int %d
Change printf to:
printf("%s",quotes);
You have to specify format string - in simplest form:
char *quotes = "One good thing about music(...)\n";
printf("%s", quotes);
or, you can use format string to decorate output:
char *quotes = "One good thing about music(...)"; // no newline
printf("%s\n", quotes); // newline added here
or, if you don't want to mess with format strings:
char *quotes = "One good thing about music(...)"; // no newline
puts(quotes); // puts() adds newline
or
char *quotes = "One good thing about music(...)\n";
fputs(quotes,stdout);
This warning is gcc's way of telling you that it cannot verify the format string argument to the printf style function (printf, fprintf... etc). This warning is generated when the compiler can't manually peek into the string and ensure that everything will go as you intend during runtime. Lets look at a couple of examples.
So as other suggested explicitly use format specifier to tell the compiler...i.e.
printf("%s",quotes);
You are getting the warning because it is dangerous when the string you are printing contains '%'. In this line it makes no sense for percents but when you want to print this for instance:
int main ()
{
int percent = 10;
char *s = "%discount: %d\n";
printf(s, percent);
return 0;
}
your program will likely crash when printf encounters the second percent and it tries to pop a value from the stack from printf.
When you want to print a percent sign use: "%%discount:"
Try this:
#include <stdio.h>
main ()
{
char *quotes = "One good thing about music, when it hits you, you feel no pain. \"Bob Marley\"\n";
puts(quotes); //Either
printf("%s",quotes);//or
return 0;
}
Why does this print the value of the memory address at 0x08480110? I'm not sure why there are 5 %08x arguments - where does that take you up the stack?
address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
This example is taken from page 11 of this paper http://crypto.stanford.edu/cs155/papers/formatstring-1.2.pdf
I think that the paper provides its printf() examples in a somewhat confusing way because the examples use string literals for format strings, and those don't generally permit the type of vulnerability being described. The format string vulnerability as described here depends on the format string being provided by user input.
So the example:
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
Might better be presented as:
/*
* in a real program, some user input source would be copied
* into the `outstring` buffer
*/
char outstring[80] = "\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|";
printf(outstring);
Since the outstring array is an automatic, the compiler will likely put it on the stack. After copying the user input to the outstring array, it'll look like the following as 'words' on the stack (assuming little endian):
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
The compiler will put other items on the stack as it sees fit (other local variables, saved registers, whatever).
When the printf() call is about to be made, the stack might look like:
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
var1
var2
saved ECX
saved EDI
Note that I'm completely making those entries up - each compiler will use the stack in different ways (so a format string vulnerability has to be custom crafted for a particular exact scenario. In other words, you won't always use 5 dummy format specifiers like in this example - as the attacker you'd need to figure out how many dummies the particular vulnerability would need.
Now to call printf(), the argument (the address of outstring) is pushed on to the stack and printf() is called, so the argument area of the stack looks like:
outstring[0c] // etc...
outstring[08] 0x30252e78 // from "x.%0"
outstring[04] 0x3830255f // from "_%08"
outstring[00] 0x08480110 // from the ""\x10\x01\x48\x08"
var1
var2
var3
saved ECX
saved EDI
&outstring // the one real argument to `printf()`
However, printf doesn't really know anything about how many arguments have been placed on the stack for it - it goes by the format specifiers it finds in the format string (the one argument it's 'sure' to get). So printf() gets the format string argument and starts processing it. When it gets to the 1st "%08x" that will correspond to the 'saved EDI' in my example, then next "%08x" will print the
saved ECX' and so on. So the "%08x" format specifiers are just eating up data on the stack until it gets back to the string the attacker was able to input. Determining how many of those are needed is something an attacker would do by a kind of trial and error (probably by a test run that has a whole slew of "%08x" formats until he can 'see' where the format string starts).
Anyway, when printf() gets to processing the "%s" format specifier, it has consumed all the stack entries up to where the outstring buffer resides. The "%s" specifier treats its stack entry as a pointer, and the string that the user has put into that buffer has been carefully crafted to have a binary representation of 0x08480110, so printf() will print out whatever is at that address as an ASCIIZ string.
You have 6 format specifiers (5 lots of %08x and one of %s), but you do not provide values for those format specifiers. You immediately fall into the realm of undefined behaviour - anything could happen and there is no wrong answer.
However, in the normal course of events, the values passed to printf() would have been stored on the stack, so the code in printf() reads values off the stack as if the extra values had been passed. The function return address is on the stack, too. There is no guarantee that I can see that the value 0x08480110 will actually be produced. This sort of attack very much depends on the the specific program and faulty function call, and you might well get a very different value. The example code is most likely written assuming a 32-bit Intel (little-endian) CPU - rather than a 64-bit or big-endian CPU.
Adapting the code fragment, compiling it into a complete program, ignoring the compilation warnings, using a 32-bit compilation on MacOS X 10.6.7 with GCC 4.2.1 (XCode 3), the following code:
#include <stdio.h>
static void somefunc(void)
{
printf("AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.|%s|\n");
}
int main(void)
{
char buffer[160] =
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz01234";
somefunc();
return 0;
}
produces the following result:
AAAAAAAAAAAAAAAA.0x000000A0.0xBFFFF11C.0x00001EC4.0x00000000.0x00001E22.0xBFFFF1C8.0x00001E5A.|abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz01234|
As you can see, I eventually 'found' the string in the main program from the printf() statement. When I compiled it in 64-bit mode, I got a core dump instead. Both results are perfectly correct; the program invokes undefined behaviour, so anything the program does is valid. If you're curious, search for 'nasal demons' for more information on undefined behaviour.
And get used to experimenting with these sorts of issues.
Another variation
#include <stdio.h>
static void somefunc(void)
{
char format[] =
"AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n";
printf(format, 1);
}
int main(void)
{
char buffer[160] =
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz012345"
"abcdefghijklmnopqrstuvwxyz01234";
somefunc();
return 0;
}
This produces:
AAAAAAAAAAAAAAAA.0x00000001.0x00000099.0x8FE467B4.0x41000024.0x41414141
.0x41414141.0x41414141.0x2E414141.0x30257830.0x302E5838.0x38302578.0x78302E58
.0x58383025.0x2578302E.0x2E583830.0x30257830.0x2E0A5838.0x30257830.0x302E5838
You might recognize the format string in the hex output - 0x41 is capital A, for example.
The 64-bit output from that code is both similar and different:
AAAAAAAAAAAAAAAA.0x00000001.0x00000000.0x00000000.0xFFE0082C.0x00000000
.0x41414141.0x41414141.0x2578302E.0x30257830.0x38302578.0x58383025.0x0A583830
.0x2E583830.0x302E5838.0x78302E58.0x2578302E.0x30257830.0x38302578.0x38302578
You misunderstood the paper.
The text you linked is assuming that the current position on the stack is 0x08480110 (look at the surrounding text). The printf() will dump data from wherever on the stack you happen to be.
The \x10\x01\x48\x08 at the beginning of the format string is merely to print the (assumed) address to stdout in front of the dumped data. In no way do these numbers modify the address from which the data is dumped.
You're correct about "take you up the stack", but only barely; it relies on the assumption that arguments are passed on the stack, rather than in registers. (Which, for a variadic function is probably a safe assumption, but still an assumption about implementation details.)
Each %08x asks for the 'next unsigned int argument' to be printed in hex; what actually occurs in that 'next argument' location is both architecture and compiler dependent. If you compare the values you get with /proc/self/maps for the process, you might be able to narrow down what some of the numbers mean.