What if we just give format string in printf statement in c like:
printf("%d, %d, %d",a, b);
What does the third %d give In answer?
I did it but not able to understand the output of the code.
The third %d gives in answer undefined behavior because there is no corresponding argument.
From the C Standard (7.21.6.1 The fprintf function)
9 If a conversion specification is invalid, the behavior is
undefined.275) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
Pay attention to that the name of the standard function is printf not Printf.
In C, functions that take variadic arguments (i.e. the ... parameter) have no way of knowing beforehand the number or type of the arguments. They must be kept track of in some way. A separate length parameter is one way, but the printf-family functions use the number of format specifiers in the format string to keep track.
If you tell the function "hey, there's a third parameter" when you don't pass one, this is undefined behavior. Anything can happen. It may appear to not print anything. It may read a garbage value from the memory location or the register where it expects to find the value. It may crash.
Reasoning about what might happen when your code invokes undefined behavior is a waste of time. Just make sure your code is free of it.
You've told printf to expect 3 additional int arguments, but only passed 2. It's going to look for that third int argument somewhere, and depending on how printf is implemented on your system, you may get a runtime error, or you may see garbage output, or you may see no output, or something entirely different may happen.
Officially, the behavior is left undefined - neither the compiler nor the runtime environment the program is executing in are required to handle the situation in any particular way. The result isn't guaranteed to be predictable or repeatable.
There is no "should" here - any result is "correct" as far as the language definition is concerned.
Related
A. printf("Values: X=%s Y=%s\n", x,y,z);
B. printf("Values: x=%s, Y=%s\n", x);
Both of the above printf() statements are incorrect: one has extra parameters, other has fewer parameters. I would like to choose between the lesser evil with an explanation. Can a modern C compiler help catch such problems? If yes, how does printf() implementor need to assist the compiler?
Both of the above printf() statements are incorrect: one has extra parameters, other has fewer parameters.
The first one is not incorrect according to the C standard. The rules for function calls in general, in C 2018 6.5.2.2, do not make it an error to pass unused arguments for a ... in the function prototype. For printf specifically, C 2018 7.21.6.1 2 (about fprintf, which the specification for printf refers to) says extra arguments are harmless:
… If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored…
Certainly if a programmer writes printf("Values: X=%s. Y=%s.\n", x, y, z);, they might have made a mistake, and a compiler would be reasonable in pointing out this possibility. However, consider code such as:
printf(ComputedFormat, x, y, z);
Here it is reasonable that we wish to print different numbers of values in different circumstances, and the ComputedFormat reflects this. It would be tedious to write code for each case and dispatch to them with a switch statement. It is simpler to write one call and let the computed format determine how many values are printed. So it is not always an error to have more arguments than the conversion specifications use.
I would like to choose between the lesser evil with an explanation.
The behavior of the latter code is not defined by the C standard. C 2018 7.21.6.1 2 also says:
… If there are insufficient arguments for the format, the behavior is undefined…
Thus, no behavior may be relied on from the latter code, unless there is some guarantee from the C implementation.
Can a modern C compiler help catch such problems?
Good modern C compilers have information about the specification of printf and, when the format argument is a string literal, they compare the number and types of the arguments to the conversion specifications in the string.
If yes, how does printf() implementor need to assist the compiler?
The implementor of printf does not need to do anything except conform to the specification of printf in the C standard. The aid described above is performed by the C compiler with reference to the C standard; it does not rely on features of the particular printf implementation.
In some platforms, information about the number of arguments passed is provided to the called routine. In such platforms, a printf implementor could check whether too few arguments are provided and signal an error in some method.
Eric Postpischil has already made a great answer that uses the most reliable source (the C standard), but I just want to post my own answer about why printf may behave as it does in both cases.
printf is a variadic function which can take a variable number of arguments. The way it knows how many you have passed is solely through the format string; every time it finds a format specifier, it takes the next argument out of the list (and assumes its type from which specifier has been used). Nothing would really happen to any extra arguments because since there is no specifier for them, the function will not even try to take them and they will not be printed. So you may be warned about the extra arguments by the compiler, but the behavior in the first example is well-defined.
The second, on the other hand, is definitely undefined behavior. Since there are not enough arguments to match the number of format specifiers in the string, eventually when it finds the second %s, it will try to take the next variadic argument, but the issue is that you haven't passed any. When this happens for me, it prints some garbage value in place of the format specifier that doesn't look too nice. Anything could happen in undefined behavior though. In this case, the function seems to try to take the next variadic argument from a CPU register / the stack (memory) and fetches some garbage value that happened to be there (though again, anything could happen with undefined behavior).
So in short:
printf("%s\n", "Hello", "World");
| | ^^^^^^^ Ignored
-------
and
printf("%s\n"); ?
| |
----------
According to my knowledge and some threads like this, if you want to print strings in C you have to do something like this:
printf("%s some text", value);
And the value will be displayed instead of %s.
I wrote this code:
char password[] = "default";
printf("Enter name: \n");
scanf("%s", password);
printf("%s is your password", password); // All good - the print is as expected
But I noticed that I can do the exact same thing without the value part and it will still work:
printf("%s is your password");
So my question is why does the %s placeholder get a value without me giving it one, and how does it know what value to give it?
This is undefined behavior, anything can happen included something that looks like correct. But it is incorrect.
Your compiler can probably tell you the problem if you use correct options.
Standard says (emphasized is mine):
7.21.6.1 The fprintf function
The fprintf function writes output to the stream pointed to by stream,
under control of the string pointed to by format that specifies how
subsequent arguments are converted for output. If there are
insufficient arguments for the format, the behavior is undefined. If
the format is exhausted while arguments remain, the excess arguments
are evaluated (as always) but are otherwise ignored. The fprintf
function returns when the end of the format string is encountered.
The printf() function uses a C language feature that lets you pass a variable number of arguments to a function. (Technically called 'variadic functions' - https://en.cppreference.com/w/c/variadic - I'll just say 'varargs' for short.)
When a function is called in C, the arguments to the function are pushed onto the stack(*) - but the design of the varargs feature provides no way for the called function to know how many parameters were passed in.
When the printf() function executes, it scans the format string, and the %s tells it to look for a string in the next position in the variable argument list. Since there are no more arguments in the list, the code 'walks off the end of the array' and grabs the next thing it sees in memory. I suspect what's happening is that the next location in memory still has the address of password from your prior call to scanf, and since that address points to a string, and you told printf to print a string, you got lucky, and it worked.
Try putting another function call (for example: printf("%s %s %s\n","X","Y","Z") in between your call to scanf("%s", password); and printf("%s is your password"); and you will almost certainly see different behavior.
Free Advice: C has a lot of sharp corners and undefined bits, but a good compiler (and static analysis or 'lint' tool) can warn you about a lot of common errors. If you are going to work in C, learn how to crank your compiler warnings to the max, learn what all the errors and warnings mean (as they happen, not all at once!) and force yourself to write C code that compiles without any warnings. It will save you a lot of unnecessary hassle.
(*) generalizing here for simplicity - sometimes arguments can be passed in registers, sometimes things are inlined, blah blah blah.
So, there are a lot of posts telling that you shouldn't do printf("%s is your password");, and that you were just lucky. I guess from your question that you somewhat knew that. But few are telling you the probable reason for why you were lucky.
To understand what probably happened, we have to understand how function parameters are passed. The caller of a function must put the parameters on an agreed upon place for the function to find the parameters. So for parameters 1...N we call these places r1 ... rN. (This kind of agreement is part of something we call a "Function Calling Convention")
That means that this code:
scanf("%s", password);
printf("%s is your password",password);
may be turned into this pseudo-code by the compiler
r1="%s";
r2=password;
call scanf;
r1="%s is your password";
r2=password;
call printf;
If you now remove the second parameter from the printf call, your pseudo-code will look like this:
r1="%s";
r2=password;
call scanf;
r1="%s is your password";
call printf;
Be aware that after call scanf;, r2 might be unmodified and still be set to password, therefore call printf; "works"
You might think that you have discovered a new way to optimize code, by eliminating one of the r2=password; assignments. This might be true for old "dumb" compilers, but not for modern ones.
Modern compilers will already do this when it is safe. And it is not always safe. Reasons for why it isn't safe might be thatscanf and printf have different calling conventions, r2 might have been modified behind your back, etc..
To better get a feeling of what the compiler is doing, I recommend to look at the assembler output from your compiler, at different optimization levels.
And please, always compile with -Wall. The compiler is often good at telling you when you are doing dumb stuff.
The question is plain and simple, s is a string, I suddenly got the idea to try to use printf(s) to see if it would work and I got a warning in one case and none in the other.
char* s = "abcdefghij\n";
printf(s);
// Warning raised with gcc -std=c11:
// format not a string literal and no format arguments [-Wformat-security]
// On the other hand, if I use
char* s = "abc %d efg\n";
printf(s, 99);
// I get no warning whatsoever, why is that?
// Update, I've tested this:
char* s = "random %d string\n";
printf(s, 99, 50);
// Results: no warning, output "random 99 string".
So what's the underlying difference between printf(s) and printf("%s", s) and why do I get a warning in just one case?
In the first case, the non-literal format string could perhaps come from user code or user-supplied (run-time) data, in which case it might contain %s or other conversion specifications, for which you've not passed the data. This can lead to all sorts of reading problems (and writing problems if the string includes %n — see printf() or your C library's manual pages).
In the second case, the format string controls the output and it doesn't matter whether any string to be printed contains conversion specifications or not (though the code shown prints an integer, not a string). The compiler (GCC or Clang is used in the question) assumes that because there are arguments after the (non-literal) format string, the programmer knows what they're up to.
The first is a 'format string' vulnerability. You can search for more information on the topic.
GCC knows that most times the single argument printf() with a non-literal format string is an invitation to trouble. You could use puts() or fputs() instead. It is sufficiently dangerous that GCC generates the warnings with the minimum of provocation.
The more general problem of a non-literal format string can also be problematic if you are not careful — but extremely useful assuming you are careful. You have to work harder to get GCC to complain: it requires both -Wformat and -Wformat-nonliteral to get the complaint.
From the comments:
So ignoring the warning, as if I really know what I am doing and there will be no errors, is one or another more efficient to use or are they the same? Considering both space and time.
Of your three printf() statements, given the tight context that the variable s is as assigned immediately above the call, there is no actual problem. But you could use puts(s) if you omitted the newline from the string or fputs(s, stdout) as it is and get the same result, without the overhead of printf() parsing the entire string to find out that it is all simple characters to be printed.
The second printf() statement is also safe as written; the format string matches the data passed. There is no significant difference between that and simply passing the format string as a literal — except that the compiler can do more checking if the format string is a literal. The run-time result is the same.
The third printf() passes more data arguments than the format string needs, but that is benign. It isn't ideal, though. Again, the compiler can check better if the format string is a literal, but the run-time effect is practically the same.
From the printf() specification linked to at the top:
Each of these functions converts, formats, and prints its arguments under control of the format. The format is a character string, beginning and ending in its initial shift state, if any. The format is composed of zero or more directives: ordinary characters, which are simply copied to the output stream, and conversion specifications, each of which shall result in the fetching of zero or more arguments. The results are undefined if there are insufficient arguments for the format. If the format is exhausted while arguments remain, the excess arguments shall be evaluated but are otherwise ignored.
In all these cases, there is no strong indication of why the format string is not a literal. However, one reason for wanting a non-literal format string might be that sometimes you print the floating point numbers in %f notation and sometimes in %e notation, and you need to choose which at run-time. (If it is simply based on value, %g might be appropriate, but there are times when you want the explicit control — always %e or always %f.)
The warning says it all.
First, to discuss about the issue, as per the signature, the first parameter to printf() is a format string which can contain format specifiers (conversion specifier). In case, a string contains a format specifier and the corresponding argument is not supplied, it invokes undefined behavior.
So, a cleaner (or safer) approach (of printing a string which needs no format specification) would be puts(s); over printf(s); (the former does not process s for any conversion specifiers, removing the reason for the possible UB in the later case). You can choose fputs(), if you're worried about the ending newline that automatically gets added in puts().
That said, regarding the warning option, -Wformat-security from the online gcc manual
At present, this warns about calls to printf and scanf functions where the format string is not a string literal and there are no format arguments, as in printf (foo);. This may be a security hole if the format string came from untrusted input and contains %n.
In your first case, there's only one argument supplied to printf(), which is not a string literal, rather a variable, which can be very well generated/ populated at run time, and if that contains unexpected format specifiers, it may invoke UB. Compiler has no way to check for the presence of any format specifier in that. That is the security problem there.
In the second case, the accompanying argument is supplied, the format specifier is not the only argument passed to printf(), so the first argument need not to be verified. Hence the warning is not there.
Update:
Regarding the third one, with excess argument that required by the supplied format string
printf(s, 99, 50);
quoting from C11, chapter §7.21.6.1
[...] If the format is exhausted while arguments remain, the excess arguments are
evaluated (as always) but are otherwise ignored. [...]
So, passing excess argument is not a problem (from the compiler perspective) at all and it is well defined. NO scope for any warning there.
There are two things in play in your question.
The first is covered succinctly by Jonathan Leffler - the warning you're getting is because the string isn't literal and doesn't have any format specifiers in it.
The other is the mystery of why the compiler doesn't issue a warning that your number of arguments doesn't match the number of specifiers. The short answer is "because it doesn't," but more specifically, printf is a variadic function. It takes any number of arguments after the initial format specification - from 0 on up. The compiler can't check to see if you gave the right amount; that's up to the printf function itself, and leads to the undefined behavior that Joachim mentioned in comments.
EDIT:
I'm going to give further answer to your question, as a means of getting on a small soapbox.
What's the difference between printf(s) and printf("%s", s)? Simple - in the latter, you're using printf as it's declared. "%s" is a const char *, and it will subsequently not generate the warning message.
In your comments to other answers, you mentioned "Ignoring the warning...". Don't do this. Warnings exist for a reason, and should be resolved (otherwise they're just noise, and you'll miss warnings that actually matter among the cruft of all the ones that don't.)
Your issue can be resolved in several ways.
const char* s = "abcdefghij\n";
printf(s);
will resolve the warning, because you're now using a const pointer, and there are none of the dangers that Jonathan mentioned. (You could also declare it as const char* const s, but don't have to. The first const is important, because it then matches the declaration of printf, and because const char* s means that characters pointed to by s can't change, i.e. the string is a literal.)
Or, even simpler, just do:
printf("abcdefghij\n");
This is implicitly a const pointer, and also not a problem.
So what's the underlying difference between printf(s) and printf("%s", s)
"printf(s)" will treat s as a format string. If s contains format specifiers then printf will interpret them and go looking for varargs. Since no varargs actually exist this will likely trigger undefined behaviour.
If an attacker controls "s" then this is likely to be a security hole.
printf("%s",s) will just print what is in the string.
and why do I get a warning in just one case?
Warnings are a balance between catching dangerous stupidity and not creating too much noise.
C programmers are in the habbit of using printf and various printf like functions* as generic print functions even when they don't actually need formatting. In this environment it's easy for someone to make the mistake of writing printf(s) without thinking about where s came from. Since formatting is pretty useless without any data to format printf(s) has little legitimate use.
printf(s,format,arguments) on the other hand indicates that the programmer deliberately intended formatting to take place.
Afaict this warning is not turned on by default in upstream gcc, but some distros are turning it on as part of their efforts to reduce security holes.
* Both standard C functions like sprintf and fprintf and functions in third party libraries.
The underlying reason: printf is declared like:
int printf(const char *fmt, ...) __attribute__ ((format(printf, 1, 2)));
This tells gcc that printf is a function with a printf-style interface where the format string comes first. IMHO it must be literal; I don't think there's a way to tell the good compiler that s is actually a pointer to a literal string it had seen before.
Read more about __attribute__ here.
I take out the age variable from the printf() call just to see what happens. I then compile it with make. It seems it only throws warning about more % conversions than data arguments and unused age variable but no compile error. I then run the executable file and it does run. Only every time I run it, it returns different random integer. I'm wondering what causes this behavior?
#include <stdio.h>
int main(int argc, char *arg[]) {
int age = 10;
int height = 72;
printf("I'm %d years old\n");
printf("I'm %d inches tall\n", height);
return 0;
}
As per the printf() specification, if there are insufficient number of arguments for the required format specifier, it invokes undefined behavior.
So, your code
printf("I'm %d years old\n");
which is missing the required argument for %d, invokes UB and not guaranteed to produce any valid result.
Cross reference, C11 standard, chapter §7.21.6.1
[..] If there are insufficient arguments for the format, the behavior is
undefined. [..]
According to the C Standard (7.21.6.1 The fprintf function - the same is valid for printf)
...If there are insufficient arguments for the format, the behavior is undefined. If the format is exhausted while arguments
remain, the excess arguments are evaluated (as always) but are
otherwise ignored.
The printf using cdecl, which using stack arguments. If you implied to the function that you are using one argument, it will be pulled out of the runtime stack, and if you didn't put there your number, the place will probably contain some garbage data. So the argument which will be printed is some arbitrary data.
With only one exception I know of, the C Standard imposes no requirements with regard to any action which in some plausible implementations might be usefully trapped. It is not hard to imagine a C compiler passing a variadic function like printf an indication of what arguments it has passed, nor would it be hard to an implementer thinking that it could be useful to have the compiler trigger a trap if code tries to retrieve a variadic parameters of some type when the corresponding argument is some other type or doesn't exist at all. Because it could be useful to have compilers trap in such cases, and because the behavior of such a trap would be outside the jurisdiction of the Standard, the Standard imposes no requirements about what may or may not happen when a variadic function tries to receive arguments which weren't passed to it.
In practice, rather than letting variadic functions know how many arguments they've received, most compilers simply have conventions which describe a relationship between the location of the non-variadic argument and the locations of subsequent variadic arguments. The generated code won't know whether a function has received e.g. two arguments of type int, but it will know that each such argument, if it exists, will be stored in a certain place. On such a compiler, using excess format specifiers will generally result in the generated code looking at the places where additional arguments would have been stored had they existed. In many cases, this location will have been used for some other purpose and then abandoned, and may hold the last value stored there for that purpose, but there is generally no reason to expect anything in particular about the contents of abandoned memory.
What will be the output of printf("%d"); or printf("%p"); statement?
Of course I know that I should pass argument as printf is expecting one but assuming that I will leave this empty what will happen?
I know that this will print some value read from stack (from the place where function argument should be placed). Assuming that I am running Linux machine can I expect that this will be some valid value (e.g. function return address)?
This is simply undefined behaviour. Anything could happen. It's impossible to give a more accurate answer.
The details depend on how printf is implemented by the library, and how variable arguments are implemented by your compiler. Look at the source of the library and/or the generated assembly to find out what's happening on your platform.
This invokes undefined behavior. By its very nature, this means you can't assume anything about what will happen.
This provokes undefinded behaviour. You might get printed out a random integer or a crash or ...
Undefined behavior. Which means your program can for example crash.
(C99, 7.19.6.1p2) "If there are insufficient arguments for the format, the behavior is
undefined."
can I expect that this will be some valid value? : NO.