Any C programmer who's been working for more than a week has encountered crashes that result from calling printf with more format specifiers than actual arguments, e.g.:
printf("Gonna %s and %s, %s!", "crash", "burn");
However, are there any similar bad things that can happen when you pass too many arguments to printf?
printf("Gonna %s and %s!", "crash", "burn", "dude");
My knowledge of x86/x64 assembly leads me to believe that this is harmless, though I'm not convinced that there's not some edge condition I'm missing, and I have no idea about other architectures. Is this condition guaranteed to be harmless, or is there a potentially crash-inducing pitfall here, too?
Online C Draft Standard (n1256), section 7.19.6.1, paragraph 2:
The fprintf function writes output to the stream pointed to by stream, under control of the string pointed to by format that specifies how subsequent arguments are
converted for output. If there are insufficient arguments for the format, the behavior is
undefined. If the format is exhausted while arguments remain, the excess arguments are
evaluated (as always) but are otherwise ignored. The fprintf function returns when
the end of the format string is encountered.
Behavior for all the other *printf() functions is the same wrt excess arguments except for vprintf() (obviously).
You probably know the prototype for the printf function as something like this
int printf(const char *format, ...);
A more complete version of that would actually be
int __cdecl printf(const char *format, ...);
The __cdecl defines the "calling convention" which, along with other things, describes how arguments are handled. In the this case it means that args are pushed onto the stack and that the stack is cleaned by the function making the call.
One alternative to _cdecl is __stdcall, there are others. With __stdcall the convention is that arguments are pushed onto the stack and cleaned by the function that is called. However, as far as I know, it isn't possible for a __stdcall function to accept a variable number of arguments. That makes sense since it wouldn't know how much stack to clean.
The long and the short of it is that in the case of __cdecl functions its safe to pass however many args you want, since the cleanup is performed in the code makeing the call. If you were to somehow pass too many arguments to a __stdcall function it result in a corruption of the stack. One example of where this could happen is if you had the wrong prototype.
More information on calling conventions can be found on Wikipedia here.
All the arguments will be pushed on the stack and removed if the stack frame is removed. this behaviour is independend from a specific processor. (I only remember a mainframe which had no stack, designed in 70s) So, yes the second example wont't fail.
printf is designed to accept any number of arguments. printf then reads the format specifier (first argument), and pulls arguments from the argument list as needed. This is why too few arguments crash: the code simply starts using non-existent arguments, accessing memory that doesn't exist, or some other bad thing. But with too many arguments, the extra arguments will simply be ignored. The format specifier will use fewer arguments than have been passed in.
Comment: both gcc and clang produce warnings:
$ clang main.c
main.c:4:29: warning: more '%' conversions than data arguments [-Wformat]
printf("Gonna %s and %s, %s!", "crash", "burn");
~^
main.c:5:47: warning: data argument not used by format string
[-Wformat-extra-args]
printf("Gonna %s and %s!", "crash", "burn", "dude");
~~~~~~~~~~~~~~~~~~ ^
2 warnings generated.
Related
I'm writing a compiler and I want to implement the type check in printf:
printf("%f", i);
warning: format specifies type 'double' but the argument has type 'int' [-Wformat]
printf("%f", 1);
~~ ^~
%d
As you can see, gcc knows what does %f mean, and try to warn me about the type mismatching.
So how I can implement this?
P.S. Is there any chance there are no mysteries, just a special rule gcc write for printf?
It's a combination of two special rules, described in the documentation of -Wformat.
The printf function is built in, which allows GCC both to optimize it and to warn about its misuse. About the specific case of printf, there's a note in the documentation:
In addition, when a function is recognized as a built-in function, GCC may use information about that function to warn about problems with calls to that function, or to generate more efficient code, even if the resulting code still contains calls to that function. For example, warnings are given with -Wformat for bad calls to printf when printf is built in and strlen is known not to modify global memory.
In addition, you can declare your own functions as printf-like for warning purposes with the format attribute.
__attribute__((__format__(__printf__, 2, 3))) /*printf-like function, the format string is parameter 2, the first element is parameter 3*/
int myprintf(int stuff, const char *format, ...);
With a built-in function, GCC tries to replace the function call by something more efficient. For printf, this generally means that when the format argument is a string literal, GCC replaces it by a series of calls to print the individual elements. For example it's likely to compile printf("%d %s\n", x, s); as if the program contained __some_internal_function_to_print_a_dcimal_integer__(x); putc(' '); puts(s);. While the compiler is performing this optimization, it'll notice any mismatch between the format string and the argument types and warn accordingly. If the function isn't built in but has a format attribute, you just get the warnings.
I have read the post sprintf format specifier replace by nothing, and others related, but have not seen this addressed specifically.
Until today, I have never seen sprintf used with only 2 arguments.
The prototype my system uses for sprintf() is:
int sprintf (char Target_String[], const char Format_String[], ...);
While working with some legacy code I ran across this: (simplified for illustration)
char toStr[30];
char fromStr[]={"this is the in string"};
sprintf(toStr, fromStr);
My interpretation of the prototype is that the second argument should be comprised of a const char[], and accepting standard ansi C format specifiers such as these.
But the above example seems to work just fine with the string fromStr as the 2nd argument.
Is it purely by undefined behavior that this works?, or is this usage perfectly legal?
I a working on Windows 7, using a C99 compiler.
Perfectly legal. The variadic arguments are optional.
In this case the printf serves as strcpy but parses the fmt string for % specifiers.
I'd write sprintf(toStr,"%s",fromStr); so it doesn't have to parse that long string.
The behavior you are observing is correct, a format string is not required to have any conversion specifiers. In this case the variable-length argument list, represented by ..., has length of zero. This is perfectly legal, although it's definitely less efficient than its equivalent
strcpy(toStr, fromStr);
It's perfectly legal code, but
If you just want to copy a string, use strcpy() instead.
If you are working with user input, you could be making yourself vulnerable to a format string attack.
Synopsis for sprintf is:
int sprintf(char *str, const char *format, ...);
That means 2 arguments are legal option.
It works because you have no further parameter (ie no control format %) to print.
It's no difference than printf without second parameter:
int printf ( const char * format, ... );
It also works if you don't have any second parameter:
printf(fromStr);
the second argument should be comprised of a const char[]
A const specifier of a function argument guarantees that the function does not change the value of that argument (given it can change it which is the case on arrays because they are passed by address to the function). It does not require that a const value to be used on the actual call.
The code you posted do not use a const string as the second argument to sprintf() but the conversion from non-const to const is implicit; there is no need to worry there.
accepting standard ansi C format specifiers
"accepting" does not mean "requiring". The format string you specified does not contain any format specifier. Accordingly, the function is called with only 2 arguments (no values to format). A third argument would be ignored by sprinf() anyway and many modern compilers would issue an warning about it.
Update: I don't want to start a debate about which compilers are modern and which are not.
It happens that I'm using the default compiler on OSX 10.11 and this what it outputs:
axiac: ~/test$ cc -v
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.3.0
Thread model: posix
axiac: ~/test$ cc -o 1 1.c
1.c:8:25: warning: data argument not used by format string [-Wformat-extra-args]
sprintf(x, "abc\n", n);
~~~~~~~ ^
I have used the gets() function in my program for getting string from user.
When I check gets() with multiple arguments I was shocked.
The gets() takes many number of arguments, but I don't know then number of arguments taken by gets(), and what is the actual use of these all arguments.
void main()
{
char str[10];
printf("Enter the String...:");
gets(str,5,5,5,5,5);
puts(str);
}
The code has no error but it will display the same argument which it is given as input.
input String : This is a Tesing.
output String : This is a Tesing.
gets() takes only one argument.
Probably what happens is that because you didn't include <stdio.h>, the compiler has no idea what the prototype of it, didn't find the compilation error, it happened to work.
The correct form of the whole program should be (even though I'm still using gets()):
#include <stdio.h>
int main() {
char str[10];
printf("Enter the String...:");
gets(str,5,5,5,5,5);
puts(str);
}
When I tested under GCC, it pops an error:
error: too many arguments to function 'gets'
And don't use gets(), it's dangerous and has been removed in C11. Use fgets() instead:
fgets(str, sizeof(str), stdin);
EDIT: thanks for #abelenky's answer and #chux's comment, I confirmed my guess.
In C11 6.5.2.2 Function calls subsection 2 (in Constraints):
**If the expression that denotes the called function has a type that includes a prototype, the number of arguments shall agree with the number of parameters. **Each argument shall
have a type such that its value may be assigned to an object with the unqualified version
of the type of its corresponding parameter.
In subsection 6 (in Semantics):
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the
behavior is undefined. ...
So what happened is, without the header stdio.h, the compiler doesn't know the prototype of gets(), the behavior is undefined, following subsection 6 above.
With the header, the compiler knows the prototype, according to subsection 2 above, it's required to generate diagnostic message as it's a constraint.
In the C language, you are allowed to pass as many arguments to a function as you want.
(as an example, see printf, which can take an arbitrary number of arguments)
That does not mean that the function you call will use those arguments at all.
Each function will only process the arguments it is documented to process.
Extra arguments are ignored.
So, go ahead, and pass even more arguments to gets. It won't change anything.
gets will still only use the first argument, as it is documented to.
As you can see from the code snippet below, I have declared one char variable and one int variable. When the code gets compiled, it must identify the data types of variables str and i.
Why do I need to tell again during scanning my variable that it's a string or integer variable by specifying %s or %d to scanf? Isn't the compiler mature enough to identify that when I declared my variables?
#include <stdio.h>
int main ()
{
char str [80];
int i;
printf ("Enter your family name: ");
scanf ("%s",str);
printf ("Enter your age: ");
scanf ("%d",&i);
return 0;
}
Because there's no portable way for a variable argument functions like scanf and printf to know the types of the variable arguments, not even how many arguments are passed.
See C FAQ: How can I discover how many arguments a function was actually called with?
This is the reason there must be at least one fixed argument to determine the number, and maybe the types, of the variable arguments. And this argument (the standard calls it parmN, see C11(ISO/IEC 9899:201x) §7.16 Variable arguments ) plays this special role, and will be passed to the macro va_start. In another word, you can't have a function with a prototype like this in standard C:
void foo(...);
The reason why the compiler can not provide the necessary information is simply, because the compiler is not involved here. The prototype of the functions doesn't specify the types, because these functions have variable types. So the actual data types are not determined at compile time, but at runtime.
The function then takes one argument from the stack, after the other. These values don't have any type information associated with it, so the only way, the function knows how to interpret the data is, by using the caller provided information, which is the format string.
The functions themselves don't know which data types are passed in, nor do they know the number of arguments passed, so there is no way that printf can decide this on it's own.
In C++ you can use operator overloading, but this is an entire different mechanism. Because here the compiler chooses the appropriate function based on the datatypes and available overloaded function.
To illustrate this, printf, when compiled looks like this:
push value1
...
push valueN
push format_string
call _printf
And the prototype of printf is this:
int printf ( const char * format, ... );
So there is no type information carried over, except what is provided in the format string.
printf is not an intrinsic function. It's not part of the C language per se. All the compiler does is generate code to call printf, passing whatever parameters. Now, because C does not provide reflection as a mechanism to figure out type information at run time, the programmer has to explicitly provide the needed info.
Compiler may be smart, but functions printf or scanf are stupid - they do not know what is the type of the parameter do you pass for every call. This is why you need to pass %s or %d every time.
The first parameter is a format string. If you're printing a decimal number, it may look like:
"%d" (decimal number)
"%5d" (decimal number padded to width 5 with spaces)
"%05d" (decimal number padded to width 5 with zeros)
"%+d" (decimal number, always with a sign)
"Value: %d\n" (some content before/after the number)
etc, see for example Format placeholders on Wikipedia to have an idea what format strings can contain.
Also there can be more than one parameter here:
"%s - %d" (a string, then some content, then a number)
Isn't the compiler matured enough to identify that when I declared my
variable?
No.
You're using a language specified decades ago. Don't expect modern design aesthetics from C, because it's not a modern language. Modern languages will tend to trade a small amount of efficiency in compilation, interpretation or execution for an improvement in usability or clarity. C hails from a time when computer processing time was expensive and in highly limited supply, and its design reflects this.
It's also why C and C++ remain the languages of choice when you really, really care about being fast, efficient or close to the metal.
scanf as prototype int scanf ( const char * format, ... ); says stores given data according to the parameter format into the locations pointed by the additional arguments.
It is not related with compiler, it is all about syntax defined for scanf.Parameter format is required to let scanf know about the size to reserve for data to be entered.
GCC (and possibly other C compilers) keep track of argument types, at least in some situations. But the language is not designed that way.
The printf function is an ordinary function which accepts variable arguments. Variable arguments require some kind of run-time-type identification scheme, but in the C language, values do not carry any run time type information. (Of course, C programmers can create run-time-typing schemes using structures or bit manipulation tricks, but these are not integrated into the language.)
When we develop a function like this:
void foo(int a, int b, ...);
we can pass "any" number of additional arguments after the second one, and it is up to us to determine how many there are and what are their types using some sort of protocol which is outside of the function passing mechanism.
For instance if we call this function like this:
foo(1, 2, 3.0);
foo(1, 2, "abc");
there is no way that the callee can distinguish the cases. There are just some bits in a parameter passing area, and we have no idea whether they represent a pointer to character data or a floating point number.
The possibilities for communicating this type of information are numerous. For example in POSIX, the exec family of functions use variable arguments which have all the same type, char *, and a null pointer is used to indicate the end of the list:
#include <stdarg.h>
void my_exec(char *progname, ...)
{
va_list variable_args;
va_start (variable_args, progname);
for (;;) {
char *arg = va_arg(variable_args, char *);
if (arg == 0)
break;
/* process arg */
}
va_end(variable_args);
/*...*/
}
If the caller forgets to pass a null pointer terminator, the behavior will be undefined because the function will keep invoking va_arg after it has consumed all of the arguments. Our my_exec function has to be called like this:
my_exec("foo", "bar", "xyzzy", (char *) 0);
The cast on the 0 is required because there is no context for it to be interpreted as a null pointer constant: the compiler has no idea that the intended type for that argument is a pointer type. Furthermore (void *) 0 isn't correct because it will simply be passed as the void * type and not char *, though the two are almost certainly compatible at the binary level so it will work in practice. A common mistake with that type of exec function is this:
my_exec("foo", "bar", "xyzzy", NULL);
where the compiler's NULL happens to be defined as 0 without any (void *) cast.
Another possible scheme is to require the caller to pass down a number which indicates how many arguments there are. Of course, that number could be incorrect.
In the case of printf, the format string describes the argument list. The function parses it and extracts the arguments accordingly.
As mentioned at the outset, some compilers, notably the GNU C Compiler, can parse format strings at compile time and perform static type checking against the number and types of arguments.
However, note that a format string can be other than a literal, and may be computed at run
time, which is impervious to such type checking schemes. Fictitious example:
char *fmt_string = message_lookup(current_language, message_code);
/* no type checking from gcc in this case: fmt_string could have
four conversion specifiers, or ones not matching the types of
arg1, arg2, arg3, without generating any diagnostic. */
snprintf(buffer, sizeof buffer, fmt_string, arg1, arg2, arg3);
It is because this is the only way to tell the functions (like printf scanf) that which type of value you are passing. for example-
int main()
{
int i=22;
printf("%c",i);
return 0;
}
this code will print character not integer 22. because you have told the printf function to treat the variable as char.
printf and scanf are I/O functions that are designed and defined in a way to receive a control string and a list of arguments.
The functions does not know the type of parameter passed to it , and Compiler also cant pass this information to it.
Because in the printf you're not specifying data type, you're specifying data format. This is an important distinction in any language, and it's doubly important in C.
When you scan in a string with with %s, you're not saying "parse a string input for my string variable." You can't say that in C because C doesn't have a string type. The closest thing C has to a string variable is a fixed-size character array that happens to contain a characters representing a string, with the end of string indicated by a null character. So what you're really saying is "here's an array to hold the string, I promise it's big enough for the string input I want you to parse."
Primitive? Of course. C was invented over 40 years ago, when a typical machine had at most 64K of RAM. In such an environment, conserving RAM had a higher priority than sophisticated string manipulation.
Still, the %s scanner persists in more advanced programming environments, where there are string data types. Because it's about scanning, not typing.
I want to know how scanf function is implemented. (Just for fun of course) Number of arguments is variable, so it's certainly implemented by va_list, va_arg macros.
It also throws some warnings, when number of arguments does not match with format string. This could be done by parsing format string and comparing it with number of arguments. No magic.
The only thing that I can't see how implemented, is type checking. When type of an argument (pointer to a data) does not match corresponding declaration in format literal, scanf produces a warning. How can one check type of data that a pointer points to?
Example:
#include<stdio.h>
int main()
{
char buffer1[32], buffer2[32];
int n;
double x;
scanf("%s %s %d",buffer1, buffer2, &x); // warning
scanf("%s %s %d",buffer1, buffer2, &n); // ok
}
Output:
warning: format ‘%d’ expects argument of type ‘int *’,
but argument 4 has type ‘double *’ [-Wformat]
AFAIK the C library is not a part of C language/Compiler, so there is nothing language-related in <stdio.h>. I'm assuming the warning is produced by implementation of scanf, not by compiler [?]. (Maybe using #warning)
If I want to do something similar in some code, how do I know which data type a pointer is pointing to?
Note: I have downloaded source code of GNU C library and looked at scanf.c. I can't find my way through the very complicated code. There is a lot of #ifndef s and calls to other functions with strange names and structure...
This check is handled by the gcc compiler, specifically for scanf/printf functions.
It's such a common error that it's worth adding special case code to the compiler for these functions.
see the GCC -WFormat flag here: http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Warning-Options.html
-Wformat : Check calls to printf and scanf, etc., to make sure that the arguments supplied have types appropriate to the format string
specified, and that the conversions specified in the format string
make sense. This includes standard functions, and others specified by
format attributes (see Function Attributes), in the printf, scanf,
strftime and strfmon (an X/Open extension, not in the C standard)
families.
These checks are not implemented by all compilers, so it's certainly not something to rely on.
With GCC you can use Function Attributes 'format' and 'format-arg' to tell the compiler to apply the same checks to your functions.
format (archetype, string-index, first-to-check) The format attribute
specifies that a function takes printf, scanf, strftime or strfmon
style arguments which should be type-checked against a format string.
For example, the declaration:
extern int my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));
...causes the compiler to check the arguments in calls to my_printf for consistency with the printf style format string argument
my_format.
The warning is generated by the compiler. You declared x as a double, so it knows &x is a double*. Then it scans the format string and sees that the format requires an int* there, hence it warns.