Does sprintf() require format specifiers to work properly? - c

I have read the post sprintf format specifier replace by nothing, and others related, but have not seen this addressed specifically.
Until today, I have never seen sprintf used with only 2 arguments.
The prototype my system uses for sprintf() is:
int sprintf (char Target_String[], const char Format_String[], ...);
While working with some legacy code I ran across this: (simplified for illustration)
char toStr[30];
char fromStr[]={"this is the in string"};
sprintf(toStr, fromStr);
My interpretation of the prototype is that the second argument should be comprised of a const char[], and accepting standard ansi C format specifiers such as these.
But the above example seems to work just fine with the string fromStr as the 2nd argument.
Is it purely by undefined behavior that this works?, or is this usage perfectly legal?
I a working on Windows 7, using a C99 compiler.

Perfectly legal. The variadic arguments are optional.
In this case the printf serves as strcpy but parses the fmt string for % specifiers.
I'd write sprintf(toStr,"%s",fromStr); so it doesn't have to parse that long string.

The behavior you are observing is correct, a format string is not required to have any conversion specifiers. In this case the variable-length argument list, represented by ..., has length of zero. This is perfectly legal, although it's definitely less efficient than its equivalent
strcpy(toStr, fromStr);

It's perfectly legal code, but
If you just want to copy a string, use strcpy() instead.
If you are working with user input, you could be making yourself vulnerable to a format string attack.

Synopsis for sprintf is:
int sprintf(char *str, const char *format, ...);
That means 2 arguments are legal option.

It works because you have no further parameter (ie no control format %) to print.
It's no difference than printf without second parameter:
int printf ( const char * format, ... );
It also works if you don't have any second parameter:
printf(fromStr);

the second argument should be comprised of a const char[]
A const specifier of a function argument guarantees that the function does not change the value of that argument (given it can change it which is the case on arrays because they are passed by address to the function). It does not require that a const value to be used on the actual call.
The code you posted do not use a const string as the second argument to sprintf() but the conversion from non-const to const is implicit; there is no need to worry there.
accepting standard ansi C format specifiers
"accepting" does not mean "requiring". The format string you specified does not contain any format specifier. Accordingly, the function is called with only 2 arguments (no values to format). A third argument would be ignored by sprinf() anyway and many modern compilers would issue an warning about it.
Update: I don't want to start a debate about which compilers are modern and which are not.
It happens that I'm using the default compiler on OSX 10.11 and this what it outputs:
axiac: ~/test$ cc -v
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.3.0
Thread model: posix
axiac: ~/test$ cc -o 1 1.c
1.c:8:25: warning: data argument not used by format string [-Wformat-extra-args]
sprintf(x, "abc\n", n);
~~~~~~~ ^

Related

How do I define a function that accepts a formatted input string in C?

I built a custom logging function that takes in a "log level" and string. A user would specify the log level associated with the message (i.e. error, warning, trace, etc.). The logging function would only print the message to the console depending on the current log level configured.
int global_log_level = INFO;
void logger(int msg_log_level, char * msg)
{
if(msg_log_level >= global_log_level )
{
printf("Log Level = %u: %s", msg_log_level, msg);
}
}
Ideally, I want to feed formatted strings to this function.
logger(LOG_LEVEL_ERROR, "Error of %u occurred\n", error_code);
However, by adding this "wrapping" logic, I'm unable to input formatted message. Instead, I have to write the message to a temporary string and then feed that into the function.
char temp[512];
sprintf("Error of %u occurred\n", error_code);
logger(LOG_LEVEL_ERROR, temp);
Is there a way to implement the logger function such that I don't need to have the user create a temporary string themselves?
This is question 15.5 on the C FAQ list.
You want vprintf, which lets you write your own printf-like function logger, but where you can pass the actual format string and arguments off to vprintf to do the work. The trick is that you need to construct a special va_arg type to "point to" the variable-length argument list. It looks like this:
#include <stdarg.h>
int global_log_level = INFO;
void logger(int msg_log_level, char * msg, ...)
{
va_list argp;
va_start(argp, msg);
if(msg_log_level >= global_log_level )
{
printf("Log Level = %u: ", msg_log_level);
vprintf(msg, argp);
}
va_end(argp);
}
As an addition to the #SteveSummit answer many compilers have special pragmas or attributes to indicate that the function is "printf-like". It allows the compiler to check parameters compile-time same as it is done when using printf
Example gcc :
format (archetype, string-index, first-to-check)
The format attribute specifies that a function takes printf, scanf,
strftime or strfmon style arguments which should be type-checked
against a format string. For example, the declaration:
extern int
my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));
causes the compiler to check the arguments in calls to my_printf for consistency with the printf style format string argument
my_format.
The parameter archetype determines how the format string is interpreted, and should be printf, scanf, strftime or strfmon. (You
can also use __printf__, __scanf__, __strftime__ or __strfmon__.) The
parameter string-index specifies which argument is the format string
argument (starting from 1), while first-to-check is the number of the
first argument to check against the format string. For functions where
the arguments are not available to be checked (such as vprintf),
specify the third parameter as zero. In this case the compiler only
checks the format string for consistency. For strftime formats, the
third parameter is required to be zero.
In the example above, the format string (my_format) is the second argument of the function my_print, and the arguments to check start
with the third argument, so the correct parameters for the format
attribute are 2 and 3.
The format attribute allows you to identify your own functions which take format strings as arguments, so that GCC can check the
calls to these functions for errors. The compiler always (unless
-ffreestanding is used) checks formats for the standard library functions printf, fprintf, sprintf, scanf, fscanf, sscanf, strftime,
vprintf, vfprintf and vsprintf whenever such warnings are requested
(using -Wformat), so there is no need to modify the header file
stdio.h. In C99 mode, the functions snprintf, vsnprintf, vscanf,
vfscanf and vsscanf are also checked. Except in strictly conforming C
standard modes, the X/Open function strfmon is also checked as are
printf_unlocked and fprintf_unlocked. See Options Controlling C
Dialect.
Source: https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html

Why do I have to specify data type each time in C to printf() and scanf()?

As you can see from the code snippet below, I have declared one char variable and one int variable. When the code gets compiled, it must identify the data types of variables str and i.
Why do I need to tell again during scanning my variable that it's a string or integer variable by specifying %s or %d to scanf? Isn't the compiler mature enough to identify that when I declared my variables?
#include <stdio.h>
int main ()
{
char str [80];
int i;
printf ("Enter your family name: ");
scanf ("%s",str);
printf ("Enter your age: ");
scanf ("%d",&i);
return 0;
}
Because there's no portable way for a variable argument functions like scanf and printf to know the types of the variable arguments, not even how many arguments are passed.
See C FAQ: How can I discover how many arguments a function was actually called with?
This is the reason there must be at least one fixed argument to determine the number, and maybe the types, of the variable arguments. And this argument (the standard calls it parmN, see C11(ISO/IEC 9899:201x) §7.16 Variable arguments ) plays this special role, and will be passed to the macro va_start. In another word, you can't have a function with a prototype like this in standard C:
void foo(...);
The reason why the compiler can not provide the necessary information is simply, because the compiler is not involved here. The prototype of the functions doesn't specify the types, because these functions have variable types. So the actual data types are not determined at compile time, but at runtime.
The function then takes one argument from the stack, after the other. These values don't have any type information associated with it, so the only way, the function knows how to interpret the data is, by using the caller provided information, which is the format string.
The functions themselves don't know which data types are passed in, nor do they know the number of arguments passed, so there is no way that printf can decide this on it's own.
In C++ you can use operator overloading, but this is an entire different mechanism. Because here the compiler chooses the appropriate function based on the datatypes and available overloaded function.
To illustrate this, printf, when compiled looks like this:
push value1
...
push valueN
push format_string
call _printf
And the prototype of printf is this:
int printf ( const char * format, ... );
So there is no type information carried over, except what is provided in the format string.
printf is not an intrinsic function. It's not part of the C language per se. All the compiler does is generate code to call printf, passing whatever parameters. Now, because C does not provide reflection as a mechanism to figure out type information at run time, the programmer has to explicitly provide the needed info.
Compiler may be smart, but functions printf or scanf are stupid - they do not know what is the type of the parameter do you pass for every call. This is why you need to pass %s or %d every time.
The first parameter is a format string. If you're printing a decimal number, it may look like:
"%d" (decimal number)
"%5d" (decimal number padded to width 5 with spaces)
"%05d" (decimal number padded to width 5 with zeros)
"%+d" (decimal number, always with a sign)
"Value: %d\n" (some content before/after the number)
etc, see for example Format placeholders on Wikipedia to have an idea what format strings can contain.
Also there can be more than one parameter here:
"%s - %d" (a string, then some content, then a number)
Isn't the compiler matured enough to identify that when I declared my
variable?
No.
You're using a language specified decades ago. Don't expect modern design aesthetics from C, because it's not a modern language. Modern languages will tend to trade a small amount of efficiency in compilation, interpretation or execution for an improvement in usability or clarity. C hails from a time when computer processing time was expensive and in highly limited supply, and its design reflects this.
It's also why C and C++ remain the languages of choice when you really, really care about being fast, efficient or close to the metal.
scanf as prototype int scanf ( const char * format, ... ); says stores given data according to the parameter format into the locations pointed by the additional arguments.
It is not related with compiler, it is all about syntax defined for scanf.Parameter format is required to let scanf know about the size to reserve for data to be entered.
GCC (and possibly other C compilers) keep track of argument types, at least in some situations. But the language is not designed that way.
The printf function is an ordinary function which accepts variable arguments. Variable arguments require some kind of run-time-type identification scheme, but in the C language, values do not carry any run time type information. (Of course, C programmers can create run-time-typing schemes using structures or bit manipulation tricks, but these are not integrated into the language.)
When we develop a function like this:
void foo(int a, int b, ...);
we can pass "any" number of additional arguments after the second one, and it is up to us to determine how many there are and what are their types using some sort of protocol which is outside of the function passing mechanism.
For instance if we call this function like this:
foo(1, 2, 3.0);
foo(1, 2, "abc");
there is no way that the callee can distinguish the cases. There are just some bits in a parameter passing area, and we have no idea whether they represent a pointer to character data or a floating point number.
The possibilities for communicating this type of information are numerous. For example in POSIX, the exec family of functions use variable arguments which have all the same type, char *, and a null pointer is used to indicate the end of the list:
#include <stdarg.h>
void my_exec(char *progname, ...)
{
va_list variable_args;
va_start (variable_args, progname);
for (;;) {
char *arg = va_arg(variable_args, char *);
if (arg == 0)
break;
/* process arg */
}
va_end(variable_args);
/*...*/
}
If the caller forgets to pass a null pointer terminator, the behavior will be undefined because the function will keep invoking va_arg after it has consumed all of the arguments. Our my_exec function has to be called like this:
my_exec("foo", "bar", "xyzzy", (char *) 0);
The cast on the 0 is required because there is no context for it to be interpreted as a null pointer constant: the compiler has no idea that the intended type for that argument is a pointer type. Furthermore (void *) 0 isn't correct because it will simply be passed as the void * type and not char *, though the two are almost certainly compatible at the binary level so it will work in practice. A common mistake with that type of exec function is this:
my_exec("foo", "bar", "xyzzy", NULL);
where the compiler's NULL happens to be defined as 0 without any (void *) cast.
Another possible scheme is to require the caller to pass down a number which indicates how many arguments there are. Of course, that number could be incorrect.
In the case of printf, the format string describes the argument list. The function parses it and extracts the arguments accordingly.
As mentioned at the outset, some compilers, notably the GNU C Compiler, can parse format strings at compile time and perform static type checking against the number and types of arguments.
However, note that a format string can be other than a literal, and may be computed at run
time, which is impervious to such type checking schemes. Fictitious example:
char *fmt_string = message_lookup(current_language, message_code);
/* no type checking from gcc in this case: fmt_string could have
four conversion specifiers, or ones not matching the types of
arg1, arg2, arg3, without generating any diagnostic. */
snprintf(buffer, sizeof buffer, fmt_string, arg1, arg2, arg3);
It is because this is the only way to tell the functions (like printf scanf) that which type of value you are passing. for example-
int main()
{
int i=22;
printf("%c",i);
return 0;
}
this code will print character not integer 22. because you have told the printf function to treat the variable as char.
printf and scanf are I/O functions that are designed and defined in a way to receive a control string and a list of arguments.
The functions does not know the type of parameter passed to it , and Compiler also cant pass this information to it.
Because in the printf you're not specifying data type, you're specifying data format. This is an important distinction in any language, and it's doubly important in C.
When you scan in a string with with %s, you're not saying "parse a string input for my string variable." You can't say that in C because C doesn't have a string type. The closest thing C has to a string variable is a fixed-size character array that happens to contain a characters representing a string, with the end of string indicated by a null character. So what you're really saying is "here's an array to hold the string, I promise it's big enough for the string input I want you to parse."
Primitive? Of course. C was invented over 40 years ago, when a typical machine had at most 64K of RAM. In such an environment, conserving RAM had a higher priority than sophisticated string manipulation.
Still, the %s scanner persists in more advanced programming environments, where there are string data types. Because it's about scanning, not typing.

Runtime typechecking for pointers

I want to know how scanf function is implemented. (Just for fun of course) Number of arguments is variable, so it's certainly implemented by va_list, va_arg macros.
It also throws some warnings, when number of arguments does not match with format string. This could be done by parsing format string and comparing it with number of arguments. No magic.
The only thing that I can't see how implemented, is type checking. When type of an argument (pointer to a data) does not match corresponding declaration in format literal, scanf produces a warning. How can one check type of data that a pointer points to?
Example:
#include<stdio.h>
int main()
{
char buffer1[32], buffer2[32];
int n;
double x;
scanf("%s %s %d",buffer1, buffer2, &x); // warning
scanf("%s %s %d",buffer1, buffer2, &n); // ok
}
Output:
warning: format ‘%d’ expects argument of type ‘int *’,
but argument 4 has type ‘double *’ [-Wformat]
AFAIK the C library is not a part of C language/Compiler, so there is nothing language-related in <stdio.h>. I'm assuming the warning is produced by implementation of scanf, not by compiler [?]. (Maybe using #warning)
If I want to do something similar in some code, how do I know which data type a pointer is pointing to?
Note: I have downloaded source code of GNU C library and looked at scanf.c. I can't find my way through the very complicated code. There is a lot of #ifndef s and calls to other functions with strange names and structure...
This check is handled by the gcc compiler, specifically for scanf/printf functions.
It's such a common error that it's worth adding special case code to the compiler for these functions.
see the GCC -WFormat flag here: http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Warning-Options.html
-Wformat : Check calls to printf and scanf, etc., to make sure that the arguments supplied have types appropriate to the format string
specified, and that the conversions specified in the format string
make sense. This includes standard functions, and others specified by
format attributes (see Function Attributes), in the printf, scanf,
strftime and strfmon (an X/Open extension, not in the C standard)
families.
These checks are not implemented by all compilers, so it's certainly not something to rely on.
With GCC you can use Function Attributes 'format' and 'format-arg' to tell the compiler to apply the same checks to your functions.
format (archetype, string-index, first-to-check) The format attribute
specifies that a function takes printf, scanf, strftime or strfmon
style arguments which should be type-checked against a format string.
For example, the declaration:
extern int my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));
...causes the compiler to check the arguments in calls to my_printf for consistency with the printf style format string argument
my_format.
The warning is generated by the compiler. You declared x as a double, so it knows &x is a double*. Then it scans the format string and sees that the format requires an int* there, hence it warns.

C programming language, array, pointer

int main()
{
int j=97;
char arr[4]="Abc";
printf(arr,j);
getch();
return 0;
}
this code gives me a stack overflow error why?
But if instead of printf(arr,j) we use printf(arr) then it prints Abc.
please tell me how printf works , means 1st argument is const char* type so how arr is
treated by compiler.
sorry! above code is right it doesn't give any error,I write this by mistake. but below code give stack overflow error.
#include <stdio.h>
int main()
{
int i, a[i];
getch();
return 0;
}
since variable i take any garbage value so that will be the size of the array
so why this code give this error when i use DEV C++ and if I use TURBO C++ 3.0 then
error:constant expression required displayed. if size of array can't be variable then when
we take size of array through user input.no error is displayed. but why in this case.
please tell me how printf works
First of all, pass only non-user supplied or validated strings to the first argument of printf()!
printf() accepts a variable number of arguments after the required const char* argument (because printf() is what's called a variadic function). The first const char* argument is where you pass a format string so that printf() knows how to display the rest of your arguments.
If the arr character array contains user-inputted values, then it may cause a segfault if the string happens to contain those formatting placeholders, so the format string should always be a hard-coded constant (or validated) string. Your code sample is simple enough to see that it's really a constant, but it's still good practice to get used to printf("%s", arr) to display strings instead of passing them directly to the first argument (unless you absolutely have to of course).
That being said, you use the formatting placeholders (those that start with %) to format the output. If you want to display:
Abc 97
Then your call to printf() should be:
printf("%s %d", arr, j);
The %s tells printf() that the second argument should be interpreted as a pointer to a null-terminated string. The %d tells printf() that the third argument should be interpreted as a signed decimal.
this code gives me a stack overflow error why?
See AndreyT's answer.
I see that now the OP changed the description of the behavior to something totally different, so my explanation no longer applies to his code. Nevertheless, the points I made about variadic functions still stand.
This code results in stack invalidation (or something similar) because you failed to declare function printf. printf is a so called variadic function, it takes variable number of arguments. In C language it has [almost] always been mandatory to declare variadic functions before calling them. The practical reason for this requirement is that variadic functions might (and often will) require some special approach for argument passing. It is often called a calling convention. If you forget to declare a variadic function before calling it, a pre-C99 compiler will assume that it is an ordinary non-variadic function and call it as an ordinary function. I.e. it will use a wrong calling convention, which in turn will lead to stack invalidation. This all depends on the implementation: some might even appear to "work" fine, some will crash. But in any case you absolutely have to declare variadic functions before calling them.
In this case you should include <stdio.h> before calling printf. Header file <stdio.h> is a standard header that contains the declaration of printf. You forgot to do it; hence the error (most likely). There's no way to be 100% sure, since it depends on the implementation.
Otherwise, your code is valid. The code is weird, since you are passing j to printf without supplying a format specifier for it, but it is not an error - printf simply ignores extra variadic arguments. Your code should print Abc in any case. Add #include <stdio.h> at the beginning of your code, and it should work fine, assuming it does what you wanted it to do.
Again, this code
#include <stdio.h>
int main()
{
int j=97;
char arr[4]="Abc";
printf(arr,j);
return 0;
}
is a strange, but perfectly valid C program with a perfectly defined output (adding \n at the end of the output would be a good idea though).
In your line int i, a[i]; in the corrected sample of broken code, a is a variable-length array of i elements, but i is uninitialized. Thus your program has undefined behavior.
You see strings in C language are treated as char* and printf function can print a string directly. For printing strings using this function you should use such code:
printf("%s", arr);
%s tells the function that the first variable will be char*.
If you want to print both arr and j you should define the format first:
printf("%s%d", arr, j);
%d tells the function that the second variable will be int
I suspect the printf() issue is a red herring, since with a null-terminated "Abc" will ignore other arguments.
Have you debugged your program? If not can you be sure the fault isn't in getch()?
I cannot duplicate your issue but then I commented out the getch() for simplicity.
BTW, why did you not use fgetc() or getchar()? Are you intending to use curses in a larger program?
===== Added after your edit =====
Okay, not a red herring, just a mistake by the OP.
C++ does allow allocating an array with the size specified by a variable; you've essentially done this with random (garbage) size and overflowed the stack, as you deduced. When you compile with C++ you are typically no longer compiling C and the rules change (depending on the particular compiler).
That said, I don't understand your question - you need to be a lot more clear with "when we take size of array through user input" ...

Passing too many arguments to printf

Any C programmer who's been working for more than a week has encountered crashes that result from calling printf with more format specifiers than actual arguments, e.g.:
printf("Gonna %s and %s, %s!", "crash", "burn");
However, are there any similar bad things that can happen when you pass too many arguments to printf?
printf("Gonna %s and %s!", "crash", "burn", "dude");
My knowledge of x86/x64 assembly leads me to believe that this is harmless, though I'm not convinced that there's not some edge condition I'm missing, and I have no idea about other architectures. Is this condition guaranteed to be harmless, or is there a potentially crash-inducing pitfall here, too?
Online C Draft Standard (n1256), section 7.19.6.1, paragraph 2:
The fprintf function writes output to the stream pointed to by stream, under control of the string pointed to by format that specifies how subsequent arguments are
converted for output. If there are insufficient arguments for the format, the behavior is
undefined. If the format is exhausted while arguments remain, the excess arguments are
evaluated (as always) but are otherwise ignored. The fprintf function returns when
the end of the format string is encountered.
Behavior for all the other *printf() functions is the same wrt excess arguments except for vprintf() (obviously).
You probably know the prototype for the printf function as something like this
int printf(const char *format, ...);
A more complete version of that would actually be
int __cdecl printf(const char *format, ...);
The __cdecl defines the "calling convention" which, along with other things, describes how arguments are handled. In the this case it means that args are pushed onto the stack and that the stack is cleaned by the function making the call.
One alternative to _cdecl is __stdcall, there are others. With __stdcall the convention is that arguments are pushed onto the stack and cleaned by the function that is called. However, as far as I know, it isn't possible for a __stdcall function to accept a variable number of arguments. That makes sense since it wouldn't know how much stack to clean.
The long and the short of it is that in the case of __cdecl functions its safe to pass however many args you want, since the cleanup is performed in the code makeing the call. If you were to somehow pass too many arguments to a __stdcall function it result in a corruption of the stack. One example of where this could happen is if you had the wrong prototype.
More information on calling conventions can be found on Wikipedia here.
All the arguments will be pushed on the stack and removed if the stack frame is removed. this behaviour is independend from a specific processor. (I only remember a mainframe which had no stack, designed in 70s) So, yes the second example wont't fail.
printf is designed to accept any number of arguments. printf then reads the format specifier (first argument), and pulls arguments from the argument list as needed. This is why too few arguments crash: the code simply starts using non-existent arguments, accessing memory that doesn't exist, or some other bad thing. But with too many arguments, the extra arguments will simply be ignored. The format specifier will use fewer arguments than have been passed in.
Comment: both gcc and clang produce warnings:
$ clang main.c
main.c:4:29: warning: more '%' conversions than data arguments [-Wformat]
printf("Gonna %s and %s, %s!", "crash", "burn");
~^
main.c:5:47: warning: data argument not used by format string
[-Wformat-extra-args]
printf("Gonna %s and %s!", "crash", "burn", "dude");
~~~~~~~~~~~~~~~~~~ ^
2 warnings generated.

Resources