How to wisely interpret this compiler warning? - c

When I executed the code of this question, I got this warning:
warning: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Wformat=]
printf("P-Q: %d, P: %d, Q: %d", (p - q), p, q);
~^ ~~~~~~~
%ld
As a reflex fix, I used %ld to print the subtraction of two pointers. And the compiler agreed.
Fortunately, I saw a comment from another user mentioning that %td should be used, since the result type of the subtraction is ptrdiff_t. This answer confirms this claim.
Now from GCC's header file of stddef.h, I can see that these types are equivalent in this case:
typedef __PTRDIFF_TYPE__ ptrdiff_t;
#define __PTRDIFF_TYPE__ long int
However, I was just going to suggest a wrong (more or less) fix to the OP, with %ld, instead of %td.
Is there a way I could have understood that the compiler warning alone was not enough? Or maybe to wisely have interpreted the warning itself, and not just react.

I don't think you can tell. It depends on the intent/caution/smartness of the compiler writer.
Maybe he decided he would always support %ld where %td is expected, or maybe he was just unaware/unable/unwilling to give a more detailed/proper message. In case of doubt, your last resort is the standard.
This doesn't seem to be a portable construct and for "orthodoxy" you should support both format specifiers.

The key here is: don't do any form of arithmetic inside printf in the first place. Separate algorithm from GUI.
Code such as printf("%d", p - q) is very dangerous, not just because you might get the types wrong logically, but also since C might "do you a favour" and silently change the types through implicit type promotion. Examples.
In addition, most compilers don't warn for wrong format specifiers. This is a relatively new thing in the history of C, since compilers aren't required to show a diagnostic message here. It is just a bonus feature of gcc.
How to avoid bugs? These functions are inherently dangerous - that's just how it is and everyone knows it. Probably printf & scanf family of functions are the most harmful functions ever written in the history of programming, in terms of total bug cost caused to mankind. So what you should do to:
Avoid stdio.h if possible and keep it away from production-quality code. Portability is not always more important than robust code - sometimes it is preferable to use the raw console API. Avoid variable argument list functions in general.
If not possible to avoid, wrap the "GUI" part of stdio.h inside a separate file, which you should be doing anyway. Don't mix printing/input with algorithms. Make an interface which is using pointers.
It is 2018, not 1970: don't write console interfaces in the first place. Ye ye I know... there's lots of old crap still floating around which needs to be maintained. But nowadays, console functions should be used mostly for debugging purposes and for newbies learning C, in which case type safety might not be such a big issue.

Related

What is the point of format specifier in C?

What is the point of format specifier in C if we have allready set the type of variable before printf?
For example:
#include<stdio.h>
int main(void)
{
int a=7
printf("%d", a);
}
Like, it's allready stated what a is, it's integer(int). So what is the point of adding %d to specify that it's an integer?
The answer to this question really only makes sense in the context of C's history.
C is, by now, a pretty old language. Though undoubtedly a "high level language", it is famously low-level as high-level languages go. And its earliest compiler was deliberately and self-consciously small and simple.
In its first incarnation, C did not enforce type safety during function calls. For example, if you called sqrt(144), you got the wrong answer, because sqrt expects an argument of type double, but 144 is an int. It was the programmer's responsibility to call a function with arguments of the correct types: the compiler did not know (did not even attempt to keep track of) the arguments expected by each function, so it did not and could not perform automatic conversions. (A separate program, lint, could check that functions were called with the correct arguments.)
C++ corrected this deficiency, by introducing the function prototype. These were inherited by C in the first ANSI C standard in 1989. However, a function prototype only works for a function that expects a single, fixed argument list, meaning that it can't help for functions that accept a variable number of arguments, the premier example being: printf.
The other thing to remember is that, in C, printf is a more or less ordinary function. ("Ordinary" other than accepting a variable number of arguments, that is.) So the compiler has no direct mechanism to notice the types of the arguments and make that list of types available to printf. printf has no way of knowing, at run time, what types were passed during any given call; it can only rely (it must rely) on the clues provided in the format string. (This is by contrast to languages, many of them, where the print statement is an explicit part of the language parsed by the compiler, meaning that the compiler can do whatever it needs to do in order to treat each argument properly according to its known type.)
So, by the rules of the language (which are constrained by backwards compatibility and the history of the language), the compiler can't do anything special with the arguments in a printf call, other than performing what is called the default argument promotions. So the compiler can't fix things (can't perform the "correct" implicit conversion) if you write something like
int a = 7;
printf("%f", a);
This is, admittedly, an uncomfortable situation. These days, programmers are used to the protections and the implicit promotions provided for by function prototypes. If, these days, you can call
int x = sqrt(144);
and have the right thing happen, why can't you similarly call
printf("%f\n", 144);
Well, you can't, although a good, modern compiler will try to help you out anyway. Although the compiler doesn't have to inspect the format string (because that's printf's job to do, at run time), and the compiler isn't allowed to insert any implicit conversions (other than the default promotions, which don't help here), a compiler can duplicate printf's logic, inspect the format string, and issue strong warnings if the programmer makes a mistake. For example, given
printf("%f\n", 144);
gcc prints "warning: format ‘%f’ expects argument of type ‘double’, but argument 2 has type ‘int", and clang prints "warning: format specifies type 'double' but the argument has type 'int'".
In my opinion, this is a fine compromise, balancing C's legacy behavior with modern expectations.
what is the point of adding %d to specify that it's an integer?
printf() is a function which receives a variable number of arguments of various type after the format argument. It does not directly know the number nor the type of arguments passed nor received.
The callers knows the argument count and types it gives to printf().
To pass the arguments count and type information, the format argument is used by the caller to encodes the argument count and types. printf() uses that format and decodes it to know the argument count and type. It is very important that the format and following arguments passed are consistent.
printf() accepts a variable number of arguments. To process those variable arguments it (va_start()) needs to know the last fixed argument is. It (va_arg()) also needs to know the type of each argument so it figure how much data to read.
The format specifier is also a compact template (or DSL) to express how text and variables should be formatted including field width, alignment, precision, encoding.

Is it OK to pass the address of an int for scanf("%x", ...)?

Does the following code have defined beavior:
#include <stdio.h>
int main() {
int x;
if (scanf("%x", &x) == 1) {
printf("decimal: %d\n", x);
}
return 0;
}
clang compiles it without any warnings even with all warnings enabled, including -pedantic. The C Standard seems unambiguous about this:
C17 7.21.6.2 The fscanf function
...
... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
...
The conversion specifiers and their meanings are:
...
x Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
On two's complement architectures, converting -1 with %x seems to work, but it would not on ancient sign/magnitude or ones complement systems.
Is there any provision to make this behavior defined or at least implementation defined?
This falls in the category of behaviors which quality implementations should support unless they document a good reason for doing otherwise, but which the Standard does not mandate. The authors of the Standard seem to have refrained from trying to list all such behaviors, and there are at least three good reasons for that:
Doing so would have made the Standard longer, and spending ink describing obvious behaviors that readers would expect anyway would distract from the places where the Standard needed to call readers' attention to things that they might not otherwise expect.
The authors of the Standard may not have wanted to preclude the possibility that an implementation might have a good reason for doing something unusual. I don't know whether that was a consideration in your particular case, but it could have been.
Consider, for example, a (likely theoretical) environment whose calling convention that requires passing information about argument types fed to variadic functions, and that supplies a scanf function that validates those argument types and squawks if int* is passed to a %X argument. The authors of the Standard were almost certainly not aware of any such environment [I doubt any ever existed], and thus would be in no position to weigh the benefits of using the environment's scanf routine versus the benefits of supporting the common behavior. Thus, it would make sense to leave such judgment up to people who would be in a better position to assess the costs and benefits of each approach.
It would be extremely difficult for the authors of the Standard to ensure that they exhaustively enumerated all such cases without missing any, and the more exhaustively they were to attempt to enumerate such cases, the more likely it would be that accidental omissions would be misconstrued as deliberate.
In practice, some compiler writers seem to regard most situations where the Standard fails to mandate the behavior of some action as an invitation to assume code will never attempt it, even if all implementations prior to the Standard had behaved consistently and it's unlikely there would ever be any good reason for an implementation to do otherwise. Consequently, using %X to read an int falls in the category of behaviors that will be reliable on implementations that make any effort to be compatible with common idioms, but could fail on implementations whose designers place a higher value on being able to process useless programs more efficiently, or on implementations that are designed to squawk when given programs that could be undermined by such implementations.

Deduce format specifier from data type?

Is it possible to deduce the format specifier programmatically for a data type? For instance if the print is for a long it automatically does something like:
printf("Vlaue of var is <fmt_spec> ", var);
I also feel it would reduce some errors on part of developer since something like
printf("Name is %s",int_val); //Oops, int_val would be treated as an address
printf("Name is %s, DOB is",name,dob); // missed %d for dob
printf("Name is %s DOB is %d", name);//Missed printing DOB
I understand that the latter two do have warnings but wouldn't it be better if errors were thrown since in most cases it is going to be problematic? Or am I missing something or are there constructs already in place to do so ?
Deduce format specifier from data type?
No.
As melpomene stated:
"Format specifiers aren't just for types. E.g. %o and %x both take unsigned int; %e, %f, %g all take double; %d, %i, %c all take int. That's why you can't (in general) deduce them from the arguments."
Point is that if such a functionality existed, then would it deduce unsiged int to %o or %x, for example? And so on . . .
About whether some cases should issue a warning or an issue, you should think about how casting works in c, and when it does make sense to allow something or not. In GCC, you could of course treat warning(s) as error(s):
-Werror
Make all warnings into errors.
-Werror=
Make the specified warning into an error. The specifier for a warning is appended; for example -Werror=switch turns the warnings controlled by -Wswitch into errors. This switch takes a negative form, to be used to negate -Werror for specific warnings; for example -Wno-error=switch makes -Wswitch warnings not be errors, even when -Werror is in effect.
The warning message for each controllable warning includes the option that controls the warning. That option can then be used with -Werror= and -Wno-error= as described above. (Printing of the option in the warning message can be disabled using the -fno-diagnostics-show-option flag.)
Note that specifying -Werror=foo automatically implies -Wfoo. However, -Wno-error=foo does not imply anything.
as you can read here.
Is it possible to deduce the format specifier programmatically for a data type?
Not easily nor directly with printf(), yet...
Yes, with limitations to a select set of types, by using of _Generic.
This could be done various ways and used with *printf() with great dificulty, yet I found a similar approach to print data, without specifying individual format specifiers in this example:
Formatted print without the need to specify type matching specifiers using _Generic
Note: This code has a coding hole concerning pointer math, that I have since patched - though not posted.
GPrintf("Name is ", GP(name), " is ", GP(dob), GP_eol);
The key was to use _Generic(parameter) to steer the selection of the code used to convert the type to text by having the macro GP(x) expand to 2 parts: a string and x. Then GPrintf() interprets the arguments.
This is akin to #Michaël Roy's comment, yet staying in C rather than C++.

How can printf issue a compiler warning?

I was wondering how can a function issue a compile-time warning?
This came to my mind because when we supply wrong format specifier in the first argument of printf (scanf) for the variable matched with that type specifier and compile with gcc with -Wall option on, compiler issues a warning.
Now, printf and scanf are regularly implemented variadic functions as I understand and I dont know any way to check the value of the string at the compile-time, let alone issue a warning if something doesnt match.
Can someone explain me how I get compiler warning then?
Warnings are implementation (i.e. compiler & C standard library) specific. You could have a compiler giving very few warnings (look into tinycc...), or even none...
I'm focusing on a recent GCC (e.g. 4.9 or 10...) on Linux.
You are getting such warnings, because printf is declared with the appropriate __attribute__ (see GCC function attributes)
(With GCC you can likewise declare your own printf-like functions with the format attribute...)
BTW, a standard conforming compiler is free to implement very specially the <stdio.h> header. So it could process #include <stdio.h> without reading any header file but by changing its internal state.
And you could even add your own function attributes, e.g. by customizing your GCC with your GCC plugin
How can printf issue a compiler warning?
Some compilers analyze the format and other arguments type of printf() and scanf() at compile time.
printf("%ld", 123); // type mis-match `long` vs. `int`
int x;
printf("%ld", &x); // type mis-match 'long *` vs. `int *`
Yet if the format is computed, then that check does not happen as it is a run-time issue.
const char *format = foo();
printf(format, 123); // mis-match? unknowable.
You're absolutely right that it's unusual for a compiler to warn about specific functions.
Warnings about printf (and scanf, and related) format specifiers are quite unusual -- but then, these functions are quite unusual in the first place.
As other answers have explained, it's at least possible for a compiler to "know" about certain functions and to perform special, extra, compile-time checks like this -- and given that printf and scanf and friends are simultaneously very unusual and very popular, it's quite appropriate for compilers to be doing this extra checking, unusual though it is.
Once upon a time (I'm talking about the pre-ANSI, K&R days here), C programmers knew they had to be careful about calling functions with the correct number and type of arguments. (In those days, the only way to automatically check that was to use lint, which some programmers did but many programmers didn't.) And if you were used to being careful, it was easy to be careful about printf and friends, also.
Today, though, it's a different story. ANSI C function prototypes have been in use for a generation. Most programmers today implicitly expect a compiler to automatically convert the types of function arguments, and to complain about incompatible mismatches. (As an example of the way things have changed: in the old days, calling sqrt(144) was an error that quietly gave mysterious results, but today it's fine.)
So today, I have a great deal of sympathy for programmers who are learning C, and are baffled by printf. If you're completely used to the protections afforded to you by function prototypes, it's a pretty great mystery why
int i = 3;
float f = 4.5;
printf("i as a float is %f, f as an int is %d\n", i, f);
doesn't work. Unlike the old days, I suspect, it is very hard to remember that, when you call printf (but pretty much only when you call printf), it's your job to get all the types right, because the compiler won't insert any implicit conversions.
The bottom line is that, today, not only is it possible for a compiler to warn about mismatches in calls to printf and the like, I believe it's pretty much a moral imperative. When we introduced function prototypes, we promised programmers type safety for function arguments, so it's really not fair to quietly withdraw that promise when it comes to printf.
[P.S. Yes, of course I know why function prototypes can't promise complete type safety for varargs functions like printf. But that's got nothing to do with my argument here. Also, yeah, I know, life isn't fair, so call me an old softie with my highfalutin talk of "moral imperatives". :-) ]

Why weren't new (bit width specific) printf() format option strings adoped as part of C99?

While researching how to do cross-platform printf() format strings in C (that is, taking into account the number of bits I expect each integer argument to printf() should be) I ran across this section of the Wikipedia article on printf(). The article discusses non-standard options that can be passed to printf() format strings, such as (what seems to be a Microsoft-specific extension):
printf("%I32d\n", my32bitInt);
It goes on to state that:
ISO C99 includes the inttypes.h header
file that includes a number of macros
for use in platform-independent printf
coding.
... and then lists a set of macros that can be found in said header. Looking at the header file, to use them I would have to write:
printf("%"PRId32"\n", my32bitInt);
My question is: am I missing something? Is this really the standard C99 way to do it? If so, why? (Though I'm not surprised that I have never seen code that uses the format strings this way, since it seems so cumbersome...)
The C Rationale seems to imply that <inttypes.h> is standardizing existing practice:
<inttypes.h> was derived from the header of the same name found on several existing 64-bit systems.
but the remainder of the text doesn't write about those macros, and I don't remember they were existing practice at the time.
What follows is just speculation, but educated by experience of how standardization committees work.
One advantage of the C99 macros over standardizing additional format specifier for printf (note that C99 also did add some) is that providing <inttypes.h> and <stdint.h> when you already have an implementation supporting the required features in an implementation specific way is just writing two files with adequate typedef and macros. That reduces the cost of making existing implementation conformant, reduces the risk of breaking existing programs which made use of the existing implementation specifics features (the standard way doesn't interfere) and facilitate the porting of conformant programs to implementation who don't have these headers (they can be provided by the program). Additionally, if the implementation specific ways already varied at the time, it doesn't favorize one implementation over another.
Correct, this is how the C99 standard says you should use them. If you want truly portablt code that is 100% standards-conformant to the letter, you should always print an int using "%d" and an int32_t using "%"PRId32.
Most people won't bother, though, since there are very few cases where failure to do so would matter. Unless you're porting your code to Win16 or DOS, you can assume that sizeof(int32_t) <= sizeof(int), so it's harmless to accidentally printf an int32_t as an int. Likewise, a long long is pretty much universally 64 bits (although it is not guaranteed to be so), so printing an int64_t as a long long (e.g. with a %llx specifier) is safe as well.
The types int_fast32_t, int_least32_t, et al are hardly ever used, so you can imagine that their corresponding format specifiers are used even more rarely.
You can always cast upwards and use %jd which is the intmax_t format specifier.
printf("%jd\n", (intmax_t)(-2));
I used intmax_t to show that any intXX_t can be used, but simply casting to long is much better for the int32_t case, then use %ld.
I can only speculate about why. I like AProgrammer's answer above, but there's one aspect overlooked: what are you going to add to printf as a format modifier? There are already two different ways that numbers are used in a printf format string (width and precision). Adding a third kind of number to say how many bits of precision are in the argument would be great, but where are you going to put it without confusing people? Unfortunatey one of the flaws in C is that printf was not designed to be extensible.
The macros are awful, but when you have to write code that is portable across 32-bit and 64-bit platforms, they are a godsend. Definitely saved my bacon.
I think the answer to your question why is either
Nobody could think of a better way to do it, or
The standards committee couldn't agree on anything they felt was clearly better.
Another possibility: backward compatibility. If you add more format specifiers to printf, or additional options, it is possible that a specifier in some pre-C99 code would have a format string interpreted differently.
With the C99 change, you're not changing the functionality of printf.

Resources