Promotions and conversions of variables in printf() - c

In this case,
#include <stdio.h>
int main()
{
unsigned char a = 1;
printf("%hhu", -a);
return 0;
}
The argument -a in printf is promoted to int by the integer promotion by the unary minus operator and subsequently promoted by the default argument promotion and finally converted to unsigned char by the format specifier.
So -a => -(int)a(by ~) => no conversion by function call => (unsigned char)-(int)a(by %hhu). Is my thought right?

You are correct that a is promoted to int in -a, and that printf("%hhu", -a); passes an int to printf. The notional conversion performed with %hhu is not clear.
Note that if a is not zero, then -a produces a value (in an int) that is not an unsigned char value. Further, with two’s complement eight-bit signed char, if a is greater than 128, then -a produces a value that is not a signed char value.
To understand %hhu, we look at the specification for u in C 2018 7.21.6.1 8:
The unsigned int argument is converted to unsigned octal (o), unsigned decimal (u),…
and for hh in 7.21.6.1 7:
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
First we have to resolve this issue of “signed char or unsigned char”. Does this say we can pass either a signed char or an unsigned char for %hhu? I think not; I think the authors have just put together the language for %hhd (intended to convert a signed char) and %hhu (intended to convert an unsigned char). So I believe the intent is that a promoted unsigned char should be passed for the %hhu conversion specification.
Apple Clang 11.0.0 seems to agree, when passing -a (but not a), it warns: “warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]”
As noted above, passing -a may pass a value that cannot result from passing a promoted unsigned char. It may even pass a value that cannot result from passing a promoted signed char or unsigned char. In this case, it can be argued we have violated the requirement to pass an unsigned char, and therefore the C standard does not specify the resulting behavior. Even though it says the passed value shall be converted to an unsigned char, I believe that is a notional conversion, not a specific requirement on the library implementation, and that is also falls under the “as if” rules: It does not actually have to be performed if the resulting defined behavior of programs is the same. But, since passing an improper value may not be defined, we do not have defined behavior.
That may be a strict reading of the rules, but it would not surprise me greatly if printf printed “4294967295” instead of “255” when a were 1.

printf is a variadic function. The type of the arguments passed by ... parameter are not known inside the function. As such, any variadic function must rely on other mechanisms to interpret the type of the va_args arguments. printf and family use a const char* format string to "tell them" what kind of arguments were passed. Passing a type different then the expected type as specified by it's format specifier results in Undefined Behavior.
For instance:
printf("%f", 24)
Is undefined behavior. There is no conversion from int to float anywhere because the arguments are passed as they are (after promotion) and inside the printf the function incorrectly treats its first argument as float. printf does not know and can't know that the real type of the argument is int.
Variadic arguments undergo some promotions of their own. Of interest for your question unsigned char is promoted to int or unsigned int (I am not sure tbo). As such there is no way for a variadic parameter to actually be of type unsigned char. So hhu while is indeed the specifier for unsigned char it will actually expect an unsigned int (int), which is what you pass to it.
So afaik the code is safe because of the two integer promotions caused by unary minus and passing variadic arguments. I am not 100% sure though. Integer promotions are weird and complicated.

Related

Which format strings allowed as a fallback?

Let's take the following five examples:
// OK, most correct
printf("%10.4hd XXX", (short) 2);
// OK, no warning
printf("%10.4d XXX", (short) 2);
// Error: [-Werror,-Wformat]
printf("%10.4hhd XXX", (short) 2);
// Error: [-Werror,-Wformat]
printf("%10.4ld XXX", (short) 2);
// Error: [-Werror,-Wformat]
printf("%10.4f XXX", (short) 2);
Why, for example, does the second one work fine, but the third, fourth, and fifth ones do not?
In C, the expression (short)2 is subject to the integer promotion rules before it's passed to printf. That means it becomes an int.
printf provides the h length specifier, but the value it consumes will be an int, not a short because in C there's no way to pass a short value directly to a var-arg function.
From ISO/IEC 9899:TC3 section 6.19.6.1.7 on the h length specifier:
Specifies that a following d, i, o, u, x, or X conversion specifier
applies to a short int or unsigned short int argument (the argument
will have been promoted according to the integer promotions, but its
value shall be converted to short int or unsigned short int before
printing); or that a following n conversion specifier applies to a
pointer to a short int argument.
The %hhd format string should also work for similar reasons. I guess the reason that clang warns for this one and not for "%d", (short) is that the latter is technically correct and there's a lot of C code that uses %d for printing shorts.
The %ld is probably always undefined behavior, but may or may not work in practice depending on whether int and long have the same representation (typically this would be if int and long are both 64 bits).
The %f is always undefined behavior. %f expects a double argument (not a float, since floats are always promoted to double when passed to a var-args function), and you've given it an int.
From this site:
When a function with a variable-length argument list is called, the
variable arguments are passed using C's old ``default argument
promotions.'' These say that types char and short int are
automatically promoted to int, and type float is automatically
promoted to double. Therefore, varargs functions will never receive
arguments of type char, short int, or float.
printf is a function with a variable-length argument list. So when you pass in a short, it gets promoted to an int.
Now, even though printf will never receive a short, you can tell it that you started with one. So %...hd tells printf that you passed in a short and it has been converted to an integer, and printf will try to do the right thing with it (convert it back to a short internally).
%...hhd works the same way but you use it when you pass in a char and it gets promoted to an int. I guess that the compiler doesn't allow %...hhd in your case because it is smart enough to notice that you didn't pass in a char (you passed in a short).
%...d works because printf sees an int (and the compiler has decided that it doesn't mind that you didn't use hd).
%...ld and %...f don't work because you didn't pass in a long or a float, and a short doesn't get promoted to either of those types.

Is it UB to give a char argument to printf where printf expects a int?

Do I understand the standard correctly that this program cause UB:
#include <stdio.h>
int main(void)
{
char a = 'A';
printf("%c\n", a);
return 0;
}
When it is executed on a system where sizeof(int)==1 && CHAR_MIN==0?
Because if a is unsigned and has the same size (1) as an int, it will be promoted to an unsigned int [1] (2), and not to an int, since a int can not represent all values of a char. The format specifier "%c" expects an int [2] and using the wrong signedness in printf() causes UB [3].
Relevant quotes from ISO/IEC 9899 for C99
[1] Promotion to int according to C99 6.3.1.1:2:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions. All other types are
unchanged by the integer promotions.
[2] The format specifier "%c" expects an int argument, C99 7.19.6.1:8 c:
If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written.
[3] Using the wrong type in fprintf() (3), including wrong signedness, causes UB according to C99 7.19.6.1:9:
... If any argument is not the correct type for the corresponding
conversion specification, the behavior is undefined.
The exception for same type with different signedness is given for the va_arg macro but not for printf() and there is no requirement that printf() uses va_arg (4).
Footnotes:
(marked with (n))
This implies INT_MAX==SCHAR_MAX, because char has no padding.
See also this question: Is unsigned char always promoted to int?
The same rules are applied to printf(), see C99 7.19.6.3:2
See also this question: Does printf("%x",1) invoke undefined behavior?
A program can have undefined behavior or not depending on the characteristics of the implementation.
For example, a program that executes
int x = 32767;
x++;
(and is otherwise well defined) has well defined behavior on an implementation with INT_MAX > 32767, and undefined behavior otherwise.
Your program:
#include <stdio.h>
int main(void)
{
char a='A';
printf("%c\n",a);
return 0;
}
has well defined behavior for any hosted implementation with INT_MAX >= CHAR_MAX. On any such implementation, the value of 'A' is promoted to int, which is what %c expects.
If INT_MAX < CHAR_MAX (which implies that plain char is unsigned and that CHAR_BIT >= 16), the value of a is promoted to unsigned int. N1570 7.21.6.1p9:
If any argument is not the correct type for the corresponding
conversion specification, the behavior is undefined.
implies that this has undefined behavior.
In practice, (a) such implementations are rare, likely nonexistent (the only existing C implementations I've heard of with CHAR_BIT > 8 are for DSPs and such implementations are likely to be freestanding), and (b) any such implementation would probably be designed to handle such cases gracefully.
TL;DR there is no UB (in my interpretation at any rate).
6.2.5 types
6. For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
9. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same 41)
41) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
Furthermore
7.16.1.1 The va_arg macro
2 The va_arg macro expands to an expression that has the specified type and the value of the next argument in the call. [...] If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
7.21.6.8 The vfprintf function
288) [...] functions vfprintf, vfscanf, vprintf, vscanf, vsnprintf, vsprintf, and vsscanf invoke the va_arg macro [...]
Thus, it stands to reason that an unsigned type is not "an incorrect type for the corresponding (signed) conversion specification", as long as the value is within the range.
This is corroborated by the fact that major compilers do not warn about signed/unsigned format specification mismatch, even though they do warn about other mismatches, even when the corresponding types have the same representation on a given platform (e.g. long and long long).
Do i understand the standard correct that this program cause UB:
#include <stdio.h>
int main(void)
{
char a='A';
printf("%c\n",a);
return 0;
}
When it is executed on a system where sizeof(int)==1 && CHAR_MIN==0?
That would be a plausible interpretation of the standard. However, in the event that an implementation with such a combination of type characteristics were produced for genuine use, I have full confidence that it would provide appropriate support for the %c directive -- as an extension, if one wants to interpret it that way. The example program would then have well-defined behavior with respect to that implementation, whether or not the C standard is interpreted to define that behavior, too. I suppose I account that quality-of-implementation issue as being rolled up in "for genuine use".

Can integer promotion happen in the reverse order(eg. long int to int) in variadic functions like printf()?

#include<stdio.h>
int main() {
long a = 9;
printf("a = %d",a);//output is 9 but with a warning 'expecting long int'
}
Why can't long here be converted to int?
Variadic functions in general and printf family in particular, are odd special cases. They are notorious for their non-existent type safety, so if you pass the wrong type or use the wrong format string, you invoke undefined behavior and anything can happen.
In your case, most likely int and long happen to have the same representation so the program works despite the warning.
In the case of a regular function though, there is a kind of "demotion" taking place if you pass a larger integer type to a function expecting a smaller one. When this happens, you trigger a conversion from the larger type to the smaller, which is well-defined. (The result will however be compiler-specific if you mix types of different signedness.)
Compilers tend to warn against such implicit conversions, so it is better to do the conversion explicitly with a cast.
Because that's the way variadic functions behave in C language. printf is just a function from the standard library and has no special processing. It is declared as
int printf(const char restrict *fmt, ...);
And the standard (n1256 draft for C99) says (emphasize mine):
6.5.2.2 Function calls...
6 If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions...
7 ... The ellipsis notation in a function prototype declarator causes
argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.
That means that on all parameters to printf, float are converted to double and integer promotions occur on integral arguments.
And in 6.3.1.1 Arithmetic operands / Boolean, characters, and integers
2 The following may be used in an expression wherever an int or unsigned int may
be used:
— An object or expression with an integer type whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
— A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the integer
promotions.48) All other types are unchanged by the integer promotions.
So as long has a rank greater than int, it is left unchanged by an integer promotion, and the format shall be adapted to accept a long:
long a = 9;
printf("a = %ld",a);
Following passes a long, yet printf() expects an int due to "%d". Result: undefined behavior (UB). If int and long are the same size that UB might look like everything is OK, or it may fail. It is UB.
long a = 9;
printf("a = %d",a); // UB
Why can't long here be converted to int?
long can be converted, yet code did not direct that like the below code.
printf("a = %d", (int) a); // OK
Can integer promotion happen in the reverse order ... in variadic functions like printf()?
Integer promotions do not happen in the reverse order unless code explicitly down-casts or assigned to a narrower type.
There are cases where demotion will appear to be true with printf().
The below promotes sc to int as it is passed to printf(). printf() will take that int and due to "%hhd" will convert it to signed char and then print that numeric value.
signed char sc = 1;
printf("a = %hhd", cs); // prints 1
The below passes i to printf() as an int. printf() will take that int and due to "%hhd" will convert it to signed char and then print that numeric value. So in this case it looks like i was demoted.
int i = 0x101;
printf("a = %hhd", i); // prints 1

Is it illegal to use the h or hh length modifiers when the corresponding argument to printf was not a short / char?

The printf family of functions provide a series of length modifiers, two of them being hh (denoting a signed char or unsigned char argument promoted to int) and h (denoting a signed short or unsigned short argument promoted to int). Historically, these length modifiers have only been introduced to create symmetry with the length modifiers of scanf and are rarely used for printf.
Here is an excerpt of ISO 9899:2011 §7.21.6.1 “The fprintf function” ¶7:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char
argument.
h Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short intargument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.
...
Ignoring the case of the n conversion specifier, what do these almost identical paragraphs say about the behaviour of h and hh?
In this answer, it is claimed that passing an argument that is outside the range of a signed char, signed short, unsigned char, or unsigned short resp. for a conversion specification with an h or hh length modifier resp. is undefined behaviour, as the argument wasn't converted from type char, short, etc. resp. before.
I claim that the function operates in a well-defined manner for every value of type int and that printf behaves as if the parameter was converted to char, short, etc. resp. before conversion.
One could also claim that invoking the function with an argument that was not of the corresponding type before default argument promotion is undefined behaviour, but this seems abstruse.
Which of these three interpretations of §7.21.6.1¶7 (if at all) is correct?
The standard specifies:
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
[C2011 7.21.6.1/9]
What is meant by "the correct type", is conceivably open to interpretation, but the most plausible interpretation to me is the type that the conversion specification "applies to" as specified earlier in the same section, and as quoted, in part, in the question. I take the parenthetical comments about argument promotion to be acknowledging the ordinary argument-passing rules, and avoiding any implication of these functions being special cases. I do not take the parenthetic comments as relevant to determining the "correct type" of the argument.
What actually happens if you pass an argument of wider type than is correct for the conversion specification is a different question. I am inclined to believe that the C system is unlikely to be implemented by anybody such that it makes a difference whether a printf() argument is actually a char, or whether it is an int whose value is in the range of char. I assert, however, that it is valid behavior for the compiler to check argument type correspondence with the format, and to reject the program if there is a mismatch (because the required behavior in such a case is explicitly undefined).
On the other hand, I could certainly imagine printf() implementations that actually misbehave (print garbage, corrupt memory, eat your lunch) if the value of an argument is outside the range implied by the corresponding conversion specifier. This also is permissible on account of the behavior being undefined.

Format specifier for unsigned char

Say I want to print unsigned char:
unsigned char x = 12;
which is correct. This:
printf("%d",x);
or this:
printf("%u",x);
?
The thing is elsewhere on SO I encountered such discussion:
-Even with ch changed to unsigned char, the behavior of the code is not defined by the C standard. This is because the unsigned char is promoted to an int (in normal C implementations), so an int is passed to printf for the specifier %u. However, %u expects an unsigned int, so the types do not match, and the C standard does not define the behavior
-Your comment is incorrect. The C11 standard states that the conversion specifier must be of the same type as the function argument itself, not the promoted type. This point is also specifically addressed in the description of the hh length modifier: "the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing"
So which is correct? Any reliable source saying on this matter? (In that sense we should also print unsigned short int with %d because it can be promoted to int?).
The correct one is*:
printf("%d",x);
This is because of default argument promotions as printf() is variadic function. This means that unsigned char value is always promoted to int.
From N1570 (C11 draft) 6.5.2.2/6 Function calls (emphasis mine going forward):
If the expression that denotes the called function has a type that
does not include a prototype, the integer promotions are performed on
each argument, and arguments that have type float are promoted to
double. These are called the default argument promotions.
and 6.5.2.2/7 subclause tells:
The ellipsis notation in a function prototype declarator causes
argument type conversion to stop after the last declared parameter.
The default argument promotions are performed on trailing arguments.
These integer promotions are defined in 6.3.1.1/2 Boolean, characters, and integers:
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
This quote answers your second question of unsigned short (see comment below).
* with exception to more than 8 bits unsigned char (e.g. it might occupy 16 bit), see #chux's answer.
Correct format specifier for unsigned char x = 12 depends on a number of things:
If INT_MAX >= UCHAR_MAX, which is often the case, use "%d". In this case an unsigned char is promoted to int.
printf("%d",x);
Otherwise use "%u" (or "%x", "%o"). In this case an unsigned char is promoted to unsigned.
printf("%u",x);
Up-to-date compilers support the "hh" length modifier, which compensates for this ambiguity. Shouldx get promoted to int or unsigned due to the standard promotions of variadic parameters, printf() converts it to unsigned char before printing.
printf("%hhu",x);
If dealing with an old compiler without "hh" or seeking highly portable code, use explicit casting
printf("%u", (unsigned) x);
The same issue/answer applies to unsigned short, expect INT_MAX >= USHRT_MAX and use "h" instead of "hh".
For cross platform development, I typically bypass the promoting issue by using inttypes.h
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/inttypes.h.html
This header (which is in the C99 standard) defines all the printf types for the basic types. So if you want an uint8_t (a syntax which I highly suggest using instead of unsigned char) I would use
#include <inttypes.h>
#include <stdint.h>
uint8_t x;
printf("%" PRIu8 "\n",x);
Both, unsigned char and unsigned short, can always safely be printed with %u. Default argument promotions convert them either to int or to unsigned int. If they are promoted to the latter, everything is fine (the format specifier and the type passed match), otherwise C11 (n1570) 6.5.2.2 p6, first bullet, applies:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
The standard is quite clear that default argument promotions apply to the variadic arguments of printf, e.g. it's mentioned again for the (mostly useless) h and hh length modifiers (ibid. 7.21.6.1 p7, emph. mine):
hh -- Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); [...]

Resources