Here's what libc has to say about variadic functions:
Since the prototype doesn’t specify types for optional arguments, in a call to a variadic function the default argument promotions are performed on the optional argument values. This means the objects of type char or short int (whether signed or not) are promoted to either int or unsigned int, as appropriate; and that objects of type float are promoted to type double. So, if the caller passes a char as an optional argument, it is promoted to an int
Then, why would anyone use "%c", or "%hd" in printf ? they should just use "%d".
I also see that there's no format specifier for float. float has to live with %f which is for double since due to promotions, it's not possible to receive a float as a variadic argument.
I know for scanf, the arguments are pointers and no promotion happens.
Is there any reason I am missing why and when "%c" must exist for printfs?
Then, why would anyone use "%c", or "%hd" in printf ? they should just use "%d".
One would use %c to interpret the integer as its character code (i.e. print 'A' instead of 65). One would use %hd to instruct printf to drop the upper portion of the short that may have been added as part of sign-extending the short value passed in. Both formats offer an alternative interpretation of an int.
I also see that there's no format specifier for float.
That's correct: since the value has been promoted to double, there is no need for a separate flag.
Related
I'm trying to understand implicit datatype conversions in C. I thought that I had understood this topic, but yet the following code example is still confusing me.
Specifically, I have read about Usual Arithmetic Conversions and Integer Promotion previously from drafts of the C Standard.
unsigned short int a = 0;
printf("\n%lld", (signed int)a - 1);
I am compiling using GCC.
unsigned short int is 2 bytes.
int is 4 bytes.
When I run this code, I get the following result: 4294967295
I expected the result -1.
This is what I expected to happen:
Typecast takes precedence, and LHS of - becomes signed int.
- operation is carried out. No integer promotion or implicit conversions occur here, as LHS and RHS are already both signed int. The result of the operation is -1 with datatype signed int.
Within printf statement, value -1 is retained within the conversion to long long int, and -1 is displayed as the result.
Can someone please explain where the flaw in my logic is?
It's undefined behaviour due to %lld being the inappropriate format specifier for an int type.
Yes indeed (signed int)a - 1 is an int type with value -1, but the printf call is the undefined part. There's nothing in the C standard to suggest that a conversion to long long occurs.
Within printf statement, value -1 is retained within the conversion to long long int
There's no such conversion taking place. printf (family of functions) is dumb and needs a format string that corresponds to the types of the argument list.
printf does not work like an ordinary function void f (long long int x), which would have forced an implicit conversion to the type of the parameter ("as per assignment"/"lvalue conversion"). This would have given you the expected "sign extension".
Notably, there's a another kind of specialized implicit conversion going on here called the default argument promotions, that only applies to variable argument functions and functions with no prototype.
C17 6.5.2.2/6
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions.
C17 6.5.2.2/7 regarding variable argument functions:
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.
In practice this means:
float passed to printf gets implicitly converted to double during function call.
Small integer types passed to printf get implicitly converted during function call as per integer promotions, most likely ending up as int.
Other types passed to printf do not get implicitly promoted during the function call.
And then the passed and potentially converted argument gets treated internally as if it was the type specified by the conversion specifier. If that one doesn't match the actual type, the code has undefined behavior.
In your case you pass an int, it doesn't get implicitly promoted, but as printf treats it as a long long, you get undefined behavior.
Here you can consider yourself lucky. a is a short int that undergoes usual arithmetic conversions to a `signed int', even despite the cast, so
unsigned short int a = 0;
printf("\n%d", (signed int)a - 1);
and
unsigned short int a = 0;
printf("\n%d", a - 1);
would have the same behaviour, if all values of unsigned short are representable in int (as they are in your case). The result of the conversion is an int. Now, for the variable arguments, the default argument promotions are applied and any integers smaller than an int is converted to int if representable, otherwise unsigned int. But lld expects a signed long long int which is 8 bytes wide. Default argument promotions do not promote int implicitly to long long int.
Now comes the luck part - you did get a wrong value. See, since the behaviour is undefined you could have gotten the value that you're expecting, this time - after all it is completely feasible on a 64-bit processor!
Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?
what is the correct format for a 1 byte signed number?
%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)
So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:
the int argument is converted to an unsigned char, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)
I saw these two parameters in a C example in a C book but the author didn't elaborate what the difference between the two are. I know that %f specifies that a float should take its place. I tried looking this up but had a hard time trying to find this w symbols. What about %lf?
The short answer is that it has no impact on printf, and denotes use of float or double in scanf.
For printf, arguments of type float are promoted to double so both %f and %lf are used for double. For scanf, you should use %f for float and %lf for double.
More detail for the language lawyers among us below:
There is no difference between %f and %lf in the printf family. The ISO C standard (all references within are from C11), section 7.21.6.1 The fprintf function, paragraph /7 states, for the l modifier (my emphasis):
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.
The reason it doesn't need to modify the f specifier is because that specifier already denotes a double, from paragraph /8 of that same section where it lists the type for the %f specifier:
A double argument representing a floating-point number is converted to decimal notation
That has to do with the fact that arguments following the ellipse in the function prototype are subject to default argument promotions as per section 6.5.2.2 Function calls, paragraph /7:
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
Since printf (and the entire family of printf-like functions) is declared as int printf(const char * restrict format, ...); with the ellipsis notation, that rule applies here. The default argument promotions are covered in section 6.5.2.2 Function calls, paragraph /6:
If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions.
For the scanf family, it mandates the use of a double rather than a float. Section 7.21.6.2 The fscanf function /11:
Specifies that a following d, i, o, u, x, X, or n conversion specifier applies to an argument with type pointer to long int or unsigned long int; that a following a, A, e, E, f, F, g, or G conversion specifier applies to an argument with type pointer to double; or that a following c, s, or [ conversion specifier applies to an argument with type pointer to wchar_t.
This modifies the /12 paragraph of that section that states, for %f:
Matches an optionally signed floating-point number, infinity, or NaN, whose format is the same as expected for the subject sequence of the strtod function. The corresponding argument shall be a pointer to floating.
For scanf, %f reads into a float, and %lf reads into a double.
For printf: In C99 and later, they both are identical, and they print either a float or a double. In C89, %lf caused undefined behaviour although it was a common extension to treat it as %f.
The reason that one specifier can be used for two different types in printf is because of the default argument promotions; arguments of type float are promoted to double when used to call a function and not matching a parameter in a function prototype. So printf just sees a double in either case.
The width modifier in %lf is gracefully ignored by printf(). Or, to be more accurate, %f takes a double - varargs will always promote float arguments to double.
For output using the printf family of functions, the %f and %lf specifiers mean the same thing; the l is ignored. Both require a corresponding argument of type double — but an argument of type float is promoted to double, which is why there’s no separate specifier for type float. (This promotion applies only to variadic functions like printf and to functions declared without a prototype, not to function calls in general.) For type long double, the correct format specifier is %Lf.
For input using the scanf family of functions, the floating-point format specifiers are %f, %lf, and %Lf. These require pointers to objects of type float, double, and long double, respectively. (There’s no float-to-double promotion because the arguments are pointers. A float value can be promoted to double, but a float* pointer can’t be promoted to a double* because the pointer has to point to an actual float object.)
But be careful using the scanf functions with numeric input. There is no defined overflow checking, and if the input is outside the range of the type your program’s behavior is undefined. For safety, read input into a string and then use something like strtod to convert it to a numeric value. (See the documentation to find out how to detect errors.)
scanf needs %lf for doubles and printf is okay with just %f
So, why is printf and scanf okay with %d?
This is what I think the reason is:
A floating point (%f) uses exactly 64 bits whereas a double floating-point number (%lf) uses at least 32. The compiler doesn't know how many bits to assign to a variable that is being scanned in by scanf, so we use %lf to tell the compiler that it needs to be at least 32 bits.
Okay... but then why do we use %d for both scanf and printf? Why not %ld and %d? %ld doesn't exist in C for starters. %d is a signed decimal number that needs at least 16 bits. You're already telling the compiler what the lower bound is in how many bits to allocate to it, so it is okay for scanf. This is also why we don't have a %ld.
Please do correct me if I am wrong or inform me what I can do to make this answer better. I'm pretty sure it is not a perfect answer.
You can think scanf as converting input stream into variables that defined in your code. Thus, scanf needs to know the exactly size for each variable. In general, the 32-bit and 64-bit IEEE 754 binary floating-point formats are used in C. So, %f means 32-bit and %lf means 64-bit.
Besides %ld exists and it means 32-bit integer. %lld also exists which means 64-bit integer. The above wiki site explains all C data types very well.
See §6.5.2.2/6-7 in the C99 standard.
§6.5.2.2/6 defines the default argument promotions: (emphasis added)
the integer promotions are performed on each argument, and arguments that have type float are promoted to double.
and specifies that these promotions are performed on arguments to a function declared with no prototype (that is, with an empty parameter list () instead of (void), where the latter is used to indicate no arguments).
Paragraph 7 additionally specifies that if the prototype of the function has a trailing ellipsis (...):
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
The integer promotions are defined in §6.3.1.1/2; they apply to
objects or expressions of an integer type whose "integer conversion rank is less than or equal to the rank of int and unsigned int": roughly speaking, any smaller integer type, such as boolean or character types;
bit-fields of type _Bool, int, signed int or unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions.
All other types are unchanged by the integer promotions.
In short, if you have varargs function, such as printf or scanf:
Integer arguments which are no larger than int are converted to (possibly unsigned) int. (This does not include long.)
Floating point arguments which are no larger than double are converted to double. (This includes float.)
Pointers are unaltered.
Other non-pointer types are unaltered.
So printf doesn't need to distinguish between float and double, because it will never be passed a float. It does need to distinguish between int and long.
But scanf does need to know whether an argument is a pointer to a float or a pointer to a double, because pointers are unchanged by the default argument promotions.
I don't understand why the output of an unsigned int is negative for the following code.
Just like a signed int.
uint32_t yyy=1<<31;
printf("%d\n",yyy);
The output is:
-2147483648
which is -2^31.
The format specifier for %d expects an int, not an unsigned int, so the code has undefined behaviour. From the C99 standard section 7.19.6.1 The fprintf function:
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
Use %u for unsigned int:
uint32_t yyy=1u<<31;
printf("%u\n",yyy);
Output:
2147483648
It's because your printf argument, as %d, is implicitly converting your number to an int.
Use %u instead.
Use %u to output unsigned numbers:
printf("%u\n", yyy);
As many have said, use the %u identifier.
The reason for this, is that printf has no way of telling what type any of the extra parameters are (they are given as a va_list), so you the programmer have to provide that information using the format string. When you then provide %d, printf will call this:
int val;
val = va_arg(va_list, int);
and implicitly cast your unsigned int into a signed.
Because you are printing it as signed. Use %u instead.
printf takes a variable number of arguments. When you call it the compiler will dutifully put them all on the stack. Because it's C there's no reflection — printf can't subsequently infer the types of things it has received. At the bit level you can't prima facie tell a signed integer from an unsigned integer or a float, a suitably small structure, part of a larger struct, etc.
That's why you also have to supply a format string. It tells printf what types to read from the stack and in what order. It depends entirely on that format string, having no ability to verify it.
Hence, as per the one-line answers already posted, if you tell it to interpret a field as a signed quantity then it'll be printed as a signed quantity.
Integers are stored in Two's Complement format. This means there is no way to tell if the number is signed or unsigned just by looking at the value. You must tell the machine which representation you want it to use and keep track of it yourself.
In your example you tell the machine that jjj is unsigned (for type checking) but then ask printf() to treat it as signed by using %d in the format string (it can't get at the type information). If you want to print an unsigned int use %u instead.
You need to use the unsigned int format specifier:
printf("%u\n",yyy);
^^
using the wrong format specifier for printf is undefined behavior, which is covered in the C99 draft standard section 7.19.6.1 The fprintf function, which also covers printf with respect to format specifiers says:
If a conversion specification is invalid, the behavior is
undefined.248) If any argument is not the correct type for
the corresponding conversion specification, the behavior is undefined.
The cppreference page for printf has a nice table specifying the format specifiers available.