Consider the following C program:
#include <stdio.h>
int main(){
int a =-1;
unsigned b=-1;
if(a==b)
printf("%d %d",a,b);
else
printf("Unequal");
return 0;
}
In the line printf("%d %d",a,b);, "%d" is used to print an unsigned type. Does this invoke undefined behavior and why?
Although you are explicitly allowed to use the va_arg macro from <stdarg.h> to retrieve a parameter that was passed as an unsigned as an int (7.15.1.1/2), in the documentation for fprintf (7.19.6.1/9) which also applies to printf, it explicitly states that if any argument is not the correct type for the format specifier - for an unmodified %d, that is int - then the behaviour is not defined.
As #bdonlan notes in a comment, if the value of b (in this case 2^N - 1 for some N) is not representable in an int then it would be undefined behavior to attempt to access the value as an int using va_arg in any case. This would only work on platforms where the representation of an unsigned used at least one padding bit where the corresponding int representation had a sign bit.
Even in the case where the value of (unsigned)-1 can be represented in an int, I still read this as being technically undefined behavior. As part of the implementation, it would seem to be allowed for an implementation to use built in magic instead of va_args to access the parameters to printf and if you pass something as an unsigned where an int is required then you have technically violated the contract for printf.
The standard isn't 100% clear on this point. On one hand, you get the specification for va_arg, which says (§7.15.1.1/2):
If there is no actual next argument, or if
type is not compatible with the type of the actual next argument (as promoted according
to the default argument promotions), the behavior is undefined, except for the following
cases:
one type is a signed integer type, the other type is the corresponding unsigned integer
type, and the value is representable in both types;
one type is pointer to void and the other is a pointer to a character type.
On the other hand, you get the specification of printf (§7.19.6.1/9):
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
Given that it's pretty much a given that printf will retrieve arguments with va_arg, I'd say you're pretty safe with values that can be represented in the target type, but not otherwise. Since you've converted -1 to an unsigned before you pass it, the value will be out of the range that can be represented in a signed int, so the behavior will be undefined.
Yes, the if will always evaluate to true and the printf will attempt to print an unsigned as a signed. Since the signed type may have trap representations, this may be UB if the sign representation is one's complement.
Related
I'm trying to understand implicit datatype conversions in C. I thought that I had understood this topic, but yet the following code example is still confusing me.
Specifically, I have read about Usual Arithmetic Conversions and Integer Promotion previously from drafts of the C Standard.
unsigned short int a = 0;
printf("\n%lld", (signed int)a - 1);
I am compiling using GCC.
unsigned short int is 2 bytes.
int is 4 bytes.
When I run this code, I get the following result: 4294967295
I expected the result -1.
This is what I expected to happen:
Typecast takes precedence, and LHS of - becomes signed int.
- operation is carried out. No integer promotion or implicit conversions occur here, as LHS and RHS are already both signed int. The result of the operation is -1 with datatype signed int.
Within printf statement, value -1 is retained within the conversion to long long int, and -1 is displayed as the result.
Can someone please explain where the flaw in my logic is?
It's undefined behaviour due to %lld being the inappropriate format specifier for an int type.
Yes indeed (signed int)a - 1 is an int type with value -1, but the printf call is the undefined part. There's nothing in the C standard to suggest that a conversion to long long occurs.
Within printf statement, value -1 is retained within the conversion to long long int
There's no such conversion taking place. printf (family of functions) is dumb and needs a format string that corresponds to the types of the argument list.
printf does not work like an ordinary function void f (long long int x), which would have forced an implicit conversion to the type of the parameter ("as per assignment"/"lvalue conversion"). This would have given you the expected "sign extension".
Notably, there's a another kind of specialized implicit conversion going on here called the default argument promotions, that only applies to variable argument functions and functions with no prototype.
C17 6.5.2.2/6
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions.
C17 6.5.2.2/7 regarding variable argument functions:
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.
In practice this means:
float passed to printf gets implicitly converted to double during function call.
Small integer types passed to printf get implicitly converted during function call as per integer promotions, most likely ending up as int.
Other types passed to printf do not get implicitly promoted during the function call.
And then the passed and potentially converted argument gets treated internally as if it was the type specified by the conversion specifier. If that one doesn't match the actual type, the code has undefined behavior.
In your case you pass an int, it doesn't get implicitly promoted, but as printf treats it as a long long, you get undefined behavior.
Here you can consider yourself lucky. a is a short int that undergoes usual arithmetic conversions to a `signed int', even despite the cast, so
unsigned short int a = 0;
printf("\n%d", (signed int)a - 1);
and
unsigned short int a = 0;
printf("\n%d", a - 1);
would have the same behaviour, if all values of unsigned short are representable in int (as they are in your case). The result of the conversion is an int. Now, for the variable arguments, the default argument promotions are applied and any integers smaller than an int is converted to int if representable, otherwise unsigned int. But lld expects a signed long long int which is 8 bytes wide. Default argument promotions do not promote int implicitly to long long int.
Now comes the luck part - you did get a wrong value. See, since the behaviour is undefined you could have gotten the value that you're expecting, this time - after all it is completely feasible on a 64-bit processor!
Aside from %hn and %hhn (where the h or hh specifies the size of the pointed-to object), what is the point of the h and hh modifiers for printf format specifiers?
Due to default promotions which are required by the standard to be applied for variadic functions, it is impossible to pass arguments of type char or short (or any signed/unsigned variants thereof) to printf.
According to 7.19.6.1(7), the h modifier:
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
If the argument was actually of type short or unsigned short, then promotion to int followed by a conversion back to short or unsigned short will yield the same value as promotion to int without any conversion back. Thus, for arguments of type short or unsigned short, %d, %u, etc. should give identical results to %hd, %hu, etc. (and likewise for char types and hh).
As far as I can tell, the only situation where the h or hh modifier could possibly be useful is when the argument passed it an int outside the range of short or unsigned short, e.g.
printf("%hu", 0x10000);
but my understanding is that passing the wrong type like this results in undefined behavior anyway, so that you could not expect it to print 0.
One real world case I've seen is code like this:
char c = 0xf0;
printf("%hhx", c);
where the author expects it to print f0 despite the implementation having a plain char type that's signed (in which case, printf("%x", c) would print fffffff0 or similar). But is this expectation warranted?
(Note: What's going on is that the original type was char, which gets promoted to int and converted back to unsigned char instead of char, thus changing the value that gets printed. But does the standard specify this behavior, or is it an implementation detail that broken software might be relying on?)
One possible reason: for symmetry with the use of those modifiers in the formatted input functions? I know it wouldn't be strictly necessary, but maybe there was value seen for that?
Although they don't mention the importance of symmetry for the "h" and "hh" modifiers in the C99 Rationale document, the committee does mention it as a consideration for why the "%p" conversion specifier is supported for fscanf() (even though that wasn't new for C99 - "%p" support is in C90):
Input pointer conversion with %p was added to C89, although it is obviously risky, for symmetry with fprintf.
In the section on fprintf(), the C99 rationale document does discuss that "hh" was added, but merely refers the reader to the fscanf() section:
The %hh and %ll length modifiers were added in C99 (see §7.19.6.2).
I know it's a tenuous thread, but I'm speculating anyway, so I figured I'd give whatever argument there might be.
Also, for completeness, the "h" modifier was in the original C89 standard - presumably it would be there even if it wasn't strictly necessary because of widespread existing use, even if there might not have been a technical requirement to use the modifier.
In %...x mode, all values are interpreted as unsigned. Negative numbers are therefore printed as their unsigned conversions. In 2's complement arithmetic, which most processors use, there is no difference in bit patterns between a signed negative number and its positive unsigned equivalent, which is defined by modulus arithmetic (adding the maximum value for the field plus one to the negative number, according to the C99 standard). Lots of software- especially the debugging code most likely to use %x- makes the silent assumption that the bit representation of a signed negative value and its unsigned cast is the same, which is only true on a 2's complement machine.
The mechanics of this cast are such that hexidecimal representations of value always imply, possibly inaccurately, that a number has been rendered in 2's complement, as long as it didn't hit an edge condition of where the different integer representations have different ranges. This even holds true for arithmetic representations where the value 0 is not represented with the binary pattern of all 0s.
A negative short displayed as an unsigned long in hexidecimal will therefore, on any machine, be padded with f, due to implicit sign extension in the promotion, which printf will print. The value is the same, but it is truly visually misleading as to the size of the field, implying a significant amount of range that simply isn't present.
%hx truncates the displayed representation to avoid this padding, exactly as you concluded from your real-world use case.
The behavior of printf is undefined when passed an int outside the range of short that should be printed as a short, but the easiest implementation by far simply discards the high bit by a raw downcast, so while the spec doesn't require any specific behavior, pretty much any sane implementation is going to just perform the truncation. There're generally better ways to do that, though.
If printf isn't padding values or displaying unsigned representations of signed values, %h isn't very useful.
The only use I can think of is for passing an unsigned short or unsigned char and using the %x conversion specifier. You cannot simply use a bare %x - the value may be promoted to int rather than unsigned int, and then you have undefined behaviour.
Your alternatives are either to explicitly cast the argument to unsigned; or to use %hx / %hhx with a bare argument.
The variadic arguments to printf() et al are automatically promoted using the default conversions, so any short or char values are promoted to int when passed to the function.
In the absence of the h or hh modifiers, you would have to mask the values passed to get the correct behaviour reliably. With the modifiers, you no longer have to mask the values; the printf() implementation does the job properly.
Specifically, for the format %hx, the code inside printf() can do something like:
va_list args;
va_start(args, format);
...
int i = va_arg(args, int);
unsigned short s = (unsigned short)i;
...print s correctly, as 4 hex digits maximum
...even on a machine with 64-bit `int`!
I'm blithely assuming that short is a 16-bit quantity; the standard does not actually guarantee that, of course.
I found it useful to avoid casting when formatting unsigned chars to hex:
sprintf_s(tmpBuf, 3, "%2.2hhx", *(CEKey + i));
It's a minor coding convenience, and looks cleaner than multiple casts (IMO).
another place it's handy is snprintf size check.
gcc7 added size check when using snprintf
so this will fail
char arr[4];
char x='r';
snprintf(arr,sizeof(arr),"%d",r);
so it forces you to use bigger char when using %d when formatting a char
here is a commit that shows those fixes instead of increasing the char array size they changed %d to %h. this also give more accurate description
https://github.com/Mellanox/libvma/commit/b5cb1e34a04b40427d195b14763e462a0a705d23#diff-6258d0a11a435aa372068037fe161d24
I agree with you that it is not strictly necessary, and so by that reason alone is no good in a C library function :)
It might be "nice" for the symmetry of the different flags, but it is mostly counter-productive because it hides the "conversion to int" rule.
int8_t is an 8-bit signed integer. Therefore, its value is anywhere in the range [-128...127].
int8_t num = -1;
printf("%u",num);
Output:
4294967295
Could someone give me a hint?
Your program behaviour is not defined.
%u cannot be used as a format specifier for int8_t since it's a signed type and %u is for unsigned types.
Use %d instead, and rely on the C standard guaranteed automatic promotion of num to an int type.
As others have mentioned, using the incorrect format specifier for printf is undefined behavior. The behavior you experienced cannot be depended on to be consistent between different compilers or even different builds of the same compiler.
That being said, here's is what most likely happened.
Any argument to printf after the first is of an unspecified type. So when num is passed to it, it can't do an exact type check. What ends up happening is that the value of num is promoted to type int.
From section 6.3.1.1 of the C standard:
2 The following may be used in an expression wherever an int or unsigned int may be used:
— An object or expression with an integer type
(other than int or unsigned int) whose integer conversion rank is
less than or equal to the rank of int and unsigned int.
— A bit-field of type _Bool, int, signed int, or unsigned int.
If an int
can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an
int; otherwise, it is converted to an unsigned int. These are
called the integer promotions. All other types are unchanged by
the integer promotions.
Because num was being used in a context where an int could be used, its value in the function call was promoted to int.
Assuming a 32-bit int and 2's compliment representation of negative numbers, the orginal binary representation 11111111 is converted to 11111111 11111111 11111111 11111111. If printed with the %u format specifier, it assumes this representation is unsigned so it prints 4294967295.
Had you used the %d format specifier, which expects a signed value, it would have printed -1.
To reiterate however, what you are seeing is undefined behavior. Other machines / compilers / optimization settings might yield different results.
To print an int8_t, the C standard provids the format specifier macro PRIi8 from <inttypes.h>.
printf("%" PRIi8, num);
Although, %d is fine to use to print an int8_t due to C's default argument promotions, you can simply use it for all signed types. You can see the format specifiers for other fixed width types in POSIX documentation as well.
Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?
what is the correct format for a 1 byte signed number?
%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)
So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:
the int argument is converted to an unsigned char, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)
I wouldn't expect the value that gets printed to be the initial negative value. Is there something I'm missing for type casting?
#include<stdint.h>
int main() {
int32_t color = -2451337;
uint32_t color2 = (uint32_t)color;
printf("%d", (uint32_t)color2);
return 0;
}
int32_t color = -2451337;
uint32_t color2 = (uint32_t)color;
The cast is unnecessary; if you omit it, exactly the same conversion will be done implicitly.
For any conversion between two numeric types, if the value is representable in both types, the conversion preserves the value. But since color is negative, that's not the case here.
For conversion from a signed integer type to an unsigned integer type, the result is implementation-defined (or it can raise an implementation-defined signal, but I don't know of any compiler that does that).
Under most compilers, conversions between integer types of the same size just copies or reinterprets the bits making up the representation. The standard requires int32_t to use two's-complement representation, so if the conversion just copies the bits, then the result will be 4292515959.
(Other results are permitted by the C standard, but not likely to be implemented by real-world compilers. The standard permits one's-complement and sign-and magnitude representations for signed integer types, but specifically requires int32_t to use two's-complement; a C compiler for a one's complement CPU probably just would't define int32_t.)
printf("%d", (uint32_t)color2);
Again, the cast is unnecessary, since color2 is already of type uint32_t. But the "%d" format requires an argument of type int, which is a signed type (that may be as narrow as 16 bits). In this case, the uint32_t value isn't converted to int. Most likely the representation of color2 will be treated as if it were an int object, but the behavior is undefined, so as far as the C standard is concerned quite literally anything could happen.
To print a uint32_t value, you can use the PRId32 macro defined in <inttypes.h>:
printf("%" PRId32, color32);
Or, perhaps more simply, you can convert it to the widest unsigned integer type and use "%ju":
printf("%ju", (uintmax_t)color32);
This will print the implementation-defined value (probably 4292515959) of color32.
And you should add a newline \n to the format string.
More quibbles:
You're missing #include <stdio.h>, which is required if you call printf.
int main() is ok, but int main(void) is preferred.
You took a bunch of bits (stored in signed value). You then told the CPU to interpret that bunch of bits as unsigned. You then told the cpu to render the same bunch of bits as signed again (%d). You would therefore see the same as you first entered.
C just deals in bunches of bits. If the value you had chosen was near the representational limit of the type(s) involved (read up on twos-complement representation), then we might see some funky effects, but the value you happened to choose wasn't. So you got back what you put in.