unsigned short int i = 0;
printf("%u\n",~i);
Why does this code return a 32 bit number in the console? It should be 16 bit, since short is 2 bytes.
The output is 4,294,967,295 and it should be 65,535.
%u expects an unsigned int; if you want to print an unsigned short int, use %hu.
EDIT
Lundin is correct that ~i will be converted to type int before being passed to printf. i is also converted to int by virtue of being passed to a variadic function. However, printf will convert the argument back to unsigned short before printing if the %hu conversion specifier is used:
7.21.6.1 The fprintf function
...
3 The format shall be a multibyte character sequence, beginning and ending in its initial
shift state. The format is composed of zero or more directives: ordinary multibyte
characters (not %), which are copied unchanged to the output stream; and conversion
specifications, each of which results in fetching zero or more subsequent arguments,
converting them, if applicable, according to the corresponding conversion specifier, and
then writing the result to the output stream.
...
7 The length modifiers and their meanings are:
...
h Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
Emphasis mine.
So, the behavior is not undefined; it would only be undefined if either i or ~i were not integral types.
When you pass an argument to printf and that argument is of integer type shorter than int, it is implicitly promoted to int as per K&R argument promotion rules. Thus your printf-call actually behaves like:
printf("%u\n", (int)~i);
Notice that this is undefined behavior since you told printf that the argument has an unsigned type whereas int is actually a signed type. Convert i to unsigned short and then to unsignedto resolve the undefined behavior and your problem:
printf("%u\n", (unsigned)(unsigned short)~i);
N1570 6.5.3.3 Unary arithmetic operators p4:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. ...
Integer type smaller than int are promoted to int. If sizeof(unsigned short) == 2 and sizeof(int) == 4, then resulting type is int.
And what's more, printf conversion specifier %u expects unsigned int, so representation of int is interpreted as unsigned int. You are basically lying to compiler, and this is undefined behaviour.
It's because the arguments to printf() are put into the stack in words, as there is no way inside printf to know that the argument is short. Also by using %u format you are merely stating that you are passing an unsigned number.
Related
I call the recv() that receive data from socket and print the end of buffer content by hex
char nbuff[BUFSZ];
while ((r_n=recv(sfd,rbuff,B_BUF,MSG_EOF))>-1)
{
printf("r_n:%d eob_p:%x\n",r_n,rbuff[r_n-1]);
if (r_n==0)
{
break;
}
memset(rbuff,0,B_BUF);
}
the result is
r_n:1674 eob_p:3c
r_n:1228 eob_p:76
r_n:2456 eob_p:ffffff81
r_n:1228 eob_p:4b
r_n:1228 eob_p:49
r_n:2456 eob_p:57
r_n:1417 eob_p:ffffff82
I am confused about why the result is 4 bytes.
I create another code to print the file that saved from buff
int main ()
{
char buff[11686];
memset(buff,0,11686);
FILE *in =fopen("web/www.sse.com.cn.html","r");
fread(buff,11686,1,in);
for (int i = 0; i < 11686 ; i++)
{
printf("%x\n",buff[i]);
}
}
the result is
....
buff[11684]:60
buff[11685]:ffffff82
why the char buff 's contents size is 4 bytes buff[11685]:ffffff82
Diagnosis
In the second example, buff is a char buffer and plain char is a signed type on your machine, and you're storing values which are negative in buff, so when they're converted to int in the call to printf(), they are negative integers (of small magnitude), printed in hex.
ISO/IEC 9899:2018
Actually, the links are to an online draft of C11, not C18, in HTML which allows links to the relevant paragraphs in the standard. AFAIK, these details have not changed between C90, C99, C11 and C18 anyway.
The standard says that the plain char type is equivalent to either signed char or unsigned char.
§6.2.5 Types ¶15:
The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.45)
45) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.
§6.3.1.1 Boolean, characters and integers ¶2,3:
2 The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions.
3 The integer promotions preserve value including sign. As discussed earlier, whether a "plain" char is treated as signed is implementation-defined.
58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclasses.
§6.5.2.6 Function calls ¶6,7:
6 If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
both types are pointers to qualified or unqualified versions of a character type or void.
7 If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
Exegesis
Note the last two sentences of §6.5.2.6 ¶7 — when the char values are promoted by the 'integer promotions', they are promoted to a (signed) int, and the negative values remain negative. Since an int has 4 bytes, and all the machines you're likely to have available use two's-complement arithmetic, the most significant 3 bytes of the value will be 0xFF each.
Prescription
To always print 2-digit hex for the characters, use %.2X (or %.2x if you prefer; you can also use either %02X or %02x) and pass either (unsigned char)rbuff[r_n-1] or rbuff[r_n-1] & 0xFF as the argument (using the variables from the first example). Or, using the variables from the second example:
printf("%.2X\n", (unsigned char)buff[i]);
printf("%.2X\n", buff[i] & 0xFF);
I'm trying to understand implicit datatype conversions in C. I thought that I had understood this topic, but yet the following code example is still confusing me.
Specifically, I have read about Usual Arithmetic Conversions and Integer Promotion previously from drafts of the C Standard.
unsigned short int a = 0;
printf("\n%lld", (signed int)a - 1);
I am compiling using GCC.
unsigned short int is 2 bytes.
int is 4 bytes.
When I run this code, I get the following result: 4294967295
I expected the result -1.
This is what I expected to happen:
Typecast takes precedence, and LHS of - becomes signed int.
- operation is carried out. No integer promotion or implicit conversions occur here, as LHS and RHS are already both signed int. The result of the operation is -1 with datatype signed int.
Within printf statement, value -1 is retained within the conversion to long long int, and -1 is displayed as the result.
Can someone please explain where the flaw in my logic is?
It's undefined behaviour due to %lld being the inappropriate format specifier for an int type.
Yes indeed (signed int)a - 1 is an int type with value -1, but the printf call is the undefined part. There's nothing in the C standard to suggest that a conversion to long long occurs.
Within printf statement, value -1 is retained within the conversion to long long int
There's no such conversion taking place. printf (family of functions) is dumb and needs a format string that corresponds to the types of the argument list.
printf does not work like an ordinary function void f (long long int x), which would have forced an implicit conversion to the type of the parameter ("as per assignment"/"lvalue conversion"). This would have given you the expected "sign extension".
Notably, there's a another kind of specialized implicit conversion going on here called the default argument promotions, that only applies to variable argument functions and functions with no prototype.
C17 6.5.2.2/6
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions.
C17 6.5.2.2/7 regarding variable argument functions:
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.
In practice this means:
float passed to printf gets implicitly converted to double during function call.
Small integer types passed to printf get implicitly converted during function call as per integer promotions, most likely ending up as int.
Other types passed to printf do not get implicitly promoted during the function call.
And then the passed and potentially converted argument gets treated internally as if it was the type specified by the conversion specifier. If that one doesn't match the actual type, the code has undefined behavior.
In your case you pass an int, it doesn't get implicitly promoted, but as printf treats it as a long long, you get undefined behavior.
Here you can consider yourself lucky. a is a short int that undergoes usual arithmetic conversions to a `signed int', even despite the cast, so
unsigned short int a = 0;
printf("\n%d", (signed int)a - 1);
and
unsigned short int a = 0;
printf("\n%d", a - 1);
would have the same behaviour, if all values of unsigned short are representable in int (as they are in your case). The result of the conversion is an int. Now, for the variable arguments, the default argument promotions are applied and any integers smaller than an int is converted to int if representable, otherwise unsigned int. But lld expects a signed long long int which is 8 bytes wide. Default argument promotions do not promote int implicitly to long long int.
Now comes the luck part - you did get a wrong value. See, since the behaviour is undefined you could have gotten the value that you're expecting, this time - after all it is completely feasible on a 64-bit processor!
Let's say we have this line of code:
printf("%hi", 6);
Let's assume sizeof(short) == 2, and sizeof(int) == 4.
printf expects a short, but is given an int, which is wider. Is this undefined behaviour?
The same with %hhi.
printf() doesn't actually expect the argument to be a short when you use %hi. When you call a variadic function, all the arguments undergo default argument promotion. In the case of integer arguments, this means integer promotions, which means that all integer types smaller than int are converted to int or unsigned int.
If the corresponding argument is a literal, all that's required is that it be a value that will fit into a short, you don't actually have to cast it to short.
The standard section 7.21.6.1.7 explains it this way:
the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing
int8_t is an 8-bit signed integer. Therefore, its value is anywhere in the range [-128...127].
int8_t num = -1;
printf("%u",num);
Output:
4294967295
Could someone give me a hint?
Your program behaviour is not defined.
%u cannot be used as a format specifier for int8_t since it's a signed type and %u is for unsigned types.
Use %d instead, and rely on the C standard guaranteed automatic promotion of num to an int type.
As others have mentioned, using the incorrect format specifier for printf is undefined behavior. The behavior you experienced cannot be depended on to be consistent between different compilers or even different builds of the same compiler.
That being said, here's is what most likely happened.
Any argument to printf after the first is of an unspecified type. So when num is passed to it, it can't do an exact type check. What ends up happening is that the value of num is promoted to type int.
From section 6.3.1.1 of the C standard:
2 The following may be used in an expression wherever an int or unsigned int may be used:
— An object or expression with an integer type
(other than int or unsigned int) whose integer conversion rank is
less than or equal to the rank of int and unsigned int.
— A bit-field of type _Bool, int, signed int, or unsigned int.
If an int
can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an
int; otherwise, it is converted to an unsigned int. These are
called the integer promotions. All other types are unchanged by
the integer promotions.
Because num was being used in a context where an int could be used, its value in the function call was promoted to int.
Assuming a 32-bit int and 2's compliment representation of negative numbers, the orginal binary representation 11111111 is converted to 11111111 11111111 11111111 11111111. If printed with the %u format specifier, it assumes this representation is unsigned so it prints 4294967295.
Had you used the %d format specifier, which expects a signed value, it would have printed -1.
To reiterate however, what you are seeing is undefined behavior. Other machines / compilers / optimization settings might yield different results.
To print an int8_t, the C standard provids the format specifier macro PRIi8 from <inttypes.h>.
printf("%" PRIi8, num);
Although, %d is fine to use to print an int8_t due to C's default argument promotions, you can simply use it for all signed types. You can see the format specifiers for other fixed width types in POSIX documentation as well.
Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?
what is the correct format for a 1 byte signed number?
%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)
So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:
the int argument is converted to an unsigned char, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)