I recently noticed a (weird) behavior when I conducted operations using shift >> <<!
To explain it, let me write this small runnable code that does two operations which are supposed to be identical(In my understanding), but I'm surprised with different results!
#include <stdio.h>
int main(void) {
unsigned char a=0x05, b=0x05;
// first operation
a = ((a<<7)>>7);
// second operation
b <<= 7;
b >>= 7;
printf("a=%X b=%X\n", a, b);
return 0;
}
When ran, a = 5 and b = 1. I expect them both to be equal to 1! Can someone kindly explain why I got such a result?
P.S: In my environment the size of unsigned char is 1 byte
In the first example:
a is converted to an int, shifted left, then right and then converted back to usigned char.
This will result to a=5 obviously.
In the second example:
b is converted to int, shifted left, then converted back to unsigned char.
b is converted to int, shifted right, then converted back to unsigned char.
The difference is that you lose information in the second example during the conversion to unsigned char
Detailed explanation of the things going on between the lines:
Case a:
In the expression a = ((a<<7)>>7);, a<<7 is evaluated first.
The C standard states that each operand of the shift operators is implicitly integer promoted, meaning that if they are of types bool, char, short etc (collectively the "small integer types"), they get promoted to an int.
This is standard practice for almost every operator in C. What makes the shift operators different from other operators is that they don't use the other kind of common, implicit promotion called "balancing". Instead, the result of a shift always have the type of the promoted left operand. In this case int.
So a gets promoted to type int, still containing the value 0x05. The 7 literal was already of type int so it doesn't get promoted.
When you left shift this int by 7, you get 0x0280. The result of the operation is of type int.
Note that int is a signed type, so had you kept shifting data further, into the sign bits, you would have invoked undefined behavior. Similarly, had either the left or the right operand been a negative value, you would also invoke undefined behavior.
You now have the expression a = 0x280 >> 7;. No promotions take place for the next shift operation, since both operands are already int.
The result is 5 and of the type int. You then convert this int to an unsigned char, which is fine, since the result is small enough to fit.
Case b:
b <<= 7; is equivalent to b = b << 7;.
As before, b gets promoted to an int. The result will again be 0x0280.
You then attempt to store this result in an unsigned char. It will not fit, so it will get truncated to only contain the least significant byte 0x80.
On the next line, b again gets promoted to an int, containing 0x80.
And then you shift 0x80 by 7, getting the result 1. This is of type int, but can fit in an unsigned char, so it will fit in b.
Good advice:
Never ever use bit-wise operators on signed integer types. This doesn't make any sense in 99% of the cases but can lead to various bugs and poorly defined behavior.
When using bit-wise operators, use the types in stdint.h rather than the primitive default types in C.
When using bit-wise operators, use explicit casts to the intended type, to prevent bugs and unintended type changes, but also to make it clear that you actually understand how implicit type promotions work, and that you didn't just get the code working by accident.
A better, safer way to write your program would have been:
#include <stdio.h>
#include <stdint.h>
int main(void) {
uint8_t a=0x05;
uint8_t b=0x05;
uint32_t tmp;
// first operation
tmp = (uint32_t)a << 7;
tmp = tmp >> 7;
a = (uint8_t)tmp;
// second operation
tmp = (uint32_t)b << 7;
tmp = tmp >> 7;
b = (uint8_t)tmp;
printf("a=%X b=%X\n", a, b);
return 0;
}
The shift operations would do integer promotions to its operands, and in your code the resulting int is converted back to char like this:
// first operation
a = ((a<<7)>>7); // a = (char)((a<<7)>>7);
// second operation
b <<= 7; // b = (char) (b << 7);
b >>= 7; // b = (char) (b >> 7);
Quote from the N1570 draft (which became the standard of C11 later):
6.5.7 Bitwise shift operators:
Each of the operands shall have integer type.
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
And it's supposed that in C99 and C90 there are similar statements.
Related
In Go (the language I'm most familiar with), the result of a mathematical operation is always the same data type as the operands, meaning if the operation overflows, the result will be incorrect. For example:
func main() {
var a byte = 100
var b byte = 9
var r byte = (a << b) >> b
fmt.Println(r)
}
This prints 0, as all the bits are shifted out of the bounds of a byte during the initial << 9 operation, then zeroes are shifted back in during the >> 9 operation.
However, this isn't the case in C:
int main() {
unsigned char a = 100;
unsigned char b = 9;
unsigned char r = (a << b) >> b;
printf("%d\n", r);
return 0;
}
This code prints 100. Although this yields the "correct" result, this is unexpected to me, as I'd only expect promotion if one of the operands were larger than a byte, but in this case all operands are bytes. It's as though the temporary variable holding the result of the << 9 operation is larger than the resulting variable, and is only downcast back to a byte after the full RHS is evaluated, and thus after the >> 9 operation restores the bits.
Obviously, if explicitly storing the result of the >> 9 into a byte before continuing, you get the same result as in Go:
int main() {
unsigned char a = 100;
unsigned char b = 9;
unsigned char c = a << b;
unsigned char r = c >> b;
printf("%d\n", r);
return 0;
}
This isn't merely the case with bitwise operators. I've tested with multiplication/division too, and it demonstrates the same behaviour.
My question is: is this behaviour of C defined? If so, where? Does it actually use a specific data type for the interim values of a complex expression? Or is this actually undefined behaviour, like an incidental result of the operations being performed in a 32/64 bit CPU register before being saved back to memory?
C 2018 6.5.7 discusses the shift operators. Paragraph 3 says:
The integer promotions are performed on each of the operands…
6.3.1.1 2 specifies the integer promotions:
… If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
Thus in a << b where a and b are unsigned char, a is promoted to int, which is at least 16 bits. (A C implementation may define unsigned char to be more than eight bits. It could be the same width as int. In this case, the integer promotions would not convert a or b.)
Note that if the integer promotions were not applied, the behavior of evaluating a << b with b equal to 9 would not be defined by the C standard, as the behavior of the shift operators is not defined for shift amounts greater than or equal to the width of the left operator.
6.5.5 specifies the multiplicative operators. Paragraph 3 says:
The usual arithmetic conversions are performed on the operands.
6.3.1.8 specifies the usual arithmetic conversions:
… First, if the corresponding real type of either operand is long double, the other operand is converted, without change of type domain [complex or real], to a type whose corresponding real type is long double.
Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double.
Otherwise, if the corresponding real type of either operand is float, the other operand is converted, without change of type domain, to a type whose corresponding real type is float.
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
Rank has a technical definition that largely corresponds to width (number of bits in an integer type).
Thus, in a * b where a and b are unsigned char, they are both promoted to int (with the caveat above about wide unsigned char) and no further conversions are necessary. If one operand were wider than int, say long long int, while the other is unsigned char then both operands would be converted to that wider type.
Welcome to integer promotions! One behavior of the C language (an often criticized one, I'd add) is that types like char and short are promoted to int before doing any arithmetic operation with them, and the result is also int. What does this mean?
unsigned char foo(unsigned char x) {
return (x << 4) >> 4;
}
int main(void) {
if (foo(0xFF) == 0x0F) {
printf("Yay!\n");
}
else {
printf("... hey, wait a minute!\n");
}
return 0;
}
Needless to say, the above code prints ... hey, wait a minute!. Let's discover why:
// this line of code:
return (x << 4) >> 4;
// is converted to this (because of integer promotion):
return ((int) x << 4) >> 4;
Therefore, this is what happens:
x is unsigned char (8-bit) and its value is 0xFF,
x << 4 needs to be executed, but first x is converted to int (32-bit),
x << 4 becomes 0x000000FF << 4, and the result 0x00000FF0 is also int,
0x00000FF0 >> 4 is executed, yielding 0x000000FF,
finally, 0x000000FF is converted to unsigned char (because that's the return value of foo()), so it becomes 0xFF,
and that's why foo(0xFF) yields 0xFF instead of 0x0F.
How to prevent this? Simple: convert the result of x << 4 to unsigned char. In the previous example, 0x00000FF0 would have become 0xF0.
unsigned char foo(unsigned char x) {
return ((unsigned char) (x << 4)) >> 4;
}
foo(0xFF) == 0x0F
NOTE: in the previous examples, it is assumed that unsigned char is 8 bits and int is 32 bits, but the examples work for basically any situation in which CHAR_BIT == 8 (because C17 requires that sizeof(int) * CHAR_BIT >= 16).
P.S.: this answer is not as exhaustive as the C official standard document, of course. But you can find all the (valid and defined) behavior of C described in the latest draft of the ISO/IEC 9899:2018 standard (a.k.a. C17/C18).
I've got a function which is supposed to insert a short into a char array, big-endian. This is what it looks like:
unsigned short getShort(char* arr, int index)
{
unsigned short n = 0;
int i;
for (i = 0; i <= 1; i++)
{
n <<= 8;
n |= arr[index + i];
}
return n;
}
Instead of working as it should, however, everything but the least significant byte (AKA the most significant byte in this case) gets transformed into 0xFF. If I insert printf("%x\n", arr[index + i]); into the beginning of the for loop (and a separator after), I get this output:
ffffffaa
ffffff88
---
0
8
---
0
0
---
0
0
---
...
---
ffffffb9
ffffffe8
---
0
e
---
0
e
---
...
Some bytes are just padded with 0xFF, bringing them up to 32 bits. The first two bytes are supposed to be 0xAA and 0x88, and those second strange ones 0xB9 and 0xE8, but apparently they don't turn out that way. In fact, examining n every step of the way, it definitely gets |ed with the 32-bit number instead of the 8-bit char.
The weirdest part is sizeof(arr[index + i]) still returns 1, and switching out n |= arr[index + i]; for n |= (char) arr[index + i]; has the same result. What does get me the correct values is switching it for n |= arr[index + i] & 0xFF;, but... it should already be 8 bits, right?
So what the heck is happening here?
Plain char can be a signed or unsigned type; on your machine, it appears to be signed. When a signed value with the high bit set is converted to an int, it is converted to a negative int. That's why you get the result you see.
When the value arr[index + i] is passed to printf(), it is converted to an int because that is how small types are handled when passed to variadic functions like printf() — char and short are converted to int, and float is converted to double.
There are also problems in the function. You should use one of:
unsigned short getShort(char* arr, int index)
{
unsigned short n = 0;
int i;
for (i = 0; i <= 1; i++)
{
n <<= 8;
n |= (unsigned char)arr[index + i];
}
return n;
}
or:
unsigned short getShort(char* arr, int index)
{
unsigned short n = 0;
int i;
for (i = 0; i <= 1; i++)
{
n <<= 8;
n |= arr[index + i] & 0xFF;
}
return n;
}
Though frankly, the loop is a bit of overkill; you could use:
unsigned short getShort(char* arr, int index)
{
return (arr[index + 0] << 8) | (arr[index + 1] & 0xFF);
}
and if you have a C99 compiler, you could even add the inline function specifier which might give you the benefits of macro-like behaviour with the safety of a true function:
static inline unsigned short getShort(char* arr, int index)
{
return (arr[index + 0] << 8) | (arr[index + 1] & 0xFF);
}
There's a moderate chance that the compiler's optimizer would produce code more or less equivalent to the functions with just a return statement even if you left the code written as a loop. If you need to have similar functions for 4-byte and 8-byte integers, keeping the loop might be better for consistency.
Note that I am making assumptions such as sizeof(short) == 2 and CHAR_BIT == 8. These are not guaranteed by the C standard, but they are the commonest configuration on desktop and server machines.
But…
obskyr asks:
This makes … no sense. It isn't converted to a negative int, it's implicitly converted to an int with all the 24 high bits set to 1. Which isn't the same number. And it's not only in the printf, but in n |= arr[index + i] too. Why is this, and why does it not convert to the actual number?
There are several misconceptions here. First, I said 'a negative int'; I did not say the value of the unsigned value negated. For example, 0xFF maps to -1, but 0xFF as an unsigned number is 255, not 1.
The 'why' is because the C standard says that's what should happen. I've omitted a section which describes 'ranks', but generally speaking, shorter types have a lower rank than longer types.
ISO/IEC 9899:2011
The current C standard, C11, says what follows, but the earlier version said very much the same thing in much the same words:
§6.3 Conversions
§6.3.1 Arithmetic conversions
§6.3.1.1 Boolean, characters and integers
…
¶2 The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and
unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions.
¶3 The integer promotions preserve value including sign. As discussed earlier, whether a
‘‘plain’’ char is treated as signed is implementation-defined.
58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain
argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the
shift operators, as specified by their respective subclauses.
6.3.1.8 Usual arithmetic conversions
¶1 Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
and this material is followed by a list of rules, moving on to:
Otherwise, the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
So, in the context of the expression:
n |= arr[index + i];
This is equivalent to:
n = n | arr[index + i];
And in this context, the value n on the RHS is promoted to int, and the value of arr[index + i] is promoted to int, and the | operation works on two int values, and the result is then converted to unsigned short, which is the type of n.
§6.5.12 Bitwise inclusive OR operator
Constraints
2 Each of the operands shall have integer type.
Semantics
¶3 The usual arithmetic conversions are performed on the operands.
¶4 The result of the | operator is the bitwise inclusive OR of the operands (that is, each bit in the result is set if and only if at least one of the corresponding bits in the converted operands is set).
(Note that 'integer type' is not the same as 'type int'.)
And in the context of a function call to a function with variadic arguments:
§6.5.2.2 Function calls
¶6 6 If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions.
¶7 If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
The value is being sign extended in the printf. Apparently your compiler's default is signed char which has a range of -128 .. 127. It is not using 32 bits, only 8.
When a signed char is promoted to an int, it performs sign extension to 32 bits (in your case). Such conversions are common in C.
Could someone explain me why:
x = x << 1;
x = x >> 1;
and:
x = (x << 1) >> 1;
produce different answers in C? x is a *uint8_t* type (unsigned 1-byte long integer). For example when I pass it 128 (10000000) in the first case it returns 0 (as expected most significant bit falls out) but in the second case it returns the original 128. Why is that? I'd expect these expressions to be equivalent?
This is due to integer promotions, both operands of the bit-wise shifts will be promoted to int in both cases. In the second case:
x = (x << 1) >> 1;
the result of x << 1 will be an int and therefore the shifted bit will be preserved and available to the next step as an int which will shift it back again. In the first case:
x = x << 1;
x = x >> 1;
when you assign back to x you will lose the extra bits. From the draft C99 standard section 6.5.7 Bit-wise shift operators it says:
The integer promotions are performed on each of the operands.
The integer promotions are covered in section 6.3.1.1 Boolean, characters, and integers paragraph 2 which says:
If an int can represent all values of the original type, the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the integer
promotions.48)
The last piece of this why does the conversion from the int value 256 to uint8_t give us 0? The conversion is covered in section 6.3.1.3 Signed and unsigned integers which is under the Conversions section and says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
So we have 256 - (255+1) which is 0.
When you bitshift the result is promoted to int. In the first example you convert int back to uint8_t everytime, and lose the intermediate data. But in the second example you keep the int result when you shift back.
Please look at my test code:
#include <stdlib.h>
#include <stdio.h>
#define PRINT_COMPARE_RESULT(a, b) \
if (a > b) { \
printf( #a " > " #b "\n"); \
} \
else if (a < b) { \
printf( #a " < " #b "\n"); \
} \
else { \
printf( #a " = " #b "\n" ); \
}
int main()
{
signed int a = -1;
unsigned int b = 2;
signed short c = -1;
unsigned short d = 2;
PRINT_COMPARE_RESULT(a,b);
PRINT_COMPARE_RESULT(c,d);
return 0;
}
The result is the following:
a > b
c < d
My platform is Linux, and my gcc version is 4.4.2.
I am surprised by the second line of output.
The first line of output is caused by integer promotion. But why is the result of the second line different?
The following rules are from C99 standard:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
I think both of the two comparisons should belong to the same case, the second case of integer promotion.
When you use an arithmetic operator, the operands go through two conversions.
Integer promotions: If int can represent all values of the type, then the operand is promoted to int. This applies to both short and unsigned short on most platforms. The conversion performed on this stage is done on each operand individually, without regard for the other operand. (There are more rules, but this is the one that applies.)
Usual arithmetic conversions: If you compare an unsigned int against a signed int, since neither includes the entire range of the other, and both have the same rank, then both are converted to the unsigned type. This conversion is done after examining the type of both operands.
Obviously, the "usual arithmetic conversions" don't always apply, if there are not two operands. This is why there are two sets of rules. One gotcha, for example, is that shift operators << and >> don't do usual arithmetic conversions, since the type of the result should only depend on the left operand (so if you see someone type x << 5U, then the U stands for "unnecessary").
Breakdown: Let's assume a typical system with 32-bit int and 16-bit short.
int a = -1; // "signed" is implied
unsigned b = 2; // "int" is implied
if (a < b)
puts("a < b"); // not printed
else
puts("a >= b"); // printed
First the two operands are promoted. Since both are int or unsigned int, no promotions are done.
Next, the two operands are converted to the same type. Since int can't represent all possible values of unsigned, and unsigned can't represent all possible values of int, there is no obvious choice. In this case, both are converted to unsigned.
When converting from signed to unsigned, 232 is repeatedly added to the signed value until it is in the range of the unsigned value. This is actually a noop as far as the processor is concerned.
So the comparison becomes if (4294967295u < 2u), which is false.
Now let's try it with short:
short c = -1; // "signed" is implied
unsigned short d = 2;
if (c < d)
puts("c < d"); // printed
else
puts("c >= d"); // not printed
First, the two operands are promoted. Since both can be represented faithfully by int, both are promoted to int.
Next, they are converted to the same type. But they already are the same type, int, so nothing is done.
So the comparison becomes if (-1 < 2), which is true.
Writing good code: There's an easy way to catch these "gotchas" in your code. Just always compile with warnings turned on, and fix the warnings. I tend to write code like this:
int x = ...;
unsigned y = ...;
if (x < 0 || (unsigned) x < y)
...;
You have to watch out that any code you do write doesn't run into the other signed vs. unsigned gotcha: signed overflow. For example, the following code:
int x = ..., y = ...;
if (x + 100 < y + 100)
...;
unsigned a = ..., b = ...;
if (a + 100 < b + 100)
...;
Some popular compilers will optimize (x + 100 < y + 100) to (x < y), but that is a story for another day. Just don't overflow your signed numbers.
Footnote: Note that while signed is implied for int, short, long, and long long, it is NOT implied for char. Instead, it depends on the platform.
Taken from the C++ standard:
4.5 Integral promotions [conv.prom] 1 An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be
converted to an rvalue of type int if int can represent all the values of the
source type; otherwise, the source rvalue can be converted to an
rvalue of type unsigned int.
In practice it means, that all operations (on the types in the list) are actually evaluated on the type int if it can cover the whole value set you are dealing with, otherwise it is carried out on unsigned int.
In the first case the values are compared as unsigned int because one of them was unsigned int and this is why -1 is "greater" than 2. In the second case the values a compared as signed integers, as int covers the whole domain of both short and unsigned short and so -1 is smaller than 2.
(Background story: Actually, all this complex definition about covering all the cases in this way is resulting that the compilers can actually ignore the actual type behind (!) :) and just care about the data size.)
The conversion process for C++ is described as the usual arithmetic conversions. However, I think the most relevant rule is at the sub-referenced section conv.prom: Integral promotions 4.6.1:
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank ([conv.rank]) is less than the
rank of int can be converted to a prvalue of type int if int can
represent all the values of the source type; otherwise, the source
prvalue can be converted to a prvalue of type unsigned int.
The funny thing there is the use of the word "can", which I think suggests that this promotion is performed at the discretion of the compiler.
I also found this C-spec snippet that hints at the omission of promotion:
11 EXAMPLE 2 In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2;
the ``integer promotions'' require that the abstract machine promote the value of each variable to int size
and then add the two ints and truncate the sum. Provided the addition of two chars can be done without
overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only
produce the same result, possibly omitting the promotions.
There is also the definition of "rank" to be considered. The list of rules is pretty long, but as it applies to this question "rank" is straightforward:
The rank of any unsigned integer type shall equal the rank of the
corresponding signed integer type.
This question was first inspired by the (unexpected) results of this code:
uint16_t t16 = 0;
uint8_t t8 = 0x80;
uint8_t t8_res;
t16 = (t8 << 1);
t8_res = (t8 << 1);
printf("t16: %x\n", t16); // Expect 0, get 0x100
printf(" t8: %x\n", t8_res); // Expect 0, get 0
But it turns out this makes sense:
6.5.7 Bitwise shift operators
Constraints
2 Each of the operands shall have integer type
Thus the originally confused line is equivalent to:
t16 = (uint16_t) (((int) t8) << 1);
A little non-intuitive IMHO, but at least well-defined.
Ok, great, but then we do:
{
uint64_t t64 = 1;
t64 <<= 31;
printf("t64: %lx\n", t64); // Expect 0x80000000, get 0x80000000
t64 <<= 31;
printf("t64: %lx\n", t64); // Expect 0x0, get 0x4000000000000000
}
// edit: following the same literal argument as above, the following should be equivalent:
t64 = (uint64_t) (((int) t64) << 31);
// hence my confusion / expectation [end_edit]
Now, we get the intuitive result, but not what would be derived from my (literal) reading of the standard. When / how does this "further automatic type promotion" take place? Or is there a limitation elsewhere that a type can never be demoted (that would make sense?), in that case, how do the promotion rules apply for:
uint32_t << uint64_t
Since the standard does say both arguments are promoted to int; should both arguments be promoted to the same type here?
// edit:
More specifically, what should the result of:
uint32_t t32 = 1;
uint64_t t64_one = 1;
uint64_t t64_res;
t64_res = t32 << t64_one;
// end edit
The answer to the above question is resolved when we recognize that the spec does not demand a promotion to int specifically, rather to an integer type, which uint64_t qualifies as.
// CLARIFICATION EDIT:
Ok, but now I am confused again. Specifically, if uint8_t is an integer type, then why is it being promoted to int at all? It does not seem to be related to the constant int 1, as the following exercise demonstrates:
{
uint16_t t16 = 0;
uint8_t t8 = 0x80;
uint8_t t8_one = 1;
uint8_t t8_res;
t16 = (t8 << t8_one);
t8_res = (t8 << t8_one);
printf("t16: %x\n", t16);
printf(" t8: %x\n", t8_res);
}
t16: 100
t8: 0
Why is the (t8 << t8_one) expression being promoted if uint8_t is an integer type?
--
For reference, I'm working from ISO/IEC 9899:TC9, WG14/N1124 May 6, 2005. If that's out of date and someone could also provide a link to a more recent copy, that'd be appreciated as well.
I think the source of your confusion might be that the following two statements are not equivalent:
Each of the operands shall have integer type
Each of the operands shall have int type
uint64_t is an integer type.
The constraint in §6.5.7 that "Each of the operands shall have integer type." is a constraint that means you cannot use the bitwise shift operators on non-integer types like floating point values or pointers. It does not cause the effect you are noting.
The part that does cause the effect is in the next paragraph:
3. The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand.
The integer promotions are described in §6.3.1.1:
2. The following may be used in an expression wherever an int
or unsigned int may be used:
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and
unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value
is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions. All other types are
unchanged by the integer promotions.
uint8_t has a lesser rank than int, so the value is converted to an int (since we know that an int must be able to represent all the values of uint8_t, given the requirements on the ranges of those two types).
The ranking rules are complex, but they guarantee that a type with a higher rank cannot have a lesser precision. This means, in effect, that types cannot be "demoted" to a type with lesser precision by the integer promotions (it is possible for uint64_t to be promoted to int or unsigned int, but only if the range of the type is at least that of uint64_t).
In the case of uint32_t << uint64_t, the rule that kicks in is "The type of the result is that of the promoted left operand". So we have a few possibilities:
If int is at least 33 bits, then uint32_t will be promoted to int and the result will be int;
If int is less than 33 bits and unsigned int is at least 32 bits, then uint32_t will be promoted to unsigned int and the result will be unsigned int;
If unsigned int is less than 32 bits then uint32_t will be unchanged and the result will be uint32_t.
On today's common desktop and server implementations, int and unsigned int are usually 32 bits, and so the second possibility will occur (uint32_t is promoted to unsigned int). In the past it was common for int / unsigned int to be 16 bits, and the third possibility would occur (uint32_t left unpromoted).
The result of your example:
uint32_t t32 = 1;
uint64_t t64_one = 1;
uint64_t t64_res;
t64_res = t32 << t64_one;
Will be the value 2 stored into t64_res. Note though that this is not affected by the fact that the result of the expression is not uint64_t - and example of an expression that would be affected is:
uint32_t t32 = 0xFF000;
uint64_t t64_shift = 16;
uint64_t t64_res;
t64_res = t32 << t64_shift;
The result here is 0xf0000000.
Note that although the details are fairly intricate, you can boil it all down to a fairly simple rule that you should keep in mind:
In C, arithmetic is never done in types narrower than int /
unsigned int.
You found the wrong rule in the standard :( The relevant is something like "the usual integer type promotions apply". This is what hits you for the first example. If an integer type like uint8_t has a rank that is smaller than int it is promoted to int. uint64_t has not a rank that is smaller than int or unsigned so no promotion is performed and the << operator is applied to the uint64_t variable.
Edit: All integer types smaller than int are promoted for arithmetic. This is just a fact of life :) Whether or not uint32_t is promoted depends on the platform, because it might have the same rank or higher than int (not promoted) or a smaller rank (promoted).
Concerning the << operator the type of the right operand is not really important, what counts for the number of bits is the left one (with the above rules). More important for the right one is its value. It musn't be negative or exceed the width of the (promoted) left operand.