in order to realize logical right shift in c , i search the web and got the following C code
int a, b, c;
int x = -100;
a = (unsigned) x >> 2;
b = (0xffffffff & x) >> 2;
c = (0x0 | x ) >> 2;
now both a and b were logical right shift result(1006632960), but c was still arithmetic shift result(-25), could somebody explain why ? thx
b = (0xffffffff & x) >> 2;
Assuming that your ints are 32 bits, the type of the literal constant 0xffffffff is unsigned int, because it is too large to fit in a plain int. The &, then, is between an unsigned int and an int, in which case the unsigned type wins by definition. The shift therefore happens on unsigned; thus it shifts in 0 bits from the left.
c = (0x0 | x ) >> 2;
The type of 0x0 defaults to int because it is small enough to fit, so the bitwise or happens on ints, and so does the following shift. It is implementation defined what happens when you shift a signed integer right, but most compilers will produce an arithmetic shift that sign-extends.
(unsigned) x is of type unsigned int so it get a logical shift.
0xffffffff (assuming 32 bit int) is of type unsigned int, so (0xffffffff & x) is also of type unsigned int so it get a logical shift.
0x0 is of type int, so (0x0|x) is of type int and get an arithmetic shift (well, it is implementation dependent).
It's all about the operand type of the operator >>. If it's signed - the right-shift sets the MSB to 1 if the operand was negative. If the operand is unsigned - MSB bits are always zero after right-shift.
In your first expression the operand is cast explicitly to unsigned.
In the second expression the (0xffffffff & x) us unsigned, because 0xffffffff definitely represents an unsigned integer (it's an overflow for signed).
OTOH in the third example 0x0 is signed (this is the default for integer constants). Hence the whole operand (0x0 | x ) is considered signed
Related
In Go (the language I'm most familiar with), the result of a mathematical operation is always the same data type as the operands, meaning if the operation overflows, the result will be incorrect. For example:
func main() {
var a byte = 100
var b byte = 9
var r byte = (a << b) >> b
fmt.Println(r)
}
This prints 0, as all the bits are shifted out of the bounds of a byte during the initial << 9 operation, then zeroes are shifted back in during the >> 9 operation.
However, this isn't the case in C:
int main() {
unsigned char a = 100;
unsigned char b = 9;
unsigned char r = (a << b) >> b;
printf("%d\n", r);
return 0;
}
This code prints 100. Although this yields the "correct" result, this is unexpected to me, as I'd only expect promotion if one of the operands were larger than a byte, but in this case all operands are bytes. It's as though the temporary variable holding the result of the << 9 operation is larger than the resulting variable, and is only downcast back to a byte after the full RHS is evaluated, and thus after the >> 9 operation restores the bits.
Obviously, if explicitly storing the result of the >> 9 into a byte before continuing, you get the same result as in Go:
int main() {
unsigned char a = 100;
unsigned char b = 9;
unsigned char c = a << b;
unsigned char r = c >> b;
printf("%d\n", r);
return 0;
}
This isn't merely the case with bitwise operators. I've tested with multiplication/division too, and it demonstrates the same behaviour.
My question is: is this behaviour of C defined? If so, where? Does it actually use a specific data type for the interim values of a complex expression? Or is this actually undefined behaviour, like an incidental result of the operations being performed in a 32/64 bit CPU register before being saved back to memory?
C 2018 6.5.7 discusses the shift operators. Paragraph 3 says:
The integer promotions are performed on each of the operands…
6.3.1.1 2 specifies the integer promotions:
… If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
Thus in a << b where a and b are unsigned char, a is promoted to int, which is at least 16 bits. (A C implementation may define unsigned char to be more than eight bits. It could be the same width as int. In this case, the integer promotions would not convert a or b.)
Note that if the integer promotions were not applied, the behavior of evaluating a << b with b equal to 9 would not be defined by the C standard, as the behavior of the shift operators is not defined for shift amounts greater than or equal to the width of the left operator.
6.5.5 specifies the multiplicative operators. Paragraph 3 says:
The usual arithmetic conversions are performed on the operands.
6.3.1.8 specifies the usual arithmetic conversions:
… First, if the corresponding real type of either operand is long double, the other operand is converted, without change of type domain [complex or real], to a type whose corresponding real type is long double.
Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double.
Otherwise, if the corresponding real type of either operand is float, the other operand is converted, without change of type domain, to a type whose corresponding real type is float.
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
Rank has a technical definition that largely corresponds to width (number of bits in an integer type).
Thus, in a * b where a and b are unsigned char, they are both promoted to int (with the caveat above about wide unsigned char) and no further conversions are necessary. If one operand were wider than int, say long long int, while the other is unsigned char then both operands would be converted to that wider type.
Welcome to integer promotions! One behavior of the C language (an often criticized one, I'd add) is that types like char and short are promoted to int before doing any arithmetic operation with them, and the result is also int. What does this mean?
unsigned char foo(unsigned char x) {
return (x << 4) >> 4;
}
int main(void) {
if (foo(0xFF) == 0x0F) {
printf("Yay!\n");
}
else {
printf("... hey, wait a minute!\n");
}
return 0;
}
Needless to say, the above code prints ... hey, wait a minute!. Let's discover why:
// this line of code:
return (x << 4) >> 4;
// is converted to this (because of integer promotion):
return ((int) x << 4) >> 4;
Therefore, this is what happens:
x is unsigned char (8-bit) and its value is 0xFF,
x << 4 needs to be executed, but first x is converted to int (32-bit),
x << 4 becomes 0x000000FF << 4, and the result 0x00000FF0 is also int,
0x00000FF0 >> 4 is executed, yielding 0x000000FF,
finally, 0x000000FF is converted to unsigned char (because that's the return value of foo()), so it becomes 0xFF,
and that's why foo(0xFF) yields 0xFF instead of 0x0F.
How to prevent this? Simple: convert the result of x << 4 to unsigned char. In the previous example, 0x00000FF0 would have become 0xF0.
unsigned char foo(unsigned char x) {
return ((unsigned char) (x << 4)) >> 4;
}
foo(0xFF) == 0x0F
NOTE: in the previous examples, it is assumed that unsigned char is 8 bits and int is 32 bits, but the examples work for basically any situation in which CHAR_BIT == 8 (because C17 requires that sizeof(int) * CHAR_BIT >= 16).
P.S.: this answer is not as exhaustive as the C official standard document, of course. But you can find all the (valid and defined) behavior of C described in the latest draft of the ISO/IEC 9899:2018 standard (a.k.a. C17/C18).
int main(){
signed int a = 0b00000000001111111111111111111111;
signed int b = (a << 10) >> 10;
// b is: 0b11111111111111111111111111111111
signed short c = 0b0000000000111111;
signed short d = (c << 10) >> 10;
// d is: 0b111111
return 0;
}
Assuming int is 32 bits and short is 16 bits,
Why would b get sign extended but d does not get sign extended?
I have tested this with gdb on x64, compiled with gcc.
In order to get short sign extended, I had to use two separate variables like this:
signed short f = c << 10;
signed short g = f >> 10;
// g is: 0b1111111111111111
In the case of signed short, when an integer type smaller than int is used in an expression it is (in most cases) promoted to type int. This is spelled out in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an int or
unsigned int may be used
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool,int,signed int,or unsigned int.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions All
other types are unchanged by the integer promotions
And this promotion specifically happens in the case of bitwise shift operators as specified in section 6.5.7p3:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
So the short value 0x003f is promoted to the int value 0x0000003f and the left shift is applied. This results in 0x0000fc00, and the right shift gives a result of 0x0000003f.
The signed int case is a bit more interesting. In this case you're left-shifting a bit with the value 1 into the sign bit. This triggers undefined behavior as per 6.5.7p4:
The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros. If E1 has an unsigned
type, the value of the result is E1×2E2, reduced
modulo one more than the maximum value representable in the
result type. If E1 has a signed type and nonnegative value,
and E1×2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
So while the output you get for the signed int case is what you might expect it to be, it's actually undefined behavior and so you can't depend on that result.
short is automatically converted to int by the integer promotions, per C 2018 6.5.7 3:
The integer promotions are performed on each of the operands…
So (c << 10) shifts an int 0b111111 left 10 bits, yielding (in your C implementation) the 32-bit int 0b00000000000000001111110000000000. The sign bit in that is zero; it is a positive number.
When you do signed short f = c << 10;, the result of c << 10 is too big to fit in a signed short. It is 64,512, which is above the largest value your signed short can represent, 32,767. In an assignment, the value is converted to the type of the left operand. Per C 2018 6.3.1.3 3, the conversion is implementation-defined. GCC defines this conversion to wrap modulo 65,536 (two the power of the number of bits in the type). So converting 64,512 yields 64,512 − 65,536 = −1024. So f is set to −1024.
Then, in f >> 10, you are shifting a negative value. As signed short, f is still promoted to int, but this conversion keeps the value, resulting in an int value of −1024. This is then shifted. This shift is implementation-defined, and GCC defines it to shift with sign extension. So the result of -1024 >> 10 is −1.
For starters according to the C Standard (6.5.7 Bitwise shift operators)
3 The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand.
Thus this value
signed short c = 0b0000000000111111;
in the expression used in this declaration
signed short d = (c << 10) >> 10;
is promoted to the integer type int. As the value is positive then the promoted values is also positive.
Thus this operation
c << 10
does not touch the sign bit.
On the other hand this code snippet
signed int a = 0b00000000001111111111111111111111;
signed int b = (a << 10) >> 10;
has undefined behavior because according to same section of the C Standard
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
I thought I'd found something similar in this answer but in that case they weren't assigning the result of the expression to the variable. In my case I am assigning it but the bitshift part of the expression has no effect.
unsigned leftmost1 = ((~0)>>20);
printf("leftmost1 %u\n", leftmost1);
Returns
leftmost1 4294967295
Whereas
unsigned leftmost1 = ~0;
leftmost1 = leftmost1 >> 20;
printf("leftmost1 %u\n", leftmost1);
Gives me
leftmost1 4095
I would expect separating the logic into two lines would have no impact, why are the results different?
In the first case, you are doing a signed right shift, because ~0 results in a signed value. The exact behavior of signed right shifts is implementation-defined, but most platforms, including yours, extend the sign bit, so the shift is a no-op for your input of "all ones".
In the second case, you are doing an unsigned right shift, since leftmost1 is an unsigned value. So you shift in zeros from the left.
If you wanted to do an unsigned shift without the intermediate assignmetn, you can do:
(~0u) >> 20
Where the u suffix indicates an unsigned literal.
~0 is an int. So your first piece of code isn't equivalent to the second, it's equivalent to
int tmp = ~0;
tmp = tmp >> 20;
unsigned leftmost1 = tmp;
You're seeing the results of sign extension when you right-shift a negative number.
0 has type int. ~0 is -1 on a typical two's complement machine. Right-shifting a negative number has implementation-defined results, but a common choice is to shift in 1 bits, which for -1 leaves the number unchanged (i.e. -1 >> anything is -1 again).
You can fix this by writing 0u (which is a literal of type unsigned int). This forces the operations to be done in unsigned int, as in your second example:
unsigned leftmost1 = ~0;
This line is equivalent to unsigned leftmost1 = -1, which implicitly converts -1 (a signed int) to UINT_MAX. The following operation (leftmost1 >> 20) then uses unsigned arithmetic.
Try casting like this. ~0 is promoted to int which is signed so it's carrying the sign bit when you shift
unsigned leftmost1 = ((unsigned)(~0)>>20);
printf("leftmost1 %u\n", leftmost1);
Could someone explain me why:
x = x << 1;
x = x >> 1;
and:
x = (x << 1) >> 1;
produce different answers in C? x is a *uint8_t* type (unsigned 1-byte long integer). For example when I pass it 128 (10000000) in the first case it returns 0 (as expected most significant bit falls out) but in the second case it returns the original 128. Why is that? I'd expect these expressions to be equivalent?
This is due to integer promotions, both operands of the bit-wise shifts will be promoted to int in both cases. In the second case:
x = (x << 1) >> 1;
the result of x << 1 will be an int and therefore the shifted bit will be preserved and available to the next step as an int which will shift it back again. In the first case:
x = x << 1;
x = x >> 1;
when you assign back to x you will lose the extra bits. From the draft C99 standard section 6.5.7 Bit-wise shift operators it says:
The integer promotions are performed on each of the operands.
The integer promotions are covered in section 6.3.1.1 Boolean, characters, and integers paragraph 2 which says:
If an int can represent all values of the original type, the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the integer
promotions.48)
The last piece of this why does the conversion from the int value 256 to uint8_t give us 0? The conversion is covered in section 6.3.1.3 Signed and unsigned integers which is under the Conversions section and says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
So we have 256 - (255+1) which is 0.
When you bitshift the result is promoted to int. In the first example you convert int back to uint8_t everytime, and lose the intermediate data. But in the second example you keep the int result when you shift back.
In here
When converting from bytes buffer back to unsigned long int:
unsigned long int anotherLongInt;
anotherLongInt = ( (byteArray[0] << 24)
+ (byteArray[1] << 16)
+ (byteArray[2] << 8)
+ (byteArray[3] ) );
where byteArray is declared as unsigned char byteArray[4];
Question:
I thought byteArray[1] would be just one unsigned char (8 bit). When left-shifting by 16, wouldn't that shift all the meaningful bits out and fill the entire byte with 0? Apparently it is not 8 bit. Perhaps it's shifting the entire byteArray which is a consecutive 4 byte? But I don't see how that works.
In that arithmetic context byteArray[0] is promoted to either int or unsigned int, so the shift is legal and maybe even sensible (I like to deal only with unsigned types when doing bitwise stuff).
6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand.
And integer promotions:
6.3.1.1
If an int can represent all values of the original type the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the integer promotions.
The unsigned char's are implicitly cast to int's when shifting. Not sure to what type exactly it is cast, I thing that depends on the platform and the compiler. To get what you intend, it is safer to explicitly cast the bytes, that also makes it more portable and the reader immediately sees what you intend to do:
unsigned long int anotherLongInt;
anotherLongInt = ( ((unsigned long)byteArray[0] << 24)
+ ((unsigned long)byteArray[1] << 16)
+ ((unsigned long)byteArray[2] << 8)
+ ((unsigned long)byteArray[3] ) );