Reverse precedence of shift and bitwise complement in an expression

Reverse precedence of shift and bitwise complement in an expression - c

In this code:
unsigned short int i = 3;
unsigned short int x = 30;
unsigned short int z = (~x) >> i;
On the third row it seems that it first does the shift and THEN the complement (~) even when I use parentheses.
However, the strange result doesn't occur if I replace short with long.
It happens both in Windows and in Unix. Why is that?

It performs the operations exactly in the order you prescribed.
However, the operands are not unsigned short ints. Integral promotion turns x and i into good old regular signed integers before preforming the operation. To quote the C standard on this:
6.3.1 Arithmetic operands / paragraph 2
The following may be used in an expression wherever an int or unsigned
int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to
the rank of int and unsigned int.
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int.
And unsigned shorts can fit snugly in a signed integer on the machines you tried.
Furthermore, right shifting a signed integer has implementation defined results for negative values:
6.5.7 Bitwise shift operators / paragraph 5
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient of
E1 / 2E2 . If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
And ~x is some negative integer (which one precisely depends on the value representation of signed integers).
All of the above more then likely accounts for you not getting the expected result when converting it back to an unsigned short integer.

Related

Inconsistencies in sign extension when shifting signed int vs short

int main(){
signed int a = 0b00000000001111111111111111111111;
signed int b = (a << 10) >> 10;
// b is: 0b11111111111111111111111111111111
signed short c = 0b0000000000111111;
signed short d = (c << 10) >> 10;
// d is: 0b111111
return 0;
}
Assuming int is 32 bits and short is 16 bits,
Why would b get sign extended but d does not get sign extended?
I have tested this with gdb on x64, compiled with gcc.
In order to get short sign extended, I had to use two separate variables like this:
signed short f = c << 10;
signed short g = f >> 10;
// g is: 0b1111111111111111

In the case of signed short, when an integer type smaller than int is used in an expression it is (in most cases) promoted to type int. This is spelled out in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an int or
unsigned int may be used
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool,int,signed int,or unsigned int.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions All
other types are unchanged by the integer promotions
And this promotion specifically happens in the case of bitwise shift operators as specified in section 6.5.7p3:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
So the short value 0x003f is promoted to the int value 0x0000003f and the left shift is applied. This results in 0x0000fc00, and the right shift gives a result of 0x0000003f.
The signed int case is a bit more interesting. In this case you're left-shifting a bit with the value 1 into the sign bit. This triggers undefined behavior as per 6.5.7p4:
The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros. If E1 has an unsigned
type, the value of the result is E1×2E2, reduced
modulo one more than the maximum value representable in the
result type. If E1 has a signed type and nonnegative value,
and E1×2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
So while the output you get for the signed int case is what you might expect it to be, it's actually undefined behavior and so you can't depend on that result.

short is automatically converted to int by the integer promotions, per C 2018 6.5.7 3:
The integer promotions are performed on each of the operands…
So (c << 10) shifts an int 0b111111 left 10 bits, yielding (in your C implementation) the 32-bit int 0b00000000000000001111110000000000. The sign bit in that is zero; it is a positive number.
When you do signed short f = c << 10;, the result of c << 10 is too big to fit in a signed short. It is 64,512, which is above the largest value your signed short can represent, 32,767. In an assignment, the value is converted to the type of the left operand. Per C 2018 6.3.1.3 3, the conversion is implementation-defined. GCC defines this conversion to wrap modulo 65,536 (two the power of the number of bits in the type). So converting 64,512 yields 64,512 − 65,536 = −1024. So f is set to −1024.
Then, in f >> 10, you are shifting a negative value. As signed short, f is still promoted to int, but this conversion keeps the value, resulting in an int value of −1024. This is then shifted. This shift is implementation-defined, and GCC defines it to shift with sign extension. So the result of -1024 >> 10 is −1.

For starters according to the C Standard (6.5.7 Bitwise shift operators)
3 The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand.
Thus this value
signed short c = 0b0000000000111111;
in the expression used in this declaration
signed short d = (c << 10) >> 10;
is promoted to the integer type int. As the value is positive then the promoted values is also positive.
Thus this operation
c << 10
does not touch the sign bit.
On the other hand this code snippet
signed int a = 0b00000000001111111111111111111111;
signed int b = (a << 10) >> 10;
has undefined behavior because according to same section of the C Standard
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.

Integer promotion and right shifting

Considering that for example an unsigned char will always be promoted to an int can I assume that if I don't cast to unsigned char before shifting the result will always be implementation defined?
For example:
unsigned char c = 0x0F;
unsigned int a = c >> 2;
Here c will be promoted to an int before shifting to the right. So the shifting will be implementation defined, depending on the compiler.
The right way would be:
unsigned int a = (unsigned char)c >> 2;
My question being, is this statement true:
Doing any shifting on any datatype smaller than int will be implementation defined if not also cast to unsigned type?

The result will always be well defined.
A right shift of a signed type is only implementation defined if the value is negative. This is specified in section 6.5.7p5 of the C standard regarding Bitwise shift operators:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
The value also cannot be negative because integer promotions preserve the sign of the value being promoted. From section 6.3.1.1p3 regarding conversion of integers:
The integer promotions preserve value including sign. As discussed
earlier, whether a "plain" char is treated as signed is
implementation-defined.
So because the value is guaranteed to be positive, the right shift operation is well defined.

6.5.7p3 on Bitwise Shift Operations says:
The integer promotions are performed on each of the operands.
If you have a subint type on the left side (or right side, but there it doesn't really matter) of a bitshift operations, it'll be promoted to int.
#include <stdio.h>
int main()
{
http://port70.net/~nsz/c/c11/n1570.html#6.5.7p3
#define TPSTR(X) _Generic(X,unsigned char: "uchar", int: "int")
unsigned char c = 0x0F;
printf("%s\n", TPSTR(c>>2));
printf("%s\n", TPSTR((unsigned char)c>>2));
printf("%s\n", TPSTR((unsigned char)c>>(unsigned char)2));
}
compiles and prints int int int.
The result of a right-shift on a signed integer type (int) will only be implementation defined iff the left operand has a negative value:
6.5.7p5:
If E1 has a signed type and a negative value, the resulting value is
implementation-defined.

How to explain the bit right shift two different results?

I get two different results, I'm confused, the code is :
int main ()
{
int i = 0xcffffff3;
printf("%x\n", 0xcffffff3>>2);
printf("%x\n", i>>2);
return 0;
}
the result is :
33fffffc
f3fffffc

It all comes down to 0xcffffff3. That is an hexadecimal integer constant. The type of the constant depends on its magnitude. Let's refer to C11 § 6.4.4.1 ¶ 5:
The type of an integer constant is the first of the corresponding list in which its value can be represented.
Octal or Hexadecimal Constant - int, unsigned int, long int, unsigned long int, long long int, unsigned long long int
So assuming 32 bit integer representation on your system. The type of 0xcffffff3 is unsigned int.
Now, when you do int i = 0xcffffff3; the unsigned constant is converted to a signed integer. This conversion yields a negative value.
Finally, when right shifting, it has the semantics defined by C11 §6.5.7 ¶5:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2 . If E1 has a signed type and a negative value, the resulting value is implementation-defined.
Shifting the unsigned constant yields 0xcffffff3/4 and shifting i yields an implementation defined value (a negative integer in this case).

In 0xcffffff3 >> 2, both values are treated as unsigned while in i >> 2, i is signed.
As a result, the arithmetic shift is prepending 1 bits in the signed case because the number is negative before shifting.
Use (unsigned)i >> 2 or define unsigned i = 0xcffffff3; if you want the same result as with the constants.

overflow with left shift

I have a simple program where I have a few macros for bit manipulation.
One of the macro translates to following
unsigned long val = 1 << 0x1f;
Here, I am getting output as `
val = 0xffffffff80000000;
I understand it is some kind of int overflow. I am confused about the part about the final result. Why is the result coming out to be the one I am getting?
(I do understand here for some reason 1 is treated as int and when left shift is making it a negative int. But the part I am confused about the type casting.)

1 is a signed integer, so in case of overflow it is undefined behavior.
According to the standard (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
Instead, try unsigned long value 1ul:
unsigned long val = 1ul << 0x1f
(I do understand here for some reason 1 is treated as int and when left shift is making it a negative int. But the part I am confused about the type casting.)
It could happen in practice indeed, if the sign bit gets set to 1. (But, once again, according to the standard it is UB.)
Nevertheless, let's consider a legal case (I used short and int since int and long have the same size on my system):
// on my system
short int s = SHRT_MIN; // 0x8000
unsigned int i = s; // 0xffff8000
The following part of the standard clarifies it:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type
(If I interpret it right) since we cannot represent a negative value in unsigned type, we add UINT_MAX + 1 to SHRT_MIN (in pure mathematical terms, without considering overflow):
UINT_MAX + 1 + SHRT_MIN // 0xffff8000

Extension on shifting or arithemtic operations in standard C

Sorry for bad English.
uint16_t a, c;
uint8_t b = 0xff;
a = b<<8;
c = b*10;
What is value of a and c we get? What is situation with arbitrary integer types?

uint16_t a, c;
uint8_t b = 0xff;
a = b<<8;
First, the integer promotions are performed on the arguments of <<. The constant 8 is an int and thus is not converted. Since the conversion rank of uint8_t is smaller than that of int, and all values of uint8_t are representable as ints, b is converted - preserving its value - to int. The resulting int value is then shifted left by eight bits.
If int is only 16 bits wide, the value 0xff * 2^8 is not representable as an int, and then the shift invokes undefined behaviour - 6.5.7 (4) in n1570 and C99:
If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Otherwise, the result is 255*256 = 65280 = 0xFF00. Since that value is representable in the type of a, the conversion of the int result of the shift to uint16_t preserves the value; if the result were out of range (e.g. if the shift distance were 9 [and int wide enough]), it would be reduced modulo 2^16 to obtain a value in the range 0 to 2^16 - 1 of uint16_t.
c = b*10;
The usual arithmetic conversions are performed on the operands of *. Both operands have an integer type, thus first the integer promotions are performed. Since 10 is an int and all values of b's type are representable as an int, the integer promotions give both operands the same type, int, and the usual arithmetic conversions don't require any further conversions. The multiplication is done at type int, its result, 2550, is again representable in the type of c, so the conversion to uint16_t that is done before storing the value in c preserves the value.
What is situation with arbitrary integer types?
For <<:
integer promotions; values/expressions of an integer type whose conversion rank is less than or equal to that of int (and unsigned int) [integer types with width <= that of (unsigned) int], and bitfields of type _Bool, int, signed int and unsigned int are converted to int or unsigned int (int if that can represent all values of the original type, unsigned int otherwise).
If the (promoted) right operand (shift distance) is negative or greater than or equal to the width (number of value bits plus sign bits; there is either one sign bit or none) of the (promoted) left operand, the behaviour is undefined. If the value of the (promoted) left operand is negative, the behaviour is undefined. If the type of the (promoted) left operand is unsigned, the result is value * 2^distance, reduced modulo 2^width. If the type of the (promoted) left operand is signed and the value nonnegative, the result is value * 2^distance if that is representable in the type, the behaviour is undefined otherwise.
If no undefined behaviour occurred in 2., the result is converted to the type of the variable it is stored in.
If the target type is _Bool (or an alias thereof), a nonzero result is converted to 1, a zero result to 0, otherwise
If the result can be represented in the target type, its value is preserved, otherwise
If the target type is unsigned, the result is reduced modulo 2^width, otherwise
the result is converted in an implementation-defined manner or an implementation-defined signal is raised.
For *:
The usual arithmetic conversions are performed, so that both (converted) operands have the same type.
The multiplication is performed at the resulting type; if that is a signed integer type and the multiplication overflows, the behaviour is undefined.
The result is converted to the target type in the same manner as above.
That's how the abstract machine is defined, if the implementation can achieve the same results (where the behaviour is defined) in another manner, it can do as it pleases under the as-if rule.