overflow with left shift - c

I have a simple program where I have a few macros for bit manipulation.
One of the macro translates to following
unsigned long val = 1 << 0x1f;
Here, I am getting output as `
val = 0xffffffff80000000;
I understand it is some kind of int overflow. I am confused about the part about the final result. Why is the result coming out to be the one I am getting?
(I do understand here for some reason 1 is treated as int and when left shift is making it a negative int. But the part I am confused about the type casting.)

1 is a signed integer, so in case of overflow it is undefined behavior.
According to the standard (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
Instead, try unsigned long value 1ul:
unsigned long val = 1ul << 0x1f
(I do understand here for some reason 1 is treated as int and when left shift is making it a negative int. But the part I am confused about the type casting.)
It could happen in practice indeed, if the sign bit gets set to 1. (But, once again, according to the standard it is UB.)
Nevertheless, let's consider a legal case (I used short and int since int and long have the same size on my system):
// on my system
short int s = SHRT_MIN; // 0x8000
unsigned int i = s; // 0xffff8000
The following part of the standard clarifies it:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type
(If I interpret it right) since we cannot represent a negative value in unsigned type, we add UINT_MAX + 1 to SHRT_MIN (in pure mathematical terms, without considering overflow):
UINT_MAX + 1 + SHRT_MIN // 0xffff8000

Related

Inconsistencies in sign extension when shifting signed int vs short

int main(){
signed int a = 0b00000000001111111111111111111111;
signed int b = (a << 10) >> 10;
// b is: 0b11111111111111111111111111111111
signed short c = 0b0000000000111111;
signed short d = (c << 10) >> 10;
// d is: 0b111111
return 0;
}
Assuming int is 32 bits and short is 16 bits,
Why would b get sign extended but d does not get sign extended?
I have tested this with gdb on x64, compiled with gcc.
In order to get short sign extended, I had to use two separate variables like this:
signed short f = c << 10;
signed short g = f >> 10;
// g is: 0b1111111111111111
In the case of signed short, when an integer type smaller than int is used in an expression it is (in most cases) promoted to type int. This is spelled out in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an int or
unsigned int may be used
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool,int,signed int,or unsigned int.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions All
other types are unchanged by the integer promotions
And this promotion specifically happens in the case of bitwise shift operators as specified in section 6.5.7p3:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
So the short value 0x003f is promoted to the int value 0x0000003f and the left shift is applied. This results in 0x0000fc00, and the right shift gives a result of 0x0000003f.
The signed int case is a bit more interesting. In this case you're left-shifting a bit with the value 1 into the sign bit. This triggers undefined behavior as per 6.5.7p4:
The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros. If E1 has an unsigned
type, the value of the result is E1×2E2, reduced
modulo one more than the maximum value representable in the
result type. If E1 has a signed type and nonnegative value,
and E1×2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
So while the output you get for the signed int case is what you might expect it to be, it's actually undefined behavior and so you can't depend on that result.
short is automatically converted to int by the integer promotions, per C 2018 6.5.7 3:
The integer promotions are performed on each of the operands…
So (c << 10) shifts an int 0b111111 left 10 bits, yielding (in your C implementation) the 32-bit int 0b00000000000000001111110000000000. The sign bit in that is zero; it is a positive number.
When you do signed short f = c << 10;, the result of c << 10 is too big to fit in a signed short. It is 64,512, which is above the largest value your signed short can represent, 32,767. In an assignment, the value is converted to the type of the left operand. Per C 2018 6.3.1.3 3, the conversion is implementation-defined. GCC defines this conversion to wrap modulo 65,536 (two the power of the number of bits in the type). So converting 64,512 yields 64,512 − 65,536 = −1024. So f is set to −1024.
Then, in f >> 10, you are shifting a negative value. As signed short, f is still promoted to int, but this conversion keeps the value, resulting in an int value of −1024. This is then shifted. This shift is implementation-defined, and GCC defines it to shift with sign extension. So the result of -1024 >> 10 is −1.
For starters according to the C Standard (6.5.7 Bitwise shift operators)
3 The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand.
Thus this value
signed short c = 0b0000000000111111;
in the expression used in this declaration
signed short d = (c << 10) >> 10;
is promoted to the integer type int. As the value is positive then the promoted values is also positive.
Thus this operation
c << 10
does not touch the sign bit.
On the other hand this code snippet
signed int a = 0b00000000001111111111111111111111;
signed int b = (a << 10) >> 10;
has undefined behavior because according to same section of the C Standard
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.

Bit-shifting unsigned longs in C

I found a bug in a piece of code I wrote, and have fixed it, but still can't explain what was happening. It boils down to this:
unsigned i = 1<<31; // gives 21476483648 as expected
unsigned long l = 1<<31; // gives 18446744071562067968 as not expected
I'm aware of a question here: Unsigned long and bit shifting wherein the exact same number shows up as an unexpected value, but there he was using a signed char which I believe led to a sign extension. I really can't for the life of me see why I'm getting an incorrect value here.
I'm using CLion on Ubuntu 18.04, and on my system an unsigned is 32 bits and a long is 64 bits.
In this expression:
1<<31
The value 1 has type int. Assuming an int is 32 bits wide, that means you're shifting a bit into the sign bit. Doing so is undefined behavior.
This is documented in section 6.5.7p4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the
value of the result is E1×2E2, reduced modulo one more than the
maximum value representable in the result type. If E1 has a
signed type and nonnegative value, and E1×2E2 is representable in
the result type, then that is the resulting value; otherwise, the
behavior is undefined.
However, since you're on Ubuntu, which used GCC, the behavior is actually implementation defined. The gcc documentation states:
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed >> acts on
negative numbers by sign extension.
As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed << as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
So gcc in this case works directly on the representation of the values. This means that 1<<31 has type int and the representation 0x80000000. The value of this representation in decimal is ‭-2147483648‬.
When this value is assigned to an unsigned int, it is converted via the rules in section 6.3.1.3p2:
Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than the
maximum value that can be represented in the new type until the
value is in the range of the new type.
Since "one more than the maximum value" is ‭42949672956 for a 32 bit unsigned int This results in the int value -2147483648‬ being converted to the unsigned int value ‭42949672956 -2147483648 == 2147483648‬.
When 1<<31 is assigned to an unsigned long int which is 64 bit, "one more than the maximum value" is 18446744073709551616 so the result of the conversion is 18446744073709551616 -2147483648 == 18446744071562067968, which is the value you're getting.
To get the correct value, use the UL suffix to make the value unsigned long:
1UL<<31

Integer promotion and right shifting

Considering that for example an unsigned char will always be promoted to an int can I assume that if I don't cast to unsigned char before shifting the result will always be implementation defined?
For example:
unsigned char c = 0x0F;
unsigned int a = c >> 2;
Here c will be promoted to an int before shifting to the right. So the shifting will be implementation defined, depending on the compiler.
The right way would be:
unsigned int a = (unsigned char)c >> 2;
My question being, is this statement true:
Doing any shifting on any datatype smaller than int will be implementation defined if not also cast to unsigned type?
The result will always be well defined.
A right shift of a signed type is only implementation defined if the value is negative. This is specified in section 6.5.7p5 of the C standard regarding Bitwise shift operators:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
The value also cannot be negative because integer promotions preserve the sign of the value being promoted. From section 6.3.1.1p3 regarding conversion of integers:
The integer promotions preserve value including sign. As discussed
earlier, whether a "plain" char is treated as signed is
implementation-defined.
So because the value is guaranteed to be positive, the right shift operation is well defined.
6.5.7p3 on Bitwise Shift Operations says:
The integer promotions are performed on each of the operands.
If you have a subint type on the left side (or right side, but there it doesn't really matter) of a bitshift operations, it'll be promoted to int.
#include <stdio.h>
int main()
{
http://port70.net/~nsz/c/c11/n1570.html#6.5.7p3
#define TPSTR(X) _Generic(X,unsigned char: "uchar", int: "int")
unsigned char c = 0x0F;
printf("%s\n", TPSTR(c>>2));
printf("%s\n", TPSTR((unsigned char)c>>2));
printf("%s\n", TPSTR((unsigned char)c>>(unsigned char)2));
}
compiles and prints int int int.
The result of a right-shift on a signed integer type (int) will only be implementation defined iff the left operand has a negative value:
6.5.7p5:
If E1 has a signed type and a negative value, the resulting value is
implementation-defined.

Reverse precedence of shift and bitwise complement in an expression

In this code:
unsigned short int i = 3;
unsigned short int x = 30;
unsigned short int z = (~x) >> i;
On the third row it seems that it first does the shift and THEN the complement (~) even when I use parentheses.
However, the strange result doesn't occur if I replace short with long.
It happens both in Windows and in Unix. Why is that?
It performs the operations exactly in the order you prescribed.
However, the operands are not unsigned short ints. Integral promotion turns x and i into good old regular signed integers before preforming the operation. To quote the C standard on this:
6.3.1 Arithmetic operands / paragraph 2
The following may be used in an expression wherever an int or unsigned
int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to
the rank of int and unsigned int.
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int.
And unsigned shorts can fit snugly in a signed integer on the machines you tried.
Furthermore, right shifting a signed integer has implementation defined results for negative values:
6.5.7 Bitwise shift operators / paragraph 5
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient of
E1 / 2E2 . If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
And ~x is some negative integer (which one precisely depends on the value representation of signed integers).
All of the above more then likely accounts for you not getting the expected result when converting it back to an unsigned short integer.

How to explain the bit right shift two different results?

I get two different results, I'm confused, the code is :
int main ()
{
int i = 0xcffffff3;
printf("%x\n", 0xcffffff3>>2);
printf("%x\n", i>>2);
return 0;
}
the result is :
33fffffc
f3fffffc
It all comes down to 0xcffffff3. That is an hexadecimal integer constant. The type of the constant depends on its magnitude. Let's refer to C11 § 6.4.4.1 ¶ 5:
The type of an integer constant is the first of the corresponding list in which its value can be represented.
Octal or Hexadecimal Constant - int, unsigned int, long int, unsigned long int, long long int, unsigned long long int
So assuming 32 bit integer representation on your system. The type of 0xcffffff3 is unsigned int.
Now, when you do int i = 0xcffffff3; the unsigned constant is converted to a signed integer. This conversion yields a negative value.
Finally, when right shifting, it has the semantics defined by C11 §6.5.7 ¶5:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2 . If E1 has a signed type and a negative value, the resulting value is implementation-defined.
Shifting the unsigned constant yields 0xcffffff3/4 and shifting i yields an implementation defined value (a negative integer in this case).
In 0xcffffff3 >> 2, both values are treated as unsigned while in i >> 2, i is signed.
As a result, the arithmetic shift is prepending 1 bits in the signed case because the number is negative before shifting.
Use (unsigned)i >> 2 or define unsigned i = 0xcffffff3; if you want the same result as with the constants.

Resources