Edit: As pointed out below I missed the first part of the ANSI C standard:
"If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." The errors (or rather lack of errors / difference in errors) are due to the particular compiler I was using.
I've come across something a bit strange, and I hope that someone can shed some light on my ignorance here. The necessary sample code is as follows:
#include <stdio.h>
int main(void)
{
unsigned a, b;
int w, x, y;
a = 0x00000001;
b = 0x00000020;
w = 31;
x = 32;
y = 33;
a << w; /*No error*/
a << x; /*No error*/
a << y; /*No error*/
a << 31; /*No error*/
a << 32; /*Error*/
a << 33; /*Error*/
a << 31U; /*No error*/
a << 32U; /*Error*/
a << 33U; /*Error*/
a << w + 1; /*No error*/
a << b; /*No error*/
return 0;
}
My question is this: why is it that an error is returned for a raw number, but not for any of the variables? They, I think, should be treated the same. According to the C11 standard
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2 , reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2 E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
The right side, since the left is unsigned type, should be 2^E2 reduced modulo one more than the maximum value representable in the result type.... That sentence isn't entirely clear to me, but in practice it seems that it is E1 << (E2%32) - despite that 32 is not the maximum representable in the result type. Regardless, it is not undefined for the C11 standard, yet the error
left shift count >= width of type [enabled by default]
shows up when trying to compile. I cannot deduce why it is that some values of >31 work (e.g. x = 33; a <
I am using the GCC compiler on 64-Bit Fedora.
Thanks in advance.
-Will
My question is this: why is it that an error is returned for a raw number, but not for any of the variables?
Because absence of compiler warnings is not a guarantee of good program behavior. The compiler would be right to emit a warning for a << x, but it does not have to.
They, I think, should be treated the same
The compiler is doing you a favor when it warns for a << 33. It is not doing you any favor when it doesn't warn for a << y, but the compiler does not have to do you any favor.
If you want to be certain that your program does not contain undefined behavior, you cannot rely on the absence of compiler warnings, but you can use a sound static analyzer. If a sound static analyzer for undefined behavior does not detect any in your program, then you can conclude that it does not produce any (modulo the conditions of use that would be documented for the analyzer in question). For instance:
$ frama-c -val t.c
...
t.c:13:[kernel] warning: invalid RHS operand for shift. assert 0 ≤ x < 32;
in practice it seems that it is E1 << (E2%32)
The reason you are seeing this is that this is the behavior implemented by the shift instructions in x86_64's instruction set.
However, shifting by a negative number or by a number larger than the width of the type is undefined behavior. It works differently on other architectures, and even some compiler for your architecture may compute it at compile-time (as part of the constant propagation phase) with rules that differ from the one you have noticed. Do not rely on the result being E1 << (E2%32) any more than you would rely on memory still containing the correct results after being free()d.
The right side, since the left is unsigned type, should be 2^E2 reduced modulo one more than the maximum value representable in the result type.... That sentence isn't entirely clear to me, but in practice it seems that it is E1 << (E2%32) - despite that 32 is not the maximum representable in the result type.
That's not the correct interpretation. It's the result that is modulo 2^32, not E2. That sentence is describing how bits shifted off the left side are discarded. As a result, any E2 greater than or equal to the number of bits in an int would be zero, if it were allowed. Since shifts greater than or equal to that number of bits are undefined behavior, the compiler is doing you the favor of producing an error at compile-time, rather than leaving it until runtime for strange and incorrect things to happen.
For n bit of data shifting is only possible for values x>0 and x<=n-1 where x is no of bit to shift.
here in your case unsigned has memory size equals to 32 bit so only possible shifting ranges from 1 to 31. you are trying to shift data beyond the storage size of that variable that's why it is giving error to you.
modulo one more than the maximum value representable in the result type....
means that the value E1 * 2^E2 is reduced mod (UINT_MAX+1) for unsigned int. This has nothing at all to do with your hypothesis about E2.
Regardless, it is not undefined for the C11 standard,
You forgot to read the paragraph before the one you quoted:
If the value of the right operand is negative or is
greater than or equal to the width of the promoted left operand, the behavior is undefined.
All the shifts of 32 or more cause undefined behaviour. The compiler is not required to issue a warning about this, but it's being nice to you in some of the cases.
Related
I want to make sure I understand exactly under what circumstances the << and >> operators in C produce Undefined Behavior. This is my current understanding:
Let...:
x_t be the type of x after integer promotion
N be the bitwidth of x after integer promotion
M be the number of 0s to the left of the most-significant 1 bit in the representation of x after integer promotion
x << y is UB if any of the following:
x < 0 (even if y == 0)
y < 0
y >= N
x_t is a signed type and y >= M
x >> y is UB if any of the following:
y < 0
y >= N
...and is implementation defined if:
x < 0
If I have this understanding correct, it would imply the following:
unsigned short x = 1;
x << 31;
This would be undefined behavior in the case where int is 32 bits and short is 16 (because x would be promoted to int, and the left shift by 31 would put the 1 bit into position 31), but it would be defined behavior in the case where int and short are both 32 bits (because x would be promoted to an unsigned int and 31 < 32).
Yes.
I find your definition of M a little weak. Specifically, it wasn't clear to me if you were including the sign bit.
But yes, the interpretation is correct.
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
y < 0 ⇒ UB
y >= N ⇒ UB
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum
value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
This paragraph is poorly worded.
There's no doubt that the behaviour of E1 << E2 isn't defined when E1 × 2E2 isn't representable.
For x << y, x_t is a signed type and y >= M ⇒ UB[1]
But what about when "E1 has a signed type and nonnegative value" is false? 3 << 2 is clearly not undefined behaviour, so that means neither the "then" not the "otherwise" clauses apply when this is false, so that means the spec is silent on the behaviour of -3 << 2. It's literally behaviour that's not defined by the spec. So,
For x << y, x < 0 ⇒ UB
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1/2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
For x >> y, x < 0 ⇒ Implementation defined
We need to consider not just two's-complement, but ones' complement and sign-magnitude in assessing the validity of this interpretation.
Do I have correct understanding of Undefined Behavior for shift operators in C?
Yes, whereas the M part is a little vague.
While the any of the following: lists some examples, it does not exhaust the whole possible space of behaviors. INT_MAX << 1 is also UB. The rule is x * 2**y <= X_T_MAX, where X_T_MAX is the maximum representable value in type x_t. That is a 2D plane of allowed numbers.
What you have written here is far more cumbersome to comprehend than the actual standard... the TL;DR of C17 6.5.7 can be summarized as:
UB: Don't left shift variables containing negative values.
UB: Don't left shift data into the sign bit of a signed operand (the type obtained after promotion).
UB: Don't shift by a negative or too large shift count.
Right-shifting variables containing negative values gives implementation-defined behavior in the form of either logical or arithmetic shift. Non-portable.
Done, that's it. No need to make things more complicated.
The golden rule is: never perform bitwise arithmetic on signed types ever. Abide it and you'll avoid a number of well-known bugs.
Under C89, the behavior of left-shifting an N-bit integer left by 0..N-1 bits was unambiguously defined for all possible signed or unsigned values of the integer, except on platforms where signed and unsigned types had padding bits in different places. On non-two's-complement platforms, however, behaving in the mandated function may have been less useful than e.g. processing << as a "multiply by power of two" operator, and more expensive than allowing compilers select in arbitrary fashion from among platform-specific interpretations (e.g. sometimes processing x<<1 as (x+x) and sometimes processing it by actually shifting x left by one bit).
Because there was no reason to imagine that implementations for two's-complement platforms would deviate from the C89 behavior, even if allowed to do so, and because people who worked with other platforms would be better placed than the Committee to weigh the pros and cons of handling the construct in precisely-predictable predictable or somewhat-unpredictable fashion, the C99 Committee opted to waive jurisdiction over the behavior of left-shifting negative numbers. The Committee classified left-shifts of negative numbers as Undefined Behavior because they never imagined that its characterization of actions as "non-portable or erroneous" would be twisted to imply that the Committee judged such actions "non-portable, and therefore erroneous".
I just checked the C++ standard. It seems the following code should NOT be undefined behavior:
unsigned int val = 0x0FFFFFFF;
unsigned int res = val >> 34; // res should be 0 by C++ standard,
// but GCC gives warning and res is 67108863
And from the standard:
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a signed type and a non-negative
value, the value of the result is the integral part of the quotient of
E1/2^E2. If E1 has a signed type and a negative value, the resulting
value is implementation-defined.
According to the standard, since 34 is NOT an negative number, the variable res will be 0.
GCC gives the following warning for the code snippet, and res is 67108863:
warning: right shift count >= width of type
I also checked the assembly code emitted by GCC. It just calls SHRL, and the Intel instruction document for SHRL, the res is not ZERO.
So does that mean GCC doesn't implement the standard behavior on Intel platform?
The draft C++ standard in section 5.8 Shift operators in paragraph 1 says(emphasis mine):
The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
So if unsigned int is 32 bits or less then this is undefined which is exactly the warning that gcc is giving you.
To explain exactly what happens:
The compiler will load 34 into a register, and then your constant in another register, and perform a right shift operation with those two registers. The x86 processor performs a "shiftcount % bits" on the shift value, meaning that you get a right-shift by 2.
And since 0x0FFFFFFF (268435455 decimal) divided by 4 = 67108863, that's the result you see.
If you had a different processor, for example a PowerPC (I think), it may well give you zero.
According to the answer to this questions:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Which seems to imply that 1 << 31 is undefined.
However GCC doesn't issue a warning if I use 1 << 31.
It does issue one for 1 << 32.
link
So which is it? Am I misunderstanding the standard?
Does GCC have its own interpretation?
No: 1 << 31 has undefined behavior if the type int has only 31 value bits.
1U << 31 is OK and evaluates to 0x80000000 if type unsigned int has 32 value bits.
On a system where bytes have 8 bits, sizeof(int) == 4 means int has at most 31 value bits, so shifting 1 by 31 places is undefined. Conversely, on a system where CHAR_BIT > 8, it may be OK to write 1 << 31.
gcc might issue a warning if you raise the warning level. try gcc -Wall -Wextra -W -Werror. clang does issue a warning with the same options.
To address Michaël Roy's comments, 1 << 31 does not evaluate to INT_MIN reliably. It might give this value on your system, but the Standard does not guarantee it, in fact the Standard describes this as undefined behavior, so not only can you not rely on it, you should avoid it to avoid spurious bugs. The optimizers routinely take advantage of potential undefined behavior to remove code and break the programmers' assumptions.
For example, the following code might compile to a simple return 1;:
int check_shift(int i) {
if ((1 << i) > 0)
return 1;
else
return 0;
}
None of the compilers supported by Godbolt's compiler explorer do, but doing so would not break conformity.
The reason GCC doesn't warn about this is because 1 << 31 was valid (but implementation-defined) in C90, and is valid (but implementation-defined) even in modern C++. C90 defines << as a bit shift and followed by saying that for unsigned types, its result was that of a multiplication, but did no such thing for signed types, which implicitly made it valid and left it covered by the general wording that bitwise operators have implementation-defined aspects for signed types. C++ nowadays defines << as multiplying to the corresponding unsigned type, with the result converted back to the signed type, which is implementation-defined as well.
C99 and C11 did make this invalid (saying the behaviour is undefined), but compilers are permitted to accept it as an extension. For compatibility with existing code, and to share code between the C and C++ frontends, GCC continues to do so, with one exception: you can use -fsanitize=undefined to get detected undefined behaviour to abort your program at run-time, and this one does handle 1 << 31, but only when compiling as C99 or C11.
It does invoke undefined behaviour, as explained by the other answers/comments. However, as to why GCC doesn't emit a diagnostic.
There are actually two things that can lead to undefined behaviour for a left-shift (both from [6.5.7]):
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Evidently GCC detects the first one (because it's trivial to do so), but not the latter.
I have the following function in C:
int lrot32(int a, int n)
{
printf("%X SHR %d = %X\n",a, 32-n, (a >> (32-n)));
return ((a << n) | (a >> (32-n)));
}
When I pass as arguments lrot32(0x8F5AEB9C, 0xB) I get the following:
8F5AEB9C shr 21 = FFFFFC7A
However, the result should be 47A. What am I doing wrong?
Thank you for your time
int is a signed integer type. C11 6.5.7p4-5 says the following:
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. [...] If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2 . If E1 has a signed type and a negative value, the resulting value is implementation-defined.
Thus in the case of <<, if the shifted value is negative, or the positive value after shift is not representable in the result type (here: int), the behaviour is undefined; in the case of >>, if the value is negative the result is implementation defined.
Thus in either case, you'd get results that at least depend on the implementation, and in the case of left-shift, worse, possibly on the optimization level and such. A strictly conforming program cannot rely on any particular behaviour.
If however you want to target a particular compiler, then check its manuals on what the behaviour - if any is specified - would be. For example GCC says:
The results of some bitwise operations on signed integers (C90 6.3,
C99 and C11 6.5).
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed ‘>>’ acts on
negative numbers by sign extension. [*]
As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed ‘<<’ as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
[*] sign extension here means that the sign bit - which is 1 for negative integers, is repeated by the shift amountwhen right-shift is executed - this is why you see those Fs in the result.
Furthermore GCC always requires 2's complement representation, so if you would always use GCC, no matter which architecture you're targeting, this is the behaviour you'd see. Also, in the future someone might use another compiler for your code, thus causing other behaviour there.
Perhaps you'd want to use unsigned integers - unsigned int or rather, if a certain width is expected, then for example uint32_t, as the shifts are always well-defined for it, and would seem to match your expectations.
Another thing to note is that not all shift amounts are allowed. C11 6.5.7 p3:
[...]If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
Thus if you ever shift an unsigned integer having width of 32 bits by 32 - left or right, the behaviour is undefined. This should be kept in mind. Even if the compiler wouldn't do anything wacky, some processor architectures do act as if shift by 32 would then shift all bits away - others behave as if the shift amount was 0.
I came across this piece of C code:
typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;
For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:
b << 16 will give 0x33440000
>> 16 will give 0x00003344
So, the 16 highest bits are discarded.
Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?
Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.
It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)
As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.
So, the 16 highest bits are discarded.
They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.
Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.
For an unsigned integral type (eg, the uint32_t we first thought was being used),
(b << 16) >> 16
is identical to b & (1<<16 - 1).
For a signed integral type though,
(b << 16)
could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case
(b << 16) >> 16
will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.
Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.
Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.