I came across this piece of C code:
typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;
For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:
b << 16 will give 0x33440000
>> 16 will give 0x00003344
So, the 16 highest bits are discarded.
Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?
Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.
It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)
As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.
So, the 16 highest bits are discarded.
They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.
Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.
For an unsigned integral type (eg, the uint32_t we first thought was being used),
(b << 16) >> 16
is identical to b & (1<<16 - 1).
For a signed integral type though,
(b << 16)
could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case
(b << 16) >> 16
will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.
Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.
Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.
Related
I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t
I have the following function in C:
int lrot32(int a, int n)
{
printf("%X SHR %d = %X\n",a, 32-n, (a >> (32-n)));
return ((a << n) | (a >> (32-n)));
}
When I pass as arguments lrot32(0x8F5AEB9C, 0xB) I get the following:
8F5AEB9C shr 21 = FFFFFC7A
However, the result should be 47A. What am I doing wrong?
Thank you for your time
int is a signed integer type. C11 6.5.7p4-5 says the following:
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. [...] If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2 . If E1 has a signed type and a negative value, the resulting value is implementation-defined.
Thus in the case of <<, if the shifted value is negative, or the positive value after shift is not representable in the result type (here: int), the behaviour is undefined; in the case of >>, if the value is negative the result is implementation defined.
Thus in either case, you'd get results that at least depend on the implementation, and in the case of left-shift, worse, possibly on the optimization level and such. A strictly conforming program cannot rely on any particular behaviour.
If however you want to target a particular compiler, then check its manuals on what the behaviour - if any is specified - would be. For example GCC says:
The results of some bitwise operations on signed integers (C90 6.3,
C99 and C11 6.5).
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed ‘>>’ acts on
negative numbers by sign extension. [*]
As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed ‘<<’ as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
[*] sign extension here means that the sign bit - which is 1 for negative integers, is repeated by the shift amountwhen right-shift is executed - this is why you see those Fs in the result.
Furthermore GCC always requires 2's complement representation, so if you would always use GCC, no matter which architecture you're targeting, this is the behaviour you'd see. Also, in the future someone might use another compiler for your code, thus causing other behaviour there.
Perhaps you'd want to use unsigned integers - unsigned int or rather, if a certain width is expected, then for example uint32_t, as the shifts are always well-defined for it, and would seem to match your expectations.
Another thing to note is that not all shift amounts are allowed. C11 6.5.7 p3:
[...]If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
Thus if you ever shift an unsigned integer having width of 32 bits by 32 - left or right, the behaviour is undefined. This should be kept in mind. Even if the compiler wouldn't do anything wacky, some processor architectures do act as if shift by 32 would then shift all bits away - others behave as if the shift amount was 0.
The C99 spec states:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
I'm curious to know, which implementations/compilers will not treat a signed E1 >> 31 as a bunch of 11111....?
Most embedded compilers for microcontrollers tend to favour logical shift (shift in zeroes) instead of arithmetic shift (shift in the sign bit).
This is probably because signed numbers are somewhat rare in embedded systems, since such programming is much closer to the hardware and further away from users, than for example desktop programming with screens.
Signed numbers is nothing but user presentation after all. If you have no need to print numbers to a user, then you don't need signed numbers very often at all.
And of course, it doesn't really make any sense to use shift on signed numbers to begin with. I have never in my programming career encountered a scenario when I had needed to do that. Meaning in most cases, such shifts are just accidental bugs.
You can simulate a signed, 2's complement arithmetic right shift using unsigned types without the use of if statements. For example:
#include <limits.h>
unsigned int asr(unsigned int x, unsigned int shift)
{
return (x >> shift) | -((x & ~(UINT_MAX >> 1)) >> shift);
}
You might need to use a different unsigned type and its associated max value in your code.
Edit: As pointed out below I missed the first part of the ANSI C standard:
"If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." The errors (or rather lack of errors / difference in errors) are due to the particular compiler I was using.
I've come across something a bit strange, and I hope that someone can shed some light on my ignorance here. The necessary sample code is as follows:
#include <stdio.h>
int main(void)
{
unsigned a, b;
int w, x, y;
a = 0x00000001;
b = 0x00000020;
w = 31;
x = 32;
y = 33;
a << w; /*No error*/
a << x; /*No error*/
a << y; /*No error*/
a << 31; /*No error*/
a << 32; /*Error*/
a << 33; /*Error*/
a << 31U; /*No error*/
a << 32U; /*Error*/
a << 33U; /*Error*/
a << w + 1; /*No error*/
a << b; /*No error*/
return 0;
}
My question is this: why is it that an error is returned for a raw number, but not for any of the variables? They, I think, should be treated the same. According to the C11 standard
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2 , reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2 E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
The right side, since the left is unsigned type, should be 2^E2 reduced modulo one more than the maximum value representable in the result type.... That sentence isn't entirely clear to me, but in practice it seems that it is E1 << (E2%32) - despite that 32 is not the maximum representable in the result type. Regardless, it is not undefined for the C11 standard, yet the error
left shift count >= width of type [enabled by default]
shows up when trying to compile. I cannot deduce why it is that some values of >31 work (e.g. x = 33; a <
I am using the GCC compiler on 64-Bit Fedora.
Thanks in advance.
-Will
My question is this: why is it that an error is returned for a raw number, but not for any of the variables?
Because absence of compiler warnings is not a guarantee of good program behavior. The compiler would be right to emit a warning for a << x, but it does not have to.
They, I think, should be treated the same
The compiler is doing you a favor when it warns for a << 33. It is not doing you any favor when it doesn't warn for a << y, but the compiler does not have to do you any favor.
If you want to be certain that your program does not contain undefined behavior, you cannot rely on the absence of compiler warnings, but you can use a sound static analyzer. If a sound static analyzer for undefined behavior does not detect any in your program, then you can conclude that it does not produce any (modulo the conditions of use that would be documented for the analyzer in question). For instance:
$ frama-c -val t.c
...
t.c:13:[kernel] warning: invalid RHS operand for shift. assert 0 ≤ x < 32;
in practice it seems that it is E1 << (E2%32)
The reason you are seeing this is that this is the behavior implemented by the shift instructions in x86_64's instruction set.
However, shifting by a negative number or by a number larger than the width of the type is undefined behavior. It works differently on other architectures, and even some compiler for your architecture may compute it at compile-time (as part of the constant propagation phase) with rules that differ from the one you have noticed. Do not rely on the result being E1 << (E2%32) any more than you would rely on memory still containing the correct results after being free()d.
The right side, since the left is unsigned type, should be 2^E2 reduced modulo one more than the maximum value representable in the result type.... That sentence isn't entirely clear to me, but in practice it seems that it is E1 << (E2%32) - despite that 32 is not the maximum representable in the result type.
That's not the correct interpretation. It's the result that is modulo 2^32, not E2. That sentence is describing how bits shifted off the left side are discarded. As a result, any E2 greater than or equal to the number of bits in an int would be zero, if it were allowed. Since shifts greater than or equal to that number of bits are undefined behavior, the compiler is doing you the favor of producing an error at compile-time, rather than leaving it until runtime for strange and incorrect things to happen.
For n bit of data shifting is only possible for values x>0 and x<=n-1 where x is no of bit to shift.
here in your case unsigned has memory size equals to 32 bit so only possible shifting ranges from 1 to 31. you are trying to shift data beyond the storage size of that variable that's why it is giving error to you.
modulo one more than the maximum value representable in the result type....
means that the value E1 * 2^E2 is reduced mod (UINT_MAX+1) for unsigned int. This has nothing at all to do with your hypothesis about E2.
Regardless, it is not undefined for the C11 standard,
You forgot to read the paragraph before the one you quoted:
If the value of the right operand is negative or is
greater than or equal to the width of the promoted left operand, the behavior is undefined.
All the shifts of 32 or more cause undefined behaviour. The compiler is not required to issue a warning about this, but it's being nice to you in some of the cases.
This question already has answers here:
Are the shift operators (<<, >>) arithmetic or logical in C?
(11 answers)
Closed 7 years ago.
The book "C The Complete Reference" by Herbert Schildt says that "(In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved.)"
What's the point of preserving the sign bit?
Moreover, I think that the book is referring to the case when negative numbers are represented using a sign bit and not using two's complement. But still even in that case the reasoning doesn't seem to make any sense.
The Schildt book is widely acknowledged to be exceptionally poor.
In fact, C doesn't guarantee that a 1 will be shifted in when you right-shift a negative signed number; the result of right-shifting a negative value is implementation-defined.
However, if right-shift of a negative number is defined to shift in 1s to the highest bit positions, then on a 2s complement representation it will behave as an arithmetic shift - the result of right-shifting by N will be the same as dividing by 2N, rounding toward negative infinity.
The statement is sweeping and inaccurate, like many a statement by Mr Schildt. Many people recommend throwing his books away. (Amongst other places, see The Annotated Annotated C Standard, and ACCU Reviews — do an author search on Schildt; see also the Definitive List of C Books on Stack Overflow).
It is implementation defined whether right shifting a negative (necessarily signed) integer shifts zeros or ones into the high order bits. The underlying CPUs (for instance, ARM; see also this class) often have two different underlying instructions — ASR or arithmetic shift right and LSR or logical shift right, of which ASR preserves the sign bit and LSR does not. The compiler writer is allowed to choose either, and may do so for reasons of compatibility, speed or whimsy.
ISO/IEC 9899:2011 §6.5.7 Bitwise shift operators
¶5 The result of E1 >> E2is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
The point is that the C >> (right shift) operator preserves1 the sign for a (signed) int.
For example:
int main() {
int a;
unsigned int b;
a = -8;
printf("%d (0x%X) >> 1 = %d (0x%X)\n", a, a, a>>1, a>>1);
b = 0xFFEEDDCC;
printf("%d (0x%X) >> 1 = %d (0x%X)\n", b, b, b>>1, b>>1);
return 0;
}
Output:
-8 (0xFFFFFFF8) >> 1 = -4 (0xFFFFFFFC) [sign preserved, LSB=1]
-1122868 (0xFFEEDDCC) >> 1 = 2146922214 (0x7FF76EE6) [MSB = 0]
If it didn't preserve the sign, the result would make absolutely no sense. You would take a small negative number, and by shifting right one (dividing by two), you would end up with a large positive number instead.
1 - This is implementation-defined, but from my experience, most compilers choose an arithmetic (sign-preserving) shift instruction.
In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved
Not necessarily. See the C standard C11 6.5.7:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
This means that the compiler is free to shift in whatever it likes (0 or 1), as long as it documents it.