I am trying to figure out how exactly arithmetic bit-shift operators work in C, and how it will affect signed 32-bit integers.
To make things simple, let's say we work within one byte (8 bits):
x = 1101.0101
MSB[ 1101.0101 ]LSB
Reading other posts on Stack Overflow and some websites, I found that:
<< will shift toward MSB (to the left, in my case), and fill "empty" LSB bits with 0s.
And >> will shift toward LSB (to the right, in my case) and fill "empty" bits with MS bit
So, x = x << 7 will result in moving LSB to MSB, and setting everything to 0s.
1000.0000
Now, let's say I would >> 7, last result. This would result in [0000.0010]? Am I right?
Am I right about my assumptions about shift operators?
I just tested on my machine, **
int x = 1; //000000000......01
x = x << 31; //100000000......00
x = x >> 31; //111111111......11 (Everything is filled with 1s !!!!!)
Why?
Right shift of a negative signed number has implementation-defined behaviour.
If your 8 bits are meant to represent a signed 8 bit value (as you're talking about a "signed 32 bit integer" before switching to 8 bit examples) then you have a negative number. Shifting it right may fill "empty" bits with the original MSB (i.e. perform sign extension) or it may shift in zeroes, depending on platform and/or compiler.
(Implementation-defined behaviour means that the compiler will do something sensible, but in a platform-dependent manner; the compiler documentation is supposed to tell you what.)
A left shift, if the number either starts out negative, or the shift operation would shift a 1 either to or beyond the sign bit, has undefined behaviour (as do most operations on signed values which cause an overflow).
(Undefined behaviour means that anything at all could happen.)
The same operations on unsigned values are well-defined in both cases: the "empty" bits will be filled with 0.
Bitwise shift operations are not defined for negative values
for '<<'
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
and for '>>'
6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.
It's a waste of time to study the behaviour of these operations on signed numbers on a specific implementation, because you have no guarantee it will work the same way on any other implementation (an implementation is, for example, you compiler on your computer with your specific commad-line parameters).
It might not even work for an older or a newer version of the very same compiler. The compiler might even define those bits as random or undefined. This would mean that the very same code sequence could produce totally different results when used across your sources or even depend on things like assembly optimisation or other register usage. If encapsulated in a function it might not even produce the same result in those bits on two consecutive calls with the same arguments.
Considering only non-negative values, the effect of left shifting by 1 (expression << 1) is the same as multpliying the expression by 2 (provided expression * 2 does not overflow) and the effect of right shifting by 1 (expression >> 1) is the same as dividing by 2.
As of c++20 the bitwise shift operators for signed integers are well defined.
The left shift a<<b is equivalent to a*2^b modulus 2^N where N is the number of bits in the resulting type. In particular 1<<31 is in fact the smallest int value.
The right shift a>>b is equivalent to a/2^b, rounded down (ie. towards negative infinity). So e.g. -1>>10 == -1.
For some more details see https://en.cppreference.com/w/cpp/language/operator_arithmetic .
(for the older standards see the answer by Matthew Slattery)
As others said shift of negative value is implementation-defined.
Most of implementations treat signed right shift as floor(x/2N) by filling shifted in bits using sign bit. It is very convenient in practice, as this operation is so common. On the other hand if you will shift right unsigned integer, shifted in bits will be zeroed.
Looking from machine side, most implementations have two types of shift-right instructions:
An 'arithmetic' shift right (often having mnemonic ASR or SRA) which works as me explained.
A 'logic' shift right (oftem having mnemonic LSR or SRL or SR) which works as you expect.
Most of compilers utilize first for signed types and second for unsigned ones. Just for convenience.
In the 32 bit compiler
x = x >> 31;
here x is the signed integer so 32nd bit is sign bit.
final x value is 100000...000. and 32nd bit indicate -ive value.
here x value implement to 1's compliment.
then final x is -32768
On my i7:
uint64_t:
0xffffffffffffffff >> 0 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 1 is 0b0111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 2 is 0b0011111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 3 is 0b0001111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 4 is 0b0000111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 62 is 0b0000000000000000000000000000000000000000000000000000000000000011
0xffffffffffffffff >> 63 is 0b0000000000000000000000000000000000000000000000000000000000000001
0xffffffffffffffff >> 64 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 65 is 0b0111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 66 is 0b0011111111111111111111111111111111111111111111111111111111111111
int64_t -1
0xffffffffffffffff >> 0 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 1 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 2 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 3 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 4 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 62 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 63 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 64 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 65 is 0b1111111111111111111111111111111111111111111111111111111111111111
0xffffffffffffffff >> 66 is 0b1111111111111111111111111111111111111111111111111111111111111111
int64_t 2^63-1
0x7fffffffffffffff >> 0 is 0b0111111111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 1 is 0b0011111111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 2 is 0b0001111111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 3 is 0b0000111111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 4 is 0b0000011111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 62 is 0b0000000000000000000000000000000000000000000000000000000000000001
0x7fffffffffffffff >> 63 is 0b0000000000000000000000000000000000000000000000000000000000000000
0x7fffffffffffffff >> 64 is 0b0111111111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 65 is 0b0011111111111111111111111111111111111111111111111111111111111111
0x7fffffffffffffff >> 66 is 0b0001111111111111111111111111111111111111111111111111111111111111
Related
I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t
I came across this piece of C code:
typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;
For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:
b << 16 will give 0x33440000
>> 16 will give 0x00003344
So, the 16 highest bits are discarded.
Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?
Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.
It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)
As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.
So, the 16 highest bits are discarded.
They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.
Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.
For an unsigned integral type (eg, the uint32_t we first thought was being used),
(b << 16) >> 16
is identical to b & (1<<16 - 1).
For a signed integral type though,
(b << 16)
could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case
(b << 16) >> 16
will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.
Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.
Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.
I am confused by some behavior of size_t that I noticed:
size_t zero = 0x1 << 32;
size_t big = 0x1 << 31;
size_t not_as_big = 0x1 << 30;
printf("0x1<<32: %zx\n0x1<<31: %zx\n0x1<<30: %zx\n", zero, big, not_as_big);
Results in:
0x1<<32: 0
0x1<<31: ffffffff80000000
0x1<<30: 40000000
Now, I understand that size_t is only guaranteed to be at minimum a 16 bit unsigned integer, but I don't understand why 0x1<<31 ends up the value it did - trying to allocate 18 exabytes did a number on my program.
I'm using LLVM on x86_64.
Shifting a signed integer so that 1's get into the sign bit position or even further is undefined in C, so the compiler is free to do the following:
0x1 << 32
Here the compiler sees a 32-bit int (0x1) which is shifted by 32 bits. Since the compiler is free to interpret it in a way that is consistent with more a correct shift, it interprets it as 0x1_0000_0000 and tries to convert it to 32-bit int, resulting in 0x0000_0000, and then sees that you later assign the result to a size_t, which is usually 64-bit: 0x0000_0000_0000_0000
0x1 << 31
As before, the compiler is free to do whatever it seems right with it, because the 1 bit invades the sign bit position. So the result is 0x8000_0000, which is a negative number - INT_MIN to be precise. Then, it sees you convert that negative number to 64 bits, so it extends it with ones, as with all negative numbers. The result is 0xffff_ffff_8000_0000, the smallest 32-bit signed integer stored as a signed 64-bit integer.
The correct and portable between all 64-bit platforms way to do it is:
((size_t)1) << 32
((size_t)1) << 31
0x1 is of type int and in an implementation with 32-bit int, both expressions evaluations:
0x1 << 32
and
0x1 << 31
invoke undefined behavior.
To fix the issue (but assuming you don't want to keep zero object evaluating to 0), do as is suggested in KarolS answer
(size_t) 1 << 32
and
(size_t) 1 << 31
This assumes a size_t type wider than 32-bit which is the case with clang implementation on x64.
I'm not so good with bitwise operators so please excuse the question but how would I clear the lower 16 bits of a 32-bit integer in C/C++?
For example I have an integer: 0x12345678 and I want to make that: 0x12340000
To clear any particular set of bits, you can use bitwise AND with the complement of a number that has 1s in those places. In your case, since the number 0xFFFF has its lower 16 bits set, you can AND with its complement:
b &= ~0xFFFF; // Clear lower 16 bits.
If you wanted to set those bits, you could instead use a bitwise OR with a number that has those bits set:
b |= 0xFFFF; // Set lower 16 bits.
And, if you wanted to flip those bits, you could use a bitwise XOR with a number that has those bits set:
b ^= 0xFFFF; // Flip lower 16 bits.
Hope this helps!
To take another path you can try
x = ((x >> 16) << 16);
One way would be to bitwise AND it with 0xFFFF0000 e.g. value = value & 0xFFFF0000
Use an and (&) with a mask that is made of the top 16 bit all ones (that will leave the top bits as they are) and the bottom bits all zeros (that will kill the bottom bits of the number).
So it'll be
0x12345678 & 0xffff0000
If the size of the type isn't known and you want to mask out only the lower 16 bits you can also build the mask in another way: use a mask that would let pass only the lower 16 bits
0xffff
and invert it with the bitwise not (~), so it will become a mask that kills only the lower 16 bits:
0x12345678 & ~0xffff
int x = 0x12345678;
int mask = 0xffff0000;
x &= mask;
Assuming the value you want to clear bits from has an unsigned type not of "small rank", this is the safest, most portable way to clear the lower 16 bits:
b &= -0x10000;
The value -0x10000 will be promoted to the type of b (an unsigned type) by modular arithmetic, resulting in all high bits being set and the low 16 bits being zero.
Edit: Actually James' answer is the safest (broadest use cases) of all, but the way his answer and mine generalize to other similar problems is a bit different and mine may be more applicable in related problems.
Why does (-1 >> 1) result in -1? I'm working in C, though I don't think that should matter.
I can not figure out what I'm missing...
Here is an example of a C program that does the calc:
#include <stdio.h>
int main()
{
int num1 = -1;
int num2 = (num1 >> 1);
printf( "num1=%d", num1 );
printf( "\nnum2=%d", num2 );
return 0;
}
Because signed integers are represented in two's complement notation.
-1 will be 11111111 (if it was an 8 bit number).
-1 >> 1 evidently sign extends so that it remains 11111111. This behaviour depends on the compiler, but for Microsoft, when shifting a signed number right (>>) the sign bit is copied, while shifting an unsigned number right causes a 0 to be put in the leftmost bit.
An Arithmetic right shift will preserve the sign when shifting a signed number:
11111111 (-1) will stay 11111111 (-1)
In contrast, a Logical right shift won't preserve the sign:
11111111 (-1) will become 01111111 (127)
Your code clearly does an arithmetic shift, so the sign bit (MSB) is repeated. What the operator (>>) does depends on the implementation details of the platform you're using. In most cases, it's an arithmetic shift.
Also, note that 11111111 can have two different meanings depending on the representation. This also affects they way they'll be shifted.
If unsigned, 11111111 represents 255. Shifting it to the right won't preserve the sign since the MSB is not a sign bit.
If signed, 11111111 represents -1. Arithmetically shifting it to the right will preserve the sign.
Bit shifting a negative number is implementation-behavior in C. The results will depend on your platform, and theoretically could be completely nonsensical. From the C99 standard (6.5.7.5):
The result of E1 >> E2 is E1
right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a
signed type and a nonnegative value,
the value of the result is the
integral part of the quotient of E1 /
2^E2 . If E1 has a signed type and a
negative value, the resulting value is
implementation-defined.
The reason this happens is most likely because your compiler uses the x86 SAR (Shift Arithmetic Right) instruction to implement >>. This means sign extension will occur - the most significant bit will be replicated into the new MSB once the value is shifted over. From the intel manuals:
The shift arithmetic right (SAR) and
shift logical right (SHR) instructions
shift the bits of the destination
operand to the right (toward less
significant bit locations). For each
shift count, the least significant bit
of the destination operand is shifted
into the CF flag, and the most
significant bit is either set or
cleared depending on the instruction
type. The SHR instruction clears the
most significant bit (see Figure 7-8
in the Intel® 64 and IA-32
Architectures Software Developer’s
Manual, Volume 1); the SAR instruction
sets or clears the most significant
bit to correspond to the sign (most
significant bit) of the original value
in the destination operand. In effect,
the SAR instruction fills the empty
bit position’s shifted value with the
sign of the unshifted value (see
Figure 7-9 in the Intel® 64 and IA-32
Architectures Software Developer’s
Manual, Volume 1).
When you right shift and the leftmost bit is a 1, some platforms/compilers will bring in a 0 and some will preserve the 1 and make the new leftmost bit a 1. This preserves the sign of the number so a negative number stays negative and is called sign extension.
You'll see the difference if you try ((unsigned) -1) >> 1, which will do an unsigned right shift and so will always shift in a 0 bit.
sign extension.