Assume I have the variable x initialized to 425. In binary, that is 110101001.
Bitshifting it to the right by 2 as follows: int a = x >> 2;, the answer is: 106. In binary that is 1101010. This makes sense as the two right-most bits are dropped and two zero's are added to the left side.
Bitshifting it to the left by 2 as follows: int a = x << 2;, the answer is: 1700. In binary this is 11010100100. I don't understand how this works. Why are the two left most bits preserved? How can I drop them?
Thank you,
This is because int is probably 32-bits on your system. (Assuming x is type int.)
So your 425, is actually:
0000 0000 0000 0000 0000 0001 1010 1001
When left-shifted by 2, you get:
0000 0000 0000 0000 0000 0110 1010 0100
Nothing gets shifted off until you go all the way past 32. (Strictly speaking, overflow of signed-integer is undefined behavior in C/C++.)
To drop the bits that are shifted off, you need to bitwise AND against a mask that's the original length of your number:
int a = (425 << 2) & 0x1ff; // 0x1ff is for 9 bits as the original length of the number.
First off, don't shift signed integers. The bitwise operations are only universally unambiguous for unsigned integral types.
Second, why shift if you can use * 4 and / 4?
Third, you only drop bits on the left when you exceed the size of the type. If you want to "truncate on the left" mathematically, perform a modulo operation:
(x * 4) % 256
The bitwise equivalent is AND with a bit pattern: (x << 2) & 0xFF
(That is, the fundamental unsigned integral types in C are always implicitly "modulo 2n", where n is the number of bits of the type.)
Why would you expect them to be dropped? Your int (probably) consumes 4 bytes. You're shifting them into a space that it rightfully occupies.
The entire 4-byte space in memory is embraced during evaluation. You'd need to shift entirely out of that space in memory to "drop" them.
Related
I'm struggling with determining edge-sizes of each variable. I can't understand the following problem.
To get maximum value of char for example, I use: ~ 0 >> 1
Which should work like this:
transfer 0 to binary: 0000 0000 (I assume that char is stored on 8
bits)
negate it: 1111 1111 (now I'm out of char max size)
move one place right: 0111 1111 (I get 127 which seems to be correct)
Now I want to present this result using printf function.
Why exactly do I have to use cast like this:
printf("%d\n", (unsigned char)(~0) >> 1)?
I just don't get it. I assume that it has something to do with point 2 when I get out of char range, but I'm not sure.
I will be grateful if you present me more complex explanation to this problem.
Please don't use these kinds of tricks. They might work on ordinary machines but they are possibly unportable and hard to understand. Instead, use the symbolic constants from the header file limits.h which contains the size limits for each of the basic types. For instance, CHAR_MAX is the upper bound for a char, CHAR_MIN is the lower bound. Further limits for the numeric types declared in stddef.h and stdint.h can be found in stdint.h.
Now for your question: Arithmetic is done on values of type int by default, unless you cause the operands involved to have a different type. This happens for various reasons, like one of the variables involved having a different type or you using a iteral of different type (like 1.0 or 1L or 1U). Even more importantly, the type of an arithmetic expression promotes from the inside to the outside. Thus, in the statement
char c = 1 + 2 + 3;
The expression 1 + 2 + 3 is evaluated as type int and only converted to char immediately before assigning. Even more important is that in the C language, you can't do arithmetic on types smaller than int. For instance, in the expression c + 1 where c is of type char, the compiler inserts an implicit conversion from char to int before adding one to c. Thus, a statement like
c = c + 1;
actually behaves like this in C:
c = (char)((int)c + 1);
Thus, ~0 >> 1 actually evaluates to 0xffffffff (-1) on a usual 32 bit architecture because the type int usually has 32 bits and right shifting of signed types usually shifts sign bits so the most significant bit becomes a one. Casting to unsigned char cause truncation, with the result being 0xff (255). All arguments but the first to printf are part of a variable argument list which is a bit complicated but basically means that all types smaller than int are converted to int, float is converted to double and all other types are left unchanged.
Now, how can we get this right? On an ordinary machine with two's complement and no padding bits one could use expressions like these to compute the largest and smallest char, assuming sizeof (char) < sizeof (int):
(1 << CHAR_BIT - 1) - 1; /* largest char */
-(1 << CHAR_BIT - 1); /* smallest char */
For other types, this is going to be slightly more difficult since we need to avoid overflow. Here is an expression that works for all signed integer types on an ordinary machine, where type is the type you want to have the limits of:
(type)(((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) - 1) /* largest */
(type)-((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) /* smallest */
For an unsigned type type, you could use this to get the maximum:
~(type)0
Please notice that all these tricks should not appear in portable code.
The exact effect of your actions is different from what you assumed.
0 is not 0000 0000. 0 has type int, which means that it is most likely 0000 0000 0000 0000 0000 0000 0000 0000, depending on how many bits int has on your platform. (I will assume 32-bit int.)
Now, ~0 is, expectedly, 1111 1111 1111 1111 1111 1111 1111 1111, which still has type int and is a negative value.
When you shift it to the right, the result is implementation-defined. Right-shifting negative signed integer values in C does not guarantee that you will obtain 0 in the sign bit. Quite the opposite, most platforms will actually replicate the sign bit when right-shifting. Which means that ~0 >> 1 will still give you 1111 1111 1111 1111 1111 1111 1111 1111.
Note, that even if you do this on a platform that shifts-in a 0 into the sign bit when right-shifting negative values, you will still obtain 0111 1111 1111 1111 1111 1111 1111 1111, which is in general case not the maximum value of char you were trying to obtain.
If you want to make sure that a right-shift operation shifts-in 0 bits from the left, you have to either 1) shift an unsigned bit-pattern or 2) shift a signed, but positive bit-pattern. With negative bit patterns you risk running into the sign-extending behavior, meaning that for negative values 1 bits would be shifted-in from the left instead of 0 bits.
Since C language does not have shifts that would work in the domain of [unsigned/signed] char type (the operand is promoted to int anyway before the shift), what you can do is make sure that you are shifting a positive int value and make sure that your initial bit-mask has the correct number of 1s in it. That is exactly what you achieve by using (unsigned char) ~0 as the initial mask. (unsigned char) ~0 will participate in the shift as a value of type int equal to 0000 0000 0000 0000 0000 0000 1111 1111 (assuming 8-bit char). After the shift you will obtain 0000 0000 0000 0000 0000 0000 0111 1111, which is exactly what you were trying to obtain.
That only works with unsigned integers. For signed integers, right shifting a negative number and the behaviour of in bit-wise inversion is implementation defined. Not only it depends on the representation of negative values, but also what CPU instruction the compiler uses to perform the right-shift (some CPUs do not have arithmetic (right) shift for instance.
So, unless you make additional constraints for your implementation, it is not possible to determine the limits of signed integers. This implies there is no completely portable way (for signed integers).
Note that whether char is signed or unsigned is also implementation defined and that (unsigned char)(~0) >> 1 is subject to integer promotions, so it will not yield a character result, but an int. (which makes the format specifier correct - allthough presumably unintended).
Use limits.h to get macros for your implementation's integer limits. This file has to be provided by any standard-compliant C compiler.
I have the following code:
unsigned char chr = 234; // 1110 1010
unsigned long result = 0;
result = chr << 24;
And now result will equal 18446744073340452864, which is 1111 1111 1111 1111 1111 1111 1111 1111 1110 1010 0000 0000 0000 0000 0000 0000 in binary.
Why is there sign extension being done, when chr is unsigned?
Also if I change the shift from 24 to 8 then result is 59904 which is 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 1010 0000 0000 in binary. Why here is there no extension done here? (Any shift 23 or less doesn't have sign extension done to it)
Also on my current platform sizeof(long) is 8.
What are the rules for automatically casting to larger size types when shifting? It seems to me that if the shift is 23 or less than the chr gets casted to an unsigned type and if it's 24 or more it gets casted to a signed type? (And why is sign extension even being done at all with a left shift)
With chr = 234, the expression chr << 24 is evaluated in isolation: chr is promoted to (a 32-bit signed) int and shifted left 24 bits, yielding a negative int value. When you assign to a 64-bit unsigned long, the sign-bit is propagated through the most significant 32 bits of the 64-bit value. Note that the method of calculating chr << 24 is not itself affected by what the value is assigned to.
When the shift is just 8 bits, the result is a positive (32-bit signed) integer, and that sign bit (0) is propagated through the most significant 32-bits of the unsigned long.
To understand this it's easiest to think in terms of values.
Each integral type has a fixed range of representable values. For example, unsigned char usually ranges from 0 to 255 ; other ranges are possible and you can find your compiler's choice by checking UCHAR_MAX in limits.h.
When doing a conversion between integral types; if the value is representable in the destination type, then the result of the conversion is that value. (This may be a different bit-pattern, e.g. sign extension).
If the value is not representable in the destination type then:
for signed destinations, the behaviour is implementation-defined (which may include raising a signal).
for unsigned destinations, the value is adjusted modulo the maximum value representable in the type, plus one.
Modern systems handle the signed out-of-range assignment by left-truncating excessive bits; and if it is still out-of-range then it retains the same bit-pattern, but the value changes to whatever value that bit-pattern represents in the destination type.
Moving onto your actual example.
In C, there is something called the integral promotions. With <<, this happens to the left-hand operand; with the arithmetic operators it happens to all operands. The effect of integral promotions is that any value of a type smaller than int is converted to the same value with type int.
Further, the definition of << 24 is multiplication by 2^24 (where this has the type of the promoted left operand), with undefined behaviour if this overflows. (Informally: shifting into the sign bit causes UB).
So, putting all the conversions explicitly, your code is
result = (unsigned long) ( ((int)chr) * 16777216 )
Now, the result of this calculation is 3925868544 , which if you are on a typical system with 32-bit int, is greater than INT_MAX which is 2147483647, so the behaviour is undefined.
If we want to explore results of this undefined behaviour on typical systems: what may happen is the same procedure I outlined earlier for out-of-range assignment. The bit-pattern of 3925868544 is of course 1110 1010 0000 0000 0000 0000 0000 0000. Treating this as the pattern of an int using 2's complement gives the int -369098752.
Finally we have the conversion of this value to unsigned long. -369098752 is out of range for unsigned long; and the rule for an unsigned destination is to adjust the value modulo ULONG_MAX+1. So the value you are seeing is 18446744073709551615 + 1 - 369098752.
If your intent was to do the calculation in unsigned long precision, you need to make one of the operands unsigned long; e.g. do ((unsigned long)chr) << 24. (Note: 24ul won't work, the type of the right-hand operand of << or >> does not affect the left-hand operand).
I'm right-shifting -109 by 5 bits, and I expect -3, because
-109 = -1101101 (binary)
shift right by 5 bits
-1101101 >>5 = -11 (binary) = -3
But, I am getting -4 instead.
Could someone explain what's wrong?
Code I used:
int16_t a = -109;
int16_t b = a >> 5;
printf("%d %d\n", a,b);
I used GCC on linux, and clang on osx, same result.
The thing is you are not considering negative numbers representation correctly. With right shifting, the type of shift (arithmetic or logical) depends on the type of the value being shifted. If you cast your value to an unsigned value, you might get what you are expecting:
int16_t b = ((unsigned int)a) >> 5;
You are using -109 (16 bits) in your example. 109 in bits is:
00000000 01101101
If you take's 109 2's complement you get:
11111111 10010011
Then, you are right shifting by 5 the number 11111111 10010011:
__int16_t a = -109;
__int16_t b = a >> 5; // arithmetic shifting
__int16_t c = ((__uint16_t)a) >> 5; // logical shifting
printf("%d %d %d\n", a,b,c);
Will yield:
-109 -4 2044
The result of right shifting a negative value is implementation defined behavior, from the C99 draft standard section 6.5.7 Bitwise shift operators paragraph 5 which says (emphasis mine going forward):
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
If we look at gcc C Implementation-defined behavior documents under the Integers section it says:
The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5).
Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension.
That's pretty clear what's happening, when representing signed integers, negative integers have a property which is, sign extension, and the left most significant bit is the sign bit.
So, 1000 ... 0000 (32 bit) is the biggest negative number that you can represent, with 32 bits.
Because of this, when you have a negative number and you shift right, a thing called sign extension happens, which means that the left most significant bit is extended, in simple terms it means that, for a number like -109 this is what happens:
Before shifting you have (16bit):
1111 1111 1001 0011
Then you shift 5 bits right (after the pipe are the discarded bits):
1XXX X111 1111 1100 | 1 0011
The X's are the new spaces that appear in your integer bit representation, that due to the sign extension, are filled with 1's, which give you:
1111 1111 1111 1100 | 1 0011
So by shifting: -109 >> 5, you get -4 (1111 .... 1100) and not -3.
Confirming results with the 1's complement:
+3 = 0... 0000 0011
-3 = ~(0... 0000 0011) + 1 = 1... 1111 1100 + 1 = 1... 1111 1101
+4 = 0... 0000 0100
-4 = ~(0... 0000 0100) + 1 = 1... 1111 1011 + 1 = 1... 1111 1100
Note: Remember that the 1's complement is just like the 2's complement with the diference that you first must negate the bits of positive number and only then sum +1.
Pablo's answer is essentially correct, but there are two small bits (no pun intended!) that may help you see what's going on.
C (like pretty much every other language) uses what's called two's complement, which is simply a different way of representing negative numbers (it's used to avoid the problems that come up with other ways of handling negative numbers in binary with a fixed number of digits). There is a conversion process to turn a positive number in two's complement (which looks just like any other number in binary - except that the furthest most left bit must be 0 in a positive number; it's basically the sign place-holder) is reasonably simple computationally:
Take your number
00000000 01101101 (It has 0s padding it to the left because it's 16 bits. If it was long, it'd be padded with more zeros, etc.)
Flip the bits
11111111 10010010
Add one.
11111111 10010011.
This is the two's complement number that Pablo was referring to. It's how C holds -109, bitwise.
When you logically shift it to the right by five bits you would APPEAR to get
00000111 11111100.
This number is most definitely not -4. (It doesn't have a 1 in the first bit, so it's not negative, and it's way too large to be 4 in magnitude.) Why is C giving you negative 4 then?
The reason is basically that the ISO implementation for C doesn't specify how a given compiler needs to treat bit-shifting in negative numbers. GCC does what's called sign extension: the idea is to basically pad the left bits with 1s (if the initial number was negative before shifting), or 0s (if the initial number was positive before shifting).
So instead of the 5 zeros that happened in the above bit-shift, you instead get:
11111111 11111100. That number is in fact negative 4! (Which is what you were consistently getting as a result.)
To see that that is in fact -4, you can just convert it back to a positive number using the two's complement method again:
00000000 00000011 (bits flipped)
00000000 00000100 (add one).
That's four alright, so your original number (11111111 11111100) was -4.
My head is starting to hurt... I've been looking at this way too long.
I'm trying to mask the most significant nibble of an int, regardless of the int bit length and the endianness of the machine. Let's say x = 8425 = 0010 0000 1110 1001 = 0x20E9. I know that to get the least significant nibble, 9, I just need to do something like x & 0xF to get back 9. But how about the most significant nibble, 2?
I apologize if my logic from here on out falls apart, my brain is completely fried, but here I go:
My book tells me that the bit length w of the data type int can be computed with w = sizeof(int)<<3. If I knew that the machine were big-endian, I could do 0xF << w-4 to have 1111 for the most significant nibble and 0000 for the rest, i.e. 1111 0000 0000 0000. If I knew that the machine were little-endian, I could do 0xF >> w-8 to have 0000 0000 0000 1111. Fortunately, this works even though we are told to assume that right shifts are done arithmetically just because 0xF always gives me the first bit of 0000. But this is not a proper solution. We are not allowed to test for endianness and then proceed from there, so what do I do?
Bit shifting operators operate at a level of abstraction above endianness. "Left" shifts always shift towards the most significant bit, and "right" shifts always shift towards the least significant bit.
You should be able to right shift by the (number of bits) - 4 regardless of endianness.
Since you already know how to compute the number of bits, it should suffice to just subtract 4 and shift by that number, and then (for safety), mask with 0xF.
See this question for discussion about endianness.
Q But how about the most significant nibble, 2?
A (x >> (sizeof(int)*8-4)) & 0xF
Quick question on left shifts in assembly using the "sall" instruction.
From what I understand, "sall rightop, leftop" would translate to "leftop = leftop << rightop", so taking an integer and shifting the bits 4 spaces to the left would result in a multiplication by 2^4.
But what happens when the integer is unsigned, 32-bits, and is something like:
1111 1111 1111 1111 1111 0000 0010 0010
Would a left shift in this case become 1111 1111 1111 1111 0000 0010 0010 0000 ?
Obviously this is not a multiplication by 2^4.
Thanks!!
It is a multiplication by 2^4, modulo 2^32:
n = (n * 2^4) % (2 ^ 32)
You can detect the bits that got "shifted out" by performing a shift left followed by masking, in this case
dropped = (n >> (32-4)) & (1<<4-1)
Left shifts (SAL, SHL) simply lose the bits on the left. The bits on the right get filled with 0. If any of the lost bits is 1, you have an overflow and the wrong result in terms of multiplication. You use S*L for both non-negative and negative values.
The regular right shift (SHR) works in exactly the same manner, but the direction is reverse, the bits on the left get filled with 0 and you lose the bits on the right. The result is effectively rounded/truncated towards 0. You use SHR of non-negative values because it does not preserve the sign of the value (0 gets written into it).
The arithmetic shift right (SAR) is slightly different. The most significant bit (=leftmost bit, sign bit) doesn't get filled with 0. It gets filled with its previous value, thus preserving the sign of the value. Another notable difference is that if the value is negative, the lost bits on the right result in rounding towards minus infinity instead of 0.