I want to get the values of INT_MIN and INT_MAX. I've tried ~0 and ~0 >> 1 since the leftmost bit is a sign bit but I got -1 for both of them.
It's so confused that why ~0 doesn't turn out to be 0xffffffff and ~0 >> 1 to be 0x7fffffff?
Use:
~0U >> 1
Suffix 'U' for unsigned shift behavior.
so, confused that why not ~0 turns out to be 0xffffffff?
See, what is 0 say in four bytes representation:
BIT NUMBER 31 0
▼ ▼
number bits 0000 0000 0000 0000 0000 0000 0000 0000
▲ ▲
MSB LSB
LSB - Least Significant Bit (numbered 0)
MSB - Most Significant Bit (numbered 31)
Now ~ is bitwise not operator then flips all bits in 0 as:
BIT NUMBER 31 0
▼ ▼
number bits 1111 1111 1111 1111 1111 1111 1111 1111
▲ ▲
MSB LSB
Because of MSB = 1 this representation is treated as negative number and its magnitude is find using 2'complement math that is -1.
How?
What is 1 ? it is:
number bits 0000 0000 0000 0000 0000 0000 0000 0001
▲ ▲
MSB LSB
1's complement of 1
number bits 1111 1111 1111 1111 1111 1111 1111 1110
▲ ▲
MSB LSB
2'complement? Add 1 in one's complement, that is:
number bits 1111 1111 1111 1111 1111 1111 1111 1111
▲ ▲
MSB LSB
this same as when you gets ~0 ? that is why you are getting -1 output.
Now >> shift operator?
In most implementation of C >> operator is defined as an arithmetic right shift, which preserves the sign bit MSB. So ~0 >> 1 is noting but -1 remains same.
6.5.7 [Bitwise shift operators]
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
You requirement is what is called unsigned right shift >> and the behavior you needed can be find using unsigned number that is why I suffixed U as 0U.
How to print INT_MIN and INT_MAX?
Because printing INT_MIN and INT_MAX is bit tricky(because of undefined and implementation behavior of setting MSB and bit-overflow) in C so I have written a code as follows:
#include <stdio.h>
#include<limits.h> /* include for CHAR_BIT */
int main(){
int my_int_min = 1U << ((sizeof(int) * CHAR_BIT) - 1);
int my_int_max = ~0U >> 1;
printf("INT_MIN = %d\n", my_int_min);
printf("INT_MAX = %d\n", my_int_max);
return 0;
}
See it executing #codepad, it output is:
INT_MIN = -2147483648
INT_MAX = 2147483647
How does this code work?
Note for 32-bit number range is [-2147483648, 2147483647] that is equals to [-231, 231 -1 ].
INT_MIN: -231 == -2147483648 is:
1000 0000 0000 0000 0000 0000 0000 0000
▲ ▲
MSB LSB
In expression 1U << ((sizeof(int) * CHAR_BIT) - 1), I shifts first bit the LSB(that is 1) to left most side at MSB, And because in C, setting signed bit is undefined behavior when operand is singed type so I used unsigned one 1U.
6.5.7 [Bitwise shift operators]
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Another point to note is I used CHAR_BIT a standard macro defined in limits.h that tells number of bits in one char in a C implementation (remember: A char is always one byte size but number of bits in one bytes can be different on different system not always guaranteed to be 8).
INT_MAX: 231 -1 == 2147483647
0111 1111 1111 1111 1111 1111 1111 1111
▲ ▲
MSB LSB
0 is of type int. So are ~0 and ~0 >> 1 because of int type promotion
~0 has all 1s in its bit pattern and it's -1 in 2's complement, which is the default representation of most modern implementations.
Right shift in C is implementation defined. But most implementations define >> as arithmetic shift when the type is signed and logical shift when the type is unsigned
Since ~0 is int, which is a signed type, ~0 >> 1 will be an arithmetic shift right. Hence the value is sign extended, cause the value to be all 1s
You need to do unsigned(~0) >> 1 or ~0U
There are no ways to get INT_MIN and INT_MAX portably because in C there are 3 different signed type implementations beside trap representations and padding bits. That's why standard libraries always define INT_MIN and INT_MAX directly with the values
~0 is -1. Every C implementation that you're likely to run into uses two's complement for signed integers, so 0xffffffff is -1 (assuming 32-bit integers). ~0 >> 1 is the equivalent of dividing -1 by 2; since we're doing integer arithmetic, the result is -1.
The value of an all bit set int depends on the sign representation that your platform has for int. This is why the macros INT_MIN and INT_MAX were invented, there is no way of computing these values in a portable way.
Based on the wikipedia article, C normally implements an arithmetic shift. That means that when you right-shift the quantity 0xffffffff, the left-most bit (sign bit) of 1 will be preserved, as you observe.
However, Wikipedia also mentions the following, so you will get a logical shift (result of 0x7fffffff) if you use the unsigned type.
The >> operator in C and C++ is not necessarily an arithmetic shift.
Usually it is only an arithmetic shift if used with a signed integer
type on its left-hand side. If it is used on an unsigned integer type
instead, it will be a logical shift.
numbers are stored in 2's compliment so ~0 is 0XFFFFFFFF which is -1 .
so FFFFFFF(1111) >>1 gives (1111)FFFFFFF = 0XFFFFFFFF = -1.
When shifting an unsigned value, the >> operator in C is a logical shift. When shifting a signed value, the >> operator is an arithmetic shift.
So , ~0U >> 1 gives (1111)FFFFFFF = 0XFFFFFFFF .
On a 32bit system, 0 is 0x00000000. ~ is the bitwise not operator, which turns every 0 bit to 1 and vice versa. Therefore, ~0 (~0x00000000) gives 0xffffffff.
This in turn is interpreted as -1 in Two's complement.
Related
I'm trying to figure out why the following code:
{
unsigned int a = 10;
a = ~a;
printf("%d\n", a);
}
a will be 00001010 to begin with, and after NOT opertaion, will transform
into 11110101.
What happens exactly when one tries to print a as signed integer, that makes
the printed result to be -11?
I thought i would end up seeing -5 maybe (according to the binary representation), but not -11.
I'll be glad to get a clarification on the matter.
2's complement notation is used to store negative numbers.
The number 10 is 0000 0000 0000 0000 0000 0000 0000 1010 in 4 byte binary.
a=~a makes the content of a as 1111 1111 1111 1111 1111 1111 1111 0101.
This number when treated as signed int will tell the compiler to
take the most significant bit as sign and rest as magnitude.
The 1 in the msb makes the number a negative number.
Hence 2's complement operation is performed on the remaining bits.
Thus 111 1111 1111 1111 1111 1111 1111 0101 becomes
000 0000 0000 0000 0000 0000 0000 1011.
This when interpreted as a decimal integer becomes -11.
When you write a = ~a; you reverse each an every bit in a, what is also called a complement to 1.
The representation of a negative number is declared as implementation dependant, meaning that different architectures could have different representation for -10 or -11.
Assuming a 32 architecture on a common processor that uses complement to 2 to represent negative numbers -1 will be represented as FFFFFFFF (hexadecimal) or 32 bits to 1.
~a will be represented as = FFFFFFF5 or in binary 1...10101 which is the representation of -11.
Nota: the first part is always the same and is not implementation dependant, ~a is FFFFFFF5 on any 32 bits architecture. It is only the second part (-11 == FFFFFFF5) that is implementation dependant. BTW it would be -10 on an architecture that would use complement to 1 to represent negative numbers.
I have the following code:
unsigned char chr = 234; // 1110 1010
unsigned long result = 0;
result = chr << 24;
And now result will equal 18446744073340452864, which is 1111 1111 1111 1111 1111 1111 1111 1111 1110 1010 0000 0000 0000 0000 0000 0000 in binary.
Why is there sign extension being done, when chr is unsigned?
Also if I change the shift from 24 to 8 then result is 59904 which is 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 1010 0000 0000 in binary. Why here is there no extension done here? (Any shift 23 or less doesn't have sign extension done to it)
Also on my current platform sizeof(long) is 8.
What are the rules for automatically casting to larger size types when shifting? It seems to me that if the shift is 23 or less than the chr gets casted to an unsigned type and if it's 24 or more it gets casted to a signed type? (And why is sign extension even being done at all with a left shift)
With chr = 234, the expression chr << 24 is evaluated in isolation: chr is promoted to (a 32-bit signed) int and shifted left 24 bits, yielding a negative int value. When you assign to a 64-bit unsigned long, the sign-bit is propagated through the most significant 32 bits of the 64-bit value. Note that the method of calculating chr << 24 is not itself affected by what the value is assigned to.
When the shift is just 8 bits, the result is a positive (32-bit signed) integer, and that sign bit (0) is propagated through the most significant 32-bits of the unsigned long.
To understand this it's easiest to think in terms of values.
Each integral type has a fixed range of representable values. For example, unsigned char usually ranges from 0 to 255 ; other ranges are possible and you can find your compiler's choice by checking UCHAR_MAX in limits.h.
When doing a conversion between integral types; if the value is representable in the destination type, then the result of the conversion is that value. (This may be a different bit-pattern, e.g. sign extension).
If the value is not representable in the destination type then:
for signed destinations, the behaviour is implementation-defined (which may include raising a signal).
for unsigned destinations, the value is adjusted modulo the maximum value representable in the type, plus one.
Modern systems handle the signed out-of-range assignment by left-truncating excessive bits; and if it is still out-of-range then it retains the same bit-pattern, but the value changes to whatever value that bit-pattern represents in the destination type.
Moving onto your actual example.
In C, there is something called the integral promotions. With <<, this happens to the left-hand operand; with the arithmetic operators it happens to all operands. The effect of integral promotions is that any value of a type smaller than int is converted to the same value with type int.
Further, the definition of << 24 is multiplication by 2^24 (where this has the type of the promoted left operand), with undefined behaviour if this overflows. (Informally: shifting into the sign bit causes UB).
So, putting all the conversions explicitly, your code is
result = (unsigned long) ( ((int)chr) * 16777216 )
Now, the result of this calculation is 3925868544 , which if you are on a typical system with 32-bit int, is greater than INT_MAX which is 2147483647, so the behaviour is undefined.
If we want to explore results of this undefined behaviour on typical systems: what may happen is the same procedure I outlined earlier for out-of-range assignment. The bit-pattern of 3925868544 is of course 1110 1010 0000 0000 0000 0000 0000 0000. Treating this as the pattern of an int using 2's complement gives the int -369098752.
Finally we have the conversion of this value to unsigned long. -369098752 is out of range for unsigned long; and the rule for an unsigned destination is to adjust the value modulo ULONG_MAX+1. So the value you are seeing is 18446744073709551615 + 1 - 369098752.
If your intent was to do the calculation in unsigned long precision, you need to make one of the operands unsigned long; e.g. do ((unsigned long)chr) << 24. (Note: 24ul won't work, the type of the right-hand operand of << or >> does not affect the left-hand operand).
I'm right-shifting -109 by 5 bits, and I expect -3, because
-109 = -1101101 (binary)
shift right by 5 bits
-1101101 >>5 = -11 (binary) = -3
But, I am getting -4 instead.
Could someone explain what's wrong?
Code I used:
int16_t a = -109;
int16_t b = a >> 5;
printf("%d %d\n", a,b);
I used GCC on linux, and clang on osx, same result.
The thing is you are not considering negative numbers representation correctly. With right shifting, the type of shift (arithmetic or logical) depends on the type of the value being shifted. If you cast your value to an unsigned value, you might get what you are expecting:
int16_t b = ((unsigned int)a) >> 5;
You are using -109 (16 bits) in your example. 109 in bits is:
00000000 01101101
If you take's 109 2's complement you get:
11111111 10010011
Then, you are right shifting by 5 the number 11111111 10010011:
__int16_t a = -109;
__int16_t b = a >> 5; // arithmetic shifting
__int16_t c = ((__uint16_t)a) >> 5; // logical shifting
printf("%d %d %d\n", a,b,c);
Will yield:
-109 -4 2044
The result of right shifting a negative value is implementation defined behavior, from the C99 draft standard section 6.5.7 Bitwise shift operators paragraph 5 which says (emphasis mine going forward):
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
If we look at gcc C Implementation-defined behavior documents under the Integers section it says:
The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5).
Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension.
That's pretty clear what's happening, when representing signed integers, negative integers have a property which is, sign extension, and the left most significant bit is the sign bit.
So, 1000 ... 0000 (32 bit) is the biggest negative number that you can represent, with 32 bits.
Because of this, when you have a negative number and you shift right, a thing called sign extension happens, which means that the left most significant bit is extended, in simple terms it means that, for a number like -109 this is what happens:
Before shifting you have (16bit):
1111 1111 1001 0011
Then you shift 5 bits right (after the pipe are the discarded bits):
1XXX X111 1111 1100 | 1 0011
The X's are the new spaces that appear in your integer bit representation, that due to the sign extension, are filled with 1's, which give you:
1111 1111 1111 1100 | 1 0011
So by shifting: -109 >> 5, you get -4 (1111 .... 1100) and not -3.
Confirming results with the 1's complement:
+3 = 0... 0000 0011
-3 = ~(0... 0000 0011) + 1 = 1... 1111 1100 + 1 = 1... 1111 1101
+4 = 0... 0000 0100
-4 = ~(0... 0000 0100) + 1 = 1... 1111 1011 + 1 = 1... 1111 1100
Note: Remember that the 1's complement is just like the 2's complement with the diference that you first must negate the bits of positive number and only then sum +1.
Pablo's answer is essentially correct, but there are two small bits (no pun intended!) that may help you see what's going on.
C (like pretty much every other language) uses what's called two's complement, which is simply a different way of representing negative numbers (it's used to avoid the problems that come up with other ways of handling negative numbers in binary with a fixed number of digits). There is a conversion process to turn a positive number in two's complement (which looks just like any other number in binary - except that the furthest most left bit must be 0 in a positive number; it's basically the sign place-holder) is reasonably simple computationally:
Take your number
00000000 01101101 (It has 0s padding it to the left because it's 16 bits. If it was long, it'd be padded with more zeros, etc.)
Flip the bits
11111111 10010010
Add one.
11111111 10010011.
This is the two's complement number that Pablo was referring to. It's how C holds -109, bitwise.
When you logically shift it to the right by five bits you would APPEAR to get
00000111 11111100.
This number is most definitely not -4. (It doesn't have a 1 in the first bit, so it's not negative, and it's way too large to be 4 in magnitude.) Why is C giving you negative 4 then?
The reason is basically that the ISO implementation for C doesn't specify how a given compiler needs to treat bit-shifting in negative numbers. GCC does what's called sign extension: the idea is to basically pad the left bits with 1s (if the initial number was negative before shifting), or 0s (if the initial number was positive before shifting).
So instead of the 5 zeros that happened in the above bit-shift, you instead get:
11111111 11111100. That number is in fact negative 4! (Which is what you were consistently getting as a result.)
To see that that is in fact -4, you can just convert it back to a positive number using the two's complement method again:
00000000 00000011 (bits flipped)
00000000 00000100 (add one).
That's four alright, so your original number (11111111 11111100) was -4.
In the following code,
#include <stdio.h>
#include<conio.h>
main()
{
int y=-01;
y<<=3;
printf("%d\n",y);
y>>=3;
printf("%d",y);
getch();
}
I read in a book that while using the right shift operator on y, the signed bit may or may not be preserved depending on my compiler. Why is that? What is the concept behind this? Please elaborate.
Some processors have only an "unsigned" right shift (fills the sign bit with 0's) or a signed right shift (fills the size bit with copies of the current sign bit). Since some processors only have one or the other, but not both, the standard doesn't try to mandate one behavior or the other.
For what it's worth, many (most?) current processors have instructions to do both. For example, the x86 includes both shr (logical shift right -- fills with 0's) and sar (arithmetic shift right -- fills with copies of the sign bit).
The following quotes from the C11 standard (6.5.7 Bitwise shift operators)might be helpful :
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
This is true. Because, there are four types of shifting available.
Arithmetic Shift
Logical Shift
Rotate No carry Shift
Rotate Through Carry Shift
In first two types, the position of sign-bit is preserved while shifting shifting to the right or left. Generally in the C or any derivation of C uses the Arithmetic Shift & Logical Shift. In C, the Right Shift is Arithmetic Shift & Left Shift is Logical. But in Java, both shifting are Logical Shifting.
In the last two types, the positioned of the Sign Bit are not maintained. The bits are shifted in regular manner. The Rotate No Carry falls under the Circular Shift which is mostly used in cryptography to retain all the bits.
The C Programming, although implements the Logical & arithmetic Shift, but it also has the capability to recognize the Circular Shifting in the following way:
unsigned int x;
unsigned int y;
/* This Code is Just for reference from Wikipedia */
y = (x << shift) | (x >> (sizeof(x)*CHAR_BIT - shift));
I this information is not Sufficient, then you can go through the Wikipedia main article which is linked from the title of all the shifts above.
lets see this example
consider this value 1111 1111 1111 1111
if your compiler is 16 bit you did right shift
value will become 1111 1111 1111 1110
if your compiler is 32 bit and you did right shift
value will become 0000 0000 0000 0001 1111 1111 1111 1110
you did the right shift of same value but you will get different result depending up on your compiler.
use %x to print the hexa decimal notation of decimal value.
see this code :
#include <stdio.h>
main()
{
int y=-01;
printf("%d\n",y);
printf("%x\n",y);
y<<=3;
printf("%d\n",y);
printf("%x\n",y);
y>>=3;
printf("%d\n",y);
printf("%x\n",y);
y=127;
printf("%d\n",y);
printf("%x\n",y);
y<<=3;
printf("%d\n",y);
printf("%x\n",y);
y>>=3;
printf("%d\n",y);
printf("%x\n",y);
y=32767;
printf("%d\n",y);
printf("%x\n",y);
y<<=3;
printf("%d\n",y);
printf("%x\n",y);
y>>=3;
printf("%d\n",y);
printf("%x\n",y);
getchar();
}
OUTPUT ON 64-bit machine :
-1
ffffffff
-8
fffffff8
-1
ffffffff
127
7f
1016
3f8
127
7f
32767
7fff
262136
3fff8
32767
7fff
OUTPUT ON 32-bit machine :
-1
ffffffff
-8
fffffff8
-1
ffffffff
127
7f
1016
3f8
127
7f
32767
7fff
-8
fffffff8
-1
ffffffff
when reaches to maximum value of 32 bit signed integers , you will find the compiler dependency on shifting.
here 8-bit signed integers explained with 1's and 2's compliment representation
Signed number representations
I am trying to understand why my program
#include<stdio.h>
void main()
{
printf("%x",-1<<4);
}
prints fffffff0.
What is this program doing, and what does the << operator do?
The << operator is the left shift operator; the result of a<<b is a shifted to the left of b bits.
The problem with your code is that you are left-shifting a negative integer, and this results in undefined behavior (although your compiler may place some guarantees on this operation); also, %x is used to print in hexadecimal an unsigned integer, and you are feeding to it a signed integer - again undefined behavior.
As for why you are seeing what you are seeing: on 2's complement architectures -1 is represented as "all ones"; so, on a computer with 32-bit int you'll have:
11111111111111111111111111111111 = -1 (if interpreted as a signed integer)
now, if you shift this to the left of 4 positions, you get:
11111111111111111111111111110000
The %x specifier makes printf interprets this stuff as an unsigned integer, which, in hexadecimal notation, is 0xfffffff0. This is easy to understand, since 4 binary digits equal a single hexadecimal digit; the 1111 groups in binary become f in hex, the last 0000 in binary is that last 0 in hex.
Again, all this behavior hereby explained is just the way your specific compiler works, as far as the C standard is concerned this is all UB. This is very intentional: historically different platforms had different ways to represent negative numbers, and the shift instructions of various processors have different subtleties, so the "defined behavior" we get for shift operators is more or less the "safe subset" common to most "normal" architectures.
It means take bit representation of -1 and shift it to the left 4 times
This means take
11111111 11111111 11111111 11111111 = ffffffff
And shift:
11111111 11111111 11111111 11110000 = fffffff0
The "%x" format specifier means print it out in hexadecimal notation.
It's the left shift binary operator. You're left shifting -1 by 4 bits:
-1 == 1111 1111 1111 1111 1111 1111 1111 1111(2) == FFFFFFFF
-1 << 4 == 1111 1111 1111 1111 1111 1111 1111 0000(2) == FFFFFFF0
"%x" means your integer will be displayed in hexadecimal value.
-1 << 4 means that the binary value of "-1" will be shifted of 4 bits
"<<" is left shift operator. -1<<4 means left shifting -1 by 4. Since -1 is 0xffffffff, you will get 0xfffffff0
More can be found in wiki http://en.wikipedia.org/wiki/Bitwise_operation