negative integer number >> 31 = -1 not 1? [duplicate] - c

This question already has answers here:
Arithmetic bit-shift on a signed integer
(6 answers)
Closed 9 years ago.
so, lets say I have a signed integer (couple of examples):
-1101363339 = 10111110 01011010 10000111 01110101 in binary.
-2147463094 = 10000000 00000000 01010000 01001010 in binary.
-20552 = 11111111 11111111 10101111 10111000 in binary.
now: -1101363339 >> 31 for example, should equal 1 right? but on my computer, I am getting -1. Regardless of what negative integer I pick if x = negative number, x >> 31 = -1. why? clearly in binary it should be 1.

Per C99 6.5.7 Bitwise shift operators:
If E1 has a signed type and a negative value, the resulting value is implementation-defined.
where E1 is the left-hand side of the shift expression. So it depends on your compiler what you'll get.

In most languages when you shift to the right it does an arithmetic shift, meaning it preserves the most significant bit. Therefore in your case you have all 1's in binary, which is -1 in decimal. If you use an unsigned int you will get the result you are looking for.
Per C 2011 6.5.7 Bitwise shift operators:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1/ 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Basically, the right-shift of a negative signed integer is implementation defined but most implementations choose to do it as an arithmetic shift.

The behavior you are seeing is called an arithmetic shift which is when right shifting extends the sign bit. This means that the MSBs will carry the same value as the original sign bit. In other words, a negative number will always be negative after a left shift operation.
Note that this behavior is implementation defined and cannot be guaranteed with a different compiler.

What you are seeing is an arithmetic shift, in contrast to the bitwise shift you were expecting; i.e., the compiler, instead of "brutally" shifting the bits, is propagating the sign bit, thus dividing by 2N.
When talking about unsigned ints and positive ints, a right shift is a very simple operation - the bits are shifted to the right by one place (inserting 0 on the left), regardless of their meaning. In such cases, the operation is equivalent to dividing by 2N (and actually the C standard defines it like that).
The distinction comes up when talking about negative numbers. Several negative numbers representation exist, although currently for integers most commonly 2's complement representation is used.
The problem of a "brutal" bitwise shift here is, for starters, that one of the bits is used in some way to express the sign; thus, shifting the binary digits regardless of the negative integers representation can give unexpected results.
For example, commonly in 2's representation the most significant bit is 1 for negative numbers, 0 for positive numbers; applying a bitwise shift (with zeroes inserted to the left) to a negative number would (between other things) make it positive, not resulting in the (usually expected) division by 2N
So, arithmetic shift is introduced; negative numbers represented in 2's complement have an interesting property: the division by 2N behavior of the shift is preserved if, instead of inserting zeroes from the left, you insert bits that have the same value of the original sign bit.
In this way, signed divisions by 2N can be performed with just a bit of extra logic in the shift, without having to resort to a fully-fledged division routine.
Now, is arithmetic shift guaranteed for signed integers? In some languages yes1, but in C it's not like that - the behavior of the shift operators when dealing with negative integers is left as an implementation-defined detail.
As often happens, this is due to different hardware support for the operation; C is used on vastly different platforms, and, especially in the past, there was quite a difference in the "cost" of operations depending on the platform.
For example, if the processor does not provide an arithmetic right shift instruction, the compiler would be mandated to emit a much slower DIV instruction of some kind, which could be a problem in an inner loop on slower processors. For these reasons, the C standard leaves it up to the implementor to do the most appropriate thing for the current platform.
In your case, your implementation probably chose arithmetic shift because you are running on an x86 processor, that uses 2's complement arithmetic and provides both bitwise and arithmetic shift as single CPU instructions.
Actually, languages like Java even have separated arithmetic and bitwise shift operators - this is mainly due to the fact that they do not have unsigned types to e.g. store bitfields.

Related

Why is -(-2147483648) = - 2147483648 in a 32-bit machine?

I think the question is self explanatory, I guess it probably has something to do with overflow but still I do not quite get it. What is happening, bitwise, under the hood?
Why does -(-2147483648) = -2147483648 (at least while compiling in C)?
Negating an (unsuffixed) integer constant:
The expression -(-2147483648) is perfectly defined in C, however it may be not obvious why it is this way.
When you write -2147483648, it is formed as unary minus operator applied to integer constant. If 2147483648 can't be expressed as int, then it s is represented as long or long long* (whichever fits first), where the latter type is guaranteed by the C Standard to cover that value†.
To confirm that, you could examine it by:
printf("%zu\n", sizeof(-2147483648));
which yields 8 on my machine.
The next step is to apply second - operator, in which case the final value is 2147483648L (assuming that it was eventually represented as long). If you try to assign it to int object, as follows:
int n = -(-2147483648);
then the actual behavior is implementation-defined. Referring to the Standard:
C11 §6.3.1.3/3 Signed and unsigned integers
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
The most common way is to simply cut-off the higher bits. For instance, GCC documents it as:
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.
Conceptually, the conversion to type of width 32 can be illustrated by bitwise AND operation:
value & (2^32 - 1) // preserve 32 least significant bits
In accordance with two's complement arithmetic, the value of n is formed with all zeros and MSB (sign) bit set, which represents value of -2^31, that is -2147483648.
Negating an int object:
If you try to negate int object, that holds value of -2147483648, then assuming two's complement machine, the program will exhibit undefined behavior:
n = -n; // UB if n == INT_MIN and INT_MAX == 2147483647
C11 §6.5/5 Expressions
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
Additional references:
INT32-C. Ensure that operations on signed integers do not result in overflow
*) In withdrawed C90 Standard, there was no long long type and the rules were different. Specifically, sequence for unsuffixed decimal was int, long int, unsigned long int (C90 §6.1.3.2 Integer constants).
†) This is due to LLONG_MAX, which must be at least +9223372036854775807 (C11 §5.2.4.2.1/1).
Note: this answer does not apply as such on the obsolete ISO C90 standard that is still used by many compilers
First of all, on C99, C11, the expression -(-2147483648) == -2147483648 is in fact false:
int is_it_true = (-(-2147483648) == -2147483648);
printf("%d\n", is_it_true);
prints
0
So how it is possible that this evaluates to true?
The machine is using 32-bit two's complement integers. The 2147483648 is an integer constant that quite doesn't fit in 32 bits, thus it will be either long int or long long int depending on whichever is the first where it fits. This negated will result in -2147483648 - and again, even though the number -2147483648 can fit in a 32-bit integer, the expression -2147483648 consists of a >32-bit positive integer preceded with unary -!
You can try the following program:
#include <stdio.h>
int main() {
printf("%zu\n", sizeof(2147483647));
printf("%zu\n", sizeof(2147483648));
printf("%zu\n", sizeof(-2147483648));
}
The output on such machine most probably would be 4, 8 and 8.
Now, -2147483648 negated will again result in +214783648, which is still of type long int or long long int, and everything is fine.
In C99, C11, the integer constant expression -(-2147483648) is well-defined on all conforming implementations.
Now, when this value is assigned to a variable of type int, with 32 bits and two's complement representation, the value is not representable in it - the values on 32-bit 2's complement would range from -2147483648 to 2147483647.
The C11 standard 6.3.1.3p3 says the following of integer conversions:
[When] the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
That is, the C standard doesn't actually define what the value in this case would be, or doesn't preclude the possibility that the execution of the program stops due to a signal being raised, but leaves it to the implementations (i.e. compilers) to decide how to handle it (C11 3.4.1):
implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made
and (3.19.1):
implementation-defined value
unspecified value where each implementation documents how the choice is made
In your case, the implementation-defined behaviour is that the value is the 32 lowest-order bits [*]. Due to the 2's complement, the (long) long int value 0x80000000 has the bit 31 set and all other bits cleared. In 32-bit two's complement integers the bit 31 is the sign bit - meaning that the number is negative; all value bits zeroed means that the value is the minimum representable number, i.e. INT_MIN.
[*] GCC documents its implementation-defined behaviour in this case as follows:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
This is not a C question, for on a C implementation featuring 32-bit two's complement representation for type int, the effect of applying the unary negation operator to an int having the value -2147483648 is undefined. That is, the C language specifically disavows designating the result of evaluating such an operation.
Consider more generally, however, how the unary - operator is defined in two's complement arithmetic: the inverse of a positive number x is formed by flipping all the bits of its binary representation and adding 1. This same definition serves as well for any negative number that has at least one bit other than its sign bit set.
Minor problems arise, however, for the two numbers that have no value bits set: 0, which has no bits set at all, and the number that has only its sign bit set (-2147483648 in 32-bit representation). When you flip all the bits of either of these, you end up with all value bits set. Therefore, when you subsequently add 1, the result overflows the value bits. If you imagine performing the addition as if the number were unsigned, treating the sign bit as a value bit, then you get
-2147483648 (decimal representation)
--> 0x80000000 (convert to hex)
--> 0x7fffffff (flip bits)
--> 0x80000000 (add one)
--> -2147483648 (convert to decimal)
Similar applies to inverting zero, but in that case the overflow upon adding 1 overflows the erstwhile sign bit, too. If the overflow is ignored, the resulting 32 low-order bits are all zero, hence -0 == 0.
I'm gonna use a 4-bit number, just to make maths simple, but the idea is the same.
In a 4-bit number, the possible values are between 0000 and 1111. That would be 0 to 15, but if you wanna represent negative numbers, the first bit is used to indicate the sign (0 for positive and 1 for negative).
So 1111 is not 15. As the first bit is 1, it's a negative number. To know its value, we use the two-complement method as already described in previous answers: "invert the bits and add 1":
inverting the bits: 0000
adding 1: 0001
0001 in binary is 1 in decimal, so 1111 is -1.
The two-complement method goes both ways, so if you use it with any number, it will give you the binary representation of that number with the inverted sign.
Now let's see 1000. The first bit is 1, so it's a negative number. Using the two-complement method:
invert the bits : 0111
add 1: 1000 (8 in decimal)
So 1000 is -8. If we do -(-8), in binary it means -(1000), which actually means using the two-complement method in 1000. As we saw above, the result is also 1000.
So, in a 4-bit number, -(-8) is equals -8.
In a 32-bit number, -2147483648 in binary is 1000..(31 zeroes), but if you use the two-complement method, you'll end up with the same value (the result is the same number).
That's why in 32-bit number -(-2147483648) is equals -2147483648
It depends on the version of C, the specifics of the implementation and whether we are talking about variables or literals values.
The first thing to understand is that there are no negative integer literals in C "-2147483648" is a unary minus operation followed by a positive integer literal.
Lets assume that we are running on a typical 32-bit platform where int and long are both 32 bits and long long is 64 bits and consider the expression.
(-(-2147483648) == -2147483648 )
The compiler needs to find a type that can hold 2147483648, on a comforming C99 compiler it will use type "long long" but a C90 compiler can use type "unsigned long".
If the compiler uses type long long then nothing overflows and the comparision is false. If the compiler uses unsigned long then the unsigned wraparound rules come into play and the comparision is true.
For the same reason that winding a tape deck counter 500 steps forward from 000 (through 001 002 003 ...) will show 500, and winding it backward 500 steps backward from 000 (through 999 998 997 ...) will also show 500.
This is two's complement notation. Of course, since 2's complement sign convention is to consider the topmost bit the sign bit, the result overflows the representable range, just like 2000000000+2000000000 overflows the representable range.
As a result, the processor's "overflow" bit will be set (seeing this requires access to the machine's arithmetic flags, generally not the case in most programming languages outside of assembler). This is the only value which will set the "overflow" bit when negating a 2's complement number: any other value's negation lies in the range representable by 2's complement.

Any reason to assign -0?

I am going through some old C code using Lint which stumbled upon this line:
int16_t max = -0;
The Lint message is that the "Constant expression evaluates to 0 in operation '-'".
Is there any reason why someone would use -0?
In the C specification (6.2.6.2 Integer types), it states the following (emphasis mine):
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are
M value bits in the signed type and N in the unsigned type, then M £ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and
magnitude);
the sign bit has the value -(2N) (two’s complement);
the sign bit has the value -(2N - 1) (one’s complement).
Which of these applies is implementation-defined, as is whether the
value with sign bit 1 and all value bits zero (for the first two), or
with sign bit and all value bits 1 (for one’s complement), is a trap
representation or a normal value. In the case of sign and magnitude
and one’s complement, if this representation is a normal value it is
called a negative zero.
In other words, C supports three different representations for signed integers and two of them have the concept of signed zero, which makes a distinction between a positive and a negative zero.
So, my explanation is that perhaps the author of your code snippet was trying to produce a negative zero value. But, as pointed out in Jens Gustedt's answer, this expression cannot actually produce a negative zero, which means the author may have made a wrong assumption there.
No, I can't see any reason for this. Others have mentioned that it is possible to have platforms with "negative zero", but such a negative zero can never be produced by this expression, so this is useless.
The corresponding paragraph in the C standard is 6.2.6.2 p3, emphasis is mine:
If the implementation supports negative zeros, they shall be generated
only by:
— the &, |, ^, ~, <<, and >> operators with operands that
produce such a value;
— the +, -, *, /, and % operators where one operand is a negative zero and the result is zero;
— compound
assignment operators based on the above cases.
To produce a negative zero on such a platform you could use ~INT_MAX, for example, but that would not be a zero for other representations, so the code wouldn't be very portable.
That is for an architecture using a CPU with one's complement numbers.
Ones complement is a way to represent negative numbers, having a -0. Still in use for floating point.
For unsigned numbers the maximal number is indeed -0 : all 1s.
We are accustomed to two's complement numbers, having one negative number more. Though one's complement has some troubles around zero, the same holds for two's complement around MIN_INT: -MIN_INT == MIN_INT.
In the code above probably intended was unsigned numbers:
uint16_t max = (uint16_t) -1;
No reason for it with C99 or later.
int16_t is an exact-width integer type.
The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits. C11dr §7.20.1.1 1
There is no signed zero with two’s complement. So code is equivalent to
int16_t max = 0;
IMO, the int16_t max = -0; hoped for result, on a non-2's complement platform, was to initialize max to -0 to flag an array of length 0 or one that only contained elements with the value -0.

Bitwise operations on negative numbers

I was reading a book on C, where in some section it says: " The bitwise operations are typically used with unsigned types.".
Question: why?
Simply because it is not immediately clear what bit operations on the sign bit of a signed number should mean.
Unsigned types have no special bits, everything works straight
forward.
Signed types have a special sign bit and can interpret that with
three different encodings to represent negative values (ones and two's complement or sign and magnitude).
From the programming languages' point of view the unsigned does not mean, that it cannot be negative. It means, that the first bit of the number is not used for determining if the value is negative or positive.
So, for an 8 bit value the bitwise operators consider all the 8 bits, thus working on the range 0..255 (while the mathematical operators can consider the first bit as being the signedness indicator, thus working on a range of -128 to +127).
Unsigned operands do not have any special bits for sign representation.
For signed operands, the special bit for sign representation may interrupt operations which we want to perform.
Bitwise operations are typically used with unsigned types or signed types with non-negative values.
The C Standard supports 3 different representations for negative values: applying bitwise operators on such values has at best an implementation defined behavior.
For example -1 is represented:
as |1000|0000|0000|0001| on a 16-bit system with sign magnitude.
as |1111|1111|1111|1110| on a 16-bit system with one's complement.
and more commonly as |1111|1111|1111|1111| on a 16-bit system with two's complement.
As a consequence, -1 & 3 can have 3 different values:
1 on sign-magnitude systems,
2 on one's complement systems,
3 on two's complement systems,
Other operations may behave in even more unexpected ways on negative numbers, such as << and >>, especially on the DS9K.
Conversely, the bitwise operations are fully defined on unsigned types, which are mandated to have binary representation (with some restrictions on the right operand of shift operators).

Why does right shifting negative numbers in C bring 1 on the left-most bits? [duplicate]

This question already has answers here:
Are the shift operators (<<, >>) arithmetic or logical in C?
(11 answers)
Closed 7 years ago.
The book "C The Complete Reference" by Herbert Schildt says that "(In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved.)"
What's the point of preserving the sign bit?
Moreover, I think that the book is referring to the case when negative numbers are represented using a sign bit and not using two's complement. But still even in that case the reasoning doesn't seem to make any sense.
The Schildt book is widely acknowledged to be exceptionally poor.
In fact, C doesn't guarantee that a 1 will be shifted in when you right-shift a negative signed number; the result of right-shifting a negative value is implementation-defined.
However, if right-shift of a negative number is defined to shift in 1s to the highest bit positions, then on a 2s complement representation it will behave as an arithmetic shift - the result of right-shifting by N will be the same as dividing by 2N, rounding toward negative infinity.
The statement is sweeping and inaccurate, like many a statement by Mr Schildt. Many people recommend throwing his books away. (Amongst other places, see The Annotated Annotated C Standard, and ACCU Reviews — do an author search on Schildt; see also the Definitive List of C Books on Stack Overflow).
It is implementation defined whether right shifting a negative (necessarily signed) integer shifts zeros or ones into the high order bits. The underlying CPUs (for instance, ARM; see also this class) often have two different underlying instructions — ASR or arithmetic shift right and LSR or logical shift right, of which ASR preserves the sign bit and LSR does not. The compiler writer is allowed to choose either, and may do so for reasons of compatibility, speed or whimsy.
ISO/IEC 9899:2011 §6.5.7 Bitwise shift operators
¶5 The result of E1 >> E2is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
The point is that the C >> (right shift) operator preserves1 the sign for a (signed) int.
For example:
int main() {
int a;
unsigned int b;
a = -8;
printf("%d (0x%X) >> 1 = %d (0x%X)\n", a, a, a>>1, a>>1);
b = 0xFFEEDDCC;
printf("%d (0x%X) >> 1 = %d (0x%X)\n", b, b, b>>1, b>>1);
return 0;
}
Output:
-8 (0xFFFFFFF8) >> 1 = -4 (0xFFFFFFFC) [sign preserved, LSB=1]
-1122868 (0xFFEEDDCC) >> 1 = 2146922214 (0x7FF76EE6) [MSB = 0]
If it didn't preserve the sign, the result would make absolutely no sense. You would take a small negative number, and by shifting right one (dividing by two), you would end up with a large positive number instead.
1 - This is implementation-defined, but from my experience, most compilers choose an arithmetic (sign-preserving) shift instruction.
In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved
Not necessarily. See the C standard C11 6.5.7:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
This means that the compiler is free to shift in whatever it likes (0 or 1), as long as it documents it.

Bits representation of negative numbers

This is a doubt regarding the representation of bits of signed integers. For example, when you want to represent -1, it is equivalent to 2's complement of (+1). So -1 is represented as 0xFFFFFFF. Now when I shift my number by 31 and print the result it is coming back as -1.
signed int a = -1;
printf(("The number is %d ",(a>>31));//this prints as -1
So can anyone please explain to me how the bits are represented for negative numbers?
Thanks.
When the top bit is zero, the number is positive. When it's 1, the number is negative.
Negative numbers shifted right keep shifting a "1" in as the topmost bit to keep the number negative. That's why you're getting that answer.
For more about two's complement, see this Stackoverflow question.
#Stobor points out that some C implementations could shift 0 into the high bit instead of 1. [Verified in Wikipedia.] In Java it's dependably an arithmetic shift.
But the output given by the questioner shows that his compiler is doing an arithmetic shift.
The C standard leaves it undefined whether the right shift of a negative (necessarily signed) integer shifts zeroes (logical shift right) or sign bits (arithmetic shift right) into the most significant bit. It is up to the implementation to choose.
Consequently, portable code ensures that it does not perform right shifts on negative numbers. Either it converts the value to the corresponding unsigned value before shifting (which is guaranteed to use a logical shift right, putting zeroes into the vacated bits), or it ensures that the value is positive, or it tolerates the variation in the output.
This is an arithmetic shift operation which preserves the sign bit and shifts the mantissa part of a signed number.
cheers
Basically there are two types of right shift. An unsigned right shift and a signed right shift. An unsigned right shift will shift the bits to the right, causing the least significant bit to be lost, and the most significant bit to be replaced with a 0. With a signed right shift, the bits are shifted to the right, causing the least significant bit be be lost, and the most significant bit to be preserved. A signed right shift divides the number by a power of two (corresponding to the number of places shifted), whereas an unsigned shift is a logical shifting operation.
The ">>" operator performs an unsigned right shift when the data type on which it operates is unsigned, and it performs a signed right shift when the data type on which it operates is signed. So, what you need to do is cast the object to an unsigned integer type before performing the bit manipulation to get the desired result.
Have a look at two's complement description. It should help.
EDIT: When the below was written, the code in the question was written as:
unsigned int a = -1;
printf(("The number is %d ",(a>>31));//this prints as -1
If unsigned int is at least 32 bits wide, then your compiler isn't really allowed to produce -1 as the output of that (with the small caveat that you should be casting the unsigned value to int before you pass it to printf).
Because a is an unsigned int, assigning -1 to it must give it the value of UINT_MAX (as the smallest non-negative value congruent to -1 modulo UINT_MAX+1). As long as unsigned int has at least 32 bits on your platform, the result of shifting that unsigned quantity right by 31 will be UINT_MAX divided by 2^31, which has to fit within int. (If unsigned int is 31 bits or shorter, it can produce whatever it likes because the result of the shift is unspecified).

Resources