Extracting the sign bit with shift - c

Is it always defined behavior to extract the sign of a 32 bit integer this way:
#include <stdint.h>
int get_sign(int32_t x) {
return (x & 0x80000000) >> 31;
}
Do I always get a result of 0 or 1?

No, it is incorrect to do this because right shifting a signed integer with a negative value is implementation-defined, as specified in the C Standard:
6.5.7 Bitwise shift operators
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
You should cast x as (uint32_t) before masking and shifting.
EDIT: Wrong answer! I shall keep this answer here as an example of good looking, intuitive but incorrect reasoning. As explained in the other answers, there is not right shifting of a negative value in the code posted. The type of x & 0x80000000 is one of the signed integer or unsigned integer types depending on the implementation characteristics, but its value is always positive, either 0 or 2147483648. Right shifting this value is not implementation-defined, the result is always either 0 or 1. Whether the result is the value of the sign bit is less obvious: it is the value of the sign bit except for some very contorted corner cases, hybrid architectures quite unlikely to exist and probably non standard conforming anyway.

Since the answer assumes that fixed width types are available, therefore a negative zero doesn't exists1, the only correct way of extracting the sign bit is to simply check if the value is negative:
_Bool Sign( const int32_t a )
{
return a < 0 ;
}
1 Fixed width types require two's complement representation, which doesn't have a negative zero.

Yes it is correct on 1s and 2s complement architectures, but for subtile reasons:
for the overwhelmingly common hardware where int is the same type as int32_t and unsigned the same as uint32_t, the constant literal 0x80000000 has type unsigned int. The left operand of the & operation is converted to unsigned int and the result of the & has the same type. The right shift is applied to an unsigned int, the value is either 0 or 1, no implementation-defined behavior.
On other platforms, 0x80000000 may have a different type and the behavior might be implementation defined:
0x80000000 can be of type int, if the int type has more than 31 value bits. In this case, x is promoted to int, and its value is unchanged.
If int uses 1s complement or 2s complement representation, the sign bit is replicated into the more significant bits. The mask operation evaluates to an int with value 0 or 0x80000000. Right shifting it by 31 positions evaluates to 0 and 1 respectively, no implementation-defined behavior either.
Conversely, if int uses sign/magnitude representation, preserving the value of x will effectively reset its 31st bit, moving the sign bit beyond the value bits. The mask operation will evaluate to 0 and the result will be incorrect.
0x80000000 can be of type long, if the int type has fewer than 31 value bits or if INT_MIN == -INT_MAX and long has more that 31 value bits. In this case, x is converted to long, and its value is unchanged, with the same consequences as for the int case. For 1s or 2s complement representation of long, the mask operation evaluates to a positive long value of either 0 or 0x80000000 and right shifting it by 31 places is defined and gives either 0 or 1, for sign/magnitude, the result should be 0 in all cases.
0x80000000 can be of type unsigned long, if the int type has fewer than 31 value bits and long has 31 value bits and uses 2s complement representation. In this case, x is converted to unsigned long keeping the sign bit intact. The mask operation evaluates to an unsigned long value of either 0 or 0x80000000 and right shifting it by 31 places is defined and gives either 0 or 1.
lastly, 0x80000000 can be of type long long, if both the int type has fewer than 31 value bits or INT_MIN == -INT_MAX and long has 31 value bits but does not use 2s complement representation. In this case, x is converted to long long, keeping its value, with the same consequences as for the int case if long long representation is sign/magnitude.
This question was purposely contrived. The answer is you get the correct result so long as the platform does not use sign/magnitude representation. But the C Standard insists on supporting integer representations other than 2s complement, with very subtile consequences.
EDIT: Careful reading of section 6.2.6.2 Integer types of the C Standard seems to exclude the possibility for different representations of signed integer types to coexist in the same implementation. This makes the code fully defined as posted, since the very presence of type int32_t implies 2s complement representation for all signed integer types.

Do I always get a result of 0 or 1?
Yes.
Simple answer:
0x80000000 >> 31 is always 1.
0x00000000 >> 31 is always 0.
See below.
[Edit]
Is it always defined behavior to extract the sign of a 32 bit integer this way
Yes, except for a corner case.
Should 0x80000000 implement as a int/long (this implies the type > 32 bit) and that signed integer type is signed-magnitude (or maybe one's complement) on a novel machine, then the conversion of int32_t x to that int/long would move the sign bit to a new bit location, rendering the & 0x80000000 moot.
The question is open if C supports int32_t (which must be 2's complement) and any of int/long/long long as non-2's complement.
0x80000000 is a hexadecimal constant. "The type of an integer constant is the first of the corresponding list in which its value can be represented" C11 §6.4.4.1 5: Octal or Hexadecimal Constant: int, unsigned, long or unsigned long.... Regardless of its type, it will have a value of +2,147,483,648.
The type of x & 0x80000000 will be the wider of the types of int32_t and the type of 0x80000000. If the 2 types are the same width and differ in sign-ness, it will be the unsigned one. INT32_MAX is +2,147,483,647 and less than +2,147,483,648, thus 0x80000000 must be a wider type (or same and unsigned) than int32_t. So regardless of what type 0x80000000, x & 0x80000000 will be the same type.
It makes no difference how int nor long are implemented as 2's complement or not.
The & operation does not change the sign of the value of 0x80000000 as either it is an unsigned integer type or the sign bit is in a more significant position. x & 0x80000000 then has the value of +2,147,483,648 or 0.
Right shift of a positive number is well defined regardless of integer type. Right shift of negative values are implementation defined. See C11 §6.5.7 5. x & 0x80000000 is never a negative number.
Thus (x & 0x80000000) >> 31 is well defined and either 0 or 1.
return x < 0; (which does not "Extracting the sign bit with shift" per post title) is understandable and is certainly the preferred code for most instances I can think of. Either approach may not make any executable code difference.

Whether this expression has precisely defined semantics or not, it is not the most readable way to get the sign bit. Here is simpler alternative:
int get_sign(int32_t x) {
return x < 0;
}
As correctly pointed out by 2501, int32_t is defined to have 2s complement representation, so comparing to 0 has the same semantics as extracting the most significant bit.
Incidentally, both functions compile to the same exact code with gcc 5.3:
get_sign(int):
movl %edi, %eax
shrl $31, %eax
ret

Related

Bit-shifting unsigned longs in C

I found a bug in a piece of code I wrote, and have fixed it, but still can't explain what was happening. It boils down to this:
unsigned i = 1<<31; // gives 21476483648 as expected
unsigned long l = 1<<31; // gives 18446744071562067968 as not expected
I'm aware of a question here: Unsigned long and bit shifting wherein the exact same number shows up as an unexpected value, but there he was using a signed char which I believe led to a sign extension. I really can't for the life of me see why I'm getting an incorrect value here.
I'm using CLion on Ubuntu 18.04, and on my system an unsigned is 32 bits and a long is 64 bits.
In this expression:
1<<31
The value 1 has type int. Assuming an int is 32 bits wide, that means you're shifting a bit into the sign bit. Doing so is undefined behavior.
This is documented in section 6.5.7p4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the
value of the result is E1×2E2, reduced modulo one more than the
maximum value representable in the result type. If E1 has a
signed type and nonnegative value, and E1×2E2 is representable in
the result type, then that is the resulting value; otherwise, the
behavior is undefined.
However, since you're on Ubuntu, which used GCC, the behavior is actually implementation defined. The gcc documentation states:
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed >> acts on
negative numbers by sign extension.
As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed << as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
So gcc in this case works directly on the representation of the values. This means that 1<<31 has type int and the representation 0x80000000. The value of this representation in decimal is ‭-2147483648‬.
When this value is assigned to an unsigned int, it is converted via the rules in section 6.3.1.3p2:
Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than the
maximum value that can be represented in the new type until the
value is in the range of the new type.
Since "one more than the maximum value" is ‭42949672956 for a 32 bit unsigned int This results in the int value -2147483648‬ being converted to the unsigned int value ‭42949672956 -2147483648 == 2147483648‬.
When 1<<31 is assigned to an unsigned long int which is 64 bit, "one more than the maximum value" is 18446744073709551616 so the result of the conversion is 18446744073709551616 -2147483648 == 18446744071562067968, which is the value you're getting.
To get the correct value, use the UL suffix to make the value unsigned long:
1UL<<31

Casting signed to unsigned and vise versa while widening the byte count

uint32_t a = -1; // 11111111111111111111111111111111
int64_t b = (int64_t) a; // 0000000000000000000000000000000011111111111111111111111111111111
int32_t c = -1; // 11111111111111111111111111111111
int64_t d = (int64_t) c; // 1111111111111111111111111111111111111111111111111111111111111111
From the observation above, it appears that only the original value's sign matters.
I.e if the original 32 bit number is unsigned, casting it to a 64 bit value will add 0's to its left regardless of the destination value being signed or unsigned and;
if the original 32 bit number is signed and negative, casting it to a 64 bit value will add 1's to its left regardless of the destination value being signed or unsigned.
Is the above statement correct?
Correct, it's the source operand that dictates this.
uint32_t a = -1;
int64_t b = (int64_t) a;
No sign extension happens here because the source value is an unsigned uint32_t. The basic idea of sign extension is to ensure the wider variable has the same value (including sign). Coming from an unsigned integer type, the value is positive, always. This is covered by the standards snippet /1 below.
Negative sign extension (in the sense that the top 1-bit in a two's complement value is copied to all the higher bits in the wider type(a)) only happens when a signed type is extended in width, since only signed types can be negative.
If the original 32 bit number is signed and negative, casting it to a 64 bit value will add 1's to its left regardless of the destination value being signed or unsigned.
This is covered by the standards snippet /2 below. You still have to maintain the sign of the value when extending the bits but pushing a negative value (assuming the source was negative) into an unsigned variable will simply mathematically add the MAX_VAL + 1 to the value until it is within the range of the target type (in reality, for two's complement, no adding is done, it just interprets the same bit pattern in a different way).
Both these scenarios are covered in the standard, in this case C11 6.3.1.3 Signed and unsigned integers /1 and /2:
1/ When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2/ Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3/ Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Note that your widening conversions are covered by the first two points above. I've included the third point for completion as it covers things like conversion from uint32_t to int32_t, or unsigned int to long where they have the same width (they both have a minimum range but there's no requirement that unsigned int be "thinner" than long).
(a) This may be different in ones' complement or sign-magnitude representations but, since they're in the process of being removed, nobody really cares that much.
See:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html (WG21, C++); and
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm (WG14, C)
for more detail.
In any case, the fixed width types are two's complement so you don't have to worry about this aspect for your example code.

can't shift negative numbers to the right in c

I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t

short int Integer wrap around / short int inversion in c not understood, difference between assignment and prints

The following code fragment
short int k = -32768;
printf("%d \n", -k);
k=-k;
printf("%d \n", k);
prints
32768
-32768
I would assume that both prints are equal.
Can somebody explain what the difference is and why the assignment k=-k causes a wrap around? It was hard to find a explanation online, as i don't really know what to google.
Well, to print a short, you need to use a length modifier.
Change the format string to %hd.
Because SHRT_MIN = -32768 and SHRT_MAX = +32767. Check out how 2's complement work.
Edit
Actually, according to the international standard of the C programming language (i.e. Committee Draft — May 6, 2005, pp. 50-51) regarding signed integers:
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit.
There need not be any padding bits; there shall be exactly one sign bit.
Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M <= N).
If the sign bit is zero, it shall not affect the resulting value.
If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value −(2N ) (two's complement);
the sign bit has the value −(2N − 1) (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
So, in other words, in your case it's seems that 2's complement is being used, but this should not be assumed to be in all platforms and compilers.
-32768 is 8000 in hexadecimal notation. 32768 can not be represented as a signed 16 bit number, therefore the compiler promotes -k to an int with the hex value 00008000. When this number is assigned to the 16-bit variable, the high bits are truncated, resulting in 8000 or -32768 again.

Why is -(-2147483648) = - 2147483648 in a 32-bit machine?

I think the question is self explanatory, I guess it probably has something to do with overflow but still I do not quite get it. What is happening, bitwise, under the hood?
Why does -(-2147483648) = -2147483648 (at least while compiling in C)?
Negating an (unsuffixed) integer constant:
The expression -(-2147483648) is perfectly defined in C, however it may be not obvious why it is this way.
When you write -2147483648, it is formed as unary minus operator applied to integer constant. If 2147483648 can't be expressed as int, then it s is represented as long or long long* (whichever fits first), where the latter type is guaranteed by the C Standard to cover that value†.
To confirm that, you could examine it by:
printf("%zu\n", sizeof(-2147483648));
which yields 8 on my machine.
The next step is to apply second - operator, in which case the final value is 2147483648L (assuming that it was eventually represented as long). If you try to assign it to int object, as follows:
int n = -(-2147483648);
then the actual behavior is implementation-defined. Referring to the Standard:
C11 §6.3.1.3/3 Signed and unsigned integers
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
The most common way is to simply cut-off the higher bits. For instance, GCC documents it as:
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.
Conceptually, the conversion to type of width 32 can be illustrated by bitwise AND operation:
value & (2^32 - 1) // preserve 32 least significant bits
In accordance with two's complement arithmetic, the value of n is formed with all zeros and MSB (sign) bit set, which represents value of -2^31, that is -2147483648.
Negating an int object:
If you try to negate int object, that holds value of -2147483648, then assuming two's complement machine, the program will exhibit undefined behavior:
n = -n; // UB if n == INT_MIN and INT_MAX == 2147483647
C11 §6.5/5 Expressions
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
Additional references:
INT32-C. Ensure that operations on signed integers do not result in overflow
*) In withdrawed C90 Standard, there was no long long type and the rules were different. Specifically, sequence for unsuffixed decimal was int, long int, unsigned long int (C90 §6.1.3.2 Integer constants).
†) This is due to LLONG_MAX, which must be at least +9223372036854775807 (C11 §5.2.4.2.1/1).
Note: this answer does not apply as such on the obsolete ISO C90 standard that is still used by many compilers
First of all, on C99, C11, the expression -(-2147483648) == -2147483648 is in fact false:
int is_it_true = (-(-2147483648) == -2147483648);
printf("%d\n", is_it_true);
prints
0
So how it is possible that this evaluates to true?
The machine is using 32-bit two's complement integers. The 2147483648 is an integer constant that quite doesn't fit in 32 bits, thus it will be either long int or long long int depending on whichever is the first where it fits. This negated will result in -2147483648 - and again, even though the number -2147483648 can fit in a 32-bit integer, the expression -2147483648 consists of a >32-bit positive integer preceded with unary -!
You can try the following program:
#include <stdio.h>
int main() {
printf("%zu\n", sizeof(2147483647));
printf("%zu\n", sizeof(2147483648));
printf("%zu\n", sizeof(-2147483648));
}
The output on such machine most probably would be 4, 8 and 8.
Now, -2147483648 negated will again result in +214783648, which is still of type long int or long long int, and everything is fine.
In C99, C11, the integer constant expression -(-2147483648) is well-defined on all conforming implementations.
Now, when this value is assigned to a variable of type int, with 32 bits and two's complement representation, the value is not representable in it - the values on 32-bit 2's complement would range from -2147483648 to 2147483647.
The C11 standard 6.3.1.3p3 says the following of integer conversions:
[When] the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
That is, the C standard doesn't actually define what the value in this case would be, or doesn't preclude the possibility that the execution of the program stops due to a signal being raised, but leaves it to the implementations (i.e. compilers) to decide how to handle it (C11 3.4.1):
implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made
and (3.19.1):
implementation-defined value
unspecified value where each implementation documents how the choice is made
In your case, the implementation-defined behaviour is that the value is the 32 lowest-order bits [*]. Due to the 2's complement, the (long) long int value 0x80000000 has the bit 31 set and all other bits cleared. In 32-bit two's complement integers the bit 31 is the sign bit - meaning that the number is negative; all value bits zeroed means that the value is the minimum representable number, i.e. INT_MIN.
[*] GCC documents its implementation-defined behaviour in this case as follows:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
This is not a C question, for on a C implementation featuring 32-bit two's complement representation for type int, the effect of applying the unary negation operator to an int having the value -2147483648 is undefined. That is, the C language specifically disavows designating the result of evaluating such an operation.
Consider more generally, however, how the unary - operator is defined in two's complement arithmetic: the inverse of a positive number x is formed by flipping all the bits of its binary representation and adding 1. This same definition serves as well for any negative number that has at least one bit other than its sign bit set.
Minor problems arise, however, for the two numbers that have no value bits set: 0, which has no bits set at all, and the number that has only its sign bit set (-2147483648 in 32-bit representation). When you flip all the bits of either of these, you end up with all value bits set. Therefore, when you subsequently add 1, the result overflows the value bits. If you imagine performing the addition as if the number were unsigned, treating the sign bit as a value bit, then you get
-2147483648 (decimal representation)
--> 0x80000000 (convert to hex)
--> 0x7fffffff (flip bits)
--> 0x80000000 (add one)
--> -2147483648 (convert to decimal)
Similar applies to inverting zero, but in that case the overflow upon adding 1 overflows the erstwhile sign bit, too. If the overflow is ignored, the resulting 32 low-order bits are all zero, hence -0 == 0.
I'm gonna use a 4-bit number, just to make maths simple, but the idea is the same.
In a 4-bit number, the possible values are between 0000 and 1111. That would be 0 to 15, but if you wanna represent negative numbers, the first bit is used to indicate the sign (0 for positive and 1 for negative).
So 1111 is not 15. As the first bit is 1, it's a negative number. To know its value, we use the two-complement method as already described in previous answers: "invert the bits and add 1":
inverting the bits: 0000
adding 1: 0001
0001 in binary is 1 in decimal, so 1111 is -1.
The two-complement method goes both ways, so if you use it with any number, it will give you the binary representation of that number with the inverted sign.
Now let's see 1000. The first bit is 1, so it's a negative number. Using the two-complement method:
invert the bits : 0111
add 1: 1000 (8 in decimal)
So 1000 is -8. If we do -(-8), in binary it means -(1000), which actually means using the two-complement method in 1000. As we saw above, the result is also 1000.
So, in a 4-bit number, -(-8) is equals -8.
In a 32-bit number, -2147483648 in binary is 1000..(31 zeroes), but if you use the two-complement method, you'll end up with the same value (the result is the same number).
That's why in 32-bit number -(-2147483648) is equals -2147483648
It depends on the version of C, the specifics of the implementation and whether we are talking about variables or literals values.
The first thing to understand is that there are no negative integer literals in C "-2147483648" is a unary minus operation followed by a positive integer literal.
Lets assume that we are running on a typical 32-bit platform where int and long are both 32 bits and long long is 64 bits and consider the expression.
(-(-2147483648) == -2147483648 )
The compiler needs to find a type that can hold 2147483648, on a comforming C99 compiler it will use type "long long" but a C90 compiler can use type "unsigned long".
If the compiler uses type long long then nothing overflows and the comparision is false. If the compiler uses unsigned long then the unsigned wraparound rules come into play and the comparision is true.
For the same reason that winding a tape deck counter 500 steps forward from 000 (through 001 002 003 ...) will show 500, and winding it backward 500 steps backward from 000 (through 999 998 997 ...) will also show 500.
This is two's complement notation. Of course, since 2's complement sign convention is to consider the topmost bit the sign bit, the result overflows the representable range, just like 2000000000+2000000000 overflows the representable range.
As a result, the processor's "overflow" bit will be set (seeing this requires access to the machine's arithmetic flags, generally not the case in most programming languages outside of assembler). This is the only value which will set the "overflow" bit when negating a 2's complement number: any other value's negation lies in the range representable by 2's complement.

Resources