I found a bug in a piece of code I wrote, and have fixed it, but still can't explain what was happening. It boils down to this:
unsigned i = 1<<31; // gives 21476483648 as expected
unsigned long l = 1<<31; // gives 18446744071562067968 as not expected
I'm aware of a question here: Unsigned long and bit shifting wherein the exact same number shows up as an unexpected value, but there he was using a signed char which I believe led to a sign extension. I really can't for the life of me see why I'm getting an incorrect value here.
I'm using CLion on Ubuntu 18.04, and on my system an unsigned is 32 bits and a long is 64 bits.
In this expression:
1<<31
The value 1 has type int. Assuming an int is 32 bits wide, that means you're shifting a bit into the sign bit. Doing so is undefined behavior.
This is documented in section 6.5.7p4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the
value of the result is E1×2E2, reduced modulo one more than the
maximum value representable in the result type. If E1 has a
signed type and nonnegative value, and E1×2E2 is representable in
the result type, then that is the resulting value; otherwise, the
behavior is undefined.
However, since you're on Ubuntu, which used GCC, the behavior is actually implementation defined. The gcc documentation states:
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed >> acts on
negative numbers by sign extension.
As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed << as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
So gcc in this case works directly on the representation of the values. This means that 1<<31 has type int and the representation 0x80000000. The value of this representation in decimal is -2147483648.
When this value is assigned to an unsigned int, it is converted via the rules in section 6.3.1.3p2:
Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than the
maximum value that can be represented in the new type until the
value is in the range of the new type.
Since "one more than the maximum value" is 42949672956 for a 32 bit unsigned int This results in the int value -2147483648 being converted to the unsigned int value 42949672956 -2147483648 == 2147483648.
When 1<<31 is assigned to an unsigned long int which is 64 bit, "one more than the maximum value" is 18446744073709551616 so the result of the conversion is 18446744073709551616 -2147483648 == 18446744071562067968, which is the value you're getting.
To get the correct value, use the UL suffix to make the value unsigned long:
1UL<<31
Related
uint32_t a = -1; // 11111111111111111111111111111111
int64_t b = (int64_t) a; // 0000000000000000000000000000000011111111111111111111111111111111
int32_t c = -1; // 11111111111111111111111111111111
int64_t d = (int64_t) c; // 1111111111111111111111111111111111111111111111111111111111111111
From the observation above, it appears that only the original value's sign matters.
I.e if the original 32 bit number is unsigned, casting it to a 64 bit value will add 0's to its left regardless of the destination value being signed or unsigned and;
if the original 32 bit number is signed and negative, casting it to a 64 bit value will add 1's to its left regardless of the destination value being signed or unsigned.
Is the above statement correct?
Correct, it's the source operand that dictates this.
uint32_t a = -1;
int64_t b = (int64_t) a;
No sign extension happens here because the source value is an unsigned uint32_t. The basic idea of sign extension is to ensure the wider variable has the same value (including sign). Coming from an unsigned integer type, the value is positive, always. This is covered by the standards snippet /1 below.
Negative sign extension (in the sense that the top 1-bit in a two's complement value is copied to all the higher bits in the wider type(a)) only happens when a signed type is extended in width, since only signed types can be negative.
If the original 32 bit number is signed and negative, casting it to a 64 bit value will add 1's to its left regardless of the destination value being signed or unsigned.
This is covered by the standards snippet /2 below. You still have to maintain the sign of the value when extending the bits but pushing a negative value (assuming the source was negative) into an unsigned variable will simply mathematically add the MAX_VAL + 1 to the value until it is within the range of the target type (in reality, for two's complement, no adding is done, it just interprets the same bit pattern in a different way).
Both these scenarios are covered in the standard, in this case C11 6.3.1.3 Signed and unsigned integers /1 and /2:
1/ When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2/ Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3/ Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Note that your widening conversions are covered by the first two points above. I've included the third point for completion as it covers things like conversion from uint32_t to int32_t, or unsigned int to long where they have the same width (they both have a minimum range but there's no requirement that unsigned int be "thinner" than long).
(a) This may be different in ones' complement or sign-magnitude representations but, since they're in the process of being removed, nobody really cares that much.
See:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html (WG21, C++); and
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm (WG14, C)
for more detail.
In any case, the fixed width types are two's complement so you don't have to worry about this aspect for your example code.
GCC version 5.4.0
Ubuntu 16.04
I have noticed some weird behavior with the right shift in C when I store a value in variable or not.
This code snippet is printing 0xf0000000, the expected behavior
int main() {
int x = 0x80000000
printf("%x", x >> 3);
}
These following two code snippets are printing 0x10000000, which is very weird in my opinion, it is performing logical shifts on a negative number
1.
int main() {
int x = 0x80000000 >> 3
printf("%x", x);
}
2.
int main() {
printf("%x", (0x80000000 >> 3));
}
Any insight would be really appreciated. I do not know if it a specific issue with my personal computer, in which case it can't be replicated, or if it is just a behavior in C.
Quoting from https://en.cppreference.com/w/c/language/integer_constant, for an hexadecimal integer constant without any suffix
The type of the integer constant is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used.
int
unsigned int
long int
unsigned long int
long long int(since C99)
unsigned long long int(since C99)
Also, later
There are no negative integer constants. Expressions such as -1 apply the unary minus operator to the value represented by the constant, which may involve implicit type conversions.
So, if an int has 32 bit in your machine, 0x80000000 has the type unsigned int as it can't fit an int and can't be negative.
The statement
int x = 0x80000000;
Converts the unsigned int to an int in an implementation defined way, but the statement
int x = 0x80000000 >> 3;
Performs a right shift to the unsigned int before converting it to an int, so the results you see are different.
EDIT
Also, as M.M noted, the format specifier %x requires an unsigned integer argument and passing an int instead causes undefined behavior.
Right shift of the negative integer has implementation defined behavior. So when shifting right the negative number you cant "expect" anything
So it is just as it is in your implementation. It is not weird.
6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.
It may also invoke the UB
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
As noted by #P__J__, the right shift is implementation-dependent, so you should not rely on it to be consistent on different platforms.
As for your specific test, which is on a single platform (possibly 32-bit Intel or another platform that uses two's complement 32-bit representation of integers), but still shows a different behavior:
GCC performs operations on literal constants using the highest precision available (usually 64-bit, but may be even more). Now, the statement x = 0x80000000 >> 3 will not be compiled into code that does right-shift at run time, instead the compiler figures out both operands are constant and folds them into x = 0x10000000. For GCC, the literal 0x80000000 is NOT a negative number. It is the positive integer 2^31.
On the other hand, x = 0x80000000 will store the value 2^31 into x, but the 32-bit storage cannot represent that as the positive integer 2^31 that you gave as an integer literal - the value is beyond the range representable by a 32-bit two's complement signed integer. The high-order bit ends up in the sign bit - so this is technically an overflow, though you don't get a warning or error. Then, when you use x >> 3, the operation is now performed at run-time (not by the compiler), with the 32-bit arithmetic - and it sees that as a negative number.
Is it always defined behavior to extract the sign of a 32 bit integer this way:
#include <stdint.h>
int get_sign(int32_t x) {
return (x & 0x80000000) >> 31;
}
Do I always get a result of 0 or 1?
No, it is incorrect to do this because right shifting a signed integer with a negative value is implementation-defined, as specified in the C Standard:
6.5.7 Bitwise shift operators
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
You should cast x as (uint32_t) before masking and shifting.
EDIT: Wrong answer! I shall keep this answer here as an example of good looking, intuitive but incorrect reasoning. As explained in the other answers, there is not right shifting of a negative value in the code posted. The type of x & 0x80000000 is one of the signed integer or unsigned integer types depending on the implementation characteristics, but its value is always positive, either 0 or 2147483648. Right shifting this value is not implementation-defined, the result is always either 0 or 1. Whether the result is the value of the sign bit is less obvious: it is the value of the sign bit except for some very contorted corner cases, hybrid architectures quite unlikely to exist and probably non standard conforming anyway.
Since the answer assumes that fixed width types are available, therefore a negative zero doesn't exists1, the only correct way of extracting the sign bit is to simply check if the value is negative:
_Bool Sign( const int32_t a )
{
return a < 0 ;
}
1 Fixed width types require two's complement representation, which doesn't have a negative zero.
Yes it is correct on 1s and 2s complement architectures, but for subtile reasons:
for the overwhelmingly common hardware where int is the same type as int32_t and unsigned the same as uint32_t, the constant literal 0x80000000 has type unsigned int. The left operand of the & operation is converted to unsigned int and the result of the & has the same type. The right shift is applied to an unsigned int, the value is either 0 or 1, no implementation-defined behavior.
On other platforms, 0x80000000 may have a different type and the behavior might be implementation defined:
0x80000000 can be of type int, if the int type has more than 31 value bits. In this case, x is promoted to int, and its value is unchanged.
If int uses 1s complement or 2s complement representation, the sign bit is replicated into the more significant bits. The mask operation evaluates to an int with value 0 or 0x80000000. Right shifting it by 31 positions evaluates to 0 and 1 respectively, no implementation-defined behavior either.
Conversely, if int uses sign/magnitude representation, preserving the value of x will effectively reset its 31st bit, moving the sign bit beyond the value bits. The mask operation will evaluate to 0 and the result will be incorrect.
0x80000000 can be of type long, if the int type has fewer than 31 value bits or if INT_MIN == -INT_MAX and long has more that 31 value bits. In this case, x is converted to long, and its value is unchanged, with the same consequences as for the int case. For 1s or 2s complement representation of long, the mask operation evaluates to a positive long value of either 0 or 0x80000000 and right shifting it by 31 places is defined and gives either 0 or 1, for sign/magnitude, the result should be 0 in all cases.
0x80000000 can be of type unsigned long, if the int type has fewer than 31 value bits and long has 31 value bits and uses 2s complement representation. In this case, x is converted to unsigned long keeping the sign bit intact. The mask operation evaluates to an unsigned long value of either 0 or 0x80000000 and right shifting it by 31 places is defined and gives either 0 or 1.
lastly, 0x80000000 can be of type long long, if both the int type has fewer than 31 value bits or INT_MIN == -INT_MAX and long has 31 value bits but does not use 2s complement representation. In this case, x is converted to long long, keeping its value, with the same consequences as for the int case if long long representation is sign/magnitude.
This question was purposely contrived. The answer is you get the correct result so long as the platform does not use sign/magnitude representation. But the C Standard insists on supporting integer representations other than 2s complement, with very subtile consequences.
EDIT: Careful reading of section 6.2.6.2 Integer types of the C Standard seems to exclude the possibility for different representations of signed integer types to coexist in the same implementation. This makes the code fully defined as posted, since the very presence of type int32_t implies 2s complement representation for all signed integer types.
Do I always get a result of 0 or 1?
Yes.
Simple answer:
0x80000000 >> 31 is always 1.
0x00000000 >> 31 is always 0.
See below.
[Edit]
Is it always defined behavior to extract the sign of a 32 bit integer this way
Yes, except for a corner case.
Should 0x80000000 implement as a int/long (this implies the type > 32 bit) and that signed integer type is signed-magnitude (or maybe one's complement) on a novel machine, then the conversion of int32_t x to that int/long would move the sign bit to a new bit location, rendering the & 0x80000000 moot.
The question is open if C supports int32_t (which must be 2's complement) and any of int/long/long long as non-2's complement.
0x80000000 is a hexadecimal constant. "The type of an integer constant is the first of the corresponding list in which its value can be represented" C11 §6.4.4.1 5: Octal or Hexadecimal Constant: int, unsigned, long or unsigned long.... Regardless of its type, it will have a value of +2,147,483,648.
The type of x & 0x80000000 will be the wider of the types of int32_t and the type of 0x80000000. If the 2 types are the same width and differ in sign-ness, it will be the unsigned one. INT32_MAX is +2,147,483,647 and less than +2,147,483,648, thus 0x80000000 must be a wider type (or same and unsigned) than int32_t. So regardless of what type 0x80000000, x & 0x80000000 will be the same type.
It makes no difference how int nor long are implemented as 2's complement or not.
The & operation does not change the sign of the value of 0x80000000 as either it is an unsigned integer type or the sign bit is in a more significant position. x & 0x80000000 then has the value of +2,147,483,648 or 0.
Right shift of a positive number is well defined regardless of integer type. Right shift of negative values are implementation defined. See C11 §6.5.7 5. x & 0x80000000 is never a negative number.
Thus (x & 0x80000000) >> 31 is well defined and either 0 or 1.
return x < 0; (which does not "Extracting the sign bit with shift" per post title) is understandable and is certainly the preferred code for most instances I can think of. Either approach may not make any executable code difference.
Whether this expression has precisely defined semantics or not, it is not the most readable way to get the sign bit. Here is simpler alternative:
int get_sign(int32_t x) {
return x < 0;
}
As correctly pointed out by 2501, int32_t is defined to have 2s complement representation, so comparing to 0 has the same semantics as extracting the most significant bit.
Incidentally, both functions compile to the same exact code with gcc 5.3:
get_sign(int):
movl %edi, %eax
shrl $31, %eax
ret
I'm a C beginner, and I'm confused by the following example found in the C answer book.
One way to find the size of unsigned long long on your system is to type:
printf("%llu", (unsigned long long) ~0);
I have no idea why this syntax works?
On my system, int are 32 bits, and long long are 64 bits.
What I expected was that, since 0 is a constant of type integer, ~0 calculates the negation of a 32-bits integer, which is then converted to an unsigned long long by the cast operator. This should give 232 - 1 as a result.
Somehow, it looks like the ~ operator already knows that it should act on 64 bits?
Does the compiler interprets this instruction as printf("%llu", ~(unsigned long long)0); ? That doesn't sound right since the cast and ~ have the same priority.
Somehow, it looks like the ~ operator already knows that it should act on 64 bits?
It's not the ~ operator, it's the cast. Here is how the integer conversion is done according to the standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The value of signed int ~0 corresponds to -1 on systems with two's complement representation of negative values. It cannot be represented by an unsigned long long, so the first bullet point does not apply.
The second bullet point does apply: the new type is unsigned, so MAX of unsigned long long is added to -1 once to get the result into the range of unsigned long long. This has the same effect as sign-extending -1 to 64 bits.
0 is of type int, not unsigned int. ~0 will therefore (on machines that use two's complement integer representation, which is all that are in use today) be -1, not 232 - 1.
Assuming a 64-bit unsigned long long, (unsigned long long) -1 is -1 modulo 264, which is 264 - 1.
0 is an int
~0 is still an int, namely the value -1.
Casting an int to unsigned long long is there merely to match the type that printf expects with the conversion llu.
However, the value of -1 extended an unsigned long long should be 0xffffffff for 4 byte int and 0xffffffffffffffff for 8 byte int.
According to N1570 Committee Draft:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the result has
the promoted type. If the promoted type is an "unsigned type, the
expression ~E is equivalent to the maximum value representable in that
type minus E".
§6.2.6.2 Language 45:
(ones’ complement). Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.
Hence, the behavior of code:
printf("%llu", (unsigned long long) ~0);
On some machine is implementation-defined and undefined - not as per expected — depend on the internal representations of integers in machine.
And according to section 6.5.3.3, approved way to write code would be:
printf("%llu", (unsigned long long) ~0u);
Further, type of ~0u is unsigned int where as you are casting it to unsigned long long int for which format string is llu. To print ~0u using format string %u.
To learn basic concept of type casting you may like to read: What exactly is a type cast in C/C++?
I was reading John Regehr's blog on how he gives his students an assignment about saturating arithmetic. The interesting part is that the code has to compile as-is while using typedefs to specify different integer types, see the following excerpt of the full header:
typedef signed int mysint;
//typedef signed long int mysint;
mysint sat_signed_add (mysint, mysint);
mysint sat_signed_sub (mysint, mysint);
The corresponding unsigned version is simple to implement (although I'm actually not sure if padding bits wouldn't make that problematic too), but I actually don't see how I can get the maximum (or minimum) value of an unknown signed type in C, without using macros for MAX_ und MIN_ or causing undefined behavior.
Am I missing something here or is the assignment just flawed (or more likely I'm missing some crucial information he gave his students)?
I don't see any way to do this without making assumptions or invoking implementation-defined (not necessarily undefined) behavior. If you assume that there are no padding bits in the representation of mysint or of uintmax_t, however, then you can compute the maximum value like this:
mysint mysint_max = (mysint)
((~(uintmax_t)0) >> (1 + CHAR_BITS * (sizeof(uintmax_t) - sizeof(mysint))));
The minimum value is then either -mysint_max (sign/magnitude or ones' complement) or -mysint_max - 1 (two's complement), but it is a bit tricky to determine which. You don't know a priori which bit is the sign bit, and there are possible trap representations that differ for different representations styles. You also must be careful about evaluating expressions, because of the possibility of "the usual arithmetic conversions" converting values to a type whose representation has different properties than those of the one you are trying to probe.
Nevertheless, you can distinguish the type of negative-value representation by computing the bitwise negation of the mysint representation of -1. For two's complement the mysint value of the result is 0, for ones' complement it is 1, and for sign/magnitude it is mysint_max - 1.
If you add the assumption that all signed integer types have the same kind of negative-value representation then you can simply perform such a test using an ordinary expression on default int literals. You don't need to make that assumption, however. Instead, you can perform the operation directly on the type representation's bit pattern, via a union:
union mysint_bits {
mysint i;
unsigned char bits[sizeof(mysint)];
} msib;
int counter = 0;
for (msib.i = -1; counter < sizeof(mysint); counter += 1) {
msib.bits[counter] = ~msib.bits[counter];
}
As long as the initial assumption holds (that there are no padding bits in the representation of type mysint) msib.i must then be a valid representation of the desired result.
I don't see a way to determine the largest and smallest representable values for an unknown signed integer type in C, without knowing something more. (In C++, you have std::numeric_limits available, so it is trivial.)
The largest representable value for an unsigned integer type is (myuint)(-1). That is guaranteed to work independent of padding bits, because (§ 6.3.1.3/1-2):
When a value with integer type is converted to another integer type… if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
So to convert -1 to an unsigned type, you add one more than the maximum representable value to it, and that result must be the maximum representable value. (The standard makes it clear that the meaning of "repeatedly adding or subtracting" is mathematical.)
Now, if you knew that the number of padding bits in the signed type was the same as the number of padding bits in the unsigned type [but see below], you could compute the largest representable signed value from the largest representable unsigned value:
(mysint)( (myuint)(-1) / (myuint)2 )
Unfortunately, that's not enough to compute the minimum representable signed value, because the standard permits the minimum to be either one less than the negative of the maximum (2's-complement representation) or exactly the negative of the maximum (1's-complement or sign/magnitude representations).
Moreover, the standard does not actually guarantee that the number of padding bits in the signed type is the same as the number of padding bits in the unsigned type. All it guarantees is that the number of value bits in the signed type be no greater than the number of value bits in the unsigned type. In particular, it would be legal for the unsigned type to have one more padding bit than the corresponding signed type, in which case they would have the same number of value bits and the maximum representable values would be the same. [Note: a value bit is neither a padding bit nor the sign bit.]
In short, if you knew (for example by being told) that the architecture were 2's-complement and that corresponding signed and unsigned types had the same number of padding bits, then you could certainly compute both signed min and max:
myuint max_myuint = (myuint)(-1);
mysint max_mysint = (mysint)(max_myuint / (my_uint)2);
mysint min_mysint = (-max_mysint) - (mysint)1;
Finally, casting an out-of-range unsigned integer to a signed integer is not undefined behaviour, although most other signed overflows are. The conversion, as indicated by §6.3.1.3/3, is implementation-defined behaviour:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
Implementation-defined behaviour is required to be documented by the implementation. So, suppose we knew that the implementation was gcc. Then we could examine the gcc documentation, where we would read the following, in the section "C Implementation-defined behaviour":
Whether signed integer types are represented using sign and
magnitude, two's complement, or one's complement, and whether the
extraordinary value is a trap representation or an ordinary value
(C99 6.2.6.2).
GCC supports only two's complement integer types, and all bit
patterns are ordinary values.
The result of, or the signal raised by, converting an integer to a
signed integer type when the value cannot be represented in an
object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo
2^N to be within range of the type; no signal is raised.
Knowing that signed integers are 2s-complement and that unsigned to signed conversions will not trap, but will produce the expected pattern of low-order bits, we can find the maximum and minimum values for any signed type starting with the maximum representable value for the widest unsigned type, uintmax_t:
uintmax_t umax = (uintmax_t)(-1);
while ( (mysint)(umax) < 0 ) umax >>= 1;
mysint max_mysint = (mysint)(umax);
mysint min_mysint = (-max_mysint) - (mysint)1;
This is a suggestion for getting the MAX value of a specific type set with typedef without using any library
typedef signed int mysint;
mysint size; // will give the size of the type
size=sizeof(mysint)*(mysint)8-(mysint)1; // as it is signed so a bit
// will be deleted for sign bit
mysint max=1;//start with first bit
while(--size)
{
mysint temp;
temp=(max<<(mysint)1)|(mysint)1;// set all bit to 1
max=temp;
}
/// max will contain the max value of the type mysint
If you assume eight-bit chars and a two's complement representation (both reasonable on all modern hardware, with the exception of some embedded DSP stuff), then you just need to form an unsigned integer (use uintmax_t to make sure it's big enough) with sizeof(mysint)*8 - 1 1's in the bottom bits, then cast it to mysint. For the minimum value, negate the maximum value and subtract one.
If you don't want to assume those things, then it's still possible, but you'll need to do some more digging through limits.h to compensate for the size of chars and the sign representation.
I guess this should work irrespective of negative number representation
// MSB is 1 and rests are zero is minimum number in both 2's and 1's
// compliments representations.
mysint min = (1 << (sizeof(mysint) * 8 - 1));
mysint max = ~x;