GCC version 5.4.0
Ubuntu 16.04
I have noticed some weird behavior with the right shift in C when I store a value in variable or not.
This code snippet is printing 0xf0000000, the expected behavior
int main() {
int x = 0x80000000
printf("%x", x >> 3);
}
These following two code snippets are printing 0x10000000, which is very weird in my opinion, it is performing logical shifts on a negative number
1.
int main() {
int x = 0x80000000 >> 3
printf("%x", x);
}
2.
int main() {
printf("%x", (0x80000000 >> 3));
}
Any insight would be really appreciated. I do not know if it a specific issue with my personal computer, in which case it can't be replicated, or if it is just a behavior in C.
Quoting from https://en.cppreference.com/w/c/language/integer_constant, for an hexadecimal integer constant without any suffix
The type of the integer constant is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used.
int
unsigned int
long int
unsigned long int
long long int(since C99)
unsigned long long int(since C99)
Also, later
There are no negative integer constants. Expressions such as -1 apply the unary minus operator to the value represented by the constant, which may involve implicit type conversions.
So, if an int has 32 bit in your machine, 0x80000000 has the type unsigned int as it can't fit an int and can't be negative.
The statement
int x = 0x80000000;
Converts the unsigned int to an int in an implementation defined way, but the statement
int x = 0x80000000 >> 3;
Performs a right shift to the unsigned int before converting it to an int, so the results you see are different.
EDIT
Also, as M.M noted, the format specifier %x requires an unsigned integer argument and passing an int instead causes undefined behavior.
Right shift of the negative integer has implementation defined behavior. So when shifting right the negative number you cant "expect" anything
So it is just as it is in your implementation. It is not weird.
6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.
It may also invoke the UB
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
As noted by #P__J__, the right shift is implementation-dependent, so you should not rely on it to be consistent on different platforms.
As for your specific test, which is on a single platform (possibly 32-bit Intel or another platform that uses two's complement 32-bit representation of integers), but still shows a different behavior:
GCC performs operations on literal constants using the highest precision available (usually 64-bit, but may be even more). Now, the statement x = 0x80000000 >> 3 will not be compiled into code that does right-shift at run time, instead the compiler figures out both operands are constant and folds them into x = 0x10000000. For GCC, the literal 0x80000000 is NOT a negative number. It is the positive integer 2^31.
On the other hand, x = 0x80000000 will store the value 2^31 into x, but the 32-bit storage cannot represent that as the positive integer 2^31 that you gave as an integer literal - the value is beyond the range representable by a 32-bit two's complement signed integer. The high-order bit ends up in the sign bit - so this is technically an overflow, though you don't get a warning or error. Then, when you use x >> 3, the operation is now performed at run-time (not by the compiler), with the 32-bit arithmetic - and it sees that as a negative number.
Related
I found a bug in a piece of code I wrote, and have fixed it, but still can't explain what was happening. It boils down to this:
unsigned i = 1<<31; // gives 21476483648 as expected
unsigned long l = 1<<31; // gives 18446744071562067968 as not expected
I'm aware of a question here: Unsigned long and bit shifting wherein the exact same number shows up as an unexpected value, but there he was using a signed char which I believe led to a sign extension. I really can't for the life of me see why I'm getting an incorrect value here.
I'm using CLion on Ubuntu 18.04, and on my system an unsigned is 32 bits and a long is 64 bits.
In this expression:
1<<31
The value 1 has type int. Assuming an int is 32 bits wide, that means you're shifting a bit into the sign bit. Doing so is undefined behavior.
This is documented in section 6.5.7p4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the
value of the result is E1×2E2, reduced modulo one more than the
maximum value representable in the result type. If E1 has a
signed type and nonnegative value, and E1×2E2 is representable in
the result type, then that is the resulting value; otherwise, the
behavior is undefined.
However, since you're on Ubuntu, which used GCC, the behavior is actually implementation defined. The gcc documentation states:
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed >> acts on
negative numbers by sign extension.
As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed << as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
So gcc in this case works directly on the representation of the values. This means that 1<<31 has type int and the representation 0x80000000. The value of this representation in decimal is -2147483648.
When this value is assigned to an unsigned int, it is converted via the rules in section 6.3.1.3p2:
Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than the
maximum value that can be represented in the new type until the
value is in the range of the new type.
Since "one more than the maximum value" is 42949672956 for a 32 bit unsigned int This results in the int value -2147483648 being converted to the unsigned int value 42949672956 -2147483648 == 2147483648.
When 1<<31 is assigned to an unsigned long int which is 64 bit, "one more than the maximum value" is 18446744073709551616 so the result of the conversion is 18446744073709551616 -2147483648 == 18446744071562067968, which is the value you're getting.
To get the correct value, use the UL suffix to make the value unsigned long:
1UL<<31
I was trying to print the minimum of int, char, short, long without using the header file <limit.h>. So bitwise operation will be a good choice. But something strange happened.
The statement
printf("The minimum of short: %d\n", ~(((unsigned short)~0) >> 1));
gives me
The minimum of short: -32768
But the statement
printf("The minimum of short: %d\n", ~((~(unsigned short)0) >> 1));
gives me
The minimum of short: 0
This phenomenon also occurs in char. But it does not occur in long, int. Why does this happen?
And it is worth mentioning that I use VS Code as my editor. When I moved my cursor on unsigned char in the statement
printf("The minimum of char: %d\n", (short)~((~(unsigned char)0) >> 1));
It gives me a hint (int) 0 instead of (unsigned char)0, which I expected. Why does this happen?
First of all, none of your code is really reliable and won't do what you expect.
printf and all other variable argument length functions have a dysfunctional "feature" called the default argument promotions. This means that the actual type of the parameters passed undergo silent promotion. Small integer types (such as char and short) get promoted to int which is signed. (And float gets promoted to double.) Tl;dr: printf is a nuts function.
Therefore you can cast between various small integer types all you want, there will still be a promotion to int in the end. This is no problem if you use the correct format specifier for the intended type, but you don't, you use %d which is for int.
In addition, the ~ operator, like most operators in C, performs implicit integer promotion of its operand. See Implicit type promotion rules.
That being said, this line ~((~(unsigned short)0) >> 1) does the following:
Take the literal 0 which is of type int and convert to unsigned short.
Implicitly promote that unsigned short back to int through implicit integer promotion.
Calculate the bitwise complement of the int value 0. This is 0xFF...FF hex, -1 dec, assuming 2's complement.
Right shift this int by 1. Here you invoke implementation-defined behavior upon shifting a negative integer. C allows this to either result in a logical shift = shift in zeroes, or arithmetic shift = shift in a sign bit. Different result from compiler to compiler and non-portable.
You get either 0x7F...FF in case of logical shift or 0xFF...FF in case of arithmetic shift. In this case it seems to be the latter, meaning you still have decimal -1 after shift.
You do bitwise complement of the 0xFF...FF = -1 and get 0.
You cast this to short. Still 0.
Default argument promotion convert it to int. Still 0.
%d expects a int and therefore prints accordingly. unsigned short is printed with %hu and short with %hd. Using the correct format specifier should undo the effect of default argument promotion.
Advice: study implicit type promotion and avoid using bitwise operators on operands that have signed type.
To simply display the lowest 2's complement value of various signed types, you have to do some trickery with unsigned types, since bitwise operations on their signed version are unreliable. Example:
int shift = sizeof(short)*8 - 1; // 15 bits on sane systems
short s = (short) (1u << shift);
printf("%hd\n", s);
This shifts an unsigned int 1u 15 bits, then converts the result of that to short, in some "implementation-defined way", meaning on two's complement systems you'll end up converting 0x8000 to -32768.
Then give printf the right format specifier and you'll get the expected result from there.
I am learning in C and I got a question regarding this conversion.
short int x = -0x52ea;
printf ( "%x", x );
output:
ffffad16
I would like to know how this conversion works because it's supposed to be on a test and we won't be able to use any compilers. Thank you
I would like to know how this conversion works
It is undefined behavior (UB)
short int x = -0x52ea;
0x52ea is a hexadecimal constant. It has the value of 52EA16, or 21,22610. It has type int as it fits in an int, even if int was 16 bit. OP's int is evidently 32-bit.
- negates the value to -21,226.
The value is assigned to a short int which can encode -21,226, so no special issues with assigning this int to a short int.
printf("%x", x );
short int x is passed to a ... function, so goes through the default argument
promotions and becomes an int. So an int with the value -21,226 is passed.
"%x" used with printf(), expects an unsigned argument. Since the type passed is not an unsigned (and not an int with a non-negative value - See exception C11dr §6.5.2.2 6), the result is undefined behavior (UB). Apparently the UB on your machine was to print the hex pattern of a 32-bit 2's complement of -21,226 or FFFFAD16.
If the exam result is anything but UB, just smile and nod and realize the curriculum needs updating.
The point here is that when a number is negative, it's structured in a completely different way.
1 in 16-bit hexadecimal is 0001, -1 is ffff. The most relevant bit (8000) indicates that it's a negative number (admitting it's a signed integer), and that's why it can only go as positive as 32767 (7fff), and as negative as -32768 (8000).
Basically to transform from positive to negative, you invert all bits and sum 1. 0001 inverted is fffe, +1 = ffff.
This is a convention called Two's complement and it's used because it's quite trivial to do arithmetic using bitwise operations when you use it.
Consider following example:
#include <stdio.h>
int main(void)
{
unsigned char a = 15; /* one byte */
unsigned short b = 15; /* two bytes */
unsigned int c = 15; /* four bytes */
long x = -a; /* eight bytes */
printf("%ld\n", x);
x = -b;
printf("%ld\n", x);
x = -c;
printf("%ld\n", x);
return 0;
}
To compile I am using GCC 4.4.7 (and it gave me no warnings):
gcc -g -std=c99 -pedantic-errors -Wall -W check.c
My result is:
-15
-15
4294967281
The question is why both unsigned char and unsigned short values are "propagated" correctly to (signed) long, while unsigned int is not ? Is there any reference or rule on this ?
Here are results from gdb (words are in little-endian order) accordingly:
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 11111111111111111111111111111111
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 00000000000000000000000000000000
This is due to how the integer promotions applied to the operand and the requirement that the result of unary minus have the same type. This is covered in section 6.5.3.3 Unary arithmetic operators and says (emphasis mine going forward):
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
and integer promotion which is covered in the draft c99 standard section 6.3 Conversions and says:
if an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.
In the first two cases, the promotion will be to int and the result will be int. In the case of unsigned int no promotion is required but the result will require a conversion back to unsigned int.
The -15 is converted to unsigned int using the rules set out in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)
So we end up with -15 + (UMAX + 1) which results in UMAX - 14 which results in a large unsigned value. This is sometimes why you will see code use -1 converted to to an unsigned value to obtain the max unsigned value of a type since it will always end up being -1 + UMAX + 1 which is UMAX.
int is special. Everything smaller than int gets promoted to int in arithmetic operations.
Thus -a and -b are applications of unary minus to int values of 15, which just work and produce -15. This value is then converted to long.
-c is different. c is not promoted to an int as it is not smaller than int. The result of unary minus applied to an unsigned int value of k is again an unsigned int, computed as 2N-k (N is the number of bits).
Now this unsigned int value is converted to long normally.
This behavior is correct. Quotes are from C 9899:TC2.
6.5.3.3/3:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
6.2.5/9:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
6.3.1.1/2:
The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
So for long x = -a;, since the operand a, an unsigned char, has conversion rank less than the rank of int and unsigned int, and all unsigned char values can be represented as int (on your platform), we first promote to type int. The negative of that is simple: the int with value -15.
Same logic for unsigned short (on your platform).
The unsigned int c is not changed by promotion. So the value of -c is calculated using modular arithmetic, giving the result UINT_MAX-14.
C's integer promotion rules are what they are because standards-writers wanted to allow a wide variety of existing implementations that did different things, in some cases because they were created before there were "standards", to keep on doing what they were doing, while defining rules for new implementations that were more specific than "do whatever you feel like". Unfortunately, the rules as written make it extremely difficult to write code which doesn't depend upon a compiler's integer size. Even if future processors would be able to perform 64-bit operations faster than 32-bit ones, the rules dictated by the standards would cause a lot of code to break if int ever grew beyond 32 bits.
It would probably in retrospect have been better to have handled "weird" compilers by explicitly recognizing the existence of multiple dialects of C, and recommending that compilers implement a dialect that handles various things in consistent ways, but providing that they may also implement dialects which do them differently. Such an approach may end up ultimately being the only way that int can grow beyond 32 bits, but I've not heard of anyone even considering such a thing.
I think the root of the problem with unsigned integer types stems from the fact that they are sometimes used to represent numerical quantities, and are sometimes used to represent members of a wrapping abstract algebraic ring. Unsigned types behave in a manner consistent with an abstract algebraic ring in circumstances which do not involve type promotion. Applying a unary minus to a member of a ring should (and does) yield a member of that same ring which, when added to the original, will yield zero [i.e. the additive inverse]. There is exactly one way to map integer quantities to ring elements, but multiple ways exist to map ring elements back to integer quantities. Thus, adding a ring element to an integer quantity should yield an element of the same ring regardless of the size of the integer, and conversion from rings to integer quantities should require that code specify how the conversion should be performed. Unfortunately, C implicitly converts rings to integers in cases where either the size of the ring is smaller than the default integer type, or when an operation uses a ring member with an integer of a larger type.
The proper solution to solve this problem would be to allow code to specify that certain variables, return values, etc. should be regarded as ring types rather than numbers; an expression like -(ring16_t)2 should yield 65534 regardless of the size of int, rather than yielding 65534 on systems where int is 16 bits, and -2 on systems where it's larger. Likewise, (ring32)0xC0000001 * (ring32)0xC0000001 should yield (ring32)0x80000001 even if int happens to be 64 bits [note that if int is 64 bits, the compiler could legally do anything it likes if code tries to multiply two unsigned 32-bit values which equal 0xC0000001, since the result would be too large to represent in a 64-bit signed integer.
Negatives are tricky. Especially when it comes to unsigned values. If you look at the c-documentation, you'll notice that (contrary to what you'd expect) unsigned chars and shorts are promoted to signed ints for computing, while an unsigned int will be computed as an unsigned int.
When you compute the -c, the c is treated as an int, it becomes -15, then is stored in x, (which still believes it is an UNSIGNED int) and is stored as such.
For clarification - No ACTUAL promotion is done when "negativeing" an unsigned. When you assign a negative to any type of int (or take a negative) the 2's compliment of the number is instead used. Since the only practical difference between unsigned and signed values is that the MSB acts as a sign flag, it is taken as a very large positive number instead of a negative one.
My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.