I am trying to convert 65529 from an unsigned int to a signed int. I tried doing a cast like this:
unsigned int x = 65529;
int y = (int) x;
But y is still returning 65529 when it should return -7. Why is that?
It seems like you are expecting int and unsigned int to be a 16-bit integer. That's apparently not the case. Most likely, it's a 32-bit integer - which is large enough to avoid the wrap-around that you're expecting.
Note that there is no fully C-compliant way to do this because casting between signed/unsigned for values out of range is implementation-defined. But this will still work in most cases:
unsigned int x = 65529;
int y = (short) x; // If short is a 16-bit integer.
or alternatively:
unsigned int x = 65529;
int y = (int16_t) x; // This is defined in <stdint.h>
I know it's an old question, but it's a good one, so how about this?
unsigned short int x = 65529U;
short int y = *(short int*)&x;
printf("%d\n", y);
This works because we are casting the address of x to the signed version of it's type, that's permitted by the C standard. Not all type punning like this (most in fact) is legal. The standard says this.
An object shall have its stored value accessed only by an lvalue that has one of the following types:
the declared type of the object,
a qualified version of the declared type of the object,
a type that is the signed or unsigned type corresponding to the declared type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
So, alas, since we are accessing the bits of x as if they were a signed (via the pointer), the actual conversion operation is replaced by reading what appears to be just a negative signed short, and conversion takes place without issue. However, it's possible for this to screw up on a one's complement machine, but those are so, so rare, and so, so obsolete, I wouldn't even bother with looking out for them.
#Mysticial got it. A short is usually 16-bit and will illustrate the answer:
int main()
{
unsigned int x = 65529;
int y = (int) x;
printf("%d\n", y);
unsigned short z = 65529;
short zz = (short)z;
printf("%d\n", zz);
}
65529
-7
Press any key to continue . . .
A little more detail. It's all about how signed numbers are stored in memory. Do a search for twos-complement notation for more detail, but here are the basics.
So let's look at 65529 decimal. It can be represented as FFF9h in hexadecimal. We can also represent that in binary as:
11111111 11111001
When we declare short zz = 65529;, the compiler interprets 65529 as a signed value. In twos-complement notation, the top bit signifies whether a signed value is positive or negative. In this case, you can see the top bit is a 1, so it is treated as a negative number. That's why it prints out -7.
For an unsigned short, we don't care about sign since it's unsigned. So when we print it out using %d, we use all 16 bits, so it's interpreted as 65529.
To understand why, you need to know that the CPU represents signed numbers using the two's complement (maybe not all, but many).
byte n = 1; //0000 0001 = 1
n = ~n + 1; //1111 1110 + 0000 0001 = 1111 1111 = -1
And also, that the type int and unsigned int can be of different sized depending on your CPU. When doing specific stuff like this:
#include <stdint.h>
int8_t ibyte;
uint8_t ubyte;
int16_t iword;
//......
The representation of the values 65529u and -7 are identical for 16-bit ints. Only the interpretation of the bits is different.
For larger ints and these values, you need to sign extend; one way is with logical operations
int y = (int )(x | 0xffff0000u); // assumes 16 to 32 extension, x is > 32767
If speed is not an issue, or divide is fast on your processor,
int y = ((int ) (x * 65536u)) / 65536;
The multiply shifts left 16 bits (again, assuming 16 to 32 extension), and the divide shifts right maintaining the sign.
You are expecting that your int type is 16 bit wide, in which case you'd indeed get a negative value. But most likely it's 32 bits wide, so a signed int can represent 65529 just fine. You can check this by printing sizeof(int).
To answer the question posted in the comment above - try something like this:
unsigned short int x = 65529U;
short int y = (short int)x;
printf("%d\n", y);
or
unsigned short int x = 65529U;
short int y = 0;
memcpy(&y, &x, sizeof(short int);
printf("%d\n", y);
Since converting unsigned values use to represent positive numbers converting it can be done by setting the most significant bit to 0. Therefore a program will not interpret that as a Two`s complement value. One caveat is that this will lose information for numbers that near max of the unsigned type.
template <typename TUnsigned, typename TSinged>
TSinged UnsignedToSigned(TUnsigned val)
{
return val & ~(1 << ((sizeof(TUnsigned) * 8) - 1));
}
I know this is an old question, but I think the responders may have misinterpreted it. I think what was intended was to convert a 16-digit bit sequence received as an unsigned integer (technically, an unsigned short) into a signed integer. This might happen (it recently did to me) when you need to convert something received from a network from network byte order to host byte order. In that case, use a union:
unsigned short value_from_network;
unsigned short host_val = ntohs(value_from_network);
// Now suppose host_val is 65529.
union SignedUnsigned {
short s_int;
unsigned short us_int;
};
SignedUnsigned su;
su.us_int = host_val;
short minus_seven = su.s_int;
And now minus_seven has the value -7.
Related
I have the following code:
unsigned char x = 255;
printf("%x\n", x); // ff
unsigned char tmp = x << 7;
unsigned char y = tmp >> 7;
printf("%x\n", y); // 1
unsigned char z = (x << 7) >> 7;
printf("%x\n", z); // ff
I would have expected y and z to be the same. But they differ depending on whether a intermediary variable is used. It would be interesting to know why this is the case.
This little test is actually more subtle than it looks as the behavior is implementation defined:
unsigned char x = 255; no ambiguity here, x is an unsigned char with value 255, type unsigned char is guaranteed to have enough range to store 255.
printf("%x\n", x); This produces ff on standard output but it would be cleaner to write printf("%hhx\n", x); as printf expects an unsigned int for conversion %x, which x is not. Passing x might actually pass an int or an unsigned int argument.
unsigned char tmp = x << 7; To evaluate the expression x << 7, x being an unsigned char first undergoes the integer promotions defined in the C Standard 6.3.3.1: If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
So if the number of value bits in unsigned char is smaller or equal to that of int (the most common case currently being 8 vs 31), x is first promoted to an int with the same value, which is then shifted left by 7 positions. The result, 0x7f80, is guaranteed to fit in the int type, so the behavior is well defined and converting this value to type unsigned char will effectively truncate the high order bits of the value. If type unsigned char has 8 bits, the value will be 128 (0x80), but if type unsigned char has more bits, the value in tmp can be 0x180, 0x380, 0x780, 0xf80, 0x1f80, 0x3f80 or even 0x7f80.
If type unsigned char is larger than int, which can occur on rare systems where sizeof(int) == 1, x is promoted to unsigned int and the left shift is performed on this type. The value is 0x7f80U, which is guaranteed to fit in type unsigned int and storing that to tmp does not actually lose any information since type unsigned char has the same size as unsigned int. So tmp would have the value 0x7f80 in this case.
unsigned char y = tmp >> 7; The evaluation proceeds the same as above, tmp is promoted to int or unsigned int depending on the system, which preserves its value, and this value is shifted right by 7 positions, which is fully defined because 7 is less than the width of the type (int or unsigned int) and the value is positive. Depending on the number of bits of type unsigned char, the value stored in y can be 1, 3, 7, 15, 31, 63, 127 or 255, the most common architecture will have y == 1.
printf("%x\n", y); again, it would be better t write printf("%hhx\n", y); and the output may be 1 (most common case) or 3, 7, f, 1f, 3f, 7f or ff depending on the number of value bits in type unsigned char.
unsigned char z = (x << 7) >> 7; The integer promotion is performed on x as described above, the value (255) is then shifted left 7 bits as an int or an unsigned int, always producing 0x7f80 and then right shifted by 7 positions, with a final value of 0xff. This behavior is fully defined.
printf("%x\n", z); Once more, the format string should be printf("%hhx\n", z); and the output would always be ff.
Systems where bytes have more than 8 bits are becoming rare these days, but some embedded processors, such as specialized DSPs still do that. It would take a perverse system to fail when passed an unsigned char for a %x conversion specifier, but it is cleaner to either use %hhx or more portably write printf("%x\n", (unsigned)z);
Shifting by 8 instead of 7 in this example would be even more contrived. It would have undefined behavior on systems with 16-bit int and 8-bit char.
The 'intermediate' values in your last case are (full) integers, so the bits that are shifted 'out of range' of the original unsigned char type are retained, and thus they are still set when the result is converted back to a single byte.
From this C11 Draft Standard:
6.5.7 Bitwise shift operators ... 3 The integer promotions are performed on each of the operands. The type of the
result is that of the promoted left operand ...
However, in your first case, unsigned char tmp = x << 7;, the tmp loses the six 'high' bits when the resultant 'full' integer is converted (i.e. truncated) back to a single byte, giving a value of 0x80; when this is then right-shifted in unsigned char y = tmp >> 7;, the result is (as expected) 0x01.
The shift operator is not defined for the char types. The value of any char operand is converted to int and the result of the expression is converted the char type.
So, when you put the left and right shift operators in the same expression the calculation will be performed as type int (without loosing any bit), and the result will be converted to char.
When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!
It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.
So I'm trying to interpret the following output:
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("v = %d, uv = %u\n", v, uv);
Output:
v = -12345
uv = 53191
So the question is: why is this exact output generated when this program is run on a two's complement machine?
What operations lead to this result when casting the value to unsigned short?
My answer assumes 16-bit two's complement arithmetic.
To find the value of -12345, take 12345, complement it, and add 1.
12345 is 0x3039 is 0011000000111001.
Complementing means changing all the 1's to 0's and all the 0's to 1's:
1100111111000110 is 0xcfc6 is 53190.
Add one: 53191.
So internally, -12345 is represented by 0xcfc7 = 53191.
But if you interpret it as an unsigned number, it's obviously just 53191. (And when you assign a signed value to an unsigned integer of the same size, what typically ends up happening is that you assign the exact bit pattern, without converting anything. Later, however, you will typically interpret that value differently, such as when you print it with %u.)
Another, perhaps easier way to think about this is that 16-bit arithmetic "wraps around" at 216 = 65536. So you can think of 65536 as being another name for 0 (just like 0:00 and 24:00 are both names for midnight). So -12345 is 65536 - 12345 = 53191.
The conversion rules, when converting signed integer to an unsigned integer, defined by C standard requires by repeatedly adding the TYPE_MAX + 1 to the value.
From 6.3.1.3 Signed and unsigned integers:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
If USHRT_MAX is 65535 and then adding 65535 + 1 + -12345 is 53191.
The output seen does not depend on 2's complement nor 16 or 32- bit int. The output seen is entirely defined and would be the same on a rare 1's complement or sign-magnitude machine. The result does depend on 16-bit unsigned short
-12345 is within the minimum range of a short, so no issues with that assignment. When a short is passed as a ... argument, is goes thought the usual promotion to an int with no change in value as all short are in the range of int. "%d" expects an int, so the output is "-12345"
short int v = -12345;
printf("%d\n", v); // output "-12345\n"
Assigning a negative number to a unsigned type is well defined. With a 16-bit unsigned short, the value of uv is -12345 plus the minimum multiples of USHRT_MAX+1 (65536 in this case) to a final value of 53191.
Passing an unsigned short as an ... argument, the value is converted to int or unsigned, whichever type contains the entire range of unsigned short. IAC, the values does not change. "%u" matches an unsigned. It also matches an int whose values are expressible as either an int or unsigned.
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("%u\n", v); // output "53191\n"
What operations lead to this result when casting the value to unsigned short?
The casting did not affect the final outcome. The same result would have occurred without the cast. The cast may be useful to quiet warnings.
I am trying to convert 65529 from an unsigned int to a signed int. I tried doing a cast like this:
unsigned int x = 65529;
int y = (int) x;
But y is still returning 65529 when it should return -7. Why is that?
It seems like you are expecting int and unsigned int to be a 16-bit integer. That's apparently not the case. Most likely, it's a 32-bit integer - which is large enough to avoid the wrap-around that you're expecting.
Note that there is no fully C-compliant way to do this because casting between signed/unsigned for values out of range is implementation-defined. But this will still work in most cases:
unsigned int x = 65529;
int y = (short) x; // If short is a 16-bit integer.
or alternatively:
unsigned int x = 65529;
int y = (int16_t) x; // This is defined in <stdint.h>
I know it's an old question, but it's a good one, so how about this?
unsigned short int x = 65529U;
short int y = *(short int*)&x;
printf("%d\n", y);
This works because we are casting the address of x to the signed version of it's type, that's permitted by the C standard. Not all type punning like this (most in fact) is legal. The standard says this.
An object shall have its stored value accessed only by an lvalue that has one of the following types:
the declared type of the object,
a qualified version of the declared type of the object,
a type that is the signed or unsigned type corresponding to the declared type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
So, alas, since we are accessing the bits of x as if they were a signed (via the pointer), the actual conversion operation is replaced by reading what appears to be just a negative signed short, and conversion takes place without issue. However, it's possible for this to screw up on a one's complement machine, but those are so, so rare, and so, so obsolete, I wouldn't even bother with looking out for them.
#Mysticial got it. A short is usually 16-bit and will illustrate the answer:
int main()
{
unsigned int x = 65529;
int y = (int) x;
printf("%d\n", y);
unsigned short z = 65529;
short zz = (short)z;
printf("%d\n", zz);
}
65529
-7
Press any key to continue . . .
A little more detail. It's all about how signed numbers are stored in memory. Do a search for twos-complement notation for more detail, but here are the basics.
So let's look at 65529 decimal. It can be represented as FFF9h in hexadecimal. We can also represent that in binary as:
11111111 11111001
When we declare short zz = 65529;, the compiler interprets 65529 as a signed value. In twos-complement notation, the top bit signifies whether a signed value is positive or negative. In this case, you can see the top bit is a 1, so it is treated as a negative number. That's why it prints out -7.
For an unsigned short, we don't care about sign since it's unsigned. So when we print it out using %d, we use all 16 bits, so it's interpreted as 65529.
To understand why, you need to know that the CPU represents signed numbers using the two's complement (maybe not all, but many).
byte n = 1; //0000 0001 = 1
n = ~n + 1; //1111 1110 + 0000 0001 = 1111 1111 = -1
And also, that the type int and unsigned int can be of different sized depending on your CPU. When doing specific stuff like this:
#include <stdint.h>
int8_t ibyte;
uint8_t ubyte;
int16_t iword;
//......
The representation of the values 65529u and -7 are identical for 16-bit ints. Only the interpretation of the bits is different.
For larger ints and these values, you need to sign extend; one way is with logical operations
int y = (int )(x | 0xffff0000u); // assumes 16 to 32 extension, x is > 32767
If speed is not an issue, or divide is fast on your processor,
int y = ((int ) (x * 65536u)) / 65536;
The multiply shifts left 16 bits (again, assuming 16 to 32 extension), and the divide shifts right maintaining the sign.
You are expecting that your int type is 16 bit wide, in which case you'd indeed get a negative value. But most likely it's 32 bits wide, so a signed int can represent 65529 just fine. You can check this by printing sizeof(int).
To answer the question posted in the comment above - try something like this:
unsigned short int x = 65529U;
short int y = (short int)x;
printf("%d\n", y);
or
unsigned short int x = 65529U;
short int y = 0;
memcpy(&y, &x, sizeof(short int);
printf("%d\n", y);
Since converting unsigned values use to represent positive numbers converting it can be done by setting the most significant bit to 0. Therefore a program will not interpret that as a Two`s complement value. One caveat is that this will lose information for numbers that near max of the unsigned type.
template <typename TUnsigned, typename TSinged>
TSinged UnsignedToSigned(TUnsigned val)
{
return val & ~(1 << ((sizeof(TUnsigned) * 8) - 1));
}
I know this is an old question, but I think the responders may have misinterpreted it. I think what was intended was to convert a 16-digit bit sequence received as an unsigned integer (technically, an unsigned short) into a signed integer. This might happen (it recently did to me) when you need to convert something received from a network from network byte order to host byte order. In that case, use a union:
unsigned short value_from_network;
unsigned short host_val = ntohs(value_from_network);
// Now suppose host_val is 65529.
union SignedUnsigned {
short s_int;
unsigned short us_int;
};
SignedUnsigned su;
su.us_int = host_val;
short minus_seven = su.s_int;
And now minus_seven has the value -7.
#include "stdio.h"
int main()
{
int x = -13701;
unsigned int y = 3;
signed short z = x / y;
printf("z = %d\n", z);
return 0;
}
I would expect the answer to be -4567. I am getting "z = 17278".
Why does a promotion of these numbers result in 17278?
I executed this in Code Pad.
The hidden type conversions are:
signed short z = (signed short) (((unsigned int) x) / y);
When you mix signed and unsigned types the unsigned ones win. x is converted to unsigned int, divided by 3, and then that result is down-converted to (signed) short. With 32-bit integers:
(unsigned) -13701 == (unsigned) 0xFFFFCA7B // Bit pattern
(unsigned) 0xFFFFCA7B == (unsigned) 4294953595 // Re-interpret as unsigned
(unsigned) 4294953595 / 3 == (unsigned) 1431651198 // Divide by 3
(unsigned) 1431651198 == (unsigned) 0x5555437E // Bit pattern of that result
(short) 0x5555437E == (short) 0x437E // Strip high 16 bits
(short) 0x437E == (short) 17278 // Re-interpret as short
By the way, the signed keyword is unnecessary. signed short is a longer way of saying short. The only type that needs an explicit signed is char. char can be signed or unsigned depending on the platform; all other types are always signed by default.
Short answer: the division first promotes x to unsigned. Only then the result is cast back to a signed short.
Long answer: read this SO thread.
The problems comes from the unsigned int y. Indeed, x/y becomes unsigned. It works with :
#include "stdio.h"
int main()
{
int x = -13701;
signed int y = 3;
signed short z = x / y;
printf("z = %d\n", z);
return 0;
}
Every time you mix "large" signed and unsigned values in additive and multiplicative arithmetic operations, unsigned type "wins" and the evaluation is performed in the domain of the unsigned type ("large" means int and larger). If your original signed value was negative, it first will be converted to positive unsigned value in accordance with the rules of signed-to-unsigned conversions. In your case -13701 will turn into UINT_MAX + 1 - 13701 and the result will be used as the dividend.
Note that the result of signed-to-unsigned conversion on a typical 32-bit int platform will result in unsigned value 4294953595. After division by 3 you'll get 1431651198. This value is too large to be forced into a short object on a platform with 16-bit short type. An attempt to do that results in implementation-defined behavior. So, if the properties of your platform are the same as in my assumptions, then your code produces implementation-defined behavior. Formally speaking, the "meaningless" 17278 value you are getting is nothing more than a specific manifestation of that implementation-defined behavior. It is possible, that if you compiled your code with overflow checking enabled (if your compiler supports them), it would trap on the assignment.