C - convert unsigned char to signed integer (negative values) [duplicate]

C - convert unsigned char to signed integer (negative values) [duplicate] - c

I am trying to convert 65529 from an unsigned int to a signed int. I tried doing a cast like this:
unsigned int x = 65529;
int y = (int) x;
But y is still returning 65529 when it should return -7. Why is that?

It seems like you are expecting int and unsigned int to be a 16-bit integer. That's apparently not the case. Most likely, it's a 32-bit integer - which is large enough to avoid the wrap-around that you're expecting.
Note that there is no fully C-compliant way to do this because casting between signed/unsigned for values out of range is implementation-defined. But this will still work in most cases:
unsigned int x = 65529;
int y = (short) x; // If short is a 16-bit integer.
or alternatively:
unsigned int x = 65529;
int y = (int16_t) x; // This is defined in <stdint.h>

I know it's an old question, but it's a good one, so how about this?
unsigned short int x = 65529U;
short int y = *(short int*)&x;
printf("%d\n", y);
This works because we are casting the address of x to the signed version of it's type, that's permitted by the C standard. Not all type punning like this (most in fact) is legal. The standard says this.
An object shall have its stored value accessed only by an lvalue that has one of the following types:
the declared type of the object,
a qualified version of the declared type of the object,
a type that is the signed or unsigned type corresponding to the declared type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
So, alas, since we are accessing the bits of x as if they were a signed (via the pointer), the actual conversion operation is replaced by reading what appears to be just a negative signed short, and conversion takes place without issue. However, it's possible for this to screw up on a one's complement machine, but those are so, so rare, and so, so obsolete, I wouldn't even bother with looking out for them.

#Mysticial got it. A short is usually 16-bit and will illustrate the answer:
int main()
{
unsigned int x = 65529;
int y = (int) x;
printf("%d\n", y);
unsigned short z = 65529;
short zz = (short)z;
printf("%d\n", zz);
}
65529
-7
Press any key to continue . . .
A little more detail. It's all about how signed numbers are stored in memory. Do a search for twos-complement notation for more detail, but here are the basics.
So let's look at 65529 decimal. It can be represented as FFF9h in hexadecimal. We can also represent that in binary as:
11111111 11111001
When we declare short zz = 65529;, the compiler interprets 65529 as a signed value. In twos-complement notation, the top bit signifies whether a signed value is positive or negative. In this case, you can see the top bit is a 1, so it is treated as a negative number. That's why it prints out -7.
For an unsigned short, we don't care about sign since it's unsigned. So when we print it out using %d, we use all 16 bits, so it's interpreted as 65529.

To understand why, you need to know that the CPU represents signed numbers using the two's complement (maybe not all, but many).
byte n = 1; //0000 0001 = 1
n = ~n + 1; //1111 1110 + 0000 0001 = 1111 1111 = -1
And also, that the type int and unsigned int can be of different sized depending on your CPU. When doing specific stuff like this:
#include <stdint.h>
int8_t ibyte;
uint8_t ubyte;
int16_t iword;
//......

The representation of the values 65529u and -7 are identical for 16-bit ints. Only the interpretation of the bits is different.
For larger ints and these values, you need to sign extend; one way is with logical operations
int y = (int )(x | 0xffff0000u); // assumes 16 to 32 extension, x is > 32767
If speed is not an issue, or divide is fast on your processor,
int y = ((int ) (x * 65536u)) / 65536;
The multiply shifts left 16 bits (again, assuming 16 to 32 extension), and the divide shifts right maintaining the sign.

You are expecting that your int type is 16 bit wide, in which case you'd indeed get a negative value. But most likely it's 32 bits wide, so a signed int can represent 65529 just fine. You can check this by printing sizeof(int).

To answer the question posted in the comment above - try something like this:
unsigned short int x = 65529U;
short int y = (short int)x;
printf("%d\n", y);
or
unsigned short int x = 65529U;
short int y = 0;
memcpy(&y, &x, sizeof(short int);
printf("%d\n", y);

Since converting unsigned values use to represent positive numbers converting it can be done by setting the most significant bit to 0. Therefore a program will not interpret that as a Two`s complement value. One caveat is that this will lose information for numbers that near max of the unsigned type.
template <typename TUnsigned, typename TSinged>
TSinged UnsignedToSigned(TUnsigned val)
{
return val & ~(1 << ((sizeof(TUnsigned) * 8) - 1));
}

I know this is an old question, but I think the responders may have misinterpreted it. I think what was intended was to convert a 16-digit bit sequence received as an unsigned integer (technically, an unsigned short) into a signed integer. This might happen (it recently did to me) when you need to convert something received from a network from network byte order to host byte order. In that case, use a union:
unsigned short value_from_network;
unsigned short host_val = ntohs(value_from_network);
// Now suppose host_val is 65529.
union SignedUnsigned {
short s_int;
unsigned short us_int;
};
SignedUnsigned su;
su.us_int = host_val;
short minus_seven = su.s_int;
And now minus_seven has the value -7.

Related

Converting non-Ascii characters to int in C, the extra bits are supplemented by 1 rather than 0

When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!

It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.

Output of a short int and a unsigned short?

So I'm trying to interpret the following output:
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("v = %d, uv = %u\n", v, uv);
Output:
v = -12345
uv = 53191
So the question is: why is this exact output generated when this program is run on a two's complement machine?
What operations lead to this result when casting the value to unsigned short?

My answer assumes 16-bit two's complement arithmetic.
To find the value of -12345, take 12345, complement it, and add 1.
12345 is 0x3039 is 0011000000111001.
Complementing means changing all the 1's to 0's and all the 0's to 1's:
1100111111000110 is 0xcfc6 is 53190.
Add one: 53191.
So internally, -12345 is represented by 0xcfc7 = 53191.
But if you interpret it as an unsigned number, it's obviously just 53191. (And when you assign a signed value to an unsigned integer of the same size, what typically ends up happening is that you assign the exact bit pattern, without converting anything. Later, however, you will typically interpret that value differently, such as when you print it with %u.)
Another, perhaps easier way to think about this is that 16-bit arithmetic "wraps around" at 216 = 65536. So you can think of 65536 as being another name for 0 (just like 0:00 and 24:00 are both names for midnight). So -12345 is 65536 - 12345 = 53191.

The conversion rules, when converting signed integer to an unsigned integer, defined by C standard requires by repeatedly adding the TYPE_MAX + 1 to the value.
From 6.3.1.3 Signed and unsigned integers:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
If USHRT_MAX is 65535 and then adding 65535 + 1 + -12345 is 53191.

The output seen does not depend on 2's complement nor 16 or 32- bit int. The output seen is entirely defined and would be the same on a rare 1's complement or sign-magnitude machine. The result does depend on 16-bit unsigned short
-12345 is within the minimum range of a short, so no issues with that assignment. When a short is passed as a ... argument, is goes thought the usual promotion to an int with no change in value as all short are in the range of int. "%d" expects an int, so the output is "-12345"
short int v = -12345;
printf("%d\n", v); // output "-12345\n"
Assigning a negative number to a unsigned type is well defined. With a 16-bit unsigned short, the value of uv is -12345 plus the minimum multiples of USHRT_MAX+1 (65536 in this case) to a final value of 53191.
Passing an unsigned short as an ... argument, the value is converted to int or unsigned, whichever type contains the entire range of unsigned short. IAC, the values does not change. "%u" matches an unsigned. It also matches an int whose values are expressible as either an int or unsigned.
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("%u\n", v); // output "53191\n"
What operations lead to this result when casting the value to unsigned short?
The casting did not affect the final outcome. The same result would have occurred without the cast. The cast may be useful to quiet warnings.

Store signed char inside unsigned int

I have an unsigned int that actually stores a signed int, and the signed int ranges from -128 to 127.
I would like to store this value back in the unsigned int so that I can simply
apply a mask 0xFF and get the signed char.
How do I do the conversion ?
i.e.
unsigned int foo = -100;
foo = (char)foo;
char bar = foo & 0xFF;
assert(bar == -100);

The & 0xFF operation will produce a value in the range 0 to 255. It's not possible to get a negative number this way. So, even if you use & 0xFF somewhere, you will still need to apply a conversion later to get to the range -128 to 127.
In your code:
char bar = foo & 0xFF;
there is an implicit conversion to char. This relies on implementation-defined behaviour but this will work on all but the most esoteric of systems. The most common implementation definition is the inverse of the conversion that applies when converting unsigned char to char.
(Your previous line foo = (char)foo; should be removed).
However,
char bar = foo;
would produce exactly the same effect (again, except for on those esoteric systems).

Since the unsigned int foo value does not reach the boundaries of -128 or 127 the implicit conversion will work for this case. But if unsigned int foo had a bigger value you will be losing bytes at the moment when storing it in a char variable and will get unexpected results on your program.

Answering for C,
If you have an unsigned int whose value was set by assignment of a value of type char (where char happens to be a signed type) or of type signed char, where the assigned value was negative, then the stored value is the arithmetic sum of the assigned negative value and one more than UINT_MAX. This will be far beyond the range of values representable by (signed) char on any C system I have ever encountered. If you convert that value back to (signed) char, whether implicitly or via a cast, "either the result is implementation-defined, or an implementation-defined signal is raised" (C2011, 6.3.1.3/3).
Converting back to the original char value in a way that avoids implementation-defined behavior is a bit tricky (but relying on implementation-defined behavior may afford much easier approaches). Certainly, masking off all but the 8 lowest-order value bits does not do the trick, as it always gives you a positive value. Also, it assumes that char is 8 bits wide, which, in principle, is not guaranteed. It does not necessarily even give you the correct bit pattern, as C permits negative integers to be represented in any of three different ways.
Here's an approach that will work on any conforming C system:
unsigned int foo = SOME_SIGNED_CHAR_VALUE;
signed char bar;
/* ... */
if (foo <= SCHAR_MAX) {
/* foo's value is representable as a signed char */
bar = foo;
} else {
/* mask off the highest-order value bits to yield a value that fits in an int */
int foo2 = foo & INT_MAX;
/* reverse the conversion to unsigned int, as if unsigned int had the same
number of value bits as int; the other bits are already accounted for */
bar = (foo2 - INT_MAX) - 1;
}
That relies only on characteristics of integer representation and conversion that C itself defines.

Don't do it.
Casting to a smaller size may truncate the value. Casting from signed to unsigned or opposite may results wrong value (e.g. 255 -> -1).
If you have to make calculations with different data types, pick one common type, prefereably signed and long int (32-bit), and check boundaries before casting down (to smaller size).
Signed helps you detect underflows (e.g. when result gets less than 0), long int (or just simply: int, which means natural word length) suits for machines (32-bit or 64-bit), and it's big enough for most purposes.
Also try to avoid mixed types in formulas, especially when they contain division (/).

Convert unsigned int to signed int C

I am trying to convert 65529 from an unsigned int to a signed int. I tried doing a cast like this:
unsigned int x = 65529;
int y = (int) x;
But y is still returning 65529 when it should return -7. Why is that?

It seems like you are expecting int and unsigned int to be a 16-bit integer. That's apparently not the case. Most likely, it's a 32-bit integer - which is large enough to avoid the wrap-around that you're expecting.
Note that there is no fully C-compliant way to do this because casting between signed/unsigned for values out of range is implementation-defined. But this will still work in most cases:
unsigned int x = 65529;
int y = (short) x; // If short is a 16-bit integer.
or alternatively:
unsigned int x = 65529;
int y = (int16_t) x; // This is defined in <stdint.h>

I know it's an old question, but it's a good one, so how about this?
unsigned short int x = 65529U;
short int y = *(short int*)&x;
printf("%d\n", y);
This works because we are casting the address of x to the signed version of it's type, that's permitted by the C standard. Not all type punning like this (most in fact) is legal. The standard says this.
An object shall have its stored value accessed only by an lvalue that has one of the following types:
the declared type of the object,
a qualified version of the declared type of the object,
a type that is the signed or unsigned type corresponding to the declared type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
So, alas, since we are accessing the bits of x as if they were a signed (via the pointer), the actual conversion operation is replaced by reading what appears to be just a negative signed short, and conversion takes place without issue. However, it's possible for this to screw up on a one's complement machine, but those are so, so rare, and so, so obsolete, I wouldn't even bother with looking out for them.

#Mysticial got it. A short is usually 16-bit and will illustrate the answer:
int main()
{
unsigned int x = 65529;
int y = (int) x;
printf("%d\n", y);
unsigned short z = 65529;
short zz = (short)z;
printf("%d\n", zz);
}
65529
-7
Press any key to continue . . .
A little more detail. It's all about how signed numbers are stored in memory. Do a search for twos-complement notation for more detail, but here are the basics.
So let's look at 65529 decimal. It can be represented as FFF9h in hexadecimal. We can also represent that in binary as:
11111111 11111001
When we declare short zz = 65529;, the compiler interprets 65529 as a signed value. In twos-complement notation, the top bit signifies whether a signed value is positive or negative. In this case, you can see the top bit is a 1, so it is treated as a negative number. That's why it prints out -7.
For an unsigned short, we don't care about sign since it's unsigned. So when we print it out using %d, we use all 16 bits, so it's interpreted as 65529.

To understand why, you need to know that the CPU represents signed numbers using the two's complement (maybe not all, but many).
byte n = 1; //0000 0001 = 1
n = ~n + 1; //1111 1110 + 0000 0001 = 1111 1111 = -1
And also, that the type int and unsigned int can be of different sized depending on your CPU. When doing specific stuff like this:
#include <stdint.h>
int8_t ibyte;
uint8_t ubyte;
int16_t iword;
//......

The representation of the values 65529u and -7 are identical for 16-bit ints. Only the interpretation of the bits is different.
For larger ints and these values, you need to sign extend; one way is with logical operations
int y = (int )(x | 0xffff0000u); // assumes 16 to 32 extension, x is > 32767
If speed is not an issue, or divide is fast on your processor,
int y = ((int ) (x * 65536u)) / 65536;
The multiply shifts left 16 bits (again, assuming 16 to 32 extension), and the divide shifts right maintaining the sign.

You are expecting that your int type is 16 bit wide, in which case you'd indeed get a negative value. But most likely it's 32 bits wide, so a signed int can represent 65529 just fine. You can check this by printing sizeof(int).

To answer the question posted in the comment above - try something like this:
unsigned short int x = 65529U;
short int y = (short int)x;
printf("%d\n", y);
or
unsigned short int x = 65529U;
short int y = 0;
memcpy(&y, &x, sizeof(short int);
printf("%d\n", y);

Since converting unsigned values use to represent positive numbers converting it can be done by setting the most significant bit to 0. Therefore a program will not interpret that as a Two`s complement value. One caveat is that this will lose information for numbers that near max of the unsigned type.
template <typename TUnsigned, typename TSinged>
TSinged UnsignedToSigned(TUnsigned val)
{
return val & ~(1 << ((sizeof(TUnsigned) * 8) - 1));
}

I know this is an old question, but I think the responders may have misinterpreted it. I think what was intended was to convert a 16-digit bit sequence received as an unsigned integer (technically, an unsigned short) into a signed integer. This might happen (it recently did to me) when you need to convert something received from a network from network byte order to host byte order. In that case, use a union:
unsigned short value_from_network;
unsigned short host_val = ntohs(value_from_network);
// Now suppose host_val is 65529.
union SignedUnsigned {
short s_int;
unsigned short us_int;
};
SignedUnsigned su;
su.us_int = host_val;
short minus_seven = su.s_int;
And now minus_seven has the value -7.

What type-conversions are happening?

#include "stdio.h"
int main()
{
int x = -13701;
unsigned int y = 3;
signed short z = x / y;
printf("z = %d\n", z);
return 0;
}
I would expect the answer to be -4567. I am getting "z = 17278".
Why does a promotion of these numbers result in 17278?
I executed this in Code Pad.

The hidden type conversions are:
signed short z = (signed short) (((unsigned int) x) / y);
When you mix signed and unsigned types the unsigned ones win. x is converted to unsigned int, divided by 3, and then that result is down-converted to (signed) short. With 32-bit integers:
(unsigned) -13701 == (unsigned) 0xFFFFCA7B // Bit pattern
(unsigned) 0xFFFFCA7B == (unsigned) 4294953595 // Re-interpret as unsigned
(unsigned) 4294953595 / 3 == (unsigned) 1431651198 // Divide by 3
(unsigned) 1431651198 == (unsigned) 0x5555437E // Bit pattern of that result
(short) 0x5555437E == (short) 0x437E // Strip high 16 bits
(short) 0x437E == (short) 17278 // Re-interpret as short
By the way, the signed keyword is unnecessary. signed short is a longer way of saying short. The only type that needs an explicit signed is char. char can be signed or unsigned depending on the platform; all other types are always signed by default.

Short answer: the division first promotes x to unsigned. Only then the result is cast back to a signed short.
Long answer: read this SO thread.

The problems comes from the unsigned int y. Indeed, x/y becomes unsigned. It works with :
#include "stdio.h"
int main()
{
int x = -13701;
signed int y = 3;
signed short z = x / y;
printf("z = %d\n", z);
return 0;
}

Every time you mix "large" signed and unsigned values in additive and multiplicative arithmetic operations, unsigned type "wins" and the evaluation is performed in the domain of the unsigned type ("large" means int and larger). If your original signed value was negative, it first will be converted to positive unsigned value in accordance with the rules of signed-to-unsigned conversions. In your case -13701 will turn into UINT_MAX + 1 - 13701 and the result will be used as the dividend.
Note that the result of signed-to-unsigned conversion on a typical 32-bit int platform will result in unsigned value 4294953595. After division by 3 you'll get 1431651198. This value is too large to be forced into a short object on a platform with 16-bit short type. An attempt to do that results in implementation-defined behavior. So, if the properties of your platform are the same as in my assumptions, then your code produces implementation-defined behavior. Formally speaking, the "meaningless" 17278 value you are getting is nothing more than a specific manifestation of that implementation-defined behavior. It is possible, that if you compiled your code with overflow checking enabled (if your compiler supports them), it would trap on the assignment.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight