Edit: I updated the example to be C. I am concerned specifically with C and not C++ (sorry for the confusion, see situation below).
I am looking for a safe way to convert a signed integer to an unsigned integer while always maintaining the exact same bit pattern between conversions. As I understand it, simply casting has undefined or implementation dependent behavior so it is not safe to rely on (case A below). But what about bit-wise operators like OR (case B below)? Can bit-wise OR be used to safely convert signed to unsigned? What about the reverse?
Example:
#include <stdio.h>
int main() {
// NOTE: assuming 32bit ints
// example bit pattern: 11111111110001110001001111011010
// signed int value: -3730470
// unsigned int value: 4291236826
// example 1
// signed -> unsigned
int s1 = -3730470;
unsigned int u1a = (unsigned int)s1;
unsigned int u1b = (unsigned int)0 | s1;
printf("%u\n%u\n", u1a, u1b);
// example 2
// unsigned -> signed
unsigned int u2 = 4291236826;
int s2a = (int)u2;
int s2b = (int)0 | u2;
printf("%i\n%i\n", s2a, s2b);
}
Situation: I am writing a PostgreSQL C-Language function/extension to add popcount functionality (my first attempt code here). PostgreSQL does not support unsigned types (ref). All the efficient methods of calculating popcount I found require unsigned data types to work correctly. Therefore, I must be able to convert the signed data types to an unsigned data type without changing the bit pattern.
Off topic: I do realize that an alternate solution would be to use PostgreSQL bit string bit and varbit data types instead of the integer data types, but for my purposes the integer data types are much easier to use and manage.
a safe way to convert a signed integer to an unsigned integer while always maintaining the exact same bit pattern between conversions
A union will work as below even if the int is a rare non-2's complement. Only on very expectational platforms (ticking away in a silicon graveyard) where INT_MAX == UINT_MAX will this be a problem.
union {
int i;
unsigned u;
} x = { some_int };
printf("%d\n", some_int);
printf("%u\n", x.u);
Yet if one can limit oneself to common 2's complement int, the below is sufficient.
unsigned u = (unsigned) some_int;
But what about bit-wise operators like OR (case B below)?
Can bit-wise OR be used to safely convert signed to unsigned?
The following | is like a hidden cast due to integer promotions:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. C11dr §6.3.1.1 3
int s1 = -3730470;
unsigned int u1b = (unsigned int)0 | s1;
// just like
= (unsigned int)0 | (unsigned int)s1;
= (unsigned int)s1;
What about the reverse?
Converting a unsigned int to a signed int is well defined if the value is representable in both [0...INT_MAX]. Converting an out-of-int-range unsigned to int is ...
either the result is implementation-defined or an implementation-defined signal is raised. §6.3.1.3 3
Best to use unsigned types for bit manipulations.
The below code may often work as hoped, but should not be used for robust coding.
// NOTE: assuming 32bit ints, etc.
unsigned int u2 = 4291236826;
int s2a = (int)u2; // avoid this
Alternative
int s2a;
if (u2 > INT_MAX) {
// Handle with some other code
} else {
s2a = (int) u2; // OK
}
BTW: better to append u to unsigned constants like 4291236826 to convey to the compiler that indeed an unsigned constant is intended and not a long long like 4291236826.
unsigned int u2 = 4291236826u;
What about ...
int s1 = -3730470;
unsigned int u1 = *(unsigned int*)&s1;
unsigned int u2 = 4291236826;
int s2a = *(int*)&u2;
Related
I would like to convert int to byte in C.
How could i get the value?
in Java
int num = 167;
byte b = num.toByte(); // -89
in C
int num = 167;
???
There is no such type as Byte in native C. Although if you don't want to import new libs, you can create one like this :
typedef unsigned char Byte
And then create any variable you'd like with it :
int bar = 15;
Byte foo = (Byte)bar
You can simply cast to a byte:
unsigned char b=(unsigned char)num;
Note that if num is more than 255 or less than 0 C won't crash and simply give the wrong result.
In computer science, the term byte is well-defined as an 8 bit chunk of raw data. Apparently Java uses a different definition than computer science...
-89 is not the value 167 "converted to a byte". 167 already fits in a byte, so no conversion is necessary.
-89 is the value 167 converted to signed 2's complement with 8 bits representation.
The most correct type to use for signed 2's complement 8 bit integers in C is int8_t from stdint.h.
Converting from int to int8_t is done implicitly in C upon assignment. There is no need for a cast.
int num = 167;
int8_t b = num;
byte is a java signed integer type with a range of -128 to 127.
The corresponding type in C is int8_t defined in <stdint.h> for architectures with 8-bit bytes. It is an alias for signed char.
You can write:
#include <stdint.h>
void f() {
int num = 167;
int8_t b = num; // or signed char b = num;
...
If your compiler emits a warning about the implicit conversion to a smaller type, you can add an explicit cast:
int8_t b = (int8_t)num; // or signed char b = (signed char)num;
Note however that it is much more common to think of 8-bit bytes as unsigned quantities in the range 0 to 255, for which one would use type uint8_t or unsigned char. The reason java byte is a signed type might be that there is no unsigned type in this language, but it is quite confusing for non-native readers.
byte can also be defined as a typedef:
typedef unsigned char byte; // 0-255;
or
typedef signed char byte; // -128-127;
Do not use type char because it is implementation defined whether this type is signed or unsigned by default. Reserve type char for the characters in C strings, although many functions actually consider these to be unsigned: strcmp(), functions from <ctype.h>...
I am trying to convert 65529 from an unsigned int to a signed int. I tried doing a cast like this:
unsigned int x = 65529;
int y = (int) x;
But y is still returning 65529 when it should return -7. Why is that?
It seems like you are expecting int and unsigned int to be a 16-bit integer. That's apparently not the case. Most likely, it's a 32-bit integer - which is large enough to avoid the wrap-around that you're expecting.
Note that there is no fully C-compliant way to do this because casting between signed/unsigned for values out of range is implementation-defined. But this will still work in most cases:
unsigned int x = 65529;
int y = (short) x; // If short is a 16-bit integer.
or alternatively:
unsigned int x = 65529;
int y = (int16_t) x; // This is defined in <stdint.h>
I know it's an old question, but it's a good one, so how about this?
unsigned short int x = 65529U;
short int y = *(short int*)&x;
printf("%d\n", y);
This works because we are casting the address of x to the signed version of it's type, that's permitted by the C standard. Not all type punning like this (most in fact) is legal. The standard says this.
An object shall have its stored value accessed only by an lvalue that has one of the following types:
the declared type of the object,
a qualified version of the declared type of the object,
a type that is the signed or unsigned type corresponding to the declared type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
So, alas, since we are accessing the bits of x as if they were a signed (via the pointer), the actual conversion operation is replaced by reading what appears to be just a negative signed short, and conversion takes place without issue. However, it's possible for this to screw up on a one's complement machine, but those are so, so rare, and so, so obsolete, I wouldn't even bother with looking out for them.
#Mysticial got it. A short is usually 16-bit and will illustrate the answer:
int main()
{
unsigned int x = 65529;
int y = (int) x;
printf("%d\n", y);
unsigned short z = 65529;
short zz = (short)z;
printf("%d\n", zz);
}
65529
-7
Press any key to continue . . .
A little more detail. It's all about how signed numbers are stored in memory. Do a search for twos-complement notation for more detail, but here are the basics.
So let's look at 65529 decimal. It can be represented as FFF9h in hexadecimal. We can also represent that in binary as:
11111111 11111001
When we declare short zz = 65529;, the compiler interprets 65529 as a signed value. In twos-complement notation, the top bit signifies whether a signed value is positive or negative. In this case, you can see the top bit is a 1, so it is treated as a negative number. That's why it prints out -7.
For an unsigned short, we don't care about sign since it's unsigned. So when we print it out using %d, we use all 16 bits, so it's interpreted as 65529.
To understand why, you need to know that the CPU represents signed numbers using the two's complement (maybe not all, but many).
byte n = 1; //0000 0001 = 1
n = ~n + 1; //1111 1110 + 0000 0001 = 1111 1111 = -1
And also, that the type int and unsigned int can be of different sized depending on your CPU. When doing specific stuff like this:
#include <stdint.h>
int8_t ibyte;
uint8_t ubyte;
int16_t iword;
//......
The representation of the values 65529u and -7 are identical for 16-bit ints. Only the interpretation of the bits is different.
For larger ints and these values, you need to sign extend; one way is with logical operations
int y = (int )(x | 0xffff0000u); // assumes 16 to 32 extension, x is > 32767
If speed is not an issue, or divide is fast on your processor,
int y = ((int ) (x * 65536u)) / 65536;
The multiply shifts left 16 bits (again, assuming 16 to 32 extension), and the divide shifts right maintaining the sign.
You are expecting that your int type is 16 bit wide, in which case you'd indeed get a negative value. But most likely it's 32 bits wide, so a signed int can represent 65529 just fine. You can check this by printing sizeof(int).
To answer the question posted in the comment above - try something like this:
unsigned short int x = 65529U;
short int y = (short int)x;
printf("%d\n", y);
or
unsigned short int x = 65529U;
short int y = 0;
memcpy(&y, &x, sizeof(short int);
printf("%d\n", y);
Since converting unsigned values use to represent positive numbers converting it can be done by setting the most significant bit to 0. Therefore a program will not interpret that as a Two`s complement value. One caveat is that this will lose information for numbers that near max of the unsigned type.
template <typename TUnsigned, typename TSinged>
TSinged UnsignedToSigned(TUnsigned val)
{
return val & ~(1 << ((sizeof(TUnsigned) * 8) - 1));
}
I know this is an old question, but I think the responders may have misinterpreted it. I think what was intended was to convert a 16-digit bit sequence received as an unsigned integer (technically, an unsigned short) into a signed integer. This might happen (it recently did to me) when you need to convert something received from a network from network byte order to host byte order. In that case, use a union:
unsigned short value_from_network;
unsigned short host_val = ntohs(value_from_network);
// Now suppose host_val is 65529.
union SignedUnsigned {
short s_int;
unsigned short us_int;
};
SignedUnsigned su;
su.us_int = host_val;
short minus_seven = su.s_int;
And now minus_seven has the value -7.
I am trying to convert 65529 from an unsigned int to a signed int. I tried doing a cast like this:
unsigned int x = 65529;
int y = (int) x;
But y is still returning 65529 when it should return -7. Why is that?
It seems like you are expecting int and unsigned int to be a 16-bit integer. That's apparently not the case. Most likely, it's a 32-bit integer - which is large enough to avoid the wrap-around that you're expecting.
Note that there is no fully C-compliant way to do this because casting between signed/unsigned for values out of range is implementation-defined. But this will still work in most cases:
unsigned int x = 65529;
int y = (short) x; // If short is a 16-bit integer.
or alternatively:
unsigned int x = 65529;
int y = (int16_t) x; // This is defined in <stdint.h>
I know it's an old question, but it's a good one, so how about this?
unsigned short int x = 65529U;
short int y = *(short int*)&x;
printf("%d\n", y);
This works because we are casting the address of x to the signed version of it's type, that's permitted by the C standard. Not all type punning like this (most in fact) is legal. The standard says this.
An object shall have its stored value accessed only by an lvalue that has one of the following types:
the declared type of the object,
a qualified version of the declared type of the object,
a type that is the signed or unsigned type corresponding to the declared type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
So, alas, since we are accessing the bits of x as if they were a signed (via the pointer), the actual conversion operation is replaced by reading what appears to be just a negative signed short, and conversion takes place without issue. However, it's possible for this to screw up on a one's complement machine, but those are so, so rare, and so, so obsolete, I wouldn't even bother with looking out for them.
#Mysticial got it. A short is usually 16-bit and will illustrate the answer:
int main()
{
unsigned int x = 65529;
int y = (int) x;
printf("%d\n", y);
unsigned short z = 65529;
short zz = (short)z;
printf("%d\n", zz);
}
65529
-7
Press any key to continue . . .
A little more detail. It's all about how signed numbers are stored in memory. Do a search for twos-complement notation for more detail, but here are the basics.
So let's look at 65529 decimal. It can be represented as FFF9h in hexadecimal. We can also represent that in binary as:
11111111 11111001
When we declare short zz = 65529;, the compiler interprets 65529 as a signed value. In twos-complement notation, the top bit signifies whether a signed value is positive or negative. In this case, you can see the top bit is a 1, so it is treated as a negative number. That's why it prints out -7.
For an unsigned short, we don't care about sign since it's unsigned. So when we print it out using %d, we use all 16 bits, so it's interpreted as 65529.
To understand why, you need to know that the CPU represents signed numbers using the two's complement (maybe not all, but many).
byte n = 1; //0000 0001 = 1
n = ~n + 1; //1111 1110 + 0000 0001 = 1111 1111 = -1
And also, that the type int and unsigned int can be of different sized depending on your CPU. When doing specific stuff like this:
#include <stdint.h>
int8_t ibyte;
uint8_t ubyte;
int16_t iword;
//......
The representation of the values 65529u and -7 are identical for 16-bit ints. Only the interpretation of the bits is different.
For larger ints and these values, you need to sign extend; one way is with logical operations
int y = (int )(x | 0xffff0000u); // assumes 16 to 32 extension, x is > 32767
If speed is not an issue, or divide is fast on your processor,
int y = ((int ) (x * 65536u)) / 65536;
The multiply shifts left 16 bits (again, assuming 16 to 32 extension), and the divide shifts right maintaining the sign.
You are expecting that your int type is 16 bit wide, in which case you'd indeed get a negative value. But most likely it's 32 bits wide, so a signed int can represent 65529 just fine. You can check this by printing sizeof(int).
To answer the question posted in the comment above - try something like this:
unsigned short int x = 65529U;
short int y = (short int)x;
printf("%d\n", y);
or
unsigned short int x = 65529U;
short int y = 0;
memcpy(&y, &x, sizeof(short int);
printf("%d\n", y);
Since converting unsigned values use to represent positive numbers converting it can be done by setting the most significant bit to 0. Therefore a program will not interpret that as a Two`s complement value. One caveat is that this will lose information for numbers that near max of the unsigned type.
template <typename TUnsigned, typename TSinged>
TSinged UnsignedToSigned(TUnsigned val)
{
return val & ~(1 << ((sizeof(TUnsigned) * 8) - 1));
}
I know this is an old question, but I think the responders may have misinterpreted it. I think what was intended was to convert a 16-digit bit sequence received as an unsigned integer (technically, an unsigned short) into a signed integer. This might happen (it recently did to me) when you need to convert something received from a network from network byte order to host byte order. In that case, use a union:
unsigned short value_from_network;
unsigned short host_val = ntohs(value_from_network);
// Now suppose host_val is 65529.
union SignedUnsigned {
short s_int;
unsigned short us_int;
};
SignedUnsigned su;
su.us_int = host_val;
short minus_seven = su.s_int;
And now minus_seven has the value -7.
#include "stdio.h"
int main()
{
int x = -13701;
unsigned int y = 3;
signed short z = x / y;
printf("z = %d\n", z);
return 0;
}
I would expect the answer to be -4567. I am getting "z = 17278".
Why does a promotion of these numbers result in 17278?
I executed this in Code Pad.
The hidden type conversions are:
signed short z = (signed short) (((unsigned int) x) / y);
When you mix signed and unsigned types the unsigned ones win. x is converted to unsigned int, divided by 3, and then that result is down-converted to (signed) short. With 32-bit integers:
(unsigned) -13701 == (unsigned) 0xFFFFCA7B // Bit pattern
(unsigned) 0xFFFFCA7B == (unsigned) 4294953595 // Re-interpret as unsigned
(unsigned) 4294953595 / 3 == (unsigned) 1431651198 // Divide by 3
(unsigned) 1431651198 == (unsigned) 0x5555437E // Bit pattern of that result
(short) 0x5555437E == (short) 0x437E // Strip high 16 bits
(short) 0x437E == (short) 17278 // Re-interpret as short
By the way, the signed keyword is unnecessary. signed short is a longer way of saying short. The only type that needs an explicit signed is char. char can be signed or unsigned depending on the platform; all other types are always signed by default.
Short answer: the division first promotes x to unsigned. Only then the result is cast back to a signed short.
Long answer: read this SO thread.
The problems comes from the unsigned int y. Indeed, x/y becomes unsigned. It works with :
#include "stdio.h"
int main()
{
int x = -13701;
signed int y = 3;
signed short z = x / y;
printf("z = %d\n", z);
return 0;
}
Every time you mix "large" signed and unsigned values in additive and multiplicative arithmetic operations, unsigned type "wins" and the evaluation is performed in the domain of the unsigned type ("large" means int and larger). If your original signed value was negative, it first will be converted to positive unsigned value in accordance with the rules of signed-to-unsigned conversions. In your case -13701 will turn into UINT_MAX + 1 - 13701 and the result will be used as the dividend.
Note that the result of signed-to-unsigned conversion on a typical 32-bit int platform will result in unsigned value 4294953595. After division by 3 you'll get 1431651198. This value is too large to be forced into a short object on a platform with 16-bit short type. An attempt to do that results in implementation-defined behavior. So, if the properties of your platform are the same as in my assumptions, then your code produces implementation-defined behavior. Formally speaking, the "meaningless" 17278 value you are getting is nothing more than a specific manifestation of that implementation-defined behavior. It is possible, that if you compiled your code with overflow checking enabled (if your compiler supports them), it would trap on the assignment.
I'm trying to figure out if the C Standard (C90, though I'm working off Derek Jones' annotated C99 book) guarantees that I will not lose precision multiplying two unsigned 8-bit values and storing to a 16-bit result. An example statement is as follows:
unsigned char foo;
unsigned int foo_u16 = foo * 10;
Our Keil 8051 compiler (v7.50 at present) will generate a MUL AB instruction which stores the MSB in the B register and the LSB in the accumulator. If I cast foo to a unsigned int first:
unsigned int foo_u16 = (unsigned int)foo * 10;
then the compiler correctly decides I want a unsigned int there and generates an expensive call to a 16x16 bit integer multiply routine. I would like to argue beyond reasonable doubt that this defensive measure is not necessary. As I read the integer promotions described in 6.3.1.1, the effect of the first line shall be as if foo and 10 were promoted to unsigned int, the multiplication performed, and the result stored as unsigned int in foo_u16. If the compiler knows an instruction that does 8x8->16 bit multiplications without loss of precision, so much the better; but the precision is guaranteed. Am I reading this correctly?
Best regards,
Craig Blome
The promotion is guaranteed, but the promotion is made to signed int type if the range of unsigned char fits into the range of signed int. So (assuming it fits) from the language point of view your
unsigned int foo_u16 = foo * 10;
is equivalent to
unsigned int foo_u16 = (signed) foo * 10;
while what you apparently want is
unsigned int foo_u16 = (unsigned) foo * 10;
The result of the multiplication can be different if it (the result) doesn't fit into the signed int range.
If your compiler interprets it differently, it could be a bug in the compiler (again, under the assumption that range of unsigned char fits into the range of signed int).