Suppose I have the following C code.
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
What implicit conversions are going on here, and is this code safe for all values of u and i? (Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
Short Answer
Your i will be converted to an unsigned integer by adding UINT_MAX + 1, then the addition will be carried out with the unsigned values, resulting in a large result (depending on the values of u and i).
Long Answer
According to the C99 Standard:
6.3.1.8 Usual arithmetic conversions
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In your case, we have one unsigned int (u) and signed int (i). Referring to (3) above, since both operands have the same rank, your i will need to be converted to an unsigned integer.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Now we need to refer to (2) above. Your i will be converted to an unsigned value by adding UINT_MAX + 1. So the result will depend on how UINT_MAX is defined on your implementation. It will be large, but it will not overflow, because:
6.2.5 (9)
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Bonus: Arithmetic Conversion Semi-WTF
#include <stdio.h>
int main(void)
{
unsigned int plus_one = 1;
int minus_one = -1;
if(plus_one < minus_one)
printf("1 < -1");
else
printf("boring");
return 0;
}
You can use this link to try this online: https://repl.it/repls/QuickWhimsicalBytes
Bonus: Arithmetic Conversion Side Effect
Arithmetic conversion rules can be used to get the value of UINT_MAX by initializing an unsigned value to -1, ie:
unsigned int umax = -1; // umax set to UINT_MAX
This is guaranteed to be portable regardless of the signed number representation of the system because of the conversion rules described above. See this SO question for more information: Is it safe to use -1 to set all bits to true?
Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3.1.3):
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
In general, conversions in C are defined to operate on values, not on representations.
To answer the original question:
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
The value of i is converted to unsigned int, yielding UINT_MAX + 1 - 5678. This value is then added to the unsigned value 1234, yielding UINT_MAX + 1 - 4444.
(Unlike unsigned overflow, signed overflow invokes undefined behavior. Wraparound is common, but is not guaranteed by the C standard -- and compiler optimizations can wreak havoc on code that makes unwarranted assumptions.)
Referring to The C Programming Language, Second Edition (ISBN 0131103628),
Your addition operation causes the int to be converted to an unsigned int.
Assuming two's complement representation and equally sized types, the bit pattern does not change.
Conversion from unsigned int to signed int is implementation dependent. (But it probably works the way you expect on most platforms these days.)
The rules are a little more complicated in the case of combining signed and unsigned of differing sizes.
When converting from signed to unsigned there are two possibilities. Numbers that were originally positive remain (or are interpreted as) the same value. Number that were originally negative will now be interpreted as larger positive numbers.
When one unsigned and one signed variable are added (or any binary operation) both are implicitly converted to unsigned, which would in this case result in a huge result.
So it is safe in the sense of that the result might be huge and wrong, but it will never crash.
As was previously answered, you can cast back and forth between signed and unsigned without a problem. The border case for signed integers is -1 (0xFFFFFFFF). Try adding and subtracting from that and you'll find that you can cast back and have it be correct.
However, if you are going to be casting back and forth, I would strongly advise naming your variables such that it is clear what type they are, eg:
int iValue, iResult;
unsigned int uValue, uResult;
It is far too easy to get distracted by more important issues and forget which variable is what type if they are named without a hint. You don't want to cast to an unsigned and then use that as an array index.
What implicit conversions are going on here,
i will be converted to an unsigned integer.
and is this code safe for all values of u and i?
Safe in the sense of being well-defined yes (see https://stackoverflow.com/a/50632/5083516 ).
The rules are written in typically hard to read standards-speak but essentially whatever representation was used in the signed integer the unsigned integer will contain a 2's complement representation of the number.
Addition, subtraction and multiplication will work correctly on these numbers resulting in another unsigned integer containing a twos complement number representing the "real result".
division and casting to larger unsigned integer types will have well-defined results but those results will not be 2's complement representations of the "real result".
(Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
While conversions from signed to unsigned are defined by the standard the reverse is implementation-defined both gcc and msvc define the conversion such that you will get the "real result" when converting a 2's complement number stored in an unsigned integer back to a signed integer. I expect you will only find any other behaviour on obscure systems that don't use 2's complement for signed integers.
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
https://msdn.microsoft.com/en-us/library/0eex498h.aspx
Horrible Answers Galore
Ozgur Ozcitak
When you cast from signed to unsigned
(and vice versa) the internal
representation of the number does not
change. What changes is how the
compiler interprets the sign bit.
This is completely wrong.
Mats Fredriksson
When one unsigned and one signed
variable are added (or any binary
operation) both are implicitly
converted to unsigned, which would in
this case result in a huge result.
This is also wrong. Unsigned ints may be promoted to ints should they have equal precision due to padding bits in the unsigned type.
smh
Your addition operation causes the int
to be converted to an unsigned int.
Wrong. Maybe it does and maybe it doesn't.
Conversion from unsigned int to signed
int is implementation dependent. (But
it probably works the way you expect
on most platforms these days.)
Wrong. It is either undefined behavior if it causes overflow or the value is preserved.
Anonymous
The value of i is converted to
unsigned int ...
Wrong. Depends on the precision of an int relative to an unsigned int.
Taylor Price
As was previously answered, you can
cast back and forth between signed and
unsigned without a problem.
Wrong. Trying to store a value outside the range of a signed integer results in undefined behavior.
Now I can finally answer the question.
Should the precision of int be equal to unsigned int, u will be promoted to a signed int and you will get the value -4444 from the expression (u+i). Now, should u and i have other values, you may get overflow and undefined behavior but with those exact numbers you will get -4444 [1]. This value will have type int. But you are trying to store that value into an unsigned int so that will then be cast to an unsigned int and the value that result will end up having would be (UINT_MAX+1) - 4444.
Should the precision of unsigned int be greater than that of an int, the signed int will be promoted to an unsigned int yielding the value (UINT_MAX+1) - 5678 which will be added to the other unsigned int 1234. Should u and i have other values, which make the expression fall outside the range {0..UINT_MAX} the value (UINT_MAX+1) will either be added or subtracted until the result DOES fall inside the range {0..UINT_MAX) and no undefined behavior will occur.
What is precision?
Integers have padding bits, sign bits, and value bits. Unsigned integers do not have a sign bit obviously. Unsigned char is further guaranteed to not have padding bits. The number of values bits an integer has is how much precision it has.
[Gotchas]
The macro sizeof macro alone cannot be used to determine precision of an integer if padding bits are present. And the size of a byte does not have to be an octet (eight bits) as defined by C99.
[1] The overflow may occur at one of two points. Either before the addition (during promotion) - when you have an unsigned int which is too large to fit inside an int. The overflow may also occur after the addition even if the unsigned int was within the range of an int, after the addition the result may still overflow.
Related
Suppose I have the following C code.
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
What implicit conversions are going on here, and is this code safe for all values of u and i? (Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
Short Answer
Your i will be converted to an unsigned integer by adding UINT_MAX + 1, then the addition will be carried out with the unsigned values, resulting in a large result (depending on the values of u and i).
Long Answer
According to the C99 Standard:
6.3.1.8 Usual arithmetic conversions
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In your case, we have one unsigned int (u) and signed int (i). Referring to (3) above, since both operands have the same rank, your i will need to be converted to an unsigned integer.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Now we need to refer to (2) above. Your i will be converted to an unsigned value by adding UINT_MAX + 1. So the result will depend on how UINT_MAX is defined on your implementation. It will be large, but it will not overflow, because:
6.2.5 (9)
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Bonus: Arithmetic Conversion Semi-WTF
#include <stdio.h>
int main(void)
{
unsigned int plus_one = 1;
int minus_one = -1;
if(plus_one < minus_one)
printf("1 < -1");
else
printf("boring");
return 0;
}
You can use this link to try this online: https://repl.it/repls/QuickWhimsicalBytes
Bonus: Arithmetic Conversion Side Effect
Arithmetic conversion rules can be used to get the value of UINT_MAX by initializing an unsigned value to -1, ie:
unsigned int umax = -1; // umax set to UINT_MAX
This is guaranteed to be portable regardless of the signed number representation of the system because of the conversion rules described above. See this SO question for more information: Is it safe to use -1 to set all bits to true?
Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3.1.3):
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
In general, conversions in C are defined to operate on values, not on representations.
To answer the original question:
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
The value of i is converted to unsigned int, yielding UINT_MAX + 1 - 5678. This value is then added to the unsigned value 1234, yielding UINT_MAX + 1 - 4444.
(Unlike unsigned overflow, signed overflow invokes undefined behavior. Wraparound is common, but is not guaranteed by the C standard -- and compiler optimizations can wreak havoc on code that makes unwarranted assumptions.)
Referring to The C Programming Language, Second Edition (ISBN 0131103628),
Your addition operation causes the int to be converted to an unsigned int.
Assuming two's complement representation and equally sized types, the bit pattern does not change.
Conversion from unsigned int to signed int is implementation dependent. (But it probably works the way you expect on most platforms these days.)
The rules are a little more complicated in the case of combining signed and unsigned of differing sizes.
When converting from signed to unsigned there are two possibilities. Numbers that were originally positive remain (or are interpreted as) the same value. Number that were originally negative will now be interpreted as larger positive numbers.
When one unsigned and one signed variable are added (or any binary operation) both are implicitly converted to unsigned, which would in this case result in a huge result.
So it is safe in the sense of that the result might be huge and wrong, but it will never crash.
As was previously answered, you can cast back and forth between signed and unsigned without a problem. The border case for signed integers is -1 (0xFFFFFFFF). Try adding and subtracting from that and you'll find that you can cast back and have it be correct.
However, if you are going to be casting back and forth, I would strongly advise naming your variables such that it is clear what type they are, eg:
int iValue, iResult;
unsigned int uValue, uResult;
It is far too easy to get distracted by more important issues and forget which variable is what type if they are named without a hint. You don't want to cast to an unsigned and then use that as an array index.
What implicit conversions are going on here,
i will be converted to an unsigned integer.
and is this code safe for all values of u and i?
Safe in the sense of being well-defined yes (see https://stackoverflow.com/a/50632/5083516 ).
The rules are written in typically hard to read standards-speak but essentially whatever representation was used in the signed integer the unsigned integer will contain a 2's complement representation of the number.
Addition, subtraction and multiplication will work correctly on these numbers resulting in another unsigned integer containing a twos complement number representing the "real result".
division and casting to larger unsigned integer types will have well-defined results but those results will not be 2's complement representations of the "real result".
(Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
While conversions from signed to unsigned are defined by the standard the reverse is implementation-defined both gcc and msvc define the conversion such that you will get the "real result" when converting a 2's complement number stored in an unsigned integer back to a signed integer. I expect you will only find any other behaviour on obscure systems that don't use 2's complement for signed integers.
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
https://msdn.microsoft.com/en-us/library/0eex498h.aspx
Horrible Answers Galore
Ozgur Ozcitak
When you cast from signed to unsigned
(and vice versa) the internal
representation of the number does not
change. What changes is how the
compiler interprets the sign bit.
This is completely wrong.
Mats Fredriksson
When one unsigned and one signed
variable are added (or any binary
operation) both are implicitly
converted to unsigned, which would in
this case result in a huge result.
This is also wrong. Unsigned ints may be promoted to ints should they have equal precision due to padding bits in the unsigned type.
smh
Your addition operation causes the int
to be converted to an unsigned int.
Wrong. Maybe it does and maybe it doesn't.
Conversion from unsigned int to signed
int is implementation dependent. (But
it probably works the way you expect
on most platforms these days.)
Wrong. It is either undefined behavior if it causes overflow or the value is preserved.
Anonymous
The value of i is converted to
unsigned int ...
Wrong. Depends on the precision of an int relative to an unsigned int.
Taylor Price
As was previously answered, you can
cast back and forth between signed and
unsigned without a problem.
Wrong. Trying to store a value outside the range of a signed integer results in undefined behavior.
Now I can finally answer the question.
Should the precision of int be equal to unsigned int, u will be promoted to a signed int and you will get the value -4444 from the expression (u+i). Now, should u and i have other values, you may get overflow and undefined behavior but with those exact numbers you will get -4444 [1]. This value will have type int. But you are trying to store that value into an unsigned int so that will then be cast to an unsigned int and the value that result will end up having would be (UINT_MAX+1) - 4444.
Should the precision of unsigned int be greater than that of an int, the signed int will be promoted to an unsigned int yielding the value (UINT_MAX+1) - 5678 which will be added to the other unsigned int 1234. Should u and i have other values, which make the expression fall outside the range {0..UINT_MAX} the value (UINT_MAX+1) will either be added or subtracted until the result DOES fall inside the range {0..UINT_MAX) and no undefined behavior will occur.
What is precision?
Integers have padding bits, sign bits, and value bits. Unsigned integers do not have a sign bit obviously. Unsigned char is further guaranteed to not have padding bits. The number of values bits an integer has is how much precision it has.
[Gotchas]
The macro sizeof macro alone cannot be used to determine precision of an integer if padding bits are present. And the size of a byte does not have to be an octet (eight bits) as defined by C99.
[1] The overflow may occur at one of two points. Either before the addition (during promotion) - when you have an unsigned int which is too large to fit inside an int. The overflow may also occur after the addition even if the unsigned int was within the range of an int, after the addition the result may still overflow.
Consider following example:
#include <stdio.h>
int main(void)
{
unsigned char a = 15; /* one byte */
unsigned short b = 15; /* two bytes */
unsigned int c = 15; /* four bytes */
long x = -a; /* eight bytes */
printf("%ld\n", x);
x = -b;
printf("%ld\n", x);
x = -c;
printf("%ld\n", x);
return 0;
}
To compile I am using GCC 4.4.7 (and it gave me no warnings):
gcc -g -std=c99 -pedantic-errors -Wall -W check.c
My result is:
-15
-15
4294967281
The question is why both unsigned char and unsigned short values are "propagated" correctly to (signed) long, while unsigned int is not ? Is there any reference or rule on this ?
Here are results from gdb (words are in little-endian order) accordingly:
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 11111111111111111111111111111111
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 00000000000000000000000000000000
This is due to how the integer promotions applied to the operand and the requirement that the result of unary minus have the same type. This is covered in section 6.5.3.3 Unary arithmetic operators and says (emphasis mine going forward):
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
and integer promotion which is covered in the draft c99 standard section 6.3 Conversions and says:
if an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.
In the first two cases, the promotion will be to int and the result will be int. In the case of unsigned int no promotion is required but the result will require a conversion back to unsigned int.
The -15 is converted to unsigned int using the rules set out in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)
So we end up with -15 + (UMAX + 1) which results in UMAX - 14 which results in a large unsigned value. This is sometimes why you will see code use -1 converted to to an unsigned value to obtain the max unsigned value of a type since it will always end up being -1 + UMAX + 1 which is UMAX.
int is special. Everything smaller than int gets promoted to int in arithmetic operations.
Thus -a and -b are applications of unary minus to int values of 15, which just work and produce -15. This value is then converted to long.
-c is different. c is not promoted to an int as it is not smaller than int. The result of unary minus applied to an unsigned int value of k is again an unsigned int, computed as 2N-k (N is the number of bits).
Now this unsigned int value is converted to long normally.
This behavior is correct. Quotes are from C 9899:TC2.
6.5.3.3/3:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
6.2.5/9:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
6.3.1.1/2:
The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
So for long x = -a;, since the operand a, an unsigned char, has conversion rank less than the rank of int and unsigned int, and all unsigned char values can be represented as int (on your platform), we first promote to type int. The negative of that is simple: the int with value -15.
Same logic for unsigned short (on your platform).
The unsigned int c is not changed by promotion. So the value of -c is calculated using modular arithmetic, giving the result UINT_MAX-14.
C's integer promotion rules are what they are because standards-writers wanted to allow a wide variety of existing implementations that did different things, in some cases because they were created before there were "standards", to keep on doing what they were doing, while defining rules for new implementations that were more specific than "do whatever you feel like". Unfortunately, the rules as written make it extremely difficult to write code which doesn't depend upon a compiler's integer size. Even if future processors would be able to perform 64-bit operations faster than 32-bit ones, the rules dictated by the standards would cause a lot of code to break if int ever grew beyond 32 bits.
It would probably in retrospect have been better to have handled "weird" compilers by explicitly recognizing the existence of multiple dialects of C, and recommending that compilers implement a dialect that handles various things in consistent ways, but providing that they may also implement dialects which do them differently. Such an approach may end up ultimately being the only way that int can grow beyond 32 bits, but I've not heard of anyone even considering such a thing.
I think the root of the problem with unsigned integer types stems from the fact that they are sometimes used to represent numerical quantities, and are sometimes used to represent members of a wrapping abstract algebraic ring. Unsigned types behave in a manner consistent with an abstract algebraic ring in circumstances which do not involve type promotion. Applying a unary minus to a member of a ring should (and does) yield a member of that same ring which, when added to the original, will yield zero [i.e. the additive inverse]. There is exactly one way to map integer quantities to ring elements, but multiple ways exist to map ring elements back to integer quantities. Thus, adding a ring element to an integer quantity should yield an element of the same ring regardless of the size of the integer, and conversion from rings to integer quantities should require that code specify how the conversion should be performed. Unfortunately, C implicitly converts rings to integers in cases where either the size of the ring is smaller than the default integer type, or when an operation uses a ring member with an integer of a larger type.
The proper solution to solve this problem would be to allow code to specify that certain variables, return values, etc. should be regarded as ring types rather than numbers; an expression like -(ring16_t)2 should yield 65534 regardless of the size of int, rather than yielding 65534 on systems where int is 16 bits, and -2 on systems where it's larger. Likewise, (ring32)0xC0000001 * (ring32)0xC0000001 should yield (ring32)0x80000001 even if int happens to be 64 bits [note that if int is 64 bits, the compiler could legally do anything it likes if code tries to multiply two unsigned 32-bit values which equal 0xC0000001, since the result would be too large to represent in a 64-bit signed integer.
Negatives are tricky. Especially when it comes to unsigned values. If you look at the c-documentation, you'll notice that (contrary to what you'd expect) unsigned chars and shorts are promoted to signed ints for computing, while an unsigned int will be computed as an unsigned int.
When you compute the -c, the c is treated as an int, it becomes -15, then is stored in x, (which still believes it is an UNSIGNED int) and is stored as such.
For clarification - No ACTUAL promotion is done when "negativeing" an unsigned. When you assign a negative to any type of int (or take a negative) the 2's compliment of the number is instead used. Since the only practical difference between unsigned and signed values is that the MSB acts as a sign flag, it is taken as a very large positive number instead of a negative one.
Suppose I have the following C code.
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
What implicit conversions are going on here, and is this code safe for all values of u and i? (Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
Short Answer
Your i will be converted to an unsigned integer by adding UINT_MAX + 1, then the addition will be carried out with the unsigned values, resulting in a large result (depending on the values of u and i).
Long Answer
According to the C99 Standard:
6.3.1.8 Usual arithmetic conversions
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In your case, we have one unsigned int (u) and signed int (i). Referring to (3) above, since both operands have the same rank, your i will need to be converted to an unsigned integer.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Now we need to refer to (2) above. Your i will be converted to an unsigned value by adding UINT_MAX + 1. So the result will depend on how UINT_MAX is defined on your implementation. It will be large, but it will not overflow, because:
6.2.5 (9)
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Bonus: Arithmetic Conversion Semi-WTF
#include <stdio.h>
int main(void)
{
unsigned int plus_one = 1;
int minus_one = -1;
if(plus_one < minus_one)
printf("1 < -1");
else
printf("boring");
return 0;
}
You can use this link to try this online: https://repl.it/repls/QuickWhimsicalBytes
Bonus: Arithmetic Conversion Side Effect
Arithmetic conversion rules can be used to get the value of UINT_MAX by initializing an unsigned value to -1, ie:
unsigned int umax = -1; // umax set to UINT_MAX
This is guaranteed to be portable regardless of the signed number representation of the system because of the conversion rules described above. See this SO question for more information: Is it safe to use -1 to set all bits to true?
Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3.1.3):
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
In general, conversions in C are defined to operate on values, not on representations.
To answer the original question:
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
The value of i is converted to unsigned int, yielding UINT_MAX + 1 - 5678. This value is then added to the unsigned value 1234, yielding UINT_MAX + 1 - 4444.
(Unlike unsigned overflow, signed overflow invokes undefined behavior. Wraparound is common, but is not guaranteed by the C standard -- and compiler optimizations can wreak havoc on code that makes unwarranted assumptions.)
Referring to The C Programming Language, Second Edition (ISBN 0131103628),
Your addition operation causes the int to be converted to an unsigned int.
Assuming two's complement representation and equally sized types, the bit pattern does not change.
Conversion from unsigned int to signed int is implementation dependent. (But it probably works the way you expect on most platforms these days.)
The rules are a little more complicated in the case of combining signed and unsigned of differing sizes.
When converting from signed to unsigned there are two possibilities. Numbers that were originally positive remain (or are interpreted as) the same value. Number that were originally negative will now be interpreted as larger positive numbers.
When one unsigned and one signed variable are added (or any binary operation) both are implicitly converted to unsigned, which would in this case result in a huge result.
So it is safe in the sense of that the result might be huge and wrong, but it will never crash.
As was previously answered, you can cast back and forth between signed and unsigned without a problem. The border case for signed integers is -1 (0xFFFFFFFF). Try adding and subtracting from that and you'll find that you can cast back and have it be correct.
However, if you are going to be casting back and forth, I would strongly advise naming your variables such that it is clear what type they are, eg:
int iValue, iResult;
unsigned int uValue, uResult;
It is far too easy to get distracted by more important issues and forget which variable is what type if they are named without a hint. You don't want to cast to an unsigned and then use that as an array index.
What implicit conversions are going on here,
i will be converted to an unsigned integer.
and is this code safe for all values of u and i?
Safe in the sense of being well-defined yes (see https://stackoverflow.com/a/50632/5083516 ).
The rules are written in typically hard to read standards-speak but essentially whatever representation was used in the signed integer the unsigned integer will contain a 2's complement representation of the number.
Addition, subtraction and multiplication will work correctly on these numbers resulting in another unsigned integer containing a twos complement number representing the "real result".
division and casting to larger unsigned integer types will have well-defined results but those results will not be 2's complement representations of the "real result".
(Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
While conversions from signed to unsigned are defined by the standard the reverse is implementation-defined both gcc and msvc define the conversion such that you will get the "real result" when converting a 2's complement number stored in an unsigned integer back to a signed integer. I expect you will only find any other behaviour on obscure systems that don't use 2's complement for signed integers.
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
https://msdn.microsoft.com/en-us/library/0eex498h.aspx
Horrible Answers Galore
Ozgur Ozcitak
When you cast from signed to unsigned
(and vice versa) the internal
representation of the number does not
change. What changes is how the
compiler interprets the sign bit.
This is completely wrong.
Mats Fredriksson
When one unsigned and one signed
variable are added (or any binary
operation) both are implicitly
converted to unsigned, which would in
this case result in a huge result.
This is also wrong. Unsigned ints may be promoted to ints should they have equal precision due to padding bits in the unsigned type.
smh
Your addition operation causes the int
to be converted to an unsigned int.
Wrong. Maybe it does and maybe it doesn't.
Conversion from unsigned int to signed
int is implementation dependent. (But
it probably works the way you expect
on most platforms these days.)
Wrong. It is either undefined behavior if it causes overflow or the value is preserved.
Anonymous
The value of i is converted to
unsigned int ...
Wrong. Depends on the precision of an int relative to an unsigned int.
Taylor Price
As was previously answered, you can
cast back and forth between signed and
unsigned without a problem.
Wrong. Trying to store a value outside the range of a signed integer results in undefined behavior.
Now I can finally answer the question.
Should the precision of int be equal to unsigned int, u will be promoted to a signed int and you will get the value -4444 from the expression (u+i). Now, should u and i have other values, you may get overflow and undefined behavior but with those exact numbers you will get -4444 [1]. This value will have type int. But you are trying to store that value into an unsigned int so that will then be cast to an unsigned int and the value that result will end up having would be (UINT_MAX+1) - 4444.
Should the precision of unsigned int be greater than that of an int, the signed int will be promoted to an unsigned int yielding the value (UINT_MAX+1) - 5678 which will be added to the other unsigned int 1234. Should u and i have other values, which make the expression fall outside the range {0..UINT_MAX} the value (UINT_MAX+1) will either be added or subtracted until the result DOES fall inside the range {0..UINT_MAX) and no undefined behavior will occur.
What is precision?
Integers have padding bits, sign bits, and value bits. Unsigned integers do not have a sign bit obviously. Unsigned char is further guaranteed to not have padding bits. The number of values bits an integer has is how much precision it has.
[Gotchas]
The macro sizeof macro alone cannot be used to determine precision of an integer if padding bits are present. And the size of a byte does not have to be an octet (eight bits) as defined by C99.
[1] The overflow may occur at one of two points. Either before the addition (during promotion) - when you have an unsigned int which is too large to fit inside an int. The overflow may also occur after the addition even if the unsigned int was within the range of an int, after the addition the result may still overflow.
It was pretty difficult for me to find information on this subject in manner that I could easily understand, so I'm asking for a review of what I have found.It's all about conversion and conversion only.
In examples I will be referring to:
(signed/unsigned) int bigger;
(signed/unsigned) char smaller;
Truncating integers. (bigger->smaller)
first truncate bigger on MSB side to match smaller size.
second, convert truncated result to signed/unsigned depending on smaller type.
If bigger value is too big to fit in smaller type, it results in undefined behaviour (correct me on that). However my rule should be working on all machines (correct me on that, too) and results should be predictable.
Widening integers (smaller->bigger)
a) signed char -> signed int
prepend smaller with MSB (1 or 0) to match bigger size
convert to signed
b) signed char -> unsigned int
prepend smaller with MSB (1 or 0) to match bigger size.
convert to unsigned
c) unsigned char -> signed int
prepend with 0's to match bigger size
convert to signed
d) unsigned char -> unsigned int
prepend with 0's to match bigger size
convert to unsigned
Where are undefined/unspecified behaviours that I didn' mention that could pop up?
An integral conversion never produces undefined behaviour (it can produce implementation-defined behaviour).
A conversion to a type that can represent the value being converted is always well-defined: the value simply stays unchanged.
A conversion to an unsigned type is always well-defined: the value is taken modulo UINT_MAX+1 (or whatever maximum value the target type admits).
A conversion to a signed type that cannot represent the value being converted results in either an implementation-defined value, or an implementation-defined signal.
Note that the above rules are defined in terms of integer values and not in terms of sequences of bits.
From C standard document (p.50 draft version 201x I believe and not exact quote):
No two signed integer shall have the same rank
The rank of signed integer shall be greater than the rank of any signed integer with less precision.
long long int is greater than long int which is greater than int which is greater than short int which is greater than signed char.
signed and unsigned of same precision have same rank (ex: signed int is same rank as unsigned int)
The rank of any standard integer type shall be greater than the rank of any extended integer type of same width.
The rank of char is equal to unsigned char is equal to signed char.
(I'm leaving out bool because you excluded them from your question)
The rank of any extended signed integer relative to another extended signed integer is implementation-defined but still subject to other rules of integer conversion rank.
for all integer types T1 T2 and T3, is T1 has greater rank than T2 and T2 has greater rank than T3, than T1 has greater rank than T3.
An object with an integer type (other than int and signed int) whose integer rank is LESS than or EQUAL to the rank of int and unsigned int, a bit field of type _Bool, int, signed int or unsigned int; if an int can represent all values of the original type, the value is converted to an int. Otherwise to an unsigned int. All other types are changed by the integer promotion.
In plain terms:
Any type "smaller" than int or unsigned int get promoted to int when converted to other type of greater rank. This is the compiler's job to ensure that a C code compiled for a given machine (architecture) is ISO-C compliant in that regard. char is implementation defined (being signed or unsigned). All other types (promotion or "demotion") are implementation-defined.
What is implementation-defined? It means that a given compiler will systematically behave the same on a given machine. In other words all "implementation-defined" behavior depends BOTH on the compiler AND the target machine.
To make portable code:
always promote values to greater rank standard C types.
Never "demote" values to lesser types.
Avoid all "implementation-defined" implementation in your code.
Why that implementation-defined madness exists if it ruins programmers' effort??? System programming basically requires these implementation-defined behavior.
So more specifically toward your question:
truncation will most likely not be protable. Or will require much more effort in maintenance, bug tracking etc, than simply maintaining the code by using higher rank types.
If your implementation runs values bigger than the types involved, your design is wrong (unless you are involved in system programming).
As a rule of thumb, going from unsigned to signed preserves values but not the other way around. So when an unsigned value goes toe to toe against a signed one, promote the unsigned to signed instead of the other way around.
If using as small integer types as possible is memory-critical in your application, you should probably revisit the whole program's achitecture.
My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.