Related
starting with a pseudo-code snippet:
char a = 0x80;
unsigned short b;
b = (unsigned short)a;
printf ("0x%04x\r\n", b); // => 0xff80
to my current understanding "char" is by definition neither a signed char nor an unsigned char but sort of a third type of signedness.
how does it come that it happens that 'a' is first sign extended from (maybe platform dependent) an 8 bits storage to (a maybe again platform specific) 16 bits of a signed short and then converted to an unsigned short?
is there a c standard that determines the order of expansion?
does this standard guide in any way on how to deal with those third type of signedness that a "pure" char (i called it once an X-char, x for undetermined signedness) so that results are at least deterministic?
PS: if inserting an "(unsigned char)" statement in front of the 'a' in the assignment line, then the result in the printing line is indeed changed to 0x0080. thus only two type casts in a row will provide what might be the intended result for certain intentions.
The type char is not a "third" signedness. It is either signed char or unsigned char, and which one it is is implementation defined.
This is dictated by section 6.2.5p15 of the C standard:
The three types char , signed char , and unsigned char are
collectively called the character types. The implementation
shall define char to have the same range, representation, and
behavior as either signed char or unsigned char.
It appears that on your implementation, char is the same as signed char, so because the value is negative and because the destination type is unsigned it must be converted.
Section 6.3.1.3 dictates how conversion between integer types occur:
1 When a value with integer type is converted to another integer type
other than
_Bool ,if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or
an implementation-defined signal is raised.
Since the value 0x80 == -128 cannot be represented in an unsigned short the conversion in paragraph 2 occurs.
char has implementation-defined signedness. It is either signed or unsigned, depending on compiler. It is true, in a way, that char is a third character type, see this. char has an indeterministic (non-portable) signedness and therefore should never be used for storing raw numbers.
But that doesn't matter in this case.
On your compiler, char is signed.
char a = 0x80; forces a conversion from the type of 0x80, which is int, to char, in a compiler-specific manner. Normally on 2's complement systems, that will mean that the char gets the value -128, as seems to be the case here.
b = (unsigned short)a; forces a conversion from char to unsigned short 1). C17 6.3.1.3 Signed and unsigned integers then says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
One more than the maximum value would be 65536. So you can think of this as -128 + 65536 = 65408.
The unsigned hex representation of 65408 is 0xFF80. No sign extension takes place anywhere!
1) The cast is not needed. When both operands of = are arithmetic types, as in this case, the right operand is implicitly converted to the type of the right operand (C17 6.5.16.1 §2).
Suppose I have the following C code.
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
What implicit conversions are going on here, and is this code safe for all values of u and i? (Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
Short Answer
Your i will be converted to an unsigned integer by adding UINT_MAX + 1, then the addition will be carried out with the unsigned values, resulting in a large result (depending on the values of u and i).
Long Answer
According to the C99 Standard:
6.3.1.8 Usual arithmetic conversions
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In your case, we have one unsigned int (u) and signed int (i). Referring to (3) above, since both operands have the same rank, your i will need to be converted to an unsigned integer.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Now we need to refer to (2) above. Your i will be converted to an unsigned value by adding UINT_MAX + 1. So the result will depend on how UINT_MAX is defined on your implementation. It will be large, but it will not overflow, because:
6.2.5 (9)
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Bonus: Arithmetic Conversion Semi-WTF
#include <stdio.h>
int main(void)
{
unsigned int plus_one = 1;
int minus_one = -1;
if(plus_one < minus_one)
printf("1 < -1");
else
printf("boring");
return 0;
}
You can use this link to try this online: https://repl.it/repls/QuickWhimsicalBytes
Bonus: Arithmetic Conversion Side Effect
Arithmetic conversion rules can be used to get the value of UINT_MAX by initializing an unsigned value to -1, ie:
unsigned int umax = -1; // umax set to UINT_MAX
This is guaranteed to be portable regardless of the signed number representation of the system because of the conversion rules described above. See this SO question for more information: Is it safe to use -1 to set all bits to true?
Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3.1.3):
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
In general, conversions in C are defined to operate on values, not on representations.
To answer the original question:
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
The value of i is converted to unsigned int, yielding UINT_MAX + 1 - 5678. This value is then added to the unsigned value 1234, yielding UINT_MAX + 1 - 4444.
(Unlike unsigned overflow, signed overflow invokes undefined behavior. Wraparound is common, but is not guaranteed by the C standard -- and compiler optimizations can wreak havoc on code that makes unwarranted assumptions.)
Referring to The C Programming Language, Second Edition (ISBN 0131103628),
Your addition operation causes the int to be converted to an unsigned int.
Assuming two's complement representation and equally sized types, the bit pattern does not change.
Conversion from unsigned int to signed int is implementation dependent. (But it probably works the way you expect on most platforms these days.)
The rules are a little more complicated in the case of combining signed and unsigned of differing sizes.
When converting from signed to unsigned there are two possibilities. Numbers that were originally positive remain (or are interpreted as) the same value. Number that were originally negative will now be interpreted as larger positive numbers.
When one unsigned and one signed variable are added (or any binary operation) both are implicitly converted to unsigned, which would in this case result in a huge result.
So it is safe in the sense of that the result might be huge and wrong, but it will never crash.
As was previously answered, you can cast back and forth between signed and unsigned without a problem. The border case for signed integers is -1 (0xFFFFFFFF). Try adding and subtracting from that and you'll find that you can cast back and have it be correct.
However, if you are going to be casting back and forth, I would strongly advise naming your variables such that it is clear what type they are, eg:
int iValue, iResult;
unsigned int uValue, uResult;
It is far too easy to get distracted by more important issues and forget which variable is what type if they are named without a hint. You don't want to cast to an unsigned and then use that as an array index.
What implicit conversions are going on here,
i will be converted to an unsigned integer.
and is this code safe for all values of u and i?
Safe in the sense of being well-defined yes (see https://stackoverflow.com/a/50632/5083516 ).
The rules are written in typically hard to read standards-speak but essentially whatever representation was used in the signed integer the unsigned integer will contain a 2's complement representation of the number.
Addition, subtraction and multiplication will work correctly on these numbers resulting in another unsigned integer containing a twos complement number representing the "real result".
division and casting to larger unsigned integer types will have well-defined results but those results will not be 2's complement representations of the "real result".
(Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
While conversions from signed to unsigned are defined by the standard the reverse is implementation-defined both gcc and msvc define the conversion such that you will get the "real result" when converting a 2's complement number stored in an unsigned integer back to a signed integer. I expect you will only find any other behaviour on obscure systems that don't use 2's complement for signed integers.
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
https://msdn.microsoft.com/en-us/library/0eex498h.aspx
Horrible Answers Galore
Ozgur Ozcitak
When you cast from signed to unsigned
(and vice versa) the internal
representation of the number does not
change. What changes is how the
compiler interprets the sign bit.
This is completely wrong.
Mats Fredriksson
When one unsigned and one signed
variable are added (or any binary
operation) both are implicitly
converted to unsigned, which would in
this case result in a huge result.
This is also wrong. Unsigned ints may be promoted to ints should they have equal precision due to padding bits in the unsigned type.
smh
Your addition operation causes the int
to be converted to an unsigned int.
Wrong. Maybe it does and maybe it doesn't.
Conversion from unsigned int to signed
int is implementation dependent. (But
it probably works the way you expect
on most platforms these days.)
Wrong. It is either undefined behavior if it causes overflow or the value is preserved.
Anonymous
The value of i is converted to
unsigned int ...
Wrong. Depends on the precision of an int relative to an unsigned int.
Taylor Price
As was previously answered, you can
cast back and forth between signed and
unsigned without a problem.
Wrong. Trying to store a value outside the range of a signed integer results in undefined behavior.
Now I can finally answer the question.
Should the precision of int be equal to unsigned int, u will be promoted to a signed int and you will get the value -4444 from the expression (u+i). Now, should u and i have other values, you may get overflow and undefined behavior but with those exact numbers you will get -4444 [1]. This value will have type int. But you are trying to store that value into an unsigned int so that will then be cast to an unsigned int and the value that result will end up having would be (UINT_MAX+1) - 4444.
Should the precision of unsigned int be greater than that of an int, the signed int will be promoted to an unsigned int yielding the value (UINT_MAX+1) - 5678 which will be added to the other unsigned int 1234. Should u and i have other values, which make the expression fall outside the range {0..UINT_MAX} the value (UINT_MAX+1) will either be added or subtracted until the result DOES fall inside the range {0..UINT_MAX) and no undefined behavior will occur.
What is precision?
Integers have padding bits, sign bits, and value bits. Unsigned integers do not have a sign bit obviously. Unsigned char is further guaranteed to not have padding bits. The number of values bits an integer has is how much precision it has.
[Gotchas]
The macro sizeof macro alone cannot be used to determine precision of an integer if padding bits are present. And the size of a byte does not have to be an octet (eight bits) as defined by C99.
[1] The overflow may occur at one of two points. Either before the addition (during promotion) - when you have an unsigned int which is too large to fit inside an int. The overflow may also occur after the addition even if the unsigned int was within the range of an int, after the addition the result may still overflow.
Suppose I have the following C code.
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
What implicit conversions are going on here, and is this code safe for all values of u and i? (Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
Short Answer
Your i will be converted to an unsigned integer by adding UINT_MAX + 1, then the addition will be carried out with the unsigned values, resulting in a large result (depending on the values of u and i).
Long Answer
According to the C99 Standard:
6.3.1.8 Usual arithmetic conversions
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In your case, we have one unsigned int (u) and signed int (i). Referring to (3) above, since both operands have the same rank, your i will need to be converted to an unsigned integer.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Now we need to refer to (2) above. Your i will be converted to an unsigned value by adding UINT_MAX + 1. So the result will depend on how UINT_MAX is defined on your implementation. It will be large, but it will not overflow, because:
6.2.5 (9)
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Bonus: Arithmetic Conversion Semi-WTF
#include <stdio.h>
int main(void)
{
unsigned int plus_one = 1;
int minus_one = -1;
if(plus_one < minus_one)
printf("1 < -1");
else
printf("boring");
return 0;
}
You can use this link to try this online: https://repl.it/repls/QuickWhimsicalBytes
Bonus: Arithmetic Conversion Side Effect
Arithmetic conversion rules can be used to get the value of UINT_MAX by initializing an unsigned value to -1, ie:
unsigned int umax = -1; // umax set to UINT_MAX
This is guaranteed to be portable regardless of the signed number representation of the system because of the conversion rules described above. See this SO question for more information: Is it safe to use -1 to set all bits to true?
Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3.1.3):
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
In general, conversions in C are defined to operate on values, not on representations.
To answer the original question:
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
The value of i is converted to unsigned int, yielding UINT_MAX + 1 - 5678. This value is then added to the unsigned value 1234, yielding UINT_MAX + 1 - 4444.
(Unlike unsigned overflow, signed overflow invokes undefined behavior. Wraparound is common, but is not guaranteed by the C standard -- and compiler optimizations can wreak havoc on code that makes unwarranted assumptions.)
Referring to The C Programming Language, Second Edition (ISBN 0131103628),
Your addition operation causes the int to be converted to an unsigned int.
Assuming two's complement representation and equally sized types, the bit pattern does not change.
Conversion from unsigned int to signed int is implementation dependent. (But it probably works the way you expect on most platforms these days.)
The rules are a little more complicated in the case of combining signed and unsigned of differing sizes.
When converting from signed to unsigned there are two possibilities. Numbers that were originally positive remain (or are interpreted as) the same value. Number that were originally negative will now be interpreted as larger positive numbers.
When one unsigned and one signed variable are added (or any binary operation) both are implicitly converted to unsigned, which would in this case result in a huge result.
So it is safe in the sense of that the result might be huge and wrong, but it will never crash.
As was previously answered, you can cast back and forth between signed and unsigned without a problem. The border case for signed integers is -1 (0xFFFFFFFF). Try adding and subtracting from that and you'll find that you can cast back and have it be correct.
However, if you are going to be casting back and forth, I would strongly advise naming your variables such that it is clear what type they are, eg:
int iValue, iResult;
unsigned int uValue, uResult;
It is far too easy to get distracted by more important issues and forget which variable is what type if they are named without a hint. You don't want to cast to an unsigned and then use that as an array index.
What implicit conversions are going on here,
i will be converted to an unsigned integer.
and is this code safe for all values of u and i?
Safe in the sense of being well-defined yes (see https://stackoverflow.com/a/50632/5083516 ).
The rules are written in typically hard to read standards-speak but essentially whatever representation was used in the signed integer the unsigned integer will contain a 2's complement representation of the number.
Addition, subtraction and multiplication will work correctly on these numbers resulting in another unsigned integer containing a twos complement number representing the "real result".
division and casting to larger unsigned integer types will have well-defined results but those results will not be 2's complement representations of the "real result".
(Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
While conversions from signed to unsigned are defined by the standard the reverse is implementation-defined both gcc and msvc define the conversion such that you will get the "real result" when converting a 2's complement number stored in an unsigned integer back to a signed integer. I expect you will only find any other behaviour on obscure systems that don't use 2's complement for signed integers.
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
https://msdn.microsoft.com/en-us/library/0eex498h.aspx
Horrible Answers Galore
Ozgur Ozcitak
When you cast from signed to unsigned
(and vice versa) the internal
representation of the number does not
change. What changes is how the
compiler interprets the sign bit.
This is completely wrong.
Mats Fredriksson
When one unsigned and one signed
variable are added (or any binary
operation) both are implicitly
converted to unsigned, which would in
this case result in a huge result.
This is also wrong. Unsigned ints may be promoted to ints should they have equal precision due to padding bits in the unsigned type.
smh
Your addition operation causes the int
to be converted to an unsigned int.
Wrong. Maybe it does and maybe it doesn't.
Conversion from unsigned int to signed
int is implementation dependent. (But
it probably works the way you expect
on most platforms these days.)
Wrong. It is either undefined behavior if it causes overflow or the value is preserved.
Anonymous
The value of i is converted to
unsigned int ...
Wrong. Depends on the precision of an int relative to an unsigned int.
Taylor Price
As was previously answered, you can
cast back and forth between signed and
unsigned without a problem.
Wrong. Trying to store a value outside the range of a signed integer results in undefined behavior.
Now I can finally answer the question.
Should the precision of int be equal to unsigned int, u will be promoted to a signed int and you will get the value -4444 from the expression (u+i). Now, should u and i have other values, you may get overflow and undefined behavior but with those exact numbers you will get -4444 [1]. This value will have type int. But you are trying to store that value into an unsigned int so that will then be cast to an unsigned int and the value that result will end up having would be (UINT_MAX+1) - 4444.
Should the precision of unsigned int be greater than that of an int, the signed int will be promoted to an unsigned int yielding the value (UINT_MAX+1) - 5678 which will be added to the other unsigned int 1234. Should u and i have other values, which make the expression fall outside the range {0..UINT_MAX} the value (UINT_MAX+1) will either be added or subtracted until the result DOES fall inside the range {0..UINT_MAX) and no undefined behavior will occur.
What is precision?
Integers have padding bits, sign bits, and value bits. Unsigned integers do not have a sign bit obviously. Unsigned char is further guaranteed to not have padding bits. The number of values bits an integer has is how much precision it has.
[Gotchas]
The macro sizeof macro alone cannot be used to determine precision of an integer if padding bits are present. And the size of a byte does not have to be an octet (eight bits) as defined by C99.
[1] The overflow may occur at one of two points. Either before the addition (during promotion) - when you have an unsigned int which is too large to fit inside an int. The overflow may also occur after the addition even if the unsigned int was within the range of an int, after the addition the result may still overflow.
I have a char array holding several characters. I want to compare one of these characters with an unsigned char variable. For example:
char myarr = { 20, 14, 5, 6, 42 };
const unsigned char foobar = 133;
myarr[2] = foobar;
if(myarr[2] == foobar){
printf("You win a shmoo!\n");
}
Is this comparison type safe?
I know from the C99 standard that char, signed char, and unsigned char are three different types (section 6.2.5 paragraph 14).
Nevertheless, can I safely convert between unsigned char and char, and back, without losing precision and without risking undefined (or implementation-defined) behavior?
In section 6.2.5 paragraph 15:
The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.
In section 6.3.1.3 paragraph 3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
I'm afraid that if char is defined as a signed char, then myarr[2] = foobar could result in an implementation-defined value that will not be converted correctly back to the original unsigned char value; for example, an implementation may always result in the value 42 regardless of the unsigned value involved.
Does this mean that it is not safe to store an unsigned value in a signed variable of the same type?
Also what is an implementation-defined signal; does this mean an implementation could simply end the program in this case?
In section 6.3.1.1 paragraph 1:
-- The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
-- The rank of any unsigned integer type shall equal the rank of the corresponding
signed integer type, if any.
In section 6.2.5 paragraph 8:
For any two integer types with the same signedness and different integer conversion rank
(see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a
subrange of the values of the other type.
In section 6.3.1 paragraph 2:
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.
In section 6.3.1.8 paragraph 1:
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
The range of char is guaranteed to be the same range as that of signed char or unsigned char, which are both subranges of int and unsigned int respectively as a result of their smaller integer conversion rank.
Since, integer promotions rules dictate that char, signed char, and unsigned char be promoted to at least int before being evaluated, does this mean that char could maintain its "signedness" throughout the comparision?
For example:
signed char foo = -1;
unsigned char bar = 255;
if(foo == bar){
printf("same\n");
}
Does foo == bar evaluate to a false value, even if -1 is equivalent to 255 when an explicit (unsigned char) cast is used?
UPDATE:
In section J.3.5 paragraph 1 regarding which cases result in implementation-defined values and behavior:
-- The result of, or the signal raised by, converting an integer to a signed integer type
when the value cannot be represented in an object of that type (6.3.1.3).
Does this mean that not even an explicit conversion is safe?
For example, could the following code result in implementation-defined behavior since char could be defined as a signed integer type:
char blah = (char)255;
My original post is rather broad and consists of many specific questions of which I should have given each its own page. However, I address and answer each question here so future visitors can grok the answers more easily.
Answer 1
Question:
Is this comparison type safe?
The comparison between myarr[2] and foobar in this particular case is safe since both variables hold unsigned values. In general, however, this is not true.
For example, suppose an implementation defines char to have the same behavior as signed char, and int is able to represent all values representable by unsigned char and signed char.
char foo = -25;
unsigned char bar = foo;
if(foo == bar){
printf("This line of text will not be printed.\n");
}
Although bar is set equal to foo, and the C99 standard guarantees that there is no loss of precision when converting from signed char to unsigned char (see Answer 2), the foo == bar conditional expression will evaluate false.
This is due to the nature of integer promotion as required by section 6.3.1 paragraph 2 of the C99 standard:
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.
Since in this implementation int can represent all values of both signed char and unsigned char, the values of both foo and bar are converted to type int before being evaluated. Thus the resulting conditional expression is -25 == 231 which evaluates to false.
Answer 2
Question:
Nevertheless, can I safely convert between unsigned char and char, and back, without losing precision and without risking undefined (or implementation-defined) behavior?
You can safely convert from char to unsigned char without losing precision (nor width nor information), but converting in the other direction -- unsigned char to char -- can lead to implementation-defined behavior.
The C99 standard makes certain guarantees which enable us to convert safely from char to unsigned char.
In section 6.2.5 paragraph 15:
The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.
Here, we are guaranteed that char will have the same range, representation, and behavior as signed char or unsigned char. If the implementation chooses the unsigned char option, then the conversion from char to unsigned char is essentially that of unsigned char to unsigned char -- thus no width nor information is lost and there are no issues.
The conversion for the signed char option is not as intuitive, but is implicitly guaranteed to preserve precision.
In section 6.2.5 paragraph 6:
For each of the signed integer types, there is a corresponding (but different) unsigned
integer type (designated with the keyword unsigned) that uses the same amount of
storage (including sign information) and has the same alignment requirements.
In 6.2.6.1 paragraph 3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be
represented using a pure binary notation.
In section 6.2.6.2 paragraph 2:
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as
the same bit in the object representation of the corresponding unsigned type (if there are
M value bits in the signed type and N in the unsigned type, then M <= N).
First, signed char is guaranteed to occupy the same amount of storage as an unsigned char, as are all signed integers in respect to their unsigned counterparts.
Second, unsigned char is guaranteed to have a pure binary representation (i.e. no padding bits and no sign bit).
signed char is required to have exactly one sign bit, and no more than the same number of value bits as unsigned char.
Given these three facts, we can prove via pigeonhole principle that the signed char type has at most one less than the number of value bits as the unsigned char type. Similarly, signed char can safely be converted to unsigned char with not only no loss of precision, but no loss of width or information as well:
unsigned char has storage size of N bits.
signed char must have the same storage size of N bits.
unsigned char has no padding or sign bits and therefore has N value bits
signed char can have at most N non-padding bits, and must allocate exactly one bit as the sign bit.
signed char can have at most N-1 value bits and exactly one sign bit
All signed char bits therefore match up one-to-one to the respective unsigned char value bits; in other words, for any given signed char value, there is a unique unsigned char representation.
/* binary representation prefix: 0b */
(signed char)(-25) = 0b11100111
(unsigned char)(231) = 0b11100111
Unfortunately, converting from unsigned char to char can lead to implementation-defined behavior. For example, if char is defined by the implementation to behave as signed char, then an unsigned char variable may hold a value that is outside the range of values representable by a signed char. In such cases, either the result is implementation-defined or an implementation-defined signal is raised.
In section 6.3.1.3 paragraph 3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Answer 3
Question:
Does this mean that it is not safe to store an unsigned value in a signed variable of the same type?
Trying to convert an unsigned type value to a signed type value can result in implementation-defined behavior if the unsigned type value cannot be represented in the new signed type.
unsigned foo = UINT_MAX;
signed bar = foo; /* possible implementation-defined behavior */
In section 6.3.1.3 paragraph 3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
An implementation-defined result would be any value returned within the range of values representable by the new signed type. An implementation could theoretically return the same value consistently (e.g. 42) for these cases and thus loss information occurs -- i.e. there is no guarantee that converting from unsigned to signed to back to unsigned will result in the same original unsigned value.
An implementation-defined signal is that which conforms to the rules laid out in section 7.14 of the C99 standard; an implementation is permitted to define additional conforming signals which are not explicitly enumerated by the C99 standard.
In this particular case, an implementation could theoretically raise the SIGTERM signal which requests the termination of the program. Thus, attempting to convert an unsigned type value to signed type could result in a program termination.
Answer 4
Question:
Does foo == bar evaluate to a false value, even if -1 is equivalent to 255 when an explicit (unsigned char) cast is used?
Consider the following code:
signed char foo = -1;
unsigned char bar = 255;
if((unsigned char)foo == bar){
printf("same\n");
}
Although signed char and unsigned char values are promoted to at least int before the evaluation of a conditional expression, the explicit unsigned char cast will convert the signed char value to unsigned char before the integer promotions occur. Furthermore, converting to an unsigned value is well-defined in the C99 standard and does not lead to implementation-defined behavior.
In section 6.3.1.3 paragraph 2:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
This the conditional expression essentially becomes 255 = 255 which evaluates to true.
until the value is in the range of the new type.
Answer 5
Questions:
Does this mean that not even an explicit conversion is safe?
In general, an explicit cast to char for a value outside the range of values representable by signed char can lead to implementation-defined behavior (see Answer 3). A conversion need not be implicit for section 6.3.1.3 paragraph 3 of the C99 standard to apply.
"does this mean that char could maintain its 'signedness' throughout the comparison?" yes; -1 as a signed char will be promoted to a signed int, which will retain its -1 value. As for the unsigned char, it will also keep its 255 value when being promoted, so yes, the comparison will be false. If you want it to evaluate to true, you will need an explicit cast.
It has to do with how the memory for the char's are stored, in an unsigned char, all 8 bits are used to represent the value of the char while a signed char uses only 7 bits for the number and the 8'th bit to represent the sign.
For an example, lets take a simpler 3 bit value (I will call this new value type tinychar):
bits unsigned signed
000 0 0
001 1 1
010 2 2
011 3 3
100 4 -4
101 5 -3
110 6 -2
111 7 -1
By looking at this chart, you can see the difference in value between a signed and an unsigned tinychar based on how the bits are arranged. Up until you start getting into the negative range, the values are identical for both types. However, once you reach the point where the left-most bit changes to 1, the value suddenly becomes a negative for the signed. The way this works is if you reach the maximum positive value (3) and then add one more you end up with the maximum negative value (-4) and if you subtract one from 0 you will underflow and cause the signed tinychar to become -1 while an unsigned tinychar would become 7. You can also see the equivalence (==) between an unsigned 7 and the signed -1 tinychar because the bits are the same (111) for both.
Now if you expand this to have a total of 8 bits, you should see similar results.
I've tested your code and it doesn't compare (signed char)-1 and (unsigned char)255 the same.
You should convert signed char into unsigned char first, because it doesn't use the MSB sign bit in operations.
I have bad experience with using signed char type for buffer operations. Things like your problem then happen. Then be sure you have turned on all warnings during compilation and try to fix them.
Suppose I have the following C code.
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
What implicit conversions are going on here, and is this code safe for all values of u and i? (Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
Short Answer
Your i will be converted to an unsigned integer by adding UINT_MAX + 1, then the addition will be carried out with the unsigned values, resulting in a large result (depending on the values of u and i).
Long Answer
According to the C99 Standard:
6.3.1.8 Usual arithmetic conversions
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In your case, we have one unsigned int (u) and signed int (i). Referring to (3) above, since both operands have the same rank, your i will need to be converted to an unsigned integer.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Now we need to refer to (2) above. Your i will be converted to an unsigned value by adding UINT_MAX + 1. So the result will depend on how UINT_MAX is defined on your implementation. It will be large, but it will not overflow, because:
6.2.5 (9)
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Bonus: Arithmetic Conversion Semi-WTF
#include <stdio.h>
int main(void)
{
unsigned int plus_one = 1;
int minus_one = -1;
if(plus_one < minus_one)
printf("1 < -1");
else
printf("boring");
return 0;
}
You can use this link to try this online: https://repl.it/repls/QuickWhimsicalBytes
Bonus: Arithmetic Conversion Side Effect
Arithmetic conversion rules can be used to get the value of UINT_MAX by initializing an unsigned value to -1, ie:
unsigned int umax = -1; // umax set to UINT_MAX
This is guaranteed to be portable regardless of the signed number representation of the system because of the conversion rules described above. See this SO question for more information: Is it safe to use -1 to set all bits to true?
Conversion from signed to unsigned does not necessarily just copy or reinterpret the representation of the signed value. Quoting the C standard (C99 6.3.1.3):
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
In general, conversions in C are defined to operate on values, not on representations.
To answer the original question:
unsigned int u = 1234;
int i = -5678;
unsigned int result = u + i;
The value of i is converted to unsigned int, yielding UINT_MAX + 1 - 5678. This value is then added to the unsigned value 1234, yielding UINT_MAX + 1 - 4444.
(Unlike unsigned overflow, signed overflow invokes undefined behavior. Wraparound is common, but is not guaranteed by the C standard -- and compiler optimizations can wreak havoc on code that makes unwarranted assumptions.)
Referring to The C Programming Language, Second Edition (ISBN 0131103628),
Your addition operation causes the int to be converted to an unsigned int.
Assuming two's complement representation and equally sized types, the bit pattern does not change.
Conversion from unsigned int to signed int is implementation dependent. (But it probably works the way you expect on most platforms these days.)
The rules are a little more complicated in the case of combining signed and unsigned of differing sizes.
When converting from signed to unsigned there are two possibilities. Numbers that were originally positive remain (or are interpreted as) the same value. Number that were originally negative will now be interpreted as larger positive numbers.
When one unsigned and one signed variable are added (or any binary operation) both are implicitly converted to unsigned, which would in this case result in a huge result.
So it is safe in the sense of that the result might be huge and wrong, but it will never crash.
As was previously answered, you can cast back and forth between signed and unsigned without a problem. The border case for signed integers is -1 (0xFFFFFFFF). Try adding and subtracting from that and you'll find that you can cast back and have it be correct.
However, if you are going to be casting back and forth, I would strongly advise naming your variables such that it is clear what type they are, eg:
int iValue, iResult;
unsigned int uValue, uResult;
It is far too easy to get distracted by more important issues and forget which variable is what type if they are named without a hint. You don't want to cast to an unsigned and then use that as an array index.
What implicit conversions are going on here,
i will be converted to an unsigned integer.
and is this code safe for all values of u and i?
Safe in the sense of being well-defined yes (see https://stackoverflow.com/a/50632/5083516 ).
The rules are written in typically hard to read standards-speak but essentially whatever representation was used in the signed integer the unsigned integer will contain a 2's complement representation of the number.
Addition, subtraction and multiplication will work correctly on these numbers resulting in another unsigned integer containing a twos complement number representing the "real result".
division and casting to larger unsigned integer types will have well-defined results but those results will not be 2's complement representations of the "real result".
(Safe, in the sense that even though result in this example will overflow to some huge positive number, I could cast it back to an int and get the real result.)
While conversions from signed to unsigned are defined by the standard the reverse is implementation-defined both gcc and msvc define the conversion such that you will get the "real result" when converting a 2's complement number stored in an unsigned integer back to a signed integer. I expect you will only find any other behaviour on obscure systems that don't use 2's complement for signed integers.
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
https://msdn.microsoft.com/en-us/library/0eex498h.aspx
Horrible Answers Galore
Ozgur Ozcitak
When you cast from signed to unsigned
(and vice versa) the internal
representation of the number does not
change. What changes is how the
compiler interprets the sign bit.
This is completely wrong.
Mats Fredriksson
When one unsigned and one signed
variable are added (or any binary
operation) both are implicitly
converted to unsigned, which would in
this case result in a huge result.
This is also wrong. Unsigned ints may be promoted to ints should they have equal precision due to padding bits in the unsigned type.
smh
Your addition operation causes the int
to be converted to an unsigned int.
Wrong. Maybe it does and maybe it doesn't.
Conversion from unsigned int to signed
int is implementation dependent. (But
it probably works the way you expect
on most platforms these days.)
Wrong. It is either undefined behavior if it causes overflow or the value is preserved.
Anonymous
The value of i is converted to
unsigned int ...
Wrong. Depends on the precision of an int relative to an unsigned int.
Taylor Price
As was previously answered, you can
cast back and forth between signed and
unsigned without a problem.
Wrong. Trying to store a value outside the range of a signed integer results in undefined behavior.
Now I can finally answer the question.
Should the precision of int be equal to unsigned int, u will be promoted to a signed int and you will get the value -4444 from the expression (u+i). Now, should u and i have other values, you may get overflow and undefined behavior but with those exact numbers you will get -4444 [1]. This value will have type int. But you are trying to store that value into an unsigned int so that will then be cast to an unsigned int and the value that result will end up having would be (UINT_MAX+1) - 4444.
Should the precision of unsigned int be greater than that of an int, the signed int will be promoted to an unsigned int yielding the value (UINT_MAX+1) - 5678 which will be added to the other unsigned int 1234. Should u and i have other values, which make the expression fall outside the range {0..UINT_MAX} the value (UINT_MAX+1) will either be added or subtracted until the result DOES fall inside the range {0..UINT_MAX) and no undefined behavior will occur.
What is precision?
Integers have padding bits, sign bits, and value bits. Unsigned integers do not have a sign bit obviously. Unsigned char is further guaranteed to not have padding bits. The number of values bits an integer has is how much precision it has.
[Gotchas]
The macro sizeof macro alone cannot be used to determine precision of an integer if padding bits are present. And the size of a byte does not have to be an octet (eight bits) as defined by C99.
[1] The overflow may occur at one of two points. Either before the addition (during promotion) - when you have an unsigned int which is too large to fit inside an int. The overflow may also occur after the addition even if the unsigned int was within the range of an int, after the addition the result may still overflow.