Unsigned char overflow with subtraction

Unsigned char overflow with subtraction - c

I am trying to work out how unsigned overflow works with subtraction, so I wrote the following test to try it out:
#include<stdio.h>
#include<stdlib.h>
unsigned char minWrap(unsigned char a, unsigned char b) {
return a > b ? a - b : a + (0xff - b) + 1;
}
int main(int argc, char *argv[]) {
unsigned char a = 0x01, b = 0xff;
unsigned char c = a - b;
printf("0x%02x 0x%02x 0x%02x\n", a-b, c, minWrap(a,b));
return EXIT_SUCCESS;
}
Which gave as output:
0xffffff02 0x02 0x02
I would have expected the output to be the same three times. My question is: is it always safe to add/subtract unsigned chars and expect them to wrap around at 0xff?
Or more general, is it safe to compute with uintN_t and expect the result to be modulo 2^N?

is it always safe to add/subtract unsigned chars and expect them to wrap around at 0xff?
No. In C, objects of type char go through the usual integer promotions. So if the range of char fits in int (usual), it is converts to int, else unsigned.
a - b --> 0x01 - 0xFF --> 1 - 255 --> -254.
The below is undefined behavior as %x does not match an int and the value of -254 is not in the unsigned range (See #EOF comment). A typical behavior is a conversion to unsigned
printf("0x%02x\n", a-b);
// 0xffffff02
it safe to compute with uintN_t and expect the result to be modulo 2^N?
Yes. But be sure to make the result of type uintN_t and avoid unexpected usual integer promotions.
#include <inttypes.h>
uint8_t a = 0x01, b = 0xff;
uint8_t diff = a - b;
printf("0x%02x\n", (unsigned) diff);
printf("0x%02" PRTx8 "\n", diff);

a-b in the printf line is evaluated after a and b are promoted to int. Also, the value is being treated as an unsigned int due to the use of %x in the format specifier by your run time environment.
It's equivalent to:
int a1 = a;
int b1 = b;
int x = a1 - b1;
printf("0x%02x 0x%02x 0x%02x\n", x, c, minWrap(a,b));
Section 6.3.1.8 Usual arithmetic conversions of the C99 standard has more details.
In theory use of an int when an unsigned int is expected in printf is cause for undefined behavior. A lenient run time environment, like you have, treats the int as an unsigned int and proceeds to print the value.

From 6.3.1.8/1 of the standard concerning integer conversions:
The integer promotions are performed on both operands. Then the following rules are applied to the promoted operands
If both operands have the same type, then no further conversion is
needed.
Otherwise, if both operands have signed integer types or both have
unsigned integer types, the operand with the type of lesser integer
conversion rank is converted to the type of the operand with greater
rank.
Otherwise, if the operand that has unsigned integer type has rank
greater or equal to the rank of the type of the other operand, then
the operand with signed integer type is converted to the type of the
operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, then the operand with unsigned integer type is converted
to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
In this case, the wrap-around is well defined. In the expression a-b, because both operands are of type unsigned char, they are first promoted to int and the operation is performed. If this value was assigned to an unsigned char, it would be properly truncated. However, you're passing this value to printf with a %x format specifier which expects an unsigned int. To display it correctly, use %hhx which expects an unsigned char.

Related

Why is a number sign-extended when it is cast to an unsigned type?

I was studying how C stores data in memory by bit patterns.
However I have confronted some issues when it comes to printf formatting.
I have saved a variable as -10 (I do understand two's complement)
and another variable as 246. Those two variables have bit patterns of 11110110 (which is 0xF6).
I was trying to print out a value using the unsigned int hexadecimal format in printf.
char a = -10;
unsigned char b = 246;
printf("a : %x , b : %x\n" , (unsigned int) a, (unsigned int) b);
//a : fffffff6 , b : f6
Both integers have the same bit pattern of 0xF6. However, if I do perform type casting into unsigned int, the result varies. The result is 0xFFFFFFF6 for a, while b remains the same.
For a signed character, it seems to me that type casting process made the unsigned char into an integer and filled all the empty bits with 1.
Is this due to their signedness? Or is this just Undefined Behavior?

In this expression
(unsigned int) a
the integer promotions are applied to the object a.
From the C Standard (6.3.1.1 Boolean, characters, and integers)
2 The following may be used in an expression wherever an int or
unsigned int may be used:
— An object or expression with an integer type (other than int or
unsigned int) whose integer conversion rank is less than or equal to
the rank of int and unsigned int.
— A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions. 58) All other types are unchanged by the integer
promotions.
and
3 The integer promotions preserve value including sign. As discussed
earlier, whether a ‘‘plain’’ char is treated as signed is
implementation-defined.
If you want that in the result object after casting the character object would be represented as having the type unsigned char then you have to write
(unsigned char) a
As the value of the promoted expression (unsigned char) a can be represented in the type unsigned int then the second cast (to unsigned int) is not required. The C Standard allows to use arguments pf the type int instead of the type unsigned int if the value is represented in the both types. You could just write
printf("a : %x , b : %x\n" , (unsigned char) a, (unsigned int) b);

Unsigned integer and unsigned char holding same value yet behaving differently why?

Why is it so that
unsigned char k=-1
if(k==-1)
is false
unsigned int k=-1
if(k==-1)
is true

For the purpose of demonstration let's assume 8-bit chars and 32-bit ints.
unsigned char k=-1;
k is assigned the value 255.
if(k==-1)
The left-hand side of the == operator is an unsigned char. The right-hand side is an int. Since all possible values of an unsigned char can fit inside an int, the left-hand side is converted to an int (this is performed due the the integer promotions, quoted below). This results in the comparison (255 == -1), which is false.
unsigned int k=-1
k is assigned the value 4294967295
if(k==-1)
This time, the left-hand side (an unsigned int) cannot fit within an int. The standard says that in this case, both values are converted to an unsigned int. So this results in the comparison (4294967295 == 4294967295), which is true.
The relevant quotes from the standard:
Integer promotions: (C99, 6.3.1.1p2)
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.
Usual arithmetic conversions: (6.3.1.8).
[For integral operands, ] the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
- If both operands have the same type, then no further conversion is needed.
...
- Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
...

§6.3.1.1p2 of the C11 standard draft (n1570.pdf):
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
In your second case, an int can't represent unsigned int k because that's out of range. Both operands end up being converted to unsigned int and compare equal.

As there is no sign extension in unsigned promotion the result is defferent.
unsigned char k is promoted from unsigned char (value=255) to int (value=255).
In the case of unsigned int k, -1 is promoted from int (value=-1) to unsigned int (value=2^32-1).

unsigned char k = -1;
-1 is an integer literal of type int, which is equivalent to signed int. When you assign a large signed integer to a smaller unsigned type, the result will get truncated in undefined ways, the C standard does not guarantee what will happen.
In the real world outside the C standard, this is what's most likely (assuming 32-bit CPU, two's complement):
-1 is 0xFFFFFFFF. The least significant byte of 0xFFFFFFFF will get assigned to k. k == 255. Try to print it using printf("%u") and see for yourself.
In the case of unsigned int k=-1, -1 is still a signed int. But it gets implicitly (silently) promoted to an unsigned int when it is stored in k. When you later compare k == -1, the right side -1 will once more get promoted to an unsigned type, and will compare equal with the data stored in k.

Why are the results of integer promotion different?

Please look at my test code:
#include <stdlib.h>
#include <stdio.h>
#define PRINT_COMPARE_RESULT(a, b) \
if (a > b) { \
printf( #a " > " #b "\n"); \
} \
else if (a < b) { \
printf( #a " < " #b "\n"); \
} \
else { \
printf( #a " = " #b "\n" ); \
}
int main()
{
signed int a = -1;
unsigned int b = 2;
signed short c = -1;
unsigned short d = 2;
PRINT_COMPARE_RESULT(a,b);
PRINT_COMPARE_RESULT(c,d);
return 0;
}
The result is the following:
a > b
c < d
My platform is Linux, and my gcc version is 4.4.2.
I am surprised by the second line of output.
The first line of output is caused by integer promotion. But why is the result of the second line different?
The following rules are from C99 standard:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
I think both of the two comparisons should belong to the same case, the second case of integer promotion.

When you use an arithmetic operator, the operands go through two conversions.
Integer promotions: If int can represent all values of the type, then the operand is promoted to int. This applies to both short and unsigned short on most platforms. The conversion performed on this stage is done on each operand individually, without regard for the other operand. (There are more rules, but this is the one that applies.)
Usual arithmetic conversions: If you compare an unsigned int against a signed int, since neither includes the entire range of the other, and both have the same rank, then both are converted to the unsigned type. This conversion is done after examining the type of both operands.
Obviously, the "usual arithmetic conversions" don't always apply, if there are not two operands. This is why there are two sets of rules. One gotcha, for example, is that shift operators << and >> don't do usual arithmetic conversions, since the type of the result should only depend on the left operand (so if you see someone type x << 5U, then the U stands for "unnecessary").
Breakdown: Let's assume a typical system with 32-bit int and 16-bit short.
int a = -1; // "signed" is implied
unsigned b = 2; // "int" is implied
if (a < b)
puts("a < b"); // not printed
else
puts("a >= b"); // printed
First the two operands are promoted. Since both are int or unsigned int, no promotions are done.
Next, the two operands are converted to the same type. Since int can't represent all possible values of unsigned, and unsigned can't represent all possible values of int, there is no obvious choice. In this case, both are converted to unsigned.
When converting from signed to unsigned, 232 is repeatedly added to the signed value until it is in the range of the unsigned value. This is actually a noop as far as the processor is concerned.
So the comparison becomes if (4294967295u < 2u), which is false.
Now let's try it with short:
short c = -1; // "signed" is implied
unsigned short d = 2;
if (c < d)
puts("c < d"); // printed
else
puts("c >= d"); // not printed
First, the two operands are promoted. Since both can be represented faithfully by int, both are promoted to int.
Next, they are converted to the same type. But they already are the same type, int, so nothing is done.
So the comparison becomes if (-1 < 2), which is true.
Writing good code: There's an easy way to catch these "gotchas" in your code. Just always compile with warnings turned on, and fix the warnings. I tend to write code like this:
int x = ...;
unsigned y = ...;
if (x < 0 || (unsigned) x < y)
...;
You have to watch out that any code you do write doesn't run into the other signed vs. unsigned gotcha: signed overflow. For example, the following code:
int x = ..., y = ...;
if (x + 100 < y + 100)
...;
unsigned a = ..., b = ...;
if (a + 100 < b + 100)
...;
Some popular compilers will optimize (x + 100 < y + 100) to (x < y), but that is a story for another day. Just don't overflow your signed numbers.
Footnote: Note that while signed is implied for int, short, long, and long long, it is NOT implied for char. Instead, it depends on the platform.

Taken from the C++ standard:
4.5 Integral promotions [conv.prom] 1 An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be
converted to an rvalue of type int if int can represent all the values of the
source type; otherwise, the source rvalue can be converted to an
rvalue of type unsigned int.
In practice it means, that all operations (on the types in the list) are actually evaluated on the type int if it can cover the whole value set you are dealing with, otherwise it is carried out on unsigned int.
In the first case the values are compared as unsigned int because one of them was unsigned int and this is why -1 is "greater" than 2. In the second case the values a compared as signed integers, as int covers the whole domain of both short and unsigned short and so -1 is smaller than 2.
(Background story: Actually, all this complex definition about covering all the cases in this way is resulting that the compilers can actually ignore the actual type behind (!) :) and just care about the data size.)

The conversion process for C++ is described as the usual arithmetic conversions. However, I think the most relevant rule is at the sub-referenced section conv.prom: Integral promotions 4.6.1:
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank ([conv.rank]) is less than the
rank of int can be converted to a prvalue of type int if int can
represent all the values of the source type; otherwise, the source
prvalue can be converted to a prvalue of type unsigned int.
The funny thing there is the use of the word "can", which I think suggests that this promotion is performed at the discretion of the compiler.
I also found this C-spec snippet that hints at the omission of promotion:
11 EXAMPLE 2 In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2;
the ``integer promotions'' require that the abstract machine promote the value of each variable to int size
and then add the two ints and truncate the sum. Provided the addition of two chars can be done without
overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only
produce the same result, possibly omitting the promotions.
There is also the definition of "rank" to be considered. The list of rules is pretty long, but as it applies to this question "rank" is straightforward:
The rank of any unsigned integer type shall equal the rank of the
corresponding signed integer type.

C usual arithmetic conversions

I was reading in the C99 standard about the usual arithmetic conversions.
If both operands have the same type, then no further conversion is
needed.
Otherwise, if both operands have signed integer types or both have
unsigned integer types, the operand with the type of lesser integer
conversion rank is converted to the type of the operand with greater
rank.
Otherwise, if the operand that has unsigned integer type has rank
greater or equal to the rank of the type of the other operand, then
the operand with signed integer type is converted to the type of the
operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, then the operand with unsigned integer type is converted
to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
So let's say I have the following code:
#include <stdio.h>
int main()
{
unsigned int a = 10;
signed int b = -5;
printf("%d\n", a + b); /* 5 */
printf("%u\n", a + b); /* 5 */
return 0;
}
I thought the bolded paragraph applies (since unsigned int and signed int have the same rank. Why isn't b converted to unsigned ? Or perhaps it is converted to unsigned but there is something I don't understand ?
Thank you for your time :-)

Indeed b is converted to unsigned. However what you observed is that b converted to unsigned and then added to 10 gives as value 5.
On x86 32bit this is what happens
b, coverted to unsigned, becomes 4294967291 (i.e. 2**32 - 5)
adding 10 becomes 5 because of wrap-around at 2**32 (2**32 - 5 + 10 = 2**32 + 5 = 5)

0x0000000a plus 0xfffffffb will always be 0x00000005 regardless of whether you are dealing with signed or unsigned types, as long as only 32 bits are used.

Repeating the relevant portion of the code from the question:
unsigned int a = 10;
signed int b = -5;
printf("%d\n", a + b); /* 5 */
printf("%u\n", a + b); /* 5 */
In a + b, b is converted to unsigned int, (yielding UINT_MAX + 1 - 5 by the rule for unsigned-to-signed conversion). The result of adding 10 to this value is 5, by the rules of unsigned arithmetic, and its type is unsigned int. In most cases, the type of a C expression is independent of the context in which it appears. (Note that none of this depends on the representation; conversion and arithmetic are defined purely in terms of numeric values.)
For the second printf call, the result is straightforward: "%u" expects an argument of type unsigned int, and you've given it one. It prints "5\n".
The first printf is a little more complicated. "%d" expects an argument of type int, but you're giving it an argument of type unsigned int. In most cases, a type mismatch like this results in undefined behavior, but there's a special-case rule that corresponding signed and unsigned types are interchangeable as function arguments -- as long as the value is representable in both types (as it is here). So the first printf also prints "5\n".
Again, all this behavior is defined in terms of values, not representations (except for the requirement that a given value has the same representation in corresponding signed and unsigned types). You'd get the same result on a system where signed int and unsigned int are both 37 bits, signed int has 7 padding bits, unsigned int has 11 padding bits, and signed int uses a 1s'-complement or sign-and-magnitude representation. (No such system exists in real life, as far as I know.)

It is converted to unsigned, the unsigned arithmetic just happens to give the result you see.
The result of unsigned arithmetic is equivalent to doing signed arithmetic with two's complement and no out of range exception.

Why is an unsigned int 1 lower than a char y -1?

main() {
unsigned x = 1;
char y = -1;
if (x > y)
printf("x>y");
else
printf("x<=y");
}
I expected x>y,
but I had to change unsigned int to signed int to get the expected result.

If char is equivalent to signed char:
char is promoted to int (Integer Promotions, ISO C99 §6.3.1.1 ¶2)
Since int and unsigned int have the same rank, int is converted to unsigned int (Arithmetic Conversions, ISO C99 §6.3.1.8)
If char is equivalent to unsigned char:
char may be promoted to either int or unsigned int:
If int can represent all unsigned char values (typically because sizeof(int) > sizeof(char)), char is converted to int.
Otherwise (typically because sizeof(char)==sizeof(int)), char is converted to unsigned.
Now we have one operand that is either int or unsigned int, and another that is unsigned int. The first operand is converted to unsigned int.
Integer promotions:
An expression of a type of lower rank that int is converted to int if int can hold all of the values of the original type, to unsigned int otherwise.
Arithmetic conversions:
Try to convert to the larger type. When there is conflict between signed and unsigned, if the larger (including the case where the two types have the same rank) type is unsigned, go with unsigned. Otherwise, go with signed only in the case it can represent all the values of both types.
Conversions to integer types(ISO C99 §6.3.1.3):
Conversion of an out-of-range value to an unsigned integer type is done via wrap-around (modular arithmetic).
Conversion of an out-of-range value to a signed integer type is implementation defined, and can raise a signal (such as SIGFPE).

When using signed and unsigned in single operation the signed got promoted to unsigned by C's automatic type conversion. If the bit patter of -1 is considered an unsigned number then it is a very very high value. So x > y is false.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight