Implicit conversion of unsigned char (arithmetic conversion vs. assignment)

Implicit conversion of unsigned char (arithmetic conversion vs. assignment) - c

In am working on something and came across code similar to the following:
#define MODULUS(a,b) ((a) >= 0 ? (a)%(b) : (b)-(-a)%(b))
unsigned char w;
unsigned char x;
unsigned char y;
char z;
/* Code that assigns values to w,x and y. All values assigned
could have been represented by a signed char. */
z = MODULUS((x - y), w);
It is my understanding that the arithmetic (x - y) will be accomplished prior to any type conversion and the macro will always evaluate to (a)%(b) -- as the result will be an unsigned char which is always greater than or equal to zero. However, the code functions as intended and I think that my understanding is flawed. So...
My questions are these:
Does an implicit type conversion to signed char occur before the expression is evaluated?
Is there a situation (for example, if the unsigned values were large enough that they could not be represented by a signed value) where the above code would not work?

Does an implicit type conversion to signed char occur before the expression is evaluated?
No, a conversion to int occurs before the expression x - y is evaluated¹. Thus the result can be negative.
Is there a situation (for example, if the unsigned values were large enough that they could not be represented by a signed value) where the above code would not work?
If sizeof int == 1, the integer promotion would promote the unsigned chars to unsigned ints, and that could produce wrong results because before the modulus by w, a modulus by UINT_MAX + 1 is performed due to unsigned arithmetic.
¹ The default integer promotion.

Related

16-bit unsigned variable casting to generate correct result [duplicate]

I'm trying to subtract two unsigned ints and compare the result to a signed int (or a literal). When using unsigned int types the behavior is as expected. When using uint16_t (from stdint.h) types the behavior is not what I would expect. The comparison was done using gcc 4.5.
Given the following code:
unsigned int a;
unsigned int b;
a = 5;
b = 20;
printf("%u\n", (a-b) < 10);
The output is 0, which is what I expected. Both a and b are unsigned, and b is larger than a, so the result is a large unsigned number which is greater than 10. Now if I change a and b to type uint16_t:
uint16_t a;
uint16_t b;
a = 5;
b = 20;
printf("%u\n", (a-b) < 10);
The output is 1. Why is this? Is the result of subtraction between two uint16_t types stored in an int in gcc? If I change the 10 to 10U the output is again 0, which seems to support this (if the subtraction result is stored as an int and the comparison is made against an unsigned int than the subtraction results will be converted to an unsigned int).

Because calculations are not done with types below int / unsigned int (char, short, unsigned short etc; but not long, unsigned long etc), but they are first promoted to one of int or unsigned int. "uint16_t" is possibly "unsigned short" on your implementation, which is promoted to "int" on your implementation. So the result of that calculation then is "-15", which is smaller than 10.
On older implementations that calculate with 16bit, "int" may not be able to represent all values of "unsigned short" because both have the same bitwidth. Such implementations must promote "unsigned short" to "unsigned int". On such implementations, your comparison results in "0".

Before both the - and < operations are performed, a set of conversions called the usual arithmetic conversions are applied to convert the operands to a common type. As part of this process, the integer promotions are applied, which promote types narrower than int or unsigned int to one of those two types.
In the first case, the types of a and b are unsigned int, so no change of types occurs due to the - operator - the result is an unsigned int with the large positive value UINT_MAX - 14. Then, because int and unsigned int have the same rank, the value 10 with type int is converted to unsigned int, and the comparison is then performed resulting in the value 0.
In the second case, it is apparent that on your implementation the type int can hold all the values of the type uint16_t. This means that when the integer promotions are applied, the values of a and b are promoted to type int. The subtraction is performed, resulting in the value -15 with type int. Both operands to the < are already int, so no conversions are performed; the result of the < is 1.
When you use a 10U in the latter case, the result of a - b is still -15 with type int. Now, however, the usual arithmetic conversions cause this value to be converted to unsigned int (just as the 10 was in the first example), which results in the value UINT_MAX - 14; the result of the < is 0.

[...] Otherwise, the integer promotions are performed on both operands. (6.3.1.8)

When uin16_t is a sub-range of int, (a-b) < 10 is performed using int math.
Use an unsigned constant to gently nudge the left side to unsigned math.
// printf("%u\n", (a-b) < 10);
printf("%d\n", (0u + a - b) < 10); // Using %d as the result of a compare is int.
// or to quiet some picky warnings
printf("%d\n", (0u + a - b) < 10u);
(a - b) < 10u also works with this simple code, yet the idea I am suggesting to to perform both sides as unsigned math as that may be needed with more complex code.

Safe low 32 bits masking of uint64_t

Assume the following code:
uint64_t g_global_var;
....
....
void foo(void)
{
uint64_t local_32bit_low = g_global_var & 0xFFFFFFFF;
....
}
With the current toolchain, this code works as expected, local_32bit_low indeed contains the low 32 bits of g_global_var.
I wonder if it is guaranteed by the standard C that this code will always work as expected?
My concern is that the compiler may treat 0xFFFFFFFF as integer value of -1 and when promoting to uint64_t it would become 0xFFFFFFFFFFFFFFFF.
P.S.
I know that to be on the safe side it is better to use 0xFFFFFFFFULL in this case. The point is that I saw it in a legacy code and I wonder if it worth to be fixed or not.

There is no problem. The integer constant 0xFFFFFFFF has the type that is able to store the value as is.
According to the C Standard (6.4.4.1 Integer constants)
5 The type of an integer constant is the first of the corresponding
list in which its value can be represented
So this value is stored as a positive value.
If the type unsigned int is a 32-bit integer type then the constant will have the type unsigned int.
Otherwise it will have one of the types that can store the value.
long int
unsigned long int
long long int
unsigned long long int
Due to the usual arithmetic conversions in the expression
g_global_var & 0xFFFFFFFF;
it is promoted like
0x00000000FFFFFFFF
Pay attention to that in C there is no negative integer constants. For example an expression like
-10
consists of two sub-expressions: the primary expression 10 and the sub-expression with the unary operator - -19 that coincides with the full expression.

0xffffffff is not -1, ever. It may convert to -1 if you cast or coerce (e.g. by assignment) it to a signed 32-bit type, but integer literals in C always have their mathematical value unless they overflow.
For decimal literals, the type is the narrowest signed type that can represent the value. For hex literals, unsigned types are used before going up to the next wider signed type. So, in the common case where int is 32-bit, 0xffffffff would have type unsigned int. If you wrote it as decimal, it would have type long (if long is 64-bit) or long long (if long is only 32-bit).

The type of an unsuffixed hexadecimal or octal constant is the first of the following list in which its value can be represented:
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
(For unsuffixed decimal constants, remove the unsigned types from the above list.)
The hexadecimal constant 0xFFFFFFFF can definitely be represented by unsigned long int, so its type will be the first of int, unsigned int, long int or unsigned long int that can represent its value.
Note that although 0xFFFFFFFF > 0 always evaluates to 1 (true), it is possible for 0xFFFFFFFF > -1 to evaluate to either 0 (false) or 1 (true) on different implementations. So you need to be careful when comparing integer constants with each other or with other objects of integer type.

Others have answered the question, just a recomendation, next time (if you are under C11) you can check the type of the expression by yourself using _Generic
#include <stdio.h>
#include <stdint.h>
#define print_type(x) _Generic((x), \
int64_t: puts("int64_t"), \
uint64_t: puts("uint64_t"), \
default: puts("unknown") \
)
uint64_t g_global_var;
int main(void)
{
print_type(g_global_var & 0xFFFFFFFF);
return 0;
}
The ouput is
uint64_t

int promotion: Is the following well-defined?

Suppose that on a C implementation (e.g. on a x86 C compiler) USHRT_MAX = 65535 and INT_MAX = 2147483647. Is, then, the following statement well-defined?
unsigned short product = USHRT_MAX * USHRT_MAX;
According to the following in the C99 standard both operands are promoted to int (since int can represent all possible values of unsigned short) and, therefore, the result is not well-defined, since an overflow will occur (65535 ^ 2 = 4294836225 > 2147483647), which means that the value of product is not well-defined:
6.3.1.1-1
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.(48) All other types are
unchanged by the integer promotions.
48) The integer promotions are applied only: as part of the usual
arithmetic conversions, to certain argument expressions, to the
operands of the unary +, -, and ~ operators, and to both operands of
the shift operators, as specified by their respective subclauses.
However, according to the following, the result is well-defined, since computations involving unsigned operands do not overflow:
6.2.5-9
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same.(31) A computation involving
unsigned operands can never overflow, because a result that cannot be
represented by the resulting unsigned integer type is reduced modulo
the number that is one greater than the largest value that can be
represented by the resulting type.
Does the variable product in the aforementioned statement have a well-defined value?
EDIT: What should happen in the following case?
unsigned short lhs = USHRT_MAX;
unsigned short rhs = USHRT_MAX;
unsigned short product = lhs * rhs;

The promotion wins.
Says section 5.2.4.2.1 about the constants USHRT_MAX etc.:
The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions.
So the multiplication is on ints, and involves no unsigned operands, unambiguously, there is no conforming way to implement USHRT_MAX to get an operation involving unsigned operands if USHRT_MAX < INT_MAX. Thus you have overflow, and undefined behaviour.
Regarding the added question
EDIT: What should happen in the following case?
unsigned short lhs = USHRT_MAX;
unsigned short rhs = USHRT_MAX;
unsigned short product = lhs * rhs;
That is exactly the same situation. The operands of * are subject to the integer promotions, all values of type unsigned short can be represented as ints by the assumption on the values of USHRT_MAX and INT_MAX, so the multiplication is on ints, and with the specified values overflows.
You need to convert at least one operand to an unsigned type that is not promoted to int in oerder to have the multiplication be performed on unsigned operands.

You get UB since by the time the multiplication operator is applied, its operands are already signed integers (because of the promotions to int occurring first).
You can work-around that with this:
unsigned short product = USHRT_MAX * (unsigned)USHRT_MAX;
Proof that (unsigned)some_integer stays unsigned:
#include <stdio.h>
int main(void)
{
printf("1u * (-1) = %f\n", (((unsigned)1) * (-1)) + 0.0);
printf("1 * (-1) = %f\n", (1 * (-1)) + 0.0);
return 0;
}
Output (ideone):
1u * (-1) = 4294967295.000000
1 * (-1) = -1.000000
Good catch, btw.

Why are the results of integer promotion different?

Please look at my test code:
#include <stdlib.h>
#include <stdio.h>
#define PRINT_COMPARE_RESULT(a, b) \
if (a > b) { \
printf( #a " > " #b "\n"); \
} \
else if (a < b) { \
printf( #a " < " #b "\n"); \
} \
else { \
printf( #a " = " #b "\n" ); \
}
int main()
{
signed int a = -1;
unsigned int b = 2;
signed short c = -1;
unsigned short d = 2;
PRINT_COMPARE_RESULT(a,b);
PRINT_COMPARE_RESULT(c,d);
return 0;
}
The result is the following:
a > b
c < d
My platform is Linux, and my gcc version is 4.4.2.
I am surprised by the second line of output.
The first line of output is caused by integer promotion. But why is the result of the second line different?
The following rules are from C99 standard:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
I think both of the two comparisons should belong to the same case, the second case of integer promotion.

When you use an arithmetic operator, the operands go through two conversions.
Integer promotions: If int can represent all values of the type, then the operand is promoted to int. This applies to both short and unsigned short on most platforms. The conversion performed on this stage is done on each operand individually, without regard for the other operand. (There are more rules, but this is the one that applies.)
Usual arithmetic conversions: If you compare an unsigned int against a signed int, since neither includes the entire range of the other, and both have the same rank, then both are converted to the unsigned type. This conversion is done after examining the type of both operands.
Obviously, the "usual arithmetic conversions" don't always apply, if there are not two operands. This is why there are two sets of rules. One gotcha, for example, is that shift operators << and >> don't do usual arithmetic conversions, since the type of the result should only depend on the left operand (so if you see someone type x << 5U, then the U stands for "unnecessary").
Breakdown: Let's assume a typical system with 32-bit int and 16-bit short.
int a = -1; // "signed" is implied
unsigned b = 2; // "int" is implied
if (a < b)
puts("a < b"); // not printed
else
puts("a >= b"); // printed
First the two operands are promoted. Since both are int or unsigned int, no promotions are done.
Next, the two operands are converted to the same type. Since int can't represent all possible values of unsigned, and unsigned can't represent all possible values of int, there is no obvious choice. In this case, both are converted to unsigned.
When converting from signed to unsigned, 232 is repeatedly added to the signed value until it is in the range of the unsigned value. This is actually a noop as far as the processor is concerned.
So the comparison becomes if (4294967295u < 2u), which is false.
Now let's try it with short:
short c = -1; // "signed" is implied
unsigned short d = 2;
if (c < d)
puts("c < d"); // printed
else
puts("c >= d"); // not printed
First, the two operands are promoted. Since both can be represented faithfully by int, both are promoted to int.
Next, they are converted to the same type. But they already are the same type, int, so nothing is done.
So the comparison becomes if (-1 < 2), which is true.
Writing good code: There's an easy way to catch these "gotchas" in your code. Just always compile with warnings turned on, and fix the warnings. I tend to write code like this:
int x = ...;
unsigned y = ...;
if (x < 0 || (unsigned) x < y)
...;
You have to watch out that any code you do write doesn't run into the other signed vs. unsigned gotcha: signed overflow. For example, the following code:
int x = ..., y = ...;
if (x + 100 < y + 100)
...;
unsigned a = ..., b = ...;
if (a + 100 < b + 100)
...;
Some popular compilers will optimize (x + 100 < y + 100) to (x < y), but that is a story for another day. Just don't overflow your signed numbers.
Footnote: Note that while signed is implied for int, short, long, and long long, it is NOT implied for char. Instead, it depends on the platform.

Taken from the C++ standard:
4.5 Integral promotions [conv.prom] 1 An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be
converted to an rvalue of type int if int can represent all the values of the
source type; otherwise, the source rvalue can be converted to an
rvalue of type unsigned int.
In practice it means, that all operations (on the types in the list) are actually evaluated on the type int if it can cover the whole value set you are dealing with, otherwise it is carried out on unsigned int.
In the first case the values are compared as unsigned int because one of them was unsigned int and this is why -1 is "greater" than 2. In the second case the values a compared as signed integers, as int covers the whole domain of both short and unsigned short and so -1 is smaller than 2.
(Background story: Actually, all this complex definition about covering all the cases in this way is resulting that the compilers can actually ignore the actual type behind (!) :) and just care about the data size.)

The conversion process for C++ is described as the usual arithmetic conversions. However, I think the most relevant rule is at the sub-referenced section conv.prom: Integral promotions 4.6.1:
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank ([conv.rank]) is less than the
rank of int can be converted to a prvalue of type int if int can
represent all the values of the source type; otherwise, the source
prvalue can be converted to a prvalue of type unsigned int.
The funny thing there is the use of the word "can", which I think suggests that this promotion is performed at the discretion of the compiler.
I also found this C-spec snippet that hints at the omission of promotion:
11 EXAMPLE 2 In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2;
the ``integer promotions'' require that the abstract machine promote the value of each variable to int size
and then add the two ints and truncate the sum. Provided the addition of two chars can be done without
overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only
produce the same result, possibly omitting the promotions.
There is also the definition of "rank" to be considered. The list of rules is pretty long, but as it applies to this question "rank" is straightforward:
The rank of any unsigned integer type shall equal the rank of the
corresponding signed integer type.

Why did these two program behave differently in ANSI C? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
A riddle (in C)
1.
main()
{
if(-1<(unsigned char)1)
printf("-1 is less than (unsigned char)1:ANSI semantics");
else
printf("-1 NOT less than (unsigned char)1:K&R semantics");
}
2.
int array[] = {23,41,12,24,52,11};
#define TOTAL_ELEMENTS (sizeof(array)/sizeof(array[0]))
main()
{
int d = -1,x;
if(d<=TOTAL_ELEMENTS -2)
x = array[d+1];
}
The first convert unsigned char 1 to a signed variable in ANSI C,
while the second program convert d to an unsigned int that makes the
condition expression return false in ANSI C.
Why did they behave differently?

For the first one the right-hand side is an unsigned char, and all unsigned char values fit into a signed int, so it is converted to signed int.
For the second one the right-hand side is an unsigned int, so the left-hand side is converted from signed int to unsigned int.
See also this CERT document on integer conversions.

starblue explained the first part of your question. I'll take the second part. Because TOTAL_ELEMENTS is a size_t, which is unsigned, the int is converted to that unsigned type. Your size_t is so that int cannot represent all values of it, so the conversion of the int to size_t happens, instead of the size_t to the int.
Conversion of negative numbers to unsigned is perfectly defined: The value wraps around. If you convert -1 to an unsigned int, it ends up at UINT_MAX. That is true whether or not you use twos' complement to represent negative numbers.
The rationale for C document has more information about that value preserving conversion.

Here's the way I remember how automatic conversions are applied:
if the operand sizes are different, the conversion is applied to the smaller operand to make it the same type as the larger operand (with sign extension if the smaller operand is signed)
if the operands are the same size, but one is signed and the other unsigned, then the signed operand is converted to unsigned
While the above may not be true for all implementations, I believe it is correct for all twos-complement implementations.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Implicit conversion of unsigned char (arithmetic conversion vs. assignment) - c

Related

16-bit unsigned variable casting to generate correct result [duplicate]

Safe low 32 bits masking of uint64_t

int promotion: Is the following well-defined?

Why are the results of integer promotion different?

Why did these two program behave differently in ANSI C? [duplicate]

Categories

Resources