Does x ^= x & -x; where x is an unsigned integer invoke UB? - c

Does this function invoke undefined behavior due to the - operator being applied to x which is unsigned? I searched the standard and couldn't find an explanation.
unsigned foo(unsigned x)
{
return x ^= x & -x;
}
IMO yes.
edit
void func(unsigned x)
{
printf("%x", -x);
}
int main(void)
{
func(INT_MIN);
}
IMO The only explanation is that it was promoted to larger signed integer size then converted to unsigned.
If it is promoted to larger integer size, what will happen if there is no larger signed integer type?

The behavior of this expression is well defined.
Constructs similar to x = x + 1 are allowed because x isn't assigned a value until all other subexpressions are evaulated. The same applies in this case.
There is also no problem with -x because the expression has unsigned type and thus has well defined wraparound behavior as opposed to overflowing.
Section 6.5.3.3p3 of the C standard regarding the unary - operator states:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
So since no promotion occurs the type remains unsigned throughout the expression. Though not explicitly stated in the standard, -x is effectively the same as 0 - x.
For the specific case of INT_MIN being passed to this function, it has type int and is outside of the range of unsigned, so it is converted when passed to the function. This results in the signed value -2,147,483,648 being converted to the unsigned value 2,147,483,648 (which in two's complement happen to have the same representation, i.e. 0x80000000). Then when -x is evaluated, it wraps around resulting in 2,147,483,648.

6.2.5 Types
...
9 The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the same value in each
type is the same.41) A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
41) The same representation and alignment requirements are meant to imply interchangeability as
arguments to functions, return values from functions, and members of unions.
...
6.3 Conversions
...
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
60) The rules describe arithmetic on the mathematical value, not the value of a given type of expression
...
6.3.1.8 Usual arithmetic conversions
...
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
In short - the -x does not lead to undefined behavior. The result of the expression is still unsigned, it just maps to a well-defined, non-negative value.

Related

integer promotion and unsigned interpretation

int_8 int8 = ~0;
uint_16 uInt16 = (uint_16) int8;
Regarding the typecast above; where in C standard can I find reference to an indication for the following behaviour?
- sign extension to the larger type before the unsigned interpretation (uInt16=0xFFFF) rather than unsigned interpretation followed by 0 extension to the larger type (uInt16=0xFF).
From C99 6.3.1.8
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Above statement is clear about which variable needs to be converted however it is not very clear about how the conversation should actually be performed hence my question asking for a reference from the standard.
Thanks
As per the standard:
6.3.1.3 Signed and unsigned integers
......
2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
And the footnote to avoid the confusion when interpreting the above:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
I.e. if your int8 has a value of -1 (assuming the negatives representations is 2's complement, it does in your example), when converted into uint16_t, the value (0xFFFF + 1) will be added to it (which one more than the max value that can be represented by uint16_t), which yields the result of 0xFFFF + 1 - 1 = 0xFFFF.
Answer I believe is actually part of 6.3.1.8 as well:
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
....
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand.....
meaning that integer promotions are performed first before the conversion to unsigned using the rule 6.3.1.3.

Extension on shifting or arithemtic operations in standard C

Sorry for bad English.
uint16_t a, c;
uint8_t b = 0xff;
a = b<<8;
c = b*10;
What is value of a and c we get? What is situation with arbitrary integer types?
uint16_t a, c;
uint8_t b = 0xff;
a = b<<8;
First, the integer promotions are performed on the arguments of <<. The constant 8 is an int and thus is not converted. Since the conversion rank of uint8_t is smaller than that of int, and all values of uint8_t are representable as ints, b is converted - preserving its value - to int. The resulting int value is then shifted left by eight bits.
If int is only 16 bits wide, the value 0xff * 2^8 is not representable as an int, and then the shift invokes undefined behaviour - 6.5.7 (4) in n1570 and C99:
If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Otherwise, the result is 255*256 = 65280 = 0xFF00. Since that value is representable in the type of a, the conversion of the int result of the shift to uint16_t preserves the value; if the result were out of range (e.g. if the shift distance were 9 [and int wide enough]), it would be reduced modulo 2^16 to obtain a value in the range 0 to 2^16 - 1 of uint16_t.
c = b*10;
The usual arithmetic conversions are performed on the operands of *. Both operands have an integer type, thus first the integer promotions are performed. Since 10 is an int and all values of b's type are representable as an int, the integer promotions give both operands the same type, int, and the usual arithmetic conversions don't require any further conversions. The multiplication is done at type int, its result, 2550, is again representable in the type of c, so the conversion to uint16_t that is done before storing the value in c preserves the value.
What is situation with arbitrary integer types?
For <<:
integer promotions; values/expressions of an integer type whose conversion rank is less than or equal to that of int (and unsigned int) [integer types with width <= that of (unsigned) int], and bitfields of type _Bool, int, signed int and unsigned int are converted to int or unsigned int (int if that can represent all values of the original type, unsigned int otherwise).
If the (promoted) right operand (shift distance) is negative or greater than or equal to the width (number of value bits plus sign bits; there is either one sign bit or none) of the (promoted) left operand, the behaviour is undefined. If the value of the (promoted) left operand is negative, the behaviour is undefined. If the type of the (promoted) left operand is unsigned, the result is value * 2^distance, reduced modulo 2^width. If the type of the (promoted) left operand is signed and the value nonnegative, the result is value * 2^distance if that is representable in the type, the behaviour is undefined otherwise.
If no undefined behaviour occurred in 2., the result is converted to the type of the variable it is stored in.
If the target type is _Bool (or an alias thereof), a nonzero result is converted to 1, a zero result to 0, otherwise
If the result can be represented in the target type, its value is preserved, otherwise
If the target type is unsigned, the result is reduced modulo 2^width, otherwise
the result is converted in an implementation-defined manner or an implementation-defined signal is raised.
For *:
The usual arithmetic conversions are performed, so that both (converted) operands have the same type.
The multiplication is performed at the resulting type; if that is a signed integer type and the multiplication overflows, the behaviour is undefined.
The result is converted to the target type in the same manner as above.
That's how the abstract machine is defined, if the implementation can achieve the same results (where the behaviour is defined) in another manner, it can do as it pleases under the as-if rule.

unsigned and int gotcha [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Arithmetic operations on unsigned and signed integers
unsigned int b=2;
int a=-2;
if(a>b)
printf("a>b");
else
printf("b>a");
OUTPUT: a>b
int b=2;
int a=-2;
if(a>b)
printf("a>b");
else
printf("b>a");
OUTPUT: b>a
PLEASE, someone explain the output
In the first case both operands are converted to unsigned int, the converted a will be UINT_MAX-1, which is much larger than b and hence the output.
Don't compare signed and unsigned integers unless you understand the semantics of arithematic conversions, the results might surprise you.
When signed and unsigned values are compared, and when the unsigned values can't all be represented in the signed type, then the signed operand is promoted to unsigned. This is done with a formula that amounts to a reinterpretation of the 2-s complement bit pattern.
Well, negative numbers have lots of high bits set...
Since your operands are all of the same rank, it's just a matter of unsigned bit patterns being compared.
And so -2 is represented with 111111..110, one less than the largest possible, and it easily beats 2 when interpreted as unsigned.
The following is taken from The C Programming Language by Kernighan and Ritchie - 2.7 Type Conversions - page 44; the second half of the page explains the same scenario in detail. A small portion is below for your reference.
Conversion rules are complicated when unsigned operands are involved. The problem is that comparison between signed and unsigned values are machine dependent, because they depend on the sizes of the various integer types. For example, suppose that int is 16 bits long and long is 32 bits. Then -1L < 1U, because 1U, which is an int, is promoted to a signed long. But -1L > 1UL, because -1L is promoted to unsigned long and thus appears to be a larger positive number.
You need to learn the operation of the operators in C and the C promotion and conversion rules. They are explained in the C standard. Some excerpts from it plus my comments:
6.5.8 Relational operators
Syntax
1 relational-expression:
shift-expression
relational-expression < shift-expression
relational-expression > shift-expression
relational-expression <= shift-expression
relational-expression >= shift-expression
Semantics
3 If both of the operands have arithmetic type, the usual arithmetic conversions are
performed.
Most operators include this "usual arithmetic conversions" step before the actual operation (addition, multiplication, comparison, etc etc). - Alex
6.3.1.8 Usual arithmetic conversions
1 Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
First, if the corresponding real type of either operand is long double, the other
operand is converted, without change of type domain, to a type whose
corresponding real type is long double.
Otherwise, if the corresponding real type of either operand is double, the other
operand is converted, without change of type domain, to a type whose
corresponding real type is double.
Otherwise, if the corresponding real type of either operand is float, the other
operand is converted, without change of type domain, to a type whose
corresponding real type is float.
Otherwise, the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type. (The rules describe arithmetic on the mathematical value, not the value of a given type of expression.)
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
So, in your a>b (with a being an int and b being an unsigned int), per the above rules you get a converted to unsigned int before the comparison. Since a is negative (-2), the unsigned value becomes UINT_MAX+1+a (this is the repeatedly adding or
subtracting one more than the maximum value bit). And UINT_MAX+1+a in your case is UINT_MAX+1-2 = UINT_MAX-1, which is a huge positive number compared to the value of b (2). And so a>b yields "true".
Forget the math you learned at school. Learn how C does it.
In this first case you get the unsigned a converted into a signed int. Then these two are compared.
Type conversion ranks between signed and unsigned types could have the same rank in C99. This is when a unsigned and a signed type have the corresponding types, when this happens the result is up to the compiler.
Here is a summary of the rules.

Is conversion from unsigned to signed undefined?

void fun(){
signed int a=-5;
unsigned int b=-5;
printf("the value of b is %u\n",b);
if(a==b)
printf("same\n");
else
printf("diff");
}
It is printing :
4294967291
same
In the 2nd line signed value is converted to unsigned value. So b has the value UINTMAX + 1 - 5 = 4294967291.
My question is what is happening in the comparison operation .
1) Is a again converted to unsigned and compared with b ?
2) Will b(ie unsigned ) be ever casted to signed value and compared automatically?
3) Is conversion from unsigned to signed undefined due to int overflow ?
I have read other posts on the topic. I just want clarification on questions 2 and 3 .
1) Is a again converted to unsigned and compared with b ?
Yes. In the expression (a==b), the implicit type conversion called "balancing" takes place (the formal name is "the usual arithmetic conversions"). Balancing rules specify that if a signed and a unsigned operand of the same size and type are compared, the signed operand is converted to a unsigned.
2) Will b(ie unsigned ) be ever casted to signed value and compared automatically?
No, it will never be converted to signed in your example.
3) Is conversion from unsigned to signed undefined due to int overflow ?
This is what the standard says: (C11)
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
2 Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than the
maximum value that can be represented in the new type until the value
is in the range of the new type.
3 Otherwise, the new type is
signed and the value cannot be represented in it; either the result is
implementation-defined or an implementation-defined signal is raised.
In other words, if the compiler can manage to do the conversion in 2) above, then the behavior is well-defined. If it cannot, then the result depends on the compiler implementation.
It is not undefined behavior.
Answers:
a is converted to unsigned int.
If a had a wider range than the signed counterpart of b (I can imagine long long a would do), b would be converted to a signed type.
If an unsigned value can't be correctly represented after conversion to a signed type, you'll have implementation-defined behavior. If it can, no problem.
b = -5;
That counts as overflow and is undefined behavior. Comparing signed with unsigned automatically promotes the operands to unsigned. -In case of overflow, you'll find yourself again in a case of undefined behavior.- plain wrong, see [edit] below
See also http://c-faq.com/expr/preservingrules.html.
[edit] Correction - the standard does state that the 2 complement should be used when converting from negative signed to unsigned.

unary - operator: implementation defined or undefined

unsigned u = 1;
int i = -u;
Does the 2nd assignment come under 6.5.5: If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Or does it come under 6.3.1.3:
1 When a value with integer type is converted to another integer type other than _Bool, ...
...
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
I wrote this question because the following (thanks to R.. for clarifications) generates undefined behaviour under 6.5.5 :
int i = INT_MIN;
i = -i;
The problem with the above is that the expression -i is of type int and -INT_MIN for 2's complement platform may be larger than INT_MAX. Under that context, it generates undefined behaviour.
On the other hand, for:
unsigned u = 1;
int i = -u;
-u is of type unsigned. As explained in Is unsigned integer subtraction defined behavior? although the range of unsigned is nominally from 0 to UINT_MAX, there is really no such thing as an out of range unsigned value. So 6.5.5 does not apply for -u. But we still have the assignment expression i=-u in which case 6.3.1.3 applies.
Or to put it another way, if I can reword 6.5.5, it would be:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), if the expression type is not one of the standard or extended unsigned type, the behavior is undefined. If the expression type is one of the standard or extended unsigned type, and the result is less than 0 or greater than the maximum representable value, the result shall adjusted as per 6.3.1.3/2.
It comes under 6.3.1.3. There's nothing exceptional about the expression -u. It's equal to UINT_MAX. Assigning the result into a signed type in which the value UINT_MAX cannot be represented then results in an implementation-defined conversion or signal.

Resources