unsigned u = 1;
int i = -u;
Does the 2nd assignment come under 6.5.5: If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Or does it come under 6.3.1.3:
1 When a value with integer type is converted to another integer type other than _Bool, ...
...
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
I wrote this question because the following (thanks to R.. for clarifications) generates undefined behaviour under 6.5.5 :
int i = INT_MIN;
i = -i;
The problem with the above is that the expression -i is of type int and -INT_MIN for 2's complement platform may be larger than INT_MAX. Under that context, it generates undefined behaviour.
On the other hand, for:
unsigned u = 1;
int i = -u;
-u is of type unsigned. As explained in Is unsigned integer subtraction defined behavior? although the range of unsigned is nominally from 0 to UINT_MAX, there is really no such thing as an out of range unsigned value. So 6.5.5 does not apply for -u. But we still have the assignment expression i=-u in which case 6.3.1.3 applies.
Or to put it another way, if I can reword 6.5.5, it would be:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), if the expression type is not one of the standard or extended unsigned type, the behavior is undefined. If the expression type is one of the standard or extended unsigned type, and the result is less than 0 or greater than the maximum representable value, the result shall adjusted as per 6.3.1.3/2.
It comes under 6.3.1.3. There's nothing exceptional about the expression -u. It's equal to UINT_MAX. Assigning the result into a signed type in which the value UINT_MAX cannot be represented then results in an implementation-defined conversion or signal.
Related
#include <limits.h>
int main(){
int a = UINT_MAX;
return 0;
}
I this UB or implementation defined?
Links saying its UB
https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Integer-Overflow-Basics
Allowing signed integer overflows in C/C++
Links saying its Implementation defined
http://www.enseignement.polytechnique.fr/informatique/INF478/docs/Cpp/en/c/language/signed_and_unsigned_integers.html
Conversion rule says:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Aren't we converting a max unsigned value into a signed value?
The way I have seen it, gcc just truncates the result.
Both references are correct, but they do not address the same issue.
int a = UINT_MAX; is not an instance of signed integer overflow, this definition involves a conversion from unsigned int to int with a value that exceeds the range of type int. As quoted from the École polytechnique's site, the C Standard defines the behavior as implementation-defined.
#include <limits.h>
int main(){
int a = UINT_MAX; // implementation defined behavior
int b = INT_MAX + 1; // undefined behavior
return 0;
}
Here is the text from the C Standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Some compilers have a command line option to change the behavior of signed arithmetic overflow from undefined behavior to implementation-defined: gcc and clang support -fwrapv to force integer computations to be performed modulo the 232 or 264 depending on the signed type. This prevents some useful optimisations, but also prevents some counterintuitive optimisations that may break innocent looking code. See this question for some examples: What does -fwrapv do?
int a = UINT_MAX; does not overflow because no exceptional condition occurs while evaluating this declaration or the expression within it. This code is defined to convert UINT_MAX to the type int for the initialization of a, and the conversion is defined by the rules in C 2018 6.3.1.3.
Briefly, the rules that apply are:
6.7.9 11 says initialization behaves similarly to simple assignment: “… The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply,…”
6.5.16.1 2 says simple assignment performs a conversion: “In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.”
6.3.1.3 3, which covers conversion to a signed integer type when the operand value cannot be represented in the type, says: “either the result is implementation-defined or an implementation-defined signal is raised.”
So, the behavior is defined.
There is a general rule in 2018 6.5 5 about exceptional conditions that occur while evaluating expressions:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
However, this rule never applies in the chain above. While doing the evaluations, including the implied assignment of the initialization, we never get a result out of range of its type. The input to the conversion is out of range of the destination type, int, but the result of the conversion is in range, so there is no out-of-range result to trigger an exceptional condition.
(A possible exception to this is that the C implementation could, I suppose, define the result of the conversion to be out of range of int. I am not aware of any that do, and this is likely not what was intended by 6.3.1.3 3.)
This in not signed integer overflow:
int a = UINT_MAX;
It is a conversion from an unsigned to a signed integer type and is implementation defined. This is covered in section 6.3.1.3 of the C standard regarding conversion of signed and unsigned integer types:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.6
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
An example of signed integer overflow would be:
int x = INT_MAX;
x = x + 1;
And this is undefined. In fact section 3.4.3 of the C standard which defines undefined behavior states in paragraph 4:
An example of undefined behavior is the behavior on integer overflow
And integer overflow only applies to signed types as per 6.2.5p9:
The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type
In the pre-existing "language" (family of dialects) the C Standard was written to describe, implementations would generally either process signed integer overflow by doing whatever the underlying platform did, truncating values to the length of the underlying type (which is what most platforms did) even on platforms which would otherwise do something else, or triggering some form of signal or diagnostic.
In K&R's book "The C Programming Language", the behavior is described as "machine-dependent".
Although the authors of the Standard have said in the published Rationale document identified some cases where they expected that implementations for commonplace platforms would behave in commonplace fashion, they didn't want to say that certain actions would have defined behavior on some platforms but not others. Further, characterizing the behavior as "implementation-defined" would have created a problem. Consider something like:
int f1(void);
int f2(int a, int b, int c);
int test(int x, int y)
{
int test = x*y;
if (f1())
f2(test, x, y);
}
If the behavior of integer overflow were "Implementation Defined", then any implementation where it could raise a signal or have other observable side effects would be required to perform the multiplication before calling f1(), even though the result of the multiply would be ignored unless f1() returns a non-zero value. Classifying it as "Undefined Behavior" avoids such issues.
Unfortunately, gcc interprets the classification as "Undefined Behavior" as an invitation to treat integer overflow in ways that aren't bound by ordinary laws of causality. Given a function like:
unsigned mul_mod_32768(unsigned short x, unsigned short y)
{
return (x*y) & 0x7FFFu;
}
an attempt to call it with x greater than INT_MAX/y may arbitrarily disrupt the behavior of surrounding code, even if the result of the function would not otherwise have been used in any observable fashion.
Does this function invoke undefined behavior due to the - operator being applied to x which is unsigned? I searched the standard and couldn't find an explanation.
unsigned foo(unsigned x)
{
return x ^= x & -x;
}
IMO yes.
edit
void func(unsigned x)
{
printf("%x", -x);
}
int main(void)
{
func(INT_MIN);
}
IMO The only explanation is that it was promoted to larger signed integer size then converted to unsigned.
If it is promoted to larger integer size, what will happen if there is no larger signed integer type?
The behavior of this expression is well defined.
Constructs similar to x = x + 1 are allowed because x isn't assigned a value until all other subexpressions are evaulated. The same applies in this case.
There is also no problem with -x because the expression has unsigned type and thus has well defined wraparound behavior as opposed to overflowing.
Section 6.5.3.3p3 of the C standard regarding the unary - operator states:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
So since no promotion occurs the type remains unsigned throughout the expression. Though not explicitly stated in the standard, -x is effectively the same as 0 - x.
For the specific case of INT_MIN being passed to this function, it has type int and is outside of the range of unsigned, so it is converted when passed to the function. This results in the signed value -2,147,483,648 being converted to the unsigned value 2,147,483,648 (which in two's complement happen to have the same representation, i.e. 0x80000000). Then when -x is evaluated, it wraps around resulting in 2,147,483,648.
6.2.5 Types
...
9 The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the same value in each
type is the same.41) A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
41) The same representation and alignment requirements are meant to imply interchangeability as
arguments to functions, return values from functions, and members of unions.
...
6.3 Conversions
...
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
60) The rules describe arithmetic on the mathematical value, not the value of a given type of expression
...
6.3.1.8 Usual arithmetic conversions
...
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
In short - the -x does not lead to undefined behavior. The result of the expression is still unsigned, it just maps to a well-defined, non-negative value.
Is the following code undefined behavior according to GCC in C99 mode:
signed char c = CHAR_MAX; // assume CHAR_MAX < INT_MAX
c = c + 1;
printf("%d", c);
signed char overflow does cause undefined behavior, but that is not what happens in the posted code.
With c = c + 1, the integer promotions are performed before the addition, so c is promoted to int in the expression on the right. Since 128 is less than INT_MAX, this addition occurs without incident. Note that char is typically narrower than int, but on rare systems char and int may be the same width. In either case a char is promoted to int in arithmetic expressions.
When the assignment to c is then made, if plain char is unsigned on the system in question, the result of the addition is less than UCHAR_MAX (which must be at least 255) and this value remains unchanged in the conversion and assignment to c.
If instead plain char is signed, the result of the addition is converted to a signed char value before assignment. Here, if the result of the addition can't be represented in a signed char the conversion "is implementation-defined, or an implementation-defined signal is raised," according to §6.3.1.3/3 of the Standard. SCHAR_MAX must be at least 127, and if this is the case then the behavior is implementation-defined for the values in the posted code when plain char is signed.
The behavior is not undefined for the code in question, but is implementation-defined.
No, it has implementation-defined behavior, either storing an implementation-defined result or possibly raising a signal.
Firstly, the usual arithmetic conversions are applied to the operands. This converts the operands to type int and so the computation is performed in type int. The result value 128 is guaranteed to be representable in int, since INT_MAX is guaranteed to be at least 32767 (5.2.4.2.1 Sizes of integer types), so next a value 128 in type int must be converted to type char to be stored in c. If char is unsigned, CHAR_MAX is guaranteed to be at least 255; otherwise, if SCHAR_MAX takes its minimal value of 127:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type, [if] the new type is signed and the value cannot be represented in it[,] either the
result is implementation-defined or an implementation-defined signal is raised.
In particular, gcc can be configured to treat char as either signed or unsigned (-f\[un\]signed-char); by default it will pick the appropriate configuration for the target platform ABI, if any. If a signed char is selected, all current gcc target platforms that I am aware of have an 8-bit byte (some obsolete targets such as AT&T DSP1600 had a 16-bit byte), so it will have range [-128, 127] (8-bit, two's complement) and gcc will apply modulo arithmetic yielding -128 as the result:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
int_8 int8 = ~0;
uint_16 uInt16 = (uint_16) int8;
Regarding the typecast above; where in C standard can I find reference to an indication for the following behaviour?
- sign extension to the larger type before the unsigned interpretation (uInt16=0xFFFF) rather than unsigned interpretation followed by 0 extension to the larger type (uInt16=0xFF).
From C99 6.3.1.8
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Above statement is clear about which variable needs to be converted however it is not very clear about how the conversation should actually be performed hence my question asking for a reference from the standard.
Thanks
As per the standard:
6.3.1.3 Signed and unsigned integers
......
2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
And the footnote to avoid the confusion when interpreting the above:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
I.e. if your int8 has a value of -1 (assuming the negatives representations is 2's complement, it does in your example), when converted into uint16_t, the value (0xFFFF + 1) will be added to it (which one more than the max value that can be represented by uint16_t), which yields the result of 0xFFFF + 1 - 1 = 0xFFFF.
Answer I believe is actually part of 6.3.1.8 as well:
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
....
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand.....
meaning that integer promotions are performed first before the conversion to unsigned using the rule 6.3.1.3.
Where in the C99 standard does it say that signed integer overflow is undefined behavior?
I see the comment about unsigned integer overflow being well-defined (see Why is unsigned integer overflow defined behavior but signed integer overflow isn't?) in section 6.2.5:
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
but I'm looking in Appendix J on undefined behaviors, and I only see these similar items in the list:
An expression having signed promoted type is left-shifted and either the value of the
expression is negative or the result of shifting would be not be representable in the
promoted type
and
The value of the result of an integer arithmetic or conversion function cannot be
represented
(note this refers to "an integer arithmetic function", not integer arithmetic itself
I don't have a copy of C99, but in the C11 standard this text appears in Section 6.5, paragraph 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Which would seem to be a catch-all for any overflow; the text about unsigned integers then becomes a special-case above 6.5 ¶ 5.