Is signed integer overflow undefined behaviour or implementation defined?

Is signed integer overflow undefined behaviour or implementation defined? - c

#include <limits.h>
int main(){
int a = UINT_MAX;
return 0;
}
I this UB or implementation defined?
Links saying its UB
https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Integer-Overflow-Basics
Allowing signed integer overflows in C/C++
Links saying its Implementation defined
http://www.enseignement.polytechnique.fr/informatique/INF478/docs/Cpp/en/c/language/signed_and_unsigned_integers.html
Conversion rule says:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Aren't we converting a max unsigned value into a signed value?
The way I have seen it, gcc just truncates the result.

Both references are correct, but they do not address the same issue.
int a = UINT_MAX; is not an instance of signed integer overflow, this definition involves a conversion from unsigned int to int with a value that exceeds the range of type int. As quoted from the École polytechnique's site, the C Standard defines the behavior as implementation-defined.
#include <limits.h>
int main(){
int a = UINT_MAX; // implementation defined behavior
int b = INT_MAX + 1; // undefined behavior
return 0;
}
Here is the text from the C Standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Some compilers have a command line option to change the behavior of signed arithmetic overflow from undefined behavior to implementation-defined: gcc and clang support -fwrapv to force integer computations to be performed modulo the 232 or 264 depending on the signed type. This prevents some useful optimisations, but also prevents some counterintuitive optimisations that may break innocent looking code. See this question for some examples: What does -fwrapv do?

int a = UINT_MAX; does not overflow because no exceptional condition occurs while evaluating this declaration or the expression within it. This code is defined to convert UINT_MAX to the type int for the initialization of a, and the conversion is defined by the rules in C 2018 6.3.1.3.
Briefly, the rules that apply are:
6.7.9 11 says initialization behaves similarly to simple assignment: “… The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply,…”
6.5.16.1 2 says simple assignment performs a conversion: “In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.”
6.3.1.3 3, which covers conversion to a signed integer type when the operand value cannot be represented in the type, says: “either the result is implementation-defined or an implementation-defined signal is raised.”
So, the behavior is defined.
There is a general rule in 2018 6.5 5 about exceptional conditions that occur while evaluating expressions:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
However, this rule never applies in the chain above. While doing the evaluations, including the implied assignment of the initialization, we never get a result out of range of its type. The input to the conversion is out of range of the destination type, int, but the result of the conversion is in range, so there is no out-of-range result to trigger an exceptional condition.
(A possible exception to this is that the C implementation could, I suppose, define the result of the conversion to be out of range of int. I am not aware of any that do, and this is likely not what was intended by 6.3.1.3 3.)

This in not signed integer overflow:
int a = UINT_MAX;
It is a conversion from an unsigned to a signed integer type and is implementation defined. This is covered in section 6.3.1.3 of the C standard regarding conversion of signed and unsigned integer types:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.6
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
An example of signed integer overflow would be:
int x = INT_MAX;
x = x + 1;
And this is undefined. In fact section 3.4.3 of the C standard which defines undefined behavior states in paragraph 4:
An example of undefined behavior is the behavior on integer overflow
And integer overflow only applies to signed types as per 6.2.5p9:
The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type

In the pre-existing "language" (family of dialects) the C Standard was written to describe, implementations would generally either process signed integer overflow by doing whatever the underlying platform did, truncating values to the length of the underlying type (which is what most platforms did) even on platforms which would otherwise do something else, or triggering some form of signal or diagnostic.
In K&R's book "The C Programming Language", the behavior is described as "machine-dependent".
Although the authors of the Standard have said in the published Rationale document identified some cases where they expected that implementations for commonplace platforms would behave in commonplace fashion, they didn't want to say that certain actions would have defined behavior on some platforms but not others. Further, characterizing the behavior as "implementation-defined" would have created a problem. Consider something like:
int f1(void);
int f2(int a, int b, int c);
int test(int x, int y)
{
int test = x*y;
if (f1())
f2(test, x, y);
}
If the behavior of integer overflow were "Implementation Defined", then any implementation where it could raise a signal or have other observable side effects would be required to perform the multiplication before calling f1(), even though the result of the multiply would be ignored unless f1() returns a non-zero value. Classifying it as "Undefined Behavior" avoids such issues.
Unfortunately, gcc interprets the classification as "Undefined Behavior" as an invitation to treat integer overflow in ways that aren't bound by ordinary laws of causality. Given a function like:
unsigned mul_mod_32768(unsigned short x, unsigned short y)
{
return (x*y) & 0x7FFFu;
}
an attempt to call it with x greater than INT_MAX/y may arbitrarily disrupt the behavior of surrounding code, even if the result of the function would not otherwise have been used in any observable fashion.

Related

Assigning an unsigned value to a signed char

Simple question on type conversion in C, assume this line of code:
signed char a = 133;
As the max value of a signed char is 128, does the above code have implementation defined behaviour according to the third rule of casting?
if the value cannot be represented by the new type and it's not unsigned, then the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

First of all, 133 is not unsigned. Since it will always fit in an int, it will be of type int, and signed (furthermore in C99+, all unsuffixed decimal constants are signed! To get unsigned numbers you must add U/u at the end).
Second, this isn't a cast but a conversion. A cast in C is explicit conversion (or non-conversion) to a certain type, marked with construct (type)expression. In this case you could write the initialization to use an explicit cast with
signed char a = (signed char)133;
In this case it would not change the behaviour of the initialization.
third, this is indeed an initialization, not an assignment, so it has different rules for what is an acceptable expression. If this initializer is for an object with static storage duration, then the initializer must be a certain kind of compile-time constant. But for this particular case, both assignment and initialization would do the conversion the same way.
Now we get to the point whether the 3rd integer conversion rule applies - for that you need to know what the 2 first ones are:
the target type is an integer type (not _Bool) with the value representable in it (does not apply in this case since as you well know 133 is not representable if SCHAR_MAX is 127)
the target type is unsigned (well it isn't)
so therefore we get to C11 6.3.1.3p3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The question is whether it has implementation-defined behaviour - yes, the implementation must document what will happen - either how it calculates the result, or which signal it will raise in that occasion.
For GCC 10.2 the manuals state it:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
The Clang "documentation" is a "little" less accessible, you just have to read the source code...

This is implicit type conversion at assignment. 133 will be copied bit by bit into the variable a . 133 in binary is 10000101 which when copied into a will represent a negative number as, if leading bit is 1 then it represents a negative number. then the actual value of a will be determined by using 2's compliment method which comes out to be -123 (It also depends on how negative numbers are implemented for signed data types).

What rules in C11 standard determine the evaluation of `int tx = INT_MAX +1`?

int tx = INT_MAX +1; // 2147483648;
printf("tx = %d\n", tx);
prints tx = -2147483648.
I was wondering how to explain the result based on 6.3 Conversions in C11 standard?
when evaluating INT_MAX +1, are both operands int? Is the result 2147483648 long int? Which rule in 6.3 determines the type of the result?
when evaluating tx = ..., are the higher bits of the bit representation of the right hand side truncated so that its size changes from long int size to int size, and then are the truncated result interpreted as int? What rules in 6.3 determines how the conversion in this step is done?

Both INT_MAX and 1 have type int, so the result will have type int. Performing this operation causes signed integer overflow which is undefined behavior.
Section 3.4.3p3 Gives this as an example of undefined behavior:
EXAMPLE An example of undefined behavior is the behavior on integer overflow.

The relevant part here is 6.5/5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
This happens because both INT_MAX and the integer constant 1 have types int. So you simply can't do INT_MAX + 1. And there are no implicit promotions/conversions present to save the day, so 6.3 does not apply. It's a bug, anything can happen.
What you could do is to force a conversion by changing the code to int tx = INT_MAX + 1u;. Here one operand, 1u, is of unsigned int type. Therefore the usual arithmetic conversions convert INT_MAX to type unsigned int (See Implicit type promotion rules). The result is a well-defined 2147483648 and of type unsigned int.
Then there's an attempt to store this inside int tx, conversion to the left operand of assignment applies and then the conversion rules of 6.3 kick in. Specifically 6.3.1.3/3:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
So by changing the type to 1u we changed the code from undefined to impl.defined behavior. Still not ideal, but at least now the code has deterministic behavior on the given compiler. In theory, the result could be a SIGFPE signal, but in practice all real-world 2's complement 32/64 bit compilers are likely to give you the result -2147483648.
Ironically, all real-world 2's complement CPUs I've ever heard of perform signed overflow in a deterministic way. So the undefined behavior part of C is just an artificial construct by the C standard, caused by the useless language feature that allows exotic 1's complement and signed magnitude formats. In such exotic formats, signed overflow could lead to a trap representation and so C must claim that integer overflow is undefined behavior, even though it is not on the real-world 2's complement CPU that the C program is executing on.

Is signed char overflow undefined within the range -255 to 255?

Is the following code undefined behavior according to GCC in C99 mode:
signed char c = CHAR_MAX; // assume CHAR_MAX < INT_MAX
c = c + 1;
printf("%d", c);

signed char overflow does cause undefined behavior, but that is not what happens in the posted code.
With c = c + 1, the integer promotions are performed before the addition, so c is promoted to int in the expression on the right. Since 128 is less than INT_MAX, this addition occurs without incident. Note that char is typically narrower than int, but on rare systems char and int may be the same width. In either case a char is promoted to int in arithmetic expressions.
When the assignment to c is then made, if plain char is unsigned on the system in question, the result of the addition is less than UCHAR_MAX (which must be at least 255) and this value remains unchanged in the conversion and assignment to c.
If instead plain char is signed, the result of the addition is converted to a signed char value before assignment. Here, if the result of the addition can't be represented in a signed char the conversion "is implementation-defined, or an implementation-defined signal is raised," according to §6.3.1.3/3 of the Standard. SCHAR_MAX must be at least 127, and if this is the case then the behavior is implementation-defined for the values in the posted code when plain char is signed.
The behavior is not undefined for the code in question, but is implementation-defined.

No, it has implementation-defined behavior, either storing an implementation-defined result or possibly raising a signal.
Firstly, the usual arithmetic conversions are applied to the operands. This converts the operands to type int and so the computation is performed in type int. The result value 128 is guaranteed to be representable in int, since INT_MAX is guaranteed to be at least 32767 (5.2.4.2.1 Sizes of integer types), so next a value 128 in type int must be converted to type char to be stored in c. If char is unsigned, CHAR_MAX is guaranteed to be at least 255; otherwise, if SCHAR_MAX takes its minimal value of 127:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type, [if] the new type is signed and the value cannot be represented in it[,] either the
result is implementation-defined or an implementation-defined signal is raised.
In particular, gcc can be configured to treat char as either signed or unsigned (-f\[un\]signed-char); by default it will pick the appropriate configuration for the target platform ABI, if any. If a signed char is selected, all current gcc target platforms that I am aware of have an 8-bit byte (some obsolete targets such as AT&T DSP1600 had a 16-bit byte), so it will have range [-128, 127] (8-bit, two's complement) and gcc will apply modulo arithmetic yielding -128 as the result:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

Where in the C99 standard does it say that signed integer overflow is undefined behavior?

Where in the C99 standard does it say that signed integer overflow is undefined behavior?
I see the comment about unsigned integer overflow being well-defined (see Why is unsigned integer overflow defined behavior but signed integer overflow isn't?) in section 6.2.5:
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
but I'm looking in Appendix J on undefined behaviors, and I only see these similar items in the list:
An expression having signed promoted type is left-shifted and either the value of the
expression is negative or the result of shifting would be not be representable in the
promoted type
and
The value of the result of an integer arithmetic or conversion function cannot be
represented
(note this refers to "an integer arithmetic function", not integer arithmetic itself

I don't have a copy of C99, but in the C11 standard this text appears in Section 6.5, paragraph 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Which would seem to be a catch-all for any overflow; the text about unsigned integers then becomes a special-case above 6.5 ¶ 5.

unary - operator: implementation defined or undefined

unsigned u = 1;
int i = -u;
Does the 2nd assignment come under 6.5.5: If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Or does it come under 6.3.1.3:
1 When a value with integer type is converted to another integer type other than _Bool, ...
...
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
I wrote this question because the following (thanks to R.. for clarifications) generates undefined behaviour under 6.5.5 :
int i = INT_MIN;
i = -i;
The problem with the above is that the expression -i is of type int and -INT_MIN for 2's complement platform may be larger than INT_MAX. Under that context, it generates undefined behaviour.
On the other hand, for:
unsigned u = 1;
int i = -u;
-u is of type unsigned. As explained in Is unsigned integer subtraction defined behavior? although the range of unsigned is nominally from 0 to UINT_MAX, there is really no such thing as an out of range unsigned value. So 6.5.5 does not apply for -u. But we still have the assignment expression i=-u in which case 6.3.1.3 applies.
Or to put it another way, if I can reword 6.5.5, it would be:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), if the expression type is not one of the standard or extended unsigned type, the behavior is undefined. If the expression type is one of the standard or extended unsigned type, and the result is less than 0 or greater than the maximum representable value, the result shall adjusted as per 6.3.1.3/2.

It comes under 6.3.1.3. There's nothing exceptional about the expression -u. It's equal to UINT_MAX. Assigning the result into a signed type in which the value UINT_MAX cannot be represented then results in an implementation-defined conversion or signal.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight