Is negating the integer -2^(31) defined as undefined behavior in the C standard or simply -2^(31) again? Trying it the latter holds, but it would be interesting to know how the C standard specifies it.
The standard (n2176 draft) says explictely at 6.5 Expressions § 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not
mathematically defined or not in the range of representable values for its type), the behavior is
undefined.
We are exactly there: the result is not is the range of representable values for the type, so it is explicitely UB.
That being said, most implementation use 2'complement for negative values and process operations on signed types as the operation on the unsigned type values having the same representation. Which is perfectly defined.
So getting -2^(31) again can be expected on common implementations. But as the standard says that it is UB, it cannot be relied on.
Related
Follow-up question for:
Type casting: double to char: multiple questions
Assigning an unsigned value to a signed char
Context: ISO/IEC 9899:202x (E) working draft — February 5, 2020 C17..C2x N2479 (emphasis added):
J.3 Implementation-defined behavior, J.3.5 Integers
— The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (6.3.1.3).
6.3.1.4 Real floating and integer
When a finite value of standard floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
Question: Why converting 'out of range integer to integer' leads to IB, but converting 'out of range floating-point to integer' leads to UB? I.e. why the behavior is not consistent (e.g. IB in both cases)?
UPD. Answer from user P.P. in duplicated question:
I doubt it's reasonably answerable. It's mainly because of history, and based on the implementations, behaviours of hardware, etc when C was standardized. So "consistency" wasn't possible/practical (it's not like the committee decided to arbitrarily classify certain behaviours as IB, UB, or unspecified).
From the point of view of the Standard, the question of whether to classify something as Implementation-Defined Behavior and Undefined Behavior depends on whether all implementations should be required to document a behavior generally consistent with the semantics of the language, regardless for cost or usefulness. There was no need to mandate that implementations process actions in ways their customers would find useful, because it was expected that implementations allowed to behave in such fashion would do so with or without a mandate. Consequently, it was seen as better to characterize as Undefined Behavior useful actions which implementations might process 100% consistently, than to characterize as Implementation-Defined actions which might sometimes be impractical to implement consistently.
Note that for an implementation to treat an action as having documented behavior could sometimes have costs that might not be obvious. Consider, for example:
int f1(int x, int y);
int f2(int x, int y, int z);
void test(int x, unsigned char y)
{
short temp = x/(y+1);
if (f1(x,y))
f2(x,y,temp);
}
On platforms where the conversion to short would always execute without side effects, or on implementations that were allowed to treat out-of-range conversions as Undefined Behavior, the computation of x/(y+1) and conversion to short could be deferred until after the call to f1, and skipped altogether if f1 returns zero. Such transformation could affect the behavior of a signal raised by the conversion, however, and would thus not appear to be allowable under the Standard on implementations where the conversion could raise a signal.
On the other hand, while it may be useful to have implementations raise a signal in case of an out-of-bounds conversion, such signals would mainly be useful in situations where quality of diagnostics was viewed as more important than performance. Implementations where performance was more important would be free to make optimizations like the above if they processed the conversion as having no side effects, and it seemed likely that the latter course of action would be practical on all platforms.
There were platforms where the fastest way of converting a float to an int will trap; as noted, the possibility that an action may trap would make classification as Implementation-Defined behavior expensive. While it is unlikely that there would have been any platforms where it would have been impractical to process a conversion from e.g. float to short as a conversion from float to int, followed by a conversion from int to short, there are platforms where that may not be the most useful behavior (e.g. if a platform can at no extra cost peg the result of such a conversion to the range of the target type, that may be more useful than a conversion to int and then the target type). Even if the authors of the Standard would have expected and intended that conversions from floating-point types to small integer types never yield unsequenced traps for any values which are within range of int, the Standard classifies as UB general actions which might behave unpredictably in some cases but in a predictable implementation-specific fashion in others, without any effort to identify specific cases where they should behave predictably.
The latter principle is perhaps best illustrated by examining the way left shift was described in C89 and C99. There is no reason why x << 0 shouldn't yield x for all integer values of x, and the way C89 specified the behavior would do precisely that. The C89 spec, however, specified behavior in some cases where it may be useful to allow some implementations to behave in a different, and not necessarily predictable, fashion. C99 makes no effort to identify situations where all implementations should treat left shifts of negative numbers the same way as C89 did, because the authors expected that all implementations would treat such cases in C89 fashion with or without a mandate.
I'll quote from N1570, but the C11 standard has similar wording:
The fpclassify macro classifies its argument value as NaN, infinite, normal,
subnormal, zero, or into another implementation-defined category. First, an argument
represented in a format wider than its semantic type is converted to its semantic type.
Then classification is based on the type of the argument.
(my emphasis)
And a footnote:
Since an expression can be evaluated with more range and precision than its type has, it is important to
know the type that classification is based on. For example, a normal long double value might
become subnormal when converted to double, and zero when converted to float.
What does it mean for the argument to be "converted to its semantic type". There is no definition of "semantic type" anywhere evident.
My understanding is that that any excess precision is removed, as if storing the expression's value to a variable of float, double or long double, resulting in a value of the precision the programmer expected. In which case, using fpclassify() and friends on an lvalue would result in no conversion necessary for a non-optimising compiler. Am I correct, or are these functions much less useful than advertised to be?
(This question arises from comments to a Code Review answer)
The semantic type is simply the type of the expression as described elsewhere in the C standard, disregarding the fact that the value is permitted to be represented with excess precision and range. Equivalently, the semantic type is the type of the expression if clause 5.2.4.2.2 paragraph 9 (which says that floating-point values may be evaluated with excess range and precision) were not in the standard.
Converting an argument to its semantic type means discarding the excess precision and range (by rounding the value to the semantic type using whatever rounding rule is in effect for the operation).
Regarding your hypothesis that applying fpclassify to an lvalue does not require any conversion (because the value stored in an object designated by an lvalue must have already been converted to its semantic type when it was assigned), I am not sure that holds formally. Certainly when the object’s value is updated by assignment, 5.2.4.2.2 9 requires that excess range and precision be removed. But consider alternate ways of modifying the value, such as the postfix increment operator. Does that count as an assignment? Its specification in 6.5.2.4 2 says to see the discussion of compound assignment for information on its conversions and effects. That is a bit vague. One would have to consider all possible ways of modifying an object and evaluate what the C standard says about them.
In the following C snippet that checks if the first two bits of a 16-bit sequence are set:
bool is_pointer(unsigned short int sequence) {
return (sequence >> 14) == 3;
}
CLion's Clang-Tidy is giving me a "Use of a signed integer operand with a binary bitwise operator" warning, and I can't understand why. Is unsigned short not unsigned enough?
The code for this warning checks if either operand to the bitwise operator is signed. It is not sequence causing the warning, but 14, and you can alleviate the problem by making 14 unsigned by appending a u to the end.
(sequence >> 14u)
This warning is bad. As Roland's answer describes, CLion is fixing this.
There is a check in clang-tidy that is called hicpp-signed-bitwise. This check follows the wording of the HIC++ standard. That standard is freely available and says:
5.6.1. Do not use bitwise operators with signed operands
Use of signed operands with bitwise operators is in some cases subject to undefined or implementation defined behavior. Therefore, bitwise operators should only be used with operands of unsigned integral types.
The authors of the HIC++ coding standard misinterpreted the intention of the C and C++ standards and either accidentally or intentionally focused on the type of the operands instead of the value of the operands.
The check in clang-tidy implements exactly this wording, in order to conform to that standard. That check is not intended to be generally useful, its only purpose is to help the poor souls whose programs have to conform to that one stupid rule from the HIC++ standard.
The crucial point is that by definition integer literals without any suffix are of type int, and that type is defined as being a signed type. HIC++ now wrongly concludes that positive integer literals might be negative and thus could invoke undefined behavior.
For comparison, the C11 standard says:
6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
This wording is carefully chosen and emphasises that the value of the right operand is important, not its type. It also covers the case of a too large value, while the HIC++ standard simply forgot that case. Therefore, saying 1u << 1000u is ok in HIC++, while 1 << 3 isn't.
The best strategy is to explicitly disable this single check. There are several bug reports for CLion mentioning this, and it is getting fixed there.
Update 2019-12-16: I asked Perforce what the motivation behind this exact wording was and whether the wording was intentional. Here is their response:
Our C++ team who were involved in creating the HIC++ standard have taken a look at the Stack Overflow question you mentioned.
In short, referring to the object type in the HIC++ rule instead of the value is an intentional choice to allow easier automated checking of the code. The type of an object is always known, while the value is not.
HIC++ rules in general aim to be "decidable". Enforcing against the type ensures that a decidable check is always possible, ie. directly where the operator is used or where a signed type is converted to unsigned.
The rationale explicitly refers to "possible" undefined behavior, therefore a sensible implementation can exclude:
constants unless there is definitely an issue and,
unsigned types that are promoted to signed types.
The best operation is therefore for CLion to limit the checking to non-constant types before promotion.
I think the integer promotion causes here the warning. Operands smaller than an int are widened to integer for the arithmetic expression, which is signed. So your code is effectively return ( (int)sequence >> 14)==3; which leds to the warning. Try return ( (unsigned)sequence >> 14)==3; or return (sequence & 0xC000)==0xC000;.
In C99, the term arithmetic operation appears 16 times, but I don't see a definition for it.
The term arithmetic operator only appears twice in the text (again without definition) but it does appear in the Index:
arithmetic operators
additive, 6.5.6, G.5.2
bitwise, 6.5.10, 6.5.11, 6.5.12
increment and decrement, 6.5.2.4, 6.5.3.1
multiplicative 6.5.5, G.5.1
shift, 6.5.7
unary, 6.5.3.3
Then we have + - | &(binary) ++ -- *(binary) / % << >> ~ as arithmetic operators, if the Index is considered normative!
Perhaps we should identify arithmetic operation as being the use of an arithmetic operator. But F9.4.5 says that the sqrt() function is also an arithmetic operation, and refers to IEC 60559 (aka. IEEE754) for details. So there must be arithmetic operations that are not just the use of arithmetic operators.
Since we don't have a formal definition let's see if we can piece together a rationale interpretation of what an arithmetic operation should be. This will be speculative but I can not find any obvious defect reports or open issues that cover this.
I guess I would start with what are considered arithmetic types, which is covered in section 6.2.5 Types paragraph 18 says (emphasis mine going forward):
Integer and floating types are collectively called arithmetic types.
Each arithmetic type belongs to one type domain: the real type domain
comprises the real types, the complex type domain comprises the
complex types.
ok, so we know that an arithmetic operation has to operate on either an integer or a floating point type. So what is an operation? It seems like we have a good go at defining that from section 5.1.2.3 Program execution paragraph 2 which says:
Accessing a volatile object, modifying an object, modifying a file, or
calling a function that does any of those operations are all side
effects,11) which are changes in the state of the execution
environment. [...]
So modifying an object or call a function that does that, it is an operation. What is an object? Section 3.14 says:
region of data storage in the execution environment, the contents of
which can represent values
Although the standard seems to use the term operation more loosely to mean an evaluation, for example in section 7.12.1 Treatment of error conditions it says:
The behavior of each of the functions in is specified for all
representable values of its input arguments, except where stated
otherwise. Each function shall execute as if it were a single
operation without generating any externally visible exceptional
conditions.
and in section 6.5 Expressions paragraph 8 which says:
A floating expression may be contracted, that is, evaluated as though
it were an atomic operation [...]
So this would seem to imply that an evaluation is an operation.
So it would seem from these sections that pretty much all the arithmetic operators and any math function would fall under a common sense definition of arithmetic operation.
The most convincing bit I could find to be an implicit definition lies in 7.14 Signal Handling, paragraph 3, in the definition of the SIGFPE signal:
SIGFPE - an erroneous arithmetic operation, such as a zero divide or an operation resulting in overflow
One might then draw a conclusion that any operation that may cause SIGFPE to be raised can be considered an arithmetic operation; only arithmetic operations can result in the SIGFPE signal being raised.
That covers pretty much anything in <math.h> and the arithmetic operators, and <complex.h> if implemented. While a signal may not be raised for integral types, signed overflow and other "exceptional" conditions are allowed to generate trap representations, which means no other operations may be carried out reliably until a valid value is obtained — something that can only be done via assignment. In other words, the definition can apply equally to operations on an integral value.
As a result, pretty much any operation other than getting the size of an object/type, dereferencing a pointer, and taking the address of an object may be considered an arithmetic operation. Note that a[n] is *((a) + (n)), so even using an array can be considered an arithmetic operation.
An arithmetic operation involve manipulation of numbers. sqrt also manipulate numbers and that could be the reason that standard says it an arithmetic operation.
Edited to include proper standard reference thanks to Carl Norum.
The C standard states
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Are there compiler switches that guarantee certain behaviors on integer overflow? I'd like to avoid nasal demons. In particular, I'd like to force the compiler to wrap on overflow.
For the sake of uniqueness, let's take the standard to be C99 and the compiler to be gcc. But I would be interested in answers for other compilers (icc, cl) and other standards (C1x, C89). In fact, just to annoy the C/C++ crowd, I'd even appreciate answers for C++0x, C++03, and C++98.
Note: International standard ISO/IEC 10967-1 may be relevant here, but as far as I could tell it was mentioned only in the informative annex.
Take a look at -ftrapv and -fwrapv:
-ftrapv
This option generates traps for signed overflow on addition, subtraction, multiplication operations.
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables other. This option is enabled by default for the Java front-end, as required by the Java language specification.
For your C99 answer, I think 6.5 Expressions, paragraph 5 is what you're looking for:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
That means if you get an overflow, you're out of luck - no behaviour of any kind guaranteed. Unsigned types are a special case, and never overflow (6.2.5 Types, paragraph 9):
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
C++ has the same statements, worded a bit differently:
5 Expressions, paragraph 4:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —endnote]
3.9.1 Fundamental types, paragraph 4:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer.
In C99 the general behavior is desribed in 6.5/5
If an exceptional condition occurs
during the evaluation of an expression
(that is, if the result is not
mathematically defined or not in the
range of representable values for its
type), the behavior is undefined.
The behavior of unsigned types is described in 6.2.5/9, which basically states that operations on unsigned types never lead to exceptional condition
A computation involving unsigned
operands can never overflow, because a
result that cannot be represented by
the resulting unsigned integer type is
reduced modulo the number that is one
greater than the largest value that
can be represented by the resulting
type.
GCC compiler has a special option -ftrapv, which is intended to catch run-time overflow of signed integer operations.
For completeness, I'd like to add that Clang now has "checked arithmetic builtins" as a language extension. Here is an example using checked unsigned multiplication:
unsigned x, y, result;
...
if (__builtin_umul_overflow(x, y, &result)) {
/* overflow occured */
...
}
...
http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
6.2.5 paragraph 9 is what you're looking for:
The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the same value in each
type is the same.31) A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
The previous postings all commented on the C99 standard, but in fact this guarantee was already available earlier.
The 5th paragraph of Section 6.1.2.5 Types
of the C89 standard states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned
integer type is reduced modulo the number that is one greater than
the largest value that can be represented by the resulting unsigned integer type.
Note that this allows C programmers to replace all unsigned divisions by some constant to be replaced by a multiplication with the inverse element of the ring formed by C's modulo 2^N interval arithmetic.
And this can be done without any "correction" as it would be necessary by approximating the division with a fixed-point multiplication with the reciprocal value.
Instead, the Extended Euclidian Algorithm can be used to find the inverse Element and use it as the multiplier. (Of course, for the sake of staying portable, bitwise AND operations should also be applied in order to ensure the results have the same bit widths.)
It may be worthwhile to comment that most C compilers already implement this as an optimization. However, such optimizations are not guaranteed, and therefore it might still be interesting for programmers to perform such optimizations manually in situations where speed matters, but the capabilities of the C optimizer are either unknown or particularly weak.
And as a final remark, the reason for why trying to do so at all: The machine-level instructions for multiplication are typically much faster than those for division, especially on high-performance CPUs.
I'm not sure if there are any compiler switches you can use to enforce uniform behavior for overflows in C/C++. Another option is to use the SafeInt<T> template. It's a cross platform C++ template that provides definitive overflow / underflow checks for all types of integer operations.
http://safeint.codeplex.com/