Edited to include proper standard reference thanks to Carl Norum.
The C standard states
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Are there compiler switches that guarantee certain behaviors on integer overflow? I'd like to avoid nasal demons. In particular, I'd like to force the compiler to wrap on overflow.
For the sake of uniqueness, let's take the standard to be C99 and the compiler to be gcc. But I would be interested in answers for other compilers (icc, cl) and other standards (C1x, C89). In fact, just to annoy the C/C++ crowd, I'd even appreciate answers for C++0x, C++03, and C++98.
Note: International standard ISO/IEC 10967-1 may be relevant here, but as far as I could tell it was mentioned only in the informative annex.
Take a look at -ftrapv and -fwrapv:
-ftrapv
This option generates traps for signed overflow on addition, subtraction, multiplication operations.
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables other. This option is enabled by default for the Java front-end, as required by the Java language specification.
For your C99 answer, I think 6.5 Expressions, paragraph 5 is what you're looking for:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
That means if you get an overflow, you're out of luck - no behaviour of any kind guaranteed. Unsigned types are a special case, and never overflow (6.2.5 Types, paragraph 9):
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
C++ has the same statements, worded a bit differently:
5 Expressions, paragraph 4:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —endnote]
3.9.1 Fundamental types, paragraph 4:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer.
In C99 the general behavior is desribed in 6.5/5
If an exceptional condition occurs
during the evaluation of an expression
(that is, if the result is not
mathematically defined or not in the
range of representable values for its
type), the behavior is undefined.
The behavior of unsigned types is described in 6.2.5/9, which basically states that operations on unsigned types never lead to exceptional condition
A computation involving unsigned
operands can never overflow, because a
result that cannot be represented by
the resulting unsigned integer type is
reduced modulo the number that is one
greater than the largest value that
can be represented by the resulting
type.
GCC compiler has a special option -ftrapv, which is intended to catch run-time overflow of signed integer operations.
For completeness, I'd like to add that Clang now has "checked arithmetic builtins" as a language extension. Here is an example using checked unsigned multiplication:
unsigned x, y, result;
...
if (__builtin_umul_overflow(x, y, &result)) {
/* overflow occured */
...
}
...
http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
6.2.5 paragraph 9 is what you're looking for:
The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the same value in each
type is the same.31) A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
The previous postings all commented on the C99 standard, but in fact this guarantee was already available earlier.
The 5th paragraph of Section 6.1.2.5 Types
of the C89 standard states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned
integer type is reduced modulo the number that is one greater than
the largest value that can be represented by the resulting unsigned integer type.
Note that this allows C programmers to replace all unsigned divisions by some constant to be replaced by a multiplication with the inverse element of the ring formed by C's modulo 2^N interval arithmetic.
And this can be done without any "correction" as it would be necessary by approximating the division with a fixed-point multiplication with the reciprocal value.
Instead, the Extended Euclidian Algorithm can be used to find the inverse Element and use it as the multiplier. (Of course, for the sake of staying portable, bitwise AND operations should also be applied in order to ensure the results have the same bit widths.)
It may be worthwhile to comment that most C compilers already implement this as an optimization. However, such optimizations are not guaranteed, and therefore it might still be interesting for programmers to perform such optimizations manually in situations where speed matters, but the capabilities of the C optimizer are either unknown or particularly weak.
And as a final remark, the reason for why trying to do so at all: The machine-level instructions for multiplication are typically much faster than those for division, especially on high-performance CPUs.
I'm not sure if there are any compiler switches you can use to enforce uniform behavior for overflows in C/C++. Another option is to use the SafeInt<T> template. It's a cross platform C++ template that provides definitive overflow / underflow checks for all types of integer operations.
http://safeint.codeplex.com/
Related
Follow-up question for:
Type casting: double to char: multiple questions
Assigning an unsigned value to a signed char
Context: ISO/IEC 9899:202x (E) working draft — February 5, 2020 C17..C2x N2479 (emphasis added):
J.3 Implementation-defined behavior, J.3.5 Integers
— The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (6.3.1.3).
6.3.1.4 Real floating and integer
When a finite value of standard floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
Question: Why converting 'out of range integer to integer' leads to IB, but converting 'out of range floating-point to integer' leads to UB? I.e. why the behavior is not consistent (e.g. IB in both cases)?
UPD. Answer from user P.P. in duplicated question:
I doubt it's reasonably answerable. It's mainly because of history, and based on the implementations, behaviours of hardware, etc when C was standardized. So "consistency" wasn't possible/practical (it's not like the committee decided to arbitrarily classify certain behaviours as IB, UB, or unspecified).
From the point of view of the Standard, the question of whether to classify something as Implementation-Defined Behavior and Undefined Behavior depends on whether all implementations should be required to document a behavior generally consistent with the semantics of the language, regardless for cost or usefulness. There was no need to mandate that implementations process actions in ways their customers would find useful, because it was expected that implementations allowed to behave in such fashion would do so with or without a mandate. Consequently, it was seen as better to characterize as Undefined Behavior useful actions which implementations might process 100% consistently, than to characterize as Implementation-Defined actions which might sometimes be impractical to implement consistently.
Note that for an implementation to treat an action as having documented behavior could sometimes have costs that might not be obvious. Consider, for example:
int f1(int x, int y);
int f2(int x, int y, int z);
void test(int x, unsigned char y)
{
short temp = x/(y+1);
if (f1(x,y))
f2(x,y,temp);
}
On platforms where the conversion to short would always execute without side effects, or on implementations that were allowed to treat out-of-range conversions as Undefined Behavior, the computation of x/(y+1) and conversion to short could be deferred until after the call to f1, and skipped altogether if f1 returns zero. Such transformation could affect the behavior of a signal raised by the conversion, however, and would thus not appear to be allowable under the Standard on implementations where the conversion could raise a signal.
On the other hand, while it may be useful to have implementations raise a signal in case of an out-of-bounds conversion, such signals would mainly be useful in situations where quality of diagnostics was viewed as more important than performance. Implementations where performance was more important would be free to make optimizations like the above if they processed the conversion as having no side effects, and it seemed likely that the latter course of action would be practical on all platforms.
There were platforms where the fastest way of converting a float to an int will trap; as noted, the possibility that an action may trap would make classification as Implementation-Defined behavior expensive. While it is unlikely that there would have been any platforms where it would have been impractical to process a conversion from e.g. float to short as a conversion from float to int, followed by a conversion from int to short, there are platforms where that may not be the most useful behavior (e.g. if a platform can at no extra cost peg the result of such a conversion to the range of the target type, that may be more useful than a conversion to int and then the target type). Even if the authors of the Standard would have expected and intended that conversions from floating-point types to small integer types never yield unsequenced traps for any values which are within range of int, the Standard classifies as UB general actions which might behave unpredictably in some cases but in a predictable implementation-specific fashion in others, without any effort to identify specific cases where they should behave predictably.
The latter principle is perhaps best illustrated by examining the way left shift was described in C89 and C99. There is no reason why x << 0 shouldn't yield x for all integer values of x, and the way C89 specified the behavior would do precisely that. The C89 spec, however, specified behavior in some cases where it may be useful to allow some implementations to behave in a different, and not necessarily predictable, fashion. C99 makes no effort to identify situations where all implementations should treat left shifts of negative numbers the same way as C89 did, because the authors expected that all implementations would treat such cases in C89 fashion with or without a mandate.
Follow-up question for:
Type casting: double to char: multiple questions
Assigning an unsigned value to a signed char
Context: ISO/IEC 9899:202x (E) working draft — February 5, 2020 C17..C2x N2479 (emphasis added):
J.3 Implementation-defined behavior, J.3.5 Integers
— The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (6.3.1.3).
6.3.1.4 Real floating and integer
When a finite value of standard floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
Question: Why converting 'out of range integer to integer' leads to IB, but converting 'out of range floating-point to integer' leads to UB? I.e. why the behavior is not consistent (e.g. IB in both cases)?
UPD. Answer from user P.P. in duplicated question:
I doubt it's reasonably answerable. It's mainly because of history, and based on the implementations, behaviours of hardware, etc when C was standardized. So "consistency" wasn't possible/practical (it's not like the committee decided to arbitrarily classify certain behaviours as IB, UB, or unspecified).
From the point of view of the Standard, the question of whether to classify something as Implementation-Defined Behavior and Undefined Behavior depends on whether all implementations should be required to document a behavior generally consistent with the semantics of the language, regardless for cost or usefulness. There was no need to mandate that implementations process actions in ways their customers would find useful, because it was expected that implementations allowed to behave in such fashion would do so with or without a mandate. Consequently, it was seen as better to characterize as Undefined Behavior useful actions which implementations might process 100% consistently, than to characterize as Implementation-Defined actions which might sometimes be impractical to implement consistently.
Note that for an implementation to treat an action as having documented behavior could sometimes have costs that might not be obvious. Consider, for example:
int f1(int x, int y);
int f2(int x, int y, int z);
void test(int x, unsigned char y)
{
short temp = x/(y+1);
if (f1(x,y))
f2(x,y,temp);
}
On platforms where the conversion to short would always execute without side effects, or on implementations that were allowed to treat out-of-range conversions as Undefined Behavior, the computation of x/(y+1) and conversion to short could be deferred until after the call to f1, and skipped altogether if f1 returns zero. Such transformation could affect the behavior of a signal raised by the conversion, however, and would thus not appear to be allowable under the Standard on implementations where the conversion could raise a signal.
On the other hand, while it may be useful to have implementations raise a signal in case of an out-of-bounds conversion, such signals would mainly be useful in situations where quality of diagnostics was viewed as more important than performance. Implementations where performance was more important would be free to make optimizations like the above if they processed the conversion as having no side effects, and it seemed likely that the latter course of action would be practical on all platforms.
There were platforms where the fastest way of converting a float to an int will trap; as noted, the possibility that an action may trap would make classification as Implementation-Defined behavior expensive. While it is unlikely that there would have been any platforms where it would have been impractical to process a conversion from e.g. float to short as a conversion from float to int, followed by a conversion from int to short, there are platforms where that may not be the most useful behavior (e.g. if a platform can at no extra cost peg the result of such a conversion to the range of the target type, that may be more useful than a conversion to int and then the target type). Even if the authors of the Standard would have expected and intended that conversions from floating-point types to small integer types never yield unsequenced traps for any values which are within range of int, the Standard classifies as UB general actions which might behave unpredictably in some cases but in a predictable implementation-specific fashion in others, without any effort to identify specific cases where they should behave predictably.
The latter principle is perhaps best illustrated by examining the way left shift was described in C89 and C99. There is no reason why x << 0 shouldn't yield x for all integer values of x, and the way C89 specified the behavior would do precisely that. The C89 spec, however, specified behavior in some cases where it may be useful to allow some implementations to behave in a different, and not necessarily predictable, fashion. C99 makes no effort to identify situations where all implementations should treat left shifts of negative numbers the same way as C89 did, because the authors expected that all implementations would treat such cases in C89 fashion with or without a mandate.
In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value.
Relevant quote from ISO C99 (6.5.7/4)
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
But in C++ the behaviour is well defined.
ISO C++-03 (5.8/2)
The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise.
[Note: the constants ULONG_MAXand UINT_MAXare defined in the header ). ]
That means
int a = -1, b=2, c;
c= a << b ;
invokes Undefined Behaviour in C but the behaviour is well defined in C++.
What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?
On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?
My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?
P.S : Please don't give answers like "It is undefined behaviour because the Standard says so". :P
The paragraph you copied is talking about unsigned types. The behavior is undefined in C++. From the last C++0x draft:
The value of E1 << E2 is E1
left-shifted E2 bit positions; vacated
bits are zero-filled. If E1 has an
unsigned type, the value of the result
is E1 × 2E2, reduced modulo one more
than the maximum value representable
in the result type. Otherwise, if E1
has a signed type and non-negative
value, and E1×2E2 is representable in
the result type, then that is the
resulting value; otherwise, the
behavior is undefined.
EDIT: got a look at C++98 paper. It just doesn't mention signed types at all. So it's still undefined behavior.
Right-shift negative is implementation defined, right. Why? In my opinion: It's easy to implementation-define because there is no truncation from the left issues. When you shift left you must say not only what's shifted from the right but also what happens with the rest of the bits e.g. with two's complement representation, which is another story.
In C bitwise left shift operation invokes Undefined Behaviour when the
left side operand has negative value.
[...]
But in C++ the behaviour is well defined.
[...] why [...]
The easy answer is: Becuase the standards say so.
A longer answer is: It has probably something to do with the fact that C and C++ both allow other representations for negative numbers besides 2's complement. Giving fewer guarantees on what's going to happen makes it possible to use the languages on other hardware including obscure and/or old machines.
For some reason, the C++ standardization committee felt like adding a little guarantee about how the bit representation changes. But since negative numbers still may be represented via 1's complement or sign+magnitude the resulting value possibilities still vary.
Assuming 16 bit ints, we'll have
-1 = 1111111111111111 // 2's complement
-1 = 1111111111111110 // 1's complement
-1 = 1000000000000001 // sign+magnitude
Shifted to the left by 3, we'll get
-8 = 1111111111111000 // 2's complement
-15 = 1111111111110000 // 1's complement
8 = 0000000000001000 // sign+magnitude
What forced the ISO C++ committee to consider that behaviour well
defined as opposed to the behaviour in C?
I guess they made this guarantee so that you can use << appropriately when you know what you're doing (ie when you're sure your machine uses 2's complement).
On the other hand the behaviour is implementation defined for bitwise
right shift operation when the left operand is negative, right?
I'd have to check the standard. But you may be right. A right shift without sign extension on a 2's complement machine isn't particularly useful. So, the current state is definitely better than requiring vacated bits to be zero-filled because it leaves room for machines that do a sign extensions -- even though it is not guaranteed.
To answer your real question as stated in the title: as for any operation on a signed type, this has undefined behavior if the result of the mathematical operation doesn't fit in the target type (under- or overflow). Signed integer types are designed like that.
For the left shift operation if the value is positive or 0, the definition of the operator as a multiplication with a power of 2 makes sense, so everything is ok, unless the result overflows, nothing surprising.
If the value is negative, you could have the same interpretation of multiplication with a power of 2, but if you just think in terms of bit shift, this would be perhaps surprising. Obviously the standards committee wanted to avoid such ambiguity.
My conclusion:
if you want to do real bit pattern
operations use unsigned types
if you want to multiply a
value (signed or not) by a power of two, do just
that, something like
i * (1u << k)
your compiler will transform this into decent assembler in any case.
A lot of these kind of things are a balance between what common CPUs can actually support in a single instruction and what's useful enough to expect compiler-writers to guarantee even if it takes extra instructions. Generally, a programmer using bit-shifting operators expects them to map to single instructions on CPUs with such instructions, so that's why there's undefined or implementation behaviour where CPUs had various handling of "edge" conditions, rather than mandating a behaviour and having the operation be unexpectedly slow. Keep in mind that the additional pre/post or handling instructions may be made even for the simpler use cases. undefined behaviour may have been necessary where some CPUs generated traps/exceptions/interrupts (as distinct from C++ try/catch type exceptions) or generally useless/inexplicable results, while if the set of CPUs considered by the Standards Committee at the time all provided at least some defined behaviour, then they could make the behaviour implementation defined.
My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?
The folks at LLVM speculate the shift operator has constraints because of the way the instruction is implemented on various platforms. From What Every C Programmer Should Know About Undefined Behavior #1/3:
... My guess is that this originated because the underlying shift operations on various CPUs do different things with this: for example, X86 truncates 32-bit shift amount to 5 bits (so a shift by 32-bits is the same as a shift by 0-bits), but PowerPC truncates 32-bit shift amounts to 6 bits (so a shift by 32 produces zero). Because of these hardware differences, the behavior is completely undefined by C...
Nate that the discussion was about shifting an amount greater than the register size. But its the closest I've found to explaining the shift constraints from an authority.
I think a second reason is the potential sign change on a 2's compliment machine. But I've never read it anywhere (no offense to #sellibitze (and I happen to agree with him)).
In C89, the behavior of left-shifting negative values was unambiguously defined on two's-complement platforms which did not use padding bits on signed and unsigned integer types. The value bits that signed and unsigned types had in common to be in the same places, and the only place the sign bit for a signed type could go was in the same place as the upper value bit for unsigned types, which in turn had to be to the left of everything else.
The C89 mandated behaviors were useful and sensible for two's-complement platforms without padding, at least in cases where treating them as multiplication would not cause overflow. The behavior may not have been optimal on other platforms, or on implementations that seek to reliably trap signed integer overflow. The authors of C99 probably wanted to allow implementations flexibility in cases where the C89 mandated behavior would have been less than ideal, but nothing in the rationale suggests an intention that quality implementations shouldn't continue to behave in the old fashion in cases where there was no compelling reason to do otherwise.
Unfortunately, even though there have never been any implementations of C99 that don't use two's-complement math, the authors of C11 declined to define the common-case (non-overflow) behavior; IIRC, the claim was that doing so would impede "optimization". Having the left-shift operator invoke Undefined Behavior when the left-hand operand is negative allows compilers to assume that the shift will only be reachable when the left-hand operand is non-negative.
I'm dubious as to how often such optimizations are genuinely useful, but the rarity of such usefulness actually weighs in favor of leaving the behavior undefined. If the only situations where two's-complement implementations wouldn't behave in commonplace fashion are those where the optimization would actually be useful, and if no such situations actually exist, then implementations would behave in commonplace fashion with or without a mandate, and there's no need to mandate the behavior.
The behavior in C++03 is the same as in C++11 and C99, you just need to look beyond the rule for left-shift.
Section 5p5 of the Standard says that:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined
The left-shift expressions which are specifically called out in C99 and C++11 as being undefined behavior, are the same ones that evaluate to a result outside the range of representable values.
In fact, the sentence about unsigned types using modular arithmetic is there specifically to avoid generating values outside the representable range, which would automatically be undefined behavior.
The result of shifting depends upon the numeric representation. Shifting behaves like multiplication only when numbers are represented as two's complement. But the problem is not exclusive to negative numbers. Consider a 4-bit signed number represented in excess-8 (aka offset binary). The number 1 is represented as 1+8 or
1001
If we left shift this as bits, we get
0010
which is the representation for -6. Similarly, -1 is represented as -1+8
0111
which becomes
1110
when left-shifted, the representation for +6. The bitwise behavior is well-defined, but the numeric behavior is highly dependent on the system of representation.
In the following C snippet that checks if the first two bits of a 16-bit sequence are set:
bool is_pointer(unsigned short int sequence) {
return (sequence >> 14) == 3;
}
CLion's Clang-Tidy is giving me a "Use of a signed integer operand with a binary bitwise operator" warning, and I can't understand why. Is unsigned short not unsigned enough?
The code for this warning checks if either operand to the bitwise operator is signed. It is not sequence causing the warning, but 14, and you can alleviate the problem by making 14 unsigned by appending a u to the end.
(sequence >> 14u)
This warning is bad. As Roland's answer describes, CLion is fixing this.
There is a check in clang-tidy that is called hicpp-signed-bitwise. This check follows the wording of the HIC++ standard. That standard is freely available and says:
5.6.1. Do not use bitwise operators with signed operands
Use of signed operands with bitwise operators is in some cases subject to undefined or implementation defined behavior. Therefore, bitwise operators should only be used with operands of unsigned integral types.
The authors of the HIC++ coding standard misinterpreted the intention of the C and C++ standards and either accidentally or intentionally focused on the type of the operands instead of the value of the operands.
The check in clang-tidy implements exactly this wording, in order to conform to that standard. That check is not intended to be generally useful, its only purpose is to help the poor souls whose programs have to conform to that one stupid rule from the HIC++ standard.
The crucial point is that by definition integer literals without any suffix are of type int, and that type is defined as being a signed type. HIC++ now wrongly concludes that positive integer literals might be negative and thus could invoke undefined behavior.
For comparison, the C11 standard says:
6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
This wording is carefully chosen and emphasises that the value of the right operand is important, not its type. It also covers the case of a too large value, while the HIC++ standard simply forgot that case. Therefore, saying 1u << 1000u is ok in HIC++, while 1 << 3 isn't.
The best strategy is to explicitly disable this single check. There are several bug reports for CLion mentioning this, and it is getting fixed there.
Update 2019-12-16: I asked Perforce what the motivation behind this exact wording was and whether the wording was intentional. Here is their response:
Our C++ team who were involved in creating the HIC++ standard have taken a look at the Stack Overflow question you mentioned.
In short, referring to the object type in the HIC++ rule instead of the value is an intentional choice to allow easier automated checking of the code. The type of an object is always known, while the value is not.
HIC++ rules in general aim to be "decidable". Enforcing against the type ensures that a decidable check is always possible, ie. directly where the operator is used or where a signed type is converted to unsigned.
The rationale explicitly refers to "possible" undefined behavior, therefore a sensible implementation can exclude:
constants unless there is definitely an issue and,
unsigned types that are promoted to signed types.
The best operation is therefore for CLion to limit the checking to non-constant types before promotion.
I think the integer promotion causes here the warning. Operands smaller than an int are widened to integer for the arithmetic expression, which is signed. So your code is effectively return ( (int)sequence >> 14)==3; which leds to the warning. Try return ( (unsigned)sequence >> 14)==3; or return (sequence & 0xC000)==0xC000;.
I'm just wondering what guarantees, if any, either C11 or C99 provides in this regard.
Empirically, it seems that when I convert a floating-point value (regardless of its precision) to a signed integer, I get "nice" saturation whenever the floating-point value isn't representable in that signed integer range, even in the event that the floating-point value is plus or minus infinity (but I don't know or care about the NaN case).
There's a subtle issue here, which is that differences in rounding behavior could cause saturation in some cases but not in others, particularly when we're right on the edge of the a saturation boundary. I'm not concerned about that. My question is whether, once the floating-point machinery has decided upon the integer that it needs to output (which is platform-dependent), but in the event that said integer lies outside of the target signed integer range (which is platform-independent), whether or not saturation is guaranteed by the spec.
My default understanding is that what I'm seeing is merely a convenience of the underlying hardware, and that such behavior is not guaranteed because signed overflow is undefined. I hope I'm wrong, because I hate signed overflow and am trying to avoid it. So yes I'm also interested in the case of conversion to unsigned integers.
While I'm at it, what about negative 0? Is this value guaranteed to convert to integer zero, even though in some sense you could think of it as negative epsilon, which conventionally would round to -1?
6.3.1.4 Real floating and integer
When a finite value of real floating type is converted to an integer type other than _Bool,
the fractional part is discarded (i.e., the value is truncated toward zero). If the value of
the integral part cannot be represented by the integer type, the behavior is undefined.)
phresnel has already done a nice job answering the main thrust of your question. Some other details to keep in mind:
So yes I'm also interested in the case of conversion to unsigned integers.
The situation for unsigned isn't any nicer. Footnote 61 in C11 (the same footnote is present in C99):
The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, Utype_MAX+1)
Fortunately, this is easily remedied for both signed and unsigned conversions; simply clamp your input before converting if you need saturation.
While I'm at it, what about negative 0? Is this value guaranteed to convert to integer zero, even though in some sense you could think of it as negative epsilon, which conventionally would round to -1?
Yes, it is guaranteed to convert to integer zero. First, the value of -0 is exactly zero, not negative epsilon (contrary to the rumors you read on the internet). Second, conversions from floating-point to integer truncate the value, so even if the value were "negative epsilon" (whatever that means), the result would be zero because "negative epsilon" lies in the interval (-1, 1).
While I'm at it, what about negative 0? Is this value guaranteed to
convert to integer zero, even though in some sense you could think of
it as negative epsilon, which conventionally would round to -1?
It is truncated, so goes towards zero - meaning anything less than 1.0 and greater than -1.0 becomes 0. Negative zero is turned into zero as far as "typcial" platforms are concerned. I'm not entirely sure if this is guaranteed by the standard, but I believe in practice you can rely on it, even if the standard doesn't define it [unless you plan for your code to run on extremely "strange" equipment, such as DSP's or GPU's].