Is it legal to zero the memory of an array of doubles (using memset(…, 0, …)) or struct containing doubles?
The question implies two different things:
From the point of view of C standard: Is this undefined behavior of not? (On any particular platform, I presume, this cannot be undefined behavior, as it just depends on the in-memory representation of floating-point numbers—that’s all.)
From practical point of view: Is it OK on Intel platform? (Regardless of what the standard is saying.)
The C99 standard Annex F says:
This annex specifies C language support for the IEC 60559 floating-point standard. The
IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that
defines __STDC_IEC_559__ shall conform to the specifications in this annex. Where
a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified
behavior is adopted by reference, unless stated otherwise.
And, immediately after:
The C floating types match the IEC 60559 formats as follows:
The float type matches the IEC 60559 single format.
The double type matches the IEC 60559 double format.
Thus, since IEC 60559 is basically IEEE 754-1985, and since this specifies that 8 zero bytes mean 0.0 (as #David Heffernan said), it means that if you find __STDC_IEC_559__ defined, you can safely do a 0.0 initialization with memset.
If you are talking about IEEE754 then the standard defines +0.0 to double precision as 8 zero bytes. If you know that you are backed by IEEE754 floating point then this is well-defined.
As for Intel, I can't think of a compiler that doesn't use IEEE754 on Intel x86/x64.
David Heffernan has given a good answer for part (2) of your question. For part (1):
The C99 standard makes no guarantees about the representation of floating-point values in the general case. §6.2.6.1 says:
The representations of all types are unspecified except as stated in this subclause.
...and that subclause makes no further mention of floating point.
You said:
(on a fixed platform, how can this UB ... it just depends of floating representation that's all ...)
Indeed - there a difference between "undefined behaviour", "unspecified behaviour" and "implementation-defined behaviour":
"undefined behaviour" means that anything could happen (including a runtime crash);
"unspecified behaviour" means that the compiler is free to implement something sensible in any way it likes, but there is no requirement for the implementation choice to be documented;
"implementation-defined behaviour" means that the compiler is free to implement something sensible in any way it likes, and is supposed to document that choice (for example, see here for the implementation choices documented by the most recent release of GCC);
and so, as floating point representation is unspecified behaviour, it can vary in an undocumented manner from platform to platform (where "platform" here means "the combination of hardware and compiler" rather than just "hardware").
(I'm not sure how useful the guarantee that a double is represented such that all-bits-zero is +0.0 if __STDC_IEC_559__ is defined, as described in Matteo Italia's answer, actually is in practice. For example, GCC never defines this, even though is uses IEEE 754 / IEC 60559 on many hardware platforms.)
Even though it is unlikely that you encounter a machine where this has problems, you may also avoid this relatively easily if you are really talking of arrays as you indicate in the question title, and if these arrays are of known length at compile time (that is not VLA), then just initializing them is probably even more convenient:
double A[133] = { 0 };
should always work. If you'd have to zero such an array again, later, and your compiler is compliant to modern C (C99) you can do this with a compound literal
memcpy(A, (double const[133]){ 0 }, 133*sizeof(double));
on any modern compiler this should be as efficient as memset, but has the advantage of not relying on a particular encoding of double.
As Matteo Italia says, that’s legal according to the standard, but I wouldn’t use it. Something like
double *p = V, *last = V + N; // N is count
while (p != last) *(p++) = 0;
is at least twice faster.
It’s “legal” to use memset. The issue is whether it produces a bit pattern where array[x] == 0.0 is true. While the basic C standard doesn’t require that to be true, I’d be interested in hearing examples where it isn’t!
It appears that setting to zero via memset is equivalent to assigning 0.0 on IBM-AIX, HP-UX (PARISC), HP-UX (IA-64), Linux (IA-64, I think).
Here is a trivial test code:
double dFloat1 = 0.0;
double dFloat2 = 111111.1111111;
memset(&dFloat2, 0, sizeof(dFloat2));
if (dFloat1 == dFloat2) {
fprintf(stdout, "memset appears to be equivalent to = 0.0\n");
} else {
fprintf(stdout, "memset is NOT equivalent to = 0.0\n");
}
Well, I think the zeroing is "legal" (after all, it's zeroing a regular buffer), but I have no idea if the standard lets you assume anything about the resulting logical value. My guess would be that the C standard leaves it as undefined.
Related
I am writing code that depends on halfway ties in C (specifically c11) rounding to even. When using rint with rounding mode as FE_TONEAREST, I have not found a guarantee in the C standard that states how ties are handled with FE_NEAREST.
Page 509 of the ISO C standard states that
The fegetround and fesetround functions in <fenv.h> provide the facility
to select among the IEC 60559 directed rounding modes represented by the rounding
direction macros in <fenv.h> (FE_TONEAREST, FE_UPWARD, FE_DOWNWARD,
FE_TOWARDZERO) and the values 0, 1, 2, and 3 of FLT_ROUNDS are the
IEC 60559 directed rounding modes.
However, I cannot find any documentation in the IEC 60559 standard for rounding modes. While on my test machine, the behavior is that in FE_TONEAREST, ties are rounded to even, I want to be sure that this is enforced by the c11 standard and is not implementation defined.
C11 Annex F says, in §F.1:
The IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for microprocessor systems, second edition (IEC 60559:1989) [...]
and then later, in §F.3, paragraph 1 (as you already quoted in the question):
The fegetround and fesetround functions in <fenv.h> provide the facility to select among the IEC 60559 directed rounding modes represented by the rounding direction macros in <fenv.h> (FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, FE_TOWARDZERO) and the values 0, 1, 2, and 3 of FLT_ROUNDS are the IEC 60559 directed rounding modes.
(Note: to be precise, I'm looking at the publicly available N1570 final draft for the C11 standard, but my understanding is that it's essentially identical to the final standard.)
So the reference to IEC 60559 here is actually a reference to the (now twice superseded) IEC 60559:1989 standard. I don't have access to that precise standard, but I do have a copy of IEEE 754-1985, and I believe that the content of those two standards (IEC 60559:1989 and IEEE 754-1985) is supposed to be essentially identical, though I observe that there are at least differences in capitalisation in the tables of contents of the respective standards. (Thanks to Michael Burr for confirming in a comment that the standards are identical in substance, if not word-for-word identical.)
IEEE 754-1985, in section 4, defines four rounding modes, which it terms "round to nearest", "round toward +∞", "round toward -∞", and "round toward zero". The latter three are described as "directed rounding modes". For "round to nearest" we have in §4.1 the text:
if the two nearest representable values are equally near, the one with its least significant bit zero shall be delivered
In other words, it's describing round-ties-to-even. (Later versions of the IEEE 754 standard introduce the names "roundTiesToEven", "roundTowardPositive", "roundTowardNegative" and "roundTowardZero" for the above rounding modes (now termed "attributes" rather than "modes", I believe because "mode" suggests some kind of persistent environmental setting), and define a fifth rounding attribute "roundTiesToAway". But C11 is explicit that it's referring to this earlier version of the standard.)
Now since C11 doesn't use the exact same terms as IEEE 754-1985, it's left to us to infer that the four rounding modes above correspond to "FE_TONEAREST", "FE_UPWARD", "FE_DOWNWARD" and "FE_TOWARDZERO", in that order, but there doesn't seem to be any reason to doubt that that's the intended matching. So assuming __STDC_IEC_559__ is defined, FE_TONEAREST should indeed correspond to "roundTiesToEven". Nate Eldredge's comment about C2x further reinforces that this is the intended matching.
So all in all, it's clear (to me at least) that the intent is that when __STDC_IEC_559__ is defined, the rounding mode FE_TONEAREST should correspond to "round to nearest", named in later versions of the IEEE 754 standard as "roundTiesToEven". The degree to which implementations of C honour that intent is, of course, a separate question (but I'd expect the vast majority of them to do so).
If a C compiler's FP is based on non-IEEE 754, then is such C compiler C standard compliant?
If the implementation says it conforms to IEEE 754/IEC 60559 (__STDC_IEC_559__ is defined), it must do so.
But the C standard does not require that. C11 footnote 356:
Implementations that do not define __STDC_IEC_559__ are not required to conform to these specifications.
C does not not require for example binary floating point - the minimal requirements include for example that any floating point type must have range of at least 1e-37 to 1e37
As far as the C standard is concerned, the representation used for floating point numbers is unspecified.
Section 6.2.6 covers Representation of Types. In particular, 6.2.6.1p1 states:
The representations of all types are unspecified except as stated in this subclause.
And nowhere in section 6.2.6 are floating point types mentioned. So an implementation may use any representation it chooses, and it is not required to document that decision.
I'm looking for a way to detect whether a C compiler uses the IEEE-754 floating point representation at compile time, preferably in the preprocessor, but a constant expression is fine too.
Note that the __STDC_IEC_559__ macro does not fit this purpose, as an implementation may use the correct representation while not fully supporting Annex F.
Not an absolute 100% solution, but will get you practically close.
Check if the characteristics of floating type double match binary64:
#include <float.h>
#define BINARY64_LIKE ( \
(FLT_RADIX == 2) \
(DBL_MANT_DIG == 53) \
(DBL_DECIMAL_DIG == 17) \
(DBL_DIG == 15) \
(DBL_MIN_EXP == -1021) \
(DBL_HAS_SUBNORM == 1) \
(DBL_MIN_10_EXP == -307) \
(DBL_MAX_EXP == +1024) \
(DBL_MAX_10_EXP == +308))
BINARY64_LIKE usable at compile time. Need additional work though for older compilers that do not define them all like: DBL_HAS_SUBNORM since C11.
Likewise for float.
Since C11, code could use _Static_assert() to detect some attributes.
_Static_assert(sizeof(double)*CHAR_BIT == 64, "double unexpected size");
See also Are there any commonly used floating point formats besides IEEE754?.
Last non-IEEE754 FP format I used was CCSI 5 years ago.
Caution: Unclear why OP wants this test. If code is doing some bit manipulations of a floating point, even with __STDC_IEC_559__ defined there remains at least one hole: The endian of floating point and integer may differ - uncommon - but out there.
Other potential holes: support of -0.0, NaN sign, encoding of infinity, signalling NaN, quiet NaN, NaN payload: the usual suspects.
As of July 2020, this would still be compiler specific... though C2x intends to change that with the __STDC_IEC_60559_BFP__ macro - see Annex F, section F.2.
It might be noted that:
The compiler usually doesn't choose the binary representation. The compiler usually follows the target system's architecture (the chipset instruction design for the CPU / GPU, etc').
The use of non-conforming binary representations for floating-point is pretty much a thing of the past. If you're using a modern (or even a moderately modern) system from the past 10 years, you are almost certainly using a conforming binary representation.
In C you can test to see if a double is NaN using isnan(x). However many places online, including for example this SO answer say that you can simply use x!=x instead.
Is x!=x in any C specification as a method that is guaranteed to test if x is NaN? I can't find it myself and I would like my code to work with different compilers.
NaN as the only value x with the property x!=x is an IEEE 754 guarantee. Whether it is a faithful test to recognize NaN in C boils down to how closely the representation of variables and the operations are mapped to IEEE 754 formats and operations in the compiler(s) you intend to use.
You should in particular worry about “excess precision” and the way compilers deal with it. Excess precision is what happens when the FPU only conveniently supports computations in a wider format than the compiler would like to use for float and double types. In this case computations can be made at the wider precision, and rounded to the type's precision when the compiler feels like it in an unpredictable way.
The C99 standard defined a way to handle this excess precision that preserved the property that only NaN was different from itself, but for a long time after 1999 (and even nowadays when the compiler's authors do not care), in presence of excess precision, x != x could possibly be true for any variable x that contains the finite result of a computation, if the compiler chooses to round the excess-precision result of the computation in-between the evaluation of the first x and the second x.
This report describes the dark times of compilers that made no effort to implement C99 (either because it wasn't 1999 yet or because they didn't care enough).
This 2008 post describes how GCC started to implement the C99 standard for excess precision in 2008. Before that, GCC could provide one with all the surprises described in the aforementioned report.
Of course, if the target platform does not implement IEEE 754 at all, a NaN value may not even exist, or exist and have different properties than specified by IEEE 754. The common cases are a compiler that implements IEEE 754 quite faithfully with FLT_EVAL_METHOD set to 0, 1 or 2 (all of which guarantee that x != x iff x is NaN), or a compiler with a non-standard implementation of excess precision, where x != x is not a reliable test for NaN.
Please refer to the normative section Annex F: IEC 60559 floating-point arithmetic of the C standard:
F.1 Introduction
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex.
Implementations that do not define __STDC_IEC_559__ are not required to conform to these specifications.
F.9.3 Relational operators
The expression x ≠ x is true if x is a NaN.
The expression x = x is false if X is a Nan.
F.3 Operators and functions
The isnan macro in <math.h> provides the isnan function recommended in the Appendix to IEC 60559.
In this rule you have to go to ISO/IEC 9899:1990 Appendix G and study each case of Implementation defined behavior to document them.
It's a difficult task to determine what are the manual checks to do in the code.
Is there some kind of list of manual checks to do because of this rule?
MISRA-C is primarily concerned with avoiding unpredictable behavior in the C language, those “traps and pitfalls” (such as undefined and unspecified behavior) all C developers should be aware of that a compiler will not always warn you about. This includes implementation-defined behavior, where the C standard specifies the behavior of certain constructs after compilation can vary. These tend to be less critical from a safety point of view, provided the compiler documentation describes its intended behavior as required by the standard.
That is, for each specific compiler the behavior is well-defined, but the concern is to assure the developers have verified this, including documenting language extensions, known bugs in the compiler (and build chain) and workarounds.
Although it is possible to manually check C code fully for MISRA-C compliancy, it is not recommended. The guidelines were developed with static analysis tools in mind. Not all guidelines can be fully checked by tools, but the better MISRA-C tools (be careful in your evaluations, there are not many “good” ones), will at least assist where it can identify automatically where code relies on implementation-specific behavior. This includes all the checks required in Rule 3.1., where implementation-defined behavior cannot be completely checked by a tool, then a manual review will be required.
Also, if you are starting a new MISRA-C project, I highly recommend referring to MISRA-C:2012, even if you are required to be MISRA-C:2004 compliant. Having MISRA-C:2012 around helps, because it has clarified many of the guidelines, including additional rationale, explanations and examples. The standard (which can be obtained at misra-c.com ) lists the C90 and C99 implementation-defined behaviors that are considered to have the potential to cause unintended behavior. This may or may not overlap with guidelines that address implementation-defined behaviors that MISRA-C is specifically concerned about.
First of all, the standard definition of implementation-defined behavior is: specific behavior which the compiler must document. So you can always refer to the compiler documentation whenever there is a need to document how a certain implementation-defined behavior is implemented.
What's left to you to do then is to document where the code relies on implementation-defined behavior. This is preferably done in source code comments.
Spontaneously, here are the most important things which you need to look for in the code. The list is not including those cases that are already covered by other MISRA rules (for example signedness of char).
The size of all the integer types. The size of int being most important, as it determines which type that is given to integer literals, C "boolean" expressions, implicitly promoted integers etc etc.
Obscure integer formats that aren't standard two's complement.
Any reliance on endianess.
The enum type format.
The floating point format.
Pointer to integer conversions, in case they are obscure on the given system.
Behavior of function inlining and the register keyword, if these are used.
Alignment issues including struct padding. Reliance on the size of a struct/union.
#include paths, in case they are obscure. Particularly if they are absolute and not relative.
Bitwise operators mixed with signed types (in most cases this is a bug or design mistake).