Is C floating-point non-deterministic? - c

I have read somewhere that there is a source of non-determinism in C double-precision floating point as follows:
The C standard says that 64-bit floats (doubles) are required to produce only about 64-bit accuracy.
Hardware may do floating point in 80-bit registers.
Because of (1), the C compiler is not required to clear the low-order bits of floating-point registers before stuffing a double into the high-order bits.
This means YMMV, i.e. small differences in results can happen.
Is there any now-common combination of hardware and software where this really happens? I see in other threads that .net has this problem, but is C doubles via gcc OK? (e.g. I am testing for convergence of successive approximations based on exact equality)

The behavior on implementations with excess precision, which seems to be the issue you're concerned about, is specified strictly by the standard in most if not all cases. Combined with IEEE 754 (assuming your C implementation follows Annex F) this does not leave room for the kinds of non-determinism you seem to be asking about. In particular, things like x == x (which Mehrdad mentioned in a comment) failing are forbidden since there are rules for when excess precision is kept in an expression and when it is discarded. Explicit casts and assignment to an object are among the operations that drop excess precision and ensure that you're working with the nominal type.
Note however that there are still a lot of broken compilers out there that don't conform to the standards. GCC intentionally disregards them unless you use -std=c99 or -std=c11 (i.e. the "gnu99" and "gnu11" options are intentionally broken in this regard). And prior to GCC 4.5, correct handling of excess precision was not even supported.

This may happen on Intel x86 code that uses the x87 floating-point unit (except probably 3., which seems bogus. LSB bits will be set to zero.). So the hardware platform is very common, but on the software side use of x87 is dying out in favor of SSE.
Basically whether a number is represented in 80 or 64 bits is at the whim of the compiler and may change at any point in the code. With for example the consequence that a number which just tested non-zero is now zero. m)
See "The pitfalls of verifying floating-point computations", page 8ff.

Testing for exact convergence (or equality) in floating point is usually a bad idea, even with in a totally deterministic environment. FP is an approximate representation to begin with. It is much safer to test for convergence to within a specified epsilon.

Related

Is it safe to assume floating point is represented using IEEE754 floats in C?

Floating point is implementation defined in the C. So there isn't any guarantees.
Our code needs to be portable, we are discussing whether or not acceptable to use IEEE754 floats in our protocol. For performance reasons it would be nice if we don't have to convert back and forth between a fixed point format when sending or receiving data.
While I know that there can be differences between platforms and architectures regarding the size of long or wchar_t. But I can't seem to find any specific about the float and double.
What I found so far that the byte order maybe reversed on big endian platforms. While there are platforms without floating point support where a code containing float and double wouldn't even link. Otherwise platforms seem to stick to IEEE754 single and double precision.
So is it safe to assume that floating point is in IEEE754 when available?
EDIT: In response to a comment:
What is your definition of "safe"?
By safe I mean, the bit pattern on one system means the same on the another (after the byte rotation to deal with endianness).
Essentially all architectures in current non-punch-card use, including embedded architectures and exotic signal processing architectures, offer one of two floating point systems:
IEEE-754.
IEEE-754 except for blah. That is, they mostly implement 754, but cheap out on some of the more expensive and/or fiddly bits.
The most common cheap-outs:
Flushing denormals to zero. This invalidates certain sometimes-useful theorems (in particular, the theorem that a-b can be exactly represented if a and b are within a factor of 2), but in practice it's generally not going to be an issue.
Failure to recognize inf and NaN as special. These architectures will fail to follow the rules regarding inf and NaN as operands, and may not saturate to inf, instead producing numbers that are larger than FLT_MAX, which will generally be recognized by other architectures as NaN.
Proper rounding of division and square root. It's a whole lot easier to guarantee that the result is within 1-3 ulps of the exact result than within 1/2 ulp. A particularly common case is for division to be implemented as reciprocal+multiplication, which loses you one bit of precision.
Fewer or no guard digits. This is an unusual cheap-out, but means that other operations can be 1-2 ulps off.
BUUUUT... even those except for blah architectures still use IEEE-754's representation of numbers. Other than byte ordering issues, the bits describing a float or double on architecture A are essentially guaranteed to have the same meaning on architecture B.
So as long as all you care about is the representation of values, you're totally fine. If you care about cross-platform consistency of operations, you may need to do some extra work.
EDIT: As Chux mentions in the comments, a common extra source of inconsistency between platforms is the use of extended precision, such as the x87's 80-bit internal representation. That's the opposite of a cheap-out, and (with proper treatment) fully conforms to both IEEE-754 and the C standard, but it will likewise cause results to differ between architectures, and even between compiler versions and following apparently minor and unrelated code changes. However: a particular x86/x64 executable will NOT produce different results on different processors due to extended precision.
There is a macro to check (since C99):
C11 §6.10.8.3 Conditional feature macros
__STDC_IEC_559__ The integer constant 1, intended to indicate conformance to the specifications in annex F (IEC 60559 floating-point arithmetic).
IEC 60559 (short for ISO/IEC/IEEE 60559) is another name for IEEE-754.
Annex F then establishes the mapping between C floating types and IEEE-754 types:
The C floating types match the IEC 60559 formats as follows:
The float type matches the IEC 60559 single format.
The double type matches the IEC 60559 double format.
The long double type matches an IEC 60559 extended format, 357) else a
non-IEC 60559 extended format, else the IEC 60559 double format.
I suggest you need to look more carefully at your definition of portable.
I would also suggest your definition of "safe" is insufficient. Even if the binary representation (allowing for endianness) is okay, the operations on variables may behave differently. After all, there are few applications of floating point that don't involve operations on variables.
If you want to support all host architectures that have ever been created then assuming IEEE floating point format is inherently unsafe. You will have to deal with systems that support different formats, systems that don't support floating point at all, systems for which compilers have switches to select floating point behaviours (with some behaviours being associated with non-IEEE formats), CPUs that have an optional co-processor (so floating point support depends on whether an additional chip is installed, but otherwise variants of the CPU are identical), systems that emulate floating point operations in software (some such software emulators are configurable at run time), and systems with buggy or incomplete implementation of floating point (which may or may not be IEEE based).
If you are willing to limit yourself to hardware of post 2000 vintage, then your risk is lower but non-zero. Virtually all CPUs of that vintage support IEEE in some form. However you still (as with older CPUs too) need to consider what floating point operations you wish to have supported, and the trade-offs you are willing to accept to have them. Different CPUs (or software emulation) have less complete implementation of floating point than others, and some are configured by default to not support some features - so it is necessary to change settings to enable some features, which can impact on performance or correctness of your code.
If you need to share floating point values between applications (which may be on different hosts with different features, built with different compilers, etc) then you will need to define a protocol. That protocol might involve IEEE format, but all your applications will need to be able to handle conversion between the protocol and their native representations.
Almost all common architectures now use IEEE-754, this is not required by the standard. There used to be old non IEE-754 architectures, and some could still be around.
If the only requirement is for exchange of network data, my advice is:
if __STDC_IEC_559__ is defined, only use network order for the bytes and assume you do have standard IEE-754 for float and double.
if __STDC_IEC_559__ is not defined, use a special interchange format, that could be IEE-754 - one single protocol - or anything else - need a protocol indication.
Like others have mentioned, there's the __STDC_IEC_559__ macro, but it isn't very useful because it's only set by compilers that completely implement the respective annex in the C standard. There are compilers that implement only a subset but still have (mostly) usable IEEE floating point support.
If you're only concerned with the binary representation, you should write a feature test that checks the bit patterns of certain floating numbers. Something like:
#include <stdint.h>
#include <stdio.h>
typedef union {
double d;
uint64_t i;
} double_bits;
int main() {
double_bits b;
b.d = 2.5;
if (b.i != UINT64_C(0x4004000000000000)) {
fprintf(stderr, "Not an IEEE-754 double\n");
return 1;
}
return 0;
}
Check a couple of numbers with different exponents, mantissae, and signs, and you should be on the safe side. Since these tests aren't expensive, you could even run them once at runtime.
Strictly speaking, it's not safe to assume floating-point support; generally speaking, the vast majority of platforms will support it. Notable exceptions include (now deprecated) VMS systems running on Alpha chips
If you have the luxury of runtime checking, consider paranoia, a floating-point vetting tool written by William Kahan.
Edit: sounds like your application is more concerned with binary formats as they pertain to storage and/or serialization. I would suggest narrowing your scope to choosing a third-party library that supports this. You could do worse than Google Protocol Buffers.

How floating point conversion was handled before the invention of FPU and SSE?

I am trying to understand how floating point conversion is handled at the low level. So based on my understanding, this is implemented in hardware. So, for example, SSE provides the instruction cvttss2si which converts a float to an int.
But my question is: was the floating point conversion always handled this way? What about before the invention of FPU and SSE, was the calculation done manually using Assembly code?
It depends on the processor, and there have been a huge number of different processors over the years.
FPU stands for "floating-point unit". It's a more or less generic term that can refer to a floating-point hardware unit for any computer system. Some systems might have floating-point operations built into the CPU. Others might have a separate chip. Yet others might not have hardware floating-point support at all. If you specify a floating-point conversion in your code, the compiler will generate whatever CPU instructions are needed to perform the necessary computation. On some systems, that might be a call to a subroutine that does whatever bit manipulations are needed.
SSE stands for "Streaming SIMD Extensions", and is specific to the x86 family of CPUs. For non-x86 CPUs, there's no "before" or "after" SSE; SSE simply doesn't apply.
The conversion from floating-point to integer is considered a basic enough operation that the 387 instruction set already had such an instruction, FIST—although not useful for compiling the (int)f construct of C programs, as that instruction used the current rounding mode.
Some RISC instruction sets have always considered that a dedicated conversion instruction from floating-point to integer was an unnecessary luxury, and that this could be done with several instructions accessing the IEEE 754 floating-point representation. One basic scheme might look like this blog post, although the blog post is about rounding a float to a
float representing the nearest integer.
Prior to the standardization of IEEE 754 arithmetic, there were many competing vendor-specific ways of doing floating-point arithmetic. These had different ranges, precision, and different behavior with respect to overflow, underflow, signed zeroes, and undefined results such as 0/0 or sqrt(-1).
However, you can divide floating point implementations into two basic groups: hardware and software. In hardware, you would typically see an opcode which performs the conversion, although coprocessor FPUs can complicate things. In software, the conversion would be done by a function.
Today, there are still soft FPUs around, mostly on embedded systems. Not too long ago, this was common for mobile devices, but soft FPUs are still the norm on smaller systems.
Indeed, the floating point operations are a challenge for hardware engineers, as they require much hardware (leading to higher costs of the final product) and consume much power. There are some architectures that do not contain a floating point unit. There are also architectures that do not provide instructions even for basic operations like integer division. The ARM architecture is an example of this, where you have to implement division in software. Also, the floating point unit comes as an optional coprocessor in this architecture. It is worth thinking about this, considering the fact that ARM is the main architecture used in embedded systems.
IEEE 754 (the floating point standard used today in most of the applications) is not the only way of representing real numbers. You can also represent them using a fixed point format. For example, if you have a 32 bit machine, you can assume you have a decimal point between bit 15 and 16 and perform operations keeping this in mind. This is a simple way of representing floating numbers and it can be handled in software easily.
It depends on the implementation of the compiler. You can implement floating point math in just about any language (an example in C: http://www.jhauser.us/arithmetic/SoftFloat.html), and so usually the compiler's runtime library will include a software implementation of things like floating point math (or possibly the target hardware has always supported native instructions for this - again, depends on the hardware) and instructions which target the FPU or use SSE are offered as an optimization.
Before Floating Point Units doesn't really apply, since some of the earliest computers made back in the 1940's supported floating point numbers: wiki - first electro mechanical computers.
On processors without floating point hardware, the floating point operations are implemented in software, or on some computers, in microcode as opposed to being fully hardware implemented: wiki - microcode , or the operations could be handled by separate hardware components such as the Intel x87 series: wiki - x87 .
But my question is: was the floating point conversion always handled this way?
No, there's no x87 or SSE on architectures other than x86 so no cvttss2si either
Everything you can do with software, you can also do in hardware and vice versa.
The same to float conversion. If you don't have the hardware support, just do some bit hacking. There's nothing low level here so you can do it in C or any other languages easily. There is already a lot of solutions on SO
Converting Int to Float/Float to Int using Bitwise
Casting float to int (bitwise) in C
Converting float to an int (float2int) using only bitwise manipulation
...
Yes. The exponent was changed to 0 by shifting the mantissa, denormalizing the number. If the result was too large for an int an exception was generated. Otherwise the denormalized number (minus the factional part and optionally rounded) is the integer equivalent.

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

I would like to know if any code in C or C++ using floating point arithmetic would produce bit exact results in any x86 based architecture, regardless of the complexity of the code.
To my knowledge, any x86 architecture since the Intel 8087 uses a FPU unit prepared to handle IEEE-754 floating point numbers, and I cannot see any reason why the result would be different in different architectures. However, if they were different (namely due to different compiler or different optimization level), would there be some way to produce bit-exact results by just configuring the compiler?
Table of contents:
C/C++
asm
Creating real-life software that achieves this.
In C or C++:
No, a fully ISO C11 and IEEE-conforming C implementation does not guarantee bit-identical results to other C implementations, even other implementations on the same hardware.
(And first of all, I'm going to assume we're talking about normal C implementations where double is the IEEE-754 binary64 format, etc., even though it would be legal for a C implementation on x86 to use some other format for double and implement FP math with software emulation, and define the limits in float.h. That might have been plausible when not all x86 CPUs included with an FPU, but in 2016 that's Deathstation 9000 territory.)
related: Bruce Dawson's Floating-Point Determinism blog post is an answer to this question. His opening paragraph is amusing (and is followed by a lot of interesting stuff):
Is IEEE floating-point math deterministic? Will you always get the same results from the same inputs? The answer is an unequivocal “yes”. Unfortunately the answer is also an unequivocal “no”. I’m afraid you will need to clarify your question.
If you're pondering this question, then you will definitely want to have a look at the index to Bruce's series of articles about floating point math, as implemented by C compilers on x86, and also asm, and IEEE FP in general.
First problem: Only "basic operations": + - * / and sqrt are required to return "correctly rounded" results, i.e. <= 0.5ulp of error, correctly-rounded out to the last bit of the mantissa, so the results is the closest representable value to the exact result.
Other math library functions like pow(), log(), and sin() allow implementers to make a tradeoff between speed and accuracy. For example, glibc generally favours accuracy, and is slower than Apple's OS X math libraries for some functions, IIRC. See also glibc's documentation of the error bounds for every libm function across different architectures.
But wait, it gets worse. Even code that only uses the correctly-rounded basic operations doesn't guarantee the same results.
C rules also allow some flexibility in keeping higher precision temporaries. The implementation defines FLT_EVAL_METHOD so code can detect how it works, but you don't get a choice if you don't like what the implementation does. You do get a choice (with #pragma STDC FP_CONTRACT off) to forbid the compiler from e.g. turning a*b + c into an FMA with no rounding of the a*b temporary before the add.
On x86, compilers targeting 32-bit non-SSE code (i.e. using obsolete x87 instructions) typically keep FP temporaries in x87 registers between operations. This produces the FLT_EVAL_METHOD = 2 behaviour of 80-bit precision. (The standard specifies that rounding still happens on every assignment, but real compilers like gcc don't actually do extra store/reloads for rounding unless you use -ffloat-store. See https://gcc.gnu.org/wiki/FloatingPointMath. That part of the standard seems to have been written assuming non-optimizing compilers, or hardware that efficiently provides rounding to the type width like non-x86, or like x87 with precision set to round to 64-bit double instead of 80-bit long double. Storing after every statement is exactly what gcc -O0 and most other compilers do, and the standard allows extra precision within evaluation of one expression.)
So when targeting x87, the compiler is allowed to evaluate the sum of three floats with two x87 FADD instructions, without rounding off the sum of the first two to a 32-bit float. In that case, the temporary has 80-bit precision... Or does it? Not always, because the C implementation's startup code (or a Direct3D library!!!) may have changed the precision setting in the x87 control word, so values in x87 registers are rounded to 53 or 24 bit mantissa. (This makes FDIV and FSQRT run a bit faster.) All of this from Bruce Dawson's article about intermediate FP precision).
In assembly:
With rounding mode and precision set the same, I think every x86 CPU should give bit-identical results for the same inputs, even for complex x87 instructions like FSIN.
Intel's manuals don't define exactly what those results are for every case, but I think Intel aims for bit-exact backwards compatibility. I doubt they'll ever add extended-precision range-reduction for FSIN, for example. It uses the 80-bit pi constant you get with fldpi (correctly-rounded 64-bit mantissa, actually 66-bit because the next 2 bits of the exact value are zero). Intel's documentation of the worst-case-error was off by a factor of 1.3 quintillion until they updated it after Bruce Dawson noticed how bad the worst-case actually was. But this can only be fixed with extended-precision range reduction, so it wouldn't be cheap in hardware.
I don't know if AMD implements their FSIN and other micro-coded instructions to always give bit-identical results to Intel, but I wouldn't be surprised. Some software does rely on it, I think.
Since SSE only provides instructions for add/sub/mul/div/sqrt, there's nothing too interesting to say. They implement the IEEE operation exactly, so there's no chance that any x86 implementation will ever give you anything different (unless the rounding mode is set differently, or denormals-are-zero and/or flush-to-zero are different and you have any denormals).
SSE rsqrt (fast approximate reciprocal square root) is not exactly specified, and I think it's possible you might get a different result even after a Newton iteration, but other than that SSE/SSE2 is always bit exact in asm, assuming the MXCSR isn't set weird. So the only question is getting the compiler go generate the same code, or just using the same binaries.
In real life:
So, if you statically link a libm that uses SSE/SSE2 and distribute those binaries, they will run the same everywhere. Unless that library uses run-time CPU detection to choose alternate implementations...
As #Yan Zhou points out, you pretty much need to control every bit of the implementation down to the asm to get bit-exact results.
However, some games really do depend on this for multi-player, but often with detection/correction for clients that get out of sync. Instead of sending the entire game state over the network every frame, every client computes what happens next. If the game engine is carefully implemented to be deterministic, they stay in sync.
In the Spring RTS, clients checksum their gamestate to detect desync. I haven't played it for a while, but I do remember reading something at least 5 years ago about them trying to achieve sync by making sure all their x86 builds used SSE math, even the 32-bit builds.
One possible reason for some games not allowing multi-player between PC and non-x86 console systems is that the engine gives the same results on all PCs, but different results on the different-architecture console with a different compiler.
Further reading: GAFFER ON GAMES: Floating Point Determinism. Some techniques that real game engines use to get deterministic results. e.g. wrap sin/cos/tan in non-optimized function calls to force the compiler to leave them at single-precision.
If the compiler and architecture is compliant to IEEE standards, yes.
For instance, gcc is IEEE compliant if configured properly. If you use the -ffast-math flag, it will not be IEEE compliant.
See http://www.validlab.com/goldberg/paper.pdf page 25.
If you want to know exactly what exactness you can rely on when using a IEEE 754-1985 hardware/compiler pair, you need to purchase the standard paper on IEEE site. Unfortunately, this is not publicly available
Link

Fortran/C Interlanguage problems: results differ in the 14th digit

I have to use C and Fortran together to do some simulations. In their course I use the same memory in both programming language parts, by defining a pointer in C to access memory allocated by Fortran.
The datatype of the problematic variable is
real(kind=8)
for Fortran, and
double
for C. The results of the same calculations now differ in the respective programming languages, and I need to directly compare them and get a zero. All calculations are done only with the above accuracies. The difference is always in the 13-14th digit.
What would be a good way to resolve this? Any compiler-flags? Just cut-off after some digits?
Many thanks!
Floating point is not perfectly accurate. Ever. Even cos(x) == cos(y) can be false if x == y.
So when doing your comparisons, take this into account, and allow the values to differ by some small epsilon value.
This is a problem with the inaccuracy with floating point numbers - they will be inaccurate and a certain place. You usually compare them either by rounding them to a digit that you know will be in the accurate area, or by providing an epsilon of appropiate value (small enough to not impact further calculations, and big enough to take care of the inaccuracy while comparing).
One thing you might check is to be sure that the FPU control word is the same in both cases. If it is set to 53-bit precision in one case and 64-bit in the other, it would likely produce different results. You can use the instructions fstcw and fldcw to read and load the control word value. Nonetheless, as others have mentioned, you should not depend on the accuracy being identical even if you can make it work in one situation.
Perfect portability is very difficult to achieve in floating point operations. Changing the order of the machine instructions might change the rounding. One compiler might keep values in registers, while another copy it to memory, which can change the precision. Currently the Fortran and C languages allow a certain amount of latitude. The IEEE module of Fortran 2008, when implemented, will allow requiring more specific and therefore more portable floating point computations.
Since you are compiling for an x86 architecture, it's likely that one of the compilers is maintaining intermediate values in floating point registers, which are 80 bits as opposed to the 64 bits of a C double.
For GCC, you can supply the -ffloat-store option to inhibit this optimisation. You may also need to change the code to explicitly store some intermediate results in double variables. Some experimentation is likely in order.

Floating-point precision when moving from i386 to x86_64

I have an application that was developed for Linux x86 32 bits. There are lots of floating-point operations and a lot of tests depending on the results. Now we are porting it to x86_64, but the test results are different in this architecture. We don't want to keep a separate set of results for each architecture.
According to the article An Introduction to GCC - for the GNU compilers gcc and g++ the problem is that GCC in X86_64 assumes fpmath=sse while x86 assumes fpmath=387. The 387 FPU uses 80 bit internal precision for all operations and only convert the result to a given floating-point type (float, double or long double) while SSE uses the type of the operands to determine its internal precision.
I can force -mfpmath=387 when compiling my own code and all my operations work correctly, but whenever I call some library function (sin, cos, atan2, etc.) the results are wrong again. I assume it's because libm was compiled without the fpmath override.
I tried to build libm myself (glibc) using 387 emulation, but it caused a lot of crashes all around (don't know if I did something wrong).
Is there a way to force all code in a process to use the 387 emulation in x86_64? Or maybe some library that returns the same values as libm does on both architectures? Any suggestions?
Regarding the question of "Do you need the 80 bit precision", I have to say that this is not a problem for an individual operation. In this simple case the difference is really small and makes no difference. When compounding a lot of operations, though, the error propagates and the difference in the final result is not so small any more and makes a difference. So I guess I need the 80 bit precision.
I'd say you need to fix your tests. You're generally setting yourself up for disappointment if you assume floating point math to be accurate. Instead of testing for exact equality, test whether it's close enough to the expected result. What you've found isn't a bug, after all, so if your tests report errors, the tests are wrong. ;)
As you've found out, every library you rely on is going to assume SSE floating point, so unless you plan to compile everything manually, now and forever, just so you can set the FP mode to x87, you're better off dealing with the problem now, and just accepting that FP math is not 100% accurate, and will not in general yield the same result on two different platforms. (I believe AMD CPU's yield slightly different results in x87 math as well).
Do you absolutely need 80-bit precision? (If so, there obviously aren't many alternatives, other than to compile everything yourself to use 80-bit FP.)
Otherwise, adjust your tests to perform comparisons and equality tests within some small epsilon. If the difference is smaller than that epsilon, the values are considered equal.
80 bit precision is actually dangerous. The problem is that it is actually preserved as long as the variable is stored in the CPU register. Whenever it is forced out to RAM, it is truncated to the type precision. So you can have a variable actually change its value even though nothing happened to it in the code.
If you want long double precision, use long double for all of your floating point variables, rather than expecting float or double to have extra magic precision. This is really a no-brainer.
SSE floating point and 387 floating point use entirely different instructions, and so there's no way to convince SSE fp instructions to use the 387. Probably the best way to deal with this is resign your test suite to getting slightly different results, and not depend on results being the same to the last bit.

Resources