Comparing floating point numbers in C - c

I've got a double that prints as 0.000000 and I'm trying to compare it to 0.0f, unsuccessfully. Why is there a difference here? What's the most reliable way to determine if your double is zero?

To determine whether it's close enough to zero that it will print as 0.000000 to six decimal places, something like:
fabs(d) < 0.0000005
Dealing with small inaccuracies in floating-point calculations can get quite complicated in general, though.
If you want a better idea what value you've got, try printing with %g instead of %f.

You can do a range. Like -0.00001 <= x <= 0.00001

This is fundamental problem with floating point arithmetic on modern computers. They are by nature imprecise, and cannot be reliably compared. For example, the language ML explicitly disallows equality comparison on real types because it was considered too unsafe. See also the excellent (if a bit long and mathematically oriented) paper by David Goldberg on this topic.
Edit: tl;dr: you might be doing it wrong.

Also, one often overlooked features of floating point number are the denormalized numbers.
That's numbers which have the minimal exponent, yet don't fit in the 0.5-1 range.
Those numbers are lower than FLT_MIN for float, and DBL_MIN for double.
A common mistake with using a threshold is to compare two values, or use FLT_MIN/DBL_MIN as limit.
For example, this would lead unlogical result (if you don't know about denormals):
bool areDifferent(float a, float b) {
if (a == b) return false; // Or also: if ((a - b) == FLT_MIN)
return true;
}
// What is the output of areDifferent(val, val + FLT_MIN * 0.5f) ?
// true, not false, even if adding half the "minimum value".
Denormals also usually implies a performance loss in computation.
Yet, you can not disable them, else such code could still produce a DIVIDE BY ZERO floating point exception (if enabled):
float getInverse(float a, float b) {
if (a != b)
return 1.0f / (a-b); // With denormals disabled, a != b can be true, but (a - b) can still be denormals, it'll rounded to 0 and throw the exception
return FLT_MAX;
}

Related

How to round 8.475 to 8.48 in C (rounding function that takes into account representation issues)? Reducing probability of issue

I am trying to round 8.475 to 8.48 (to two decimal places in C). The problem is that 8.475 internally is represented as 8.47499999999999964473:
double input_test =8.475;
printf("input tests: %.20f, %.20f \n", input_test, *&input_test);
gives:
input tests: 8.47499999999999964473, 8.47499999999999964473
So, if I had an ideal round function then it would round 8.475=8.4749999... to 8.47. So, internal round function is no appropriate for me. I see that rounding problem arises in cases of "underflow" and therefore I am trying to use the following algorithm:
double MyRound2( double * value) {
double ad;
long long mzr;
double resval;
if ( *value < 0.000000001 )
ad = -0.501;
else
ad = 0.501;
mzr = long long (*value);
resval = *value - mzr;
resval= (long long( resval*100+ad))/100;
return resval;
}
This solves the "underflow" issue and it works well for "overflow" issues as well. The problem is that there are valid values x.xxx99 for which this function incorrectly gives bigger value (because of 0.001 in 0.501). How to solve this issue, how to devise algorithm that can detect floating point representation issue and that can round taking account this issue? Maybe C already has such clever rounding function? Maybe I can select different value for constant ad - such that probability of such rounding errors goes to zero (I mostly work with money values with up to 4 decimal ciphers).
I have read all the popoular articles about floating point representation and I know that there are tricky and unsolvable issues, but my client do not accept such explanation because client can clearly demonstrate that Excel handles (reproduces, rounds and so on) floating point numbers without representation issues.
(The C and C++ standards are intentionally flexible when it comes to the specification of the double type; quite often it is IEEE754 64 bit type. So your observed result is platform-dependent).
You are observing of the pitfalls of using floating point types.
Sadly there isn't an "out-of-the-box" fix for this. (Adding a small constant pre-rounding just pushes the problem to other numbers).
Moral of the story: don't use floating point types for money.
Use a special currency type instead or work in "pence"; using an integral type instead.
By the way, Excel does use an IEEE754 double precision floating point for its number type, but it also has some clever tricks up its sleeve. Essentially it tracks the joke digits carefully and also is clever with its formatting. This is how it can evaluate 1/3 + 1/3 + 1/3 exactly. But even it will get money calculations wrong sometimes.
For financial calculations, it is better to work in base-10 to avoid represenatation issues when going to/from binary. In many countries, financial software is even legally required to do so. Here is one library for IEEE 754R Decimal Floating-Point Arithmetic, have not tried it myself:
http://www.netlib.org/misc/intel/
Also note that working in decimal floating-point instead of fixed-point representation allows clever algoritms like the Kahan summation algorithm, to avoid accumulation of rounding errors. A noteworthy difference to normal floating point is that numbers with few significant digits are not normalized, so you can have e.g both 1*10^2 and .1*10^3.
An implementation note is that one representation in the std uses a binary significand, to allow sw implementations using a standard binary ALU.
How about this one: Define some threshold. This threshold is the distance to the next multiple of 0.005 at which you assume that this distance could be an error of imprecision. Execute appropriate methods if it's within that distance and smaller. Round as usual and at the end, if you detected that it was, add 0.01.
That said, this is only a work around and somewhat of a code smell. If you don't need too much speed, go for some other type than float. Like your own type that works like
class myDecimal{ int digits; int exponent_of_ten; } with value = digits * E exponent_of_ten
I am not trying to argument that using floating point numbers to represent money is advisable - it is not! but sometimes you have no choice... We do kind of work with money (life incurance calculations) and are forced to use floating point numbers for everything including values representing money.
Now there are quite some different rounding behaviours out there: round up, round down, round half up, round half down, round half even, maybe more. It looks like you were after round half up method.
Our round-half-up function - here translated from Java - looks like this:
#include <iostream>
#include <cmath>
#include <cfloat>
using namespace std;
int main()
{
double value = 8.47499999999999964473;
double result = value * pow(10, 2);
result = nextafter(result + (result > 0.0 ? 1e-8 : -1e-8), DBL_MAX);
double integral = floor(result);
double fraction = result - integral;
if (fraction >= 0.5) {
result = ceil(result);
} else {
result = integral;
}
result /= pow(10, 2);
cout << result << endl;
return 0;
}
where nextafter is a function returning the next floating point value after the given value - this code is proved to work using C++11 (AFAIK the nextafter is also available in boost), the result written into the standard output is 8.48.

How IF() function works while comparing float numbers? [duplicate]

This question already has answers here:
Floating point inaccuracy examples
(7 answers)
Closed 8 years ago.
int main()
{
float a = 0.7;
float b = 0.5;
if (a < 0.7)
{
if (b < 0.5) printf("2 are right");
else printf("1 is right");
}
else printf("0 are right");
}
I would have expected the output of this code to be 0 are right.
But to my dismay the output is 1 is right why?
int main()
{
float a = 0.7, b = 0.5; // These are FLOATS
if(a < .7) // This is a DOUBLE
{
if(b < .5) // This is a DOUBLE
printf("2 are right");
else
printf("1 is right");
}
else
printf("0 are right");
}
Floats get promoted to doubles during comparison, and since floats are less precise than doubles, 0.7 as float is not the same as 0.7 as double. In this case, 0.7 as float becomes inferior to 0.7 as double when it gets promoted. And as Christian said, 0.5 being a power of 2 is always represented exactly, so the test works as expected: 0.5 < 0.5 is false.
So either:
Change float to double, or:
Change .7 and .5 to .7f and .5f,
and you will get the expected behavior.
The issue is that the constants you are comparing to are double not float. Also, changing your constants to something that is representable easily such as a factor of 5 will make it say 0 is right. For example,
main()
{
float a=0.25,b=0.5;
if(a<.25)
{
if(b<.5)
printf("2 are right");
else
printf("1 is right");
}
else
printf("0 are right");
}
Output:
0 are right
This SO question on Most Effective Way for float and double comparison covers this topic.
Also, this article at cygnus on floating point number comparison gives us some tips:
The IEEE float and double formats were designed so that the numbers
are “lexicographically ordered”, which – in the words of IEEE
architect William Kahan means “if two floating-point numbers in the
same format are ordered ( say x < y ), then they are ordered the same
way when their bits are reinterpreted as Sign-Magnitude integers.”
This means that if we take two floats in memory, interpret their bit
pattern as integers, and compare them, we can tell which is larger,
without doing a floating point comparison. In the C/C++ language this
comparison looks like this:
if (*(int*)&f1 < *(int*)&f2)
This charming syntax means take the address of f1, treat it as an
integer pointer, and dereference it. All those pointer operations look
expensive, but they basically all cancel out and just mean ‘treat f1
as an integer’. Since we apply the same syntax to f2 the whole line
means ‘compare f1 and f2, using their in-memory representations
interpreted as integers instead of floats’.
It's due to rounding issues while converting from float to double
Generally comparing equality with floats is a dangerous business (which is effectively what you're doing as you're comparing right on the boundary of > ), remember that in decimal certain fractions (like 1/3) cannot be expressed exactly, the same can be said of binary,
0.5= 0.1, will be the same in float or double.
0.7=0.10110011001100 etc forever, 0.7 cannot be exactly represented in binary, you get rounding errors and may be (very very slightly) different between float and double
Note that going between floats and doubles you cut off a different number of decimal places, hence your inconsistant results.
Also, btw, you have an error in your logic of 0 are right. You don't check b when you output 0 are right. But the whole thing is a little mysterious in what you are really trying to accomplish. Floating point comparisons between floats and doubles will have variations, minute, so you should compare with a delta 'acceptable' variation for your situation. I've always done this via inline functions that just perform the work (did it once with a macro, but thats too messy). Anyhow, yah, rounding issues abound with this type of example. Read the floating point stuff, and know that .7 is different than .7f and assigning .7 to a float will cast a double into a float, thus changing the exact nature of the value. But, the programming assumption about b being wrong since you checked a blared out to me, and I had to note that :)

sqrt, perfect squares and floating point errors

In the sqrt function of most languages (though here I'm mostly interested in C and Haskell), are there any guarantees that the square root of a perfect square will be returned exactly? For example, if I do sqrt(81.0) == 9.0, is that safe or is there a chance that sqrt will return 8.999999998 or 9.00000003?
If numerical precision is not guaranteed, what would be the preferred way to check that a number is a perfect square? Take the square root, get the floor and the ceiling and make sure they square back to the original number?
Thank you!
In IEEE 754 floating-point, if the double-precision value x is the square of a nonnegative representable number y (i.e. y*y == x and the computation of y*y does not involve any rounding, overflow, or underflow), then sqrt(x) will return y.
This is all because sqrt is required to be correctly-rounded by the IEEE 754 standard. That is, sqrt(x), for any x, will be the closest double to the actual square root of x. That sqrt works for perfect squares is a simple corollary of this fact.
If you want to check whether a floating-point number is a perfect square, here's the simplest code I can think of:
int issquare(double d) {
if (signbit(d)) return false;
feclearexcept(FE_INEXACT);
double dd = sqrt(d);
asm volatile("" : "+x"(dd));
return !fetestexcept(FE_INEXACT);
}
I need the empty asm volatile block that depends on dd because otherwise your compiler might be clever and "optimise" away the calculation of dd.
I used a couple of weird functions from fenv.h, namely feclearexcept and fetestexcept. It's probably a good idea to look at their man pages.
Another strategy that you might be able to make work is to compute the square root, check whether it has set bits in the low 26 bits of the mantissa, and complain if it does. I try this approach below.
And I needed to check whether d is zero because otherwise it can return true for -0.0.
EDIT: Eric Postpischil suggested that hacking around with the mantissa might be better. Given that the above issquare doesn't work in another popular compiler, clang, I tend to agree. I think the following code works:
int _issquare2(double d) {
if (signbit(d)) return 0;
int foo;
double s = sqrt(d);
double a = frexp(s, &foo);
frexp(d, &foo);
if (foo & 1) {
return (a + 33554432.0) - 33554432.0 == a && s*s == d;
} else {
return (a + 67108864.0) - 67108864.0 == a;
}
}
Adding and subtracting 67108864.0 from a has the effect of wiping the low 26 bits of the mantissa. We will get a back exactly when those bits were clear in the first place.
According to this paper, which discusses proving the correctness of IEEE floating-point square root:
The IEEE-754 Standard for Binary Floating-Point
Arithmetic [1] requires that the result of a divide or square
root operation be calculated as if in infinite precision, and
then rounded to one of the two nearest floating-point
numbers of the specified precision that surround the
infinitely precise result
Since a perfect square that can be represented exactly in floating-point is an integer and its square root is an integer that can be precisely represented, the square root of a perfect square should always be exactly correct.
Of course, there's no guarantee that your code will execute with a conforming IEEE floating-point library.
#tmyklebu perfectly answered the question. As a complement, let's see a possibly less efficient alternative for testing perfect square of fractions without asm directive.
Let's suppose we have an IEEE 754 compliant sqrt which rounds the result correctly.
Let's suppose exceptional values (Inf/Nan) and zeros (+/-) are already handled.
Let's decompose sqrt(x) into I*2^m where I is an odd integer.
And where I spans n bits: 1+2^(n-1) <= I < 2^n.
If n > 1+floor(p/2) where p is floating point precision (e.g. p=53 and n>27 in double precision)
Then 2^(2n-2) < I^2 < 2^2n.
As I is odd, I^2 is odd too and thus spans over > p bits.
Thus I is not the exact square root of any representable floating point with this precision.
But given I^2<2^p, could we say that x was a perfect square?
The answer is obviously no. A taylor expansion would give
sqrt(I^2+e)=I*(1+e/2I - e^2/4I^2 + O(e^3/I^3))
Thus, for e=ulp(I^2) up to sqrt(ulp(I^2)) the square root is correctly rounded to rsqrt(I^2+e)=I... (round to nearest even or truncate or floor mode).
Thus we would have to assert that sqrt(x)*sqrt(x) == x.
But above test is not sufficient, for example, assuming IEEE 754 double precision, sqrt(1.0e200)*sqrt(1.0e200)=1.0e200, where 1.0e200 is exactly 99999999999999996973312221251036165947450327545502362648241750950346848435554075534196338404706251868027512415973882408182135734368278484639385041047239877871023591066789981811181813306167128854888448 whose first prime factor is 2^613, hardly a perfect square of any fraction...
So we can combine both tests:
#include <float.h>
bool is_perfect_square(double x) {
return sqrt(x)*sqrt(x) == x
&& squared_significand_fits_in_precision(sqrt(x));
}
bool squared_significand_fits_in_precision(double x) {
double scaled=scalb( x , DBL_MANT_DIG/2-ilogb(x));
return scaled == floor(scaled)
&& (scalb(scaled,-1)==floor(scalb(scaled,-1)) /* scaled is even */
|| scaled < scalb( sqrt((double) FLT_RADIX) , DBL_MANT_DIG/2 + 1));
}
EDIT:
If we want to restrict to the case of integers, we can also check that floor(sqrt(x))==sqrt(x) or use dirty bit hacks in squared_significand_fits_in_precision...
Instead of doing sqrt(81.0) == 9.0, try 9.0*9.0 == 81.0. This will always work as long as the square is within the limits of the floating point magnitude.
Edit: I was probably unclear about what I meant by "floating point magnitude". What I mean is to keep the number within the range of integer values that can be held without precision loss, less than 2**53 for a IEEE double. I also expected that there would be a separate operation to make sure the square root was an integer.
double root = floor(sqrt(x) + 0.5); /* rounded result to nearest integer */
if (root*root == x && x < 9007199254740992.0)
/* it's a perfect square */

C : Strange error when using float and double [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
strange output in comparision of float with float literal
Comparison of float and double variables
I have a test with double and float in C, but I cannot explain why.
float x = 3.4F;
if(x==3.4)
printf("true\n");
else printf("false\n");
double y = 3.4;
if (y==3.4)
printf("true\n");
else printf("false\n");
The result will be False and True. Please explain for me please.
3.4 cannot be exactly represented as a double for the same reason that one third cannot be exactly represented as a base-10 decimal number using a finite number of digits -- the representation recurs.
So, the double literal 3.4 is actually the double value closest to 3.4. 3.4F is the float value closest to 3.4, but that's different from the closest double value.
When you compare a float with a double, the float is converted to double, which doesn't change its value.
Hence, 3.4F != 3.4, just as 0.3333 != 0.33333333
x == 3.4 should be x == 3.4F, otherwise the 3.4 is a double (by default). Always compare like with like, not apples and oranges.
Edit:
Whether the result of the comparison between types of different precision is true or false depends on the floating point representation of the compiler.
Floating point numbers, whether single precision (float) or double, are an approximation.
No guarantee that the result will be false and true, but the basic idea is pretty simple: 3.4 has type double. When you assign it to a float, it'll get rounded. When you compare, that rounded number will be promoted back to a double, not necessarily the same double as 3.4.
In the second case, everything's double throughout.
Comparing for equality
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false.
As a tip, comparing floating point numbers (float or double) with '==' is in most cases a bad programming practice, as you might never get the statement to be true. The two numbers might differ in their least significant bits.
It is better using something like:
abs(x - y) < EQUALITY_MARGIN
With EQUALITY_MARGIN being an adequately small number.

The Google Calculator Glitch, could float vs. double be a possible reason?

I did this Just for kicks (so, not exactly a question, i can see the downmodding happening already) but, in lieu of Google's newfound inability to do math correctly (check it! according to google 500,000,000,000,002 - 500,000,000,000,001 = 0), i figured i'd try the following in C to run a little theory.
int main()
{
char* a = "399999999999999";
char* b = "399999999999998";
float da = atof(a);
float db = atof(b);
printf("%s - %s = %f\n", a, b, da-db);
a = "500000000000002";
b = "500000000000001";
da = atof(a);
db = atof(b);
printf("%s - %s = %f\n", a, b, da-db);
}
When you run this program, you get the following
399999999999999 - 399999999999998 = 0.000000
500000000000002 - 500000000000001 = 0.000000
It would seem like Google is using simple 32 bit floating precision (the error here), if you switch float for double in the above code, you fix the issue! Could this be it?
/mp
For more of this kind of silliness see this nice article pertaining to Windows calculator.
When you change the insides, nobody notices
The innards of Calc - the arithmetic
engine - was completely thrown away
and rewritten from scratch. The
standard IEEE floating point library
was replaced with an
arbitrary-precision arithmetic
library. This was done after people
kept writing ha-ha articles about how
Calc couldn't do decimal arithmetic
correctly, that for example computing
10.21 - 10.2 resulted in 0.0100000000000016.
in C#, try (double.maxvalue == (double.maxvalue - 100)) , you'll get true ...
but thats what it is supposed to be:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
thinking about it, you have 64 bit representing a number greater than 2^64 (double.maxvalue), so inaccuracy is expected.
It would seem like Google is using simple 32 bit floating precision (the error here), if you switch float for double in the above code, you fix the issue! Could this be it?
No, you just defer the issue. doubles still exhibit the same issue, just with larger numbers.
#ebel
thinking about it, you have 64 bit representing a number greater than 2^64 (double.maxvalue), so inaccuracy is expected.
2^64 is not the maximum value of a double. 2^64 is the number of unique values that a double (or any other 64-bit type) can hold. Double.MaxValue is equal to 1.79769313486232e308.
Inaccuracy with floating point numbers doesn't come from representing values larger than Double.MaxValue (which is impossible, excluding Double.PositiveInfinity). It comes from the fact that the desired range of values is simply too large to fit into the datatype. So we give up precision in exchange for a larger effective range. In essense, we are dropping significant digits in return for a larger exponent range.
#DrPizza
Not even; the IEEE encodings use multiple encodings for the same values. Specifically, NaN is represented by an exponent of all-bits-1, and then any non-zero value for the mantissa. As such, there are 252 NaNs for doubles, 223 NaNs for singles.
True. I didn't account for duplicate encodings. There are actually 252-1 NaNs for doubles and 223-1 NaNs for singles, though. :p
2^64 is not the maximum value of a double. 2^64 is the number of unique values that a double (or any other 64-bit type) can hold. Double.MaxValue is equal to 1.79769313486232e308.
Not even; the IEEE encodings use multiple encodings for the same values. Specifically, NaN is represented by an exponent of all-bits-1, and then any non-zero value for the mantissa. As such, there are 252 NaNs for doubles, 223 NaNs for singles.
True. I didn't account for duplicate encodings. There are actually 252-1 NaNs for doubles and 223-1 NaNs for singles, though. :p
Doh, forgot to subtract the infinities.
The rough estimate version of this issue that I learned is that 32-bit floats give you 5 digits of precision and 64-bit floats give you 15 digits of precision. This will of course vary depending on how the floats are encoded, but it's a pretty good starting point.

Resources