float vs double comparison [duplicate] - c

This question already has answers here:
Comparing float and double
(3 answers)
Closed 7 years ago.
int main(void)
{
  float me = 1.1;  
double you = 1.1;   
if ( me == you ) {
printf("I love U");
} else {
printf("I hate U");
}
}
This prints "I hate U". Why?

Floats use binary fraction. If you convert 1.1 to float, this will result in a binary representation.
Each bit right if the binary point halves the weight of the digit, as much as for decimal, it divides by ten. Bits left of the point double (times ten for decimal).
in decimal: ... 0*2 + 1*1 + 0*0.5 + 0*0.25 + 0*0.125 + 1*0.0625 + ...
binary: 0 1 . 0 0 0 1 ...
2's exp: 1 0 -1 -2 -3 -4
(exponent to the power of 2)
Problem is that 1.1 cannot be converted exactly to binary representation. For double, there are, however, more significant digits than for float.
If you compare the values, first, the float is converted to double. But as the computer does not know about the original decimal value, it simply fills the trailing digits of the new double with all 0, while the double value is more precise. So both do compare not equal.
This is a common pitfall when using floats. For this and other reasons (e.g. rounding errors), you should not use exact comparison for equal/unequal), but a ranged compare using the smallest value different from 0:
#include "float.h"
...
// check for "almost equal"
if ( fabs(fval - dval) <= FLT_EPSILON )
...
Note the usage of FLT_EPSILON, which is the aforementioned value for single precision float values. Also note the <=, not <, as the latter will actually require exact match).
If you compare two doubles, you might use DBL_EPSILON, but be careful with that.
Depending on intermediate calculations, the tolerance has to be increased (you cannot reduce it further than epsilon), as rounding errors, etc. will sum up. Floats in general are not forgiving with wrong assumptions about precision, conversion and rounding.
Edit:
As suggested by #chux, this might not work as expected for larger values, as you have to scale EPSILON according to the exponents. This conforms to what I stated: float comparision is not that simple as integer comparison. Think about before comparing.

In short, you should NOT use == to compare floating points.
for example
float i = 1.1; // or double
float j = 1.1; // or double
This argument
(i==j) == true // is not always valid
for a correct comparison you should use epsilon (very small number):
(abs(i-j)<epsilon)== true // this argument is valid

The question simplifies to why do me and you have different values?
Usually, C floating point is based on a binary representation. Many compilers & hardware follow IEEE 754 binary32 and binary64. Rare machines use a decimal, base-16 or other floating point representation.
OP's machine certainly does not represent 1.1 exactly as 1.1, but to the nearest representable floating point number.
Consider the below which prints out me and you to high precision. The previous representable floating point numbers are also shown. It is easy to see me != you.
#include <math.h>
#include <stdio.h>
int main(void) {
float me = 1.1;
double you = 1.1;
printf("%.50f\n", nextafterf(me,0)); // previous float value
printf("%.50f\n", me);
printf("%.50f\n", nextafter(you,0)); // previous double value
printf("%.50f\n", you);
1.09999990463256835937500000000000000000000000000000
1.10000002384185791015625000000000000000000000000000
1.09999999999999986677323704498121514916420000000000
1.10000000000000008881784197001252323389053300000000
But it is more complicated: C allows code to use higher precision for intermediate calculations depending on FLT_EVAL_METHOD. So on another machine, where FLT_EVAL_METHOD==1 (evaluate all FP to double), the compare test may pass.
Comparing for exact equality is rarely used in floating point code, aside from comparison to 0.0. More often code uses an ordered compare a < b. Comparing for approximate equality involves another parameter to control how near. #R.. has a good answer on that.

Because you are comparing two Floating point!
Floating point comparison is not exact because of Rounding Errors. Simple values like 1.1 or 9.0 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. For example:
float a = 9.0 + 16.0
double b = 25.0
if(a == b) // can be false!
if(a >= b) // can also be false!
Even
if(abs(a-b) < 0.0001) // wrong - don't do this
This is a bad way to do it because a fixed epsilon (0.0001) is chosen because it “looks small”, could actually be way too large when the numbers being compared are very small as well.
I personally use the following method, may be this will help you:
#include <iostream> // std::cout
#include <cmath> // std::abs
#include <algorithm> // std::min
using namespace std;
#define MIN_NORMAL 1.17549435E-38f
#define MAX_VALUE 3.4028235E38f
bool nearlyEqual(float a, float b, float epsilon) {
float absA = std::abs(a);
float absB = std::abs(b);
float diff = std::abs(a - b);
if (a == b) {
return true;
} else if (a == 0 || b == 0 || diff < MIN_NORMAL) {
return diff < (epsilon * MIN_NORMAL);
} else {
return diff / std::min(absA + absB, MAX_VALUE) < epsilon;
}
}
This method passes tests for many important special cases, for different a, b and epsilon.
And don't forget to read What Every Computer Scientist Should Know About Floating-Point Arithmetic!

Related

How to flip the exponent of a double (e.g. 1e300->1e-300)?

I am interested in writing a fast C program that flips the exponent of a double. For instance, this program should convert 1e300 to 1e-300. I guess the best way would be some bit operations, but I lack enough knowledge to fulfill that. Any good idea?
Assuming you mean to negate the decimal exponent, the power of ten exponent in scientific notation:
#include <math.h>
double negate_decimal_exponent(const double value)
{
if (value != 0.0) {
const double p = pow(10.0, -floor(log10(fabs(value))));
return (value * p) * p;
} else
return value;
}
Above, floor(log10(fabs(value))) is the base 10 logarithm of the absolute value of value, rounded down. Essentially, it is the power of ten exponent in value using the scientific notation. If we negate it, and raise ten to that power, we have the inverse of that power of ten.
We can't calculate the square of p, because it might underflow for very large values of value in magnitude, or overflow for very small values of value in magnitude. Instead, we multiply value by p, so that the product is near unity in magnitude (that is, decimal exponent is zero); then multiply that with p, to essentially negate the decimal exponent.
Because the base-ten logarithm of zero is undefined, so we need to deal with that separately. (I initially missed this corner case; thanks to chux for pointing it out.)
Here is an example program to demonstrate:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
double negate_decimal_exponent(const double value)
{
if (value != 0.0) {
const double p = pow(10.0, -floor(log10(fabs(value))));
return (value * p) * p;
} else
return value;
}
#define TEST(val) printf("negate_decimal_exponent(%.16g) = %.16g\n", val, negate_decimal_exponent(val))
int main(void)
{
TEST(1.0e300);
TEST(1.1e300);
TEST(-1.0e300);
TEST(-0.8e150);
TEST(0.35e-25);
TEST(9.83e-200);
TEST(23.4728395e-220);
TEST(0.0);
TEST(-0.0);
return EXIT_SUCCESS;
}
which, when compiled (remember to link with the math library, -lm) and run, outputs (on my machine; should output the same on all machines using IEEE-754 Binary64 for doubles):
negate_decimal_exponent(1e+300) = 1e-300
negate_decimal_exponent(1.1e+300) = 1.1e-300
negate_decimal_exponent(-1e+300) = -1e-300
negate_decimal_exponent(-8e+149) = -8e-149
negate_decimal_exponent(3.5e-26) = 3.5e+26
negate_decimal_exponent(9.83e-200) = 9.83e+200
negate_decimal_exponent(2.34728395e-219) = 2.34728395e+219
negate_decimal_exponent(0) = 0
negate_decimal_exponent(-0) = -0
Are there faster methods to do this?
Sure. Construct a look-up table of powers of ten, and use a binary search to find the largest value that is smaller than value in magnitude. Have a second look-up table have the two multipliers that when multiplied with value, negates the decimal power of ten. Two factors are needed, because a single one does not have the necessary range and precision. (However, the two values are symmetric with respect to the base-ten logarithm.) For a look-up table with thousand exponents (covers IEEE-754 doubles, but one should check at compile time that it does cover DBL_MAX), that would be ten comparisons and two multiplications (using floating-point values), so it'd be quite fast.
A portable program could calculate the tables necessary at run-time, too.

why if(a==2.3) evaluates false when float a=2.3 [duplicate]

This question already has answers here:
Why are floating point numbers inaccurate?
(5 answers)
Closed 6 years ago.
#include<stdio.h>
void main()
{
float a = 2.3;
if(a == 2.3) {
pritnf("hello");
}
else {
printf("hi");
}
}
It prints "hi" in output, or we can say that if condition is getting false value.
#include<stdio.h>
void main()
{
float a = 2.5;
if(a == 2.5)
printf("Hello");
else
printf("Hi");
}
It prints hello.
The variable a is a float that holds some value close to the mathematical value 2.3.
The literal 2.3 is a double that also holds some value close to the mathematical value 2.3, but because double has greater precision than float, this may be a different value from the value of a. Both float and double can only represent a finite number of values, so there are necessarily mathematical real numbers that cannot be represented exactly by either of those two types.
In the comparison a == 2.3, the left operand is promoted from float to double. This promotion is exact and preserves the value (as all promotions do), but as discussed above, that value may be a different one from that of the 2.3 literal.
To make a comparison between floats, you can use an appropriate float literal:
assert(a == 2.3f);
// ^
2.3 with binary representation is 01000000000100110011001100110011...
so you are not able to set a float exactly to 2.3
with double precision you get something similar: 2.299999952316284
you converted a double to float when you wrote:
float a = 2.3;
the if checks if the float a is equal to double 2.299999952316284
you should write:
float a = 2.3f;
and you can check:
if (a == 2.3f) {
...
}
i would rather test with:
if (fabs(a - 2.3f) < 0.00001) {
...
}
the 2.5 represented with bits is: 01000000001000000000000000000000
EDIT: fabs is part of the <math.h> or <cmath>
Read this: article
Comparing floating point values is not as easy as it might seem, have a look at Most effective way for float and double comparison.
It all boils down to the fact, that floating point numbers are not exact (well
most are not). Usually you compare 2 floats by allowing a small error window (epsilon):
if( fabs(a - 2.3f) < epsion) { ... }
where epsilon is small enough for your calculation, but not too small (bigger than Machine epsilon).

Strange output when using float instead of double

Strange output when I use float instead of double
#include <stdio.h>
void main()
{
double p,p1,cost,cost1=30;
for (p = 0.1; p < 10;p=p+0.1)
{
cost = 30-6*p+p*p;
if (cost<cost1)
{
cost1=cost;
p1=p;
}
else
{
break;
}
printf("%lf\t%lf\n",p,cost);
}
printf("%lf\t%lf\n",p1,cost1);
}
Gives output as expected at p = 3;
But when I use float the output is a little weird.
#include <stdio.h>
void main()
{
float p,p1,cost,cost1=40;
for (p = 0.1; p < 10;p=p+0.1)
{
cost = 30-6*p+p*p;
if (cost<cost1)
{
cost1=cost;
p1=p;
}
else
{
break;
}
printf("%f\t%f\n",p,cost);
}
printf("%f\t%f\n",p1,cost1);
}
Why is the increment of p in the second case going weird after 2.7?
This is happening because the float and double data types store numbers in base 2. Most base-10 numbers can’t be stored exactly. Rounding errors add up much more quickly when using floats. Outside of embedded applications with limited memory, it’s generally better, or at least easier, to use doubles for this reason.
To see this happening for double types, consider the output of this code:
#include <stdio.h>
int main(void)
{
double d = 0.0;
for (int i = 0; i < 100000000; i++)
d += 0.1;
printf("%f\n", d);
return 0;
}
On my computer, it outputs 9999999.981129. So after 100 million iterations, rounding error made a difference of 0.018871 in the result.
For more information about how floating-point data types work, read What Every Computer Scientist Should Know About Floating-Point Arithmetic. Or, as akira mentioned in a comment, see the Floating-Point Guide.
Your program can work fine with float. You don't need double to compute a table of 100 values to a few significant digits. You can use double, and if you do, it will have chances to work even if you use binary floating-point binary at cross-purposes. The IEEE 754 double-precision format used for double by most C compilers is so precise that it makes many misuses of floating-point unnoticeable (but not all of them).
Values that are simple in decimal may not be simple in binary
A consequence is that a value that is simple in decimal may not be represented exactly in binary.
This is the case for 0.1: it is not simple in binary, and it is not represented exactly as either double or float, but the double representation has more digits and as a result, is closer to the intended value 1/10.
Floating-point operations are not exact in general
Binary floating-point operations in a format such as float or double have to produce a result in the intended format. This leads to some digits having to be dropped from the result each time an operation is computed. When using binary floating-point in an advanced manner, the programmer sometimes knows that the result will have few enough digits for all the digits to be represented in the format (in other words, sometimes a floating-point operation can be exact and advanced programmers can predict and take advantage of conditions in which this happens). But here, you are adding 0.1, which is not simple and (in binary) uses all the available digits, so most of the times, this addition is not be exact.
How to print a small table of values using only float
In for (p = 0.1; p < 10;p=p+0.1), the value of p, being a float, will be rounded at each iteration. Each iteration will be computed from a previous iteration that was already rounded, so the rounding errors will accumulate and make the end result drift away from the intended, mathematical value.
Here is a list of improvements over what you wrote, in reverse order of exactness:
for (i = 1, p = 0.1f; i < 100; i++, p = i * 0.1f)
In the above version, 0.1f is not exactly 1/10, but the computation of p involves only one multiplication and one rounding, instead of up to 100. That version gives a more precise approximation of i/10.
for (i = 1, p = 0.1f; i < 100; i++, p = i * 0.1)
In the very slightly different version above, i is multiplied by the double value 0.1, which more closely approximates 1/10. The result is always the closest float to i/10, but this solution is cheating a bit, since it uses a double multiplication. I said a solution existed with only float!
for (i = 1, p = 0.1f; i < 100; i++, p = i / 10.0f)
In this last solution, p is computed as the division of i, represented exactly as a float because it is a small integer, by 10.0f, which is also exact for the same reason. The only computation approximation is that of a single operation, and the arguments are exactly what we wanted them to, so this is the best solution. It produces the closest float to i/10 for all values of i between 1 and 99.

Moving decimal place to right in c

I'm new to C and when I run the code below, the value that is put out is 12098 instead of 12099.
I'm aware that working with decimals always involves a degree of inaccuracy, but is there a way to accurately move the decimal point to the right two places every time?
#include <stdio.h>
int main(void)
{
int i;
float f = 120.99;
i = f * 100;
printf("%d", i);
}
Use the round function
float f = 120.99;
int i = round( f * 100.0 );
Be aware however, that a float typically only has 6 or 7 digits of precision, so there's a maximum value where this will work. The smallest float value that won't convert properly is the number 131072.01. If you multiply by 100 and round, the result will be 13107202.
You can extend the range of your numbers by using double values, but even a double has limited range. (A double has 16 or 17 digits of precision.) For example, the following code will print 10000000000000098
double d = 100000000000000.99;
uint64_t j = round( d * 100.0 );
printf( "%llu\n", j );
That's just an example, finding the smallest number is that exceeds the precision of a double is left as an exercise for the reader.
Use fixed-point arithmetic on integers:
#include <stdio.h>
#define abs(x) ((x)<0 ? -(x) : (x))
int main(void)
{
int d = 12099;
int i = d * 100;
printf("%d.%02d\n", d/100, abs(d)%100);
printf("%d.%02d\n", i/100, abs(i)%100);
}
Your problem is that float are represented internaly using IEEE-754. That is in base 2 and not in base 10. 0.25 will have an exact representation, but 0.1 has not, nor has 120.99.
What really happens is that due to floating point inacuracy, the ieee-754 float closest to the decimal value 120.99 multiplied by 100 is slightly below 12099, so it is truncated to 12098. You compiler should have warned you that you had a truncation from float to in (mine did).
The only foolproof way to get what you expect is to add 0.5 to the float before the truncation to int :
i = (f * 100) + 0.5
But beware floating point are inherently inaccurate when processing decimal values.
Edit :
Of course for negative numbers, it should be i = (f * 100) - 0.5 ...
If you'd like to continue operating on the number as a floating point number, then the answer is more or less no. There's various things you can do for small numbers, but as your numbers get larger, you'll have issues.
If you'd like to only print the number, then my recommendation would be to convert the number to a string, and then move the decimal point there. This can be slightly complicated depending on how you represent the number in the string (exponential and what not).
If you'd like this to work and you don't mind not using floating point, then I'd recommend researching any number of fixed decimal libraries.
You can use
float f = 120.99f
or
double f = 120.99
by default c store floating-point values as double so if you store them in float variable implicit casting is happened and it is bad ...
i think this works.

How do I compute maximum/minimum of 8 different float values

I need to find maximum and minimum of 8 float values I get. I did as follows. But float comparisons are going awry as warned by any good C book!
How do I compute the max and min in a accurate way.
main()
{
float mx,mx1,mx2,mx3,mx4,mn,mn1,mn2,mn3,mn4,tm1,tm2;
mx1 = mymax(2.1,2.01); //this returns 2.09999 instead of 2.1 because a is passed as 2.09999.
mx2 = mymax(-3.5,7.000001);
mx3 = mymax(7,5);
mx4 = mymax(7.0000011,0); //this returns incorrectly- 7.000001
tm1 = mymax(mx1,mx2);
tm2 = mymax(mx3,mx4);
mx = mymax(tm1,tm2);
mn1 = mymin(2.1,2.01);
mn2 = mymin(-3.5,7.000001);
mn3 = mymin(7,5);
mn4 = mymin(7.0000011,0);
tm1 = mymin(mx1,mx2);
tm2 = mymin(mx3,mx4);
mn = mymin(tm1,tm2);
printf("Max is %f, Min is %f \n",mx,mn);
getch();
}
float mymax(float a,float b)
{
if(a >= b)
{
return a;
}
else
{
return b;
}
}
float mymin(float a,float b)
{
if(a <= b)
{
return a;
}
else
{
return b;
}
}
How can I do exact comparisons of these floats? This is all C code.
thank you.
-AD.
You are doing exact comparison of these floats. The problem (with your example code at least) is that float simply does not have enough digits of precision to represent the values of your literals sufficiently. 7.000001 and 7.0000011 simply are so close together that the mantissa of a 32 bit float cannot represent them differently.
But the example seems artificial. What is the real problem you're trying to solve? What values will you actually be working with? Or is this just an academic exercise?
The best solution depends on the answer to that. If your actual values just require somewhat more more precision than float can provide, use double. If you need exact representation of decimal digits, use a decimal type library. If you want to improve your understanding of how floating point values work, read The Floating-Point Guide.
You can do exact comparison of floats. Either directly as floats, or by casting them to int with the same bit representation.
float a = 1.0f;
float b = 2.0f;
int &ia = *(int *)(&a);
int &ib = *(int *)(&b);
/* you can compare a and b, or ia and ib, the results will be the same,
whatever the values of the floats are.
Floats are ordered the correct way when its bits are considered as int
and thus can be compared (provided that float and int both are 32 bits).
*/
But you will never be able to represent exactly 2.1 as a float.
Your problem is not a problem of comparison, it is a problem of representation of a value.
I'd claim that these comparisons are actually exact, since no value is altered.
The problem is that many float literals can't be represented exactly by IEEE-754 floating point numbers. So for example 2.1.
If you need an exact representation of base 10 pointed numbers you could - for example - write your own fixed point BCD arithmetic.
Concerning finding min and max at the same time:
A way that needs less comparisons is for each index pair (2*i, 2*i+1) first finding the minimum (n/2 comparisons)
Then find the minimum of the minima ((n-1)/2 comparisons) and the maximum of the maxima ((n-1)/2 comparisons).
So we get (3*n-2)/2 comparisons instead of (2*n-2)/2 when finding the minimum and maximum separated.
The < and > comparison always works correct with floats or doubles. Only the == comparison has problems, therefore you are advised to use epsilon.
So your method of calculating min, max has no issue. Note that if you use float, you should use the notation 2.1f instead of 2.1. Just a note.

Resources