Comparing float and double

Comparing float and double - c

#include <stdio.h>
int main(void){
float a = 1.1;
double b = 1.1;
if(a == b){
printf("if block");
}
else{
printf("else block");
}
return 0;
}
Prints: else block
#include <stdio.h>
int main(void){
float a = 1.5;
double b = 1.5;
if(a == b){
printf("if block");
}
else{
printf("else block");
}
return 0;
}
Prints: if block
What is the logic behind this?
Compiler used: gcc-4.3.4

This is because 1.1 is not exactly representable in binary floating-point. But 1.5 is.
As a result, the float and double representations will hold slightly different values of 1.1.
Here is exactly the difference when written out as binary floating-point:
(float) 1.1 = (0.00011001100110011001101)₂
(double)1.1 = (0.0001100110011001100110011001100110011001100110011010)₂
Thus, when you compare them (and the float version gets promoted), they will not be equal.

Must read: What Every Computer Scientist Should Know About Floating-Point Arithmetic

The exact value of 1.1 decimal in binary is non-ending fraction 1.00011001100110011001100(1100).... The double constant 1.1 is 53-bit truncation / approximate value of that mantissa. Now this when converted to float, the mantissa will be represented just in 24 bits.
When the float is converted back to double, the mantissa is now back to 53 bits, but all memory of the digits beyond 24 are lost - the value is zero-extended, and now you're comparing (for example, depending on the rounding behaviour)
1.0001100110011001100110011001100110011001100110011001
and
1.0001100110011001100110000000000000000000000000000000
Now, if you used 1.5 instead of 1.1;
1.5 decimal is exactly 1.1 in binary. It can be presented exactly in just 2 bit mantissa, therefore even the 24 bits of float are an exaggeration... what you have is
1.1000000000000000000000000000000000000000000000000000
and
1.10000000000000000000000
The latter, zero extended to a double would be
1.1000000000000000000000000000000000000000000000000000
which clearly is the same number.

Related

Similar codes output different results

The output of the following code is 0.0000000:
#include <stdio.h>
int main() {
float x;
x = (float)3.3 == 3.3;
printf("%f", x);
return 0;
}
Whereas this code outputs 1.000000:
int main() {
float x;
x = (float)3.5 == 3.5;
printf("%f", x);
return 0;
}
The only difference between the 2 codes is the value in the comparison, however, the results are not the same, why is this?

This line:
x=(float)3.3==3.3;
Compares (float)3.3 and 3.3 for equality and assigns the result to x. The left side of the comparison has type float because of the cast, while the right side has type double which is the default type for floating point constants.
The value 3.3 cannot be represented exactly in binary floating point, so the actual value stored is an approximation. This approximated value will be different for types float and double due to their differing precisions, so the equality will evaluated to false, i.e. 0. This is the value that gets assigned to x.
Regarding your comment on why x is 1 when the number you're checking is 3.5, that is because 3.5 can be represented exactly in binary, and both types have the precision to store that value, so they compare equal.

The assigned value 3.3 is a type of double but you're trying to compare a double with a float (by typecasting and due to this, precision losses).
The value of 3.3 as double is 3.299999999999999822 whereas the same value in float is measured 3.299999952F which are clearly unequal. Hence, the result will be true (i.e. 1.0000000) if you typecast the other 3.3 as float.
Rather than:
x = (float) 3.3 == 3.3; // float != double precision (precision loss)
If you do this:
x = (float) 3.3 == (float) 3.3; // converting both to make precision equal
Or,
x = (double) 3.3 == (double) 3.3; // converting . . . (same)
In other words, the comparison will be equal if you convert any one of the expression as same as the other one.
Also, notice that 3.5 is equal to 3.50000000... in both float and double, hence all the trailing zeroes are truncated from the assigned variable and hence you get 1.0000000. But this stuff is just a bit contrary with 3.3.

float has less precision than double, the value of 3.3 constant defaults to double and is 3.299999999999999822, the same constant when converted to float is 3.29999995231628418.
3.299999999999999822 == 3.29999995231628418 the result of this comparison is false i.e. 0.
Given the precedence rules, the expresssion amounts to x = ((float)3.5 == 3.5);, the comparison is evaluated first and the result is assigned to x.
When there is no cast both constants default to double so naturally the result of the comparison between them is true i.e. 1.
Regarding the comparison between 3.5 double and float being true, it has to do with the binary conversion, 3.3 is subjected to an aproximation given the fact that the mantissa conversion would go on indefinitely as can be seen in the link above, the exact value simply cannot be represented in double nor in float, whereas 3.5 is perfectly representable both in double and float alike.

To see loss of precision with float:
#include <stdio.h>
int main(){
float x = 3.3;
double d = 3.3;
printf("%12.9f %12.9lf\n",x, d);
Output:
3.299999952 3.300000000

3.3 in binary is
11.0100110011001100110011001100110011001100110011
When converted to float some bits of precision are truncated
11.0100110011001100110011001100110
When converted back to double, computer just uses 0's
11.0100110011001100110011001100110000000000000000
So
3.3 11.0100110011001100110011001100110011001100110011
(float)3.3 11.0100110011001100110011001100110
(double)((float)3.3) 11.0100110011001100110011001100110000000000000000
3.5 for comparison
3.5 11.1000000000000000000000000000000000000000000000
(float)3.5 11.1000000000000000000000000000000
(double)((float)3.5) 11.1000000000000000000000000000000000000000000000

Values like 3.3 cannot be represented exactly in a finite number of bits, just like the result of 1/3 cannot be represented exactly in a finite number of decimal digits, so you wind up storing an approximation of the value.
The float approximation is different from the double approximation, so the comparison (float) 3.3 == 3.3 fails.
By contrast, 3.5 can be represented exactly in both float and double types, so the comparison (float) 3.5 == 3.5 succeeds.
Just like with integer types, the significand of a floating-point type is a sum of powers of 2 - the value 3.5 is represented as 1.75 * 21, and the binary representation of the significand 1.75 is 1.112 - 1 * 20 + 1 * 2-1 + 1 * 2-2, or 1 + 0.5 + 0.25.

The issue is that you loose precision when converting the value 3.3, which is a literal of type double, to a float value with the cast (float)3.3. This loss of precision is irreversible, even if the comparison operator == will promote the left operand back to type double.
So the issue is that
(double)((float)3.3) == (double)3.3
will be false since the cast of 3.3 to float looses precision. For 3.5, in contrast, the result will be true, because 3.5 can be exactly represented as float in the same precision as double can.
Actually, the situation can be compared to casting a value from a higher rank to a lower and then back like in the following snippet:
unsigned int x = 257;
unsigned char y = x;
unsigned int x2 = y;
printf("%d\n", x==x2); // 0
x = 255;
y = x;
x2 = y;
printf("%d\n", x==x2); // 1

Can I not provide an equality condition in a while loop?

I wrote this code and only 'Hello' got printed.
float x=1.1;
printf("Hello\n");
while(x-1.1==0)
{
printf("%f\n",x);
x=x-1;
}

When you are dealing with floating point operations, you don't get the results as you would expect.
What you are seeing is:
1.1 is represented as a float in x.
In the while statement, 1.1 is of type double, not float. Hence, x is promoted to a double before the subtraction and comparison is made.
You lose precision in these steps. Hence x-1.1 does not evaluate to 0.0.
You can see expected results if you use appropriate floating point constants.
#include <stdio.h>
void test1()
{
printf("In test1...\n");
float x=1.1;
// Use a literal of type float, not double.
if (x-1.1f == 0)
{
printf("true\n");
}
else
{
printf("false\n");
}
}
void test2()
{
printf("In test1...\n");
// Use a variable of type double, not float.
double x=1.1;
if (x-1.1 == 0)
{
printf("true\n");
}
else
{
printf("false\n");
}
}
int main()
{
test1();
test2();
return 0;
}
Output:
In test1...
true
In test2...
true

This is because x is a single-precision floating-point number, but you subtract the constant 1.1 from it, which is double-precision. So your single-precision 1.1 is converted to double-precision, and the subtraction is performed, but the result is non-zero (since 1.1 cannot be exactly represented, but the double-precision value is closer than the single-precision value). Try the following:
#include <stdio.h>
int main()
{
float x = 1.1;
double y = 1.1;
printf("%.20g\n", x - 1.1);
printf("%.20g\n", y - 1.1);
return 0;
}
On my computer, the result is:
2.384185782133840803e-08
0

Compare float like -0.000001 < x - 1.1 && x - 1.1 < 0.00001

You need to read up on floating point precision.
TL;DR is - x-1.1 is not really round 0.
for example, in my debugger, x equals to 1.10000002 - this is due to the nature of the floating point precision.
Relevant read:
http://floating-point-gui.de/

behaviour of float in C

#include<stdio.h>
int main()
{
float f = 0.1;
double d = 0.1;
printf("%lu %lu %lu %lu\n", sizeof(f), sizeof(0.1f), sizeof(0.1), sizeof(d));
return 0;
}
Output
$ ./a.out
4 4 8 8
As per above code, we can see sizeof(0.1) and sizeof(0.1f) are not same.
sizeof(0.1) is 8 bytes, while sizeof(0.1f) is 4 bytes.
but while assigning the value to float variable f, it automatically truncates its size to 4 bytes.
While in below code, while comparing it with float x it is not truncating and 4 bytes of float are compared with 8 bytes of 0.1, value of float x matches with 0.1f as both are of 4 bytes.
#include<stdio.h>
int main()
{
float x = 0.1;
if (x == 0.1)
printf("IF");
else if (x == 0.1f)
printf("ELSE IF");
else
printf("ELSE");
}
Output
$ ./a.out
ELSE IF
why and how it is truncating while assigning and not while comparing?

A floating point literal without a suffix is of type double. Suffixing it with an f makes a literal of type float.
When assigning to a variable, the right operand to = is converted to the type of the left operand, thus you observe truncation.
When comparing, the operands to == are converted to the larger of the two operands, so x == 0.1 is like (double)x == 0.1, which is false since (double)(float)0.1 is not equal to 0.1 due to rounding issues. In x == 0.1f, both operands have type float, which results in equality on your machine.
Floating point math is tricky, read the standard for more details.

a floating point constant like 0.1 is a double unless specified as a float like 0.1f. The line
float f = 0.1;
means create a double with value 0.1 and cast it to float and lose precision in the process. The lines
float x = 0.1;
if (x == 0.1)
will cause x to be implicitly converted to double but it will have a slightly different value than for e.g. double x = 0.1;

0.1f (the "f" after the number) is for the computer as float , that how your compailer know that he need to store it as float and not as double.
so float 0.1 not equal to 0.1 , its equal to 0.1f

when you write 0.1 , it is considered by default as double. suffix f explicitly make it float.
In second question float are stored as ieee standard so it it's going in else if because equivalent conversion of 0.1f to double is not same.
https://en.wikipedia.org/wiki/Floating_point

0.1 is a double value whereas 0.1f is a float value.
The reason we can write float x=0.1 as well as double x=0.1 is due to implicit conversions .
But by using suffix f you make it a float type .
In this -
if(x == 0.1)
is flase because 0.1 is not exactly 0.1 at some places after decimal .There is also conversion in this to higher type i.e double.
Converting to float and then to double , there is loss of information as also double as higher precession than float so it differs .

float vs double comparison [duplicate]

This question already has answers here:
Comparing float and double
(3 answers)
Closed 7 years ago.
int main(void)
{
  float me = 1.1;  
double you = 1.1;   
if ( me == you ) {
printf("I love U");
} else {
printf("I hate U");
}
}
This prints "I hate U". Why?

Floats use binary fraction. If you convert 1.1 to float, this will result in a binary representation.
Each bit right if the binary point halves the weight of the digit, as much as for decimal, it divides by ten. Bits left of the point double (times ten for decimal).
in decimal: ... 0*2 + 1*1 + 0*0.5 + 0*0.25 + 0*0.125 + 1*0.0625 + ...
binary: 0 1 . 0 0 0 1 ...
2's exp: 1 0 -1 -2 -3 -4
(exponent to the power of 2)
Problem is that 1.1 cannot be converted exactly to binary representation. For double, there are, however, more significant digits than for float.
If you compare the values, first, the float is converted to double. But as the computer does not know about the original decimal value, it simply fills the trailing digits of the new double with all 0, while the double value is more precise. So both do compare not equal.
This is a common pitfall when using floats. For this and other reasons (e.g. rounding errors), you should not use exact comparison for equal/unequal), but a ranged compare using the smallest value different from 0:
#include "float.h"
...
// check for "almost equal"
if ( fabs(fval - dval) <= FLT_EPSILON )
...
Note the usage of FLT_EPSILON, which is the aforementioned value for single precision float values. Also note the <=, not <, as the latter will actually require exact match).
If you compare two doubles, you might use DBL_EPSILON, but be careful with that.
Depending on intermediate calculations, the tolerance has to be increased (you cannot reduce it further than epsilon), as rounding errors, etc. will sum up. Floats in general are not forgiving with wrong assumptions about precision, conversion and rounding.
Edit:
As suggested by #chux, this might not work as expected for larger values, as you have to scale EPSILON according to the exponents. This conforms to what I stated: float comparision is not that simple as integer comparison. Think about before comparing.

In short, you should NOT use == to compare floating points.
for example
float i = 1.1; // or double
float j = 1.1; // or double
This argument
(i==j) == true // is not always valid
for a correct comparison you should use epsilon (very small number):
(abs(i-j)<epsilon)== true // this argument is valid

The question simplifies to why do me and you have different values?
Usually, C floating point is based on a binary representation. Many compilers & hardware follow IEEE 754 binary32 and binary64. Rare machines use a decimal, base-16 or other floating point representation.
OP's machine certainly does not represent 1.1 exactly as 1.1, but to the nearest representable floating point number.
Consider the below which prints out me and you to high precision. The previous representable floating point numbers are also shown. It is easy to see me != you.
#include <math.h>
#include <stdio.h>
int main(void) {
float me = 1.1;
double you = 1.1;
printf("%.50f\n", nextafterf(me,0)); // previous float value
printf("%.50f\n", me);
printf("%.50f\n", nextafter(you,0)); // previous double value
printf("%.50f\n", you);
1.09999990463256835937500000000000000000000000000000
1.10000002384185791015625000000000000000000000000000
1.09999999999999986677323704498121514916420000000000
1.10000000000000008881784197001252323389053300000000
But it is more complicated: C allows code to use higher precision for intermediate calculations depending on FLT_EVAL_METHOD. So on another machine, where FLT_EVAL_METHOD==1 (evaluate all FP to double), the compare test may pass.
Comparing for exact equality is rarely used in floating point code, aside from comparison to 0.0. More often code uses an ordered compare a < b. Comparing for approximate equality involves another parameter to control how near. #R.. has a good answer on that.

Because you are comparing two Floating point!
Floating point comparison is not exact because of Rounding Errors. Simple values like 1.1 or 9.0 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. For example:
float a = 9.0 + 16.0
double b = 25.0
if(a == b) // can be false!
if(a >= b) // can also be false!
Even
if(abs(a-b) < 0.0001) // wrong - don't do this
This is a bad way to do it because a fixed epsilon (0.0001) is chosen because it “looks small”, could actually be way too large when the numbers being compared are very small as well.
I personally use the following method, may be this will help you:
#include <iostream> // std::cout
#include <cmath> // std::abs
#include <algorithm> // std::min
using namespace std;
#define MIN_NORMAL 1.17549435E-38f
#define MAX_VALUE 3.4028235E38f
bool nearlyEqual(float a, float b, float epsilon) {
float absA = std::abs(a);
float absB = std::abs(b);
float diff = std::abs(a - b);
if (a == b) {
return true;
} else if (a == 0 || b == 0 || diff < MIN_NORMAL) {
return diff < (epsilon * MIN_NORMAL);
} else {
return diff / std::min(absA + absB, MAX_VALUE) < epsilon;
}
}
This method passes tests for many important special cases, for different a, b and epsilon.
And don't forget to read What Every Computer Scientist Should Know About Floating-Point Arithmetic!

Float doesn't change when i add 0.1 to it

I am quite a newbie to c. So when i writing a small game demo, i face a really strange problem.
void testC()
{
float a = 825300160;
float b = a + 0.1;
assert(a != b);
}
The above assert statement can't passed. Very strange.
My environment is mac os ml. gcc 4.2.1

The fractional portion of a float consists of 23 bits. You need 30 bits to represent 825300160, so the less significant portion of the number is dropped. Adding .1 does not make a difference - you need to add roughly 32 for the number to change:
float a = 825300160;
float b = a + 31.5;
assert(a != b); // No change is detected
float c = a + 32;
assert(a != c); // Change is detected

There's not enough precision in the float type. If you really need to distinguish a 0.1 addition to a number as large as 825300160, use double.

As this site shows, both a and b would be represented as
0 10011100 10001001100010001010011
in the IEEE standard for floats, where the first bit is the sign, the next 8 are the exponent, and the remaining 23 the mantissa. There's just not enough space in those 23 bits to represent the difference, because the exponent is so large.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight