This question already has answers here:
Is floating point math broken?
(31 answers)
Floating point comparison [duplicate]
(5 answers)
Closed 5 years ago.
Consider the following code:
float x = 0.1;
if ( x> 0.1){
printf("if");
}
Now when I check equality, it is explained that x is being converted to double by padding 0s at the end, but 0.1 on RHS is stored in double, hence the inequality.
But if I go through this logic, the "if" in the above code should have given false, but istead it is true.
Why?
(Restricting the answer to IEEE754 although all floating point schemes supported by C mandate that the double set is a superset of the float set.)
0.1 is 0.1000000000000000055511151231257827021181583404541015625
0.1f is 0.100000001490116119384765625.
0.100000001490116119384765625 is promoted to a double, but it's the same number as all floats can be represented as doubles exactly.
So (double)(0.1f) is bigger than 0.1, accounting for the result.
The key issue is that 0.1 can't be represented exactly in binary. Converting this to base 2, you get a number with a repetend in the fraction part, 1001 is repeating forever (just like in decimal, 1/3 ends up with 3 repeating forever).
When storing 0.1 in a float, it ends up getting rounded up in the last digit *), therefore the float 0.1 is larger than the double 0.1
So, yes, converting the float to double just appends (binary) zeros, but the float was already a bit too large in the first place.
*) this actually depends on the representation of floating point numbers your implementation uses. With IEEE 754 floats, the representation would end with
... 10011001100
and because the next digit would be a 1, rounding is done upwards, so the final result ends with
... 10011001101
Related
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
I'm getting confused with floating point number in c programming
#include <stdio.h>
int main()
{
float a = 255.167715;
printf("%f", a);
return 0;
} // it print value of 255.167709
Why it produce value like so? To be honest can you tell how it actually works in C programming?
In binary, 255.167715 is approximately 11111111.001010101110111101011110110010000000110001110…2. In your C implementation, most likely, the source code 255.167715 is converted to 11111111.0010101011101111010111101100100000001100011102, which is 255.16771499999998695784597657620906829833984375, because 255.167715 is a double constant, and that is the closest value representable in your implementation’s double type to the decimal number 255.167715, because the double type has only 53-bit significands. (A significand is the fraction portion of a floating-point number. There is also a sign and an exponent portion.)
Then, for float a = 255.167715;, this double value is converted to float. Since the float type has only 24-bit significands, the result is 11111111.00101010111011112, which is 255.1677093505859375 in decimal.
When you print this with the default formatting of %f, six digits after the decimal place are used, so it prints “255.167709”.
The short answer is that there is no such single-precision floating-point number as 255.167715. This is true for any computer using IEEE 754 floating-point formats; the situation is not unique to C. As a single-precision floating-point number, the closest value is 255.167709.
Single-precision floating point gives you the equivalent of 7 or so decimal digits of precision. And as you can see, the input you gave and the output you got agreed to seven places, namely 255.1677. Any digits past that are essentially meaningless.
For much more about the sometimes-surprising aspects of floating-point math, see "Is floating point math broken?".
This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 6 years ago.
float f=2.2;
if (f==2.2)
printf("abc");
else
printf("xyz");
This code prints xyz,while if we give 2.5 instead of 2.2 the output is abc.
2.2 is a double that is not exactly representable in binary floating point. In effect, you are comparing the double 2.2 to the result of converting it to float, with rounding, and then back to double.
Assuming double is IEEE 754 64-bit binary floating point, and float is IEEE 754 32-bit binary floating point, the closest double to decimal 2.2 has exact value 2.20000000000000017763568394002504646778106689453125, and the result of converting it to float is 2.2000000476837158203125. Converting the float back to double does not change its value. 2.20000000000000017763568394002504646778106689453125 is not equal to 2.2000000476837158203125.
2.5 is exactly representable in both float and double in binary floating point systems, so neither the conversion to float nor the conversion back to double changes the value. 2.5 is equal to 2.5.
Floating-point numbers are likely to contain round-off error
Ref: http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)FloatingPoint.html
Going through the documentation,
I see the following. (Point 6 in the URL above)
In general, floating-point numbers are not exact: they are likely to contain round-off error because of the truncation of the mantissa to a fixed number of bits.
The easiest way to avoid accumulating error is to use high-precision floating-point numbers (this means using double instead of float). On modern CPUs there is little or no time penalty for doing so....
One consequence of round-off error is that it is very difficult to test floating-point numbers for equality, unless you are sure you have an exact value as described above. It is generally not the case, for example, that (0.1+0.1+0.1) == 0.3 in C. This can produce odd results if you try writing something like for(f = 0.0; f <= 0.3; f += 0.1): it will be hard to predict in advance whether the loop body will be executed with f = 0.3 or not. (Even more hilarity ensues if you write for(f = 0.0; f != 0.3; f += 0.1), which after not quite hitting 0.3 exactly keeps looping for much longer than I am willing to wait to see it stop, but which I suspect will eventually converge to some constant value of f large enough that adding 0.1 to it has no effect.)
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
today happened to me a strange thing, when I try to compile and execute the output of this code isn't what I expected. Here is the code that simply add floating values to an array of float and then print it out.
The simple code:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%f\n",r[z]);
}
}
the output:
0.634000
1.634000
8.634000
27.634001
64.634003
125.634003
216.634003
343.634003
512.633972
729.633972
note that from the 27 appears numbers after the .634 that should not be there. Anyone know why this happened? It's an event caused by floating point approximation?..
P.S I have a linux debian system, 64 bit
thanks all
A number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Edit (Edward's comment): Larger (more bits) floating point representations allow for greater precision.
Yes, this is a floating point approximation error or Round-off error. Floating point numbers representation uses quantization to represent a large range of numbers, so it only represent steps and round all the in-between numbers to the nearest step. This cause error if the wanted number is not one of these steps.
In addition to the other useful answers, it can be illustrative to print more digits than the default:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%.30f\n",r[z]);
}
}
gives
0.634000003337860107421875000000
1.633999943733215332031250000000
8.633999824523925781250000000000
27.634000778198242187500000000000
64.634002685546875000000000000000
125.634002685546875000000000000000
216.634002685546875000000000000000
343.634002685546875000000000000000
512.633972167968750000000000000000
729.633972167968750000000000000000
In particular, note that 0.634 isn't actually "0.634", but instead is the closest number representable by a float.
"float" has only about six digit precision, so it isn't unexpected that you get errors that large.
If you used "double", you would have about 15 digits precision. You would have an error, but you would get for example 125.634000000000003 and not 125.634003.
So you will always get rounding errors and your results will not be quite what you expect, but by using double the effect will be minimal. Warning: If you do things like adding 125 + 0.634 and then subtract 125, the result will (most likely) not be 0.634. No matter whether you use float or double. But with double, the result will be very, very close to 0.634.
In principle, given the choice of float and double, you should never use float, unless you have a very, very good reason.
This question already has answers here:
Is floating point math broken?
(31 answers)
Identical float values comparing as inequal [duplicate]
(4 answers)
Closed 3 years ago.
The program below outputs This No. is not same. Why does it do this when both numbers are the same?
void main() {
float f = 2.7;
if(f == 2.7) {
printf("This No. is same");
} else {
printf("This No. is not same");
}
}
Why does it do this when both numbers are the same?
The numbers are not the same value.
double can represent exactly typically about 264 different values.
float can represent exactly typically about 232 different values.
2.7 is not one of the values - some approximations are made given the binary nature of floating-point encoding versus the decimal text 2.7
The compiler converts 2.7 to the nearest representable double or
2.70000000000000017763568394002504646778106689453125
given the typical binary64 representation of double.
The next best double is
2.699999999999999733546474089962430298328399658203125.
Knowing the exact value beyond 17 significant digits has reduced usefulness.
When the value is assigned to a float, it becomes the nearest representable float or
2.7000000476837158203125.
2.70000000000000017763568394002504646778106689453125 does not equal 2.7000000476837158203125.
Should a compiler use a float/double which represents 2.7 exactly like with decimal32/decimal64, code would work as OP expected. double representation using an underlying decimal format is rare. Far more often the underlying double representation is base 2 and these conversion artifacts need to be considered when programming.
Had the code been float f = 2.5;, the float and double value, using a binary or decimal underlying format, would have made if (f == 2.5) true. The same value as a high precision double is representable exactly as a low precision float.
(Assuming binary32/binary64 floating point)
double has 53 bits of significance and float has 24. The key is that if the number as a double has its least significant (53-24) bits set to 0, when converted to float, it will have the same numeric value as a float or double. Numbers like 1, 2.5 and 2.7000000476837158203125 fulfill that. (range, sub-normal, and NaN issues ignored here.)
This is a reason why exact floating point comparisons are typically only done in select situations.
Check this program:
#include<stdio.h>
int main()
{
float x = 0.1;
printf("%zu %zu %zu\n", sizeof(x), sizeof(0.1), sizeof(0.1f));
return 0;
}
Output is 4 8 4.
The values used in an expression are considered as double (double precision floating point format) unless a f is specified at the end. So the expression “x==0.1″ has a double on right side and float which are stored in a single precision floating point format on left side.
In such situations float is promoted to double .
The double precision format uses uses more bits for precision than single precision format.
In your case add 2.7f to get expected result.
The literal 2.7 in f == 2.7 is converted to double that's why 2.7 is not equal to f.
void main()
{
float a = 0.7;
if (a < 0.7)
printf("c");
else
printf("c++");
}
In the above question for 0.7, "c" will be printed, but for 0.8, "c++" wil be printed. Why?
And how is any float represented in binary form?
At some places, it is mentioned that internally 0.7 will be stored as 0.699997, but 0.8 as 0.8000011. Why so?
basically with float you get 32 bits that encode
VALUE = SIGN * MANTISSA * 2 ^ (128 - EXPONENT)
32-bits = 1-bit 23-bits 8-bits
and that is stored as
MSB LSB
[SIGN][EXPONENT][MANTISSA]
since you only get 23 bits, that's the amount of "precision" you can store. If you are trying to represent a fraction that is irrational (or repeating) in base 2, the sequence of bits will be "rounded off" at the 23rd bit.
0.7 base 10 is 7 / 10 which in binary is 0b111 / 0b1010 you get:
0.1011001100110011001100110011001100110011001100110011... etc
Since this repeats, in fixed precision there is no way to exactly represent it. The
same goes for 0.8 which in binary is:
0.1100110011001100110011001100110011001100110011001101... etc
To see what the fixed precision value of these numbers is you have to "cut them off" at the number of bits you and do the math. The only trick is you the leading 1 is implied and not stored so you technically get an extra bit of precision. Because of rounding, the last bit will be a 1 or a 0 depending on the value of the truncated bit.
So the value of 0.7 is effectively 11,744,051 / 2^24 (no rounding effect) = 0.699999988 and the value of 0.8 is effectively 13,421,773 / 2^24 (rounded up) = 0.800000012.
That's all there is to it :)
A good reference for this is What Every Computer Scientist Should Know About Floating-Point Arithmetic. You can use higher precision types (e.g. double) or a Binary Coded Decimal (BCD) library to achieve better floating point precision if you need it.
The internal representation is IEE754.
You can also use this calculator to convert decimal to float, I hope this helps to understand the format.
floats will be stored as described in IEEE 754: 1 bit for sign, 8 for a biased exponent, and the rest storing the fractional part.
Think of numbers representable as floats as points on the number line, some distance apart; frequently, decimal fractions will fall in between these points, and the nearest representation will be used; this leads to the counterintuitive results you describe.
"What every computer scientist should know about floating point arithmetic" should answer all your questions in detail.
If you want to know how float/double is presented in C(and almost all languages), please refert to Standard for Floating-Point Arithmetic (IEEE 754) http://en.wikipedia.org/wiki/IEEE_754-2008
Using single-precision floats as an example, here is the bit layout:
seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm meaning
31 0 bit #
s = sign bit, e = exponent, m = mantissa
Another good resource to see how floating point numbers are stored as binary in computers is Wikipedia's page on IEEE-754.
Floating point numbers in C/C++ are represented in IEEE-754 standard format. There are many articles on the internet, that describe in much better detail than I can here, how exactly a floating point is represented in binary. A simple search for IEEE-754 should illuminate the mystery.
0.7 is a numeric literal; it's value is the mathematical real number 0.7, rounded to the nearest double value.
After initialising float a = 0.7, the value of a is 0.7 rounded to float, that is the real number 0.7, rounded to the nearest double value, rounded to the nearest float value. Except by a huge coincidence, you wouldn't expect a to be equal to 0.7.
"if (a < 0.7)" compares 0.7 rounded to double then to float with the number 0.7 rounded to double. It seems that in the case of 0.7, the rounding produced a smaller number. And in the same experiment with 0.8, rounding 0.8 to float will produce a larger number than 0.8.
Floating point comparisons are not reliable, whatever you do. You should use threshold tolerant comparison/ epsilon comparison of floating points.
Try IEEE-754 Floating-Point Conversion and see what you get. :)