This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
I'm getting confused with floating point number in c programming
#include <stdio.h>
int main()
{
float a = 255.167715;
printf("%f", a);
return 0;
} // it print value of 255.167709
Why it produce value like so? To be honest can you tell how it actually works in C programming?
In binary, 255.167715 is approximately 11111111.001010101110111101011110110010000000110001110…2. In your C implementation, most likely, the source code 255.167715 is converted to 11111111.0010101011101111010111101100100000001100011102, which is 255.16771499999998695784597657620906829833984375, because 255.167715 is a double constant, and that is the closest value representable in your implementation’s double type to the decimal number 255.167715, because the double type has only 53-bit significands. (A significand is the fraction portion of a floating-point number. There is also a sign and an exponent portion.)
Then, for float a = 255.167715;, this double value is converted to float. Since the float type has only 24-bit significands, the result is 11111111.00101010111011112, which is 255.1677093505859375 in decimal.
When you print this with the default formatting of %f, six digits after the decimal place are used, so it prints “255.167709”.
The short answer is that there is no such single-precision floating-point number as 255.167715. This is true for any computer using IEEE 754 floating-point formats; the situation is not unique to C. As a single-precision floating-point number, the closest value is 255.167709.
Single-precision floating point gives you the equivalent of 7 or so decimal digits of precision. And as you can see, the input you gave and the output you got agreed to seven places, namely 255.1677. Any digits past that are essentially meaningless.
For much more about the sometimes-surprising aspects of floating-point math, see "Is floating point math broken?".
Related
This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 3 years ago.
double a = 1E100
printf("%f", a);
It seems to be printing a different value other than 1 followed by a 100 zeros:
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104
What might be the problem here?
The double type, assuming it uses IEEE 754 double precision floating point representation, only holds about 16 decimal digits of precision.
So any decimal digits after that will only be an approximate value, i.e. the closest value to 1E100 that it can represent.
This question already has answers here:
Is floating point math broken?
(31 answers)
Floating point comparison [duplicate]
(5 answers)
Closed 5 years ago.
Consider the following code:
float x = 0.1;
if ( x> 0.1){
printf("if");
}
Now when I check equality, it is explained that x is being converted to double by padding 0s at the end, but 0.1 on RHS is stored in double, hence the inequality.
But if I go through this logic, the "if" in the above code should have given false, but istead it is true.
Why?
(Restricting the answer to IEEE754 although all floating point schemes supported by C mandate that the double set is a superset of the float set.)
0.1 is 0.1000000000000000055511151231257827021181583404541015625
0.1f is 0.100000001490116119384765625.
0.100000001490116119384765625 is promoted to a double, but it's the same number as all floats can be represented as doubles exactly.
So (double)(0.1f) is bigger than 0.1, accounting for the result.
The key issue is that 0.1 can't be represented exactly in binary. Converting this to base 2, you get a number with a repetend in the fraction part, 1001 is repeating forever (just like in decimal, 1/3 ends up with 3 repeating forever).
When storing 0.1 in a float, it ends up getting rounded up in the last digit *), therefore the float 0.1 is larger than the double 0.1
So, yes, converting the float to double just appends (binary) zeros, but the float was already a bit too large in the first place.
*) this actually depends on the representation of floating point numbers your implementation uses. With IEEE 754 floats, the representation would end with
... 10011001100
and because the next digit would be a 1, rounding is done upwards, so the final result ends with
... 10011001101
This question already has answers here:
Why is printf round floating point numbers up?
(3 answers)
Closed 6 years ago.
#include<stdio.h>
void main()
{
float f = 2.5367;
printf("%3.3f",f);
}
Output is : 2.537
But how?
I think output should be 2.536 but the output is 2.537.
Because 2.5367 is closer to 2.537. Delta from 2.536 is 0.007 while delta from 2.537 is just 0.003.
What you want to do is just deleting last character of its string representation and is wrong in math.
In addition to existing answers, note that many C compilers try to follow IEEE 754 for floating-point matters. IEEE 754 recommends rounding according to the current rounding mode for conversions from binary floating-point to decimal. The default rounding mode is “round to nearest and ties to even”. Some compilation platforms do not take the rounding mode into account and always round according to the default nearest-even mode in conversions from floating-point to decimal.
since float f = 2.5367 needs to rounding up and there should be 3 digit after decimal, so the value will be 2.*** , means 2.537.
The documentation says:
The precision value specifies the number of digits after the decimal point. If a decimal point appears, at least one digit appears before it. The value is rounded to the appropriate number of digits.
That said, please use another book or resource for learning, whoever taught you that it's ok to use void main() is plain wrong and should not teach C.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
today happened to me a strange thing, when I try to compile and execute the output of this code isn't what I expected. Here is the code that simply add floating values to an array of float and then print it out.
The simple code:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%f\n",r[z]);
}
}
the output:
0.634000
1.634000
8.634000
27.634001
64.634003
125.634003
216.634003
343.634003
512.633972
729.633972
note that from the 27 appears numbers after the .634 that should not be there. Anyone know why this happened? It's an event caused by floating point approximation?..
P.S I have a linux debian system, 64 bit
thanks all
A number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Edit (Edward's comment): Larger (more bits) floating point representations allow for greater precision.
Yes, this is a floating point approximation error or Round-off error. Floating point numbers representation uses quantization to represent a large range of numbers, so it only represent steps and round all the in-between numbers to the nearest step. This cause error if the wanted number is not one of these steps.
In addition to the other useful answers, it can be illustrative to print more digits than the default:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%.30f\n",r[z]);
}
}
gives
0.634000003337860107421875000000
1.633999943733215332031250000000
8.633999824523925781250000000000
27.634000778198242187500000000000
64.634002685546875000000000000000
125.634002685546875000000000000000
216.634002685546875000000000000000
343.634002685546875000000000000000
512.633972167968750000000000000000
729.633972167968750000000000000000
In particular, note that 0.634 isn't actually "0.634", but instead is the closest number representable by a float.
"float" has only about six digit precision, so it isn't unexpected that you get errors that large.
If you used "double", you would have about 15 digits precision. You would have an error, but you would get for example 125.634000000000003 and not 125.634003.
So you will always get rounding errors and your results will not be quite what you expect, but by using double the effect will be minimal. Warning: If you do things like adding 125 + 0.634 and then subtract 125, the result will (most likely) not be 0.634. No matter whether you use float or double. But with double, the result will be very, very close to 0.634.
In principle, given the choice of float and double, you should never use float, unless you have a very, very good reason.
This question already has answers here:
Is floating point math broken?
(31 answers)
Identical float values comparing as inequal [duplicate]
(4 answers)
Closed 3 years ago.
The program below outputs This No. is not same. Why does it do this when both numbers are the same?
void main() {
float f = 2.7;
if(f == 2.7) {
printf("This No. is same");
} else {
printf("This No. is not same");
}
}
Why does it do this when both numbers are the same?
The numbers are not the same value.
double can represent exactly typically about 264 different values.
float can represent exactly typically about 232 different values.
2.7 is not one of the values - some approximations are made given the binary nature of floating-point encoding versus the decimal text 2.7
The compiler converts 2.7 to the nearest representable double or
2.70000000000000017763568394002504646778106689453125
given the typical binary64 representation of double.
The next best double is
2.699999999999999733546474089962430298328399658203125.
Knowing the exact value beyond 17 significant digits has reduced usefulness.
When the value is assigned to a float, it becomes the nearest representable float or
2.7000000476837158203125.
2.70000000000000017763568394002504646778106689453125 does not equal 2.7000000476837158203125.
Should a compiler use a float/double which represents 2.7 exactly like with decimal32/decimal64, code would work as OP expected. double representation using an underlying decimal format is rare. Far more often the underlying double representation is base 2 and these conversion artifacts need to be considered when programming.
Had the code been float f = 2.5;, the float and double value, using a binary or decimal underlying format, would have made if (f == 2.5) true. The same value as a high precision double is representable exactly as a low precision float.
(Assuming binary32/binary64 floating point)
double has 53 bits of significance and float has 24. The key is that if the number as a double has its least significant (53-24) bits set to 0, when converted to float, it will have the same numeric value as a float or double. Numbers like 1, 2.5 and 2.7000000476837158203125 fulfill that. (range, sub-normal, and NaN issues ignored here.)
This is a reason why exact floating point comparisons are typically only done in select situations.
Check this program:
#include<stdio.h>
int main()
{
float x = 0.1;
printf("%zu %zu %zu\n", sizeof(x), sizeof(0.1), sizeof(0.1f));
return 0;
}
Output is 4 8 4.
The values used in an expression are considered as double (double precision floating point format) unless a f is specified at the end. So the expression “x==0.1″ has a double on right side and float which are stored in a single precision floating point format on left side.
In such situations float is promoted to double .
The double precision format uses uses more bits for precision than single precision format.
In your case add 2.7f to get expected result.
The literal 2.7 in f == 2.7 is converted to double that's why 2.7 is not equal to f.