I am trying to understand what is the difference between the following:
printf("%f",4.567f);
printf("%f",4.567);
How does using the f suffix change/influence the output?
How using the 'f' changes/influences the output?
The f at the end of a floating point constant determines the type and can affect the value.
4.567 is floating point constant of type and precision of double. A double can represent exactly typical about 264 different values. 4.567 is not one on them*1. The closest alternative typically is exactly
4.56700000000000017053025658242404460906982421875 // best
4.56699999999999928235183688229881227016448974609375 // next best double
4.567f is floating point constant of type and precision of float. A float can represent exactly typical about 232 different values. 4.567 is not one on them. The closest alternative typically is exactly
4.566999912261962890625 // best
4.56700038909912109375 // next best float
When passed to printf() as part of the ... augments, a float is converted to double with the same value.
So the question becomes what is the expected difference in printing?
printf("%f",4.56700000000000017053025658242404460906982421875);
printf("%f",4.566999912261962890625);
Since the default number of digits after the decimal point to print for "%f" is 6, the output for both rounds to:
4.567000
To see a difference, print with more precision or try 4.567e10, 4.567e10f.
45670000000.000000 // double
45669998592.000000 // float
Your output may slightly differ to to quality of implementation issues.
*1 C supports many floating point encodings. A common one is binary64. Thus typical floating-point values are encoded as an sign * binary fraction * 2exponent. Even simple decimal values like 0.1 can not be represented exactly as such.
Related
My Matlab script reads a string value "0.001044397222448" from a file, and after parsing the file, this value printed in the console shows as double precision:
value_double =
0.001044397222448
After I convert this number to singe using value_float = single(value_double), the value shows as:
value_float =
0.0010444
What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?
My problem is that later on, after I compare this with analogous C code, I get differences. In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000. So the C code keeps good precision. But, does Matlab?
The number 0.001044397222448 (like the vast majority of decimal fractions) cannot be exactly represented in binary floating point.
As a single-precision float, it's most closely represented as (hex) 0x0.88e428 × 2-9, which in decimal is 0.001044397242367267608642578125.
In double precision, it's most closely represented as 0x0.88e427d4327300 × 2-9, which in decimal is 0.001044397222447999984407118745366460643708705902099609375.
Those are what the numbers are, internally, in both C and Matlab.
Everything else you see is an artifact of how the numbers are printed back out, possibly rounded and/or truncated.
When I said that the single-precision representation "in decimal is 0.001044397242367267608642578125", that's mildly misleading, because it makes it look like there are 28 or more digits' worth of precision. Most of those digits, however, are an artifact of the conversion from base 2 back to base 10. As other answers have noted, single-precision floating point actually gives you only about 7 decimal digits of precision, as you can see if you notice where the single- and double-precision equivalents start to diverge:
0.001044397242367267608642578125
0.001044397222447999984407118745366460643708705902099609375
^
difference
Similarly, double precision gives you roughly 16 decimal digits worth of precision, as you can see if you compare the results of converting a few previous and next mantissa values:
0x0.88e427d43272f8 0.00104439722244799976756668424826557384221814572811126708984375
0x0.88e427d4327300 0.001044397222447999984407118745366460643708705902099609375
0x0.88e427d4327308 0.00104439722244800020124755324246734744519926607608795166015625
0x0.88e427d4327310 0.0010443972224480004180879877395682342466898262500762939453125
^
changes
This also demonstrates why you can never exactly represent your original value 0.001044397222448 in binary. If you're using double, you can have 0.00104439722244799998, or you can have 0.0010443972224480002, but you can't have anything in between. (You'd get a little less close with float, and you could get considerably closer with long double, but you'll never get your exact value.)
In C, and whether you're using float or double, you can ask for as little or as much precision as you want when printing things with %f, and under a high-quality implementation you'll always get properly-rounded results. (Of course the results you get will always be the result of rounding the actual, internal value, not necessarily the decimal value you started with.) For example, if I run this code:
printf("%.5f\n", 0.001044397222448);
printf("%.10f\n", 0.001044397222448);
printf("%.15f\n", 0.001044397222448);
printf("%.20f\n", 0.001044397222448);
printf("%.30f\n", 0.001044397222448);
printf("%.40f\n", 0.001044397222448);
printf("%.50f\n", 0.001044397222448);
printf("%.60f\n", 0.001044397222448);
printf("%.70f\n", 0.001044397222448);
I see these results, which as you can see match the analysis above.
(Note that this particular example is using double, not float.)
0.00104
0.0010443972
0.001044397222448
0.00104439722244799998
0.001044397222447999984407118745
0.0010443972224479999844071187453664606437
0.00104439722244799998440711874536646064370870590210
0.001044397222447999984407118745366460643708705902099609375000
0.0010443972224479999844071187453664606437087059020996093750000000000000
I'm not sure how Matlab prints things.
In answer to your specific questions:
What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?
As a float, it is really "truncated" to a number which, converted back to decimal, is exactly 0.001044397242367267608642578125. But as we've seen, most of those digits are essentially meaningless, and the result can more properly thought of as being about 0.0010443972.
In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000
So C got the same answer I did -- but, again, most of those digits are not meaningful.
So the C code keeps good precision. But, does Matlab?
I'd be willing to bet that Matlab keeps the same internal precision for ordinary floats and doubles.
MATLAB uses IEEE-754 binary64 for its double-precision type and binary32 for single-precision. When 0.001044397222448 is rounded to the nearest value representable in binary64, the result is 4816432068447840•2−62 = 0.001044397222447999984407118745366460643708705902099609375.
When that is rounded to the nearest value representable in binary32, the result is 8971304•2−33 = 0.001044397242367267608642578125.
Various software (C, Matlab, others) displays floating-point numbers in diverse ways, with more or fewer digits. The above values are the exact numbers represented by the floating-point data, per the IEEE 754 specification, and they are the values the data has when used in arithmetic operations.
All single precisions should be the same
So here is the thing. According to documentation, both matlab and C comply with the IEEE 754 standard. Which means that there should not be any difference between what is actually stored in memory.
You could compute the binary representation by hand but according to this(thanks #Danijel) handy website, the representation of 0.001044397222448 should be 0x3a88e428.
The question is how precise is your representation? It is a bit tricky with floating point but the short answer is your number is accurate up to the 9th decimal and has decimal represented up to the 33rd decimal. If you want the long answer see the tow paragraphs at the end of this post.
A display issue
The fact that you are not seeing the same thing when you print does not mean that you don't have the same bits in memory (and you should have the exact same bytes in memory in C and MATLAB). The only reason you see a difference on your display is because the print functions truncate your number. If you print the 33 decimals in each language you should not have any difference.
To do so in matlab use: fprintf('%.33f', value_float);
To do so in c use printf('%.33f\n', gf);
About floating point precision
Now in more details, the question was: how precise is this representation? Well the tricky thing with floating point is that the precision of the representation depends on what number you are representing. The representation is over 32 bits and is divide with 1 bit for the sign, 8 for the exponent and 23 for the fraction.
The number can be computed as sign * 2^(exponent-127) * 1.fraction. This basically means that the maximal error/precision (depending on how you want to call it) is basically 2^(exponent-127-23), the 23 is here to represent the 23 bytes of the fraction. (There are a few edge cases, I won't elaborate on it). In our case the exponent is 117, which means your precision is 2^(117-127-23) = 1.16415321826934814453125e-10. That means that your single precision float should represent your number accurately up to the 9th decimal, after that it is up to luck.
Further details
I know this is a rather short explanation. For more details, this post explains the floating point imprecision more precisely and this website gives you some useful info and allows you to play visually with the representation.
I want to be able print out a double precision number s.t it is identical to the original decimal number in C language.
char buf[1000];
double d = ______;
For a double, I would do
snprintf(buf, sizeof(buf), "%.17g", .1);
But it will print out on doing : printf("%s", buf);
0.10000000000000001
There is a last 1 as a rounding error
It prints out the closest decimal representation double precision value if you try to represent .1 . Casting .1 to double, you get a rounding error. Casting back to decimal, you will get another rounding error.
Is there a way to fix this?
The "standard" solution for that isn't pretty: use hexadecimal notation for all your floating point constants instead of decimal. They look like 0x1.7aP-13 something that is hard to digest for humans, unfortunately. But with them you have the guarantee that writing out and reading back in gives you exactly the same value.
The tools from the standard C library strtod, printf (with format %a) and similar support this if you have a C library that implements at least C99. One minor point that you must be careful about when using that is that the decimal point of printf is locale dependent. So if your locale has for example a , for that you must be sure to write and read with the same locale.
In general what you are attempting to do is impossible. Floating point values use a binary representation and many values with terminating decimal representation do not have exact binary representations.
Fundamentally, if you wish to preserve the decimal representation then you should use a data type that represents that value as a decimal.
It might be worth pointing out the method that Python adopted for displaying floating point values. Python displays the shortest decimal representation whose nearest floating point value is the value being displayed.
Using 0.1 as an example, that value cannot be represented exactly in binary floating point. So the closest floating point value is used. When you ask for a decimal representation of that binary value, the Python runtime determines that 0.1 is the shortest representation such that float('0.1') == value.
If that approach would be of use to you then you should study this: https://bugs.python.org/issue1580
I have been asked a very simple question in the book to write the output of the following program -
#include<stdio.h>
int main()
{
float i=1.1;
while(i==1.1)
{
printf("%f\n",i);
i=i-0.1;
}
return 0;
}
Now I already read that I can use floating point numbers as loop counters but are not advisable which I learned. Now when I run this program inside the gcc, I get no output even though the logic is completely correct and according to which the value of I should be printed once. I tried printing the value of i and it gave me a result of 1.100000 . So I do not understand why the value is not being printed?
In most C implementations, using IEEE-754 binary floating-point, what happens in your program is:
The source text 1.1 is converted to a double. Since binary floating-point does not represent this value exactly, the result is the nearest representable value, 1.100000000000000088817841970012523233890533447265625.
The definition float i=1.1; converts the value to float. Since float has less precision than double, the result is 1.10000002384185791015625.
In the comparison i==1.1, the float 1.10000002384185791015625 is converted to double (which does not change its value) and compared to 1.100000000000000088817841970012523233890533447265625. Since they are unequal, the result is false.
The quantity 11/10 cannot be represented exactly in binary floating-point, and it has different approximations as double and as float.
The constant 1.1 in the source code is the double approximation of 11/10. Since i is of type float, it ends up containing the float approximation of 1.1.
Write while (i==1.1f) or declare i as double and your program will work.
Comparing floating point numbers:1
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precision, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false.
In short:
1.1 can't be represented exactly in binary floating pint number. This is like the decimal representation of 10/3 in decimal which is 3.333333333..........
I would suggest you to Read the article What Every Computer Scientist Should Know About Floating-Point Arithmetic.
1. For the experts who are encouraging beginner programmers to use == in floating point comparision
It is because i is not quite exactly 1.1.
If you are going to test a floating point, you should do something along the lines of while(i-1.1 < SOME_DELTA) where delta is the threshold where equality is good enough.
Read: https://softwareengineering.stackexchange.com/questions/101163/what-causes-floating-point-rounding-errors
This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 9 years ago.
float a;
a=8.3;
if(a==8.3)
printf("1");
else
printf("2");
giving a as 8.3 and 8.4 respectively and comparing with 8.3 and 8.4 correspondingly , output becomes 2 but when comparing with 8.5 output is 1. I found that it is related to concept of recurring binary which takes 8 bytes. I want to know how to find which number is recurring binary. kindly give some input.
Recurring numbers are not representable, hence floating point comparison will not work.
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Also as in the 2nd comment - floating point literals 8.3 has type double and a has type float.
Comparing with epsilon – absolute error
Since floating point calculations involve a bit of uncertainty we can try to allow for this by seeing if two numbers are ‘close’ to each other. If you decide – based on error analysis, testing, or a wild guess – that the result should always be within 0.00001 of the expected result then you can change your comparison to this:
if (fabs(result - expectedResult) < 0.00001)
For example, 3/7 is a repeating binary fraction, its computed value in double precision is different from its stored value in single precision. Thus the comparison 3/7 with its stored computed value fails.
For more please read - What Every Computer Scientist Should Know About Floating-Point Arithmetic
You should not compare floating point numbers for equality using ==. Because of how floating point numbers are actually stored in memory it will give inaccurate results.
Use something like this to determine if your number a is close enough to the desired value:
if(fabs(a-8.3) < 0.0000005))
There are two problems here.
First is that floating point literals like 8.3 have type double, while a has type float. Doubles and floats store values to different precisions, and for values that don't have an exact floating point representation (such as 8.3), the stored values are slightly different. Thus, the comparison fails.
You could fix this by writing the comparison as a==8.3f; the f suffix forces the literal to be a float instead of a double.
However, it's bad juju to compare floating point values directly; again, most values cannot be represented exactly, but only to an approximation. If a were the result of an expression involving multiple floating-point calcuations, it may not be equivalent to 8.3f. Ideally, you should look at the difference between the two values, and if it's less than some threshold, then they are effectively equivalent:
if ( fabs( a - 8.3f) < EPSILON )
{
// a is "equal enough" to 8.3
}
The exact value of EPSILON depends on a number of factors, not least of which is the magnitude of the values being compared. You only have so many digits of precision, so if the values you're trying to compare are greater than 999999.0, then you can't test for differences within 0.000001 of each other.
#include <stdio.h>
int main( )
{
float a=1.0;
long i;
for(i=0; i<100; i++)
{
a = a - 0.01;
}
printf("%e\n",a);
}
Result is: 6.59e-07
It's a binary floating point number, not a decimal one - therefore you need to expect rounding errors. See the Basic section in this article:
What Every Programmer Should Know About Floating-Point Arithmetic
For example, the value 0.01 does not have a precise represenation in binary floating point type. To get a "correct" result in your sample you would have to either round or use a a decimal floating point type (see Wikipedia):
Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixed-point or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.
There are two questions here. If you're asking, why is my printf statement displaying the result as 6.59e-07 instead of 0.000000659, it's because you've used the format specifier for Scientific Notation: %e. You want %f for the floating point a.
printf("%f\n",a);
If you're asking why the result is not exactly zero rather than 0.000000659, the answer is (as others have pointed out) that with floating point arithmetic using binary numbers you need to expect rounding.
You have to specify %f for printing the float number then it will print 0 for variable a.
That's floating point numbers rounding errors on the scene. Each time you subtract a fraction you get approximately the result you'd normally expect from a number on paper and so the final result is very close to zero, but not necessarily precise zero.
The precision with floating numbers isn't accurate, that's why you find this result.
Cordially