Printing out double precision numbers identical to original decimal number

Printing out double precision numbers identical to original decimal number - c

I want to be able print out a double precision number s.t it is identical to the original decimal number in C language.
char buf[1000];
double d = ______;
For a double, I would do
snprintf(buf, sizeof(buf), "%.17g", .1);
But it will print out on doing : printf("%s", buf);
0.10000000000000001
There is a last 1 as a rounding error
It prints out the closest decimal representation double precision value if you try to represent .1 . Casting .1 to double, you get a rounding error. Casting back to decimal, you will get another rounding error.
Is there a way to fix this?

The "standard" solution for that isn't pretty: use hexadecimal notation for all your floating point constants instead of decimal. They look like 0x1.7aP-13 something that is hard to digest for humans, unfortunately. But with them you have the guarantee that writing out and reading back in gives you exactly the same value.
The tools from the standard C library strtod, printf (with format %a) and similar support this if you have a C library that implements at least C99. One minor point that you must be careful about when using that is that the decimal point of printf is locale dependent. So if your locale has for example a , for that you must be sure to write and read with the same locale.

In general what you are attempting to do is impossible. Floating point values use a binary representation and many values with terminating decimal representation do not have exact binary representations.
Fundamentally, if you wish to preserve the decimal representation then you should use a data type that represents that value as a decimal.
It might be worth pointing out the method that Python adopted for displaying floating point values. Python displays the shortest decimal representation whose nearest floating point value is the value being displayed.
Using 0.1 as an example, that value cannot be represented exactly in binary floating point. So the closest floating point value is used. When you ask for a decimal representation of that binary value, the Python runtime determines that 0.1 is the shortest representation such that float('0.1') == value.
If that approach would be of use to you then you should study this: https://bugs.python.org/issue1580

Related

Matlab "single" precision vs C floating point?

My Matlab script reads a string value "0.001044397222448" from a file, and after parsing the file, this value printed in the console shows as double precision:
value_double =
0.001044397222448
After I convert this number to singe using value_float = single(value_double), the value shows as:
value_float =
0.0010444
What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?
My problem is that later on, after I compare this with analogous C code, I get differences. In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000. So the C code keeps good precision. But, does Matlab?

The number 0.001044397222448 (like the vast majority of decimal fractions) cannot be exactly represented in binary floating point.
As a single-precision float, it's most closely represented as (hex) 0x0.88e428 × 2-9, which in decimal is 0.001044397242367267608642578125.
In double precision, it's most closely represented as 0x0.88e427d4327300 × 2-9, which in decimal is 0.001044397222447999984407118745366460643708705902099609375.
Those are what the numbers are, internally, in both C and Matlab.
Everything else you see is an artifact of how the numbers are printed back out, possibly rounded and/or truncated.
When I said that the single-precision representation "in decimal is 0.001044397242367267608642578125", that's mildly misleading, because it makes it look like there are 28 or more digits' worth of precision. Most of those digits, however, are an artifact of the conversion from base 2 back to base 10. As other answers have noted, single-precision floating point actually gives you only about 7 decimal digits of precision, as you can see if you notice where the single- and double-precision equivalents start to diverge:
0.001044397242367267608642578125
0.001044397222447999984407118745366460643708705902099609375
^
difference
Similarly, double precision gives you roughly 16 decimal digits worth of precision, as you can see if you compare the results of converting a few previous and next mantissa values:
0x0.88e427d43272f8 0.00104439722244799976756668424826557384221814572811126708984375
0x0.88e427d4327300 0.001044397222447999984407118745366460643708705902099609375
0x0.88e427d4327308 0.00104439722244800020124755324246734744519926607608795166015625
0x0.88e427d4327310 0.0010443972224480004180879877395682342466898262500762939453125
^
changes
This also demonstrates why you can never exactly represent your original value 0.001044397222448 in binary. If you're using double, you can have 0.00104439722244799998, or you can have 0.0010443972224480002, but you can't have anything in between. (You'd get a little less close with float, and you could get considerably closer with long double, but you'll never get your exact value.)
In C, and whether you're using float or double, you can ask for as little or as much precision as you want when printing things with %f, and under a high-quality implementation you'll always get properly-rounded results. (Of course the results you get will always be the result of rounding the actual, internal value, not necessarily the decimal value you started with.) For example, if I run this code:
printf("%.5f\n", 0.001044397222448);
printf("%.10f\n", 0.001044397222448);
printf("%.15f\n", 0.001044397222448);
printf("%.20f\n", 0.001044397222448);
printf("%.30f\n", 0.001044397222448);
printf("%.40f\n", 0.001044397222448);
printf("%.50f\n", 0.001044397222448);
printf("%.60f\n", 0.001044397222448);
printf("%.70f\n", 0.001044397222448);
I see these results, which as you can see match the analysis above.
(Note that this particular example is using double, not float.)
0.00104
0.0010443972
0.001044397222448
0.00104439722244799998
0.001044397222447999984407118745
0.0010443972224479999844071187453664606437
0.00104439722244799998440711874536646064370870590210
0.001044397222447999984407118745366460643708705902099609375000
0.0010443972224479999844071187453664606437087059020996093750000000000000
I'm not sure how Matlab prints things.
In answer to your specific questions:
What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?
As a float, it is really "truncated" to a number which, converted back to decimal, is exactly 0.001044397242367267608642578125. But as we've seen, most of those digits are essentially meaningless, and the result can more properly thought of as being about 0.0010443972.
In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000
So C got the same answer I did -- but, again, most of those digits are not meaningful.
So the C code keeps good precision. But, does Matlab?
I'd be willing to bet that Matlab keeps the same internal precision for ordinary floats and doubles.

MATLAB uses IEEE-754 binary64 for its double-precision type and binary32 for single-precision. When 0.001044397222448 is rounded to the nearest value representable in binary64, the result is 4816432068447840•2−62 = 0.001044397222447999984407118745366460643708705902099609375.
When that is rounded to the nearest value representable in binary32, the result is 8971304•2−33 = 0.001044397242367267608642578125.
Various software (C, Matlab, others) displays floating-point numbers in diverse ways, with more or fewer digits. The above values are the exact numbers represented by the floating-point data, per the IEEE 754 specification, and they are the values the data has when used in arithmetic operations.

All single precisions should be the same
So here is the thing. According to documentation, both matlab and C comply with the IEEE 754 standard. Which means that there should not be any difference between what is actually stored in memory.
You could compute the binary representation by hand but according to this(thanks #Danijel) handy website, the representation of 0.001044397222448 should be 0x3a88e428.
The question is how precise is your representation? It is a bit tricky with floating point but the short answer is your number is accurate up to the 9th decimal and has decimal represented up to the 33rd decimal. If you want the long answer see the tow paragraphs at the end of this post.
A display issue
The fact that you are not seeing the same thing when you print does not mean that you don't have the same bits in memory (and you should have the exact same bytes in memory in C and MATLAB). The only reason you see a difference on your display is because the print functions truncate your number. If you print the 33 decimals in each language you should not have any difference.
To do so in matlab use: fprintf('%.33f', value_float);
To do so in c use printf('%.33f\n', gf);
About floating point precision
Now in more details, the question was: how precise is this representation? Well the tricky thing with floating point is that the precision of the representation depends on what number you are representing. The representation is over 32 bits and is divide with 1 bit for the sign, 8 for the exponent and 23 for the fraction.
The number can be computed as sign * 2^(exponent-127) * 1.fraction. This basically means that the maximal error/precision (depending on how you want to call it) is basically 2^(exponent-127-23), the 23 is here to represent the 23 bytes of the fraction. (There are a few edge cases, I won't elaborate on it). In our case the exponent is 117, which means your precision is 2^(117-127-23) = 1.16415321826934814453125e-10. That means that your single precision float should represent your number accurately up to the 9th decimal, after that it is up to luck.
Further details
I know this is a rather short explanation. For more details, this post explains the floating point imprecision more precisely and this website gives you some useful info and allows you to play visually with the representation.

printf behaviour in C

I am trying to understand what is the difference between the following:
printf("%f",4.567f);
printf("%f",4.567);
How does using the f suffix change/influence the output?

How using the 'f' changes/influences the output?
The f at the end of a floating point constant determines the type and can affect the value.
4.567 is floating point constant of type and precision of double. A double can represent exactly typical about 264 different values. 4.567 is not one on them*1. The closest alternative typically is exactly
4.56700000000000017053025658242404460906982421875 // best
4.56699999999999928235183688229881227016448974609375 // next best double
4.567f is floating point constant of type and precision of float. A float can represent exactly typical about 232 different values. 4.567 is not one on them. The closest alternative typically is exactly
4.566999912261962890625 // best
4.56700038909912109375 // next best float
When passed to printf() as part of the ... augments, a float is converted to double with the same value.
So the question becomes what is the expected difference in printing?
printf("%f",4.56700000000000017053025658242404460906982421875);
printf("%f",4.566999912261962890625);
Since the default number of digits after the decimal point to print for "%f" is 6, the output for both rounds to:
4.567000
To see a difference, print with more precision or try 4.567e10, 4.567e10f.
45670000000.000000 // double
45669998592.000000 // float
Your output may slightly differ to to quality of implementation issues.
*1 C supports many floating point encodings. A common one is binary64. Thus typical floating-point values are encoded as an sign * binary fraction * 2exponent. Even simple decimal values like 0.1 can not be represented exactly as such.

No Output Coming In Simple C Program

I have been asked a very simple question in the book to write the output of the following program -
#include<stdio.h>
int main()
{
float i=1.1;
while(i==1.1)
{
printf("%f\n",i);
i=i-0.1;
}
return 0;
}
Now I already read that I can use floating point numbers as loop counters but are not advisable which I learned. Now when I run this program inside the gcc, I get no output even though the logic is completely correct and according to which the value of I should be printed once. I tried printing the value of i and it gave me a result of 1.100000 . So I do not understand why the value is not being printed?

In most C implementations, using IEEE-754 binary floating-point, what happens in your program is:
The source text 1.1 is converted to a double. Since binary floating-point does not represent this value exactly, the result is the nearest representable value, 1.100000000000000088817841970012523233890533447265625.
The definition float i=1.1; converts the value to float. Since float has less precision than double, the result is 1.10000002384185791015625.
In the comparison i==1.1, the float 1.10000002384185791015625 is converted to double (which does not change its value) and compared to 1.100000000000000088817841970012523233890533447265625. Since they are unequal, the result is false.

The quantity 11/10 cannot be represented exactly in binary floating-point, and it has different approximations as double and as float.
The constant 1.1 in the source code is the double approximation of 11/10. Since i is of type float, it ends up containing the float approximation of 1.1.
Write while (i==1.1f) or declare i as double and your program will work.

Comparing floating point numbers:1
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precision, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false.
In short:
1.1 can't be represented exactly in binary floating pint number. This is like the decimal representation of 10/3 in decimal which is 3.333333333..........
I would suggest you to Read the article What Every Computer Scientist Should Know About Floating-Point Arithmetic.
1. For the experts who are encouraging beginner programmers to use == in floating point comparision

It is because i is not quite exactly 1.1.
If you are going to test a floating point, you should do something along the lines of while(i-1.1 < SOME_DELTA) where delta is the threshold where equality is good enough.
Read: https://softwareengineering.stackexchange.com/questions/101163/what-causes-floating-point-rounding-errors

Why is not a==0 in the following code?

#include <stdio.h>
int main( )
{
float a=1.0;
long i;
for(i=0; i<100; i++)
{
a = a - 0.01;
}
printf("%e\n",a);
}
Result is: 6.59e-07

It's a binary floating point number, not a decimal one - therefore you need to expect rounding errors. See the Basic section in this article:
What Every Programmer Should Know About Floating-Point Arithmetic
For example, the value 0.01 does not have a precise represenation in binary floating point type. To get a "correct" result in your sample you would have to either round or use a a decimal floating point type (see Wikipedia):
Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixed-point or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.

There are two questions here. If you're asking, why is my printf statement displaying the result as 6.59e-07 instead of 0.000000659, it's because you've used the format specifier for Scientific Notation: %e. You want %f for the floating point a.
printf("%f\n",a);
If you're asking why the result is not exactly zero rather than 0.000000659, the answer is (as others have pointed out) that with floating point arithmetic using binary numbers you need to expect rounding.

You have to specify %f for printing the float number then it will print 0 for variable a.

That's floating point numbers rounding errors on the scene. Each time you subtract a fraction you get approximately the result you'd normally expect from a number on paper and so the final result is very close to zero, but not necessarily precise zero.

The precision with floating numbers isn't accurate, that's why you find this result.
Cordially

How do printf and scanf handle floating point precision formats?

Consider the following snippet of code:
float val1 = 214.20;
double val2 = 214.20;
printf("float : %f, %4.6f, %4.2f \n", val1, val1, val1);
printf("double: %f, %4.6f, %4.2f \n", val2, val2, val2);
Which outputs:
float : 214.199997, 214.199997, 214.20 | <- the correct value I wanted
double: 214.200000, 214.200000, 214.20 |
I understand that 214.20 has an infinite binary representation. The first two elements of the first line have an approximation of the intended value, but the the last one seems to have no approximation at all, and this led me to the following question:
How do the scanf, fscanf, printf, fprintf (etc.) functions treat the precision formats?
With no precision provided, printf printed out an approximated value, but with %4.2f it gave the correct result. Can you explain me the algorithm used by these functions to handle precision?

The thing is, 214.20 cannot be expressed exactly with binary representation. Few decimal numbers can. So an approximation is stored. Now when you use printf, the binary representation is turned into a decimal representation, but it again cannot be expressed exactly and is only approximated.
As you noticed, you can give a precision to printf to tell it how to round the decimal approximation. And if you don't give it a precision then a precision of 6 is assumed (see the man page for details).
If you use %.40f for the float and %.40lf for the double in your example above, you will get these results:
214.1999969482421875000000000000000000000000
214.1999999999999886313162278383970260620117
They are different because with double, there are more bits to better approximate 214.20. But as you can see, they are still very odd when represented in decimal.
I recommend to read the Wikipedia article on floating point numbers for more insights about how floating point numbers work. An excellent read is also What Every Computer Scientist Should Know About Floating-Point Arithmetic

Since you asked about scanf, one thing you should note is that POSIX requires printf and a subsequent scanf (or strtod) to reconstruct the original value exactly as long as sufficiently significant digits (at least DECIMAL_DIG, I believe) were printed. Plain C of course makes no such requirement; the C standard pretty much allows floating point operations to give whatever result the implementor likes as long as they document it. However, if your intent is to store floating point numbers in text files to be read back later, you might be better off using the C99 %a specifier to print them in hex. This way they'll be exact and there's no confusion about whether the serialize/deserialize process loses precision.
Keep in mind thatr when I said "reconstruct the original value", I mean the actual value which was held in the variable/expression passed to printf, not the original decimal you wrote in the source file which got rounded to the best binary representation by the compiler.

scanf will round the input value to the nearest exactly representable floating point value. As DarkDust's answer illustrates, in single precision the closest exactly representable value is below the exact value, and in double precision the closest exactly representable value is above the exact value.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight