Consider the following snippet of code:
float val1 = 214.20;
double val2 = 214.20;
printf("float : %f, %4.6f, %4.2f \n", val1, val1, val1);
printf("double: %f, %4.6f, %4.2f \n", val2, val2, val2);
Which outputs:
float : 214.199997, 214.199997, 214.20 | <- the correct value I wanted
double: 214.200000, 214.200000, 214.20 |
I understand that 214.20 has an infinite binary representation. The first two elements of the first line have an approximation of the intended value, but the the last one seems to have no approximation at all, and this led me to the following question:
How do the scanf, fscanf, printf, fprintf (etc.) functions treat the precision formats?
With no precision provided, printf printed out an approximated value, but with %4.2f it gave the correct result. Can you explain me the algorithm used by these functions to handle precision?
The thing is, 214.20 cannot be expressed exactly with binary representation. Few decimal numbers can. So an approximation is stored. Now when you use printf, the binary representation is turned into a decimal representation, but it again cannot be expressed exactly and is only approximated.
As you noticed, you can give a precision to printf to tell it how to round the decimal approximation. And if you don't give it a precision then a precision of 6 is assumed (see the man page for details).
If you use %.40f for the float and %.40lf for the double in your example above, you will get these results:
214.1999969482421875000000000000000000000000
214.1999999999999886313162278383970260620117
They are different because with double, there are more bits to better approximate 214.20. But as you can see, they are still very odd when represented in decimal.
I recommend to read the Wikipedia article on floating point numbers for more insights about how floating point numbers work. An excellent read is also What Every Computer Scientist Should Know About Floating-Point Arithmetic
Since you asked about scanf, one thing you should note is that POSIX requires printf and a subsequent scanf (or strtod) to reconstruct the original value exactly as long as sufficiently significant digits (at least DECIMAL_DIG, I believe) were printed. Plain C of course makes no such requirement; the C standard pretty much allows floating point operations to give whatever result the implementor likes as long as they document it. However, if your intent is to store floating point numbers in text files to be read back later, you might be better off using the C99 %a specifier to print them in hex. This way they'll be exact and there's no confusion about whether the serialize/deserialize process loses precision.
Keep in mind thatr when I said "reconstruct the original value", I mean the actual value which was held in the variable/expression passed to printf, not the original decimal you wrote in the source file which got rounded to the best binary representation by the compiler.
scanf will round the input value to the nearest exactly representable floating point value. As DarkDust's answer illustrates, in single precision the closest exactly representable value is below the exact value, and in double precision the closest exactly representable value is above the exact value.
Related
My Matlab script reads a string value "0.001044397222448" from a file, and after parsing the file, this value printed in the console shows as double precision:
value_double =
0.001044397222448
After I convert this number to singe using value_float = single(value_double), the value shows as:
value_float =
0.0010444
What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?
My problem is that later on, after I compare this with analogous C code, I get differences. In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000. So the C code keeps good precision. But, does Matlab?
The number 0.001044397222448 (like the vast majority of decimal fractions) cannot be exactly represented in binary floating point.
As a single-precision float, it's most closely represented as (hex) 0x0.88e428 × 2-9, which in decimal is 0.001044397242367267608642578125.
In double precision, it's most closely represented as 0x0.88e427d4327300 × 2-9, which in decimal is 0.001044397222447999984407118745366460643708705902099609375.
Those are what the numbers are, internally, in both C and Matlab.
Everything else you see is an artifact of how the numbers are printed back out, possibly rounded and/or truncated.
When I said that the single-precision representation "in decimal is 0.001044397242367267608642578125", that's mildly misleading, because it makes it look like there are 28 or more digits' worth of precision. Most of those digits, however, are an artifact of the conversion from base 2 back to base 10. As other answers have noted, single-precision floating point actually gives you only about 7 decimal digits of precision, as you can see if you notice where the single- and double-precision equivalents start to diverge:
0.001044397242367267608642578125
0.001044397222447999984407118745366460643708705902099609375
^
difference
Similarly, double precision gives you roughly 16 decimal digits worth of precision, as you can see if you compare the results of converting a few previous and next mantissa values:
0x0.88e427d43272f8 0.00104439722244799976756668424826557384221814572811126708984375
0x0.88e427d4327300 0.001044397222447999984407118745366460643708705902099609375
0x0.88e427d4327308 0.00104439722244800020124755324246734744519926607608795166015625
0x0.88e427d4327310 0.0010443972224480004180879877395682342466898262500762939453125
^
changes
This also demonstrates why you can never exactly represent your original value 0.001044397222448 in binary. If you're using double, you can have 0.00104439722244799998, or you can have 0.0010443972224480002, but you can't have anything in between. (You'd get a little less close with float, and you could get considerably closer with long double, but you'll never get your exact value.)
In C, and whether you're using float or double, you can ask for as little or as much precision as you want when printing things with %f, and under a high-quality implementation you'll always get properly-rounded results. (Of course the results you get will always be the result of rounding the actual, internal value, not necessarily the decimal value you started with.) For example, if I run this code:
printf("%.5f\n", 0.001044397222448);
printf("%.10f\n", 0.001044397222448);
printf("%.15f\n", 0.001044397222448);
printf("%.20f\n", 0.001044397222448);
printf("%.30f\n", 0.001044397222448);
printf("%.40f\n", 0.001044397222448);
printf("%.50f\n", 0.001044397222448);
printf("%.60f\n", 0.001044397222448);
printf("%.70f\n", 0.001044397222448);
I see these results, which as you can see match the analysis above.
(Note that this particular example is using double, not float.)
0.00104
0.0010443972
0.001044397222448
0.00104439722244799998
0.001044397222447999984407118745
0.0010443972224479999844071187453664606437
0.00104439722244799998440711874536646064370870590210
0.001044397222447999984407118745366460643708705902099609375000
0.0010443972224479999844071187453664606437087059020996093750000000000000
I'm not sure how Matlab prints things.
In answer to your specific questions:
What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?
As a float, it is really "truncated" to a number which, converted back to decimal, is exactly 0.001044397242367267608642578125. But as we've seen, most of those digits are essentially meaningless, and the result can more properly thought of as being about 0.0010443972.
In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000
So C got the same answer I did -- but, again, most of those digits are not meaningful.
So the C code keeps good precision. But, does Matlab?
I'd be willing to bet that Matlab keeps the same internal precision for ordinary floats and doubles.
MATLAB uses IEEE-754 binary64 for its double-precision type and binary32 for single-precision. When 0.001044397222448 is rounded to the nearest value representable in binary64, the result is 4816432068447840•2−62 = 0.001044397222447999984407118745366460643708705902099609375.
When that is rounded to the nearest value representable in binary32, the result is 8971304•2−33 = 0.001044397242367267608642578125.
Various software (C, Matlab, others) displays floating-point numbers in diverse ways, with more or fewer digits. The above values are the exact numbers represented by the floating-point data, per the IEEE 754 specification, and they are the values the data has when used in arithmetic operations.
All single precisions should be the same
So here is the thing. According to documentation, both matlab and C comply with the IEEE 754 standard. Which means that there should not be any difference between what is actually stored in memory.
You could compute the binary representation by hand but according to this(thanks #Danijel) handy website, the representation of 0.001044397222448 should be 0x3a88e428.
The question is how precise is your representation? It is a bit tricky with floating point but the short answer is your number is accurate up to the 9th decimal and has decimal represented up to the 33rd decimal. If you want the long answer see the tow paragraphs at the end of this post.
A display issue
The fact that you are not seeing the same thing when you print does not mean that you don't have the same bits in memory (and you should have the exact same bytes in memory in C and MATLAB). The only reason you see a difference on your display is because the print functions truncate your number. If you print the 33 decimals in each language you should not have any difference.
To do so in matlab use: fprintf('%.33f', value_float);
To do so in c use printf('%.33f\n', gf);
About floating point precision
Now in more details, the question was: how precise is this representation? Well the tricky thing with floating point is that the precision of the representation depends on what number you are representing. The representation is over 32 bits and is divide with 1 bit for the sign, 8 for the exponent and 23 for the fraction.
The number can be computed as sign * 2^(exponent-127) * 1.fraction. This basically means that the maximal error/precision (depending on how you want to call it) is basically 2^(exponent-127-23), the 23 is here to represent the 23 bytes of the fraction. (There are a few edge cases, I won't elaborate on it). In our case the exponent is 117, which means your precision is 2^(117-127-23) = 1.16415321826934814453125e-10. That means that your single precision float should represent your number accurately up to the 9th decimal, after that it is up to luck.
Further details
I know this is a rather short explanation. For more details, this post explains the floating point imprecision more precisely and this website gives you some useful info and allows you to play visually with the representation.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
So i have been trying to make my own printf and now i stuck at %f.
The problem i have is i don't know what printf does in the background when i give it a float number like: f = 1.4769996 it print 1.477000.
but when i give it f = 1.4759995 it print the value 1.475999
float f = 1.4769996;
printf("%f\n", f); // 1.477000
f = 1.4759995;
printf("%f\n", f); // 1.475999
what i thought of is that printf see the 5 at last and it adds one but not working in the second example.
What is the logic behind this floating point ?
Your C implementation likely uses the IEEE-754 binary32 and binary64 formats for float and double. Given this, float f = 1.4769996; results in setting f to 1.47699964046478271484375, and f = 1.4759995; results in setting f to 1.47599947452545166015625.
Then it is easy to see that rounding 1.47699964046478271484375 to six digits after the decimal point results in 1.477000 (because the next digit is 6, so we round up), and rounding 1.47599947452545166015625 to six digits after the decimal point results in 1.475999 (because the next digit is 4, so we round down).
When working with floating-point numbers, it is important to understand each floating-point value represents one number exactly (unless it is a Not a Number [NaN] encoding). When you write 1.4769996 in source code, it is converted to a value representable in double. When you assign it to a float, it is converted to a value representable in float. Operations on the floating-point object behave as if the object have exactly the value it represents, not as if its value is the numeral you wrote in source code.
To provide some further details, the C standard requires (in C 2018 7.21.6.1 13) that formatting with f be correctly rounded if the number of digits requested is at most DECIMAL_DIG. DECIMAL_DIG is the number of decimal digits in the widest floating-point format the implementation supports such that converting any number in that format to a numeral with DECIMAL_DIG significant decimal digits and back to the floating-point format yields the original value (5.2.4.2.2 12). DECIMAL_DIG must be at least 10. If more than DECIMAL_DIG digits are requested, the C standard allows some leeway in rounding. However, high-quality C implementations will round correctly as specified by IEEE-754 (to the nearest number with the requested number of digits, with ties favoring an even low digit).
If you are trying to write your own printf, and if you are stuck on %f, there are three or four things you need to know:
When a "varargs" function like printf is called, arguments of type float are always implicitly promoted to type double. So when you've seen %f in the format string, and you're using va_arg() to pluck the next argument from the list, you'll want to pluck an argument of type double, not float. (This also means that you have just one case to handle, not two. Inside printf, you don't have to worry about handling type float at all.)
Printing the whole-number part of a double is easy; it's more or less the same problem as printing an int, which I'm guessing you've already figured out, if you've got %d working. And to do a straightforward, simpleminded job of printing the fractional part, it usually works pretty well to just repeatedly multiply by 10. That is, if you're trying to print 123.456, and you've already got the 123 part taken care of, you can then proceed to print the rest by taking the fractional part 0.456, multiplying by 10 to get 4.56 then truncating to get 4, then taking the new fractional part 0.56 and repeating.
There is no such number as 1.4769996. (There's no such number as the 123.456 I was just using, either.) When we write numbers like 1.4769996 and 123.456 we're thinking about decimal fractions, but most computers (including the one you're using) use binary fractions internally, and you can't represent decimal fractions like 1.4769996 and 123.456 exactly in binary, so the actual numbers are always a little bit different than you expect, which is why you often get slight "roundoff error", or extra 999's at the end when you expected 000.
Doing a proper job on this stuff is really, really hard. If you're trying to write your own printf, and you've gotten to %f, and if you can get it working pretty well most of the time, consider yourself lucky, and call it a day. Don't get bogged down on the last digit -- or if you're bound and determined to get the last digit right in every case (which is certainly a noble goal), do some research and set aside some time, because you're going to be working at it for a while.
I want to be able print out a double precision number s.t it is identical to the original decimal number in C language.
char buf[1000];
double d = ______;
For a double, I would do
snprintf(buf, sizeof(buf), "%.17g", .1);
But it will print out on doing : printf("%s", buf);
0.10000000000000001
There is a last 1 as a rounding error
It prints out the closest decimal representation double precision value if you try to represent .1 . Casting .1 to double, you get a rounding error. Casting back to decimal, you will get another rounding error.
Is there a way to fix this?
The "standard" solution for that isn't pretty: use hexadecimal notation for all your floating point constants instead of decimal. They look like 0x1.7aP-13 something that is hard to digest for humans, unfortunately. But with them you have the guarantee that writing out and reading back in gives you exactly the same value.
The tools from the standard C library strtod, printf (with format %a) and similar support this if you have a C library that implements at least C99. One minor point that you must be careful about when using that is that the decimal point of printf is locale dependent. So if your locale has for example a , for that you must be sure to write and read with the same locale.
In general what you are attempting to do is impossible. Floating point values use a binary representation and many values with terminating decimal representation do not have exact binary representations.
Fundamentally, if you wish to preserve the decimal representation then you should use a data type that represents that value as a decimal.
It might be worth pointing out the method that Python adopted for displaying floating point values. Python displays the shortest decimal representation whose nearest floating point value is the value being displayed.
Using 0.1 as an example, that value cannot be represented exactly in binary floating point. So the closest floating point value is used. When you ask for a decimal representation of that binary value, the Python runtime determines that 0.1 is the shortest representation such that float('0.1') == value.
If that approach would be of use to you then you should study this: https://bugs.python.org/issue1580
I need to convert a floating-point number with system precision to one with a specified precision (e.g. 3 decimal places) for the printed output. The fprintf function will not suffice for this as it will not correctly round some numbers. All the other solutions I've tried fail in that they all reintroduce undesired precision when I convert back to a float. For example:
float xf_round1_f(float input, int prec) {
printf("%f\t",input);
int trunc = round(input * pow(10, prec));
printf("%f\t",(float)trunc);
input=(float)trunc / pow(10, prec);
printf("%f\n",input);
return (input);
}
This function prints the input, the truncated integer and the output to each line, and the result looks like this for some numbers supposed to be truncated to 3 decimal places:
49.975002 49975.000000 49.974998
49.980000 49980.000000 49.980000
49.985001 49985.000000 49.985001
49.990002 49990.000000 49.990002
49.995003 49995.000000 49.994999
50.000000 50000.000000 50.000000
You can see that the second step works as intended - even when "trunc" is cast to float for printing - but as soon as I convert it back to a float the precision returns. The 1st and 6th rows illustrate problem cases.
Surely there must be a way of resolving this - even if the 1st row result remained 49.975002 a formatted print would give the desired effect, but in this case there is a real problem.
Any solutions?
Binary floating-point cannot represent most decimal numerals exactly. Each binary floating-point number is formed by multiplying an integer by a power of two. For the common implementation of float, IEEE-754 32-bit binary floating-point, that integer must be in (–224, 224). There is no integer x and integer y such that x•2y exactly equals 49.975. Therefore, when you divide 49975 by 1000, the result must be an approximation.
If you merely need to format a number for output, you can do this with the usual fprintf format specifiers. If you need to compute exactly with such numbers, you may be able to do it by scaling them to representable values and doing the arithmetic either in floating-point or in integer arithmetic, depending on your needs.
Edit: it appears you may only care about the printed results. printf is generally smart enough to do proper rounding to the number of digits you specify. If you give a format of "%.3f" you will probably get what you need.
If your only problem is with the cases that are below the desired number, you can easily fix it by making everything higher than the desired number instead. Unfortunately this increases the absolute error of the answer; even a result that was exact before, such as 50.000 is now off.
Simply add this line to the end of the function:
input=nextafterf(input, input*1.0001);
See it in action at http://ideone.com/iHNTzs
49.975002 49975.000000 49.974998 49.975002
49.980000 49980.000000 49.980000 49.980003
49.985001 49985.000000 49.985001 49.985004
49.990002 49990.000000 49.990002 49.990005
49.995003 49995.000000 49.994999 49.995003
50.000000 50000.000000 50.000000 50.000004
If you require exact representation of all decimal fractions with three digits after the decimal point, you can work in thousandths. Use an integer data type to represent one thousand times the actual number for all intermediate results.
Fixed point numbers. That is where you keep the actual numbers in a wide precision integer format, for example long or long long. And you also keep the number of decimal places. And then you will also need methods to scale the fixed point number by the decimal places. And some way to convert to/from strings.
The reason why you are having trouble that 1/10 is not representable exactly as a fractional power of 2 (1/2, 1/4, 1/8, etc). This is the same reason that 1/3 is a repeating decimal in base 10 (0.33333...).
#include <stdio.h>
int main( )
{
float a=1.0;
long i;
for(i=0; i<100; i++)
{
a = a - 0.01;
}
printf("%e\n",a);
}
Result is: 6.59e-07
It's a binary floating point number, not a decimal one - therefore you need to expect rounding errors. See the Basic section in this article:
What Every Programmer Should Know About Floating-Point Arithmetic
For example, the value 0.01 does not have a precise represenation in binary floating point type. To get a "correct" result in your sample you would have to either round or use a a decimal floating point type (see Wikipedia):
Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixed-point or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.
There are two questions here. If you're asking, why is my printf statement displaying the result as 6.59e-07 instead of 0.000000659, it's because you've used the format specifier for Scientific Notation: %e. You want %f for the floating point a.
printf("%f\n",a);
If you're asking why the result is not exactly zero rather than 0.000000659, the answer is (as others have pointed out) that with floating point arithmetic using binary numbers you need to expect rounding.
You have to specify %f for printing the float number then it will print 0 for variable a.
That's floating point numbers rounding errors on the scene. Each time you subtract a fraction you get approximately the result you'd normally expect from a number on paper and so the final result is very close to zero, but not necessarily precise zero.
The precision with floating numbers isn't accurate, that's why you find this result.
Cordially