C float and double comparisons

C float and double comparisons - c

I'm comparing simple floats and doubles in C, specifically the value 8.7 for both of them. Now I assign 8.7 to each variable, when I print I get a result of 8.7000 for both values. Why has the compiler added these zeros. And the main question I wanted to ask was is there any further numbers that I'm not seeing, as in hidden after the trailing zeros. I read that I shouldn't do comparisons like this with float because of a lack of precision, but I thought with such a small value surely it can store 8.7 with a degree of accuracy needed to compare itself with another 8.7 value?
My only worry is that its actually being represented somewhere in memory as eg 8.70000003758 or something, which is throwing my comparisons off? I tried to printf with %.20f to see any further numbers that might be hiding but I think that just created numbers that were otherwise not there as the whole accuracy of the number changed to 8.6918734634834929 or something similar.

I'm comparing simple floats and doubles in C, specifically the value 8.7 for both of them.
Bad choice, since 8.7 has no exact binary representation.
Now I assign 8.7 to each variable, when I print I get a result of 8.7000 for both values. Why has the compiler added these zeros.
It hasn't, your print routine has.
And the main question I wanted to ask was is there any further numbers that I'm not seeing, as in hidden after the trailing zeros.
Definitely, since 8.7 has no exact binary representation. (Try to write it out as the sum of integer powers of 2, you can't do it.)
I read that I shouldn't do comparisons like this with float because of a lack of precision, but I thought with such a small value surely it can store 8.7 with a degree of accuracy needed to compare itself with another 8.7 value?
You thought wrong. 1/3 is small but has no exact decimal representation with a finite number of digits. Whether a value is big or small has nothing to do with whether it can be represented exactly with a finite number of digits in a particular base.
My only worry is that its actually being represented somewhere in memory as eg 8.70000003758 or something, which is throwing my comparisons off?
Exactly, just as representing 1/3 as 0.333333333 would do.
I tried to printf with %.20f to see any further numbers that might be hiding but I think that just created numbers that were otherwise not there as the whole accuracy of the number changed to 8.6918734634834929 or something similar.
That's probably just a bug. Show us that code. Perhaps you tried to output a double and left out the l.

Related

What should printf("%.15e", 1e23); print?

I am experimenting with optimizing double->text conversion (trying to beat grissu, ryu etc...).
While doing so, I am comparing my results with sprintf outputs. Now I have encountered the above interesting case.
printf("%.15e", 1e23);
(e.g. glibc) prints
9.999999999999999e+22
while my routine prints
1.000000000000000e+23
Now both numbers are the same distance from the "true value" and converting both these values back (e.g. with atof) yields the same double.
However, I believe my result satisfies "round to even" rule (which is the reason it went this way).
Which result is more correct?

1e23 is typically not exactly represented as a double.
The 2 closest choices are:
// %a v %f
0x1.52d02c7e14af6p+76 99999999999999991611392.000000
0x1.52d02c7e14af7p+76 100000000000000008388608.000000
There are both 8388608.0 from 100000000000000000000000.0, one above, one below.
Typically the selected one in tie cases is the even one. (See last bit in hex format.)
99999999999999991611392.000000 is the winner and so an output of "9.999999999999999e+22" is expected.
When printing with more the DBL_DIG (15) significant digits, ("%.15e" prints 16 significant digits) this problem is possible as code is effectively doing a text-double-text round-trip and exceeding what double can round-trip.
I am experimenting with optimizing double->text conversion
I recommend to also use "%a" to gain deeper understanding.

Data lost from float assignment to float print

This came up during testing where I have to compare values between actual output and expected output.
Code:
float nf = 584227.4649743827f;
printf("Output: \t %.9f \n", nf);
Output:
Output: 584227.437500
I clearly have some gaps in my knowledge in C, so could someone explain to me this situation:
Why is there this deviation (0.027474382659420f) in the print ?
Is this only the limitation of print, or is it the float data type limitation?
which value is actually stored in the variable nf?
How should I work with values like this so I don't lose information like having a deviation of 0.027474382659420f during assignment.
Any other suggestion related to this kind of problem in testing would be also much appriciated.

Why is there this deviation (0.027474382659420f) in the print ?
Because float has an accuracy of about 7 digits, and your deviation starts at the seventh digit.
Is this only the limitation of print, or is it the float data type
limitation?
It's a limitation of floating point numbers, so that also includes double (although double has a higher precision). It also has to do with the conversion between binary and decimal numbers, so for instance 0.2 is a repeating decimal (well, binary) in binary representation, so it is suspect to rounding errors, too, and might actually turn into something like 0.200000000000000011.
which value is actually stored in the variable nf?
The one that you see printed. The 584227.4649743827f that you specified most likely won't even exist in the binary of your compiled program and is "translated" to the actually used value during compilation.
How should I work with values like this so I don't lose information
like having a deviation of 0.027474382659420f during assignment.
Use double, that has an accuracy of about 15-17 digits. Also, you need to remove the f from the 584227.4649743827f, turning it into a double constant instead. If that is not accurate enough, you may have to use external libraries instead for arbitrary precision numbers, such as GMP.
Your floating point numbers most likely adhere to the IEEE 754 standard, but there's no guarantee.

Why do float calculation results differ in C and on my calculator?

I am working on a problem and my results returned by C program are not as good as returned by a simple calculator, not equally precise to be precise.
On my calculator, when I divide 2000008 by 3, I get 666669.333333
But in my C program, I am getting 666669.312500
This is what I'm doing-
printf("%f\n",2000008/(float)3);
Why are results different? What should i do to get the result same as that of calculator? I tried double but then it returns result in a different format. Do I need to go through conversion and all? Please help.

See http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html for an in-depth explanation.
In short, floating point numbers are approximations to the real numbers, and they have a limit on digits they can hold. With float, this limit is quite small, with doubles, it's more, but still not perfect.
Try
printf("%20.12lf\n",(double)2000008/(double)3);
and you'll see a better, but still not perfect result. What it boils down to is, you should never assume floating point numbers to be precise. They aren't.

Floating point numbers take a fixed amount of memory and therefore have a limited precision. Limited precision means you can't represent all possible real numbers, and that in turn means that some calculations result in rounding errors. Use double instead of float to gain extra precision, but mind you that even a double can't represent everything even if it's enough for most practical purposes.
Gunthram summarizes it very well in his answer:
What it boils down to is, you should never assume floating point numbers to be precise. They aren't.

C: Adding Exponentials

What I thought was a trivial addition in standard C code compiled by GCC has confused me somewhat.
If I have a double called A and also a double called B, and A = a very small exponential say 1e-20 and B is a larger value for example 1e-5 - why does my double C which equals the summation A+B take on the dominant value B? I was hoping that when I specify to print to 25 decimal places I would get 1.00000000000000100000e-5.
Instead what I get is just 1.00000000000000000000e-5. Do I have to use long double or something else?
Very confused, and an easy question for most to answer I'm sure! Thanks for any guidance in advance.

Yes, there is not enough precision in the double mantissa. 2^53 (the precision of the double mantissa) is only slightly larger than 10^15 (the ratio between 10^20 and 10^5) so binary expansion and round off can easily squash small bits at the end.
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Google is your friend etc.

Floating point variables can hold a bigger range of value than fixed point, however their precision on significant digit has limits.
You can represent very big or very small numbers but the precision is dependent on the number of significant digit.
If you try to make operation between numbers very far in terms of exponent used to express them, the ability to work with them depends on the ability to represent them with the same exponent.
In your case when you try to sum the two numbers, the smaller numbers is matched in exponent with the bigger one, resulting in a 0 because its significant digit is out of range.
You can learn more for example on wiki

trouble with double truncation and math in C

Im making a functions that fits balls into boxes. the code that computes the number of balls that can fit on each side of the box is below. Assume that the balls fit together as if they were cubes. I know this is not the optimal way but just go with it.
the problem for me is that although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??
two additional things, this error only happens when the optimal side length is reached; for example, the side length is 12.2, the box thickness is .1 and the ball radius is 1.5. this leads to exactly 4 balls fit on that side. if I DONT cast as an int, it works out but if I do cast as an int, I get the aforementioned error (31 instead of 32). Also, the print line runs once if the side length is optimal but twice if it's not. I don't know what that means.
double ballsFit(double r, double l, double w, double h, double boxthick)
{
double ballsInL, ballsInW, ballsInH;
int ballsinbox;
ballsInL= (int)((l-(2*boxthick))/(r*2));
ballsInW= (int)((w-(2*boxthick))/(r*2));
ballsInH= (int)((h-(2*boxthick))/(r*2));
ballsinbox=(ballsInL*ballsInW*ballsInH);
printf("LENGTH=%f\nWidth=%f\nHight=%f\nBALLS=%d\n", ballsInL, ballsInW, ballsInH, ballsinbox);
return ballsinbox;
}

The fundamental problem is that floating-point math is inexact.
For example, the number 0.1 -- that you mention as the value of thickness in the problematic example -- cannot be represented exactly as a double. When you assign 0.1 to a variable, what gets stored is an approximation of 0.1.
I recommend that you read What Every Computer Scientist Should Know About Floating-Point Arithmetic.
although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??
It is almost certainly the case that the multiplicands (at least some of them) are not what they look like. If they were exactly 4.0, 4.0 and 2.0, their product would be exactly 32.0. If you printed out all the digits that the doubles are capable of representing, I am pretty sure you'd see lots of 9s, as in 3.99999999999... etc. As a consequence, the product is a tiny bit less than 32. The double-to-int conversion simply chops off the fractional part, so you end up with 31.
Of course, you don't always get numbers that are less than what they would be if the computation were exact; you can also get numbers that are greater than what you might expect.

Fixed precision floating point numbers, such as the IEEE-754 numbers commonly used in modern computers cannot represent all decimal numbers accurately - much like 1/3 cannot be represented accurately in decimal.
For example 0.1 can be something along the lines of 0.100000000000000004... when converted to binary and back. The difference is small, but significant.
I have occasionally managed to (partially) deal with such issues by using extended or arbitrary precision arithmetic to maintain a degree of precision while computing and then down-converting to double for the final results. There is usually a noticeable drop in performance, but IMHO correctness is infinitely more important.
I recently used algorithms from the high-precision arithmetic libraries listed here with good results on both the precision and performance fronts.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight