trouble with double truncation and math in C - c

Im making a functions that fits balls into boxes. the code that computes the number of balls that can fit on each side of the box is below. Assume that the balls fit together as if they were cubes. I know this is not the optimal way but just go with it.
the problem for me is that although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??
two additional things, this error only happens when the optimal side length is reached; for example, the side length is 12.2, the box thickness is .1 and the ball radius is 1.5. this leads to exactly 4 balls fit on that side. if I DONT cast as an int, it works out but if I do cast as an int, I get the aforementioned error (31 instead of 32). Also, the print line runs once if the side length is optimal but twice if it's not. I don't know what that means.
double ballsFit(double r, double l, double w, double h, double boxthick)
{
double ballsInL, ballsInW, ballsInH;
int ballsinbox;
ballsInL= (int)((l-(2*boxthick))/(r*2));
ballsInW= (int)((w-(2*boxthick))/(r*2));
ballsInH= (int)((h-(2*boxthick))/(r*2));
ballsinbox=(ballsInL*ballsInW*ballsInH);
printf("LENGTH=%f\nWidth=%f\nHight=%f\nBALLS=%d\n", ballsInL, ballsInW, ballsInH, ballsinbox);
return ballsinbox;
}

The fundamental problem is that floating-point math is inexact.
For example, the number 0.1 -- that you mention as the value of thickness in the problematic example -- cannot be represented exactly as a double. When you assign 0.1 to a variable, what gets stored is an approximation of 0.1.
I recommend that you read What Every Computer Scientist Should Know About Floating-Point Arithmetic.
although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??
It is almost certainly the case that the multiplicands (at least some of them) are not what they look like. If they were exactly 4.0, 4.0 and 2.0, their product would be exactly 32.0. If you printed out all the digits that the doubles are capable of representing, I am pretty sure you'd see lots of 9s, as in 3.99999999999... etc. As a consequence, the product is a tiny bit less than 32. The double-to-int conversion simply chops off the fractional part, so you end up with 31.
Of course, you don't always get numbers that are less than what they would be if the computation were exact; you can also get numbers that are greater than what you might expect.

Fixed precision floating point numbers, such as the IEEE-754 numbers commonly used in modern computers cannot represent all decimal numbers accurately - much like 1/3 cannot be represented accurately in decimal.
For example 0.1 can be something along the lines of 0.100000000000000004... when converted to binary and back. The difference is small, but significant.
I have occasionally managed to (partially) deal with such issues by using extended or arbitrary precision arithmetic to maintain a degree of precision while computing and then down-converting to double for the final results. There is usually a noticeable drop in performance, but IMHO correctness is infinitely more important.
I recently used algorithms from the high-precision arithmetic libraries listed here with good results on both the precision and performance fronts.

Related

floating point inaccuracies in c

I know floating point values are limited in the numbers the can express accurately and i have found many sites that describe why this happens. But i have not found any information of how to deal with this problem efficiently. But I'm sure NASA isn't OK with 0.2/0.1 = 0.199999. Example:
#include <stdio.h>
#include <float.h>
#include <math.h>
int main(void)
{
float number = 4.20;
float denominator = 0.25;
printf("number = %f\n", number);
printf("denominator = %f\n", denominator);
printf("quotient as a float = %f should be 16.8\n", number/denominator);
printf("the remainder of 4.20 / 0.25 = %f\n", number - ((int) number/denominator)*denominator);
printf("now if i divide 0.20000 by 0.1 i get %f not 2\n", ( number - ((int) number/denominator)*denominator)/0.1);
}
output:
number = 4.200000
denominator = 0.250000
quotient as a float = 16.799999 should be 16.8
the remainder of 4.20 / 0.25 = 0.200000
now if i divide 0.20000 by 0.1 i get 1.999998 not 2
So how do i do arithmetic with floats (or decimals or doubles) and get accurate results. Hope i haven't just missed something super obvious. Any help would be awesome! Thanks.
The solution is to not use floats for applications where you can't accept roundoff errors. Use an extended precision library (a.k.a. arbitrary precision library) like GNU MP Bignum. See this Wikipedia page for a nice list of arbitrary-precision libraries. See also the Wikipedia article on rational data types and this thread for more info.
If you are going to use floating point representations (float, double, etc.) then write code using accepted methods for dealing with roundoff errors (e.g., avoiding ==). There's lots of on-line literature about how to do this and the methods vary widely depending on the application and algorithms involved.
Floating point is pretty fine, most of the time. Here are the key things I try to keep in mind:
There's really a big difference between float and double. double gives you enough precision for most things, most of the time; float surprisingly often gives you not enough. Unless you know what you're doing and have a really good reason, just always use double.
There are some things that floating point is not good for. Although C doesn't support it natively, fixed point is often a good alternative. You're essentially using fixed point if you do your financial calculations in cents rather than dollars -- that is, if you use an int or a long int representing pennies, and remember to put a decimal point two places from the right when it's time to print out as dollars.
The algorithm you use can really matter. Naïve or "obvious" algorithms can easily end up magnifying the effects of roundoff error, while more sophisticated algorithms minimize them. One simple example is that the order you add up floating-point numbers can matter.
Never worry about 16.8 versus 16.799999. That sort of thing always happens, but it's not a problem, unless you make it a problem. If you want one place past the decimal, just print it using %.1f, and printf will round it for you. (Also don't try to compare floating-point numbers for exact equality, but I assume you've heard that by now.)
Related to the above, remember that 0.1 is not representable exactly in binary (just as 1/3 is not representable exactly in decimal). This is just one of many reasons that you'll always get what look like tiny roundoff "errors", even though they're perfectly normal and needn't cause problems.
Occasionally you need a multiple precision (MP or "bignum") library, which can represent numbers to arbitrary precision, but these are (relatively) slow and (relatively) cumbersome to use, and fortunately you usually don't need them. But it's good to know they exist, and if you're a math nurd they can be a lot of fun to use.
Occasionally a library for representing rational numbers is useful. Such a library represents, for example, the number 1/3 as the pair of numbers (1, 3), so it doesn't have the inaccuracies inherent in trying to represent that number as 0.333333333.
Others have recommended the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, which is very good, and the standard reference, although it's long and fairly technical. An easier and shorter read I can recommend is this handout from a class I used to teach: https://www.eskimo.com/~scs/cclass/handouts/sciprog.html#precision . This is a little dated by now, but it should get you started on the basics.
There's isn't a good answer and it's often a problem.
If data is integral, e.g. amounts of money in cents, then store it as integers, which can mean a double that is constrained to hold an integer number of cents rather than a rational number of dollars. But that only helps in a few circumstances.
As a general rule, you get inaccuracies when trying to divide by numbers that are close to zero. So you just have to write the algorithms to avoid or suppress such operations. There are lots of discussions of "numerically stable" versus "unstable" algorithms and it's too big a subject to do justice to it here. And then, usually, it's best to treat floating point numbers as though they have small random errors. If they ultimately represent measurements of analogue values in the real world, there must be a certain tolerance or inaccuracy in them anyway.
If you are doing maths rather than processing data, simply don't use C or C++. Use a symbolic algebra package such a Maple, which stores values such as sqrt(2) as an expression rather than a floating point number, so sqrt(2) * sqrt(2) will always give exactly 2, rather than a number very close to 2.

Why do float calculation results differ in C and on my calculator?

I am working on a problem and my results returned by C program are not as good as returned by a simple calculator, not equally precise to be precise.
On my calculator, when I divide 2000008 by 3, I get 666669.333333
But in my C program, I am getting 666669.312500
This is what I'm doing-
printf("%f\n",2000008/(float)3);
Why are results different? What should i do to get the result same as that of calculator? I tried double but then it returns result in a different format. Do I need to go through conversion and all? Please help.
See http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html for an in-depth explanation.
In short, floating point numbers are approximations to the real numbers, and they have a limit on digits they can hold. With float, this limit is quite small, with doubles, it's more, but still not perfect.
Try
printf("%20.12lf\n",(double)2000008/(double)3);
and you'll see a better, but still not perfect result. What it boils down to is, you should never assume floating point numbers to be precise. They aren't.
Floating point numbers take a fixed amount of memory and therefore have a limited precision. Limited precision means you can't represent all possible real numbers, and that in turn means that some calculations result in rounding errors. Use double instead of float to gain extra precision, but mind you that even a double can't represent everything even if it's enough for most practical purposes.
Gunthram summarizes it very well in his answer:
What it boils down to is, you should never assume floating point numbers to be precise. They aren't.

C: Adding Exponentials

What I thought was a trivial addition in standard C code compiled by GCC has confused me somewhat.
If I have a double called A and also a double called B, and A = a very small exponential say 1e-20 and B is a larger value for example 1e-5 - why does my double C which equals the summation A+B take on the dominant value B? I was hoping that when I specify to print to 25 decimal places I would get 1.00000000000000100000e-5.
Instead what I get is just 1.00000000000000000000e-5. Do I have to use long double or something else?
Very confused, and an easy question for most to answer I'm sure! Thanks for any guidance in advance.
Yes, there is not enough precision in the double mantissa. 2^53 (the precision of the double mantissa) is only slightly larger than 10^15 (the ratio between 10^20 and 10^5) so binary expansion and round off can easily squash small bits at the end.
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Google is your friend etc.
Floating point variables can hold a bigger range of value than fixed point, however their precision on significant digit has limits.
You can represent very big or very small numbers but the precision is dependent on the number of significant digit.
If you try to make operation between numbers very far in terms of exponent used to express them, the ability to work with them depends on the ability to represent them with the same exponent.
In your case when you try to sum the two numbers, the smaller numbers is matched in exponent with the bigger one, resulting in a 0 because its significant digit is out of range.
You can learn more for example on wiki

Why can't I multiply a float? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Dealing with accuracy problems in floating-point numbers
I was quite surprised why I tried to multiply a float in C (with GCC 3.2) and that it did not do as I expected.. As a sample:
int main() {
float nb = 3.11f;
nb *= 10;
printf("%f\n", nb);
}
Displays: 31.099998
I am curious regarding the way floats are implemented and why it produces this unexpected behavior?
First off, you can multiply floats. The problem you have is not the multiplication itself, but the original number you've used. Multiplication can lose some precision, but here the original number you've multiplied started with lost precision.
This is actually an expected behavior. floats are implemented using binary representation which means they can't accurately represent decimal values.
See MSDN for more information.
You can also see in the description of float that it has 6-7 significant digits accuracy. In your example if you round 31.099998 to 7 significant digits you will get 31.1 so it still works as expected here.
double type would of course be more accurate, but still has rounding error due to it's binary representation while the number you wrote is decimal.
If you want complete accuracy for decimal numbers, you should use a decimal type. This type exists in languages like C#. http://msdn.microsoft.com/en-us/library/system.decimal.aspx
You can also use rational numbers representation. Using two integers which will give you complete accuracy as long as you can represent the number as a division of two integers.
This is working as expected. Computers have finite precision, because they're trying to compute floating point values from integers. This leads to floating point inaccuracies.
The Floating point wikipedia page goes into far more detail on the representation and resulting accuracy problems than I could here :)
Interesting real-world side-note: this is partly why a lot of money calculations are done using integers (cents) - don't let the computer lose money with lack of precision! I want my $0.00001!
The number 3.11 cannot be represented in binary. The closest you can get with 24 significant bits is 11.0001110000101000111101, which works out to 3.1099998950958251953125 in decimal.
If your number 3.11 is supposed to represent a monetary amount, then you need to use a decimal representation.
In the Python communities we often see people surprised at this, so there are well-tested-and-debugged FAQs and tutorial sections on the issue (of course they're phrased in terms of Python, not C, but since Python delegates float arithmetic to the underlying C and hardware anyway, all the descriptions of float's mechanics still apply).
It's not the multiplication's fault, of course -- remove the statement where you multiply nb and you'll see similar issues anyway.
From Wikipedia article:
The fact that floating-point numbers
cannot precisely represent all real
numbers, and that floating-point
operations cannot precisely represent
true arithmetic operations, leads to
many surprising situations. This is
related to the finite precision with
which computers generally represent
numbers.
Floating points are not precise because they use base 2 (because it's binary: either 0 or 1) instead of base 10. And base 2 converting to base 10, as many have stated before, will cause rounding precision issues.

How can I introduce a small number with a lot of significant figures into a C program?

I'm not particularly knowledgable about programming and I'm trying to figure out how to get a precise value calculated in a C program. I need a constant to the power of negative 7, with 5 significant figures. Any suggestions (keeping in mind I know very little, have never programmed in anything but c and only during required courses that I took years ago at school)?
Thanks!
You can get high-precision math from specialized libraries, but if all you need is 5 significant digits then the built-in float and double types will do fine. Let's go with double for maximum precision.
The negative 7th power is just 1 over your number to the 7th power, so...
double k = 1.2345678; // your constant, whatever it is
double ktominus7 = 1.0 / (k * k * k * k * k * k * k);
...and that's it!
If you want to print out the value, you can do something like
printf("My number is: %9.5g\n", ktominus7);
For a constant value, the required calculation is going to be constant too. So, I recommend you calculate the value using your [desktop calculator / MATLAB / other] then hard-code it in your C code.
In the realm of computer floating-point formats, five significant digits is not a lot. The 32-bit IEEE-754 floating-point type used for float in most implementations of C has 24 bits of precision, which is about 7.2 decimal digits. So you can just use floating-point with no fear. double usually has 53 bits of precision (almost 16 decimal digits). Carl Smotricz's answer is fine, but there's also a pow function in C that you can pass -7.0 to.
There are times when you have to be careful about numerical analysis of your algorithm to ensure you aren't losing precision with intermediate results, but this doesn't sound like one of them.
long double offers the best precision in most cases and can be statically allocated and re-used to keep waste to a minimum. See also quadruple precision. Both change from platform to platform. Quadruple precision says the left most bit (1) continues to dictate signedness, while the next 15 bits dictate the exponent. IEEE 754 (i.e binary128) if the links provided aren't enough, they all lead back to long double :)
Simple shifting should take care of the rest, if I understand you correctly?
you can use log to transform small numbers into larger numbers and do your math on the log transformed version. it's kind of tricky but it will work most of the time. you can also switch to python which does not have this problem as much.

Resources