Why can't I multiply a float? [duplicate] - c

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Dealing with accuracy problems in floating-point numbers
I was quite surprised why I tried to multiply a float in C (with GCC 3.2) and that it did not do as I expected.. As a sample:
int main() {
float nb = 3.11f;
nb *= 10;
printf("%f\n", nb);
}
Displays: 31.099998
I am curious regarding the way floats are implemented and why it produces this unexpected behavior?

First off, you can multiply floats. The problem you have is not the multiplication itself, but the original number you've used. Multiplication can lose some precision, but here the original number you've multiplied started with lost precision.
This is actually an expected behavior. floats are implemented using binary representation which means they can't accurately represent decimal values.
See MSDN for more information.
You can also see in the description of float that it has 6-7 significant digits accuracy. In your example if you round 31.099998 to 7 significant digits you will get 31.1 so it still works as expected here.
double type would of course be more accurate, but still has rounding error due to it's binary representation while the number you wrote is decimal.
If you want complete accuracy for decimal numbers, you should use a decimal type. This type exists in languages like C#. http://msdn.microsoft.com/en-us/library/system.decimal.aspx
You can also use rational numbers representation. Using two integers which will give you complete accuracy as long as you can represent the number as a division of two integers.

This is working as expected. Computers have finite precision, because they're trying to compute floating point values from integers. This leads to floating point inaccuracies.
The Floating point wikipedia page goes into far more detail on the representation and resulting accuracy problems than I could here :)
Interesting real-world side-note: this is partly why a lot of money calculations are done using integers (cents) - don't let the computer lose money with lack of precision! I want my $0.00001!

The number 3.11 cannot be represented in binary. The closest you can get with 24 significant bits is 11.0001110000101000111101, which works out to 3.1099998950958251953125 in decimal.
If your number 3.11 is supposed to represent a monetary amount, then you need to use a decimal representation.

In the Python communities we often see people surprised at this, so there are well-tested-and-debugged FAQs and tutorial sections on the issue (of course they're phrased in terms of Python, not C, but since Python delegates float arithmetic to the underlying C and hardware anyway, all the descriptions of float's mechanics still apply).
It's not the multiplication's fault, of course -- remove the statement where you multiply nb and you'll see similar issues anyway.

From Wikipedia article:
The fact that floating-point numbers
cannot precisely represent all real
numbers, and that floating-point
operations cannot precisely represent
true arithmetic operations, leads to
many surprising situations. This is
related to the finite precision with
which computers generally represent
numbers.

Floating points are not precise because they use base 2 (because it's binary: either 0 or 1) instead of base 10. And base 2 converting to base 10, as many have stated before, will cause rounding precision issues.

Related

Is there a C equivalent of C#'s decimal type?

In relation to: Convert Decimal to Double
Now, I got to many questions relating to C#'s floating-point type called decimal and I've seen its differences with both float and double, which got me thinking if there is an equivalent to this type in C.
In the question specified, I got an answer I want to convert to C:
double trans = trackBar1.Value / 5000.0;
double trans = trackBar1.Value / 5000d;
Of course, the only change is the second line gone, but with the thing about the decimal type, I want to know it's C equivalent.
Question: What is the C equivalent of C#'s decimal?
C2X will standardize decimal floating point as _DecimalN: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2573.pdf
In addition, GCC implements decimal floating point as an extension; it currently supports 32-bit, 64-bit, and 128-bit decimal floats.
Edit: much of what I said below is just plain wrong, as pointed out by phuclv in the comments. Even then, I think there's valuable information to be gained in reading that answer, so I'll leave it unedited below.
So in short: yes, there is support for Decimal floating-point values and arithmetic in the standard C language. Just check out phuclv's comment and S.S. Anne's answer.
In the C programming language, as others have commented, there's no such thing as a Decimal type, nor are there types implemented like it. The simplest type that is close to it would be double, which is implemented, most commonly, as an IEEE-754 compliant 64-bit floating-point type. It contains a 1-bit sign, an 11-bit exponent and a 52-bit mantissa/fraction. The following image represents it quite well(from wikipedia):
So you have the following format:
A more detailed explanation can be read here, but you can see that the exponent part is a power of two, which means that there will be imprecisions when dealing with division and multiplication by ten. A simple explanation is because division by anything that isn't a power of two is sure to repeat digits indefinitely in base 2. Example: 1/10 = 0.1(in base 10) = 0.00011001100110011...(in base 2). And, because computers can't store an unlimited amount of zeroes, your operations will have to be truncated/approximated.
In the case of C#'s Decimal, from the documentation:
The binary representation of a Decimal number consists of a 1-bit sign, a 96-bit integer number, and a scaling factor used to divide the integer number and specify what portion of it is a decimal fraction.
This last part is important, because instead of being a multiplication by a power of two, it is a division by a power of ten. So you have the following format:
Which, as you can clearly see, is a completely different implementation from above!
For instance, if you wanted to divide by a power of 10, you could do that exactly, because that just involves increasing the exponent part(N). You have to be aware of the limitation of the numbers that can be represented by Decimal, though, which is at most a measly 7.922816251426434e+28, whereas double can go up to 1.79769e+308.
Given that there are no equivalents (yet) in C to Decimal, you may wonder "what do I do?". Well, it depends. First off, is it really important for you to use a Decimal type? Can't you use a double? To answer that question, it's helpful to know why that type was created in the first place. Again, from Microsoft's documentation:
The Decimal value type is appropriate for financial calculations that require large numbers of significant integral and fractional digits and no round-off errors
And, just at the next phrase:
The Decimal type does not eliminate the need for rounding. Rather, it minimizes errors due to rounding
So you shouldn't think of Decimal as having "infinite precision", just as being a more appropriate type for calculations that generally need to be made in the decimal system(such as financial ones, as stated above).
If you still want a Decimal data type in C, you'd have to work in developing a library to support addition, subtraction, multiplication, etc --- Because C doesn't support operator overloading. Also, it still wouldn't have hardware support(e.g. from the x64 instruction set), so all of your operations would be slower than those of double, for example. Finally, if you still want something that supports a Decimal in other languages(in your final question), you may look into Decimal TR in C++.
As other pointed out, there's nothing in C standard(s) such as .NET's decimal, but, if you're working on Windows and have the Windows SDK, it's defined:
DECIMAL structure (wtypes.h)
Represents a decimal data type that provides a sign and scale for a
number (as in coordinates.)
Decimal variables are stored as 96-bit (12-byte) unsigned integers
scaled by a variable power of 10. The power of 10 scaling factor
specifies the number of digits to the right of the decimal point, and
ranges from 0 to 28.
typedef struct tagDEC {
USHORT wReserved;
union {
struct {
BYTE scale;
BYTE sign;
} DUMMYSTRUCTNAME;
USHORT signscale;
} DUMMYUNIONNAME;
ULONG Hi32;
union {
struct {
ULONG Lo32;
ULONG Mid32;
} DUMMYSTRUCTNAME2;
ULONGLONG Lo64;
} DUMMYUNIONNAME2;
} DECIMAL;
DECIMAL is used to represent an exact numeric value with a fixed precision and fixed scale.
The origin of this type is Windows' COM/OLE automation (introduced for VB/VBA/Macros, etc. so, it predates .NET, which has very good COM automation support), documented here officially: [MS-OAUT]: OLE Automation Protocol, 2.2.26 DECIMAL
It's also one of the VARIANT type (VT_DECIMAL). In x86 architecture, it's size fits right in the VARIANT (16 bytes).
Decimal type in C# is used is used with precision of 28-29 digits and it has size of 16 bytes.There is not even a close equivalent in C to C#.In Java there is a BigDecimal data type that is closest to C# decimal data type.C# decimal gives you numbers like:
+/- someInteger / 10 ^ someExponent
where someInteger is a 96 bit unsigned integer and someExponent is an integer between 0 and 28.
Is Java's BigDecimal the closest data type corresponding to C#'s Decimal?

floating point inaccuracies in c

I know floating point values are limited in the numbers the can express accurately and i have found many sites that describe why this happens. But i have not found any information of how to deal with this problem efficiently. But I'm sure NASA isn't OK with 0.2/0.1 = 0.199999. Example:
#include <stdio.h>
#include <float.h>
#include <math.h>
int main(void)
{
float number = 4.20;
float denominator = 0.25;
printf("number = %f\n", number);
printf("denominator = %f\n", denominator);
printf("quotient as a float = %f should be 16.8\n", number/denominator);
printf("the remainder of 4.20 / 0.25 = %f\n", number - ((int) number/denominator)*denominator);
printf("now if i divide 0.20000 by 0.1 i get %f not 2\n", ( number - ((int) number/denominator)*denominator)/0.1);
}
output:
number = 4.200000
denominator = 0.250000
quotient as a float = 16.799999 should be 16.8
the remainder of 4.20 / 0.25 = 0.200000
now if i divide 0.20000 by 0.1 i get 1.999998 not 2
So how do i do arithmetic with floats (or decimals or doubles) and get accurate results. Hope i haven't just missed something super obvious. Any help would be awesome! Thanks.
The solution is to not use floats for applications where you can't accept roundoff errors. Use an extended precision library (a.k.a. arbitrary precision library) like GNU MP Bignum. See this Wikipedia page for a nice list of arbitrary-precision libraries. See also the Wikipedia article on rational data types and this thread for more info.
If you are going to use floating point representations (float, double, etc.) then write code using accepted methods for dealing with roundoff errors (e.g., avoiding ==). There's lots of on-line literature about how to do this and the methods vary widely depending on the application and algorithms involved.
Floating point is pretty fine, most of the time. Here are the key things I try to keep in mind:
There's really a big difference between float and double. double gives you enough precision for most things, most of the time; float surprisingly often gives you not enough. Unless you know what you're doing and have a really good reason, just always use double.
There are some things that floating point is not good for. Although C doesn't support it natively, fixed point is often a good alternative. You're essentially using fixed point if you do your financial calculations in cents rather than dollars -- that is, if you use an int or a long int representing pennies, and remember to put a decimal point two places from the right when it's time to print out as dollars.
The algorithm you use can really matter. Naïve or "obvious" algorithms can easily end up magnifying the effects of roundoff error, while more sophisticated algorithms minimize them. One simple example is that the order you add up floating-point numbers can matter.
Never worry about 16.8 versus 16.799999. That sort of thing always happens, but it's not a problem, unless you make it a problem. If you want one place past the decimal, just print it using %.1f, and printf will round it for you. (Also don't try to compare floating-point numbers for exact equality, but I assume you've heard that by now.)
Related to the above, remember that 0.1 is not representable exactly in binary (just as 1/3 is not representable exactly in decimal). This is just one of many reasons that you'll always get what look like tiny roundoff "errors", even though they're perfectly normal and needn't cause problems.
Occasionally you need a multiple precision (MP or "bignum") library, which can represent numbers to arbitrary precision, but these are (relatively) slow and (relatively) cumbersome to use, and fortunately you usually don't need them. But it's good to know they exist, and if you're a math nurd they can be a lot of fun to use.
Occasionally a library for representing rational numbers is useful. Such a library represents, for example, the number 1/3 as the pair of numbers (1, 3), so it doesn't have the inaccuracies inherent in trying to represent that number as 0.333333333.
Others have recommended the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, which is very good, and the standard reference, although it's long and fairly technical. An easier and shorter read I can recommend is this handout from a class I used to teach: https://www.eskimo.com/~scs/cclass/handouts/sciprog.html#precision . This is a little dated by now, but it should get you started on the basics.
There's isn't a good answer and it's often a problem.
If data is integral, e.g. amounts of money in cents, then store it as integers, which can mean a double that is constrained to hold an integer number of cents rather than a rational number of dollars. But that only helps in a few circumstances.
As a general rule, you get inaccuracies when trying to divide by numbers that are close to zero. So you just have to write the algorithms to avoid or suppress such operations. There are lots of discussions of "numerically stable" versus "unstable" algorithms and it's too big a subject to do justice to it here. And then, usually, it's best to treat floating point numbers as though they have small random errors. If they ultimately represent measurements of analogue values in the real world, there must be a certain tolerance or inaccuracy in them anyway.
If you are doing maths rather than processing data, simply don't use C or C++. Use a symbolic algebra package such a Maple, which stores values such as sqrt(2) as an expression rather than a floating point number, so sqrt(2) * sqrt(2) will always give exactly 2, rather than a number very close to 2.

Why do float calculation results differ in C and on my calculator?

I am working on a problem and my results returned by C program are not as good as returned by a simple calculator, not equally precise to be precise.
On my calculator, when I divide 2000008 by 3, I get 666669.333333
But in my C program, I am getting 666669.312500
This is what I'm doing-
printf("%f\n",2000008/(float)3);
Why are results different? What should i do to get the result same as that of calculator? I tried double but then it returns result in a different format. Do I need to go through conversion and all? Please help.
See http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html for an in-depth explanation.
In short, floating point numbers are approximations to the real numbers, and they have a limit on digits they can hold. With float, this limit is quite small, with doubles, it's more, but still not perfect.
Try
printf("%20.12lf\n",(double)2000008/(double)3);
and you'll see a better, but still not perfect result. What it boils down to is, you should never assume floating point numbers to be precise. They aren't.
Floating point numbers take a fixed amount of memory and therefore have a limited precision. Limited precision means you can't represent all possible real numbers, and that in turn means that some calculations result in rounding errors. Use double instead of float to gain extra precision, but mind you that even a double can't represent everything even if it's enough for most practical purposes.
Gunthram summarizes it very well in his answer:
What it boils down to is, you should never assume floating point numbers to be precise. They aren't.

trouble with double truncation and math in C

Im making a functions that fits balls into boxes. the code that computes the number of balls that can fit on each side of the box is below. Assume that the balls fit together as if they were cubes. I know this is not the optimal way but just go with it.
the problem for me is that although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??
two additional things, this error only happens when the optimal side length is reached; for example, the side length is 12.2, the box thickness is .1 and the ball radius is 1.5. this leads to exactly 4 balls fit on that side. if I DONT cast as an int, it works out but if I do cast as an int, I get the aforementioned error (31 instead of 32). Also, the print line runs once if the side length is optimal but twice if it's not. I don't know what that means.
double ballsFit(double r, double l, double w, double h, double boxthick)
{
double ballsInL, ballsInW, ballsInH;
int ballsinbox;
ballsInL= (int)((l-(2*boxthick))/(r*2));
ballsInW= (int)((w-(2*boxthick))/(r*2));
ballsInH= (int)((h-(2*boxthick))/(r*2));
ballsinbox=(ballsInL*ballsInW*ballsInH);
printf("LENGTH=%f\nWidth=%f\nHight=%f\nBALLS=%d\n", ballsInL, ballsInW, ballsInH, ballsinbox);
return ballsinbox;
}
The fundamental problem is that floating-point math is inexact.
For example, the number 0.1 -- that you mention as the value of thickness in the problematic example -- cannot be represented exactly as a double. When you assign 0.1 to a variable, what gets stored is an approximation of 0.1.
I recommend that you read What Every Computer Scientist Should Know About Floating-Point Arithmetic.
although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??
It is almost certainly the case that the multiplicands (at least some of them) are not what they look like. If they were exactly 4.0, 4.0 and 2.0, their product would be exactly 32.0. If you printed out all the digits that the doubles are capable of representing, I am pretty sure you'd see lots of 9s, as in 3.99999999999... etc. As a consequence, the product is a tiny bit less than 32. The double-to-int conversion simply chops off the fractional part, so you end up with 31.
Of course, you don't always get numbers that are less than what they would be if the computation were exact; you can also get numbers that are greater than what you might expect.
Fixed precision floating point numbers, such as the IEEE-754 numbers commonly used in modern computers cannot represent all decimal numbers accurately - much like 1/3 cannot be represented accurately in decimal.
For example 0.1 can be something along the lines of 0.100000000000000004... when converted to binary and back. The difference is small, but significant.
I have occasionally managed to (partially) deal with such issues by using extended or arbitrary precision arithmetic to maintain a degree of precision while computing and then down-converting to double for the final results. There is usually a noticeable drop in performance, but IMHO correctness is infinitely more important.
I recently used algorithms from the high-precision arithmetic libraries listed here with good results on both the precision and performance fronts.

Why does the value of this float change from what it was set to?

Why is this C program giving the "wrong" output?
#include<stdio.h>
void main()
{
float f = 12345.054321;
printf("%f", f);
getch();
}
Output:
12345.054688
But the output should be, 12345.054321.
I am using VC++ in VS2008.
It's giving the "wrong" answer simply because not all real values are representable by floats (or doubles, for that matter). What you'll get is an approximation based on the underlying encoding.
In order to represent every real value, even between 1.0x10-100 and 1.1x10-100 (a truly minuscule range), you still require an infinite number of bits.
Single-precision IEEE754 values have only 32 bits available (some of which are tasked to other things such as exponent and NaN/Inf representations) and cannot therefore give you infinite precision. They actually have 23 bits available giving precision of about 224 (there's an extra implicit bit) or just over 7 decimal digits (log10(224) is roughly 7.2).
I enclose the word "wrong" in quotes because it's not actually wrong. What's wrong is your understanding about how computers represent numbers (don't be offended though, you're not alone in this misapprehension).
Head on over to http://www.h-schmidt.net/FloatApplet/IEEE754.html and type your number into the "Decimal representation" box to see this in action.
If you want a more accurate number, use doubles instead of floats - these have double the number of bits available for representing values (assuming your C implementation is using IEEE754 single and double precision data types for float and double respectively).
If you want arbitrary precision, you'll need to use a "bignum" library like GMP although that's somewhat slower than native types so make sure you understand the trade-offs.
The decimal number 12345.054321 cannot be represented accurately as a float on your platform. The result that you are seeing is a decimal approximation to the closest number that can be represented as a float.
floats are about convenience and speed, and use a binary representation - if you care about precision use a decimal type.
To understand the problem, read What Every Computer Scientist Should Know About Floating-Point Arithmetic:
http://docs.sun.com/source/806-3568/ncg_goldberg.html
For a solution, see the Decimal Arithmetic FAQ:
http://speleotrove.com/decimal/decifaq.html
It's all to do with precision. Your number cannot be stored accurately in a float.
Single-precision floating point values can only represent about eight to nine significant (decimal) digits. Beyond that point, you're seeing quantization error.

Resources