I have the following code,
float a = 0.7;
if(0.7 > a)
printf("Hi\n");
else
printf("Hello\n"); //Line1
and
float a = 0.98;
if(0.98 > a)
printf("Hi\n");
else
printf("Hello\n"); //Line2
here line1 outputs Hi but Line2 outputs Hello. I assume there would be a certain criteria about double constant and float, i.e any one of them would become larger on evaluation. But this two codes clarify me that situation can be come when double constant get larger and some other times float get larger. Is there any rounding off issue behind this? If it is, please explain me. I am badly in need of this clear..
thanks advance
What you have is called representation error.
To see what is going on you might find it easier to first consider the decimal representations of 1/3, 1/2 and 2/3 stored with different precision (3 decimal places or 6 decimal places):
a = 0.333
b = 0.333333
a < b
a = 0.500
b = 0.500000
a == b
a = 0.667
b = 0.666667
a > b
Increasing the precision can make the number slightly larger, slightly smaller, or have the same value.
The same logic applies to binary floating point numbers.
float a = 0.7;
Now a is the closest single-precision floating point value to 0.7. For the comparison 0.7 > a that is promoted to double, since the type of the constant 0.7 is double, and its value is the closest double-precision floating point value to 0.7. These two values are different, since 0.7 isn't exactly representable, so one value is larger than the other.
The same applies to 0.98. Sometimes, the closest single-precision value is larger than the decimal fraction and the closest double-precision number smaller, sometimes the other way round.
This is part of What Every Computer Scientist Should Know About Floating-Point Arithmetic.
This is simply one of the issues with floating point precision.
While there are an infinite number of floating point numbers, there are not an infinite number of floating point representations due to the bit-constraints. So there will be rounding errors when using floats in this manner.
There is no criteria for where it decides to round up or down, that would probably be language -implementation or compiler dependent.
See here: http://en.wikipedia.org/wiki/Floating_point, and http://en.wikipedia.org/wiki/IEEE_754 for more details.
Related
This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 6 years ago.
float f=2.2;
if (f==2.2)
printf("abc");
else
printf("xyz");
This code prints xyz,while if we give 2.5 instead of 2.2 the output is abc.
2.2 is a double that is not exactly representable in binary floating point. In effect, you are comparing the double 2.2 to the result of converting it to float, with rounding, and then back to double.
Assuming double is IEEE 754 64-bit binary floating point, and float is IEEE 754 32-bit binary floating point, the closest double to decimal 2.2 has exact value 2.20000000000000017763568394002504646778106689453125, and the result of converting it to float is 2.2000000476837158203125. Converting the float back to double does not change its value. 2.20000000000000017763568394002504646778106689453125 is not equal to 2.2000000476837158203125.
2.5 is exactly representable in both float and double in binary floating point systems, so neither the conversion to float nor the conversion back to double changes the value. 2.5 is equal to 2.5.
Floating-point numbers are likely to contain round-off error
Ref: http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)FloatingPoint.html
Going through the documentation,
I see the following. (Point 6 in the URL above)
In general, floating-point numbers are not exact: they are likely to contain round-off error because of the truncation of the mantissa to a fixed number of bits.
The easiest way to avoid accumulating error is to use high-precision floating-point numbers (this means using double instead of float). On modern CPUs there is little or no time penalty for doing so....
One consequence of round-off error is that it is very difficult to test floating-point numbers for equality, unless you are sure you have an exact value as described above. It is generally not the case, for example, that (0.1+0.1+0.1) == 0.3 in C. This can produce odd results if you try writing something like for(f = 0.0; f <= 0.3; f += 0.1): it will be hard to predict in advance whether the loop body will be executed with f = 0.3 or not. (Even more hilarity ensues if you write for(f = 0.0; f != 0.3; f += 0.1), which after not quite hitting 0.3 exactly keeps looping for much longer than I am willing to wait to see it stop, but which I suspect will eventually converge to some constant value of f large enough that adding 0.1 to it has no effect.)
I am calculating the volume of a room and I got a number with 6 decimal places. I was wondering if I can reduce the value to only 2 decimal places. The resulting number for the volume is from 5 different variables, which I do not know if it matters in this situation.
#Rashmi solution provides a nicely rounded display of a floating point value.
It does not change the value of the original number.
If one wants to round a floating point value to the nearest 0.01 use round()
#include <math.h>
double d = 1.2345;
d = round(d * 100.0)/100.0;
Notes:
Due to FP limitations, the rounded value may not be exactly a multiple of 0.01, but will be the closest FP number a given platform allows.
When d is very close to x.xx5, (x is various digits 0-9) d * 100.0 introduces a rounding in the product before the round() call. Code may round the wrong way.
You can use printf("%.2f", 20.233232)
There might be a round() function floating around (ha ha) somewhere in some math library (I don't have a C ref at hand). If not, a quick and dirty method would be to multiply the number by 100 (shift decimal point right by 2), add 0.5, truncate to integer, and divide by 100 (shift decimal point left by 2).
The floating point value 0.01 cannot be expressed in IEEE 754, so you still get more decimals than you asked for.
Better Way: just don't display the extra decimals in your program. I doubt you are "getting" 6 decimals; it could be the default value for a plain
printf ("too much accuracy in %f!", yourFloat);
If so, use %.2f to display.
Slightly Worse Way, depending on the numerical range and the sort of calculations you are going to do: multiply the floats by 100, round, and store as integer. 100.00% guaranteed you'll only get two digits of accuracy. Watch out when dividing (you'll loose 2 digits if not done carefully) and multiplying (you'll gain 2).
I am trying to multiply two floats as follows:
float number1 = 321.12;
float number2 = 345.34;
float rexsult = number1 * number2;
The result I want to see is 110895.582, but when I run the code it just gives me 110896. Most of the time I'm having this issue. Any calculator gives me the exact result with all decimals. How can I achive that result?
edit : It's C code. I'm using XCode iOS simulator.
There's a lot of rounding going on.
float a = 321.12; // this number will be rounded
float b = 345.34; // this number will also be rounded
float r = a * b; // and this number will be rounded too
printf("%.15f\n", r);
I get 110895.578125000000000 after the three separate roundings.
If you want more than 6 decimal digits' worth of precision, you will have to use double and not float. (Note that I said "decimal digits' worth", because you don't get decimal digits, you get binary.) As it stands, 1/2 ULP of error (a worst-case bound for a perfectly rounded result) is about 0.004.
If you want exactly rounded decimal numbers, you will have to use a specialized decimal library for such a task. A double has more than enough precision for scientists, but if you work with money everything has to be 100% exact. No floating point numbers for money.
Unlike integers, floating point numbers take some real work before you can get accustomed to their pitfalls. See "What Every Computer Scientist Should Know About Floating-Point Arithmetic", which is the classic introduction to the topic.
Edit: Actually, I'm not sure that the code rounds three times. It might round five times, since the constants for a and b might be rounded first to double-precision and then to single-precision when they are stored. But I don't know the rules of this part of C very well.
You will never get the exact result that way.
First of all, number1 ≠ 321.12 because that value cannot be represented exactly in a base-2 system. You'll need an infinite number of bits for it.
The same holds for number2 ≠ 345.34.
So, you begin with inexact values to begin with.
Then the product will get rounded because multiplication gives you double the number of significant digits but the product has to be stored in float again if you multiply floats.
You probably want to use a 10-based system for your numbers. Or, in case your numbers only have 2 decimal digits of the fractional, you can use integers (32-bit integers are sufficient in this case, but you may end up needing 64-bit):
32112 * 34534 = 1108955808.
That represents 321.12 * 345.34 = 110895.5808.
Since you are using C you could easily set the precision by using "%.xf" where x is the wanted precision.
For example:
float n1 = 321.12;
float n2 = 345.34;
float result = n1 * n2;
printf("%.20f", result);
Output:
110895.57812500000000000000
However, note that float only gives six digits of precision. For better precision use double.
floating point variables are only approximate representation, not precise one. Not every number can "fit" into float variable. For example, there is no way to put 1/10 (0.1) into binary variable, just like it's not possible to put 1/3 into decimal one (you can only approximate it with endless 0.33333)
when outputting such variables, it's usual to apply many rounding options. Unless you set them all, you can never be sure which of them are applied. This is especially true for << operators, as the stream can be told how to round BEFORE <<.
Printf also does some rounding. Consider http://codepad.org/LLweoeHp:
float t = 0.1f;
printf("result: %f\n", t);
--
result: 0.100000
Well, it looks fine. Why? Because printf defaulted to some precision and rounded up the output. Let's dial in 50 places after decimal point: http://codepad.org/frUPOvcI
float t = 0.1f;
printf("result: %.50f\n", t);
--
result: 0.10000000149011611938476562500000000000000000000000
That's different, isn't it? After 625 the float ran out of capacity to hold more data, that's why we see zeroes.
A double can hold more digits, but 0.1 in binary is not finite. Double has to give up, eventually: http://codepad.org/RAd7Yu2r
double t = 0.1;
printf("result: %.70f\n", t);
--
result: 0.1000000000000000055511151231257827021181583404541015625000000000000000
In your example, 321.12 alone is enough to cause trouble: http://codepad.org/cgw3vUKn
float t = 321.12f;
printf("and the result is: %.50f\n", t);
result: 321.11999511718750000000000000000000000000000000000000
This is why one has to round up floating point values before presenting them to humans.
Calculator programs don't use floats or doubles at all. They implement decimal number format. eg:
struct decimal
{
int mantissa; //meaningfull digits
int exponent; //number of decimal zeroes
};
Ofc that requires reinventing all operations: addition, substraction, multiplication and division. Or just look for a decimal library.
myInt = int( 5 * myRandom() )
myRandom() is a randomly generated float, which should be 0.2.
So this statement should evaluate to be 1.
My question: is it possible that due to a floating point error it will NOT evaluate to 1?
For example if due to a floating point error something which should be 0.2 could that be LESS than that?
IE, for instance consider the following 3 possibilities:
int(5 * 0.2 ) = 1 //case 1 normal
int(5 * 0.2000000000000001 ) = 1 //case 2 slightly larger, its OK
int(5 * 0.1999999999999999 ) = 0 //case 3 negative, is NOT OK, as int() floors it
Is case3 even possible?, with 0.1999999999999999 be a result of a floating point error? I have never actually seen a negative epsilon so far, only case 2, when its a slightly bit larger, and thats OK, as when it is cast to int(), that 'floors' it to the correct result. However with a negative epsilon the 'flooring' effect will make the resulting 0.9999999999999996 evaluate to 0.
It is impossible for myRandom to return .2 because .2 is not representable as a float or a double, assuming your target system is using the IEEE 754 binary floating-point standard, which is overwhelmingly the default.
If myRandom() returns the representable number nearest .2, then myInt will be 1, because the number nearest .2 representable as a float is slightly greater than .2 (it is 0.20000000298023223876953125), and so is the nearest representable double (0.20000000000000001110223024625156540423631668090820312).
In other cases, this will not be true. E.g., the nearest double to .6 is 0.59999999999999997779553950749686919152736663818359375, so myInt will be 2, not 3.
Yes, it's possible, at least as far as the C standard is concerned.
The value 0.2 cannot be represented exactly in a binary floating-point format. The value returned by myRandom() will therefore be either slightly below, or slightly above, the mathematical value 0.2. The C standard permits either result.
Now it may well be that IEEE semantics only permit the result to be slightly greater than 0.2 -- but the C standard doesn't require IEEE semantics. And that's assuming that the result is derived as exactly as possible from the value 0.2. If the value is generated from a series of floating-point operations, each of which can introduce a small error, it could easily be either less than or greater than 0.2.
It's not a floating point error, it's the way floating point works. Any fraction that isn't 1/(power of 2) can't be exactly represented, and will be rounded either up or down to the nearest representable number.
You can fix your code by multiplying by some small epsilon greater than one before converting to integer.
myInt = int( 5 * myRandom() * 1.000000000000001 )
See What Every Computer Scientist Should Know About Floating-Point Arithmetic.
It's possible, depending on the number you choose.
To check a specific number you can always print them with a lot of precision: printf("%1.50f", 0.2)
why not multiply your float by 5.0 and then use the round function to properly round it?
void main()
{
float a = 0.7;
if (a < 0.7)
printf("c");
else
printf("c++");
}
In the above question for 0.7, "c" will be printed, but for 0.8, "c++" wil be printed. Why?
And how is any float represented in binary form?
At some places, it is mentioned that internally 0.7 will be stored as 0.699997, but 0.8 as 0.8000011. Why so?
basically with float you get 32 bits that encode
VALUE = SIGN * MANTISSA * 2 ^ (128 - EXPONENT)
32-bits = 1-bit 23-bits 8-bits
and that is stored as
MSB LSB
[SIGN][EXPONENT][MANTISSA]
since you only get 23 bits, that's the amount of "precision" you can store. If you are trying to represent a fraction that is irrational (or repeating) in base 2, the sequence of bits will be "rounded off" at the 23rd bit.
0.7 base 10 is 7 / 10 which in binary is 0b111 / 0b1010 you get:
0.1011001100110011001100110011001100110011001100110011... etc
Since this repeats, in fixed precision there is no way to exactly represent it. The
same goes for 0.8 which in binary is:
0.1100110011001100110011001100110011001100110011001101... etc
To see what the fixed precision value of these numbers is you have to "cut them off" at the number of bits you and do the math. The only trick is you the leading 1 is implied and not stored so you technically get an extra bit of precision. Because of rounding, the last bit will be a 1 or a 0 depending on the value of the truncated bit.
So the value of 0.7 is effectively 11,744,051 / 2^24 (no rounding effect) = 0.699999988 and the value of 0.8 is effectively 13,421,773 / 2^24 (rounded up) = 0.800000012.
That's all there is to it :)
A good reference for this is What Every Computer Scientist Should Know About Floating-Point Arithmetic. You can use higher precision types (e.g. double) or a Binary Coded Decimal (BCD) library to achieve better floating point precision if you need it.
The internal representation is IEE754.
You can also use this calculator to convert decimal to float, I hope this helps to understand the format.
floats will be stored as described in IEEE 754: 1 bit for sign, 8 for a biased exponent, and the rest storing the fractional part.
Think of numbers representable as floats as points on the number line, some distance apart; frequently, decimal fractions will fall in between these points, and the nearest representation will be used; this leads to the counterintuitive results you describe.
"What every computer scientist should know about floating point arithmetic" should answer all your questions in detail.
If you want to know how float/double is presented in C(and almost all languages), please refert to Standard for Floating-Point Arithmetic (IEEE 754) http://en.wikipedia.org/wiki/IEEE_754-2008
Using single-precision floats as an example, here is the bit layout:
seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm meaning
31 0 bit #
s = sign bit, e = exponent, m = mantissa
Another good resource to see how floating point numbers are stored as binary in computers is Wikipedia's page on IEEE-754.
Floating point numbers in C/C++ are represented in IEEE-754 standard format. There are many articles on the internet, that describe in much better detail than I can here, how exactly a floating point is represented in binary. A simple search for IEEE-754 should illuminate the mystery.
0.7 is a numeric literal; it's value is the mathematical real number 0.7, rounded to the nearest double value.
After initialising float a = 0.7, the value of a is 0.7 rounded to float, that is the real number 0.7, rounded to the nearest double value, rounded to the nearest float value. Except by a huge coincidence, you wouldn't expect a to be equal to 0.7.
"if (a < 0.7)" compares 0.7 rounded to double then to float with the number 0.7 rounded to double. It seems that in the case of 0.7, the rounding produced a smaller number. And in the same experiment with 0.8, rounding 0.8 to float will produce a larger number than 0.8.
Floating point comparisons are not reliable, whatever you do. You should use threshold tolerant comparison/ epsilon comparison of floating points.
Try IEEE-754 Floating-Point Conversion and see what you get. :)