Casting to int and floating point errors? - c

myInt = int( 5 * myRandom() )
myRandom() is a randomly generated float, which should be 0.2.
So this statement should evaluate to be 1.
My question: is it possible that due to a floating point error it will NOT evaluate to 1?
For example if due to a floating point error something which should be 0.2 could that be LESS than that?
IE, for instance consider the following 3 possibilities:
int(5 * 0.2 ) = 1 //case 1 normal
int(5 * 0.2000000000000001 ) = 1 //case 2 slightly larger, its OK
int(5 * 0.1999999999999999 ) = 0 //case 3 negative, is NOT OK, as int() floors it
Is case3 even possible?, with 0.1999999999999999 be a result of a floating point error? I have never actually seen a negative epsilon so far, only case 2, when its a slightly bit larger, and thats OK, as when it is cast to int(), that 'floors' it to the correct result. However with a negative epsilon the 'flooring' effect will make the resulting 0.9999999999999996 evaluate to 0.

It is impossible for myRandom to return .2 because .2 is not representable as a float or a double, assuming your target system is using the IEEE 754 binary floating-point standard, which is overwhelmingly the default.
If myRandom() returns the representable number nearest .2, then myInt will be 1, because the number nearest .2 representable as a float is slightly greater than .2 (it is 0.20000000298023223876953125), and so is the nearest representable double (0.20000000000000001110223024625156540423631668090820312).
In other cases, this will not be true. E.g., the nearest double to .6 is 0.59999999999999997779553950749686919152736663818359375, so myInt will be 2, not 3.

Yes, it's possible, at least as far as the C standard is concerned.
The value 0.2 cannot be represented exactly in a binary floating-point format. The value returned by myRandom() will therefore be either slightly below, or slightly above, the mathematical value 0.2. The C standard permits either result.
Now it may well be that IEEE semantics only permit the result to be slightly greater than 0.2 -- but the C standard doesn't require IEEE semantics. And that's assuming that the result is derived as exactly as possible from the value 0.2. If the value is generated from a series of floating-point operations, each of which can introduce a small error, it could easily be either less than or greater than 0.2.

It's not a floating point error, it's the way floating point works. Any fraction that isn't 1/(power of 2) can't be exactly represented, and will be rounded either up or down to the nearest representable number.
You can fix your code by multiplying by some small epsilon greater than one before converting to integer.
myInt = int( 5 * myRandom() * 1.000000000000001 )
See What Every Computer Scientist Should Know About Floating-Point Arithmetic.

It's possible, depending on the number you choose.
To check a specific number you can always print them with a lot of precision: printf("%1.50f", 0.2)

why not multiply your float by 5.0 and then use the round function to properly round it?

Related

What happens if we keep dividing float 1.0 by 2 until it reaches zero?

float f = 1.0;
while (f != 0.0) f = f / 2.0;
This loop runs 150 times using 32-bit precision. Why is that so? Is it getting rounded to zero?
In common C implementations, the IEEE-754 binary32 format is used for float. It is also called “single precision.” It is a binary based format where finite numbers are represented as ±f•2e, where f is a 24-bit binary numeral in [1, 2) and e is an integer in [−126, 127].
In this format, 1 is represented as +1.000000000000000000000002•20. Dividing that by 2 yields ½, which is represented as +1.000000000000000000000002•2−1. Dividing that by 2 yields +1.000000000000000000000002•2−2, then +1.000000000000000000000002•2−3, and so on until we reach +1.000000000000000000000002•2−126.
When that is divided by two, the mathematical result is +1.000000000000000000000002•2−127, but −127 is below the normal exponent range, [−126, 127]. Instead, the significand becomes denormalized; 2−127 is represented by +0.100000000000000000000002•2−126. Dividing that by 2 yields +0.010000000000000000000002•2−126, then +0.001000000000000000000002•2−126, +0.000100000000000000000002•2−126, and so on until we get to +0.000000000000000000000012•2−126.
At this point, we have done 149 divisions by 2; +0.000000000000000000000012•2−126 is 2−149.
When the next division is performed, the result would be 2−150, but that is not representable in this format. Even with the lowest non-zero significand, 0.000000000000000000000012, and the lowest exponent, −126, we cannot get to 2−150. The next lower representable number is +0.000000000000000000000002•2−126, which equals 0.
So, the real-number-arithmetic result of the division would be 2−150, but we cannot represent that in this format. The two nearest representable numbers are +0.000000000000000000000012•2−126 just above it and +0.000000000000000000000002•2−126 just below it. They are equally near 2−150. The default rounding method is to take the nearest representable number and, in case of ties, to take the number with the even low digit. So +0.000000000000000000000002•2−126 wins the tie, and that is produced as the result for the 150th division.
What happens is simply that your system has only a limited number of bits available for a variable, and hence limited precision; even though, mathematically, you can halve a number (!= 0) indefinitely without ever reaching zero, in a computer implementation that has a limited precision for a float variable, that variable will inevitably, at some stage, become indistinguishable from zero. The more bits your system uses, the more precision it has and the later this will happen, but at some stage it will.
Since I suppose this is meant to be C, I just implemented it in C (with a counter counting each iteration), and indeed it ran for 150 rounds until the loop ended. I also implemented it with a double, where it ran for 1075 iterations. Keep in mind, however, that the C standard does not define the exact precision of a float variable. In most implementations it's 32 bits for a float and 64 for a double. With a long double, I get 16,446 iterations.

How to multiply floating point in ANSI C?

The following code:
float numberForFactorial;
float floatingPart = 1.400000;
int integralPart = 1;
numberForFactorial = ((floatingPart) - (float)integralPart) * 10;
printf("%d", (int)numberForFactorial);
Returns 3 instead of 4. Can you explain me why?
The float closest to 1.400000 is slightly less than 1.4. You can verify that by doing
printf("%0.8hf\n", floatingPart);
The result from ideone is 1.39999998. This means that 10 times the first digit after the decimal point is 3.
To avoid this issue, use rounding instead of truncation. One easy way to round is by adding half before truncation:
printf("%d", (int)(numberForFactorial + 0.5f));
will print 4 as you were expecting. You can also use round, rint, lround, or modf to get the same result: https://www.gnu.org/software/libc/manual/html_node/Rounding-Functions.html. Rounding is a complex topic, so choose the method whose constraints match your situation best.
This is due to binary representation of floating-point values. More specifically, the 0.4 or 2/5 cannot be expressed with mantissa as sum of any combination like 1/2 + 1/4 + 1/8 + ...
The literal 1.400000 is stored as something closer to 1.399999976158142 in its binary representation. The cast to int truncates non-integer part, giving three as the final result.
To be pedantic, the C standard does not require binary-based representation of floating-point data type, however IEEE 754 is de facto the standad one in today's computing.

I cant understand this program [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
strange output in comparision of float with float literal
I can't understand this code. How can two same numbers be compared?
#include<stdio.h>
int main()
{
float a=0.8;
if(0.8>a) //how can we compare same numbers?
printf("c");
else
printf("c++");
return 0;
}
How can this problem be solved?
I do not understand why you ask whether the same two numbers can be compared. Why would you not expect a comparison such as 3 > 3 to work, even though 3 is the same as 3? The > operator returns true if the left side is greater than the right side, and it returns false otherwise. In this case, .8 is not greater than a, so the result is false.
Other people seem to have assumed that you are asking about some floating-point rounding issue. However, even with exact mathematics, .8 > .8 is false, and that is the result you get with the code you showed; the else branch is taken. So there is no unexpected behavior here to explain.
What is your real question?
In case you are asking about floating-point effects, some information is below.
In C source, the text “.8” stands for one of the numbers that is representable in double-precision that is nearest .8. (A good implementation uses the nearest number, but the C standard allows some slack.) In the most common floating-point format, the nearest double-precision value to .8 is (as a hexadecimal floating-point numeral) 0x1.999999999999ap-1, which is (in decimal) 0.8000000000000000444089209850062616169452667236328125.
The reason for this is that binary floating-point represents numbers only with bits that stand for powers of two. (Which powers of two depends on the exponent of the floating-point value, but, regardless of the exponent, every bit in the fraction portion represents some power of two, such as .5, .25, .125, .0625, and so on.) Since .8 is not an exact multiple of any power of two, then, when the available bits in the fraction portion are all used, the resulting value is only close to .8; it is not exact.
The initialization float a = .8; converts the double-precision value to single-precision, so that it can be assigned to a. In single-precision, the representable number closest to .8 is 0x1.99999ap-1 (in decimal, 0.800000011920928955078125).
Thus, when you compare “.8” to a, you find that .8 > a is false.
For some other values, such as .7, the nearest representable numbers work out differently, and the relational operator returns true. For example, .7 > .7f is true. (The text “.7f” in source code stands for a single-precision floating-point value near .7.)
0.8 is a double. When a is set to it, then it's converted into a float and at this point looses precision. The comparison takes the float and promotes it back to a double, so the value is for sure different.
EDIT: I can prove my point. I just compile and ran a program
float a = 0.8;
int b = a == 0.8 ? 1 : 0;
int c = a < 0.8 ? 1 : 0;
int d = a > 0.8 ? 1 : 0;
printf("b=%d, c=%d, d=%d, a=%.12f 0.8=%.12f \n", b, c, d, a, 0.8);
b=0, c=0, d=1, a=0.800000011921 0.8=0.800000000000
Notice how a now has some very small factional part, due to the promotion to double

Can anyone explain me this feature simply?

I have the following code,
float a = 0.7;
if(0.7 > a)
printf("Hi\n");
else
printf("Hello\n"); //Line1
and
float a = 0.98;
if(0.98 > a)
printf("Hi\n");
else
printf("Hello\n"); //Line2
here line1 outputs Hi but Line2 outputs Hello. I assume there would be a certain criteria about double constant and float, i.e any one of them would become larger on evaluation. But this two codes clarify me that situation can be come when double constant get larger and some other times float get larger. Is there any rounding off issue behind this? If it is, please explain me. I am badly in need of this clear..
thanks advance
What you have is called representation error.
To see what is going on you might find it easier to first consider the decimal representations of 1/3, 1/2 and 2/3 stored with different precision (3 decimal places or 6 decimal places):
a = 0.333
b = 0.333333
a < b
a = 0.500
b = 0.500000
a == b
a = 0.667
b = 0.666667
a > b
Increasing the precision can make the number slightly larger, slightly smaller, or have the same value.
The same logic applies to binary floating point numbers.
float a = 0.7;
Now a is the closest single-precision floating point value to 0.7. For the comparison 0.7 > a that is promoted to double, since the type of the constant 0.7 is double, and its value is the closest double-precision floating point value to 0.7. These two values are different, since 0.7 isn't exactly representable, so one value is larger than the other.
The same applies to 0.98. Sometimes, the closest single-precision value is larger than the decimal fraction and the closest double-precision number smaller, sometimes the other way round.
This is part of What Every Computer Scientist Should Know About Floating-Point Arithmetic.
This is simply one of the issues with floating point precision.
While there are an infinite number of floating point numbers, there are not an infinite number of floating point representations due to the bit-constraints. So there will be rounding errors when using floats in this manner.
There is no criteria for where it decides to round up or down, that would probably be language -implementation or compiler dependent.
See here: http://en.wikipedia.org/wiki/Floating_point, and http://en.wikipedia.org/wiki/IEEE_754 for more details.

exact representation of floating points in c

void main()
{
float a = 0.7;
if (a < 0.7)
printf("c");
else
printf("c++");
}
In the above question for 0.7, "c" will be printed, but for 0.8, "c++" wil be printed. Why?
And how is any float represented in binary form?
At some places, it is mentioned that internally 0.7 will be stored as 0.699997, but 0.8 as 0.8000011. Why so?
basically with float you get 32 bits that encode
VALUE = SIGN * MANTISSA * 2 ^ (128 - EXPONENT)
32-bits = 1-bit 23-bits 8-bits
and that is stored as
MSB LSB
[SIGN][EXPONENT][MANTISSA]
since you only get 23 bits, that's the amount of "precision" you can store. If you are trying to represent a fraction that is irrational (or repeating) in base 2, the sequence of bits will be "rounded off" at the 23rd bit.
0.7 base 10 is 7 / 10 which in binary is 0b111 / 0b1010 you get:
0.1011001100110011001100110011001100110011001100110011... etc
Since this repeats, in fixed precision there is no way to exactly represent it. The
same goes for 0.8 which in binary is:
0.1100110011001100110011001100110011001100110011001101... etc
To see what the fixed precision value of these numbers is you have to "cut them off" at the number of bits you and do the math. The only trick is you the leading 1 is implied and not stored so you technically get an extra bit of precision. Because of rounding, the last bit will be a 1 or a 0 depending on the value of the truncated bit.
So the value of 0.7 is effectively 11,744,051 / 2^24 (no rounding effect) = 0.699999988 and the value of 0.8 is effectively 13,421,773 / 2^24 (rounded up) = 0.800000012.
That's all there is to it :)
A good reference for this is What Every Computer Scientist Should Know About Floating-Point Arithmetic. You can use higher precision types (e.g. double) or a Binary Coded Decimal (BCD) library to achieve better floating point precision if you need it.
The internal representation is IEE754.
You can also use this calculator to convert decimal to float, I hope this helps to understand the format.
floats will be stored as described in IEEE 754: 1 bit for sign, 8 for a biased exponent, and the rest storing the fractional part.
Think of numbers representable as floats as points on the number line, some distance apart; frequently, decimal fractions will fall in between these points, and the nearest representation will be used; this leads to the counterintuitive results you describe.
"What every computer scientist should know about floating point arithmetic" should answer all your questions in detail.
If you want to know how float/double is presented in C(and almost all languages), please refert to Standard for Floating-Point Arithmetic (IEEE 754) http://en.wikipedia.org/wiki/IEEE_754-2008
Using single-precision floats as an example, here is the bit layout:
seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm meaning
31 0 bit #
s = sign bit, e = exponent, m = mantissa
Another good resource to see how floating point numbers are stored as binary in computers is Wikipedia's page on IEEE-754.
Floating point numbers in C/C++ are represented in IEEE-754 standard format. There are many articles on the internet, that describe in much better detail than I can here, how exactly a floating point is represented in binary. A simple search for IEEE-754 should illuminate the mystery.
0.7 is a numeric literal; it's value is the mathematical real number 0.7, rounded to the nearest double value.
After initialising float a = 0.7, the value of a is 0.7 rounded to float, that is the real number 0.7, rounded to the nearest double value, rounded to the nearest float value. Except by a huge coincidence, you wouldn't expect a to be equal to 0.7.
"if (a < 0.7)" compares 0.7 rounded to double then to float with the number 0.7 rounded to double. It seems that in the case of 0.7, the rounding produced a smaller number. And in the same experiment with 0.8, rounding 0.8 to float will produce a larger number than 0.8.
Floating point comparisons are not reliable, whatever you do. You should use threshold tolerant comparison/ epsilon comparison of floating points.
Try IEEE-754 Floating-Point Conversion and see what you get. :)

Resources