Comparison of one float variable to its contained value - c

#include<stdio.h>
int main()
{
float a,b;
a=4.375;
b=4.385;
if(a==4.375)
printf("YES\n");
else
printf("NO\n");
if(b==4.385)
printf("YES\n");
else
printf("NO\n");
return 0;
}
Answer of this code :
YES
NO
I always thought if i compare a float with double value. it will never match to it. unless value is pure integer. but here float "a" has 4.375 is exact in it but "b" doesn't
printf("%0.20f\n",a);
printf("%0.20f\n",b);
This prints :
4.37500000000000000000
4.38500022888183593750
but if i print
printf("%0.20f\n",4.475);
It prints 4.47499990463256835938
How is this rounding effect is showing in some and not in others.
Can anyone explain this. how should "WE" judge when value in float variable will match to that contained in it and when it doesn't ?

The conversion from decimal fraction to a binary fraction is precise only if the decimal fraction can be summed up by binary fractions like 0.5, 0.25, ..., etc.
For example in your case
0.375 = 0.25 + 0.125 = 2-2 + 2-3
So it can be represented exactly by using binary fractions.
Where as the number 0.385 can not be represented by using binary fractions precisely. So numbers like 0.5, 0.25, 0.125, ..., etc. or a combination of these numbers can be represented exactly as floating point numbers. Others like 0.385 will give incorrect results when the comparison or equality operations are performed on them.

Floating points aren't magic.
They contain an exact value and if you compare it with that they will compare equal. The two problems are
1) Some operations are not always entirely exact due to precision issues. If you add one to a float and then subtract one then adding that one might have causes some loss of precision in the least significant value bits and when you subtract it you don't get back to quite the same value you expect.
2) It is not possible to exactly represent every decimal value in the floating point binary format. For example it is not possible to store the exact value of 0.1 in a floating point binary number in exactly the same way that you can't write the value of 1/3.0 as a decimal value exactly no matter how many digits you use.
But in your case if you store a value and compare it with that same value they SHOULD compare equal as they'll both have the same issues in the same way.
Your issue though is that you are not comparing like with like.
4.375 and 4.385 are not floats they are doubles and get converted to be stored so when you compare them later it's possible that the converted value is not quite identical. If you write 4.385f and 4.385f to use float values you should get YES both times.

Related

float variable doesn't meet the conditions (C)

I'm trying to get the user to input a number between 1.00000 to 0.00001 while edges not included into a float variable. I can assume that the user isn't typing more than 5 numbers after the dot.
now, here is what I have written:
printf("Enter required Leibniz gap.(Between 0.00001 to 1.00000)\n");
scanf("%f", &gap);
while ((gap < 0.00002) || (gap > 0.99999))
{
printf("Enter required Leibniz gap.(Between 0.00001 to 1.00000)\n");
scanf("%f", &gap);
}
now, when I'm typing the smallest number possible: 0.00002 in getting stuck in the while loop.
when I run the debugger I saw that 0.00002 is stored with this value in the float variable: 1.99999995e-005
anybody can clarify for me what am I doing wrong? why isn't 0.00002 meeting the conditions? what is this "1.99999995e-005" thing.
The problem here is that you are using a float variable (gap), but you are comparing it with a double constant (0.00002). The constant is double because floating-point constants in C are double unless otherwise specified.
An underlying issue is that the number 0.00002 is not representable in either float or double. (It's not representable at all in binary floating point because it's binary expansion is infinitely long, like the decimal expansion of &frac13;.) So when you write 0.00002 in a program, the C compiler substitutes it with a double value which is very close to 0.00002. Similarly, when scanf reads the number 0.00002 into a float variable, it substitutes a float value which is very close to 0.00002. Since double numbers have more bits than floats, the double value is closer to 0.00002 than the float value.
When you compare two floating point values with different precision, the compiler converts the value with less precision into exactly the same value with more precision. (The set of values representable as double is a superset of the set of values representable as float, so it is always possible to find a double whose value is the same as the value of a float.) And that's what happens when gap < 0.00002 is executed: gap is converted to the double of the same value, and that is compared with the double (close to) 0.00002. Since both of these values are actually slightly less than 0.00002, and the double is closer, the float is less than the double.
You can solve this problem in a couple of ways. First, you can avoid the conversion, either by making gap a double and changing the scanf format to %lf, or by comparing gap to a float:
while (gap < 0.00002F || gap > 0.99999F) {
But that's not really correct, for a couple of reasons. First, there is actually no guarantee that the floating point conversion done by the C compiler is the same as the conversion done by the standard library (scanf), and the standard allows the compiler to use "either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner." (It doesn't specify in detail which value scanf produces either, but recommends that it be the nearest representable value.) As it happens, gcc and glibc (the C compiler and standard library used on Linux) both produce the nearest representable value, but other implementations don't.
Anyway, according to your error message, you want the value to be between 0.00001 and 1.00000. So your test should be precisely that:
while (gap <= 0.00001F || gap >= 1.0000F) { ...
(assuming you keep gap as a float.)
Any of the above solutions will work. Personally, I'd make gap a double in order to make the comparison more intuitive, and also change the comparison to compare against 0.00001 and 1.0000.
By the way, the E-05 suffix means "times ten to the power of -5" (the E stands for Exponent). You'll see that a lot; it's a standard way of writing floating point constants.
floats are not capable of storing exact values for every possible number (infinite numbers between 0-1 therefore impossible). Assigning 0.00002 to a float will have a different but really close number due to the implementation which is what you are experiencing. Precision decreases as the number grows.
So you can't directly compare two close floats and have healthy results.
More information on floating points can be found on this Wikipedia page.
What you could do is emulate fixed point math. Have an int n = 100000; to represent 1.00000 internally (1000 -> 0.001 and such) and do calculations accordingly or use a fixed point math library.
Fraction part of single precision floating numbers can represent numbers from -2 to 2-2^-23 and have a fraction part with smallest quantization step of 2^-23. So if some value cannot be represented with a such step then it represented with a nearest value according to IEEE 754 rounding rules:
0.00002*32768 = 0.655360043 // floating point exponent is chosen.
0.655360043/(2^-23) = 5497558.5 // is not an integer multiplier
// of quantization step, so the
5497558*(2^-23) = 0.655359983 // nearest value is chosen
5497559*(2^-23) = 0.655360103 // from these two variants
First one variant equals to 1.999969797×10⁻⁵ in decimal format and the second one equals to 1.999999948×10⁻⁵ (just to compare - if we choose 5497560 we get 2.000000677×10⁻⁵). So the second variant can be choosen as a result and its value is not equal to 0.00002.
The total precision of floating point number depends on exponent value as well (takes values from -128 to 127): it can be computed by multiplication of fraction part quantization step and exponent value. In case of 0.00002 total precision is (2^-23)×(2^-15) = 3.6×(10^-12). It means if we add to 0.00002 a value which is smaller than a half of this value than 0.00002 remains the same. In general it means that numbers of floating point number which is meaningful are from 1×exponent to 2×(10^-23)×exponent.
That is why a very popular approach is to compare two floating numbers using some epsilon value which is greater than quantization step.
Like some of the comments said, due to how floating point numbers are represented, you will see errors like this.
A solution to this is convert it to
gap + 1e-8 < 0.0002
This gives you a small window of tolerance enough to let most cases you want to pass and most you dont want to fail

recurring binary for decimal number [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 9 years ago.
float a;
a=8.3;
if(a==8.3)
printf("1");
else
printf("2");
giving a as 8.3 and 8.4 respectively and comparing with 8.3 and 8.4 correspondingly , output becomes 2 but when comparing with 8.5 output is 1. I found that it is related to concept of recurring binary which takes 8 bytes. I want to know how to find which number is recurring binary. kindly give some input.
Recurring numbers are not representable, hence floating point comparison will not work.
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Also as in the 2nd comment - floating point literals 8.3 has type double and a has type float.
Comparing with epsilon – absolute error
Since floating point calculations involve a bit of uncertainty we can try to allow for this by seeing if two numbers are ‘close’ to each other. If you decide – based on error analysis, testing, or a wild guess – that the result should always be within 0.00001 of the expected result then you can change your comparison to this:
if (fabs(result - expectedResult) < 0.00001)
For example, 3/7 is a repeating binary fraction, its computed value in double precision is different from its stored value in single precision. Thus the comparison 3/7 with its stored computed value fails.
For more please read - What Every Computer Scientist Should Know About Floating-Point Arithmetic
You should not compare floating point numbers for equality using ==. Because of how floating point numbers are actually stored in memory it will give inaccurate results.
Use something like this to determine if your number a is close enough to the desired value:
if(fabs(a-8.3) < 0.0000005))
There are two problems here.
First is that floating point literals like 8.3 have type double, while a has type float. Doubles and floats store values to different precisions, and for values that don't have an exact floating point representation (such as 8.3), the stored values are slightly different. Thus, the comparison fails.
You could fix this by writing the comparison as a==8.3f; the f suffix forces the literal to be a float instead of a double.
However, it's bad juju to compare floating point values directly; again, most values cannot be represented exactly, but only to an approximation. If a were the result of an expression involving multiple floating-point calcuations, it may not be equivalent to 8.3f. Ideally, you should look at the difference between the two values, and if it's less than some threshold, then they are effectively equivalent:
if ( fabs( a - 8.3f) < EPSILON )
{
// a is "equal enough" to 8.3
}
The exact value of EPSILON depends on a number of factors, not least of which is the magnitude of the values being compared. You only have so many digits of precision, so if the values you're trying to compare are greater than 999999.0, then you can't test for differences within 0.000001 of each other.

How can I round a float to a given decimal precision

I need to convert a floating-point number with system precision to one with a specified precision (e.g. 3 decimal places) for the printed output. The fprintf function will not suffice for this as it will not correctly round some numbers. All the other solutions I've tried fail in that they all reintroduce undesired precision when I convert back to a float. For example:
float xf_round1_f(float input, int prec) {
printf("%f\t",input);
int trunc = round(input * pow(10, prec));
printf("%f\t",(float)trunc);
input=(float)trunc / pow(10, prec);
printf("%f\n",input);
return (input);
}
This function prints the input, the truncated integer and the output to each line, and the result looks like this for some numbers supposed to be truncated to 3 decimal places:
49.975002 49975.000000 49.974998
49.980000 49980.000000 49.980000
49.985001 49985.000000 49.985001
49.990002 49990.000000 49.990002
49.995003 49995.000000 49.994999
50.000000 50000.000000 50.000000
You can see that the second step works as intended - even when "trunc" is cast to float for printing - but as soon as I convert it back to a float the precision returns. The 1st and 6th rows illustrate problem cases.
Surely there must be a way of resolving this - even if the 1st row result remained 49.975002 a formatted print would give the desired effect, but in this case there is a real problem.
Any solutions?
Binary floating-point cannot represent most decimal numerals exactly. Each binary floating-point number is formed by multiplying an integer by a power of two. For the common implementation of float, IEEE-754 32-bit binary floating-point, that integer must be in (–224, 224). There is no integer x and integer y such that x•2y exactly equals 49.975. Therefore, when you divide 49975 by 1000, the result must be an approximation.
If you merely need to format a number for output, you can do this with the usual fprintf format specifiers. If you need to compute exactly with such numbers, you may be able to do it by scaling them to representable values and doing the arithmetic either in floating-point or in integer arithmetic, depending on your needs.
Edit: it appears you may only care about the printed results. printf is generally smart enough to do proper rounding to the number of digits you specify. If you give a format of "%.3f" you will probably get what you need.
If your only problem is with the cases that are below the desired number, you can easily fix it by making everything higher than the desired number instead. Unfortunately this increases the absolute error of the answer; even a result that was exact before, such as 50.000 is now off.
Simply add this line to the end of the function:
input=nextafterf(input, input*1.0001);
See it in action at http://ideone.com/iHNTzs
49.975002 49975.000000 49.974998 49.975002
49.980000 49980.000000 49.980000 49.980003
49.985001 49985.000000 49.985001 49.985004
49.990002 49990.000000 49.990002 49.990005
49.995003 49995.000000 49.994999 49.995003
50.000000 50000.000000 50.000000 50.000004
If you require exact representation of all decimal fractions with three digits after the decimal point, you can work in thousandths. Use an integer data type to represent one thousand times the actual number for all intermediate results.
Fixed point numbers. That is where you keep the actual numbers in a wide precision integer format, for example long or long long. And you also keep the number of decimal places. And then you will also need methods to scale the fixed point number by the decimal places. And some way to convert to/from strings.
The reason why you are having trouble that 1/10 is not representable exactly as a fractional power of 2 (1/2, 1/4, 1/8, etc). This is the same reason that 1/3 is a repeating decimal in base 10 (0.33333...).

Can anyone explain me this feature simply?

I have the following code,
float a = 0.7;
if(0.7 > a)
printf("Hi\n");
else
printf("Hello\n"); //Line1
and
float a = 0.98;
if(0.98 > a)
printf("Hi\n");
else
printf("Hello\n"); //Line2
here line1 outputs Hi but Line2 outputs Hello. I assume there would be a certain criteria about double constant and float, i.e any one of them would become larger on evaluation. But this two codes clarify me that situation can be come when double constant get larger and some other times float get larger. Is there any rounding off issue behind this? If it is, please explain me. I am badly in need of this clear..
thanks advance
What you have is called representation error.
To see what is going on you might find it easier to first consider the decimal representations of 1/3, 1/2 and 2/3 stored with different precision (3 decimal places or 6 decimal places):
a = 0.333
b = 0.333333
a < b
a = 0.500
b = 0.500000
a == b
a = 0.667
b = 0.666667
a > b
Increasing the precision can make the number slightly larger, slightly smaller, or have the same value.
The same logic applies to binary floating point numbers.
float a = 0.7;
Now a is the closest single-precision floating point value to 0.7. For the comparison 0.7 > a that is promoted to double, since the type of the constant 0.7 is double, and its value is the closest double-precision floating point value to 0.7. These two values are different, since 0.7 isn't exactly representable, so one value is larger than the other.
The same applies to 0.98. Sometimes, the closest single-precision value is larger than the decimal fraction and the closest double-precision number smaller, sometimes the other way round.
This is part of What Every Computer Scientist Should Know About Floating-Point Arithmetic.
This is simply one of the issues with floating point precision.
While there are an infinite number of floating point numbers, there are not an infinite number of floating point representations due to the bit-constraints. So there will be rounding errors when using floats in this manner.
There is no criteria for where it decides to round up or down, that would probably be language -implementation or compiler dependent.
See here: http://en.wikipedia.org/wiki/Floating_point, and http://en.wikipedia.org/wiki/IEEE_754 for more details.

Why is not a==0 in the following code?

#include <stdio.h>
int main( )
{
float a=1.0;
long i;
for(i=0; i<100; i++)
{
a = a - 0.01;
}
printf("%e\n",a);
}
Result is: 6.59e-07
It's a binary floating point number, not a decimal one - therefore you need to expect rounding errors. See the Basic section in this article:
What Every Programmer Should Know About Floating-Point Arithmetic
For example, the value 0.01 does not have a precise represenation in binary floating point type. To get a "correct" result in your sample you would have to either round or use a a decimal floating point type (see Wikipedia):
Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixed-point or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.
There are two questions here. If you're asking, why is my printf statement displaying the result as 6.59e-07 instead of 0.000000659, it's because you've used the format specifier for Scientific Notation: %e. You want %f for the floating point a.
printf("%f\n",a);
If you're asking why the result is not exactly zero rather than 0.000000659, the answer is (as others have pointed out) that with floating point arithmetic using binary numbers you need to expect rounding.
You have to specify %f for printing the float number then it will print 0 for variable a.
That's floating point numbers rounding errors on the scene. Each time you subtract a fraction you get approximately the result you'd normally expect from a number on paper and so the final result is very close to zero, but not necessarily precise zero.
The precision with floating numbers isn't accurate, that's why you find this result.
Cordially

Resources