float variable doesn't meet the conditions (C) - c

I'm trying to get the user to input a number between 1.00000 to 0.00001 while edges not included into a float variable. I can assume that the user isn't typing more than 5 numbers after the dot.
now, here is what I have written:
printf("Enter required Leibniz gap.(Between 0.00001 to 1.00000)\n");
scanf("%f", &gap);
while ((gap < 0.00002) || (gap > 0.99999))
{
printf("Enter required Leibniz gap.(Between 0.00001 to 1.00000)\n");
scanf("%f", &gap);
}
now, when I'm typing the smallest number possible: 0.00002 in getting stuck in the while loop.
when I run the debugger I saw that 0.00002 is stored with this value in the float variable: 1.99999995e-005
anybody can clarify for me what am I doing wrong? why isn't 0.00002 meeting the conditions? what is this "1.99999995e-005" thing.

The problem here is that you are using a float variable (gap), but you are comparing it with a double constant (0.00002). The constant is double because floating-point constants in C are double unless otherwise specified.
An underlying issue is that the number 0.00002 is not representable in either float or double. (It's not representable at all in binary floating point because it's binary expansion is infinitely long, like the decimal expansion of &frac13;.) So when you write 0.00002 in a program, the C compiler substitutes it with a double value which is very close to 0.00002. Similarly, when scanf reads the number 0.00002 into a float variable, it substitutes a float value which is very close to 0.00002. Since double numbers have more bits than floats, the double value is closer to 0.00002 than the float value.
When you compare two floating point values with different precision, the compiler converts the value with less precision into exactly the same value with more precision. (The set of values representable as double is a superset of the set of values representable as float, so it is always possible to find a double whose value is the same as the value of a float.) And that's what happens when gap < 0.00002 is executed: gap is converted to the double of the same value, and that is compared with the double (close to) 0.00002. Since both of these values are actually slightly less than 0.00002, and the double is closer, the float is less than the double.
You can solve this problem in a couple of ways. First, you can avoid the conversion, either by making gap a double and changing the scanf format to %lf, or by comparing gap to a float:
while (gap < 0.00002F || gap > 0.99999F) {
But that's not really correct, for a couple of reasons. First, there is actually no guarantee that the floating point conversion done by the C compiler is the same as the conversion done by the standard library (scanf), and the standard allows the compiler to use "either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner." (It doesn't specify in detail which value scanf produces either, but recommends that it be the nearest representable value.) As it happens, gcc and glibc (the C compiler and standard library used on Linux) both produce the nearest representable value, but other implementations don't.
Anyway, according to your error message, you want the value to be between 0.00001 and 1.00000. So your test should be precisely that:
while (gap <= 0.00001F || gap >= 1.0000F) { ...
(assuming you keep gap as a float.)
Any of the above solutions will work. Personally, I'd make gap a double in order to make the comparison more intuitive, and also change the comparison to compare against 0.00001 and 1.0000.
By the way, the E-05 suffix means "times ten to the power of -5" (the E stands for Exponent). You'll see that a lot; it's a standard way of writing floating point constants.

floats are not capable of storing exact values for every possible number (infinite numbers between 0-1 therefore impossible). Assigning 0.00002 to a float will have a different but really close number due to the implementation which is what you are experiencing. Precision decreases as the number grows.
So you can't directly compare two close floats and have healthy results.
More information on floating points can be found on this Wikipedia page.
What you could do is emulate fixed point math. Have an int n = 100000; to represent 1.00000 internally (1000 -> 0.001 and such) and do calculations accordingly or use a fixed point math library.

Fraction part of single precision floating numbers can represent numbers from -2 to 2-2^-23 and have a fraction part with smallest quantization step of 2^-23. So if some value cannot be represented with a such step then it represented with a nearest value according to IEEE 754 rounding rules:
0.00002*32768 = 0.655360043 // floating point exponent is chosen.
0.655360043/(2^-23) = 5497558.5 // is not an integer multiplier
// of quantization step, so the
5497558*(2^-23) = 0.655359983 // nearest value is chosen
5497559*(2^-23) = 0.655360103 // from these two variants
First one variant equals to 1.999969797×10⁻⁵ in decimal format and the second one equals to 1.999999948×10⁻⁵ (just to compare - if we choose 5497560 we get 2.000000677×10⁻⁵). So the second variant can be choosen as a result and its value is not equal to 0.00002.
The total precision of floating point number depends on exponent value as well (takes values from -128 to 127): it can be computed by multiplication of fraction part quantization step and exponent value. In case of 0.00002 total precision is (2^-23)×(2^-15) = 3.6×(10^-12). It means if we add to 0.00002 a value which is smaller than a half of this value than 0.00002 remains the same. In general it means that numbers of floating point number which is meaningful are from 1×exponent to 2×(10^-23)×exponent.
That is why a very popular approach is to compare two floating numbers using some epsilon value which is greater than quantization step.

Like some of the comments said, due to how floating point numbers are represented, you will see errors like this.
A solution to this is convert it to
gap + 1e-8 < 0.0002
This gives you a small window of tolerance enough to let most cases you want to pass and most you dont want to fail

Related

What happens if we keep dividing float 1.0 by 2 until it reaches zero?

float f = 1.0;
while (f != 0.0) f = f / 2.0;
This loop runs 150 times using 32-bit precision. Why is that so? Is it getting rounded to zero?
In common C implementations, the IEEE-754 binary32 format is used for float. It is also called “single precision.” It is a binary based format where finite numbers are represented as ±f•2e, where f is a 24-bit binary numeral in [1, 2) and e is an integer in [−126, 127].
In this format, 1 is represented as +1.000000000000000000000002•20. Dividing that by 2 yields ½, which is represented as +1.000000000000000000000002•2−1. Dividing that by 2 yields +1.000000000000000000000002•2−2, then +1.000000000000000000000002•2−3, and so on until we reach +1.000000000000000000000002•2−126.
When that is divided by two, the mathematical result is +1.000000000000000000000002•2−127, but −127 is below the normal exponent range, [−126, 127]. Instead, the significand becomes denormalized; 2−127 is represented by +0.100000000000000000000002•2−126. Dividing that by 2 yields +0.010000000000000000000002•2−126, then +0.001000000000000000000002•2−126, +0.000100000000000000000002•2−126, and so on until we get to +0.000000000000000000000012•2−126.
At this point, we have done 149 divisions by 2; +0.000000000000000000000012•2−126 is 2−149.
When the next division is performed, the result would be 2−150, but that is not representable in this format. Even with the lowest non-zero significand, 0.000000000000000000000012, and the lowest exponent, −126, we cannot get to 2−150. The next lower representable number is +0.000000000000000000000002•2−126, which equals 0.
So, the real-number-arithmetic result of the division would be 2−150, but we cannot represent that in this format. The two nearest representable numbers are +0.000000000000000000000012•2−126 just above it and +0.000000000000000000000002•2−126 just below it. They are equally near 2−150. The default rounding method is to take the nearest representable number and, in case of ties, to take the number with the even low digit. So +0.000000000000000000000002•2−126 wins the tie, and that is produced as the result for the 150th division.
What happens is simply that your system has only a limited number of bits available for a variable, and hence limited precision; even though, mathematically, you can halve a number (!= 0) indefinitely without ever reaching zero, in a computer implementation that has a limited precision for a float variable, that variable will inevitably, at some stage, become indistinguishable from zero. The more bits your system uses, the more precision it has and the later this will happen, but at some stage it will.
Since I suppose this is meant to be C, I just implemented it in C (with a counter counting each iteration), and indeed it ran for 150 rounds until the loop ended. I also implemented it with a double, where it ran for 1075 iterations. Keep in mind, however, that the C standard does not define the exact precision of a float variable. In most implementations it's 32 bits for a float and 64 for a double. With a long double, I get 16,446 iterations.

printf behaviour in C

I am trying to understand what is the difference between the following:
printf("%f",4.567f);
printf("%f",4.567);
How does using the f suffix change/influence the output?
How using the 'f' changes/influences the output?
The f at the end of a floating point constant determines the type and can affect the value.
4.567 is floating point constant of type and precision of double. A double can represent exactly typical about 264 different values. 4.567 is not one on them*1. The closest alternative typically is exactly
4.56700000000000017053025658242404460906982421875 // best
4.56699999999999928235183688229881227016448974609375 // next best double
4.567f is floating point constant of type and precision of float. A float can represent exactly typical about 232 different values. 4.567 is not one on them. The closest alternative typically is exactly
4.566999912261962890625 // best
4.56700038909912109375 // next best float
When passed to printf() as part of the ... augments, a float is converted to double with the same value.
So the question becomes what is the expected difference in printing?
printf("%f",4.56700000000000017053025658242404460906982421875);
printf("%f",4.566999912261962890625);
Since the default number of digits after the decimal point to print for "%f" is 6, the output for both rounds to:
4.567000
To see a difference, print with more precision or try 4.567e10, 4.567e10f.
45670000000.000000 // double
45669998592.000000 // float
Your output may slightly differ to to quality of implementation issues.
*1 C supports many floating point encodings. A common one is binary64. Thus typical floating-point values are encoded as an sign * binary fraction * 2exponent. Even simple decimal values like 0.1 can not be represented exactly as such.

recurring binary for decimal number [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 9 years ago.
float a;
a=8.3;
if(a==8.3)
printf("1");
else
printf("2");
giving a as 8.3 and 8.4 respectively and comparing with 8.3 and 8.4 correspondingly , output becomes 2 but when comparing with 8.5 output is 1. I found that it is related to concept of recurring binary which takes 8 bytes. I want to know how to find which number is recurring binary. kindly give some input.
Recurring numbers are not representable, hence floating point comparison will not work.
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Also as in the 2nd comment - floating point literals 8.3 has type double and a has type float.
Comparing with epsilon – absolute error
Since floating point calculations involve a bit of uncertainty we can try to allow for this by seeing if two numbers are ‘close’ to each other. If you decide – based on error analysis, testing, or a wild guess – that the result should always be within 0.00001 of the expected result then you can change your comparison to this:
if (fabs(result - expectedResult) < 0.00001)
For example, 3/7 is a repeating binary fraction, its computed value in double precision is different from its stored value in single precision. Thus the comparison 3/7 with its stored computed value fails.
For more please read - What Every Computer Scientist Should Know About Floating-Point Arithmetic
You should not compare floating point numbers for equality using ==. Because of how floating point numbers are actually stored in memory it will give inaccurate results.
Use something like this to determine if your number a is close enough to the desired value:
if(fabs(a-8.3) < 0.0000005))
There are two problems here.
First is that floating point literals like 8.3 have type double, while a has type float. Doubles and floats store values to different precisions, and for values that don't have an exact floating point representation (such as 8.3), the stored values are slightly different. Thus, the comparison fails.
You could fix this by writing the comparison as a==8.3f; the f suffix forces the literal to be a float instead of a double.
However, it's bad juju to compare floating point values directly; again, most values cannot be represented exactly, but only to an approximation. If a were the result of an expression involving multiple floating-point calcuations, it may not be equivalent to 8.3f. Ideally, you should look at the difference between the two values, and if it's less than some threshold, then they are effectively equivalent:
if ( fabs( a - 8.3f) < EPSILON )
{
// a is "equal enough" to 8.3
}
The exact value of EPSILON depends on a number of factors, not least of which is the magnitude of the values being compared. You only have so many digits of precision, so if the values you're trying to compare are greater than 999999.0, then you can't test for differences within 0.000001 of each other.

How can I round a float to a given decimal precision

I need to convert a floating-point number with system precision to one with a specified precision (e.g. 3 decimal places) for the printed output. The fprintf function will not suffice for this as it will not correctly round some numbers. All the other solutions I've tried fail in that they all reintroduce undesired precision when I convert back to a float. For example:
float xf_round1_f(float input, int prec) {
printf("%f\t",input);
int trunc = round(input * pow(10, prec));
printf("%f\t",(float)trunc);
input=(float)trunc / pow(10, prec);
printf("%f\n",input);
return (input);
}
This function prints the input, the truncated integer and the output to each line, and the result looks like this for some numbers supposed to be truncated to 3 decimal places:
49.975002 49975.000000 49.974998
49.980000 49980.000000 49.980000
49.985001 49985.000000 49.985001
49.990002 49990.000000 49.990002
49.995003 49995.000000 49.994999
50.000000 50000.000000 50.000000
You can see that the second step works as intended - even when "trunc" is cast to float for printing - but as soon as I convert it back to a float the precision returns. The 1st and 6th rows illustrate problem cases.
Surely there must be a way of resolving this - even if the 1st row result remained 49.975002 a formatted print would give the desired effect, but in this case there is a real problem.
Any solutions?
Binary floating-point cannot represent most decimal numerals exactly. Each binary floating-point number is formed by multiplying an integer by a power of two. For the common implementation of float, IEEE-754 32-bit binary floating-point, that integer must be in (–224, 224). There is no integer x and integer y such that x•2y exactly equals 49.975. Therefore, when you divide 49975 by 1000, the result must be an approximation.
If you merely need to format a number for output, you can do this with the usual fprintf format specifiers. If you need to compute exactly with such numbers, you may be able to do it by scaling them to representable values and doing the arithmetic either in floating-point or in integer arithmetic, depending on your needs.
Edit: it appears you may only care about the printed results. printf is generally smart enough to do proper rounding to the number of digits you specify. If you give a format of "%.3f" you will probably get what you need.
If your only problem is with the cases that are below the desired number, you can easily fix it by making everything higher than the desired number instead. Unfortunately this increases the absolute error of the answer; even a result that was exact before, such as 50.000 is now off.
Simply add this line to the end of the function:
input=nextafterf(input, input*1.0001);
See it in action at http://ideone.com/iHNTzs
49.975002 49975.000000 49.974998 49.975002
49.980000 49980.000000 49.980000 49.980003
49.985001 49985.000000 49.985001 49.985004
49.990002 49990.000000 49.990002 49.990005
49.995003 49995.000000 49.994999 49.995003
50.000000 50000.000000 50.000000 50.000004
If you require exact representation of all decimal fractions with three digits after the decimal point, you can work in thousandths. Use an integer data type to represent one thousand times the actual number for all intermediate results.
Fixed point numbers. That is where you keep the actual numbers in a wide precision integer format, for example long or long long. And you also keep the number of decimal places. And then you will also need methods to scale the fixed point number by the decimal places. And some way to convert to/from strings.
The reason why you are having trouble that 1/10 is not representable exactly as a fractional power of 2 (1/2, 1/4, 1/8, etc). This is the same reason that 1/3 is a repeating decimal in base 10 (0.33333...).

Why does this floating-point loop terminate at 1,000,000?

The answer to this sample homework problem is "1,000,000", but I do not understand why:
What is the output of the following code?
int main(void) {
float k = 1;
while (k != k + 1) {
k = k + 1;
}
printf(“%g”, k); // %g means output a floating point variable in decimal
}
If the program runs indefinitely but produces no output, write INFINITE LOOP as the answer to the question. All of the programs compile and run. They may or may not contain serious errors, however. You should assume that int is four bytes. You should assume that float has the equivalent of six decimal digits of precision. You may round your answer off to the nearest power of 10 (e.g., you can say 1,000 instead of 210 (i.e., 1024)).
I do not understand why the loop would ever terminate.
It doesn't run forever for the simple reason that floating point numbers are not perfect.
At some point, k will become big enough so that adding 1 to it will have no effect.
At that point, k will be equal to k+1 and your loop will exit.
Floating point numbers can be differentiated by a single unit only when they're in a certain range.
As an example, let's say you have an integer type with 3 decimal digits of precision for a positive integer and a single-decimal-digit exponent.
With this, you can represent the numbers 0 through 999 perfectly as 000x100 through 999x100 (since 100 is 1):
What happens when you want to represent 1000? You need to use 100x101. This is still represented perfectly.
However, there is no accurate way to represent 1001 with this scheme, the next number you can represent is 101x101 which is 1010.
So, when you add 1 to 1000, you'll get the closest match which is 1000.
The code is using a float variable.
As specified in the question, float has 6 digits of precision, meaning that any digits after the sixth will be inaccurate. Therefore, once you pass a million, the final digit will be inaccurate, so that incrementing it can have no effect.
The output of this program is not specified by the C standard, since the semantics of the float type are not specified. One likely result (what you will get on a platform for which float arithmetic is evaluated in IEEE-754 single precision) is 2^24.
All integers smaller than 2^24 are exactly representable in single precision, so the computation will not stop before that point. The next representable single precision number after 2^24, however, is 2^24 + 2. Since 2^24 + 1 is exactly halfway between that number and 2^24, in the default IEEE-754 rounding mode it rounds to the one whose trailing bit is zero, which is 2^24.
Other likely answers include 2^53 and 2^64. Still other answers are possible. Infinity (the floating-point value) could result on a platform for which the default rounding mode is round up, for example. As others have noted, an infinite loop is also possible on platforms that evaluate floating-point expressions in a wider type (which is the source of all sorts of programmer confusion, but allowed by the C standard).
Actually, on most C compilers, this will run forever (infinite loop), though the precise behavior is implementation defined.
The reason that most compilers will give an infinite loop is that they evaluate all floating point expressions at double precision and only round values back to float (single) precision when storing into a variable. So when the value of k gets to about 2^24, k == k + 1 will still evaluate as false (as a double can hold the value k+1 without rounding), but the k = k + 1 assignment will be a noop, as k+1 needs to be rounded to fit into a float
edit
gcc on x86 gets this infinite loop behavior. Interestingly on x64 it does not, as it uses sse instructions which do the comparison in float precision.

Resources