C - Summation with Double - Precision - c

I have problem with precision of double format.
Sample example:
double K=0, L=0, M=0;
scanf("%lf %lf %lf", &K, &L, &M);
if((K+L) <= M) printf("Incorrect input");
else printf("Right, K=%f, L=%f, M=%f", K, L, M);
My test input:
K = 0.1, L = 0.2, M = 0.3 -> Condition but goes to 'else' statement.
How I can correct this difference? Is there any other method to summation?

In the world of Double Precision IEEE 754 binary floating-point format (the ones used on Intel and other processors) 0.1 + 0.2 == 0.30000000000000004 :-) And 0.30000000000000004 != 0.3 (and note that in the marvelous world of doubles, 0.1, 0.2 and 0.3 don't exist as "exact" quantities. There are some double numbers that are very near them, but if you printed them with full precision, they wouldn't be 0.1, 0.2 and 0.3)
To laugh a little, try this: http://pages.cs.wisc.edu/~rkennedy/exact-float
Insert a decimal number and look at the second and third row, it shows how the number is really represented in memory. It's for Delphi, but Double and Single are the same for Delphi and for probably all the C compilers for Intel processors (they are called double and float in C)
And if you want to try for yourself, look at this http://ideone.com/WEL7h
#include <stdio.h>
int main()
{
double d1 = (0.1 + 0.2);
double d2 = 0.3;
printf("%.20e\n%.20e", d1, d2);
return 0;
}
output:
3.00000000000000044409e-01
2.99999999999999988898e-01
(be aware that the output is compiler dependant. Depending on the options, 0.1 + 0.2 could be compiled and rounded to 0.3)

Unlike integer values floating point values are not stored exactly the way you assign values to them. Lets consider the following code:
int i = 1; // this is and always will be 1
float j = 0.03 // this gets stored at least on my machine as something like 0.029999999
Why is this so? Well how many floating point number exist in the interval between 0.1 and 0.2?
An infinite number! So there are values which will get stored as you intended but a hell of a lot of values which will be stored with a small error.
This is the reason why comparing floating point values for equality is not a good idea. Try something like this instead:
float a = 0.3f;
float b = 0.301f;
float threshold = 1e-6;
if( abs(a-b) < threshold )
return true;
else
return false;

There are infinitely many real numbers between any two distinct real numbers. If we were to be able to represent every one of those, we would need infinite memory. Since we only have finite memory, floating point numbers need to be stored with only finite precision. Up to that finite precision, it might be not be true that 0.1 + 0.2 <= 0.3.
Now, you really should go read what's at the other end of the excellent link provided by Paul R.

Related

While loop condition with double and float

i have a simple task that says 'Write the value of y with the following formula for the range between xmin and xmax with the difference of dx.
The only problem i have is that when using while with float, such as in code i am going to provide, i am getting one less output of y than i should have.
For the following code
#include <stdio.h>
int main() {
float x,xmin,xmax,dx,y;
printf("Input the values of xmin xmax i dx");
scanf("%f%f%f",&xmin,&xmax,&dx);
x=xmin;
while(x<=xmax) {
y=(x*x-2*x-2)/(x*x+1);
printf("%.3f %.3f\n",x,y);
x=x+dx;
}
}
for the input of (-2 2 0.2) i get output only up to 1.8 (that's 20 outputs) and not up to 2.
But when i use double instead of float everything works just fine (Has 21 outputs).
Is there something connected to the while condition that i am not aware of?
That makes sense. Float or double are an approximation rather an exact representation of Rational Numbers a/b:integers, b!=0. The closer you are to 1.000... the better the approximation but still an approximation.
A subset of rational numbers guaranteed to be exactly represented by floating point representation are rationals: 2^k, with k:integer [-126<= x <= 127 . Eg. const float dx = 0.25f; ~ 1/(2^2) would have worked fine.
0.2 is not represented as 0.2 rather as: 0.20000000298023223876953125
The next closest approximation to 0.2 is: 0.199999988079071044921875
https://www.h-schmidt.net/FloatConverter/IEEE754.html
An alternative way to loop floats might be:
#include <stdio.h>
int main() {
float x,xmin,xmax,dx,y;
printf("Input the values of xmin xmax i dx");
scanf("%f%f%f",&xmin,&xmax,&dx);
x=xmin;
//expected cummulative error
const float e = 0.7 * dx;
do
{
y=(x*x-2*x-2)/(x*x+1);
printf("%.3f %.3f\n",x,y);
x=x+dx;
}
while(!(x > (xmax + e)));
}
The solution above appears to be working as expected but it would only do so for small number of iterations.

Round off error in C (Forward and Backward sum)

#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
int main() {
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
float sum=0.0;
for(int i=1;i<=1000000;i++)
sum+=(1.0)/i;
printf("Forward sum is %f ",sum);
sum=0.0;
for(int i=1000000;i>=1;i--)
sum+=(1.0)/i;
printf("Backward sum is %f ",sum);
return 0;
}
Output:-
Forward sum is :- 14.357358
Backward sum is :- 14.392652.
Why is there a difference in both sums ? I think that there is some precision error which is causing the difference in both the sums but I am not able to get a clear picture of why is this happening.
This is one of the surprising aspects of floating-point arithmetic: it actually matters what order you do things like addition in. (Formally, we say that floating-point addition is not commutative.)
It's pretty easy to see why this is the case, with a simpler, slightly artificial example. Let's say you have this addition problem:
1000000. + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
But let's say that you're using a single-precision floating-point format that has only 7 digits of precision. So even though you might think that 1000000.0 + 0.1 would be 1000000.1, actually it would be rounded off to 1000000.. So 1000000.0 + 0.1 + 0.1 would also be 1000000., and adding in all 10 copies of 0.1 would still result in just 1000000., also.
But if instead you tried this:
0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 1000000.
Now, when you add 0.1 + 0.1, there's no problem with precision, so you get 0.2. So it you add 0.1 ten times, you get 1.0. So if you do the whole problem in that order, you'll get 1000001..
You can see this yourself. Try this program:
#include <stdio.h>
int main()
{
float f1 = 100000.0, f2 = 0.0;
int i;
for(i = 0; i < 10; i++) {
f1 += 0.1;
f2 += 0.1;
}
f2 += 100000.0;
printf("%.1f %.1f\n", f1, f2);
}
On my computer, this prints 100001.0 100001.0, as expected. But if I change the two big numbers to 10000000.0, then it prints 10000000.0 10000001.0. The two numbers are clearly unequal.
The first loop starts with adding relatively large parts to the sum, and decreasingly smaller parts when the sum gets larger. So while more bits are needed to represent the sum, less bits are available for the small parts.
In the second loop, small parts are added to the sum, and increasingly larger parts are added when the sum gets larger. So less bits are required to store the the newly added part relative to the current value of sum.
(Not a very scientific explanation, but i hope this verbal attempt makes the principle clear)
N.b.: it also means the second result is more accurate.
In an attempt to be more precise: in order to add two floating point numbers they need to be scaled to have the same number of bits for mantissa and exponent. When the sum gets larger, the item added to will be scaled so as not to loose significance of this sum. As a result, the least significant bits of the part to be added will be scaled out of the register before the addition. For example (hypothetical) adding 0.00000001 to 1,000,000,000 will result in adding zero to this large number.

Looping over a range of floats in C

I am trying to write a program in C to accomplish the following task.
Input: Three double-precision numbers, a, b, and c.
Output: All the numbers from b to a, that can be reached by decrements of c.
Here is a simple program (filename: range.c).
#include <stdlib.h>
#include <stdio.h>
int main()
{
double high, low, step, var;
printf("Enter the <lower limit> <upperlimit> <step>\n>>");
scanf("%lf %lf %lf", &low, &high, &step);
printf("Number in the requested range\n");
for (var = high; var >= low; var -= step)
printf("%g\n", var);
return 0;
}
However, the for loop behaves rather bizarrely for some inputs. For instance, the following.
10-236-49-81:stackoverflow pavithran$ ./range.o
Enter the <lower limit> <upperlimit> <step>
>>0.1 0.9 0.2
Number in the requested range
0.9
0.7
0.5
0.3
10-236-49-81:stackoverflow pavithran$
I cannot figure out why the loop quits at var = 0.1. While for another input, it behaves as expected.
10-236-49-81:stackoverflow pavithran$ ./range.o
Enter the <lower limit> <upperlimit> <step>
>>0.1 0.5 0.1
Number in the requested range
0.5
0.4
0.3
0.2
0.1
10-236-49-81:stackoverflow pavithran$
Had the weird behaviour in the first situation got something to do with numeric precision?
How can I ensure that the range will always contain floor((high - low)/step) + 1 numbers?
I have tried an alternate method of looping over floats, where I scale the loop variables to integers, and print the result of the loop variable divided by the scaling used. But there's perhaps a better way...
Using a double as a counter in a for loop requires very careful consideration. In many instances it's best avoided.
I'm sure you know that not all numbers that are exact in decimal are also exact in binary floating point. In fact, for IEEE754 floating point, only dyadic rationals are. So 0.5 is, but 0.4, 0.3, 0.2, and 0.1 are not.
The closest IEEE754 floating point double to 0.2 is actually the slightly larger 0.200000000000000011102230246251565404236316680908203125.
In your case a repeated subtraction of this from 0.9 eventually causes a number whose first significant figure is a to become a number whose first significant figure is a - 3: your bug then manifests itself.
The simple remedy is to work in integers, decement by 1 each time, and scale your output using step.

Strange output when using float instead of double

Strange output when I use float instead of double
#include <stdio.h>
void main()
{
double p,p1,cost,cost1=30;
for (p = 0.1; p < 10;p=p+0.1)
{
cost = 30-6*p+p*p;
if (cost<cost1)
{
cost1=cost;
p1=p;
}
else
{
break;
}
printf("%lf\t%lf\n",p,cost);
}
printf("%lf\t%lf\n",p1,cost1);
}
Gives output as expected at p = 3;
But when I use float the output is a little weird.
#include <stdio.h>
void main()
{
float p,p1,cost,cost1=40;
for (p = 0.1; p < 10;p=p+0.1)
{
cost = 30-6*p+p*p;
if (cost<cost1)
{
cost1=cost;
p1=p;
}
else
{
break;
}
printf("%f\t%f\n",p,cost);
}
printf("%f\t%f\n",p1,cost1);
}
Why is the increment of p in the second case going weird after 2.7?
This is happening because the float and double data types store numbers in base 2. Most base-10 numbers can’t be stored exactly. Rounding errors add up much more quickly when using floats. Outside of embedded applications with limited memory, it’s generally better, or at least easier, to use doubles for this reason.
To see this happening for double types, consider the output of this code:
#include <stdio.h>
int main(void)
{
double d = 0.0;
for (int i = 0; i < 100000000; i++)
d += 0.1;
printf("%f\n", d);
return 0;
}
On my computer, it outputs 9999999.981129. So after 100 million iterations, rounding error made a difference of 0.018871 in the result.
For more information about how floating-point data types work, read What Every Computer Scientist Should Know About Floating-Point Arithmetic. Or, as akira mentioned in a comment, see the Floating-Point Guide.
Your program can work fine with float. You don't need double to compute a table of 100 values to a few significant digits. You can use double, and if you do, it will have chances to work even if you use binary floating-point binary at cross-purposes. The IEEE 754 double-precision format used for double by most C compilers is so precise that it makes many misuses of floating-point unnoticeable (but not all of them).
Values that are simple in decimal may not be simple in binary
A consequence is that a value that is simple in decimal may not be represented exactly in binary.
This is the case for 0.1: it is not simple in binary, and it is not represented exactly as either double or float, but the double representation has more digits and as a result, is closer to the intended value 1/10.
Floating-point operations are not exact in general
Binary floating-point operations in a format such as float or double have to produce a result in the intended format. This leads to some digits having to be dropped from the result each time an operation is computed. When using binary floating-point in an advanced manner, the programmer sometimes knows that the result will have few enough digits for all the digits to be represented in the format (in other words, sometimes a floating-point operation can be exact and advanced programmers can predict and take advantage of conditions in which this happens). But here, you are adding 0.1, which is not simple and (in binary) uses all the available digits, so most of the times, this addition is not be exact.
How to print a small table of values using only float
In for (p = 0.1; p < 10;p=p+0.1), the value of p, being a float, will be rounded at each iteration. Each iteration will be computed from a previous iteration that was already rounded, so the rounding errors will accumulate and make the end result drift away from the intended, mathematical value.
Here is a list of improvements over what you wrote, in reverse order of exactness:
for (i = 1, p = 0.1f; i < 100; i++, p = i * 0.1f)
In the above version, 0.1f is not exactly 1/10, but the computation of p involves only one multiplication and one rounding, instead of up to 100. That version gives a more precise approximation of i/10.
for (i = 1, p = 0.1f; i < 100; i++, p = i * 0.1)
In the very slightly different version above, i is multiplied by the double value 0.1, which more closely approximates 1/10. The result is always the closest float to i/10, but this solution is cheating a bit, since it uses a double multiplication. I said a solution existed with only float!
for (i = 1, p = 0.1f; i < 100; i++, p = i / 10.0f)
In this last solution, p is computed as the division of i, represented exactly as a float because it is a small integer, by 10.0f, which is also exact for the same reason. The only computation approximation is that of a single operation, and the arguments are exactly what we wanted them to, so this is the best solution. It produces the closest float to i/10 for all values of i between 1 and 99.

what's is going wrong with this loop condition? [duplicate]

This question already has answers here:
How dangerous is it to compare floating point values?
(12 answers)
Closed 9 years ago.
Look at the output of this link(scroll down to see the output) to find out what I'm trying to accomplish
The problem is with the for loop on line number 9-11
for(i=0; i<=0.9; i+=0.1){
printf("%6.1f ",i);
}
I expected this to print values from 0.0 until 0.9 but it stops after printing 0.8, any idea why ??
Using float here is source of problem. Instead, do it with an int:
int i;
for(i = 0; i <= 10; i++)
printf("%6.1f ", (float)(i / 10.0));
Output:
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Ideally floating point should not be used for iteration, but if you want to know why change your code and see how.
for(float i=0; i<=0.9f; ){
i+=0.1f;
System.out.println(i);
}
Here is the result.
0.1
0.2
0.3
0.4
0.5
0.6
0.70000005
0.8000001
0.9000001
your 9th value exceeds 0.9.
Floating point arithmetic is inexact in computing. This is because of the way that a computer represents floating point values. Here's an excerpt from an MSDN article on the subject:
Every decimal integer can be exactly represented by a binary integer; however, this is not >true for fractional numbers. In fact, every number that is irrational in base 10 will also be >irrational in any system with a base smaller than 10.
For binary, in particular, only fractional numbers that can be represented in the form p/q, >where q is an integer power of 2, can be expressed exactly, with a finite number of bits.
Even common decimal fractions, such as decimal 0.0001, cannot be represented exactly in >binary. (0.0001 is a repeating binary fraction with a period of 104 bits!)
Link to the full article: https://support.microsoft.com/kb/42980
Floating point number cannot precisely represent decimals, so rounding errors accumulate:
#include <iostream>
#include <iomanip>
using namespace std;
int main() {
float literal = 0.9;
float sum = 0;
for(int i = 0; i < 9; i++)
sum += 0.1;
cout << setprecision(10) << literal << ", " << sum << endl;
return 0;
}
Output:
0.8999999762, 0.9000000954
You loop is right, but the float comparison in loops is not safe.
The problem is that a binary floating point number cannot exactly represent 0.1
This would work.
for(i=0.0; i<=0.9001; i+=0.1){
printf("%6.1f ",i);

Resources