Round off error in C (Forward and Backward sum) - c

#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
int main() {
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
float sum=0.0;
for(int i=1;i<=1000000;i++)
sum+=(1.0)/i;
printf("Forward sum is %f ",sum);
sum=0.0;
for(int i=1000000;i>=1;i--)
sum+=(1.0)/i;
printf("Backward sum is %f ",sum);
return 0;
}
Output:-
Forward sum is :- 14.357358
Backward sum is :- 14.392652.
Why is there a difference in both sums ? I think that there is some precision error which is causing the difference in both the sums but I am not able to get a clear picture of why is this happening.

This is one of the surprising aspects of floating-point arithmetic: it actually matters what order you do things like addition in. (Formally, we say that floating-point addition is not commutative.)
It's pretty easy to see why this is the case, with a simpler, slightly artificial example. Let's say you have this addition problem:
1000000. + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
But let's say that you're using a single-precision floating-point format that has only 7 digits of precision. So even though you might think that 1000000.0 + 0.1 would be 1000000.1, actually it would be rounded off to 1000000.. So 1000000.0 + 0.1 + 0.1 would also be 1000000., and adding in all 10 copies of 0.1 would still result in just 1000000., also.
But if instead you tried this:
0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 1000000.
Now, when you add 0.1 + 0.1, there's no problem with precision, so you get 0.2. So it you add 0.1 ten times, you get 1.0. So if you do the whole problem in that order, you'll get 1000001..
You can see this yourself. Try this program:
#include <stdio.h>
int main()
{
float f1 = 100000.0, f2 = 0.0;
int i;
for(i = 0; i < 10; i++) {
f1 += 0.1;
f2 += 0.1;
}
f2 += 100000.0;
printf("%.1f %.1f\n", f1, f2);
}
On my computer, this prints 100001.0 100001.0, as expected. But if I change the two big numbers to 10000000.0, then it prints 10000000.0 10000001.0. The two numbers are clearly unequal.

The first loop starts with adding relatively large parts to the sum, and decreasingly smaller parts when the sum gets larger. So while more bits are needed to represent the sum, less bits are available for the small parts.
In the second loop, small parts are added to the sum, and increasingly larger parts are added when the sum gets larger. So less bits are required to store the the newly added part relative to the current value of sum.
(Not a very scientific explanation, but i hope this verbal attempt makes the principle clear)
N.b.: it also means the second result is more accurate.
In an attempt to be more precise: in order to add two floating point numbers they need to be scaled to have the same number of bits for mantissa and exponent. When the sum gets larger, the item added to will be scaled so as not to loose significance of this sum. As a result, the least significant bits of the part to be added will be scaled out of the register before the addition. For example (hypothetical) adding 0.00000001 to 1,000,000,000 will result in adding zero to this large number.

Related

Determining the number of decimal digits in a floating number

I am trying to write a program that outputs the number of the digits in the decimal portion of a given number (0.128).
I made the following program:
#include <stdio.h>
#include <math.h>
int main(){
float result = 0;
int count = 0;
int exp = 0;
for(exp = 0; int(1+result) % 10 != 0; exp++)
{
result = 0.128 * pow(10, exp);
count++;
}
printf("%d \n", count);
printf("%f \n", result);
return 0;
}
What I had in mind was that exp keeps being incremented until int(1+result) % 10 outputs 0. So for example when result = 0.128 * pow(10,4) = 1280, result mod 10 (int(1+result) % 10) will output 0 and the loop will stop.
I know that on a bigger scale this method is still inefficient since if result was a given input like 1.1208 the program would basically stop at one digit short of the desired value; however, I am trying to first find out the reason why I'm facing the current issue.
My Issue: The loop won't just stop at 1280; it keeps looping until its value reaches 128000000.000000.
Here is the output when I run the program:
10
128000000.000000
Apologies if my description is vague, any given help is very much appreciated.
I am trying to write a program that outputs the number of the digits in the decimal portion of a given number (0.128).
This task is basically impossible, because on a conventional (binary) machine the goal is not meaningful.
If I write
float f = 0.128;
printf("%f\n", f);
I see
0.128000
and I might conclude that 0.128 has three digits. (Never mind about the three 0's.)
But if I then write
printf("%.15f\n", f);
I see
0.128000006079674
Wait a minute! What's going on? Now how many digits does it have?
It's customary to say that floating-point numbers are "not accurate" or that they suffer from "roundoff error". But in fact, floating-point numbers are, in their own way, perfectly accurate — it's just that they're accurate in base two, not the base 10 we're used to thinking about.
The surprising fact is that most decimal (base 10) fractions do not exist as finite binary fractions. This is similar to the way that the number 1/3 does not even exist as a finite decimal fraction. You can approximate 1/3 as 0.333 or 0.3333333333 or 0.33333333333333333333, but without an infinite number of 3's it's only an approximation. Similarly, you can approximate 1/10 in base 2 as 0b0.00011 or 0b0.000110011 or 0b0.000110011001100110011001100110011, but without an infinite number of 0011's it, too, is only an approximation. (That last rendition, with 33 bits past the binary point, works out to about 0.0999999999767.)
And it's the same with most decimal fractions you can think of, including 0.128. So when I wrote
float f = 0.128;
what I actually got in f was the binary number 0b0.00100000110001001001101111, which in decimal is exactly 0.12800000607967376708984375.
Once a number has been stored as a float (or a double, for that matter) it is what it is: there is no way to rediscover that it was initially initialized from a "nice, round" decimal fraction like 0.128. And if you try to "count the number of decimal digits", and if your code does a really precise job, you're liable to get an answer of 26 (that is, corresponding to the digits "12800000607967376708984375"), not 3.
P.S. If you were working with computer hardware that implemented decimal floating point, this problem's goal would be meaningful, possible, and tractable. And implementations of decimal floating point do exist. But the ordinary float and double values any of is likely to use on any of today's common, mass-market computers are invariably going to be binary (specifically, conforming to IEEE-754).
P.P.S. Above I wrote, "what I actually got in f was the binary number 0b0.00100000110001001001101111". And if you count the number of significant bits there — 100000110001001001101111 — you get 24, which is no coincidence at all. You can read at single precision floating-point format that the significand portion of a float has 24 bits (with 23 explicitly stored), and here, you're seeing that in action.
float vs. code
A binary float cannot encode 0.128 exactly as it is not a dyadic rational.
Instead, it takes on a nearby value: 0.12800000607967376708984375. 26 digits.
Rounding errors
OP's approach incurs rounding errors in result = 0.128 * pow(10, exp);.
Extended math needed
The goal is difficult. Example: FLT_TRUE_MIN takes about 149 digits.
We could use double or long double to get us somewhat there.
Simply multiply the fraction by 10.0 in each step.
d *= 10.0; still incurs rounding errors, but less so than OP's approach.
#include <stdio.h>
#include <math.h> int main(){
int count = 0;
float f = 0.128f;
double d = f - trunc(f);
printf("%.30f\n", d);
while (d) {
d *= 10.0;
double ipart = trunc(d);
printf("%.0f", ipart);
d -= ipart;
count++;
}
printf("\n");
printf("%d \n", count);
return 0;
}
Output
0.128000006079673767089843750000
12800000607967376708984375
26
Usefulness
Typically, past FLT_DECMAL_DIG (9) or so significant decimal places, OP’s goal is usually not that useful.
As others have said, the number of decimal digits is meaningless when using binary floating-point.
But you also have a flawed termination condition. The loop test is (int)(1+result) % 10 != 0 meaning that it will stop whenever we reach an integer whose last digit is 9.
That means that 0.9, 0.99 and 0.9999 all give a result of 2.
We also lose precision by truncating the double value we start with by storing into a float.
The most useful thing we could do is terminate when the remaining fractional part is less than the precision of the type used.
Suggested working code:
#include <math.h>
#include <float.h>
#include <stdio.h>
int main(void)
{
double val = 0.128;
double prec = DBL_EPSILON;
double result;
int count = 0;
while (fabs(modf(val, &result)) > prec) {
++count;
val *= 10;
prec *= 10;
}
printf("%d digit(s): %0*.0f\n", count, count, result);
}
Results:
3 digit(s): 128

While loop condition with double and float

i have a simple task that says 'Write the value of y with the following formula for the range between xmin and xmax with the difference of dx.
The only problem i have is that when using while with float, such as in code i am going to provide, i am getting one less output of y than i should have.
For the following code
#include <stdio.h>
int main() {
float x,xmin,xmax,dx,y;
printf("Input the values of xmin xmax i dx");
scanf("%f%f%f",&xmin,&xmax,&dx);
x=xmin;
while(x<=xmax) {
y=(x*x-2*x-2)/(x*x+1);
printf("%.3f %.3f\n",x,y);
x=x+dx;
}
}
for the input of (-2 2 0.2) i get output only up to 1.8 (that's 20 outputs) and not up to 2.
But when i use double instead of float everything works just fine (Has 21 outputs).
Is there something connected to the while condition that i am not aware of?
That makes sense. Float or double are an approximation rather an exact representation of Rational Numbers a/b:integers, b!=0. The closer you are to 1.000... the better the approximation but still an approximation.
A subset of rational numbers guaranteed to be exactly represented by floating point representation are rationals: 2^k, with k:integer [-126<= x <= 127 . Eg. const float dx = 0.25f; ~ 1/(2^2) would have worked fine.
0.2 is not represented as 0.2 rather as: 0.20000000298023223876953125
The next closest approximation to 0.2 is: 0.199999988079071044921875
https://www.h-schmidt.net/FloatConverter/IEEE754.html
An alternative way to loop floats might be:
#include <stdio.h>
int main() {
float x,xmin,xmax,dx,y;
printf("Input the values of xmin xmax i dx");
scanf("%f%f%f",&xmin,&xmax,&dx);
x=xmin;
//expected cummulative error
const float e = 0.7 * dx;
do
{
y=(x*x-2*x-2)/(x*x+1);
printf("%.3f %.3f\n",x,y);
x=x+dx;
}
while(!(x > (xmax + e)));
}
The solution above appears to be working as expected but it would only do so for small number of iterations.

Looping over a range of floats in C

I am trying to write a program in C to accomplish the following task.
Input: Three double-precision numbers, a, b, and c.
Output: All the numbers from b to a, that can be reached by decrements of c.
Here is a simple program (filename: range.c).
#include <stdlib.h>
#include <stdio.h>
int main()
{
double high, low, step, var;
printf("Enter the <lower limit> <upperlimit> <step>\n>>");
scanf("%lf %lf %lf", &low, &high, &step);
printf("Number in the requested range\n");
for (var = high; var >= low; var -= step)
printf("%g\n", var);
return 0;
}
However, the for loop behaves rather bizarrely for some inputs. For instance, the following.
10-236-49-81:stackoverflow pavithran$ ./range.o
Enter the <lower limit> <upperlimit> <step>
>>0.1 0.9 0.2
Number in the requested range
0.9
0.7
0.5
0.3
10-236-49-81:stackoverflow pavithran$
I cannot figure out why the loop quits at var = 0.1. While for another input, it behaves as expected.
10-236-49-81:stackoverflow pavithran$ ./range.o
Enter the <lower limit> <upperlimit> <step>
>>0.1 0.5 0.1
Number in the requested range
0.5
0.4
0.3
0.2
0.1
10-236-49-81:stackoverflow pavithran$
Had the weird behaviour in the first situation got something to do with numeric precision?
How can I ensure that the range will always contain floor((high - low)/step) + 1 numbers?
I have tried an alternate method of looping over floats, where I scale the loop variables to integers, and print the result of the loop variable divided by the scaling used. But there's perhaps a better way...
Using a double as a counter in a for loop requires very careful consideration. In many instances it's best avoided.
I'm sure you know that not all numbers that are exact in decimal are also exact in binary floating point. In fact, for IEEE754 floating point, only dyadic rationals are. So 0.5 is, but 0.4, 0.3, 0.2, and 0.1 are not.
The closest IEEE754 floating point double to 0.2 is actually the slightly larger 0.200000000000000011102230246251565404236316680908203125.
In your case a repeated subtraction of this from 0.9 eventually causes a number whose first significant figure is a to become a number whose first significant figure is a - 3: your bug then manifests itself.
The simple remedy is to work in integers, decement by 1 each time, and scale your output using step.

Round positive value half-up to 2 decimal places in C

Typically, Rounding to 2 decimal places is very easy with
printf("%.2lf",<variable>);
However, the rounding system will usually rounds to the nearest even. For example,
2.554 -> 2.55
2.555 -> 2.56
2.565 -> 2.56
2.566 -> 2.57
And what I want to achieve is that
2.555 -> 2.56
2.565 -> 2.57
In fact, rounding half-up is doable in C, but for Integer only;
int a = (int)(b+0.5)
So, I'm asking for how to do the same thing as above with 2 decimal places on positive values instead of Integer to achieve what I said earlier for printing.
It is not clear whether you actually want to "round half-up", or rather "round half away from zero", which requires different treatment for negative values.
Single precision binary float is precise to at least 6 decimal places, and 20 for double, so nudging a FP value by DBL_EPSILON (defined in float.h) will cause a round-up to the next 100th by printf( "%.2lf", x ) for n.nn5 values. without affecting the displayed value for values not n.nn5
double x2 = x * (1 + DBL_EPSILON) ; // round half-away from zero
printf( "%.2lf", x2 ) ;
For different rounding behaviours:
double x2 = x * (1 - DBL_EPSILON) ; // round half-toward zero
double x2 = x + DBL_EPSILON ; // round half-up
double x2 = x - DBL_EPSILON ; // round half-down
Following is precise code to round a double to the nearest 0.01 double.
The code functions like x = round(100.0*x)/100.0; except it handles uses manipulations to insure scaling by 100.0 is done exactly without precision loss.
Likely this is more code than OP is interested, but it does work.
It works for the entire double range -DBL_MAX to DBL_MAX. (still should do more unit testing).
It depends on FLT_RADIX == 2, which is common.
#include <float.h>
#include <math.h>
void r100_best(const char *s) {
double x;
sscanf(s, "%lf", &x);
// Break x into whole number and fractional parts.
// Code only needs to round the fractional part.
// This preserves the entire `double` range.
double xi, xf;
xf = modf(x, &xi);
// Multiply the fractional part by N (256).
// Break into whole and fractional parts.
// This provides the needed extended precision.
// N should be >= 100 and a power of 2.
// The multiplication by a power of 2 will not introduce any rounding.
double xfi, xff;
xff = modf(xf * 256, &xfi);
// Multiply both parts by 100.
// *100 incurs 7 more bits of precision of which the preceding code
// insures the 8 LSbit of xfi, xff are zero.
int xfi100, xff100;
xfi100 = (int) (xfi * 100.0);
xff100 = (int) (xff * 100.0); // Cast here will truncate (towards 0)
// sum the 2 parts.
// sum is the exact truncate-toward-0 version of xf*256*100
int sum = xfi100 + xff100;
// add in half N
if (sum < 0)
sum -= 128;
else
sum += 128;
xf = sum / 256;
xf /= 100;
double y = xi + xf;
printf("%6s %25.22f ", "x", x);
printf("%6s %25.22f %.2f\n", "y", y, y);
}
int main(void) {
r100_best("1.105");
r100_best("1.115");
r100_best("1.125");
r100_best("1.135");
r100_best("1.145");
r100_best("1.155");
r100_best("1.165");
return 0;
}
[Edit] OP clarified that only the printed value needs rounding to 2 decimal places.
OP's observation that rounding of numbers "half-way" per a "round to even" or "round away from zero" is misleading. Of 100 "half-way" numbers like 0.005, 0.015, 0.025, ... 0.995, only 4 are typically exactly "half-way": 0.125, 0.375, 0.625, 0.875. This is because floating-point number format use base-2 and numbers like 2.565 cannot be exactly represented.
Instead, sample numbers like 2.565 have as the closest double value of 2.564999999999999947... assuming binary64. Rounding that number to nearest 0.01 should be 2.56 rather than 2.57 as desired by OP.
Thus only numbers ending with 0.125 and 0.625 area exactly half-way and round down rather than up as desired by OP. Suggest to accept that and use:
printf("%.2lf",variable); // This should be sufficient
To get close to OP's goal, numbers could be A) tested against ending with 0.125 or 0.625 or B) increased slightly. The smallest increase would be
#include <math.h>
printf("%.2f", nextafter(x, 2*x));
Another nudge method is found with #Clifford.
[Former answer that rounds a double to the nearest double multiple of 0.01]
Typical floating-point uses formats like binary64 which employs base-2. "Rounding to nearest mathmatical 0.01 and ties away from 0.0" is challenging.
As #Pascal Cuoq mentions, floating point numbers like 2.555 typically are only near 2.555 and have a more precise value like 2.555000000000000159872... which is not half way.
#BLUEPIXY solution below is best and practical.
x = round(100.0*x)/100.0;
"The round functions round their argument to the nearest integer value in floating-point
format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.6.
The ((int)(100 * (x + 0.005)) / 100.0) approach has 2 problems: it may round in the wrong direction for negative numbers (OP did not specify) and integers typically have a much smaller range (INT_MIN to INT_MAX) that double.
There are still some cases when like when double x = atof("1.115"); which end up near 1.12 when it really should be 1.11 because 1.115, as a double is really closer to 1.11 and not "half-way".
string x rounded x
1.115 1.1149999999999999911182e+00 1.1200000000000001065814e+00
OP has not specified rounding of negative numbers, assuming y = -f(-x).

C - Summation with Double - Precision

I have problem with precision of double format.
Sample example:
double K=0, L=0, M=0;
scanf("%lf %lf %lf", &K, &L, &M);
if((K+L) <= M) printf("Incorrect input");
else printf("Right, K=%f, L=%f, M=%f", K, L, M);
My test input:
K = 0.1, L = 0.2, M = 0.3 -> Condition but goes to 'else' statement.
How I can correct this difference? Is there any other method to summation?
In the world of Double Precision IEEE 754 binary floating-point format (the ones used on Intel and other processors) 0.1 + 0.2 == 0.30000000000000004 :-) And 0.30000000000000004 != 0.3 (and note that in the marvelous world of doubles, 0.1, 0.2 and 0.3 don't exist as "exact" quantities. There are some double numbers that are very near them, but if you printed them with full precision, they wouldn't be 0.1, 0.2 and 0.3)
To laugh a little, try this: http://pages.cs.wisc.edu/~rkennedy/exact-float
Insert a decimal number and look at the second and third row, it shows how the number is really represented in memory. It's for Delphi, but Double and Single are the same for Delphi and for probably all the C compilers for Intel processors (they are called double and float in C)
And if you want to try for yourself, look at this http://ideone.com/WEL7h
#include <stdio.h>
int main()
{
double d1 = (0.1 + 0.2);
double d2 = 0.3;
printf("%.20e\n%.20e", d1, d2);
return 0;
}
output:
3.00000000000000044409e-01
2.99999999999999988898e-01
(be aware that the output is compiler dependant. Depending on the options, 0.1 + 0.2 could be compiled and rounded to 0.3)
Unlike integer values floating point values are not stored exactly the way you assign values to them. Lets consider the following code:
int i = 1; // this is and always will be 1
float j = 0.03 // this gets stored at least on my machine as something like 0.029999999
Why is this so? Well how many floating point number exist in the interval between 0.1 and 0.2?
An infinite number! So there are values which will get stored as you intended but a hell of a lot of values which will be stored with a small error.
This is the reason why comparing floating point values for equality is not a good idea. Try something like this instead:
float a = 0.3f;
float b = 0.301f;
float threshold = 1e-6;
if( abs(a-b) < threshold )
return true;
else
return false;
There are infinitely many real numbers between any two distinct real numbers. If we were to be able to represent every one of those, we would need infinite memory. Since we only have finite memory, floating point numbers need to be stored with only finite precision. Up to that finite precision, it might be not be true that 0.1 + 0.2 <= 0.3.
Now, you really should go read what's at the other end of the excellent link provided by Paul R.

Resources