Adding two doubles gives weird rounding result in C [duplicate] - c

This question already has answers here:
Why adding these two double does not give correct answer? [duplicate]
(2 answers)
Closed 8 years ago.
I'm a bit of C newbie but this problem is really confusing me.
I have a variable double = 436553940.0000000000 (it was cast from an Int) and an other variable double 0.095832496.
My result should be 436553940.0958324*96*, however I get 436553940.0958324*67*.
Why does this happen and how can I prevent it from happening?

The number you expect is simply not representable by a double. The value you receive is instead a close approximation based on rounding results:
In [9]: 436553940.095832496
Out[9]: 436553940.09583247
In [18]: 436553940.095832496+2e-8
Out[18]: 436553940.09583247
In [19]: 436553940.095832496+3e-8
Out[19]: 436553940.0958325
In [20]: 436553940.095832496-2e-8
Out[20]: 436553940.09583247
In [21]: 436553940.095832496-3e-8
Out[21]: 436553940.0958324
You've just run out of significand bits.

Doubles are not able to represent every number. We can write some C++ code (that implements doubles in the same way) to show this.
#include <cstdio>
#include <cmath>
int main() {
double x = 436553940;
double y = 0.095832496;
double sum = x + y;
printf("prev: %50.50lf\n", std::nextafter(sum, 0));
printf("sum: %50.50lf\n", sum);
printf("next: %50.50lf\n", std::nextafter(sum, 500000000));
}
This code computes the sum of the two numbers you are talking about, and stores it as sum. We then compute the next representable double before that number, and after that number.
Here's the output:
[11:43am][wlynch#watermelon /tmp] ./foo
prev: 436553940.09583240747451782226562500000000000000000000000000
sum: 436553940.09583246707916259765625000000000000000000000000000
next: 436553940.09583252668380737304687500000000000000000000000000
So, we are not able to have the calculation equal 436553940.0958324_96_, because that number is not a valid double. So the IEEE-754 standard (and your compiler) defines some rules that tell us how the number should be rounded, to reach the nearest representable double.

Related

Issue with asin() while calculating pi

I am trying to self teach myself C (C99 I think? gcc 8.1.0) coming from python/java. One of the practice problems I am working on is how to calculate pi to a given decimal.
I am currently using the following equation 2 * (Arcsin(sqrt(1 - 0.5^2)) + abs(Arcsin(0.5))).
float pi_find(float nth)
{
float x, y, z;
/* Equation = 2 * (Arcsin(sqrt(1 - x^2)) + abs(Arcsin(x))) [x|-1<=x=>1, xeR]*/
x = sqrt(1-pow(nth, 2)); /* Carrot (^) notation does not work, use pow() */
y = fabs(asin(nth)); /* abs is apparently int only, use fabs for floats */
z = x+y;
printf("x: %f\ny: %f\nsum: %f\n", x, y, (x+y));
printf("%f\n", asin(z));
return 2 * asin(z); /* <- Error Happens */
}
int main()
{
float nth = 0.5f;
double pi = pi_find(nth);
printf("Pi: %f\n", pi);
return 0;
}
Results:
x: 0.866025
y:0.523599
sum: 1.389624
z:-1.#IND00
Pi:-1.#IND00
I know the issue lies in the addition of x + y which sums out to 1.389... and asin() can only handle values between -1 and +1 inclusive.
HOWEVER!
I am using Wolfram Alpha along side python to check the calc is correct at every step and it can calculate asin(1.389...). [1]
I don't understand Imaginary mathematics, it is far beyond my capabilities as a mathematician but below is what Wolfram is doing. [2]
1.570796 -0.8563436 i
Interpreting as: 0.8563436 i
Assuming multiplication | Use a list instead
Assuming i is the imaginary unit | Use i as a variable instead
While writing this I found out about the _Imaginary Datatype added in C99, but I don't really understand if it's doing the same thing as what Wolfram does.
Also looked up how imaginary numbers worked, but I don't really understand how 'The square roots of a negative number cannot be distinguished until one of the two is defined as the imaginary unit' works. [3]
Can someone nudge me in the direction to fix this please?
It is obviously a knowledge issue and not a mathematical or language limitation
p.s yes I know it's trash code, I am using a weird way of debugging before I rewrite it properly.
[1]:Wolfram_Alpha Calculation
[2]:Wolfram_Alpha Assumption
[3]:Imaginary Numbers
The problem is you're grouping the expression incorrectly. The desired expression is:
2 * (Arcsin(sqrt(1 - 0.5^2)) + abs(Arcsin(0.5)))
With nth substituted for 0.5, this becomes:
2 * (Arcsin(sqrt(1 - nth^2)) + abs(Arcsin(nth))).
In particular, the argument to the first Arcsin is sqrt(1 - nth^2)), and the argument to the second Arcsin is nth.
You're also better off using nth * nth rather than pow(nth, 2). It's both faster and more accurate.
So what you want is:
x = asin(sqrt(1 - nth*nth));
y = fabs(asin(nth));
r = 2*(x + y);
Notice that the argument to asin can never have magnitude greater than 1 (as long as nth is less than 1).
Also, as I mentioned earlier in a comment, you should change all your float variables to double. You're using the double-precision math library functions anyway, so there's no reason to discard half of the precision by storing the results in float variables.
In C, the float and double types model "real" numbers, which I'll assume you have a handle on.
In mathematics, "complex" numbers are an extension of the real numbers. Every real number counts as a complex number, but so do "imaginary numbers", which you can get by multiplying the real numbers by the "imaginary unit" (labeled i in mathematical notation, and conventionally described as "the square root of -1").
Mathematically speaking, the basic arithmetic operations (+, -, *, /) are defined on complex numbers. It turns out that you can extend functions like arcsine to operate on complex numbers as well.
Without getting any further into the details, the Wolfram Alpha is almost certainly giving you values from a complex version of arcsine.
However, the standard C function asin() is the un-extended version: it takes a double as an argument, and returns a double as a result. Since double only models real numbers, asin() makes no sense for input values outside [-1,1].

What is wrong with this expression on the code? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
I have written this piece of code in my computer and the result is 7 instead of 8 (the correct result ... I think).
I don't know why... Can somebody help me?
#include <stdio.h>
int main() {
int num;
num = (68/10.0 - 68/10)*10;
printf("the result %d", num);
return 0;
}
double typically represents exactly about 264 different numbers. 68/10.0 is not one of them,
As a binary64, 68/10.0 is about
6.7999999999999998223643161..., the closest value to 6.8 that is a multiple of a dyadic rational. # AntonH
68/10 is an integer division with a quotient of 6.
(68/10.0 - 68/10)*10 is thus about 7.9999999999999982236431606...
Assigning that to an int is 7 not 8 as the fraction is discarded even though it is very close to 8.
When converting a floating point value consider round to the the closest, rather than truncating.
num = lround((68/10.0 - 68/10)*10);

adding numbers in a for loop [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I tried to sum some numbers in a for loop but it didn't go as I expected
float sum = 0;
int i;
printf("0.1+0.1=%f\n", 0.1 + 0.1);
for (i = 0; i<1000000; i++)
{
sum = sum + 0.1;
}
printf("the sum need to be 100000 \n");
printf("the real sum is:\n %f\n", sum);
system("PAUSE");
this program prints:
0.1+0.1=0.200000
the sum need to be 100000
the real sum is:
100958.343750
Press any key to continue . . .
can you explain please this strange result?
the international standard for floating point numbers does not have an exact representation for some decimal numbers.
http://en.wikipedia.org/wiki/IEEE_754
It is due to the way they are stored in memory, the way the mantissa and exponent are stored.
https://en.wikipedia.org/wiki/Floating_point
This is also the reason why you should never compare two float numbers even if they look "the same".
I still remember how surprised I was the fist time a simple code comparing two float numbers didn't work :) This alone would open a dedicated universe of discussions. It is very worth reading anyway:
http://floating-point-gui.de/errors/comparison/
The floating numbers are stored in memory as x*2^y where x is between 0 and 1 with some precision and y is integer and so they accurately don't represent most numbers, they represent numbers "close enough".
When you do this addition multiple times, the error is just more visible.
You can use double type for better accuracy.

Does pow() work for int data type in C? [duplicate]

This question already has answers here:
Strange behaviour of the pow function
(5 answers)
Closed 7 years ago.
I was simply writing a program to calculate the power of an integer. But the output was not as expected. It worked for all the integer numbers except for the power of 5.
My code is:
#include <stdio.h>
#include <math.h>
int main(void)
{
int a,b;
printf("Enter the number.");
scanf("\n%d",&a);
b=pow(a,2);
printf("\n%d",b);
}
The output is something like this:
"Enter the number. 2
4
"Enter the number. 5
24
"Enter the number. 4
16
"Enter the number. 10
99
Can't we use pow() function for int data type??
Floating point precision is doing its job here. The actual working of pow is using log
pow(a, 2) ==> exp(log(a) * 2)
Look at math.h library which says:
###<math.h>
/* Excess precision when using a 64-bit mantissa for FPU math ops can
cause unexpected results with some of the MSVCRT math functions. For
example, unless the function return value is stored (truncating to
53-bit mantissa), calls to pow with both x and y as integral values
sometimes produce a non-integral result. ... */
Just add 0.5 to the return value of pow and then convert it to int.
b = (int)(pow(a,2) + 0.5);
So, the answer to your question
Does pow() work for int data type in C?
Not always. For integer exponentiation you could implement your own function (this will work for 0 and +ve exp only):
unsigned uint_pow(unsigned base, unsigned exp)
{
unsigned result = 1;
while (exp)
{
if (exp % 2)
result *= base;
exp /= 2;
base *= base;
}
return result;
}
there is no int based pow. What you are suffering from is floating point truncation.
an int based pow is too constrained (the range of inputs would quickly overflow an int). In many cases int based pow, like in your case where its powers of 2 can be done efficiently other ways.
printf("%a", pow(10, 2)) and see what you get; I expect you'll see you don't quite get 100. Call lround if you want to round instead of truncating.
The C library function double pow(double x, double y)
It takes double type

Equality test and accuracy of machine [duplicate]

This question already has answers here:
Floating point inaccuracy examples
(7 answers)
C++ floating point precision [duplicate]
(5 answers)
Closed 8 years ago.
I found this code snippet on Page 174, A Book on C -Al Kelley, Ira Pohl .
int main()
{
int cnt=0; double sum=0.0,x;
for( x=0.0 ;x!= 9.9 ;x+=0.1)
{
sum=sum +x;
printf("cnt = %5d\n",cnt++);
}
return 0;
}
and it became a infinite loop as the book said it would. It didnt mention the precise reason except saying that it had to do with the accuracy of the machine.
I modified the code to check if
x=9.9
would ever become true, i.e. x was attaining 9.9 by adding the following lines
diff=x-9.9;
printf("cnt =10%d \a x =%10.10lf dif=%10.10lf \n",++cnt,x,diff);
and i got the following lines among the output
cnt =1098 x =9.7000000000 dif=-0.2000000000
cnt =1099 x =9.8000000000 dif=-0.1000000000
cnt =10100 x =9.9000000000 dif=-0.0000000000
cnt =10101 x =10.0000000000 dif=0.1000000000
cnt =10102 x =10.1000000000 dif=0.2000000000
if x is attaining the value 9.9 exactly , why is it still a infinite loop?
You are simply printing the number with too poor accuracy to notice that it isn't exact. Try something like this:
#include <stdio.h>
int main()
{
double d = 9.9;
if(d == 9.9)
{
printf("Equal!");
}
else
{
printf("Not equal! %.20f", d);
}
}
Output on my machine:
Not equal! 9.90000000000000035527
The book is likely trying to teach you to never use == or != operators to compare floating point variables. Also for the same reason, never use floats as loop iterators.
The problem is that most floating point implementation are based on IEEE 754. See http://en.wikipedia.org/wiki/IEEE_floating_point
The problem with this is, that numbers are builded with base 2 (binary formats).
The number 9.9 can never be build with base 2 excatly.
The "Numerical Computation Guide" by David Goldberg gves an exact statement about it:
Several different representations of real numbers have been proposed,
but by far the most widely used is the floating-point representation.
Floating-point representations have a base b (which is always assumed to
be even) and a precision p. If b = 10 and p = 3, then the number 0.1 is
represented as 1.00 × 10^-1. If b = 2 and p = 24, then the decimal
number 0.1 cannot be represented exactly, but is approximately
1.10011001100110011001101 × 2^-4.
You can safely assume two floating point numbers are never equal 'exactly' (unless one is a copy of the other).
Computer works on binary and floating point, in other words in base 2. Just like base 10, base 2 have numbers that it cannot build. For example, try to write the fraction 10/3 in base 10. You'll end up with infinite 3s. and in Binary, you cannot even write 0.1 (decimal) in binary, you'll also get a recurring pattern 0.0001100110011... (binary).
This video will do better to explain http://www.youtube.com/watch?v=PZRI1IfStY0

Resources