e^-x MacLaurin Series Expansion - c

I am trying to compute the MacLaurin series for e-x = 1 - x + (x2 / 2!) - (x3 / 3!) +...
My values seem to work up to a certain point and then deviate completely. Is there something wrong with rounding or am I using the wrong type of variable for such a question?
int i;
double sum=0;
double x = 8.3;
for(i=0; i<26; i++)
{
sum = sum+ (((pow(-1,i)) * (pow(x,i)))/factorial(i));
printf("Sum = %.12f\n\n\n",sum);
}
return 0;
I don't understand why, but up to the 12th term, the values are correct but after that, it begins to completely differ.

Presumably your factorial function, which you're not showing, is performing integer arithmetic. After 12! you're going to overflow a 32-bit integer. Switch to using double in the factorial function too.

Related

Simple integration that depends on floating point equality

I have the following very-crude integration calculator:
// definite integrate on one variable
// using basic trapezoid approach
float integrate(float start, float end, float step, float (*func)(float x))
{
if (start >= (end-step))
return 0;
else {
float x = start; // make it a bit more math-like
float segment = step * (func(x) + func(x+step))/2;
return segment + integrate(x+step, end, step, func);
}
}
And an example usage:
static float square(float x) {return x*x;}
int main(void)
{
// Integral x^2 from 0->2 should be ~ 2.6
float start=0.0, end=2.0, step=0.01;
float answer = integrate(start, end, step, square);
printf("The integral from %.2f to %.2f for X^2 = %.2f\n", start, end, answer );
}
$ run
The integral from 0.00 to 2.00 for X^2 = 2.67
What happens if the equality check at start >= (end-step) doesn't work? For example, if it evaluates something to 2.99997 instead of 3 and so does another loop (or one less loop). Is there a way to prevent that, or do most math-type calculators just work in decimals or some extension to the 'normal' floating points?
If you are given step, one way to write a loop (and you should use a loop for this, not recursion) is:
float x;
for (float i = 0; (x = start + i*step) < end - step/2; ++i)
…
Some points about this:
We keep an integer count with i. As long as there are a reasonable number of steps, there will be no floating-point rounding error in this. (We could make i and int, but float can count integer values perfectly well, and using float avoids an int-to-float conversion in i*step.)
Instead of incrementing x (or start as it is passed by recursion) repeatedly, we recalculate it each time as start + i*step. This has only two possible rounding errors, in the multiplication and in the addition, so it avoids accumulating errors over repeated additions.
We use end - step/2 as the threshold. This allows us to catch the desired endpoint even if the calculated x drifts as far away from end as end - step/2. And that is about the best we can do, because if it is drifting farther than half a step away from the ideally spaced points, we cannot tell if it has drifted +step/2 from end-step or -step/2 from end.
This presumes that step is an integer division of end-start, or pretty close to it, so that there are a whole number of steps in the loop. If it is not, the loop should be redesigned a bit to stop one step earlier and then calculate a step of partial width at the end.
At the beginning, I mentioned being given step. An alternative is you might be given a number of steps to use, and then the step width would be calculated from that. In that case, we would use an integer number of steps to control the loop. The loop termination condition would not involve floating-point rounding at all. We could calculate x as (float) i / NumberOfSteps * (end-start) + start.
Two improvements can be made easily.
Using recursion is a bad idea. Each additional call creates a new stack frame. For a sufficiently large number of steps, you will trigger a Stack Overflow. Use a loop instead.
Normally, you would avoid the rounding problem by using start, end and n, the number of steps. The location of the kth interval would be at start + k * (end - start) / n;
So you could rewrite your function as
float integrate(float start, float end, int n, float (*func)(float x))
{
float next = start;
float sum = 0.0f;
for(int k = 0; k < n; k++) {
float x = next;
next = start + k * (end - start) / n;
sum += 0.5f * (next - x) * (func(x) + func(next));
}
return sum;
}

Why do the results of my program to find the coordinates of an arc or circle in C differ from the 'correct' results?

I am a real beginner here and I'm really not sure about this. It is a homework assignment.
We have to find 10 (x,y) coordinates of an arc using the radius, a starting angle and an end angle. The program works, but the results it gives differ very slightly from the 'correct' results as required by the automatic checking system. Here's the code and both mine and the systems results based on r=100, angle1=1, anglef=30!
Thanks in advance!
FILE *pf1;
int n=0;
double angulo1, angulof, angulo, radio, x, y;
printf ("\nIntroduce radio : ");
scanf ("%lf", &radio);
printf ("\nIntroduce angulo inicial : ");
scanf ("%lf", &angulo1);
printf ("\nIntroduce angulo final : ");
scanf ("%lf", &angulof);
angulo = ((angulof-angulo1)/9);
pf1 = fopen("salida.txt", "w");
for (n=0; n<=9; n++)
{
x=radio*cos(angulo1+(angulo*n));
y=radio*sin(angulo1+(angulo*n));
fprintf (pf1, "%lf,%lf\n", x,y);
}
return 0;
.
ERROR
■■■ MY FILE:
54.030231,84.147098
-41.614684,90.929743
-98.999250,14.112001
-65.364362,-75.680250
28.366219,-95.892427
96.017029,-27.941550
75.390225,65.698660
-14.550003,98.935825
-91.113026,41.211849
-83.907153,-54.402111
■■■ CORRECT FILE:
54.030231,84.147095
-41.614685,90.929741
-98.999252,14.112000
-65.364365,-75.680252
28.366219,-95.892426
96.017029,-27.941549
75.390228,65.698662
-14.550003,98.935822
-91.113029,41.211849
-83.907150,-54.402111
The post is very confusing:
Calculating angles from 1 to 30 radians in 10 steps doesn't make much sense. Sensible values for radians are from -2π to +2π, outside that range trigonomic functions quickly lose precision. Also, making steps that are much larger than a few degrees is very unusual. To get from 1 to 30 radians in 10 steps, steps of almost 180° are taken.
Some testing of the output reveals that the steps are smaller: from 1 to 10 radians in 10 steps. This is still 57° per step, and goes almost 2 times around the circle.
Reverse engineering the output with atan2(y,x) reveals that the desired output is less precise than the calculation with doubles. So, probably the calculations used 32 bit floats. To test this, one has to be very careful. Internally, floats can get passed as doubles, and the processor works with 80 bits of precision for arithmetic calculations. (Note that on most machines long double has the same precision as double.)
Now, if you call sin on a float, the compiler often calls the double version of sin. To force the float version, one can try to explicitly call them, they have an f appended to the function name: sinf and cosf.
Testing the following with MicroSoft Visual C, 2017 community edition:
#include <math.h>
void test_sinf()
{
float radio = 100;
float angulo1 = 1;
float angulof = 10;
float angulo = (angulof - angulo1) / 9;
float x, y;
int n;
for (n = 0; n < 10; ++n) {
x = radio * cosf(angulo1 + (angulo*n));
y = radio * sinf(angulo1 + (angulo*n));
printf("%.6lf,%.6lf\n", x, y);
}
}
outputs:
54.030228, 84.147095
-41.614685, 90.929741
-98.999252, 14.112000
-65.364365, -75.680252
28.366220, -95.892426
96.017029, -27.941549
75.390228, 65.698662
-14.550003, 98.935822
-91.113022, 41.211849
-83.907150, -54.402115
This leaves the last digit of only 4 of the numbers with a difference. Which suggests a slightly different library/compiler has been used. As the angles and the radius are all integer numbers which can be represented exact with floats, they are an unprobable cause of the differences.
edit: Testing out the suggestion of #gnasher729 is seems he's right. Running the code with the double precision sin and cos, and convering the result to float before printing, gives exactly the "desired" numbers. This probably gives the same results on most compilers for this test case. (32 bit floats are an IEEE standard, and 64 bit trigonomic functions have enough precision to make implementation details disappear after rounding.)
#include <math.h>
void test_sin_converted_to_float()
{
float radio = 100;
float angulo1 = 1;
float angulof = 10;
float angulo = (angulof - angulo1) / 9;
float x, y;
for (int n = 0; n <= 9; ++n) {
x = radio * cos(angulo1 + (angulo*n));
y = radio * sin(angulo1 + (angulo*n));
printf("%.6lf, %.6lf\n", x, y);
}
}
Checking your data with a spreadsheet, it contains exactly what it ought to contain. The data in the second file however look like sine and cosine where calculated with single precision (float).
Which would mean that whoever created this "automatic checking" should be very, very, ashamed of themselves.

Sum of array of floats returns different results [duplicate]

This question already has answers here:
How best to sum up lots of floating point numbers?
(5 answers)
Is floating point math broken?
(31 answers)
Closed 4 years ago.
Here I have a function sum() of type float that takes in a pointer t of type float and an integer size. It returns the sum of all the elements in the array. Then I create two arrays using that function. One that has the BIG value at the first index and one that has it at the last index. When I return the sums of each of those arrays I get different results. This is my code:
#include <stdlib.h>
#include <stdio.h>
#define N 1024
#define SMALL 1.0
#define BIG 100000000.0
float sum(float* t, int size) { // here I define the function sum()
float s = 0.0;
for (int i = 0; i < size; i++) {
s += t[i];
}
return s;
}
int main() {
float tab[N];
for (int i = 0; i < N; i++) {
tab[i] = SMALL;
}
tab[0] = BIG;
float sum1 = sum(tab, N); // initialize sum1 with the big value at index 0
printf("sum1 = %f\n", sum1);
tab[0] = SMALL;
tab[N-1] = BIG;
float sum2 = sum(tab, N); // initialize sum2 with the big value at last index
printf("sum2 = %f\n", sum2);
return 0;
}
After compiling the code and running it I get the following output:
Sum = 100000000.000000
Sum = 100001024.000000
Why do I get different results even though the arrays have the same elements ( but at different indexes ).
What you're experiencing is floating point imprecision. Here's a simple demonstration.
int main() {
float big = 100000000.0;
float small = 1.0;
printf("%f\n", big + small);
printf("%f\n", big + (19 *small));
return 0;
}
You'd expect 100000001.0 and 100000019.0.
$ ./test
100000000.000000
100000016.000000
Why'd that happen? Because computers don't store numbers like we do, floating point numbers doubly so. A float has a size of just 32 bits, but can store numbers up to about 3^38 rather than the just 2^31 a 32 bit integer can. And it can store decimal places. How? They cheat. What it really stores is the sign, an exponent, and a mantissa.
sign * 2^exponent * mantissa
The mantissa is what determines accuracy and there's only 24 bits in a float. So large numbers lose precision.
You can read about exactly how and play around with the representation.
To solve this either use a double which has greater precision, or use an accurate, but slower, arbitrary precision library such as GMP.
Why do I get different results even though the arrays have the same elements
In floating-point math, 100000000.0 + 1.0 equals 100000000.0 and not 100000001.0, but 100000000.0 + 1024.0 does equal 100001024.0. Given the value 100000000.0, the value 1.0 is too small to show up in the available bits used to represent 100000000.0.
So when you put 100000000.0 first, all the later + 1.0 operations have no effect.
When you put 100000000.0 last, though, all the previous 1000+ 1.0 + 1.0 + ... do add up to 1024.0, and 1024.0 is "big enough" to make a difference given the available precision of floating point math.

Losing precision when multiplying Doubles in C

I am trying to get multiplay decimal part of a double number about 500 times. This number starts to lose precision as time goes on. Is there any trick to be able to make the continued multiplication accurate?
double x = 0.3;
double binary = 2.0;
for (i=0; i<500; i++){
x = x * binary;
printf("x equals to : %f",x);
if(x>=1.0)
x = x - 1;
}
Ok after i read some of the things u posted i am thinking how could i remove this unwanted stuff from my number to keep multiplication stable. For instance in my example. My decimal parts will be chaning in such manner: 0.3,0.6,0.2,0.4,0.8... Can we cut the rest to keep this numbers ??
With typical FP is binary64, double x = 0.3; results in x with the value more like 0.29999999999999998890... so code has an difference from the beginning.
Scale x by 10 to stay with exact math - or use a decimal64 double
int main(void) {
double x = 3.0;
double binary = 2.0;
printf("x equals to : %.20f\n",x);
for (int i=0; i<500; i++){
x = x * binary;
printf("x equals to : %.20f\n",x/10);
if(x>=10.0)
x = x - 10;
}
return 0;
}
In general, floating point math is not completely precise, as shown in the other answers and in many online resources. The problem is that certain numbers can not be represented exactly in binary. 0.3 is such a number, but all natural numbers aren't. So you could change your program to this:
double x = 3.0;
double binary = 2.0;
for (i=0; i<500; i++){
x = x * binary;
printf("x equals to : %f",x/10.0);
if(x>=10.0)
x = x - 10.0;
}
Although your program is doing some very unusual things, the main answer to your question is that that is how floating point numbers work. They are imprecise.
http://floating-point-gui.de/basic/

Approximation of arcsin in C

I've got a program that calculates the approximation of an arcsin value based on Taylor's series.
My friend and I have come up with an algorithm which has been able to return the almost "right" values, but I don't think we've done it very crisply. Take a look:
double my_asin(double x)
{
double a = 0;
int i = 0;
double sum = 0;
a = x;
for(i = 1; i < 23500; i++)
{
sum += a;
a = next(a, x, i);
}
}
double next(double a, double x, int i)
{
return a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2)));
}
I checked if my_pow works correctly so there's no need for me to post it here as well. Basically I want the loop to end once the difference between the current and next term is more or equal to my EPSILON (0.00001), which is the precision I'm using when calculating a square root.
This is how I would like it to work:
while(my_abs(prev_term - next_term) >= EPSILON)
But the function double next is dependent on i, so I guess I'd have to increment it in the while statement too. Any ideas how I should go about doing this?
Example output for -1:
$ -1.5675516116e+00
Instead of:
$ -1.5707963268e+00
Thanks so much guys.
Issues with your code and question include:
Your image file showing the Taylor series for arcsin has two errors: There is a minus sign on the x5 term instead of a plus sign, and the power of x is shown as xn but should be x2n+1.
The x factor in the terms of the Taylor series for arcsin increases by x2 in each term, but your formula a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2))) divides by x2 in each term. This does not matter for the particular value -1 you ask about, but it will produce wrong results for other values, except 1.
You ask how to end the loop once the difference in terms is “more or equal to” your epsilon, but, for most values of x, you actually want less than (or, conversely, you want to continue, not end, while the difference is greater than or equal to, as you show in code).
The Taylor series is a poor way to evaluate functions because its error increases as you get farther from the point around which the series is centered. Most math library implementations of functions like this use a minimax series or something related to it.
Evaluating the series from low-order terms to high-order terms causes you to add larger values first, then smaller values later. Due to the nature of floating-point arithmetic, this means that accuracy from the smaller terms is lost, because it is “pushed out” of the width of the floating-point format by the larger values. This effect will limit how accurate any result can be.
Finally, to get directly to your question, the way you have structured the code, you directly update a, so you never have both the previous term and the next term at the same time. Instead, create another double b so that you have an object b for a previous term and an object a for the current term, as shown below.
Example:
double a = x, b, sum = a;
int i = 0;
do
{
b = a;
a = next(a, x, ++i);
sum += a;
} while (abs(b-a) > threshold);
using Taylor series for arcsin is extremly imprecise as the stuff converge very badly and there will be relatively big differencies to the real stuff for finite number of therms. Also using pow with integer exponents is not very precise and efficient.
However using arctan for this is OK
arcsin(x) = arctan(x/sqrt(1-(x*x)));
as its Taylor series converges OK on the <0.0,0.8> range all the other parts of the range can be computed through it (using trigonometric identities). So here my C++ implementation (from my arithmetics template):
T atan (const T &x) // = atan(x)
{
bool _shift=false;
bool _invert=false;
bool _negative=false;
T z,dz,x1,x2,a,b; int i;
x1=x; if (x1<0.0) { _negative=true; x1=-x1; }
if (x1>1.0) { _invert=true; x1=1.0/x1; }
if (x1>0.7) { _shift=true; b=::sqrt(3.0)/3.0; x1=(x1-b)/(1.0+(x1*b)); }
x2=x1*x1;
for (z=x1,a=x1,b=1,i=1;i<1000;i++) // if x1>0.8 convergence is slow
{
a*=x2; b+=2; dz=a/b; z-=dz;
a*=x2; b+=2; dz=a/b; z+=dz;
if (::abs(dz)<zero) break;
}
if (_shift) z+=pi/6.0;
if (_invert) z=0.5*pi-z;
if (_negative) z=-z;
return z;
}
T asin (const T &x) // = asin(x)
{
if (x<=-1.0) return -0.5*pi;
if (x>=+1.0) return +0.5*pi;
return ::atan(x/::sqrt(1.0-(x*x)));
}
Where T is any floating point type (float,double,...). As you can see you need sqrt(x), pi=3.141592653589793238462643383279502884197169399375105, zero=1e-20 and +,-,*,/ operations implemented. The zero constant is the target precision.
So just replace T with float/double and ignore the :: ...
so I guess I'd have to increment it in the while statement too
Yes, this might be a way. And what stops you?
int i=0;
while(condition){
//do something
i++;
}
Another way would be using the for condition:
for(i = 1; i < 23500 && my_abs(prev_term - next_term) >= EPSILON; i++)
Your formula is wrong. Here is the correct formula: http://scipp.ucsc.edu/~haber/ph116A/taylor11.pdf.
P.S. also note that your formula and your series are not correspond to each other.
You can use while like this:
while( std::abs(sum_prev - sum) < 1e-15 )
{
sum_prev = sum;
sum += a;
a = next(a, x, i);
}

Resources