Round-off error when calculating a geometric mean [duplicate] - c

I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be
double geometric_mean(std::vector<double> const&data) // failure
{
auto product = 1.0;
for(auto x:data) product *= x;
return std::pow(product,1.0/data.size());
}
However, this may well fail because of underflow or overflow in the accumulated product (note: long double doesn't really avoid this problem). So, the next option is to sum-up the logarithms:
double geometric_mean(std::vector<double> const&data)
{
auto sumlog = 0.0;
for(auto x:data) sum_log += std::log(x);
return std::exp(sum_log/data.size());
}
This works, but calls std::log() for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product separately?

The "split exponent and mantissa" solution:
double geometric_mean(std::vector<double> const & data)
{
double m = 1.0;
long long ex = 0;
double invN = 1.0 / data.size();
for (double x : data)
{
int i;
double f1 = std::frexp(x,&i);
m*=f1;
ex+=i;
}
return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN);
}
If you are concerned that ex might overflow you can define it as a double instead of a long long, and multiply by invN at every step, but you might lose a lot of precision with this approach.
EDIT For large inputs, we can split the computation in several buckets:
double geometric_mean(std::vector<double> const & data)
{
long long ex = 0;
auto do_bucket = [&data,&ex](int first,int last) -> double
{
double ans = 1.0;
for ( ;first != last;++first)
{
int i;
ans *= std::frexp(data[first],&i);
ex+=i;
}
return ans;
};
const int bucket_size = -std::log2( std::numeric_limits<double>::min() );
std::size_t buckets = data.size() / bucket_size;
double invN = 1.0 / data.size();
double m = 1.0;
for (std::size_t i = 0;i < buckets;++i)
m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN );
m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN );
return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m;
}

I think I figured out a way to do it, it combined the two routines in the question, similar to Peter's idea. Here is an example code.
double geometric_mean(std::vector<double> const&data)
{
const double too_large = 1.e64;
const double too_small = 1.e-64;
double sum_log = 0.0;
double product = 1.0;
for(auto x:data) {
product *= x;
if(product > too_large || product < too_small) {
sum_log+= std::log(product);
product = 1;
}
}
return std::exp((sum_log + std::log(product))/data.size());
}
The bad news is: this comes with a branch. The good news: the branch predictor is likely to get this almost always right (the branch should only rarely be triggered).
The branch could be avoided using Peter's idea of a constant number of terms in the product. The problem with that is that overflow/underflow may still occur within only a few terms, depending on the values.

You may be able to accelerate this by multiplying numbers as in your original solution and only converting to logarithms every certain number of multiplications (depending on the size of your initial numbers).

A different approach which would give better accuracy and performance than the logarithm method would be to compensate out-of-range exponents by a fixed amount, maintaining an exact logarithm of the cancelled excess. Like so:
const int EXP = 64; // maximal/minimal exponent
const double BIG = pow(2, EXP); // overflow threshold
const double SMALL = pow(2, -EXP); // underflow threshold
double product = 1;
int excess = 0; // number of times BIG has been divided out of product
for(int i=0; i<n; i++)
{
product *= A[i];
while(product > BIG)
{
product *= SMALL;
excess++;
}
while(product < SMALL)
{
product *= BIG;
excess--;
}
}
double mean = pow(product, 1.0/n) * pow(BIG, double(excess)/n);
All multiplications by BIG and SMALL are exact, and there's no calls to log (a transcendental, and therefore particularly imprecise, function).

There is simple idea to reduce computation and also to prevent overflow. You can group together numbers say atleast two at time and calculate their log and then evaluate their sum.
log(abcde) = 5*log(K)
log(ab) + log(cde) = 5*log(k)

Summing logs to compute products stably is perfectly fine, and rather efficient (if this is not enough: there are ways to get vectorized logarithms with a few SSE operations -- there are also Intel MKL's vector operations).
To avoid overflow, a common technique is to divide every number by the maximum or minimum magnitude entry beforehand (or sum log differences to the log max or log min). You can also use buckets if the numbers vary a lot (eg. sum the log of small numbers and large numbers separately). Note that typically neither of this is needed except for very large sets since the log of a double is never huge (between say -700 and 700).
Also, you need to keep track of the signs separately.
Computing log x keeps typically the same number of significant digits as x, except when x is close to 1: you want to use std::log1p if you need to compute prod(1 + x_n) with small x_n.
Finally, if you have roundoff error problems when summing, you can use Kahan summation or variants.

Instead of using logarithms, which are very expensive, you can directly scale the results by powers of two.
double geometric_mean(std::vector<double> const&data) {
double huge = scalbn(1,512);
double tiny = scalbn(1,-512);
int scale = 0;
double product = 1.0;
for(auto x:data) {
if (x >= huge) {
x = scalbn(x, -512);
scale++;
} else if (x <= tiny) {
x = scalbn(x, 512);
scale--;
}
product *= x;
if (product >= huge) {
product = scalbn(product, -512);
scale++;
} else if (product <= tiny) {
product = scalbn(product, 512);
scale--;
}
}
return exp2((512.0*scale + log2(product)) / data.size());
}

Related

How to calculate the log2 of integer in C as precisely as possible with bitwise operations

I need to calculate the entropy and due to the limitations of my system I need to use restricted C features (no loops, no floating point support) and I need as much precision as possible. From here I figure out how to estimate the floor log2 of an integer using bitwise operations. Nevertheless, I need to increase the precision of the results. Since no floating point operations are allowed, is there any way to calculate log2(x/y) with x < y so that the result would be something like log2(x/y)*10000, aiming at getting the precision I need through arithmetic integer?
You will base an algorithm on the formula
log2(x/y) = K*(-log(x/y));
where
K = -1.0/log(2.0); // you can precompute this constant before run-time
a = (y-x)/y;
-log(x/y) = a + a^2/2 + a^3/3 + a^4/4 + a^5/5 + ...
If you write the loop correctly—or, if you prefer, unroll the loop to code the same sequence of operations looplessly—then you can handle everything in integer operations:
(y^N*(1*2*3*4*5*...*N)) * (-log(x/y))
= y^(N-1)*(2*3*4*5*...*N)*(y-x) + y^(N-2)*(1*3*4*5*...*N)*(y-x)^2 + ...
Of course, ^, the power operator, binding tighter than *, is not a C operator, but you can implement that efficiently in the context of your (perhaps unrolled) loop as a running product.
The N is an integer large enough to afford desired precision but not so large that it overruns the number of bits you have available. If unsure, then try N = 6 for instance. Regarding K, you might object that that is a floating-point number, but this is not a problem for you because you are going to precompute K, storing it as a ratio of integers.
SAMPLE CODE
This is a toy code but it works for small values of x and y such as 5 and 7, thus sufficing to prove the concept. In the toy code, larger values can silently overflow the default 64-bit registers. More work would be needed to make the code robust.
#include <stddef.h>
#include <stdlib.h>
// Your program will not need the below headers, which are here
// included only for comparison and demonstration.
#include <math.h>
#include <stdio.h>
const size_t N = 6;
const long long Ky = 1 << 10; // denominator of K
// Your code should define a precomputed value for Kx here.
int main(const int argc, const char *const *const argv)
{
// Your program won't include the following library calls but this
// does not matter. You can instead precompute the value of Kx and
// hard-code its value above with Ky.
const long long Kx = lrintl((-1.0/log(2.0))*Ky); // numerator of K
printf("K == %lld/%lld\n", Kx, Ky);
if (argc != 3) exit(1);
// Read x and y from the command line.
const long long x0 = atoll(argv[1]);
const long long y = atoll(argv[2]);
printf("x/y == %lld/%lld\n", x0, y);
if (x0 <= 0 || y <= 0 || x0 > y) exit(1);
// If 2*x <= y, then, to improve accuracy, double x repeatedly
// until 2*x > y. Each doubling offsets the log2 by 1. The offset
// is to be recovered later.
long long x = x0;
int integral_part_of_log2 = 0;
while (1) {
const long long trial_x = x << 1;
if (trial_x > y) break;
x = trial_x;
--integral_part_of_log2;
}
printf("integral_part_of_log2 == %d\n", integral_part_of_log2);
// Calculate the denominator of -log(x/y).
long long yy = 1;
for (size_t j = N; j; --j) yy *= j*y;
// Calculate the numerator of -log(x/y).
long long xx = 0;
{
const long long y_minus_x = y - x;
for (size_t i = N; i; --i) {
long long term = 1;
size_t j = N;
for (; j > i; --j) {
term *= j*y;
}
term *= y_minus_x;
--j;
for (; j; --j) {
term *= j*y_minus_x;
}
xx += term;
}
}
// Convert log to log2.
xx *= Kx;
yy *= Ky;
// Restore the aforementioned offset.
for (; integral_part_of_log2; ++integral_part_of_log2) xx -= yy;
printf("log2(%lld/%lld) == %lld/%lld\n", x0, y, xx, yy);
printf("in floating point, this ratio of integers works out to %g\n",
(1.0*xx)/(1.0*yy));
printf("the CPU's floating-point unit computes the log2 to be %g\n",
log2((1.0*x0)/(1.0*y)));
return 0;
}
Running this on my machine with command-line arguments of 5 7, it outputs:
K == -1477/1024
x/y == 5/7
integral_part_of_log2 == 0
log2(5/7) == -42093223872/86740254720
in floating point, this ratio of integers works out to -0.485279
the CPU's floating-point unit computes the log2 to be -0.485427
Accuracy would be substantially improved by N = 12 and Ky = 1 << 20, but for that you need either thriftier code or more than 64 bits.
THRIFTIER CODE
Thriftier code, wanting more effort to write, might represent numerator and denominator in prime factors. For example, it might represent 500 as [2 0 3], meaning (22)(30)(53).
Yet further improvements might occur to your imagination.
AN ALTERNATE APPROACH
For an alternate approach, though it might not meet your requirements precisely as you have stated them, #phuclv has given the suggestion I would be inclined to follow if your program were mine: work the problem in reverse, guessing a value c/d for the logarithm and then computing 2^(c/d), presumably via a Newton-Raphson iteration. Personally, I like the Newton-Raphson approach better. See sect. 4.8 here (my original).
MATHEMATICAL BACKGROUND
Several sources including mine already linked explain the Taylor series underlying the first approach and the Newton-Raphson iteration of the second approach. The mathematics unfortunately is nontrivial, but there you have it. Good luck.

Optimizing sqrt(n) - sqrt(n-1)

Here is function that I call many times per second:
static inline double calculate_scale(double n) { //n may be int or double
return sqrt(n) - sqrt(n-1);
}
Called in loop like:
for(double i = 0; i < x; i++) {
double scale = calculate_scale(i);
...
}
And it's so slow. What is the best way to optimize this function to get as accurate output as possible?
Parameter n: Starting from 1 up, practically not limited, but mainly used with small numbers in range 1-10. It's integer (whole number), but it may be both int or double, depending on what performs better.
You can try to replace it with the following approximation
sqrt(n) - sqrt(n-1) ==
(sqrt(n) - sqrt(n-1)) * (sqrt(n) + sqrt(n-1)) / (sqrt(n) + sqrt(n-1)) ==
(n - (n + 1)) / (sqrt(n) + sqrt(n-1)) ==
1 / (sqrt(n) + sqrt(n-1))
For large enough n, the last equation is pretty close to 1 / (2 * sqrt(n)). So you only have to call sqrt once. It's also worth noting that even without the approximation, the last expression is more numerically stable in terms of relative error for larger n.
First of all, thanks for all suggestions. I've done some research and found some interesting implementations and facts.
1. In Loop or Using Precomputed table
(thanks #Ulysse BN)
You can optimize loop by simply saving previous sqrt(n) value.
Following example demonstrates this optimization used to setup precomputed table.
/**
* Init variables
* i counter
* x number of cycles (size of table)
* sqrtI1 previous square root = sqrt(i-1)
* ptr Pointer for next value
*/
double i, x = sizeof(precomputed_table) / sizeof(double);
double sqrtI1 = 0;
double* ptr = (double*) precomputed_table;
/**
* Optimized calculation
* In short:
* scale = sqrt(i) - sqrt(i-1)
*/
for(i = 1; i <= x; i++) {
double sqrtI = sqrt(i);
double scale = sqrtI - sqrtI1;
*ptr++ = scale;
sqrtI1 = sqrtI;
}
Using precomputed table is
probably the fastest method, but it's drawback may be that it's size is limited.
static inline double calculate_scale(int n) {
return precomputed_table[n-1];
}
2. Approximation For BIG numbers using Inverse Square Root
Required Inverse (reciprocal) Square Root function rsqrt
This method has most accurate results with big numbers. With small numbers there are errors:
1 2 3 10 100 1000
0.29 0.006 0.0016 0.000056 1.58e-7 4.95e-10
Here is JS code that I used to calculate results above:
function sqrt(x) { return Math.sqrt(x); } function d(x) { return (sqrt(x)-sqrt(x-1))-(0.5/sqrt(x-0.5));} console.log(d(1), d(2), d(3), d(10), d(100), d(1000));
You can also see accuracy compared with two-sqrt version in single graph: https://www.google.com/search?q=(sqrt(x)-sqrt(x-1))-(0.5%2Fsqrt(x-0.5))
Usage:
static inline double calculate_scale(double n) {
//Same as: 0.5 / sqrt(n-0.5)
//but lot faster
return 0.5 * rsqrt(n-0.5);
}
On some older cpus (with slow or no hardware square root) you may go even faster using floats and Fast inverse square root from Quake:
static inline float calculate_scale(float n) {
return 0.5 * Q_rsqrt(n-0.5);
}
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
For more info about implementation, see https://en.wikipedia.org/wiki/Fast_inverse_square_root and http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf . Not recommended to use on modern cpus with hardware reciprocal square root.
Not always solution: 0.5 / sqrt(n-0.5)
Please note that on some processors (eg. ARM Cortex A9, Intel Core2)
division takes nearly same time as hardware square root,
so it's best to use original function with 2 square roots sqrt(n) - sqrt(n-1) OR
reciprocal square root with multiply instead 0.5 * rsqrt(n-0.5) if exist.
3. Using Precomputed table with fallback
This method is good compromise between first 2 solutions.
It has both good accuracy and performance.
static inline double calculate_scale(double n) {
if(n <= sizeof_precomputed_table) {
int nIndex = (int) n;
return precomputed_table[nIndex-1];
}
//Multiply + Inverse Square root
return 0.5 * rsqrt(n-0.5);
//OR
return sqrt(n) - sqrt(n-1);
}
In my case I need really accurate numbers, so my precomputed table size is 2048.
Any feedback is welcomed.
You stated that n is mainly a number smaller than 10. You could possibly use a precomputed table for numbers smaller than 10, or even more since it's cheap, and fallback to real calculations in case of larger numbers.
The code would look something like:
static inline double calculate_scale(double n) { //n may be int or double
if (n <= 10.0 && n == floor(n)) {
return precomputed[(int) n]
}
return sqrt(n) - sqrt(n-1);
}

Flooring floating-point modulo

I want to create a modulo-like function which can work with double-precision floats rather than ints. Another important factor is that the function must round towards negative infinity, rather than zero.
I have a couple of methods which work, but I believe them to be slow for a function which will be called many times in loops:
// A suggested method
double reduce_range(double x, const double max) {
x /= max; // Normalize to [0,1)
x -= (int) x;
x += 1.0;
x -= (int) x;
return x * max; // Denormalize
}
// My own simple implementation
double reduce_range(const double x, const double max) {
return x - floor(x / max) * max;
}
Both seem to work, but the second uses floor (which seems to be a bit of a bottleneck for these sorts of things) and the first repeatedly casts to int and subtracts. Is there not some faster way to do this (or to allow the compiler to take care of it)?
Alternatively, how about this:
double reduce_range(double x, const double max) {
x = fmod(x, max);
if(x < 0) x += max;
return x;
}
Is it going to be greatly slowed down by the branching if?
Edit: some example inputs and outputs:
(5.0, 7.0) >> 5.0
(8.5, 7.0) >> 1.5
(-2.3, 7.0) >> 4.7
If you are worried about the branch, then possibly this might be better, if it's cheaper to load an integer into the fpu:
x += max * (x < 0);

Calculate maclaurin series for sin using C

I wrote a code for calculating sin using its maclaurin series and it works but when I try to calculate it for large x values and try to offset it by giving a large order N (the length of the sum) - eventually it overflows and doesn't give me correct results. This is the code and I would like to know is there an additional way to optimize it so it works for large x values too (it already works great for small x values and really big N values).
Here is the code:
long double calcMaclaurinPolynom(double x, int N){
long double result = 0;
long double atzeretCounter = 2;
int sign = 1;
long double fraction = x;
for (int i = 0; i <= N; i++)
{
result += sign*fraction;
sign = sign*(-1);
fraction = fraction*((x*x) / ((atzeretCounter)*(atzeretCounter + 1)));
atzeretCounter += 2;
}
return result;
}
The major issue is using the series outside its range where it well converges.
As OP said "converted x to radX = (x*PI)/180" indicates the OP is starting with degrees rather than radians, the OP is in luck. The first step in finding my_sin(x) is range reduction. When starting with degrees, the reduction is exact. So reduce the range before converting to radians.
long double calcMaclaurinPolynom(double x /* degrees */, int N){
// Reduce to range -360 to 360
// This reduction is exact, no round-off error
x = fmod(x, 360);
// Reduce to range -180 to 180
if (x >= 180) {
x -= 180;
x = -x;
} else if (x <= -180) {
x += 180;
x = -x;
}
// Reduce to range -90 to 90
if (x >= 90) {
x = 180 - x;
} else if (x <= -90) {
x = -180 - x;
}
//now convert to radians.
x = x*PI/180;
// continue with regular code
Alternative, if using C11, use remquo(). Search SO for sample code.
As #user3386109 commented above, no need to "convert back to degrees".
[Edit]
With typical summation series, summing the least significant terms first improves the precision of the answer. With OP's code this can be done with
for (int i = N; i >= 0; i--)
Alternatively, rather than iterating a fixed number of times, loop until the term has no significance to the sum. The following uses recursion to sum the least significant terms first. With range reduction in the -90 to 90 range, the number of iterations is not excessive.
static double sin_d_helper(double term, double xx, unsigned i) {
if (1.0 + term == 1.0)
return term;
return term - sin_d_helper(term * xx / ((i + 1) * (i + 2)), xx, i + 2);
}
#include <math.h>
double sin_d(double x_degrees) {
// range reduction and d --> r conversion from above
double x_radians = ...
return x_radians * sin_d_helper(1.0, x_radians * x_radians, 1);
}
You can avoid the sign variable by incorporating it into the fraction update as in (-x*x).
With your algorithm you do not have problems with integer overflow in the factorials.
As soon as x*x < (2*k)*(2*k+1) the error - assuming exact evaluation - is bounded by abs(fraction), i.e., the size of the next term in the series.
For large x the biggest source for errors is truncation resp. floating point errors that are magnified via cancellation of the terms of the alternating series. For k about x/2 the terms around the k-th term have the biggest size and have to be offset by other big terms.
Halving-and-Squaring
One easy method to deal with large x without using the value of pi is to employ the trigonometric theorems where
sin(2*x)=2*sin(x)*cos(x)
cos(2*x)=2*cos(x)^2-1=cos(x)^2-sin(x)^2
and first reduce x by halving, simultaneously evaluating the Maclaurin series for sin(x/2^n) and cos(x/2^n) and then employ trigonometric squaring (literal squaring as complex numbers cos(x)+i*sin(x)) to recover the values for the original argument.
cos(x/2^(n-1)) = cos(x/2^n)^2-sin(x/2^n)^2
sin(x/2^(n-1)) = 2*sin(x/2^n)*cos(x/2^n)
then
cos(x/2^(n-2)) = cos(x/2^(n-1))^2-sin(x/2^(n-1))^2
sin(x/2^(n-2)) = 2*sin(x/2^(n-1))*cos(x/2^(n-1))
etc.
See https://stackoverflow.com/a/22791396/3088138 for the simultaneous computation of sin and cos values, then encapsulate it with
def CosSinForLargerX(x,n):
k=0
while abs(x)>1:
k+=1; x/=2
c,s = getCosSin(x,n)
r2=0
for i in range(k):
s2=s*s; c2=c*c; r2=s2+c2
s = 2*c*s
c = c2-s2
return c/r2,s/r2

Approximation of arcsin in C

I've got a program that calculates the approximation of an arcsin value based on Taylor's series.
My friend and I have come up with an algorithm which has been able to return the almost "right" values, but I don't think we've done it very crisply. Take a look:
double my_asin(double x)
{
double a = 0;
int i = 0;
double sum = 0;
a = x;
for(i = 1; i < 23500; i++)
{
sum += a;
a = next(a, x, i);
}
}
double next(double a, double x, int i)
{
return a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2)));
}
I checked if my_pow works correctly so there's no need for me to post it here as well. Basically I want the loop to end once the difference between the current and next term is more or equal to my EPSILON (0.00001), which is the precision I'm using when calculating a square root.
This is how I would like it to work:
while(my_abs(prev_term - next_term) >= EPSILON)
But the function double next is dependent on i, so I guess I'd have to increment it in the while statement too. Any ideas how I should go about doing this?
Example output for -1:
$ -1.5675516116e+00
Instead of:
$ -1.5707963268e+00
Thanks so much guys.
Issues with your code and question include:
Your image file showing the Taylor series for arcsin has two errors: There is a minus sign on the x5 term instead of a plus sign, and the power of x is shown as xn but should be x2n+1.
The x factor in the terms of the Taylor series for arcsin increases by x2 in each term, but your formula a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2))) divides by x2 in each term. This does not matter for the particular value -1 you ask about, but it will produce wrong results for other values, except 1.
You ask how to end the loop once the difference in terms is “more or equal to” your epsilon, but, for most values of x, you actually want less than (or, conversely, you want to continue, not end, while the difference is greater than or equal to, as you show in code).
The Taylor series is a poor way to evaluate functions because its error increases as you get farther from the point around which the series is centered. Most math library implementations of functions like this use a minimax series or something related to it.
Evaluating the series from low-order terms to high-order terms causes you to add larger values first, then smaller values later. Due to the nature of floating-point arithmetic, this means that accuracy from the smaller terms is lost, because it is “pushed out” of the width of the floating-point format by the larger values. This effect will limit how accurate any result can be.
Finally, to get directly to your question, the way you have structured the code, you directly update a, so you never have both the previous term and the next term at the same time. Instead, create another double b so that you have an object b for a previous term and an object a for the current term, as shown below.
Example:
double a = x, b, sum = a;
int i = 0;
do
{
b = a;
a = next(a, x, ++i);
sum += a;
} while (abs(b-a) > threshold);
using Taylor series for arcsin is extremly imprecise as the stuff converge very badly and there will be relatively big differencies to the real stuff for finite number of therms. Also using pow with integer exponents is not very precise and efficient.
However using arctan for this is OK
arcsin(x) = arctan(x/sqrt(1-(x*x)));
as its Taylor series converges OK on the <0.0,0.8> range all the other parts of the range can be computed through it (using trigonometric identities). So here my C++ implementation (from my arithmetics template):
T atan (const T &x) // = atan(x)
{
bool _shift=false;
bool _invert=false;
bool _negative=false;
T z,dz,x1,x2,a,b; int i;
x1=x; if (x1<0.0) { _negative=true; x1=-x1; }
if (x1>1.0) { _invert=true; x1=1.0/x1; }
if (x1>0.7) { _shift=true; b=::sqrt(3.0)/3.0; x1=(x1-b)/(1.0+(x1*b)); }
x2=x1*x1;
for (z=x1,a=x1,b=1,i=1;i<1000;i++) // if x1>0.8 convergence is slow
{
a*=x2; b+=2; dz=a/b; z-=dz;
a*=x2; b+=2; dz=a/b; z+=dz;
if (::abs(dz)<zero) break;
}
if (_shift) z+=pi/6.0;
if (_invert) z=0.5*pi-z;
if (_negative) z=-z;
return z;
}
T asin (const T &x) // = asin(x)
{
if (x<=-1.0) return -0.5*pi;
if (x>=+1.0) return +0.5*pi;
return ::atan(x/::sqrt(1.0-(x*x)));
}
Where T is any floating point type (float,double,...). As you can see you need sqrt(x), pi=3.141592653589793238462643383279502884197169399375105, zero=1e-20 and +,-,*,/ operations implemented. The zero constant is the target precision.
So just replace T with float/double and ignore the :: ...
so I guess I'd have to increment it in the while statement too
Yes, this might be a way. And what stops you?
int i=0;
while(condition){
//do something
i++;
}
Another way would be using the for condition:
for(i = 1; i < 23500 && my_abs(prev_term - next_term) >= EPSILON; i++)
Your formula is wrong. Here is the correct formula: http://scipp.ucsc.edu/~haber/ph116A/taylor11.pdf.
P.S. also note that your formula and your series are not correspond to each other.
You can use while like this:
while( std::abs(sum_prev - sum) < 1e-15 )
{
sum_prev = sum;
sum += a;
a = next(a, x, i);
}

Resources