I was calculating e^x using Taylor Series and noticed that when we calculate it for negative x absolute error is large.Is it because we don't have enough precision to calculate it?
(I know that to prevent it we can use e^(-x)=1/e^x)
#include <stdio.h>
#include <math.h>
double Exp(double x);
int main(void)
{
double x;
printf("x=");
scanf("%le", &x);
printf("%le", Exp(x));
return 0;
}
double Exp(double x)
{
double h, eps = 1.e-16, Sum = 1.0;
int i = 2;
h = x;
do
{
Sum += h;
h *= x / i;
i++;
} while (fabs(h) > eps);
return Sum ;
}
For example:
x=-40 the value is 4.24835e-18 but programm gives me 3.116952e-01.The absolute error is ~0.311
x=-50 the value is 1.92875e-22 programm gives me 2.041833e+03.The absolute error is ~2041.833
The problem is caused by rounding errors at the middle phase of the algorithm.
The h is growing quickly as 40/2 * 40/3 * 40 / 4 * ... and oscillating in sign. The values for i, h and Sum for x=-40 for consecutive iterations can be found below (some data points omitted for brevity):
x=-40
i=2 h=800 Sum=-39
i=3 h=-10666.7 Sum=761
i=4 h=106667 Sum=-9905.67
i=5 h=-853333 Sum=96761
i=6 h=5.68889e+06 Sum=-756572
...
i=37 h=-1.37241e+16 Sum=6.63949e+15
i=38 h=1.44464e+16 Sum=-7.08457e+15
i=39 h=-1.48168e+16 Sum=7.36181e+15
i=40 h=1.48168e+16 Sum=-7.45499e+15
i=41 h=-1.44554e+16 Sum=7.36181e+15
i=42 h=1.37671e+16 Sum=-7.09361e+15
i=43 h=-1.28066e+16 Sum=6.67346e+15
i=44 h=1.16423e+16 Sum=-6.13311e+15
i=45 h=-1.03487e+16 Sum=5.50923e+15
i=46 h=8.99891e+15 Sum=-4.83952e+15
...
i=97 h=-2610.22 Sum=1852.36
i=98 h=1065.4 Sum=-757.861
i=99 h=-430.463 Sum=307.534
...
i=138 h=1.75514e-16 Sum=0.311695
i=139 h=-5.05076e-17 Sum=0.311695
3.116952e-01
The peak magnitude of sum is 7e15. This is where the precision is lost. Type double can be represented with about 1e-16 accuracy. This gives expected absolute error of about 0.1 - 1.
As the expected sum (value of exp(-40) is close to zero the final absolute error is close to the maximal absolute error of the partial sums.
For x=-50 the peak value of sum is 1.5e20 what gives the absolute error due to finite representation of double at about 1e3 - 1e4 what is close to observed one.
Not much can be fixed without significant changes to algorithm to avoid forming those partial sums. Alternatively, compute exp(-x) as 1/exp(x).
For negative x, adding the alternating +/- terms creates a computational problems even in the first sum of 1.0 + x as the final sum error can be expected to be as bad as the least significant bit of 1.0 or about 1 part in 1016. This implies x_min as in Exp(x_min) == 1.0e-16 is the minimum useful computational value (e.g. x about -36)
A simple solution is to form a good Exp(positive_x) and for negative values ...
double Exp(double x) {
if (x < 0) {
return 1.0 / Exp(-x);
}
...
A good (and simple) Exp(positive_x) computes terms until a term + 1.0 is still 1.0 as additional small terms do not change the sum significantly. Works well for all x (very small error) except could use improvements when the result should be a sub-normal.
double my_exp(double x) {
if (x < 0) {
return 1.0 / my_exp(-x);
}
double sum = 1.0;
unsigned n = 1;
double term = 1.0;
do {
term *= x / n++;
sum += term;
if (!isfinite(term)) {
return term;
}
} while (1.0 != term + 1.0);
return sum;
}
Related
Im trying to calculate the sin(x) using Taylor Series for sin x by formula with the accuracy of 0.00001 (meaning until the sum goes lower than precision of 0.00001).
(x is given by radians).
The problem is that my function to calculate sin (using Taylor series formula) is printing out the same value as given (for example if 7 is given it will print out 7.00000 instead of 0.656987).
tried to debug my code using gdb and couldnt figure out why it stops after first iteration.
Here`s my C code in order to calculate sin (x ) using Taylor series.
double my_sin(double x) {
int i=3,sign=1; // sign variable is meant to be used for - and + operator inside loop.
// i variable will be used for power and factorial division
double sum=x, accuracy=0.000001; // sum is set for first x.
for(i=3;fabs(sum) < accuracy ;i+=2){ // starting from power of 3.
sign*=-1; // sign will change each iteration from - to +.
sum+=sign*(pow(x,i)/factorial(i)); // the formula itself (factorial simple function for division)
}
return (sum);
}
Any help would be appreciated.
Thanks
tried to debug my code using gdb and couldnt figure out why it stops after first iteration.
Well, let's do it again, step by step.
sum = x (input is 7.0, so sum == 7.0).
for(i=3; fabs(sum) < accuracy; i+=2) { ...
Since sum is 7.0, it is not less than accuracy, so the loop body never executes.
return sum; -- sum is still 7.0, so that's what your function returns.
Your program does exactly what you asked it to do.
P.S. Here the code you probably intended to write:
double my_sin(double x) {
double sum = x, accuracy = 0.000001;
double delta = DBL_MAX;
for(int i = 3, sign = -1; accuracy < fabs(delta); i += 2, sign = -sign) {
delta = sign * pow(x, i) / factorial(i);
sum += delta;
}
return sum;
}
I'm trying to calculate the the taylor series of cos(x) with error at most 10^-3 and for all x ∈ [-pi/4, pi/4], that means my error needs to be less than 0.001. I can modify the x +=in the for loop to have different result. I tried several numbers but it never turns to an error less than 0.001.
#include <stdio.h>
#include <math.h>
float cosine(float x, int j)
{
float val = 1;
for (int k = j - 1; k >= 0; --k)
val = 1 - x*x/(2*k+2)/(2*k+1)*val;
return val;
}
int main( void )
{
for( double x = 0; x <= PI/4; x += 0.9999 )
{
if(cosine(x, 2) <= 0.001)
{
printf("cos(x) : %10g %10g %10g\n", x, cos(x), cosine(x, 2));
}
printf("cos(x) : %10g %10g %10g\n", x, cos(x), cosine(x, 2));
}
return 0;
}
I'm also doing this for e^x too. For this part, x must in [-2,2] .
float exponential(int n, float x)
{
float sum = 1.0f; // initialize sum of series
for (int i = n - 1; i > 0; --i )
sum = 1 + x * sum / i;
return sum;
}
int main( void )
{
// change the number of x in for loop so you can have different range
for( float x = -2.0f; x <= 2.0f; x += 1.587 )
{
// change the frist parameter to have different n value
if(exponential(5, x) <= 0.001)
{
printf("e^x = %f\n", exponential(5, x));
}
printf("e^x = %f\n", exponential(5, x));
}
return 0;
}
But whenever I changed the number of terms in the for loop, it always have an error that is greater than 1. How am I suppose to change it to have errors less than 10^-3?
Thanks!
My understanding is that to increase precision, you would need to consider more terms in the Taylor series. For example, consider what happens when
you attempt to calculate e(1) by a Taylor series.
$e(x) = \sum\limits_{n=0}^{\infty} frac{x^n}{n!}$
we can consider the first few terms in the expansion of e(1):
n value of nth term sum
0 x^0/0! = 1 1
1 x^1/1! = 1 2
2 x^2/2! = 0.5 2.5
3 x^3/3! = 0.16667 2.66667
4 x^4/4! = 0.04167 2.70834
You should notice two things, first that as we add more terms we are getting closer to the exact value of e(1), also that the difference between consecutive sums are getting smaller.
So, an implementation of e(x) could be written as:
#include <stdbool.h>
#include <stdio.h>
#include <math.h>
typedef float (*term)(int, int);
float evalSum(int, int, int, term);
float expTerm(int, int);
int fact(int);
int mypow(int, int);
bool sgn(float);
const int maxTerm = 10; // number of terms to evaluate in series
const float epsilon = 0.001; // the accepted error
int main(void)
{
// change these values to modify the range and increment
float start = -2;
float end = 2;
float inc = 1;
for(int x = start; x <= end; x += inc)
{
float value = 0;
float prev = 0;
for(int ndx = 0; ndx < maxTerm; ndx++)
{
value = evalSum(0, ndx, x, expTerm);
float diff = fabs(value-prev);
if((sgn(value) && sgn(prev)) && (diff < epsilon))
break;
else
prev = value;
}
printf("the approximate value of exp(%d) is %f\n", x, value);
}
return 0;
}
I've used as a guess that we will not need to use more then ten terms in the expansion to get to the desired precision, thus the inner for loop is where we loop over values of n in the range [0,10].
Also, we have several lines dedicated to checking if we reach the required precision. First I calculate the absolute value of the difference between the current evaluation and the previous evaluation, and take the absolute difference. Checking if the difference is less than our epsilon value (1E-3) is on of the criteria to exit the loop early. I also needed to check that the sign of of the current and the previous values were the same due to some fluctuation in calculating the value of e(-1), that is what the first clause in the conditional is doing.
float evalSum(int start, int end, int val, term fnct)
{
float sum = 0;
for(int n = start; n <= end; n++)
{
sum += fnct(n, val);
}
return sum;
}
This is a utility function that I wrote to evaluate the first n-terms of a series. start is the starting value (which is this code always 0), and end is the ending value. The final parameter is a pointer to a function that represents how to calculate a given term. In this code, fnct can be a pointer to any function that takes to integer parameters and returns a float.
float expTerm(int n, int x)
{
return (float)mypow(x,n)/(float)fact(n);
}
Buried down in this one-line function is where most of the work happens. This function represents the closed form of a Taylor expansion for e(n). Looking carefully at the above, you should be able to see that we are calculating $\fract{x^n}{n!}$ for a given value of x and n. As a hint, for doing the cosine part you would need to create a function to evaluate the closed for a term in the Taylor expansion of cos. This is given by $(-1)^n\fact{x^{2n}}{(2n)!}$.
int fact(int n)
{
if(0 == n)
return 1; // by defination
else if(1 == n)
return 1;
else
return n*fact(n-1);
}
This is just a standard implementation of the factorial function. Nothing special to see here.
int mypow(int base, int exp)
{
int result = 1;
while(exp)
{
if(exp&1) // b&1 quick check for odd power
{
result *= base;
}
exp >>=1; // exp >>= 1 quick division by 2
base *= base;
}
return result;
}
A custom function for doing exponentiation. We certainly could have used the version from <math.h>, but because I knew we would only be doing integer powers we could write an optimized version. Hint: in doing cosine you probably will need to use the version from <math.h> to work with floating point bases.
bool sgn(float x)
{
if(x < 0) return false;
else return true;
}
An incredibly simple function to determine the sign of a floating point value, returning true is positive and false otherwise.
This code was compiled on my Ubuntu-14.04 using gcc version 4.8.4:
******#crossbow:~/personal/projects$ gcc -std=c99 -pedantic -Wall series.c -o series
******#crossbow:~/personal/projects$ ./series
the approximate value of exp(-2) is 0.135097
the approximate value of exp(-1) is 0.367857
the approximate value of exp(0) is 1.000000
the approximate value of exp(1) is 2.718254
the approximate value of exp(2) is 7.388713
The expected values, as given by using bc are:
******#crossbow:~$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
e(-2)
.13533528323661269189
e(-1)
.36787944117144232159
e(0)
1.00000000000000000000
e(1)
2.71828182845904523536
e(2)
7.38905609893065022723
As you can see, the values are well within the tolerances that you requests. I leave it as an exercise to do the cosine part.
Hope this helps,
-T
exp and cos have power series that converge everywhere on the real line. For any bounded interval, e.g. [-pi/4, pi/4] or [-2, 2], the power series converge not just pointwise, but uniformly to exp and cos.
Pointwise convergence means that for any x in the region, and any epsilon > 0, you can pick a large enough N so that the approximation you get from the first N terms of the taylor series is within epsilon of the true value. However, with pointwise convergence, the N may be small for some x's and large for others, and since there are infinitely many x's there may be no finite N that accommodates them all. For some functions that really is what happens sometimes.
Uniform convergence means that for any epsilon > 0, you can pick a large enough N so that the approximation is within epsilon for EVERY x in the region. That's the kind of approximation that you are looking for, and you are guaranteed that that's the kind of convergence that you have.
In principle you could look at one of the proofs that exp, cos are uniformly convergent on any finite domain, sit down and say "what if we take epsilon = .001, and the regions to be ...", and compute some finite bound on N using a pen and paper. However most of these proofs will use at some steps some estimates that aren't sharp, so the value of N that you compute will be larger than necessary -- maybe a lot larger. It would be simpler to just implement it for N being a variable, then check the values using a for-loop like you did in your code, and see how large you have to make it so that the error is less than .001 everywhere.
So, I can't tell what the right value of N you need to pick is, but the math guarantees that if you keep trying larger values eventually you will find one that works.
I wrote a code for calculating sin using its maclaurin series and it works but when I try to calculate it for large x values and try to offset it by giving a large order N (the length of the sum) - eventually it overflows and doesn't give me correct results. This is the code and I would like to know is there an additional way to optimize it so it works for large x values too (it already works great for small x values and really big N values).
Here is the code:
long double calcMaclaurinPolynom(double x, int N){
long double result = 0;
long double atzeretCounter = 2;
int sign = 1;
long double fraction = x;
for (int i = 0; i <= N; i++)
{
result += sign*fraction;
sign = sign*(-1);
fraction = fraction*((x*x) / ((atzeretCounter)*(atzeretCounter + 1)));
atzeretCounter += 2;
}
return result;
}
The major issue is using the series outside its range where it well converges.
As OP said "converted x to radX = (x*PI)/180" indicates the OP is starting with degrees rather than radians, the OP is in luck. The first step in finding my_sin(x) is range reduction. When starting with degrees, the reduction is exact. So reduce the range before converting to radians.
long double calcMaclaurinPolynom(double x /* degrees */, int N){
// Reduce to range -360 to 360
// This reduction is exact, no round-off error
x = fmod(x, 360);
// Reduce to range -180 to 180
if (x >= 180) {
x -= 180;
x = -x;
} else if (x <= -180) {
x += 180;
x = -x;
}
// Reduce to range -90 to 90
if (x >= 90) {
x = 180 - x;
} else if (x <= -90) {
x = -180 - x;
}
//now convert to radians.
x = x*PI/180;
// continue with regular code
Alternative, if using C11, use remquo(). Search SO for sample code.
As #user3386109 commented above, no need to "convert back to degrees".
[Edit]
With typical summation series, summing the least significant terms first improves the precision of the answer. With OP's code this can be done with
for (int i = N; i >= 0; i--)
Alternatively, rather than iterating a fixed number of times, loop until the term has no significance to the sum. The following uses recursion to sum the least significant terms first. With range reduction in the -90 to 90 range, the number of iterations is not excessive.
static double sin_d_helper(double term, double xx, unsigned i) {
if (1.0 + term == 1.0)
return term;
return term - sin_d_helper(term * xx / ((i + 1) * (i + 2)), xx, i + 2);
}
#include <math.h>
double sin_d(double x_degrees) {
// range reduction and d --> r conversion from above
double x_radians = ...
return x_radians * sin_d_helper(1.0, x_radians * x_radians, 1);
}
You can avoid the sign variable by incorporating it into the fraction update as in (-x*x).
With your algorithm you do not have problems with integer overflow in the factorials.
As soon as x*x < (2*k)*(2*k+1) the error - assuming exact evaluation - is bounded by abs(fraction), i.e., the size of the next term in the series.
For large x the biggest source for errors is truncation resp. floating point errors that are magnified via cancellation of the terms of the alternating series. For k about x/2 the terms around the k-th term have the biggest size and have to be offset by other big terms.
Halving-and-Squaring
One easy method to deal with large x without using the value of pi is to employ the trigonometric theorems where
sin(2*x)=2*sin(x)*cos(x)
cos(2*x)=2*cos(x)^2-1=cos(x)^2-sin(x)^2
and first reduce x by halving, simultaneously evaluating the Maclaurin series for sin(x/2^n) and cos(x/2^n) and then employ trigonometric squaring (literal squaring as complex numbers cos(x)+i*sin(x)) to recover the values for the original argument.
cos(x/2^(n-1)) = cos(x/2^n)^2-sin(x/2^n)^2
sin(x/2^(n-1)) = 2*sin(x/2^n)*cos(x/2^n)
then
cos(x/2^(n-2)) = cos(x/2^(n-1))^2-sin(x/2^(n-1))^2
sin(x/2^(n-2)) = 2*sin(x/2^(n-1))*cos(x/2^(n-1))
etc.
See https://stackoverflow.com/a/22791396/3088138 for the simultaneous computation of sin and cos values, then encapsulate it with
def CosSinForLargerX(x,n):
k=0
while abs(x)>1:
k+=1; x/=2
c,s = getCosSin(x,n)
r2=0
for i in range(k):
s2=s*s; c2=c*c; r2=s2+c2
s = 2*c*s
c = c2-s2
return c/r2,s/r2
So, I'm trying to create a program that calculates cos(x) by using a Taylor approximation.
The program is really simple: The user inputs a parameter x (x being an angle in radians) and a float ε, which is the precision of the value of cos(x).
Basically, the only thing the program has to do is to calculate this sum:
x^0/0! - x^2/2! + x^4/4! - x^6! + x^8/8! - ..., until the terms are smaller than ε, that is, the value for cos(x) it'll be within our range of precision.
So, here's the code:
#include <stdio.h>
/* Calculates cos(x) by using a Taylor approximation:
cos(x) = x^0/(0!) - x^2/(2!) + x^4/(4!) - x^6/(6!) + x^8/(8!) - ... */
int main(void)
{
int k; // dummy variable k
float x, // parameter of cos(x), in radians
epsilon; // precision of cos(x) (cos = sum ± epsilon)
sum, // sum of the terms of the polynomial series
term; // variable that stores each term of the summation
scanf("%f %f", &x, &epsilon);
sum = term = 1, k = 0;
while (term >= epsilon && -term <= epsilon)
// while abs(term) is smaller than epsilon
{
k += 2;
term *= -(x*x)/(k*(k-1));
sum += term;
}
printf("cos(%f) = %f\n", x, sum);
return 0;
}
At first, I tried to solve it by calculating the factorials on a separate variable "fact", though that caused an overflow even with reasonable large values for ε.
To solve this, I noticed that I could just multiply the previous term by -x² / (k(k - 1)), increasing k by 2 in every iteration, to get the next term. I thought that would solve my problem, but then again, it is not working.
The program compiles fine, but for example, if I input:
3.141593 0.001
The output is:
cos(3.141593) = -3.934803
...and that is obviously wrong. Can someone help me?
The bug lies in the condition of your while loop:
while (term >= epsilon && -term <= epsilon)
It's not the correct condition. While it could be fixed by fixing the logic:
while (term >= epsilon || -term >= epsilon)
You should just use the standard floating point abs function, fabs, as it makes the function of your code more obvious:
while (fabs(term) >= epsilon)
After applying that change and compiling your program I used it to compute cos(3.141593) = -1.000004, which is correct.
Just adding to Charliehorse55's answer.
Usually one does an argument reduction using simple trigonometry
cos(x + y) = cos(x)cos(y) - sin(x)sin(y)
sin(x + y) = cos(x)sin(y) + sin(x)cos(y)
to reduce the argument to [0..SmallAngle] range and only then calculate the Taylor expansion.
I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory.
The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.
EDIT
The algorithm should be correct up to 3 decimal places.
Using Taylor series is not the simplest neither the fastest way of doing this. Most professional implementations are using approximating polynomials. I'll show you how to generate one in Maple (it is a computer algebra program), using the Remez algorithm.
For 3 digits of accuracy execute the following commands in Maple:
with(numapprox):
Digits := 8
minimax(ln(x), x = 1 .. 2, 4, 1, 'maxerror')
maxerror
Its response is the following polynomial:
-1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x
With the maximal error of: 0.000061011436
We generated a polynomial which approximates the ln(x), but only inside the [1..2] interval. Increasing the interval is not wise, because that would increase the maximal error even more. Instead of that, do the following decomposition:
So first find the highest power of 2, which is still smaller than the number (See: What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?). That number is actually the base-2 logarithm. Divide with that value, then the result gets into the 1..2 interval. At the end we will have to add n*ln(2) to get the final result.
An example implementation for numbers >= 1:
float ln(float y) {
int log2;
float divisor, x, result;
log2 = msb((int)y); // See: https://stackoverflow.com/a/4970859/6630230
divisor = (float)(1 << log2);
x = y / divisor; // normalized value between [1.0, 2.0]
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
result += ((float)log2) * 0.69314718; // ln(2) = 0.69314718
return result;
}
Although if you plan to use it only in the [1.0, 2.0] interval, then the function is like:
float ln(float x) {
return -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
}
The Taylor series for e^x converges extremely quickly, and you can tune your implementation to the precision that you need. (http://en.wikipedia.org/wiki/Taylor_series)
The Taylor series for log is not as nice...
If you don't need floating-point math for anything else, you may compute an approximate fractional base-2 log pretty easily. Start by shifting your value left until it's 32768 or higher and store the number of times you did that in count. Then, repeat some number of times (depending upon your desired scale factor):
n = (mult(n,n) + 32768u) >> 16; // If a function is available for 16x16->32 multiply
count<<=1;
if (n < 32768) n*=2; else count+=1;
If the above loop is repeated 8 times, then the log base 2 of the number will be count/256. If ten times, count/1024. If eleven, count/2048. Effectively, this function works by computing the integer power-of-two logarithm of n**(2^reps), but with intermediate values scaled to avoid overflow.
Would basic table with interpolation between values approach work? If ranges of values are limited (which is likely for your case - I doubt temperature readings have huge range) and high precisions is not required it may work. Should be easy to test on normal machine.
Here is one of many topics on table representation of functions: Calculating vs. lookup tables for sine value performance?
Necromancing.
I had to implement logarithms on rational numbers.
This is how I did it:
Occording to Wikipedia, there is the Halley-Newton approximation method
which can be used for very-high precision.
Using Newton's method, the iteration simplifies to (implementation), which has cubic convergence to ln(x), which is way better than what the Taylor-Series offers.
// Using Newton's method, the iteration simplifies to (implementation)
// which has cubic convergence to ln(x).
public static double ln(double x, double epsilon)
{
double yn = x - 1.0d; // using the first term of the taylor series as initial-value
double yn1 = yn;
do
{
yn = yn1;
yn1 = yn + 2 * (x - System.Math.Exp(yn)) / (x + System.Math.Exp(yn));
} while (System.Math.Abs(yn - yn1) > epsilon);
return yn1;
}
This is not C, but C#, but I'm sure anybody capable to program in C will be able to deduce the C-Code from that.
Furthermore, since
logn(x) = ln(x)/ln(n).
You have therefore just implemented logN as well.
public static double log(double x, double n, double epsilon)
{
return ln(x, epsilon) / ln(n, epsilon);
}
where epsilon (error) is the minimum precision.
Now as to speed, you're probably better of using the ln-cast-in-hardware, but as I said, I used this as a base to implement logarithms on a rational numbers class working with arbitrary precision.
Arbitrary precision might be more important than speed, under certain circumstances.
Then, use the logarithmic identities for rational numbers:
logB(x/y) = logB(x) - logB(y)
In addition to Crouching Kitten's answer which gave me inspiration, you can build a pseudo-recursive (at most 1 self-call) logarithm to avoid using polynomials. In pseudo code
ln(x) :=
If (x <= 0)
return NaN
Else if (!(1 <= x < 2))
return LN2 * b + ln(a)
Else
return taylor_expansion(x - 1)
This is pretty efficient and precise since on [1; 2) the taylor series converges A LOT faster, and we get such a number 1 <= a < 2 with the first call to ln if our input is positive but not in this range.
You can find 'b' as your unbiased exponent from the data held in the float x, and 'a' from the mantissa of the float x (a is exactly the same float as x, but now with exponent biased_0 rather than exponent biased_b). LN2 should be kept as a macro in hexadecimal floating point notation IMO. You can also use http://man7.org/linux/man-pages/man3/frexp.3.html for this.
Also, the trick
unsigned long tmp = *(ulong*)(&d);
for "memory-casting" double to unsigned long, rather than "value-casting", is very useful to know when dealing with floats memory-wise, as bitwise operators will cause warnings or errors depending on the compiler.
Possible computation of ln(x) and expo(x) in C without <math.h> :
static double expo(double n) {
int a = 0, b = n > 0;
double c = 1, d = 1, e = 1;
for (b || (n = -n); e + .00001 < (e += (d *= n) / (c *= ++a)););
// approximately 15 iterations
return b ? e : 1 / e;
}
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}
Try it Online
Building off #Crouching Kitten's great natural log answer above, if you need it to be accurate for inputs <1 you can add a simple scaling factor. Below is an example in C++ that i've used in microcontrollers. It has a scaling factor of 256 and it's accurate to inputs down to 1/256 = ~0.04, and up to 2^32/256 = 16777215 (due to overflow of a uint32 variable).
It's interesting to note that even on an STMF103 Arm M3 with no FPU, the float implementation below is significantly faster (eg 3x or better) than the 16 bit fixed-point implementation in libfixmath (that being said, this float implementation still takes a few thousand cycles so it's still not ~fast~)
#include <float.h>
float TempSensor::Ln(float y)
{
// Algo from: https://stackoverflow.com/a/18454010
// Accurate between (1 / scaling factor) < y < (2^32 / scaling factor). Read comments below for more info on how to extend this range
float divisor, x, result;
const float LN_2 = 0.69314718; //pre calculated constant used in calculations
uint32_t log2 = 0;
//handle if input is less than zero
if (y <= 0)
{
return -FLT_MAX;
}
//scaling factor. The polynomial below is accurate when the input y>1, therefore using a scaling factor of 256 (aka 2^8) extends this to 1/256 or ~0.04. Given use of uint32_t, the input y must stay below 2^24 or 16777216 (aka 2^(32-8)), otherwise uint_y used below will overflow. Increasing the scaing factor will reduce the lower accuracy bound and also reduce the upper overflow bound. If you need the range to be wider, consider changing uint_y to a uint64_t
const uint32_t SCALING_FACTOR = 256;
const float LN_SCALING_FACTOR = 5.545177444; //this is the natural log of the scaling factor and needs to be precalculated
y = y * SCALING_FACTOR;
uint32_t uint_y = (uint32_t)y;
while (uint_y >>= 1) // Convert the number to an integer and then find the location of the MSB. This is the integer portion of Log2(y). See: https://stackoverflow.com/a/4970859/6630230
{
log2++;
}
divisor = (float)(1 << log2);
x = y / divisor; // FInd the remainder value between [1.0, 2.0] then calculate the natural log of this remainder using a polynomial approximation
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x; //This polynomial approximates ln(x) between [1,2]
result = result + ((float)log2) * LN_2 - LN_SCALING_FACTOR; // Using the log product rule Log(A) + Log(B) = Log(AB) and the log base change rule log_x(A) = log_y(A)/Log_y(x), calculate all the components in base e and then sum them: = Ln(x_remainder) + (log_2(x_integer) * ln(2)) - ln(SCALING_FACTOR)
return result;
}