Optimizing sqrt(n) - sqrt(n-1) - c

Here is function that I call many times per second:
static inline double calculate_scale(double n) { //n may be int or double
return sqrt(n) - sqrt(n-1);
}
Called in loop like:
for(double i = 0; i < x; i++) {
double scale = calculate_scale(i);
...
}
And it's so slow. What is the best way to optimize this function to get as accurate output as possible?
Parameter n: Starting from 1 up, practically not limited, but mainly used with small numbers in range 1-10. It's integer (whole number), but it may be both int or double, depending on what performs better.

You can try to replace it with the following approximation
sqrt(n) - sqrt(n-1) ==
(sqrt(n) - sqrt(n-1)) * (sqrt(n) + sqrt(n-1)) / (sqrt(n) + sqrt(n-1)) ==
(n - (n + 1)) / (sqrt(n) + sqrt(n-1)) ==
1 / (sqrt(n) + sqrt(n-1))
For large enough n, the last equation is pretty close to 1 / (2 * sqrt(n)). So you only have to call sqrt once. It's also worth noting that even without the approximation, the last expression is more numerically stable in terms of relative error for larger n.

First of all, thanks for all suggestions. I've done some research and found some interesting implementations and facts.
1. In Loop or Using Precomputed table
(thanks #Ulysse BN)
You can optimize loop by simply saving previous sqrt(n) value.
Following example demonstrates this optimization used to setup precomputed table.
/**
* Init variables
* i counter
* x number of cycles (size of table)
* sqrtI1 previous square root = sqrt(i-1)
* ptr Pointer for next value
*/
double i, x = sizeof(precomputed_table) / sizeof(double);
double sqrtI1 = 0;
double* ptr = (double*) precomputed_table;
/**
* Optimized calculation
* In short:
* scale = sqrt(i) - sqrt(i-1)
*/
for(i = 1; i <= x; i++) {
double sqrtI = sqrt(i);
double scale = sqrtI - sqrtI1;
*ptr++ = scale;
sqrtI1 = sqrtI;
}
Using precomputed table is
probably the fastest method, but it's drawback may be that it's size is limited.
static inline double calculate_scale(int n) {
return precomputed_table[n-1];
}
2. Approximation For BIG numbers using Inverse Square Root
Required Inverse (reciprocal) Square Root function rsqrt
This method has most accurate results with big numbers. With small numbers there are errors:
1 2 3 10 100 1000
0.29 0.006 0.0016 0.000056 1.58e-7 4.95e-10
Here is JS code that I used to calculate results above:
function sqrt(x) { return Math.sqrt(x); } function d(x) { return (sqrt(x)-sqrt(x-1))-(0.5/sqrt(x-0.5));} console.log(d(1), d(2), d(3), d(10), d(100), d(1000));
You can also see accuracy compared with two-sqrt version in single graph: https://www.google.com/search?q=(sqrt(x)-sqrt(x-1))-(0.5%2Fsqrt(x-0.5))
Usage:
static inline double calculate_scale(double n) {
//Same as: 0.5 / sqrt(n-0.5)
//but lot faster
return 0.5 * rsqrt(n-0.5);
}
On some older cpus (with slow or no hardware square root) you may go even faster using floats and Fast inverse square root from Quake:
static inline float calculate_scale(float n) {
return 0.5 * Q_rsqrt(n-0.5);
}
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
For more info about implementation, see https://en.wikipedia.org/wiki/Fast_inverse_square_root and http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf . Not recommended to use on modern cpus with hardware reciprocal square root.
Not always solution: 0.5 / sqrt(n-0.5)
Please note that on some processors (eg. ARM Cortex A9, Intel Core2)
division takes nearly same time as hardware square root,
so it's best to use original function with 2 square roots sqrt(n) - sqrt(n-1) OR
reciprocal square root with multiply instead 0.5 * rsqrt(n-0.5) if exist.
3. Using Precomputed table with fallback
This method is good compromise between first 2 solutions.
It has both good accuracy and performance.
static inline double calculate_scale(double n) {
if(n <= sizeof_precomputed_table) {
int nIndex = (int) n;
return precomputed_table[nIndex-1];
}
//Multiply + Inverse Square root
return 0.5 * rsqrt(n-0.5);
//OR
return sqrt(n) - sqrt(n-1);
}
In my case I need really accurate numbers, so my precomputed table size is 2048.
Any feedback is welcomed.

You stated that n is mainly a number smaller than 10. You could possibly use a precomputed table for numbers smaller than 10, or even more since it's cheap, and fallback to real calculations in case of larger numbers.
The code would look something like:
static inline double calculate_scale(double n) { //n may be int or double
if (n <= 10.0 && n == floor(n)) {
return precomputed[(int) n]
}
return sqrt(n) - sqrt(n-1);
}

Related

Need help fixing an algorithm that approximates pi

I'm trying to write the C code for an algorithm that approximates pi. It's supposed to get the volume of a cube and the volume of a sphere inside that cube (the sphere's radius is 1/2 of the cube's side). Then I am supposed to divide the cube's volume by the sphere's and multiply by 6 to get pi.
It's working but it's doing something weird in the part that is supposed to get the volumes. I figure it's something to do the with delta I chose for the approximations.
With a cube of side 4 instead of giving me a volume of 64 it's giving me 6400. With the sphere instead of 33 it's giving me 3334. something.
Can someone figure it out? Here is the code (I commented the relevant parts):
#include <stdio.h>
int in_esfera(double x, double y, double z, double r_esfera){
double dist = (x-r_esfera)*(x-r_esfera) + (y-r_esfera)*(y-r_esfera) + (z-r_esfera)*(z-r_esfera);
return dist <= (r_esfera)*(r_esfera) ? 1 : 0;
}
double get_pi(double l_cubo){
double r_esfera = l_cubo/2;
double total = 0;
double esfera = 0;
//this is delta, for the precision. If I set it to 1E anything less than -1 the program continues endlessly. Is this normal?
double delta = (1E-1);
for(double x = 0; x < l_cubo; x+=delta){
printf("x => %f; delta => %.6f\n",x,delta);
for(double y = 0; y <l_cubo; y+=delta){
printf("y => %f; delta => %.6f\n",y,delta);
for(double z = 0; z < l_cubo; z+=delta){
printf("z => %f; delta => %.6f\n",z,delta);
total+=delta;
if(in_esfera(x,y,z,r_esfera))
esfera+=delta;
}
}
}
//attempt at fixing this
//esfera/=delta;
//total/=delta;
//
//This printf displays the volumes. Notice how the place of the point is off. If delta isn't a power of 10 the values are completely wrong.
printf("v_sphere = %.8f; v_cube = %.8f\n",esfera,total);
return (esfera)/(total)*6;
}
void teste_pi(){
double l_cubo = 4;
double pi = get_pi(l_cubo);
printf("%.8f\n",pi);
}
int main(){
teste_pi();
}
total+=delta;
if(in_esfera(x,y,z,r_esfera))
esfera+=delta;
total and esfera are three-dimensional volumes whereas delta is a one-dimensional length. If you were tracking units you'd have m3 on the left and m on the right. The units are incompatible.
To fix it, cube delta so that you're conceptually accumulating tiny cubes instead of tiny lines.
total+=delta*delta*delta;
if(in_esfera(x,y,z,r_esfera))
esfera+=delta*delta*delta;
Doing that fixes the output, and also works for any value of delta:
v_sphere = 33.37400000; v_cube = 64.00000000
3.12881250
Note that this algorithm "works" for arbitrary delta values, but it has severe accuracy issues. It's incredibly prone to rounding problems. It works best when delta is a power of two: 1/64.0 is better than 1/100.0, for example:
v_sphere = 33.50365448; v_cube = 64.00000000
3.14096761
Also, if you want your program to run faster get rid of all those printouts! Or at least the ones in the inner loops...
The thing is that multiplication over integers like a * b * c is the same as adding 1 + 1 + 1 + 1 + ... + 1 a * b * c times, right?
You're adding delta + delta + ... (x / delta) * (y / delta) * (z / delta) times. Or, in other words, (x * y * z) / (delta ** 3) times.
Now, that sum of deltas is the same as this:
delta * (1 + 1 + 1 + 1 + ...)
^^^^^^^^^^^^^^^^^^^^ (x * y * z) / (delta**3) times
So, if delta is a power of 10, (x * y * z) / (delta**3) will be an integer, and it'll be equal to the sum of 1's in parentheses (because it's the same as the product x * y * (z / (delta**3)), where the last term is an integer - see the very first sentence of this answer). Thus, your result will be the following:
delta * ( (x * y * z) / (delta ** 3) ) == (x * y * z) / (delta**2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the sum of ones
That's how you ended up calculating the product divided by delta squared.
To solve this, multiply all volumes by delta * delta.
However, I don't think it's possible to use this logic for deltas that aren't a power of 10. And indeed, the code will go all kinds of haywire for delta == 0.21 and l_cubo == 2, for example: you'll get 9.261000000000061 instead of 8.

Comparing the ratio of two values to 1

I'm working via a basic 'Programming in C' book.
I have written the following code based off of it in order to calculate the square root of a number:
#include <stdio.h>
float absoluteValue (float x)
{
if(x < 0)
x = -x;
return (x);
}
float squareRoot (float x, float epsilon)
{
float guess = 1.0;
while(absoluteValue(guess * guess - x) >= epsilon)
{
guess = (x/guess + guess) / 2.0;
}
return guess;
}
int main (void)
{
printf("SquareRoot(2.0) = %f\n", squareRoot(2.0, .00001));
printf("SquareRoot(144.0) = %f\n", squareRoot(144.0, .00001));
printf("SquareRoot(17.5) = %f\n", squareRoot(17.5, .00001));
return 0;
}
An exercise in the book has said that the current criteria used for termination of the loop in squareRoot() is not suitable for use when computing the square root of a very large or a very small number.
Instead of comparing the difference between the value of x and the value of guess^2, the program should compare the ratio of the two values to 1. The closer this ratio gets to 1, the more accurate the approximation of the square root.
If the ratio is just guess^2/x, shouldn't my code inside of the while loop:
guess = (x/guess + guess) / 2.0;
be replaced by:
guess = ((guess * guess) / x ) / 1 ; ?
This compiles but nothing is printed out into the terminal. Surely I'm doing exactly what the exercise is asking?
To calculate the ratio just do (guess * guess / x) that could be either higher or lower than 1 depending on your implementation. Similarly, your margin of error (in percent) would be absoluteValue((guess * guess / x) - 1) * 100
All they want you to check is how close the square root is. By squaring the number you get and dividing it by the number you took the square root of you are just checking how close you were to the original number.
Example:
sqrt(4) = 2
2 * 2 / 4 = 1 (this is exact so we get 1 (2 * 2 = 4 = 4))
margin of error = (1 - 1) * 100 = 0% margin of error
Another example:
sqrt(4) = 1.999 (lets just say you got this)
1.999 * 1.999 = 3.996
3.996/4 = .999 (so we are close but not exact)
To check margin of error:
.999 - 1 = -.001
absoluteValue(-.001) = .001
.001 * 100 = .1% margin of error
How about applying a little algebra? Your current criterion is:
|guess2 - x| >= epsilon
You are elsewhere assuming that guess is nonzero, so it is algebraically safe to convert that to
|1 - x / guess2| >= epsilon / guess2
epsilon is just a parameter governing how close the match needs to be, and the above reformulation shows that it must be expressed in terms of the floating-point spacing near guess2 to yield equivalent precision for all evaluations. But of course that's not possible because epsilon is a constant. This is, in fact, exactly why the original criterion gets less effective as x diverges from 1.
Let us instead write the alternative expression
|1 - x / guess2| >= delta
Here, delta expresses the desired precision in terms of the spacing of floating point values in the vicinity of 1, which is related to a fixed quantity sometimes called the "machine epsilon". You can directly select the required precision via your choice of delta, and you will get the same precision for all x, provided that no arithmetic operations overflow.
Now just convert that back into code.
Suggest a different point of view.
As this method guess_next = (x/guess + guess) / 2.0;, once the initial approximation is in the neighborhood, the number of bits of accuracy doubles. Example log2(FLT_EPSILON) is about -23, so 6 iterations are needed. (Think 23, 12, 6, 3, 2, 1)
The trouble with using guess * guess is that it may vanish, become 0.0 or infinity for a non-zero x.
To form a quality initial guess:
assert(x > 0.0f);
int expo;
float signif = frexpf(x, &expo);
float guess = ldexpf(signif, expo/2);
Now iterate N times (e.g. 6), (N based on FLT_EPSILON, FLT_DECIMAL_DIG or FLT_DIG.)
for (i=0; i<N; i++) {
guess = (x/guess + guess) / 2.0f;
}
The cost of perhaps an extra iteration is saved by avoiding an expensive termination condition calculation.
If code wants to compare a/b nearest to 1.0f
Simply use some epsilon factor like 1 or 2.
float a = guess;
float b = x/guess;
assert(b);
float q = a/b;
#define FACTOR (1.0f /* some value 1.0f to maybe 2,3 or 4 */)
if (q >= 1.0f - FLT_EPSILON*N && q <= 1.0f + FLT_EPSILON*N) {
close_enough();
}
First lesson in numerical analysis: for floating point numbers x+y has the potential for large relative errors, especially when the sum is near zero, but x*y has very limited relative errors.

How to compute sine wave with accuracy over the time

Use case is to generate a sine wave for digital synthesis, so, we need to compute all values of sin(d t) where:
t is an integer number, representing the sample number. This is variable. Range is from 0 to 158,760,000 for one hour sound of CD quality.
d is double, representing the delta of the angle. This is constant. And the range is: greater than 0 , less than pi.
Goal is to achieve high accuracy with traditional int and double data types. Performance is not important.
Naive implementation is:
double next()
{
t++;
return sin( ((double) t) * (d) );
}
But, the problem is when t increases, accuracy gets reduced because big numbers provided to "sin" function.
An improved version is the following:
double next()
{
d_sum += d;
if (d_sum >= (M_PI*2)) d_sum -= (M_PI*2);
return sin(d_sum);
}
Here, I make sure to provide numbers in range from 0 to 2*pi to the "sin" function.
But, now, the problem is when d is small, there are many small additions which decreases the accuracy every time.
The question here is how to improve the accuracy.
Appendix 1
"accuracy gets reduced because big numbers provided to "sin" function":
#include <stdio.h>
#include <math.h>
#define TEST (300000006.7846112)
#define TEST_MOD (0.0463259891528704262050786960234519968548937998410258872449766)
#define SIN_TEST (0.0463094209176730795999323058165987662490610492247070175523420)
int main()
{
double a = sin(TEST);
double b = sin(TEST_MOD);
printf("a=%0.20f \n" , a);
printf("diff=%0.20f \n" , a - SIN_TEST);
printf("b=%0.20f \n" , b);
printf("diff=%0.20f \n" , b - SIN_TEST);
return 0;
}
Output:
a=0.04630944601888796475
diff=0.00000002510121488442
b=0.04630942091767308033
diff=0.00000000000000000000
You can try an approach that is used is some implementations of fast Fourier transformation. Values of trigonometric function are calculated based on previous values and delta.
Sin(A + d) = Sin(A) * Cos(d) + Cos(A) * Sin(d)
Here we have to store and update cosine value too and store constant (for given delta) factors Cos(d) and Sin(d).
Now about precision: cosine(d) for small d is very close to 1, so there is risk of precision loss (there are only few significant digits in numbers like 0.99999987). To overcome this issue, we can store constant factors as
dc = Cos(d) - 1 = - 2 * Sin(d/2)^2
ds = Sin(d)
using another formulas to update current value
(here sa = Sin(A) for current value, ca = Cos(A) for current value)
ts = sa //remember last values
tc = ca
sa = sa * dc + ca * ds
ca = ca * dc - ts * ds
sa = sa + ts
ca = ca + tc
P.S. Some FFT implementations periodically (every K steps) renew sa and ca values through trig. functions to avoid error accumulation.
Example result. Calculations in doubles.
d=0.000125
800000000 iterations
finish angle 100000 radians
cos sin
described method -0.99936080743598 0.03574879796994
Cos,Sin(100000) -0.99936080743821 0.03574879797202
windows Calc -0.9993608074382124518911354141448
0.03574879797201650931647050069581
sin(x) = sin(x + 2N∙π), so the problem can be boiled down to accurately finding a small number which is equal to a large number x modulo 2π.
For example, –1.61059759 ≅ 256 mod 2π, and you can calculate sin(-1.61059759) with more precision than sin(256)
So let's choose some integer number to work with, 256. First find small numbers which are equal to powers of 256, modulo 2π:
// to be calculated once for a given frequency
// approximate hard-coded numbers for d = 1 below:
double modB = -1.61059759; // = 256 mod (2π / d)
double modC = 2.37724612; // = 256² mod (2π / d)
double modD = -0.89396887; // = 256³ mod (2π / d)
and then split your index as a number in base 256:
// split into a base 256 representation
int a = i & 0xff;
int b = (i >> 8) & 0xff;
int c = (i >> 16) & 0xff;
int d = (i >> 24) & 0xff;
You can now find a much smaller number x which is equal to i modulo 2π/d
// use our smaller constants instead of the powers of 256
double x = a + modB * b + modC * c + modD * d;
double the_answer = sin(d * x);
For different values of d you'll have to calculate different values modB, modC and modD, which are equal to those powers of 256, but modulo (2π / d). You could use a high precision library for these couple of calculations.
Scale up the period to 2^64, and do the multiplication using integer arithmetic:
// constants:
double uint64Max = pow(2.0, 64.0);
double sinFactor = 2 * M_PI / (uint64Max);
// scale the period of the waveform up to 2^64
uint64_t multiplier = (uint64_t) floor(0.5 + uint64Max * d / (2.0 * M_PI));
// multiplication with index (implicitly modulo 2^64)
uint64_t x = i * multiplier;
// scale 2^64 down to 2π
double value = sin((double)x * sinFactor);
As long as your period is not billions of samples, the precision of multiplier will be good enough.
The following code keeps the input to the sin() function within a small range, while somewhat reducing the number of small additions or subtractions due to a potentially very tiny phase increment.
double next() {
t0 += 1.0;
d_sum = t0 * d;
if ( d_sum > 2.0 * M_PI ) {
t0 -= (( 2.0 * M_PI ) / d );
}
return (sin(d_sum));
}
For hyper accuracy, OP has 2 problems:
multiplying d by n and maintaining more precision than double. That is answered in the first part below.
Performing a mod of the period. The simple solution is to use degrees and then mod 360, easy enough to do exactly. To do 2*π of large angles is tricky as it needs a value of 2*π with about 27 more bits of accuracy than (double) 2.0 * M_PI
Use 2 doubles to represent d.
Let us assume 32-bit int and binary64 double. So double has 53-bits of accuracy.
0 <= n <= 158,760,000 which is about 227.2. Since double can handle 53-bit unsigned integers continuously and exactly, 53-28 --> 25, any double with only 25 significant bits can be multiplied by n and still be exact.
Segment d into 2 doubles dmsb,dlsb, the 25-most significant digits and the 28- least.
int exp;
double dmsb = frexp(d, &exp); // exact result
dmsb = floor(dmsb * POW2_25); // exact result
dmsb /= POW2_25; // exact result
dmsb *= pow(2, exp); // exact result
double dlsb = d - dmsb; // exact result
Then each multiplication (or successive addition) of dmsb*n will be exact. (this is the important part.) dlsb*n will only error in its least few bits.
double next()
{
d_sum_msb += dmsb; // exact
d_sum_lsb += dlsb;
double angle = fmod(d_sum_msb, M_PI*2); // exact
angle += fmod(d_sum_lsb, M_PI*2);
return sin(angle);
}
Note: fmod(x,y) results are expected to be exact give exact x,y.
#include <stdio.h>
#include <math.h>
#define AS_n 158760000
double AS_d = 300000006.7846112 / AS_n;
double AS_d_sum_msb = 0.0;
double AS_d_sum_lsb = 0.0;
double AS_dmsb = 0.0;
double AS_dlsb = 0.0;
double next() {
AS_d_sum_msb += AS_dmsb; // exact
AS_d_sum_lsb += AS_dlsb;
double angle = fmod(AS_d_sum_msb, M_PI * 2); // exact
angle += fmod(AS_d_sum_lsb, M_PI * 2);
return sin(angle);
}
#define POW2_25 (1U << 25)
int main(void) {
int exp;
AS_dmsb = frexp(AS_d, &exp); // exact result
AS_dmsb = floor(AS_dmsb * POW2_25); // exact result
AS_dmsb /= POW2_25; // exact result
AS_dmsb *= pow(2, exp); // exact result
AS_dlsb = AS_d - AS_dmsb; // exact result
double y;
for (long i = 0; i < AS_n; i++)
y = next();
printf("%.20f\n", y);
}
Output
0.04630942695385031893
Use degrees
Recommend using degrees as 360 degrees is the exact period and M_PI*2 radians is an approximation. C cannot represent π exactly.
If OP still wants to use radians, for further insight on performing the mod of π, see Good to the Last Bit

Calculate maclaurin series for sin using C

I wrote a code for calculating sin using its maclaurin series and it works but when I try to calculate it for large x values and try to offset it by giving a large order N (the length of the sum) - eventually it overflows and doesn't give me correct results. This is the code and I would like to know is there an additional way to optimize it so it works for large x values too (it already works great for small x values and really big N values).
Here is the code:
long double calcMaclaurinPolynom(double x, int N){
long double result = 0;
long double atzeretCounter = 2;
int sign = 1;
long double fraction = x;
for (int i = 0; i <= N; i++)
{
result += sign*fraction;
sign = sign*(-1);
fraction = fraction*((x*x) / ((atzeretCounter)*(atzeretCounter + 1)));
atzeretCounter += 2;
}
return result;
}
The major issue is using the series outside its range where it well converges.
As OP said "converted x to radX = (x*PI)/180" indicates the OP is starting with degrees rather than radians, the OP is in luck. The first step in finding my_sin(x) is range reduction. When starting with degrees, the reduction is exact. So reduce the range before converting to radians.
long double calcMaclaurinPolynom(double x /* degrees */, int N){
// Reduce to range -360 to 360
// This reduction is exact, no round-off error
x = fmod(x, 360);
// Reduce to range -180 to 180
if (x >= 180) {
x -= 180;
x = -x;
} else if (x <= -180) {
x += 180;
x = -x;
}
// Reduce to range -90 to 90
if (x >= 90) {
x = 180 - x;
} else if (x <= -90) {
x = -180 - x;
}
//now convert to radians.
x = x*PI/180;
// continue with regular code
Alternative, if using C11, use remquo(). Search SO for sample code.
As #user3386109 commented above, no need to "convert back to degrees".
[Edit]
With typical summation series, summing the least significant terms first improves the precision of the answer. With OP's code this can be done with
for (int i = N; i >= 0; i--)
Alternatively, rather than iterating a fixed number of times, loop until the term has no significance to the sum. The following uses recursion to sum the least significant terms first. With range reduction in the -90 to 90 range, the number of iterations is not excessive.
static double sin_d_helper(double term, double xx, unsigned i) {
if (1.0 + term == 1.0)
return term;
return term - sin_d_helper(term * xx / ((i + 1) * (i + 2)), xx, i + 2);
}
#include <math.h>
double sin_d(double x_degrees) {
// range reduction and d --> r conversion from above
double x_radians = ...
return x_radians * sin_d_helper(1.0, x_radians * x_radians, 1);
}
You can avoid the sign variable by incorporating it into the fraction update as in (-x*x).
With your algorithm you do not have problems with integer overflow in the factorials.
As soon as x*x < (2*k)*(2*k+1) the error - assuming exact evaluation - is bounded by abs(fraction), i.e., the size of the next term in the series.
For large x the biggest source for errors is truncation resp. floating point errors that are magnified via cancellation of the terms of the alternating series. For k about x/2 the terms around the k-th term have the biggest size and have to be offset by other big terms.
Halving-and-Squaring
One easy method to deal with large x without using the value of pi is to employ the trigonometric theorems where
sin(2*x)=2*sin(x)*cos(x)
cos(2*x)=2*cos(x)^2-1=cos(x)^2-sin(x)^2
and first reduce x by halving, simultaneously evaluating the Maclaurin series for sin(x/2^n) and cos(x/2^n) and then employ trigonometric squaring (literal squaring as complex numbers cos(x)+i*sin(x)) to recover the values for the original argument.
cos(x/2^(n-1)) = cos(x/2^n)^2-sin(x/2^n)^2
sin(x/2^(n-1)) = 2*sin(x/2^n)*cos(x/2^n)
then
cos(x/2^(n-2)) = cos(x/2^(n-1))^2-sin(x/2^(n-1))^2
sin(x/2^(n-2)) = 2*sin(x/2^(n-1))*cos(x/2^(n-1))
etc.
See https://stackoverflow.com/a/22791396/3088138 for the simultaneous computation of sin and cos values, then encapsulate it with
def CosSinForLargerX(x,n):
k=0
while abs(x)>1:
k+=1; x/=2
c,s = getCosSin(x,n)
r2=0
for i in range(k):
s2=s*s; c2=c*c; r2=s2+c2
s = 2*c*s
c = c2-s2
return c/r2,s/r2

Efficient implementation of natural logarithm (ln) and exponentiation

I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory.
The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.
EDIT
The algorithm should be correct up to 3 decimal places.
Using Taylor series is not the simplest neither the fastest way of doing this. Most professional implementations are using approximating polynomials. I'll show you how to generate one in Maple (it is a computer algebra program), using the Remez algorithm.
For 3 digits of accuracy execute the following commands in Maple:
with(numapprox):
Digits := 8
minimax(ln(x), x = 1 .. 2, 4, 1, 'maxerror')
maxerror
Its response is the following polynomial:
-1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x
With the maximal error of: 0.000061011436
We generated a polynomial which approximates the ln(x), but only inside the [1..2] interval. Increasing the interval is not wise, because that would increase the maximal error even more. Instead of that, do the following decomposition:
So first find the highest power of 2, which is still smaller than the number (See: What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?). That number is actually the base-2 logarithm. Divide with that value, then the result gets into the 1..2 interval. At the end we will have to add n*ln(2) to get the final result.
An example implementation for numbers >= 1:
float ln(float y) {
int log2;
float divisor, x, result;
log2 = msb((int)y); // See: https://stackoverflow.com/a/4970859/6630230
divisor = (float)(1 << log2);
x = y / divisor; // normalized value between [1.0, 2.0]
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
result += ((float)log2) * 0.69314718; // ln(2) = 0.69314718
return result;
}
Although if you plan to use it only in the [1.0, 2.0] interval, then the function is like:
float ln(float x) {
return -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
}
The Taylor series for e^x converges extremely quickly, and you can tune your implementation to the precision that you need. (http://en.wikipedia.org/wiki/Taylor_series)
The Taylor series for log is not as nice...
If you don't need floating-point math for anything else, you may compute an approximate fractional base-2 log pretty easily. Start by shifting your value left until it's 32768 or higher and store the number of times you did that in count. Then, repeat some number of times (depending upon your desired scale factor):
n = (mult(n,n) + 32768u) >> 16; // If a function is available for 16x16->32 multiply
count<<=1;
if (n < 32768) n*=2; else count+=1;
If the above loop is repeated 8 times, then the log base 2 of the number will be count/256. If ten times, count/1024. If eleven, count/2048. Effectively, this function works by computing the integer power-of-two logarithm of n**(2^reps), but with intermediate values scaled to avoid overflow.
Would basic table with interpolation between values approach work? If ranges of values are limited (which is likely for your case - I doubt temperature readings have huge range) and high precisions is not required it may work. Should be easy to test on normal machine.
Here is one of many topics on table representation of functions: Calculating vs. lookup tables for sine value performance?
Necromancing.
I had to implement logarithms on rational numbers.
This is how I did it:
Occording to Wikipedia, there is the Halley-Newton approximation method
which can be used for very-high precision.
Using Newton's method, the iteration simplifies to (implementation), which has cubic convergence to ln(x), which is way better than what the Taylor-Series offers.
// Using Newton's method, the iteration simplifies to (implementation)
// which has cubic convergence to ln(x).
public static double ln(double x, double epsilon)
{
double yn = x - 1.0d; // using the first term of the taylor series as initial-value
double yn1 = yn;
do
{
yn = yn1;
yn1 = yn + 2 * (x - System.Math.Exp(yn)) / (x + System.Math.Exp(yn));
} while (System.Math.Abs(yn - yn1) > epsilon);
return yn1;
}
This is not C, but C#, but I'm sure anybody capable to program in C will be able to deduce the C-Code from that.
Furthermore, since
logn(x) = ln(x)/ln(n).
You have therefore just implemented logN as well.
public static double log(double x, double n, double epsilon)
{
return ln(x, epsilon) / ln(n, epsilon);
}
where epsilon (error) is the minimum precision.
Now as to speed, you're probably better of using the ln-cast-in-hardware, but as I said, I used this as a base to implement logarithms on a rational numbers class working with arbitrary precision.
Arbitrary precision might be more important than speed, under certain circumstances.
Then, use the logarithmic identities for rational numbers:
logB(x/y) = logB(x) - logB(y)
In addition to Crouching Kitten's answer which gave me inspiration, you can build a pseudo-recursive (at most 1 self-call) logarithm to avoid using polynomials. In pseudo code
ln(x) :=
If (x <= 0)
return NaN
Else if (!(1 <= x < 2))
return LN2 * b + ln(a)
Else
return taylor_expansion(x - 1)
This is pretty efficient and precise since on [1; 2) the taylor series converges A LOT faster, and we get such a number 1 <= a < 2 with the first call to ln if our input is positive but not in this range.
You can find 'b' as your unbiased exponent from the data held in the float x, and 'a' from the mantissa of the float x (a is exactly the same float as x, but now with exponent biased_0 rather than exponent biased_b). LN2 should be kept as a macro in hexadecimal floating point notation IMO. You can also use http://man7.org/linux/man-pages/man3/frexp.3.html for this.
Also, the trick
unsigned long tmp = *(ulong*)(&d);
for "memory-casting" double to unsigned long, rather than "value-casting", is very useful to know when dealing with floats memory-wise, as bitwise operators will cause warnings or errors depending on the compiler.
Possible computation of ln(x) and expo(x) in C without <math.h> :
static double expo(double n) {
int a = 0, b = n > 0;
double c = 1, d = 1, e = 1;
for (b || (n = -n); e + .00001 < (e += (d *= n) / (c *= ++a)););
// approximately 15 iterations
return b ? e : 1 / e;
}
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}
Try it Online
Building off #Crouching Kitten's great natural log answer above, if you need it to be accurate for inputs <1 you can add a simple scaling factor. Below is an example in C++ that i've used in microcontrollers. It has a scaling factor of 256 and it's accurate to inputs down to 1/256 = ~0.04, and up to 2^32/256 = 16777215 (due to overflow of a uint32 variable).
It's interesting to note that even on an STMF103 Arm M3 with no FPU, the float implementation below is significantly faster (eg 3x or better) than the 16 bit fixed-point implementation in libfixmath (that being said, this float implementation still takes a few thousand cycles so it's still not ~fast~)
#include <float.h>
float TempSensor::Ln(float y)
{
// Algo from: https://stackoverflow.com/a/18454010
// Accurate between (1 / scaling factor) < y < (2^32 / scaling factor). Read comments below for more info on how to extend this range
float divisor, x, result;
const float LN_2 = 0.69314718; //pre calculated constant used in calculations
uint32_t log2 = 0;
//handle if input is less than zero
if (y <= 0)
{
return -FLT_MAX;
}
//scaling factor. The polynomial below is accurate when the input y>1, therefore using a scaling factor of 256 (aka 2^8) extends this to 1/256 or ~0.04. Given use of uint32_t, the input y must stay below 2^24 or 16777216 (aka 2^(32-8)), otherwise uint_y used below will overflow. Increasing the scaing factor will reduce the lower accuracy bound and also reduce the upper overflow bound. If you need the range to be wider, consider changing uint_y to a uint64_t
const uint32_t SCALING_FACTOR = 256;
const float LN_SCALING_FACTOR = 5.545177444; //this is the natural log of the scaling factor and needs to be precalculated
y = y * SCALING_FACTOR;
uint32_t uint_y = (uint32_t)y;
while (uint_y >>= 1) // Convert the number to an integer and then find the location of the MSB. This is the integer portion of Log2(y). See: https://stackoverflow.com/a/4970859/6630230
{
log2++;
}
divisor = (float)(1 << log2);
x = y / divisor; // FInd the remainder value between [1.0, 2.0] then calculate the natural log of this remainder using a polynomial approximation
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x; //This polynomial approximates ln(x) between [1,2]
result = result + ((float)log2) * LN_2 - LN_SCALING_FACTOR; // Using the log product rule Log(A) + Log(B) = Log(AB) and the log base change rule log_x(A) = log_y(A)/Log_y(x), calculate all the components in base e and then sum them: = Ln(x_remainder) + (log_2(x_integer) * ln(2)) - ln(SCALING_FACTOR)
return result;
}

Resources