c change the exponent of a double - c

Lets say I have a double a = 0.3;. How would I be able to change the exponent of the variable, without using math functions like pow(), or multiplying it manually.
I am guessing I would have to acces the memory addres of the variable using pointers, find the exponent and change it manualy. But how would I accomplish this?
Note, that this is on a 8-bit system, and I am trying to find a faster way to multiply the number by 10^12, 10^9, 10^6 or 10^3.
Best regards!

Note that a*10^3 = a*1000 = a*1024 - a*16 - a*8 = a*2^10 - a*2^4 - a*2^3.
So you can calculate a*10^3 as follows:
Read the 11 exponent bits into int exp
Read the 52 fraction bits into double frac
Calculate double x with exp+10 as the exponent and frac as the fraction
Calculate double y with exp+4 as the exponent and frac as the fraction
Calculate double z with exp+3 as the exponent and frac as the fraction
Calculate the output as x-y-z, and don't forget to add the sign bit if a < 0
You can use a similar method for the other options (a*10^6, a*10^9 and a*10^12)...
Here is how you can do the whole thing in a "clean" manner:
double MulBy1000(double a)
{
double x = a;
double y = a;
double z = a;
unsigned long long* px = (unsigned long long*)&x;
unsigned long long* py = (unsigned long long*)&y;
unsigned long long* pz = (unsigned long long*)&z;
*px += 10ULL << 52;
*py += 4ULL << 52;
*pz += 3ULL << 52;
return x - y - z;
}
Please note that I'm not sure whether or not this code breaks strict-aliasing rules.

Multiplying a number by 10 is the equivalent of
a) Multiplying the original number by 2
b) Multiplying the original number by 8
c) Adding the results of (a) and (b).
This works because to is binary 1010.
One approach would therefore be to increment the exponent (for (a)), add 3 to the exponent (for (b)), then add the results.
To multiply by 10^n, repeat the above n times. Alternatively work out the binary representation of 1,000, 1,000,000, etc, and add the relevant 1s. You may make things easier by noting that 1000 for instance 1024 (for instance) is 1024 - 16 - 8, i.e.
a) Add 10 to the exponent of the original to multiply by 1024
b) Add 4 to the exponent of the original to multiply by 16
c) Add 3 to the exponent of the original to multiply by 8
d) From (a) subtract (b) and (c) to get the answer.
Again, you can do that multiple times for 10^6, 10^9 etc.
For a quick approximation and powers of n which are multiples of 3, just add 10n/3 to the exponent (as 1024 ~= 1000)

For fun a simple recursive solution.
double ScalePower10(double x, unsigned power) {
if (power <= 1) {
if (power == 0) return x;
return x * 10.0;
}
double y = ScalePower10(x, power/2);
y = y*y;
if (power%2) y *= 10.0;
return y;
}

Related

How to compute sine wave with accuracy over the time

Use case is to generate a sine wave for digital synthesis, so, we need to compute all values of sin(d t) where:
t is an integer number, representing the sample number. This is variable. Range is from 0 to 158,760,000 for one hour sound of CD quality.
d is double, representing the delta of the angle. This is constant. And the range is: greater than 0 , less than pi.
Goal is to achieve high accuracy with traditional int and double data types. Performance is not important.
Naive implementation is:
double next()
{
t++;
return sin( ((double) t) * (d) );
}
But, the problem is when t increases, accuracy gets reduced because big numbers provided to "sin" function.
An improved version is the following:
double next()
{
d_sum += d;
if (d_sum >= (M_PI*2)) d_sum -= (M_PI*2);
return sin(d_sum);
}
Here, I make sure to provide numbers in range from 0 to 2*pi to the "sin" function.
But, now, the problem is when d is small, there are many small additions which decreases the accuracy every time.
The question here is how to improve the accuracy.
Appendix 1
"accuracy gets reduced because big numbers provided to "sin" function":
#include <stdio.h>
#include <math.h>
#define TEST (300000006.7846112)
#define TEST_MOD (0.0463259891528704262050786960234519968548937998410258872449766)
#define SIN_TEST (0.0463094209176730795999323058165987662490610492247070175523420)
int main()
{
double a = sin(TEST);
double b = sin(TEST_MOD);
printf("a=%0.20f \n" , a);
printf("diff=%0.20f \n" , a - SIN_TEST);
printf("b=%0.20f \n" , b);
printf("diff=%0.20f \n" , b - SIN_TEST);
return 0;
}
Output:
a=0.04630944601888796475
diff=0.00000002510121488442
b=0.04630942091767308033
diff=0.00000000000000000000
You can try an approach that is used is some implementations of fast Fourier transformation. Values of trigonometric function are calculated based on previous values and delta.
Sin(A + d) = Sin(A) * Cos(d) + Cos(A) * Sin(d)
Here we have to store and update cosine value too and store constant (for given delta) factors Cos(d) and Sin(d).
Now about precision: cosine(d) for small d is very close to 1, so there is risk of precision loss (there are only few significant digits in numbers like 0.99999987). To overcome this issue, we can store constant factors as
dc = Cos(d) - 1 = - 2 * Sin(d/2)^2
ds = Sin(d)
using another formulas to update current value
(here sa = Sin(A) for current value, ca = Cos(A) for current value)
ts = sa //remember last values
tc = ca
sa = sa * dc + ca * ds
ca = ca * dc - ts * ds
sa = sa + ts
ca = ca + tc
P.S. Some FFT implementations periodically (every K steps) renew sa and ca values through trig. functions to avoid error accumulation.
Example result. Calculations in doubles.
d=0.000125
800000000 iterations
finish angle 100000 radians
cos sin
described method -0.99936080743598 0.03574879796994
Cos,Sin(100000) -0.99936080743821 0.03574879797202
windows Calc -0.9993608074382124518911354141448
0.03574879797201650931647050069581
sin(x) = sin(x + 2N∙π), so the problem can be boiled down to accurately finding a small number which is equal to a large number x modulo 2π.
For example, –1.61059759 ≅ 256 mod 2π, and you can calculate sin(-1.61059759) with more precision than sin(256)
So let's choose some integer number to work with, 256. First find small numbers which are equal to powers of 256, modulo 2π:
// to be calculated once for a given frequency
// approximate hard-coded numbers for d = 1 below:
double modB = -1.61059759; // = 256 mod (2π / d)
double modC = 2.37724612; // = 256² mod (2π / d)
double modD = -0.89396887; // = 256³ mod (2π / d)
and then split your index as a number in base 256:
// split into a base 256 representation
int a = i & 0xff;
int b = (i >> 8) & 0xff;
int c = (i >> 16) & 0xff;
int d = (i >> 24) & 0xff;
You can now find a much smaller number x which is equal to i modulo 2π/d
// use our smaller constants instead of the powers of 256
double x = a + modB * b + modC * c + modD * d;
double the_answer = sin(d * x);
For different values of d you'll have to calculate different values modB, modC and modD, which are equal to those powers of 256, but modulo (2π / d). You could use a high precision library for these couple of calculations.
Scale up the period to 2^64, and do the multiplication using integer arithmetic:
// constants:
double uint64Max = pow(2.0, 64.0);
double sinFactor = 2 * M_PI / (uint64Max);
// scale the period of the waveform up to 2^64
uint64_t multiplier = (uint64_t) floor(0.5 + uint64Max * d / (2.0 * M_PI));
// multiplication with index (implicitly modulo 2^64)
uint64_t x = i * multiplier;
// scale 2^64 down to 2π
double value = sin((double)x * sinFactor);
As long as your period is not billions of samples, the precision of multiplier will be good enough.
The following code keeps the input to the sin() function within a small range, while somewhat reducing the number of small additions or subtractions due to a potentially very tiny phase increment.
double next() {
t0 += 1.0;
d_sum = t0 * d;
if ( d_sum > 2.0 * M_PI ) {
t0 -= (( 2.0 * M_PI ) / d );
}
return (sin(d_sum));
}
For hyper accuracy, OP has 2 problems:
multiplying d by n and maintaining more precision than double. That is answered in the first part below.
Performing a mod of the period. The simple solution is to use degrees and then mod 360, easy enough to do exactly. To do 2*π of large angles is tricky as it needs a value of 2*π with about 27 more bits of accuracy than (double) 2.0 * M_PI
Use 2 doubles to represent d.
Let us assume 32-bit int and binary64 double. So double has 53-bits of accuracy.
0 <= n <= 158,760,000 which is about 227.2. Since double can handle 53-bit unsigned integers continuously and exactly, 53-28 --> 25, any double with only 25 significant bits can be multiplied by n and still be exact.
Segment d into 2 doubles dmsb,dlsb, the 25-most significant digits and the 28- least.
int exp;
double dmsb = frexp(d, &exp); // exact result
dmsb = floor(dmsb * POW2_25); // exact result
dmsb /= POW2_25; // exact result
dmsb *= pow(2, exp); // exact result
double dlsb = d - dmsb; // exact result
Then each multiplication (or successive addition) of dmsb*n will be exact. (this is the important part.) dlsb*n will only error in its least few bits.
double next()
{
d_sum_msb += dmsb; // exact
d_sum_lsb += dlsb;
double angle = fmod(d_sum_msb, M_PI*2); // exact
angle += fmod(d_sum_lsb, M_PI*2);
return sin(angle);
}
Note: fmod(x,y) results are expected to be exact give exact x,y.
#include <stdio.h>
#include <math.h>
#define AS_n 158760000
double AS_d = 300000006.7846112 / AS_n;
double AS_d_sum_msb = 0.0;
double AS_d_sum_lsb = 0.0;
double AS_dmsb = 0.0;
double AS_dlsb = 0.0;
double next() {
AS_d_sum_msb += AS_dmsb; // exact
AS_d_sum_lsb += AS_dlsb;
double angle = fmod(AS_d_sum_msb, M_PI * 2); // exact
angle += fmod(AS_d_sum_lsb, M_PI * 2);
return sin(angle);
}
#define POW2_25 (1U << 25)
int main(void) {
int exp;
AS_dmsb = frexp(AS_d, &exp); // exact result
AS_dmsb = floor(AS_dmsb * POW2_25); // exact result
AS_dmsb /= POW2_25; // exact result
AS_dmsb *= pow(2, exp); // exact result
AS_dlsb = AS_d - AS_dmsb; // exact result
double y;
for (long i = 0; i < AS_n; i++)
y = next();
printf("%.20f\n", y);
}
Output
0.04630942695385031893
Use degrees
Recommend using degrees as 360 degrees is the exact period and M_PI*2 radians is an approximation. C cannot represent π exactly.
If OP still wants to use radians, for further insight on performing the mod of π, see Good to the Last Bit

C program to get average of 2 numbers in a 16 bit processor

How do we write a program in C which can calculate an average of 2 16 bit signed numbers on a 16 bit processor.
int getAverage(int x, int y)
{
int result=0;
result = ((x+y)/2);
return result;
}
The above works for most cases except for when both x and y are max values 65535.
In the case where both x and y are positive or negative numbers, I would divide the difference between the numbers by 2 and add that result to the number that is subtracted. Mathematically, this is equivalent to what you currently have:
(y - x)/2 + x = y/2 - x/2 + x = y/2 + x/2 = (x + y)/2
If x is positive and y is negative or vice versa, the original method of calculation that you have should be used.
Simplest possible solution with some crude integer rounding:
int32_t getAverage (int16_t x, int16_t y)
{
int32_t sum = (int32_t)x + (int32_t)y;
return sum/2 + sum%2;
}
This will work just fine since your 16 bit compiler will have software routines to handle 32 bit integers.

How can you easily calculate the square root of an unsigned long long in C?

I was looking at another question (here) where someone was looking for a way to get the square root of a 64 bit integer in x86 assembly.
This turns out to be very simple. The solution is to convert to a floating point number, calculate the sqrt and then convert back.
I need to do something very similar in C however when I look into equivalents I'm getting a little stuck. I can only find a sqrt function which takes in doubles. Doubles do not have the precision to store large 64bit integers without introducing significant rounding error.
Is there a common math library that I can use which has a long double sqrt function?
There is no need for long double; the square root can be calculated with double (if it is IEEE-754 64-bit binary). The rounding error in converting a 64-bit integer to double is nearly irrelevant in this problem.
The rounding error is at most one part in 253. This causes an error in the square root of at most one part in 254. The sqrt itself has a rounding error of less than one part in 253, due to rounding the mathematical result to the double format. The sum of these errors is tiny; the largest possible square root of a 64-bit integer (rounded to 53 bits) is 232, so an error of three parts in 254 is less than .00000072.
For a uint64_t x, consider sqrt(x). We know this value is within .00000072 of the exact square root of x, but we do not know its direction. If we adjust it to sqrt(x) - 0x1p-20, then we know we have a value that is less than, but very close to, the square root of x.
Then this code calculates the square root of x, truncated to an integer, provided the operations conform to IEEE 754:
uint64_t y = sqrt(x) - 0x1p-20;
if (2*y < x - y*y)
++y;
(2*y < x - y*y is equivalent to (y+1)*(y+1) <= x except that it avoids wrapping the 64-bit integer if y+1 is 232.)
Function sqrtl(), taking a long double, is part of C99.
Note that your compilation platform does not have to implement long double as 80-bit extended-precision. It is only required to be as wide as double, and Visual Studio implements is as a plain double. GCC and Clang do compile long double to 80-bit extended-precision on Intel processors.
Yes, the standard library has sqrtl() (since C99).
If you only want to calculate sqrt for integers, using divide and conquer should find the result in max 32 iterations:
uint64_t mysqrt (uint64_t a)
{
uint64_t min=0;
//uint64_t max=1<<32;
uint64_t max=((uint64_t) 1) << 32; //chux' bugfix
while(1)
{
if (max <= 1 + min)
return min;
uint64_t sqt = min + (max - min)/2;
uint64_t sq = sqt*sqt;
if (sq == a)
return sqt;
if (sq > a)
max = sqt;
else
min = sqt;
}
Debugging is left as exercise for the reader.
Here we collect several observations in order to arrive to a solution:
In standard C >= 1999, it is garanted that non-netative integers have a representation in bits as one would expected for any base-2 number.
----> Hence, we can trust in bit manipulation of this type of numbers.
If x is a unsigned integer type, tnen x >> 1 == x / 2 and x << 1 == x * 2.
(!) But: It is very probable that bit operations shall be done faster than their arithmetical counterparts.
sqrt(x) is mathematically equivalent to exp(log(x)/2.0).
If we consider truncated logarithms and base-2 exponential for integers, we could obtain a fair estimate: IntExp2( IntLog2(x) / 2) "==" IntSqrtDn(x), where "=" is informal notation meaning almost equatl to (in the sense of a good approximation).
If we write IntExp2( IntLog2(x) / 2 + 1) "==" IntSqrtUp(x), we obtain an "above" approximation for the integer square root.
The approximations obtained in (4.) and (5.) are a little rough (they enclose the true value of sqrt(x) between two consecutive powers of 2), but they could be a very well starting point for any algorithm that searchs for the square roor of x.
The Newton algorithm for square root could be work well for integers, if we have a good first approximation to the real solution.
http://en.wikipedia.org/wiki/Integer_square_root
The final algorithm needs some mathematical comprobations to be plenty sure that always work properly, but I will not do it right now... I will show you the final program, instead:
#include <stdio.h> /* For printf()... */
#include <stdint.h> /* For uintmax_t... */
#include <math.h> /* For sqrt() .... */
int IntLog2(uintmax_t n) {
if (n == 0) return -1; /* Error */
int L;
for (L = 0; n >>= 1; L++)
;
return L; /* It takes < 64 steps for long long */
}
uintmax_t IntExp2(int n) {
if (n < 0)
return 0; /* Error */
uintmax_t E;
for (E = 1; n-- > 0; E <<= 1)
;
return E; /* It takes < 64 steps for long long */
}
uintmax_t IntSqrtDn(uintmax_t n) { return IntExp2(IntLog2(n) / 2); }
uintmax_t IntSqrtUp(uintmax_t n) { return IntExp2(IntLog2(n) / 2 + 1); }
int main(void) {
uintmax_t N = 947612934; /* Try here your number! */
uintmax_t sqrtn = IntSqrtDn(N), /* 1st approx. to sqrt(N) by below */
sqrtn0 = IntSqrtUp(N); /* 1st approx. to sqrt(N) by above */
/* The following means while( abs(sqrt-sqrt0) > 1) { stuff... } */
/* However, we take care of subtractions on unsigned arithmetic, just in case... */
while ( (sqrtn > sqrtn0 + 1) || (sqrtn0 > sqrtn+1) )
sqrtn0 = sqrtn, sqrtn = (sqrtn0 + N/sqrtn0) / 2; /* Newton iteration */
printf("N==%llu, sqrt(N)==%g, IntSqrtDn(N)==%llu, IntSqrtUp(N)==%llu, sqrtn==%llu, sqrtn*sqrtn==%llu\n\n",
N, sqrt(N), IntSqrtDn(N), IntSqrtUp(N), sqrtn, sqrtn*sqrtn);
return 0;
}
The last value stored in sqrtn is the integer square root of N.
The last line of the program just shows all the values, with comprobation purposes.
So, you can try different values of Nand see what happens.
If we add a counter inside the while-loop, we'll see that no more than a few iterations happen.
Remark: It is necessary to verify that the condition abs(sqrtn-sqrtn0)<=1 is always achieved when working in the integer-number setting. If not, we shall have to fix the algorithm.
Remark2: In the initialization sentences, observe that sqrtn0 == sqrtn * 2 == sqrtn << 1. This avoids us some calculations.
// sqrt_i64 returns the integer square root of v.
int64_t sqrt_i64(int64_t v) {
uint64_t q = 0, b = 1, r = v;
for( b <<= 62; b > 0 && b > r; b >>= 2);
while( b > 0 ) {
uint64_t t = q + b;
q >>= 1;
if( r >= t ) {
r -= t;
q += b;
}
b >>= 2;
}
return q;
}
The for loop may be optimized by using the clz machine code instruction.

bitwise division by multiples of 2

I found many posts about bitwise division and I completely understand most bitwise usage but I can't think of a specific division. I want to divide a given number (lets say 100) with all the multiples of 2 possible (ATTENTION: I don't want to divide with powers of 2 bit multiples!)
For example: 100/2, 100/4, 100/6, 100/8, 100/10...100/100
Also I know that because of using unsigned int the answers will be rounded for example 100/52=0 but it doesn't really matter, because I can both skip those answers or print them, no problem. My concern is mostly how I can divide with 6 or 10, etc. (multiples of 2). There is need for it to be done in C, because I can manage to transform any code you give me from Java to C.
Following the math shown for the accepted solution to the division by 3 question, you can derive a recurrence for the division algorithm:
To compute (int)(X / Y)
Let k be such that 2k &geq; Y and 2k-1 < Y
(note, 2k = (1 << k))
Let d = 2k - Y
Then, if A = (int)(X / 2k) and B = X % 2k,
X = (1 << k) * A + B
= (1 << k) * A - Y * A + Y * A + B
= d * A + Y * A + B
= Y * A + (d * A + B)
Thus,
X/Y = A + (d * A + B)/Y
In otherwords,
If S(X, Y) := X/Y, then S(X, Y) := A + S(d * A + B, Y).
This recurrence can be implemented with a simple loop. The stopping condition for the loop is when the numerator falls below 2k. The function divu implements the recurrence, using only bitwise operators and using unsigned types. Helper functions for the math operations are left unimplemented, but shouldn't be too hard (the linked answer provides a full add implementation already). The rs() function is for "right-shift", which does sign extension on the unsigned input. The function div is the actual API for int, and checks for divide by zero and negative y before delegating to divu. negate does 2's complement negation.
static unsigned divu (unsigned x, unsigned y) {
unsigned k = 0;
unsigned pow2 = 0;
unsigned mask = 0;
unsigned diff = 0;
unsigned sum = 0;
while ((1 << k) < y) k = add(k, 1);
pow2 = (1 << k);
mask = sub(pow2, 1);
diff = sub(pow2, y);
while (x >= pow2) {
sum = add(sum, rs(x, k));
x = add(mul(diff, rs(x, k)), (x & mask));
}
if (x >= y) sum = add(sum, 1);
return sum;
}
int div (int x, int y) {
assert(y);
if (y > 0) return divu(x, y);
return negate(divu(x, negate(y)));
}
This implementation depends on signed int using 2's complement. For maximal portability, div should convert negative arguments to 2's complement before calling divu. Then, it should convert the result from divu back from 2's complement to the native signed representation.
The following code works for positive numbers. When the dividend or the divisor or both are negative, have flags to change the sign of the answer appropriately.
int divi(long long m, long long n)
{
if(m==0 || n==0 || m<n)
return 0;
long long a,b;
int f=0;
a=n;b=1;
while(a<=m)
{
b = b<<1;
a = a<<1;
f=1;
}
if(f)
{
b = b>>1;
a = a>>1;
}
b = b + divi(m-a,n);
return b;
}
Use the operator / for integer division as much as you can.
For instance, when you want to divide 100 by 6 or 10 you should write 100/6 or 100/10.
When you mention bit wise division do you (1) mean an implementation of operator / or (2) you are referring to the division by a power of two number.
For (1) a processor should have an integer division unit. If not the compiler should provide a good implementation.
For (2) you can use 100>>2 instead of 100/4. If the numerator is known at compile time then a good compiler should automatically use the shift instruction.

Efficient implementation of natural logarithm (ln) and exponentiation

I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory.
The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.
EDIT
The algorithm should be correct up to 3 decimal places.
Using Taylor series is not the simplest neither the fastest way of doing this. Most professional implementations are using approximating polynomials. I'll show you how to generate one in Maple (it is a computer algebra program), using the Remez algorithm.
For 3 digits of accuracy execute the following commands in Maple:
with(numapprox):
Digits := 8
minimax(ln(x), x = 1 .. 2, 4, 1, 'maxerror')
maxerror
Its response is the following polynomial:
-1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x
With the maximal error of: 0.000061011436
We generated a polynomial which approximates the ln(x), but only inside the [1..2] interval. Increasing the interval is not wise, because that would increase the maximal error even more. Instead of that, do the following decomposition:
So first find the highest power of 2, which is still smaller than the number (See: What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?). That number is actually the base-2 logarithm. Divide with that value, then the result gets into the 1..2 interval. At the end we will have to add n*ln(2) to get the final result.
An example implementation for numbers >= 1:
float ln(float y) {
int log2;
float divisor, x, result;
log2 = msb((int)y); // See: https://stackoverflow.com/a/4970859/6630230
divisor = (float)(1 << log2);
x = y / divisor; // normalized value between [1.0, 2.0]
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
result += ((float)log2) * 0.69314718; // ln(2) = 0.69314718
return result;
}
Although if you plan to use it only in the [1.0, 2.0] interval, then the function is like:
float ln(float x) {
return -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
}
The Taylor series for e^x converges extremely quickly, and you can tune your implementation to the precision that you need. (http://en.wikipedia.org/wiki/Taylor_series)
The Taylor series for log is not as nice...
If you don't need floating-point math for anything else, you may compute an approximate fractional base-2 log pretty easily. Start by shifting your value left until it's 32768 or higher and store the number of times you did that in count. Then, repeat some number of times (depending upon your desired scale factor):
n = (mult(n,n) + 32768u) >> 16; // If a function is available for 16x16->32 multiply
count<<=1;
if (n < 32768) n*=2; else count+=1;
If the above loop is repeated 8 times, then the log base 2 of the number will be count/256. If ten times, count/1024. If eleven, count/2048. Effectively, this function works by computing the integer power-of-two logarithm of n**(2^reps), but with intermediate values scaled to avoid overflow.
Would basic table with interpolation between values approach work? If ranges of values are limited (which is likely for your case - I doubt temperature readings have huge range) and high precisions is not required it may work. Should be easy to test on normal machine.
Here is one of many topics on table representation of functions: Calculating vs. lookup tables for sine value performance?
Necromancing.
I had to implement logarithms on rational numbers.
This is how I did it:
Occording to Wikipedia, there is the Halley-Newton approximation method
which can be used for very-high precision.
Using Newton's method, the iteration simplifies to (implementation), which has cubic convergence to ln(x), which is way better than what the Taylor-Series offers.
// Using Newton's method, the iteration simplifies to (implementation)
// which has cubic convergence to ln(x).
public static double ln(double x, double epsilon)
{
double yn = x - 1.0d; // using the first term of the taylor series as initial-value
double yn1 = yn;
do
{
yn = yn1;
yn1 = yn + 2 * (x - System.Math.Exp(yn)) / (x + System.Math.Exp(yn));
} while (System.Math.Abs(yn - yn1) > epsilon);
return yn1;
}
This is not C, but C#, but I'm sure anybody capable to program in C will be able to deduce the C-Code from that.
Furthermore, since
logn(x) = ln(x)/ln(n).
You have therefore just implemented logN as well.
public static double log(double x, double n, double epsilon)
{
return ln(x, epsilon) / ln(n, epsilon);
}
where epsilon (error) is the minimum precision.
Now as to speed, you're probably better of using the ln-cast-in-hardware, but as I said, I used this as a base to implement logarithms on a rational numbers class working with arbitrary precision.
Arbitrary precision might be more important than speed, under certain circumstances.
Then, use the logarithmic identities for rational numbers:
logB(x/y) = logB(x) - logB(y)
In addition to Crouching Kitten's answer which gave me inspiration, you can build a pseudo-recursive (at most 1 self-call) logarithm to avoid using polynomials. In pseudo code
ln(x) :=
If (x <= 0)
return NaN
Else if (!(1 <= x < 2))
return LN2 * b + ln(a)
Else
return taylor_expansion(x - 1)
This is pretty efficient and precise since on [1; 2) the taylor series converges A LOT faster, and we get such a number 1 <= a < 2 with the first call to ln if our input is positive but not in this range.
You can find 'b' as your unbiased exponent from the data held in the float x, and 'a' from the mantissa of the float x (a is exactly the same float as x, but now with exponent biased_0 rather than exponent biased_b). LN2 should be kept as a macro in hexadecimal floating point notation IMO. You can also use http://man7.org/linux/man-pages/man3/frexp.3.html for this.
Also, the trick
unsigned long tmp = *(ulong*)(&d);
for "memory-casting" double to unsigned long, rather than "value-casting", is very useful to know when dealing with floats memory-wise, as bitwise operators will cause warnings or errors depending on the compiler.
Possible computation of ln(x) and expo(x) in C without <math.h> :
static double expo(double n) {
int a = 0, b = n > 0;
double c = 1, d = 1, e = 1;
for (b || (n = -n); e + .00001 < (e += (d *= n) / (c *= ++a)););
// approximately 15 iterations
return b ? e : 1 / e;
}
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}
Try it Online
Building off #Crouching Kitten's great natural log answer above, if you need it to be accurate for inputs <1 you can add a simple scaling factor. Below is an example in C++ that i've used in microcontrollers. It has a scaling factor of 256 and it's accurate to inputs down to 1/256 = ~0.04, and up to 2^32/256 = 16777215 (due to overflow of a uint32 variable).
It's interesting to note that even on an STMF103 Arm M3 with no FPU, the float implementation below is significantly faster (eg 3x or better) than the 16 bit fixed-point implementation in libfixmath (that being said, this float implementation still takes a few thousand cycles so it's still not ~fast~)
#include <float.h>
float TempSensor::Ln(float y)
{
// Algo from: https://stackoverflow.com/a/18454010
// Accurate between (1 / scaling factor) < y < (2^32 / scaling factor). Read comments below for more info on how to extend this range
float divisor, x, result;
const float LN_2 = 0.69314718; //pre calculated constant used in calculations
uint32_t log2 = 0;
//handle if input is less than zero
if (y <= 0)
{
return -FLT_MAX;
}
//scaling factor. The polynomial below is accurate when the input y>1, therefore using a scaling factor of 256 (aka 2^8) extends this to 1/256 or ~0.04. Given use of uint32_t, the input y must stay below 2^24 or 16777216 (aka 2^(32-8)), otherwise uint_y used below will overflow. Increasing the scaing factor will reduce the lower accuracy bound and also reduce the upper overflow bound. If you need the range to be wider, consider changing uint_y to a uint64_t
const uint32_t SCALING_FACTOR = 256;
const float LN_SCALING_FACTOR = 5.545177444; //this is the natural log of the scaling factor and needs to be precalculated
y = y * SCALING_FACTOR;
uint32_t uint_y = (uint32_t)y;
while (uint_y >>= 1) // Convert the number to an integer and then find the location of the MSB. This is the integer portion of Log2(y). See: https://stackoverflow.com/a/4970859/6630230
{
log2++;
}
divisor = (float)(1 << log2);
x = y / divisor; // FInd the remainder value between [1.0, 2.0] then calculate the natural log of this remainder using a polynomial approximation
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x; //This polynomial approximates ln(x) between [1,2]
result = result + ((float)log2) * LN_2 - LN_SCALING_FACTOR; // Using the log product rule Log(A) + Log(B) = Log(AB) and the log base change rule log_x(A) = log_y(A)/Log_y(x), calculate all the components in base e and then sum them: = Ln(x_remainder) + (log_2(x_integer) * ln(2)) - ln(SCALING_FACTOR)
return result;
}

Resources