ieee754 floating point 1/x * x > 1.0 - c

I want to know whether the program defined below can return 1 assuming:
IEEE754 floating point arithmetics
no overflow (neither in max/x nor in f*x)
no nan or inf (obviously)
0 < x and 0 < n < 32
no unsafe math optimization
int canfail(int n, double x) {
double max = 1ULL << n; // 2^n
double f = max / x;
return f * x > max;
}
In my opinion, it should sometime return 1, as roundToNearest(max / x) can in general be greater than max/x.
I'm able to find numbers for the opposite case, where f * x < max, but I have no examples of input that show f * x > max and I have no idea of how to find one. Can somebody help ?
EDIT:
I know the value of x if in a range between 10^(-6) and 10^6 (that still leaves a lot (too much possible double values), but I know I will not have to deal with overflow, underflow or sub-normal numbers !
In addition, I just realized that because max is a power of two and we don't deal with overflow, the solution will be the same by fixing max=1 as it is exactly the same computation, but shifted.
Therefore, the problem correspond to finding a positive, normal double value x such that `(1/x) * x > 1.0 !!
I made a little program to try to find a solution:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
#include <omp.h>
int main( void ) {
#pragma omp parallel
{
unsigned short int xsubi[3] = {
omp_get_thread_num(),
omp_get_thread_num(),
omp_get_thread_num()
};
#pragma omp for
for(int64_t i=0; i<INT64_MAX; i++) {
double x = fmod(nrand48(xsubi), 1048576.0);
if(x<0.000001)
continue;
double f = 1.0 / x;
if(f * x > 1.0) {
printf("found !!! x=%.30f\n", x);
fflush(stdout);
}
}
}
return 1;
}
If you change the sign of the comparison, you will find some value quickly. However, it seems to run forever with f * x > 1.0

In the absence of underflow or overflow, the exponents are irrelevant; if M/x*x > M, then (M/p) / (x/q) * (x/q) > (M/p) for any powers of two p and q. So let’s consider 252 ≤ x < 253 and M = 2105. We can eliminate x = 252 since this yields exact floating-point arithmetic, so 252 < x < 253.
Division of 2105 by x yields integer quotient q and integer remainder r, with 252 q < 253, 0 < r < x, and 2105 = q•x + r.
In order for M/x*x to exceed M, both the division and the multiplication must round up. Since the division rounds up, x/2 ≤ r.
With rounding up, the result of floating-point division of 2105 by x yields q+1. Then the exact (not rounded) multiplication yields (q+1)•x = q•x + x = q•x + x + r - r = q•x + r + x − r = 2105 + x − r. Since x/2 < r, x − r ≤ x/2, so rounding this exact result rounds down, yielding 2105. (The “<” case always rounds down, and the “=” case rounds down because 2105 has the low even bit.)
Therefore, for powers of two M and all arithmetic within exponent bounds, M/x*x > M never occurs with round-to-nearest-ties-to-even.

Multiplication by a power of two is just a scaling of exponent, it does not change the problem: so it's the same as finding x such that (1/x) * x > 1.
One solution is brute force search.
For same reasons, we can limit the search of such x in the interval (1.0,2.0(
A better approach is to analyze error bounds without brute force.
Let's note ix the nearest floating point to 1/x.
Considering xand ixas exact fractions, we can write the integer division: 1 = ix * x + r where ris the remainder
(these are all fractions with denominators being powers of 2, so we have to multiply the whole equation by appropriate power of 2 to really have integer division).
In other words, ix = 1/x - r/x, where -r/x is the rounding error of inversion.
When we multiply the inverse approximation by x, the exact value is ix*x = 1 - r.
We know that the floating point result will be rounded to the nearest float to that exact value.
So, assumming default rounding mode to nearest, tie to even, the question asked is whether -r can exceed 0.5 ulp.
The short answer is never!
Suppose |r| > 0.5 ulp, then the rounding error -r/x does exceed half ulp of exact result 1/x.
This is not a proper answer, because the exact result is not a floating point and does not have an ulp, but you get the idea...
I might come back with a correct proof if i have time, but my bet is that you can find it already done, possibly on SO
EDIT
Why can you find (1/x) * x < 1?
Simply because 1.0 is at a binade limit, so below 1, we have to prove that r<0.25 ulp, what we cannot...

canfail(1, pow(2, 1023) * (2 - pow(2, -51))) will return 1.

Related

How to find remainder of a double in C? Modulo only works for integers

This is what I've found so far online,
int main(void)
{
long a = 12345;
int b = 10;
int remain = a - (a / b) * b;
printf("%i\n", remain);
}
First I wonder how the formula works. Maybe i cant do math, but the priority of operations here seems a bit odd. If i run this code the expected answer of 5 is printed. But I dont get how (a / b) * b doesn't cancel out to 'a' leading to a - a = 0.
Now, this only works for int and long, as soon as double are involved it doesn't work anymore. Anyone might tell me why? Is there an alternative to modulo that works for double?
Also I'm not sure if i understand up to what value a long can go, i found online that the upper limit was 2147483647 but when i input bigger numbers such as the one in 'a' the code runs without any issue up to a certain point...
Thanks for your help I'm new to coding and trying to learn!
Given two double finite numbers x and y, with y not equal to zero, fmod(x, y) produces the remainder of x when divided by y. Specifically, it returns x − ny, where n is chosen so that x − ny has the same sign as x and is smaller in magnitude than y. (So, if x is positive, 0 ≤ fmod(x, y) < x, and, if x is negative, x < fmod(x, y) ≤ 0.)
fmod is declared in <math.h>.
A properly implemented fmod returns an exact result; there is no floating-point error, since the specified result is always representable.
The C standard also specifies remquo to return the remainder and some low bits (at least three) of the quotient n and remainder with a variation on the definition of the remainder. It also specifies variants of these functions for float and long double.
Naive implementation. Limited range. Adds additional floating point imprecisions (as it does some arithmetic)
double naivemod(double x)
{
return x - (long long)x;
}
int main(void)
{
printf("%.50f\n", naivemod(345345.567567756));
printf("%.50f\n", naivemod(.0));
printf("%.50f\n", naivemod(10.5));
printf("%.50f\n", naivemod(-10.0/3));
}

Comparing the ratio of two values to 1

I'm working via a basic 'Programming in C' book.
I have written the following code based off of it in order to calculate the square root of a number:
#include <stdio.h>
float absoluteValue (float x)
{
if(x < 0)
x = -x;
return (x);
}
float squareRoot (float x, float epsilon)
{
float guess = 1.0;
while(absoluteValue(guess * guess - x) >= epsilon)
{
guess = (x/guess + guess) / 2.0;
}
return guess;
}
int main (void)
{
printf("SquareRoot(2.0) = %f\n", squareRoot(2.0, .00001));
printf("SquareRoot(144.0) = %f\n", squareRoot(144.0, .00001));
printf("SquareRoot(17.5) = %f\n", squareRoot(17.5, .00001));
return 0;
}
An exercise in the book has said that the current criteria used for termination of the loop in squareRoot() is not suitable for use when computing the square root of a very large or a very small number.
Instead of comparing the difference between the value of x and the value of guess^2, the program should compare the ratio of the two values to 1. The closer this ratio gets to 1, the more accurate the approximation of the square root.
If the ratio is just guess^2/x, shouldn't my code inside of the while loop:
guess = (x/guess + guess) / 2.0;
be replaced by:
guess = ((guess * guess) / x ) / 1 ; ?
This compiles but nothing is printed out into the terminal. Surely I'm doing exactly what the exercise is asking?
To calculate the ratio just do (guess * guess / x) that could be either higher or lower than 1 depending on your implementation. Similarly, your margin of error (in percent) would be absoluteValue((guess * guess / x) - 1) * 100
All they want you to check is how close the square root is. By squaring the number you get and dividing it by the number you took the square root of you are just checking how close you were to the original number.
Example:
sqrt(4) = 2
2 * 2 / 4 = 1 (this is exact so we get 1 (2 * 2 = 4 = 4))
margin of error = (1 - 1) * 100 = 0% margin of error
Another example:
sqrt(4) = 1.999 (lets just say you got this)
1.999 * 1.999 = 3.996
3.996/4 = .999 (so we are close but not exact)
To check margin of error:
.999 - 1 = -.001
absoluteValue(-.001) = .001
.001 * 100 = .1% margin of error
How about applying a little algebra? Your current criterion is:
|guess2 - x| >= epsilon
You are elsewhere assuming that guess is nonzero, so it is algebraically safe to convert that to
|1 - x / guess2| >= epsilon / guess2
epsilon is just a parameter governing how close the match needs to be, and the above reformulation shows that it must be expressed in terms of the floating-point spacing near guess2 to yield equivalent precision for all evaluations. But of course that's not possible because epsilon is a constant. This is, in fact, exactly why the original criterion gets less effective as x diverges from 1.
Let us instead write the alternative expression
|1 - x / guess2| >= delta
Here, delta expresses the desired precision in terms of the spacing of floating point values in the vicinity of 1, which is related to a fixed quantity sometimes called the "machine epsilon". You can directly select the required precision via your choice of delta, and you will get the same precision for all x, provided that no arithmetic operations overflow.
Now just convert that back into code.
Suggest a different point of view.
As this method guess_next = (x/guess + guess) / 2.0;, once the initial approximation is in the neighborhood, the number of bits of accuracy doubles. Example log2(FLT_EPSILON) is about -23, so 6 iterations are needed. (Think 23, 12, 6, 3, 2, 1)
The trouble with using guess * guess is that it may vanish, become 0.0 or infinity for a non-zero x.
To form a quality initial guess:
assert(x > 0.0f);
int expo;
float signif = frexpf(x, &expo);
float guess = ldexpf(signif, expo/2);
Now iterate N times (e.g. 6), (N based on FLT_EPSILON, FLT_DECIMAL_DIG or FLT_DIG.)
for (i=0; i<N; i++) {
guess = (x/guess + guess) / 2.0f;
}
The cost of perhaps an extra iteration is saved by avoiding an expensive termination condition calculation.
If code wants to compare a/b nearest to 1.0f
Simply use some epsilon factor like 1 or 2.
float a = guess;
float b = x/guess;
assert(b);
float q = a/b;
#define FACTOR (1.0f /* some value 1.0f to maybe 2,3 or 4 */)
if (q >= 1.0f - FLT_EPSILON*N && q <= 1.0f + FLT_EPSILON*N) {
close_enough();
}
First lesson in numerical analysis: for floating point numbers x+y has the potential for large relative errors, especially when the sum is near zero, but x*y has very limited relative errors.

Round positive value half-up to 2 decimal places in C

Typically, Rounding to 2 decimal places is very easy with
printf("%.2lf",<variable>);
However, the rounding system will usually rounds to the nearest even. For example,
2.554 -> 2.55
2.555 -> 2.56
2.565 -> 2.56
2.566 -> 2.57
And what I want to achieve is that
2.555 -> 2.56
2.565 -> 2.57
In fact, rounding half-up is doable in C, but for Integer only;
int a = (int)(b+0.5)
So, I'm asking for how to do the same thing as above with 2 decimal places on positive values instead of Integer to achieve what I said earlier for printing.
It is not clear whether you actually want to "round half-up", or rather "round half away from zero", which requires different treatment for negative values.
Single precision binary float is precise to at least 6 decimal places, and 20 for double, so nudging a FP value by DBL_EPSILON (defined in float.h) will cause a round-up to the next 100th by printf( "%.2lf", x ) for n.nn5 values. without affecting the displayed value for values not n.nn5
double x2 = x * (1 + DBL_EPSILON) ; // round half-away from zero
printf( "%.2lf", x2 ) ;
For different rounding behaviours:
double x2 = x * (1 - DBL_EPSILON) ; // round half-toward zero
double x2 = x + DBL_EPSILON ; // round half-up
double x2 = x - DBL_EPSILON ; // round half-down
Following is precise code to round a double to the nearest 0.01 double.
The code functions like x = round(100.0*x)/100.0; except it handles uses manipulations to insure scaling by 100.0 is done exactly without precision loss.
Likely this is more code than OP is interested, but it does work.
It works for the entire double range -DBL_MAX to DBL_MAX. (still should do more unit testing).
It depends on FLT_RADIX == 2, which is common.
#include <float.h>
#include <math.h>
void r100_best(const char *s) {
double x;
sscanf(s, "%lf", &x);
// Break x into whole number and fractional parts.
// Code only needs to round the fractional part.
// This preserves the entire `double` range.
double xi, xf;
xf = modf(x, &xi);
// Multiply the fractional part by N (256).
// Break into whole and fractional parts.
// This provides the needed extended precision.
// N should be >= 100 and a power of 2.
// The multiplication by a power of 2 will not introduce any rounding.
double xfi, xff;
xff = modf(xf * 256, &xfi);
// Multiply both parts by 100.
// *100 incurs 7 more bits of precision of which the preceding code
// insures the 8 LSbit of xfi, xff are zero.
int xfi100, xff100;
xfi100 = (int) (xfi * 100.0);
xff100 = (int) (xff * 100.0); // Cast here will truncate (towards 0)
// sum the 2 parts.
// sum is the exact truncate-toward-0 version of xf*256*100
int sum = xfi100 + xff100;
// add in half N
if (sum < 0)
sum -= 128;
else
sum += 128;
xf = sum / 256;
xf /= 100;
double y = xi + xf;
printf("%6s %25.22f ", "x", x);
printf("%6s %25.22f %.2f\n", "y", y, y);
}
int main(void) {
r100_best("1.105");
r100_best("1.115");
r100_best("1.125");
r100_best("1.135");
r100_best("1.145");
r100_best("1.155");
r100_best("1.165");
return 0;
}
[Edit] OP clarified that only the printed value needs rounding to 2 decimal places.
OP's observation that rounding of numbers "half-way" per a "round to even" or "round away from zero" is misleading. Of 100 "half-way" numbers like 0.005, 0.015, 0.025, ... 0.995, only 4 are typically exactly "half-way": 0.125, 0.375, 0.625, 0.875. This is because floating-point number format use base-2 and numbers like 2.565 cannot be exactly represented.
Instead, sample numbers like 2.565 have as the closest double value of 2.564999999999999947... assuming binary64. Rounding that number to nearest 0.01 should be 2.56 rather than 2.57 as desired by OP.
Thus only numbers ending with 0.125 and 0.625 area exactly half-way and round down rather than up as desired by OP. Suggest to accept that and use:
printf("%.2lf",variable); // This should be sufficient
To get close to OP's goal, numbers could be A) tested against ending with 0.125 or 0.625 or B) increased slightly. The smallest increase would be
#include <math.h>
printf("%.2f", nextafter(x, 2*x));
Another nudge method is found with #Clifford.
[Former answer that rounds a double to the nearest double multiple of 0.01]
Typical floating-point uses formats like binary64 which employs base-2. "Rounding to nearest mathmatical 0.01 and ties away from 0.0" is challenging.
As #Pascal Cuoq mentions, floating point numbers like 2.555 typically are only near 2.555 and have a more precise value like 2.555000000000000159872... which is not half way.
#BLUEPIXY solution below is best and practical.
x = round(100.0*x)/100.0;
"The round functions round their argument to the nearest integer value in floating-point
format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.6.
The ((int)(100 * (x + 0.005)) / 100.0) approach has 2 problems: it may round in the wrong direction for negative numbers (OP did not specify) and integers typically have a much smaller range (INT_MIN to INT_MAX) that double.
There are still some cases when like when double x = atof("1.115"); which end up near 1.12 when it really should be 1.11 because 1.115, as a double is really closer to 1.11 and not "half-way".
string x rounded x
1.115 1.1149999999999999911182e+00 1.1200000000000001065814e+00
OP has not specified rounding of negative numbers, assuming y = -f(-x).

Implementing single-precision division as double-precision multiplication

Question
For a C99 compiler implementing exact IEEE 754 arithmetic, do values of f, divisor of type float exist such that f / divisor != (float)(f * (1.0 / divisor))?
EDIT: By “implementing exact IEEE 754 arithmetic” I mean a compiler that rightfully defines FLT_EVAL_METHOD as 0.
Context
A C compiler that provides IEEE 754-compliant floating-point can only replace a single-precision division by a constant by a single-precision multiplication by the inverse if said inverse is itself representable exactly as a float.
In practice, this only happens for powers of two. So a programmer, Alex, may be confident that f / 2.0f will be compiled as if it had been f * 0.5f, but if it is acceptable for Alex to multiply by 0.10f instead of dividing by 10, Alex should express it by writing the multiplication in the program, or by using a compiler option such as GCC's -ffast-math.
This question is about transforming a single-precision division into a double-precision multiplication. Does it always produce the correctly rounded result? Is there a chance that it could be cheaper, and thus be an optimization that compilers might make (even without -ffast-math)?
I have compared (float)(f * 0.10) and f / 10.0f for all single-precision values of f between 1 and 2, without finding any counter-example. This should cover all divisions of normal floats producing a normal result.
Then I generalized the test to all divisors with the program below:
#include <float.h>
#include <math.h>
#include <stdio.h>
int main(void){
for (float divisor = 1.0; divisor != 2.0; divisor = nextafterf(divisor, 2.0))
{
double factor = 1.0 / divisor; // double-precision inverse
for (float f = 1.0; f != 2.0; f = nextafterf(f, 2.0))
{
float cr = f / divisor;
float opt = f * factor; // double-precision multiplication
if (cr != opt)
printf("For divisor=%a, f=%a, f/divisor=%a but (float)(f*factor)=%a\n",
divisor, f, cr, opt);
}
}
}
The search space is just large enough to make this interesting (246). The program is currently running. Can someone tell me whether it will print something, perhaps with an explanation why or why not, before it has finished?
Your program won't print anything, assuming round-ties-to-even rounding mode. The essence of the argument is as follows:
We're assuming that both f and divisor are between 1.0 and 2.0. So f = a / 2^23 and divisor = b / 2^23 for some integers a and b in the range [2^23, 2^24). The case divisor = 1.0 isn't interesting, so we can further assume that b > 2^23.
The only way that (float)(f * (1.0 / divisor)) could give the wrong result would be for the exact value f / divisor to be so close to a halfway case (i.e., a number exactly halfway between two single-precision floats) that the accumulated errors in the expression f * (1.0 / divisor) push us to the other side of that halfway case from the true value.
But that can't happen. For simplicity, let's first assume that f >= divisor, so that the exact quotient is in [1.0, 2.0). Now any halfway case for single precision in the interval [1.0, 2.0) has the form c / 2^24 for some odd integer c with 2^24 < c < 2^25. The exact value of f / divisor is a / b, so the absolute value of the difference f / divisor - c / 2^24 is bounded below by 1 / (2^24 b), so is at least 1 / 2^48 (since b < 2^24). So we're more than 16 double-precision ulps away from any halfway case, and it should be easy to show that the error in the double precision computation can never exceed 16 ulps. (I haven't done the arithmetic, but I'd guess it's easy to show an upper bound of 3 ulps on the error.)
So f / divisor can't be close enough to a halfway case to create problems. Note that f / divisor can't be an exact halfway case, either: since c is odd, c and 2^24 are relatively prime, so the only way we could have c / 2^24 = a / b is if b is a multiple of 2^24. But b is in the range (2^23, 2^24), so that's not possible.
The case where f < divisor is similar: the halfway cases then have the form c / 2^25 and the analogous argument shows that abs(f / divisor - c / 2^25) is greater than 1 / 2^49, which again gives us a margin of 16 double-precision ulps to play with.
It's certainly not possible if non-default rounding modes are possible. For example, in replacing 3.0f / 3.0f with 3.0f * C, a value of C less than the exact reciprocal would yield the wrong result in downward or toward-zero rounding modes, whereas a value of C greater than the exact reciprocal would yield the wrong result for upward rounding mode.
It's less clear to me whether what you're looking for is possible if you restrict to default rounding mode. I'll think about it and revise this answer if I come up with anything.
Random search resulted in an example.
Looks like when the result is a "denormal/subnormal" number, the inequality is possible. But then, maybe my platform is not IEEE 754 compliant?
f 0x1.7cbff8p-25
divisor -0x1.839p+116
q -0x1.f8p-142
q2 -0x1.f6p-142
int MyIsFinite(float f) {
union {
float f;
unsigned char uc[sizeof (float)];
unsigned long ul;
} x;
x.f = f;
return (x.ul & 0x7F800000L) != 0x7F800000L;
}
float floatRandom() {
union {
float f;
unsigned char uc[sizeof (float)];
} x;
do {
size_t i;
for (i=0; i<sizeof(x.uc); i++) x.uc[i] = rand();
} while (!MyIsFinite(x.f));
return x.f;
}
void testPC() {
for (;;) {
volatile float f, divisor, q, qd;
do {
f = floatRandom();
divisor = floatRandom();
q = f / divisor;
} while (!MyIsFinite(q));
qd = (float) (f * (1.0 / divisor));
if (qd != q) {
printf("%a %a %a %a\n", f, divisor, q, qd);
return;
}
}
}
Eclipse PC Version: Juno Service Release 2
Build id: 20130225-0426

Efficient implementation of natural logarithm (ln) and exponentiation

I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory.
The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.
EDIT
The algorithm should be correct up to 3 decimal places.
Using Taylor series is not the simplest neither the fastest way of doing this. Most professional implementations are using approximating polynomials. I'll show you how to generate one in Maple (it is a computer algebra program), using the Remez algorithm.
For 3 digits of accuracy execute the following commands in Maple:
with(numapprox):
Digits := 8
minimax(ln(x), x = 1 .. 2, 4, 1, 'maxerror')
maxerror
Its response is the following polynomial:
-1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x
With the maximal error of: 0.000061011436
We generated a polynomial which approximates the ln(x), but only inside the [1..2] interval. Increasing the interval is not wise, because that would increase the maximal error even more. Instead of that, do the following decomposition:
So first find the highest power of 2, which is still smaller than the number (See: What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?). That number is actually the base-2 logarithm. Divide with that value, then the result gets into the 1..2 interval. At the end we will have to add n*ln(2) to get the final result.
An example implementation for numbers >= 1:
float ln(float y) {
int log2;
float divisor, x, result;
log2 = msb((int)y); // See: https://stackoverflow.com/a/4970859/6630230
divisor = (float)(1 << log2);
x = y / divisor; // normalized value between [1.0, 2.0]
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
result += ((float)log2) * 0.69314718; // ln(2) = 0.69314718
return result;
}
Although if you plan to use it only in the [1.0, 2.0] interval, then the function is like:
float ln(float x) {
return -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
}
The Taylor series for e^x converges extremely quickly, and you can tune your implementation to the precision that you need. (http://en.wikipedia.org/wiki/Taylor_series)
The Taylor series for log is not as nice...
If you don't need floating-point math for anything else, you may compute an approximate fractional base-2 log pretty easily. Start by shifting your value left until it's 32768 or higher and store the number of times you did that in count. Then, repeat some number of times (depending upon your desired scale factor):
n = (mult(n,n) + 32768u) >> 16; // If a function is available for 16x16->32 multiply
count<<=1;
if (n < 32768) n*=2; else count+=1;
If the above loop is repeated 8 times, then the log base 2 of the number will be count/256. If ten times, count/1024. If eleven, count/2048. Effectively, this function works by computing the integer power-of-two logarithm of n**(2^reps), but with intermediate values scaled to avoid overflow.
Would basic table with interpolation between values approach work? If ranges of values are limited (which is likely for your case - I doubt temperature readings have huge range) and high precisions is not required it may work. Should be easy to test on normal machine.
Here is one of many topics on table representation of functions: Calculating vs. lookup tables for sine value performance?
Necromancing.
I had to implement logarithms on rational numbers.
This is how I did it:
Occording to Wikipedia, there is the Halley-Newton approximation method
which can be used for very-high precision.
Using Newton's method, the iteration simplifies to (implementation), which has cubic convergence to ln(x), which is way better than what the Taylor-Series offers.
// Using Newton's method, the iteration simplifies to (implementation)
// which has cubic convergence to ln(x).
public static double ln(double x, double epsilon)
{
double yn = x - 1.0d; // using the first term of the taylor series as initial-value
double yn1 = yn;
do
{
yn = yn1;
yn1 = yn + 2 * (x - System.Math.Exp(yn)) / (x + System.Math.Exp(yn));
} while (System.Math.Abs(yn - yn1) > epsilon);
return yn1;
}
This is not C, but C#, but I'm sure anybody capable to program in C will be able to deduce the C-Code from that.
Furthermore, since
logn(x) = ln(x)/ln(n).
You have therefore just implemented logN as well.
public static double log(double x, double n, double epsilon)
{
return ln(x, epsilon) / ln(n, epsilon);
}
where epsilon (error) is the minimum precision.
Now as to speed, you're probably better of using the ln-cast-in-hardware, but as I said, I used this as a base to implement logarithms on a rational numbers class working with arbitrary precision.
Arbitrary precision might be more important than speed, under certain circumstances.
Then, use the logarithmic identities for rational numbers:
logB(x/y) = logB(x) - logB(y)
In addition to Crouching Kitten's answer which gave me inspiration, you can build a pseudo-recursive (at most 1 self-call) logarithm to avoid using polynomials. In pseudo code
ln(x) :=
If (x <= 0)
return NaN
Else if (!(1 <= x < 2))
return LN2 * b + ln(a)
Else
return taylor_expansion(x - 1)
This is pretty efficient and precise since on [1; 2) the taylor series converges A LOT faster, and we get such a number 1 <= a < 2 with the first call to ln if our input is positive but not in this range.
You can find 'b' as your unbiased exponent from the data held in the float x, and 'a' from the mantissa of the float x (a is exactly the same float as x, but now with exponent biased_0 rather than exponent biased_b). LN2 should be kept as a macro in hexadecimal floating point notation IMO. You can also use http://man7.org/linux/man-pages/man3/frexp.3.html for this.
Also, the trick
unsigned long tmp = *(ulong*)(&d);
for "memory-casting" double to unsigned long, rather than "value-casting", is very useful to know when dealing with floats memory-wise, as bitwise operators will cause warnings or errors depending on the compiler.
Possible computation of ln(x) and expo(x) in C without <math.h> :
static double expo(double n) {
int a = 0, b = n > 0;
double c = 1, d = 1, e = 1;
for (b || (n = -n); e + .00001 < (e += (d *= n) / (c *= ++a)););
// approximately 15 iterations
return b ? e : 1 / e;
}
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}
Try it Online
Building off #Crouching Kitten's great natural log answer above, if you need it to be accurate for inputs <1 you can add a simple scaling factor. Below is an example in C++ that i've used in microcontrollers. It has a scaling factor of 256 and it's accurate to inputs down to 1/256 = ~0.04, and up to 2^32/256 = 16777215 (due to overflow of a uint32 variable).
It's interesting to note that even on an STMF103 Arm M3 with no FPU, the float implementation below is significantly faster (eg 3x or better) than the 16 bit fixed-point implementation in libfixmath (that being said, this float implementation still takes a few thousand cycles so it's still not ~fast~)
#include <float.h>
float TempSensor::Ln(float y)
{
// Algo from: https://stackoverflow.com/a/18454010
// Accurate between (1 / scaling factor) < y < (2^32 / scaling factor). Read comments below for more info on how to extend this range
float divisor, x, result;
const float LN_2 = 0.69314718; //pre calculated constant used in calculations
uint32_t log2 = 0;
//handle if input is less than zero
if (y <= 0)
{
return -FLT_MAX;
}
//scaling factor. The polynomial below is accurate when the input y>1, therefore using a scaling factor of 256 (aka 2^8) extends this to 1/256 or ~0.04. Given use of uint32_t, the input y must stay below 2^24 or 16777216 (aka 2^(32-8)), otherwise uint_y used below will overflow. Increasing the scaing factor will reduce the lower accuracy bound and also reduce the upper overflow bound. If you need the range to be wider, consider changing uint_y to a uint64_t
const uint32_t SCALING_FACTOR = 256;
const float LN_SCALING_FACTOR = 5.545177444; //this is the natural log of the scaling factor and needs to be precalculated
y = y * SCALING_FACTOR;
uint32_t uint_y = (uint32_t)y;
while (uint_y >>= 1) // Convert the number to an integer and then find the location of the MSB. This is the integer portion of Log2(y). See: https://stackoverflow.com/a/4970859/6630230
{
log2++;
}
divisor = (float)(1 << log2);
x = y / divisor; // FInd the remainder value between [1.0, 2.0] then calculate the natural log of this remainder using a polynomial approximation
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x; //This polynomial approximates ln(x) between [1,2]
result = result + ((float)log2) * LN_2 - LN_SCALING_FACTOR; // Using the log product rule Log(A) + Log(B) = Log(AB) and the log base change rule log_x(A) = log_y(A)/Log_y(x), calculate all the components in base e and then sum them: = Ln(x_remainder) + (log_2(x_integer) * ln(2)) - ln(SCALING_FACTOR)
return result;
}

Resources