I am looking for a simple portable implementation of log1p. I have come across two implementations.
The first one appears as Theorem 4 here
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html,
An implementation of the above
double log1p(double p)
{
volatile double y = p;
return ( (1 + y) == 1 ) ? y : y * ( log( 1 + y) / ( ( 1 + y) - 1 ) );
}
The second one is in GSL http://fossies.org/dox/gsl-1.16/log1p_8c_source.html
double gsl_log1p (const double x)
{
volatile double y, z;
y = 1 + x;
z = y - 1;
return log(y) - (z-x)/y ; /* cancels errors with IEEE arithmetic */
}
Is there a reason to prefer one over the other?
I have tested these two approaches using a log() implementation with a maximum error of < 0.51 ulps, comparing to a multi-precision arithmetic library. Using that log() implementation as a building block for the two log1p() variants, I found the maximum error of Goldberg's version to be < 2.5 ulps, while the maximum error in the GSL variant was < 1.5 ulps. This indicates that the latter is significantly more accurate.
In terms of special case handling, the Goldberg variant showed one mismatch, in that it returns a NaN for an input of +infinity, whereas the correct result is +infinity. There were three mismatches for special cases with the GSL implementation: Inputs of -1 and +infinity delivered a NaN, while the correct results should be -infinity and +infinity, respectively. Also, for an input of -0 this code returned +0, whereas the correct result is -0.
It is difficult to assess performance without knowledge of the distribution of the inputs. As others have pointed out in comments, Goldberg's version is potentially faster when many arguments are close to zero, as it skips the expensive call to log() for such arguments.
There is no sure answer between the two. The GNU Scientific Library is quite robust, well used, and actively supported across all late versions of gcc. You are not as likely to run across too many surprises with its use. As for any other code you scrape up, there is absolutely no reason not to use it after you have validated its logic and are satisfied with its level/manner of error checking. The small downside to GSL is that it is another library you must carry around and depending on how widespread use of your code will be can provided more of a challenge for other users on other platforms. That is about the size of it.
The best piece of code is the one that most closely meets the requirements of your project.
Related
I have to raise 10 to the power of a double a lot of times.
Is there a more efficient way to do this than with the math library pow(10,double)? If it matters, my doubles are always negative between -5 and -11.
I assume pow(double,double) uses a more general algorithm than is required for pow(10,double) and might therefore not be the fastest method. Given some of the answers below, that might have been an incorrect assumption.
As for the why, it is for logartihmic interpolation.
I have a table of x and y values.
My object has a known x value (which is almost always a double).
double Dbeta(struct Data *diffusion, double per){
double frac;
while(per>diffusion->x[i]){
i++;
}
frac = (per-diffusion->x[i-1])/(diffusion->x[i]-diffusion->x[i-1]);
return pow(10,log10DB[i-1] + frac * (log10DB[i]-log10DB[i-1]));
}
This function is called a lot of times.
I have been told to look into profiling, so that is what I will do first.
I have just been told I could have used natural logarithms instead of base 10, which is obviously right. (my stupidity sometimes amazes even myself.)
After replacing everything with natural logarithms everything runs a bit faster. With profiling (which is a new word I learned today) I found out 39% of my code is spend in the exp function, so for those who wondered if it was in fact this part that was bottlenecking my code, it was.
For pow(10.0, n) it should be faster to set c = log(10.0), which you can compute once, then use exp(c*n), which should be significantly faster than pow(10.0, n) (which is basically doing that same thing internally, except it would be calculating log(10.0) over and over instead of just once). Beyond that, there probably isn't much else you can do.
Yes, the pow function is slow (roughly 50x the cost of a multiply, for those asking for benchmarks).
By some log/exponents trickery, we can express 10^x as
10^x = exp(log(10^x)) = exp(x * log(10)).
So you can implement 10^x with exp(x * M_LN10), which should be more efficient than pow.
If double accuracy isn't critical, use the float version of the function expf (or powf), which should be more efficient than the double version.
If rough accuracy is Ok, precompute a table over the [-5, -11] range and do a quick look up with linear interpolation.
Some benchmarks (using glibc 2.31):
Benchmark Time
---------------------------------
pow(10, x) 15.54 ns
powf(10, x) 7.18 ns
expf(x * (float)M_LN10) 3.45 ns
I was implementing a hashmap in C as part of a project I'm working on and using random inserts to test it. I noticed that rand() on Linux seems to repeat numbers far more often than on Mac. RAND_MAX is 2147483647/0x7FFFFFFF on both platforms. I've reduced it to this test program that makes a byte array RAND_MAX+1-long, generates RAND_MAX random numbers, notes if each is a duplicate, and checks it off the list as seen.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main() {
size_t size = ((size_t)RAND_MAX) + 1;
char *randoms = calloc(size, sizeof(char));
int dups = 0;
srand(time(0));
for (int i = 0; i < RAND_MAX; i++) {
int r = rand();
if (randoms[r]) {
// printf("duplicate at %d\n", r);
dups++;
}
randoms[r] = 1;
}
printf("duplicates: %d\n", dups);
}
Linux consistently generates around 790 million duplicates. Mac consistently only generates one, so it loops through every random number that it can generate almost without repeating. Can anyone please explain to me how this works? I can't tell anything different from the man pages, can't tell which RNG each is using, and can't find anything online. Thanks!
While at first it may sound like the macOS rand() is somehow better for not repeating any numbers, one should note that with this amount of numbers generated it is expected to see plenty of duplicates (in fact, around 790 million, or (231-1)/e). Likewise iterating through the numbers in sequence would also produce no duplicates, but wouldn't be considered very random. So the Linux rand() implementation is in this test indistinguishable from a true random source, whereas the macOS rand() is not.
Another thing that appears surprising at first glance is how the macOS rand() can manage to avoid duplicates so well. Looking at its source code, we find the implementation to be as follows:
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* without overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;
/* Can't be initialized with 0, so use another value. */
if (*ctx == 0)
*ctx = 123459876;
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
if (x < 0)
x += 0x7fffffff;
return ((*ctx = x) % ((unsigned long) RAND_MAX + 1));
This does indeed result in all numbers between 1 and RAND_MAX, inclusive, exactly once, before the sequence repeats again. Since the next state is based on multiplication, the state can never be zero (or all future states would also be zero). Thus the repeated number you see is the first one, and zero is the one that is never returned.
Apple has been promoting the use of better random number generators in their documentation and examples for at least as long as macOS (or OS X) has existed, so the quality of rand() is probably not deemed important, and they've just stuck with one of the simplest pseudorandom generators available. (As you noted, their rand() is even commented with a recommendation to use arc4random() instead.)
On a related note, the simplest pseudorandom number generator I could find that produces decent results in this (and many other) tests for randomness is xorshift*:
uint64_t x = *ctx;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
*ctx = x;
return (x * 0x2545F4914F6CDD1DUL) >> 33;
This implementation results in almost exactly 790 million duplicates in your test.
MacOS provides an undocumented rand() function in stdlib. If you leave it unseeded, then the first values it outputs are 16807, 282475249, 1622650073, 984943658 and 1144108930. A quick search will show that this sequence corresponds to a very basic LCG random number generator that iterates the following formula:
xn+1 = 75 ยท xn (mod 231 โ 1)
Since the state of this RNG is described entirely by the value of a single 32-bit integer, its period is not very long. To be precise, it repeats itself every 231 โ 2 iterations, outputting every value from 1 to 231 โ 2.
I don't think there's a standard implementation of rand() for all versions of Linux, but there is a glibc rand() function that is often used. Instead of a single 32-bit state variable, this uses a pool of over 1000 bits, which to all intents and purposes will never produce a fully repeating sequence. Again, you can probably find out what version you have by printing the first few outputs from this RNG without seeding it first. (The glibc rand() function produces the numbers 1804289383, 846930886, 1681692777, 1714636915 and 1957747793.)
So the reason you're getting more collisions in Linux (and hardly any in MacOS) is that the Linux version of rand() is basically more random.
rand() is defined by the C standard, and the C standard does not specify which algorithm to use. Obviously, Apple is using an inferior algorithm to your GNU/Linux implementation: The Linux one is indistinguishable from a true random source in your test, while the Apple implementation just shuffles the numbers around.
If you want random numbers of any quality, either use a better PRNG that gives at least some guarantees on the quality of the numbers it returns, or simply read from /dev/urandom or similar. The later gives you cryptographic quality numbers, but is slow. Even if it is too slow by itself, /dev/urandom can provide some excellent seeds to some other, faster PRNG.
In general, the rand/srand pair has been considered sort of deprecated for a long time due to low-order bits displaying less randomness than high-order bits in the results. This may or may not have anything to do with your results, but I think this is still a good opportunity to remember that even though some rand/srand implementations are now more up to date, older implementations persist and it's better to use random(3). On my Arch Linux box, the following note is still in the man page for rand(3):
The versions of rand() and srand() in the Linux C Library use the same
random number generator as random(3) and srandom(3), so the lower-order
bits should be as random as the higher-order bits. However, on older
rand() implementations, and on current implementations on different
systems, the lower-order bits are much less random than the higher-or-
der bits. Do not use this function in applications intended to be por-
table when good randomness is needed. (Use random(3) instead.)
Just below that, the man page actually gives very short, very simple example implementations of rand and srand that are about the simplest LC RNGs you've ever seen and having a small RAND_MAX. I don't think they match what's in the C standard library, if they ever did. Or at least I hope not.
In general, if you're going to use something from the standard library, use random if you can (the man page lists it as POSIX standard back to POSIX.1-2001, but rand is standard way back before C was even standardized). Or better yet, crack open Numerical Recipes (or look for it online) or Knuth and implement one. They're really easy and you only really need to do it once to have a general purpose RNG with the attributes you most often need and which is of known quality.
I was writing a function in C used to make roots of numbers, and I stumbled upon a problem: it works really well with cube roots of positive numbers and square roots, but when I try to make a cube root of a negative numer, it returns: -1.#IND00
I tried researching and it turns out the returned number is too big, but I can't understand why...
('rooter' is the function, x is the radicand and ind is the degree.)
I also tried to put in '0.66' instead of 1/ind but the same result happens.
float rooter(int x, int ind)
{
if(ind%2==0)
{
if (x>=0)
return ( pow(x, 1.0/ind) );
else
errore=1;
return -1;
}
else
{
return ( pow(x, (float)1.0/ind) );
}
}
pow does not accept negative base for non integer exponent. (Probably because making the special cases where it is traditionally defined work is too burdensome especially when it is expected to be implemented using logarithms.)
This answer is a little more philosophical than the other answers so far and attempts to address the underlying question "why doesn't pow allow a negative base with a float exponent?".
Consider how floating point numbers work and how powers to rational numbers are defined. Now given a negative base, ask yourself for what rational exponents is the result a real number?
Using the usual mathematical definition (-2)^(1/2) isn't a real number but you could find something arbitrarily close to 1/2 for which it is. For example (-2)^(49999/99999) is real. What this means is that if the implementation tried to determine what is and isn't real then any floating point precision error may actually swap your expression from real to imaginary, or vice versa, which would be unstable from a programmers perspective.
Another issue with this type of definition is that it requires us to represent the rational exponent in its most reduced form to determine if the expression is real or not. This isn't generally a trivial representation to determine. Note that (-2)^(2/4) is imaginary, even though the fourth root of -2 squared is real.
As stated by AProgrammer, the pow function does not accept negative x in your circumstances. To get round this, you can 'remember' the sign of x, pass its positive value (magnitude) to pow, then re-apply the sign (as you've already checked that ind is an odd number in this case):
else {
float sign = x < 0.0 ? -1.0 : +1.0;
// return sign * ( pow(fabs(x), (float)1.0/ind) ); // MNC - see comments
return sign * ( pow(fabs(x), 1.0/ind) ); // BPC - maybe?
}
Feel free to ask for further clarification and/or explanation.
pow would have to return complex numbers in order to handle negative bases without a much more complex API. Their approach is fast, easy to use and accessible. There is but one simple rule: the base must be zero or positive ^^
Plus, why the cast to float?
Simply change the code to
else
{
return pow(fabs(x), 1.0 / ind) * (x < 0 ? -1 : 1);
}
I am currently tightening floating-point numerics for an estimate of a value. (It's: p(k,t) for those who are interested.) Essentially, the utility can never yield an under-estimate of this value: the security of probable prime generation depends on a numerically robust implementation. While output results agree with the published values, I have used the DBL_EPSILON value to ensure that division, in particular, yields a result that is never less than the true value:
Consider: double x, y; /* assigned some values... */
The evaluation: r = x / y; occurs frequently, but these (finite precision) results may truncate significant digits from the true result - a possibly infinite precision rational expansion. I currently try to mitigate this by applying a bias to the numerator, i.e.,
r = ((1.0 + DBL_EPSILON) * x) / y;
If you know anything about this subject, p(k,t) is typically much smaller than most estimates - but it's simply not good enough to dismiss the issue with this "observation". I can of course state:
(((1.0 + DBL_EPSILON) * x) / y) >= (x / y)
Of course, I need to ensure that the 'biased' result is greater than, or equal to, the 'exact' value. While I am certain it has to do with manipulating or scaling DBL_EPSILON, I obviously want the 'biased' result to exceed the 'exact' result by a minimum - demonstrable under IEEE-754 arithmetic assumptions.
Yes, I've looked though Goldberg's paper, and I've searched for a robust solution. Please don't suggest manipulation of rounding modes. Ideally, I'm after an answer by someone with a very good grasp on floating-point theorems, or knows of a very well illustrated example.
EDIT: To clarify, (((1.0 + DBL_EPSILON) * x) / y) or a form (((1.0 + c) * x) / y), is not a prerequisite. This was simply an approach I was using as 'probably good enough', without having provided a solid basis for it. I can state that the numerator and denominator will not be special values: NaNs, Infs, etc., nor will the denominator be zero.
First: I know that you don't want to set the rounding mode, but it really should be said that
in terms of precision, as others have noted, setting the rounding mode will produce as good of an answer as possible. Specifically, assuming that x and y are both positive (which seems to be the case, but hasn't been explicitly stated in your question), the following is a standard C snippet with the desired effect[1]:
#include <math.h>
#pragma STDC FENV_ACCESS on
int OldRoundingMode = fegetround();
fesetround(FE_UPWARD);
r = x/y;
fesetround(OldRoundingMode);
Now, that aside, there are legitimate reasons not to want to change the rounding mode (some platforms don't support round-to-plus-infinity, on some platforms changing the rounding mode introduces a large serializing stall, etc etc), and your desire not to do so shouldn't be brushed aside so casually. So, respecting your question, what else can we do?
If your platform supports fused multiply-add, there's a very elegant solution available to you:
#include <math.h>
r = x/y;
if (fma(r,y,-x) < 0) r = nextafter(r, INFINITY);
On platforms with hardware fma support, this is very efficient. Even if fma( ) is implemented in software, it may be acceptable. This approach has the virtue that it will deliver the same result as would changing the rounding mode; that is, the tightest bound possible.
If your platform's C library is antediluvian and does not provide fma, there is still hope. Your claimed statement is correct (assuming no denormal values, at least -- I would need to think more about what happens for denormals); (1.0+DBL_EPSILON)*x/y really is always greater than or equal to the infinitely precise x/y. It will sometimes be one ulp larger than the smallest value with this property, but that's a very small and probably acceptable margin. The proof of these claims is pretty fussy, and probably not suitable for StackOverflow, but I'll give a quick sketch:
Ignoring denormals, it suffices to restrict ourselves to x, y in [1.0, 2.0).
(1.0 + eps)*x >= x + eps > x. To see this, observe:
(1.0 + eps)*x = x + x*eps >= x + eps > x.
Let P be the mathematically precise x/y. We have:
(1.0 + eps)*x/y >= (x + eps)/y = x/y + eps/y = P + eps/y
Now, y is bounded above by 2, so this gives us:
(1.0 + eps)*x/y > P + eps/2
which is sufficient to guarantee that the result rounds to a value >= P. This also shows us the way to a tighter bound. We could instead use nextafter(x,INFINITY)/y to get the desired effect with a tighter bound in many cases. (nextafter(x,INFINITY) is always x + ulp, whereas (1.0 + eps)*x will be x + 2ulp half of the time. If you want to avoid calling the nextafter library function, you can use (x + (0.75*DBL_EPSILON)*x) instead to get the same result, under the working assumption of positive normal values).
In order to be really pedantically correct, this would become significantly more complicated. No one really writes code like this, but it would be along these lines:
#include <math.h>
#pragma STDC FENV_ACCESS on
#if defined FE_UPWARD
int OldRoundingMode = fegetround();
if (OldRoundingMode < 0) goto Error;
if (fesetround(FE_UPWARD)) goto Error;
r = x/y;
if (fesetround(OldRoundingMode)) goto TrulyHosed;
return r;
TrulyHosed:
// we established the desired rounding mode and did our computation,
// but now we can't set it back to the original mode. I have no idea
// how you handle this gracefully.
Error:
#else
// we can't establish the desired rounding mode, so fall back on
// something else.
If have the following C function, used to determine if one number is a multiple of another to an arbirary tolerance
#include <math.h>
#define TOLERANCE 0.0001
int IsMultipleOf(double x,double mod)
{
return(fabs(fmod(x, mod)) < TOLERANCE);
}
It works fine, but profiling shows it to be very slow, to the extent that it has become a candidate for optimization. About 75% of the time is spent in modulo and the remaining in fabs. I'm trying to figure a way of speeding things up, using something like a look-up table. The parameter x changes regularly, whereas mod changes infrequently. The number of possible values of x is small enough that the space for a look-up would not be an issue, typically it will be one of a few hundred possible values. I can get rid of the fabs easily enough, but can't figure out a reasonable alternative to the modulo. Any ideas on how to optimize the above?
Edit The code will be running on a wide range of Windows desktop and mobile devices, hence processors could include Intel, AMD on desktop, and ARM or SH4 on mobile devices. VisualStudio 2008 is the compiler.
Do you really have to use modulo for this?
Wouldn't it be possible to just result = x / mod and then check if the decimal part of result is close to 0. For instance:
11 / 5.4999 = 2.000003 ==> 0.000003 < TOLERANCE
Or something like that.
Division (floating point or not, fmod in your case) is often an operation where the execution time varies a lot depending on the cpu and compiler:
gcc has a builtin replacement for
that if you give it the right compile
flags or if you use __builtin_fmod
explicitly. This then might map the
operation on a small number of
assembler instructions.
there may be special units like SSE
on intel processors where this
operation is implemented more
efficiently
By such tricks, depending on your environment (you didn't tell which) the time may vary from some clock cycles to some hundred. I think best is to look into the documentation of your compiler and cpu for that particular operation.
The following is probably overkill, and sub-optimal. But for what it is worth here is one way on how to do it.
We know the format of the double ...
1 bit for the sign
11 bits for the biased exponent
52 fraction bits
Let ...
value = x / mod;
exp = exponent bits of value - BIAS;
lsb = least sig bit of value's fraction bits;
Once you have that ...
/*
* If applying the exponent would eliminate the fraction bits
* then for double precision resolution it is a multiple.
* Note: lsb may require some massaging.
*/
if (exp > lsb)
return (true);
if (exp < 0)
return (false);
The only case remaining is the tolerance case. Build your double so that you are getting rid of all the digits to the left of the decimal.
sign bit is zero (positive)
exponent is the BIAS (1023 I think ... look it up to be sure)
shift the fraction bits as appropriate
Now compare it against your tolerance.
I think you need to inspect the bowels of your C RTL fmod() function: X86 FPU's have 'FPREM/FPREM1' instructions which computes remainders by repeated subtraction.
While floating point division is a single instruction, it seems you may need to call FPREM repeatedly to get the right answer for modulus, so your RTL may not use it.
I have not tested this at all, but from the way I understand fmod this should be equivalent inlined, which might let the compiler optimize it better, though I would have thought that the compiler's math library (or builtins) would work just as well. (also, I don't even know for sure if this is correct).
#include <math.h>
int IsMultipleOf(double x, double mod) {
long n = x / mod; // You should probably test for /0 or NAN result here
double new_x = mod * n;
double delta = x - new_x;
return fabs(delta) < TOLERANCE; // and for NAN result from fabs
}
Maybe you can get away with long long instead of double if you have comparable scale of data. For example long long would be enough for over 60 astronomical units in micrometer resolution.
Does it need to be double precision ? Depending on how good your math library is, this ought to be faster:
#include <math.h>
#define TOLERANCE 0.0001f
bool IsMultipleOf(float x, float mod)
{
return(fabsf(fmodf(x, mod)) < TOLERANCE);
}
I presume modulo looks a little like this on the inside:
mod(x,m) {
while (x > m) {
x = x - m
}
return x
}
I think that through some sort of search i could be optimised: eg:
fastmod(x,m) {
q = 1
while (m * q < x) {
q = q * 2
}
return mod((x - (q / 2) * m), m)
}
You might even choose to replace the finall call to mod with annother call to fastmod, adding the condition that if x < m then to return x.