I have to calculate in c binomial coefficients of the expression
(x+y)**n, with n very large (order of 500-1000). The first algo to calculate binomial coefficients that came to my mind was multiplicative formula. So I coded it into my program as
long double binomial(int k, int m)
{
int i,j;
long double num=1, den=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
num*=(k+1-i);
den*=i;
}
return num/den;
}
This code is really fast on a single core thread, compared for example to recursive formula, although the latter one is less subject to rounding errors since involves only sums and not divisions.
So I wanted to test these algos for great values and tried to evaluate 500 choose 250 (order 10^160). I have found that the "relative error" is less than 10^(-19), so basically they are the same number, although they differ something like 10^141.
So I'm wondering: Is there a way to evaluate the order of the error of the calculation? And is there some fast way to calculate binomial coefficients which is more precise than the multiplicative formula? Since I don't know the precision of my algo I don't know where to truncate the stirling's series to get better results..
I've googled for some tables of binomial coefficients so I could copy from those, but the best one I've found stops at n=100...
If you're just computing individual binomial coefficients C(n,k) with n fairly large but no larger than about 1750, then your best bet with a decent C library is to use the tgammal standard library function:
tgammal(n+1) / (tgammal(n-k+1) * tgammal(k+1))
Tested with the Gnu implementation of libm, that consistently produced results within a few ULP of the precise value, and generally better than solutions based on multiplying and dividing.
If k is small (or large) enough that the binomial coefficient does not overflow 64 bits of precision, then you can get a precise result by alternately multiplying and dividing.
If n is so large that tgammal(n+1) exceeds the range of a long double (more than 1754) but not so large that the numerator overflows, then a multiplicative solution is the best you can get without a bignum library. However, you could also use
expl(lgammal(n+1) - lgammal(n-k+1) - lgammal(k+1))
which is less precise but easier to code. (Also, if the logarithm of the coefficient is useful to you, the above formula will work over quite a large range of n and k. Not having to use expl will improve the accuracy.)
If you need a range of binomial coefficients with the same value of n, then your best bet is iterative addition:
void binoms(unsigned n, long double* res) {
// res must have (n+3)/2 elements
res[0] = 1;
for (unsigned i = 2, half = 0; i <= n; ++i) {
res[half + 1] = res[half] * 2;
for (int k = half; k > 0; --k)
res[k] += res[k-1];
if (i % 2 == 0)
++half;
}
}
The above produces only the coefficients with k from 0 to n/2. It has a slightly larger round-off error than the multiplicative algorithm (at least when k is getting close to n/2), but it's a lot quicker if you need all the coefficients and it has a larger range of acceptable inputs.
To get exact integer results for small k and m, a better solution might be (a slight variation of your code) :
unsigned long binomial(int k, int m)
{
int i,j; unsigned long num=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
num*=(k+1-i);
num/=i;
}
return num;
}
Every time you get a combinatorial number after doing the division num/=i, so you won't get truncated. To get approximate results for bigger k and m, your solution might be good. But beware that long double multiplication is already much slower than the multiplication and division of integers (unsigned long or size_t). If you want to get bigger numbers exact, probably a big integer class must be coded or included from a library. You can also google if there's fast factorial algorithm for n! of extremely big integer n. That may help with combinatorics, too. Stirling's formula is a good approximation for ln(n!) when n is large. It all depends on how accurate you want to be.
If you really want to use the multiplicative formula, I would recommend an exception based approach.
Implement the formula with large integers (long long for example)
Attempt division operations as soon as possible (as suggested by Zhuoran)
Add code to check correctness of every division and multiplication
Resolve incorrect divisions or multiplications, e.g.
try the division in loop proposed by Zhuoran, but if it fails resort back to the initial algorithm (accumulating the product of divisor in den)
store the unresolved multiplier, divisors in additional long integers and try to resolve them in next iteration loops
If you really use large numbers then your result might not fit in long integer. then in that case you can switch to long double or use your personal LongInteger storage.
This is a skeleton code, to give you an idea:
long long binomial_l(int k, int m)
{
int i,j;
long long num=1, den=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
int multiplier=(k+1-i);
int divisor=i;
long long candidate_num=num*multiplier;
//check multiplication
if((candidate_num/multiplier)!=num)
{
//resolve exception...
}
else
{
num=candidate_num;
}
candidate_num=num/divisor;
//check division
if((candidate_num*divisor)==num)
{
num=candidate_num;
}
else
{
//resolve exception
den*=divisor;
//this multiplication should also be checked...
}
}
long long candidate_result= num/den;
if((candidate_result*den)==num)
{
return candidate_result;
}
// you should not get here if all exceptions are resolved
return 0;
}
This may not be what OP is looking for, but one can analytically approximate nCr for large n with binary entropy function. It is mentioned in
Page 10 of http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/5.16.pdf
https://math.stackexchange.com/questions/835017/using-binary-entropy-function-to-approximate-logn-choose-k
Related
I am doing a contest problem. Attaching an excerpt here -
Find the maximum profit Chef can make if he sells his cars in an optimal way. Since this number may be large, compute it modulo 1,000,000,007 (10^9+7).
Does this simply mean I have to find the remainder of the final profit on dividing by 1000000007? Pardon me for this simple question, the language is not clear.
In the context of this remark, 10^9+7 is meant to be read as 109+7, which is 1000000007.
Very large numbers will exceed the range of integer types, so you are asked to compute the result modulo 1000000007 (ie: compute the remainder of the division by 1000000007), which can be achieved by reducing the intermediary results modulo 1000000007 whenever they exceed or equal this value, as long as the final result is obtained by additions and multiplications. Modular Arithmetics has many more interesting properties. Pupils are told a simple way to check arithmetic operations, computing modulo 9 by adding the digits.
For example, you can compute factorials modulo 1000000007 this way:
long factorial_mod(int n) {
long res = 1;
for (int i = 2; i <= n; i++) {
res = res * (long long)i % 1000000007;
}
return res;
}
I have this code that calculates a guess for sine and compares it to the standard C library's (glibc's in my case) result:
#include <stdio.h>
#include <math.h>
double double_sin(double a)
{
a -= (a*a*a)/6;
return a;
}
int main(void)
{
double clib_sin = sin(.13),
my_sin = double_sin(.13);
printf("%.16f\n%.16f\n%.16f\n", clib_sin, my_sin, clib_sin-my_sin);
return 0;
}
The accuracy for double_sin is poor (about 5-6 digits). Here's my output:
0.1296341426196949
0.1296338333333333
0.0000003092863615
As you can see, after .12963, the results differ.
Some notes:
I don't think the Taylor series will work for this specific situation, the factorials required for greater accuracy aren't able to be stored inside an unsigned long long.
Lookup tables are not an option, they take up too much space and generally don't provide any information on how to calculate the result.
If you use magic numbers, please explain them (although I would prefer if they were not used).
I would greatly prefer an algorithm is easily understandable and able to be used as a reference over one that is not.
The result does not have to be perfectly accurate. A minimum would be the requirements of IEEE 754, C, and/or POSIX.
I'm using the IEEE-754 double format, which can be relied on.
The range supported needs to be at least from -2*M_PI to 2*M_PI. It would be nice if range reduction were included.
What is a more accurate algorithm I can use to calculate the sine of a number?
I had an idea about something similar to Newton-Raphson, but for calculating sine instead. However, I couldn't find anything on it and am ruling this possibility out.
You can actually get pretty close with the Taylor series. The trick is not to calculate the full factorial on each iteration.
The Taylor series looks like this:
sin(x) = x^1/1! - x^3/3! + x^5/5! - x^7/7!
Looking at the terms, you calculate the next term by multiplying the numerator by x^2, multiplying the denominator by the next two numbers in the factorial, and switching the sign. Then you stop when adding the next term doesn't change the result.
So you could code it like this:
double double_sin(double x)
{
double result = 0;
double factor = x;
int i;
for (i=2; result+factor!=result; i+=2) {
result += factor;
factor *= -(x*x)/(i*(i+1));
}
return result;
}
My output:
0.1296341426196949
0.1296341426196949
-0.0000000000000000
EDIT:
The accuracy can be increased further if the terms are added in the reverse direction, however this means computing a fixed number of terms:
#define FACTORS 30
double double_sin(double x)
{
double result = 0;
double factor = x;
int i, j;
double factors[FACTORS];
for (i=2, j=0; j<FACTORS; i+=2, j++) {
factors[j] = factor;
factor *= -(x*x)/(i*(i+1));
}
for (j=FACTORS-1;j>=0;j--) {
result += factors[j];
}
return result;
}
This implementation loses accuracy if x falls outside the range of 0 to 2*PI. This can be fixed by calling x = fmod(x, 2*M_PI); at the start of the function to normalize the value.
I have to implement an algorithm to evaluate a sum of poisson functions, each one with multiplying constants:
Where C(k) are positive constants<1, cut is a cutoff because in principle the sum should take infinite numbers of k, and lambda is a number that may vary in my case from 20 to 100. I've tried a straight forward implementation in my code:
#include<quadmath.h>
... //Definitions of lambda and c[k]...
long double sum=0;
for(int k=0;k<cutoff;++k)
{
sum=sum + c[k] powq(lambda,k)/tgamma(k+1)* (1.0/expq(lambda));
}
But I am not quite satisfied. I've searched on "Numerical recipes" for a good approach to evalutation of a poisson distribution, but I didn't find anything about that.
Are there better ways to do this?
Edit: to be clear, I'm looking for the most precise way to approximate the probaility of large events, given a poisson distribution, without computing awkward (lambda^k)/k! Factors!
Well, a simple improvement will be to calculate by hand, and cache lambda^k and (k+1)!, since their value in the previous iteration can be used to quickly calculate the respective ones in the current iteration, with an O(1) calculation.
Also, since 1.0/exp(lambda) is a constant, you should calculate it once in advance 1
#include<quadmath.h>
... //Definitions of lambda and c[k]...
const long double e_coeff = 1.0 / expq(lambda);
long double inv_k_factorial = 1.0l;
long double lambda_pow_k = 1.0l;
long double sum = 0.0l;
for(int k=0; k < cutoff; ++k)
{
lambda_pow_k *= lambda;
inv_k_factorial /= (k + 1);
sum += (c[k] * lambda_pow_k * inv_k_factorial);
}
sum *= e_coeff;
So now the three function calls and their respective overhead are completely gone from your loop.
Now, I've attempted to use the same data types as you did when writing your question. Since your comment indicates that lambda is greater than 1.0 (no relative error growth due to a quickly diminishing lambda_pow_k, I believe) Any significance lost here would depend on the limits of long double, which is either good or bad, depending on your concrete needs.
Compilers are clever nowadays. So it could be optimized like that any way, but I think it's best to leave less obvious optimizations to the optimizer. Your code shouldn't suffer in performance even when handed to a non-optimizing compiler.
Since the Poisson probabilities obey the simple recurrence
P(k,lam) = lam/k * P(k-1,lam)
one possibility would be to use something like Horner's rule for polynomials. That is:
Sum{k|C[k]*P(k,lam)} = exp(-lam)*(C[0]+(lam/1)*(C[1]+(lam/2)*(..))..)
or
P = c[cut]
for k=cut-1 .. 0
P = P*lam/(k+1) + C[k]
P *= exp(-lam)
I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.
Example:
A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.
The function I am currently using is:
int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
return ceil(n_digits*log(src_base)/log(dst_base));
}
Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.
I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.
GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.
I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.
In the following, let dst_base be 2^w, src_base be b, and n_digits be n.
Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)
For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:
ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) )
≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.
In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.
I'm not sure about float point rounding in this case, but it is relatively easy to implement this using only integers, as log2 is a classic bit manipulation pattern and integer division can be easily rounded up. The following code is equivalent to yours, but using integers:
// Returns log2(x) rounded up using bit manipulation (not most efficient way)
unsigned int log2(unsigned int x)
{
unsigned int y = 0;
--x;
while (x) {
y++;
x >>= 1;
}
return y;
}
// Returns ceil(a/b) using integer division
unsigned int roundup(unsigned int a, unsigned int b)
{
return (a + b - 1) / b;
}
unsigned int get_num_digits_in_different_base(unsigned int n_digits, unsigned int src_base, unsigned int log2_dst_base)
{
return roundup(n_digits * log2(src_base), log2_dst_base);
}
Please, note that:
This function return different results compared to yours! However, in every case I looked, both were still correct (the smaller value was more accurate, but your requirement is just an upper bound).
The integer version I wrote receives log2_dst_base instead of dst_base to avoid overflow for 2^64.
log2 can be made more efficient using lookup tables.
I've used unsigned int instead of int.
For the classic interview question "How do you perform integer multiplication without the multiplication operator?", the easiest answer is, of course, the following linear-time algorithm in C:
int mult(int multiplicand, int multiplier)
{
for (int i = 1; i < multiplier; i++)
{
multiplicand += multiplicand;
}
return multiplicand;
}
Of course, there is a faster algorithm. If we take advantage of the property that bit shifting to the left is equivalent to multiplying by 2 to the power of the number of bits shifted, we can bit-shift up to the nearest power of 2, and use our previous algorithm to add up from there. So, our code would now look something like this:
#include <math.h>
int log2( double n )
{
return log(n) / log(2);
}
int mult(int multiplicand, int multiplier)
{
int nearest_power = 2 ^ (floor(log2(multiplier)));
multiplicand << nearest_power;
for (int i = nearest_power; i < multiplier; i++)
{
multiplicand += multiplicand;
}
return multiplicand;
}
I'm having trouble determining what the time complexity of this algorithm is. I don't believe that O(n - 2^(floor(log2(n)))) is the correct way to express this, although (I think?) it's technically correct. Can anyone provide some insight on this?
mulitplier - nearest_power can be as large as half of multiplier, and as it tends towards infinity the constant 0.5 there doesn't matter (not to mention we get rid of constants in Big O). The loop is therefore O(multiplier). I'm not sure about the bit-shifting.
Edit: I took more of a look around on the bit-shifting. As gbulmer says, it can be O(n), where n is the number of bits shifted. However, it can also be O(1) on certain architectures. See: Is bit shifting O(1) or O(n)?
However, it doesn't matter in this case! n > log2(n) for all valid n. So we have O(n) + O(multiplier) which is a subset of O(2*multiplier) due to the aforementioned relationship, and thus the whole algorithm is O(multiplier).
The point of finding the nearest power is so that your function runtime could get close to runtime O(1). This happens when 2^nearest_power is very close to the result of your addition.
Behind the scenes the whole "to the power of 2" is done with bit shifting.
So, to answer your question, the second version of your code is still worse case linear time: O(multiplier).
Your answer, O(n - 2^(floor(log2(n)))), is also not incorrect; it's just very precise and might be hard to do in your head quickly to find the bounds.
Edit
Let's look at the second posted algorithm, starting with:
int nearest_power = 2 ^ (floor(log2(multiplier)));
I believe calculating log2, is, rather pleasingly, O(log2(multiplier))
then nearest_power gets to the interval [multiplier/2 to multiplier], the magnitude of this is multiplier/2. This is the same as finding the highest set-bit for a positive number.
So the for loop is O(multiplier/2), the constant of 1/2 comes out, so it is O(n)
On average, it is half the interval away, which would be O(multiplier/4). But that is just the constant 1/4 * n, so it is still O(n), the constant is smaller but it is still O(n).
A faster algorithm.
Our intuitiion is we can multiply by an n digit number in n steps
In binary this is using 1-bit shift, 1-bit test and binary add to construct the whole answer. Each of those operations is O(1). This is long-multiplication, one digit at a time.
If we use O(1) operations for n, an x bit number, it is O(log2(n)) or O(x), where x is the number of bits in the number
This is an O(log2(n)) algorithm:
int mult(int multiplicand, int multiplier) {
int product = 0;
while (multiplier) {
if (multiplier & 1) product += multiplicand;
multiplicand <<= 1;
multiplier >>= 1;
}
return product;
}
It is essentially how we do long multiplication.
Of course, the wise thing to do is use the smaller number as the multiplier. (I'll leave that as an exercise for the reader :-)
This only works for positive values, but by testing and remembering the signs of the input, operating on positive values, and then adjusting the sign, it works for all numbers.