I am doing a contest problem. Attaching an excerpt here -
Find the maximum profit Chef can make if he sells his cars in an optimal way. Since this number may be large, compute it modulo 1,000,000,007 (10^9+7).
Does this simply mean I have to find the remainder of the final profit on dividing by 1000000007? Pardon me for this simple question, the language is not clear.
In the context of this remark, 10^9+7 is meant to be read as 109+7, which is 1000000007.
Very large numbers will exceed the range of integer types, so you are asked to compute the result modulo 1000000007 (ie: compute the remainder of the division by 1000000007), which can be achieved by reducing the intermediary results modulo 1000000007 whenever they exceed or equal this value, as long as the final result is obtained by additions and multiplications. Modular Arithmetics has many more interesting properties. Pupils are told a simple way to check arithmetic operations, computing modulo 9 by adding the digits.
For example, you can compute factorials modulo 1000000007 this way:
long factorial_mod(int n) {
long res = 1;
for (int i = 2; i <= n; i++) {
res = res * (long long)i % 1000000007;
}
return res;
}
Related
I have to calculate in c binomial coefficients of the expression
(x+y)**n, with n very large (order of 500-1000). The first algo to calculate binomial coefficients that came to my mind was multiplicative formula. So I coded it into my program as
long double binomial(int k, int m)
{
int i,j;
long double num=1, den=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
num*=(k+1-i);
den*=i;
}
return num/den;
}
This code is really fast on a single core thread, compared for example to recursive formula, although the latter one is less subject to rounding errors since involves only sums and not divisions.
So I wanted to test these algos for great values and tried to evaluate 500 choose 250 (order 10^160). I have found that the "relative error" is less than 10^(-19), so basically they are the same number, although they differ something like 10^141.
So I'm wondering: Is there a way to evaluate the order of the error of the calculation? And is there some fast way to calculate binomial coefficients which is more precise than the multiplicative formula? Since I don't know the precision of my algo I don't know where to truncate the stirling's series to get better results..
I've googled for some tables of binomial coefficients so I could copy from those, but the best one I've found stops at n=100...
If you're just computing individual binomial coefficients C(n,k) with n fairly large but no larger than about 1750, then your best bet with a decent C library is to use the tgammal standard library function:
tgammal(n+1) / (tgammal(n-k+1) * tgammal(k+1))
Tested with the Gnu implementation of libm, that consistently produced results within a few ULP of the precise value, and generally better than solutions based on multiplying and dividing.
If k is small (or large) enough that the binomial coefficient does not overflow 64 bits of precision, then you can get a precise result by alternately multiplying and dividing.
If n is so large that tgammal(n+1) exceeds the range of a long double (more than 1754) but not so large that the numerator overflows, then a multiplicative solution is the best you can get without a bignum library. However, you could also use
expl(lgammal(n+1) - lgammal(n-k+1) - lgammal(k+1))
which is less precise but easier to code. (Also, if the logarithm of the coefficient is useful to you, the above formula will work over quite a large range of n and k. Not having to use expl will improve the accuracy.)
If you need a range of binomial coefficients with the same value of n, then your best bet is iterative addition:
void binoms(unsigned n, long double* res) {
// res must have (n+3)/2 elements
res[0] = 1;
for (unsigned i = 2, half = 0; i <= n; ++i) {
res[half + 1] = res[half] * 2;
for (int k = half; k > 0; --k)
res[k] += res[k-1];
if (i % 2 == 0)
++half;
}
}
The above produces only the coefficients with k from 0 to n/2. It has a slightly larger round-off error than the multiplicative algorithm (at least when k is getting close to n/2), but it's a lot quicker if you need all the coefficients and it has a larger range of acceptable inputs.
To get exact integer results for small k and m, a better solution might be (a slight variation of your code) :
unsigned long binomial(int k, int m)
{
int i,j; unsigned long num=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
num*=(k+1-i);
num/=i;
}
return num;
}
Every time you get a combinatorial number after doing the division num/=i, so you won't get truncated. To get approximate results for bigger k and m, your solution might be good. But beware that long double multiplication is already much slower than the multiplication and division of integers (unsigned long or size_t). If you want to get bigger numbers exact, probably a big integer class must be coded or included from a library. You can also google if there's fast factorial algorithm for n! of extremely big integer n. That may help with combinatorics, too. Stirling's formula is a good approximation for ln(n!) when n is large. It all depends on how accurate you want to be.
If you really want to use the multiplicative formula, I would recommend an exception based approach.
Implement the formula with large integers (long long for example)
Attempt division operations as soon as possible (as suggested by Zhuoran)
Add code to check correctness of every division and multiplication
Resolve incorrect divisions or multiplications, e.g.
try the division in loop proposed by Zhuoran, but if it fails resort back to the initial algorithm (accumulating the product of divisor in den)
store the unresolved multiplier, divisors in additional long integers and try to resolve them in next iteration loops
If you really use large numbers then your result might not fit in long integer. then in that case you can switch to long double or use your personal LongInteger storage.
This is a skeleton code, to give you an idea:
long long binomial_l(int k, int m)
{
int i,j;
long long num=1, den=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
int multiplier=(k+1-i);
int divisor=i;
long long candidate_num=num*multiplier;
//check multiplication
if((candidate_num/multiplier)!=num)
{
//resolve exception...
}
else
{
num=candidate_num;
}
candidate_num=num/divisor;
//check division
if((candidate_num*divisor)==num)
{
num=candidate_num;
}
else
{
//resolve exception
den*=divisor;
//this multiplication should also be checked...
}
}
long long candidate_result= num/den;
if((candidate_result*den)==num)
{
return candidate_result;
}
// you should not get here if all exceptions are resolved
return 0;
}
This may not be what OP is looking for, but one can analytically approximate nCr for large n with binary entropy function. It is mentioned in
Page 10 of http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/5.16.pdf
https://math.stackexchange.com/questions/835017/using-binary-entropy-function-to-approximate-logn-choose-k
I am running a bunch of physical simulations in which I need random numbers. I'm using the standard rand() function in C++.
So it works like this: first I precalculate a bunch of probabilities that are of the form 1/(1+exp(a)), for a set of different a. They're of type double as returned by the exp function in the math library, and then things must happen with those probabilities, there are only two of them, so I generate a random number uniformly distributed between 0 and 1 and compared with those precalculated probabilities. To do that, I used:
double p = double(rand()%101)/100.0;
so I'm given random values between 0 and 1 both included. This didn't yield to correct physical results. I tried this:
double p = double(rand()%1000001)/1000000.0;
And this worked. I don't really understand why so I would like some criteria about how to do it. My intuition tells that if I do
double p = double(rand()%(N+1))/double(N);
with N big enough such that the smallest division (1/N) is much smaller than the smallest probability 1/1+exp(a) then I will be getting realistic random numbers.
I would like to understand why, though.
rand() returns a random number between 0 and RAND_MAX.
Therefore you need this:
double p = double(rand() % RAND_MAX) / double(RAND_MAX);
Also run this snippet and you will understand:
int i;
for (i = 1; i < 30; i++)
{
int rnd = rand();
double p0 = double(rnd % 101) / 100.0;
double p1 = double(rnd % 1000001) / 1000000.0;
printf ("%d\t%f\t%f\n", rnd, p0, p1);
}
for (i = 1; i < 30; i++)
{
int rnd = rand();
double p0 = double(rnd) / double(RAND_MAX);
printf ("%d\t%f\n", rnd, p0);
}
You have multiple problems.
rand() isn't very random at all. On almost all operating systems it returns badly distributed, horribly biased numbers. It's actually quite hard to find a good random number generator, but I can guarantee you that rand() will be among the worst you can find.
rand() % N gives a biased distribution. Think about the pigeonhole principle. Let's simplify it, assume that rand returns numbers [0,7) and your N is 6. 0 to 5 map to 0 to 5, 6 maps to 0 and 7 maps to 1, meaning that 0 and 1 are twice as likely to come out.
Converting the numbers to double before division does not remove the bias from 2, it just makes it less visible. The pigeonhole principle applies regardless of the conversions you do.
Converting a well-distributed random number from integer to float/double is harder than it looks. Simple division ignores the problems of how floating point math works.
I can't help you much with 1, you need to do research. Look around the net for random number libraries. If you want something very random and unpredictable you need to look for cryptographic random libraries. If you want a repeatable but good random number Mersenne Twister should probably be good enough. But you need to do the research here.
For 2 and 3 there are standard solutions. You are mapping a set from M elements to N elements and rand % N will only work iff N < M and N and M share prime factors. Since on most systems M will be a power of two it means that N also has to be a power of two. So assuming that M is a power of two the algorithm is: find the nearest power of 2 higher or equal to N, let's call it P. Generate randomness_source() % P. If the number is higher than N, throw it away and try again. This is the only safe way to do this. Cleverer people than you and me have spent years on this problem, there's no better way to remove the bias.
For 4, you can probably ignore the problem and just divide, in an absolute majority of cases this should be good enough. If you really want to study the problem, I've done some work on it and published the code on github. There I go through some basic principles of how floating point numbers work and how it relates to generating random numbers.
// produces pseudorandom bits. These are NOT crypto quality bits. Has the same underlying unpredictability as uncooked
// rand() output. It buffers rand() bits to produce a more convenient zero-to-the-argument range including negative
// arguments, corrects for the toward-zero bias of the modular construction I'd be using otherwise, eliminates the
// RAND_MAX range limitation, (use INT64_MAX instead) and effectively obscures biases and sequence telltales due to
// annoyingly bad rand libraries. It does not correct these biases; anyone tracking the arguments and outputs has
// enough information to reconstruct the rand() output and detect them. But it makes the relationships drastically more complicated.
// needs stdint, stdlib.
int64_t privaterandom(int64_t range, int reset){
static uint64_t state = 0;
int64_t retval;
if (reset != 0){
srand((unsigned int)range);
state = (uint64_t)range;
}
if (range == 0) return (0);
if (range < 0) return -privaterandom(-range, 0);
if (range > UINT64_MAX/0xFFFFFFFF){
retval = (privaterandom(range/0xFFFFFFFF, 0) * 0xFFFFFFFF); // order of operations matters
return (retval + privaterandom(0xFFFFFFFF, 0));
}
while (state < UINT64_MAX / 0xFF){
state *= RAND_MAX;
state += rand();
}
retval = (state % range);
// makes "pigeonhole" bias alternate unpredictably between toward-even and toward-odd
if ((state/range > (state - (retval) )/ range) && state % 2 == 0) retval++;
state /= range;
return retval;
}
int64_t Random(int64_t range){ return (privaterandom(range, 0));}
int64_t Random_Init(int64_t seed){return (privaterandom(seed, 1));}
I have a loop like this:
for(uint64_t i=0; i*i<n; i++) {
This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.
unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {
In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.
My question is: is the sqrt function accurate enough to do this for all cases?
Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.
My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.
sqrt(x+dx) < sqrt(x) + 1 //solve
dx < 2*sqrt(x) + 1
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1
Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:
unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi); //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi)); //65535 65536
Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.
First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.
If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:
for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)
replacing the multiply with an lea or a shift and two adds.
Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.
However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:
uint32_t cut = sqrt(n) + 1;
and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:
uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;
Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:
uint32_t cut = sqrt(n);
Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:
double_n = n*(1 + e)
which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:
square_root(double_n) = square_root(n)*square_root(1+e)
Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:
square_root(1+e) = 1 + e/2 + O(e^2)
= 1 + d with |d| <~ 2^-54
Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.
[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.
All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both
a simple way to avoid imul in loop
a simple way to compute the ceiling sqrt
Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:
sqrtFloor
"Return the integer part of the square root of self"
| guess delta |
guess := 1 bitShift: (self highBit + 1) // 2.
[
delta := (guess squared - self) // (guess + guess).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1
Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow
But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:
uint32_t sqrtFloor(uint64_t n)
{
int64_t diff;
int64_t delta;
uint64_t guess=sqrt(n); /* implicit conversions here... */
while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
guess -= delta;
return guess-(diff>0);
}
That's a few integer multiplications and divisions, but outside the main loop.
What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.
For x>0, there is
.
To make the notation more compact, rewriting the above formula as
Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:
Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.
For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.
For the classic interview question "How do you perform integer multiplication without the multiplication operator?", the easiest answer is, of course, the following linear-time algorithm in C:
int mult(int multiplicand, int multiplier)
{
for (int i = 1; i < multiplier; i++)
{
multiplicand += multiplicand;
}
return multiplicand;
}
Of course, there is a faster algorithm. If we take advantage of the property that bit shifting to the left is equivalent to multiplying by 2 to the power of the number of bits shifted, we can bit-shift up to the nearest power of 2, and use our previous algorithm to add up from there. So, our code would now look something like this:
#include <math.h>
int log2( double n )
{
return log(n) / log(2);
}
int mult(int multiplicand, int multiplier)
{
int nearest_power = 2 ^ (floor(log2(multiplier)));
multiplicand << nearest_power;
for (int i = nearest_power; i < multiplier; i++)
{
multiplicand += multiplicand;
}
return multiplicand;
}
I'm having trouble determining what the time complexity of this algorithm is. I don't believe that O(n - 2^(floor(log2(n)))) is the correct way to express this, although (I think?) it's technically correct. Can anyone provide some insight on this?
mulitplier - nearest_power can be as large as half of multiplier, and as it tends towards infinity the constant 0.5 there doesn't matter (not to mention we get rid of constants in Big O). The loop is therefore O(multiplier). I'm not sure about the bit-shifting.
Edit: I took more of a look around on the bit-shifting. As gbulmer says, it can be O(n), where n is the number of bits shifted. However, it can also be O(1) on certain architectures. See: Is bit shifting O(1) or O(n)?
However, it doesn't matter in this case! n > log2(n) for all valid n. So we have O(n) + O(multiplier) which is a subset of O(2*multiplier) due to the aforementioned relationship, and thus the whole algorithm is O(multiplier).
The point of finding the nearest power is so that your function runtime could get close to runtime O(1). This happens when 2^nearest_power is very close to the result of your addition.
Behind the scenes the whole "to the power of 2" is done with bit shifting.
So, to answer your question, the second version of your code is still worse case linear time: O(multiplier).
Your answer, O(n - 2^(floor(log2(n)))), is also not incorrect; it's just very precise and might be hard to do in your head quickly to find the bounds.
Edit
Let's look at the second posted algorithm, starting with:
int nearest_power = 2 ^ (floor(log2(multiplier)));
I believe calculating log2, is, rather pleasingly, O(log2(multiplier))
then nearest_power gets to the interval [multiplier/2 to multiplier], the magnitude of this is multiplier/2. This is the same as finding the highest set-bit for a positive number.
So the for loop is O(multiplier/2), the constant of 1/2 comes out, so it is O(n)
On average, it is half the interval away, which would be O(multiplier/4). But that is just the constant 1/4 * n, so it is still O(n), the constant is smaller but it is still O(n).
A faster algorithm.
Our intuitiion is we can multiply by an n digit number in n steps
In binary this is using 1-bit shift, 1-bit test and binary add to construct the whole answer. Each of those operations is O(1). This is long-multiplication, one digit at a time.
If we use O(1) operations for n, an x bit number, it is O(log2(n)) or O(x), where x is the number of bits in the number
This is an O(log2(n)) algorithm:
int mult(int multiplicand, int multiplier) {
int product = 0;
while (multiplier) {
if (multiplier & 1) product += multiplicand;
multiplicand <<= 1;
multiplier >>= 1;
}
return product;
}
It is essentially how we do long multiplication.
Of course, the wise thing to do is use the smaller number as the multiplier. (I'll leave that as an exercise for the reader :-)
This only works for positive values, but by testing and remembering the signs of the input, operating on positive values, and then adjusting the sign, it works for all numbers.
Background:
Given n balls such that:
'a' balls are of colour GREEN
'b' balls are of colour BLUE
'c' balls are of colour RED
...
(of course a + b + c + ... = n)
The number of permutations in which these balls can be arranged is given by:
perm = n! / (a! b! c! ..)
Question 1:
How can I 'elegantly' calculate perm so as to avoid an integer overflow as as long as possible, and be sure that when I am done calculating, I either have the correct value of perm, or I know that the final result will overflow?
Basically, I want to avoid using something like GNU GMP.
Optionally, Question 2:
Is this a really bad idea, and should I just go ahead and use GMP?
These are known as the multinomial coefficients, which I shall denote by m(a,b,...).
And you can efficiently calculate them avoiding overflow by exploiting this identity (which should be fairly simple to prove):
m(a,b,c,...) = m(a-1,b,c,...) + m(a,b-1,c,...) + m(a,b,c-1,...) + ...
m(0,0,0,...) = 1 // base case
m(anything negative) = 0 // base case 2
Then it's a simple matter of using recursion to calculate the coefficients. Note that to avoid an exponential running time, you need to either cache your results (to avoid recomputation) or use dynamic programming.
To check for overflow, just make sure the sums won't overflow.
And yes, it's a very bad idea to use arbitrary precision libraries to do this simple task.
If you have globs of cpu time, you can make lists out of all the factorials, then find the prime factorization of all the numbers in the lists, then cancel all the numbers on the top with those on the bottom, until the numbers are completely reduced.
The overflow-safest way is the way Dave suggested. You find the exponent with which the prime p divides n! by the summation
m = n;
e = 0;
do{
m /= p;
e += m;
}while(m > 0);
Subtract the exponents of p in the factorisations of a! etc. Do that for all primes <= n, and you have the factorisation of the multinomial coefficient. That calculation overflows if and only if the final result overflows. But the multinomial coefficients grow rather fast, so you will have overflow already for fairly small n. For substantial calculations, you will need a bignum library (if you don't need exact results, you can get by a bit longer using doubles).
Even if you use a bignum library, it is worthwhile to keep intermediate results from getting too large, so instead of calculating the factorials and dividing huge numbers, it is better to calculate the parts in sequence,
n!/(a! * b! * c! * ...) = n! / (a! * (n-a)!) * (n-a)! / (b! * (n-a-b)!) * ...
and to compute each of these factors, let's take the second for illustration,
(n-a)! / (b! * (n-a-b)!) = \prod_{i = 1}^b (n-a+1-i)/i
is calculated with
prod = 1
for i = 1 to b
prod = prod * (n-a+1-i)
prod = prod / i
Finally multiply the parts. This requires n divisions and n + number_of_parts - 1 multiplications, keeping the intermediate results moderately small.
According to this link, you can calculate multinomial coefficients as a product of several binomial coefficients, controlling integer overflow on the way.
This reduces original problem to overflow-controlled computation of a binomial coefficient.
Notations: n! = prod(1,n) where prod you may guess what does.
It's very easy, but first you must know that for any 2 positive integers (i, n > 0) that expression is a positive integer:
prod(i,i+n-1)/prod(1,n)
Thus the idea is to slice the computation of n! in small chunks and to divide asap.
With a, than with b and so on.
perm = (a!/a!) * (prod(a+1, a+b)/b!) * ... * (prod(a+b+c+...y+1,n)/z!)
Each of these factors is an integer, so if perm will not overflow, neither one of its factors will do.
Though, in the calculation of a such factor could be an overflow in numerator or denominator but that's avoidable doing a multiplication in numerator then a division in alternance:
prod(a+1, a+b)/b! = (a+1)(a+2)/2*(a+3)/3*..*(a+b)/b
In that way every division will yield an integer.