I have written this code in C where each of a,b,cc,ma,mb,mcc,N,k are int . But as per specification of the problem , N and k could be as big as 10^9 . 10^9 can be stored within a int variable in my machine. But internal and final value of of a,b,cc,ma,mb,mcc will be much bigger for bigger values of N and k which can not be stored even in a unsigned long long int variable.
Now, I want to print value of mcc % 1000000007 as you can see in the code. I know, some clever modulo arithmetic tricks in the operations of the body of the for loop can create correct output without any overflow and also can make the program time efficient. Being new in modulo arithmetic, I failed to solve this. Can someone point me out those steps?
ma=1;mb=0;mcc=0;
for(i=1; i<=N; ++i){
a=ma;b=mb;cc=mcc;
ma = k*a + 1;
mb = k*b + k*(k-1)*a*a;
mcc = k*cc + k*(k-1)*a*(3*b+(k-2)*a*a);
}
printf("%d\n",mcc%1000000007);
My attempt:
I used a,b,cc,ma,mb,mcc as long long and done this. Could it be optimized more ??
ma=1;mb=0;cc=0;
ok = k*(k-1);
for(i=1; i<=N; ++i){
a=ma;b=mb;
as = (a*a)%MOD;
ma = (k*a + 1)%MOD;
temp1 = (k*b)%MOD;
temp2 = (as*ok)%MOD;
mb = (temp1+temp2)%MOD;
temp1 = (k*cc)%MOD;
temp2 = (as*(k-2))%MOD;
temp3 = (3*b)%MOD;
temp2 = (temp2+temp3)%MOD;
temp2 = (temp2*a)%MOD;
temp2 = (ok*temp2)%MOD;
cc = (temp1 + temp2)%MOD;
}
printf("%lld\n",cc);
Let's look at a small example:
mb = (k*b + k*(k-1)*a*a)%MOD;
Here, k*b, k*(k-1)*a*a can overflow, so can the sum, taking into account
(x + y) mod m = (x mod m + y mod m) mod m
we can rewrite this (x= k*b, y=k*(k-1)*a*a and m=MOD)
mb = ((k*b) % MOD + (k*(k-1)*a*a) %MOD) % MOD
now, we could go one step futher. Since
x * y mod m = (x mod m * y mod m) mod m
we can also rewrite the multiplication k*(k-1)*a*a % MOD with with x=k*(k-1) and y=a*a to
((k*(k-1)) %MOD) * ((a*a) %MOD)) % MOD
I'm sure you can do the rest. While you can sprinkle % MOD all over the place, you should careful consider whether you need it or not, taking John's hint into account:
Adding two n-digit numbers produces a number of up to n+1 digits, and
multiplying an n-digit number by an m-digit number produces a result
with up to n + m digits.
As such, there are places where you will need use modulus properties, and there are some, where you surely don't need it, but this is your part of the work ;).
That's a good exercise to build a template class along these lines:
template <int N>
class modulo_int_t
{
public:
modulo_int_t(int value) : value_(value % N) {}
modulo_int_t<N> operator+(const modulo_int_t<N> &rhs)
{
return modulo_int_t<N>(value_ + rhs.value) ;
}
// fill in the other operations
private:
int value_ ;
} ;
Then write the operations using modulo_int_t<1000000007> objects instead of int.
Disclaimer: make use of long long where appropriate and take care of negative differencies...
Related
For an assignment we are required to write a division algorithm in order to complete a certain question using just addition and recursion. I found that, without using tail recursion, the naive repeated subtraction implementation can easily result in a stack overflow. So doing a quick analysis of this method, and correct me if I'm wrong, shows that if you divide A by B, with n and m binary digits respectively, it should be exponential in n-m. I actually get
O( (n-m)*2^(n-m) )
since you need to subtract an m binary digit number from an n binary digit number 2^(n-m) times in order to drop the n digit number to an n-1 digit number, and you need to do this n-m times to get a number with at most m digits in the repeated subtraction division, so the runtime should be as mentioned. Again, I very well may be wrong so someone please correct me if I am. This is assuming O(1) addition since I'm working with fixed size integers. I suppose with fixed size integers one could argue the algorithm is O(1).
Back to my main question. I developed a different method to perform integer division which works much better, even when using it recursively, based on the idea that for
P = 2^(k_i) + ... 2^(K_0)
we have
A/B = (A - B*P)/B + P
The algorithm goes as follows to caclulate A/B:
input:
A, B
i) Set Q = 0
ii) Find the largest K such that B * 2^K <= A < B * 2(K + 1)
iii) Q -> Q + 2^K
iv) A -> A - B * 2^k
v) Repeat steps ii) through iv) until A <= B
vi) Return Q (and A if you want the remainder)
with the restrictions of using only addition, I simply add B to itself on each recursive call, however here is my code without recursion and with the use of shifts instead of addition.
int div( unsigned int m, unsigned int n )
{
// q is a temporary n, sum is the quotient
unsigned int q, sum = 0;
int i;
while( m > n )
{
i = 0;
q = n;
// double q until it's larger than m and record the exponent
while( q <= m )
{
q <<= 1;
++i;
}
i--;
q >>= 1; // q is one factor of 2 too large
sum += (1<<i); // add one bit of the quotient
m -= q; // new numerator
}
return sum;
}
I feel that sum |= (1<<i) may be more appropriate in order to emphasize I'm dealing with a binary representation, but it didn't seem to give any performance boost and may make it harder to understand. So, if M and N are the number of bits in m and n respectively, an analysis suggests the inner loop is performed M - N times and each time the outer loop is completed that m looses one bit, and it must also be completed M - N times in order for the condition m <= n so I get that it's O( (M - N)^2 ).
So after all of that, I am asking if I am correct about the runtime of the algorithm and whether it can be improved upon?
Your algorithm is pretty good and your analysis of the running time is correct, but you don't need to do the inner loop every time:
unsigned div(unsigned num, unsigned den)
{
//TODO check for divide by zero
unsigned place=1;
unsigned ret=0;
while((num>>1) >= den) //overflow-safe check
{
place<<=1;
den<<=1;
}
for( ;place>0; place>>=1,den>>=1)
{
if (num>=den)
{
num-=den;
ret+=place;
}
}
return ret;
}
That makes it O(M-N)
I'm stuck there trying to figure out how to convert the last two "if" statements of the following code to a branchless state.
int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;
u = rand() % 4;
if ( y > x) u = 5;
if (-y > x) u = 4;
Or, in case the above turns out to be too difficult, you can consider them as:
if (x > 0) u = 5;
if (y > 0) u = 4;
I think that what gets me is the fact that those don't have an else catcher. If it was the case I could have probably adapted a variation of a branchless abs (or max/min) function.
The rand() functions you see aren't part of the real code. I added them like this just to hint at the expected ranges that the variables x, y and u can possibly have at the time the two branches happen.
Assembly machine code is allowed for the purpose.
EDIT:
After a bit of braingrinding I managed to put together a working branchless version:
int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;
u = rand() % 4;
u += (4-u)*((unsigned int)(x+y) >> 31);
u += (5-u)*((unsigned int)(x-y) >> 31);
Unfortunately, due to the integer arithmetic involved, the original version with if statements turns out to be faster by a 30% range.
Compiler knows where the party is at.
[All: this answer was written with the assumption that the calls on rand() were part of the problem. I offer improvement below under that assumption.
OP belatedly clarifies he only used rand to tell us ranges (and presumably distribution) of the values of x and y. Unclear if he meant for the value for u, too. Anyway, enjoy my improved answer to the problem he didn't really pose].
I think you'd be better off recoding this as:
int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;
if ( y > x) u = 5;
else if (-y > x) u = 4;
else u = rand() % 4;
This calls the last rand only 1/4 as often as OP's original code.
Since I assume rand (and the divides) are much more expensive
than compare-and-branch, this would be a significant savings.
If your rand generator produces a lot of truly random bits (e.g. 16) on each call as it should, you can call it just once (I've assumed rand is more expensive than divide, YMMV):
int u, x, y, t;
t = rand() ;
u = t % 4;
t = t >> 2;
x = t % 100 - 50;
y = ( t / 100 ) %100 - 50;
if ( y > x) u = 5;
else if (-y > x) u = 4;
I think that the rand function in the MS C library is not good enough for this if you want really random values. I had to code my own; turned out faster anyway.
You might also get rid of the divide, by using multiplication by a reciprocal (untested):
int u, x, y;
unsigned int t;
unsigned long t2;
t = rand() ;
u = t % 4;
{ // Compute value of x * 2^32 in a long by multiplying.
// The (unsigned int) term below should be folded into a single constant at compile time.
// The remaining multiply can be done by one machine instruction
// (typically 32bits * 32bits --> 64bits) widely found in processors.
// The "4" has the same effect as the t = t >> 2 in the previous version
t2 = ( t * ((unsigned int)1./(4.*100.)*(1<<32));
}
x = (t2>>32)-50; // take the upper word (if compiler won't, do this in assembler)
{ // compute y from the fractional remainder of the above multiply,
// which is sitting in the lower 32 bits of the t2 product
y = ( t2 mod (1<<32) ) * (unsigned int)(100.*(1<<32));
}
if ( y > x) u = 5;
else if (-y > x) u = 4;
If your compiler won't produce the "right" instructions, it should be straightforward to write assembly code to do this.
Some tricks using arrays indices, they may be quite fast if the compiler/CPU has one-step instructions to convert comparison results to 0-1 values (e.g. x86's "sete" and similar).
int ycpx[3];
/* ... */
ycpx[0] = 4;
ycpx[1] = u;
ycpx[2] = 5;
u = ycpx[1 - (-y <= x) + (y > x)];
Alternate form
int v1[2];
int v2[2];
/* ... */
v1[0] = u;
v1[1] = 5;
v2[1] = 4;
v2[0] = v1[y > x];
u = v2[-y > x];
Almost unreadable...
NOTE: In both cases the initialization of array elements containing 4 and 5 may be included in declaration and arrays may be made static if reentrancy is not a problem for you.
Forgive me if I am being a bit silly, but I have only very recently started programming, and am maybe a little out of my depth doing Problem 160 on Project Euler. I have made some attempts at solving it but it seems that going through 1tn numbers will take too long on any personal computer, so I guess I should be looking into the mathematics to find some short-cuts.
Project Euler Problem 160:
For any N, let f(N) be the last five digits before the trailing zeroes
in N!. For example,
9! = 362880 so f(9)=36288 10! = 3628800 so f(10)=36288 20! =
2432902008176640000 so f(20)=17664
Find f(1,000,000,000,000)
New attempt:
#include <stdio.h>
main()
{
//I have used long long ints everywhere to avoid possible multiplication errors
long long f; //f is f(1,000,000,000,000)
f = 1;
for (long long i = 1; i <= 1000000000000; i = ++i){
long long p;
for (p = i; (p % 10) == 0; p = p / 10) //p is i without proceeding zeros
;
p = (p % 1000000); //p is last six nontrivial digits of i
for (f = f * p; (f % 10) == 0; f = f / 10)
;
f = (f % 1000000);
}
f = (f % 100000);
printf("f(1,000,000,000,000) = %d\n", f);
}
Old attempt:
#include <stdio.h>
main()
{
//This part of the programme removes the zeros in factorials by dividing by 10 for each factor of 5, and finds f(1,000,000,000,000) inductively
long long int f, m; //f is f(n), m is 10^k for each multiple of 5
short k; //Stores multiplicity of 5 for each multiple of 5
f = 1;
for (long long i = 1; i <= 100000000000; ++i){
if ((i % 5) == 0){
k = 1;
for ((m = i / 5); (m % 5) == 0; m = m / 5) //Computes multiplicity of 5 in factorisation of i
++k;
m = 1;
for (short j = 1; j <= k; ++j) //Computes 10^k
m = 10 * m;
f = (((f * i) / m) % 100000);
}
else f = ((f * i) % 100000);
}
printf("f(1,000,000,000,000) = %d\n", f);
}
The problem is:
For any N, let f(N) be the last five digits before the trailing zeroes in N!. Find f(1,000,000,000,000)
Let's rephrase the question:
For any N, let g(N) be the last five digits before the trailing zeroes in N. For any N, let f(N) be g(N!). Find f(1,000,000,000,000).
Now, before you write the code, prove this assertion mathematically:
For any N > 1, f(N) is equal to g(f(N-1) * g(N))
Note that I have not proved this myself; I might be making a mistake here. (UPDATE: It appears to be wrong! We'll have to give this more thought.) Prove it to your satisfaction. You might want to start by proving some intermediate results, like:
g(x * y) = g(g(x) * g(y))
And so on.
Once you have obtained a proof of this result, now you have a recurrence relation that you can use to find any f(N), and the numbers you have to deal with don't ever get much larger than N.
Prod(n->k)(k*a+c) mod a <=> c^k mod a
For example
prod[ 3, 1000003, 2000003,... , 999999000003 ] mod 1000000
equals
3^(1,000,000,000,000/1,000,000) mod 1000000
And number of trailing 0 in N! equals to number of 5 in factorisation of N!
I would compute the whole thing and then separate first nonzero digits from LSB ...
but for you I think is better this:
1.use bigger base
any number can be rewrite as sum of multiplies of powers of the same number (base)
like 1234560004587786542 can be rewrite to base b=1000 000 000 like this:
1*b^2 + 234560004*b^1 + 587786542*b^0
2.when you multiply then lower digit is dependent only on lowest digits of multiplied numbers
A*B = (a0*b^0+a1*b^1+...)*(b0*b^0+b1*b^1+...)
= (a0*b0*b^0)+ (...*b^1) + (...*b^2)+ ...
3.put it together
for (f=1,i=1;i<=N;i++)
{
j=i%base;
// here remove ending zeroes from j
f*=j;
// here remove ending zeroes from f
f%=base;
}
do not forget that variable f has to be big enough for base^2
and base has to be at least 2 digits bigger then 100000 to cover 5 digits and overflows to zero
base must be power of 10 to preserve decimal digits
[edit1] implementation
uint<2> f,i,j,n,base; // mine 64bit unsigned ints (i use 32bit compiler/app)
base="10000000000"; // base >= 100000^2 ... must be as string to avoid 32bit trunc
n="20"; // f(n) ... must be as string to avoid 32bit trunc
for (f=1,i=1;i<=n;i++)
{
j=i%base;
for (;(j)&&((j%10).iszero());j/=10);
f*=j;
for (;(f)&&((f%10).iszero());f/=10);
f%=base;
}
f%=100000;
int s=f.a[1]; // export low 32bit part of 64bit uint (s is the result)
It is too slow :(
f(1000000)=12544 [17769.414 ms]
f( 20)=17664 [ 0.122 ms]
f( 10)=36288 [ 0.045 ms]
for more speed or use any fast factorial implementation
[edit2] just few more 32bit n! factorials for testing
this statement is not valid :(
//You could attempt to exploit that
//f(n) = ( f(n%base) * (f(base)^floor(n/base)) )%base
//do not forget that this is true only if base fulfill the conditions above
luckily this one seems to be true :) but only if (a is much much bigger then b and a%base=0)
g((a+b)!)=g(g(a!)*g(b!))
// g mod base without last zeroes...
// this can speed up things a lot
f( 1)=00001
f( 10)=36288
f( 100)=16864
f( 1,000)=53472
f( 10,000)=79008
f( 100,000)=56096
f( 1,000,000)=12544
f( 10,000,000)=28125
f( 1,000,100)=42016
f( 1,000,100)=g(??????12544*??????16864)=g(??????42016)->42016
the more is a closer to b the less valid digits there are!!!
that is why f(1001000) will not work ...
I'm not an expert project Euler solver, but some general advice for all Euler problems.
1 - Start by solving the problem in the most obvious way first. This may lead to insights for later attempts
2 - Work the problem for a smaller range. Euler usually give an answer for the smaller range that you can use to check your algorithm
3 - Scale up the problem and work out how the problem will scale, time-wise, as the problem gets bigger
4 - If the solution is going to take longer than a few minutes, it's time to check the algorithm and come up with a better way
5 - Remember that Euler problems always have an answer and rely on a combination of clever programming and clever mathematics
6 - A problem that has been solved by many people cannot be wrong, it's you that's wrong!
I recently solved the phidigital number problem (Euler's site is down, can't look up the number, it's quite recent at time of posting) using exactly these steps. My initial brute-force algorithm was going to take 60 hours, I took a look at the patterns solving to 1,000,000 showed and got the insight to find a solution that took 1.25s.
It might be an idea to deal with numbers ending 2,4,5,6,8,0 separately. Numbers ending 1,3,7,9 can not contribute to a trailing zeros. Let
A(n) = 1 * 3 * 7 * 9 * 11 * 13 * 17 * 19 * ... * (n-1).
B(n) = 2 * 4 * 5 * 6 * 8 * 10 * 12 * 14 * 15 * 16 * 18 * 20 * ... * n.
The factorial of n is A(n)*B(n). We can find the last five digits of A(n) quite easily. First find A(100,000) MOD 100,000 we can make this easier by just doing multiplications mod 100,000. Note that A(200,000) MOD 100,000 is just A(100,000)*A(100,000) MOD 100,000 as 100,001 = 1 MOD 100,000 etc. So A(1,000,000,000,000) is just A(100,000)^10,000,000 MOD 100,000.
More care is needed with 2,4,5,6,8,0 you'll need to track when these add a trailing zero. Obviously whenever we multiply by numbers ending 2 or 5 we will end up with a zero. However there are cases when you can get two zeros 25*4 = 100.
I'm working on a cryptographic exercise, and I'm trying to calculate (2n-1)mod p where p is a prime number
What would be the best approach to do this? I'm working with C so 2n-1 becomes too large to hold when n is large
I came across the equation (a*b)modp=(a(bmodp))modp, but I'm not sure this applies in this case, as 2n-1 may be prime (or I'm not sure how to factorise this)
Help much appreciated.
A couple tips to help you come up with a better way:
Don't use (a*b)modp=(a(bmodp))modp to compute 2n-1 mod p, use it to compute 2n mod p and then subtract afterward.
Fermat's little theorem can be useful here. That way, the exponent you actually have to deal with won't exceed p.
You mention in the comments that n and p are 9 or 10 digits, or something. If you restrict them to 32 bit (unsigned long) values, you can find 2^n mod p with a simple (binary) modular exponentiation:
unsigned long long u = 1, w = 2;
while (n != 0)
{
if ((n & 0x1) != 0)
u = (u * w) % p; /* (mul-rdx) */
if ((n >>= 1) != 0)
w = (w * w) % p; /* (sqr-rdx) */
}
r = (unsigned long) u;
And, since (2^n - 1) mod p = r - 1 mod p :
r = (r == 0) ? (p - 1) : (r - 1);
If 2^n mod p = 0 - which doesn't actually occur if p > 2 is prime - but we might as well consider the general case - then (2^n - 1) mod p = -1 mod p.
Since the 'common residue' or 'remainder' (mod p) is in [0, p - 1], we add a some multiple of p so that it is in this range.
Otherwise, the result of 2^n mod p was in [1, p - 1], and subtracting 1 will be in this range already. It's probably better expressed as:
if (r == 0)
r = p - 1; /* -1 mod p */
else
r = r - 1;
To take modulus you somehow must have 2^n-1 or you will move in a different direction of algorithms, interesting but seperate direction somehow, so i recommend you to use big int concept as it will be easy... make a structure and implement a big value in small values, e.g.
struct bigint{
int lowerbits;
int upperbits;
}
decomposition of the statement also has solution like 2^n = (2^n-4 * 2^4 )-1%p decompose and seperatly handle them, that will be quite algorithmic then
To compute 2^n - 1 mod p, you can use exponentiation by squaring after first removing any multiple of (p - 1) from n (since a^{p-1} = 1 mod p). In pseudo-code:
n = n % (p - 1)
result = 1
pow = 2
while n {
if n % 2 {
result = (result * pow) % p
}
pow = (pow * pow) % p
n /= 2
}
result = (result + p - 1) % p
I came across the answer that I am posting here, when solving one of the mathematical problems on HackerRank, and it has worked for all the given test cases given there.
If you restrict n and p to 64 bit (unsigned long) values, then here is the mathematical approach :
2^n - 1 can be written as 1*[ (2^n - 1)/(2 - 1) ]
If you look at this carefully, this is the sum of the GP 1 + 2 + 4 + .. + 2^(n-1)
And voila, we know that (a+b)%m = ( (a%m) + (b%m) )%m
If you have a confusion whether the above relation is true or not for addition, you can google for it or you can check this link : http://www.inf.ed.ac.uk/teaching/courses/dmmr/slides/13-14/Ch4.pdf
So, now we can apply the above mentioned relation to our GP, and you would have your answer!!
That is,
(2^n - 1)%p is equivalent to ( 1 + 2 + 4 + .. + 2^(n-1) )%p and now apply the given relation.
First, focus on 2n mod p because you can always subtract one at the end.
Consider the powers of two. This is a sequence of numbers produced by repeatedly multiplying by two.
Consider the modulo operation. If the number is written in base p, you're just grabbing the last digit. Higher digits can be thrown away.
So at some point(s) in the sequence, you get a two-digit number (a 1 in the p's place), and your task is really just to get rid of the first digit (subtract p) when that happens.
Stopping here conceptually, the brute-force approach would be something like this:
uint64_t exp2modp( uint64_t n, uint64_t p ) {
uint64_t ret = 1;
uint64_t limit = p / 2;
n %= p; // Apply Fermat's Little Theorem.
while ( n -- ) {
if ( ret >= limit ) {
ret *= 2;
ret -= p;
} else {
ret *= 2;
}
}
return ret;
}
Unfortunately, this still takes forever for large n and p, and I can't think of any better number theory offhand.
If you have a multiplication facility which can compute (p-1)^2 without overflow, then you can use an analogous algorithm using repeated squaring with a modulo after each square operation, and then take the product of the series of square residuals, again with a modulo after each multiplication.
step 1. x= shifting 1 n times and then subtract 1
step 2.result = logical and operation of x and p
Optimized way to handle the value of n^n (1 ≤ n ≤ 10^9)
I used long long int but it's not good enough as the value might be (1000^1000)
Searched and found the GMP library http://gmplib.org/ and BigInt class but don't wanna use them. I am looking for some numerical method to handle this.
I need to print the first and last k (1 ≤ k ≤ 9) digits of n^n
For the first k digits I am getting it like shown below (it's bit ugly way of doing it)
num = pow(n,n);
while(num){
arr[i++] = num%10;
num /= 10;
digit++;
}
while(digit > 0){
j=digit;
j--;
if(count<k){
printf("%lld",arr[j]);
count++;
}
digit--;
}
and for last k digits am using num % 10^k like below.
findk=pow(10,k);
lastDigits = num % findk;
enter code here
maximum value of k is 9. so i need only 18 digits at max.
I am think of getting those 18 digits without really solving the complete n^n expression.
Any idea/suggestion??
// note: Scope of use is limited.
#include <stdio.h>
long long powerMod(long long a, long long d, long long n){
// a ^ d mod n
long long result = 1;
while(d > 0){
if(d & 1)
result = result * a % n;
a = (a * a) % n;
d >>=1;
}
return result;
}
int main(void){
long long result = powerMod(999, 999, 1000000000);//999^999 mod 10^9
printf("%lld\n", result);//499998999
return 0;
}
Finding the Least Significant Digits (last k digits) are easy because of the property of modular arithmetic, which says: (n*n)%m == (n%m * n%m)%m, so the code shown by BLUEPIXY which followed exponentiation by squaring method will work well for finding k LSDs.
Now, Most Significant Digits (1st k digits) of N^N can be found in this way:
We know,
N^N = 10^(N log N)
So if you calculate N log (N) you will get a number of this format xxxx.yyyy, now we have to use this number as a power of 10, it is easily understandable that xxxx or integer part of the number will add xxxx zeros after 10, which is not important for you! That means, if you calculate 10^0.yyyy, you will get those significants digits you are looking for.
So the solution will be something like this:
double R = N * log10 (N);
R = R - (long long) R; //so taking only the fractional part
double V = pow(10, R);
int powerK = 1;
for (int i=0; i<k; i++) powerK *=10;
V *= powerK;
//Now Print the 1st K digits from V
Why don't you want to use bigint libraries?
bignum arithmetic is very hard to do right and efficiently. You could still get a PhD by working on that subject.
Fist, bigint arithmetic have non-trivial algorithmics
Then, bigint implementations usually need some machine instructions (like add with carry) which are not easily accessible in plain C.
For your specific problem (first and last few digits of NN) you'll better also reason on paper (using arithmetic theorems) to lower the complexity. I am not an expert, but I guess that still remains intractable, perhaps with a complexity worse than O(N)