How to calculate inverse modular exponentation in c? - c

I want to take modular inverse(k≥1) of integer and then multiply the result to another integer, as explain in following expression:
result=((x^(-k)))*y mod z
How can i implement this expression, where k≥1?

You need to define four function:
uint64_t modular_exponentiation(uint64_t x, uint64_t y, uint64_t z)
{
uint64_t res = 1;
x = x % z;
while (y > 0)
{
if (y & 1)
res = (res*x) % p;
y = y>>1; // y = y/2
x = (x*x) % z;
}
return res;
}
uint64_t moduloMultiplication(uint64_t a, uint64_t b,uint64_t z)
{
uint64_t res = 0;
a %= z;
while (b)
{
if (b & 1)
res = (res + a) % z;
a = (2 * a) % p;
b >>= 1; // b = b / 2
}
return res;
}
void extendedEuclid(uint64_t A, uint64_t B)
{
uint64_t temp;
if(B == 0)
{
d = A;
x = 1;
y = 0;
}
else
{
extendedEuclid(B,A%B);
temp = x;
x = y;
y = temp - (A/B)*y;
}
}
int modInverse(uint64_t A, uint64_t M)
{
extendedEuclid(A,M);
if (x < 0)
x += M;
return (x);
}
In main():
uint64_t result=0x00;
result=modular_exponentiation(x,k,z); // (x^k) mod z
result=modInverse(result,z); // ((x^k)^-1) mod z == x^(-k) mod z
result=moduloMultiplication(result,y,z);// x^(-k) * y mod z

You will need the extended greatest common divisor to compute the inverse of x for modulus z. When x and zare relatively prime you have a * x + b * z = 1 = gcd(x, z). And thus, a * x = 1 - b * z or a * x = 1 mod z, and a is the inverse to x in the modulus z.
Now you may compute result with x^-1 = a mod z:
result = power(a, k) * y % z
with ordinary integer arithmetic in C, where power() is the ordinary integer exponentiation.
Since the coefficients in such calculations can become very large very quickly, it is better to use ready-made libraries (e.g. gmp).

You can try the mod_inv C function :
// return a modular multiplicative inverse of n with respect to the modulus.
// return 0 if the linear congruence has no solutions.
unsigned mod_inv(unsigned ra, unsigned rb) {
unsigned rc, sa = 1, sb = 0, sc, i = 0;
if (rb > 1) do {
rc = ra % rb;
sc = sa - (ra / rb) * sb;
sa = sb, sb = sc;
ra = rb, rb = rc;
} while (++i, rc);
sa *= (i *= ra == 1) != 0;
sa += (i & 1) * sb;
return sa;
}
This is basically the standard algorithm, when n = 1 and mod = 0 the output is 0, not 1, i think we have not many computations to execute modulo 0.
The modular multiplicative inverse of an integer N modulo m is an integer n such as the inverse of N modulo m equals n, if a modular inverse exists then it is unique. To calculate the value of the modulo inverse, use the extended euclidean algorithm which finds solutions to the Bezout identity.
Example of usage :
#include <assert.h>
int main(void) {
unsigned n, mod, res;
n = 52, mod = 107;
res = mod_inv(n, mod);
assert(res == 35); // 35 is a solution of the linear congruence.
n = 66, mod = 123;
res = mod_inv(n, mod);
assert(res == 0); // 66 does note have an inverse modulo 123.
}
/*
n = 7 and mod = 45 then res = 13 so 1 == ( 13 * 7 ) % 45
n = 52 and mod = 107 then res = 35 so 1 == ( 35 * 52 ) % 107
n = 213 and mod = 155 then res = 147 so 1 == ( 147 * 213 ) % 155
n = 392 and mod = 45 then res = 38 so 1 == ( 38 * 392 ) % 45
n = 687 and mod = 662 then res = 53 so 1 == ( 53 * 687 ) % 662
n = 451 and mod = 799 then res = 512 so 1 == ( 512 * 451 ) % 799
n = 1630 and mod = 259 then res = 167 so 1 == ( 167 * 1630 ) % 259
n = 4277 and mod = 4722 then res = 191 so 1 == ( 191 * 4277 ) % 4722
*/
Source

Related

Question about address operations in neon intrinstics

I'm rearranging an array in my project on ARMv7. Now I get the elements' address d[] in the order I expect. To make the code more efficient, I want to use neon intrinstics in C++. Now my problem is, I can load the address array d[] by using vld1q_s32(), but I do not know how to read the elements of this vector as addresses.The instructions I know can only simply duplicate one vector.
This problem has been confusing me for several days. Or neon cannot do certain thing?
Thanks for your answering.
Here is my code:
void InputRearrange(int8_t* din, int8_t* dout, const int x, const int y){
int8_t* dout_array[16];
int out = 0;
dout_array[0] = din;
for(int n = 1; n < 16; n++) {//get the address of the first line in z-axis
dout_array[n] = dout_array[n - 1] + x*y;
}
for(int y_count = 0; y_count < y; y_count++) {
for(int x_count = 0; x_count < x; x_count++) {
for(int z_count = 0; z_count < 16; z_count++) {
dout[out++] = *(dout_array[k]++);//dout_array[k]++ let dout_array[k] moves in x-axis and I want to change this loop into neon intrinsics.
}
}
}
}
}
din[ ] is the original array and is like a 3-D array as a cube but stored as a 1-D one. The cube has three axis: x, y , z(=16). The original way array din[ ] stores the elements from x-axis first and then y-axis and last z-axis. But my code changed the order to z-axis first and then x-axis and last y-axis. I would like to use neon intrinsics in the final for loop, but it seems that it cannot be realized.
Your code rearranges a three-dimensional array int8_t (&output)[y][x][16] to int8_t (&input)[16][y][x], which is also equivalent to transposing a 2d array of int8_t (&out)[x*y][16] to int8_t (&in)[16][x*y].
This can definitely benefit from arm neon intrinsics that can interleave/deinterleave either registers (vzip,vuzp) or memory content (vldN, vstN).
// planarizes next 128 bytes to 16 planes
void planarize(int8_t *in, int8_t *out, int xy) {
int8_t * o_1 = out + 4*xy;
int8_t * o_2 = out + 8*xy;
int8_t * o_3 = out + 12*xy;
int8x16x4_t a = vld4q_s8(in); in+=64;
int8x16x4_t b = vld4q_s8(in); in+=64;
int8x16x2_t c = vuzpq_s8(a.val[0], b.val[0]);
int8x16x2_t d = vuzpq_s8(a.val[1], b.val[1]);
int8x16x2_t e = vuzpq_s8(a.val[2], b.val[2]);
int8x16x2_t f = vuzpq_s8(a.val[3], b.val[3]);
c = vuzpq_s8(c.val[0], c.val[1]);
d = vuzpq_s8(d.val[0], d.val[1]);
e = vuzpq_s8(e.val[0], e.val[1]);
f = vuzpq_s8(f.val[0], f.val[1]);
// now c = 0 16 32 48 64 80 96 112 4 20 36 52 68 84 100 116
// 8 24 40 56 72 88 104 120 12 28 44 60 76 92 108 124
// d = c + 1, e = d + 1, f = e + 1
vst1_s8(out + 0 * xy, vget_low_s8(c.val[0]);
vst1_s8(out + 1 * xy, vget_low_s8(d.val[0]);
vst1_s8(out + 2 * xy, vget_low_s8(e.val[0]);
vst1_s8(out + 3 * xy, vget_low_s8(f.val[0]);
vst1_s8(o_1 + 4 * xy, vget_high_s8(c.val[0]);
vst1_s8(o_1 + 5 * xy, vget_high_s8(d.val[0]);
vst1_s8(o_1 + 6 * xy, vget_high_s8(e.val[0]);
vst1_s8(o_1 + 7 * xy, vget_high_s8(f.val[0]);
vst1_s8(o_2 + 0 * xy, vget_low_s8(c.val[1]);
vst1_s8(o_2 + 1 * xy, vget_low_s8(d.val[1]);
vst1_s8(o_2 + 2 * xy, vget_low_s8(e.val[1]);
vst1_s8(o_2 + 3 * xy, vget_low_s8(f.val[1]);
vst1_s8(o_3 + 4 * xy, vget_high_s8(c.val[1]);
vst1_s8(o_3 + 5 * xy, vget_high_s8(d.val[1]);
vst1_s8(o_3 + 6 * xy, vget_high_s8(e.val[1]);
vst1_s8(o_3 + 7 * xy, vget_high_s8(f.val[1]);
}
The opposite would interleave from 16 independent planes
int8x16x2_t load4(int8_t *in, int xy) {
int8x8_t a0 = vld1_s8(in);
int8x8_t a1 = vld1_s8(in + xy);
int8x8_t a2 = vld1_s8(in + 2 * xy);
int8x8_t a3 = vld1_s8(in + 3 * xy);
auto a = vzipq_s8(vcombine_s8(a0, a0), vcombine_s8(a1, a1)).val[0];
auto b = vzipq_s8(vcombine_s8(a2, a2), vcombine_s8(a3, a3)).val[0];
return vzipq_s8(a,b);
}
int8_t *store4(int8x16x2_t a, int8x16x2_t b, int8x16x2_t c, int8x16x2_t d, int8_t *out) {
int32x4x4_t A{
vreinterpretq_s32_s8(a.val[0]),
vreinterpretq_s32_s8(b.val[0]),
vreinterpretq_s32_s8(c.val[0]),
vreinterpretq_s32_s8(d.val[0])};
int32x4x4_t B{
vreinterpretq_s32_s8(a.val[1]),
vreinterpretq_s32_s8(b.val[1]),
vreinterpretq_s32_s8(c.val[1]),
vreinterpretq_s32_s8(d.val[1])};
vst4q_s32((int32_t*)out, A); out += 64;
vst4q_s32((int32_t*)out, B); out += 64;
return out;
}
void interleave(int8_t *in, int8_t *out, int xy) {
int w = xy;
while (w >= 8) {
auto a = load4(in, xy);
auto b = load4(in + 4*xy, xy);
auto c = load4(in + 8*xy, xy);
auto d = load4(in + 12*xy, xy);
in += 8;
out = store4(a,b,c,d, out);
w -= 8;
}
}
Handling the excess (xy & 7 != 0) can be done by processing one full block aligned at in_ptr + xy - 8 and out_ptr + xy * 16 - 8*16.

Modular multiplicative inverse

I calculating ((A^B)/C)%M, but my code is not working when A,B,C,M are large in numbers. This code is giving right answer when A,B,C,D is small int.
What is wrong here?
Here C and M is co-prime
Sample input
2 3 4 5
Sample output
2
Code fails for these input
969109092 60139073 122541116 75884463
C program
#include <stdio.h>
int d,x,y;
Modular exponential (A^B)%M
int power(int A, int B, int M)
{
long long int result=1;
while(B>0)
{
if(B % 2 ==1)
{
result=(result * A)%M;
}
A=(A*A)%M;
B=B/2;
}
return result;
}
Modular multiplicative inverse
void extendedEuclid(int A, int B)
{
if(B == 0)
{
d = A;
x = 1;
y = 0;
}
else
{
extendedEuclid(B,A%B);
int temp = x;
x = y;
y = temp - (A/B)*y;
}
}
int modInv(int A, int M)
{
extendedEuclid(A,M);
return (x%M+M)%M;
}
main()
int main()
{
int A,B,C,M;
scanf("%d %d %d %d",&A,&B,&C,&M);
int inv = modInv(C,M)%M;
printf("%d\n",inv);
long long int p = (power(A,B,M))%M;
printf("%d\n",p);
long long int ans = (p * inv)%M;
//printf("%d",((modInv(C,M)*(power(A,B,M))))%M);
printf("%lld",ans);
return 0;
}
Code has at least the following issues:
int overflow in A*A. Code needs to calculate the product A*A using wider math. That is why code works with small values, but not large.
// A=(A*A)%M;
A = ((long long)A*A) % M;
// or
A = (1LL*A*A) % M;
Wrong print specifier. This implies compiler warnings are not fully enabled. Save time, Enable them all.
long long int p = (power(A,B,M))%M;
// printf("%d\n",p);
printf("%lld\n",p);
Code is amiss with negative values. Rather than patch that int hole, use unsigned types.
unsigned power(unsigned A, unsigned B, unsigned M) {
unsigned long long result = 1;
...
Failed corner case in power(A,0,1). result should be 0 when M==1.
// long long int result=1;
long long int result=1%M;
Test version with fixes noted in comments:
#include <stdio.h>
int d,x,y;
int power(int A, int B, int M)
{
long long int result=1;
long long int S = A; /* fix */
while(B>0)
{
if(B % 2 ==1)
{
result=(result * S)%M; /* fix */
}
S=(S*S)%M; /* fix */
B=B/2;
}
return (int)result;
}
void extendedEuclid(int A, int B)
{
int temp; /* C */
if(B == 0)
{
d = A;
x = 1;
y = 0;
}
else
{
extendedEuclid(B,A%B);
temp = x;
x = y;
y = temp - (A/B)*y;
}
}
int modInv(int A, int M)
{
extendedEuclid(A,M);
/* x = x%M; ** not needed */
if (x < 0) /* fix */
x += M; /* fix */
return (x); /* fix */
}
int main()
{
int A,B,C,M; /* C */
int inv, p, ans; /* C */
A = 969109092; /* 2^2 × 3^2 ×7 × 1249 × 3079 */
B = 60139073; /* 60139073 */
C = 122541116; /* 2^2 × 1621 × 18899 */
M = 75884463; /* 3^2 × 8431607 */
inv = modInv(C,M)%M; /* 15543920 */
printf("%d\n",inv);
p = power(A,B,M)%M; /* 6704397 */
printf("%d\n",p);
ans = (unsigned)(((unsigned long long)p * inv)%M); /* fix 22271562 */
printf("%d\n",ans);
return 0;
}
The value of int is probably not large enough, try with long, or double.
Be careful because power returns an int not long long int
You can try the mod_inv C function :
// return a modular multiplicative inverse of n with respect to the modulus.
// return 0 if the linear congruence has no solutions.
unsigned mod_inv(unsigned ra, unsigned rb) {
unsigned rc, sa = 1, sb = 0, sc, i = 0;
if (rb > 1) do {
rc = ra % rb;
sc = sa - (ra / rb) * sb;
sa = sb, sb = sc;
ra = rb, rb = rc;
} while (++i, rc);
sa *= (i *= ra == 1) != 0;
sa += (i & 1) * sb;
return sa;
}
This is basically the standard algorithm, to avoid overflows the signs are stored into the d variable, you could use a struct to do this. Also, when n = 1 and mod = 0 the output is 0, not 1, i think we have not many computations to execute modulo 0.
The modular multiplicative inverse of an integer N modulo m is an integer n such as the inverse of N modulo m equals n, if a modular inverse exists then it is unique. To calculate the value of the modulo inverse, use the extended euclidean algorithm which finds solutions to the Bezout identity.
Example of usage :
#include <assert.h>
int main(void) {
unsigned n, mod, res;
n = 52, mod = 107;
res = mod_inv(n, mod);
assert(res == 35); // 35 is a solution of the linear congruence.
n = 66, mod = 123;
res = mod_inv(n, mod);
assert(res == 0); // 66 does note have an inverse modulo 123.
}
/*
n = 7 and mod = 45 then res = 13 so 1 == ( 13 * 7 ) % 45
n = 52 and mod = 107 then res = 35 so 1 == ( 35 * 52 ) % 107
n = 213 and mod = 155 then res = 147 so 1 == ( 147 * 213 ) % 155
n = 392 and mod = 45 then res = 38 so 1 == ( 38 * 392 ) % 45
n = 687 and mod = 662 then res = 53 so 1 == ( 53 * 687 ) % 662
n = 451 and mod = 799 then res = 512 so 1 == ( 512 * 451 ) % 799
n = 1630 and mod = 259 then res = 167 so 1 == ( 167 * 1630 ) % 259
n = 4277 and mod = 4722 then res = 191 so 1 == ( 191 * 4277 ) % 4722
*/
Source

Compute logarithmic expression without floating point arithmetics or log

I need to compute the mathematical expression floor(ln(u)/ln(1-p)) for 0 < u < 1 and 0 < p < 1 in C on an embedded processor with no floating point arithmetics and no ln function. The result is a positive integer. I know about the limit cases (p=0), I'll deal with them later...
I imagine that the solution involves having u and p range over 0..UINT16_MAX, and appeal to a lookup table for the logarithm, but I cannot figure out how exactly: what does the lookup table map to?
The result needs not be 100% exact, approximations are OK.
Thanks!
Since the logarithm is used in both dividend and divisor, there is no need to use log(); we can use log2() instead. Due to the restrictions on the inputs u and p the logarithms are known to be both negative, so we can restrict ourselves to compute the positive quantity -log2().
We can use fixed-point arithmetic to compute the logarithm. We do so by multiplying the original input by a sequence of factors of decreasing magnitude that approach 1. Considering each of the factor in sequence, we multiply the input only by those factors that result in a product closer to 1, but without exceeding it. While doing so, we sum the log2() of the factors that "fit". At the end of this procedure we wind up with a number very close to 1 as our final product, and a sum that represents the binary logarithm.
This process is known in the literature as multiplicative normalization or pseudo division, and some early publications describing it are the works by De Lugish and Meggitt. The latter indicates that the origin is basically Henry Briggs's method for computing common logarithms.
B. de Lugish. "A Class of Algorithms for Automatic Evaluation of Functions and Computations in a Digital Computer". PhD thesis, Dept. of Computer Science, University of Illinois, Urbana, 1970.
J. E. Meggitt. "Pseudo division and pseudo multiplication processes". IBM Journal of Research and Development, Vol. 6, No. 2, April 1962, pp. 210-226
As the chosen set of factors comprises 2i and (1+2-i) the necessary multiplications can be performed without the need for a multiplication instruction: the products can be computed by either shift or shift plus add.
Since the inputs u and p are purely fractional numbers with 16 bits, we may want to chose a 5.16 fixed-point result for the logarithm. By simply dividing the two logarithm values, we remove the fixed-point scale factor, and apply a floor() operation at the same time, because for positive numbers, floor(x) is identical to trunc(x) and integer division is truncating.
Note that the fixed-point computation of the logarithm results in large relative error for inputs near 1. This in turn means the entire function computed using fixed-point arithmetic may deliver results significantly different from the reference if p is small. An example of this is the following test case: u=55af p=0052 res=848 ref=874.
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
/* input x is a 0.16 fixed-point number in [0,1)
function returns -log2(x) as a 5.16 fixed-point number in (0, 16]
*/
uint32_t nlog2_16 (uint16_t x)
{
uint32_t r = 0;
uint32_t t, a = x;
/* try factors 2**i with i = 8, 4, 2, 1 */
if ((t = a << 8 ) < 0x10000) { a = t; r += 0x80000; }
if ((t = a << 4 ) < 0x10000) { a = t; r += 0x40000; }
if ((t = a << 2 ) < 0x10000) { a = t; r += 0x20000; }
if ((t = a << 1 ) < 0x10000) { a = t; r += 0x10000; }
/* try factors (1+2**(-i)) with i = 1, .., 16 */
if ((t = a + (a >> 1)) < 0x10000) { a = t; r += 0x095c0; }
if ((t = a + (a >> 2)) < 0x10000) { a = t; r += 0x0526a; }
if ((t = a + (a >> 3)) < 0x10000) { a = t; r += 0x02b80; }
if ((t = a + (a >> 4)) < 0x10000) { a = t; r += 0x01664; }
if ((t = a + (a >> 5)) < 0x10000) { a = t; r += 0x00b5d; }
if ((t = a + (a >> 6)) < 0x10000) { a = t; r += 0x005ba; }
if ((t = a + (a >> 7)) < 0x10000) { a = t; r += 0x002e0; }
if ((t = a + (a >> 8)) < 0x10000) { a = t; r += 0x00171; }
if ((t = a + (a >> 9)) < 0x10000) { a = t; r += 0x000b8; }
if ((t = a + (a >> 10)) < 0x10000) { a = t; r += 0x0005c; }
if ((t = a + (a >> 11)) < 0x10000) { a = t; r += 0x0002e; }
if ((t = a + (a >> 12)) < 0x10000) { a = t; r += 0x00017; }
if ((t = a + (a >> 13)) < 0x10000) { a = t; r += 0x0000c; }
if ((t = a + (a >> 14)) < 0x10000) { a = t; r += 0x00006; }
if ((t = a + (a >> 15)) < 0x10000) { a = t; r += 0x00003; }
if ((t = a + (a >> 16)) < 0x10000) { a = t; r += 0x00001; }
return r;
}
/* Compute floor(log(u)/log(1-p)) for 0 < u < 1 and 0 < p < 1,
where 'u' and 'p' are represented as 0.16 fixed-point numbers
Result is an integer in range [0, 1048676]
*/
uint32_t func (uint16_t u, uint16_t p)
{
uint16_t one_minus_p = 0x10000 - p; // 1.0 - p
uint32_t log_u = nlog2_16 (u);
uint32_t log_p = nlog2_16 (one_minus_p);
uint32_t res = log_u / log_p; // divide and floor in one go
return res;
}
The maximum value of this function basically depends on the precision limit; that is, how arbitrarily close to the limits (u -> 0) or (1 - p -> 1) the fixed point values can be.
If we assume (k) fractional bits, e.g., with the limits: u = (2^-k) and 1 - p = 1 - (2^-k),
then the maximum value is: k / (k - log2(2^k - 1))
(As the ratio of natural logarithms, we are free to use any base e.g., lb(x) or log2)
Unlike njuffa's answer, I went with a lookup table approach, settling on k = 10 fractional bits to represent 0 < frac(u) < 1024 and 0 < frac(p) < 1024. This requires a log table with 2^k entries. Using 32-bit table values, we're only looking at a 4KiB table.
Any more than that, and you are using enough memory that you could seriously consider using the relevant parts of a 'soft-float' library. e.g., k = 16 would yield a 256KiB LUT.
We're computing the values - log2(i / 1024.0) for 0 < i < 1024. Since these values are in the open interval (0, k), we only need 4 binary digits to store the integral part. So we store the precomputed LUT in 32-bit [4.28] fixed-point format:
uint32_t lut[1024]; /* never use lut[0] */
for (uint32_t i = 1; i < 1024; i++)
lut[i] = (uint32_t) (- (log2(i / 1024.0) * (268435456.0));
Given: u, p represented by [0.10] fixed-point values in [1, 1023] :
uint32_t func (uint16_t u, uint16_t p)
{
/* assert: 0 < u, p < 1024 */
return lut[u] / lut[1024 - p];
}
We can easily test all valid (u, p) pairs against the 'naive' floating-point evaluation:
floor(log(u / 1024.0) / log(1.0 - p / 1024.0))
and only get a mismatch (+1 too high) on the following cases:
u = 193, p = 1 : 1708 vs 1707 (1.7079978488147417e+03)
u = 250, p = 384 : 3 vs 2 (2.9999999999999996e+00)
u = 413, p = 4 : 232 vs 231 (2.3199989016957960e+02)
u = 603, p = 1 : 542 vs 541 (5.4199909906444600e+02)
u = 680, p = 1 : 419 vs 418 (4.1899938077226307e+02)
Finally, it turns out that using the natural logarithm in a [3.29] fixed-point format gives us even higher precision, where:
lut[i] = (uint32_t) (- (log(i / 1024.0) * (536870912.0));
only yields a single 'mismatch', though 'bignum' precision suggests it's correct:
u = 250, p = 384 : 3 vs 2 (2.9999999999999996e+00)

Find the sum of digits in 100 factorial in C without using Array

For solving project euler problem 20 to find the sum of digits in 100! i am running the following program , it is working for factorial of small numbers but not for 100.which data type should i use or is it necessary to use an array for storing the digits?
int rec(int);
void main()
{
int f=1,i=1,z,s=0,r,n;
while(i<=100)
{
f=f*i;
f=rec(f);
i++;
}
n=f;
while(n!=0)
{
r=n%10;
n=n/10;
s=s+r;
}
printf("\n%d",s);
}
int rec(int t)
{
if(t%10==0)
{
t=t/10;
rec(t);
}
return t;
}
Approximate factorial of 100 can be calculated using the double type. You can also use the Stirling's formula, stating that
n! ≈ sqrt(2*M_PI*n) * pow(n/exp(0),n)
If you plug in the numbers, you'll get n! ≈ 9*10157. That means your type needs to be able to hold 158 decimal digits or, equivalently, ~log2(9*10157) = 525 bits or 66 8-bit bytes.
No fundamental numeric type in C is big enough. The largest you are guaranteed to get is 64 bits (if you use unsigned long long).
So, if you want to calculate n! in C, you either need to construct long arithmetic multiplication by hand or use a special library that can do that for you.
For this relatively simple task you can actually implement long multiplication and use it to get the factorial value by repeated multiplication.
In the following program I've used an in-place multiplication algorithm, which modifies one of the multiplicands in the process and eventually replaces it with the product. The algorithm can be derived directly from the long multiplication known from school.
This program calculates factorials of integers from 1 up to and including 100:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <limits.h>
typedef unsigned char uint8;
typedef unsigned short uint16;
#if UINT_MAX >= 0xFFFFFFFF
typedef unsigned uint32;
#else
typedef unsigned long uint32;
#endif
typedef unsigned uint;
void MulInPlace(uint8* dst/* n bytes */,
const uint8* src/* n bytes */,
uint n)
{
uint c1, c2;
if (n >= 0xFFFF) abort();
for (c1 = n - 1; c1 != ~0u; c1--)
{
uint16 s = 0;
uint32 p = 0; // p must be able to store ceil(log2(n))+2*8 bits
for (c2 = c1; c2 != ~0u; c2--)
{
p += dst[c2] * src[c1 - c2];
}
dst[c1] = (uint8)(p & 0xFF);
for (c2 = c1 + 1; c2 < n; c2++)
{
p >>= 8;
s += dst[c2] + (uint8)(p & 0xFF);
dst[c2] = (uint8)(s & 0xFF);
s >>= 8;
}
}
}
int ByteDivInPlace(uint8* dst/* n bytes */,
uint n,
uint8 divisor,
uint8* remainder)
{
uint rem = 0;
int nonzero = 0;
while (n)
{
rem += dst[n - 1];
nonzero |= (dst[n - 1] = rem / divisor);
rem = (rem % divisor) << 8;
n--;
}
if (remainder != NULL)
*remainder = (uint8)(rem >> 8);
return nonzero; // 1 if the quotient is non-zero, 0 otherwise
}
void IncInPlace(uint8* dst/* n bytes */,
uint n)
{
uint c = 1;
while (n-- && c)
{
c += *dst;
*dst++ = c & 0xFF;
c >>= 8;
}
}
void DestroyingDecimalPrint(uint8* dst, uint n)
{
uint8 r;
if (ByteDivInPlace(dst, n, 10, &r))
DestroyingDecimalPrint(dst, n);
printf("%d", r);
}
int main(void)
{
int i;
uint8 factorial[66];
uint8 factor[sizeof(factorial)];
uint8 tmp[sizeof(factorial)];
// factor = 1
memset(factor, 0, sizeof(factor));
factor[0] = 1;
// factorial = 1
memcpy(factorial, factor, sizeof(factorial));
for (i = 1; i <= 100; i++)
{
// factorial *= factor
MulInPlace(factorial, factor, sizeof(factorial));
// tmp = factorial
memcpy(tmp, factorial, sizeof(factorial));
// print i and tmp
printf("%i! = ", i);
DestroyingDecimalPrint(tmp, sizeof(tmp));
printf("\n");
// factor += 1
IncInPlace(factor, sizeof(factor));
}
return 0;
}
Output (ideone):
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5040
8! = 40320
9! = 362880
10! = 3628800
11! = 39916800
12! = 479001600
13! = 6227020800
14! = 87178291200
15! = 1307674368000
16! = 20922789888000
17! = 355687428096000
18! = 6402373705728000
19! = 121645100408832000
20! = 2432902008176640000
21! = 51090942171709440000
22! = 1124000727777607680000
23! = 25852016738884976640000
24! = 620448401733239439360000
25! = 15511210043330985984000000
26! = 403291461126605635584000000
27! = 10888869450418352160768000000
28! = 304888344611713860501504000000
29! = 8841761993739701954543616000000
30! = 265252859812191058636308480000000
31! = 8222838654177922817725562880000000
32! = 263130836933693530167218012160000000
33! = 8683317618811886495518194401280000000
34! = 295232799039604140847618609643520000000
35! = 10333147966386144929666651337523200000000
36! = 371993326789901217467999448150835200000000
37! = 13763753091226345046315979581580902400000000
38! = 523022617466601111760007224100074291200000000
39! = 20397882081197443358640281739902897356800000000
40! = 815915283247897734345611269596115894272000000000
41! = 33452526613163807108170062053440751665152000000000
42! = 1405006117752879898543142606244511569936384000000000
43! = 60415263063373835637355132068513997507264512000000000
44! = 2658271574788448768043625811014615890319638528000000000
45! = 119622220865480194561963161495657715064383733760000000000
46! = 5502622159812088949850305428800254892961651752960000000000
47! = 258623241511168180642964355153611979969197632389120000000000
48! = 12413915592536072670862289047373375038521486354677760000000000
49! = 608281864034267560872252163321295376887552831379210240000000000
50! = 30414093201713378043612608166064768844377641568960512000000000000
51! = 1551118753287382280224243016469303211063259720016986112000000000000
52! = 80658175170943878571660636856403766975289505440883277824000000000000
53! = 4274883284060025564298013753389399649690343788366813724672000000000000
54! = 230843697339241380472092742683027581083278564571807941132288000000000000
55! = 12696403353658275925965100847566516959580321051449436762275840000000000000
56! = 710998587804863451854045647463724949736497978881168458687447040000000000000
57! = 40526919504877216755680601905432322134980384796226602145184481280000000000000
58! = 2350561331282878571829474910515074683828862318181142924420699914240000000000000
59! = 138683118545689835737939019720389406345902876772687432540821294940160000000000000
60! = 8320987112741390144276341183223364380754172606361245952449277696409600000000000000
61! = 507580213877224798800856812176625227226004528988036003099405939480985600000000000000
62! = 31469973260387937525653122354950764088012280797258232192163168247821107200000000000000
63! = 1982608315404440064116146708361898137544773690227268628106279599612729753600000000000000
64! = 126886932185884164103433389335161480802865516174545192198801894375214704230400000000000000
65! = 8247650592082470666723170306785496252186258551345437492922123134388955774976000000000000000
66! = 544344939077443064003729240247842752644293064388798874532860126869671081148416000000000000000
67! = 36471110918188685288249859096605464427167635314049524593701628500267962436943872000000000000000
68! = 2480035542436830599600990418569171581047399201355367672371710738018221445712183296000000000000000
69! = 171122452428141311372468338881272839092270544893520369393648040923257279754140647424000000000000000
70! = 11978571669969891796072783721689098736458938142546425857555362864628009582789845319680000000000000000
71! = 850478588567862317521167644239926010288584608120796235886430763388588680378079017697280000000000000000
72! = 61234458376886086861524070385274672740778091784697328983823014963978384987221689274204160000000000000000
73! = 4470115461512684340891257138125051110076800700282905015819080092370422104067183317016903680000000000000000
74! = 330788544151938641225953028221253782145683251820934971170611926835411235700971565459250872320000000000000000
75! = 24809140811395398091946477116594033660926243886570122837795894512655842677572867409443815424000000000000000000
76! = 1885494701666050254987932260861146558230394535379329335672487982961844043495537923117729972224000000000000000000
77! = 145183092028285869634070784086308284983740379224208358846781574688061991349156420080065207861248000000000000000000
78! = 11324281178206297831457521158732046228731749579488251990048962825668835325234200766245086213177344000000000000000000
79! = 894618213078297528685144171539831652069808216779571907213868063227837990693501860533361810841010176000000000000000000
80! = 71569457046263802294811533723186532165584657342365752577109445058227039255480148842668944867280814080000000000000000000
81! = 5797126020747367985879734231578109105412357244731625958745865049716390179693892056256184534249745940480000000000000000000
82! = 475364333701284174842138206989404946643813294067993328617160934076743994734899148613007131808479167119360000000000000000000
83! = 39455239697206586511897471180120610571436503407643446275224357528369751562996629334879591940103770870906880000000000000000000
84! = 3314240134565353266999387579130131288000666286242049487118846032383059131291716864129885722968716753156177920000000000000000000
85! = 281710411438055027694947944226061159480056634330574206405101912752560026159795933451040286452340924018275123200000000000000000000
86! = 24227095383672732381765523203441259715284870552429381750838764496720162249742450276789464634901319465571660595200000000000000000000
87! = 2107757298379527717213600518699389595229783738061356212322972511214654115727593174080683423236414793504734471782400000000000000000000
88! = 185482642257398439114796845645546284380220968949399346684421580986889562184028199319100141244804501828416633516851200000000000000000000
89! = 16507955160908461081216919262453619309839666236496541854913520707833171034378509739399912570787600662729080382999756800000000000000000000
90! = 1485715964481761497309522733620825737885569961284688766942216863704985393094065876545992131370884059645617234469978112000000000000000000000
91! = 135200152767840296255166568759495142147586866476906677791741734597153670771559994765685283954750449427751168336768008192000000000000000000000
92! = 12438414054641307255475324325873553077577991715875414356840239582938137710983519518443046123837041347353107486982656753664000000000000000000000
93! = 1156772507081641574759205162306240436214753229576413535186142281213246807121467315215203289516844845303838996289387078090752000000000000000000000
94! = 108736615665674308027365285256786601004186803580182872307497374434045199869417927630229109214583415458560865651202385340530688000000000000000000000
95! = 10329978488239059262599702099394727095397746340117372869212250571234293987594703124871765375385424468563282236864226607350415360000000000000000000000
96! = 991677934870949689209571401541893801158183648651267795444376054838492222809091499987689476037000748982075094738965754305639874560000000000000000000000
97! = 96192759682482119853328425949563698712343813919172976158104477319333745612481875498805879175589072651261284189679678167647067832320000000000000000000000
98! = 9426890448883247745626185743057242473809693764078951663494238777294707070023223798882976159207729119823605850588608460429412647567360000000000000000000000
99! = 933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582511852109168640000000000000000000000
100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
You should look for overflow, print the value after each iteration.
Note that rec(t); doesn't do anything as it doesn't use the returned value... you want t = rec(t);.
int is definitely too short, try long long... if that's still overflowing, you need another data structure.. eg: GMP Library.
Note: using some "proper" language for the job might give you some insight to the range you have to support... e.g. with python:
>>> import math
>>> math.factorial(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000L
private static void problem20()
{
string muliplent = "100";
for (int i = 99; i > 1; i--)
{
muliplent = getproduct(muliplent, i);
}
int sum = 0;
char[] result=muliplent.ToCharArray();
int count = muliplent.ToCharArray().Count();
for (int j = 0; j < count; j++)
{
sum = sum + (result[j] - '0');
}
Console.WriteLine("sum is {0}", sum);
Console.ReadLine();
}
private static string getproduct(string multiplent, int multiplier)
{
StringBuilder str = new StringBuilder();
int product = 0;
int remainder = 0;
int dividend = 0;
char[] c = multiplent.ToCharArray();
for (int i = c.Count() - 1; i >= 0; i--)
{
product = (((c[i] - '0') * multiplier) + dividend);
remainder = product % 10;
dividend = product / 10;
if (i != 0)
{
str.Insert(0, remainder);
}
}
str.Insert(0, product);
return str.ToString();
}

Pollard Rho factorization method implementation in C

Can anyone help me out with the pollard rho implementation? I have implemented this in C. It's working fine for numbers upto 10 digits but it's not able to handle greater numbers.
Please help me out to improve it to carry out factorization of numbers upto 18 digits . My code is this:
#include<stdio.h>
#include<math.h>
int gcd(int a, int b)
{
if(b==0) return a ;
else
return(gcd(b,a%b)) ;
}
long long int mod(long long int a , long long int b , long long int n )
{
long long int x=1 , y=a ;
while(b>0)
{
if(b%2==1) x = ((x%n)*(y%n))%n ;
y = ((y%n)*(y%n))%n ;
b/=2 ;
}
return x%n ;
}
int isprimes(long long int u)
{
if(u==3)
return 1 ;
int a = 2 , i ;
long long int k , t = 0 , r , p ;
k = u-1 ;
while(k%2==0)
{ k/=2 ; t++ ; }
while(a<=3) /*der are no strong pseudoprimes common in base 2 and base 3*/
{
r = mod(a,k,u) ;
for(i = 1 ; i<=t ; i++)
{
p = ((r%u)*(r%u))%u ;
if((p==1)&&(r!=1)&&(r!=(u-1)))
{ return 0 ; }
r = p ;
}
if(p!=1)
return 0 ;
else
a++ ;
}
if(a==4)
return 1 ;
}
long long int pol(long long int u)
{
long long int x = 2 , k , i , a , y , c , s;
int d = 1 ;
k = 2 ;
i = 1 ;
y = x ;
a = u ;
if(isprimes(u)==1)
{
return 1;
}
c=-1 ;
s = 2 ;
while(1)
{
i++;
x=((x%u)*(x%u)-1)% u ;
d = gcd(abs(y-x),u) ;
if(d!=1&&d!=u)
{ printf("%d ",d);
while(a%d==0) { a=a/d; }
x = 2 ;
k = 2 ;
i = 1 ;
y = x ;
if(a==1)
{ return 0 ; }
if(isprimes(a)!=0)
{ return a ; }
u=a ;
}
if(i==k)
{y = x ; k*=2 ; c = x ;} /*floyd cycle detection*/
if(c==x)
{ x = ++s ; }
}
return ;
}
int main()
{
long long int t ;
long long int i , n , j , k , a , b , u ;
while(scanf("%lld",&n)&&n!=0)
{ u = n ; k = 0 ;
while(u%2==0)
{ u/=2 ; k = 1 ; }
if(k==1) printf("2 ") ;
if(u!=1)
t = pol(u) ;
if(u!=1)
{
if(t==1)
{ printf("%lld",u) ; }
else
if(t!=0)
{ printf("%lld",t) ; }
}
printf("\n");
}
return 0;
}
sorry for the long code ..... I am a new coder.
When you're multiplying two numbers modulo m, the intermediate product can become nearly m^2. So if you use a 64-bit unsigned integer type, the maximal modulus it can handle is 2^32, if the modulus is larger, overflow may happen. It will be rare when the modulus is only slightly larger, but that makes it only less obvious, you cannot rely on being lucky if the modulus allows the possibility of overflow.
You can gain a larger range by a factor of two if you choose a representative of the residue class modulo m of absolute value at most m/2 or something equivalent:
uint64_t mod_mul(uint64_t x, uint64_t y, uint64_t m)
{
int neg = 0;
// if x is too large, choose m-x and note that we need one negation for that at the end
if (x > m/2) {
x = m - x;
neg = !neg;
}
// if y is too large, choose m-y and note that we need one negation for that at the end
if (y > m/2) {
y = m - y;
neg = !neg;
}
uint64_t prod = (x * y) % m;
// if we had negated _one_ factor, and the product isn't 0 (mod m), negate
if (neg && prod) {
prod = m - prod;
}
return prod;
}
So that would allow moduli of up to 2^33 with a 64-bit unsigned type. Not a big step.
The recommended solution to the problem is the use of a big-integer library, for example GMP is available as a distribution package on most if not all Linux distros, and also (relatively) easily installable on Windows.
If that is not an option (really, are you sure?), you can get it to work for larger moduli (up to 2^63 for an unsigned 64-bit integer type) using Russian peasant multiplication:
x * y = 2 * (x * (y/2)) + (x * (y % 2))
so for the calculation, you only need that 2*(m-1) doesn't overflow.
uint64_t mod_mult(uint64_t x, uint64_t y, uint64_t m)
{
if (y == 0) return 0;
if (y == 1) return x % m;
uint64_t temp = mod_mult(x,y/2,m);
temp = (2*temp) % m;
if (y % 2 == 1) {
temp = (temp + x) % m;
}
return temp;
}
Note however that this algorithm needs O(log y) steps, so it's rather slow in practice. For smaller m you can speed it up, if 2^k*(m-1) doesn't overflow, you can proceed in steps of k bits instead of single bits (x*y = ((x * (y >> k)) << k) + (x * (y & ((1 << k)-1)))), which is a good improvement if your moduli are never larger than 48 or 56 bits, say.
Using that variant of modular multiplication, your algorithm will work for larger numbers (but it will be significantly slower). You can also try test for the size of the modulus and/or the factors to determine which method to use, if m < 2^32 or x < (2^64-1)/y, the simple (x * y) % m will do.
You can try this C implementation of Pollard Rho :
unsigned long long pollard_rho(const unsigned long long N) {
// Require : a composite number N, not a square.
// Ensure : res is a non-trivial factor of N.
// Option : define a timeout, define a rand function.
static const int timeout = 18;
static unsigned long long rand_val = 2994439072U;
rand_val = (rand_val * 1025416097U + 286824428U) % 4294967291LLU;
unsigned long long res = 1, a, b, c, i = 0, j = 1, x = 1, y = 1 + rand_val % (N - 1);
for (; res == 1; ++i) {
if (i == j) {
if (j >> timeout)
break;
j <<= 1;
x = y;
}
a = y, b = y;
for (y = 0; a; a & 1 ? b >= N - y ? y -= N : 0, y += b : 0, a >>= 1, (c = b) >= N - b ? c -= N : 0, b += c);
y = (1 + y) % N;
for (a = N, b = y > x ? y - x : x - y; (a %= b) && (b %= a););
res = a | b;
}
return res;
}
Otherwise there is a pure C quadratic sieve which factors numbers from 0 to 300-bit.

Resources