Here, N can be upto 10^18 and M is (10^9 +7). Since this loop takes O(n) time to execute, I get TLE in my code. Any way to reduce the time complexity?

The question is basically:
(count*a^b)%mod = ((count%mod)*((a^b)%mod))%mod
a = 10, b = 10^18
You can find ((a^b)%mod) using:
long long power(long long x, long long y, long long p)
long long res = 1; // Initialize result
x = x % p; // Update x if it is more than or
// equal to p
while (y > 0)
// If y is odd, multiply x with result
if (y & 1)
res = (res*x) % p;
// y must be even now
y = y>>1; // y = y/2
x = (x*x) % p;
return res;
Time Complexity of the power function is O(log y).
In your case count is a 1-digit number so we can simply multiply this with (count%mod), and finally take mod of the result. If count is a big number too, and can cause overflow then we can do:
long long mulmod(long long a, long long b, long long mod)
long long res = 0; // Initialize result
a = a % mod;
while (b > 0)
// If b is odd, add 'a' to result
if (b % 2 == 1)
res = (res + a) % mod;
// Multiply 'a' with 2
a = (a * 2) % mod;
// Divide b by 2
b /= 2;
// Return result
return res % mod;


Is it possible to increment the modulo operator in later loop iterations?

I am trying to construct a simple program which adds together the digits of a long number. I attempted to do this by using a loop employing the modulo operator and some basic arithmetic. I want to increment the modulo operator by multiplying it by ten on each iteration of the loop in order to reach the next digit. I want to check if my code is correct, however, I receive errors pertaining to the lines involving the modulo operations and I'm not quite sure why.
This was my attempted construction:
long i = 0;
long b;
int m = 1;
long number = get_long("Number?\n");
long a = number % m;
b = number - a;
long c = b % m x 10;
long d = c / m;
i = i + d;
m = m x 10
while (b > 0);
printf("%ld\n", i);
I made the basic error of writing "x" instead of "*". However, having fixed this, I no longer receive errors, but the program simply returns "0". Any diagnosis would be appreciated.
int main(void)
long i = 0;
long b;
int m = 10;
long number = get_long("Number?\n");
long a = number % m;
b = number - a;
long c = b % m * 10;
long d = c / m;
i = i + d;
m = m * 10;
while (b > 0);
printf("%ld\n", i);
For your revised code:
long c = b % m * 10;
this line will evaluate (b % m) and then multiply it by 10 because of the order of operations.
I presume what you actually want is:
long c = b % (m * 10);
Secondly, the following line determines which digit you start at:
int m = 10;
and this line determines how many digits between the ones you include in your total:
m = m * 10;
So for this configuration, it will start at the 2nd digit from the right and add every digit.
So for the number 1234, you'd get 3 + 2 + 1 = 6.
If you want to add every digit, you could set:
int m = 10;
and you'd get 4 + 3 + 2 + 1 = 10.
Alternatively, if you had used:
m = m * 10;
you'd have 3 + 1 = 4.
First, you're likely getting errors due to these lines:
long c = b % m x 10;
m = m x 10
This is because x is not a valid operator.
The multiplication operator is *:
long c = b % m * 10;
m = m * 10;
As for your approach, I would suggest, instead of changing the modulo operand, you simply divide the original number by 10 to shift it one digit each operation.
For example:
#include <stdio.h>
int main()
int sumofdigits = 0;
int num = 12345;
while(num > 0) {
sumofdigits += num % 10;
num /= 10;
printf("%d", sumofdigits);
return 0;
The reduced-sum of the digits of a number is the same as that number modulo 9.
#include <stdio.h>
int main(void) {
int number = 57283;
printf("%d \n", number%9);
// 5 + 7 + 2 + 8 + 3 == 25 ==> 2 + 5 == 7
// 57283 % 9 == 7
return 0;
If you want to use loops to get the reduced sum:
int sum_of_digits(int num)
int sum;
sum = 0;
sum += num%10;
num /= 10;
num = sum;
} while (sum >9);
return sum;
But if you only want the simple sum of digits (one pass only):
int sum_of_digits(int num)
int sum = 0;
sum += num%10;
num /= 10;
return sum;
You have to find the sum of the digits of a variable of type long by the two operators modulo (%) and division (/), you start with the operator modulo to find the remainder of the division (the digits) then, you add this degit to the sum, then you do the division / 10 to overwrite (the summed digit) until the number is equal to 0 like this:
int main()
long number=0,m=0;
printf("Give a number :");
long s=0,temp=number;
while(number != 0)
printf("\n%The sum of the digits of the Number %ld is : %ld\n",temp,s);

modular exponentation funcation generate incorrect result for big input in c

I try two function for modular exponentiation for big base return wrong results,
One of the function is:
uint64_t modular_exponentiation(uint64_t x, uint64_t y, uint64_t p)
uint64_t res = 1; // Initialize result
x = x % p; // Update x if it is more than or
// equal to p
while (y > 0)
// If y is odd, multiply x with result
if (y & 1)
res = (res*x) % p;
// y must be even now
y = y>>1; // y = y/2
x = (x*x) % p;
return res;
For input x = 1103362698 ,y = 137911680 , p=1217409241131113809;
It return the value (x^y mod p):749298230523009574(Incorrect).
The correct value is:152166603192600961
The other function i try, gave same result, What is wrong with these functions?
The other one is :
long int exponentMod(long int A, long int B, long int C)
// Base cases
if (A == 0)
return 0;
if (B == 0)
return 1;
// If B is even
long int y;
if (B % 2 == 0) {
y = exponentMod(A, B / 2, C);
y = (y * y) % C;
// If B is odd
else {
y = A % C;
y = (y * exponentMod(A, B - 1, C) % C) % C;
return (long int)((y + C) % C);
With p = 1217409241131113809, this value as well as any intermediate values for res and x will be larger than 32 bits. This means that multiplying two of these numbers could result in a value larger than 64 bits which overflows the datatype you're using.
If you restrict the parameters to 32 bit datatypes and use 64 bit datatypes for intermediate values then the function will work. Otherwise you'll need to use a big number library to get correct output.

Efficient algorithm to calculate the sum of number of base2 digits (number of bits) over an interval of positive integers

Let's say I've been given two integers a, b where a is a positive integer and is smaller than b. I have to find an efficient algorithm that's going to give me the sum of number of base2 digits (number of bits) over the interval [a, b]. For example, in the interval [0, 4] the sum of digits is equal to 9 because 0 = 1 digit, 1 = 1 digit, 2 = 2 digits, 3 = 2 digits and 4 = 3 digits.
My program is capable of calculating this number by using a loop but I'm looking for something more efficient for large numbers. Here are the snippets of my code just to give you an idea:
int numberOfBits(int i) {
if(i == 0) {
return 1;
else {
return (int) log2(i) + 1;
The function above is for calculating the number of digits of one number in the interval.
The code below shows you how I use it in my main function.
for(i = a; i <= b; i++) {
l = l + numberOfBits(i);
printf("Digits: %d\n", l);
Ideally I should be able to get the number of digits by using the two values of my interval and using some special algorithm to do that.
Try this code, i think it gives you what you are needing to calculate the binaries:
int bit(int x)
if(!x) return 1;
int i;
for(i = 0; x; i++, x >>= 1);
return i;
The main thing to understand here is that the number of digits used to represent a number in binary increases by one with each power of two:
| number range | binary digits |
| 0 - 1 | 1 |
| 2 - 3 | 2 |
| 4 - 7 | 3 |
| 8 - 15 | 4 |
| 16 - 31 | 5 |
| 32 - 63 | 6 |
| ... | ... |
A trivial improvement over your brute force algorithm would then be to figure out how many times this number of digits has increased between the two numbers passed in (given by the base two logarithm) and add up the digits by multiplying the count of numbers that can be represented by the given number of digits (given by the power of two) with the number of digits.
A naive implementation of this algorithm is:
int digits_sum_seq(int a, int b)
int sum = 0;
int i = 0;
int log2b = b <= 0 ? 1 : floor(log2(b));
int log2a = a <= 0 ? 1 : floor(log2(a)) + 1;
sum += (pow(2, log2a) - a) * (log2a);
for (i = log2b; i > log2a; i--)
sum += pow(2, i - 1) * i;
sum += (b - pow(2, log2b) + 1) * (log2b + 1);
return sum;
It can then be improved by the more efficient versions of the log and pow functions seen in the other answers.
First, we can improve the speed of log2, but that only gives us a fixed factor speed-up and doesn't change the scaling.
Faster log2 adapted from:
The lookup table method takes only about 7 operations to find the log
of a 32-bit value. If extended for 64-bit quantities, it would take
roughly 9 operations. Another operation can be trimmed off by using
four tables, with the possible additions incorporated into each. Using
int table elements may be faster, depending on your architecture.
Second, we must re-think the algorithm. If you know that numbers between N and M have the same number of digits, would you add them up one by one or would you rather do (M-N+1)*numDigits?
But if we have a range where multiple numbers appear what do we do? Let's just find the intervals of same digits, and add sums of those intervals. Implemented below. I think that my findEndLimit could be further optimized with a lookup table.
#include <stdio.h>
#include <limits.h>
#include <time.h>
unsigned int fastLog2(unsigned int v)
static const char LogTable256[256] =
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
return (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
return (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
unsigned int numberOfBits(unsigned int i)
if (i == 0) {
return 1;
else {
return fastLog2(i) + 1;
unsigned int findEndLimit(unsigned int sx, unsigned int ex)
unsigned int sy = numberOfBits(sx);
unsigned int ey = numberOfBits(ex);
unsigned int mx;
unsigned int my;
if (sy == ey) // this also means sx == ex
return ex;
// assumes sy < ey
mx = (ex - sx) / 2 + sx; // will eq. sx for sx + 1 == ex
my = numberOfBits(mx);
while (ex - sx != 1) {
mx = (ex - sx) / 2 + sx; // will eq. sx for sx + 1 == ex
my = numberOfBits(mx);
if (my == ey) {
ex = mx;
ey = numberOfBits(ex);
else {
sx = mx;
sy = numberOfBits(sx);
return sx+1;
int main(void)
unsigned int a, b, m;
unsigned long l;
clock_t start, end;
l = 0;
a = 0;
start = clock();
unsigned int i;
for (i = a; i < b; ++i) {
l += numberOfBits(i);
if (i == b) {
l += numberOfBits(i);
end = clock();
printf("Digits: %ld; Time: %fs\n",l, ((double)(end-start))/CLOCKS_PER_SEC);
start = clock();
do {
m = findEndLimit(a, b);
l += (b-m + 1) * (unsigned long)numberOfBits(b);
b = m-1;
} while (b > a);
l += (b-a+1) * (unsigned long)numberOfBits(b);
end = clock();
printf("Binary search\n");
printf("Digits: %ld; Time: %fs\n",l, ((double)(end-start))/CLOCKS_PER_SEC);
From 0 to UINT_MAX
$ ./main
Digits: 133143986178; Time: 25.722492s
Binary search
Digits: 133143986178; Time: 0.000025s
My findEndLimit can take long time in some edge cases:
From UINT_MAX/16+1 to UINT_MAX/8
$ ./main
Digits: 7784628224; Time: 1.651067s
Binary search
Digits: 7784628224; Time: 4.921520s
Conceptually, you would need to split the task to two subproblems -
1) find the sum of digits from 0..M, and from 0..N, then subtract.
2) find the floor(log2(x)), because eg for the number 77 the numbers 64,65,...77 all have 6 digits, the next 32 have 5 digits, the next 16 have 4 digits and so on, which makes a geometric progression.
int digits(int a) {
if (a == 0) return 1; // should digits(0) be 0 or 1 ?
int b=(int)floor(log2(a)); // use any all-integer calculation hack
int sum = 1 + (b+1) * (a- (1<<b) +1); // added 1, due to digits(0)==1
while (--b)
sum += (b + 1) << b; // shortcut for (b + 1) * (1 << b);
return sum;
int digits_range(int a, int b) {
if (a <= 0 || b <= 0) return -1; // formulas work for strictly positive numbers
return digits(b)-digits(a-1);
As efficiency depends on the tools available, one approach would be doing it "analog":
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
unsigned long long pow2sum_min(unsigned long long n, long long unsigned m)
if (m >= n)
return 1;
return (2ULL << n) + pow2sum_min(n, m);
#define LN(x) (log2(x)/log2(M_E))
int main(int argc, char** argv)
if (2 >= argc)
fprintf(stderr, "%s a b\n", argv[0]);
long a = atol(argv[1]), b = atol(argv[2]);
if (0L >= a || 0L >= b || b < a)
puts("Na ...!");
/* Expand intevall to cover full dimensions: */
unsigned long long a_c = pow(2, floor(log2(a)));
unsigned long long b_c = pow(2, floor(log2(b+1)) + 1);
double log2_a_c = log2(a_c);
double log2_b_c = log2(b_c);
unsigned long p2s = pow2sum_min(log2_b_c, log2_a_c) - 1;
/* Integral log2(x) between a_c and b_c: */
double A = ((b_c * (LN(b_c) - 1))
- (a_c * (LN(a_c) - 1)))/LN(2)
+ (b+1 - a);
/* "Integer"-integral - integral of log2(x)'s inverse function (2**x) between log(a_c) and log(b_c): */
double D = p2s - (b_c - a_c)/LN(2);
/* Corrective from a_c/b_c to a/b : */
double C = (log2_b_c - 1)*(b_c - (b+1)) + log2_a_c*(a - a_c);
printf("Total used digits: %lld\n", (long long) ((A - D - C) +.5));
The main thing here is the number and kind of iterations done.
Number is
log(floor(b_c)) - log(floor(a_c))
doing one
n - 1 /* Integer decrement */
2**n + s /* One bit-shift and one integer addition */
for each iteration.
Here's an entirely look-up based approach. You don't even need the log2 :)
First we precompute interval limits where the number of bits would change and create a lookup table. In other words we create an array limits[2^n], where limits[i] gives us the biggest integer that can be represented with (i+1) bits. Our array is then {1, 3, 7, ..., 2^n-1}.
Then, when we want to determine the sum of bits for our range, we must first match our range limits a and b with the smallest index for which a <= limits[i] and b <= limits[j] holds, which will then tell us that we need (i+1) bits to represent a, and (j+1) bits to represent b.
If the indexes are the same, then the result is simply (b-a+1)*(i+1), otherwise we must separately get the number of bits from our value to the edge of same number of bits interval, and add up total number of bits for each interval between as well. In any case, simple arithmetic.
#include <stdio.h>
#include <limits.h>
#include <time.h>
unsigned long bitsnumsum(unsigned int a, unsigned int b)
// generate lookup table
// limits[i] is the max. number we can represent with (i+1) bits
static const unsigned int limits[32] =
#define LTN(n) n*2u-1, n*4u-1, n*8u-1, n*16u-1, n*32u-1, n*64u-1, n*128u-1, n*256u-1
// make it work for any order of arguments
if (b < a) {
unsigned int c = a;
a = b;
b = c;
// find interval of a
unsigned int i = 0;
while (a > limits[i]) {
// find interval of b
unsigned int j = i;
while (b > limits[j]) {
// add it all up
unsigned long sum = 0;
if (i == j) {
// a and b in the same range
// conveniently, this also deals with j == 0
// so no danger to do [j-1] below
return (i+1) * (unsigned long)(b - a + 1);
else {
// add sum of digits in range [a, limits[i]]
sum += (i+1) * (unsigned long)(limits[i] - a + 1);
// add sum of digits in range [limits[j], b]
sum += (j+1) * (unsigned long)(b - limits[j-1]);
// add sum of digits in range [limits[i], limits[j]]
for (++i; i<j; ++i) {
sum += (i+1) * (unsigned long)(limits[i] - limits[i-1]);
return sum;
int main(void)
clock_t start, end;
unsigned int a=0, b=UINT_MAX;
start = clock();
printf("Sum of binary digits for numbers in range "
"[%u, %u]: %lu\n", a, b, bitsnumsum(a, b));
end = clock();
printf("Time: %fs\n", ((double)(end-start))/CLOCKS_PER_SEC);
$ ./lookup
Sum of binary digits for numbers in range [0, 4294967295]: 133143986178
Time: 0.000282s
The main idea is to find the n2 = log2(x) rounded down. That is the number of digits in x. Let pow2 = 1 << n2. n2 * (pow2 - x + 1) is the number of digits in the values [x...pow2]. Now find the sun of digits in the powers of 2 from 1 to n2-1
I am certain various simplifications can be made.
Untested code. Will review later.
// Let us use unsigned for everything.
unsigned ulog2(unsigned value) {
unsigned result = 0;
if (0xFFFF0000u & value) {
value >>= 16; result += 16;
if (0xFF00u & value) {
value >>= 8; result += 8;
if (0xF0u & value) {
value >>= 4; result += 4;
if (0xCu & value) {
value >>= 2; result += 2;
if (0x2 & value) {
value >>= 1; result += 1;
return result;
unsigned bit_count_helper(unsigned x) {
if (x == 0) {
return 1;
unsigned n2 = ulog2(x);
unsigned pow2 = 1u << n;
unsigned sum = n2 * (pow2 - x + 1u); // value from pow2 to x
while (n2 > 0) {
// ... + 5*16 + 4*8 + 3*4 + 2*2 + 1*1
pow2 /= 2;
sum += n2 * pow2;
return sum;
unsigned bit_count(unsigned a, unsigned b) {
assert(a < b);
return bit_count_helper(b - 1) - bit_count_helper(a);
For this problem your solution is the simplest, the one called "naive" where you look for every element in the sequence or in your case interval for check something or execute operations.
Naive Algorithm
Assuming that a and b are positive integers with b greater than a let's call the dimension/size of the interval [a,b], n = (b-a).
Having our number of elements n and using some notations of algorithms (like big-O notation link), the worst case cost is O(n*(numberOfBits_cost)).
From this we can see that we can speed up our algorithm by using a faster algorithm for computing numberOfBits() or we need to find a way to not look at every element of the interval that costs us n operations.
Now looking at a possible interval [6,14] you can see that for 6 and 7 we need 3 digits, with 4 need for 8,9,10,11,12,13,14. This results in calling numberOfBits() for every number that use the same number of digits to be represented, while the following multiplication operation would be faster:
((14-8)+1)*4 = 28
((7-6)+1)*3 = 6
So we reduced the looping on 9 elements with 9 operations to only 2.
So writing a function that use this intuition will give us a more efficient in time, not necessarily in memory, algorithm. Using your numberOfBits() function I have created this solution:
int intuitionSol(int a, int b){
int digitsForA = numberOfBits(a);
int digitsForB = numberOfBits(b);
if(digitsForA != digitsForB){
//because a or b can be that isn't the first or last element of the
// interval that a specific number of digit can rappresent there is a need
// to execute some correction operation before on a and b
int tmp = pow(2,digitsForA) - a;
int result = tmp*digitsForA; //will containt the final result that will be returned
int i;
for(i = digitsForA + 1; i < digitsForB; i++){
int interval_elements = pow(2,i) - pow(2,i-1);
result = result + ((interval_elements) * i);
//printf("NumOfElem: %i for %i digits; sum:= %i\n", interval_elements, i, result);
int tmp1 = ((b + 1) - pow(2,digitsForB-1));
result = result + tmp1*digitsForB;
return result;
else {
int elements = (b - a) + 1;
return elements * digitsForA; // or digitsForB
Let's look at the cost, this algorithm costs is the cost of doing correction operation on a and b plus the most expensive one that of the for-loop. In my solution however I'm not looping over all elements but only on numberOfBits(b)-numberOfBits(a) that in the worst case, when [0,n], become log(n)-1 thats equivalent to O(log n).
To resume we passed from a linear operations cost O(n) to a logartmic one O(log n) in the worst case. Look on this diagram the diferinces between the two.
When I talk about interval or sub-interval I refer to the interval of elements that use the same number of digits to represent the number in binary.
Following there are some output of my tests with the last one that shows the difference:
Considered interval is [0,4]
YourSol: 9 in time: 0.000015s
IntuitionSol: 9 in time: 0.000007s
Considered interval is [0,0]
YourSol: 1 in time: 0.000005s
IntuitionSol: 1 in time: 0.000005s
Considered interval is [4,7]
YourSol: 12 in time: 0.000016s
IntuitionSol: 12 in time: 0.000005s
Considered interval is [2,123456]
YourSol: 1967697 in time: 0.005010s
IntuitionSol: 1967697 in time: 0.000015s

Algorithm to find nth root of a number

I am looking for an efficient algorithm to find nth root of a number. The answer must be an integer. I have found that newtons method and bisection method are popular methods. Are there any efficient and simple methods for integer output?
#include <math.h>
inline int root(int input, int n)
return round(pow(input, 1./n));
This works for pretty much the whole integer range (as IEEE754 8-byte doubles can represent the whole 32-bit int range exactly, which are the representations and sizes that are used on pretty much every system). And I doubt any integer based algorithm is faster on non-ancient hardware. Including ARM. Embedded controllers (the microwave washing machine kind) might not have floating point hardware though. But that part of the question was underspecified.
I know this thread is probably dead, but I don't see any answers I like and that bugs me...
int root(int a, int n) {
int v = 1, bit, tp, t;
if (n == 0) return 0; //error: zeroth root is indeterminate!
if (n == 1) return a;
tp = iPow(v,n);
while (tp < a) { // first power of two such that v**n >= a
v <<= 1;
tp = iPow(v,n);
if (tp == a) return v; // answer is a power of two
v >>= 1;
bit = v >> 1;
tp = iPow(v, n); // v is highest power of two such that v**n < a
while (a > tp) {
v += bit; // add bit to value
t = iPow(v, n);
if (t > a) v -= bit; // did we add too much?
else tp = t;
if ( (bit >>= 1) == 0) break;
return v; // closest integer such that v**n <= a
// used by root function...
int iPow(int a, int e) {
int r = 1;
if (e == 0) return r;
while (e != 0) {
if ((e & 1) == 1) r *= a;
e >>= 1;
a *= a;
return r;
This method will also work with arbitrary precision fixed point math in case you want to compute something like sqrt(2) to 100 decimal places...
I question your use of "algorithm" when speaking of C programs. Programs and algorithms are not the same (an algorithm is mathematical; a C program is expected to be implementing some algorithm).
But on current processors (like in recent x86-64 laptops or desktops) the FPU is doing fairly well. I guess (but did not benchmark) that a fast way of computing the n-th root could be,
inline unsigned root(unsigned x, unsigned n) {
switch (n) {
case 0: return 1;
case 1: return x;
case 2: return (unsigned)sqrt((double)x);
case 3: return (unsigned)cbrt((double)x);
default: return (unsigned) pow (x, 1.0/n);
(I made a switch because many processors have hardware to compute sqrt and some have hardware to compute cbrt ..., so you should prefer these when relevant...).
I am not sure that n-th root of a negative number makes sense in general. So my root function takes some unsigned x and returns some unsigned number.  
Here is an efficient general implementation in C, using a simplified version of the "shifting nth root algorithm" to compute the floor of the nth root of x:
uint64_t iroot(const uint64_t x, const unsigned n)
if ((x == 0) || (n == 0)) return 0;
if (n == 1) return x;
uint64_t r = 1;
for (int s = ((ilog2(x) / n) * n) - n; s >= 0; s -= n)
r <<= 1;
r |= (ipow(r|1, n) <= (x >> s));
return r;
It needs this function to compute the nth power of x (using the method of exponentiation by squaring):
uint64_t ipow(uint64_t x, unsigned n)
if (x <= 1) return x;
uint64_t y = 1;
for (; n != 0; n >>= 1, x *= x)
if (n & 1)
y *= x;
return y;
and this function to compute the floor of base-2 logarithm of x:
int ilog2(uint64_t x)
#if __has_builtin(__builtin_clzll)
return 63 - ((x != 0) * (int)__builtin_clzll(x)) - ((x == 0) * 64);
int y = -(x == 0);
for (unsigned k = 64 / 2; k != 0; k /= 2)
if ((x >> k) != 0)
{ x >>= k; y += k; }
return y;
Note: This assumes that your compiler understands GCC's __has_builtin test and that your compiler's uint64_t type is the same size as an unsigned long long.
You can try this C function to get the nth_root of an unsigned integer :
unsigned initial_guess_nth_root(unsigned n, unsigned nth){
unsigned res = 1;
for(; n >>= 1; ++res);
return nth ? 1 << (res + nth - 1) / nth : 0 ;
// return a number that, when multiplied by itself nth times, makes N.
unsigned nth_root(const unsigned n, const unsigned nth) {
unsigned a = initial_guess_nth_root(n , nth), b, c, r = nth ? a + (n > 0) : n == 1 ;
for (; a < r; b = a + (nth - 1) * r, a = b / nth)
for (r = a, a = n, c = nth - 1; c && (a /= r); --c);
return r;
Example of output :
24 == (int) pow(15625, 1.0/3)
25 == nth_root(15625, 3)
0 == nth_root(0, 0)
1 == nth_root(1, 0)
4 == nth_root(4096, 6)
13 == nth_root(18446744073709551614, 17) // 64-bit 20 digits
11 == nth_root(340282366920938463463374607431768211454, 37) // 128-bit 39 digits
Here is the github source.

Pollard Rho factorization method implementation in C

Can anyone help me out with the pollard rho implementation? I have implemented this in C. It's working fine for numbers upto 10 digits but it's not able to handle greater numbers.
Please help me out to improve it to carry out factorization of numbers upto 18 digits . My code is this:
int gcd(int a, int b)
if(b==0) return a ;
return(gcd(b,a%b)) ;
long long int mod(long long int a , long long int b , long long int n )
long long int x=1 , y=a ;
if(b%2==1) x = ((x%n)*(y%n))%n ;
y = ((y%n)*(y%n))%n ;
b/=2 ;
return x%n ;
int isprimes(long long int u)
return 1 ;
int a = 2 , i ;
long long int k , t = 0 , r , p ;
k = u-1 ;
{ k/=2 ; t++ ; }
while(a<=3) /*der are no strong pseudoprimes common in base 2 and base 3*/
r = mod(a,k,u) ;
for(i = 1 ; i<=t ; i++)
p = ((r%u)*(r%u))%u ;
{ return 0 ; }
r = p ;
return 0 ;
a++ ;
return 1 ;
long long int pol(long long int u)
long long int x = 2 , k , i , a , y , c , s;
int d = 1 ;
k = 2 ;
i = 1 ;
y = x ;
a = u ;
return 1;
c=-1 ;
s = 2 ;
x=((x%u)*(x%u)-1)% u ;
d = gcd(abs(y-x),u) ;
{ printf("%d ",d);
while(a%d==0) { a=a/d; }
x = 2 ;
k = 2 ;
i = 1 ;
y = x ;
{ return 0 ; }
{ return a ; }
u=a ;
{y = x ; k*=2 ; c = x ;} /*floyd cycle detection*/
{ x = ++s ; }
return ;
int main()
long long int t ;
long long int i , n , j , k , a , b , u ;
{ u = n ; k = 0 ;
{ u/=2 ; k = 1 ; }
if(k==1) printf("2 ") ;
t = pol(u) ;
{ printf("%lld",u) ; }
{ printf("%lld",t) ; }
return 0;
sorry for the long code ..... I am a new coder.
When you're multiplying two numbers modulo m, the intermediate product can become nearly m^2. So if you use a 64-bit unsigned integer type, the maximal modulus it can handle is 2^32, if the modulus is larger, overflow may happen. It will be rare when the modulus is only slightly larger, but that makes it only less obvious, you cannot rely on being lucky if the modulus allows the possibility of overflow.
You can gain a larger range by a factor of two if you choose a representative of the residue class modulo m of absolute value at most m/2 or something equivalent:
uint64_t mod_mul(uint64_t x, uint64_t y, uint64_t m)
int neg = 0;
// if x is too large, choose m-x and note that we need one negation for that at the end
if (x > m/2) {
x = m - x;
neg = !neg;
// if y is too large, choose m-y and note that we need one negation for that at the end
if (y > m/2) {
y = m - y;
neg = !neg;
uint64_t prod = (x * y) % m;
// if we had negated _one_ factor, and the product isn't 0 (mod m), negate
if (neg && prod) {
prod = m - prod;
return prod;
So that would allow moduli of up to 2^33 with a 64-bit unsigned type. Not a big step.
The recommended solution to the problem is the use of a big-integer library, for example GMP is available as a distribution package on most if not all Linux distros, and also (relatively) easily installable on Windows.
If that is not an option (really, are you sure?), you can get it to work for larger moduli (up to 2^63 for an unsigned 64-bit integer type) using Russian peasant multiplication:
x * y = 2 * (x * (y/2)) + (x * (y % 2))
so for the calculation, you only need that 2*(m-1) doesn't overflow.
uint64_t mod_mult(uint64_t x, uint64_t y, uint64_t m)
if (y == 0) return 0;
if (y == 1) return x % m;
uint64_t temp = mod_mult(x,y/2,m);
temp = (2*temp) % m;
if (y % 2 == 1) {
temp = (temp + x) % m;
return temp;
Note however that this algorithm needs O(log y) steps, so it's rather slow in practice. For smaller m you can speed it up, if 2^k*(m-1) doesn't overflow, you can proceed in steps of k bits instead of single bits (x*y = ((x * (y >> k)) << k) + (x * (y & ((1 << k)-1)))), which is a good improvement if your moduli are never larger than 48 or 56 bits, say.
Using that variant of modular multiplication, your algorithm will work for larger numbers (but it will be significantly slower). You can also try test for the size of the modulus and/or the factors to determine which method to use, if m < 2^32 or x < (2^64-1)/y, the simple (x * y) % m will do.
You can try this C implementation of Pollard Rho :
unsigned long long pollard_rho(const unsigned long long N) {
// Require : a composite number N, not a square.
// Ensure : res is a non-trivial factor of N.
// Option : define a timeout, define a rand function.
static const int timeout = 18;
static unsigned long long rand_val = 2994439072U;
rand_val = (rand_val * 1025416097U + 286824428U) % 4294967291LLU;
unsigned long long res = 1, a, b, c, i = 0, j = 1, x = 1, y = 1 + rand_val % (N - 1);
for (; res == 1; ++i) {
if (i == j) {
if (j >> timeout)
j <<= 1;
x = y;
a = y, b = y;
for (y = 0; a; a & 1 ? b >= N - y ? y -= N : 0, y += b : 0, a >>= 1, (c = b) >= N - b ? c -= N : 0, b += c);
y = (1 + y) % N;
for (a = N, b = y > x ? y - x : x - y; (a %= b) && (b %= a););
res = a | b;
return res;
Otherwise there is a pure C quadratic sieve which factors numbers from 0 to 300-bit.
