Recursion that branches running out of memory - c

I have a programming assignment that goes like this:
You are given three numbers a, b, and c. (1 ≤ а, b, c ≤ 10^18)
Each time you have two choises, either add b to a (a+=b), or add a to b (b+=a). Write a program that will print out YES or NO depending on whether you can get to c by adding a and b to each other.
I've tried solving this problem using recursion that branches to two branches every time where one branch stores a+b, b and the other branch stores a, b+a. In every recursive call, the function checks the values of a and b, and if they are equal to c the search stops and the function prints YES. The recursion stops when either a or b have a value greater than c.
Here's how the branching works:
And here's the code in C:
#include <stdio.h>
#include <stdlib.h>
void tree(long long int a, long long int b, long long int c){
if(a==c || b==c){
printf("YES");
exit(0);
}
else if(a<c && b<c){
tree(a, b+a, c);
tree(a+b, b, c);
}
}
int main(){
long long int a, b, c;
scanf("%I64d", &a);
scanf("%I64d", &b);
scanf("%I64d", &c);
tree(a, b, c);
printf("NO");
return 0;
}
Now, this program works for small numbers, but since a b and c can be any 64-bit number, the tree can branch itself a few billion times, and the program runs out of memory and crashes.
My question is: Is there any way i can improve my code, or use any other way (other then recursion) to solve this?

OK I'll have to admit that this turned out to be a fascinating question. I really thought that there should be a quick way of finding out the answer but the more I looked at the problem, the more complex it became. For example, if you zigzag down the tree, alternating a+=b with b+=a, you are essentially creating the fibonacci sequence (imagine a=2 and b=3 to start with). Which means that if you could find the answer quickly, you could somehow use a similar program to answer "is c a fibonacci number"?
So I never came up with anything better than searching the binary tree. But I did come up with a way to search the binary tree without running out of memory. The key trick in my algorithm is that at every node you need to search two child nodes. But you don't need to recurse down both paths. You only need to recurse down one path, and if that fails, you can iterate to the other child. When recursing, you should always pick the path where the smaller number changes. This guarantees that you are doubling the minimum element on each recursion level, which guarantees that you will only recurse 64 times max before your minimum element will exceed 2^64.
So I wrote the program and ran it, and it worked just fine. That is until I entered a very large number for c. At that point, it didn't finish. I found from testing that the algorithm appears to have an O(N^2) running time, where N = c. Here are some sample running times (all on a desktop running 64-bit Windows).
Inputs Time in minutes
------ ---------------
a=2 b=3 c=10000000000 (10^10): 0:20
a=2 b=3 c=100000000000 (10^11): 13:42
a=2 b=3 c=100000000001 : 2:21 (randomly found the answer quickly)
a=2 b=3 c=100000000002 : 16:36
a=150 b=207 c=10000000 (10^7) : 0:08 (no solution)
a=150 b=207 c=20000000 : 0:31 (no solution)
a=150 b=207 c=40000000 : 2:05 (no solution)
a=150 b=207 c=100000000 (10^8) : 12:48 (no solution)
Here is my code:
// Given three numbers: a, b, c.
//
// At each step, either do: a += b, or b += a.
// Can you make either a or b equal to c?
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
static int solve(uint64_t a, uint64_t b, uint64_t c);
int main(int argc, char *argv[])
{
uint64_t a = 0, b = 0, c = 0;
if (argc < 4) {
printf("Usage: %s a b c\n", argv[0]);
exit(0);
}
a = strtoull(argv[1], NULL, 0);
b = strtoull(argv[2], NULL, 0);
c = strtoull(argv[3], NULL, 0);
// Note, by checking to see if a or b are solutions here, solve() can
// be made simpler by only checking a + b == c. That speeds up solve().
if (a == c || b == c || solve(a, b, c))
printf("There is a solution\n");
else
printf("There is NO solution\n");
return 0;
}
int solve(uint64_t a, uint64_t b, uint64_t c)
{
do {
uint64_t sum = a + b;
// Check for past solution.
if (sum > c)
return 0;
// Check for solution.
if (sum == c)
return 1;
// The algorithm is to search both branches (a += b and b += a).
// But first we search the branch where add the higher number to the
// lower number, because that branch will be guaranteed to double the
// lower number, meaning we will not recurse more than 64 times. Then
// if that doesn't work out, we iterate to the other branch.
if (a < b) {
// Try a += b recursively.
if (solve(sum, b, c))
return 1;
// Failing that, try b += a.
b = sum;
} else {
// Try b += a recursively.
if (solve(a, sum, c))
return 1;
// Failing that, try a += b.
a = sum;
}
} while(1);
}
Edit: I optimized the above program by removing recursion, reordering the arguments so that a is always less than b at every step, and some more tricks. It runs about 50% faster than before. You can find the optimized program here.

Based on comment from #Oliver Charlesworth, this is an iterative not recursive solution so it won't solve the homework. But it's pretty simple, I step through b because it is larger than a (although that is not entirely clear from the OP) hence fewer iterations required.
#include <stdio.h>
int main(){
unsigned long long int a, b, c, bb;
scanf("%I64u", &a);
scanf("%I64u", &b);
scanf("%I64u", &c);
if (a >= 1 && a < b && b < c) {
for (bb=b; bb<c; bb+=b) {
if ((c - bb) % a == 0) {
printf ("YES\n");
return 0;
}
}
}
printf("NO\n");
return 0;
}

Related

While using recursive function calls when to use 'return'?

I am confused when to use 'return' while using recursive function calls.
I am trying to find the "GCD (Greatest common divisor)" of two numbers. What I actually thought would work is:
include <stdio.h>
int gcd (int a, int b);
int main ()
{
int a, b;
printf ("Enter two numbers \n");
scanf ("%d %d", &a, &b);
printf ("GCD for numbers %d and %d is %d\n", a, b, gcd(a,b));
return (0);
}
int gcd (int a, int b)
{
while (a!=b)
{
if (a > b)
gcd(a-b,b);
else if (b > a)
gcd(a,b-a);
}
return (a);
}
But the above code continuously accepts numbers from terminal and fails to run the code.
However, when I replace the function definition as follows the code works as expected returning the right values.
int gcd (int a, int b)
{
while (a!=b)
{
if (a > b)
return gcd(a-b,b);
else if (b > a)
return gcd(a,b-a);
}
return (a);
}
As you see the only change is addition of 'return' before the recursive function call. Why is return required there considering in both the cases I am calling the gcd(arg1, arg2) function?
Why is return required there considering in both the cases I am calling the gcd(arg1, arg2) function?
For the same reason that it is required any other time you call a function and wish to return the value that was returned by that function call; because calling it only calls it, and does nothing else with the resulting value.
I am confused when to use 'return' while using recursive function calls.
Use return for a recursive call, whenever you would use return for any other function call - i.e.: when, and because, that call returns the value you wish to return this time around.
Imagine that we have
#include "magic.h" /* defines gcd2(), which computes GCD in some mysterious way */
And then instead of making recursive calls, we delegate some of the work to that:
/* Of course this solution also works, but is not interesting
int gcd(int a, int b)
{
return gcd2(a, b);
} */
/* So, let's do something that actually shows off the recurrence relation */
int gcd(int a, int b)
{
if (a > b)
return gcd2(a-b, b);
else if (b > a)
return gcd2(a, b-a);
else
return a;
}
(I also removed the while loop, because it is not relevant to the algorithm; of course, a return is reached in every circumstance, and this breaks the loop.)
I assume I don't need to go over the mathematical theory; and also I assume it is clear why return is needed for the gcd2 results.
But it doesn't actually matter how the work is delegated; if gcd is a function that correctly computes GCDs, and gcd2 is also such, then a call to gcd2 may be replaced by a call to gcd. This is the secret - calling a function recursively is not actually different from calling one normally. It's just that considering the possibility, requires a clearer understanding of how calling a function works and what it actually does.
Of course, it is also possible to make good use of the original while loop - by subtracting out as much as possible before doing the recursion. That might look like:
int gcd(int a, int b)
{
if (a > b)
while (a > b)
a -= b; /* prepare a value for the recursion. */
return gcd(a, b); /* and then recurse with that value. */
else if (b > a)
while (b > a)
b -= a; /* similarly. */
return gcd(a, b);
else /* a == b */
return a;
}
But then we might as well go all the way and convert to an iterative approach:
int gcd(int a, int b)
{
while (a != b)
/* whichever is greater, adjust it downward, leaving an (a, b)
pair that has the same GCD. Eventually we reach an equal pair,
for which the result is known. */
if (a > b)
a -= b;
else
b -= a;
return a; /* since the loop exited, they are equal now. */
}
(And we also could do modulo arithmetic to accomplish multiple subtractions at once; this is left as an exercise.)
The first gcd function "tries" to compute the gcd but in the end always returns a unchanged.
The second gcd function computes the gcd recursively and each invocation returns a gcd.

seemingly ignored assignment to variables in head of for-loop

void main()
{
int a, b, r;
//Finf GCD by Eucledian algorithm
scanf("%d %d", &a, &b);
for( ; b == 0; (a = b), (b = r)){
r = a % b;
printf("GCD is %d", a);
}
printf("GCD is %d", a);
}
Somehow this doesn't work.
I assign a to b and b to r, but this doesn't seem to change the value of a or b.
This for(;b==0;(a=b),(b=r)) sets up the for-loop like this
do nothing to init anything
as long as b equals 0 do the loop body
between loop body executions first copy value of b to a,
then copy value of r to b
Note that the loop will never execute if b starts off non-zero.
Otherwise the loop will stop executing as soon as b becomes non-zero, from being updated with value of r.
(This is somewhat compiling an answer from comments, in order to get this out of the list of unanswered questions. Credits to dddJewelsbbb. I offer to delete this if they make an answer and ask me to.)
Find the corrected working code below, which changes the loop condition:
#include <stdio.h>
void main ()
{
int a, b, r;
//Finf GCD by Eucledian algorithm
scanf ("%d %d", &a, &b);
for (; r > 0; a = b, b = r)
{
r = a % b;
}
printf ("GCD is %d", a);
}
Test by giving inputs: 16, 12
16
12
GCD is 4
Code explanation:
The code is the direct implementation of Euclid's algorithm to find GCD. Here the divisor (one of the number) is used to obtain a remainder (after dividing the second number as dividend). Next, the remainder becomes the divisor and the earlier divisor becomes the dividend and this process continues till remainder is zero.
There is plenty of scope in the code to make it more elegant and handle corner cases. I corrected the exact same code as minimum as possible to spot the exact error (which was in the for loop condition).

The difference between n++ and ++n at the end of a while loop? (ANSI C)

this is probably a dumb question but I just can't figure it out. It has to do with the differences between n++ and ++n (which I thought I understood but apparently not).
#include <stdio.h>
#include <math.h>
long algorithmA(int n);
long algorithmB(int n);
int main(){
long A, B;
A = B = 0;
int n = 1;
while(A >= B){
A = algorithmA(n);
B = algorithmB(n);
n++;
}
printf("At n = %d, Algorithm A performs in %ld seconds & "
"Algorithm B performs in %ld seconds.", n, A, B);
}
long algorithmA(int n){
return pow(n,4) * 86400 * 4;
}
long algorithmB(int n){
return pow(3,n);
}
Here you can probably tell I'm trying to see at what point Algorithm A outperforms Algorithm B. The functions and units of time were given to me in a homework problem.
Anyways, I always thought that the order of "++" would not matter at the end of a while loop. But if I put ++n instead of n++, I get the wrong answer. Can somebody explain why?
Edit: Well it WAS showing 24 with ++n and 25 with n++, but it must have been for another reason. Because I just checked now and there is no difference. Thanks for your patience and time guys, I just wish I knew what I did!
If you increment without assignment, no difference. However, in the following circumstances, there is:
int n = 1;
int x = n++; // x will be 1 and n will be 2
In this example, the statement gets executed prior to the increment.
int n = 1;
int x = ++n; // both x and n will be 2
However, in this example, increment occurs prior to the execution of the statement.
Operator precedence can help you out.
The only difference between n++ and ++n is that n++ yields the original value of n, and ++n yields the value of n after it's been incremented. Both have the side effect of modifying the value of n by incrementing it.
If the result is discarded, as it is in your code, there is no effective difference.
If your program is behaving differently depending on whether you write
n++;
or
++n;
it must be for some other reason.
In fact, when I compile and execute your program on my system, I get exactly the same output in both cases. Adding newlines to the output format, I get:
At n = 25, Algorithm A performs in 114661785600 seconds &
Algorithm B performs in 282429536481 seconds.
You haven't told us what output you're getting. Please update your question to show the output in both cases.
The prefix versions (++n) alter the variable and then pass along its value.
The postfix version (n++) pass along the current value and then alter the variable.

C Program Runs Surprisingly Slow

A simple program I wrote in C takes upwards of half an hour to run. I am surprised that C would take so long to run, because from what I can find on the internet C ( aside from C++ or Java ) is one of the faster languages.
// this is a program to find the first triangular number that is divisible by 500 factors
int main()
{
int a; // for triangular num loop
int b = 1; // limit for triangular num (1+2+3+......+b)
int c; // factor counter
int d; // divisor
int e = 1; // ends loop
long long int t = 0; // triangular number in use
while( e != 0 )
{
c = 0;
// create triangular number t
t = t + b;
b++;
// printf("%lld\n", t); // in case you want to see where it's at
// counts factors
for( d = 1 ; d != t ; d++ )
{
if( t % d == 0 )
{
c++;
}
}
// test to see if condition is met
if( c > 500 )
{
break;
}
}
printf("%lld is the first triangular number with more than 500 factors\n", t);
getchar();
return 0;
}
Granted the program runs through a lot of data, but none of it is ever saved, just tested and passed over.
I am using the Tiny C Compiler on Windows 8.
Is there a reason this runs so slowly? What would be a faster way of achieving the same result?
Thank you!
You're iterating over a ton of numbers you don't need to. By definition, a positive factor is any whole number that can be multiplied by another to obtain the desired product.
Ex: 12 = 1*12, 2*6, and 3*4
The order of multiplication are NOT considered when deciding factors. In other words,
Ex: 12 = 2*6 = 6*2
The order doesn't matter. 2 and 6 are factors once.
The square root is the one singleton that will come out of a factoring of a product that stands alone. All others are in pairs, and I hope that is clear. Given that, you can significantly speed up your code by doing the following:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// this is a program to find the first triangular number that is divisible by 500 factors
int main()
{
int c = 0; // factor counter
long long int b = 0; // limit for triangular num (1+2+3+......+b)
long long int d; // divisor
long long int t = 0; // triangular number in use
long long int r = 0; // root of current test number
while (c <= 500)
{
c = 0;
// next triangular number
t += ++b;
// get closest root.
r = floor(sqrt(t));
// counts factors
for( d = 1 ; d < r; ++d )
{
if( t % d == 0 )
c += 2; // add the factor *pair* (there are two)
}
if (t % r == 0) // add the square root if it is applicable.
++c;
}
printf("%lld is the first triangular number with more than 500 factors\n", t);
return 0;
}
Running this on IDEOne.com takes less than two seconds to come up with the following:
Output
76576500 is the first triangular number with more than 500 factors
I hope this helps. (and I think that is the correct answer). There are certainly more efficient ways of doing this (see here for some spoilers if you're interested), but going with your code idea and seeing how far we could take it was the goal of this answer.
Finally, this finds the first number with MORE than 500 factors (i.e. 501 or more) as per your output message. Your comment at the top of the file indicates you're looking for the first number with 500-or-more, which does not match up with your output message.
Without any math analysis:
...
do
{
c = 0;
t += b;
b++;
for (d = 1; d < t; ++d)
{
if (!(t % d))
{
c++;
}
}
} while (c <= 500);
...
You are implementing an O(n^2) algorithm. It would be surprising if the code took less than a half an hour.
Refer to your computer science textbook for a better method compared to this brute force method of: check 1, 1 + 2, 1 + 2 + 3, etc.
You might be able to shorten the inner for loop. Does it really need to check all the way up to t for factors that divide the triangular number. For example, can 10 be evenly divisible by any number greater than 5? or 100 by any number greater than 50?
Thus, given a number N, what is the largest number that can evenly divide N?
Keep reading/researching this problem.
Also, as other people have mentioned, the outer loop could be simply coded as:
while (1)
{
// etc.
}
So, no need need to declare e, or a? Note, this doesn't affect the length of time, but your coding style indicates you are still learning and thus a reviewer would question everything your code does!!
You are doing some unnecessary operations, and I think those instructions are not at all required if we can check that simply.
first :
while(e!=0)
as you declared e=1, if you put only 1 in loop it will work. You are not updating value of e anywhere.
Change that and check whether it works fine or not.
One of the beautiful things about triangle numbers, is that if you have a triangle number, with a simple addition operation, you can have the next one.

Not getting proper output from Pollard's rho algorithm implementation

I don't know where I am doing wrong in trying to calculate prime factorizations using Pollard's rho algorithm.
#include<stdio.h>
#define f(x) x*x-1
int pollard( int );
int gcd( int, int);
int main( void ) {
int n;
scanf( "%d",&n );
pollard( n );
return 0;
}
int pollard( int n ) {
int i=1,x,y,k=2,d;
x = rand()%n;
y = x;
while(1) {
i++;
x = f( x ) % n;
d = gcd( y-x, n);
if(d!=1 && d!=n)
printf( "%d\n", d);
if(i == k) {
y = x;
k = 2 * k;
}
}
}
int gcd( int a, int b ) {
if( b == 0)
return a;
else
return gcd( b, a % b);
}
One immediate problem is, as Peter de Rivaz suspected the
#define f(x) x*x-1
Thus the line
x = f(x)%n;
becomes
x = x*x-1%n;
and the precedence of % is higher than that of -, hence the expression is implicitly parenthesised as
x = (x*x) - (1%n);
which is equivalent to x = x*x - 1; (I assume n > 1, anyway it's x = x*x - constant;) and if you start with a value x >= 2, you have overflow before you had a realistic chance of finding a factor:
2 -> 2*2-1 = 3 -> 3*3 - 1 = 8 -> 8*8 - 1 = 63 -> 3968 -> 15745023 -> overflow if int is 32 bits
That doesn't immediately make it impossible that gcd(y-x,n) is a factor, though. It just makes it likely that at a stage where theoretically, you would have found a factor, the overflow destroys the common factor that mathematically would exist - more likely than a common factor introduced by overflow.
Overflow of signed integers is undefined behaviour, so there are no guarantees how the programme behaves, but usually it behaves consistently so the iteration of f still produces a well-defined sequence for which the algorithm in principle works.
Another problem is that y-x will frequently be negative, and then the computed gcd can also be negative - often -1. In that case, you print -1.
And then, it is a not too rare occurrence that iterating f from a starting value doesn't detect a common factor because the cycles modulo both prime factors (for the example of n a product of two distinct primes) have equal length and are entered at the same time. You make no attempt at detecting such a case; whenever gcd(|y-x|, n) == n, any further work in that sequence is pointless, so you should break out of the loop when d == n.
Also, you never check whether n is a prime, in which case trying to find a factor is a futile undertaking from the start.
Furthermore, after fixing f(x) so that the % n applies to the complete result of f(x), you have the problem that x*x still overflows for relatively small x (with the standard signed 32-bit ints, for x >= 46341), so factoring larger n may fail due to overflow. At least, you should use unsigned long long for the computations, so that overflow is avoided for n < 2^32. However, factorising such small numbers is typically done more efficiently with trial division. Pollard's Rho method and other advanced factoring algorithms are meant for larger numbers, where trial division is no longer efficient or even feasible.
I'm just a novice at C++, and I am new to Stack Overflow, so some of what I have written is going to look sloppy, but this should get you going in the right direction. The program posted here should generally find and return one non-trivial factor of the number you enter at the prompt, or it will apologize if it cannot find such a factor.
I tested it with a few semiprime numbers, and it worked for me. For 371156167103, it finds 607619 without any detectable delay after I hit the enter key. I didn't check it with larger numbers than this. I used unsigned long long variables, but if possible, you should get and use a library that provides even larger integer types.
Editing to add, the single call to the method f for X and 2 such calls for Y is intentional and is in accordance with the way the algorithm works. I thought to nest the call for Y inside another such call to keep it on one line, but I decided to do it this way so it's easier to follow.
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
typedef unsigned long long ULL;
ULL pollard(ULL numberToFactor);
ULL gcd(ULL differenceBetweenCongruentFunctions, ULL numberToFactor);
ULL f(ULL x, ULL numberToFactor);
int main(void)
{
ULL factor;
ULL n;
std::cout<<"Enter the number for which you want a prime factor: ";
std::cin>>n;
factor = pollard(n);
if (factor == 0) std::cout<<"No factor found. Your number may be prime, but it is not certain.\n\n";
else std::cout<<"One factor is: "<<factor<<"\n\n";
}
ULL pollard(ULL n)
{
ULL x = 2ULL;
ULL y = 2ULL;
ULL d = 1ULL;
while(d==1||d==n)
{
x = f(x,n);
y = f(y,n);
y = f(y,n);
if (y>x)
{
d = gcd(y-x, n);
}
else
{
d = gcd(x-y, n);
}
}
return d;
}
ULL gcd(ULL a, ULL b)
{
if (a==b||a==0)
return 0; // If x==y or if the absolute value of (x-y) == the number to be factored, then we have failed to find
// a factor. I think this is not proof of primality, so the process could be repeated with a new function.
// For example, by replacing x*x+1 with x*x+2, and so on. If many such functions fail, primality is likely.
ULL currentGCD = 1;
while (currentGCD!=0) // This while loop is based on Euclid's algorithm
{
currentGCD = b % a;
b=a;
a=currentGCD;
}
return b;
}
ULL f(ULL x, ULL n)
{
return (x * x + 1) % n;
}
Sorry for the long delay getting back to this. As I mentioned in my first answer, I am a novice at C++, which will be evident in my excessive use of global variables, excessive use of BigIntegers and BigUnsigned where other types might be better, lack of error checking, and other programming habits on display which a more skilled person might not exhibit. That being said, let me explain what I did, then will post the code.
I am doing this in a second answer because the first answer is useful as a very simple demo of how a Pollard's Rho algorithm is to implement once you understand what it does. And what it does is to first take 2 variables, call them x and y, assign them the starting values of 2. Then it runs x through a function, usually (x^2+1)%n, where n is the number you want to factor. And it runs y through the same function twice each cycle. Then the difference between x and y is calculated, and finally the greatest common divisor is found for this difference and n. If that number is 1, then you run x and y through the function again.
Continue this process until the GCD is not 1 or until x and y are equal again. If the GCD is found which is not 1, then that GCD is a non-trivial factor of n. If x and y become equal, then the (x^2+1)%n function has failed. In that case, you should try again with another function, maybe (x^2+2)%n, and so on.
Here is an example. Take 35, for which we know the prime factors are 5 and 7. I'll walk through Pollard Rho and show you how it finds a non-trivial factor.
Cycle #1: X starts at 2. Then using the function (x^2+1)%n, (2^2+1)%35, we get 5 for x. Y starts at 2 also, and after one run through the function, it also has a value of 5. But y always goes through the function twice, so the second run is (5^2+1)%35, or 26. The difference between x and y is 21. The GCD of 21 (the difference) and 35 (n) is 7. We have already found a prime factor of 35! Note that the GCD for any 2 numbers, even extremely large exponents, can be found very quickly by formula using Euclid's algorithm, and that's what the program I will post here does.
On the subject of the GCD function, I am using one library I downloaded for this program, a library that allows me to use BigIntegers and BigUnsigned. That library also has a GCD function built in, and I could have used it. But I decided to stay with the hand-written GCD function for instructional purposes. If you want to improve the program's execution time, it might be a good idea to use the library's GCD function because there are faster methods than Euclid, and the library may be written to use one of those faster methods.
Another side note. The .Net 4.5 library supports the use of BigIntegers and BigUnsigned also. I decided not to use that for this program because I wanted to write the whole thing in C++, not C++/CLI. You could get better performance from the .Net library, or you might not. I don't know, but I wanted to share that that is also an option.
I am jumping around a bit here, so let me start now by explaining in broad strokes what the program does, and lastly I will explain how to set it up on your computer if you use Visual Studio 11 (also called Visual Studio 2012).
The program allocates 3 arrays for storing the factors of any number you give it to process. These arrays are 1000 elements wide, which is excessive, maybe, but it ensures any number with 1000 prime factors or less will fit.
When you enter the number at the prompt, it assumes the number is composite and puts it in the first element of the compositeFactors array. Then it goes through some admittedly inefficient while loops, which use Miller-Rabin to check if the number is composite. Note this test can either say a number is composite with 100% confidence, or it can say the number is prime with extremely high (but not 100%) confidence. The confidence is adjustable by a variable confidenceFactor in the program. The program will make one check for every value between 2 and confidenceFactor, inclusive, so one less total check than the value of confidenceFactor itself.
The setting I have for confidenceFactor is 101, which does 100 checks. If it says a number is prime, the odds that it is really composite are 1 in 4^100, or the same as the odds of correctly calling the flip of a fair coin 200 consecutive times. In short, if it says the number is prime, it probably is, but the confidenceFactor number can be increased to get greater confidence at the cost of speed.
Here might be as good a place as any to mention that, while Pollard's Rho algorithm can be pretty effective factoring smaller numbers of type long long, the Miller-Rabin test to see if a number is composite would be more or less useless without the BigInteger and BigUnsigned types. A BigInteger library is pretty much a requirement to be able to reliably factor large numbers all the way to their prime factors like this.
When Miller Rabin says the factor is composite, it is factored, the factor stored in a temp array, and the original factor in the composites array divided by the same factor. When numbers are identified as likely prime, they are moved into the prime factors array and output to screen. This process continues until there are no composite factors left. The factors tend to be found in ascending order, but this is coincidental. The program makes no effort to list them in ascending order, but only lists them as they are found.
Note that I could not find any function (x^2+c)%n which will factor the number 4, no matter what value I gave c. Pollard Rho seems to have a very hard time with all perfect squares, but 4 is the only composite number I found which is totally impervious to it using functions in the format described. Therefore I added a check for an n of 4 inside the pollard method, returning 2 instantly if so.
So to set this program up, here is what you should do. Go to https://mattmccutchen.net/bigint/ and download bigint-2010.04.30.zip. Unzip this and put all of the .hh files and all of the C++ source files in your ~\Program Files\Microsoft Visual Studio 11.0\VC\include directory, excluding the Sample and C++ Testsuite source files. Then in Visual Studio, create an empty project. In the solution explorer, right click on the resource files folder and select Add...existing item. Add all of the C++ source files in the directory I just mentioned. Then also in solution expolorer, right click the Source Files folder and add a new item, select C++ file, name it, and paste the below source code into it, and it should work for you.
Not to flatter overly much, but there are folks here on Stack Overflow who know a great deal more about C++ than I do, and if they modify my code below to make it better, that's fantastic. But even if not, the code is functional as-is, and it should help illustrate the principles involved in programmatically finding prime factors of medium sized numbers. It will not threaten the general number field sieve, but it can factor numbers with 12 - 14 digit prime factors in a reasonably short time, even on an old Core2 Duo computer like the one I am using.
The code follows. Good luck.
#include <string>
#include <stdio.h>
#include <iostream>
#include "BigIntegerLibrary.hh"
typedef BigInteger BI;
typedef BigUnsigned BU;
using std::string;
using std::cin;
using std::cout;
BU pollard(BU numberToFactor);
BU gcda(BU differenceBetweenCongruentFunctions, BU numberToFactor);
BU f(BU x, BU numberToFactor, int increment);
void initializeArrays();
BU getNumberToFactor ();
void factorComposites();
bool testForComposite (BU num);
BU primeFactors[1000];
BU compositeFactors[1000];
BU tempFactors [1000];
int primeIndex;
int compositeIndex;
int tempIndex;
int numberOfCompositeFactors;
bool allJTestsShowComposite;
int main ()
{
while(1)
{
primeIndex=0;
compositeIndex=0;
tempIndex=0;
initializeArrays();
compositeFactors[0] = getNumberToFactor();
cout<<"\n\n";
if (compositeFactors[0] == 0) return 0;
numberOfCompositeFactors = 1;
factorComposites();
}
}
void initializeArrays()
{
for (int i = 0; i<1000;i++)
{
primeFactors[i] = 0;
compositeFactors[i]=0;
tempFactors[i]=0;
}
}
BU getNumberToFactor ()
{
std::string s;
std::cout<<"Enter the number for which you want a prime factor, or 0 to quit: ";
std::cin>>s;
return stringToBigUnsigned(s);
}
void factorComposites()
{
while (numberOfCompositeFactors!=0)
{
compositeIndex = 0;
tempIndex = 0;
// This while loop finds non-zero values in compositeFactors.
// If they are composite, it factors them and puts one factor in tempFactors,
// then divides the element in compositeFactors by the same amount.
// If the element is prime, it moves it into tempFactors (zeros the element in compositeFactors)
while (compositeIndex < 1000)
{
if(compositeFactors[compositeIndex] == 0)
{
compositeIndex++;
continue;
}
if(testForComposite(compositeFactors[compositeIndex]) == false)
{
tempFactors[tempIndex] = compositeFactors[compositeIndex];
compositeFactors[compositeIndex] = 0;
tempIndex++;
compositeIndex++;
}
else
{
tempFactors[tempIndex] = pollard (compositeFactors[compositeIndex]);
compositeFactors[compositeIndex] /= tempFactors[tempIndex];
tempIndex++;
compositeIndex++;
}
}
compositeIndex = 0;
// This while loop moves all remaining non-zero values from compositeFactors into tempFactors
// When it is done, compositeFactors should be all 0 value elements
while (compositeIndex < 1000)
{
if (compositeFactors[compositeIndex] != 0)
{
tempFactors[tempIndex] = compositeFactors[compositeIndex];
compositeFactors[compositeIndex] = 0;
tempIndex++;
compositeIndex++;
}
else compositeIndex++;
}
compositeIndex = 0;
tempIndex = 0;
// This while loop checks all non-zero elements in tempIndex.
// Those that are prime are shown on screen and moved to primeFactors
// Those that are composite are moved to compositeFactors
// When this is done, all elements in tempFactors should be 0
while (tempIndex<1000)
{
if(tempFactors[tempIndex] == 0)
{
tempIndex++;
continue;
}
if(testForComposite(tempFactors[tempIndex]) == false)
{
primeFactors[primeIndex] = tempFactors[tempIndex];
cout<<primeFactors[primeIndex]<<"\n";
tempFactors[tempIndex]=0;
primeIndex++;
tempIndex++;
}
else
{
compositeFactors[compositeIndex] = tempFactors[tempIndex];
tempFactors[tempIndex]=0;
compositeIndex++;
tempIndex++;
}
}
compositeIndex=0;
numberOfCompositeFactors=0;
// This while loop just checks to be sure there are still one or more composite factors.
// As long as there are, the outer while loop will repeat
while(compositeIndex<1000)
{
if(compositeFactors[compositeIndex]!=0) numberOfCompositeFactors++;
compositeIndex ++;
}
}
return;
}
// The following method uses the Miller-Rabin primality test to prove with 100% confidence a given number is composite,
// or to establish with a high level of confidence -- but not 100% -- that it is prime
bool testForComposite (BU num)
{
BU confidenceFactor = 101;
if (confidenceFactor >= num) confidenceFactor = num-1;
BU a,d,s, nMinusOne;
nMinusOne=num-1;
d=nMinusOne;
s=0;
while(modexp(d,1,2)==0)
{
d /= 2;
s++;
}
allJTestsShowComposite = true; // assume composite here until we can prove otherwise
for (BI i = 2 ; i<=confidenceFactor;i++)
{
if (modexp(i,d,num) == 1)
continue; // if this modulus is 1, then we cannot prove that num is composite with this value of i, so continue
if (modexp(i,d,num) == nMinusOne)
{
allJTestsShowComposite = false;
continue;
}
BU exponent(1);
for (BU j(0); j.toInt()<=s.toInt()-1;j++)
{
exponent *= 2;
if (modexp(i,exponent*d,num) == nMinusOne)
{
// if the modulus is not right for even a single j, then break and increment i.
allJTestsShowComposite = false;
continue;
}
}
if (allJTestsShowComposite == true) return true; // proven composite with 100% certainty, no need to continue testing
}
return false;
/* not proven composite in any test, so assume prime with a possibility of error =
(1/4)^(number of different values of i tested). This will be equal to the value of the
confidenceFactor variable, and the "witnesses" to the primality of the number being tested will be all integers from
2 through the value of confidenceFactor.
Note that this makes this primality test cryptographically less secure than it could be. It is theoretically possible,
if difficult, for a malicious party to pass a known composite number for which all of the lowest n integers fail to
detect that it is composite. A safer way is to generate random integers in the outer "for" loop and use those in place of
the variable i. Better still if those random numbers are checked to ensure no duplicates are generated.
*/
}
BU pollard(BU n)
{
if (n == 4) return 2;
BU x = 2;
BU y = 2;
BU d = 1;
int increment = 1;
while(d==1||d==n||d==0)
{
x = f(x,n, increment);
y = f(y,n, increment);
y = f(y,n, increment);
if (y>x)
{
d = gcda(y-x, n);
}
else
{
d = gcda(x-y, n);
}
if (d==0)
{
x = 2;
y = 2;
d = 1;
increment++; // This changes the pseudorandom function we use to increment x and y
}
}
return d;
}
BU gcda(BU a, BU b)
{
if (a==b||a==0)
return 0; // If x==y or if the absolute value of (x-y) == the number to be factored, then we have failed to find
// a factor. I think this is not proof of primality, so the process could be repeated with a new function.
// For example, by replacing x*x+1 with x*x+2, and so on. If many such functions fail, primality is likely.
BU currentGCD = 1;
while (currentGCD!=0) // This while loop is based on Euclid's algorithm
{
currentGCD = b % a;
b=a;
a=currentGCD;
}
return b;
}
BU f(BU x, BU n, int increment)
{
return (x * x + increment) % n;
}
As far as I can see, Pollard Rho normally uses f(x) as (x*x+1) (e.g. in these lecture notes ).
Your choice of x*x-1 appears not as good as it often seems to get stuck in a loop:
x=0
f(x)=-1
f(f(x))=0

Resources