finding greatest prime factor using recursion in c

finding greatest prime factor using recursion in c - c

have wrote the code for what i see to be a good algorithm for finding the greatest prime factor for a large number using recursion. My program crashes with any number greater than 4 assigned to the variable huge_number though. I am not good with recursion and the assignment does not allow any sort of loop.
#include <stdio.h>
long long prime_factor(int n, long long huge_number);
int main (void)
{
int n = 2;
long long huge_number = 60085147514;
long long largest_prime = 0;
largest_prime = prime_factor(n, huge_number);
printf("%ld\n", largest_prime);
return 0;
}
long long prime_factor (int n, long long huge_number)
{
if (huge_number / n == 1)
return huge_number;
else if (huge_number % n == 0)
return prime_factor (n, huge_number / n);
else
return prime_factor (n++, huge_number);
}
any info as to why it is crashing and how i could improve it would be greatly appreciated.

Even fixing the problem of using post-increment so that the recursion continues forever, this is not a good fit for a recursive solution - see here for why, but it boils down to how fast you can reduce the search space.
While your division of huge_number whittles it down pretty fast, the vast majority of recursive calls are done by simply incrementing n. That means you're going to use a lot of stack space.
You would be better off either:
using an iterative solution where you won't blow out the stack (if you just want to solve the problem) (a); or
finding a more suitable problem for recursion if you're just trying to learn recursion.
(a) An example of such a beast, modeled on your recursive solution, is:
#include <stdio.h>
long long prime_factor_i (int n, long long huge_number) {
while (n < huge_number) {
if (huge_number % n == 0) {
huge_number /= n;
continue;
}
n++;
}
return huge_number;
}
int main (void) {
int n = 2;
long long huge_number = 60085147514LL;
long long largest_prime = 0;
largest_prime = prime_factor_i (n, huge_number);
printf ("%lld\n", largest_prime);
return 0;
}
As can be seen from the output of that iterative solution, the largest factor is 10976461. That means the final batch of recursions in your recursive solution would require a stack depth of ten million stack frames, not something most environments will contend with easily.
If you really must use a recursive solution, you can reduce the stack space to the square root of that by using the fact that you don't have to check all the way up to the number, but only up to its square root.
In addition, other than 2, every other prime number is odd, so you can further halve the search space by only checking two plus the odd numbers.
A recursive solution taking those two things into consideration would be:
long long prime_factor_r (int n, long long huge_number) {
// Debug code for level checking.
// static int i = 0;
// printf ("recursion level = %d\n", ++i);
// Only check up to square root.
if (n * n >= huge_number)
return huge_number;
// If it's a factor, reduce the number and try again.
if (huge_number % n == 0)
return prime_factor_r (n, huge_number / n);
// Select next "candidate" prime to check against, 2 -> 3,
// 2n+1 -> 2n+3 for all n >= 1.
if (n == 2)
return prime_factor_r (3, huge_number);
return prime_factor_r (n + 2, huge_number);
}
You can see I've also removed the (awkward, in my opinion) construct:
if something then
return something
else
return something else
I much prefer the less massively indented code that comes from:
if something then
return something
return something else
But that's just personal preference. In any case, that gets your recursion level down to 1662 (uncomment the debug code to verify) rather than ten million, a rather sizable reduction but still not perfect. That runs okay in my environment.

You meant n+1 instead of n++. n++ increments n after using it, so the recursive call gets the original value of n.

You are overflowing stack, because n++ post-increments the value, making a recursive call with the same values as in the current invocation.

the crash reason is stack overflow. I add a counter to your program and execute it(on ubuntu 10.04 gcc 4.4.3) the counter stop at "218287" before core dump. the better solution is using loop instead of recursion.

Related

any suggestions to improve and bypass timeout error test with this prime finder function?

I'm supposed to create a function to find the closest next prime number for a given number, I mean even though the algorithm is badly written and very slow (probably it's the slowest around), but its doing the task, the problem is the program supposed to evaluate my thing refuse it with timeout error, he had given it a bunch of numbers at once and he want them to be all resolved in 10 seconds, so the question is there any improvements you may suggest that could fast forward my poor tortue ? (for is not allowed)
int is_prime(int nb)
{
int i;
/* if negative terminate */
if (nb <= 1)
return (0);
/* start from first prime */
i = 2;
/* primes equals zero only when divisible by 1 and theme-selves */
while (nb % i != 0)
i++;
/* if i divides nb, we see if i is the nb we looking for */
if (i == nb)
return (1);
else
return (0);
}
int find_next_prime(int nb)
{
int i;
i = 0;
/*keep looking for primes one by one */
while (!is_prime(nb + i))
i++;
return (nb + i);
}

The best simple speed improvement is to check divisors up to the square root of n rather than all divisors up to n. This takes the algorithm from O(nb) to O(sqrt(nb)).
Consider is_prime(2147483647). OP's approach takes about 2147483647 iterations. Testing up to about the square root, 46341, is about 46,000 times faster.
// while (nb % i != 0) i++;
// if (i == nb) return (1);
// else return (0);
// While the divisor <= quotient (or until the square root is reached)
while (i <= nb/i) {
if (nb%i == 0) return 1;
i++;
}
return 0;
Avoid a i*i <= nb test as i*i may overflow.
Avoid sqrt(nb) as than involves a host of floating point/int issues.
Note: Good compilers see nearby nb/i; nb%i and compute them both for the time cost of one.
Lots of other improvements are possible, but wanted to focus on a simple one with a big impact. When wanting to improve speed focus on reducing order of complexity O() and not linear improvements. Is premature optimization really the root of all evil?

Your missing the two most common tricks to improve speed.
you only need to check up to square root of the number
once you have check 2, you only need to check every other number from 3 on.

How is it possible to achive O(log n) power function a^n by only using recursion?

Below, the purpose of the code is to compute power of an integer.
My friend told me that the time complexity of this algorithm is O(log n).
But, in fact the number of function calls is not equal to logn.
For example, power(2, 9) calls power functions 5 times (including the calling power(2,9)), while power(2, 8) calls power function 4 times (including the calling power(2,8).
Nevertheless the number of bits needed for 8 and 9 are same, the numbers of function calls are different.
Why does this happen? Is this really O(log n) algorithm?
#include <stdio.h>
int power(int a, int n) {
if(n == 0) {
return 1;
}
if(n == 1) {
return a;
}
if (n%2 == 0) {
return power(a*a, n/2);
}else{
return a * power(a, n - 1);
}
}
int main() {
for (int i = 0; i < 15; i++)
printf("pow(%d, %d) = %d\n", 2, i, power(2, i));
return 0;
}

Your implementation is O(logN), but it could be made slightly more efficient.
Note that hereafter, a log is a log base 2.
You have log(n) calls of power(a*a,n/2), and a call to power(a, n-1) for every bit set in n.
The number of bits set in n is at most log(n) +1.
Thus, the number of calls to power is at most log(n)+log(n)+1. For instance, when n = 15, the sequence of calls is
power(15), power(14), power(7), power(6), power(3), power(2), power(1)
log(n)+log(n)+1 = 3+3+1 = 7
Here is a more efficient implementation that has only log(n)+2 calls of power.
int power(int a, int n) {
if(n == 0) {
return 1;
}
if (n&1 == 0) {
return power(a*a, n/2);
}else{
return a * power(a*a, n/2);
}
}
In this case the sequence of calls when n = 15 is
power(15), power(7), power(3), power(1), power(0)
I removed the if (n == 1) condition because we can avoid this test that would be performed log(n) time by adding one call to power.
We then have log(n)+2 calls to power which is better than 2log(n)+1.

The reason why the algorithm remains Ο(lgN) even with the extra calls for the odd number case is because the number of extra calls is bounded by a constant. In the worst case, N/2 is odd at each iteration, but this would only double the number of extra calls (the constant is 2). That is, at worst, there will be 2lgN calls to complete the algorithm.
To more easily observe that the algorithm is Ο(lgN), you can rewrite the function to always reduce the power by half at each iteration, so that at worst case, there are only lgN calls. To leverage tail recursion, you can add a function parameter to accumulate the carried multiplier from the odd N.
int power_i (int a, unsigned N, int c) {
if (N == 0) return c;
return power_i(a*a, N/2, N%2 ? a*c : c);
}
int power (int a, unsigned N) {
return power_i(a, N, 1);
}
The advantage of tail recursion is that the optimized code will be converted into a simple loop by most modern C compilers.
Try it online!

The power function has two base cases: n = 0 and n = 1.
The power function has two recursive calls. Only one of them is made in any given call.
Let's first consider the case when n is even: In that case, the recursive call is made with n / 2.
If all calls would use this case, then you half n in each call down until you reach 1. This is indeed log(n) calls (plus 1 for the base case).
The other case, when n is odd, reduces n only by one. If all calls would end up using this recursive call then the function would be called n times; clearly not logarithmic but linear thus.
But what happens to an odd number when you subtract one from it? It becomes an even number. Thus the feared linear behaviour mentioned above cannot occur.
Worst case is: n is odd, thus use second recursive call. Now n is even, thus first recursive call. Now n is odd, this use second, ... and so on down until n is one. In that case every second call reduces n to n / 2. Therefore you need 2 * log(n) calls then (plus one for the base case).
So yes, this is in O(log(n)). This algorithm is often called binary exponentiation.

why runtime error on online judge?

I am unable to understand why i am getting runtime error with this code. Problem is every number >=6 can be represented as sum of two prime numbers.
My code is ...... Thanks in advance problem link is http://poj.org/problem?id=2262
#include "stdio.h"
#include "stdlib.h"
#define N 1000000
int main()
{
long int i,j,k;
long int *cp = malloc(1000000*sizeof(long int));
long int *isprime = malloc(1000000*sizeof(long int));
//long int *isprime;
long int num,flag;
//isprime = malloc(2*sizeof(long int));
for(i=0;i<N;i++)
{
isprime[i]=1;
}
j=0;
for(i=2;i<N;i++)
{
if(isprime[i])
{
cp[j] = i;
j++;
for(k=i*i;k<N;k+=i)
{
isprime[k] = 0;
}
}
}
//for(i=0;i<j;i++)
//{
// printf("%d ",cp[i]);
//}
//printf("\n");
while(1)
{
scanf("%ld",&num);
if(num==0) break;
flag = 0;
for(i=0;i<j&&num>cp[i];i++)
{
//printf("%d ",cp[i]);
if(isprime[num-cp[i]])
{
printf("%ld = %ld + %ld\n",num,cp[i],num-cp[i]);
flag = 1;
break;
}
}
if(flag==0)
{
printf("Goldbach's conjecture is wrong.\n");
}
}
free(cp);
free(isprime);
return 0;
}

Two possibilities immediately spring to mind. The first is that the user input may be failing if whatever test harness is being used does not provide any input. Without knowing more detail on the harness, this is a guess at best.
You could check that by hard-coding a value rather than accepting one from standard input.
The other possibility is the rather large memory allocations being done. It may be that you're in a constrained environment which doesn't allow that.
A simple test for that is to drop the value of N (and, by the way, use it rather than the multiple hardcoded 1000000 figures in your malloc calls). A better way would be to check the return value from malloc to ensure it's not NULL. That should be done anyway.
And, aside from that, you may want to check your Eratosthenes Sieve code. The first item that should be marked non-prime for the prime i is i + i rather than i * i as you have. I think it should be:
for (k = i + i; k < N; k += i)
The mathematical algorithm is actually okay since any multiple of N less than N * N will already have been marked non-prime by virtue of the fact it's a multiple of one of the primes previously checked.
Your problem lies with integer overflow. At the point where N becomes 46_349, N * N is 2_148_229_801 which, if you have a 32-bit two's complement integer (maximum value of 2_147_483_647), will wrap around to -2_146_737_495.
When that happens, the loop keeps going since that negative number is still less than your limit, but using it as an array index is, shall we say, inadvisable :-)
The reason it works with i + i is because your limit is well short of INT_MAX / 2 so no overflow happens there.
If you want to make sure that this won't be a problem if you get up near INT_MAX / 2, you can use something like:
for (k = i + i; (k < N) && (k > i); k += i)
That extra check on k should catch the wraparound event, provided your wrapping follows the "normal" behaviour - technically, I think it's undefined behaviour to wrap but most implementations simply wrap two positives back to a negative due to the two's complement nature. Be aware then that this is actually non-portable, but what that means in practice is that it will only work on 99.999% of machines out there :-)
But, if you're a stickler for portability, there are better ways to prevent overflow in the first place. I won't go into them here but to say they involve subtracting one of the terms being summed from MAX_INT and comparing it to the other term being summed.

The only way I can get this to give an error is if I enter a value greater than 1000000 or less than 1 to the scanf().
Like this:
ubuntu#amrith:/tmp$ ./x
183475666
Segmentation fault (core dumped)
ubuntu#amrith:/tmp$
But the reason for that should be obvious. Other than that, this code looks good.

Just trying to find what went wrong!
If the sizeof(long int) is 4 bytes for the OS that you are using, then it makes this problem.
In the code:
for(k=i*i;k<N;k+=i)
{
isprime[k] = 0;
}
Here, when you do k = i*i, for large values if i, the value of k goes beyond 4 bytesand get truncated which may result in negative numbers and so, the condition k<N is satisfied but with a negative number :). So you get a segmentation fault there.
It's good that you need only i+i, but if you need to increase the limit, take care of this problem.

Not getting proper output from Pollard's rho algorithm implementation

I don't know where I am doing wrong in trying to calculate prime factorizations using Pollard's rho algorithm.
#include<stdio.h>
#define f(x) x*x-1
int pollard( int );
int gcd( int, int);
int main( void ) {
int n;
scanf( "%d",&n );
pollard( n );
return 0;
}
int pollard( int n ) {
int i=1,x,y,k=2,d;
x = rand()%n;
y = x;
while(1) {
i++;
x = f( x ) % n;
d = gcd( y-x, n);
if(d!=1 && d!=n)
printf( "%d\n", d);
if(i == k) {
y = x;
k = 2 * k;
}
}
}
int gcd( int a, int b ) {
if( b == 0)
return a;
else
return gcd( b, a % b);
}

One immediate problem is, as Peter de Rivaz suspected the
#define f(x) x*x-1
Thus the line
x = f(x)%n;
becomes
x = x*x-1%n;
and the precedence of % is higher than that of -, hence the expression is implicitly parenthesised as
x = (x*x) - (1%n);
which is equivalent to x = x*x - 1; (I assume n > 1, anyway it's x = x*x - constant;) and if you start with a value x >= 2, you have overflow before you had a realistic chance of finding a factor:
2 -> 2*2-1 = 3 -> 3*3 - 1 = 8 -> 8*8 - 1 = 63 -> 3968 -> 15745023 -> overflow if int is 32 bits
That doesn't immediately make it impossible that gcd(y-x,n) is a factor, though. It just makes it likely that at a stage where theoretically, you would have found a factor, the overflow destroys the common factor that mathematically would exist - more likely than a common factor introduced by overflow.
Overflow of signed integers is undefined behaviour, so there are no guarantees how the programme behaves, but usually it behaves consistently so the iteration of f still produces a well-defined sequence for which the algorithm in principle works.
Another problem is that y-x will frequently be negative, and then the computed gcd can also be negative - often -1. In that case, you print -1.
And then, it is a not too rare occurrence that iterating f from a starting value doesn't detect a common factor because the cycles modulo both prime factors (for the example of n a product of two distinct primes) have equal length and are entered at the same time. You make no attempt at detecting such a case; whenever gcd(|y-x|, n) == n, any further work in that sequence is pointless, so you should break out of the loop when d == n.
Also, you never check whether n is a prime, in which case trying to find a factor is a futile undertaking from the start.
Furthermore, after fixing f(x) so that the % n applies to the complete result of f(x), you have the problem that x*x still overflows for relatively small x (with the standard signed 32-bit ints, for x >= 46341), so factoring larger n may fail due to overflow. At least, you should use unsigned long long for the computations, so that overflow is avoided for n < 2^32. However, factorising such small numbers is typically done more efficiently with trial division. Pollard's Rho method and other advanced factoring algorithms are meant for larger numbers, where trial division is no longer efficient or even feasible.

I'm just a novice at C++, and I am new to Stack Overflow, so some of what I have written is going to look sloppy, but this should get you going in the right direction. The program posted here should generally find and return one non-trivial factor of the number you enter at the prompt, or it will apologize if it cannot find such a factor.
I tested it with a few semiprime numbers, and it worked for me. For 371156167103, it finds 607619 without any detectable delay after I hit the enter key. I didn't check it with larger numbers than this. I used unsigned long long variables, but if possible, you should get and use a library that provides even larger integer types.
Editing to add, the single call to the method f for X and 2 such calls for Y is intentional and is in accordance with the way the algorithm works. I thought to nest the call for Y inside another such call to keep it on one line, but I decided to do it this way so it's easier to follow.
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
typedef unsigned long long ULL;
ULL pollard(ULL numberToFactor);
ULL gcd(ULL differenceBetweenCongruentFunctions, ULL numberToFactor);
ULL f(ULL x, ULL numberToFactor);
int main(void)
{
ULL factor;
ULL n;
std::cout<<"Enter the number for which you want a prime factor: ";
std::cin>>n;
factor = pollard(n);
if (factor == 0) std::cout<<"No factor found. Your number may be prime, but it is not certain.\n\n";
else std::cout<<"One factor is: "<<factor<<"\n\n";
}
ULL pollard(ULL n)
{
ULL x = 2ULL;
ULL y = 2ULL;
ULL d = 1ULL;
while(d==1||d==n)
{
x = f(x,n);
y = f(y,n);
y = f(y,n);
if (y>x)
{
d = gcd(y-x, n);
}
else
{
d = gcd(x-y, n);
}
}
return d;
}
ULL gcd(ULL a, ULL b)
{
if (a==b||a==0)
return 0; // If x==y or if the absolute value of (x-y) == the number to be factored, then we have failed to find
// a factor. I think this is not proof of primality, so the process could be repeated with a new function.
// For example, by replacing x*x+1 with x*x+2, and so on. If many such functions fail, primality is likely.
ULL currentGCD = 1;
while (currentGCD!=0) // This while loop is based on Euclid's algorithm
{
currentGCD = b % a;
b=a;
a=currentGCD;
}
return b;
}
ULL f(ULL x, ULL n)
{
return (x * x + 1) % n;
}

Sorry for the long delay getting back to this. As I mentioned in my first answer, I am a novice at C++, which will be evident in my excessive use of global variables, excessive use of BigIntegers and BigUnsigned where other types might be better, lack of error checking, and other programming habits on display which a more skilled person might not exhibit. That being said, let me explain what I did, then will post the code.
I am doing this in a second answer because the first answer is useful as a very simple demo of how a Pollard's Rho algorithm is to implement once you understand what it does. And what it does is to first take 2 variables, call them x and y, assign them the starting values of 2. Then it runs x through a function, usually (x^2+1)%n, where n is the number you want to factor. And it runs y through the same function twice each cycle. Then the difference between x and y is calculated, and finally the greatest common divisor is found for this difference and n. If that number is 1, then you run x and y through the function again.
Continue this process until the GCD is not 1 or until x and y are equal again. If the GCD is found which is not 1, then that GCD is a non-trivial factor of n. If x and y become equal, then the (x^2+1)%n function has failed. In that case, you should try again with another function, maybe (x^2+2)%n, and so on.
Here is an example. Take 35, for which we know the prime factors are 5 and 7. I'll walk through Pollard Rho and show you how it finds a non-trivial factor.
Cycle #1: X starts at 2. Then using the function (x^2+1)%n, (2^2+1)%35, we get 5 for x. Y starts at 2 also, and after one run through the function, it also has a value of 5. But y always goes through the function twice, so the second run is (5^2+1)%35, or 26. The difference between x and y is 21. The GCD of 21 (the difference) and 35 (n) is 7. We have already found a prime factor of 35! Note that the GCD for any 2 numbers, even extremely large exponents, can be found very quickly by formula using Euclid's algorithm, and that's what the program I will post here does.
On the subject of the GCD function, I am using one library I downloaded for this program, a library that allows me to use BigIntegers and BigUnsigned. That library also has a GCD function built in, and I could have used it. But I decided to stay with the hand-written GCD function for instructional purposes. If you want to improve the program's execution time, it might be a good idea to use the library's GCD function because there are faster methods than Euclid, and the library may be written to use one of those faster methods.
Another side note. The .Net 4.5 library supports the use of BigIntegers and BigUnsigned also. I decided not to use that for this program because I wanted to write the whole thing in C++, not C++/CLI. You could get better performance from the .Net library, or you might not. I don't know, but I wanted to share that that is also an option.
I am jumping around a bit here, so let me start now by explaining in broad strokes what the program does, and lastly I will explain how to set it up on your computer if you use Visual Studio 11 (also called Visual Studio 2012).
The program allocates 3 arrays for storing the factors of any number you give it to process. These arrays are 1000 elements wide, which is excessive, maybe, but it ensures any number with 1000 prime factors or less will fit.
When you enter the number at the prompt, it assumes the number is composite and puts it in the first element of the compositeFactors array. Then it goes through some admittedly inefficient while loops, which use Miller-Rabin to check if the number is composite. Note this test can either say a number is composite with 100% confidence, or it can say the number is prime with extremely high (but not 100%) confidence. The confidence is adjustable by a variable confidenceFactor in the program. The program will make one check for every value between 2 and confidenceFactor, inclusive, so one less total check than the value of confidenceFactor itself.
The setting I have for confidenceFactor is 101, which does 100 checks. If it says a number is prime, the odds that it is really composite are 1 in 4^100, or the same as the odds of correctly calling the flip of a fair coin 200 consecutive times. In short, if it says the number is prime, it probably is, but the confidenceFactor number can be increased to get greater confidence at the cost of speed.
Here might be as good a place as any to mention that, while Pollard's Rho algorithm can be pretty effective factoring smaller numbers of type long long, the Miller-Rabin test to see if a number is composite would be more or less useless without the BigInteger and BigUnsigned types. A BigInteger library is pretty much a requirement to be able to reliably factor large numbers all the way to their prime factors like this.
When Miller Rabin says the factor is composite, it is factored, the factor stored in a temp array, and the original factor in the composites array divided by the same factor. When numbers are identified as likely prime, they are moved into the prime factors array and output to screen. This process continues until there are no composite factors left. The factors tend to be found in ascending order, but this is coincidental. The program makes no effort to list them in ascending order, but only lists them as they are found.
Note that I could not find any function (x^2+c)%n which will factor the number 4, no matter what value I gave c. Pollard Rho seems to have a very hard time with all perfect squares, but 4 is the only composite number I found which is totally impervious to it using functions in the format described. Therefore I added a check for an n of 4 inside the pollard method, returning 2 instantly if so.
So to set this program up, here is what you should do. Go to https://mattmccutchen.net/bigint/ and download bigint-2010.04.30.zip. Unzip this and put all of the .hh files and all of the C++ source files in your ~\Program Files\Microsoft Visual Studio 11.0\VC\include directory, excluding the Sample and C++ Testsuite source files. Then in Visual Studio, create an empty project. In the solution explorer, right click on the resource files folder and select Add...existing item. Add all of the C++ source files in the directory I just mentioned. Then also in solution expolorer, right click the Source Files folder and add a new item, select C++ file, name it, and paste the below source code into it, and it should work for you.
Not to flatter overly much, but there are folks here on Stack Overflow who know a great deal more about C++ than I do, and if they modify my code below to make it better, that's fantastic. But even if not, the code is functional as-is, and it should help illustrate the principles involved in programmatically finding prime factors of medium sized numbers. It will not threaten the general number field sieve, but it can factor numbers with 12 - 14 digit prime factors in a reasonably short time, even on an old Core2 Duo computer like the one I am using.
The code follows. Good luck.
#include <string>
#include <stdio.h>
#include <iostream>
#include "BigIntegerLibrary.hh"
typedef BigInteger BI;
typedef BigUnsigned BU;
using std::string;
using std::cin;
using std::cout;
BU pollard(BU numberToFactor);
BU gcda(BU differenceBetweenCongruentFunctions, BU numberToFactor);
BU f(BU x, BU numberToFactor, int increment);
void initializeArrays();
BU getNumberToFactor ();
void factorComposites();
bool testForComposite (BU num);
BU primeFactors[1000];
BU compositeFactors[1000];
BU tempFactors [1000];
int primeIndex;
int compositeIndex;
int tempIndex;
int numberOfCompositeFactors;
bool allJTestsShowComposite;
int main ()
{
while(1)
{
primeIndex=0;
compositeIndex=0;
tempIndex=0;
initializeArrays();
compositeFactors[0] = getNumberToFactor();
cout<<"\n\n";
if (compositeFactors[0] == 0) return 0;
numberOfCompositeFactors = 1;
factorComposites();
}
}
void initializeArrays()
{
for (int i = 0; i<1000;i++)
{
primeFactors[i] = 0;
compositeFactors[i]=0;
tempFactors[i]=0;
}
}
BU getNumberToFactor ()
{
std::string s;
std::cout<<"Enter the number for which you want a prime factor, or 0 to quit: ";
std::cin>>s;
return stringToBigUnsigned(s);
}
void factorComposites()
{
while (numberOfCompositeFactors!=0)
{
compositeIndex = 0;
tempIndex = 0;
// This while loop finds non-zero values in compositeFactors.
// If they are composite, it factors them and puts one factor in tempFactors,
// then divides the element in compositeFactors by the same amount.
// If the element is prime, it moves it into tempFactors (zeros the element in compositeFactors)
while (compositeIndex < 1000)
{
if(compositeFactors[compositeIndex] == 0)
{
compositeIndex++;
continue;
}
if(testForComposite(compositeFactors[compositeIndex]) == false)
{
tempFactors[tempIndex] = compositeFactors[compositeIndex];
compositeFactors[compositeIndex] = 0;
tempIndex++;
compositeIndex++;
}
else
{
tempFactors[tempIndex] = pollard (compositeFactors[compositeIndex]);
compositeFactors[compositeIndex] /= tempFactors[tempIndex];
tempIndex++;
compositeIndex++;
}
}
compositeIndex = 0;
// This while loop moves all remaining non-zero values from compositeFactors into tempFactors
// When it is done, compositeFactors should be all 0 value elements
while (compositeIndex < 1000)
{
if (compositeFactors[compositeIndex] != 0)
{
tempFactors[tempIndex] = compositeFactors[compositeIndex];
compositeFactors[compositeIndex] = 0;
tempIndex++;
compositeIndex++;
}
else compositeIndex++;
}
compositeIndex = 0;
tempIndex = 0;
// This while loop checks all non-zero elements in tempIndex.
// Those that are prime are shown on screen and moved to primeFactors
// Those that are composite are moved to compositeFactors
// When this is done, all elements in tempFactors should be 0
while (tempIndex<1000)
{
if(tempFactors[tempIndex] == 0)
{
tempIndex++;
continue;
}
if(testForComposite(tempFactors[tempIndex]) == false)
{
primeFactors[primeIndex] = tempFactors[tempIndex];
cout<<primeFactors[primeIndex]<<"\n";
tempFactors[tempIndex]=0;
primeIndex++;
tempIndex++;
}
else
{
compositeFactors[compositeIndex] = tempFactors[tempIndex];
tempFactors[tempIndex]=0;
compositeIndex++;
tempIndex++;
}
}
compositeIndex=0;
numberOfCompositeFactors=0;
// This while loop just checks to be sure there are still one or more composite factors.
// As long as there are, the outer while loop will repeat
while(compositeIndex<1000)
{
if(compositeFactors[compositeIndex]!=0) numberOfCompositeFactors++;
compositeIndex ++;
}
}
return;
}
// The following method uses the Miller-Rabin primality test to prove with 100% confidence a given number is composite,
// or to establish with a high level of confidence -- but not 100% -- that it is prime
bool testForComposite (BU num)
{
BU confidenceFactor = 101;
if (confidenceFactor >= num) confidenceFactor = num-1;
BU a,d,s, nMinusOne;
nMinusOne=num-1;
d=nMinusOne;
s=0;
while(modexp(d,1,2)==0)
{
d /= 2;
s++;
}
allJTestsShowComposite = true; // assume composite here until we can prove otherwise
for (BI i = 2 ; i<=confidenceFactor;i++)
{
if (modexp(i,d,num) == 1)
continue; // if this modulus is 1, then we cannot prove that num is composite with this value of i, so continue
if (modexp(i,d,num) == nMinusOne)
{
allJTestsShowComposite = false;
continue;
}
BU exponent(1);
for (BU j(0); j.toInt()<=s.toInt()-1;j++)
{
exponent *= 2;
if (modexp(i,exponent*d,num) == nMinusOne)
{
// if the modulus is not right for even a single j, then break and increment i.
allJTestsShowComposite = false;
continue;
}
}
if (allJTestsShowComposite == true) return true; // proven composite with 100% certainty, no need to continue testing
}
return false;
/* not proven composite in any test, so assume prime with a possibility of error =
(1/4)^(number of different values of i tested). This will be equal to the value of the
confidenceFactor variable, and the "witnesses" to the primality of the number being tested will be all integers from
2 through the value of confidenceFactor.
Note that this makes this primality test cryptographically less secure than it could be. It is theoretically possible,
if difficult, for a malicious party to pass a known composite number for which all of the lowest n integers fail to
detect that it is composite. A safer way is to generate random integers in the outer "for" loop and use those in place of
the variable i. Better still if those random numbers are checked to ensure no duplicates are generated.
*/
}
BU pollard(BU n)
{
if (n == 4) return 2;
BU x = 2;
BU y = 2;
BU d = 1;
int increment = 1;
while(d==1||d==n||d==0)
{
x = f(x,n, increment);
y = f(y,n, increment);
y = f(y,n, increment);
if (y>x)
{
d = gcda(y-x, n);
}
else
{
d = gcda(x-y, n);
}
if (d==0)
{
x = 2;
y = 2;
d = 1;
increment++; // This changes the pseudorandom function we use to increment x and y
}
}
return d;
}
BU gcda(BU a, BU b)
{
if (a==b||a==0)
return 0; // If x==y or if the absolute value of (x-y) == the number to be factored, then we have failed to find
// a factor. I think this is not proof of primality, so the process could be repeated with a new function.
// For example, by replacing x*x+1 with x*x+2, and so on. If many such functions fail, primality is likely.
BU currentGCD = 1;
while (currentGCD!=0) // This while loop is based on Euclid's algorithm
{
currentGCD = b % a;
b=a;
a=currentGCD;
}
return b;
}
BU f(BU x, BU n, int increment)
{
return (x * x + increment) % n;
}

As far as I can see, Pollard Rho normally uses f(x) as (x*x+1) (e.g. in these lecture notes ).
Your choice of x*x-1 appears not as good as it often seems to get stuck in a loop:
x=0
f(x)=-1
f(f(x))=0

storing known sequences in c

I'm working on Project Euler #14 in C and have figured out the basic algorithm; however, it runs insufferably slow for large numbers, e.g. 2,000,000 as wanted; I presume because it has to generate the sequence over and over again, even though there should be a way to store known sequences (e.g., once we get to a 16, we know from previous experience that the next numbers are 8, 4, 2, then 1).
I'm not exactly sure how to do this with C's fixed-length array, but there must be a good way (that's amazingly efficient, I'm sure). Thanks in advance.
Here's what I currently have, if it helps.
#include <stdio.h>
#define UPTO 2000000
int collatzlen(int n);
int main(){
int i, l=-1, li=-1, c=0;
for(i=1; i<=UPTO; i++){
if( (c=collatzlen(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
/* n != 0 */
int collatzlen(int n){
int len = 0;
while(n>1) n = (n%2==0 ? n/2 : 3*n+1), len+=1;
return len;
}

Your original program needs 3.5 seconds on my machine. Is it insufferably slow for you?
My dirty and ugly version needs 0.3 seconds. It uses a global array to store the values already calculated. And use them in future calculations.
int collatzlen2(unsigned long n);
static unsigned long array[2000000 + 1];//to store those already calculated
int main()
{
int i, l=-1, li=-1, c=0;
int x;
for(x = 0; x < 2000000 + 1; x++) {
array[x] = -1;//use -1 to denote not-calculated yet
}
for(i=1; i<=UPTO; i++){
if( (c=collatzlen2(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
int collatzlen2(unsigned long n){
unsigned long len = 0;
unsigned long m = n;
while(n > 1){
if(n > 2000000 || array[n] == -1){ // outside range or not-calculated yet
n = (n%2 == 0 ? n/2 : 3*n+1);
len+=1;
}
else{ // if already calculated, use the value
len += array[n];
n = 1; // to get out of the while-loop
}
}
array[m] = len;
return len;
}

Given that this is essentially a throw-away program (i.e. once you've run it and got the answer, you're not going to be supporting it for years :), I would suggest having a global variable to hold the lengths of sequences already calculated:
int lengthfrom[UPTO] = {};
If your maximum size is a few million, then we're talking megabytes of memory, which should easily fit in RAM at once.
The above will initialise the array to zeros at startup. In your program - for each iteration, check whether the array contains zero. If it does - you'll have to keep going with the computation. If not - then you know that carrying on would go on for that many more iterations, so just add that to the number you've done so far and you're done. And then store the new result in the array, of course.
Don't be tempted to use a local variable for an array of this size: that will try to allocate it on the stack, which won't be big enough and will likely crash.
Also - remember that with this sequence the values go up as well as down, so you'll need to cope with that in your program (probably by having the array longer than UPTO values, and using an assert() to guard against indices greater than the size of the array).

If I recall correctly, your problem isn't a slow algorithm: the algorithm you have now is fast enough for what PE asks you to do. The problem is overflow: you sometimes end up multiplying your number by 3 so many times that it will eventually exceed the maximum value that can be stored in a signed int. Use unsigned ints, and if that still doesn't work (but I'm pretty sure it does), use 64 bit ints (long long).
This should run very fast, but if you want to do it even faster, the other answers already addressed that.