How to calculate this factorial - c

#include <stdio.h>
int main(){
int n, v;
printf("Please enter a value from 39 to 59: \n");
scanf("%d", &n);
printf("Please enter a value from 3 to 7: \n");
scanf("%d", &v);
}
When I got those values from user, how can I perform this factorial calculation:
n! / ((n-v)! * v!))
I've tried different data types but apparently none can hold the result.
For example: n = 49, v=6. The result is: 13,983,816, but how can I go about getting it?

You're best bet is to ditch the naive factorial implementations, usually based on recursion, and switch to one that returns the natural log of gamma function.
The gamma function is related to factorial: gamma(n) = (n-1)!
Best of all is natural log of gamma, because you can rewrite that expression like this:
ln(n!/(n-v)!v!) = ln(n!) - ln((n-v)!) - ln(v!)
But
(n-v)! = gamma(n-v+1)
n! = gamma(n+1)
v! = gamma(v+1)
So
ln(n!/(n-v)!v!) = lngamma(n+1) - lngamma(n-v+1) - lngamma(v+1)
You can find an implemenation for lngamma in Numerical Recipes.
lngamma returns a double, so it'll fit even for larger values.
It should go without saying that you'll take exp() of both sides to get the original expression you want back.

#duffymo idea looked like too much fun to ignore: use lgamma() from <math.h>.
Results past maybe x=1e15, start to lose the trailing significant digits.. Still fun to be able to get 1000000.0!.
void factorial_expo(double x, double *significand, double *expo) {
double y = lgamma(x+1);
const static double ln10 = 2.3025850929940456840179914546844;
y /= ln10;
double ipart;
double fpart = modf(y, &ipart);
if (significand) *significand = pow(10.0, fpart);
if (expo) *expo = ipart;
}
void facttest(double x) {
printf("%.1f! = ", x);
double significand, expo;
factorial_expo(x, &significand, &expo);
int digits = expo > 15 ? 15 : expo;
if (digits < 1) digits++;
printf("%.*fe%.0f\n", digits, significand, expo);
}
int main(void) {
facttest(0.0);
facttest(1.0);
facttest(2.0);
facttest(6.0);
facttest(10.0);
facttest(69.0);
facttest(1000000.0);
return 0;
}
0.0! = 1.0e0
1.0! = 1.0e0
2.0! = 2.0e0
6.0! = 7.20e2
10.0! = 3.628800e6
69.0! = 1.711224524281441e98
1000000.0! = 8.263931668544735e5565708

In a comment, you've finally said that you don't need exact results.
Just use floating-point. The largest intermediate result you'll need to handle is 59!, which is about 1.3868e80; type double is more than big enough to hold that value.
Write a function like:
double factorial(int n);
(I presume you know how to implement it) and use that.
If you're going to be doing a lot of these calculations, you might want to cache the results by storing them in an array. If you define an array like:
double fact[60];
then you can store the value of N! in fact[N] for N from 0 to 59 -- and you can fill the entire array in about the time it would take to compute 59! just once. Otherwise, you'll be doing several dozen floating-point multiplications and divisions on each calculation -- which is trivial if you do it once, but could be significant if you do it, say, thousands or millions of times.
If you needed exact results, you could use an extended integer library like GNU MP, as others have suggested. Or you could use a language (like Python, for example) that has built-in support for arbitrary-length integers.
Or you could probably perform the multiplications and divisions in an order that avoids overflow; I don't know exactly how to do that, but since n! / ((n-v)! * v!)) is a common formula I strongly suspect that work has already been done.

You can't work with such long numbers as 59! in simple way.
However you can use special C libraries which are working with long numbers bigger than 8 bytes, for example GMP

Related

Exceeding the range of long double and big floating point numbers

Problem statement: I am working on a code that calculates big numbers. Hence, I am easily get beyond the maximum length of "long double". Here is an example below, where part of the code is given that generates big numbers:
int n;
long double summ;
a[1]=1;
b[1]=1;
c[1] = 1; //a, b, c are 1D variables of long double types
summ=1+c[1];
for(n=2; n <=1760; n++){
a[n]=n*n;
b[n]=n;
c[n] = c[n-1]*a[n-1]/b[n]; //Let us assume we have this kind of operation
summ= summ+c[n]; //So basically, summ = 1+c[1]+c[2]+c[3]+...+c[1760]
}
The intermediates values of summ and c[n] are then used to evaluate the ratio c[n]/summ for every integer n. Then, just after the above loop, I do:
for(n=1;n<=1760;n++){
c2[n]=c[n]/summ; //summ is thus here equals to 1+c[1]+c[2]+c[3]+...+c[1760]
}
Output: If we print n, c[n] and summ, we obtain inf after n=1755 because we exceed the length of long double:
n c[n] summ
1752 2.097121e+4917 2.098320e+4917
1753 3.672061e+4920 3.674159e+4920
1754 6.433452e+4923 6.437126e+4923
1755 1.127785e+4927 1.128428e+4927
1756 inf inf
1757 inf inf
1758 inf inf
1759 inf inf
1760 inf inf
Of course, if there is an overflow for c[n] and summ, I cannot evaluate the quantity of interest, which is c2[n].
Questions: Does someone see any solution for this ? How do I need to change the code so that to have finite numerical values (for arbitrary n) ?
I will indeed most likely need to go to very big numbers (n can be much larger than 1760).
Proposition: I know that GNU Multiple Precision Arithmetic (GMP) might be useful but honestly found too many difficulties trying to use this (outside the field), so if there an easier way to solve this, I would be glad to read it. Otherwise, I will be forever grateful if someone could apply GMP or any other method to solve the above-mentioned problem.
NOTE: This does not exactly what OP wants. I'll leave this answer here in case someone has a similar problem.
As long as your final result and all initial values are not out of range, you can very often re-arrange your terms to avoid any overflow. In your case if you actually just want to know c2[n] = c[n]/sum[n] you can re-write this as follows:
c2[n] = c[n]/sum[n]
= c[n]/(sum[n-1] + c[n]) // def. of sum[n]
= 1.0/(sum[n-1]/c[n] + 1.0)
= 1.0/(sum[n-1]/(c[n-1] * a[n-1] / b[n]) + 1.0) // def. of c[n]
= 1.0/(sum[n-1]/c[n-1] * b[n] / a[n-1] + 1.0)
= a[n-1]/(1/c2[n-1] * b[n] + a[n-1]) // def. of c2[n-1]
= (a[n-1]*c2[n-1]) / (b[n] + a[n-1]*c2[n-1])
Now in the final expression neither argument grows out of range, and in fact c2 slowly converges towards 1. If the values in your question are the actual values of a[n] and b[n] you may even find a closed form expression for c2[n] (I did not check it).
To check that the re-arrangement works, you can compare it with your original formula (godbolt-link, only printing the last values): https://godbolt.org/z/oW8KsdKK6
Btw: Unless you later need all values of c2 again, there is actually no need to store any intermediate value inside an array.
I ain't no mathematician. This is what I wrote with the results below. Looks to me that the exponent, at least, is keeping up with your long double results using my feeble only double only...
#include <stdio.h>
#include <math.h>
int main() {
int n;
double la[1800], lb[1800], lc[1800];
for( n = 2; n <= 1760; n++ ) {
lb[n] = log10(n);
la[n] = lb[n] + lb[n];
lc[n] = lc[n-1] + la[n-1] - lb[n];
printf( "%4d: %.16lf\n", n, lc[n] );
}
return 0;
}
/* omitted for brevity */
1750: 4910.8357954121602000
1751: 4914.0785853634488000
1752: 4917.3216235537839000
1753: 4920.5649098413542000
1754: 4923.8084440845114000
1755: 4927.0522261417700000 <<=== Take note, please.
1756: 4930.2962558718036000
1757: 4933.5405331334487000
1758: 4936.7850577857016000
1759: 4940.0298296877190000
1760: 4943.2748486988194000
EDIT (Butterfly edition)
Below is a pretty simple iterative function involving one single and one double precision float values. The purpose is to demonstrate that iterative calculations are exceedingly sensitive to initial conditions. While it seems obvious that the extra bits of the double will "hold-on", remaining closer to the results one would get with infinite precision, the compounding discrepancy between these two versions demonstrate that "demons lurking in small places" will likely remain hidden in the fantastically tiny gaps between finite representations of what is infinite.
Just a bit of fun for a rainy day.
int main() {
float fpi = 3.1415926535897932384626433832;
double dpi = 3.1415926535897932384626433832;
double thresh = 10e-8;
for( int i = 0; i < 1000; i++ ) {
fpi = fpi * 1.03f;
dpi = dpi * 1.03f;
double diff = fabs( dpi - fpi );
if( diff > thresh) {
printf( "%3d: %25.16lf\n", i, diff );
thresh *= 10.0;
}
}
return 0;
}
8: 0.0000001229991486
35: 0.0000010704333473
90: 0.0000100210180918
192: 0.0001092634900033
229: 0.0010121794607585
312: 0.0100316228017618
367: 0.1002719746902585
453: 1.0056506423279643
520: 10.2658853083848950
609: 103.8011477291584000
667: 1073.9984381198883000
736: 10288.9632129669190000
807: 101081.5514678955100000
886: 1001512.2135009766000000
966: 10473883.3271484370000000

Inexplicable computational error

I am writing a program that reads wavelength and intensity data from separate signal and background files (so each file is comprised of a number of pairs of wavelength and intensity). As you can see, I do this by creating a structure, and then assigning the values to the proper elements in the structure using fscanf in a loop. Once the data is read in, the program is supposed to plot it on the interval where the recorded wavelengths in each file overlap, that is, the common range of wavelengths. The wavelengths align perfectly where this overlap exist and are known to be spaced at a constant difference. Thus, my way of discerning which elements of the structure array were applicable was to determine which of the two files' minimum wavelength was higher, and maximum wavelength was lower. Then, for the file that had the lower minimum and higher maximum, I would find the difference between this and the higher minimum/lower maximum, and then divide it by the constant step to determine how many elements to offset. This works, except when the math is done, the program returns a wrong answer that is completely inexplicable.
In the code below, I define the constant step as lambdastep by calculating the difference between wavelengths of one element and the element before it. With my sample data, it is .002, which is confirmed by printf. However, when I run the program and divide by lambdastep, I get an incorrect answer. When I run the program dividing by .002, I get the correct answer. Why is this case? There is no explanation I can think of.
#include<stdio.h>
#include<math.h>
#include<stdlib.h>
#include "plots.h"
struct spectrum{
double lambda;
double intensity;
};
main(){
double a=0,b=0,c=0,d=0,lambdastep,smin,smax,bmin,bmax,tmin,tmax,sintmin,bintmin,tintmin,sintmax,bintmax,tintmax,ymin,ymax;
int ns,nb,nt,i=0,sminel,smaxel,bminel,bmaxel,tminel,tmaxel;
double min(struct spectrum *a,int,int);
double max(struct spectrum *a,int,int);
FILE *Input;
Input = fopen("sig.dat","r");
FILE *InputII;
InputII = fopen("bck.dat","r");
fscanf(Input,"%d",&ns);
fscanf(InputII,"%d",&nb);
struct spectrum signal[ns];
struct spectrum background[nb];
struct spectrum *s = &signal[0];
struct spectrum *ba = &background[0];
s = malloc(ns*sizeof(struct spectrum));
ba = malloc(nb*sizeof(struct spectrum));
while( fscanf(Input,"%lf%lf",&a,&b) != EOF){
signal[i].lambda = a;
signal[i].intensity = b;
i++;
}
i = 0;
while( fscanf(InputII,"%lf%lf",&c,&d) != EOF){
background[i].lambda = c;
background[i].intensity = d;
i++;
}
for (i=0; i < ns ;i++){
printf("%.7lf %.7lf\n", signal[i].lambda,signal[i].intensity);
}
printf("\n");
for (i=0; i < nb ;i++){
printf("%.7lf %.7lf\n", background[i].lambda,background[i].intensity);
}
lambdastep = signal[1].lambda - signal[0].lambda; //this is where I define lambdastep as the interval between two measurements
smin = signal[0].lambda;
smax = signal[ns-1].lambda;
bmin = background[0].lambda;
bmax = background[nb-1].lambda;
if (smin > bmin)
tmin = smin;
else
tmin = bmin;
if (smax > bmax)
tmax = bmax;
else
tmax = smax;
printf("%lf %lf %lf %lf %lf %lf %lf\n",lambdastep,smin,smax,bmin,bmax,tmin,tmax); //here is where I confirm that it is .002, which is the expected value
sminel = (tmin-smin)/(lambdastep); //sminel should be 27, but it returns 26 when lamdastep is used. it works right when .002 is directly entered , but not with lambdastep, even though i already confirmed they are exactly the same. why?
sminel is an integer, so (tmin-smin)/lambdastep will be casted to an integer when the calculation concludes.
A very slight difference in lambdastep could be the difference between getting e.g. 27.00001 and 26.99999; the latter truncates down to 26 when cast to an int.
Try using floor, ceil, or round to get better control over the rounding of the returned value.
It almost certainly has to do with the inherent imprecision of floating-point calculations. Trying printing out lambdastep to many significant digits -- I bet you'll find that its exact value is slightly larger than you think it is.
With my sample data, it is .002, which is confirmed by printf.
Try printing out (lambdastep == .002).

Not getting proper output from Pollard's rho algorithm implementation

I don't know where I am doing wrong in trying to calculate prime factorizations using Pollard's rho algorithm.
#include<stdio.h>
#define f(x) x*x-1
int pollard( int );
int gcd( int, int);
int main( void ) {
int n;
scanf( "%d",&n );
pollard( n );
return 0;
}
int pollard( int n ) {
int i=1,x,y,k=2,d;
x = rand()%n;
y = x;
while(1) {
i++;
x = f( x ) % n;
d = gcd( y-x, n);
if(d!=1 && d!=n)
printf( "%d\n", d);
if(i == k) {
y = x;
k = 2 * k;
}
}
}
int gcd( int a, int b ) {
if( b == 0)
return a;
else
return gcd( b, a % b);
}
One immediate problem is, as Peter de Rivaz suspected the
#define f(x) x*x-1
Thus the line
x = f(x)%n;
becomes
x = x*x-1%n;
and the precedence of % is higher than that of -, hence the expression is implicitly parenthesised as
x = (x*x) - (1%n);
which is equivalent to x = x*x - 1; (I assume n > 1, anyway it's x = x*x - constant;) and if you start with a value x >= 2, you have overflow before you had a realistic chance of finding a factor:
2 -> 2*2-1 = 3 -> 3*3 - 1 = 8 -> 8*8 - 1 = 63 -> 3968 -> 15745023 -> overflow if int is 32 bits
That doesn't immediately make it impossible that gcd(y-x,n) is a factor, though. It just makes it likely that at a stage where theoretically, you would have found a factor, the overflow destroys the common factor that mathematically would exist - more likely than a common factor introduced by overflow.
Overflow of signed integers is undefined behaviour, so there are no guarantees how the programme behaves, but usually it behaves consistently so the iteration of f still produces a well-defined sequence for which the algorithm in principle works.
Another problem is that y-x will frequently be negative, and then the computed gcd can also be negative - often -1. In that case, you print -1.
And then, it is a not too rare occurrence that iterating f from a starting value doesn't detect a common factor because the cycles modulo both prime factors (for the example of n a product of two distinct primes) have equal length and are entered at the same time. You make no attempt at detecting such a case; whenever gcd(|y-x|, n) == n, any further work in that sequence is pointless, so you should break out of the loop when d == n.
Also, you never check whether n is a prime, in which case trying to find a factor is a futile undertaking from the start.
Furthermore, after fixing f(x) so that the % n applies to the complete result of f(x), you have the problem that x*x still overflows for relatively small x (with the standard signed 32-bit ints, for x >= 46341), so factoring larger n may fail due to overflow. At least, you should use unsigned long long for the computations, so that overflow is avoided for n < 2^32. However, factorising such small numbers is typically done more efficiently with trial division. Pollard's Rho method and other advanced factoring algorithms are meant for larger numbers, where trial division is no longer efficient or even feasible.
I'm just a novice at C++, and I am new to Stack Overflow, so some of what I have written is going to look sloppy, but this should get you going in the right direction. The program posted here should generally find and return one non-trivial factor of the number you enter at the prompt, or it will apologize if it cannot find such a factor.
I tested it with a few semiprime numbers, and it worked for me. For 371156167103, it finds 607619 without any detectable delay after I hit the enter key. I didn't check it with larger numbers than this. I used unsigned long long variables, but if possible, you should get and use a library that provides even larger integer types.
Editing to add, the single call to the method f for X and 2 such calls for Y is intentional and is in accordance with the way the algorithm works. I thought to nest the call for Y inside another such call to keep it on one line, but I decided to do it this way so it's easier to follow.
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
typedef unsigned long long ULL;
ULL pollard(ULL numberToFactor);
ULL gcd(ULL differenceBetweenCongruentFunctions, ULL numberToFactor);
ULL f(ULL x, ULL numberToFactor);
int main(void)
{
ULL factor;
ULL n;
std::cout<<"Enter the number for which you want a prime factor: ";
std::cin>>n;
factor = pollard(n);
if (factor == 0) std::cout<<"No factor found. Your number may be prime, but it is not certain.\n\n";
else std::cout<<"One factor is: "<<factor<<"\n\n";
}
ULL pollard(ULL n)
{
ULL x = 2ULL;
ULL y = 2ULL;
ULL d = 1ULL;
while(d==1||d==n)
{
x = f(x,n);
y = f(y,n);
y = f(y,n);
if (y>x)
{
d = gcd(y-x, n);
}
else
{
d = gcd(x-y, n);
}
}
return d;
}
ULL gcd(ULL a, ULL b)
{
if (a==b||a==0)
return 0; // If x==y or if the absolute value of (x-y) == the number to be factored, then we have failed to find
// a factor. I think this is not proof of primality, so the process could be repeated with a new function.
// For example, by replacing x*x+1 with x*x+2, and so on. If many such functions fail, primality is likely.
ULL currentGCD = 1;
while (currentGCD!=0) // This while loop is based on Euclid's algorithm
{
currentGCD = b % a;
b=a;
a=currentGCD;
}
return b;
}
ULL f(ULL x, ULL n)
{
return (x * x + 1) % n;
}
Sorry for the long delay getting back to this. As I mentioned in my first answer, I am a novice at C++, which will be evident in my excessive use of global variables, excessive use of BigIntegers and BigUnsigned where other types might be better, lack of error checking, and other programming habits on display which a more skilled person might not exhibit. That being said, let me explain what I did, then will post the code.
I am doing this in a second answer because the first answer is useful as a very simple demo of how a Pollard's Rho algorithm is to implement once you understand what it does. And what it does is to first take 2 variables, call them x and y, assign them the starting values of 2. Then it runs x through a function, usually (x^2+1)%n, where n is the number you want to factor. And it runs y through the same function twice each cycle. Then the difference between x and y is calculated, and finally the greatest common divisor is found for this difference and n. If that number is 1, then you run x and y through the function again.
Continue this process until the GCD is not 1 or until x and y are equal again. If the GCD is found which is not 1, then that GCD is a non-trivial factor of n. If x and y become equal, then the (x^2+1)%n function has failed. In that case, you should try again with another function, maybe (x^2+2)%n, and so on.
Here is an example. Take 35, for which we know the prime factors are 5 and 7. I'll walk through Pollard Rho and show you how it finds a non-trivial factor.
Cycle #1: X starts at 2. Then using the function (x^2+1)%n, (2^2+1)%35, we get 5 for x. Y starts at 2 also, and after one run through the function, it also has a value of 5. But y always goes through the function twice, so the second run is (5^2+1)%35, or 26. The difference between x and y is 21. The GCD of 21 (the difference) and 35 (n) is 7. We have already found a prime factor of 35! Note that the GCD for any 2 numbers, even extremely large exponents, can be found very quickly by formula using Euclid's algorithm, and that's what the program I will post here does.
On the subject of the GCD function, I am using one library I downloaded for this program, a library that allows me to use BigIntegers and BigUnsigned. That library also has a GCD function built in, and I could have used it. But I decided to stay with the hand-written GCD function for instructional purposes. If you want to improve the program's execution time, it might be a good idea to use the library's GCD function because there are faster methods than Euclid, and the library may be written to use one of those faster methods.
Another side note. The .Net 4.5 library supports the use of BigIntegers and BigUnsigned also. I decided not to use that for this program because I wanted to write the whole thing in C++, not C++/CLI. You could get better performance from the .Net library, or you might not. I don't know, but I wanted to share that that is also an option.
I am jumping around a bit here, so let me start now by explaining in broad strokes what the program does, and lastly I will explain how to set it up on your computer if you use Visual Studio 11 (also called Visual Studio 2012).
The program allocates 3 arrays for storing the factors of any number you give it to process. These arrays are 1000 elements wide, which is excessive, maybe, but it ensures any number with 1000 prime factors or less will fit.
When you enter the number at the prompt, it assumes the number is composite and puts it in the first element of the compositeFactors array. Then it goes through some admittedly inefficient while loops, which use Miller-Rabin to check if the number is composite. Note this test can either say a number is composite with 100% confidence, or it can say the number is prime with extremely high (but not 100%) confidence. The confidence is adjustable by a variable confidenceFactor in the program. The program will make one check for every value between 2 and confidenceFactor, inclusive, so one less total check than the value of confidenceFactor itself.
The setting I have for confidenceFactor is 101, which does 100 checks. If it says a number is prime, the odds that it is really composite are 1 in 4^100, or the same as the odds of correctly calling the flip of a fair coin 200 consecutive times. In short, if it says the number is prime, it probably is, but the confidenceFactor number can be increased to get greater confidence at the cost of speed.
Here might be as good a place as any to mention that, while Pollard's Rho algorithm can be pretty effective factoring smaller numbers of type long long, the Miller-Rabin test to see if a number is composite would be more or less useless without the BigInteger and BigUnsigned types. A BigInteger library is pretty much a requirement to be able to reliably factor large numbers all the way to their prime factors like this.
When Miller Rabin says the factor is composite, it is factored, the factor stored in a temp array, and the original factor in the composites array divided by the same factor. When numbers are identified as likely prime, they are moved into the prime factors array and output to screen. This process continues until there are no composite factors left. The factors tend to be found in ascending order, but this is coincidental. The program makes no effort to list them in ascending order, but only lists them as they are found.
Note that I could not find any function (x^2+c)%n which will factor the number 4, no matter what value I gave c. Pollard Rho seems to have a very hard time with all perfect squares, but 4 is the only composite number I found which is totally impervious to it using functions in the format described. Therefore I added a check for an n of 4 inside the pollard method, returning 2 instantly if so.
So to set this program up, here is what you should do. Go to https://mattmccutchen.net/bigint/ and download bigint-2010.04.30.zip. Unzip this and put all of the .hh files and all of the C++ source files in your ~\Program Files\Microsoft Visual Studio 11.0\VC\include directory, excluding the Sample and C++ Testsuite source files. Then in Visual Studio, create an empty project. In the solution explorer, right click on the resource files folder and select Add...existing item. Add all of the C++ source files in the directory I just mentioned. Then also in solution expolorer, right click the Source Files folder and add a new item, select C++ file, name it, and paste the below source code into it, and it should work for you.
Not to flatter overly much, but there are folks here on Stack Overflow who know a great deal more about C++ than I do, and if they modify my code below to make it better, that's fantastic. But even if not, the code is functional as-is, and it should help illustrate the principles involved in programmatically finding prime factors of medium sized numbers. It will not threaten the general number field sieve, but it can factor numbers with 12 - 14 digit prime factors in a reasonably short time, even on an old Core2 Duo computer like the one I am using.
The code follows. Good luck.
#include <string>
#include <stdio.h>
#include <iostream>
#include "BigIntegerLibrary.hh"
typedef BigInteger BI;
typedef BigUnsigned BU;
using std::string;
using std::cin;
using std::cout;
BU pollard(BU numberToFactor);
BU gcda(BU differenceBetweenCongruentFunctions, BU numberToFactor);
BU f(BU x, BU numberToFactor, int increment);
void initializeArrays();
BU getNumberToFactor ();
void factorComposites();
bool testForComposite (BU num);
BU primeFactors[1000];
BU compositeFactors[1000];
BU tempFactors [1000];
int primeIndex;
int compositeIndex;
int tempIndex;
int numberOfCompositeFactors;
bool allJTestsShowComposite;
int main ()
{
while(1)
{
primeIndex=0;
compositeIndex=0;
tempIndex=0;
initializeArrays();
compositeFactors[0] = getNumberToFactor();
cout<<"\n\n";
if (compositeFactors[0] == 0) return 0;
numberOfCompositeFactors = 1;
factorComposites();
}
}
void initializeArrays()
{
for (int i = 0; i<1000;i++)
{
primeFactors[i] = 0;
compositeFactors[i]=0;
tempFactors[i]=0;
}
}
BU getNumberToFactor ()
{
std::string s;
std::cout<<"Enter the number for which you want a prime factor, or 0 to quit: ";
std::cin>>s;
return stringToBigUnsigned(s);
}
void factorComposites()
{
while (numberOfCompositeFactors!=0)
{
compositeIndex = 0;
tempIndex = 0;
// This while loop finds non-zero values in compositeFactors.
// If they are composite, it factors them and puts one factor in tempFactors,
// then divides the element in compositeFactors by the same amount.
// If the element is prime, it moves it into tempFactors (zeros the element in compositeFactors)
while (compositeIndex < 1000)
{
if(compositeFactors[compositeIndex] == 0)
{
compositeIndex++;
continue;
}
if(testForComposite(compositeFactors[compositeIndex]) == false)
{
tempFactors[tempIndex] = compositeFactors[compositeIndex];
compositeFactors[compositeIndex] = 0;
tempIndex++;
compositeIndex++;
}
else
{
tempFactors[tempIndex] = pollard (compositeFactors[compositeIndex]);
compositeFactors[compositeIndex] /= tempFactors[tempIndex];
tempIndex++;
compositeIndex++;
}
}
compositeIndex = 0;
// This while loop moves all remaining non-zero values from compositeFactors into tempFactors
// When it is done, compositeFactors should be all 0 value elements
while (compositeIndex < 1000)
{
if (compositeFactors[compositeIndex] != 0)
{
tempFactors[tempIndex] = compositeFactors[compositeIndex];
compositeFactors[compositeIndex] = 0;
tempIndex++;
compositeIndex++;
}
else compositeIndex++;
}
compositeIndex = 0;
tempIndex = 0;
// This while loop checks all non-zero elements in tempIndex.
// Those that are prime are shown on screen and moved to primeFactors
// Those that are composite are moved to compositeFactors
// When this is done, all elements in tempFactors should be 0
while (tempIndex<1000)
{
if(tempFactors[tempIndex] == 0)
{
tempIndex++;
continue;
}
if(testForComposite(tempFactors[tempIndex]) == false)
{
primeFactors[primeIndex] = tempFactors[tempIndex];
cout<<primeFactors[primeIndex]<<"\n";
tempFactors[tempIndex]=0;
primeIndex++;
tempIndex++;
}
else
{
compositeFactors[compositeIndex] = tempFactors[tempIndex];
tempFactors[tempIndex]=0;
compositeIndex++;
tempIndex++;
}
}
compositeIndex=0;
numberOfCompositeFactors=0;
// This while loop just checks to be sure there are still one or more composite factors.
// As long as there are, the outer while loop will repeat
while(compositeIndex<1000)
{
if(compositeFactors[compositeIndex]!=0) numberOfCompositeFactors++;
compositeIndex ++;
}
}
return;
}
// The following method uses the Miller-Rabin primality test to prove with 100% confidence a given number is composite,
// or to establish with a high level of confidence -- but not 100% -- that it is prime
bool testForComposite (BU num)
{
BU confidenceFactor = 101;
if (confidenceFactor >= num) confidenceFactor = num-1;
BU a,d,s, nMinusOne;
nMinusOne=num-1;
d=nMinusOne;
s=0;
while(modexp(d,1,2)==0)
{
d /= 2;
s++;
}
allJTestsShowComposite = true; // assume composite here until we can prove otherwise
for (BI i = 2 ; i<=confidenceFactor;i++)
{
if (modexp(i,d,num) == 1)
continue; // if this modulus is 1, then we cannot prove that num is composite with this value of i, so continue
if (modexp(i,d,num) == nMinusOne)
{
allJTestsShowComposite = false;
continue;
}
BU exponent(1);
for (BU j(0); j.toInt()<=s.toInt()-1;j++)
{
exponent *= 2;
if (modexp(i,exponent*d,num) == nMinusOne)
{
// if the modulus is not right for even a single j, then break and increment i.
allJTestsShowComposite = false;
continue;
}
}
if (allJTestsShowComposite == true) return true; // proven composite with 100% certainty, no need to continue testing
}
return false;
/* not proven composite in any test, so assume prime with a possibility of error =
(1/4)^(number of different values of i tested). This will be equal to the value of the
confidenceFactor variable, and the "witnesses" to the primality of the number being tested will be all integers from
2 through the value of confidenceFactor.
Note that this makes this primality test cryptographically less secure than it could be. It is theoretically possible,
if difficult, for a malicious party to pass a known composite number for which all of the lowest n integers fail to
detect that it is composite. A safer way is to generate random integers in the outer "for" loop and use those in place of
the variable i. Better still if those random numbers are checked to ensure no duplicates are generated.
*/
}
BU pollard(BU n)
{
if (n == 4) return 2;
BU x = 2;
BU y = 2;
BU d = 1;
int increment = 1;
while(d==1||d==n||d==0)
{
x = f(x,n, increment);
y = f(y,n, increment);
y = f(y,n, increment);
if (y>x)
{
d = gcda(y-x, n);
}
else
{
d = gcda(x-y, n);
}
if (d==0)
{
x = 2;
y = 2;
d = 1;
increment++; // This changes the pseudorandom function we use to increment x and y
}
}
return d;
}
BU gcda(BU a, BU b)
{
if (a==b||a==0)
return 0; // If x==y or if the absolute value of (x-y) == the number to be factored, then we have failed to find
// a factor. I think this is not proof of primality, so the process could be repeated with a new function.
// For example, by replacing x*x+1 with x*x+2, and so on. If many such functions fail, primality is likely.
BU currentGCD = 1;
while (currentGCD!=0) // This while loop is based on Euclid's algorithm
{
currentGCD = b % a;
b=a;
a=currentGCD;
}
return b;
}
BU f(BU x, BU n, int increment)
{
return (x * x + increment) % n;
}
As far as I can see, Pollard Rho normally uses f(x) as (x*x+1) (e.g. in these lecture notes ).
Your choice of x*x-1 appears not as good as it often seems to get stuck in a loop:
x=0
f(x)=-1
f(f(x))=0

Avoiding interger overflow with permutation (nPr, nCr) functions in C

I am attempting to do some statistics-related functions so I can carry out a few related procedures (ie: statistics calculations for probabilities, generate Pascal's triangle for an arbitrary depth, etc).
I have encountered an issue where I am likely dealing with overflow. For example, if I want to calculate nPr for (n=30,p=1), I know that I can reduce it to:
30P1 = 30! / (30 - 1)!
= 30! / (29)!
= 30! / 29!
= 30
However, when calculating using the functions below, it looks like I will always get invalid values due to integer overflow. Are there any workarounds that don't require the use of a library to support arbitrarily large numbers? I've read up a bit in other posts on the gamma functions, but couldn't find concrete examples.
int factorial(int n) {
return (n == 1 || n == 0) ? 1 : factorial(n - 1) * n;
}
int nCr(int n, int r) {
return (nPr(n,r) / factorial(r));
//return factorial(n) / factorial(r) / factorial(n-r));
}
int nPr(int n, int r) {
return (factorial(n) / factorial(n-r));
}
Here is a way to calculate without using gamma functions.
It relies on the fact that n_C_r = (n/r) * ((n-1)C(r-1))
and that for any positive value, n_C_0 = 1
so we could use it write a recusrive function like below
public long combination(long n, long r) {
if(r==0)
return 1;
else {
long num = n * combination(n - 1, r - 1);
return num/r;
}
}
I think you have two choices:
Use a big integer library. This way you won't lose precision (floating point might work for some cases, but is a poor substitute).
Restructure your functions, so they won't reach high intermediate values. E.g. factorial(x)/factorial(y) is the product of all numbers from y+1 to x. So just write a loop and multiply. This way, you'll only get an overflow if the final result overflows.
If you don't have to deal with signed values (and it doesn't appear that you do), you could try using a larger integral type, e.g., unsigned long long. If that doesn't suffice, you'd need to use a non-standard library that supports arbitrarily long integers. Note that the use of the long long type requires C99 compiler support (if you use GCC, might have to compile with -std=c99).
Edit: you might be able to fit more into a long double, which is 80-bits on some systems.
I might be being dense, but it seems to me that going to doubles and the gamma function is overkill here.
Are there any workarounds that don't require the use of a library to support arbitrarily large numbers?
Sure there are. You know exactly what you're dealing with at all times - products of ranges of integers. A range of integers is a special case of a finite list of integers. I have no idea what an idiomatic way of representing a list is in C, so I'll stick to C-ish pseudocode:
make_list(from, to)
return a list containing from, from+1, ..., to
concatenate_lists(list1, list2)
return a list with all the elements from list1 and list2
calculate_list_product(list)
return list[0] * list[1] * ... * list[last]
calculate_list_quotient(numerator_list, denominator_list)
/*
be a bit clever here: form the product of the elements of numerator_list, but
any time the running product is divisible by an element of denominator_list,
divide the running product by that element and remove it from consideration
*/
n_P_r(n, r)
/* nPr is n! / (n-r)!
which simplifies to n * n-1 * ... * r+1
so we can just: */
return calculate_list_product(make_list(r+1, n))
n_C_r(n, r)
/* nCr is n! / (r! (n-r)!) */
return calculate_list_quotient(
make_list(1, n),
concatenate_lists(make_list(1, r), make_list(1, n-r))
)
Note that we never actually calculate a factorial!
You look like you are on the right track, so here you go:
#include <math.h>
#include <stdio.h>
int nCr(int n, int r) {
if(r>n) {
printf("FATAL ERROR"); return 0;
}
if(n==0 || r==0 || n==r) {
return 1;
} else {
return (int)lround( ((double)n/(double)(n-r)/(double)r) * exp(lgamma(n) - lgamma(n-r) - lgamma(r)));
}
}
int nPr(int n, int r) {
if(r>n) {printf("FATAL ERROR"; return 0;}
if(n==0 || r==0) {
return 1;
} else {
if (n==r) {
r = n - 1;
}
return (int)lround( ((double)n/(double)(n-r)) * exp(lgamma(n) - lgamma(n-r)));
}
}
To compile, do: gcc -lm myFile.c && ./a.out
Note that the accuracy of your results is limited by the bit-depth of the double data type. You should be able to get good results with this, but be warned: replacing all the ints above with long long unsigned may not necessarily guarantee accurate results for larger values of n,r. At some point, you will still need some math library to handle arbitrarily large values, but this should help you avoid that for smaller input values.

Why is prime number check getting wrong results for large numbers?

This small C script checks if a number is a prime... Unfortunately it doesn't fully work. I am aware of the inefficiency of the script (e.g. sqrt optimization), these are not the problem.
#include <stdio.h>
int main() {
int n, m;
printf("Enter an integer, that will be checked:\n"); // Set 'n' from commandline
scanf("%d", &n); // Set 'n' from commandline
//n = 5; // To specify 'n' inside code.
for (m = n-1; m >= 1; m--) {
if (m == 1) {
printf("The entered integer IS a prime.\n");
break;
}
if (n % m == 0) {
printf("The entered integer IS NOT a prime.\n");
break;
}
}
return 0;
}
I tested the programm with a lot of numbers and it worked... Then I tried a bigger number (1231231231231236) which is clearly not a prime...
BUT: the program told me it was!?
What am I missing...?
The number "1231231231231236" is too big to fit in an "int" data type. Add a printf statement to show what number your program thinks you gave it, and if that's prime, your program works fine; else, you might have a problem that merits checking. Adding support for integers of arbitary size requires considerable extra effort.
The reason you are having this problem is that intrinsic data types like int have a fixed size - probably 32 bits, or 4 bytes, for int. Given that, variables of type int can only represent 2^32 unique values - about 4 billion. Even if you were using unsigned int (you're not), the int type couldn't be used to store numbers bigger than around 4 billion. Your number is several orders of magnitude larger than that and, as such, when you try to put your input into the int variable, something happens, but I can tell you what doesn't happen: it doesn't get assigned the value 1231231231231236.
Hard to know without more details, but if your ints are 32-bit, then the value you've passed is outside the allowable range, which will no doubt be represented as something other than the value you've passed. You may want to consider using unsigned int instead.
The given number is too large for integer in C. Probably it only accepted a part of it. Try Printing the value of n.

Resources