Factorization of really big numbers in C - c

I am writing an article regarding the importance of the Prime numbers on today's criptography. I want to develop a small application showing how long a program written in C (low level language, at least to me) would take to factorize a compound number in its prime factors. I came up with a simple algorithm to do so, but I ran into a problem:
I would like the user to be able to type gigantic numbers, for example: 7777777777777777777777777772
So the computer would take some hours to process that, showing how good our criptography based upon primes is.
But in C the largest data type I could find was LONG which goes up to 2147483646.
Do you guys know how I could be able to type and process a big number in C?
Thanks in advance

Factorization of really big numbers
I would like the user to be able to type gigantic numbers, for example: 7777777777777777777777777772
That is a 93 bit number, not that gigantic, so one could simplistically brute force it.
Something like the below if you have access to a unsigned __int128. C does specify 64-bit types, yet beyond that, you are on your own.
This modest factorization I'd estimate could take some minutes.
https://www.dcode.fr/prime-factors-decomposition reports the answer in seconds.
Of course many improvement can be had.
unsigned __int128 factor(unsigned __int128 x) {
if (x <= 3) {
return x;
}
if (x %2 == 0) return 2;
for (unsigned __int128 i = 3; i <= x/i; i += 2) {
static unsigned long n = 0;
if (++n >= 100000000) {
n = 0;
printf(" %llu approx %.0f\n", (unsigned long long) i, (double)(x/i));
fflush(stdout);
}
if (x%i == 0) {
return i;
}
}
return x;
}
void factors(unsigned __int128 x) {
do {
unsigned __int128 f = factor(x);
printf("%llu approx %.0f\n", (unsigned long long) f, (double)x);
fflush(stdout);
x /= f;
} while (x > 1);
}
void factors(unsigned __int128 x) {
do {
unsigned __int128 f = factor(x);
printf("approx %0.f approx %.0f\n", (double) f, (double)x);
fflush(stdout);
x /= f;
} while (x > 1);
}
Output
approx 2 approx 7777777777777778308713283584
approx 2 approx 3888888888888889154356641792
approx 487 approx 1944444444444444577178320896
approx 2687 approx 3992699064567647864619008
99996829 approx 14859790387308
199996829 approx 7429777390798
299996829 approx 4953158749339
399996829 approx 3714859245385
499996829 approx 2971882684351
...
38399996829 approx 38696146902
38499996829 approx 38595637421
approx 1485931918335559335936 approx 1485931918335559335936
The right answer though is to use more efficient algorithms and then consider the types needed.

The same way you do it on paper. You break the number into pieces and use long division, long addition, and long multiplication.
Perhaps the simplest way is to store the number as a base 10 string and write code to do all the operations you need on those strings. You would do addition with carries the same way you do it on paper. Multiplication would be done with single-digit multiplication combined with addition (which you'd have already don). And so on.
There are plenty of libraries available to do this for you such as libgmp's MPZ library and OpenSSL's BN library.

You can use a struct, and just set the numbers you want, the code below is not tested but should give you some direction.
I believe this should give you the ability to get somewhere around 4294967295 (max_int) to the power of x x being the places you define in the struct
typedef struct big_number{
int thousands;
int millions;
int billions;
}
//Then do some math
big_number add(big_number n1, big_number n2){
int thousands = n1.thousands + n2.thousands;
int millions = n1.millions + n2.millions;
//etc... (note each part of your struct will have a maximum value of 999
if(thousands > 999){
int r = thousands - 999;
millions += r; //move the remainder up
}
}

Related

How to compute the digits of an irrational number one by one?

I want to read digit by digit the decimals of the sqrt of 5 in C.
The square root of 5 is 2,23606797749979..., so this'd be the expected output:
2
3
6
0
6
7
9
7
7
...
I've found the following code:
#include<stdio.h>
void main()
{
int number;
float temp, sqrt;
printf("Provide the number: \n");
scanf("%d", &number);
// store the half of the given number e.g from 256 => 128
sqrt = number / 2;
temp = 0;
// Iterate until sqrt is different of temp, that is updated on the loop
while(sqrt != temp){
// initially 0, is updated with the initial value of 128
// (on second iteration = 65)
// and so on
temp = sqrt;
// Then, replace values (256 / 128 + 128 ) / 2 = 65
// (on second iteration 34.46923076923077)
// and so on
sqrt = ( number/temp + temp) / 2;
}
printf("The square root of '%d' is '%f'", number, sqrt);
}
But this approach stores the result in a float variable, and I don't want to depend on the limits of the float types, as I would like to extract like 10,000 digits, for instance. I also tried to use the native sqrt() function and casting it to string number using this method, but I faced the same issue.
What you've asked about is a very hard problem, and whether it's even possible to do "one by one" (i.e. without working space requirement that scales with how far out you want to go) depends on both the particular irrational number and the base you want it represented in. For example, in 1995 when a formula for pi was discovered that allows computing the nth binary digit in O(1) space, this was a really big deal. It was not something people expected to be possible.
If you're willing to accept O(n) space, then some cases like the one you mentioned are fairly easy. For example, if you have the first n digits of the square root of a number as a decimal string, you can simply try appending each digit 0 to 9, then squaring the string with long multiplication (same as you learned in grade school), and choosing the last one that doesn't overshoot. Of course this is very slow, but it's simple. The easy way to make it a lot faster (but still asymptotically just as bad) is using an arbitrary-precision math library in place of strings. Doing significantly better requires more advanced approaches and in general may not be possible.
As already noted, you need to change the algorithm into a digit-by-digit one (there are some examples in the Wikipedia page about the methods of computing of the square roots) and use an arbitrary precision arithmetic library to perform the calculations (for instance, GMP).
In the following snippet I implemented the before mentioned algorithm, using GMP (but not the square root function that the library provides). Instead of calculating one decimal digit at a time, this implementation uses a larger base, the greatest multiple of 10 that fits inside an unsigned long, so that it can produce 9 or 18 decimal digits at every iteration.
It also uses an adapted Newton method to find the actual "digit".
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <gmp.h>
unsigned long max_ul(unsigned long a, unsigned long b)
{
return a < b ? b : a;
}
int main(int argc, char *argv[])
{
// The GMP functions accept 'unsigned long int' values as parameters.
// The algorithm implemented here can work with bases other than 10,
// so that it can evaluate more than one decimal digit at a time.
const unsigned long base = sizeof(unsigned long) > 4
? 1000000000000000000
: 1000000000;
const unsigned long decimals_per_digit = sizeof(unsigned long) > 4 ? 18 : 9;
// Extract the number to be square rooted and the desired number of decimal
// digits from the command line arguments. Fallback to 0 in case of errors.
const unsigned long number = argc > 1 ? atoi(argv[1]) : 0;
const unsigned long n_digits = argc > 2 ? atoi(argv[2]) : 0;
// All the variables used by GMP need to be properly initialized before use.
// 'c' is basically the remainder, initially set to the original number
mpz_t c;
mpz_init_set_ui(c, number);
// At every iteration, the algorithm "move to the left" by two "digits"
// the reminder, so it multplies it by base^2.
mpz_t base_squared;
mpz_init_set_ui(base_squared, base);
mpz_mul(base_squared, base_squared, base_squared);
// 'p' stores the digits of the root found so far. The others are helper variables
mpz_t p;
mpz_init_set_ui(p, 0UL);
mpz_t y;
mpz_init(y);
mpz_t yy;
mpz_init(yy);
mpz_t dy;
mpz_init(dy);
mpz_t dx;
mpz_init(dx);
mpz_t pp;
mpz_init(pp);
// Timing, for testing porpuses
clock_t start = clock(), diff;
unsigned long x_max = number;
// Each "digit" correspond to some decimal digits
for (unsigned long i = 0,
last = (n_digits + decimals_per_digit) / decimals_per_digit + 1UL;
i < last; ++i)
{
// Find the greatest x such that: x * (2 * base * p + x) <= c
// where x is in [0, base), using a specialized Newton method
// pp = 2 * base * p
mpz_mul_ui(pp, p, 2UL * base);
unsigned long x = x_max;
for (;;)
{
// y = x * (pp + x)
mpz_add_ui(yy, pp, x);
mpz_mul_ui(y, yy, x);
// dy = y - c
mpz_sub(dy, y, c);
// If y <= c we have found the correct x
if ( mpz_sgn(dy) <= 0 )
break;
// Newton's step: dx = dy/y' where y' = 2 * x + pp
mpz_add_ui(yy, yy, x);
mpz_tdiv_q(dx, dy, yy);
// Update x even if dx == 0 (last iteration)
x -= max_ul(mpz_get_si(dx), 1);
}
x_max = base - 1;
// The actual format of the printed "digits" is up to you
if (i % 4 == 0)
{
if (i == 0)
printf("%lu.", x);
putchar('\n');
}
else
printf("%018lu", x);
// p = base * p + x
mpz_mul_ui(p, p, base);
mpz_add_ui(p, p, x);
// c = (c - y) * base^2
mpz_sub(c, c, y);
mpz_mul(c, c, base_squared);
}
diff = clock() - start;
long int msec = diff * 1000L / CLOCKS_PER_SEC;
printf("\n\nTime taken: %ld.%03ld s\n", msec / 1000, msec % 1000);
// Final cleanup
mpz_clear(c);
mpz_clear(base_squared);
mpz_clear(p);
mpz_clear(pp);
mpz_clear(dx);
mpz_clear(y);
mpz_clear(dy);
mpz_clear(yy);
}
You can see the outputted digits here.
Your title says:
How to compute the digits of an irrational number one by one?
Irrational numbers are not limited to most square roots. They also include numbers of the form log(x), exp(z), sin(y), etc. (transcendental numbers). However, there are some important factors that determine whether or how fast you can compute a given irrational number's digits one by one (that is, from left to right).
Not all irrational numbers are computable; that is, no one has found a way to approximate them to any desired length (whether by a closed form expression, a series, or otherwise).
There are many ways numbers can be expressed, such as by their binary or decimal expansions, as continued fractions, as series, etc. And there are different algorithms to compute a given number's digits depending on the representation.
Some formulas compute a given number's digits in a particular base (such as base 2), not in an arbitrary base.
For example, besides the first formula to extract the digits of π without computing the previous digits, there are other formulas of this type (known as BBP-type formulas) that extract the digits of certain irrational numbers. However, these formulas only work for a particular base, not all BBP-type formulas have a formal proof, and most importantly, not all irrational numbers have a BBP-type formula (essentially, only certain log and arctan constants do, not numbers of the form exp(x) or sqrt(x)).
On the other hand, if you can express an irrational number as a continued fraction (which all real numbers have), you can extract its digits from left to right, and in any base desired, using a specific algorithm. What is more, this algorithm works for any real number constant, including square roots, exponentials (e and exp(x)), logarithms, etc., as long as you know how to express it as a continued fraction. For an implementation see "Digits of pi and Python generators". See also Code to Generate e one Digit at a Time.

C code keeps running forever*

I am trying to find the largest prime factor of a huge number in C ,for small numbers like 100 or even 10000 it works fine but fails (By fail i mean it keeps running and running for tens of minutes on my core2duo and i5) for very big target numbers (See code for the target number.)
Is my algorithm correct?
I am new to C and really struggling with big numbers. What i want is correction or guidance not a solution i can do this using python with bignum bindings and stuff (I have not tried yet but am pretty sure) but not in C. Or i might have done some tiny mistake that i am too tired to realize , anyways here is the code i wrote:
#include <stdio.h>
// To find largest prime factor of target
int is_prime(unsigned long long int num);
long int main(void) {
unsigned long long int target = 600851475143;
unsigned long long int current_factor = 1;
register unsigned long long int i = 2;
while (i < target) {
if ( (target % i) == 0 && is_prime(i) && (i > current_factor) ) { //verify i as a prime factor and greater than last factor
current_factor = i;
}
i++;
}
printf("The greates is: %llu \n",current_factor);
return(0);
}
int is_prime (unsigned long long int num) { //if num is prime 1 else 0
unsigned long long int z = 2;
while (num > z && z !=num) {
if ((num % z) == 0) {return 0;}
z++;
}
return 1;
}
600 billion iterations of anything will take some non-trivial amount of time. You need to substantially reduce this.
Here's a hint: Given an arbitrary integer value x, if we discover that y is a factor, then we've implicitly discovered that x / y is also a factor. In other words, factors always come in pairs. So there's a limit to how far we need to iterate before we're doing redundant work.
What is that limit? Well, what's the crossover point where y will be greater than x / y?
Once you've applied this optimisation to the outer loop, you'll find that your code's runtime will be limited by the is_prime function. But of course, you may apply a similar technique to that too.
By iterating until the square root of the number, we can get all of it's factors.( factor and N/factor and factor<=sqrt(N)). Under this small idea the solution exists. Any factor less than the sqrt(N) we check, will have corresponding factor larger than sqrt(N). So we only need to check up to the sqrt(N), and then we can get the remaining factors.
Here you don't need to use explicitly any prime finding algorithm. The factorization logic itself will deduce whether the target is prime or not. So all that is left is to check the pairwise factors.
unsigned long long ans ;
for(unsigned long long i = 2; i<=target/i; i++)
while(target % i == 0){
ans = i;
target/=i;
}
if( target > 1 ) ans = target; // that means target is a prime.
//print ans
Edit: A point to be added (chux)- i*i in the earlier code is may lead to overflow which can be avoided if we use i<=target/i.
Also another choice would be to have
unsigned long long sqaure_root = isqrt(target);
for(unsigned long long i = 2; i<=square_root; i++){
...
}
Here note than use of sqrt is not a wise choice since -
mixing of double math with an integer operation is prone to round-off errors.
For target given the answer will be 6857.
Code has 2 major problems
The while (i < target) loop is very inefficient. Upon finding a factor, target could be reduced to target = target / i;. Further, a factor i could occur multiple times. Fix not shown.
is_prime(n) is very inefficient. Its while (num > z && z !=num) could loop n time. Here too, use the quotient to limit the iterations to sqrt(n) times.
int is_prime (unsigned long long int num) {
unsigned long long int z = 2;
while (z <= num/z) {
if ((num % z) == 0) return 0;
z++;
}
return num > 1;
}
Nothing is wrong, it just needs optimization, for example:
int is_prime(unsigned long long int num) {
if (num == 2) {
return (1); /* Special case */
}
if (num % 2 == 0 || num <= 1) {
return (0);
}
unsigned long long int z = 3; /* We skipped the all even numbers */
while (z < num) { /* Do a single test instead of your redundant ones */
if ((num % z) == 0) {
return 0;
}
z += 2; /* Here we go twice as fast */
}
return 1;
}
Also the big other problem is while (z < num) but since you don't want the solution i let you find how to optimize that, similarly look out by yourself the first function.
EDIT: Someone else posted 50 seconds before me the array-list of primes solution which is the best but i chose to give an easy solution since you are just a beginner and manipulating arrays may not be easy at first (need to learn pointers and stuff).
is_prime has a chicken-and-egg problem in that you need to test num only against other primes. So you don't need to check against 9 because that is a multiple of 3.
is_prime could maintain an array of primes and each time a new num is tested that is a pime, it can be added to the array. num isr tested against each prime in the array and if it is not divisable by any of the primes in the array, it is itself a prime and is added to the array. The aray needs to be malloc'd and relloc'd unless there is a formue to calculate the number of primes up intil your target (I believe such formula does not exist).
EDIT: the number of primes to test for the target 600,851,475,143 will be approximately 7,500,000,000 and the table could run out of memory.
The approach can be adapted as follows:
to use unsiged int up until primes of UINT_max
to use unsigned long long int for primes above that
to use brute force above a certain memory consumption.
UINT_MAX is defined as 4,294,967,295 and would cover the primes up to around 100,000,000,000 and would cost 7.5*4= 30Gb
See also The Prime Pages.

does modulus function is only applicable on integer data types?

my algorithm calculates the arithmetic operations given below,for small values it works perfectly but for large numbers such as 218194447 it returns a random value,I have tried to use long long int,double but nothing works because modulus function which I have used can only be used with int types , can anyone explain how to solve it or could provide a links that can be useful
#include<stdio.h>
#include<math.h>
int main()
{
long long i,j;
int t,n;
scanf("%d\n",&t);
while(t--)
{
scanf("%d",&n);
long long k;
i = (n*n);
k = (1000000007);
j = (i % k);
printf("%d\n",j);
}
return 0;
}
You could declare your variables as int64_t or long long ; then they would compute the modulus in their range (e.g. 64 bits for int64_t). And it would work correctly only if all intermediate values fit in their range.
However, you probably want or need bignums. I suggest you to learn and use GMPlib for that.
BTW, don't use pow since it computes in floating point. Try i = n * n; instead of i = pow(n,2);
P.S. this is not for a beginner in C programming, using gmplib requires some fluency with C programming (and programming in general)
The problem in your code is that intermittent values of your computation exceed the range of values that can be stored in an int. n^2 for values of n>2^30 cannot be represented as int.
Follow the link above given by R.T. for a way of doing modulo on big numbers. That won't be enough though, since you also need a class/library that can handle big integer values . With only standard C libraries in place, that will otherwise be a though task do do on your own. (ok, for 2^31, a 64 bit integer would do, but if you're going even larger, you're out of luck again)
After accept answer
To find the modulo of a number n raised to some power p (2 in OP's case), there is no need to first calculate power(n,p). Instead calculate intermediate modulo values as n is raise to intermediate powers.
The following code works with p==2 as needed by OP, but also works quickly if p=1000000000.
The only wider integers needed are integers that are twice as wide as n.
Performing all this with unsigned integers simplifies the needed code.
The resultant code is quite small.
#include <stdint.h>
uint32_t powmod(uint32_t base, uint32_t expo, uint32_t mod) {
// `y = 1u % mod` needed only for the cases expo==0, mod<=1
// otherwise `y = 1u` would do.
uint32_t y = 1u % mod;
while (expo) {
if (expo & 1u) {
y = ((uint64_t) base * y) % mod;
}
expo >>= 1u;
base = ((uint64_t) base * base) % mod;
}
return y;
}
#include<stdio.h>
#include<math.h>
int main(void) {
unsigned long j;
unsigned t, n;
scanf("%u\n", &t);
while (t--) {
scanf("%u", &n);
unsigned long k;
k = 1000000007u;
j = powmod(n, 2, k);
printf("%lu\n", j);
}
return 0;
}

How do I make this code work for large input values?

#include <stdio.h>
int main()
{
int i,j,k,t;
long int n;
int count;
int a,b;
float c;
scanf("%d",&t);
for(k=0;k<t;k++)
{
count=0;
scanf("%d",&n);
for(i=1;i<n;i++)
{
a=pow(i,2);
for(j=i;j<n;j++)
{
b=pow(j,2);
c=sqrt(a+b);
if((c-floor(c)==0)&&c<=n)
++count;
}
}
printf("%d\n",count);
}
return 0;
}
The above is a c code that counts the number of Pythagorean triplets within range 1..n.
How do I optimize it ? It times out for large input .
1<=T<=100
1<=N<=10^6
Your inner two loops are O(n*n) so there's not too much that can be done without changing algorithms. Just looking at the inner loop the best I could come up with in a short time was the following:
unsigned long long int i,j,k,t;
unsigned long long int n = 30000; //Example for testing
unsigned long long int count = 0;
unsigned long long int a, b;
unsigned long long int c;
unsigned long long int n2 = n * n;
for(i=1; i<n; i++)
{
a = i*i;
for(j=i; j<n; j++)
{
b = j*j;
unsigned long long int sum = a + b;
if (sum > n2) break;
// Check for multiples of 2, 3, and 5
if ( (sum & 2) || ((sum & 7) == 5) || ((sum & 11) == 8) ) continue;
c = sqrt((double)sum);
if (c*c == sum) ++count;
}
}
A few comments:
For the case of n=30000 this is roughly twice as fast as your original.
If you don't mind n being limited to 65535 you can switch to unsigned int to get a x2 speed increase (or roughly x4 faster than your original).
The check for multiples of 2/3/5 increases the speed by a factor of two. You may be able to increase this by looking at the answers to this question.
Your original code has integer overflows when i > 65535 which is the reason I switched to 64-bit integers for everything.
I think your method of checking for a perfect square doesn't always work due to the inherent in-precision of floating point numbers. The method in my example should get around that and is slightly faster anyways.
You are still bound to the O(n*n) algorithm. On my machine the code for n=30000 runs in about 6 seconds which means the n=1000000 case will take close to 2 hours. Looking at Wikipedia shows a host of other algorithms you could explore.
It really depends on what the benchmark is that you are expecting.
But for now, the power function could be a bottle neck in this. I think you can do either of the two things:
a) precalculate and save in a file and then load into a dictionary all the squared values. Depending on the input size, that might be loading your memory.
b) memorize previously calculated squared values so that when asked again, you could reuse it there by saving CPU time. This again, would eventually load your memory.
You can define your indexes as (unsigned) long or even (unsigned) long long, but you may have to use big num libraries to solve your problem for huge numbers. Using unsigned uppers your Max number limit but forces you to work with positive numbers. I doubt you'll need bigger than long long though.
It seems your question is about optimising your code to make it faster. If you read up on Pythagorean triplets you will see there is a way to calculate them using integer parameters. If 3 4 5 are triplets then we know that 2*3 2*4 2*5 are also triplets and k*3 k*4 k*5 are also triplets. Your algorithm is checking all of those triplets. There are better algorithms to use, but I'm afraid you will have to search on Google to study about Pythagorean triplets.

Optimized way to handle extremely large number without using external library

Optimized way to handle the value of n^n (1 ≤ n ≤ 10^9)
I used long long int but it's not good enough as the value might be (1000^1000)
Searched and found the GMP library http://gmplib.org/ and BigInt class but don't wanna use them. I am looking for some numerical method to handle this.
I need to print the first and last k (1 ≤ k ≤ 9) digits of n^n
For the first k digits I am getting it like shown below (it's bit ugly way of doing it)
num = pow(n,n);
while(num){
arr[i++] = num%10;
num /= 10;
digit++;
}
while(digit > 0){
j=digit;
j--;
if(count<k){
printf("%lld",arr[j]);
count++;
}
digit--;
}
and for last k digits am using num % 10^k like below.
findk=pow(10,k);
lastDigits = num % findk;
enter code here
maximum value of k is 9. so i need only 18 digits at max.
I am think of getting those 18 digits without really solving the complete n^n expression.
Any idea/suggestion??
// note: Scope of use is limited.
#include <stdio.h>
long long powerMod(long long a, long long d, long long n){
// a ^ d mod n
long long result = 1;
while(d > 0){
if(d & 1)
result = result * a % n;
a = (a * a) % n;
d >>=1;
}
return result;
}
int main(void){
long long result = powerMod(999, 999, 1000000000);//999^999 mod 10^9
printf("%lld\n", result);//499998999
return 0;
}
Finding the Least Significant Digits (last k digits) are easy because of the property of modular arithmetic, which says: (n*n)%m == (n%m * n%m)%m, so the code shown by BLUEPIXY which followed exponentiation by squaring method will work well for finding k LSDs.
Now, Most Significant Digits (1st k digits) of N^N can be found in this way:
We know,
N^N = 10^(N log N)
So if you calculate N log (N) you will get a number of this format xxxx.yyyy, now we have to use this number as a power of 10, it is easily understandable that xxxx or integer part of the number will add xxxx zeros after 10, which is not important for you! That means, if you calculate 10^0.yyyy, you will get those significants digits you are looking for.
So the solution will be something like this:
double R = N * log10 (N);
R = R - (long long) R; //so taking only the fractional part
double V = pow(10, R);
int powerK = 1;
for (int i=0; i<k; i++) powerK *=10;
V *= powerK;
//Now Print the 1st K digits from V
Why don't you want to use bigint libraries?
bignum arithmetic is very hard to do right and efficiently. You could still get a PhD by working on that subject.
Fist, bigint arithmetic have non-trivial algorithmics
Then, bigint implementations usually need some machine instructions (like add with carry) which are not easily accessible in plain C.
For your specific problem (first and last few digits of NN) you'll better also reason on paper (using arithmetic theorems) to lower the complexity. I am not an expert, but I guess that still remains intractable, perhaps with a complexity worse than O(N)

Resources