How to sum large numbers?

How to sum large numbers? - c

I am trying to calculate 1 + 1 * 2 + 1 * 2 * 3 + 1 * 2 * 3 * 4 + ... + 1 * 2 * ... * n where n is the user input.
It works for values of n up to 12. I want to calculate the sum for n = 13, n = 14 and n = 15. How do I do that in C89? As I know, I can use unsigned long long int only in C99 or C11.
Input 13, result 2455009817, expected 6749977113
Input 14, result 3733955097, expected 93928268313
Input 15, result 1443297817, expected 1401602636313
My code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned long int n;
unsigned long int P = 1;
int i;
unsigned long int sum = 0;
scanf("%lu", &n);
for(i = 1; i <= n; i++)
{
P *= i;
sum += P;
}
printf("%lu", sum);
return 0;
}

In practice, you want some arbitrary precision arithmetic (a.k.a. bigint or bignum) library. My recommendation is GMPlib but there are other ones.
Don't try to code your own bignum library. Efficient & clever algorithms exist, but they are unintuitive and difficult to grasp (you can find entire books devoted to that question). In addition, existing libraries like GMPlib are taking advantage of specific machine instructions (e.g. ADC -add with carry) that a standard C compiler won't emit (from pure C code).
If this is a homework and you are not allowed to use external code, consider for example representing a number in base or radix 1000000000 (one billion) and code yourself the operations in a very naive way, similar to what you have learned as a kid. But be aware that more efficient algorithms exist (and that real bignum libraries are using them).
A number could be represented in base 1000000000 by having an array of unsigned, each being a "digit" of base 1000000000. So you need to manage arrays (probably heap allocated, using malloc) and their length.

You could use a double, especially if your platform uses IEEE754.
Such a double gives you 53 bits of precision, which means integers are exact up to the 53rd power of 2. That's good enough for this case.
If your platform doesn't use IEEE754 then consult the documentation on the floating point scheme adopted. It might be adequate.

A simple approach when you're just over the limit of MaxInt, is to do the computations modulo 10^n for a suitable n and you do the same computation as floating point computation but where you divide everything by 10^r.The former result will give you the first n digits while the latter result will give you the last digits of the answer with the first r digits removed. Then the last few digits here will be inaccurate due to roundoff errors, so you should choose r a bit smaller than n. In this case taking n = 9 and r = 5 will work well.

Related

Buggy transfer of single long long numbers to int array

I’m trying to grab a Long Long Int and split each place number into it’s own spot in an array, in order of course, with array[0] being the largest number.
So for instance, if the number was 314, then array[0] = 3, array[1] =1, and array[2] = 4.
This is part of a calculator project for a microcontroller where I’m writing the graphics library (for fun) and using arrays to display each line.
The issue is, it needs to be able to deal with really large numbers (9,999,999,999+), and I’m having dramas with the large stuff. If the Long Long is < 1,000,000, it will writes all the numbers perfectly, but the more numbers I add, they all start to be written wrong towards the end.
For instance, 1,234,567,890 displays as 1,234,567,966.
Here’s the snippet of code I’m using:
long long int number = 1234567890;
int answerArray[10];
int numberLength = 10;
for(writeNumber = 0; writeNumber < numberLength; writeNumber++)
{
answerArray[writeNumber] = ((int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
}
I’m fairly sure this has to do with either the “%” and multiple data types, because any number within the Int range works perfectly.
Can you see where I’m going wrong? Is there a better way achieve my goal? Any tips for large numbers?

The signature of pow is
double pow(double x, double y);
When you call the function, the computation will implicitly use floating point. That is why it is no longer exact as pure integer operations.
In addition, you have to be careful how you cast to int.
In your question, you have
((int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
The parentheses do not match, so I will assume you meant:
(int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
However, here you cast a number that may exceed the range of int before you apply the modulo 10 operation. That can result in an integer overflow. The code is doing the same as if you had written:
((int)(number / pow(10, (numberLength - 1 - writeNumber)))) % 10;
To avoid the overflow, it would be better to perform the modulo operation first. However, you are dealing implicitly with double at this point (because of pow), so it is not ideal either. It is best to stick with pure integer operations to avoid these pitfalls.

Your issue is that you're casting what is potentially a very large number to an int. Look at the iteration when writeNumber is numberLength-1. In that case, you're dividing a long long by 1 and then forcing the result into an int. Once number becomes larger than 2^31-1, you're going to run into problems.
You should remove the cast altogether as well as the call to pow. Instead, you should iteratively grab the next digit by modding out by 10 and then dividing number (or a copy of it) by 10.
E.g.,
int index = sizeof(answerArray)/sizeof(answerArray[0]);
for (long long x=number; x>0; x /= 10) {
answerArray[--index] = x%10;
}

How to compute the digits of an irrational number one by one?

I want to read digit by digit the decimals of the sqrt of 5 in C.
The square root of 5 is 2,23606797749979..., so this'd be the expected output:
2
3
6
0
6
7
9
7
7
...
I've found the following code:
#include<stdio.h>
void main()
{
int number;
float temp, sqrt;
printf("Provide the number: \n");
scanf("%d", &number);
// store the half of the given number e.g from 256 => 128
sqrt = number / 2;
temp = 0;
// Iterate until sqrt is different of temp, that is updated on the loop
while(sqrt != temp){
// initially 0, is updated with the initial value of 128
// (on second iteration = 65)
// and so on
temp = sqrt;
// Then, replace values (256 / 128 + 128 ) / 2 = 65
// (on second iteration 34.46923076923077)
// and so on
sqrt = ( number/temp + temp) / 2;
}
printf("The square root of '%d' is '%f'", number, sqrt);
}
But this approach stores the result in a float variable, and I don't want to depend on the limits of the float types, as I would like to extract like 10,000 digits, for instance. I also tried to use the native sqrt() function and casting it to string number using this method, but I faced the same issue.

What you've asked about is a very hard problem, and whether it's even possible to do "one by one" (i.e. without working space requirement that scales with how far out you want to go) depends on both the particular irrational number and the base you want it represented in. For example, in 1995 when a formula for pi was discovered that allows computing the nth binary digit in O(1) space, this was a really big deal. It was not something people expected to be possible.
If you're willing to accept O(n) space, then some cases like the one you mentioned are fairly easy. For example, if you have the first n digits of the square root of a number as a decimal string, you can simply try appending each digit 0 to 9, then squaring the string with long multiplication (same as you learned in grade school), and choosing the last one that doesn't overshoot. Of course this is very slow, but it's simple. The easy way to make it a lot faster (but still asymptotically just as bad) is using an arbitrary-precision math library in place of strings. Doing significantly better requires more advanced approaches and in general may not be possible.

As already noted, you need to change the algorithm into a digit-by-digit one (there are some examples in the Wikipedia page about the methods of computing of the square roots) and use an arbitrary precision arithmetic library to perform the calculations (for instance, GMP).
In the following snippet I implemented the before mentioned algorithm, using GMP (but not the square root function that the library provides). Instead of calculating one decimal digit at a time, this implementation uses a larger base, the greatest multiple of 10 that fits inside an unsigned long, so that it can produce 9 or 18 decimal digits at every iteration.
It also uses an adapted Newton method to find the actual "digit".
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <gmp.h>
unsigned long max_ul(unsigned long a, unsigned long b)
{
return a < b ? b : a;
}
int main(int argc, char *argv[])
{
// The GMP functions accept 'unsigned long int' values as parameters.
// The algorithm implemented here can work with bases other than 10,
// so that it can evaluate more than one decimal digit at a time.
const unsigned long base = sizeof(unsigned long) > 4
? 1000000000000000000
: 1000000000;
const unsigned long decimals_per_digit = sizeof(unsigned long) > 4 ? 18 : 9;
// Extract the number to be square rooted and the desired number of decimal
// digits from the command line arguments. Fallback to 0 in case of errors.
const unsigned long number = argc > 1 ? atoi(argv[1]) : 0;
const unsigned long n_digits = argc > 2 ? atoi(argv[2]) : 0;
// All the variables used by GMP need to be properly initialized before use.
// 'c' is basically the remainder, initially set to the original number
mpz_t c;
mpz_init_set_ui(c, number);
// At every iteration, the algorithm "move to the left" by two "digits"
// the reminder, so it multplies it by base^2.
mpz_t base_squared;
mpz_init_set_ui(base_squared, base);
mpz_mul(base_squared, base_squared, base_squared);
// 'p' stores the digits of the root found so far. The others are helper variables
mpz_t p;
mpz_init_set_ui(p, 0UL);
mpz_t y;
mpz_init(y);
mpz_t yy;
mpz_init(yy);
mpz_t dy;
mpz_init(dy);
mpz_t dx;
mpz_init(dx);
mpz_t pp;
mpz_init(pp);
// Timing, for testing porpuses
clock_t start = clock(), diff;
unsigned long x_max = number;
// Each "digit" correspond to some decimal digits
for (unsigned long i = 0,
last = (n_digits + decimals_per_digit) / decimals_per_digit + 1UL;
i < last; ++i)
{
// Find the greatest x such that: x * (2 * base * p + x) <= c
// where x is in [0, base), using a specialized Newton method
// pp = 2 * base * p
mpz_mul_ui(pp, p, 2UL * base);
unsigned long x = x_max;
for (;;)
{
// y = x * (pp + x)
mpz_add_ui(yy, pp, x);
mpz_mul_ui(y, yy, x);
// dy = y - c
mpz_sub(dy, y, c);
// If y <= c we have found the correct x
if ( mpz_sgn(dy) <= 0 )
break;
// Newton's step: dx = dy/y' where y' = 2 * x + pp
mpz_add_ui(yy, yy, x);
mpz_tdiv_q(dx, dy, yy);
// Update x even if dx == 0 (last iteration)
x -= max_ul(mpz_get_si(dx), 1);
}
x_max = base - 1;
// The actual format of the printed "digits" is up to you
if (i % 4 == 0)
{
if (i == 0)
printf("%lu.", x);
putchar('\n');
}
else
printf("%018lu", x);
// p = base * p + x
mpz_mul_ui(p, p, base);
mpz_add_ui(p, p, x);
// c = (c - y) * base^2
mpz_sub(c, c, y);
mpz_mul(c, c, base_squared);
}
diff = clock() - start;
long int msec = diff * 1000L / CLOCKS_PER_SEC;
printf("\n\nTime taken: %ld.%03ld s\n", msec / 1000, msec % 1000);
// Final cleanup
mpz_clear(c);
mpz_clear(base_squared);
mpz_clear(p);
mpz_clear(pp);
mpz_clear(dx);
mpz_clear(y);
mpz_clear(dy);
mpz_clear(yy);
}
You can see the outputted digits here.

Your title says:
How to compute the digits of an irrational number one by one?
Irrational numbers are not limited to most square roots. They also include numbers of the form log(x), exp(z), sin(y), etc. (transcendental numbers). However, there are some important factors that determine whether or how fast you can compute a given irrational number's digits one by one (that is, from left to right).
Not all irrational numbers are computable; that is, no one has found a way to approximate them to any desired length (whether by a closed form expression, a series, or otherwise).
There are many ways numbers can be expressed, such as by their binary or decimal expansions, as continued fractions, as series, etc. And there are different algorithms to compute a given number's digits depending on the representation.
Some formulas compute a given number's digits in a particular base (such as base 2), not in an arbitrary base.
For example, besides the first formula to extract the digits of π without computing the previous digits, there are other formulas of this type (known as BBP-type formulas) that extract the digits of certain irrational numbers. However, these formulas only work for a particular base, not all BBP-type formulas have a formal proof, and most importantly, not all irrational numbers have a BBP-type formula (essentially, only certain log and arctan constants do, not numbers of the form exp(x) or sqrt(x)).
On the other hand, if you can express an irrational number as a continued fraction (which all real numbers have), you can extract its digits from left to right, and in any base desired, using a specific algorithm. What is more, this algorithm works for any real number constant, including square roots, exponentials (e and exp(x)), logarithms, etc., as long as you know how to express it as a continued fraction. For an implementation see "Digits of pi and Python generators". See also Code to Generate e one Digit at a Time.

Calculating sum of digits of 2^n in C

I am new to C and trying to write a program that calculates the sum of the digits of 2^n, where n<10^8.
For example, for 2^10, we'd have 1+0+2+4, which is 7.
Here's what I came up with:
#include <stdio.h>
#include <math.h>
int main()
{
int n, t, sum = 0, remainder;
printf("Enter an integer\n");
scanf("%d", &n);
t = pow(2, n);
while (t != 0)
{
remainder = t % 10;
sum = sum + remainder;
t = t / 10;
}
printf("Sum of digits of 2 to the power of %d = %d\n", n, sum);
return 0;
}
The problem is: the program works fine with numbers smaller than 30. Once I set n to a number higher than 30, the result is always -47.
I really do not understand this error and what causes it.

An interesting problem to be sure, but I think the solution is way outside the scope of a simple answer if you wish to support large values of n, such as the 108 you mentioned. The number 2108 requires 108 + 1 (100,000,001) bits, or around 12 megabytes of memory, to store in binary. In decimal it has around 30 million digits.
Your int is 32 bits wide, which is why the signed int can't store 231 – the 32nd bit is the sign while 231 has a 1 followed by 31 zeros in binary, requiring 32 bits without the sign. So it overflows and is interpreted as a negative number. (Technically signed integer overflow is undefined behaviour in C.)
You can switch to an unsigned int to get rid of the sign and the undefined behaviour, in which case your new highest supported n will be 31. You almost certainly have 64-bit integers available, and perhaps even 128-bit, but 2127 is still way less than 2100000000.
So either you need to find an algorithm to compute the decimal digits of a power of 2 without actually storing them (and only store the sum), or forget about trying to use any scalar types in standard C and get (or implement) an arbitrary precision math library operating on arrays (of bits, decimal digits, or binary-coded decimal digits). Alternatively, you can limit your solution to, say, uint64_t, but then you have n < 64, which is not nearly as interesting… =)

For signed int t = pow(2,n), if n >= 31 then t > INT_MAX.
You can use unsigned long long t = pow(2,n) instead.
This will allow you to go as up as n == 63.
Also, since you're using base 2, you can use (unsigned long long)1 << n instead of pow(2,n).

Upper bound for number of digits of big integer in different base

I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.
Example:
A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.
The function I am currently using is:
int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
return ceil(n_digits*log(src_base)/log(dst_base));
}
Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.
I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.
GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.

I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.
In the following, let dst_base be 2^w, src_base be b, and n_digits be n.
Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)
For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:
ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) )
≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.
In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.

I'm not sure about float point rounding in this case, but it is relatively easy to implement this using only integers, as log2 is a classic bit manipulation pattern and integer division can be easily rounded up. The following code is equivalent to yours, but using integers:
// Returns log2(x) rounded up using bit manipulation (not most efficient way)
unsigned int log2(unsigned int x)
{
unsigned int y = 0;
--x;
while (x) {
y++;
x >>= 1;
}
return y;
}
// Returns ceil(a/b) using integer division
unsigned int roundup(unsigned int a, unsigned int b)
{
return (a + b - 1) / b;
}
unsigned int get_num_digits_in_different_base(unsigned int n_digits, unsigned int src_base, unsigned int log2_dst_base)
{
return roundup(n_digits * log2(src_base), log2_dst_base);
}
Please, note that:
This function return different results compared to yours! However, in every case I looked, both were still correct (the smaller value was more accurate, but your requirement is just an upper bound).
The integer version I wrote receives log2_dst_base instead of dst_base to avoid overflow for 2^64.
log2 can be made more efficient using lookup tables.
I've used unsigned int instead of int.

accuracy of sqrt of integers

I have a loop like this:
for(uint64_t i=0; i*i<n; i++) {
This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.
unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {
In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.
My question is: is the sqrt function accurate enough to do this for all cases?
Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.
My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.
sqrt(x+dx) < sqrt(x) + 1 //solve
dx < 2*sqrt(x) + 1
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1
Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:
unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi); //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi)); //65535 65536
Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.

First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.
If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:
for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)
replacing the multiply with an lea or a shift and two adds.
Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.
However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:
uint32_t cut = sqrt(n) + 1;
and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:
uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;
Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:
uint32_t cut = sqrt(n);
Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:
double_n = n*(1 + e)
which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:
square_root(double_n) = square_root(n)*square_root(1+e)
Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:
square_root(1+e) = 1 + e/2 + O(e^2)
= 1 + d with |d| <~ 2^-54
Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.
[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.

All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both
a simple way to avoid imul in loop
a simple way to compute the ceiling sqrt
Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:
sqrtFloor
"Return the integer part of the square root of self"
| guess delta |
guess := 1 bitShift: (self highBit + 1) // 2.
[
delta := (guess squared - self) // (guess + guess).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1
Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow
But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:
uint32_t sqrtFloor(uint64_t n)
{
int64_t diff;
int64_t delta;
uint64_t guess=sqrt(n); /* implicit conversions here... */
while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
guess -= delta;
return guess-(diff>0);
}
That's a few integer multiplications and divisions, but outside the main loop.

What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.
For x>0, there is
.
To make the notation more compact, rewriting the above formula as
Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:
Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.
For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.