Buggy transfer of single long long numbers to int array - arrays

I’m trying to grab a Long Long Int and split each place number into it’s own spot in an array, in order of course, with array[0] being the largest number.
So for instance, if the number was 314, then array[0] = 3, array[1] =1, and array[2] = 4.
This is part of a calculator project for a microcontroller where I’m writing the graphics library (for fun) and using arrays to display each line.
The issue is, it needs to be able to deal with really large numbers (9,999,999,999+), and I’m having dramas with the large stuff. If the Long Long is < 1,000,000, it will writes all the numbers perfectly, but the more numbers I add, they all start to be written wrong towards the end.
For instance, 1,234,567,890 displays as 1,234,567,966.
Here’s the snippet of code I’m using:
long long int number = 1234567890;
int answerArray[10];
int numberLength = 10;
for(writeNumber = 0; writeNumber < numberLength; writeNumber++)
{
answerArray[writeNumber] = ((int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
}
I’m fairly sure this has to do with either the “%” and multiple data types, because any number within the Int range works perfectly.
Can you see where I’m going wrong? Is there a better way achieve my goal? Any tips for large numbers?

The signature of pow is
double pow(double x, double y);
When you call the function, the computation will implicitly use floating point. That is why it is no longer exact as pure integer operations.
In addition, you have to be careful how you cast to int.
In your question, you have
((int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
The parentheses do not match, so I will assume you meant:
(int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
However, here you cast a number that may exceed the range of int before you apply the modulo 10 operation. That can result in an integer overflow. The code is doing the same as if you had written:
((int)(number / pow(10, (numberLength - 1 - writeNumber)))) % 10;
To avoid the overflow, it would be better to perform the modulo operation first. However, you are dealing implicitly with double at this point (because of pow), so it is not ideal either. It is best to stick with pure integer operations to avoid these pitfalls.

Your issue is that you're casting what is potentially a very large number to an int. Look at the iteration when writeNumber is numberLength-1. In that case, you're dividing a long long by 1 and then forcing the result into an int. Once number becomes larger than 2^31-1, you're going to run into problems.
You should remove the cast altogether as well as the call to pow. Instead, you should iteratively grab the next digit by modding out by 10 and then dividing number (or a copy of it) by 10.
E.g.,
int index = sizeof(answerArray)/sizeof(answerArray[0]);
for (long long x=number; x>0; x /= 10) {
answerArray[--index] = x%10;
}

Related

How to sum large numbers?

I am trying to calculate 1 + 1 * 2 + 1 * 2 * 3 + 1 * 2 * 3 * 4 + ... + 1 * 2 * ... * n where n is the user input.
It works for values of n up to 12. I want to calculate the sum for n = 13, n = 14 and n = 15. How do I do that in C89? As I know, I can use unsigned long long int only in C99 or C11.
Input 13, result 2455009817, expected 6749977113
Input 14, result 3733955097, expected 93928268313
Input 15, result 1443297817, expected 1401602636313
My code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned long int n;
unsigned long int P = 1;
int i;
unsigned long int sum = 0;
scanf("%lu", &n);
for(i = 1; i <= n; i++)
{
P *= i;
sum += P;
}
printf("%lu", sum);
return 0;
}
In practice, you want some arbitrary precision arithmetic (a.k.a. bigint or bignum) library. My recommendation is GMPlib but there are other ones.
Don't try to code your own bignum library. Efficient & clever algorithms exist, but they are unintuitive and difficult to grasp (you can find entire books devoted to that question). In addition, existing libraries like GMPlib are taking advantage of specific machine instructions (e.g. ADC -add with carry) that a standard C compiler won't emit (from pure C code).
If this is a homework and you are not allowed to use external code, consider for example representing a number in base or radix 1000000000 (one billion) and code yourself the operations in a very naive way, similar to what you have learned as a kid. But be aware that more efficient algorithms exist (and that real bignum libraries are using them).
A number could be represented in base 1000000000 by having an array of unsigned, each being a "digit" of base 1000000000. So you need to manage arrays (probably heap allocated, using malloc) and their length.
You could use a double, especially if your platform uses IEEE754.
Such a double gives you 53 bits of precision, which means integers are exact up to the 53rd power of 2. That's good enough for this case.
If your platform doesn't use IEEE754 then consult the documentation on the floating point scheme adopted. It might be adequate.
A simple approach when you're just over the limit of MaxInt, is to do the computations modulo 10^n for a suitable n and you do the same computation as floating point computation but where you divide everything by 10^r.The former result will give you the first n digits while the latter result will give you the last digits of the answer with the first r digits removed. Then the last few digits here will be inaccurate due to roundoff errors, so you should choose r a bit smaller than n. In this case taking n = 9 and r = 5 will work well.

Upper bound for number of digits of big integer in different base

I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.
Example:
A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.
The function I am currently using is:
int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
return ceil(n_digits*log(src_base)/log(dst_base));
}
Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.
I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.
GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.
I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.
In the following, let dst_base be 2^w, src_base be b, and n_digits be n.
Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)
For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:
ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) )
≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.
In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.
I'm not sure about float point rounding in this case, but it is relatively easy to implement this using only integers, as log2 is a classic bit manipulation pattern and integer division can be easily rounded up. The following code is equivalent to yours, but using integers:
// Returns log2(x) rounded up using bit manipulation (not most efficient way)
unsigned int log2(unsigned int x)
{
unsigned int y = 0;
--x;
while (x) {
y++;
x >>= 1;
}
return y;
}
// Returns ceil(a/b) using integer division
unsigned int roundup(unsigned int a, unsigned int b)
{
return (a + b - 1) / b;
}
unsigned int get_num_digits_in_different_base(unsigned int n_digits, unsigned int src_base, unsigned int log2_dst_base)
{
return roundup(n_digits * log2(src_base), log2_dst_base);
}
Please, note that:
This function return different results compared to yours! However, in every case I looked, both were still correct (the smaller value was more accurate, but your requirement is just an upper bound).
The integer version I wrote receives log2_dst_base instead of dst_base to avoid overflow for 2^64.
log2 can be made more efficient using lookup tables.
I've used unsigned int instead of int.

accuracy of sqrt of integers

I have a loop like this:
for(uint64_t i=0; i*i<n; i++) {
This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.
unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {
In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.
My question is: is the sqrt function accurate enough to do this for all cases?
Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.
My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.
sqrt(x+dx) < sqrt(x) + 1 //solve
dx < 2*sqrt(x) + 1
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1
Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:
unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi); //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi)); //65535 65536
Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.
First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.
If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:
for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)
replacing the multiply with an lea or a shift and two adds.
Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.
However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:
uint32_t cut = sqrt(n) + 1;
and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:
uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;
Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:
uint32_t cut = sqrt(n);
Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:
double_n = n*(1 + e)
which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:
square_root(double_n) = square_root(n)*square_root(1+e)
Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:
square_root(1+e) = 1 + e/2 + O(e^2)
= 1 + d with |d| <~ 2^-54
Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.
[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.
All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both
a simple way to avoid imul in loop
a simple way to compute the ceiling sqrt
Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:
sqrtFloor
"Return the integer part of the square root of self"
| guess delta |
guess := 1 bitShift: (self highBit + 1) // 2.
[
delta := (guess squared - self) // (guess + guess).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1
Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow
But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:
uint32_t sqrtFloor(uint64_t n)
{
int64_t diff;
int64_t delta;
uint64_t guess=sqrt(n); /* implicit conversions here... */
while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
guess -= delta;
return guess-(diff>0);
}
That's a few integer multiplications and divisions, but outside the main loop.
What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.
For x>0, there is
.
To make the notation more compact, rewriting the above formula as
Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:
Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.
For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.

Generating a uniform distribution of INTEGERS in C

I've written a C function that I think selects integers from a uniform distribution with range [rangeLow, rangeHigh], inclusive. This isn't homework--I'm just using this in some embedded systems tinkering that I'm doing for fun.
In my test cases, this code appears to produce an appropriate distribution. I'm not feeling fully confident that the implementation is correct, though.
Could someone do a sanity check and let me know if I've done anything wrong here?
//uniform_distribution returns an INTEGER in [rangeLow, rangeHigh], inclusive.
int uniform_distribution(int rangeLow, int rangeHigh)
{
int myRand = (int)rand();
int range = rangeHigh - rangeLow + 1; //+1 makes it [rangeLow, rangeHigh], inclusive.
int myRand_scaled = (myRand % range) + rangeLow;
return myRand_scaled;
}
//note: make sure rand() was already initialized using srand()
P.S. I searched for other questions like this. However, it was hard to filter out the small subset of questions that discuss random integers instead of random floating-point numbers.
Let's assume that rand() generates a uniformly-distributed value I in the range [0..RAND_MAX],
and you want to generate a uniformly-distributed value O in the range [L,H].
Suppose I in is the range [0..32767] and O is in the range [0..2].
According to your suggested method, O= I%3. Note that in the given range, there are 10923 numbers for which I%3=0, 10923 number for which I%3=1, but only 10922 number for which I%3=2. Hence your method will not map a value from I into O uniformly.
As another example, suppose O is in the range [0..32766].
According to your suggested method, O=I%32767. Now you'll get O=0 for both I=0 and I=32767. Hence 0 is twice as likely than any other value - your method is again nonuniform.
The suggest way to generate a uniform mapping is as follow:
Calculate the number of bits that are needed to store a random value in the range [L,H]:
unsigned int nRange = (unsigned int)H - (unsigned int)L + 1;
unsigned int nRangeBits= (unsigned int)ceil(log((double(nRange) / log(2.));
Generate nRangeBits random bits
this can be easily implemented by shifting-right the result of rand()
Ensure that the generated number is not greater than H-L.
If it is - repeat step 2.
Now you can map the generated number into O just by adding a L.
On some implementations, rand() did not provide good randomness on its lower order bits, so the modulus operator would not provide very random results. If you find that to be the case, you could try this instead:
int uniform_distribution(int rangeLow, int rangeHigh) {
double myRand = rand()/(1.0 + RAND_MAX);
int range = rangeHigh - rangeLow + 1;
int myRand_scaled = (myRand * range) + rangeLow;
return myRand_scaled;
}
Using rand() this way will produce a bias as noted by Lior. But, the technique is fine if you can find a uniform number generator to calculate myRand. One possible candidate would be drand48(). This will greatly reduce the amount of bias to something that would be very difficult to detect.
However, if you need something cryptographically secure, you should use an algorithm outlined in Lior's answer, assuming your rand() is itself cryptographically secure (the default one is probably not, so you would need to find one). Below is a simplified implementation of what Lior described. Instead of counting bits, we assume the range falls within RAND_MAX, and compute a suitable multiple. Worst case, the algorithm ends up calling the random number generator twice on average per request for a number in the range.
int uniform_distribution_secure(int rangeLow, int rangeHigh) {
int range = rangeHigh - rangeLow + 1;
int secureMax = RAND_MAX - RAND_MAX % range;
int x;
do x = secure_rand(); while (x >= secureMax);
return rangeLow + x % range;
}
I think it is known that rand() is not very good. It just depends on how good of "random" data you need.
http://www.azillionmonkeys.com/qed/random.html
http://www.linuxquestions.org/questions/programming-9/generating-random-numbers-in-c-378358/
http://forums.indiegamer.com/showthread.php?9460-Using-C-rand%28%29-isn-t-as-bad-as-previously-thought
I suppose you could write a test then calculate the chi-squared value to see how good your uniform generator is:
http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test
Depending on your use (don't use this for your online poker shuffler), you might consider a LFSR
http://en.wikipedia.org/wiki/Linear_feedback_shift_register
It may be faster, if you just want some psuedo-random output. Also, supposedly they can be uniform, although I haven't studied the math enough to back up that claim.
A version which corrects the distribution errors (noted by Lior),
involves the high-bits returned by rand() and
only uses integer math (if that's desirable):
int uniform_distribution(int rangeLow, int rangeHigh)
{
int range = rangeHigh - rangeLow + 1; //+1 makes it [rangeLow, rangeHigh], inclusive.
int copies=RAND_MAX/range; // we can fit n-copies of [0...range-1] into RAND_MAX
// Use rejection sampling to avoid distribution errors
int limit=range*copies;
int myRand=-1;
while( myRand<0 || myRand>=limit){
myRand=rand();
}
return myRand/copies+rangeLow; // note that this involves the high-bits
}
//note: make sure rand() was already initialized using srand()
This should work well provided that range is much smaller than RAND_MAX, otherwise
you'll be back to the problem that rand() isn't a good random number generator in terms of its low-bits.

how do we print a number that's greater than 2^32-1 with int and float? (is it even possible?)

how do we print a number that's greater than 2^32-1 with int and float? (is it even possible?)
How does your variable contain a number that is greater than 2^32 - 1? Short answer: It'll probably be a specific data-structure and assorted functions (oh, a class?) that deal with this.
Given this data structure, how do we print it? With BigInteger_Print(BigInteger*) of course :)
Really though, there is no correct answer to this, as printing a number larger than 2^32-1 depends entirely upon how you're storing that number.
More theoretically: suppose you have a very very very large number stored somewhere somehow; if so, I suppose that you are somehow able to do math on that number, otherwise it would be quite pointless storing it.
If you can do math on it, just divide the bignumber by ten (10); store the remainder somewhere. Repeat until the result is smaller than 10. When it's smaller than ten, print it, then print the remainders, from the last to the first. Finish.
You can speed up things by dividing for the largest power of 10 that you are able to print without effort (on 32 bit, 1'000'000'000).
Edit: pseudo code:
#include <stdio.h>
#include <math.h>
#include <math_with_very_very_big_num.h>
int main(int argc, char **argv) {
very_very_big_num bignum = someveryverybignum;
very_very_big_num quot;
int size = (int) floor(vvbn_log10(bignum)) + 1;
char *result = calloc(size, sizeof(char));
int i = 0;
do {
quot = vvbn_divide(bignum, 10);
result[i++] = (char) vvbn_remainder(bignum, 10) + '0';
bignum = quot;
} while (vvbn_greater(bignum, 9));
result[i] = (char) vvbn_to_i(bignum) + '0';
while(i >= 0)
printf("%c", result[i--]);
printf("\n");
}
(I wrote this using long, than translating it with veryverybignum stuff; it worked with long, unluckily I cannot try this version, so please forgive me if I made transation errors...)
If you are talking about int64 types, you can try %I64u, %I64d, %I64x, %llu, %lld
On common hardware, the largest float is (2^128 - 2^104), so if it's smaller than that, you just use %f (or %g or %a) with printf( ).
For int64 types, JustJeff's answer is spot on.
The range of double (%f) extends to nearly 2^1024, which is really quite huge; on Intel hardware, when the long double (%Lf) type corresponds to 80-bit float, the range of that type goes up to 2^16384.
If you need larger numbers than that, you need to use a library (which will likely have its own print routines) or roll your own representation and provide your own printing support.

Resources