Generating random number within Cuda kernel in a varying range - c

I am trying to generate random number random numbers within the cuda kernel. I wish to generate the random numbers from uniform distribution and in the integer form, starting from 1 up to 8. The random numbers would be different for each of the threads. The range up to which random number can be generated would also vary from one thread to another. The maximum of the range in one thread might be as low as 2 or in the other thread it can be high as 8, but not higher than that. So, I am providing an example below of how I want the numbers to get generated :
In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2
In thread#2 --> maximum of the range is 6 and so the random number should be between 1 and 6
In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5
and so on...

EDIT: I've edited my answer to fix some of the deficiencies pointed out in the other answers (#tudorturcu) and comments.
Use CURAND to generate a uniform
distribution
between 0.0 and 1.0. Note: 1.0 is included and 0.0 is excluded
Then multiply this by the desired range (largest value - smallest
value + 0.999999).
Then add the offset (+ smallest value).
Then truncate to an integer.
Something like this in your device code:
int idx = threadIdx.x+blockDim.x*blockIdx.x;
// assume have already set up curand and generated state for each thread...
// assume ranges vary by thread index
float myrandf = curand_uniform(&(my_curandstate[idx]));
myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999);
myrandf += min_rand_int[idx];
int myrand = (int)truncf(myrandf);
You should:
#include <math.h>
for truncf
Here's a fully worked example:
$ cat t527.cu
#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>
#include <math.h>
#include <assert.h>
#define MIN 2
#define MAX 7
#define ITER 10000000
__global__ void setup_kernel(curandState *state){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
curand_init(1234, idx, 0, &state[idx]);
}
__global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int, unsigned int *result){
int idx = threadIdx.x + blockDim.x*blockIdx.x;
int count = 0;
while (count < n){
float myrandf = curand_uniform(my_curandstate+idx);
myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999);
myrandf += min_rand_int[idx];
int myrand = (int)truncf(myrandf);
assert(myrand <= max_rand_int[idx]);
assert(myrand >= min_rand_int[idx]);
result[myrand-min_rand_int[idx]]++;
count++;}
}
int main(){
curandState *d_state;
cudaMalloc(&d_state, sizeof(curandState));
unsigned *d_result, *h_result;
unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int;
cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned));
h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned));
cudaMalloc(&d_max_rand_int, sizeof(unsigned));
h_max_rand_int = (unsigned *)malloc(sizeof(unsigned));
cudaMalloc(&d_min_rand_int, sizeof(unsigned));
h_min_rand_int = (unsigned *)malloc(sizeof(unsigned));
cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned));
setup_kernel<<<1,1>>>(d_state);
*h_max_rand_int = MAX;
*h_min_rand_int = MIN;
cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result);
cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost);
printf("Bin: Count: \n");
for (int i = MIN; i <= MAX; i++)
printf("%d %d\n", i, h_result[i-MIN]);
return 0;
}
$ nvcc -arch=sm_20 -o t527 t527.cu -lcurand
$ cuda-memcheck ./t527
========= CUDA-MEMCHECK
Bin: Count:
2 1665496
3 1668130
4 1667644
5 1667435
6 1665026
7 1666269
========= ERROR SUMMARY: 0 errors
$

#Robert's example doesn't generate a perfectly uniform distribution (although all the numbers in the range are generated and all the generated numbers are in the range). Both the smallest and largest value have 0.5 the probability of being chosen of the rest of the numbers in the range.
At step 2, you should multiply with the number of values in the range: (largest value - smallest value + 0.999999). *
At step 3, the offset should be (+ smallest value) instead of (+ smallest value + 0.5).
Steps 1 and 4 remain the same.
*As #Kamil Czerski noted, 1.0 is included in the distribution. Adding 1.0 instead of 0.99999 would sometimes result in a number outside of the desired range.

For a safer general purpose random integer function using curand_uniform() that can handle larger integers:
#include <math.h>
int rand = (int)(ceil((curand_uniform(&state)*(RANGE + 1))) - 1);
Multiple your float by RANGE + 1 then take the ceiling, subtract by 1, and cast as an integer. Taking the ceiling produces a whole number between 1 and RANGE + 1 so when we subtract by one we get an integer between 0 and RANGE.
Addition discussion:
If 0.0 were included in curand_uniform() and 1.0 were not then,
(int)((curand_uniform(&state)*(RANGE + 1)));
would produce an integer between 0 and RANGE. We are safe truncating to an integer because RANGE + 1 is not a possible result. We are also happy because the distribution includes our entire range.
Since 0.0 is excluded and 1.0 included then all possible results need to be shifted down by some amount to truncate to an integer safely. This is accomplished by adding .999999 to RANGE and multiplying.
(int)((curand_uniform(&state)*(RANGE + .999999)))
The solution is not perfect however because not all possible values between 0 and RANGE are represented (not considering 0 or RANGE). This produces a slight bias against the greatest integer in our range.
The greatest offset according to IEEE 754 Floating Point is .999999940395355224609375 as this would be the largest decimal less than one before the computer rounds up. The problem with using this value is that the computer will start rounding up for values greater than 1 when the decimal part exceeds approximately .999999. In fact, our offset must shrink in proportion to the value of our integer because the integer part takes up more space in memory. For integers greater than 10000000 you would have to amend the solution since virtually all decimal parts will round up.

Related

Given a range find the sum of number that is divisible by 3 or 5

The below code that works perfectly fine for smaller digits, But Time dilation for greater digits
given me the suggestion
#include<stdio.h>
int main()
{
int num;
int sum=0;
scanf("%d",&num);
for(int i=1;i<=num;i++)
{
if(i%3==0 || i%5==0)
sum += i;
}
printf("%d",sum);
}
Need efficient code for this
Try to reduce the time take for the code.
The answer can be computed with simple arithmetic without any iteration. Many Project Euler questions are intended to make you think about clever ways to find solutions without just using the raw power of computers to chug through calculations. (This was Project Euler question 1, except the Project Euler problem specifies the limit using less than instead of less than or equal to.)
Given positive integers N and F, the number of positive multiples of F that are less than or equal to N is ⌊N/F⌋. (⌊x⌋ is the greatest integer not greater than x.) For example, the number of multiples of 5 less than or equal to 999 is ⌊999/5⌋ = ⌊199.8⌋ = 199.
Let n be this number of multiples, ⌊N/F⌋.
The first multiple is F and the last multiple is n•F. For example, with 1000 and 5, the first multiple is 5 and the last multiple is 200•5 = 1000.
The multiples are evenly spaced, so the average of all of them equals the average of the first and the last, so it is (F + nF)/2.
The total of the multiples equals their average multiplied by the number of them, so the total of the multiples of F less than N is n • (F + n•F)/2.
Adding the sum of multiples of 3 and the sum of multiples of 5 includes the multiples of both 3 and 5 twice. We can correct for this by subtracting the sum of those numbers. Multiples of both 3 and 5 are multiples of 15.
Thus, we can compute the requested sum using simple arithmetic without any iteration:
#include <stdio.h>
static long SumOfMultiples(long N, long F)
{
long NumberOfMultiples = N / F;
long FirstMultiple = F;
long LastMultiple = NumberOfMultiples * F;
return NumberOfMultiples * (FirstMultiple + LastMultiple) / 2;
}
int main(void)
{
long N = 1000;
long Sum = SumOfMultiples(N, 3) + SumOfMultiples(N, 5) - SumOfMultiples(N, 3*5);
printf("%ld\n", Sum);
}
As you do other Project Euler questions, you should look for similar ideas.

How to compute the digits of an irrational number one by one?

I want to read digit by digit the decimals of the sqrt of 5 in C.
The square root of 5 is 2,23606797749979..., so this'd be the expected output:
2
3
6
0
6
7
9
7
7
...
I've found the following code:
#include<stdio.h>
void main()
{
int number;
float temp, sqrt;
printf("Provide the number: \n");
scanf("%d", &number);
// store the half of the given number e.g from 256 => 128
sqrt = number / 2;
temp = 0;
// Iterate until sqrt is different of temp, that is updated on the loop
while(sqrt != temp){
// initially 0, is updated with the initial value of 128
// (on second iteration = 65)
// and so on
temp = sqrt;
// Then, replace values (256 / 128 + 128 ) / 2 = 65
// (on second iteration 34.46923076923077)
// and so on
sqrt = ( number/temp + temp) / 2;
}
printf("The square root of '%d' is '%f'", number, sqrt);
}
But this approach stores the result in a float variable, and I don't want to depend on the limits of the float types, as I would like to extract like 10,000 digits, for instance. I also tried to use the native sqrt() function and casting it to string number using this method, but I faced the same issue.
What you've asked about is a very hard problem, and whether it's even possible to do "one by one" (i.e. without working space requirement that scales with how far out you want to go) depends on both the particular irrational number and the base you want it represented in. For example, in 1995 when a formula for pi was discovered that allows computing the nth binary digit in O(1) space, this was a really big deal. It was not something people expected to be possible.
If you're willing to accept O(n) space, then some cases like the one you mentioned are fairly easy. For example, if you have the first n digits of the square root of a number as a decimal string, you can simply try appending each digit 0 to 9, then squaring the string with long multiplication (same as you learned in grade school), and choosing the last one that doesn't overshoot. Of course this is very slow, but it's simple. The easy way to make it a lot faster (but still asymptotically just as bad) is using an arbitrary-precision math library in place of strings. Doing significantly better requires more advanced approaches and in general may not be possible.
As already noted, you need to change the algorithm into a digit-by-digit one (there are some examples in the Wikipedia page about the methods of computing of the square roots) and use an arbitrary precision arithmetic library to perform the calculations (for instance, GMP).
In the following snippet I implemented the before mentioned algorithm, using GMP (but not the square root function that the library provides). Instead of calculating one decimal digit at a time, this implementation uses a larger base, the greatest multiple of 10 that fits inside an unsigned long, so that it can produce 9 or 18 decimal digits at every iteration.
It also uses an adapted Newton method to find the actual "digit".
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <gmp.h>
unsigned long max_ul(unsigned long a, unsigned long b)
{
return a < b ? b : a;
}
int main(int argc, char *argv[])
{
// The GMP functions accept 'unsigned long int' values as parameters.
// The algorithm implemented here can work with bases other than 10,
// so that it can evaluate more than one decimal digit at a time.
const unsigned long base = sizeof(unsigned long) > 4
? 1000000000000000000
: 1000000000;
const unsigned long decimals_per_digit = sizeof(unsigned long) > 4 ? 18 : 9;
// Extract the number to be square rooted and the desired number of decimal
// digits from the command line arguments. Fallback to 0 in case of errors.
const unsigned long number = argc > 1 ? atoi(argv[1]) : 0;
const unsigned long n_digits = argc > 2 ? atoi(argv[2]) : 0;
// All the variables used by GMP need to be properly initialized before use.
// 'c' is basically the remainder, initially set to the original number
mpz_t c;
mpz_init_set_ui(c, number);
// At every iteration, the algorithm "move to the left" by two "digits"
// the reminder, so it multplies it by base^2.
mpz_t base_squared;
mpz_init_set_ui(base_squared, base);
mpz_mul(base_squared, base_squared, base_squared);
// 'p' stores the digits of the root found so far. The others are helper variables
mpz_t p;
mpz_init_set_ui(p, 0UL);
mpz_t y;
mpz_init(y);
mpz_t yy;
mpz_init(yy);
mpz_t dy;
mpz_init(dy);
mpz_t dx;
mpz_init(dx);
mpz_t pp;
mpz_init(pp);
// Timing, for testing porpuses
clock_t start = clock(), diff;
unsigned long x_max = number;
// Each "digit" correspond to some decimal digits
for (unsigned long i = 0,
last = (n_digits + decimals_per_digit) / decimals_per_digit + 1UL;
i < last; ++i)
{
// Find the greatest x such that: x * (2 * base * p + x) <= c
// where x is in [0, base), using a specialized Newton method
// pp = 2 * base * p
mpz_mul_ui(pp, p, 2UL * base);
unsigned long x = x_max;
for (;;)
{
// y = x * (pp + x)
mpz_add_ui(yy, pp, x);
mpz_mul_ui(y, yy, x);
// dy = y - c
mpz_sub(dy, y, c);
// If y <= c we have found the correct x
if ( mpz_sgn(dy) <= 0 )
break;
// Newton's step: dx = dy/y' where y' = 2 * x + pp
mpz_add_ui(yy, yy, x);
mpz_tdiv_q(dx, dy, yy);
// Update x even if dx == 0 (last iteration)
x -= max_ul(mpz_get_si(dx), 1);
}
x_max = base - 1;
// The actual format of the printed "digits" is up to you
if (i % 4 == 0)
{
if (i == 0)
printf("%lu.", x);
putchar('\n');
}
else
printf("%018lu", x);
// p = base * p + x
mpz_mul_ui(p, p, base);
mpz_add_ui(p, p, x);
// c = (c - y) * base^2
mpz_sub(c, c, y);
mpz_mul(c, c, base_squared);
}
diff = clock() - start;
long int msec = diff * 1000L / CLOCKS_PER_SEC;
printf("\n\nTime taken: %ld.%03ld s\n", msec / 1000, msec % 1000);
// Final cleanup
mpz_clear(c);
mpz_clear(base_squared);
mpz_clear(p);
mpz_clear(pp);
mpz_clear(dx);
mpz_clear(y);
mpz_clear(dy);
mpz_clear(yy);
}
You can see the outputted digits here.
Your title says:
How to compute the digits of an irrational number one by one?
Irrational numbers are not limited to most square roots. They also include numbers of the form log(x), exp(z), sin(y), etc. (transcendental numbers). However, there are some important factors that determine whether or how fast you can compute a given irrational number's digits one by one (that is, from left to right).
Not all irrational numbers are computable; that is, no one has found a way to approximate them to any desired length (whether by a closed form expression, a series, or otherwise).
There are many ways numbers can be expressed, such as by their binary or decimal expansions, as continued fractions, as series, etc. And there are different algorithms to compute a given number's digits depending on the representation.
Some formulas compute a given number's digits in a particular base (such as base 2), not in an arbitrary base.
For example, besides the first formula to extract the digits of π without computing the previous digits, there are other formulas of this type (known as BBP-type formulas) that extract the digits of certain irrational numbers. However, these formulas only work for a particular base, not all BBP-type formulas have a formal proof, and most importantly, not all irrational numbers have a BBP-type formula (essentially, only certain log and arctan constants do, not numbers of the form exp(x) or sqrt(x)).
On the other hand, if you can express an irrational number as a continued fraction (which all real numbers have), you can extract its digits from left to right, and in any base desired, using a specific algorithm. What is more, this algorithm works for any real number constant, including square roots, exponentials (e and exp(x)), logarithms, etc., as long as you know how to express it as a continued fraction. For an implementation see "Digits of pi and Python generators". See also Code to Generate e one Digit at a Time.

How to generate mxn matrix with randomly generated 0 and 1 with probability in C

I wrote C program that defined a 2D matrix with m rows and n columns with random numbers (Either 0 or 1). The code is as following:
int i,j;
int original_matrix[m][n];
for (i=0; i<=m-1; i++){
for (j=0; j<=n-1; j++){
original_matrix[i][j] = rand() % 2;
}
}
It worked. For the next step, I want to create the matrix with a probability. For example, 1 is written into a cell with probability p and 0 is written with probability 1-p. Could you please share any ideas on this if you have?
Since rand() gives you a value between 0 and RAND_MAX, you can get a value at particular perentage simply by choosing an appropriate threshold. For example, if RAND_MAX was 999, 42% of all values would be expected to be less than 420.
So you can use code like in the following complete program, to set up an appropriate threshold and test the distribution of your values:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
int main(int argc, char *argv[]) {
// Get threshold (defaults to ~PI%), seed random numbers.
double percent = (argc > 1) ? atof(argv[1]) : .0314159;
int threshold = round(RAND_MAX * percent);
srand(time(0));
// Work out distribution (millions of samples).
int below = 0, total = 0;
for (int i = 0 ; i < 1000000; ++i) {
++total;
if (rand() < threshold) ++below;
}
// Output stats.
printf("Using probability of %f, below was %d / %d, %f%%\n",
percent, below, total, below * 100.0 / total);
}
Some sample runs, with varying desired probabilities:
Using probability of 0.031416, below was 31276 / 1000000, 3.127600%
Using probability of 0.031416, below was 31521 / 1000000, 3.152100%
Using probability of 0.421230, below was 420936 / 1000000, 42.093600%
Using probability of 0.421230, below was 421634 / 1000000, 42.163400%
Using probability of 0.175550, below was 175441 / 1000000, 17.544100%
Using probability of 0.175550, below was 176031 / 1000000, 17.603100%
Using probability of 0.980000, below was 979851 / 1000000, 97.985100%
Using probability of 0.980000, below was 980032 / 1000000, 98.003200%
Using probability of 0.000000, below was 0 / 1000000, 0.000000%
Using probability of 1.000000, below was 1000000 / 1000000, 100.000000%
So, the bottom line is: to acheive your desire of one having a probabilty p (a double value) and zero having the probability 1 - p, you need the following:
srand(time(0)); // done once, seed generator.
int threshold = round(RAND_MAX * p); // done once.
int oneOrZero = (rand() < threshold) ? 1 : 0; // done for each cell.
Just keep in mind the limits of rand(), the difference between (for example) probabilities 0.0000000000 and 0.0000000001 will most likely be non-existent, unless RAND_MAX is large enough to make a difference. I doubt you'll be using probabilities that fine but I thought I'd better mention it just in case.
rand() % 2 gives you a probability of 0.5.
p is a float, so you'll look at How to generate random float number in C to generate a random value in a real range. The top answer gives us: float x = (float)rand()/(float)(RAND_MAX/a);
We want a equal to 1 for probabilities. So, to get 0 with a probability of p, the formula is:
int zeroWithAProbabilityOfP = (float)rand()/(float)RAND_MAX <= p;
Which can be also be written:
int zeroWithAProbabilityOfP = rand() <= p * RAND_MAX;
ps: if available, for precision reasons, you should favor arc4random() or arc4random_buf() instead of rand():
rand() precision is 1 / 0x7FFFFFFF (on macOS)
arc4random() precision is 1 / 0xFFFFFFFF (so twice better)
In that case, formula would be:
int zeroWithAProbabilityOfP = arc4random() <= p * UINT32_MAX;

Find minimum element for the calculated decimal into the power of 2

There is a given arrayInt a = {3,2,0,0,1,4,5,6,0}
Now the every elements of array in the power of 2 as follows
2^3 + 2^2 + 2^0 + 2^0 + 2^1 + 2^4 + 2^5 + 2^6 + 2^0 = 129
Find the minimum elements required to the power of 2 so that we get same decimal value.
129= need to know the algorithm/function to find minimum element into the power of 2 so that we get the same decimal value which was calculated from the given array into the power of 2.
Please let me know the function I tried it but not able to figure out as follows
**Edited **
I want the minimum elements into the array to the power of 2. Which gives the same value.whihc was calculated from given input array.
**what I want the binary representation of calculated decimal like as follows
2^7+2^0 => 128+1 => 129 so in a[] ={0,7}, two elemnts
is the possible solution. As I want minimum no's into the power of 2.
There might be a solution like
2^6+2^1+2^0= 129 so in a[] ={0,1,6}, three elements
but I want minimum as possible**
I take the every element of the array into the power of 2. and calculated the decimal. But not able to find the minimum elements to calculate the same decimal into the power of 2.
Just represent result in binary, get position of ones in the binary representation
129dec = 10000001bin
so result is positions of set bits {0,7}
Note that you don't need binary representation itself - just extract bits positions from number
IntResult = Sum Of Given Powers //here we get value like 129
SetBitList = {} //list/array for bit positions
i = 0
while (IntResult) do
if (IntResult && 1) //if least-significant bit is non-zero
SetBitList.Add(i)
IntResult = IntResult >> 1 //shift right
i = i + 1 //increment position
If you cannot sum incase of an overflow you can use a priority queue to solve it
basically take the 2 smallest numbers present in the priority queue currently(0 and 0) now since 2^0 + 2^0 = 2^1 we merge this and add it to the priority queue, incase the numbers were not equal we add the smaller bit to the answer and add the larger number back to the priority queue, we do this till we have exactly 1 number left in the priority queue
Note that my priority queue is a min heap
#include <iostream>
#include <cstdio>
#include <algorithm>
#include <queue>
#include <vector>
using namespace std;
class priortise{
public:
bool operator()(const int &x, const int &y){
return x>y;
}
};
int main() {
// your code goes here
int a[] = {3,2,0,0,1,4,5,6,0},ans = 0;
//priority_queue<int, vector<int>, priortise> pq;
priority_queue<int> pq;
for(int i=0;i<9;i++){
pq.push(a[i]);
}
while(!pq.empty()){
int u = pq.top();
pq.pop();
if(pq.empty()){
ans++;
break;
}
int v = pq.top();
pq.pop();
if(u==v){
pq.push(u+1);
}
else{
ans++;
pq.push(v);
}
}
printf("%d\n", ans);
return 0;
}
It is possible to reduce the problem to exponants all distincts. In fact if two powers of 2 have the same exponent k, then 2^k + 2^k = 2.2^k = 2^(k+1) and so on...
Then according to uniqueness of base-2 representation of an integer, we can prove that the base-2 representation gives the minimum elements (all distincts).

How to sum large numbers?

I am trying to calculate 1 + 1 * 2 + 1 * 2 * 3 + 1 * 2 * 3 * 4 + ... + 1 * 2 * ... * n where n is the user input.
It works for values of n up to 12. I want to calculate the sum for n = 13, n = 14 and n = 15. How do I do that in C89? As I know, I can use unsigned long long int only in C99 or C11.
Input 13, result 2455009817, expected 6749977113
Input 14, result 3733955097, expected 93928268313
Input 15, result 1443297817, expected 1401602636313
My code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned long int n;
unsigned long int P = 1;
int i;
unsigned long int sum = 0;
scanf("%lu", &n);
for(i = 1; i <= n; i++)
{
P *= i;
sum += P;
}
printf("%lu", sum);
return 0;
}
In practice, you want some arbitrary precision arithmetic (a.k.a. bigint or bignum) library. My recommendation is GMPlib but there are other ones.
Don't try to code your own bignum library. Efficient & clever algorithms exist, but they are unintuitive and difficult to grasp (you can find entire books devoted to that question). In addition, existing libraries like GMPlib are taking advantage of specific machine instructions (e.g. ADC -add with carry) that a standard C compiler won't emit (from pure C code).
If this is a homework and you are not allowed to use external code, consider for example representing a number in base or radix 1000000000 (one billion) and code yourself the operations in a very naive way, similar to what you have learned as a kid. But be aware that more efficient algorithms exist (and that real bignum libraries are using them).
A number could be represented in base 1000000000 by having an array of unsigned, each being a "digit" of base 1000000000. So you need to manage arrays (probably heap allocated, using malloc) and their length.
You could use a double, especially if your platform uses IEEE754.
Such a double gives you 53 bits of precision, which means integers are exact up to the 53rd power of 2. That's good enough for this case.
If your platform doesn't use IEEE754 then consult the documentation on the floating point scheme adopted. It might be adequate.
A simple approach when you're just over the limit of MaxInt, is to do the computations modulo 10^n for a suitable n and you do the same computation as floating point computation but where you divide everything by 10^r.The former result will give you the first n digits while the latter result will give you the last digits of the answer with the first r digits removed. Then the last few digits here will be inaccurate due to roundoff errors, so you should choose r a bit smaller than n. In this case taking n = 9 and r = 5 will work well.

Resources