I adapted some python code I found here to calculate the sqrt of a number if it exists as an integer using bitwise operations. here is my code.
int ft_sqrt(int nb){
int smallcandidate;
int largecandidate;
if (nb < 0){
return (0);
}else if (nb < 2){
return (nb);
}else{
smallcandidate = ft_sqrt(nb >> 2) << 1;
largecandidate = smallcandidate + 1;
if (largecandidate * largecandidate > nb){
return (smallcandidate);
}
else{
return (largecandidate);
}
}
}
This works for every number i've tested (within the bounds of what an integer can hold), except for 3. Why is this? and how can i fix it?
Sorry, but you had better to use an iterative function, as you see your recursion is final recursion, that can be collapsed to a while loop. Your algorithm is:
#include <stdio.h>
unsigned isqrt(unsigned x)
{
unsigned quot = 1, mean = x; /* isqrt must be between these two */
/* we begin with extreme numbers and for each pair of (quot,mean),
* the first, below the square root, and the other above, we get
* mean value of the two (lesser than previous) and the
* quotient (above the prev. value, but still less than the
* square root, so closer to it) to get a better approach */
while (quot < mean) {
mean = (mean + quot) >> 1;
quot = x / mean;
}
/* quot is always <= mean so finally it should be the same,
* we can return quot or mean, indistinctly. */
return mean;
}
int main() /* main test function, eliminate to use the above. */
{
unsigned n;
while (scanf("%u", &n) == 1) {
printf("isqrt(%u) ==> %u\n", n, isqrt(n));
}
}
EDIT
This algorithm is based on the fact that the geometric mean is always closer to 1 than the arithmetic mean. So we take two approximations (the source number and 1, as their geometric mean is the square root) then we calculate their arithmetic mean (so the value obtained is between both, and so, closer to the geometric mean) then we divide the original number by the arithmetic mean so both aproximations multiply to the original data (and their geometric mean is, again, the square root). As, in each loop the arithmetic mean is closer to the geometric mean, so must be the quotient (and so the quotient to the geometric mean), leading to two numbers that are closer to the square root. We continue the algorithm until both numbers are equal (a / sqrt(a) = sqrt(a), and (sqrt(a) + sqrt(a))/2 = sqrt(a)) or, due to rounding errors, they cross over. ---this happens with integers---
Related
I need to calculate the entropy and due to the limitations of my system I need to use restricted C features (no loops, no floating point support) and I need as much precision as possible. From here I figure out how to estimate the floor log2 of an integer using bitwise operations. Nevertheless, I need to increase the precision of the results. Since no floating point operations are allowed, is there any way to calculate log2(x/y) with x < y so that the result would be something like log2(x/y)*10000, aiming at getting the precision I need through arithmetic integer?
You will base an algorithm on the formula
log2(x/y) = K*(-log(x/y));
where
K = -1.0/log(2.0); // you can precompute this constant before run-time
a = (y-x)/y;
-log(x/y) = a + a^2/2 + a^3/3 + a^4/4 + a^5/5 + ...
If you write the loop correctly—or, if you prefer, unroll the loop to code the same sequence of operations looplessly—then you can handle everything in integer operations:
(y^N*(1*2*3*4*5*...*N)) * (-log(x/y))
= y^(N-1)*(2*3*4*5*...*N)*(y-x) + y^(N-2)*(1*3*4*5*...*N)*(y-x)^2 + ...
Of course, ^, the power operator, binding tighter than *, is not a C operator, but you can implement that efficiently in the context of your (perhaps unrolled) loop as a running product.
The N is an integer large enough to afford desired precision but not so large that it overruns the number of bits you have available. If unsure, then try N = 6 for instance. Regarding K, you might object that that is a floating-point number, but this is not a problem for you because you are going to precompute K, storing it as a ratio of integers.
SAMPLE CODE
This is a toy code but it works for small values of x and y such as 5 and 7, thus sufficing to prove the concept. In the toy code, larger values can silently overflow the default 64-bit registers. More work would be needed to make the code robust.
#include <stddef.h>
#include <stdlib.h>
// Your program will not need the below headers, which are here
// included only for comparison and demonstration.
#include <math.h>
#include <stdio.h>
const size_t N = 6;
const long long Ky = 1 << 10; // denominator of K
// Your code should define a precomputed value for Kx here.
int main(const int argc, const char *const *const argv)
{
// Your program won't include the following library calls but this
// does not matter. You can instead precompute the value of Kx and
// hard-code its value above with Ky.
const long long Kx = lrintl((-1.0/log(2.0))*Ky); // numerator of K
printf("K == %lld/%lld\n", Kx, Ky);
if (argc != 3) exit(1);
// Read x and y from the command line.
const long long x0 = atoll(argv[1]);
const long long y = atoll(argv[2]);
printf("x/y == %lld/%lld\n", x0, y);
if (x0 <= 0 || y <= 0 || x0 > y) exit(1);
// If 2*x <= y, then, to improve accuracy, double x repeatedly
// until 2*x > y. Each doubling offsets the log2 by 1. The offset
// is to be recovered later.
long long x = x0;
int integral_part_of_log2 = 0;
while (1) {
const long long trial_x = x << 1;
if (trial_x > y) break;
x = trial_x;
--integral_part_of_log2;
}
printf("integral_part_of_log2 == %d\n", integral_part_of_log2);
// Calculate the denominator of -log(x/y).
long long yy = 1;
for (size_t j = N; j; --j) yy *= j*y;
// Calculate the numerator of -log(x/y).
long long xx = 0;
{
const long long y_minus_x = y - x;
for (size_t i = N; i; --i) {
long long term = 1;
size_t j = N;
for (; j > i; --j) {
term *= j*y;
}
term *= y_minus_x;
--j;
for (; j; --j) {
term *= j*y_minus_x;
}
xx += term;
}
}
// Convert log to log2.
xx *= Kx;
yy *= Ky;
// Restore the aforementioned offset.
for (; integral_part_of_log2; ++integral_part_of_log2) xx -= yy;
printf("log2(%lld/%lld) == %lld/%lld\n", x0, y, xx, yy);
printf("in floating point, this ratio of integers works out to %g\n",
(1.0*xx)/(1.0*yy));
printf("the CPU's floating-point unit computes the log2 to be %g\n",
log2((1.0*x0)/(1.0*y)));
return 0;
}
Running this on my machine with command-line arguments of 5 7, it outputs:
K == -1477/1024
x/y == 5/7
integral_part_of_log2 == 0
log2(5/7) == -42093223872/86740254720
in floating point, this ratio of integers works out to -0.485279
the CPU's floating-point unit computes the log2 to be -0.485427
Accuracy would be substantially improved by N = 12 and Ky = 1 << 20, but for that you need either thriftier code or more than 64 bits.
THRIFTIER CODE
Thriftier code, wanting more effort to write, might represent numerator and denominator in prime factors. For example, it might represent 500 as [2 0 3], meaning (22)(30)(53).
Yet further improvements might occur to your imagination.
AN ALTERNATE APPROACH
For an alternate approach, though it might not meet your requirements precisely as you have stated them, #phuclv has given the suggestion I would be inclined to follow if your program were mine: work the problem in reverse, guessing a value c/d for the logarithm and then computing 2^(c/d), presumably via a Newton-Raphson iteration. Personally, I like the Newton-Raphson approach better. See sect. 4.8 here (my original).
MATHEMATICAL BACKGROUND
Several sources including mine already linked explain the Taylor series underlying the first approach and the Newton-Raphson iteration of the second approach. The mathematics unfortunately is nontrivial, but there you have it. Good luck.
I tried using the nCm function to find all combinations but for large numbers it fails
int fact(int num)
{
if (num == 1 || num == 0)
return 1;
return num * fact(num-1);
}
int nCm(int num, int base)
{
int result;
return result = fact(num) / (fact(num - base)*fact(base));
}
where base = 3 and num can be anything so for large num it fails. I cannot use bigInteger library so please help
If you consider that division for a moment, you'll see that the (n-b)! term is common to both numerator and denominator (i.e. they cancel out).
You just need to think of n! as:
n * (n-1) * (n-2) * ... * (n-b+1) * (n-b)!
Now you can calculate the result without any division or large intermediate values (which could overflow), and you can also do it without recursion.
I'm trying to calculate the the taylor series of cos(x) with error at most 10^-3 and for all x ∈ [-pi/4, pi/4], that means my error needs to be less than 0.001. I can modify the x +=in the for loop to have different result. I tried several numbers but it never turns to an error less than 0.001.
#include <stdio.h>
#include <math.h>
float cosine(float x, int j)
{
float val = 1;
for (int k = j - 1; k >= 0; --k)
val = 1 - x*x/(2*k+2)/(2*k+1)*val;
return val;
}
int main( void )
{
for( double x = 0; x <= PI/4; x += 0.9999 )
{
if(cosine(x, 2) <= 0.001)
{
printf("cos(x) : %10g %10g %10g\n", x, cos(x), cosine(x, 2));
}
printf("cos(x) : %10g %10g %10g\n", x, cos(x), cosine(x, 2));
}
return 0;
}
I'm also doing this for e^x too. For this part, x must in [-2,2] .
float exponential(int n, float x)
{
float sum = 1.0f; // initialize sum of series
for (int i = n - 1; i > 0; --i )
sum = 1 + x * sum / i;
return sum;
}
int main( void )
{
// change the number of x in for loop so you can have different range
for( float x = -2.0f; x <= 2.0f; x += 1.587 )
{
// change the frist parameter to have different n value
if(exponential(5, x) <= 0.001)
{
printf("e^x = %f\n", exponential(5, x));
}
printf("e^x = %f\n", exponential(5, x));
}
return 0;
}
But whenever I changed the number of terms in the for loop, it always have an error that is greater than 1. How am I suppose to change it to have errors less than 10^-3?
Thanks!
My understanding is that to increase precision, you would need to consider more terms in the Taylor series. For example, consider what happens when
you attempt to calculate e(1) by a Taylor series.
$e(x) = \sum\limits_{n=0}^{\infty} frac{x^n}{n!}$
we can consider the first few terms in the expansion of e(1):
n value of nth term sum
0 x^0/0! = 1 1
1 x^1/1! = 1 2
2 x^2/2! = 0.5 2.5
3 x^3/3! = 0.16667 2.66667
4 x^4/4! = 0.04167 2.70834
You should notice two things, first that as we add more terms we are getting closer to the exact value of e(1), also that the difference between consecutive sums are getting smaller.
So, an implementation of e(x) could be written as:
#include <stdbool.h>
#include <stdio.h>
#include <math.h>
typedef float (*term)(int, int);
float evalSum(int, int, int, term);
float expTerm(int, int);
int fact(int);
int mypow(int, int);
bool sgn(float);
const int maxTerm = 10; // number of terms to evaluate in series
const float epsilon = 0.001; // the accepted error
int main(void)
{
// change these values to modify the range and increment
float start = -2;
float end = 2;
float inc = 1;
for(int x = start; x <= end; x += inc)
{
float value = 0;
float prev = 0;
for(int ndx = 0; ndx < maxTerm; ndx++)
{
value = evalSum(0, ndx, x, expTerm);
float diff = fabs(value-prev);
if((sgn(value) && sgn(prev)) && (diff < epsilon))
break;
else
prev = value;
}
printf("the approximate value of exp(%d) is %f\n", x, value);
}
return 0;
}
I've used as a guess that we will not need to use more then ten terms in the expansion to get to the desired precision, thus the inner for loop is where we loop over values of n in the range [0,10].
Also, we have several lines dedicated to checking if we reach the required precision. First I calculate the absolute value of the difference between the current evaluation and the previous evaluation, and take the absolute difference. Checking if the difference is less than our epsilon value (1E-3) is on of the criteria to exit the loop early. I also needed to check that the sign of of the current and the previous values were the same due to some fluctuation in calculating the value of e(-1), that is what the first clause in the conditional is doing.
float evalSum(int start, int end, int val, term fnct)
{
float sum = 0;
for(int n = start; n <= end; n++)
{
sum += fnct(n, val);
}
return sum;
}
This is a utility function that I wrote to evaluate the first n-terms of a series. start is the starting value (which is this code always 0), and end is the ending value. The final parameter is a pointer to a function that represents how to calculate a given term. In this code, fnct can be a pointer to any function that takes to integer parameters and returns a float.
float expTerm(int n, int x)
{
return (float)mypow(x,n)/(float)fact(n);
}
Buried down in this one-line function is where most of the work happens. This function represents the closed form of a Taylor expansion for e(n). Looking carefully at the above, you should be able to see that we are calculating $\fract{x^n}{n!}$ for a given value of x and n. As a hint, for doing the cosine part you would need to create a function to evaluate the closed for a term in the Taylor expansion of cos. This is given by $(-1)^n\fact{x^{2n}}{(2n)!}$.
int fact(int n)
{
if(0 == n)
return 1; // by defination
else if(1 == n)
return 1;
else
return n*fact(n-1);
}
This is just a standard implementation of the factorial function. Nothing special to see here.
int mypow(int base, int exp)
{
int result = 1;
while(exp)
{
if(exp&1) // b&1 quick check for odd power
{
result *= base;
}
exp >>=1; // exp >>= 1 quick division by 2
base *= base;
}
return result;
}
A custom function for doing exponentiation. We certainly could have used the version from <math.h>, but because I knew we would only be doing integer powers we could write an optimized version. Hint: in doing cosine you probably will need to use the version from <math.h> to work with floating point bases.
bool sgn(float x)
{
if(x < 0) return false;
else return true;
}
An incredibly simple function to determine the sign of a floating point value, returning true is positive and false otherwise.
This code was compiled on my Ubuntu-14.04 using gcc version 4.8.4:
******#crossbow:~/personal/projects$ gcc -std=c99 -pedantic -Wall series.c -o series
******#crossbow:~/personal/projects$ ./series
the approximate value of exp(-2) is 0.135097
the approximate value of exp(-1) is 0.367857
the approximate value of exp(0) is 1.000000
the approximate value of exp(1) is 2.718254
the approximate value of exp(2) is 7.388713
The expected values, as given by using bc are:
******#crossbow:~$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
e(-2)
.13533528323661269189
e(-1)
.36787944117144232159
e(0)
1.00000000000000000000
e(1)
2.71828182845904523536
e(2)
7.38905609893065022723
As you can see, the values are well within the tolerances that you requests. I leave it as an exercise to do the cosine part.
Hope this helps,
-T
exp and cos have power series that converge everywhere on the real line. For any bounded interval, e.g. [-pi/4, pi/4] or [-2, 2], the power series converge not just pointwise, but uniformly to exp and cos.
Pointwise convergence means that for any x in the region, and any epsilon > 0, you can pick a large enough N so that the approximation you get from the first N terms of the taylor series is within epsilon of the true value. However, with pointwise convergence, the N may be small for some x's and large for others, and since there are infinitely many x's there may be no finite N that accommodates them all. For some functions that really is what happens sometimes.
Uniform convergence means that for any epsilon > 0, you can pick a large enough N so that the approximation is within epsilon for EVERY x in the region. That's the kind of approximation that you are looking for, and you are guaranteed that that's the kind of convergence that you have.
In principle you could look at one of the proofs that exp, cos are uniformly convergent on any finite domain, sit down and say "what if we take epsilon = .001, and the regions to be ...", and compute some finite bound on N using a pen and paper. However most of these proofs will use at some steps some estimates that aren't sharp, so the value of N that you compute will be larger than necessary -- maybe a lot larger. It would be simpler to just implement it for N being a variable, then check the values using a for-loop like you did in your code, and see how large you have to make it so that the error is less than .001 everywhere.
So, I can't tell what the right value of N you need to pick is, but the math guarantees that if you keep trying larger values eventually you will find one that works.
I've got a program that calculates the approximation of an arcsin value based on Taylor's series.
My friend and I have come up with an algorithm which has been able to return the almost "right" values, but I don't think we've done it very crisply. Take a look:
double my_asin(double x)
{
double a = 0;
int i = 0;
double sum = 0;
a = x;
for(i = 1; i < 23500; i++)
{
sum += a;
a = next(a, x, i);
}
}
double next(double a, double x, int i)
{
return a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2)));
}
I checked if my_pow works correctly so there's no need for me to post it here as well. Basically I want the loop to end once the difference between the current and next term is more or equal to my EPSILON (0.00001), which is the precision I'm using when calculating a square root.
This is how I would like it to work:
while(my_abs(prev_term - next_term) >= EPSILON)
But the function double next is dependent on i, so I guess I'd have to increment it in the while statement too. Any ideas how I should go about doing this?
Example output for -1:
$ -1.5675516116e+00
Instead of:
$ -1.5707963268e+00
Thanks so much guys.
Issues with your code and question include:
Your image file showing the Taylor series for arcsin has two errors: There is a minus sign on the x5 term instead of a plus sign, and the power of x is shown as xn but should be x2n+1.
The x factor in the terms of the Taylor series for arcsin increases by x2 in each term, but your formula a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2))) divides by x2 in each term. This does not matter for the particular value -1 you ask about, but it will produce wrong results for other values, except 1.
You ask how to end the loop once the difference in terms is “more or equal to” your epsilon, but, for most values of x, you actually want less than (or, conversely, you want to continue, not end, while the difference is greater than or equal to, as you show in code).
The Taylor series is a poor way to evaluate functions because its error increases as you get farther from the point around which the series is centered. Most math library implementations of functions like this use a minimax series or something related to it.
Evaluating the series from low-order terms to high-order terms causes you to add larger values first, then smaller values later. Due to the nature of floating-point arithmetic, this means that accuracy from the smaller terms is lost, because it is “pushed out” of the width of the floating-point format by the larger values. This effect will limit how accurate any result can be.
Finally, to get directly to your question, the way you have structured the code, you directly update a, so you never have both the previous term and the next term at the same time. Instead, create another double b so that you have an object b for a previous term and an object a for the current term, as shown below.
Example:
double a = x, b, sum = a;
int i = 0;
do
{
b = a;
a = next(a, x, ++i);
sum += a;
} while (abs(b-a) > threshold);
using Taylor series for arcsin is extremly imprecise as the stuff converge very badly and there will be relatively big differencies to the real stuff for finite number of therms. Also using pow with integer exponents is not very precise and efficient.
However using arctan for this is OK
arcsin(x) = arctan(x/sqrt(1-(x*x)));
as its Taylor series converges OK on the <0.0,0.8> range all the other parts of the range can be computed through it (using trigonometric identities). So here my C++ implementation (from my arithmetics template):
T atan (const T &x) // = atan(x)
{
bool _shift=false;
bool _invert=false;
bool _negative=false;
T z,dz,x1,x2,a,b; int i;
x1=x; if (x1<0.0) { _negative=true; x1=-x1; }
if (x1>1.0) { _invert=true; x1=1.0/x1; }
if (x1>0.7) { _shift=true; b=::sqrt(3.0)/3.0; x1=(x1-b)/(1.0+(x1*b)); }
x2=x1*x1;
for (z=x1,a=x1,b=1,i=1;i<1000;i++) // if x1>0.8 convergence is slow
{
a*=x2; b+=2; dz=a/b; z-=dz;
a*=x2; b+=2; dz=a/b; z+=dz;
if (::abs(dz)<zero) break;
}
if (_shift) z+=pi/6.0;
if (_invert) z=0.5*pi-z;
if (_negative) z=-z;
return z;
}
T asin (const T &x) // = asin(x)
{
if (x<=-1.0) return -0.5*pi;
if (x>=+1.0) return +0.5*pi;
return ::atan(x/::sqrt(1.0-(x*x)));
}
Where T is any floating point type (float,double,...). As you can see you need sqrt(x), pi=3.141592653589793238462643383279502884197169399375105, zero=1e-20 and +,-,*,/ operations implemented. The zero constant is the target precision.
So just replace T with float/double and ignore the :: ...
so I guess I'd have to increment it in the while statement too
Yes, this might be a way. And what stops you?
int i=0;
while(condition){
//do something
i++;
}
Another way would be using the for condition:
for(i = 1; i < 23500 && my_abs(prev_term - next_term) >= EPSILON; i++)
Your formula is wrong. Here is the correct formula: http://scipp.ucsc.edu/~haber/ph116A/taylor11.pdf.
P.S. also note that your formula and your series are not correspond to each other.
You can use while like this:
while( std::abs(sum_prev - sum) < 1e-15 )
{
sum_prev = sum;
sum += a;
a = next(a, x, i);
}
I was looking at another question (here) where someone was looking for a way to get the square root of a 64 bit integer in x86 assembly.
This turns out to be very simple. The solution is to convert to a floating point number, calculate the sqrt and then convert back.
I need to do something very similar in C however when I look into equivalents I'm getting a little stuck. I can only find a sqrt function which takes in doubles. Doubles do not have the precision to store large 64bit integers without introducing significant rounding error.
Is there a common math library that I can use which has a long double sqrt function?
There is no need for long double; the square root can be calculated with double (if it is IEEE-754 64-bit binary). The rounding error in converting a 64-bit integer to double is nearly irrelevant in this problem.
The rounding error is at most one part in 253. This causes an error in the square root of at most one part in 254. The sqrt itself has a rounding error of less than one part in 253, due to rounding the mathematical result to the double format. The sum of these errors is tiny; the largest possible square root of a 64-bit integer (rounded to 53 bits) is 232, so an error of three parts in 254 is less than .00000072.
For a uint64_t x, consider sqrt(x). We know this value is within .00000072 of the exact square root of x, but we do not know its direction. If we adjust it to sqrt(x) - 0x1p-20, then we know we have a value that is less than, but very close to, the square root of x.
Then this code calculates the square root of x, truncated to an integer, provided the operations conform to IEEE 754:
uint64_t y = sqrt(x) - 0x1p-20;
if (2*y < x - y*y)
++y;
(2*y < x - y*y is equivalent to (y+1)*(y+1) <= x except that it avoids wrapping the 64-bit integer if y+1 is 232.)
Function sqrtl(), taking a long double, is part of C99.
Note that your compilation platform does not have to implement long double as 80-bit extended-precision. It is only required to be as wide as double, and Visual Studio implements is as a plain double. GCC and Clang do compile long double to 80-bit extended-precision on Intel processors.
Yes, the standard library has sqrtl() (since C99).
If you only want to calculate sqrt for integers, using divide and conquer should find the result in max 32 iterations:
uint64_t mysqrt (uint64_t a)
{
uint64_t min=0;
//uint64_t max=1<<32;
uint64_t max=((uint64_t) 1) << 32; //chux' bugfix
while(1)
{
if (max <= 1 + min)
return min;
uint64_t sqt = min + (max - min)/2;
uint64_t sq = sqt*sqt;
if (sq == a)
return sqt;
if (sq > a)
max = sqt;
else
min = sqt;
}
Debugging is left as exercise for the reader.
Here we collect several observations in order to arrive to a solution:
In standard C >= 1999, it is garanted that non-netative integers have a representation in bits as one would expected for any base-2 number.
----> Hence, we can trust in bit manipulation of this type of numbers.
If x is a unsigned integer type, tnen x >> 1 == x / 2 and x << 1 == x * 2.
(!) But: It is very probable that bit operations shall be done faster than their arithmetical counterparts.
sqrt(x) is mathematically equivalent to exp(log(x)/2.0).
If we consider truncated logarithms and base-2 exponential for integers, we could obtain a fair estimate: IntExp2( IntLog2(x) / 2) "==" IntSqrtDn(x), where "=" is informal notation meaning almost equatl to (in the sense of a good approximation).
If we write IntExp2( IntLog2(x) / 2 + 1) "==" IntSqrtUp(x), we obtain an "above" approximation for the integer square root.
The approximations obtained in (4.) and (5.) are a little rough (they enclose the true value of sqrt(x) between two consecutive powers of 2), but they could be a very well starting point for any algorithm that searchs for the square roor of x.
The Newton algorithm for square root could be work well for integers, if we have a good first approximation to the real solution.
http://en.wikipedia.org/wiki/Integer_square_root
The final algorithm needs some mathematical comprobations to be plenty sure that always work properly, but I will not do it right now... I will show you the final program, instead:
#include <stdio.h> /* For printf()... */
#include <stdint.h> /* For uintmax_t... */
#include <math.h> /* For sqrt() .... */
int IntLog2(uintmax_t n) {
if (n == 0) return -1; /* Error */
int L;
for (L = 0; n >>= 1; L++)
;
return L; /* It takes < 64 steps for long long */
}
uintmax_t IntExp2(int n) {
if (n < 0)
return 0; /* Error */
uintmax_t E;
for (E = 1; n-- > 0; E <<= 1)
;
return E; /* It takes < 64 steps for long long */
}
uintmax_t IntSqrtDn(uintmax_t n) { return IntExp2(IntLog2(n) / 2); }
uintmax_t IntSqrtUp(uintmax_t n) { return IntExp2(IntLog2(n) / 2 + 1); }
int main(void) {
uintmax_t N = 947612934; /* Try here your number! */
uintmax_t sqrtn = IntSqrtDn(N), /* 1st approx. to sqrt(N) by below */
sqrtn0 = IntSqrtUp(N); /* 1st approx. to sqrt(N) by above */
/* The following means while( abs(sqrt-sqrt0) > 1) { stuff... } */
/* However, we take care of subtractions on unsigned arithmetic, just in case... */
while ( (sqrtn > sqrtn0 + 1) || (sqrtn0 > sqrtn+1) )
sqrtn0 = sqrtn, sqrtn = (sqrtn0 + N/sqrtn0) / 2; /* Newton iteration */
printf("N==%llu, sqrt(N)==%g, IntSqrtDn(N)==%llu, IntSqrtUp(N)==%llu, sqrtn==%llu, sqrtn*sqrtn==%llu\n\n",
N, sqrt(N), IntSqrtDn(N), IntSqrtUp(N), sqrtn, sqrtn*sqrtn);
return 0;
}
The last value stored in sqrtn is the integer square root of N.
The last line of the program just shows all the values, with comprobation purposes.
So, you can try different values of Nand see what happens.
If we add a counter inside the while-loop, we'll see that no more than a few iterations happen.
Remark: It is necessary to verify that the condition abs(sqrtn-sqrtn0)<=1 is always achieved when working in the integer-number setting. If not, we shall have to fix the algorithm.
Remark2: In the initialization sentences, observe that sqrtn0 == sqrtn * 2 == sqrtn << 1. This avoids us some calculations.
// sqrt_i64 returns the integer square root of v.
int64_t sqrt_i64(int64_t v) {
uint64_t q = 0, b = 1, r = v;
for( b <<= 62; b > 0 && b > r; b >>= 2);
while( b > 0 ) {
uint64_t t = q + b;
q >>= 1;
if( r >= t ) {
r -= t;
q += b;
}
b >>= 2;
}
return q;
}
The for loop may be optimized by using the clz machine code instruction.