Issue with Square Root in C Algorithm - c

I have a bit of code that finds a point on a unit sphere. Recall, for a unit sphere:
1 = sqrt( x^2 + y^2 + z^2 )
The algorithm picks two random points (the x and y coordinates) between zero and one. Provided their magnitude is less than one we have room to define a third coordinate by solving the above equation for z.
void pointOnSphere(double *point){
double x, y;
do {
x = 2*randf() - 1;
y = 2*randf() - 1;
} while (x*x + y*y > 1);
double mag = sqrt(fabs(1 - x*x - y*y));
point[0] = 2*(x*mag);
point[1] = 2*(y*mag);
point[2] = 1 - 2*(mag*mag);
}
Technically, I inherited this code. The previous owner compiled using -Ofast which "Disregards strict standards compliance". TL;DR it means your code doesn't need to follow strict IEEE standards. So when I tried to compile without optimization I ran into an error.
undefined reference to `sqrt'
What are IEEE standards? Well, because computers can't store floating point numbers to infinite precision, rounding errors pop up during certain calculations if you're not careful.
After some googling I ran into this question which got me on the right track about using proper IEEE stuff. I even read this article about floating point numbers (which I recommend). Unfortunately it didn't answer my questions.
I'd like to use sqrt() in my function as opposed to something like Newton Iteration. I understand the issue in my algorithm probably comes from the fact I could potentially (even though not really) pass a negative number to the sqrt() function. I'm just not quite sure how to remedy the issue. Thanks for all the help!
Oh, and if it's relevant I'm using a Mersenne Twister number generator.
Just to clarify, I am linking libm with -lm! I have also confirmed it is pointing to the correct library.

As for the undefined reference to sqrt you need to link with libm, usually with -lm or similar option.
Also note that
Provided their magnitude is less than one we have room to define a third coordinate by solving the above equation for z.
is wrong. The x and y must satisfy x * x + y * y <= 1 in order for there to be a solution for z.

I'd use spherical coordinates
theta = randf()*M_PI;
phi = randf()*2*M_PI;
r = 1.0;
x = r*sin(theta)*cos(phi);
y = r*sin(theta)*sin(phi);
z = r*cos(theta);

To insure the points meet a condition, test for the condition itself as part of the while loop, rather than a derivation of the condition.
// functions like `sqrt(), hypot()` benefit with declaration before use
// and without it may generate "undefined reference to `sqrt'"
// Some functions like `sqrt()` are understood and optimized out by a smart compiler.
// Still, best to always declare them.
#include <math.h>
void pointOnSphere(double *point){
double x, y, z;
do {
x = 2*randf() - 1;
y = 2*randf() - 1;
double zz = 1.0 - hypot(x,y);
if (zz < 0.) continue; // On rare negative values due to imprecision
z = sqrt(zz);
if (rand()%2) z = -z; // Flip z half the time
} while (x*x + y*y + z*z > 1); // Must meet this condition
point[0] = x;
point[1] = y;
point[2] = z;
}

Related

Underflow error in floating point arithmetic in C

I am new to C, and my task is to create a function
f(x) = sqrt[(x^2)+1]-1
that can handle very large numbers and very small numbers. I am submitting my script on an online interface that checks my answers.
For very large numbers I simplify the expression to:
f(x) = x-1
By just using the highest power. This was the correct answer.
The same logic does not work for smaller numbers. For small numbers (on the order of 1e-7), they are very quickly truncated to zero, even before they are squared. I suspect that this has to do with floating point precision in C. In my textbook, it says that the float type has smallest possible value of 1.17549e-38, with 6 digit precision. So although 1e-7 is much larger than 1.17e-38, it has a higher precision, and is therefore rounded to zero. This is my guess, correct me if I'm wrong.
As a solution, I am thinking that I should convert x to a long double when x < 1e-6. However when I do this, I still get the same error. Any ideas? Let me know if I can clarify. Code below:
#include <math.h>
#include <stdio.h>
double feval(double x) {
/* Insert your code here */
if (x > 1e299)
{;
return x-1;
}
if (x < 1e-6)
{
long double g;
g = x;
printf("x = %Lf\n", g);
long double a;
a = pow(x,2);
printf("x squared = %Lf\n", a);
return sqrt(g*g+1.)- 1.;
}
else
{
printf("x = %f\n", x);
printf("Used third \n");
return sqrt(pow(x,2)+1.)-1;
}
}
int main(void)
{
double x;
printf("Input: ");
scanf("%lf", &x);
double b;
b = feval(x);
printf("%f\n", b);
return 0;
}
For small inputs, you're getting truncation error when you do 1+x^2. If x=1e-7f, x*x will happily fit into a 32 bit floating point number (with a little bit of error due to the fact that 1e-7 does not have an exact floating point representation, but x*x will be so much smaller than 1 that floating point precision will not be sufficient to represent 1+x*x.
It would be more appropriate to do a Taylor expansion of sqrt(1+x^2), which to lowest order would be
sqrt(1+x^2) = 1 + 0.5*x^2 + O(x^4)
Then, you could write your result as
sqrt(1+x^2)-1 = 0.5*x^2 + O(x^4),
avoiding the scenario where you add a very small number to 1.
As a side note, you should not use pow for integer powers. For x^2, you should just do x*x. Arbitrary integer powers are a little trickier to do efficiently; the GNU scientific library for example has a function for efficiently computing arbitrary integer powers.
There are two issues here when implementing this in the naive way: Overflow or underflow in intermediate computation when computing x * x, and substractive cancellation during final subtraction of 1. The second issue is an accuracy issue.
ISO C has a standard math function hypot (x, y) that performs the computation sqrt (x * x + y * y) accurately while avoiding underflow and overflow in intermediate computation. A common approach to fix issues with subtractive cancellation is to transform the computation algebraically such that it is transformed into multiplications and / or divisions.
Combining these two fixes leads to the following implementation for float argument. It has an error of less than 3 ulps across all possible inputs according to my testing.
/* Compute sqrt(x*x+1)-1 accurately and without spurious overflow or underflow */
float func (float x)
{
return (x / (1.0f + hypotf (x, 1.0f))) * x;
}
A trick that is often useful in these cases is based on the identity
(a+1)*(a-1) = a*a-1
In this case
sqrt(x*x+1)-1 = (sqrt(x*x+1)-1)*(sqrt(x*x+1)+1)
/(sqrt(x*x+1)+1)
= (x*x+1-1) / (sqrt(x*x+1)+1)
= x*x/(sqrt(x*x+1)+1)
The last formula can be used as an implementation. For vwry small x sqrt(x*x+1)+1 will be close to 2 (for small enough x it will be 2) but we don;t loose precision in evaluating it.
The problem isn't with running into the minimum value, but with the precision.
As you said yourself, float on your machine has about 7 digits of precision. So let's take x = 1e-7, so that x^2 = 1e-14. That's still well within the range of float, no problems there. But now add 1. The exact answer would be 1.00000000000001. But if we only have 7 digits of precision, this gets rounded to 1.0000000, i.e. exactly 1. So you end up computing sqrt(1.0)-1 which is exactly 0.
One approach would be to use the linear approximation of sqrt around x=1 that sqrt(x) ~ 1+0.5*(x-1). That would lead to the approximation f(x) ~ 0.5*x^2.

Why does this code fail for these weird numbers?

I wrote a function to find the cube root of a number a using the Newton-Raphson method to find the root of the function f(x) = x^3 - a.
#include <stdio.h>
#include <math.h>
double cube_root(double a)
{
double x = a;
double y;
int equality = 0;
if(x == 0)
{
return(x);
}
else
{
while(equality == 0)
{
y = (2 * x * x * x + a) / (3 * x * x);
if(y == x)
{
equality = 1;
}
x = y;
}
return(x);
}
}
f(x) for a = 20 (blue) and a = -20 (red) http://graphsketch.com/?eqn1_color=1&eqn1_eqn=x*x*x%20-%2020&eqn2_color=2&eqn2_eqn=x*x*x%20%2B%2020&eqn3_color=3&eqn3_eqn=&eqn4_color=4&eqn4_eqn=&eqn5_color=5&eqn5_eqn=&eqn6_color=6&eqn6_eqn=&x_min=-8&x_max=8&y_min=-75&y_max=75&x_tick=1&y_tick=1&x_label_freq=5&y_label_freq=5&do_grid=0&bold_labeled_lines=0&line_width=4&image_w=850&image_h=525
The code seemed to be working well, for example it calculates the cube root of 338947578237847893823789474.324623784 just fine, but weirdly fails for some numbers for example 4783748237482394? The code just seems to go into an infinite loop and must be manually terminated.
Can anyone explain why the code should fail on this number? I've included the graph to show that, using the starting value of a, this method should always keep providing closer and closer estimates until the two values are equal to working precision. So I don't really get what's special about this number.
Apart from posting an incorrect formula...
You are performing floating point arithmetic, and floating point arithmetic has rounding errors. Even with the rounding errors, you will get very very close to a cube root, but you won't get exactly there (usually cube roots are irrational, and floating point numbers are rational).
Once your x is very close to the cube root, when you calculate y, you should get the same result as x, but because of rounding errors, you may get something very close to x but slightly different instead. So x != y. Then you do the same calculation starting with y, and you may get x as the result. So your result will forever switch between two values.
You can do the same thing with three numbers x, y and z and quit when either z == y or z == x. This is much more likely to stop, and with a bit of mathematics you might even be able to proof that it will always stop.
Better to calculate the change in x, and determine whether that change is small enough so that the next step will not change x except for rounding errors.
shouldn't it be:
y = x - (2 * x * x * x + a) / (3 * x * x);
?

Computing fractional exponents in C

I'm trying to evaluate a^n, where a and n are rational numbers.
I don't want to use any predefined functions like sqrt() or pow()
So I'm trying to use Newton's Method to get an approximate solution using this approach:
3^0.2 = 3^(1/5) , so if x = 3^0.2, x^5 = 3.
Probably the best way to solve that (without a calculator but still
using the basic arithmetic operations) is to use "Newton's method".
Newton's method for solving the equation f(x)= 0 is to set up a
sequence of numbers xn defined by taking x0 as some initial "guess"
and then xn+1= xn- f(xn/f '(xn) where f '(x) is the derivative of f.
Posted on physicsforums
The problem with that method is that if I want to compute 5.2^0.33333, I'll need to find the roots for this equation x^10000 - 5.2^33333 = 0. I end up with huge numbers, and get inf and nan errors most of the time.
Can someone give me advice on how to solve this problem? Or, can someone provide another algorithm to compute a^n?
It seems your task is to calculate
⎛ xN ⎞(aN / aD)
⎜⎼⎼⎼⎼⎟ where xN,xD,aN,aD ∈ ℤ, xD,aD ≠ 0
⎝ xD ⎠
using only multiplications, divisions, additions, and subtractions, with Newton's method as the suggested method to implement.
The equation we're trying to solve (for y) is
(aN / aD)
y = (xN / xD) where y ∈ ℝ
Newton's method finds a root of a function. If we want to use it to solve the above, we substract the right side from the left side, to get a function whose zero gives us the y we want:
(aN/aD)
f(y) = y - (xN/xD) = 0
Not much help. I guess this is as far as you got? The point here is to not form that function just yet, because we don't have a way to calculate a rational power of a rational number!
First, let's decide that aD and xD are both positive. We can do that simply by negating both aN and aD if aD was negative (so sign of aN/aD does not change), and negating both xN and xD if xD was negative. Remember, by definition neither xD or aD is zero. Then, we can simply raise both sides to the aD'th power:
aD aN aN aN
y = (xN / xD) = xN / xD
We can even eliminate the division by multiplying both sides by the last term:
aD aN aN
y × xD = xN
Now, this looks quite promising! The function we get from this is
aD aN aN
f(y) = y xD - xN
Newton's method also requires the derivative, which is obviously
f(y) aD aN
⎼⎼⎼⎼ = df(y) = y xD y / aD
dy
Newton's method itself relies on iterating
f(y)
y = y - ⎼⎼⎼⎼⎼⎼
i+1 i df(y)
If you work out the math, you'll find that the iteration is just
aD
y[i] y[i] xN
y[i+1] = y[i] - ⎼⎼⎼⎼ + ⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼⎼
aD aD aN
aD y[i] xD
You don't need to keep all the y values in memory; it is enough to remember the last one, and stop iterating when their difference is small enough.
You do still have exponentiation above, but now they are integer exponentiation only, i.e.
aD
xN = xN × xN × .. × xN
╰───────┬───────╯
aD times
which you can do very simply, for example just by multiplying the argument by itself the desired number of times, e.g. in C,
double ipow(const double base, const int exponent)
{
double result = 1.0;
int i;
for (i = 0; i < exponent; i++)
result *= base;
return result;
}
There are more efficient methods to do integer exponentiation, but the above function should be perfectly acceptable for this.
The final problem is to pick the initial y so that you get convergence. You cannot use 0, because (a power of) y is used as a denominator in the division; you'd get division by zero error. Personally, I'd check whether the result ought to be positive or negative, and smaller than or greater than one in magnitude; two rules overall to pick a safe initial y.
Questions?
You can use the generalized binomial theorem. Substitute y=1 and x=a-1. You would want to truncate the infinite series after enough terms, based on the desired accuracy. To be able to link number of terms to accuracy, you would need to ensure that the x^r terms are decreasing in absolute value. So, depending on the value of a and n, you should apply the formula to compute one of a^n and a^(-n) and use that to get your desired result.
A solution for raising an integer number to a power is:
int poweri (int x, unsigned int y)
{
int temp;
if (y == 0)
return 1;
temp = poweri (x, y / 2);
if ((y % 2) == 0)
return temp * temp;
else
return x * temp * temp;
}
However, the square root doesn't provide as clean of a closed solution. There is a good bit of background to be found at wikipedia-square root and at Wolfram Mathworks Square Root Algorithms Both provide several methods that will meet your needs, you just have to choose the one that fits your purpose.
With slight modification, this routine from wikipedia (modified to return the square root and refine accuracy) returns a surprisingly accurate square root. Yes, there will be howls about the use of a union, and it is only valid where integer and float storage are equivalent, but if you are hacking your own square root, this is relatively efficient:
float sqrt_f (float x)
{
float xhalf = 0.5f*x;
union
{
float x;
int i;
} u;
u.x = x;
u.i = 0x5f3759df - (u.i >> 1);
/* The next line can be repeated any number of times to increase accuracy */
// u.x = u.x * (1.5f - xhalf * u.x * u.x);
int i = 10;
while (i--)
u.x *= 1.5f - xhalf * u.x * u.x;
return 1.0f / u.x;
}

Approximation of arcsin in C

I've got a program that calculates the approximation of an arcsin value based on Taylor's series.
My friend and I have come up with an algorithm which has been able to return the almost "right" values, but I don't think we've done it very crisply. Take a look:
double my_asin(double x)
{
double a = 0;
int i = 0;
double sum = 0;
a = x;
for(i = 1; i < 23500; i++)
{
sum += a;
a = next(a, x, i);
}
}
double next(double a, double x, int i)
{
return a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2)));
}
I checked if my_pow works correctly so there's no need for me to post it here as well. Basically I want the loop to end once the difference between the current and next term is more or equal to my EPSILON (0.00001), which is the precision I'm using when calculating a square root.
This is how I would like it to work:
while(my_abs(prev_term - next_term) >= EPSILON)
But the function double next is dependent on i, so I guess I'd have to increment it in the while statement too. Any ideas how I should go about doing this?
Example output for -1:
$ -1.5675516116e+00
Instead of:
$ -1.5707963268e+00
Thanks so much guys.
Issues with your code and question include:
Your image file showing the Taylor series for arcsin has two errors: There is a minus sign on the x5 term instead of a plus sign, and the power of x is shown as xn but should be x2n+1.
The x factor in the terms of the Taylor series for arcsin increases by x2 in each term, but your formula a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2))) divides by x2 in each term. This does not matter for the particular value -1 you ask about, but it will produce wrong results for other values, except 1.
You ask how to end the loop once the difference in terms is “more or equal to” your epsilon, but, for most values of x, you actually want less than (or, conversely, you want to continue, not end, while the difference is greater than or equal to, as you show in code).
The Taylor series is a poor way to evaluate functions because its error increases as you get farther from the point around which the series is centered. Most math library implementations of functions like this use a minimax series or something related to it.
Evaluating the series from low-order terms to high-order terms causes you to add larger values first, then smaller values later. Due to the nature of floating-point arithmetic, this means that accuracy from the smaller terms is lost, because it is “pushed out” of the width of the floating-point format by the larger values. This effect will limit how accurate any result can be.
Finally, to get directly to your question, the way you have structured the code, you directly update a, so you never have both the previous term and the next term at the same time. Instead, create another double b so that you have an object b for a previous term and an object a for the current term, as shown below.
Example:
double a = x, b, sum = a;
int i = 0;
do
{
b = a;
a = next(a, x, ++i);
sum += a;
} while (abs(b-a) > threshold);
using Taylor series for arcsin is extremly imprecise as the stuff converge very badly and there will be relatively big differencies to the real stuff for finite number of therms. Also using pow with integer exponents is not very precise and efficient.
However using arctan for this is OK
arcsin(x) = arctan(x/sqrt(1-(x*x)));
as its Taylor series converges OK on the <0.0,0.8> range all the other parts of the range can be computed through it (using trigonometric identities). So here my C++ implementation (from my arithmetics template):
T atan (const T &x) // = atan(x)
{
bool _shift=false;
bool _invert=false;
bool _negative=false;
T z,dz,x1,x2,a,b; int i;
x1=x; if (x1<0.0) { _negative=true; x1=-x1; }
if (x1>1.0) { _invert=true; x1=1.0/x1; }
if (x1>0.7) { _shift=true; b=::sqrt(3.0)/3.0; x1=(x1-b)/(1.0+(x1*b)); }
x2=x1*x1;
for (z=x1,a=x1,b=1,i=1;i<1000;i++) // if x1>0.8 convergence is slow
{
a*=x2; b+=2; dz=a/b; z-=dz;
a*=x2; b+=2; dz=a/b; z+=dz;
if (::abs(dz)<zero) break;
}
if (_shift) z+=pi/6.0;
if (_invert) z=0.5*pi-z;
if (_negative) z=-z;
return z;
}
T asin (const T &x) // = asin(x)
{
if (x<=-1.0) return -0.5*pi;
if (x>=+1.0) return +0.5*pi;
return ::atan(x/::sqrt(1.0-(x*x)));
}
Where T is any floating point type (float,double,...). As you can see you need sqrt(x), pi=3.141592653589793238462643383279502884197169399375105, zero=1e-20 and +,-,*,/ operations implemented. The zero constant is the target precision.
So just replace T with float/double and ignore the :: ...
so I guess I'd have to increment it in the while statement too
Yes, this might be a way. And what stops you?
int i=0;
while(condition){
//do something
i++;
}
Another way would be using the for condition:
for(i = 1; i < 23500 && my_abs(prev_term - next_term) >= EPSILON; i++)
Your formula is wrong. Here is the correct formula: http://scipp.ucsc.edu/~haber/ph116A/taylor11.pdf.
P.S. also note that your formula and your series are not correspond to each other.
You can use while like this:
while( std::abs(sum_prev - sum) < 1e-15 )
{
sum_prev = sum;
sum += a;
a = next(a, x, i);
}

Floating-point division - bias to avoid a result less than an 'exact' value

I am currently tightening floating-point numerics for an estimate of a value. (It's: p(k,t) for those who are interested.) Essentially, the utility can never yield an under-estimate of this value: the security of probable prime generation depends on a numerically robust implementation. While output results agree with the published values, I have used the DBL_EPSILON value to ensure that division, in particular, yields a result that is never less than the true value:
Consider: double x, y; /* assigned some values... */
The evaluation: r = x / y; occurs frequently, but these (finite precision) results may truncate significant digits from the true result - a possibly infinite precision rational expansion. I currently try to mitigate this by applying a bias to the numerator, i.e.,
r = ((1.0 + DBL_EPSILON) * x) / y;
If you know anything about this subject, p(k,t) is typically much smaller than most estimates - but it's simply not good enough to dismiss the issue with this "observation". I can of course state:
(((1.0 + DBL_EPSILON) * x) / y) >= (x / y)
Of course, I need to ensure that the 'biased' result is greater than, or equal to, the 'exact' value. While I am certain it has to do with manipulating or scaling DBL_EPSILON, I obviously want the 'biased' result to exceed the 'exact' result by a minimum - demonstrable under IEEE-754 arithmetic assumptions.
Yes, I've looked though Goldberg's paper, and I've searched for a robust solution. Please don't suggest manipulation of rounding modes. Ideally, I'm after an answer by someone with a very good grasp on floating-point theorems, or knows of a very well illustrated example.
EDIT: To clarify, (((1.0 + DBL_EPSILON) * x) / y) or a form (((1.0 + c) * x) / y), is not a prerequisite. This was simply an approach I was using as 'probably good enough', without having provided a solid basis for it. I can state that the numerator and denominator will not be special values: NaNs, Infs, etc., nor will the denominator be zero.
First: I know that you don't want to set the rounding mode, but it really should be said that
in terms of precision, as others have noted, setting the rounding mode will produce as good of an answer as possible. Specifically, assuming that x and y are both positive (which seems to be the case, but hasn't been explicitly stated in your question), the following is a standard C snippet with the desired effect[1]:
#include <math.h>
#pragma STDC FENV_ACCESS on
int OldRoundingMode = fegetround();
fesetround(FE_UPWARD);
r = x/y;
fesetround(OldRoundingMode);
Now, that aside, there are legitimate reasons not to want to change the rounding mode (some platforms don't support round-to-plus-infinity, on some platforms changing the rounding mode introduces a large serializing stall, etc etc), and your desire not to do so shouldn't be brushed aside so casually. So, respecting your question, what else can we do?
If your platform supports fused multiply-add, there's a very elegant solution available to you:
#include <math.h>
r = x/y;
if (fma(r,y,-x) < 0) r = nextafter(r, INFINITY);
On platforms with hardware fma support, this is very efficient. Even if fma( ) is implemented in software, it may be acceptable. This approach has the virtue that it will deliver the same result as would changing the rounding mode; that is, the tightest bound possible.
If your platform's C library is antediluvian and does not provide fma, there is still hope. Your claimed statement is correct (assuming no denormal values, at least -- I would need to think more about what happens for denormals); (1.0+DBL_EPSILON)*x/y really is always greater than or equal to the infinitely precise x/y. It will sometimes be one ulp larger than the smallest value with this property, but that's a very small and probably acceptable margin. The proof of these claims is pretty fussy, and probably not suitable for StackOverflow, but I'll give a quick sketch:
Ignoring denormals, it suffices to restrict ourselves to x, y in [1.0, 2.0).
(1.0 + eps)*x >= x + eps > x. To see this, observe:
(1.0 + eps)*x = x + x*eps >= x + eps > x.
Let P be the mathematically precise x/y. We have:
(1.0 + eps)*x/y >= (x + eps)/y = x/y + eps/y = P + eps/y
Now, y is bounded above by 2, so this gives us:
(1.0 + eps)*x/y > P + eps/2
which is sufficient to guarantee that the result rounds to a value >= P. This also shows us the way to a tighter bound. We could instead use nextafter(x,INFINITY)/y to get the desired effect with a tighter bound in many cases. (nextafter(x,INFINITY) is always x + ulp, whereas (1.0 + eps)*x will be x + 2ulp half of the time. If you want to avoid calling the nextafter library function, you can use (x + (0.75*DBL_EPSILON)*x) instead to get the same result, under the working assumption of positive normal values).
In order to be really pedantically correct, this would become significantly more complicated. No one really writes code like this, but it would be along these lines:
#include <math.h>
#pragma STDC FENV_ACCESS on
#if defined FE_UPWARD
int OldRoundingMode = fegetround();
if (OldRoundingMode < 0) goto Error;
if (fesetround(FE_UPWARD)) goto Error;
r = x/y;
if (fesetround(OldRoundingMode)) goto TrulyHosed;
return r;
TrulyHosed:
// we established the desired rounding mode and did our computation,
// but now we can't set it back to the original mode. I have no idea
// how you handle this gracefully.
Error:
#else
// we can't establish the desired rounding mode, so fall back on
// something else.

Resources