Why does this code fail for these weird numbers? - c

I wrote a function to find the cube root of a number a using the Newton-Raphson method to find the root of the function f(x) = x^3 - a.
#include <stdio.h>
#include <math.h>
double cube_root(double a)
{
double x = a;
double y;
int equality = 0;
if(x == 0)
{
return(x);
}
else
{
while(equality == 0)
{
y = (2 * x * x * x + a) / (3 * x * x);
if(y == x)
{
equality = 1;
}
x = y;
}
return(x);
}
}
f(x) for a = 20 (blue) and a = -20 (red) http://graphsketch.com/?eqn1_color=1&eqn1_eqn=x*x*x%20-%2020&eqn2_color=2&eqn2_eqn=x*x*x%20%2B%2020&eqn3_color=3&eqn3_eqn=&eqn4_color=4&eqn4_eqn=&eqn5_color=5&eqn5_eqn=&eqn6_color=6&eqn6_eqn=&x_min=-8&x_max=8&y_min=-75&y_max=75&x_tick=1&y_tick=1&x_label_freq=5&y_label_freq=5&do_grid=0&bold_labeled_lines=0&line_width=4&image_w=850&image_h=525
The code seemed to be working well, for example it calculates the cube root of 338947578237847893823789474.324623784 just fine, but weirdly fails for some numbers for example 4783748237482394? The code just seems to go into an infinite loop and must be manually terminated.
Can anyone explain why the code should fail on this number? I've included the graph to show that, using the starting value of a, this method should always keep providing closer and closer estimates until the two values are equal to working precision. So I don't really get what's special about this number.

Apart from posting an incorrect formula...
You are performing floating point arithmetic, and floating point arithmetic has rounding errors. Even with the rounding errors, you will get very very close to a cube root, but you won't get exactly there (usually cube roots are irrational, and floating point numbers are rational).
Once your x is very close to the cube root, when you calculate y, you should get the same result as x, but because of rounding errors, you may get something very close to x but slightly different instead. So x != y. Then you do the same calculation starting with y, and you may get x as the result. So your result will forever switch between two values.
You can do the same thing with three numbers x, y and z and quit when either z == y or z == x. This is much more likely to stop, and with a bit of mathematics you might even be able to proof that it will always stop.
Better to calculate the change in x, and determine whether that change is small enough so that the next step will not change x except for rounding errors.

shouldn't it be:
y = x - (2 * x * x * x + a) / (3 * x * x);
?

Related

getting the original values from the concatenated numbers in c

Referring to the below question,
How to concatenate two integers in C
unsigned concatenate(unsigned x, unsigned y) {
unsigned pow = 10;
while(y >= pow)
pow *= 10;
return x * pow + y;
}
How do I get the original values for x and y without using arrays? I need less processing overhead.
How do I get the original values for x and y without using arrays?
Information is lost. To reconstruct x,y addtional infomation is needed.
Example 1234 could have beeen formed by
x y
123 4
12 34
1 234
0 1234
Aside: Improved answer on the original concatenation problem.
You pass two arguments to concatenate().
If you pass two arguments to unconcatenate() you can get your numbers back:
void unconcatenate(unsigned *first, unsigned *second, unsigned concatenated, unsigned limit) {
unsigned pow = 10;
while (pow < limit) pow *= 10;
*first = concatenated / pow;
*second = concatenated % pow;
}
unsigned x, y, foo = concatenate(1234, 56);
unconcatenate(&x, &y, foo, 100);
printf("%u unconcatenaded to %u and %u.\n", foo, x, y);
see https://ideone.com/bg7qMd
I haven't commented on it yet, but that original question makes no sense to me.
In my experience, at least, the goal is to concatenate numbers with a known, fixed number of digits. For example, if I have
int year = 2021;
int month = 10;
int day = 2;
(that is, a date next month), and I want to "concatenate" them, the desired result is of course 20211002. It would be quite wrong to output 2021102 — because nobody could tell whether it was supposed to be October 2, or maybe January 2!
At the other question, there's a comment that "100 * x + y fails when y == 0", as if concatenating 23 and 0 should yield 230, or maybe 23, but that sounds crazy to me.
Since the question here is about getting the original values back, the answer is obviously to know how many digits each original number had. For example, if you know you have two 2-digit numbers x and y, then z = 100 * x + y is absolutely the right way to concatenate them. And then, having done so, getting the original values back is equally straightforward:
x = z / 100;
y = z % 100;
The key is that we picked that magic number 100, that effectively sets y as 2 digits, in advance, and baked it into the code. We did not, as some of the answers at the other question (and the code fragment in this question here) suggest, dynamically and empirically discover a scaling factor at run time by doing successive multiplications by 10. (And by not doing those successive multiplications to compute pow every time, we'll have less processing overhead, too.)
Summary: while(y >= pow) pow *= 10; is the wrong way to concatenate numbers in C. The right way is
z = 100 * x * y;
and once you've done it that way, you get the original numbers back by doing
x = z / 100;
y = z % 100;
If y might have more digits, pick an appropriate multiplier greater than 100, and use that instead. And do be mindful that if you're not careful, your concatenated number can end up bigger than an ordinary int will hold.
P.S. I said "if you know you have two 2-digit numbers x and y", but that's not quite right. If you're using 100 * x + y, then y has to be two digits (or less), but it's okay for x to be more than 2 digits (within limits).

Underflow error in floating point arithmetic in C

I am new to C, and my task is to create a function
f(x) = sqrt[(x^2)+1]-1
that can handle very large numbers and very small numbers. I am submitting my script on an online interface that checks my answers.
For very large numbers I simplify the expression to:
f(x) = x-1
By just using the highest power. This was the correct answer.
The same logic does not work for smaller numbers. For small numbers (on the order of 1e-7), they are very quickly truncated to zero, even before they are squared. I suspect that this has to do with floating point precision in C. In my textbook, it says that the float type has smallest possible value of 1.17549e-38, with 6 digit precision. So although 1e-7 is much larger than 1.17e-38, it has a higher precision, and is therefore rounded to zero. This is my guess, correct me if I'm wrong.
As a solution, I am thinking that I should convert x to a long double when x < 1e-6. However when I do this, I still get the same error. Any ideas? Let me know if I can clarify. Code below:
#include <math.h>
#include <stdio.h>
double feval(double x) {
/* Insert your code here */
if (x > 1e299)
{;
return x-1;
}
if (x < 1e-6)
{
long double g;
g = x;
printf("x = %Lf\n", g);
long double a;
a = pow(x,2);
printf("x squared = %Lf\n", a);
return sqrt(g*g+1.)- 1.;
}
else
{
printf("x = %f\n", x);
printf("Used third \n");
return sqrt(pow(x,2)+1.)-1;
}
}
int main(void)
{
double x;
printf("Input: ");
scanf("%lf", &x);
double b;
b = feval(x);
printf("%f\n", b);
return 0;
}
For small inputs, you're getting truncation error when you do 1+x^2. If x=1e-7f, x*x will happily fit into a 32 bit floating point number (with a little bit of error due to the fact that 1e-7 does not have an exact floating point representation, but x*x will be so much smaller than 1 that floating point precision will not be sufficient to represent 1+x*x.
It would be more appropriate to do a Taylor expansion of sqrt(1+x^2), which to lowest order would be
sqrt(1+x^2) = 1 + 0.5*x^2 + O(x^4)
Then, you could write your result as
sqrt(1+x^2)-1 = 0.5*x^2 + O(x^4),
avoiding the scenario where you add a very small number to 1.
As a side note, you should not use pow for integer powers. For x^2, you should just do x*x. Arbitrary integer powers are a little trickier to do efficiently; the GNU scientific library for example has a function for efficiently computing arbitrary integer powers.
There are two issues here when implementing this in the naive way: Overflow or underflow in intermediate computation when computing x * x, and substractive cancellation during final subtraction of 1. The second issue is an accuracy issue.
ISO C has a standard math function hypot (x, y) that performs the computation sqrt (x * x + y * y) accurately while avoiding underflow and overflow in intermediate computation. A common approach to fix issues with subtractive cancellation is to transform the computation algebraically such that it is transformed into multiplications and / or divisions.
Combining these two fixes leads to the following implementation for float argument. It has an error of less than 3 ulps across all possible inputs according to my testing.
/* Compute sqrt(x*x+1)-1 accurately and without spurious overflow or underflow */
float func (float x)
{
return (x / (1.0f + hypotf (x, 1.0f))) * x;
}
A trick that is often useful in these cases is based on the identity
(a+1)*(a-1) = a*a-1
In this case
sqrt(x*x+1)-1 = (sqrt(x*x+1)-1)*(sqrt(x*x+1)+1)
/(sqrt(x*x+1)+1)
= (x*x+1-1) / (sqrt(x*x+1)+1)
= x*x/(sqrt(x*x+1)+1)
The last formula can be used as an implementation. For vwry small x sqrt(x*x+1)+1 will be close to 2 (for small enough x it will be 2) but we don;t loose precision in evaluating it.
The problem isn't with running into the minimum value, but with the precision.
As you said yourself, float on your machine has about 7 digits of precision. So let's take x = 1e-7, so that x^2 = 1e-14. That's still well within the range of float, no problems there. But now add 1. The exact answer would be 1.00000000000001. But if we only have 7 digits of precision, this gets rounded to 1.0000000, i.e. exactly 1. So you end up computing sqrt(1.0)-1 which is exactly 0.
One approach would be to use the linear approximation of sqrt around x=1 that sqrt(x) ~ 1+0.5*(x-1). That would lead to the approximation f(x) ~ 0.5*x^2.

Issue with Square Root in C Algorithm

I have a bit of code that finds a point on a unit sphere. Recall, for a unit sphere:
1 = sqrt( x^2 + y^2 + z^2 )
The algorithm picks two random points (the x and y coordinates) between zero and one. Provided their magnitude is less than one we have room to define a third coordinate by solving the above equation for z.
void pointOnSphere(double *point){
double x, y;
do {
x = 2*randf() - 1;
y = 2*randf() - 1;
} while (x*x + y*y > 1);
double mag = sqrt(fabs(1 - x*x - y*y));
point[0] = 2*(x*mag);
point[1] = 2*(y*mag);
point[2] = 1 - 2*(mag*mag);
}
Technically, I inherited this code. The previous owner compiled using -Ofast which "Disregards strict standards compliance". TL;DR it means your code doesn't need to follow strict IEEE standards. So when I tried to compile without optimization I ran into an error.
undefined reference to `sqrt'
What are IEEE standards? Well, because computers can't store floating point numbers to infinite precision, rounding errors pop up during certain calculations if you're not careful.
After some googling I ran into this question which got me on the right track about using proper IEEE stuff. I even read this article about floating point numbers (which I recommend). Unfortunately it didn't answer my questions.
I'd like to use sqrt() in my function as opposed to something like Newton Iteration. I understand the issue in my algorithm probably comes from the fact I could potentially (even though not really) pass a negative number to the sqrt() function. I'm just not quite sure how to remedy the issue. Thanks for all the help!
Oh, and if it's relevant I'm using a Mersenne Twister number generator.
Just to clarify, I am linking libm with -lm! I have also confirmed it is pointing to the correct library.
As for the undefined reference to sqrt you need to link with libm, usually with -lm or similar option.
Also note that
Provided their magnitude is less than one we have room to define a third coordinate by solving the above equation for z.
is wrong. The x and y must satisfy x * x + y * y <= 1 in order for there to be a solution for z.
I'd use spherical coordinates
theta = randf()*M_PI;
phi = randf()*2*M_PI;
r = 1.0;
x = r*sin(theta)*cos(phi);
y = r*sin(theta)*sin(phi);
z = r*cos(theta);
To insure the points meet a condition, test for the condition itself as part of the while loop, rather than a derivation of the condition.
// functions like `sqrt(), hypot()` benefit with declaration before use
// and without it may generate "undefined reference to `sqrt'"
// Some functions like `sqrt()` are understood and optimized out by a smart compiler.
// Still, best to always declare them.
#include <math.h>
void pointOnSphere(double *point){
double x, y, z;
do {
x = 2*randf() - 1;
y = 2*randf() - 1;
double zz = 1.0 - hypot(x,y);
if (zz < 0.) continue; // On rare negative values due to imprecision
z = sqrt(zz);
if (rand()%2) z = -z; // Flip z half the time
} while (x*x + y*y + z*z > 1); // Must meet this condition
point[0] = x;
point[1] = y;
point[2] = z;
}

Comparing the ratio of two values to 1

I'm working via a basic 'Programming in C' book.
I have written the following code based off of it in order to calculate the square root of a number:
#include <stdio.h>
float absoluteValue (float x)
{
if(x < 0)
x = -x;
return (x);
}
float squareRoot (float x, float epsilon)
{
float guess = 1.0;
while(absoluteValue(guess * guess - x) >= epsilon)
{
guess = (x/guess + guess) / 2.0;
}
return guess;
}
int main (void)
{
printf("SquareRoot(2.0) = %f\n", squareRoot(2.0, .00001));
printf("SquareRoot(144.0) = %f\n", squareRoot(144.0, .00001));
printf("SquareRoot(17.5) = %f\n", squareRoot(17.5, .00001));
return 0;
}
An exercise in the book has said that the current criteria used for termination of the loop in squareRoot() is not suitable for use when computing the square root of a very large or a very small number.
Instead of comparing the difference between the value of x and the value of guess^2, the program should compare the ratio of the two values to 1. The closer this ratio gets to 1, the more accurate the approximation of the square root.
If the ratio is just guess^2/x, shouldn't my code inside of the while loop:
guess = (x/guess + guess) / 2.0;
be replaced by:
guess = ((guess * guess) / x ) / 1 ; ?
This compiles but nothing is printed out into the terminal. Surely I'm doing exactly what the exercise is asking?
To calculate the ratio just do (guess * guess / x) that could be either higher or lower than 1 depending on your implementation. Similarly, your margin of error (in percent) would be absoluteValue((guess * guess / x) - 1) * 100
All they want you to check is how close the square root is. By squaring the number you get and dividing it by the number you took the square root of you are just checking how close you were to the original number.
Example:
sqrt(4) = 2
2 * 2 / 4 = 1 (this is exact so we get 1 (2 * 2 = 4 = 4))
margin of error = (1 - 1) * 100 = 0% margin of error
Another example:
sqrt(4) = 1.999 (lets just say you got this)
1.999 * 1.999 = 3.996
3.996/4 = .999 (so we are close but not exact)
To check margin of error:
.999 - 1 = -.001
absoluteValue(-.001) = .001
.001 * 100 = .1% margin of error
How about applying a little algebra? Your current criterion is:
|guess2 - x| >= epsilon
You are elsewhere assuming that guess is nonzero, so it is algebraically safe to convert that to
|1 - x / guess2| >= epsilon / guess2
epsilon is just a parameter governing how close the match needs to be, and the above reformulation shows that it must be expressed in terms of the floating-point spacing near guess2 to yield equivalent precision for all evaluations. But of course that's not possible because epsilon is a constant. This is, in fact, exactly why the original criterion gets less effective as x diverges from 1.
Let us instead write the alternative expression
|1 - x / guess2| >= delta
Here, delta expresses the desired precision in terms of the spacing of floating point values in the vicinity of 1, which is related to a fixed quantity sometimes called the "machine epsilon". You can directly select the required precision via your choice of delta, and you will get the same precision for all x, provided that no arithmetic operations overflow.
Now just convert that back into code.
Suggest a different point of view.
As this method guess_next = (x/guess + guess) / 2.0;, once the initial approximation is in the neighborhood, the number of bits of accuracy doubles. Example log2(FLT_EPSILON) is about -23, so 6 iterations are needed. (Think 23, 12, 6, 3, 2, 1)
The trouble with using guess * guess is that it may vanish, become 0.0 or infinity for a non-zero x.
To form a quality initial guess:
assert(x > 0.0f);
int expo;
float signif = frexpf(x, &expo);
float guess = ldexpf(signif, expo/2);
Now iterate N times (e.g. 6), (N based on FLT_EPSILON, FLT_DECIMAL_DIG or FLT_DIG.)
for (i=0; i<N; i++) {
guess = (x/guess + guess) / 2.0f;
}
The cost of perhaps an extra iteration is saved by avoiding an expensive termination condition calculation.
If code wants to compare a/b nearest to 1.0f
Simply use some epsilon factor like 1 or 2.
float a = guess;
float b = x/guess;
assert(b);
float q = a/b;
#define FACTOR (1.0f /* some value 1.0f to maybe 2,3 or 4 */)
if (q >= 1.0f - FLT_EPSILON*N && q <= 1.0f + FLT_EPSILON*N) {
close_enough();
}
First lesson in numerical analysis: for floating point numbers x+y has the potential for large relative errors, especially when the sum is near zero, but x*y has very limited relative errors.

Approximation of arcsin in C

I've got a program that calculates the approximation of an arcsin value based on Taylor's series.
My friend and I have come up with an algorithm which has been able to return the almost "right" values, but I don't think we've done it very crisply. Take a look:
double my_asin(double x)
{
double a = 0;
int i = 0;
double sum = 0;
a = x;
for(i = 1; i < 23500; i++)
{
sum += a;
a = next(a, x, i);
}
}
double next(double a, double x, int i)
{
return a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2)));
}
I checked if my_pow works correctly so there's no need for me to post it here as well. Basically I want the loop to end once the difference between the current and next term is more or equal to my EPSILON (0.00001), which is the precision I'm using when calculating a square root.
This is how I would like it to work:
while(my_abs(prev_term - next_term) >= EPSILON)
But the function double next is dependent on i, so I guess I'd have to increment it in the while statement too. Any ideas how I should go about doing this?
Example output for -1:
$ -1.5675516116e+00
Instead of:
$ -1.5707963268e+00
Thanks so much guys.
Issues with your code and question include:
Your image file showing the Taylor series for arcsin has two errors: There is a minus sign on the x5 term instead of a plus sign, and the power of x is shown as xn but should be x2n+1.
The x factor in the terms of the Taylor series for arcsin increases by x2 in each term, but your formula a*((my_pow(2*i-1, 2)) / ((2*i)*(2*i+1)*my_pow(x, 2))) divides by x2 in each term. This does not matter for the particular value -1 you ask about, but it will produce wrong results for other values, except 1.
You ask how to end the loop once the difference in terms is “more or equal to” your epsilon, but, for most values of x, you actually want less than (or, conversely, you want to continue, not end, while the difference is greater than or equal to, as you show in code).
The Taylor series is a poor way to evaluate functions because its error increases as you get farther from the point around which the series is centered. Most math library implementations of functions like this use a minimax series or something related to it.
Evaluating the series from low-order terms to high-order terms causes you to add larger values first, then smaller values later. Due to the nature of floating-point arithmetic, this means that accuracy from the smaller terms is lost, because it is “pushed out” of the width of the floating-point format by the larger values. This effect will limit how accurate any result can be.
Finally, to get directly to your question, the way you have structured the code, you directly update a, so you never have both the previous term and the next term at the same time. Instead, create another double b so that you have an object b for a previous term and an object a for the current term, as shown below.
Example:
double a = x, b, sum = a;
int i = 0;
do
{
b = a;
a = next(a, x, ++i);
sum += a;
} while (abs(b-a) > threshold);
using Taylor series for arcsin is extremly imprecise as the stuff converge very badly and there will be relatively big differencies to the real stuff for finite number of therms. Also using pow with integer exponents is not very precise and efficient.
However using arctan for this is OK
arcsin(x) = arctan(x/sqrt(1-(x*x)));
as its Taylor series converges OK on the <0.0,0.8> range all the other parts of the range can be computed through it (using trigonometric identities). So here my C++ implementation (from my arithmetics template):
T atan (const T &x) // = atan(x)
{
bool _shift=false;
bool _invert=false;
bool _negative=false;
T z,dz,x1,x2,a,b; int i;
x1=x; if (x1<0.0) { _negative=true; x1=-x1; }
if (x1>1.0) { _invert=true; x1=1.0/x1; }
if (x1>0.7) { _shift=true; b=::sqrt(3.0)/3.0; x1=(x1-b)/(1.0+(x1*b)); }
x2=x1*x1;
for (z=x1,a=x1,b=1,i=1;i<1000;i++) // if x1>0.8 convergence is slow
{
a*=x2; b+=2; dz=a/b; z-=dz;
a*=x2; b+=2; dz=a/b; z+=dz;
if (::abs(dz)<zero) break;
}
if (_shift) z+=pi/6.0;
if (_invert) z=0.5*pi-z;
if (_negative) z=-z;
return z;
}
T asin (const T &x) // = asin(x)
{
if (x<=-1.0) return -0.5*pi;
if (x>=+1.0) return +0.5*pi;
return ::atan(x/::sqrt(1.0-(x*x)));
}
Where T is any floating point type (float,double,...). As you can see you need sqrt(x), pi=3.141592653589793238462643383279502884197169399375105, zero=1e-20 and +,-,*,/ operations implemented. The zero constant is the target precision.
So just replace T with float/double and ignore the :: ...
so I guess I'd have to increment it in the while statement too
Yes, this might be a way. And what stops you?
int i=0;
while(condition){
//do something
i++;
}
Another way would be using the for condition:
for(i = 1; i < 23500 && my_abs(prev_term - next_term) >= EPSILON; i++)
Your formula is wrong. Here is the correct formula: http://scipp.ucsc.edu/~haber/ph116A/taylor11.pdf.
P.S. also note that your formula and your series are not correspond to each other.
You can use while like this:
while( std::abs(sum_prev - sum) < 1e-15 )
{
sum_prev = sum;
sum += a;
a = next(a, x, i);
}

Resources