wrong calculation when using float - c

I'm trying to do a calculation and for some reason when I'm using float I'm getting -nan(ind) but when I'm changing the variables (x,y) to double I'm getting to right answer maybe you guys have any idea why its happening ?
Thank you
#include <stdio.h>
#include <math.h>
#define pi 3.1416
#define L1 0.5
#define L2 0.5
void main()
{
float q1[12], q2[12], q1_Degrees[12], q2_Degrees[12];
float x = 0.8;
float y = 0.6;
q2[0] = acos((pow(x, 2) + pow(y, 2) - pow(L1, 2) - pow(L2, 2)) / (2 * L1*L2));
q1[0] = atan(y / x) - atan((L2*sin(q2[0])) / (L1 + L2 * cos(q2[0])));
q1_Degrees[0] = (q1[0] * 180) / pi;
q2_Degrees[0] = (q2[0] * 180) / pi;
printf_s("q1 is = %.1f q2 is = %.1f\n\n", q1_Degrees[0], q2_Degrees[0]);
}

2 concerns
acos()
The x in acos(x) needs to be in the range [-1...1]. Outside that, the result may be NaN.
(pow(x, 2) + pow(y, 2) - pow(L1, 2) - pow(L2, 2)) / (2 * L1*L2) is prone to slight effects of computation that result in a value just outside [-1...1] even if mathematically the result should be in range.
A quick work-around:
double z = (pow(x, 2) + pow(y, 2) - pow(L1, 2) - pow(L2, 2)) / (2 * L1*L2);
if (z < -1.0) z = -1.0;
else if (z > 1.0) z = 1.0;
q2[0] = acos(z);
The issue applies to double, float, long double. The fact it "worked" with one type is no reason to believe code is robust with other values.
Note that code is calling double functions like acos(), pow() and not their float counterparts acosf(), powf(). I recommend to use double throughout unless you have a compelling reason otherwise.
atan
atan() provides a [-π/2... +π/2] radians (aka [-90...90] degrees) result.
A whole circle result of [-π... +π] radians (aka [-180...180] degrees) is available with atan2(y,x)
atan((L2*sin(q2[0])) / (L1 + L2 * cos(q2[0])))
// or
atan2(L2*sin(q2[0]), L1 + L2 * cos(q2[0]))
A better solution is to use a different form of trig manipulation that does not depend on the edge of acos(). Easiest to do if OP also posted the higher level goal of the exercise.

Some basic debugging:
First, you can narrow down your code to:
#include <stdio.h>
#include <math.h>
void main()
{
float x = 0.8;
float y = 0.6;
double q = acos((pow(x, 2) + pow(y, 2) - 0.5) * 2);
printf("q = %lf\n", q);
}
Then, it becomes obvious that either pow(x, 2) or pow(y, 2) yield slightly different results for float and double.
At this point, let's investigate the actual differences:
Between the value of (float)0.8 and the value of (double)0.8
Between the value of (float)0.6 and the value of (double)0.6
#include <stdio.h>
void main()
{
printf("(float)0.8 = %.10f\n", (float)0.8);
printf("(double)0.8 = %.10lf\n", (double)0.8);
printf("(float)0.6 = %.10f\n", (float)0.6);
printf("(double)0.6 = %.10lf\n", (double)0.6);
}
The printout is:
(float)0.8 = 0.8000000119
(double)0.8 = 0.8000000000
(float)0.6 = 0.6000000238
(double)0.6 = 0.6000000000
Does that answer your question?

You're getting accumulated roundoff which runs a bit past the domain of acos().
Simplifying your example to a minimum that shows the issue:
#include <stdio.h>
#include <math.h>
#define L1 0.5
#define L2 0.5
int main()
{
float x = 0.8;
float y = 0.6;
float acos_param = (pow(x, 2) + pow(y, 2) - pow(L1, 2) - pow(L2, 2)) / (2 * L1*L2);
float q2 = acos(acos_param);
printf("acos_param = %.9f; q2 = %.9f\n", acos_param, q2);
return 0;
}
And running this - with floats - we see:
acos_param = 1.000000119; q2 = nan
Aha: greater than 1.0 is out of range of acos so you get NaN (not a number).
Changing all the float to double we get:
acos_param = 1.000000000; q2 = 0.000000000
which is more in line with expectations.
EDIT - Expanding on comments in the comments, variadic functions in C always pass floating-point values as double, and the misleadingly-named format %f really means double, not float.
Even if you attempt to cast "down" to a float, it will get promoted again back to double before it's called, but will truncate the precision.
Try this:
#include <stdio.h>
int main()
{
double d1 = 0.8;
double d2 = (float)0.8;
printf("d1=%.9f; d2=%.9f\n", d1, d2);
return 0;
}
Which on my compiler produces:
d1=0.800000000; d2=0.800000012
Here, d1 is the full real-deal double, while d2 is the float-truncated version promoted back to double.
And in no case is the l format specifier need; %f and %lf are the same thing.

Related

atan2f vs fmodf vs just plain subtraction

I have a problem with a piece of code that I wrote to wrap an angle around during an integration and is part of a small simulation that I'm working on. So basically the idea is to prevent the angle from growing large by making sure that it always has a sane value. I have tried three different approaches that I would expect to give the same results. And most of the time they do. But the first two give artifacts around the point where angle value wraps around. When I then generate a waveform from the angle value I get undesirable results because of these precision errors.
So the first approach is like this (limit angle to -8PI +8PI range):
self->state.angle = atan2f(sinf(angle / 8), cosf(angle / 8)) * 8;
This creates artifact that looks like this:
Second approach of:
self->state.angle = fmodf(angle, (float)(2.f * M_PI * 8))
Creates the same result:
However if I just do it like this:
float limit = (8 * 2 * M_PI);
if(angle > limit) angle -= limit;
if(angle < 0) angle += limit;
self->state.angle = a;
Then it works as expected without any artifacts:
So what am I missing here? Why do the other two approaches create precision error? I would expect all of them to generate the same result (I know that ranges of the angle are different but when the angle is passed further into a sin function I would expect the result to be the same).
Edit: small test
// g++ -o test test.cc -lm && ./test
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
int main(int argc, char **argv){
float a1 = 0;
float a2 = 0;
float a3 = 0;
float dt = 1.f / 7500.f;
for(float t = -4.f * M_PI; t < (4.f * M_PI); t+=dt){
a1 += dt;
a2 += dt;
a3 += dt;
float b1 = a1;
if(b1 > 2.f * M_PI) b1 -= 2.f * M_PI;
if(b1 < 0.f) b1 += 2.f * M_PI;
float b2 = atan2f(sinf(a2), cosf(a2));
float b3 = fmodf(a3, 2 * M_PI);
float x1 = sinf(b1);
float x2 = sinf(b2);
float x3 = sinf(b3);
if((x1 * x2 * x3) > 1e-9){
printf("%f: x[%f %f %f],\tx1-x2:%f x1-x3:%f x2-x3:%f]\n", t, x1, x2, x3, (x1 - x2) * 1e9, (x1 - x3) * 1e9, (x2 - x3) * 1e9);
}
}
return 0;
}
Output:
-9.421306: x[0.001565 0.001565 0.001565], x1-x2:0.000000 x1-x3:0.000000 x2-x3:0.000000]
-9.421172: x[0.001431 0.001431 0.001431], x1-x2:0.000000 x1-x3:0.000000 x2-x3:0.000000]
-9.421039: x[0.001298 0.001298 0.001298], x1-x2:0.000000 x1-x3:0.000000 x2-x3:0.000000]
-9.420905: x[0.001165 0.001165 0.001165], x1-x2:0.000000 x1-x3:0.000000 x2-x3:0.000000]
-9.420772: x[0.001032 0.001032 0.001032], x1-x2:0.000000 x1-x3:0.000000 x2-x3:0.000000]
-6.275573: x[0.001037 0.001037 0.001037], x1-x2:0.000000 x1-x3:174.855813 x2-x3:174.855813]
-6.275439: x[0.001171 0.001171 0.001171], x1-x2:0.000000 x1-x3:174.855813 x2-x3:174.855813]
-6.275306: x[0.001304 0.001304 0.001304], x1-x2:0.000000 x1-x3:174.855813 x2-x3:174.855813]
-6.275172: x[0.001438 0.001438 0.001438], x1-x2:0.000000 x1-x3:174.855813 x2-x3:174.855813]
-6.275039: x[0.001571 0.001571 0.001571], x1-x2:0.000000 x1-x3:174.855813 x2-x3:174.855813]
-6.274905: x[0.001705 0.001705 0.001705], x1-x2:0.000000 x1-x3:174.855813 x2-x3:174.855813]
-6.274772: x[0.001838 0.001838 0.001838], x1-x2:0.116415 x1-x3:174.855813 x2-x3:174.739398]
Without more information it's difficult to provide an explanation but I'll try anyway.
The difference between using fmod and "plain subtraction" (or addition) like you're doing is that if the value is way out of range already (like 800000 * M_PI for instance), then the add/subtract method doesn't change the value much (it has little effect) and a very big (in absolute value) angle hits your computation function, without an issue, since no artifact is seen.
Using fmod (or atan2) guarantees that the value is in the range you defined, which isn't the same thing.
Note that doing:
float limit = (8 * 2 * M_PI);
while(angle > limit) angle -= limit;
while(angle < 0) angle += limit;
self->state.angle = a;
would be equivalent (roughly) to fmod (but would be worse than fmod for big values since it introduces floating point accumulation errors because of repeated additions or subtractions).
So if inputting very big values in your computation produces the correct result, then you can wonder if it's wise to normalize your angles instead of leaving that to the math library.
EDIT: the first part of this answer assumed that this super-out-of-bounds case would happen, and further question edits showed that this wasn't the case, so...
The other difference between fmod and 2 tests is that there's no guaranteed that the value is the same if already in range when calling fmod
For instance if the implementation is like value - int(value/modulus)*modulus;, floating point inaccuracy may substract a small value to the original value.
Using atan2f combined with sin ... also changes the result if already in range.
(and even if the value is slightly out of range, adding/subbing like you're doing doesn't involve dividing/truncating/multiplying)
Since you can adjust the value in the range by just adding or subbing once, using fmodf or atan2f is overkill in your case, and you can stick to your simple sub/add (adding an else would save a test: if you just reajusted a too low value, no need to test to see if the value is too big)
float versus double math.
Of course the 3rd method works best. It is using double math.
Look at b1, b3. b3 is certainly calculated with float precision due to the fmodf() call.
Note that M_PI is usually a double, so b1 -= 2.f * M_PI; is likely done with double precision math and provides a more accurate answer. The f in 2.f does not force the product 2.f * M_PI into float - the product is double and so is -=.
b1 -= 2.f * M_PI;
// same as
b1 = (float)((double)b1 - (2.f * M_PI));
Further: with optimizations and FLT_EVAL_METHOD > 0, C is allowed to perform FP code at higher than the precision of the type. b1 may calculate at double, even though code appears float. With higher precision, and the fact that M_PI (a rational number) is not exactly π (an irrational number), leads to a more accurate b1 than fmodf(a3, 2 * M_PI);
float b1 = a1;
if(b1 > 2.f * M_PI) b1 -= 2.f * M_PI; // double math
if(b1 < 0.f) b1 += 2.f * M_PI; // double math
float b3 = fmodf(a3, 2 * M_PI);
To insure float results, use volatile float b1 = a1; to do a fair comparison and use float constants like #define M_PIf ((float) M_PI)
Further. With a fair comparison, better to use if(b1 < -2.f * M_PIf) b1 += 2.f * M_PIf;
Recommend OP print FLT_EVAL_METHOD to aid further discussion.
#include <float.h>
printf("%d\n", FLT_EVAL_METHOD);
OP has 2 solutions:
Use wider math like double for the sensitive radian reduction.
float b3 = fmod(a3, 2 * M_PI); // not fmodf
Do not use radians, but an angle measurement like degrees or BAM and perform exact range reduction. Angles will need degrees to radian conversions prior to trig calls.
float b3 = fmodf(a3, 360.0f); // use fmodf, a3, b3 are in degrees
Note: the float b2 = atan2f(sinf(a2), cosf(a2)); method is not a reasonable contender.

Why does this code return two different values doing the same?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
double a;
double b;
double q0 = 0.5 * M_PI + 0.5 * -2.1500000405000002;
double q1 = 0.5 * M_PI + 0.5 * 0.0000000000000000;
double w0 = 0.5 * M_PI + 0.5 * -43000.0008100000050000;
double w1 = 0.5 * M_PI + 0.5 * -0.0000000000000000;
double m = 1;
double g = 43000000.81;
double l1 = 0.1;
double l2 = 0.1;
double h = 0.0001;
a = ((-g / l1) * sin(q0) + (sin(q1 - q0) * (cos(q1 - q0) * (w0 * w0 + (g / l1) * cos(q0)) + l2 * (w1 * w1 / l1))) / (m + pow(sin(q1 - q0), 2)));
a = h * a;
b = h * ((-g / l1) * sin(q0) + (sin(q1 - q0) * (cos(q1 - q0) * (w0 * w0 + (g / l1) * cos(q0)) + l2 * (w1 * w1 / l1))) / (m + pow(sin(q1 - q0), 2)));
printf("%.20lf ", a);
printf("%.20lf", b);
return 0;
}
I do the same calculations with a and b, just with the difference that I get the value of a in two steps, and the b in one.
My code returns:
-629.47620126173774000000 -629.47620126173763000000
What is the reason of the difference between the two last decimals?
The C Standard (99 and 11) says:
The values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type.
So in an expression such as h*(X+Y) as you have in the assignment to b, the implementation is allowed to use greater precision for the intermediate result of X+Y than can possibly be stored in a double, even though the type of the subexpression is still considered to be double. But in a=X+Y; a=h*a;, the first assignment forces the value to be one that actually can be stored in a double, causing a slightly different result.
Another possibility is that the compiler has done "floating point contraction". To quote the C Standard again,
A floating expression may be contracted, that is, evaluated as though it were an atomic operation, thereby omitting rounding errors implied by the source code and the expression evaluation method.
This would most likely happen if the processor has a single instruction that can do a floating point addition and then a multiplication in one step, and the compiler decided to use it.
Assuming one or both of these is the cause, your value b is likely a more accurate representation of the computation you specified (given that all the inputs were restricted to values which can be represented in a double).
The cppreference page about macro FLT_EVAL_METHOD discusses both of these issues in a little more detail. It may be interesting to find out your value of FLT_EVAL_METHOD and play with #pragma STDC FP_CONTRACT OFF.
The answer is because in the float numbers calculations the equation a=bcde does not have to be equal to x = bc y = de and a.= xy.
It is because floating point arithmetic has a limited precision.

How does C float type work?

I want to create a program that use decimal numbers, so I thought I would need to use float types, but I don't understand how these types behave. I made a test:
#include <stdio.h>
#include <float.h>
int main(void)
{
float fl;
fl = 5 - 100000000;
printf("%f\n", fl);
fl = FLT_MAX - FLT_MAX * 2;
printf("%f\n", fl);
fl = -100000000000000;
printf("%f\n", fl);
return 0;
}
Output:
-99999992.000000 // I expected it to be -99999995.000000
-inf // I expected it to be -340282346638528859811704183484516925440.000000
-100000000376832.000000 // I expected it to be -100000000000000
Why are the results different of my expectations?
EDIT: Thank's to people who don't just downvote my question for some reasons, and actually try to help me. However, what I could learn in this thread doesn't help me understanding why some float variables containing integers (ending with .000000) behaves strangely.
I'm not sure, but I have some ideas.
In float numbers any number consists of 2 parts: m,p. Any number can be shown as:
X = m * 2^p. (m and p - binary. 0< m <1)
So, if you try to do some calculations with numbers(X+Y), firstly computer needs to represent X and Y with the same p.
Simple example (not like in program, but like people could do the same):
X=0.5, Y=1.45
0.55+1.45
0.55 * 10^0 + 0.145 * 10^1
0.055 * 10^1 + 0.145 * 10^1 (*)
0.200 * 10^1
BUT. If X and Y has too different p we have some problems in (*) line. m has limited number of digits, so:
Ex.:
X = 0.1, Y = 0.000000000000000001
X + Y = 0.100000000000(000001) <- this part didn't fit into m, so
X + Y = 0.1.
Hope you got what I wanted to say

How to pass infinity values to a function and test the result

I have this function:
#include <complex.h>
complex double f(complex double x, complex double y) {
return x*y;
}
I would like to call it with x = inf + i inf and y = i inf and see what the result is. In particular I want to test if the result is an infinity value (as it should be) or if it is NaN + iNaN. The reason for this is to test different C compilers.
How do you do this in C?
I would add an intermediate check, too:
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>
#include <math.h>
complex double f(complex double x, complex double y) {
return x*y;
}
int main(void){
complex double x = INFINITY + INFINITY * I;
complex double y = 0.0 + INFINITY * I;
complex double ret;
printf("x = %g + %g*I\n", creal(x), cimag(x));
printf("y = %g + %g*I\n", creal(y), cimag(y));
ret = f(x,y);
printf("f = %g + %g*I\n", creal(ret), cimag(ret));
exit(EXIT_SUCCESS);
}
Why?
Result with gcc-4.9.real (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
x = nan + inf*I
y = nan + inf*I
f = -inf + -nan*I
Result with Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4)
x = nan + inf*I
y = nan + inf*I
f = nan + nan*I
A complete and utter failure from the get go.
I'm not 100% sure this is correct since I've never worked with complex numbers in C, but I've tried this snippet:
#include <stdio.h>
#include <math.h>
#include <complex.h>
double complex f(double complex x, double complex y) {
return x*y;
}
int main(void)
{
double complex z1 = INFINITY + INFINITY * I;
double complex z2 = INFINITY + INFINITY * I;
complex double result = f(z1, z2);
printf("%f + i%f\n", creal(result), cimag(result));
}
I used both clang 3.8 (C 11) and GCC 6.1 (C 11) and the result was:
-inf + i-nan
Based on http://en.cppreference.com/w/c/numeric/math/INFINITY.
Apparently the macro INFINITY is not always supported and thus defined. Check the link above for more info.
You can specify what is basically a complex literal using multiplication with the I macro (which also has a _Complex_I alias). A minor example creating such a value is seen below:
#include <math.h>
#include <complex.h>
int main()
{
complex c = INFINITY * I + INFINITY;
}
The function, as written, is fine, the problem is with the way the complex numbers with special values are created and interpreted.
If you just write
double complex z1 = I * INFINITY;
double complex z2 = INFINITY + I * INFINITY;
you may discover that most of the popular compilers today do not support C99's imaginary numbers, and this expression actually multiplies (0,1) by (inf,0) and then adds (inf,0) to the result.
With gcc, I get z1 = (nan, inf), z2 = (nan, inf), f(z1,z2) = (-inf, -nan)
with clang, I get z1 = (nan, inf), z2 = (nan, inf), f(z1,z2) = (-inf, -nan)
with icc, I get z1 = (-nan, inf), z2 = (-nan, inf), f(z1, z2) = (-nan, -nan)
The only compiler that defines I as a pure imaginary number that I have access to is the C compiler from Oracle Studio
with oracle studio, I get z1 = (0, inf), z2 = (inf, inf), f(z1,z2) = (-inf, inf)
Now this is not actually supposed to be a problem because in C, there is only one Complex Infinity, and every complex number whose one component is infinite is considered to be that infinity, even if the other component is NaN. All built-in arithmetic is supposed to honor that: so in my list above, only Intel appears to have a bug here where multiplication of two complex infinities gave a complex nan.
For the lazy compilers, C11 has a macro that saves the day: CMPLX
double complex z1 = CMPLX(0, INFINITY);
double complex z2 = CMPLX(INFINITY, INFINITY);
now,
with gcc, I get z1 = (0, inf), z2 = (inf, inf), f(z1,z2) = (-inf, inf)

How to compute sine wave with accuracy over the time

Use case is to generate a sine wave for digital synthesis, so, we need to compute all values of sin(d t) where:
t is an integer number, representing the sample number. This is variable. Range is from 0 to 158,760,000 for one hour sound of CD quality.
d is double, representing the delta of the angle. This is constant. And the range is: greater than 0 , less than pi.
Goal is to achieve high accuracy with traditional int and double data types. Performance is not important.
Naive implementation is:
double next()
{
t++;
return sin( ((double) t) * (d) );
}
But, the problem is when t increases, accuracy gets reduced because big numbers provided to "sin" function.
An improved version is the following:
double next()
{
d_sum += d;
if (d_sum >= (M_PI*2)) d_sum -= (M_PI*2);
return sin(d_sum);
}
Here, I make sure to provide numbers in range from 0 to 2*pi to the "sin" function.
But, now, the problem is when d is small, there are many small additions which decreases the accuracy every time.
The question here is how to improve the accuracy.
Appendix 1
"accuracy gets reduced because big numbers provided to "sin" function":
#include <stdio.h>
#include <math.h>
#define TEST (300000006.7846112)
#define TEST_MOD (0.0463259891528704262050786960234519968548937998410258872449766)
#define SIN_TEST (0.0463094209176730795999323058165987662490610492247070175523420)
int main()
{
double a = sin(TEST);
double b = sin(TEST_MOD);
printf("a=%0.20f \n" , a);
printf("diff=%0.20f \n" , a - SIN_TEST);
printf("b=%0.20f \n" , b);
printf("diff=%0.20f \n" , b - SIN_TEST);
return 0;
}
Output:
a=0.04630944601888796475
diff=0.00000002510121488442
b=0.04630942091767308033
diff=0.00000000000000000000
You can try an approach that is used is some implementations of fast Fourier transformation. Values of trigonometric function are calculated based on previous values and delta.
Sin(A + d) = Sin(A) * Cos(d) + Cos(A) * Sin(d)
Here we have to store and update cosine value too and store constant (for given delta) factors Cos(d) and Sin(d).
Now about precision: cosine(d) for small d is very close to 1, so there is risk of precision loss (there are only few significant digits in numbers like 0.99999987). To overcome this issue, we can store constant factors as
dc = Cos(d) - 1 = - 2 * Sin(d/2)^2
ds = Sin(d)
using another formulas to update current value
(here sa = Sin(A) for current value, ca = Cos(A) for current value)
ts = sa //remember last values
tc = ca
sa = sa * dc + ca * ds
ca = ca * dc - ts * ds
sa = sa + ts
ca = ca + tc
P.S. Some FFT implementations periodically (every K steps) renew sa and ca values through trig. functions to avoid error accumulation.
Example result. Calculations in doubles.
d=0.000125
800000000 iterations
finish angle 100000 radians
cos sin
described method -0.99936080743598 0.03574879796994
Cos,Sin(100000) -0.99936080743821 0.03574879797202
windows Calc -0.9993608074382124518911354141448
0.03574879797201650931647050069581
sin(x) = sin(x + 2N∙π), so the problem can be boiled down to accurately finding a small number which is equal to a large number x modulo 2π.
For example, –1.61059759 ≅ 256 mod 2π, and you can calculate sin(-1.61059759) with more precision than sin(256)
So let's choose some integer number to work with, 256. First find small numbers which are equal to powers of 256, modulo 2π:
// to be calculated once for a given frequency
// approximate hard-coded numbers for d = 1 below:
double modB = -1.61059759; // = 256 mod (2π / d)
double modC = 2.37724612; // = 256² mod (2π / d)
double modD = -0.89396887; // = 256³ mod (2π / d)
and then split your index as a number in base 256:
// split into a base 256 representation
int a = i & 0xff;
int b = (i >> 8) & 0xff;
int c = (i >> 16) & 0xff;
int d = (i >> 24) & 0xff;
You can now find a much smaller number x which is equal to i modulo 2π/d
// use our smaller constants instead of the powers of 256
double x = a + modB * b + modC * c + modD * d;
double the_answer = sin(d * x);
For different values of d you'll have to calculate different values modB, modC and modD, which are equal to those powers of 256, but modulo (2π / d). You could use a high precision library for these couple of calculations.
Scale up the period to 2^64, and do the multiplication using integer arithmetic:
// constants:
double uint64Max = pow(2.0, 64.0);
double sinFactor = 2 * M_PI / (uint64Max);
// scale the period of the waveform up to 2^64
uint64_t multiplier = (uint64_t) floor(0.5 + uint64Max * d / (2.0 * M_PI));
// multiplication with index (implicitly modulo 2^64)
uint64_t x = i * multiplier;
// scale 2^64 down to 2π
double value = sin((double)x * sinFactor);
As long as your period is not billions of samples, the precision of multiplier will be good enough.
The following code keeps the input to the sin() function within a small range, while somewhat reducing the number of small additions or subtractions due to a potentially very tiny phase increment.
double next() {
t0 += 1.0;
d_sum = t0 * d;
if ( d_sum > 2.0 * M_PI ) {
t0 -= (( 2.0 * M_PI ) / d );
}
return (sin(d_sum));
}
For hyper accuracy, OP has 2 problems:
multiplying d by n and maintaining more precision than double. That is answered in the first part below.
Performing a mod of the period. The simple solution is to use degrees and then mod 360, easy enough to do exactly. To do 2*π of large angles is tricky as it needs a value of 2*π with about 27 more bits of accuracy than (double) 2.0 * M_PI
Use 2 doubles to represent d.
Let us assume 32-bit int and binary64 double. So double has 53-bits of accuracy.
0 <= n <= 158,760,000 which is about 227.2. Since double can handle 53-bit unsigned integers continuously and exactly, 53-28 --> 25, any double with only 25 significant bits can be multiplied by n and still be exact.
Segment d into 2 doubles dmsb,dlsb, the 25-most significant digits and the 28- least.
int exp;
double dmsb = frexp(d, &exp); // exact result
dmsb = floor(dmsb * POW2_25); // exact result
dmsb /= POW2_25; // exact result
dmsb *= pow(2, exp); // exact result
double dlsb = d - dmsb; // exact result
Then each multiplication (or successive addition) of dmsb*n will be exact. (this is the important part.) dlsb*n will only error in its least few bits.
double next()
{
d_sum_msb += dmsb; // exact
d_sum_lsb += dlsb;
double angle = fmod(d_sum_msb, M_PI*2); // exact
angle += fmod(d_sum_lsb, M_PI*2);
return sin(angle);
}
Note: fmod(x,y) results are expected to be exact give exact x,y.
#include <stdio.h>
#include <math.h>
#define AS_n 158760000
double AS_d = 300000006.7846112 / AS_n;
double AS_d_sum_msb = 0.0;
double AS_d_sum_lsb = 0.0;
double AS_dmsb = 0.0;
double AS_dlsb = 0.0;
double next() {
AS_d_sum_msb += AS_dmsb; // exact
AS_d_sum_lsb += AS_dlsb;
double angle = fmod(AS_d_sum_msb, M_PI * 2); // exact
angle += fmod(AS_d_sum_lsb, M_PI * 2);
return sin(angle);
}
#define POW2_25 (1U << 25)
int main(void) {
int exp;
AS_dmsb = frexp(AS_d, &exp); // exact result
AS_dmsb = floor(AS_dmsb * POW2_25); // exact result
AS_dmsb /= POW2_25; // exact result
AS_dmsb *= pow(2, exp); // exact result
AS_dlsb = AS_d - AS_dmsb; // exact result
double y;
for (long i = 0; i < AS_n; i++)
y = next();
printf("%.20f\n", y);
}
Output
0.04630942695385031893
Use degrees
Recommend using degrees as 360 degrees is the exact period and M_PI*2 radians is an approximation. C cannot represent π exactly.
If OP still wants to use radians, for further insight on performing the mod of π, see Good to the Last Bit

Resources