I'm trying to make a program to calculate the cos(x) function using taylor series so far I've got this:
int factorial(int a){
if(a < 0)
return 0;
else if(a==0 || a==1)
return 1;
else
return a*(factorial(a-1));
}
double Tserie(float angle, int repetitions){
double series = 0.0;
float i;
for(i = 0.0; i < repeticiones; i++){
series += (pow(-1, i) * pow(angle, 2*i))/factorial(2*i);
printf("%f\n", (pow(-1, i) * pow(angle, 2*i))/factorial(2*i));
}
return series;
}
For my example I'm using angle = 90, and repetitions = 20, to calculate cos(90) but it's useless I just keep getting values close to the infinite, any help will be greatly appreciated.
For one thing, the angle is in radians, so for a 90 degree angle, you'd need to pass M_PI/2.
Also, you should avoid recursive functions for something as trivial as factorials, it would take 1/4 the effort to write it iteratively and it would perform a lot better. You don't actually even need it, you can keep the factorial in a temporary variable and just multiply it by 2*i*(2*i-1) at each step. Keep in mind that you'll hit a representability/precision wall really quickly at this step.
You also don't need to actually call pow for -1 to the power of i, a simple i%2?1:-1 would suffice. This way it's faster and it won't lose precision as you increase i.
Oh and don't make i float, it's an integer, make it an integer. You're leaking precision a lot as it is, why make it worse..
And to top it all off, you're approximating cos around 0, but are calling it for pi/2. You'll get really high errors doing that.
The Taylor series is for the mathematical cosine function, whose arguments is in radians. So 90 probably doesn't mean what you thought it meant here.
Furthermore, the series requires more terms the longer the argument is from 0. Generally, the number of terms need to be comparable to the size of the argument before you even begin to see the successive terms becoming smaller, and many more than that in order to get convergence. 20 is pitifully few terms to use for x=90.
Another problem is then that you compute the factorial as an int. The factorial function grows very fast -- already for 13! an ordinary C int (on a 32-bit machine) will overflow, so your terms beyond the sixth will be completely wrong anyway.
In fact the factorials and the powers of 90 quickly become too large to be represented even as doubles. If you want any chance of seeing the series converge, you must not compute each term from scratch but derive it from the previous one using a formula such as
nextTerm = - prevTerm * x * x / (2*i-1) / (2*i);
Related
I want to develop a simple geo-fencing algorithm in C, that works without using sin, cos and tan. I am working with a small microcontroller, hence the restriction. I have no space left for <math.h>. The radius will be around 20..100m. I am not expecting super accurate results this way.
My current solution takes two coordinate sets (decimal, .00001 accuracy, but passed as a value x10^5, in order to eliminate the decimal places) and a radius (in m). When multiplying the coordinates with 0.9, they can approximately be used for a Pythagorean equation which checks, if one coordinate lies within the radius of another:
static int32_t
geo_convert_coordinates(int32_t coordinate)
{
return (cordinate * 10) / 9;
}
bool
geo_check(int32_t lat_fixed,
int32_t lon_fixed,
int32_t lat_var,
int32_t lon_var,
uint16_t radius)
{
lat_fixed = geo_convert_distance(lat_fixed);
lon_fixed = geo_convert_distance(lon_fixed);
lat_var = geo_convert_distance(lat_var);
lon_var = geo_convert_distance(lon_var);
if (((lat_var - lat_fixed) * (lat_var - lat_fixed) + (lon_var - lon_fixed) * (lon_var - lon_fixed))
<= (radius * radius))
{
return true;
}
return false;
}
This solution works quite well for the equator, but when changing the latitude, this becomes increasingly inaccurate, at 70°N the deviation is around 50%. I could change the factor depending on the latitude, but I am not happy with this solution.
Is there a better way to do this calculation? Any help is very much appreciated. Best regards!
UPDATE
I used the input I got and managed to implement a decent solution. I used only signed ints, no floats.
The haversine formula could be simplified: due to the relevant radii (50-500m), the deltas of the latitude and longitude are very small (<0.02°). This means, that the sine can be simplified to sin(x) = x and also the arcsine to asin(x) = x. This approach is very accurate for angles <10° and even better for the small angles used here. This leaves the cosine, which I implemented according to #meaning-matters 's suggestion. The cosine will take an angle and return the actual result multiplied by 100, in order to be able to use ints. The square root was implemented with an iterative loop (I cannot find the so post anymore). The haversine calculation was done with the inputs multiplied by powers of 10 in order to achieve accuracy and afterwards divided by the necessary power of 10.
For my 8bit system, this caused a memory usage of around 2000-2500 Bytes.
Implement the Havesine function using your own trigonometric functions that use lookup tables and do interpolation.
Because you don't want very accurate results, small lookup tables, of perhaps twenty points, would be sufficient. And, simple linear interpolation would also be fine.
In case you don't have much memory space: Bear in mind that to implement sine and cosine, you only need one lookup table for 90 degrees of either function. All values can then be determined by mirroring and offsetting.
I have written the following function for the Taylor series to calculate cosine.
double cosine(int x) {
x %= 360; // make it less than 360
double rad = x * (PI / 180);
double cos = 0;
int n;
for(n = 0; n < TERMS; n++) {
cos += pow(-1, n) * pow(rad, 2 * n) / fact(2 * n);
}
return cos;
}
My issue is that when i input 90 i get the answer -0.000000. (why am i getting -0.000 instead of 0.000?)
Can anybody explain why and how i can solve this issue?
I think it's due to the precision of double.
Here is the main() :
int main(void){
int y;
//scanf("%d",&y);
y=90;
printf("sine(%d)= %lf\n",y, sine(y));
printf("cosine(%d)= %lf\n",y, cosine(y));
return 0;
}
It's totally expected that you will not be able to get exact zero outputs for cosine of anything with floating point, regardless of how good your approach to computing it is. This is fundamental to how floating point works.
The mathematical zeros of cosine are odd multiples of pi/2. Because pi is irrational, it's not exactly representable as a double (or any floating point form), and the difference between the nearest neighboring values that are representable is going to be at least pi/2 times DBL_EPSILON, roughly 3e-16 (or corresponding values for other floating point types). For some odd multiples of pi/2, you might "get lucky" and find that it's really close to one of the two neighbors, but on average you're going to find it's about 1e-16 away. So your input is already wrong by 1e-16 or so.
Now, cosine has slope +1 or -1 at its zeros, so the error in the output will be roughly proportional to the error in the input. But to get an exact zero, you'd need error smaller than the smallest representable nonzero double, which is around 2e-308. That's nearly 300 orders of magnitude smaller than the error in the input.
While you coudl in theory "get lucky" and have some multiple if pi/2 that's really really close to the nearest representable double, the likelihood of this, just modelling it as random, is astronomically small. I believe there are even proofs that there is no double x for which the correctly-rounded value of cos(x) is an exact zero. For single-precision (float) this can be determined easily by brute force; for double that's probably also doable but a big computation.
As to why printf is printing -0.000000, it's just that the default for %f is 6 places after the decimal point, which is nowhere near enough to see the first significant digit. Using %e or %g, optionally with a large precision modifier, would show you an approximation of the result you got that actually retains some significance and give you an idea whether your result is good.
My issue is that when i input 90 i get the answer -0.000000. (why am i getting -0.000 instead of 0.000?)
cosine(90) is not precise enough to result in a value of 0.0. Use printf("cosine(%d)= %le\n",y, cosine(y)); (note the e) to see a more informative view of the result. Instead, cosine(90) is generating a negative result in the range [-0.0005 ... -0.0] and that is rounded to "-0.000" for printing.
Can anybody explain why and how i can solve this issue?
OP's cosine() lacks sufficient range reduction, which for degrees can be exact.
x %= 360; was a good first step, yet perform a better range reduction to a 90° width like [-45°...45°], [45°...135°], etc.
Also recommend: Use a Taylor series with sufficient terms (e.g. 10) and a good machine PI1. Form the terms more carefully than pow(rad, 2 * n) / fact(2 * n), which inject excessive error.
Example1, example2.
Other improvements possible, yet something to get OP started.
1 #define PI 3.1415926535897932384626433832795
I have to implement an algorithm to evaluate a sum of poisson functions, each one with multiplying constants:
Where C(k) are positive constants<1, cut is a cutoff because in principle the sum should take infinite numbers of k, and lambda is a number that may vary in my case from 20 to 100. I've tried a straight forward implementation in my code:
#include<quadmath.h>
... //Definitions of lambda and c[k]...
long double sum=0;
for(int k=0;k<cutoff;++k)
{
sum=sum + c[k] powq(lambda,k)/tgamma(k+1)* (1.0/expq(lambda));
}
But I am not quite satisfied. I've searched on "Numerical recipes" for a good approach to evalutation of a poisson distribution, but I didn't find anything about that.
Are there better ways to do this?
Edit: to be clear, I'm looking for the most precise way to approximate the probaility of large events, given a poisson distribution, without computing awkward (lambda^k)/k! Factors!
Well, a simple improvement will be to calculate by hand, and cache lambda^k and (k+1)!, since their value in the previous iteration can be used to quickly calculate the respective ones in the current iteration, with an O(1) calculation.
Also, since 1.0/exp(lambda) is a constant, you should calculate it once in advance 1
#include<quadmath.h>
... //Definitions of lambda and c[k]...
const long double e_coeff = 1.0 / expq(lambda);
long double inv_k_factorial = 1.0l;
long double lambda_pow_k = 1.0l;
long double sum = 0.0l;
for(int k=0; k < cutoff; ++k)
{
lambda_pow_k *= lambda;
inv_k_factorial /= (k + 1);
sum += (c[k] * lambda_pow_k * inv_k_factorial);
}
sum *= e_coeff;
So now the three function calls and their respective overhead are completely gone from your loop.
Now, I've attempted to use the same data types as you did when writing your question. Since your comment indicates that lambda is greater than 1.0 (no relative error growth due to a quickly diminishing lambda_pow_k, I believe) Any significance lost here would depend on the limits of long double, which is either good or bad, depending on your concrete needs.
Compilers are clever nowadays. So it could be optimized like that any way, but I think it's best to leave less obvious optimizations to the optimizer. Your code shouldn't suffer in performance even when handed to a non-optimizing compiler.
Since the Poisson probabilities obey the simple recurrence
P(k,lam) = lam/k * P(k-1,lam)
one possibility would be to use something like Horner's rule for polynomials. That is:
Sum{k|C[k]*P(k,lam)} = exp(-lam)*(C[0]+(lam/1)*(C[1]+(lam/2)*(..))..)
or
P = c[cut]
for k=cut-1 .. 0
P = P*lam/(k+1) + C[k]
P *= exp(-lam)
Here is the question..
This is what I've done so far,
#include <stdio.h>
#include <math.h>
long int factorial(int m)
{
if (m==0 || m==1) return (1);
else return (m*factorial(m-1));
}
double power(double x,int n)
{
double val=1;
int i;
for (i=1;i<=n;i++)
{
val*=x;
}
return val;
}
double sine(double x)
{
int n;
double val=0;
for (n=0;n<8;n++)
{
double p = power(-1,n);
double px = power(x,2*n+1);
long fac = factorial(2*n+1);
val += p * px / fac;
}
return val;
}
int main()
{
double x;
printf("Enter angles in degrees: ");
scanf("%lf",&x);
printf("\nValue of sine of %.2f is %.2lf\n",x,sine(x * M_PI / 180));
printf("\nValue of sine of %.2f from library function is %.2lf\n",x,sin(x * M_PI / 180));
return 0;
}
The problem is that the program works perfectly fine from 0 to 180 degrees, but beyond that it gives error.. Also when I increase the value of n in for (n=0;n<8;n++) beyond 8, i get significant error.. There is nothing wrong with the algorithm, I've tested it in my calculator, and the program seems to be fine as well.. I think the problem is due to the range of the data type.. what should i correct to get rid of this error?
Thanks..
You are correct that the error is due to the range of the data type. In sine(), you are calculating the factorial of 15, which is a huge number and does not fit in 32 bits (which is presumably what long int is implemented as on your system). To fix this, you could either:
Redefine factorial to return a double.
Rework your code to combine power and factorial into one loop, which alternately multiplies by x, and divides by i. This will be messier-looking but will avoid the possibility of overflowing a double (granted, I don't think that's a problem for your use case).
15! is indeed beyond range that a 32bit integer can hold. I'd use doubles throughout if I were you.
The taylor series for sin(x) converges more slowly for large values of x. For x outside -π,π. I'd add/subtract multiples of 2*π to get as small an x as possible.
You need range reduction. Note that a Taylor series is best near zero and that in the negative range it is the (negative) mirror image of it's positive range. So, in short: reduce the range (by the modula of 2 PI) to wrap it it the range where you have the highest accuracy. The range beyond 1/2 PI is getting less accurate, so you also want to use the formula: sin(1/2 PI + x) = sin(1/2 PI - x). For negative vales use the formula: sin(-x) = -sin(x). Now you only need to evaluate the interval 0 - 1/2 PI while spanning the whole range. Of course for VERY large values accuracy of the modula of 2 PI will suffer.
You may be having a problem with 15!.
I would print out the values for p, px, fac, and the value for the term for each iteration, and check them out.
You're only including 8 terms in an infinite series. If you think about it for a second in terms of a polynomial, you should see that you don't have a good enough fit for the entire curve.
The fact is that you only need to write the function for 0 <= x <=\pi; all other values will follow using these relationships:
sin(-x) = -sin(x)
and
sin(x+\pi;) = -sin(x)
and
sin(x+2n\pi) = sin(x)
I'd recommend that you normalize your input angle using these to make your function work for all angles as written.
There's a lot of inefficiency built into your code (e.g. you keep recalculating factorials that would easily fit in a table lookup; you use power() to oscillate between -1 and +1). But first make it work correctly, then make it faster.
For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data.
Now I have problem implementing the algorithm because of the limitations in the C's datatype.
( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering )
PROBLEM STATEMENT:
The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using
Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below.
#define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01)
#define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99)
int main()
{
int index;
long double numerator = 1.0;
long double denom1 = 1.0, denom2 = 1.0;
long double doc_spam_prob;
/* Simulating FEW unlikely spam words */
for(index = 0; index < 162; index++)
{
numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD;
denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD;
denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD);
}
/* Simulating lot of mostly definite spam words */
for (index = 0; index < 1000; index++)
{
numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD;
denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD;
denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD);
}
doc_spam_prob= (numerator/(denom1+denom2));
return 0;
}
I tried Float, double and even long double datatypes but still same problem.
Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!.
This is the first time I am hitting such issue. So how exactly should this problem be tackled?
This happens often in machine learning. AFAIK, there's nothing you can do about the loss in precision. So to bypass this, we use the log function and convert divisions and multiplications to subtractions and additions, resp.
SO I decided to do the math,
The original equation is:
I slightly modify it:
Taking logs on both sides:
Let,
Substituting,
Hence the alternate formula for computing the combined probability:
If you need me to expand on this, please leave a comment.
Here's a trick:
for the sake of readability, let S := p_1 * ... * p_n and H := (1-p_1) * ... * (1-p_n),
then we have:
p = S / (S + H)
p = 1 / ((S + H) / S)
p = 1 / (1 + H / S)
let`s expand again:
p = 1 / (1 + ((1-p_1) * ... * (1-p_n)) / (p_1 * ... * p_n))
p = 1 / (1 + (1-p_1)/p_1 * ... * (1-p_n)/p_n)
So basically, you will obtain a product of quite large numbers (between 0 and, for p_i = 0.01, 99). The idea is, not to multiply tons of small numbers with one another, to obtain, well, 0, but to make a quotient of two small numbers. For example, if n = 1000000 and p_i = 0.5 for all i, the above method will give you 0/(0+0) which is NaN, whereas the proposed method will give you 1/(1+1*...1), which is 0.5.
You can get even better results, when all p_i are sorted and you pair them up in opposed order (let's assume p_1 < ... < p_n), then the following formula will get even better precision:
p = 1 / (1 + (1-p_1)/p_n * ... * (1-p_n)/p_1)
that way you devide big numerators (small p_i) with big denominators (big p_(n+1-i)), and small numerators with small denominators.
edit: MSalter proposed a useful further optimization in his answer. Using it, the formula reads as follows:
p = 1 / (1 + (1-p_1)/p_n * (1-p_2)/p_(n-1) * ... * (1-p_(n-1))/p_2 * (1-p_n)/p_1)
Your problem is caused because you are collecting too many terms without regard for their size. One solution is to take logarithms. Another is to sort your individual terms. First, let's rewrite the equation as 1/p = 1 + ∏((1-p_i)/p_i). Now your problem is that some of the terms are small, while others are big. If you have too many small terms in a row, you'll underflow, and with too many big terms you'll overflow the intermediate result.
So, don't put too many of the same order in a row. Sort the terms (1-p_i)/p_i. As a result, the first will be the smallest term, the last the biggest. Now, if you'd multiply them straight away you would still have an underflow. But the order of calculation doesn't matter. Use two iterators into your temporary collection. One starts at the beginning (i.e. (1-p_0)/p_0), the other at the end (i.e (1-p_n)/p_n), and your intermediate result starts at 1.0. Now, when your intermediate result is >=1.0, you take a term from the front, and when your intemediate result is < 1.0 you take a result from the back.
The result is that as you take terms, the intermediate result will oscillate around 1.0. It will only go up or down as you run out of small or big terms. But that's OK. At that point, you've consumed the extremes on both ends, so it the intermediate result will slowly approach the final result.
There's of course a real possibility of overflow. If the input is completely unlikely to be spam (p=1E-1000) then 1/p will overflow, because ∏((1-p_i)/p_i) overflows. But since the terms are sorted, we know that the intermediate result will overflow only if ∏((1-p_i)/p_i) overflows. So, if the intermediate result overflows, there's no subsequent loss of precision.
Try computing the inverse 1/p. That gives you an equation of the form 1 + 1/(1-p1)*(1-p2)...
If you then count the occurrence of each probability--it looks like you have a small number of values that recur--you can use the pow() function--pow(1-p, occurences_of_p)*pow(1-q, occurrences_of_q)--and avoid individual roundoff with each multiplication.
You can use probability in percents or promiles:
doc_spam_prob= (numerator*100/(denom1+denom2));
or
doc_spam_prob= (numerator*1000/(denom1+denom2));
or use some other coefficient
I am not strong in math so I cannot comment on possible simplifications to the formula that might eliminate or reduce your problem. However, I am familiar with the precision limitations of long double types and am aware of several arbitrary and extended precision math libraries for C. Check out:
http://www.nongnu.org/hpalib/
and
http://www.tc.umn.edu/~ringx004/mapm-main.html