C floats : how to go around them for 2d geometry (lines) - c

I am currently doing some 2D geometry in C, mostly lines intersecting themselves. Those lines have all kinds of slopes : from 0.001 to 1000 (examples, I don't even know).
I was using floats until now and did not have to worry about whether the value was very small (and then floating point would store 0,0011 as 1e-3 with no rounding) or very high (and then 1001 would be stored as 1e3), with in both cases little loss of precision where relevant.
But now I want to try without floats, with integers. How to maintain precision in my calculations ? I could have a flag telling me whether the slope is going to be big or small and then consider a tenth of big slopes and ten times small slopes so that rounding is no problem for small slopes and there is no overflow in the case of big slopes. But that feels like a headache.
Basically I need to still be able to differentiate between a slope of 0.2 and 0.4 and also on the overflow side of things a slope of 1000 and 2000 (supposing that ints overflow at 1000 - less of a problem here).
Any other ideas?

Store the slope as a pair of integers
struct slope {
int delta_y;
int delta_x;
};
This allows for a wide range of slopes like 0 and +/- 1/INT_MAX ... +/- INT_MAX, even vertical. With careful coding, exact computations can be had.
Tardy credit: This is much like #Ignacio Vazquez-Abrams comment.

Generally speaking, with lines of arbitrary orientation it is not recommended to work with the slope/intercept representation y = mx + p, but with the implicit equation a x + b y + c = 0. The latter is more isotropic, supports vertical lines and gives you extra flexibility to scale the coefficients.
Meeting #chux's answer, the coefficients can be the deltas, Dy x - Dx y + c = 0 (assuming that the lines are defined by two points, Dx and Dy are likely to not overflow). Overflow is still possible on c, and you can use the variant Dy (x - x0) - Dx (y - y0) = 0.
Anyway, intermediate computations such as intersections may require larger ranges, i.e. double length integers.
The idea of flagging the large/low value is a little counterproductive: it is actually a primitive way of doing floating-point, i.e. separating the scale from the mantissa. Working this way, you will somehow re-design a foating-point system, less powerful than the built-in type and costing you sweat and tears.
Unfortunately, high range arithmetic can't be avoided. Indeed, the intersection of two straight lines is given by the Cramer formulas
x = (c b' - c' b) / (a b' - a' b),
y = (a c' - a' c) / (a b' - a' b)
where the products to be evaluated are one order of magnitude larger than the initial coefficients. This is explained by the fact that quasi-parallel lines have far-away intersections.

Look up fixed point arithmetic if you want to use int in a general way.
You can also design your algorithms so that you do every computation in such a way that you don't need sub-integer accuracy (for example look up Bresenham's line and circle drawing algorithms).
To your particular problem, you could try to keep quotient and fraction separately, in other words use rational numbers. Or ot put it another way, have delta X and delta Y as two numbers.

Related

Check if geo location is within radius of other geolocation without using sin/cos/tan

I want to develop a simple geo-fencing algorithm in C, that works without using sin, cos and tan. I am working with a small microcontroller, hence the restriction. I have no space left for <math.h>. The radius will be around 20..100m. I am not expecting super accurate results this way.
My current solution takes two coordinate sets (decimal, .00001 accuracy, but passed as a value x10^5, in order to eliminate the decimal places) and a radius (in m). When multiplying the coordinates with 0.9, they can approximately be used for a Pythagorean equation which checks, if one coordinate lies within the radius of another:
static int32_t
geo_convert_coordinates(int32_t coordinate)
{
return (cordinate * 10) / 9;
}
bool
geo_check(int32_t lat_fixed,
int32_t lon_fixed,
int32_t lat_var,
int32_t lon_var,
uint16_t radius)
{
lat_fixed = geo_convert_distance(lat_fixed);
lon_fixed = geo_convert_distance(lon_fixed);
lat_var = geo_convert_distance(lat_var);
lon_var = geo_convert_distance(lon_var);
if (((lat_var - lat_fixed) * (lat_var - lat_fixed) + (lon_var - lon_fixed) * (lon_var - lon_fixed))
<= (radius * radius))
{
return true;
}
return false;
}
This solution works quite well for the equator, but when changing the latitude, this becomes increasingly inaccurate, at 70°N the deviation is around 50%. I could change the factor depending on the latitude, but I am not happy with this solution.
Is there a better way to do this calculation? Any help is very much appreciated. Best regards!
UPDATE
I used the input I got and managed to implement a decent solution. I used only signed ints, no floats.
The haversine formula could be simplified: due to the relevant radii (50-500m), the deltas of the latitude and longitude are very small (<0.02°). This means, that the sine can be simplified to sin(x) = x and also the arcsine to asin(x) = x. This approach is very accurate for angles <10° and even better for the small angles used here. This leaves the cosine, which I implemented according to #meaning-matters 's suggestion. The cosine will take an angle and return the actual result multiplied by 100, in order to be able to use ints. The square root was implemented with an iterative loop (I cannot find the so post anymore). The haversine calculation was done with the inputs multiplied by powers of 10 in order to achieve accuracy and afterwards divided by the necessary power of 10.
For my 8bit system, this caused a memory usage of around 2000-2500 Bytes.
Implement the Havesine function using your own trigonometric functions that use lookup tables and do interpolation.
Because you don't want very accurate results, small lookup tables, of perhaps twenty points, would be sufficient. And, simple linear interpolation would also be fine.
In case you don't have much memory space: Bear in mind that to implement sine and cosine, you only need one lookup table for 90 degrees of either function. All values can then be determined by mirroring and offsetting.

Taylor Series to calculate cosine (getting output -0.000 for cosine(90))

I have written the following function for the Taylor series to calculate cosine.
double cosine(int x) {
x %= 360; // make it less than 360
double rad = x * (PI / 180);
double cos = 0;
int n;
for(n = 0; n < TERMS; n++) {
cos += pow(-1, n) * pow(rad, 2 * n) / fact(2 * n);
}
return cos;
}
My issue is that when i input 90 i get the answer -0.000000. (why am i getting -0.000 instead of 0.000?)
Can anybody explain why and how i can solve this issue?
I think it's due to the precision of double.
Here is the main() :
int main(void){
int y;
//scanf("%d",&y);
y=90;
printf("sine(%d)= %lf\n",y, sine(y));
printf("cosine(%d)= %lf\n",y, cosine(y));
return 0;
}
It's totally expected that you will not be able to get exact zero outputs for cosine of anything with floating point, regardless of how good your approach to computing it is. This is fundamental to how floating point works.
The mathematical zeros of cosine are odd multiples of pi/2. Because pi is irrational, it's not exactly representable as a double (or any floating point form), and the difference between the nearest neighboring values that are representable is going to be at least pi/2 times DBL_EPSILON, roughly 3e-16 (or corresponding values for other floating point types). For some odd multiples of pi/2, you might "get lucky" and find that it's really close to one of the two neighbors, but on average you're going to find it's about 1e-16 away. So your input is already wrong by 1e-16 or so.
Now, cosine has slope +1 or -1 at its zeros, so the error in the output will be roughly proportional to the error in the input. But to get an exact zero, you'd need error smaller than the smallest representable nonzero double, which is around 2e-308. That's nearly 300 orders of magnitude smaller than the error in the input.
While you coudl in theory "get lucky" and have some multiple if pi/2 that's really really close to the nearest representable double, the likelihood of this, just modelling it as random, is astronomically small. I believe there are even proofs that there is no double x for which the correctly-rounded value of cos(x) is an exact zero. For single-precision (float) this can be determined easily by brute force; for double that's probably also doable but a big computation.
As to why printf is printing -0.000000, it's just that the default for %f is 6 places after the decimal point, which is nowhere near enough to see the first significant digit. Using %e or %g, optionally with a large precision modifier, would show you an approximation of the result you got that actually retains some significance and give you an idea whether your result is good.
My issue is that when i input 90 i get the answer -0.000000. (why am i getting -0.000 instead of 0.000?)
cosine(90) is not precise enough to result in a value of 0.0. Use printf("cosine(%d)= %le\n",y, cosine(y)); (note the e) to see a more informative view of the result. Instead, cosine(90) is generating a negative result in the range [-0.0005 ... -0.0] and that is rounded to "-0.000" for printing.
Can anybody explain why and how i can solve this issue?
OP's cosine() lacks sufficient range reduction, which for degrees can be exact.
x %= 360; was a good first step, yet perform a better range reduction to a 90° width like [-45°...45°], [45°...135°], etc.
Also recommend: Use a Taylor series with sufficient terms (e.g. 10) and a good machine PI1. Form the terms more carefully than pow(rad, 2 * n) / fact(2 * n), which inject excessive error.
Example1, example2.
Other improvements possible, yet something to get OP started.
1 #define PI 3.1415926535897932384626433832795

Splines in integer arithmetic?

Splines (the piecewise cubic polynomial form) can be written as:
s = x - x[k]
y = y[k] + a[k]*s + b[k]*s*s + c[k]*s*s*s
where x[k] < x < x[k+1], the curve passes through each (x[k], y[k]) point, and a,b,c are arrays of coefficients describing the slope and shape. This all works fine in floating point, and there are plenty of ways to calculate a,b,c for different kinds of splines. However...
How can this be approximated in integer arithmetic?
One of the tricky parts is that any approximation should, ideally, be continuous, in other words using x=x[k+1] and the coefficients from the k-th segment, the result should be y[k+1] except for rounding errors. In other words, for a straight segment, y[k+1] == y[k] + a[k]*(x[k+1] - x[k]), and curvy segments only deviate from this in the middle but not at either end. This is guaranteed by construction in the case of floating point, but even a small coefficient change from rounding can throw it off quite a bit.
Another tricky part is that, in general, the magnitude of the higher-order coefficients is much smaller - but not always, esp. not at sharp "corners". It may still make sense to scale them up by the typical size of s to the power of whatever order they are, so they are not rounded of to zero as integers, but that would seem to trade off resolution in curvature for max possible corner sharpness.
First try at an integer version:
y = y[k] + (a[k] + (b[k] + c[k]*s)*s)*s
Then use integer multiply (intended for 16bit values, 32bit arithmetic):
#define q (1<<16)
#define mult(x, y) ((x * y) / q)
y = y[k] + mult(mult(mult(c[k], s) + b[k], s) + a[k], s)
This looks good in theory, but I'm not sure it's the best possible approach, or how to tell systematically what the best possible approach is.

Is there a simple C library for 3d rotations with minimal rounding error?

A naive implementation of vector rotation in 3d gives huge rounding errors, especially when multiple rotations around different axis are performed. A simple 1-axis example shows the basic problem. I have a code where I rotate points around x- and y- axis a few times. In some cases, I get errors in the second decimal place (e.g. length of the vector is 1 before rotations and 0.9 after). I'd be happy with relative errors < 1e-5.
void Rotate_x(double data[3], double agl) {
agl *= M_PI/180.0;
double c = cos(agl); double s = sin(agl);
double tmp_y = c*data[1] - s*data[2];
double tmp_z = s*data[1] + c*data[2];
data[1] = tmp_y; data[2] = tmp_z;
}
Can someone point me to a library or some code that rotates points around the coordinate axis with minimal error?
Everything I found were bloated linear algebra libraries that are overkill for my purposes.
Edit:
I went to long double precision and combined rotations to improve errors. With doubles I was not fully satisfied (1e-3 relative error in worst case). That was the easiest solution an it works okay. Still wouldn't mind a nice library that does rotations in regular double precision accurately.
better precision variables are not enough
you need more precise sin,cos functions to improve accuracy
so make your own functions via Taylor series expansion
and use that ... then compare the results
and increase the polynomial order until accuracy stop raising or start dropping again
if you are applying many transformations on the same data
then create cumulative transform matrix
then check if it is orthogonal/orthonormal
and repair if not (with use of cross product)
I use this for 3D render object matrices (many cumulative transforms over time)
but in your case this can also increase error (if chosen wrong order of axises during correction)
this is better suited to ensure that object will stay the same size/shape over time ...
[edit1] test
I took your code to Borland BDS2006 compile as win32 app
and the result is:
original: (0.0000000000000000000,1.0000000000000000000,0.0000000000000000000)
rotated: (0.0000000000000000000,0.9999999999999998890,-0.0000000000000000273)
also do not forget if your sin,cos taking radians (as usuall for C/C++) then add this to Rotate
agl*=M_PI/180.0;
What compiler/platform are you using?
This is how mine Rotate looks like
void Rotate(double *data,double agl)
{
agl*=M_PI/180.0;
double c = cos(agl); double s = sin(agl);
double tmp_y = c*data[1] - s*data[2];
double tmp_z = s*data[1] + c*data[2];
data[1] = tmp_y; data[2] = tmp_z;
}
[edit2] 32/64 bit comparison
[double] //64bit floating point
(0.0000000000000000000,1.0000000000000000000,0.0000000000000000000)
(0.0000000000000000000,0.9999999999999998890,-0.0000000000000000273)
[float] //32bit floating point
(0.0000000000000000000,1.0000000000000000000,0.0000000000000000000)
(0.0000000000000000000,0.9999999403953552246,-0.0000000146747787255)

Problem with Precision floating point operation in C

For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data.
Now I have problem implementing the algorithm because of the limitations in the C's datatype.
( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering )
PROBLEM STATEMENT:
The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using
Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below.
#define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01)
#define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99)
int main()
{
int index;
long double numerator = 1.0;
long double denom1 = 1.0, denom2 = 1.0;
long double doc_spam_prob;
/* Simulating FEW unlikely spam words */
for(index = 0; index < 162; index++)
{
numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD;
denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD;
denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD);
}
/* Simulating lot of mostly definite spam words */
for (index = 0; index < 1000; index++)
{
numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD;
denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD;
denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD);
}
doc_spam_prob= (numerator/(denom1+denom2));
return 0;
}
I tried Float, double and even long double datatypes but still same problem.
Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!.
This is the first time I am hitting such issue. So how exactly should this problem be tackled?
This happens often in machine learning. AFAIK, there's nothing you can do about the loss in precision. So to bypass this, we use the log function and convert divisions and multiplications to subtractions and additions, resp.
SO I decided to do the math,
The original equation is:
I slightly modify it:
Taking logs on both sides:
Let,
Substituting,
Hence the alternate formula for computing the combined probability:
If you need me to expand on this, please leave a comment.
Here's a trick:
for the sake of readability, let S := p_1 * ... * p_n and H := (1-p_1) * ... * (1-p_n),
then we have:
p = S / (S + H)
p = 1 / ((S + H) / S)
p = 1 / (1 + H / S)
let`s expand again:
p = 1 / (1 + ((1-p_1) * ... * (1-p_n)) / (p_1 * ... * p_n))
p = 1 / (1 + (1-p_1)/p_1 * ... * (1-p_n)/p_n)
So basically, you will obtain a product of quite large numbers (between 0 and, for p_i = 0.01, 99). The idea is, not to multiply tons of small numbers with one another, to obtain, well, 0, but to make a quotient of two small numbers. For example, if n = 1000000 and p_i = 0.5 for all i, the above method will give you 0/(0+0) which is NaN, whereas the proposed method will give you 1/(1+1*...1), which is 0.5.
You can get even better results, when all p_i are sorted and you pair them up in opposed order (let's assume p_1 < ... < p_n), then the following formula will get even better precision:
p = 1 / (1 + (1-p_1)/p_n * ... * (1-p_n)/p_1)
that way you devide big numerators (small p_i) with big denominators (big p_(n+1-i)), and small numerators with small denominators.
edit: MSalter proposed a useful further optimization in his answer. Using it, the formula reads as follows:
p = 1 / (1 + (1-p_1)/p_n * (1-p_2)/p_(n-1) * ... * (1-p_(n-1))/p_2 * (1-p_n)/p_1)
Your problem is caused because you are collecting too many terms without regard for their size. One solution is to take logarithms. Another is to sort your individual terms. First, let's rewrite the equation as 1/p = 1 + ∏((1-p_i)/p_i). Now your problem is that some of the terms are small, while others are big. If you have too many small terms in a row, you'll underflow, and with too many big terms you'll overflow the intermediate result.
So, don't put too many of the same order in a row. Sort the terms (1-p_i)/p_i. As a result, the first will be the smallest term, the last the biggest. Now, if you'd multiply them straight away you would still have an underflow. But the order of calculation doesn't matter. Use two iterators into your temporary collection. One starts at the beginning (i.e. (1-p_0)/p_0), the other at the end (i.e (1-p_n)/p_n), and your intermediate result starts at 1.0. Now, when your intermediate result is >=1.0, you take a term from the front, and when your intemediate result is < 1.0 you take a result from the back.
The result is that as you take terms, the intermediate result will oscillate around 1.0. It will only go up or down as you run out of small or big terms. But that's OK. At that point, you've consumed the extremes on both ends, so it the intermediate result will slowly approach the final result.
There's of course a real possibility of overflow. If the input is completely unlikely to be spam (p=1E-1000) then 1/p will overflow, because ∏((1-p_i)/p_i) overflows. But since the terms are sorted, we know that the intermediate result will overflow only if ∏((1-p_i)/p_i) overflows. So, if the intermediate result overflows, there's no subsequent loss of precision.
Try computing the inverse 1/p. That gives you an equation of the form 1 + 1/(1-p1)*(1-p2)...
If you then count the occurrence of each probability--it looks like you have a small number of values that recur--you can use the pow() function--pow(1-p, occurences_of_p)*pow(1-q, occurrences_of_q)--and avoid individual roundoff with each multiplication.
You can use probability in percents or promiles:
doc_spam_prob= (numerator*100/(denom1+denom2));
or
doc_spam_prob= (numerator*1000/(denom1+denom2));
or use some other coefficient
I am not strong in math so I cannot comment on possible simplifications to the formula that might eliminate or reduce your problem. However, I am familiar with the precision limitations of long double types and am aware of several arbitrary and extended precision math libraries for C. Check out:
http://www.nongnu.org/hpalib/
and
http://www.tc.umn.edu/~ringx004/mapm-main.html

Resources