I'm using Sqlalchemy to define my tables and such and here is some code I came up with:
locations = Table('locations', Base.metadata,
Column("lat", Float(Precision=64), primary_key=True),
Column("lng", Float(Precision=64), primary_key=True),
)
I read somewhere that latitude and longitude require better precision than floats, usually double precision. So I set the precision manually to 64, is this sufficient? Overkill? Would this even help for my situation?
Nobody else here provided concrete numbers with proof for the worst-case accuracy of a floating point lat/long. I needed to know this for something I was working on, so here is my analysis in case it helps someone else.
A single-precision floating point offers 24-bits of precision in the significand (the binary exponential notation of a number). As the whole part of the number gets larger, the number of bits after the decimal goes down. Therefore, the worst-case accuracy for a latitude or longitude is when the magnitude is as far away from 0 as is possible. Assuming you bound your latitudes to [-90, 90] and longitudes from (-180, 180], the worst-case will be at the equator for longitude 180.
In binary, 180 requires 8-bits of the 24-bits available, leaving 16 bits after the decimal point. Therefore, the distance between consecutively representable values at this longitude would be 2^-16 deg (approximately 1.526E-5). Multiplying that number (in radians) by the WGS-84 radius of the Earth at the equator (6,378,137 m) yields a worst-case precision of:
2^-16 deg * 6,378,137 m * PI rad / 180 deg = 1.6986 m (5.5728 ft).
The same analysis against lat/longs stored in radians yields the following:
2^-22 rad * 6,378,137 m = 1.5207 m (4.9891 ft)
And finally, if you normalize the latitudes to the range [-1, 1] and the longitudes to the range (-1, 1], then you can achieve the following worst-case precision:
2^-24 * PI rad * 6,378,137 m = 1.1943 m (3.9184 ft)
So storing lat/long in radians buys you around 7 inches of additional accuracy, and storing them in normalized form buys you around 1'8" of additional accuracy, both in the worst-case scenario.
If, when converting between double-precision and single-precision you rounded (instead of truncating), the single-precision value will be within half of the distance between two consecutive values computed above.
It depends on what you are using your data for. If you use a float it will be ok if the you only need it down to about the meter level of detail. Using the data in graphically applications will cause a jitter effect if the user zooms in to far. For more about jitter and see Precisions, Precisions. Hope this helps.
Update: Jeff's answer has a better analysis. However...
To improve upon Jeff's answer:
If you divide the actual angle in radians by π, thus encoding the angle in a scale going from 0 to ±1, then it should be possible to use all the digits of the significand (23 bits (24 - 1 sign bit)). The precision would then be:
2^-23 * 6,378,137 m = 0.7603 m (76 cm)
My Old answer:
A 32 bit floating point number can represent a number with about 7.2 decimal digits of precision. This is an approximation because the floating point number is actually in binary, and when converted to decimal, the number of significant digits might vary.
If we take it as 6 decimal digits of precision (to play on the safe side), and if we are storing latitude and longitude in degrees, then we get a precision of about 1/1000th of a degree which is a precision of about 111 meters in the worst case. In the best case, if we get 7 decimal digits of precision, the accuracy would be about 11.1 meters.
It is possible to get a better precision using radians as the unit. In the worst case we get a precision of 10 millionth of a radian which is about 63 meters. In the best case, it would be 1 millionth of a radian which is about 6 meters.
Needless to say, a 64bit floating point number would be extremely precise (about 6 micro meters in the worst case).
TL;DR: if one-meter resolution is acceptable then a single-precision float storing degrees is acceptable.
This answer is a bit late to the party but I needed a solid answer myself and so hacked out some code to quickly get it. There are of course more elegant ways to do this, but it looks to work. As noted by Jeff, the worst case scenario will be at +/- 180 degrees longitude (ie, the date line).
Per the code below, a single-precision float is accurate to 0.85 meters at the date line using single-precision floats storing degrees. Accuracy increases significantly (to w/in mm) when close to the Prime meridian.
#include <stdio.h>
// per wikipedia, earth's circumference is 40,075.017 KM
#define METERS_PER_DEG (40075017 / 360.0)
// worst case scenario is near +/-180.0 (ie, the date line)
#define LONGITUDE 180.0
int main()
{
// subtract very small but increasingly larger values from
// 180.0 and recast as float until it no longer equals 180.0
double step = 1.0e-10;
int ctr = 1;
while ((float) LONGITUDE == (float) (LONGITUDE - (double) ctr * step)) {
ctr++;
}
double delta = (double) ctr * step;
printf("Longitude %f\n", LONGITUDE);
printf("delta %f (%d steps)\n", delta, ctr);
printf("meters: %f\n", delta * METERS_PER_DEG);
return 0;
}
Output from this code is
Longitude 180.000000
delta 0.000008 (76294 steps)
meters: 0.849301
Related
I have written the following function for the Taylor series to calculate cosine.
double cosine(int x) {
x %= 360; // make it less than 360
double rad = x * (PI / 180);
double cos = 0;
int n;
for(n = 0; n < TERMS; n++) {
cos += pow(-1, n) * pow(rad, 2 * n) / fact(2 * n);
}
return cos;
}
My issue is that when i input 90 i get the answer -0.000000. (why am i getting -0.000 instead of 0.000?)
Can anybody explain why and how i can solve this issue?
I think it's due to the precision of double.
Here is the main() :
int main(void){
int y;
//scanf("%d",&y);
y=90;
printf("sine(%d)= %lf\n",y, sine(y));
printf("cosine(%d)= %lf\n",y, cosine(y));
return 0;
}
It's totally expected that you will not be able to get exact zero outputs for cosine of anything with floating point, regardless of how good your approach to computing it is. This is fundamental to how floating point works.
The mathematical zeros of cosine are odd multiples of pi/2. Because pi is irrational, it's not exactly representable as a double (or any floating point form), and the difference between the nearest neighboring values that are representable is going to be at least pi/2 times DBL_EPSILON, roughly 3e-16 (or corresponding values for other floating point types). For some odd multiples of pi/2, you might "get lucky" and find that it's really close to one of the two neighbors, but on average you're going to find it's about 1e-16 away. So your input is already wrong by 1e-16 or so.
Now, cosine has slope +1 or -1 at its zeros, so the error in the output will be roughly proportional to the error in the input. But to get an exact zero, you'd need error smaller than the smallest representable nonzero double, which is around 2e-308. That's nearly 300 orders of magnitude smaller than the error in the input.
While you coudl in theory "get lucky" and have some multiple if pi/2 that's really really close to the nearest representable double, the likelihood of this, just modelling it as random, is astronomically small. I believe there are even proofs that there is no double x for which the correctly-rounded value of cos(x) is an exact zero. For single-precision (float) this can be determined easily by brute force; for double that's probably also doable but a big computation.
As to why printf is printing -0.000000, it's just that the default for %f is 6 places after the decimal point, which is nowhere near enough to see the first significant digit. Using %e or %g, optionally with a large precision modifier, would show you an approximation of the result you got that actually retains some significance and give you an idea whether your result is good.
My issue is that when i input 90 i get the answer -0.000000. (why am i getting -0.000 instead of 0.000?)
cosine(90) is not precise enough to result in a value of 0.0. Use printf("cosine(%d)= %le\n",y, cosine(y)); (note the e) to see a more informative view of the result. Instead, cosine(90) is generating a negative result in the range [-0.0005 ... -0.0] and that is rounded to "-0.000" for printing.
Can anybody explain why and how i can solve this issue?
OP's cosine() lacks sufficient range reduction, which for degrees can be exact.
x %= 360; was a good first step, yet perform a better range reduction to a 90° width like [-45°...45°], [45°...135°], etc.
Also recommend: Use a Taylor series with sufficient terms (e.g. 10) and a good machine PI1. Form the terms more carefully than pow(rad, 2 * n) / fact(2 * n), which inject excessive error.
Example1, example2.
Other improvements possible, yet something to get OP started.
1 #define PI 3.1415926535897932384626433832795
I need to do a math to convert a 16-bit value received from sensor to real relative humidity value. It's calculated with following formula:
Given this in floating point math that would be:
uint16_t buf = 0x7C80; // Example
float rh = ((float)buf*125 / 65536)-6;
But I want to avoid floating point math as my platform are "FPUless".
What are the most effective way to calculate & store RH in integer math here? Considering it's humidity the actual value should be between 0 and 100% but sometimes approximation could lead that rh could be slightly less than 0 or higher than 100 (if I would leave that float, I could just do something like if (rh<0) rh=0; else if (rh>100) rh=100;) and I care only about last 2 digits after decimal point (%.2f).
Currently I've solved this like this:
int16_t rhint = ((uint32_t)buf*12500 / 65536)-600;
And working with rhint / 100; rhint % 100. But probably there are more effective way?
You could avoid the large intermediate term by writing the right hand side as
-6 + (128 - 4 + 1) * S / 65536
Which becomes
-6 + S / 512 - S / 16384 + S / 65536
You might be able to drop the last term, and possibly the penultimate one too depending on how precise you want the basis point truncation to be.
I found Stevens Computing Services – K & R Exercise 2-1 a very thorough answer to K&R 2-1. This slice of the full code computes the maximum value of a float type in the C programming language.
Unluckily my theoretical comprehension of float values is quite limited. I know they are composed of significand (mantissa.. ) and a magnitude which is a power of 2.
#include <stdio.h>
#include <limits.h>
#include <float.h>
main()
{
float flt_a, flt_b, flt_c, flt_r;
/* FLOAT */
printf("\nFLOAT MAX\n");
printf("<limits.h> %E ", FLT_MAX);
flt_a = 2.0;
flt_b = 1.0;
while (flt_a != flt_b) {
flt_m = flt_b; /* MAX POWER OF 2 IN MANTISSA */
flt_a = flt_b = flt_b * 2.0;
flt_a = flt_a + 1.0;
}
flt_m = flt_m + (flt_m - 1); /* MAX VALUE OF MANTISSA */
flt_a = flt_b = flt_c = flt_m;
while (flt_b == flt_c) {
flt_c = flt_a;
flt_a = flt_a * 2.0;
flt_b = flt_a / 2.0;
}
printf("COMPUTED %E\n", flt_c);
}
I understand that the latter part basically checks to which power of 2 it's possible to raise the significand with a three variable algorithm. What about the first part?
I can see that a progression of multiples of 2 should eventually determine the value of the significand, but I tried to trace a few small numbers to check how it should work and it failed to find the right values...
======================================================================
What are the concepts on which this program is based upon and does this program gets more precise as longer and non-integer numbers have to be found?
The first loop determines the number of bits contributing to the significand by finding the least power 2 such that adding 1 to it (using floating-point arithmetic) fails to change its value. If that's the nth power of two, then the significand uses n bits, because with n bits you can express all the integers from 0 through 2^n - 1, but not 2^n. The floating-point representation of 2^n must therefore have an exponent large enough that the (binary) units digit is not significant.
By that same token, having found the first power of 2 whose float representation has worse than unit precision, the maximim float value that does have unit precision is one less. That value is recorded in variable flt_m.
The second loop then tests for the maximum exponent by starting with the maximum unit-precision value, and repeatedly doubling it (thereby increasing the exponent by 1) until it finds that the result cannot be converted back by halving it. The maximum float is the value before that final doubling.
Do note, by the way, that all the above supposes a base-2 floating-point representation. You are unlikely to run into anything different, but C does not actually require any specific representation.
With respect to the second part of your question,
does this program gets more precise as longer and non-integer numbers have to be found?
the program takes care to avoid losing precision. It does assume a binary floating-point representation such as you described, but it will work correctly regardless of the number of bits in the significand or exponent of such a representation. No non-integers are involved, but the program already deals with numbers that have worse than unit precision, and with numbers larger than can be represented with type int.
Typically, Rounding to 2 decimal places is very easy with
printf("%.2lf",<variable>);
However, the rounding system will usually rounds to the nearest even. For example,
2.554 -> 2.55
2.555 -> 2.56
2.565 -> 2.56
2.566 -> 2.57
And what I want to achieve is that
2.555 -> 2.56
2.565 -> 2.57
In fact, rounding half-up is doable in C, but for Integer only;
int a = (int)(b+0.5)
So, I'm asking for how to do the same thing as above with 2 decimal places on positive values instead of Integer to achieve what I said earlier for printing.
It is not clear whether you actually want to "round half-up", or rather "round half away from zero", which requires different treatment for negative values.
Single precision binary float is precise to at least 6 decimal places, and 20 for double, so nudging a FP value by DBL_EPSILON (defined in float.h) will cause a round-up to the next 100th by printf( "%.2lf", x ) for n.nn5 values. without affecting the displayed value for values not n.nn5
double x2 = x * (1 + DBL_EPSILON) ; // round half-away from zero
printf( "%.2lf", x2 ) ;
For different rounding behaviours:
double x2 = x * (1 - DBL_EPSILON) ; // round half-toward zero
double x2 = x + DBL_EPSILON ; // round half-up
double x2 = x - DBL_EPSILON ; // round half-down
Following is precise code to round a double to the nearest 0.01 double.
The code functions like x = round(100.0*x)/100.0; except it handles uses manipulations to insure scaling by 100.0 is done exactly without precision loss.
Likely this is more code than OP is interested, but it does work.
It works for the entire double range -DBL_MAX to DBL_MAX. (still should do more unit testing).
It depends on FLT_RADIX == 2, which is common.
#include <float.h>
#include <math.h>
void r100_best(const char *s) {
double x;
sscanf(s, "%lf", &x);
// Break x into whole number and fractional parts.
// Code only needs to round the fractional part.
// This preserves the entire `double` range.
double xi, xf;
xf = modf(x, &xi);
// Multiply the fractional part by N (256).
// Break into whole and fractional parts.
// This provides the needed extended precision.
// N should be >= 100 and a power of 2.
// The multiplication by a power of 2 will not introduce any rounding.
double xfi, xff;
xff = modf(xf * 256, &xfi);
// Multiply both parts by 100.
// *100 incurs 7 more bits of precision of which the preceding code
// insures the 8 LSbit of xfi, xff are zero.
int xfi100, xff100;
xfi100 = (int) (xfi * 100.0);
xff100 = (int) (xff * 100.0); // Cast here will truncate (towards 0)
// sum the 2 parts.
// sum is the exact truncate-toward-0 version of xf*256*100
int sum = xfi100 + xff100;
// add in half N
if (sum < 0)
sum -= 128;
else
sum += 128;
xf = sum / 256;
xf /= 100;
double y = xi + xf;
printf("%6s %25.22f ", "x", x);
printf("%6s %25.22f %.2f\n", "y", y, y);
}
int main(void) {
r100_best("1.105");
r100_best("1.115");
r100_best("1.125");
r100_best("1.135");
r100_best("1.145");
r100_best("1.155");
r100_best("1.165");
return 0;
}
[Edit] OP clarified that only the printed value needs rounding to 2 decimal places.
OP's observation that rounding of numbers "half-way" per a "round to even" or "round away from zero" is misleading. Of 100 "half-way" numbers like 0.005, 0.015, 0.025, ... 0.995, only 4 are typically exactly "half-way": 0.125, 0.375, 0.625, 0.875. This is because floating-point number format use base-2 and numbers like 2.565 cannot be exactly represented.
Instead, sample numbers like 2.565 have as the closest double value of 2.564999999999999947... assuming binary64. Rounding that number to nearest 0.01 should be 2.56 rather than 2.57 as desired by OP.
Thus only numbers ending with 0.125 and 0.625 area exactly half-way and round down rather than up as desired by OP. Suggest to accept that and use:
printf("%.2lf",variable); // This should be sufficient
To get close to OP's goal, numbers could be A) tested against ending with 0.125 or 0.625 or B) increased slightly. The smallest increase would be
#include <math.h>
printf("%.2f", nextafter(x, 2*x));
Another nudge method is found with #Clifford.
[Former answer that rounds a double to the nearest double multiple of 0.01]
Typical floating-point uses formats like binary64 which employs base-2. "Rounding to nearest mathmatical 0.01 and ties away from 0.0" is challenging.
As #Pascal Cuoq mentions, floating point numbers like 2.555 typically are only near 2.555 and have a more precise value like 2.555000000000000159872... which is not half way.
#BLUEPIXY solution below is best and practical.
x = round(100.0*x)/100.0;
"The round functions round their argument to the nearest integer value in floating-point
format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.6.
The ((int)(100 * (x + 0.005)) / 100.0) approach has 2 problems: it may round in the wrong direction for negative numbers (OP did not specify) and integers typically have a much smaller range (INT_MIN to INT_MAX) that double.
There are still some cases when like when double x = atof("1.115"); which end up near 1.12 when it really should be 1.11 because 1.115, as a double is really closer to 1.11 and not "half-way".
string x rounded x
1.115 1.1149999999999999911182e+00 1.1200000000000001065814e+00
OP has not specified rounding of negative numbers, assuming y = -f(-x).
I have doubles that represent latitudes and longitudes.
I can easily limit longitudes to (-180.0, 180.0] with the following function.
double limitLon(double lon)
{
return fmod(lon - 180.0, 360.0) + 180.0;
}
This works because one end is exclusive and the other is inclusive. fmod includes 0 but not -360.0.
Can anyone think of an elegant method for latitude?
The required interval is [-90.0, 90.0]. A closed form solution would be best, i.e. no loop. I think fmod() is probably a non-starter because both ends are inclusive now.
Edit: As was pointed out, one can't go to 91 degrees latitude anyway. Technically 91 should map to 89.0. Oh boy, that changes things.
There is a much, much more efficient way to do this than using sin and arcsin. The most expensive operation is a single division. The observation that the required interval is closed is key.
Divide by 360 and take the remainder. This yields a number in the interval [0, 360), which is half-open, as observed.
Fold the interval in half. If the remainder is >=180, subtract it from 360. This maps the interval [180, 360) to the interval (0, 180]. The union of this interval with the bottom half is the closed interval [0, 180].
Subtract 90 from the result. This interval is [-90, 90], as desired.
This is, indeed, the exact same function as arcsin(sin(x)), but without the expense or any issue with numeric stability.
Using trig functions sin()/cos() is expensive in time and introduces loss of precision. Much better to use the remainder() function. Note the result has the same sign as x and magnitude less than the magnitude of y, if able.
OP was on the right track! The below solution is easy to adjust per the edge values of -180 and + 180.0.
#include <math.h>
// Reduce to (-180.0, 180.0]
double Limit_Longitude(double longitude_degrees) {
// A good implementation of `fmod()` will introduce _no_ loss of precision.
// -360.0 <= longitude_reduced <=- 360.0
double longitude_reduced = fmod(longitude_degrees, 360.0);
if (longitude_reduced > 180.0) {
longitude_reduced -= 360.0;
} else if (longitude_reduced <= -180.0) {
longitude_reduced += 360.0;
}
return longitude_reduced;
}
Limiting Latitude to [-90 to +90] is trickier as a latitude of +91 degrees is going over the North Pole but switching the longitude +/- 180 degrees. To preserve longitude precision, adjust by 180 toward 0 degrees.
void Limit_Latitude_Longitude(double *latitude_degrees, double *longitude_degrees) {
*latitude_degrees = Limit_Longitude(*latitude_degrees);
int flip = 0;
if (*latitude_degrees > 90.0) {
*latitude_degrees = 180.0 - *latitude_degrees;
flip = 1;
} else if (*latitude_degrees < -90.0) {
*latitude_degrees = -180.0 - *latitude_degrees;
flip = 1;
}
if (flip) {
*longitude_degrees += *longitude_degrees > 0 ? -180.0 : 180.0;
}
*longitude_degrees = Limit_Longitude(*longitude_degrees);
}
Minor: Although the goal is "limit longitudes to (-180.0, 180.0]", I'd expect ranges of [-180.0, 180.0), [-180.0, 180.0] to be more commonly needed.
How about using the sin and inverse functions?
asin(sin((lat/180.0)*3.14159265)) * (180.0/3.14159265);
Neither answer provided (D Stanley, eh9) works ... though for eh9's I might be misinterpreting something. Try them with multiple values.
The proper answers are unfortunately expensive. See the following from Microsoft Research: https://web.archive.org/web/20150109080324/http://research.microsoft.com/en-us/projects/wraplatitudelongitude/.
From there, the answers are:
latitude_new = atan(sin(latitude)/fabs(cos(latitude))) -- note the absolute value around cos(latitude)
longitude_new = atan2(sin(latitude),cos(latitude))
Note that in C you may want to use atan2f (float vs double). Also, all trig functions take radians.