Difference between geodist() and dist() for Geo-Spacial Search

Difference between geodist() and dist() for Geo-Spacial Search - solr

What is the Difference between Geodist(sfield,x,y) and dist(2,x,y,a,b) in Apache Solr for Geo-Spacial Searches ??
dist(2,x,y,0,0) :- calculates the Euclidean distance between (0,0) and (x,y) for each document. Return the Distance between two Vectors (points) in an n-dimensional space.
I was earlier using geodist() distance function for Geo-Spatial searches on my website but its response time was large. so have done a POC(proof of concept) for different distance functions and found that dist(2,x,y,0,0) distance function is relatively taking half of the time. But I want to know the reason behind this and the algorithms which both functions are using to calculate the distance.
I have to make a difference matrix for the same to convey it further.

The main difference is that geodist() is intended to work with spatial field types.
Most spatial implementation are based on Lucene's Points API, which is a BKD Index. This field type is strictly limited to coordinates in lat/lon decimal degrees. Behind the scenes, latitude and longitude are indexed as separate numbers. Four main field types are available for spatial search :
LatLonPointSpatialField
LatLonType (now deprecated) and its non-geodetic twin PointType
SpatialRecursivePrefixTreeFieldType (RPT for short), including RptWithGeometrySpatialField, a derivative
BBoxField (for areas, 4 instances of another field type referred to by numberType)
In geodist (sfield, x, y), sfield is a spatial field type that represents two points (lat,lon), so the direct equivalent using dist() would be to implement dist (2, sfieldX, sfieldY, x, y) with sfieldX and sfieldY being respectively the (lat,lon) coordinates of sfield.
Using dist (power, a, b, ...) you can't query a spatial field type. In order to perform the same spatial search, you would have to specify every point's dimension separately. It would require 2 indexed fields (or values per field at least) for 2 dimensions, 3 for 3d, and so on. That makes a huge difference because you would have to index every coordinates of each point separately.
Besides, you can also use geodist() as is with the BBoxField field type that indexes a single rectangle per document field and supports searching via a bounding box. To do the same with dist() you would have to compute the center point of the box to input each one of its coordinates as a function argument, so it would be too much hassle to yield the same result if you want to use an area as parameter.
Lastly, LatLonPointSpatialField for example does distance calculations based on Haversine formula (Great Circle), BBoxField does it a little faster because the rectangular shape is faster to compute. It's true that dist() may be even faster but remember that requires more field to be indexed, a lot of preprocess at query time to be able to yield the same calculated distance, and, as mentioned by Mats, it wouldn't take the earth' curvature into account.

An euclidean distance doesn't account for the curvature of the earth. If you're only sorting by the distance, the behavior can be OK - but only if your hits are within a small geographical area (the value of a unit compared to meters greatly change when you're getting closer to the poles).
There's an extensive and good answer that explains the difference between a Euclidean distance and a proper geographical distance (usually calculated using haversine) available at the GIS Stack Exchange.
Although at small scales any smooth surface looks like a plane, the accuracy of the Pythagorean formula depends on the coordinates used. When those coordinates are latitude and longitude on a sphere (or ellipsoid), we can expect that
Distances along lines of longitude will be reasonably accurate.
Distances along the Equator will be reasonably accurate.
All other distances will be erroneous, in rough proportion to the differences in latitude and longitude.

Related

Range search with Hilbert Curve Index

I have a hilbert curve index based on this algorithm. I take two to four values (latitude, longitude, time in unix format and an id code) and create a 1-d hilbert curve.
I'm looking for a way to use this data to create a bounding box query (i.e. "find all ids within this rectangle).
I'm looking for a way to do so without decoding the 1d Hilbert code back into its constituent parts. It seems to be easier to do this with a Morton/Z-order curve but I was wondering about the locality preservation.
My question is: if I created a 2d hilbert curve range (i.e. I converted the range of the box into a hilbert curve so x1y1-> hilbert value1 and x2y2-> hilbertvalue2) would all values of corresponding 2d hilbert values fall within their range?
E.g. If I converted (1,2) and (20,30) into Hilbert values and then searched for all values between hilbertvalue1 and hilbertvalue2, would all the values I get decode to fall within (1,2) and (20, 30), or would I have to perform additional transformations?
An additional problem is crafting a range when you have more than 2 dimensions. I have the ability to convert in and out of Hilbert curves but how can I make sure that even 4d values have latitude and longitude that falls within the same rectangle/bounding box?
Thanks.

My question is: if I created a 2d hilbert curve range (i.e. I converted the range of the box into a hilbert curve so x1y1-> hilbert value1 and x2y2-> hilbertvalue2) would all values of corresponding 2d hilbert values fall within their range?
The answer is no. This is part of the challenge of using a Hilbert index. Below is an example curve. You'll notice that you can have a point on the curve that has a higher index than the vertices of a box containing that point. The light blue box is an example where the vertices have indices 117, 122, 133, 138 yet inside (though on the border) is the value 143.
One simple approach is brute force where you visit every cell in the search region and calculate the index in those cells. Then you compile a list of index ranges that would be used in a query. You might join some ranges and filter later as a performance optimization based on benchmarks (lots of small range queries might take longer than querying fewer larger ranges followed by a filter). I'd like to see something more elegant than this but have yet to see it.
UPDATE: I've worked out something more elegant than the brute force technique and the details (and a java library) are at https://github.com/davidmoten/hilbert-curve. In short, the endpoints of the ranges that exactly cover a search box will all be on the perimeter of the region. If you sort all the hilbert curve values on the perimeter of the region and start with the smallest value you can then pair up all the ranges by doing tests on whether the next point on the curve stays on the perimeter, leaves the box or is inside the box.
An additional problem is crafting a range when you have more than 2 dimensions. I have the ability to convert in and out of Hilbert curves but how can I make sure that even 4d values have latitude and longitude that falls within the same rectangle/bounding box?
The perimeter technique described above works for any number of dimensions (but of course becomes more expensive!).

For 2 dimensions you can treat the curve as a base-4 number (quadkey) and search from left to right.

Redis GREORADIUS: include altitude?

I am looking to use Redis' GEORADIUS command.
However, this command only searches within the range of a two-dimensional circle around a given longitude and latitude. I need to also search with an altitude, thus finding results within the range of a three-dimensional sphere.
How would I achieve this in Redis?

I believe what you're actually trying to do is search within a 3D cylinder (or some kind of a cone if you take into account the Earth's spherical nature).
An altitude search isn't natively included with Redis' geospatial indices, but you could store that property in its own Sorted Set as score. Then, you can perform a range search (ZRANGEBYSCORE) on the altitude and intersect (ZINTER) the results with those from the radius query (hint: use a temporary key to STORE results).
For your reference, this is the approach I took with the xyzsets in geo.lua.

Buffer Polygon on Point in Polygon Query

I would like to buffer the warning polygon by two miles can anyone help me with
this so if ema personal are with in to miles of the warning the are listed, I've been trying to use ST Buffer (to expand the polygon coverage for the search) but cant seem to get it right? Is it in Meters (3218.69)? I'm using the latest opengeo suite.
SELECT DISTINCT ON (ema.name)
ST_X(ema.geom),ST_Y(ema.geom),ema."name", torpoly.expire
FROM ema INNER JOIN torpoly ON ST_Within(ema.geom, ST_BUFFER(torpoly.geom)
ORDER BY ema."name"

Your options are either:
Use an appropriate projected coordinate system for the region that uses linear units in metres or feet (UTM, State plane, etc.). All distance calculations on geometry types use a Cartesian coordinate system, which is quick and simple.
Use the geography type, which does distance calculations on objects with EPSG:4326 (lat/lon) with distance units in metres. If you don't want to change the data types, you can use a geom::geography cast, and maybe make an index on that cast.
And never do ST_Within(.., ST_Buffer()) for this type of analysis. It is slower and imperfect. Instead, use ST_DWithin, which finds all geometry/geography objects within a distance threshold of each other, which is just like a buffer. This function may use a spatial GiST index, if present.

Spatial search with ravenDB

I have a rather specific spatial search I need to do. Basically, have an object (lets call it obj1) with two locations, lets call them point A and point B.
I then have a collection of objects(lets call each one obj2) each with their own A and B locations.
I want to return the top 10 objects from the collection sorted by:
(distance from obj1 A to obj2A) + (the distance from obj1B to obj2B)
Any ideas?
Thanks,
Nick
Update:
Here's a little more detail on the documents and how I want to compare them.
The domain model:
Listing:
ListingId int
Title string
Price double
Origin Location
Destination Location
Location:
Post / Zipcode string
Latitude decimal
Longitude decimal
What i want to do is take a listing object (not in the database) and compare it with the collection of listings in the database. I want the query to return the top 12 (or x) number of listings sorted by the crow flies distance from the origins plus the crow flies distance from destinations.
I don't care about the distance from origin to destination - only about the distance of origin to origin plus destination to destination.
Basically Im trying to find listings where the starting and ending locations are close.
Please let me know if I can clarify more.
Thanks!

Here is how one would solve such a problem in
mysql 4.1 &
mysql 5.
The link from mysql 4.1 seems quite helpful, esp. the first example, it's pretty much what you are asking about.
But if this is not quite helpful, I guess you'd have to loop and do queries either on obj1 or obj2 against its counterpart table.

From algorithmic perspective, I'd find the center of the bounding box, then picked candidates with increasing radius while I find enough.
Also I just want to remind that crow fly distance over the globe is not Pythagoras distance and different formula must be used:
public static double GetDistance(double lat1, double lng1, double lat2, double lng2)
{
double deltaLat = DegreesToRadians(lat2 - lat1);
double deltaLong = DegreesToRadians(lng2 - lng1);
double a = Math.Pow(Math.Sin(deltaLat / 2), 2) +
Math.Cos(DegreesToRadians(lat1))
* Math.Cos(DegreesToRadians(lat2))
* Math.Pow(Math.Sin(deltaLong / 2), 2);
return earthMeanRadiusMiles * (2 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a)));
}

Sounds like you're building a rideshare website. :)
The bottom line is that in order to sort your query result by surface distance, you'll need spatial indexing built into the database engine. I think your options here are MySQL with OpenGIS extensions (already mentioned) or PostgreSQL with PostGIS. It looks like it's possible in ravenDB too: http://ravendb.net/documentation/indexes/sptial
But if that's not an option, there's a few other ways. Let's simplify the problem and say you just want to sort your database records by their distance to location A, since you're just doing that twice and summing the result.
The simplest solution is to pull every record from the database and calculate the distance to location A one by one, then sort, in code. Trouble is, you end up doing a lot of redundant computations and pulling down the entire table for every query.
Let's once again simplify and pretend we only care about the Chebyshev (maximum) distance. This will work for narrowing our scope within the db before we get more accurate. We can do a "binary search" for nearby records. We must decide an approximate number of closest records to return; let's say 10. Then we query inside of a square area, let's say 1 degree latitude by 1 degree longitude (that's about 60x60 miles) around the location of interest. Let's say our location of interest is lat,lng=43.5,86.5. Then our db query is SELECT COUNT(*) FROM locations WHERE (lat > 43 AND lat < 44) AND (lng > 86 AND lng < 87). If you have indexes on the lat/lng fields, that should be a fast query.
Our goal is to get just above 10 total results inside the box. Here's where the "binary search" comes in. If we only got 5 results, we double the box area and search again. If we got 100 results, we cut the area in half and search again. If we get 3 results immediately after that, we increase the box area by 50% (instead of 100%) and try again, proceeding until we get close enough to our 10 result target.
Finally we take this manageable set of records and calculate their euclidean distance from the location of interest, and sort, in code.
Good luck!

I do not think that you find a solution directly out of the box.
It'll be much more efficient if you use a bounding sphere instead of a bounding box to specify your object.
http://en.wikipedia.org/wiki/Bounding_sphere
C = ( A + B)/2 and R = distance(A,B) /2
You do not precise how much data you want to compare. And if you want to see the closests or the farthest objects pair.
For both case, I think that you have to encode C coordinate as a path in an octtree if you are using 3D or quadtree if you are using 2D.
http://en.wikipedia.org/wiki/Quadtree
This is a first draft I can add more information if this not enough.
If you are not familiar with 3D start with 2D it easier to start with.
I show your latest add, it seems that your problem is very similar to clash detection algorithm.
I think that if you change the coordinate system of the "end-point" by polar coordinate relative to the "start-point". If you round the radial coordinate to your tolerance (x miles), and order them by this value.

Maps: Does calculating distance between 2 points factor in altitude?

Does Postgres' Spatial plugin, or any Spatial package for that manner, factor in the altitude when calculating the distance between 2 points?
I know the Spatial packages factor in the approximate curvature of the earth but if one location is at the top of a mountain and the other location is close to the sea - it seems like the calculated difference between those two points would greatly vary if the difference in altitude was not factored into account.
Also keep in mind that if I have 2 points are at the same ocean altitude but a mountain exists between the 2 points - the distance package should account for this.

Those factors are not being counted at all. Why? The software only knows about the two features (the two points you are getting the distance, the sphere/spheroid and a datum/projection factor).
For that to happen you need to probably use a developed linestring, in which you will connect your point with n vertices, each of them being Z aware.
Imagine this (loose WKT): LINESTRING((0,1,2),(0,2,3),(0,3,4),(0,10,15),(0,11,-1)).
Asking the software to calculate the distance between each vertex and summing it up, will consider the variations of terrain. But without something like that, it is impossible to map for irregularities in terrain.
All GIS softwares cannot tell, by themselves, what are those irregularities in terrain, and therefore, not take them in account.
You can create such linestrings (automatically) with softwares like ArcGIS (and others), using a line (between two points), and a surface file, such as the ones provided freely by NASA (SRTM project). These files come in a raster format, and each pixel has a X Y and Z value, in meters. Traversing the line you want, coupled with that terrain profile, you can achieve the calculation you want to achieve. If you need to have super extra precise calculations, you need a precise surface, and precise Z values in each vertex of this profile line.
That cleared up?

If the distance formula you're using does not take the altitude of the two points as parameters (in addition to the Latitudes and Longitudes of the two points), then it does not factor in altitude to the distance calculation. In any event, altitude difference does not have a very significant effect on calculated distance.
As usual with GPS, the difference in distance calculations that altitude would make is probably smaller than the error in most commercial GPS devices anyway, so in most applications altitude can be safely dispensed with (altitude measurements themselves are pretty inaccurate with commercial GPS devices, although survey data on altitudes is quite accurate).

PostgreSQL does not factor in altitude when calculating distances. It is all done in a planar surface.
Most of database spatial packages will not take this into account, altought, if your point is 3d, i.e., has a Z coordinate that might happend.
I don´t have PostgreSQL in this machine, but try this.
SELECT ST_DISTANCE(ST_POINT(0,0,10),ST_POINT(0,0,0));
It´s fairly easy to know if it is taking into account your Z value, since the return should be > 0; If that turns out to be true, just create Z aware features, and you will be successfull.
What SQL SERVER 2008, for example, takes into account when calculating distances, is the position of a Geography feature in a sphere. Geometry features in SQL SERVER will always use planar calculations.
EDIT: checked this in PostGIS manual
For Z aware points you must use the ST_MakePoint function. It takes up to 4 arguments (X Y Z and M). St_POINT only takes two (X Y)
http://postgis.refractions.net/documentation/manual-1.4/ST_Distance.html
ST_DISTANCE = 2D calculations
ST_DISTANCE_SPHERE documentation (takes in account a fixed sphere for calculations - aka not planar)
http://postgis.refractions.net/documentation/manual-1.4/ST_Distance_Sphere.html
ST_DISTANCE_SPHEROID documentation (takes into account a choosen spheroid for your calculations)
http://postgis.refractions.net/documentation/manual-1.4/ST_Distance_Spheroid.html
ST_POINT documentation
http://postgis.refractions.net/documentation/manual-1.4/ST_Point.html

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight