I am looking to use Redis' GEORADIUS command.
However, this command only searches within the range of a two-dimensional circle around a given longitude and latitude. I need to also search with an altitude, thus finding results within the range of a three-dimensional sphere.
How would I achieve this in Redis?
I believe what you're actually trying to do is search within a 3D cylinder (or some kind of a cone if you take into account the Earth's spherical nature).
An altitude search isn't natively included with Redis' geospatial indices, but you could store that property in its own Sorted Set as score. Then, you can perform a range search (ZRANGEBYSCORE) on the altitude and intersect (ZINTER) the results with those from the radius query (hint: use a temporary key to STORE results).
For your reference, this is the approach I took with the xyzsets in geo.lua.
Related
This is more a design question so please bear with me.
I have a system that stores locations consisting of the ID, Longitude and Latitude.
I need to compare the distance between my current location and the locations in the database a only choose ones that are within a certain distance.
I have the formula that calculates the distance between 2 locations based on the long/lat and that works great.
My issue is I may have 10 of thousands of locations in the database and don't want to loop through them all every time I need a list of locations close by.
Not sure what other datapoint I can store with the location to make it so I only have to compare a smaller subset.
Thanks.
As was mentioned in the comments, SQL Server has had support for geospatial since (iirc) SQL 2008. And I know that there is support within .NET for that as well so you should be able to define the data and query it from within your application.
Since the datatype is index-able, k nearest neighbor queries are pretty efficient. There's even a topic in the documentation for that use case. Doing a lift and shift from that page:
DECLARE #g geography = 'POINT(-121.626 47.8315)';
SELECT TOP(7) SpatialLocation.ToString(), City
FROM Person.Address
WHERE SpatialLocation.STDistance(#g) IS NOT NULL
ORDER BY SpatialLocation.STDistance(#g);
If you need all the points within that radius, omit the top clause and change the predicate on STDistance() to something like SpatialLocation.STDistance(#g) < 1000 (the SRID I typically use has meters as the unit of measure, so this would say 'within 1 km').
https://gis.stackexchange.com/ is a good place for in-depth advice on this topic.
A classic approach to quickly locating "nearby" values, is to "grid" the area of interest:
Associate each location with a "grid cell", where each cell is a convenient size. Pick a cell-edge-length such that most cells will hold a small number of values and/or that is similar to the distance range you typically query.
If cell edge is 1 km, and you need locations within 2 km, then get data from 5x5 cells centered at the "target" location.
This is guaranteed to include all data +- 2 km from any location within the central cell.
Apply distance formula to each returned location; some will be beyond 2 km.
I've only done this in memory, not from a DB. I think you add two columns, one for X cell number, other for Y cell number.
With indexes on both of those. So can efficiently get a range of Xs by a range of Ys.
Not sure if a combined "X,Y" index helps or not.
I have a hilbert curve index based on this algorithm. I take two to four values (latitude, longitude, time in unix format and an id code) and create a 1-d hilbert curve.
I'm looking for a way to use this data to create a bounding box query (i.e. "find all ids within this rectangle).
I'm looking for a way to do so without decoding the 1d Hilbert code back into its constituent parts. It seems to be easier to do this with a Morton/Z-order curve but I was wondering about the locality preservation.
My question is: if I created a 2d hilbert curve range (i.e. I converted the range of the box into a hilbert curve so x1y1-> hilbert value1 and x2y2-> hilbertvalue2) would all values of corresponding 2d hilbert values fall within their range?
E.g. If I converted (1,2) and (20,30) into Hilbert values and then searched for all values between hilbertvalue1 and hilbertvalue2, would all the values I get decode to fall within (1,2) and (20, 30), or would I have to perform additional transformations?
An additional problem is crafting a range when you have more than 2 dimensions. I have the ability to convert in and out of Hilbert curves but how can I make sure that even 4d values have latitude and longitude that falls within the same rectangle/bounding box?
Thanks.
My question is: if I created a 2d hilbert curve range (i.e. I converted the range of the box into a hilbert curve so x1y1-> hilbert value1 and x2y2-> hilbertvalue2) would all values of corresponding 2d hilbert values fall within their range?
The answer is no. This is part of the challenge of using a Hilbert index. Below is an example curve. You'll notice that you can have a point on the curve that has a higher index than the vertices of a box containing that point. The light blue box is an example where the vertices have indices 117, 122, 133, 138 yet inside (though on the border) is the value 143.
One simple approach is brute force where you visit every cell in the search region and calculate the index in those cells. Then you compile a list of index ranges that would be used in a query. You might join some ranges and filter later as a performance optimization based on benchmarks (lots of small range queries might take longer than querying fewer larger ranges followed by a filter). I'd like to see something more elegant than this but have yet to see it.
UPDATE: I've worked out something more elegant than the brute force technique and the details (and a java library) are at https://github.com/davidmoten/hilbert-curve. In short, the endpoints of the ranges that exactly cover a search box will all be on the perimeter of the region. If you sort all the hilbert curve values on the perimeter of the region and start with the smallest value you can then pair up all the ranges by doing tests on whether the next point on the curve stays on the perimeter, leaves the box or is inside the box.
An additional problem is crafting a range when you have more than 2 dimensions. I have the ability to convert in and out of Hilbert curves but how can I make sure that even 4d values have latitude and longitude that falls within the same rectangle/bounding box?
The perimeter technique described above works for any number of dimensions (but of course becomes more expensive!).
For 2 dimensions you can treat the curve as a base-4 number (quadkey) and search from left to right.
What is the Difference between Geodist(sfield,x,y) and dist(2,x,y,a,b) in Apache Solr for Geo-Spacial Searches ??
dist(2,x,y,0,0) :- calculates the Euclidean distance between (0,0) and (x,y) for each document. Return the Distance between two Vectors (points) in an n-dimensional space.
I was earlier using geodist() distance function for Geo-Spatial searches on my website but its response time was large. so have done a POC(proof of concept) for different distance functions and found that dist(2,x,y,0,0) distance function is relatively taking half of the time. But I want to know the reason behind this and the algorithms which both functions are using to calculate the distance.
I have to make a difference matrix for the same to convey it further.
The main difference is that geodist() is intended to work with spatial field types.
Most spatial implementation are based on Lucene's Points API, which is a BKD Index. This field type is strictly limited to coordinates in lat/lon decimal degrees. Behind the scenes, latitude and longitude are indexed as separate numbers. Four main field types are available for spatial search :
LatLonPointSpatialField
LatLonType (now deprecated) and its non-geodetic twin PointType
SpatialRecursivePrefixTreeFieldType (RPT for short), including RptWithGeometrySpatialField, a derivative
BBoxField (for areas, 4 instances of another field type referred to by numberType)
In geodist (sfield, x, y), sfield is a spatial field type that represents two points (lat,lon), so the direct equivalent using dist() would be to implement dist (2, sfieldX, sfieldY, x, y) with sfieldX and sfieldY being respectively the (lat,lon) coordinates of sfield.
Using dist (power, a, b, ...) you can't query a spatial field type. In order to perform the same spatial search, you would have to specify every point's dimension separately. It would require 2 indexed fields (or values per field at least) for 2 dimensions, 3 for 3d, and so on. That makes a huge difference because you would have to index every coordinates of each point separately.
Besides, you can also use geodist() as is with the BBoxField field type that indexes a single rectangle per document field and supports searching via a bounding box. To do the same with dist() you would have to compute the center point of the box to input each one of its coordinates as a function argument, so it would be too much hassle to yield the same result if you want to use an area as parameter.
Lastly, LatLonPointSpatialField for example does distance calculations based on Haversine formula (Great Circle), BBoxField does it a little faster because the rectangular shape is faster to compute. It's true that dist() may be even faster but remember that requires more field to be indexed, a lot of preprocess at query time to be able to yield the same calculated distance, and, as mentioned by Mats, it wouldn't take the earth' curvature into account.
An euclidean distance doesn't account for the curvature of the earth. If you're only sorting by the distance, the behavior can be OK - but only if your hits are within a small geographical area (the value of a unit compared to meters greatly change when you're getting closer to the poles).
There's an extensive and good answer that explains the difference between a Euclidean distance and a proper geographical distance (usually calculated using haversine) available at the GIS Stack Exchange.
Although at small scales any smooth surface looks like a plane, the accuracy of the Pythagorean formula depends on the coordinates used. When those coordinates are latitude and longitude on a sphere (or ellipsoid), we can expect that
Distances along lines of longitude will be reasonably accurate.
Distances along the Equator will be reasonably accurate.
All other distances will be erroneous, in rough proportion to the differences in latitude and longitude.
I would like to buffer the warning polygon by two miles can anyone help me with
this so if ema personal are with in to miles of the warning the are listed, I've been trying to use ST Buffer (to expand the polygon coverage for the search) but cant seem to get it right? Is it in Meters (3218.69)? I'm using the latest opengeo suite.
SELECT DISTINCT ON (ema.name)
ST_X(ema.geom),ST_Y(ema.geom),ema."name", torpoly.expire
FROM ema INNER JOIN torpoly ON ST_Within(ema.geom, ST_BUFFER(torpoly.geom)
ORDER BY ema."name"
Your options are either:
Use an appropriate projected coordinate system for the region that uses linear units in metres or feet (UTM, State plane, etc.). All distance calculations on geometry types use a Cartesian coordinate system, which is quick and simple.
Use the geography type, which does distance calculations on objects with EPSG:4326 (lat/lon) with distance units in metres. If you don't want to change the data types, you can use a geom::geography cast, and maybe make an index on that cast.
And never do ST_Within(.., ST_Buffer()) for this type of analysis. It is slower and imperfect. Instead, use ST_DWithin, which finds all geometry/geography objects within a distance threshold of each other, which is just like a buffer. This function may use a spatial GiST index, if present.
I have a rather specific spatial search I need to do. Basically, have an object (lets call it obj1) with two locations, lets call them point A and point B.
I then have a collection of objects(lets call each one obj2) each with their own A and B locations.
I want to return the top 10 objects from the collection sorted by:
(distance from obj1 A to obj2A) + (the distance from obj1B to obj2B)
Any ideas?
Thanks,
Nick
Update:
Here's a little more detail on the documents and how I want to compare them.
The domain model:
Listing:
ListingId int
Title string
Price double
Origin Location
Destination Location
Location:
Post / Zipcode string
Latitude decimal
Longitude decimal
What i want to do is take a listing object (not in the database) and compare it with the collection of listings in the database. I want the query to return the top 12 (or x) number of listings sorted by the crow flies distance from the origins plus the crow flies distance from destinations.
I don't care about the distance from origin to destination - only about the distance of origin to origin plus destination to destination.
Basically Im trying to find listings where the starting and ending locations are close.
Please let me know if I can clarify more.
Thanks!
Here is how one would solve such a problem in
mysql 4.1 &
mysql 5.
The link from mysql 4.1 seems quite helpful, esp. the first example, it's pretty much what you are asking about.
But if this is not quite helpful, I guess you'd have to loop and do queries either on obj1 or obj2 against its counterpart table.
From algorithmic perspective, I'd find the center of the bounding box, then picked candidates with increasing radius while I find enough.
Also I just want to remind that crow fly distance over the globe is not Pythagoras distance and different formula must be used:
public static double GetDistance(double lat1, double lng1, double lat2, double lng2)
{
double deltaLat = DegreesToRadians(lat2 - lat1);
double deltaLong = DegreesToRadians(lng2 - lng1);
double a = Math.Pow(Math.Sin(deltaLat / 2), 2) +
Math.Cos(DegreesToRadians(lat1))
* Math.Cos(DegreesToRadians(lat2))
* Math.Pow(Math.Sin(deltaLong / 2), 2);
return earthMeanRadiusMiles * (2 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a)));
}
Sounds like you're building a rideshare website. :)
The bottom line is that in order to sort your query result by surface distance, you'll need spatial indexing built into the database engine. I think your options here are MySQL with OpenGIS extensions (already mentioned) or PostgreSQL with PostGIS. It looks like it's possible in ravenDB too: http://ravendb.net/documentation/indexes/sptial
But if that's not an option, there's a few other ways. Let's simplify the problem and say you just want to sort your database records by their distance to location A, since you're just doing that twice and summing the result.
The simplest solution is to pull every record from the database and calculate the distance to location A one by one, then sort, in code. Trouble is, you end up doing a lot of redundant computations and pulling down the entire table for every query.
Let's once again simplify and pretend we only care about the Chebyshev (maximum) distance. This will work for narrowing our scope within the db before we get more accurate. We can do a "binary search" for nearby records. We must decide an approximate number of closest records to return; let's say 10. Then we query inside of a square area, let's say 1 degree latitude by 1 degree longitude (that's about 60x60 miles) around the location of interest. Let's say our location of interest is lat,lng=43.5,86.5. Then our db query is SELECT COUNT(*) FROM locations WHERE (lat > 43 AND lat < 44) AND (lng > 86 AND lng < 87). If you have indexes on the lat/lng fields, that should be a fast query.
Our goal is to get just above 10 total results inside the box. Here's where the "binary search" comes in. If we only got 5 results, we double the box area and search again. If we got 100 results, we cut the area in half and search again. If we get 3 results immediately after that, we increase the box area by 50% (instead of 100%) and try again, proceeding until we get close enough to our 10 result target.
Finally we take this manageable set of records and calculate their euclidean distance from the location of interest, and sort, in code.
Good luck!
I do not think that you find a solution directly out of the box.
It'll be much more efficient if you use a bounding sphere instead of a bounding box to specify your object.
http://en.wikipedia.org/wiki/Bounding_sphere
C = ( A + B)/2 and R = distance(A,B) /2
You do not precise how much data you want to compare. And if you want to see the closests or the farthest objects pair.
For both case, I think that you have to encode C coordinate as a path in an octtree if you are using 3D or quadtree if you are using 2D.
http://en.wikipedia.org/wiki/Quadtree
This is a first draft I can add more information if this not enough.
If you are not familiar with 3D start with 2D it easier to start with.
I show your latest add, it seems that your problem is very similar to clash detection algorithm.
I think that if you change the coordinate system of the "end-point" by polar coordinate relative to the "start-point". If you round the radial coordinate to your tolerance (x miles), and order them by this value.