Related
I can calculate the distance between two points using:
SELECT ST_Distance(
ST_GeomFromText('SRID=4326;POINT(54.5972850 -5.930119)')
, ST_GeomFromText('SRID=4326;POINT(54.516827 -5.958130)'),
false);
However, my goal is to create a rough circular zone (this can be square, hexagon, octagon .etc) around each point and then check if the zones overlap.
I am looking at ST_Overlaps as a possible solution but I am not sure how to convert these points into polygons to be compared. My ideal result would be something like:
SELECT ST_Overlaps(
ST_CreateCircularPolygon(geom1, 1000, 6)
ST_CreateCircularPolygon(geom2, 10000, 4)
);
Where:
ST_CreateCircularPolygon(geomerty, metreRadius, numberOfRadialPoints (e.g. 6 creates a hexagonal polygon))
Any guidance would be much appreciated!
You can use the quad_seg parameter of st_buffer to specify the number of segments per quarter of a circle. That is, the total number of segments in the output will be a factor of 4.
To produce a square:
select st_asText(st_buffer(st_geomFromText('Point(10 10)'), 1, 'quad_segs=1'));
st_astext
------------------------------------------------------
POLYGON((11 10,10 9,9 10,9.99999999999999 11,11 10))
(1 row)
Octagon:
select st_asText(st_buffer(st_geomFromText('Point(10 10)'), 1, 'quad_segs=2'));
st_astext
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
POLYGON((11 10,10.7071067811865 9.29289321881345,10 9,9.29289321881345 9.29289321881345,9 10,9.29289321881345 10.7071067811865,9.99999999999999 11,10.7071067811865 10.7071067811866,11 10))
(1 row)
Since you want to work in meters but have unprojected coordinates, you can cast your geometry to geography, apply a buffer in meters and cast back to geometry. Let's note that st_buffer in geography will internally cast to a geometry in UTM, do the buffer, then cast back to geography (a lot of casting, but it's handy!)
That being said, a square is not a circle and it sounds very very wrong to assume otherwise. The orientation of the square is not obvious: should a corner be at the north? or should a segment be facing norht? or should the square be rotated? by how much?
You will save yourself a lot of trouble by using a real circle. In this case, don't use st_buffer at all, nor st_distance but rather st_dwithin which can leverage spatial indexes
I would like to buffer the warning polygon by two miles can anyone help me with
this so if ema personal are with in to miles of the warning the are listed, I've been trying to use ST Buffer (to expand the polygon coverage for the search) but cant seem to get it right? Is it in Meters (3218.69)? I'm using the latest opengeo suite.
SELECT DISTINCT ON (ema.name)
ST_X(ema.geom),ST_Y(ema.geom),ema."name", torpoly.expire
FROM ema INNER JOIN torpoly ON ST_Within(ema.geom, ST_BUFFER(torpoly.geom)
ORDER BY ema."name"
Your options are either:
Use an appropriate projected coordinate system for the region that uses linear units in metres or feet (UTM, State plane, etc.). All distance calculations on geometry types use a Cartesian coordinate system, which is quick and simple.
Use the geography type, which does distance calculations on objects with EPSG:4326 (lat/lon) with distance units in metres. If you don't want to change the data types, you can use a geom::geography cast, and maybe make an index on that cast.
And never do ST_Within(.., ST_Buffer()) for this type of analysis. It is slower and imperfect. Instead, use ST_DWithin, which finds all geometry/geography objects within a distance threshold of each other, which is just like a buffer. This function may use a spatial GiST index, if present.
I've noticed that the MAX/MIN aggregate functions for SQL Server 2008 do not work as expected with negative numbers.
I was working with latitude and longitude values (many are negative #s) and I'm getting results that appear to only be looking at the absolute value.
SELECT
MAX(g.Geo_Lat) AS MaxLat, MAX(g.Geo_Long) AS MaxLong,
MIN(g.Geo_Lat) AS MinLat, MIN(g.Geo_Long) AS MinLong
FROM Geolocations g
Here are results of a query:
MaxLat MaxLong MinLat MinLong
38.3346412 -85.7667496 38.1579234 -85.5289429
note the results for maxlong and minlong are incorrect.
Is there some workaround for this (other than a special UDF)?
Data types and collation determine order.
Geographic data, stored for instance as a geography type, could sort differently than float values - but in this case, they would not. Gopegraphic types are not sortable, only the latitudes and longitudes are, as you display. But those output as float values.
What data type are you using that causes this to occur? Affter some testing, I eventually figured it out. It would work as expected for geographic data, or any numeric type that holds negative numbers.
You are storing your latitudes and longitudes as text data - aren't you?
Cast them as floats. That will fix it.
Max and Min aggregates have worked as expected for me in SQL Server 2008. Can you provide the column types for Geolocations table?
It does depend on the locations in your table. As you get more and more entries, it should approach
MaxLat MaxLong MinLat MinLong
90 180 -90 -180
south pole far west of north pole far east of
prime meridian prime meridian
Prime meridian 0 degrees goes through London, England.
-180 and 180 are the same line opposite 0.
Try adding these locations to your table if you only have a few rows:
insert into Geolocations (Geo_Lat, GeoLong) values(-180,-90)
insert into Geolocations (Geo_Lat, GeoLong) values(0,0)
insert into Geolocations (Geo_Lat, GeoLong) values(180,90)
This is more of a challenge question than something I urgently need, so don't spend all day on it guys.
I built a dating site (long gone) back in 2000 or so, and one of the challenges was calculating the distance between users so we could present your "matches" within an X mile radius. To just state the problem, given the following database schema (roughly):
USER TABLE
UserId
UserName
ZipCode
ZIPCODE TABLE
ZipCode
Latitude
Longitude
With USER and ZIPCODE being joined on USER.ZipCode = ZIPCODE.ZipCode.
What approach would you take to answer the following question: What other users live in Zip Codes that are within X miles of a given user's Zip Code.
We used the 2000 census data, which has tables for zip codes and their approximate lattitude and longitude.
We also used the Haversine Formula to calculate distances between any two points on a sphere... pretty simple math really.
The question, at least for us, being the 19 year old college students we were, really became how to efficiently calculate and/store distances from all members to all other members. One approach (the one we used) would be to import all the data and calculate the distance FROM every zip code TO every other zip code. Then you'd store and index the results. Something like:
SELECT User.UserId
FROM ZipCode AS MyZipCode
INNER JOIN ZipDistance ON MyZipCode.ZipCode = ZipDistance.MyZipCode
INNER JOIN ZipCode AS TheirZipCode ON ZipDistance.OtherZipCode = TheirZipCode.ZipCode
INNER JOIN User AS User ON TheirZipCode.ZipCode = User.ZipCode
WHERE ( MyZipCode.ZipCode = 75044 )
AND ( ZipDistance.Distance < 50 )
The problem, of course, is that the ZipDistance table is going to have a LOT of rows in it. It isn't completely unworkable, but it is really big. Also it requires complete pre-work on the whole data set, which is also not unmanageable, but not necessarily desireable.
Anyway, I was wondering what approach some of you gurus might take on something like this. Also, I think this is a common issue programmers have to tackle from time to time, especially if you consider problems that are just algorithmically similar. I'm interested in a thorough solution that includes at least HINTS on all the pieces to do this really quickly end efficiently. Thanks!
Ok, for starters, you don't really need to use the Haversine formula here. For large distances where a less accurate formula produces a larger error, your users don't care if the match is plus or minus a few miles, and for closer distances, the error is very small. There are easier (to calculate) formulas listed on the Geographical Distance Wikipedia article.
Since zip codes are nothing like evenly spaced, any process that partitions them evenly is going to suffer mightily in areas where they are clustered tightly (east coast near DC being a good example). If you want a visual comparison, check out http://benfry.com/zipdecode and compare the zipcode prefix 89 with 07.
A far better way to deal with indexing this space is to use a data structure like a Quadtree or an R-tree. This structure allows you to do spatial and distance searches over data which is not evenly spaced.
Here's what an Quadtree looks like:
To search over it, you drill down through each larger cell using the index of smaller cells that are within it. Wikipedia explains it more thoroughly.
Of course, since this is a fairly common thing to do, someone else has already done the hard part for you. Since you haven't specified what database you're using, the PostgreSQL extension PostGIS will serve as an example. PostGIS includes the ability to do R-tree spatial indexes which allow you to do efficient spatial querying.
Once you've imported your data and built the spatial index, querying for distance is a query like:
SELECT zip
FROM zipcode
WHERE
geom && expand(transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661), 16093)
AND
distance(
transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
geom) < 16093
I'll let you work through the rest of the tutorial yourself.
http://unserializableone.blogspot.com/2007/02/using-postgis-to-find-points-of.html
Here are some other references to get you started.
http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_tut02
http://www.manning.com/obe/PostGIS_MEAPCH01.pdf
http://postgis.refractions.net/docs/ch04.html
I'd simply just create a zip_code_distances table and pre-compute the distances between all 42K zipcodes in the US which are within a 20-25 mile radius of each other.
create table zip_code_distances
(
from_zip_code mediumint not null,
to_zip_code mediumint not null,
distance decimal(6,2) default 0.0,
primary key (from_zip_code, to_zip_code),
key (to_zip_code)
)
engine=innodb;
Only including zipcodes within a 20-25 miles radius of each other reduces the number of rows you need to store in the distance table from it's maximum of 1.7 billion (42K ^ 2) - 42K to a much more manageable 4 million or so.
I downloaded a zipcode datafile from the web which contained the longitudes and latitudes of all the official US zipcodes in csv format:
"00601","Adjuntas","Adjuntas","Puerto Rico","PR","787","Atlantic", 18.166, -66.7236
"00602","Aguada","Aguada","Puerto Rico","PR","787","Atlantic", 18.383, -67.1866
...
"91210","Glendale","Los Angeles","California","CA","818","Pacific", 34.1419, -118.261
"91214","La Crescenta","Los Angeles","California","CA","818","Pacific", 34.2325, -118.246
"91221","Glendale","Los Angeles","California","CA","818","Pacific", 34.1653, -118.289
...
I wrote a quick and dirty C# program to read the file and compute the distances between every zipcode but only output zipcodes that fall within a 25 mile radius:
sw = new StreamWriter(path);
foreach (ZipCode fromZip in zips){
foreach (ZipCode toZip in zips)
{
if (toZip.ZipArea == fromZip.ZipArea) continue;
double dist = ZipCode.GetDistance(fromZip, toZip);
if (dist > 25) continue;
string s = string.Format("{0}|{1}|{2}", fromZip.ZipArea, toZip.ZipArea, dist);
sw.WriteLine(s);
}
}
The resultant output file looks as follows:
from_zip_code|to_zip_code|distance
...
00601|00606|16.7042215574185
00601|00611|9.70353520976393
00601|00612|21.0815707704904
00601|00613|21.1780461311929
00601|00614|20.101431539283
...
91210|90001|11.6815708119899
91210|90002|13.3915723402714
91210|90003|12.371251171873
91210|90004|5.26634939906721
91210|90005|6.56649623829871
...
I would then just load this distance data into my zip_code_distances table using load data infile and then use it to limit the search space of my application.
For example if you have a user whose zipcode is 91210 and they want to find people who are within a 10 mile radius of them then you can now simply do the following:
select
p.*
from
people p
inner join
(
select
to_zip_code
from
zip_code_distances
where
from_zip_code = 91210 and distance <= 10
) search
on p.zip_code = search.to_zip_code
where
p.gender = 'F'....
Hope this helps
EDIT: extended radius to 100 miles which increased the number of zipcode distances to 32.5 million rows.
quick performance check for zipcode 91210 runtime 0.009 seconds.
select count(*) from zip_code_distances
count(*)
========
32589820
select
to_zip_code
from
zip_code_distances
where
from_zip_code = 91210 and distance <= 10;
0:00:00.009: Query OK
You could shortcut the calculation by just assuming a box instead of a circular radius. Then when searching you simply calculate the lower/upper bound of lat/lon for a given point+"radius", and as long as you have an index on the lat/lon columns you could pull back all records that fall within the box pretty easily.
I know that this post is TOO old, but making some research for a client I've found some useful functionality of Google Maps API and is so simple to implement, you just need to pass to the url the origin and destination ZIP codes, and it calculates the distance even with the traffic, you can use it with any language:
origins = 90210
destinations = 93030
mode = driving
http://maps.googleapis.com/maps/api/distancematrix/json?origins=90210&destinations=93030&mode=driving&language=en-EN&sensor=false%22
following the link you can see that it returns a json. Remember that you need an API key to use this on your own hosting.
source:
http://stanhub.com/find-distance-between-two-postcodes-zipcodes-driving-time-in-current-traffic-using-google-maps-api/
You could divide your space into regions of roughly equal size -- for instance, approximate the earth as a buckyball or icosahedron. The regions could even overlap a bit, if that's easier (e.g. make them circular). Record which region(s) each ZIP code is in. Then you can precalculate the maximum distance possible between every region pair, which has the same O(n^2) problem as calculating all the ZIP code pairs, but for smaller n.
Now, for any given ZIP code, you can get a list of regions that are definitely within your given range, and a list of regions that cross the border. For the former, just grab all the ZIP codes. For the latter, drill down into each border region and calculate against individual ZIP codes.
It's certainly more complex mathematically, and in particular the number of regions would have to be chosen for a good balance between the size of the table vs. the time spent calculating on the fly, but it reduces the size of the precalculated table by a good margin.
I would use latitude and longitude. For example, if you have a latitude of 45 and a longitude of 45 and were asked to find matches within 50 miles, then you could do it by moving 50/69 ths up in latitude and 50/69 ths down in latitude (1 deg latitude ~ 69 miles). Select zip codes with latitudes in this range. Longitudes are a little different, because they get smaller as you move closer to the poles.
But at 45 deg, 1 longitude ~ 49 miles, so you could move 50/49ths left in latitude and 50/49ths right in latitude, and select all zip codes from the latitude set with this longitude. This gives you all zip codes within a square with lengths of a hundred miles. If you wanted to be really precise, you could then use the Haversine formula witch you mentioned to weed out zips in the corners of the box, to give you a sphere.
Not every possible pair of zip codes are going to be used. I would build zipdistance as a 'cache' table. For each request calculate the distance for that pair and save it in the cache. When a request for a distance pair comes, first look in the cache, then compute if it's not available.
I do not know the intricacies of distance calculations, so I would also check whether computing on the fly is cheaper than looking up (also taking into consideration how often you have to compute).
I have the problem running great, and pretty much everyone's answer got used. I was thinking about this in terms of the old solution instead of just "starting over." Babtek gets the nod for stating in in simplest terms.
I'll skip the code because I'll provide references to derive the needed formulas, and there is too much to cleanly post here.
Consider Point A on a sphere, represented by latitude and longitude. Figure out North, South, East, and West edges of a box 2X miles across with Point A at the center.
Select all point within the box from the ZipCode table. This includes a simple WHERE clause with two Between statements limiting by Lat and Long.
Use the haversine formula to determine the spherical distance between Point A and every point B returned in step 2.
Discard all points B where distance A -> B > X.
Select users where ZipCode is in the remaining set of points B.
This is pretty fast for > 100 miles. Longest result was ~ 0.014 seconds to calculate the match, and trivial to run the select statement.
Also, as a side note, it was necessary to implement the math in a couple of functions and call them in SQL. Once I got past a certain distance the matching number of ZipCodes was too large to pass back to SQL and use as an IN statement, so I had to use a temp table and join the resulting ZipCodes to User on the ZipCode column.
I suspect that using a ZipDistance table will not provide a long-term performance gain. The number of rows just gets really big. If you calculate the distance from every zip to to every other zip code (eventually) then the resultant row count from 40,000 zip codes would be ~ 1.6B. Whoah!
Alternately, I am interested in using SQL's built in geography type to see if that will make this easier, but good old int/float types served fine for this sample.
So... final list of online resources I used, for your easy reference:
Maximum Difference, Latitude and Longitude.
The Haversine Formula.
Lengthy but complete discussion of the whole process, which I found from Googling stuff in your answers.
Does Postgres' Spatial plugin, or any Spatial package for that manner, factor in the altitude when calculating the distance between 2 points?
I know the Spatial packages factor in the approximate curvature of the earth but if one location is at the top of a mountain and the other location is close to the sea - it seems like the calculated difference between those two points would greatly vary if the difference in altitude was not factored into account.
Also keep in mind that if I have 2 points are at the same ocean altitude but a mountain exists between the 2 points - the distance package should account for this.
Those factors are not being counted at all. Why? The software only knows about the two features (the two points you are getting the distance, the sphere/spheroid and a datum/projection factor).
For that to happen you need to probably use a developed linestring, in which you will connect your point with n vertices, each of them being Z aware.
Imagine this (loose WKT): LINESTRING((0,1,2),(0,2,3),(0,3,4),(0,10,15),(0,11,-1)).
Asking the software to calculate the distance between each vertex and summing it up, will consider the variations of terrain. But without something like that, it is impossible to map for irregularities in terrain.
All GIS softwares cannot tell, by themselves, what are those irregularities in terrain, and therefore, not take them in account.
You can create such linestrings (automatically) with softwares like ArcGIS (and others), using a line (between two points), and a surface file, such as the ones provided freely by NASA (SRTM project). These files come in a raster format, and each pixel has a X Y and Z value, in meters. Traversing the line you want, coupled with that terrain profile, you can achieve the calculation you want to achieve. If you need to have super extra precise calculations, you need a precise surface, and precise Z values in each vertex of this profile line.
That cleared up?
If the distance formula you're using does not take the altitude of the two points as parameters (in addition to the Latitudes and Longitudes of the two points), then it does not factor in altitude to the distance calculation. In any event, altitude difference does not have a very significant effect on calculated distance.
As usual with GPS, the difference in distance calculations that altitude would make is probably smaller than the error in most commercial GPS devices anyway, so in most applications altitude can be safely dispensed with (altitude measurements themselves are pretty inaccurate with commercial GPS devices, although survey data on altitudes is quite accurate).
PostgreSQL does not factor in altitude when calculating distances. It is all done in a planar surface.
Most of database spatial packages will not take this into account, altought, if your point is 3d, i.e., has a Z coordinate that might happend.
I don´t have PostgreSQL in this machine, but try this.
SELECT ST_DISTANCE(ST_POINT(0,0,10),ST_POINT(0,0,0));
It´s fairly easy to know if it is taking into account your Z value, since the return should be > 0; If that turns out to be true, just create Z aware features, and you will be successfull.
What SQL SERVER 2008, for example, takes into account when calculating distances, is the position of a Geography feature in a sphere. Geometry features in SQL SERVER will always use planar calculations.
EDIT: checked this in PostGIS manual
For Z aware points you must use the ST_MakePoint function. It takes up to 4 arguments (X Y Z and M). St_POINT only takes two (X Y)
http://postgis.refractions.net/documentation/manual-1.4/ST_Distance.html
ST_DISTANCE = 2D calculations
ST_DISTANCE_SPHERE documentation (takes in account a fixed sphere for calculations - aka not planar)
http://postgis.refractions.net/documentation/manual-1.4/ST_Distance_Sphere.html
ST_DISTANCE_SPHEROID documentation (takes into account a choosen spheroid for your calculations)
http://postgis.refractions.net/documentation/manual-1.4/ST_Distance_Spheroid.html
ST_POINT documentation
http://postgis.refractions.net/documentation/manual-1.4/ST_Point.html