When to use ST_Transform and when to use ST_SetSRID in PostGIS?

When to use ST_Transform and when to use ST_SetSRID in PostGIS? - postgis

I'm new to writing postgis queries and I still get confused when writing queries where I have geom vs. a projected version of geom. My queries end up being a mess and I run into the error that goes something like conflicting SRID.
I feel like, mentally, I don't know when to solve it with ST_Transform and when to use ST_SetSRID.

Seems I'm not the only one confused on this. Postgis' website had this article from many years ago now:
ST_Transform and ST_SetSRID: To project or not to project?
People often get confused between the ST_Transform and ST_SetSRID functions.
ST_SetSRID doesn’t change the coordinates but adds meta data to state what spatial reference system the coordinate actually are. If you stamped your WGS 84 long lat data as a meter based projection. Guess what? Its still long lat. A spade by any other name is still a spade so don’t use ST_SetSRID and expect to magically get meter coordinates.
ST_Transform is used to change the underlying coordinates from a known spatial reference system to another known spatial reference system.
source: https://postgis.net/2013/08/30/st_transform-and-st_setsrid-to-project-or-not-to-project/

If the coordinates of the geometry already represent the correct spatial reference system, but it is not yet classified as such, use ST_SetSRID.
If you need to change the actual geometry to reflect a certain spatial reference system, use ST_Transform.
From the docs: "ST_Transform actually changes the coordinates of a geometry from one spatial reference system to another, while ST_SetSRID() simply changes the SRID identifier of the geometry." (https://postgis.net/docs/ST_Transform.html)

Related

Get projection limits from Postgis

I receive spatial queries to my API in lat/lon coordinate pairs. My spatial data is in projections that don't cover the entire globe, which follows that some queries are out of bounds.
I'd like to respond to wrong queries with helpful error message. Rather than try to find out some in GIS specifications or standards what boundaries each projection has (and getting correct lat/lon pairs from those), I'd like to know wether I could either ask the limits from Postgis or ask if specific point is within limits and in what way it's wrong. This way I could support many projections easily.
It looks like Postgis has this information because for wrong query it answers thus:
transform: couldn't project point (-77.0331 -12.1251 0):
latitude or longitude exceeded limits (-14)
I'm using Postgis through Geodjango distance query (django.contrib.gis.geos.geometry.GEOSGeometry.distance function).

PostGIS or PROJ.4 (follow this thread) don't have these bounds. Each projection's bounds are unique, and are traditionally published by the authority that designed the projection.
One of the primary sources for this data is from https://www.epsg-registry.org click "retrieve by code" and (e.g.) 27200, and view the "Area of Use" fields.
Much of the same information is repeated at (e.g.) http://epsg.io/27200 and look for "bounds".
If you need this data, I suggest you make a new table to collect it.

Should I use SqlGeometry or SqlGeography?

We're in a bit of internal conflict on this issue, and can't seem to come to a happy conclusion.
We'll only be storing latitudes and longitudes, and possibly simple polygons. All we need it for is computing distance between two points (and possibly to see if a point is within a polygon), and the entirety of the data is in such close proximity to make planar estimations acceptable.
Since our requirements are so relaxed, half of the dev team suggests using SqlGeometry types, which are apparently simpler. I'm having trouble accepting this, though, since we're storing geographic data, which seems like storing them in SqlGeography is the right thing to do. Also, I'm not finding any substantive evidence that the SqlGeometry data type is that much easier to work with than the SqlGeography type.
Does anyone have advice as to which type would be more appropriate for this relatively simple scenario?

It's not a question of comparing features, or accuracy, or simplicity - the two spatial datatypes are for working with different sorts of data.
As an analogy, suppose you were choosing the best datatype for a column that contained a unique identifier for each row. If that UID only contained integer values, you'd use int, whereas if it was a 6-character alphanumeric value you'd use char(6). And if it had variable-length unicode values, you'd use nvarchar instead, right?
The same logic goes for spatial data - you choose the appropriate datatype based on the values that that column contains; if you're working with geographic (i.e. latitude/longitude) coordinates, use the SqlGeography datatype. It's that simple.
You can use SqlGeometry to store latitude/longitude values, but it would be like using nvarchar(max) to store an integer... and I promise you it will lead to further problems down the line (when all your area calculations come out measured in degrees squared, for example)

The SqlGeography type has less methods available than SqlGeometry (especially in Sql 2008).
SqlGeography reference
SqlGeometry reference
For example, suppose you want to get the centroid of a polygon in Sql2008. You have a native method for that in geometry, but not in geography.
Also, it has the following limitations:
You can't have a geography exceeding one hemisphere
The ring-order matters when creating the polygon
Also, most API and libraries available (that I know of) handle geometries better than geographies.
That said, if the distance calculation has to be precise, you have large distances and have coordinates all over the world, geography would probably be a better fit. Otherwise, and according to your description of the problem, you would be well served with the geometry type.
Regarding your question: "is that much easier to work?". It depends. Anyway, and as a rule of thumb, for simple scenarios I typically opt for SqlGeometry.
Anyway, IMHO you shouldn't worry too much on that decision. It's relatively easy to create a new column with the other type and migrate the data if necessary.

Four years later, it has become apparent that we should have stored the data in SqlGeometry instead of SqlGeography.
Why?
We were importing information from legislative district maps, and their data was stored in SqlGeometry. When determining if a particular lat/long was within a certain legislative district boundary, we'd get inconsistent results when the point was close to two boundaries.
This required us to do additional work to identify locations that were "close" to a boundary, and manually verify that they were assigned to the proper district. Not ideal.
Moral of the story: if you're relying on any data, consider what type it's stored in to help guide your decision.

Get info from map data file by coordinate

Imagine I have a map shape file (.shp) or osm xml, I'm able to see different kind of data from different layers in GIS oriented programs, e.g. ArcGIS, QGIS etc. But how can I get this info programmatically? Is there a specific library for that?
What I'm really looking for is a some kind of method getMapData(longitude, latitude) to get landscape/terrain info (e.g. forest, river, city, highway) in specified location
Thanks in advance for your answers!

It still depends what you want to achieve whether you are better off using raster or vector data.
If your are using your grid to subdivide an area as an array of containers for geographic features, then stick with vector data. To do this, I would create a polygon grid file and intersect it with each of your data layers. You can then add an ID field that represents the cell's location in the array (and hence it's relative position to a known lat/long coordinate - let's say lower left). Alternatively you can use spatial queries to access your data by selecting a polygon in your vector grid file and then finding all the features in your other file that are contained by it.
OTOH, if you want to do some multi-feature analysis based on presence/abscence then you may be better going down the route of raster analysis. My gut feeling from what you have said is that this is what you are trying to achieve but I am still not 100% sure. You would handle this by creating a set of boolean rasters of a suitable resolution and then performing maths operations on the set (add, subtract, average etc - depending on what questions your are asking).
Let's say you are looking at animal migration. Let's say your model assumes that streams, hedges and towns are all obstacles to migration but roads only reduce the chance of an area being crossed. So you convert your obstacles to a value of '1' and NoData to '0' in each case, except roads where you decide to set the value to 0.5. You can then add all your rasters together in one big stack and predict migration routes.
Ok that's a simplistic example but perhaps you can see why we need EVEN more information on what you are wanting to do.

Shapefiles or an osm xml file are just containers that hold geometric shapes. There are plenty of software libraries out there that let you read these files and extract the data. I would recommend looking at GDAL/OGR as a starting point.
A method like getMapData(longitude, latitude) is essentially a search/query function. You need to be a little more specific too, do you want geometries that contain the point, are within a distance of a point, etc?
You could find the map data using a brute force algorithm
for shape in shapefile:
if shape.contains(query_point):
return shape
Or you can use more advanced algorithms/data structures such as RTrees, KDTrees, QuadTrees, etc. The easiest way to get start with querying map data is to load it into a spatial database. I would recommending investigating PostgreSQL+PostGIS and SpatiaLite

You may also like to look at Spatialite and/or PostGIS which are two spatial enabled databses that you could use separately or in conjunction with GDAL/OGR.
I must echo Charles' request that you explain your use-case in more detail because the actual implementation will depend greatly on exactly what you are wanting to achieve. My reading of this is that you may want to convert your data into a series of aligned rasters which you can overlay and treat as a 3 dimensional array.

distance between two points across land using sql server

I am looking to calculate the shortest distance between two points inside SQL Server 2008 taking into account land mass only.
I have used the geography data type along with STDistance() to work out point x distance to point y as the crow flies, however this sometimes crosses the sea which i am trying to avoid.
I have also created a polygon around the land mass boundary I am interested in.
I believe that I need to combine these two methods to ensure that STDistance always remains within polygon - unless there is a simpler solution.
Thanks for any advice

Use STIntersects - http://msdn.microsoft.com/en-us/library/bb933899%28v=SQL.105%29.aspx to find out what part of the line is over land.
After reading your comment your requirement makes sense. However I'm pretty sure there are no inbuilt techniques to do this in SQL Server. I'm assuming you are ignoring roads, and taking an as-the-crow-flies approach but over land only.
The only way I can think to do this would be to convert your area into a raster (grid cells) and perform a cost path analysis. You would set the area of sea to have a prohibitively high cost so the algorithm would route around the sea. See this link for description of technique:
http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=cost_path
Otherwise try implementing the algorithm below!
http://bit.ly/ckvciz
There may be other libraries that do this. Alteratively how about using the new Google Directions API between the two cities - you'd get actual road distances then.
http://code.google.com/apis/maps/documentation/directions/

Clustering Lat/Longs in a Database

I'm trying to see if anyone knows how to cluster some Lat/Long results, using a database, to reduce the number of results sent over the wire to the application.
There are a number of resources about how to cluster, either on the client side OR in the server (application) side .. but not in the database side :(
This is a similar question, asked by a fellow S.O. member. The solutions are server side based (ie. C# code behind).
Has anyone had any luck or experience with solving this, but in a database? Are there any database guru's out there who are after a hawt and sexy DB challenge?
please help :)
EDIT 1: Clarification - by clustering, i'm hoping to group x number of points into a single point, for an area. So, if i say cluster everything in a 1 mile / 1 km square, then all the results in that 'square' are GROUP'D into a single result (say ... the middle of the square).
EDIT 2: I'm using MS Sql 2008, but i'm open to hearing if there are other solutions in other DB's.

I'd probably use a modified* version of k-means clustering using the cartesian (e.g. WGS-84 ECF) coordinates for your points. It's easy to implement & converges quickly, and adapts to your data no matter what it looks like. Plus, you can pick k to suit your bandwidth requirements, and each cluster will have the same number of associated points (mod k).
I'd make a table of cluster centroids, and add a field to the original data table to indicate what cluster it belonged too. You'd obviously want to update the clustering periodically if your data is at all dynamic. I don't know if you could do that with a stored procedure & trigger, but perhaps.
*The "modification" would be to adjust the length of the computed centroid vectors so they'd be on the surface of the earth. Otherwise you'd end up with a bunch of points with negative altitude (when converted back to LLH).

If you're clustering on geographic location, and I can't imagine it being anything else :-), you could store the "cluster ID" in the database along with the lat/long co-ordinates.
What I mean by that is to divide the world map into (for example) a 100x100 matrix (10,000 clusters) and each co-ordinate gets assigned to one of those clusters.
Then, you can detect very close coordinates by selecting those in the same square and moderately close ones by selecting those in adjacent squares.
The size of your squares (and therefore the number of them) will be decided by how accurate you need the clustering to be. Obviously, if you only have a 2x2 matrix, you could get some clustering of co-ordinates that are a long way apart.
You will always have the edge cases such as two points close together but in different clusters (one northernmost in one cluster, the other southernmost in another) but you could adjust the cluster size OR post-process the results on the client side.

I did a similar thing for a geographic application where I wanted to ensure I could cache point sets easily. My geohashing code looks like this:
def compute_chunk(latitude, longitude)
(floor_lon(longitude) * 0x1000) | floor_lat(latitude)
end
def floor_lon(longitude)
((longitude + 180) * 10).to_i
end
def floor_lat(latitude)
((latitude + 90) * 10).to_i
end
Everything got really easy from there. I had some code for grabbing all of the chunks from a given point to a given radius that would translate into a single memcache multiget (and some code to backfill that when it was missing).

For movielandmarks.com I used the clustering code from Mike Purvis, one of the authors of Beginning Google Maps Applications with PHP and AJAX. It builds trees of clusters/points for different zoom levels using PHP and MySQL, storing it in the database so that recall is very fast. Some of it may be useful to you even if you are using a different database.

Why not testing multiple approaches?
translate the weka library in .NET CLI with IKVM.NET
add an assembly resulted from your code and weka.dll (use ilmerge) into your database
Make some tests, that is. No specific clustering works better than anyone else.

I believe you can use MSSQL's spatial data types. If they are similar to other spatial data types I know, they will store your points in a tree of rectangles, and then you can go to the lower-resolution rectangles to get implicit clusters.

If you end up wanting to explore Geohash's (which were invented at exactly the same time you posted this question), here's a more fleshed-out implementation of Geohash related functions for SQL Server's TSQL in which you might be interested.
QalGeohash-TSQL
I have used the Integer version of the Geohash extensively to cluster results to reduce data sent to a client for a limited viewport.