Using STCrosses() with a Spatial Index in SQL Server - sql-server

Does The Microsoft StCrosses() function for Geography data support Spatial Index?
When I try to execute this function with Spatial Index I get this error message:
"The query processor could not produce a query plan for a query with a spatial index hint. Reason: Spatial indexes do not support the method name supplied in the predicate. Try removing the index hints or removing SET FORCEPLAN"

No.
Indexing spatial data is nontrivial, and the class you are discussing can contain arbitrarily complex figures, not just simple geometric shapes. The specific way shapes and indexing is implemented can make finding overlaps difficult or impossible in the general case. It's also not based on whatever is indexed of the spatial data for complex geometries. This may be why you can't require SQL to only use the index - there is not enough data there. In the degenerate case there may be, but it would not know that, so it is turned off.
Imagine having a star-shape, with complex things embedded in it. The index may only store the boundary of the outer shpe, or the center of the shape, or the bounding rectangle. None of these would be enough to compute the intersect of 2 shapes, or if the shapes actually overlap.
See http://msdn.microsoft.com/en-us/library/bb895265.aspx#geometry to confirm that it is not supported.

Related

Get projection limits from Postgis

I receive spatial queries to my API in lat/lon coordinate pairs. My spatial data is in projections that don't cover the entire globe, which follows that some queries are out of bounds.
I'd like to respond to wrong queries with helpful error message. Rather than try to find out some in GIS specifications or standards what boundaries each projection has (and getting correct lat/lon pairs from those), I'd like to know wether I could either ask the limits from Postgis or ask if specific point is within limits and in what way it's wrong. This way I could support many projections easily.
It looks like Postgis has this information because for wrong query it answers thus:
transform: couldn't project point (-77.0331 -12.1251 0):
latitude or longitude exceeded limits (-14)
I'm using Postgis through Geodjango distance query (django.contrib.gis.geos.geometry.GEOSGeometry.distance function).
PostGIS or PROJ.4 (follow this thread) don't have these bounds. Each projection's bounds are unique, and are traditionally published by the authority that designed the projection.
One of the primary sources for this data is from https://www.epsg-registry.org click "retrieve by code" and (e.g.) 27200, and view the "Area of Use" fields.
Much of the same information is repeated at (e.g.) http://epsg.io/27200 and look for "bounds".
If you need this data, I suggest you make a new table to collect it.

Find all lat/long pairs within a given distance from a lat/long pair

I have a database with millions of lat/long pairs. I would like to implement a function to gather all lat/long pairs within a specified distance from a given lat/long pair. Is there a better way to do this than by iterating over each pair in the database and computing the distance between that pair and the given pair? I'd like to avoid brute force if I can avoid doing so!
I would like to add that I will never be searching for lat/long pairs greater than 1 mile from the given lat/long pair.
Many databases support storage of spatial types directly, and include spatial queries. This will handle the distance computation correctly for you, as well as provide a far more efficient means of pulling the information.
For examples, see:
Spatial Data in SQL Server
Geometric types in PostgreSQL
MySQL Spatial Extensions
SpatiaLite
What you can do is cluster the database beforehand. In this case you would divide the database into, say, 3-mile clusters. Then when you do the search you only need to compare points within the same cluster.

Caching user-specific proximity searches

The situation and the goal
Imagine a user search system that provides a proximity search from a user’s own position, which is specified by a decimal latitude/longitude combination. An Atlanta resident’s position, for instance, would be represented by 33.756944,-84.390278 and a perimeter search by this user should yield other users in his area from a radius of 10 mi, 50 mi, and so on.
A table-valued function calculates distances and provides users accordingly, ordered by ascending distance to the user that started the search. It’s always a live query, and it’s a tough and frequent one. Now, we want to built some sort of caching to reduce load.
On the way to solutions
So far, all users were grouped by the integer portion of their lat/long. The idea is to create cache files with all users from a grid square, so accessing the relevant cache file would be easy. If a grid square contains more users than a cache file should, the square is quartered or further divided into eight pieces and so on. To fully utilize a square and its cache file, multiple overlaying squares are contemplated. One deficiency of this approach is that gridding and quartering high density metropolitan areas and spacious countrysides into overlaying cache files may not be optimal.
Reading on, I stumbled upon topics like nearest neighbor searches, the Manhattan distance and tree-esque space partitioning techniques like a k-d tree, a quadtree or binary space partitioning. Also, SQL Server provides its own geographical datatypes and functions (though I’d guess the pure-mathematical FLOAT way has an adequate performance). And of course, the crux is making user-centric proximity searches cacheable.
Question!
I haven’t found much resources on this, but I’m sure I’m not the first one with this plan. Remember, it’s not about the search, but about caching.
Can I scrap my approach?
Are there ways of an advantageous partitioning of users into geographical divisions of equal size?
Is there a best practice for storing spatial user information for efficient proximity searches?
What do you think of the techniques mentioned above (quadtrees, etc.) and how would you pair them with caching?
Do you know an example of successfully caching user-specific proximity search?
Can I scrap my approach?
You can adapt your appoach, because as you already noted, a quadtree uses this technic. Or you use a geo spatial extension. That is available for MySql, too.
Are there ways of an advantageous partitioning of users into
geographical divisions of equal size
A simple fixed grid of equal size is fine when locations are equally distributed or if the area is very small. Geo locations are hardly equal distributed. Usually a geo spatial structure is used. see next answer:
Is there a best practice for storing spatial user information for
efficient proximity searches
Quadtree, k-dTree or R-Tree.
What do you think of the techniques mentioned above (quadtrees, etc.) and how would you pair them with caching?
There is some work from Hannan Samet, which describes Quadtrees and caching.

calculate or save spatial data

When working with spatial data in a data base like Postgis, is it a good way to calculate on every SELECT the intersections of two polygons or the area of polygons? Or is it better for performance issues to do the calculations on an INSERT, UPDATE or DELETE-statement and save the results in a column of the tables? How is the approach in big spatial data bases?
Thanks for an answer.
The questuion is too abstract.
Of course if you use the intersection area (ST_Intersection) you should store the result of ST_Intersection geometry. But in practice we often have to calculate intersection on-fly, because entrance arguments depend on dynamical parameters (e.g. intersection of an area with temperature <30C with and area of wind > 20 ms). By the way you can use VIEW to simplify a query in that way.
Of course if your table contains both geometrical column-arguments or one of them is constant it's better to store the intersection. Particularly you can build spatial indexes for this column.
There is no any constant rules. You should be guided by practice conditions: size of database, type of using etc. For example I store the generated ellipse (belief zone) for point of lightning stroke, but I don't store facts of intersectioning (boolean) with powerlines because those intersetionings may be parametrized.

Geospatial data in SQL

I have been experimenting with geography datatype lately and just love it. But I can't decide should i convert from my current schema, that stores latitude and longitude in two separate numeric(9,5) fields to geography type. I have calculated the size of both types and Lat/Long way of representing a point is 28 bytes for a single point whereas geography type is 26. Not a big gain in space but huge improvement in performing geospatial operations (intersect, distance measurement etc.) which are currently handled using awkward stored procedures and scalar functions. What I wonder is the indices. Will geography data type require more space for indexing the data? I have a feeling that it will, even though the actual data stored in columns is less, I thing the way geospatial indices work will eventually result in larger space allocation for them.
P.S. as a side note, it seems that SQL Server 2008 (not R2) does not automatically seek through geospatial indices unless explicitly told to using WITH(INDEX()) clause
In my opinion you should definitely use the spatial types only. The spatial type are optimized for spatial queries and if spatial queries are what you need then I think it is an easy choice.
As a sideeffect you can get rid of your geographical functions and procedures since they are (probably) built-in in SQL server 2008. One caveat though, you might have to spend some time optimizing the spatial indexes, but this depends on your specific case.
I understand that you are trying to decide between keep one of the two, but you might want to consider keeping both. If you export your data into shape files, its a common practice to let lat lon field be along with the geom field.
I would keep both. It can be useful to easily query the original coordinates of a particular feature without requiring spatial operations. You have the benefit of knowing the original points as well as the ability to create a new geometry from them in case you need it in a different coordinate system (like if you have your geometry in a particular projection that will lose a lot of precision going to another).

Resources