Best postgis distance query

Best postgis distance query - postgis

I'm trying to find the fastest way to get a count for properties with a radius. I've seen some of the other similar question. Does any one know why the third query gives such different results? It is super fast.
Slow
SELECT count(*)
FROM property
WHERE ST_Distance_Sphere(geom_pt, ST_MakePoint(-104.989879,39.736355))<=2000;
count=2665
Very Slow
SELECT count(*)
FROM property
WHERE ST_Distance(geom_pt, ST_SetSRID(ST_MakePoint(-104.989879, 39.736355),4326)::geography)<=2000;
count=2665
Very Fast
SELECT count(*)
FROM property
WHERE ST_Within(geom_pt,ST_Transform(ST_Buffer(ST_Transform(ST_SetSRID(ST_MakePoint(-104.989879, 39.736355), 4326), 3857), 2000), 4326));
count=1794

Nope, you want ST_DWithin which will use a GIST index.
SELECT count(*)
FROM property
WHERE ST_DWithin(
geom_pt, -- make sure this is also geography
ST_MakePoint(-104.989879, 39.736355)::geography,
2000 -- note distance in meters.
);
You also don't have to set srid to 4326 on geography. It's the default.
If you don't have the index,
CREATE INDEX ON property USING gist(geom_pt);
VACUUM property;
You can CLUSTER on geom_pt too.

Related

How do I query for geometries in a rectangle bigger than what I request?

I have this PostGIS query:
SELECT ST_AsGeoJSON(geom) AS geom_geojson
FROM tracts AS tbl
WHERE ST_MakeEnvelope(-86.84422306060793,36.14537465258748,-86.76182559967043,36.17846331773539) && ST_Transform(tbl.geom,4326);
This returns geometries in a rectangle made by those four points, but I'd like to get back geometries within a rectangle that's one kilometer bigger than that. What PostGIS query must I write?

Just use ST_DWithin on the geography type. If you have an index on tbl.geom::geography it'll use it. Or, you can store the tbl.geom as geography, and then you only have to have an index on the column.
SELECT ST_AsGeoJSON(geom) AS geom_geojson
FROM tracts AS tbl
WHERE ST_DWithin(
tbl.geom::geography,
ST_MakeEnvelope(
-86.84422306060793,
36.14537465258748,
-86.76182559967043,
36.17846331773539
),
1000
);

How to calculate the "Nearest Neighbour" for multiple sources in SQL Server?

The "Nearest Neighbour" problem is very common when working with spatial data.
There's even some nice, simple documentation about how to do it with MS Sql Server in their docs!
I'm usually seeing examples where it's using 1x source Lat/Long and it returns the 'x' number of nearest neighbour Lat/Longs. Fine...
e.g.
USE AdventureWorks2012
GO
DECLARE #g geography = 'POINT(-121.626 47.8315)';
SELECT TOP(7) SpatialLocation.ToString(), City FROM Person.Address
WHERE SpatialLocation.STDistance(#g) IS NOT NULL
ORDER BY SpatialLocation.STDistance(#g);
In my case, I have multiple Lat/Long sources ... and for each source, need to return the 'x' number of nearest neighbours.
Here's my schema
Table: SomeGeogBoundaries
LocationId INTEGER PRIMARY KEY (it's not an identity, but a PK & FK)
CentrePoint GEOGRAPHY
Index:
Spatial Index on CentrePoint column. [Geography || MEDIUM, MEDIUM, HIGH, HIGH]
Sample data:
LocationId | CP Lat/Long
1 | 10,10
2 | 11,11
3 | 20,20
..
So for each location in this table, I need to find the closest.. say 5 other locations.
Update
So far, it looks like using a CURSOR is the only way .. but I'm open to more set based solutions.

You need to find the nearest neighbors within the same set?
SELECT *
FROM SomeGeogBoundaries as b
OUTER APPLY (
SELECT TOP(5) CentrePoint
FROM SomeGeogBoundaries as t
WHERE t.CentrePoint.STInsersects(b.CentrePoint.STBuffer(100))
ORDER by b.CentrePoint.STDistance(t.CentrePoint)
) AS nn
Two notes.
The where clause in the outer apply is to limit the search to (in this case) points that are within 100 meters of eachother (assuming that you're using an SRID whose native unit of measure is meters). That may or may not be appropriate for you. If not, just omit the where clause.
I think this is still a cursor. Don't fool yourself into thinking that just because there is nary a declare cursor statement to be seen that the db engine has much of a choice but to iterate through your table and evaluate the apply for each row.

Is it possible to select the polygon with the maximum area with a query in GeoDjango or PostGIS?

I have lots of urban area multipolygons and I need to select the one with the largest area. At the moment I iterate through each object and calculate the area for each but this is inefficient. What is the best way to do this?

I am not sure if you can call ST_Area directly in GeoDjango, in conjunction with an aggregate query, but you could use a raw SQL query. For example, using the correct Postgres query, as posted by #MikeT,
top_area = ModelName.objects.raw('SELECT * FROM sometable ORDER BY ST_Area(geom) DESC LIMIT 1')[0]

With SQL, this query is:
SELECT *, ST_Area(geom) FROM mytable ORDER BY ST_Area(geom) DESC LIMIT 1;
This calculates area for each geometry for the whole table.
If you use ST_Area(geom) frequently, you can make an index on the expression:
CREATE INDEX mytable_geom_area_idx ON mytable (ST_Area(geom))

Computed column performance

I can't find any answer for my problem on the web.
When exactly are computed columns computed? (not persisted ones)
When I select TOP 100 from thousands of records, are they calculated for only those selected rows?
What if I add a WHERE clause for the computed column? Does this change?
The main problem is that I have a one to many relationship, but I want to have information on parent side about... let's say MAX(somecolumn) of child table.
I'm using Entity Framework. I decided to make a computed column.
Is this a good idea? Are there any others? Any help appreciated. Tnx
EDIT:
My column is defined like this:
[ComputedNextClassDate] as [dbo].[ComputeNextClassDate]([Id]),
And my function:
CREATE FUNCTION [dbo].[ComputeNextClassDate](#id INT)
RETURNS DATETIME
AS
BEGIN
DECLARE #nextDate DATETIME;
DECLARE #now DATETIME = GETUTCDATE();
SELECT #nextDate = MIN(Start) FROM [dbo].[Events] WHERE [Start] > #now AND [GroupClassId] = #id
RETURN #nextDate;
END;

For the calculated columns with no persistance, the calculation result is never stored.
On query execution, SQL Server engine search an execution plan. If your query has been well written, the value will be calculated only once even if it is used at many places into your query.
My opinion, I never use calculated columns with no persistence. The calculation must be done at the insertion or when reading. SQL Server, and others, are ineficient for calculation usually.
Call the CLR is catastrophic in terms of performance. Avoid it.
Prefer multiples tables with joins like
SELECT p.product_name
, SUM(ISNULL(sales,0))
FROM product p
LEFT OUTER JOIN sales s ON p.product_id = s.product_id
GROUP BY p.product_name

SQL Server 2005 SELECT TOP 1 from VIEW returns LAST row

I have a view that may contain more than one row, looking like this:
[rate] | [vendorID]
8374 1234
6523 4321
5234 9374
In a SPROC, I need to set a param equal to the value of the first column from the first row of the view. something like this:
DECLARE #rate int;
SET #rate = (select top 1 rate from vendor_view where vendorID = 123)
SELECT #rate
But this ALWAYS returns the LAST row of the view.
In fact, if I simply run the subselect by itself, I only get the last row.
With 3 rows in the view, TOP 2 returns the FIRST and THIRD rows in order. With 4 rows, it's returning the top 3 in order. Yet still top 1 is returning the last.
DERP?!?
This works..
DECLARE #rate int;
CREATE TABLE #temp (vRate int)
INSERT INTO #temp (vRate) (select rate from vendor_view where vendorID = 123)
SET #rate = (select top 1 vRate from #temp)
SELECT #rate
DROP TABLE #temp
.. but can someone tell me why the first behaves so fudgely and how to do what I want? As explained in the comments, there is no meaningful column by which I can do an order by. Can I force the order in which rows are inserted to be the order in which they are returned?
[EDIT] I've also noticed that: select top 1 rate from ([view definition select]) also returns the correct values time and again.[/EDIT]

That is by design.
If you don't specify how the query should be sorted, the database is free to return the records in any order that is convenient. There is no natural order for a table that is used as default sort order.
What the order will actually be depends on how the query is planned, so you can't even rely on the same query giving a consistent result over time, as the database will gather statistics about the data and may change how the query is planned based on that.
To get the record that you expect, you simply have to specify how you want them sorted, for example:
select top 1 rate
from vendor_view
where vendorID = 123
order by rate

I ran into this problem on a query that had worked for years. We upgraded SQL Server and all of a sudden, an unordered select top 1 was not returning the final record in a table. We simply added an order by to the select.
My understanding is that SQL Server normally will generally provide you the results based on the clustered index if no order by is provided OR off of whatever index is picked by the engine. But, this is not a guarantee of a certain order.
If you don't have something to order off of, you need to add it. Either add a date inserted column and default it to GETDATE() or add an identity column. It won't help you historically, but it addresses the issue going forward.

While it doesn't necessarily make sense that the results of the query should be consistent, in this particular instance they are so we decided to leave it 'as is'. Ultimately it would be best to add a column, but this was not an option. The application this belongs to is slated to be discontinued sometime soon and the database server will not be upgraded from SQL 2005. I don't necessarily like this outcome, but it is what it is: until it breaks it shall not be fixed. :-x