How to speed up query that use postgis extension? - database

I have the following query that checks whether is point (T.latitude, T.longitude) is inside a POLYGON
query = """
SELECT id
FROM T
WHERE ST_Intersects(ST_Point(T.latitude, T.longitude), 'POLYGON(({points}))')
"""
But it works slow, how can I speed up it if I have the following index:
(latitude, longitude)?
The query is slow because it must compute the formula for every possible pair of points. So it makes the postgress server do a lot of math, and it forces it to scan through your whole location table. How can we optimize this? Maybe we can eliminate the points that are too far north or too far south or too far east or west?

1) Add a geometry column of type Geometry(Point) and fill it:
ALTER TABLE T add COLUMN geom geometry(Point);
UPDATE T SET geom = ST_Point(T.latitude, T.longitude);
2) Create a spatial index:
CREATE INDEX t_gix ON t USING GIST (geom);
3) Use ST_DWithin instead of ST_Intersect:
WHERE ST_DWithin('POLYGON(({points}))', geom, 0)
You want actually find the points which are within a polygon, so ST_DWithin() is what you need. From the documentation:
This function call will automatically include a bounding box
comparison that will make use of any indexes that are available
PS:
If you for some reason cannot make the points 1 and 2, so at least use ST_Dwithin instead of ST_Intersect:
WHERE ST_DWithin('POLYGON(({points}))', ST_Point(T.latitude, T.longitude), 0)
The last parameter is the tolerance.

You can easly speed up your spatial queries that adding t1.geom&&t2.geom condition to your scripts
This condition;
required spatial indexies so your spatial columns must have spatial indexies
returns approximate result (but with st_ Operators gives exact result)
Here is a example at my database and query timings;
select p.id,k.id, p.poly&&k.poly as intersects
from parcel p , enterance k
where st_contains(p.poly,k.poly) and p.poly&&k.poly
--without && 10.4 sec
--with && 1.6 sec
select count(*) from parcel --34797
select count(*) from enterance --70715
https://postgis.net/docs/overlaps_geometry_box2df.html

Related

Delphi, code example code getting list of overlapping polygons with SQL Server Spatial

The overlap between polygons can be evaluated using SQL Server spatial .STContains function (https://learn.microsoft.com/en-us/sql/t-sql/spatial-geometry/stcontains-geometry-data-type?view=sql-server-ver16&viewFallbackFrom=sql-server-ver18)
Can I get some code sample to get the list of polygons located inside a given polygon (the record index should be the query parameter), return a list of records inside this polygon?
select *
from table
where PolygonsInside (RecordIndex = 1)
How to get to this SQL query?
Your description ("overlap between polygons") doesn't necessarily match the function you've chosen (from the same documentation you've linked to, STContains() "Returns 1 if a geometry instance completely contains another geometry instance.".); STIntersects() may be a better choice.
Regardless of any of that, here's a query you'd write to find any polygon from a table that overlaps given the ID of another polygon in the same table.
select overlaps.ID, overlaps.Polygon
from PolygonTable as given
join PolygonTable as other
on other.Polygon.STIntersects(given.Polygon) = 1
and other.ID <> given.ID
where given.ID = 1;
That is, if you provide an ID of 1, that query will find any polygon in the table that intersects with that. Note, I've added a predicate in the join so as not to also return the polygon that has the ID provided - that may or may not be what you want but it's easily removed if you do want to return that one.

Spatial Join in SQL Server

I come from a mining and exploration background. I have a list of points (with ID and X and Y coordinates).
Example
I have another table which contains polygons (claim numbers).
I am trying to find out which points fall within which polygons. My coordinates are in UTM Projection.
Thank you in advance!
I was trying the following code however my query is not returning results.
SELECT
pg.claim_number,
p.DHId
FROM leasehold as pg
JOIN
(SELECT
DHId,
X,
Y,
geometry::STPointFromText('POINT(',X,' ',Y,'), 0) AS [geom]
FROM collar
WHERE X is not null AND
Y is not null AND
claim_number is null) AS p
ON pg.Shape.STIntersects(p.geom) = 1
I was expecting to get a list with claim_number from polygon table which each DHId from point table intersects or falls within.
It looks like there just a syntax/quoting issue. I cleaned it up a bit and replaced STPointFromText with Point which is an MS-specific extension but doesn't require you to create WKT just to get a point. But that's really all I changed - I'd expect your general approach to work.
SELECT
pg.claim_number,
p.DHId
FROM leasehold as pg
JOIN (
SELECT
DHId,
X,
Y,
geometry::Point(X, Y, 0) AS [geom]
FROM collar
WHERE X is not null AND
Y is not null AND
claim_number is null
) AS p
ON pg.Shape.STIntersects(p.geom) = 1;
That said, I would expect this to not be super performant. You're creating the points on the fly and so will be incurring that cost at run time. As such, there's no way to put a spatial index on that data. If you can, I'd suggest adding a column to your collar table that is the geometry point and, as I implied, put a spatial index on it. Also, if there's not an index on the leasehold.Shape column, I'd put one there as well.

flask-sqlalchemy slow paginate count

I have a Postgres 10 database in my Flask app. I'm trying to paginate the filtering results on table over milions of rows. The problem is, that paginate method do counting total number of query results totaly ineffective.
Heres the example with dummy filter:
paginate = Buildings.query.filter(height>10).paginate(1,10)
Under the hood if perform 2 queries:
SELECT * FROM buildings where height > 10
SELECT count(*) FROM (
SELECT * FROM buildings where height > 10
)
--------
count returns 200,000 rows
The problem is that count on raw select without subquery is quite fast ~30ms, but paginate method wraps that into subquery that takes ~30s.
The query plan on cold database:
Is there an option of using default paginate method from flask-sqlalchemy in performant way?
EDIT:
To get the better understanding of my problem here is the real filter operations used in my case, but with dummy field names:
paginate = Buildings.query.filter_by(owner_id=None).filter(Buildings.address.like('%A%')).paginate(1,10)
So the SQL the ORM produce is:
SELECT count(*) AS count_1
FROM (SELECT foo_column, [...]
FROM buildings
WHERE buildings.owner_id IS NULL AND buildings.address LIKE '%A%' ) AS anon_1
That query is already optimized by indices from:
CREATE INDEX ix_trgm_buildings_address ON public.buildings USING gin (address gin_trgm_ops);
CREATE INDEX ix_buildings_owner_id ON public.buildings USING btree (owner_id)
The problem is just this count function, that's very slow.
So it looks like a disk-reading problem. The solutions would be get faster disks, get more RAM is it all can be cached, or if you have enough RAM than to use pg_prewarm to get all the data into the cache ahead of need. Or try increasing effective_io_concurrency, so that the bitmap heap scan can have more than one IO request outstanding at a time.
Your actual query seems to be more complex than the one you show, based on the Filter: entry and based on the Row Removed by Index Recheck: entry in combination with the lack of Lossy blocks. There might be some other things to try, but we would need to see the real query and the index definition (which apparently is not just an ordinary btree index on "height").

SQL using more simple indexes

my questions are What indexes are used? In what order? Why? in following sample
Query:
SELECT House
FROM myTable
WHERE 1=1
and City='myCity'
and Street='myStreet'
and Color='myColor'
Indexes:
Ind1: City
Ind2: Street
Ind3: Color
Ind4: Street,Color
It depends on... The server might have statistics, so it will choose the index which has the most effective filtering like:
if City='myCity' returns 100
if Street='myStreet' returns 1000
if Color='myColor' returns 10000
element, then City index will be used. This logic is valid for composite indexes as well.
The optimizer will try to get the smallest set first then the other filters will be applied on this.
This requires uptodate statistic, otherwise the wrong index might be used.

lua and lsqlite3: speeding up select statement

I'm using the lsqlite3 lua wrapper and I'm making queries into a database. My DB has ~5million rows and the code I'm using to retrieve rows is akin to:
db = lsqlite3.open('mydb')
local temp = {}
local sql = "SELECT A,B FROM tab where FOO=BAR ORDER BY A DESC LIMIT N"
for row in db:nrows(sql) do temp[row['key']] = row['col1'] end
As you can see I'm trying to get the top N rows sorted in descending order by FOO (I want to get the top rows and then apply the LIMIT not the other way around). I indexed the column A but it doesn't seem to make much of a difference. How can I make this faster?
You need to index the column on which you filter (i.e. with the WHERE clause). THe reason is that ORDER BY comes into play after filtering, not the other way around.
So you probably should create an index on FOO.
Can you post your table schema?
UPDATE
Also you can increase the sqlite cache, e.g.:
PRAGMA cache_size=100000
You can adjust this depending on the memory available and the size of your database.
UPDATE 2
I you want to have a better understanding of how your query is handled by sqlite, you can ask it to provide you with the query plan:
http://www.sqlite.org/eqp.html
UPDATE 3
I did not understand your context properly with my initial answer. If you are to ORDER BY on some large data set, you probably want to use that index, not the previous one, so you can tell sqlite to not use the index on FOO this way:
SELECT a, b FROM foo WHERE +a > 30 ORDER BY b

Resources