Postgis - Get all unique area from set of overlapping polygon - postgis

I have a layer with X polygons. Some of them overlaps, some of them don't :
I need to isolate each distinct area of this layer like this :
output layer
I tried with boundary and polygonize :
SELECT ST_Polygonize(geom) AS geom FROM (
SELECT ST_Union(geom) AS geom FROM (
SELECT ST_Boundary(geom) AS geom FROM buffers
) AS lines
) AS union_lines
It works pretty well but I have some missing parts :
If I reduce the number of input polygons, there is no missing parts.
Do you have a better solution than mine to achieve my goal ?
Thank you a lot for your time

Related

Spatial Join in SQL Server

I come from a mining and exploration background. I have a list of points (with ID and X and Y coordinates).
Example
I have another table which contains polygons (claim numbers).
I am trying to find out which points fall within which polygons. My coordinates are in UTM Projection.
Thank you in advance!
I was trying the following code however my query is not returning results.
SELECT
pg.claim_number,
p.DHId
FROM leasehold as pg
JOIN
(SELECT
DHId,
X,
Y,
geometry::STPointFromText('POINT(',X,' ',Y,'), 0) AS [geom]
FROM collar
WHERE X is not null AND
Y is not null AND
claim_number is null) AS p
ON pg.Shape.STIntersects(p.geom) = 1
I was expecting to get a list with claim_number from polygon table which each DHId from point table intersects or falls within.
It looks like there just a syntax/quoting issue. I cleaned it up a bit and replaced STPointFromText with Point which is an MS-specific extension but doesn't require you to create WKT just to get a point. But that's really all I changed - I'd expect your general approach to work.
SELECT
pg.claim_number,
p.DHId
FROM leasehold as pg
JOIN (
SELECT
DHId,
X,
Y,
geometry::Point(X, Y, 0) AS [geom]
FROM collar
WHERE X is not null AND
Y is not null AND
claim_number is null
) AS p
ON pg.Shape.STIntersects(p.geom) = 1;
That said, I would expect this to not be super performant. You're creating the points on the fly and so will be incurring that cost at run time. As such, there's no way to put a spatial index on that data. If you can, I'd suggest adding a column to your collar table that is the geometry point and, as I implied, put a spatial index on it. Also, if there's not an index on the leasehold.Shape column, I'd put one there as well.

Using SQL Server spatial queries, how to select a shape that is produced by intersecting two other shapes and calculate its area?

Consider the following query:
SELECT b.*
FROM dbo.BndPrimarySchoolZone b
WHERE b.ncessch IN ( 250315000417, 250315000421 );
It produces two rows, returning some descriptive data along with two shapes (Shape column is of data type geometry):
In the "Spatial results" tab you can see that these two shapes overlap:
Question: how can I isolate just the overlapping area into a separate shape and calculate its area?
I have looked at STIntersects, but that seems to only be useful for determining if the shapes intersect, and not for finding the actual intersection. I understand that once I have the intersection, finding its area is trivial with STArea.
Actually, after discovering the STIntersection function, I think this works:
WITH cte_school1
AS (SELECT b.*
FROM dbo.BndPrimarySchoolZone b
WHERE b.ncessch = 250315000417),
cte_school2
AS (SELECT b.*
FROM dbo.BndPrimarySchoolZone b
WHERE b.ncessch = 250315000421)
SELECT s1.Shape.STIntersection(s2.Shape),
s1.Shape.STIntersection(s2.Shape).STArea()
FROM cte_school1 s1
JOIN cte_school2 s2
ON s1.Shape.STIntersects(s2.Shape) = 1;
If there is a better way to do it, please post in comments.

PostGIS : improving the speed of query over complex geometry

I am new to PostgreSQL and PostGIS but the question is not trivial. I am using PostgreSQL 9.5 with PostGIS 2.2.
I need to run some queries that take a horrible amount of time.
First, let me explain the problem in non-GIS terms :
Basically, I have a set of several hundreds of thousands of points spread over a territory of about half a million square kilometres a (country).
Over this territory, I have about a dozen sets of areas coming from various databases. In each set, I have between a few hundreds and a few thousands of areas. I want to find which points are in any of these areas.
Now, how I am currently working out the problem in GIS terms :
Each set of areas is a Postgresql table with a geometry column of the type multipolygon and with, as explained before a few hundreds to a few thousand records.
All these tables are contained in a schema donnees but I am using a different schema for these operations, called traitements.
So the process is a/ merging all the geometries into a single geometry, and then b/ finding which points are contained in this geometry.
The problem is that, if step a/ took a reasonable amount of time (several minutes), step b/ takes forever.
I am currently working with only a sample of the points I must process (about 1% of them, i.e. about 7000) and it is not finished after several hours (the database connection eventually times out).
I am making tests running the query by limiting the number of return rows to 10 or 50 and it still takes about half an hour for that.
I am using a Linux Mint 18 machine with 4 CPU and 8 Gb of RAM if you wonder.
I have created indexes on the geometry columns. All geometry columns use the same SRID.
Creating the tables :
CREATE TABLE traitements.sites_candidats (
pkid serial PRIMARY KEY,
statut varchar(255) NOT NULL,
geom geometry(Point, 2154)
);
CREATE UNIQUE INDEX ON traitements.sites_candidats (origine, origine_id ) ;
CREATE INDEX ON traitements.sites_candidats (statut);
CREATE INDEX sites_candidats_geométrie ON traitements.sites_candidats USING GIST ( geom );
CREATE TABLE traitements.zones_traitements (
pkid serial PRIMARY KEY,
définition varchar(255) NOT NULL,
geom geometry (MultiPolygon, 2154)
);
CREATE UNIQUE INDEX ON traitements.zones_traitements (définition) ;
CREATE INDEX zones_traitements_geométrie ON traitements.zones_traitements USING GIST ( geom );
Please note that I specified the geometry type of the geom column in table traitements only because I wanted to specify the SRID but I was not sure what is the correct syntax for any type of Geometry. Maybe "geom geometry (Geometry, 2154)" ?
Merging all the geometries of the various sets of areas :
As said before, all the tables hold geometries of the type multipolygon.
This is the code I am using to merge all the geometries from one of the tables :
INSERT INTO traitements.zones_traitements
( définition, , geom )
VALUES
(
'first-level merge',
(
SELECT ST_Multi(ST_Collect(dumpedGeometries)) AS singleMultiGeometry
FROM
(
SELECT ST_Force2D((ST_Dump(geom)).geom) AS dumpedGeometries
FROM donnees.one_table
) AS dumpingGeometries
)
) ;
I found that some of the geometries in some of the records are in 3D, so that's why I am using _ST_Force2D_.
I do this for all the tables and then merge the geometries again using :
INSERT INTO traitements.zones_traitements
( définition, geom )
VALUES
(
'second-level merge',
(
SELECT ST_Multi(ST_Collect(dumpedGeometries)) AS singleMultiGeometry
FROM
(
SELECT (ST_Dump(geom)).geom AS dumpedGeometries
FROM traitements.zones_traitements
WHERE définition != 'second-level merge'
) AS dumpingGeometries
)
) ;
As said before, these queries take several minutes but that's fine.
Not the query that takes forever :
SELECT pkid
FROM traitements.sites_candidats AS sites
JOIN (
SELECT geom FROM traitements.zones_traitements
WHERE définition = 'zones_rédhibitoires' ) AS zones
ON ST_Contains(zones.geom , sites.geom)
LIMIT 50;
Analysing the problem :
Obviously, it is the subquery selecting the points that takes a lot of time, not the update.
So I have run an EXPLAIN (ANALYZE, BUFFERS) on the query :
EXPLAIN (ANALYZE, BUFFERS)
SELECT pkid
FROM traitements.sites_candidats AS sites
JOIN (
SELECT geom FROM traitements.zones_traitements
WHERE définition = 'second_level_merge' ) AS zones
ON ST_Contains(zones.geom , sites.geom)
LIMIT 10;
---------------------------------
"Limit (cost=4.18..20.23 rows=1 width=22) (actual time=6052.069..4393634.244 rows=10 loops=1)"
" Buffers: shared hit=1 read=688784"
" -> Nested Loop (cost=4.18..20.23 rows=1 width=22) (actual time=6052.068..4391938.803 rows=10 loops=1)"
" Buffers: shared hit=1 read=688784"
" -> Seq Scan on zones_traitements (cost=0.00..1.23 rows=1 width=54939392) (actual time=0.016..0.016 rows=1 loops=1)"
" Filter: (("définition")::text = 'zones_rédhibitoires'::text)"
" Rows Removed by Filter: 17"
" Buffers: shared hit=1"
" -> Bitmap Heap Scan on sites_candidats sites (cost=4.18..19.00 rows=1 width=54) (actual time=6052.044..4391260.053 rows=10 loops=1)"
" Recheck Cond: (zones_traitements.geom ~ geom)"
" Filter: _st_contains(zones_traitements.geom, geom)"
" Heap Blocks: exact=1"
" Buffers: shared read=688784"
" -> Bitmap Index Scan on "sites_candidats_geométrie" (cost=0.00..4.18 rows=4 width=0) (actual time=23.284..23.284 rows=3720 loops=1)"
" Index Cond: (zones_traitements.geom ~ geom)"
" Buffers: shared read=51"
"Planning time: 91.967 ms"
"Execution time: 4399271.394 ms"
I am not sure how to read this output.
Nevertheless, I suspect that the query is so slow because of the geometry obtained by merging all these multipolygons into a single one.
Questions :
Would that work better using a different type of geometry to merge the others, like a GeometryCollection ?
How does the indexes work in this case ?
Is there more efficient than ST_Contains() ?
Let´s see. First off, you should ask GIS specific questions over at GIS Stackexchange. But I´ll try to help here:
Technically, your geometry column definition is correct, and using
'primitives' (e.g. POINT, LINE, POLYGON and their MULTIs) is favorable
over GEOMETRYCOLLECTIONs.However, it is almost always the better
choice to run spatial relation functions on as small a geometry as
possible; for most of those functions, PostGIS has to check each and
every vertice of the input geometries against each other (so in this
case, it has to traverse the polygon's millions of vertices once for each point
to be checked in ST_Contains).PostGIS will in fact fire up a bbox
comparison prior to the relation checks (if an index is present on
both geometries) to limit the possible matches and effectively
speeding up the check by several magnitudes; this is rendered useless
here.(I would almost recommend to actually dump the MULTIs into simple POLYGONS, but not without knowing your data).
Why are you dumping the MULTI geometries just to collect them
back into MULTIs? If your source table's geometries are actually stored as MULTIPOLYGONS (and hopefully for good reason), simply copy them into the intermediate table, with ST_Force2D used on the MULTIs and ST_IsValid in the WHERE block (you can try ST_MakeValidon the geometries, but there's no guarantee it will work).If you have inserted all tables into the zones_traitements table, run VACUUM ANALYZE and REINDEX to actually make use of the index!
In your 'second merge' query...are you simply adding the 'merged' geometries to the existing ones in the table? Don´t, that´s just wrong. It messes up table statistics and the index and is quite the unnecessary overhead. You should do these things within your query, but it´s not necessary here.
Keep in mind that geometries of different types or extends created or derived by or within queries can neither have an index nor use the initial one. This applies to your 'merging' queries!
Then run
SELECT pkid
FROM traitements.sites_candidats AS sites
JOIN traitements.zones_traitements AS zones
ON ST_Intersects(zones.geom, sites.geom)
to return one pkid for every intersection with a zone so that if one point intersects two MULTIOLYGONs, you´ll get two rows for that point. Use SELECT DISTINCT pkid ... to only get one row per pkid that is intersecting any zone.(Note: I used ST_Intersection because that should imply on less check on the relation. If you absolutely need ST_Contains, just replace it)
Hope this helps. If not, say a word.
Again, thanks.
I had come to the same conclusion as your advice : that, instead of merging all the thousands of multipolygons into a single huge one, whose bbox is too huge, it would be more efficient to decompose all the multipolygons into simple polygons using ST_Dump and insert these into a dedicated table with an appropriate index.
Nevertheless, to do this, I first had to correct geometries : certain multipolygons had indeed unvalid geometries. St_MakeValid would make valid 90% of them as multipolygons but the rest was transformed into either GeometryCollections or MultilineStrings. To correct these, I used ST_Buffer, with a buffer of 0.01 meter, the result of which being a correct multipolygon.
Once this was done, all my multipolygons were valid and I could dump them into simple polygons.
Doing this, I reduced the search time by a factor of +/- 5000 !
:D

How to speed up query that use postgis extension?

I have the following query that checks whether is point (T.latitude, T.longitude) is inside a POLYGON
query = """
SELECT id
FROM T
WHERE ST_Intersects(ST_Point(T.latitude, T.longitude), 'POLYGON(({points}))')
"""
But it works slow, how can I speed up it if I have the following index:
(latitude, longitude)?
The query is slow because it must compute the formula for every possible pair of points. So it makes the postgress server do a lot of math, and it forces it to scan through your whole location table. How can we optimize this? Maybe we can eliminate the points that are too far north or too far south or too far east or west?
1) Add a geometry column of type Geometry(Point) and fill it:
ALTER TABLE T add COLUMN geom geometry(Point);
UPDATE T SET geom = ST_Point(T.latitude, T.longitude);
2) Create a spatial index:
CREATE INDEX t_gix ON t USING GIST (geom);
3) Use ST_DWithin instead of ST_Intersect:
WHERE ST_DWithin('POLYGON(({points}))', geom, 0)
You want actually find the points which are within a polygon, so ST_DWithin() is what you need. From the documentation:
This function call will automatically include a bounding box
comparison that will make use of any indexes that are available
PS:
If you for some reason cannot make the points 1 and 2, so at least use ST_Dwithin instead of ST_Intersect:
WHERE ST_DWithin('POLYGON(({points}))', ST_Point(T.latitude, T.longitude), 0)
The last parameter is the tolerance.
You can easly speed up your spatial queries that adding t1.geom&&t2.geom condition to your scripts
This condition;
required spatial indexies so your spatial columns must have spatial indexies
returns approximate result (but with st_ Operators gives exact result)
Here is a example at my database and query timings;
select p.id,k.id, p.poly&&k.poly as intersects
from parcel p , enterance k
where st_contains(p.poly,k.poly) and p.poly&&k.poly
--without && 10.4 sec
--with && 1.6 sec
select count(*) from parcel --34797
select count(*) from enterance --70715
https://postgis.net/docs/overlaps_geometry_box2df.html

Finding polygons within a certain radius of a number of other polygons

I have a table with a bunch of polygons (or multipolygons, I'm not sure...does it matter?) of one type (A) defined in a CTE, and then another of another type (B) in another CTE. I want to filter for just type A polygons that are within a given radius of any of the polygons of type B. How do I do this?
Create a collection of your 'B' polygons using ST_Collect & then use a WHERE clause with ST_DWithin to specify your distance parameter.
For example:
WITH polys_a AS (
SELECT geom
FROM buildings_dc
),
polys_b AS (
SELECT geom
FROM buildings_va
)
SELECT polys_a.*
FROM polys_a,
(
SELECT ST_Collect(geom) as geoms
FROM polys_b
) as c
WHERE ST_DWithin(a.geom, c.geoms, .001);
Note that both sets of geometries may be of different types (e.g. Polygon, Point, MultiPolygon, etc.), but they must be of the same projection/ coordinate system. If you are using standard WGS84 (SRID 4326), the distance parameter is in terms of degrees.

Resources