ST_Union() takes too much time to process - postgis

I have this query
SELECT ST_Clip(rast, the_geom)FROM raster, polygons that is very fast to process. the_geom has 50 geometries while raster is a 400x400 tiled 5-band raster layer (about 3GB in size).
While the above query works fine, SELECT ST_Union(ST_Clip(rast, the_geom))
FROM raster, polygons takes forever to process. I created spatial index st_convexthull(rast) while loading the raster to PostGIS. What would I have missed?
Thank you in advance.

Related

geom_sf: plot multiple series

I have a sf polygon dataframe with multiple
series (T1, T2, T3, all on the same scale:
they're observations at different time points).
I can plot say T1 with
ggplot(map)+geom_sf(aes(fill=T1))
what I'd like to do is plot all three (T1, T2 and T3)
as facets (separate maps) on the same drawing.
I'm sure there's a way to do this, but I can't find it.
Can anyone tell me how? Thanks!
ADDED: Two additional notes on this question.
First, the data structure described above is
one that could be plotted using spplot, with
the T's being the arguments to spplot's zcol argument.
So in this connection, my question amounts to
asking how to convert an spplot structure to
be usable by geom_sf.
Second, suppose I use sf to read in a shp file
for say 20 polygons. I also have a data frame
consisting of stacked observations for these
same polygons, say for 3 periods, so the
dataframe has 60 rows. How do I merge these
in order to be usable? Can I just stack
3 copies of the sf structure, and than cbind the
dataframe (assuming the rows match up correctly)?
At least in one sense this turns out to be very simple. Given a data structure (ds_sp) that can be plotted with spplot, you can just do the following:
ds_sf <- st_as_sf(ds_sp) # convert to sf form
plot(ds_sf[c("T1","T2")]) # plot the desired series
This isn't quite the same as using facet_wrap with ggplot, but at least it gives you something to work with.
ANOTHER LATER ADDITION : As to the longitudinal + facet_wrap issue, the following seems to work:
If necessary, create a data frame (df1) with the
longitudinal data (longit), an area indicator (fips) and a time
indicator (date) which will be used for faceting, and
anything else you may need.
If necessary, create an sf-compatible version of
the spatial geometry via st_to_sf, as new_poly .
This will be of classes "sf" and "data.frame"
and should have a spatial indicator matching fips
in df1.
Merge the two:
data_new<-df1<-dplyr::inner_join(df1,new_poly,by="fips",all.x=TRUE)
Now produce the plot
ggplot(data_new)+geom_sf(aes(fill=longit,geometry=geometry))+facet_wrap(~date)
and make adjustments from there.

Geometry operations on latitude/longitude coordinates

My question is probably a duplicate, but all the answers I have seen so far do not satisfy me or still leaves me in doubt.
I have a web application that uses Google Maps API to draw and save shapes (circles and polygons) in a SQL Server DB with the geometry data type (where I save lat/long coordinates) and an SRID = 4326.
My objective is to later on, determine if a point is contained in the area of those circles/polygons thanks to SQL function geometry::ST_Intersects().
I have been told so far that my method wouldn't work because I am using geometry instead of geography. But to my surprise... after checking with a few tests, it works perfectly well with geometry and I am not able to understand why or how?
Could somebody explain to me why the geometry type works well with operations on lat/long whereas geography would be more suited?
I post as an answer because as comment is too long
geometry works well to the extent that your intersections are approximable to flat intersections.
the difference between geometry and geography consists in the fact that the former works by hypothesizing to work on plane surfaces the second on spherical surfaces. in the case in which the polygons in question are related to small areas in the order of a few thousand meters geometry works very well. the difference between measured distance by imagining that the points lie on a plane or that the points lie on the earth's sphere is so small as to be negligible. Unlike the question if the points are a few hundred kilometers in this case the distance measured in the plane or on the sphere is very different and proportionally is also the result of the intersection between these areas

Slow performance on mssql spatial queries

I have a table, dbo.ashfill_pointCloud, that consists of approximately 3 million geometry points. I then draw polygons (rectangles) and run a query to find the average of the z-values of all the points that lie inside that polygon.
The performance is extremely slow and non-consistent.
The query:
`select #averageZ = NULLIF( AVG(NULLIF(Ogc_geometry.Z,NULL)),NULL)
FROM dbo.ashfill_pointCloud WITH (INDEX(si_geom_points))
WHERE Ogc_geometry.STWithin(#deltablock)=1 `
I am using a spatial index (si_geom_points) on the dbo.ashfill_pointCloud table and it is set up as follows:
Bounding Box:
X-min: 12700
Y-min: -2940200
X-max: 13300
Y-max: -2938800
General:
Tesselation Scheme: Geometry grid
Cells per object: 16
Grids
Level1: Medium
Level2: Medium
Level3: Medium
Level4: Medium
The bounding box is specified to include all the points in the dbo.ashfill_pointClouds table. I have also tried many other index setups, for example different grid levels, more cells per object etc. with no luck.
It seems like the query will execute extremely fast for certain regions, but extremely slow (as in hours) for other regions. I have also noticed that if there are no points inside the polygon it also slows down performance drastically. I have also tried using other methods, like STIntersects(), with the same performance.
Any ideas on what I could be doing wrong will be appreciated.

Check that smaller cubes fill bigger cube

Given one large cube (axis aligned and on integer coordinates), and many smaller cubes (also axis aligned and on integer coordinates). How can we check that the large cube is perfectly filled by the smaller cubes.
Currently we check that:
For each small cube it is fully contained by the large cube.
That it doesn't intersect any other small cube.
The sum of the volumes of the small cubes equals the volume of the large cube.
This is ok for small numbers of cubes but we need to support this test of cubes with dimensions greater than 2^32. Even at 2^16 the number of small cubes required to fill the large cube is large enough that step 2 takes a while (O(n^2) checking each cube intersects no other).
Is there a better algorithm?
EDIT:
There seems to be some confusion over this. I am not trying to split a cube into smaller cubes. That's already done. Part of our program splits large OpenCL ranges (axis aligned cubes on integer coordinates) into lots of smaller ranges that fit into a hardware job.
What I'm doing is hooking into this system and checking that the jobs it produces correctly cover the large initial range. My algorithm above works, but it's slow and given the amount of tests we have to run I'd like to keep these tests as fast as possible.
We are talking about 3D right?
For 2D one can do a similar (but simpler) process (with, I believe, an O(n log n) running time algorithm).
The basic idea of the below is the sweep-line algorithm.
Note that rectangle intersection can done by checking whether any corner of any cube is contained in any other cube.
You can improve on (2) as follows:
Split each cube into 2 rectangles on the y-z plane (so you'd have 2 rectangles defined by the same set of 4 (y,z) coordinates, but the x coordinates will be different between the rectangles).
Define the rectangle with the smaller x-coordinate as the start of a cube and the other rectangle as the end of a cube.
Sort the rectangles by x-coordinate
Have an initially empty interval tree
(each interval should also store a reference to the rectangle to which it belongs)
For each rectangle:
Look up the y-coordinate of each point of the rectangle in the interval tree.
For each matching interval, look up its rectangle and check whether the point is also contained within the z-coordinates (this is all that's required because the tree only contains x-coordinates in the correct range and we check the y-coordinates by doing the interval lookup).
If it is, we have overlap.
If the rectangle is the start of a cube, insert the 2 y-coordinates of the rectangle as an interval into the interval tree.
Otherwise, remove the interval defined by the 2 y-coordinates from the tree.
The running time is between O(n) (best case) and O(n2) (worst case), depending on how much overlap there is in the x- and y-coordinates (more overlap is worse).
order your insert cubes
insert the biggest insert cube in one of the corners of your cube and split up the remaining cube into subcubes
insert the second biggest insert cube in the first of the sub cubes that will fit and add the remaining subcubes of this subcube to the set of subcubes
etc.
Another go, again only addressing step 2 in the original question:
Define a space-filling curve with good spatial locality, such as a 3D Hilbert Curve.
For each cube calculate the pair of coordinates on the curve for the points at which the curve both enters and leaves the cube. The space-filling curve will enter and leave some cubes more than once, calculate more than one pair of coordinates for these cases.
You've now got I don't know how many pairs of coordinates, but I'd guess no more than 2^18. These coordinates define intervals along the space-filling curve, so sort them and look for overlaps.
Time complexity is probably dominated by the sort, space complexity is probably quite big.

Spatial query - poor performance, not using index

I'm trying to find locations within some distance from given coordinates. On a table with about 32k records query takes about 2 seconds - which is way too slow, imo.
It is doing a clustered index scan, which is logical - it has to calculate distance for every location.. However, I still think this should be faster over a data set this small. I do have spatial index defined, however it's not used, and query fails if I'm forcing it.
Most of the time (~86%) is spent on Filter that calculates the distance - so I'm looking for a ways to optimize that, and I need some help here..
The query I'm using is this:
SELECT Name
FROM Venue
WHERE (Coordinates.STDistance(geography::STPointFromText('POINT(-113.512245 51.498212)', 4326)) / 1000) <= 100
One old approach is to use a BOX firxt.
From your point, make two points on opposite ends of the box. +R/+R and -R/-R from the center.
Then you can filter - a point has to be in this box AND in the circle you describe.
The box check can run on the index and kills most elements.
Simple school geometry. You draw a rectangular box around the circle you describe.
Your current approach can not use the index because the index does not contain the fields.
ALTERNATIVELY: DRAW A CIRCLE - do not use a distance calculation. Draw a circle. With points.
Or read https://stackoverflow.com/questions/11311363/my-application-is-about-asp-net-using-linq-and-remote-mssql-my-connection-is-be which is the same issue you ask.

Resources