I have a table, dbo.ashfill_pointCloud, that consists of approximately 3 million geometry points. I then draw polygons (rectangles) and run a query to find the average of the z-values of all the points that lie inside that polygon.
The performance is extremely slow and non-consistent.
The query:
`select #averageZ = NULLIF( AVG(NULLIF(Ogc_geometry.Z,NULL)),NULL)
FROM dbo.ashfill_pointCloud WITH (INDEX(si_geom_points))
WHERE Ogc_geometry.STWithin(#deltablock)=1 `
I am using a spatial index (si_geom_points) on the dbo.ashfill_pointCloud table and it is set up as follows:
Bounding Box:
X-min: 12700
Y-min: -2940200
X-max: 13300
Y-max: -2938800
General:
Tesselation Scheme: Geometry grid
Cells per object: 16
Grids
Level1: Medium
Level2: Medium
Level3: Medium
Level4: Medium
The bounding box is specified to include all the points in the dbo.ashfill_pointClouds table. I have also tried many other index setups, for example different grid levels, more cells per object etc. with no luck.
It seems like the query will execute extremely fast for certain regions, but extremely slow (as in hours) for other regions. I have also noticed that if there are no points inside the polygon it also slows down performance drastically. I have also tried using other methods, like STIntersects(), with the same performance.
Any ideas on what I could be doing wrong will be appreciated.
Related
This is more a design question so please bear with me.
I have a system that stores locations consisting of the ID, Longitude and Latitude.
I need to compare the distance between my current location and the locations in the database a only choose ones that are within a certain distance.
I have the formula that calculates the distance between 2 locations based on the long/lat and that works great.
My issue is I may have 10 of thousands of locations in the database and don't want to loop through them all every time I need a list of locations close by.
Not sure what other datapoint I can store with the location to make it so I only have to compare a smaller subset.
Thanks.
As was mentioned in the comments, SQL Server has had support for geospatial since (iirc) SQL 2008. And I know that there is support within .NET for that as well so you should be able to define the data and query it from within your application.
Since the datatype is index-able, k nearest neighbor queries are pretty efficient. There's even a topic in the documentation for that use case. Doing a lift and shift from that page:
DECLARE #g geography = 'POINT(-121.626 47.8315)';
SELECT TOP(7) SpatialLocation.ToString(), City
FROM Person.Address
WHERE SpatialLocation.STDistance(#g) IS NOT NULL
ORDER BY SpatialLocation.STDistance(#g);
If you need all the points within that radius, omit the top clause and change the predicate on STDistance() to something like SpatialLocation.STDistance(#g) < 1000 (the SRID I typically use has meters as the unit of measure, so this would say 'within 1 km').
https://gis.stackexchange.com/ is a good place for in-depth advice on this topic.
A classic approach to quickly locating "nearby" values, is to "grid" the area of interest:
Associate each location with a "grid cell", where each cell is a convenient size. Pick a cell-edge-length such that most cells will hold a small number of values and/or that is similar to the distance range you typically query.
If cell edge is 1 km, and you need locations within 2 km, then get data from 5x5 cells centered at the "target" location.
This is guaranteed to include all data +- 2 km from any location within the central cell.
Apply distance formula to each returned location; some will be beyond 2 km.
I've only done this in memory, not from a DB. I think you add two columns, one for X cell number, other for Y cell number.
With indexes on both of those. So can efficiently get a range of Xs by a range of Ys.
Not sure if a combined "X,Y" index helps or not.
I am using react-chartjs-2. I have a Line chart and x: time y: value. I have 2 different dataset. First dataset value is very as low as 0.00000145 and etc. The second dataset value is as high as millions. When i use this in the same chart, first dataset is at the bottom and looks like a single straight line as chart comparing 2 datasets. Time is same for both datasets. So what is the best way to see two different datasets self comparing?
I think, the best way is to make single chart, which will show correlation between two datasets - the division of first dataset's values by another's. I think, it may be representative
I'm trying to find locations within some distance from given coordinates. On a table with about 32k records query takes about 2 seconds - which is way too slow, imo.
It is doing a clustered index scan, which is logical - it has to calculate distance for every location.. However, I still think this should be faster over a data set this small. I do have spatial index defined, however it's not used, and query fails if I'm forcing it.
Most of the time (~86%) is spent on Filter that calculates the distance - so I'm looking for a ways to optimize that, and I need some help here..
The query I'm using is this:
SELECT Name
FROM Venue
WHERE (Coordinates.STDistance(geography::STPointFromText('POINT(-113.512245 51.498212)', 4326)) / 1000) <= 100
One old approach is to use a BOX firxt.
From your point, make two points on opposite ends of the box. +R/+R and -R/-R from the center.
Then you can filter - a point has to be in this box AND in the circle you describe.
The box check can run on the index and kills most elements.
Simple school geometry. You draw a rectangular box around the circle you describe.
Your current approach can not use the index because the index does not contain the fields.
ALTERNATIVELY: DRAW A CIRCLE - do not use a distance calculation. Draw a circle. With points.
Or read https://stackoverflow.com/questions/11311363/my-application-is-about-asp-net-using-linq-and-remote-mssql-my-connection-is-be which is the same issue you ask.
My spatial geography index in SQL Server has the following level definitions.
HIGH LOW LOW LOW
The problem is that all of my points are in a city and thus all of my points are in a single cell at layer 1. As a result the primary filter is looking at all points which means my index efficiency is 0%. I realized that the HIGH grid means that there are 256 cells. How do I instead use 512 cells or 1024 cells? 256 just isn't enough for me.
Take a look at this page for the different levels.
Does anyone know how to get a higher value than HIGH?
You need to use a bounding box (see: http://technet.microsoft.com/en-us/library/bb934196(v=sql.105).aspx for information about bounding boxes).
Without a bounding box: The issue is that the SQL Server uses a sub-gridding methodology. The 256 cells together must span the entire space! This means that you HLLL is restricting the number of cells you use. Think about it this way: The LLL portion creates 4096 cells for each of the initial cells. The 256 cells each must be the same size. That means that your high level cells are splitting up too large of an area!
Instead, if you put in a bounding box, the total area covered will be reduced, and the 4096 grids will be smaller, so splitting that into 256 can be sufficient.
I have a table that needs to record geographical points (long, lat) for the whole world. The input data is traditional longitudinal & latitudinal (-180, -90, 180, 90).
I created a geography column and want to index it. However, there are many options and MSDN doesn't indicate best practices. I have the following questions:
I assume GRIDS = ( LEVEL_1 = HIGH, LEVEL_2 = HIGH, LEVEL_3 = HIGH, LEVEL_4 = HIGH) is best for grids. This would create the max possible resolution at latitude ≈ 611.5m. I have seen example with other options. What is best?
Since I am recording only points, I assume CELLS_PER_OBJECT = 1 is correct?
What is the min to max range for x & y of GEOGRAPHY_GRID? See #4.
With reference to #3 above, would I need to convert the traditional longitudinal & latitudinal (-180, -90, 180, 90) data to whatever range GEOGRAPHY_GRID uses so as to properly use the grids?
1.) and 2.) The important thing to bear in mind is that the same grid is used not only for tessellating the data in the column on which the index is created, but also for whatever the query parameter you're using to test that data against.
Consider the following query:
SELECT * FROM Table WHERE GeomColumn.STIntersects(#MyPoly) = 1
Assuming that you've created a spatial index on GeomColumn, then the same grid will be applied to #MyPoly in order to perform a primary filter of the results. So, you don't just choose a grid setting based on what's in your table, but also the sort of query sample that you'll be running against that data. In practice, what is "best" is very subjective based on your data. I'd always recommend you start at MEDIUM, MEDIUM, MEDIUM, MEDIUM, and then try adjusting it from there to see if you get better performance based on empirical tests.
3.) and 4.) You don't set a bounding box for the geography datatype - all geography indexes are implicitly assumed to cover the entire globe. That's one of the reasons that geometry is generally a faster-performing datatype than geography, because the cells of a geometry index can provide higher resolution over a limited geographic area.
I found the answer to 3, 4: SRID 4326 is (-180.0000, -90.0000, 180.0000, 90.0000)