Making this spatial query more efficient - sql-server

I have 2 tables:
tZipCodeNoCity with ZipCode and PointGeography
and MBLPosition with Latitude and Longitude
In this query I'm finding closest ZipCode to my positions. It's "poor mans" geocoding :)
How do I write this query so I don't have to do this SELECT TOP 1 inline?
It's pretty slow with even 150 devices (like 20 seconds)
I had to include 150 mile radius into this subselect to get it faster
SELECT LastPositions.DeviceId, P.Description, P.Latitude, P.Longitude, P.Speed, P.DeviceTime,
(
SELECT TOP 1 ZC.ZipCode
FROM dbo.tZipCodeNoCity ZC
WHERE ZC.PointGeography.STDistance(geography::STPointFromText('POINT(' + CAST(P.Longitude AS VARCHAR(20)) + ' ' + CAST(P.Latitude AS VARCHAR(20)) + ')', 4326)) < 150 * 1609.344
ORDER BY ZC.PointGeography.STDistance(geography::STPointFromText('POINT(' + CAST(P.Longitude AS VARCHAR(20)) + ' ' + CAST(P.Latitude AS VARCHAR(20)) + ')', 4326))
)
FROM dbo.MBLPosition P
INNER JOIN
(
SELECT D.DeviceId, MAX(P.PositionKey) LastPositionKey
FROM dbo.MBLPosition P
INNER JOIN IDATTApplication.dbo.MBLDevice D ON P.DeviceKey = D.DeviceKey
GROUP BY D.DeviceId
) LastPositions ON P.PositionKey = LastPositions.LastPositionKey

In a project I worked on about 12 years ago, I ran a query along these lines to reduce the list of possibilities before doing the actual distance calculation:
WHERE zip.lat < my.lat + 0.5 && zip.lat > my.lat - 0.5
&& zip.long < my.long + 0.5 && zip.long > my.long - 0.5
From that subset, I calculate the actual distance between the two points and sort on it. You'll have to adjust the "0.5" portion as appropriate to get a big enough box to be sure you're going to get a hit.
And I would imagine that there's a better way than STPointFromText to create your point object. Could you use STPointFromWKB? Could you convert to the geography type once?
See this page for an example of creating your point via SET.
DECLARE #p geography;
SET #p = geography::STGeomFromText('POINT(' + CAST(P.Longitude AS VARCHAR(20)) + ' ' + CAST(P.Latitude AS VARCHAR(20)) + ')', 4326);
SELECT TOP 1 ZC.ZipCode
FROM dbo.tZipCodeNoCity ZC
WHERE ZC.PointGeography.STDistance(#p)) < 150 * 1609.344
ORDER BY ZC.PointGeography.STDistance(#p))

Related

Apply STDistance to all rows of table?

I'm trying to return all the rows from [store] with distance of less than 10 miles. Table [Store] has a column of type Geography.
I understand how to find the distance between two specific points, something like this:
declare #origin geography
select #origin = geography::STPointFromText('POINT(' + CAST(-73.935242 AS
VARCHAR(20)) + ' ' + CAST(40.730610 AS VARCHAR(20)) + ')', 4326)
declare #destination geography
select #destination = geography::STPointFromText('POINT(' + CAST(-93.732666 AS VARCHAR(20)) + ' ' + CAST(30.274096 AS VARCHAR(20)) + ')', 4326)
select #origin.STDistance(#destination)/ 1609.344 as 'distance in miles'
I'm having trouble applying this logic to a SELECT statement. Instead of getting the distance between #origin and #destination, I would like to get the distance in miles between #origin and store.Geolocation for all rows.
The STDistance method, used from one instance of Geography and applied to another, returns the distance between the two points. It can be used with variables, e.g. #Origin.STDistance( #Destination ), columns or a combination thereof, e.g. to find all of the stores within 10 miles of a particular #Origin:
select *
from Store
where #Origin.STDistance( Store.Geolocation ) < 1609.344 * 10.0;
Note: As BenThul pointed out, spatial index handling is a bit fickle. An STDistance compared to a constant is SARGable: #Origin.STDistance( Store.Geolocation ) < 1609.344 * 10.0, but this mathematically equivalent expression is not: #Origin.STDistance( Store.Geolocation ) / 1609.344 < 10.0. This "feature" is documented here.

Paginating a parent in SQL Server on a parent/child query

(SQL Server 2012 - Web Edition)
I have a parent/child (one to many) relationship in a query like so:
SELECT a.a, a.b, b.c
FROM tablea INNER JOIN
tableb ON b.pk = a.fk
I have a huge pagination query that encompasses this using the standard (psuedo-code):
WITH C as (SELECT top(#perpage*#pagenum) rowID = row_number() OVER (somefield)),
SELECT c.* FROM C (query) WHERE DT_RowId > (#pagenum-1)*#perpage
The question I have is in this scenario is it possible to paginate off the parent table (a), instead of the entire query? Can I modify my pagination query (not the sql that pulls the query itself) so that when I ask for 10 rows, it gives me 10 rows from the parent, with 'x' number of children attached?
I know I'm not giving the bigger picture here, but the bigger picture is ugly. If need be, we can go there, but it's out there. Here's a small taste of where we're going with this:
IF UPPER(LEFT(#rSQL, 6)) = 'SELECT'
BEGIN
SET #rSQL = 'SELECT * FROM (' + #rSQL + ')' + ' as rTBL';
SET #rSQL = RIGHT(#rSQL, LEN(#rSQL)-7);
IF (LEN(LTRIM(#search)) > 0)
BEGIN
SET #rPaging =
'IF (#schemaonly=1) SET FMTONLY ON;
SELECT #ttlrows = COUNT(*) FROM (SELECT ' + #rSQL + #rWhere + ') AS TBL;
WITH C as (select top(#perpage*#pagenum) DT_RowId = ROW_NUMBER() OVER (' + #rOrder + '), ';
SET #rPaging = #rPaging + #rSQL + #rWhere + ')
SELECT C.*' + #rcols + ', (#perpage-1) * #pagenum as pagenum, #ttlrows as ct, CEILING(#ttlrows / CAST(#perpage AS FLOAT)) as pages
FROM C '+ #query + ' WHERE DT_RowId > (#pagenum-1) * #perpage ';
END
ELSE
BEGIN
SET #rPaging =
'IF (#schemaonly=1) SET FMTONLY ON;
SELECT #ttlrows = COUNT(*) FROM (' + #oSQL + ') AS SUBQUERY;
WITH C as (select top(#perpage*#pagenum) DT_RowId = ROW_NUMBER() OVER (' + #rOrder + '), ';
SET #rPaging = #rPaging + #rSQL + ')
SELECT C.*' + #rcols + ',(#perpage-1) * #pagenum as pagenum, #ttlrows as ct, CEILING(#ttlrows / CAST(#perpage AS FLOAT)) as pages
FROM C '+ #query + ' WHERE DT_RowId > (#pagenum-1) * #perpage ';
END
PRINT #rPaging;
EXECUTE SP_EXECUTESQL #rPaging, #parms, #ttlrows out, #schemaonly, #perpage, #pagenum, #fksiteID, #filter1, #filter2, #filter3, #filter4, #intfilter1, #intfilter2, #intfilter3, #intfilter4, #datefilter1, #datefilter2, #search;
SET FMTONLY OFF;
END
ELSE
BEGIN
SET #rSQL = LTRIM(REPLACE(UPPER(#rSQL), 'EXEC',''));
EXECUTE SP_EXECUTESQL #rSQL, #parms, #ttlrows out, #schemaonly, #perpage, #pagenum, #fksiteID, #filter1, #filter2, #filter3, #filter4, #intfilter1, #intfilter2, #intfilter3, #intfilter4, #datefilter1, #datefilter2;
END
You could do the pagination in a CTE that only gets the parent rows, and then join the child rows in a subsequent CTE or in the main query.
Due to the dynamic way you are using this, this might have to involve building your pagination query from the same building blocks you use to build #query. Without seeing the code that builds #query I can't be much more specific than that.
You could add
,DENSE_RANK() OVER (ORDER BY table_a.primary_key)
This would indirectly provide the same result as
,ROW_NUMBER() OVER(ORDER BY table_a.primary_key)
but the former would be on the final result set instead going back to table a for the latter code snippet.
But please be aware of the disadvantage: any additional ranking function will force an additional sort operation on the result set! This might significantly influence the query performance. If this is the case in your scenario, I'd recommend to follow Tab Allemans solution and use a cte.

Create a custom sequence with letter prefix

I have a sequence in SQL Server
CREATE SEQUENCE dbo.NextBusinessValue
START WITH 1
INCREMENT BY 1 ;
GO
And I'd like to use this to generate a 5 digit custom reference number that uses this sequence to create the number in the format A0000.
The rules for the reference number are that:
1-9999 would be A0001 - A9999
10000-19999 would be B0000 - B9999
20000-29999 would be C0000 - C9999 etc...
It won't ever get the the amount of data that requires going past Z.
I know I can get a letter by using:
SELECT CHAR(65)
So this would work for 1-9999:
declare #n int = 9999
SELECT CHAR(65) + right('0000' + convert(varchar(10), #n), 4)
But would fail when it reaches 10000.
What methods can be used to increment the letter each time the sequence hits the next block of 10000?
UPDATE AND WARNING
Having a primary key and a business key used for display, invoicing is very common. The business key has to be stored and indexed because business users will use it to search for records, documents etc. You shouldn't use the business key as the primary key though.
ORIGINAL
You already get the first digit with #n/10000. Add that to 65 to get the first letter.
To get the remainder you can perform a modulo operation, #n/10000 and format the result as a string:
select char(65 + #n/10000) + format(#n % 10000 ,'d')
Sequences and FORMAT were both introduced in SQL Server 2012, so you can be assured that FORMAT is always available.
9999 will return A9999, 19999 will return B9999 etc.
The scale can be a parameter itself
select char(65 + #n/#scale) + format(#n % #scale ,'d')
Personally I would handle this either in your display code or add it as a computed field either ti the table or view.
This would work upto Z:
declare #n int = 9999
-- Gives A9999
SELECT CHAR(#n / 10000 + 65 ) + right('0000' + convert(varchar(10), #n), 4)
SET #n = 10000
-- Gives B0000
SELECT CHAR(#n / 10000 + 65 ) + right('0000' + convert(varchar(10), #n), 4)
SET #n = 10001
-- Gives B0001
SELECT CHAR(#n / 10000 + 65 ) + right('0000' + convert(varchar(10), #n), 4)
SET #n = 20001
-- Gives C0001
SELECT CHAR(#n / 10000 + 65 ) + right('0000' + convert(varchar(10), #n), 4)
SET #n = 200001
-- Gives U0001
SELECT CHAR(#n / 10000 + 65 ) + right('0000' + convert(varchar(10), #n), 4)
SET #n = 300001
-- Gives _0001
SELECT CHAR(#n / 10000 + 65 ) + right('0000' + convert(varchar(10), #n), 4)
Something like this?
DECLARE #n INT = 9999;
WHILE #n < 26000
BEGIN
SELECT CHAR(65 + CONVERT(INT, #n / 10000)) + RIGHT('0000' + CONVERT(VARCHAR(10), #n), 4);
SELECT #n = #n + 1;
END;
(edited)
You should not use this as primary key, but rather calculate your format for the output on-the-fly. For a faster search I'd reccomend to use the following to calculate a persistant computed column, which you can use with an index.
DECLARE #mockingTbl TABLE(SomeSeqValue INT);
INSERT INTO #mockingTbl VALUES(0),(1),(999),(1000),(9999),(10000),(12345),(50000);
SELECT A.NumeralPart
,B.Rest
,C.StartLetter
,C.StartLetter+REPLACE(STR(A.NumeralPart,4),' ','0') AS YourCode
FROM #mockingTbl AS m
CROSS APPLY(SELECT m.SomeSeqValue % 10000 AS NumeralPart) AS A
CROSS APPLY(SELECT (m.SomeSeqValue-A.NumeralPart)/1000 AS Rest) AS B
CROSS APPLY(SELECT CHAR(B.Rest + ASCII('A'))) AS C(StartLetter)

SQL Server: Calculate the Radius of a Lat/Long?

Say I have the latitude and longitude of a city and I need to find out all the airport that are within 100 miles of this location. How would I accomplish this? My data resides in SQL Server. 1 table has all the city info with lat and long and the other has the airport info with lat and long.
First ... convert city's data point
DECLARE #point geography;
SELECT geography::STPointFromText('POINT(' + CAST(#lat AS VARCHAR(20)) + ' ' +
CAST(#lon AS VARCHAR(20)) + ')', 4326)
where #lat and #lon are the latitude and longitude of the city in question.
Then you can query the table ...
SELECT [column1],[column2],[etc]
FROM [table]
WHERE #point.STBuffer(160934.4).STIntersects(geography::STPointFromText(
'POINT(' + CAST([lat] AS VARCHAR(20)) + ' ' +
CAST([lon] AS VARCHAR(20)) + ')', 4326) );
where 160934.4 is the number of meters in 100 miles.
This will be slow, though. If you wanted to do even more spatial work, you could add a persisted computed column (because lat and lon points aren't really going to change) and then use a spatial index.
ALTER TABLE [table]
ADD geo_point AS geography::STPointFromText('POINT(' + CAST([lat] AS VARCHAR(20))
+ ' ' + CAST([lon] AS VARCHAR(20)) + ')', 4326) PERSISTED;
CREATE SPATIAL INDEX spix_table_geopt
ON table(geo_point)
WITH ( BOUNDING_BOX = ( 0, 0, 500, 200 ) ); --you'd have to know your data
I used/wrote this several years ago, and it was close enough for what I needed. Part of the formula takes into account the curvature of the earth if I remember correctly, but it has been a while. I used zip codes, but you could easily adapt for cities instead - same logic.
ALTER PROCEDURE [dbo].[sp_StoresByZipArea] (#zip nvarchar(5), #Radius float) AS
DECLARE #LatRange float
DECLARE #LongRange float
DECLARE #LowLatitude float
DECLARE #HighLatitude float
DECLARE #LowLongitude float
DECLARE #HighLongitude float
DECLARE #istartlat float
DECLARE #istartlong float
SELECT #iStartlat=Latitude, #iStartLong=Longitude from zipcodes where zipcode=#ZIP
SELECT #LatRange = #Radius / ((6076 / 5280) * 60)
SELECT #LongRange = #Radius / (((cos((#iStartLat * 3.141592653589 / 180)) * 6076.) / 5280.) * 60)
SELECT #LowLatitude = #istartlat - #LatRange
SELECT #HighLatitude = #istartlat + #LatRange
SELECT #LowLongitude = #istartlong - #LongRange
SELECT #HighLongitude = #istartlong + #LongRange
/** Now you can create a SQL statement which limits the recordset of cities in this manner: **/
SELECT * FROM ZipCodes
WHERE (Latitude <= #HighLatitude) AND (Latitude >= #LowLatitude) AND (Longitude >= #LowLongitude) AND (Longitude <= #HighLongitude)

How do I easily find IDENTITY columns in danger of overflowing?

My database is getting old, and one of my biggest INT IDENTITY columns has a value around 1.3 billion. This will overflow around 2.1 billion. I plan on increasing it's size, but I don't want to do it too soon because of the number of records in the database. I may replace my database hardware before I increase the column size, which could offset any performance problems this could cause. I also want to keep an eye on all the other columns in my databases that are more than 50% full. It's a lot of tables, and checking each one manually is not practical.
This is how I am getting the value now (I know the value returned may be slightly out of date, but it's good enough for my purposes):
PRINT IDENT_CURRENT('MyDatabase.dbo.MyTable')
Can I use the INFORMATION_SCHEMA to get this information?
You can consult the sys.identity_columns system catalog view:
SELECT
name,
seed_value, increment_value, last_value
FROM sys.identity_columns
This gives you the name, seed, increment and last value for each column. The view also contains the data type, so you can easily figure out which identity columns might be running out of numbers soonish...
I created a stored procedure to solve this problem. It uses the INFORMATION_SCHEMA to find the IDENTITY columns, and then uses IDENT_CURRENT and the column's DATA_TYPE to calculate the percent full. Specify the database as the first parameter, and then optionally the minimum percent and data type.
EXEC master.dbo.CheckIdentityColumns 'MyDatabase' --all
EXEC master.dbo.CheckIdentityColumns 'MyDatabase', 50 --columns 50% full or greater
EXEC master.dbo.CheckIdentityColumns 'MyDatabase', 50, 'int' --only int columns
Example output:
Table Column Type Percent Full Remaining
------------------------- ------------------ ------- ------------ ---------------
MyDatabase.dbo.Table1 Table1ID int 9 1,937,868,393
MyDatabase.dbo.Table2 Table2ID int 5 2,019,944,894
MyDatabase.dbo.Table3 Table3ID int 9 1,943,793,775
I created a reminder to check all my databases once per month, and I log this information in a spreadsheet.
CheckIdentityColumns Procedure
USE master
GO
CREATE PROCEDURE dbo.CheckIdentityColumns
(
#Database AS NVARCHAR(128),
#PercentFull AS TINYINT = 0,
#Type AS VARCHAR(8) = NULL
)
AS
--this procedure assumes you are not using negative numbers in your identity columns
DECLARE #Sql NVARCHAR(3000)
SET #Sql =
'USE ' + #Database + '
SELECT
[Column].TABLE_CATALOG + ''.'' +
[Column].TABLE_SCHEMA + ''.'' +
[Table].TABLE_NAME AS [Table],
[Column].COLUMN_NAME AS [Column],
[Column].DATA_TYPE AS [Type],
CAST((
CASE LOWER([Column].DATA_TYPE)
WHEN ''tinyint''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 255)
WHEN ''smallint''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 32767)
WHEN ''int''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 2147483647)
WHEN ''bigint''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 9223372036854775807)
WHEN ''decimal''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / (([Column].NUMERIC_PRECISION * 10) - 1))
END * 100) AS INT) AS [Percent Full],
REPLACE(CONVERT(VARCHAR(19), CAST(
CASE LOWER([Column].DATA_TYPE)
WHEN ''tinyint''
THEN (255 - IDENT_CURRENT([Table].TABLE_NAME))
WHEN ''smallint''
THEN (32767 - IDENT_CURRENT([Table].TABLE_NAME))
WHEN ''int''
THEN (2147483647 - IDENT_CURRENT([Table].TABLE_NAME))
WHEN ''bigint''
THEN (9223372036854775807 - IDENT_CURRENT([Table].TABLE_NAME))
WHEN ''decimal''
THEN ((([Column].NUMERIC_PRECISION * 10) - 1) - IDENT_CURRENT([Table].TABLE_NAME))
END
AS MONEY) , 1), ''.00'', '''') AS Remaining
FROM
INFORMATION_SCHEMA.COLUMNS AS [Column]
INNER JOIN
INFORMATION_SCHEMA.TABLES AS [Table]
ON [Table].TABLE_NAME = [Column].TABLE_NAME
WHERE
COLUMNPROPERTY(
OBJECT_ID([Column].TABLE_NAME),
[Column].COLUMN_NAME, ''IsIdentity'') = 1 --true
AND [Table].TABLE_TYPE = ''Base Table''
AND [Table].TABLE_NAME NOT LIKE ''dt%''
AND [Table].TABLE_NAME NOT LIKE ''MS%''
AND [Table].TABLE_NAME NOT LIKE ''syncobj_%''
AND CAST(
(
CASE LOWER([Column].DATA_TYPE)
WHEN ''tinyint''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 255)
WHEN ''smallint''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 32767)
WHEN ''int''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 2147483647)
WHEN ''bigint''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / 9223372036854775807)
WHEN ''decimal''
THEN (IDENT_CURRENT([Table].TABLE_NAME) / (([Column].NUMERIC_PRECISION * 10) - 1))
END * 100
) AS INT) >= ' + CAST(#PercentFull AS VARCHAR(4))
IF (#Type IS NOT NULL)
SET #Sql = #Sql + 'AND LOWER([Column].DATA_TYPE) = ''' + LOWER(#Type) + ''''
SET #Sql = #Sql + '
ORDER BY
[Column].TABLE_CATALOG + ''.'' +
[Column].TABLE_SCHEMA + ''.'' +
[Table].TABLE_NAME,
[Column].COLUMN_NAME'
EXECUTE sp_executesql #Sql
GO
Keith Walton has a very comprehensive query that is very good. Here's a little simpler one that is based on the assumption that the identity columns are all integers:
SELECT sys.tables.name AS [Table Name],
last_value AS [Last Value],
MAX_LENGTH,
CAST(cast(last_value as int) / 2147483647.0 * 100.0 AS DECIMAL(5,2))
AS [Percentage of ID's Used],
2147483647 - cast(last_value as int) AS Remaining
FROM sys.identity_columns
INNER JOIN sys.tables
ON sys.identity_columns.object_id = sys.tables.object_id
ORDER BY last_value DESC
The results will look like this:
Table Name Last Value MAX_LENGTH Percentage of ID's Used Remaining
My_Table 49181800 4 2.29 2098301847
Checking Integer Identity Columns
While crafting a solution for this problem, we found this thread both informative and interesting (we also wrote a detailed post about this and described how our tool works).
In our solution we're querying the information_schema to acquire a list of
all columns. Then we wrote a program that would go through each of them and compute the maximum and minimum (we account for both overflow and underflow).
SELECT
b.COLUMN_NAME,
b.COLUMN_TYPE,
b.DATA_TYPE,
b.signed,
a.TABLE_NAME,
a.TABLE_SCHEMA
FROM (
-- get all tables
SELECT
TABLE_NAME, TABLE_SCHEMA
FROM information_schema.tables
WHERE
TABLE_TYPE IN ('BASE TABLE', 'VIEW') AND
TABLE_SCHEMA NOT IN ('mysql', 'performance_schema')
) a
JOIN (
-- get information about columns types
SELECT
TABLE_NAME,
COLUMN_NAME,
COLUMN_TYPE,
TABLE_SCHEMA,
DATA_TYPE,
(!(LOWER(COLUMN_TYPE) REGEXP '.*unsigned.*')) AS signed
FROM information_schema.columns
) b ON a.TABLE_NAME = b.TABLE_NAME AND a.TABLE_SCHEMA = b.TABLE_SCHEMA
ORDER BY a.TABLE_SCHEMA DESC;

Resources