Getting Largest Distance from group of GPS Coordinates - sql-server

So I have a database with multiple rows of GPS coordinates. I know how to calculate the distance from a given lat/lng from any one of them in the database, but what I want to do basically is look at the coordinates of a set of rows and get the two rows that are farthest apart. I'd love it if I could do this in SQL, but if I have to do it in my application code that would work to. Here is what I am doing to calculate the distance between two points:
ROUND(( 3960 * acos( cos( radians( :lat ) ) *
cos( radians( p.latitude ) ) * cos( radians( p.longitude ) - radians( :lng ) ) +
sin( radians( :lat ) ) * sin( radians( p.latitude ) ) ) ),1) AS distance
What we are trying to do is, look at GPS data for a specific user and make sure they aren't moving wildly all over the country. All the coordinates for a user should be within a couple miles at most of each other. A flag that there is malicious activity in our system is if the coordinates are all over the country. So I'd like to be able to quickly run through the data for a spcicic user and know what is the max distance they have been.
I thought about just running a Max/Min on the lat and lng separately and set an internal threshold for what is acceptable. And maybe that is easier, but if what I asked in the first part is possible, that would be best.

If you have SQL Server 2008 or later then you can use GEOGRAPHY to calculate the distance, e.g.:
DECLARE #lat1 DECIMAL(19,6) = 44.968046;
DECLARE #lon1 DECIMAL(19,6) = -94.420307;
DECLARE #lat2 DECIMAL(19,6) = 44.33328;
DECLARE #lon2 DECIMAL(19,6) = -89.132008;
SELECT GEOGRAPHY::Point(#lat1, #lon1, 4326).STDistance(GEOGRAPHY::Point(#lat2, #lon2, 4326));
This makes the problem pretty trivial?
For a set of lats/ longs for a user you would need to calculate the distance between each set and then return the highest distance. Putting this all together, you could probably do something like this:
DECLARE #UserGPS TABLE (
UserId INT, --the user
GPSId INT, --the incrementing unique id associated with this GPS reading (could link to a table with more details, e.g. time, date)
Lat DECIMAL(19,6), --lattitude
Lon DECIMAL(19,6)); --longitude
INSERT INTO #UserGPS SELECT 1, 1, 44.968046, -94.420307; --User #1 goes on a very long journey
INSERT INTO #UserGPS SELECT 1, 2, 44.33328, -89.132008;
INSERT INTO #UserGPS SELECT 1, 3, 34.12345, -92.21369;
INSERT INTO #UserGPS SELECT 1, 4, 44.978046, -94.430307;
INSERT INTO #UserGPS SELECT 2, 1, 44.968046, -94.420307; --User #2 doesn't get far
INSERT INTO #UserGPS SELECT 2, 2, 44.978046, -94.430307;
--Make a working table to store the distances between each set of co-ordinates
--This isn't strictly necessary; we could change this into a common-table expression
DECLARE #WorkTable TABLE (
UserId INT, --the user
GPSIdFrom INT, --the id of the first set of co-ordinates
GPSIdTo INT, --the id of the second set of co-ordinates being compared
Distance NUMERIC(19,6)); --the distance
--Get the distance between each and every combination of co-ordinates for each user
INSERT INTO
#WorkTable
SELECT
c1.UserId,
c1.GPSId,
c2.GPSId,
GEOGRAPHY::Point(c1.Lat, c1.Lon, 4326).STDistance(GEOGRAPHY::Point(c2.Lat, c2.Lon, 4326))
FROM
#UserGPS c1
INNER JOIN #UserGPS c2 ON c2.UserId = c1.UserId AND c2.GPSId > c1.GPSId;
--Note this is a self-join, but single-tailed. So we compare each set of co-ordinates to each other set of co-ordinates for a user
--This is handled by the "c2.GPSID > c1.GPSId" in the JOIN clause
--As an example, say we have three sets of co-ordinates for a user
--We would compare set #1 to set #2
--We would compare set #1 to set #3
--We would compare set #2 to set #3
--We wouldn't compare set #3 to anything (as we already did this)
--Determine the maximum distance between all the GPS co-ordinates per user
WITH MaxDistance AS (
SELECT
UserId,
MAX(Distance) AS Distance
FROM
#WorkTable
GROUP BY
UserId)
--Report the results
SELECT
w.UserId,
g1.GPSId,
g1.Lat,
g1.Lon,
g2.GPSId,
g2.Lat,
g2.Lon,
md.Distance AS MaxDistance
FROM
MaxDistance md
INNER JOIN #WorkTable w ON w.UserId = md.UserId AND w.Distance = md.Distance
INNER JOIN #UserGPS g1 ON g1.UserId = md.UserId AND g1.GPSId = w.GPSIdFrom
INNER JOIN #UserGPS g2 ON g2.UserId = md.UserId AND g2.GPSId = w.GPSIdTo;
Results are:
UserId GPSId Lat Lon GPSId Lat Lon MaxDistance
1 3 34.123450 -92.213690 4 44.978046 -94.430307 1219979.460185
2 1 44.968046 -94.420307 2 44.978046 -94.430307 1362.820895
Now I made a LOT of assumptions about what data you are holding as there was no information about the detail of this in your question. You would probably need to adapt this to some degree?

Related

How do I produce the expected table?

I'm trying to get averages of min and max for each categories in posnam column. This record set is of course only a sample, so there are many more records.
Given:
state position minrate maxrate
ny admin assistant 12.5000 14.5000
ny office manager 20.5000 25.5000
ca admin assistant 13.5000 15.5000
ca office manager 21.5000 26.5000
al admin assistant 11.5000 13.5000
al office manager 19.5000 24.5000
Expected:
position ny_min ny_max ca_min ca_max al_min al_max avg_min avg_max
admin assistant 12.5000 14.5000 13.5000 15.5000 11.5000 13.5000 12.5000 14.5000
office manager 20.5000 25.5000 21.5000 26.5000 19.5000 24.5000 20.5000 25.5000
Code:
declare #jobs table (
[state] nvarchar(25),
[position] nvarchar(25),
[minrate] decimal(18,4),
[maxrate] decimal(18,4)
)
insert #jobs
values
('ny','admin assistant',12.5, 14.5),
('ny','office manager',20.5, 25.5),
('ca','admin assistant',13.5, 15.5),
('ca','office manager',21.5, 26.5),
('al','admin assistant',11.5, 13.5),
('al','office manager',19.5, 24.5)
select * from #jobs
In order to dynamically create field names, you will need to utilize dynamic SQL. To pair that with the additional aggregates that you need (the total avg min/max), you will need to perform an additional query across all rows and combine them.
In order to utilize dynamic SQL in this manner, we need an object outside of the current session's scope, so for the purpose of your example here I have swapped your provided table variable of #jobs for a global temp table ##tmpjobs. Assuming that you are actually pulling this from a database table in the real world, you can simply swap the global temp table ##tmpjobs for your real table.
I accomplish this in the below example by unpivoting all state min/max values, adding unpivoted values for the total avg min/max, and then performing a single (fairly standard) PIVOT command.
/*get list of columns that we want for our pivot*/
DECLARE #ColumnList nvarchar(max) = CONCAT((
SELECT
STRING_AGG(state_list.min_max_title,N',') WITHIN GROUP (ORDER BY state_list.min_max_title DESC) AS ColumnList
FROM
(SELECT
CONCAT(j.[state],N'_min,',j.[state],N'_max') AS min_max_title
FROM
##tmpjobs AS j
GROUP BY
j.[state]) AS state_list
),N',avg_min,avg_max');
/*build pivot query*/
DECLARE #Sql nvarchar(max) = CONCAT(
N'SELECT
pvt.*
FROM
/*subquery to unpivot data for min/max values for each state and the two totals*/
(/*add in min for each state*/
SELECT
CONCAT(j.[state],N''_min'') AS ColumnName
,j.position AS position
,j.minrate AS Amount
FROM
##tmpjobs AS j
UNION ALL
/*add max for each state*/
SELECT
CONCAT(j.[state],N''_max'') AS ColumnName
,j.position AS position
,j.maxrate AS Amount
FROM
##tmpjobs AS j
UNION ALL
/*add total min/max rows*/
SELECT
CASE /*conditionally return max/min column name*/
WHEN row_mult.RowId = 1 THEN ''avg_min''
WHEN row_mult.RowId = 2 THEN ''avg_max''
END AS ColumnName
,total_avgs.position
,CASE /*conditionally return max/min value*/
WHEN row_mult.RowId = 1 THEN total_avgs.AvgMinRate
WHEN row_mult.RowId = 2 THEN total_avgs.AvgMaxRate
END
FROM
/*subquery to calculate the total for all states for each position*/
(SELECT
j.position
,AVG(j.minrate) AS AvgMinRate
,AVG(j.maxrate) AS AvgMaxRate
FROM
##tmpjobs AS j
GROUP BY
j.position) AS total_avgs
/*generate an extra row per position*/
OUTER APPLY (SELECT 1 AS RowId
UNION ALL
SELECT 2 AS RowId) AS row_mult) AS src
PIVOT
(MAX(Amount) FOR ColumnName IN (',#ColumnList,N')) AS pvt');
/*now run query*/
EXEC sys.sp_executesql #stmt = #Sql;
The only thing to really note here is that I have the state columns currently in reverse alphabetic order (Z to A) to match your expected output. You can change that to A to Z by changing the DESC order to ASC in the WITHIN GROUP statement, or really any other order you please by changing what the #ColumnList variable outputs.

How to use recursive CTE to add resolution to a data set

I'm attempting to create a recursive CTE statement that adds blank rows in between data points that will later for interpolation. I'm a beginner with SQL and this is my first time using CTE's and am having some difficulty finding the proper way to do this.
I've attempted a few different slight variations on the code I have provided below after some research but haven't grasped a good enough understanding to see my issue yet. The following code should simulate sparse sampling by taking a observation every 4 hours from the sample data set and the second portion should add rows with there respective x values every 0.1 of an hour which will later be filled with interpolated values derived from a cubic spline.
--Sample Data
create table #temperatures (hour integer, temperature double precision);
insert into #temperatures (hour, temperature) values
(0,18.5),
(1,16.9),
(2,15.3),
(3,14.1),
(4,13.8),
(5,14.7),
(6,14.7),
(7,13.5),
(8,12.2),
(9,11.4),
(10,10.9),
(11,10.5),
(12,12.3),
(13,16.4),
(14,22.3),
(15,27.2),
(16,31.1),
(17,34),
(18,35.6),
(19,33.1),
(20,25.1),
(21,21.3),
(22,22.3),
(23,20.3),
(24,18.4),
(25,16.8),
(26,15.6),
(27,15.4),
(28,14.7),
(29,14.1),
(30,14.2),
(31,14),
(32,13.9),
(33,13.9),
(34,13.6),
(35,13.1),
(36,15),
(37,18.2),
(38,21.8),
(39,24.1),
(40,25.7),
(41,29.9),
(42,28.9),
(43,31.7),
(44,29.4),
(45,30.7),
(46,29.9),
(47,27);
--1
WITH xy (x,y)
AS
(
SELECT TOP 12
CAST(hour AS double precision) AS x
,temperature AS y
FROM #temperatures
WHERE cast(hour as integer) % 4 = 0
)
Select x,y
INTO #xy
FROM xy
Select [x] As [x_input]
INTO #x_series
FROM #xy
--2
with recursive
, x_series(input_x) as (
select
min(x)
from
#xy
union all
select
input_x + 0.1
from
x_series
where
input_x + 0.1 < (select max(x) from x)
)
, x_coordinate as (
select
input_x
, max(x) over(order by input_x) as previous_x
from
x_series
left join
#xy on abs(x_series.input_x - xy.x) < 0.001
)
The first CTE works as expected and produces a list of 12 (a sample every 4 hours for two days) but the second produces syntax error. The expected out put would be something like
(4,13.8), (4.1,null/0), (4.2,null/0),....., (8,12.2)
I dont think you need recursive.
What about this:
SQL DEMO
SELECT DISTINCT n = number *1.0 /10 , #xy.x, #xy.y
FROM master..[spt_values] step
LEFT JOIN #xy
ON step.number*1.0 /10 = #xy.x
WHERE number BETWEEN 40 AND 480
This 480 is based on the two days you mention.
OUTPUT
You dont even need the temporal table
SELECT DISTINCT n = number *1.0 /10 , #temperatures.temperature
FROM master..[spt_values] step
LEFT JOIN #temperatures
ON step.number *1.0 / 10 = #temperatures.hour
AND #temperatures.hour % 4 = 0
WHERE number BETWEEN 40 AND 480;
I don't think you need a recursive CTE here. I think a solution like this would be a better approach. Modify accordingly.
DECLARE #max_value FLOAT =
(SELECT MAX(hour) FROM #temperatures) * 10
INSERT INTO #temperatures (hour, temperature)
SELECT X.N / 10, NULL
FROM (
select CAST(ROW_NUMBER() over(order by t1.number) AS FLOAT) AS N
from master..spt_values t1
cross join master..spt_values t2
) X
WHERE X.N <= #max_value
AND X.N NOT IN (SELECT hour FROM #temperatures)
Use the temp table #xy produced in --1 you have, the following will give you a x series:
;with x_series(input_x)
as
(
select min(x) AS input_x
from #xy
union all
select input_x + 0.1
from x_series
where input_x + 0.1 < (select max(x) from #xy)
)
SELECT * FROM x_series;

Should I use a cursor for this?

I have a table with three fields. Group number, X-coord and Y-coord. There can be from 0 to about 10 rows within each group number.
What I want to do is calculate the maximum and minimum distance between points within each group. Obviously, this will only give you a value if there are 2 or more rows within that group.
Output should consist of fields: group number, minDistance, maxDistance.
Is a cursor a good solution for this?
(Coordinates are in WGS84 and I have a working formula for calculating distances)
My reasoning for using a cursor is that I cannot avoid doing a cross join for each group and then applying the formula for each result of the cross join.
I wouldn't use a cursor in your situation but preferably a scalar User Defined Function with the required group number in argument, and calculate the maximum distance for that group inside the UDF.
Please note the calculation algorithm inside the function is much simpler than what you may have.
create table dist (groupId int, X int, Y int)
insert into dist(groupid, x, y) values (1,14,20),(1,11,20),(1,10,22),(1,12,24),(1,11,28),(1,19,78)
insert into dist(groupid, x, y) values (2,10,20),(2,11,20),(2,10,22),(2,12,24),(2,11,28),(2,17,52)
create function dbo.getMinMaxDistanceForGroup (#groupId int)
returns table as return (
select MIN(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MinDistance,
MAX(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MaxDistance
from dist a cross join dist b
where a.groupId = #groupId and b.groupId = #groupId
)
select groupId, MinDistance, MaxDistance
from dist OUTER APPLY dbo.getMinMaxDistanceForGroup(groupId)
group by groupid, MinDistance, MaxDistance

How to do a batch STDistance using a Table Type with Lat Longs?

I am trying to pass a Table Type into a stored procedure and would like the sproc to look up each row of lat/longs and return to me the nearest point for that row.
Type:
CREATE TYPE dbo.LatLongRoadLinkType AS TABLE
(
Id INT NOT NULL,
Latitude FLOAT NOT NULL,
Longitude FLOAT NOT NULL
);
Stored Proc:
ALTER PROCEDURE [dbo].[BatchNearestRoadNodes]
#Input dbo.LatLongRoadLinkType READONLY
AS
BEGIN
-- do stuff here
-- return a table of id from input, nodeid and distance
END
It needs to do for the whole table what is done here for a single lat/long:
DECLARE #g geography = 'POINT(13.5333414077759 54.549524307251)';
DECLARE #region geography = #g.STBuffer(5000)
SELECT TOP 1 NodeID, Point.STDistance(#g) as 'Distance'
FROM Location
WHERE Point.Filter(#region) = 1
ORDER BY Point.STDistance(#g)
The Location table has the important column Point of type Geography, which is spatially indexed and is what the comparisons are done against.I am sending the table of lat/longs from code into the sproc, and the code is expecting a return of :
Id (original point passed in)
NodeID (of nearest point in location table)
Distance
How should I approach this? To perhaps make it a bit easier I could simply pass in a SqlGeography from my code into the sproc instead of Lat/Long, however that would kill the performance since its very expensive to convert to that.
EDIT:
This works, don't know if its the most optimal solution however.
ALTER PROCEDURE [dbo].[BatchNearestRoadNodes]
#Input dbo.LatLongRoadLinkType READONLY
AS
BEGIN
SELECT x.Id, x.LocationName, x.NodeID, x.Distance
FROM (SELECT I.Id,
L.LocationName,
L.NodeId,
L.Point.STDistance(geography::Point(I.Latitude, I.Longitude, 4326)) AS Distance,
ROW_NUMBER () OVER (PARTITION BY I.Id ORDER BY L.Point.STDistance(geography::Point(I.Latitude, I.Longitude, 4326)) ASC) AS Ranking
FROM #Input AS I
JOIN Location AS L
ON L.Point.STIntersects(geography::Point(I.Latitude, I.Longitude, 4326).STBuffer(5000)) = 1
) AS x WHERE Ranking = 1
END
Performance - V1 vs Jon's Edit
V1
============
original:643 found:627 in:1361 ms
original:1018 found:999 in:1700 ms
original:1801 found:1758 in:2628 ms
original:4098 found:3973 in:5271 ms
original:16388 found:15948 in:19624 ms
Jon's Edit
==========
original:643 found:627 in:1333 ms
original:1018 found:999 in:1689 ms
original:1801 found:1758 in:2559 ms
original:4098 found:3973 in:5114 ms
original:16388 found:15948 in:19054 ms
The difference is minimal. Need to get the last figure down.
Try something like this to get partial results:
WITH PreQuery AS
(
I.Id,
GEOGRAPHY::STPointFromText(I.PointAsWKT).STBuffer(5000) AS Geog,
L.NodeId,
L.Point
FROM
#Input AS I
JOIN
Location AS L ON L.Point.STIntersects(I.Geog) = 1
)
SELECT
P.Id,
P.NodeId,
P.Geog.STDistance(P.Point) AS Distance
FROM
PreQuery P
I've written it from off the head and without any test data so there may be small bugs but in the main it will give you every node and it's distance (within 5000 metres) from every point. You'll still need to filter them to get only the one with the minimum distance for each id - shouldn't be too hard ;-)
Hope it helps somewhat, even if not complete.
EDIT (2nd Dec)
I already see the problem with my first solution, you can't get the distance because it's pre-buffered (to note the main thing). However, this amalgamation should be the most efficient combination of both attempts.
WITH PreQuery AS
(
SELECT
I.Id,
geography::Point(I.Latitude, I.Longitude, 4326) AS InputGeography
FROM
#input AS I
)
SELECT x.Id, x.LocationName, x.NodeId, x.Distance
FROM
(
SELECT
PQ.Id,
L.LocationName,
L.NodeId,
L.Point.STDistance(PQ.InputGeography) AS Distance,
ROWNUMBER() OVER (PARTITION BY I.Id ORDER BY L.Point.Distance(PQ.InputGeography) ASC) AS Ranking
FROM
Prequery AS PQ
JOIN
Location AS L
-- ON L.Point.STIntersects(PQ.InputGeography.STBuffer(5000)) = 1 -- Slower
ON L.Point.STDistance(PQ.InputGeography) <= 5000 -- Faster
) AS X WHERE Ranking = 1
This way, you pre-create the input geography only once, not three times as per your attempt. Again this is untested but should prove the most efficient.

Sql Query to calculate Distance between 2 latitudes and Longitudes

This question was posted many a times, but again i had to post the same due to incorrect result what am getting. Can any one help me what am doing wrong.
What i need is the nearest warehouse name for the given customer in Cust_Master and Distance between WH and Customer
I have 2 tables as below.
WH_Master
WH_Name Latitude Longitude
----------- --------- ---------
Horamavu 13.02457 77.65723
White Field 12.985278 77.729899
Hennur 13.030672 77.634034
Cust_Master
Cust_ID Latitude Longitude
------- --------- ---------
Cust-1 13.025579 77.6515
I have tried the below Option and it gives me a wrong distance and location. For the current customer in the example Horamavu is the nearest warehouse and as per the google the distance is 1.8 KM. But am getting 0.751 which is Wrong.
The Query i used is below.
SELECT Top 1 WH_Name, (( 6367.45 * acos( cos( radians(13.025579) ) * cos( radians( Latitude ) ) * cos( radians( Longitude ) - radians(77.6515) ) + sin( radians(13.025579) ) * sin( radians( Latitude ) ) ) )) AS distance_KM FROM WH_Master
Unfortunately this is getting me the same WH_Name and the distance am getting is also wrong. Can you please let me know the correct query. Am using MS SQL Server as my database.
If you are using SQL 2008 or later, you should use a geography data type, and the STDistance function.
eg:
declare #t table (wh_name nvarchar(50), p geography)
insert #t values
('horamavu',geography::STGeomFromText('POINT(13.02457 77.65723)', 4326)),
('white field',geography::STGeomFromText('POINT(12.985278 77.729899)', 4326)),
('hennur', geography::STGeomFromText('POINT(13.030672 77.634034)', 4326))
select wh_name,
p.STDistance(geography::STGeomFromText('POINT(13.025579 77.6515)', 4326)) distance_m
from #t
order by distance_m
Are you sure the google distance is the crow flies distance, and not the via road distance?
As for your original query, if you want TOP you need to specify the ORDER BY

Resources