Should I use a cursor for this? - sql-server

I have a table with three fields. Group number, X-coord and Y-coord. There can be from 0 to about 10 rows within each group number.
What I want to do is calculate the maximum and minimum distance between points within each group. Obviously, this will only give you a value if there are 2 or more rows within that group.
Output should consist of fields: group number, minDistance, maxDistance.
Is a cursor a good solution for this?
(Coordinates are in WGS84 and I have a working formula for calculating distances)
My reasoning for using a cursor is that I cannot avoid doing a cross join for each group and then applying the formula for each result of the cross join.

I wouldn't use a cursor in your situation but preferably a scalar User Defined Function with the required group number in argument, and calculate the maximum distance for that group inside the UDF.
Please note the calculation algorithm inside the function is much simpler than what you may have.
create table dist (groupId int, X int, Y int)
insert into dist(groupid, x, y) values (1,14,20),(1,11,20),(1,10,22),(1,12,24),(1,11,28),(1,19,78)
insert into dist(groupid, x, y) values (2,10,20),(2,11,20),(2,10,22),(2,12,24),(2,11,28),(2,17,52)
create function dbo.getMinMaxDistanceForGroup (#groupId int)
returns table as return (
select MIN(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MinDistance,
MAX(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MaxDistance
from dist a cross join dist b
where a.groupId = #groupId and b.groupId = #groupId
)
select groupId, MinDistance, MaxDistance
from dist OUTER APPLY dbo.getMinMaxDistanceForGroup(groupId)
group by groupid, MinDistance, MaxDistance

Related

Getting Largest Distance from group of GPS Coordinates

So I have a database with multiple rows of GPS coordinates. I know how to calculate the distance from a given lat/lng from any one of them in the database, but what I want to do basically is look at the coordinates of a set of rows and get the two rows that are farthest apart. I'd love it if I could do this in SQL, but if I have to do it in my application code that would work to. Here is what I am doing to calculate the distance between two points:
ROUND(( 3960 * acos( cos( radians( :lat ) ) *
cos( radians( p.latitude ) ) * cos( radians( p.longitude ) - radians( :lng ) ) +
sin( radians( :lat ) ) * sin( radians( p.latitude ) ) ) ),1) AS distance
What we are trying to do is, look at GPS data for a specific user and make sure they aren't moving wildly all over the country. All the coordinates for a user should be within a couple miles at most of each other. A flag that there is malicious activity in our system is if the coordinates are all over the country. So I'd like to be able to quickly run through the data for a spcicic user and know what is the max distance they have been.
I thought about just running a Max/Min on the lat and lng separately and set an internal threshold for what is acceptable. And maybe that is easier, but if what I asked in the first part is possible, that would be best.
If you have SQL Server 2008 or later then you can use GEOGRAPHY to calculate the distance, e.g.:
DECLARE #lat1 DECIMAL(19,6) = 44.968046;
DECLARE #lon1 DECIMAL(19,6) = -94.420307;
DECLARE #lat2 DECIMAL(19,6) = 44.33328;
DECLARE #lon2 DECIMAL(19,6) = -89.132008;
SELECT GEOGRAPHY::Point(#lat1, #lon1, 4326).STDistance(GEOGRAPHY::Point(#lat2, #lon2, 4326));
This makes the problem pretty trivial?
For a set of lats/ longs for a user you would need to calculate the distance between each set and then return the highest distance. Putting this all together, you could probably do something like this:
DECLARE #UserGPS TABLE (
UserId INT, --the user
GPSId INT, --the incrementing unique id associated with this GPS reading (could link to a table with more details, e.g. time, date)
Lat DECIMAL(19,6), --lattitude
Lon DECIMAL(19,6)); --longitude
INSERT INTO #UserGPS SELECT 1, 1, 44.968046, -94.420307; --User #1 goes on a very long journey
INSERT INTO #UserGPS SELECT 1, 2, 44.33328, -89.132008;
INSERT INTO #UserGPS SELECT 1, 3, 34.12345, -92.21369;
INSERT INTO #UserGPS SELECT 1, 4, 44.978046, -94.430307;
INSERT INTO #UserGPS SELECT 2, 1, 44.968046, -94.420307; --User #2 doesn't get far
INSERT INTO #UserGPS SELECT 2, 2, 44.978046, -94.430307;
--Make a working table to store the distances between each set of co-ordinates
--This isn't strictly necessary; we could change this into a common-table expression
DECLARE #WorkTable TABLE (
UserId INT, --the user
GPSIdFrom INT, --the id of the first set of co-ordinates
GPSIdTo INT, --the id of the second set of co-ordinates being compared
Distance NUMERIC(19,6)); --the distance
--Get the distance between each and every combination of co-ordinates for each user
INSERT INTO
#WorkTable
SELECT
c1.UserId,
c1.GPSId,
c2.GPSId,
GEOGRAPHY::Point(c1.Lat, c1.Lon, 4326).STDistance(GEOGRAPHY::Point(c2.Lat, c2.Lon, 4326))
FROM
#UserGPS c1
INNER JOIN #UserGPS c2 ON c2.UserId = c1.UserId AND c2.GPSId > c1.GPSId;
--Note this is a self-join, but single-tailed. So we compare each set of co-ordinates to each other set of co-ordinates for a user
--This is handled by the "c2.GPSID > c1.GPSId" in the JOIN clause
--As an example, say we have three sets of co-ordinates for a user
--We would compare set #1 to set #2
--We would compare set #1 to set #3
--We would compare set #2 to set #3
--We wouldn't compare set #3 to anything (as we already did this)
--Determine the maximum distance between all the GPS co-ordinates per user
WITH MaxDistance AS (
SELECT
UserId,
MAX(Distance) AS Distance
FROM
#WorkTable
GROUP BY
UserId)
--Report the results
SELECT
w.UserId,
g1.GPSId,
g1.Lat,
g1.Lon,
g2.GPSId,
g2.Lat,
g2.Lon,
md.Distance AS MaxDistance
FROM
MaxDistance md
INNER JOIN #WorkTable w ON w.UserId = md.UserId AND w.Distance = md.Distance
INNER JOIN #UserGPS g1 ON g1.UserId = md.UserId AND g1.GPSId = w.GPSIdFrom
INNER JOIN #UserGPS g2 ON g2.UserId = md.UserId AND g2.GPSId = w.GPSIdTo;
Results are:
UserId GPSId Lat Lon GPSId Lat Lon MaxDistance
1 3 34.123450 -92.213690 4 44.978046 -94.430307 1219979.460185
2 1 44.968046 -94.420307 2 44.978046 -94.430307 1362.820895
Now I made a LOT of assumptions about what data you are holding as there was no information about the detail of this in your question. You would probably need to adapt this to some degree?

How to do a batch STDistance using a Table Type with Lat Longs?

I am trying to pass a Table Type into a stored procedure and would like the sproc to look up each row of lat/longs and return to me the nearest point for that row.
Type:
CREATE TYPE dbo.LatLongRoadLinkType AS TABLE
(
Id INT NOT NULL,
Latitude FLOAT NOT NULL,
Longitude FLOAT NOT NULL
);
Stored Proc:
ALTER PROCEDURE [dbo].[BatchNearestRoadNodes]
#Input dbo.LatLongRoadLinkType READONLY
AS
BEGIN
-- do stuff here
-- return a table of id from input, nodeid and distance
END
It needs to do for the whole table what is done here for a single lat/long:
DECLARE #g geography = 'POINT(13.5333414077759 54.549524307251)';
DECLARE #region geography = #g.STBuffer(5000)
SELECT TOP 1 NodeID, Point.STDistance(#g) as 'Distance'
FROM Location
WHERE Point.Filter(#region) = 1
ORDER BY Point.STDistance(#g)
The Location table has the important column Point of type Geography, which is spatially indexed and is what the comparisons are done against.I am sending the table of lat/longs from code into the sproc, and the code is expecting a return of :
Id (original point passed in)
NodeID (of nearest point in location table)
Distance
How should I approach this? To perhaps make it a bit easier I could simply pass in a SqlGeography from my code into the sproc instead of Lat/Long, however that would kill the performance since its very expensive to convert to that.
EDIT:
This works, don't know if its the most optimal solution however.
ALTER PROCEDURE [dbo].[BatchNearestRoadNodes]
#Input dbo.LatLongRoadLinkType READONLY
AS
BEGIN
SELECT x.Id, x.LocationName, x.NodeID, x.Distance
FROM (SELECT I.Id,
L.LocationName,
L.NodeId,
L.Point.STDistance(geography::Point(I.Latitude, I.Longitude, 4326)) AS Distance,
ROW_NUMBER () OVER (PARTITION BY I.Id ORDER BY L.Point.STDistance(geography::Point(I.Latitude, I.Longitude, 4326)) ASC) AS Ranking
FROM #Input AS I
JOIN Location AS L
ON L.Point.STIntersects(geography::Point(I.Latitude, I.Longitude, 4326).STBuffer(5000)) = 1
) AS x WHERE Ranking = 1
END
Performance - V1 vs Jon's Edit
V1
============
original:643 found:627 in:1361 ms
original:1018 found:999 in:1700 ms
original:1801 found:1758 in:2628 ms
original:4098 found:3973 in:5271 ms
original:16388 found:15948 in:19624 ms
Jon's Edit
==========
original:643 found:627 in:1333 ms
original:1018 found:999 in:1689 ms
original:1801 found:1758 in:2559 ms
original:4098 found:3973 in:5114 ms
original:16388 found:15948 in:19054 ms
The difference is minimal. Need to get the last figure down.
Try something like this to get partial results:
WITH PreQuery AS
(
I.Id,
GEOGRAPHY::STPointFromText(I.PointAsWKT).STBuffer(5000) AS Geog,
L.NodeId,
L.Point
FROM
#Input AS I
JOIN
Location AS L ON L.Point.STIntersects(I.Geog) = 1
)
SELECT
P.Id,
P.NodeId,
P.Geog.STDistance(P.Point) AS Distance
FROM
PreQuery P
I've written it from off the head and without any test data so there may be small bugs but in the main it will give you every node and it's distance (within 5000 metres) from every point. You'll still need to filter them to get only the one with the minimum distance for each id - shouldn't be too hard ;-)
Hope it helps somewhat, even if not complete.
EDIT (2nd Dec)
I already see the problem with my first solution, you can't get the distance because it's pre-buffered (to note the main thing). However, this amalgamation should be the most efficient combination of both attempts.
WITH PreQuery AS
(
SELECT
I.Id,
geography::Point(I.Latitude, I.Longitude, 4326) AS InputGeography
FROM
#input AS I
)
SELECT x.Id, x.LocationName, x.NodeId, x.Distance
FROM
(
SELECT
PQ.Id,
L.LocationName,
L.NodeId,
L.Point.STDistance(PQ.InputGeography) AS Distance,
ROWNUMBER() OVER (PARTITION BY I.Id ORDER BY L.Point.Distance(PQ.InputGeography) ASC) AS Ranking
FROM
Prequery AS PQ
JOIN
Location AS L
-- ON L.Point.STIntersects(PQ.InputGeography.STBuffer(5000)) = 1 -- Slower
ON L.Point.STDistance(PQ.InputGeography) <= 5000 -- Faster
) AS X WHERE Ranking = 1
This way, you pre-create the input geography only once, not three times as per your attempt. Again this is untested but should prove the most efficient.

Creating test data for calculation using RAND()

I attempted to populate a table with two columns of random FLOATs, but of every row generated was identical.
;WITH CTE (x, y) AS (
SELECT RAND(), RAND()
UNION ALL
SELECT x, y FROM CTE
)
--INSERT INTO CalculationTestData (x, y)
SELECT TOP 5000000 x, y
FROM CTE
OPTION (MAXRECURSION 0)
I can accomplish what I need just fine by just not using the CTE, but this has peaked my curiosity.
Is there a way to do this quickly?
I know quickly is a relative term, by it, I mean approximately how quickly it would take to execute the above.
What do you expect other than for the cte to repeat the rows because you're recursion is just selecting them again
SELECT RAND(), RAND() -- SELECT 9 , 10
UNION ALL
SELECT x, y -- SELECT 9 , 10
what you want to do is more like this
SELECT RAND(), RAND()
UNION ALL
SELECT RAND(), RAND() -- but the problem is that this 'row' will be duplicated
so you need to seed and reseed for each row giving you something like
SELECT RAND(CAST(NEWID() AS VARBINARY)),
RAND(CAST(NEWID() AS VARBINARY))
UNION ALL
SELECT RAND(CAST(NEWID() AS VARBINARY)),
RAND(CAST(NEWID() AS VARBINARY))
using NEWID() as the seed is one way there may well be others that are more efficient etc
Try this instead of rand(): it will give a random positive whole number on each entry. I had the same issue with rand() recently
ABS(Checksum(NewID()))
Float:
cast(ABS(Checksum(NewID()) ) as float)
To be Clear:
;WITH CTE (x, y) AS (
SELECT cast(ABS(Checksum(NewID()) ) as float), cast(ABS(Checksum(NewID()) ) as float)
UNION ALL
SELECT x, y FROM CTE
)
Did not give a random entry on each line?

Storing 'Point' column from ShapeFile

I have a Shapefile (*.shp) which I am loading into the Database. I have a column called Point" which stores the Data in shapes. For example
POLYGON ((1543297.7815 5169880.9468, 1543236.7046 5169848.3834,
1543195.0218 5169930.2767, 1543104.4989 5170101.6818,
1543056.805 5170191.9835, 1542969.1187 5170358.1396,
1542820.9656 5170638.8525, 1542820.6605 5170639.7223,
1542816.1912 5170647.8707, 1543158.2618 5170829.6437,
1543318.4126 5170915.6562, 1543559.2078 5171043.8001,
1543840.2014 5171192.4698, 1544108.917 5171336.1306,
1544271.7972 5171422.313, 1544357.0262 5171263.5454,
1544447.9779 5171091.3804, 1544468.04 5171054.3179,
1544529.7931 5170936.192, 1544583.3416 5170837.5321,
1544658.3376 5170696.5608, 1544699.0638 5170622.0859,
1543985.6169 5170245.4526, 1543618.4129 5170050.7422,
1543297.7815 5169880.9468))
The data type of the Column "Point" is nvarchar(max).
The problem is when the size of Polygon exceeds , the column truncates and does not store all the values. I can't convert Points into Geometry as I want to convert Points into Lat/long from Polygon.
I'd suggest storing the whole polygon as a geometry type. If/when you need to "convert" it to geography, use the geography methods STNumPoints and STPointN to extract the individual points in sequence and convert them as appropriate.
Speaking of the conversion, what format are your data in now? I'm not seeing lat/long info there, but perhaps I'm missing something.
Edit: Here's a solution that I just coded.
use tempdb;
create table tally (i int not null);
with
a as (select 1 as [i] union select 0),
b as (select 1 as [i] from a as [a1] cross join a as [a2]),
c as (select 1 as [i] from b as [a1] cross join b as [a2]),
d as (select 1 as [i] from c as [a1] cross join c as [a2]),
e as (select 1 as [i] from d as [a1] cross join d as [a2])
insert into tally
select row_number() over (order by i) from e
create unique clustered index [CI_Tally] on tally (i)
create table ace (g geometry)
insert into ace (g)
values (geometry::STGeomFromText(<<your polygon string here>>, 0));
select i, g.STPointN(t.i), g.STPointN(t.i).STAsText()
from ace as [a]
cross join tally as [t]
where t.i <= g.STNumPoints()

SQL Filtering A Result Set To Return A Maximum Amount Of Rows At Even Intervals

I currently use SQL2008 where I have a stored procedure that fetches data from a table that then gets fed in to a line graph on the client. This procedure takes a from date and a too date as parameters to filter the data. This works fine for small datasets but the graph gets a bit muddled when a large date range is entered causes thousends of results.
What I'd like to do is provide a max amount of records to be returned and return records at evenly spaced intervals to give that amount. For example say I limited it to 10 records and the result set was 100 records I'd like the stored procedure to return every 10th record.
Is this possible wihtout suffering big performance issues and what would be the best way to achieve it? I'm struggling to find a way to do it without cursors and if thats the case I'd rather not do it at all.
Thanks
Assuming you use at least SQL2005, you could do somesting like
WITH p as (
SELECT a, b,
row_number() OVER(ORDER BY time_column) as row_no,
count() OVER() as total_count
FROM myTable
WHERE <date is in range>
)
SELECT a, b
FROM p
WHERE row_no % (total_cnt / 10) = 1
The where condition in the bottom calculates the modulus of the row number by the total number of records divided by the required number of final records.
If you want to use the average instead of one specific value, you would extend this as follows:
WITH p as (
SELECT a, b,
row_number() OVER(ORDER BY time_column) as row_no,
count() OVER() as total_count
FROM myTable
WHERE <date is in range>
),
a as (
SELECT a, b, row_no, total_count,
avg(a) OVER(partition by row_no / (total_cnt / 10)) as avg_a
FROM p
)
SELECT a, b, avg_a
FROM a
WHERE row_no % (total_cnt / 10) = 1
The formula to select one of the values in the final WHERE clause is used with the % replaced by / in the partition by clause.

Resources