Distance based on Location Datatype - sql-server

I have 2 tables as below.
Cust_Master:
Cust_ID Location Distance WHID
Cust10001 0xE6100000010CA986FD9E58172A40425A63D009685340 ??? ???
Cust10002 0xE6100000010C7BD976DA1A992F4071766B990C835340 ??? ???
WH_Master:
WH_ID Location
WH1001 0xE6100000010C84F068E388C54240373811FDDA5B5340
WH1002 0xE6100000010C5BB1BFEC9E142A407DEA58A5F4675340
I would like to populate Distance and WHID in Cust_Master table based on location from WH_Master. Can some one throw some light on this.

Create a temporary table with a cross join of distances and warehouse ID for the customers, then based on your requirement ( closest or furthest ) perform a select on this table and use the data to populate the customer table. You will need to define a function to compute distances, and then simply run an update based on customer ID.

Using STDistance() to compute distance and then a ranking function should work:
UPDATE Cust_Master
SET
Distance = OuterQuery.Distance,
WHID = OuterQuery.WH_ID
FROM (
SELECT
Cust_ID,
WH_ID,
Distance,
RANK() OVER (PARTITION BY WH_ID ORDER BY Distance ASC) AS 'RNK'
FROM (
SELECT Cust_ID, WH_ID, Cust_Master.Location.STDistance(WH_Master.Location) AS Distance
FROM Cust_Master, WH_Master
) InnerQuery
GROUP BY Cust_ID, WH_ID, Distance
) OuterQuery
WHERE RNK = 1 AND Cust_Master.Cust_ID = OuterQuery.Cust_ID
This updates the Cust_Mastertable with the WH_IDand Distanceto the closest WH_IDin WH_Master. If you want the n closest you can change RNK = 1to RNK = n

Related

Google Maps SQL Server : calculating outlier geographic data within group

There are 100 suppliers, each with between 50 and 1000 items. Each supplier may have items close to their office or spread across an entire country or continent.
As LatLngs are input by a human, some mistakes happen. With lots of data and constant 'churn', mistakes are difficult to identify.
To improve data quality, I want to identify outliers for each supplier so that they can be fixed. If a supplier's items are mostly near New York, one in California would be an outlier.
SUPPLIERS
SupplierID int
Latitude DECIMAL(12,9)
Longitude DECIMAL(12,9)
ITEMS
ItemID int
SupplierID int
LatLng geography
I assume I need to use standard deviation for this, but putting it into T-SQL is giving me a headache.
I'd like to output a list of outliers for each supplier, based on each supplier's specific deviation.
This code outputs Items and the distance between each item and the supplier's office.
WITH cte AS
(
SELECT
ItemID,
SupplierID,
LatLng,
LatLng.STDistance(GEOGRAPHY::Point(a.Latitude, a.Longitude, 4326))/1000 As Distance
FROM
Items v
JOIN
Suppliers a ON v.SupplierID = a.SupplierID
)
SELECT
ItemID, SupplierID, Distance
FROM cte
Here's the SQL functionality for standard deviation (from a blog post):
DECLARE #StdDev DECIMAL(5,2)
DECLARE #Avg DECIMAL(5,2)
SELECT
#StdDev = STDEV(Qty),
#Avg = AVG(Qty)
FROM Sales
SELECT
*
FROM
Sales
WHERE
Qty > #Avg - #StdDev AND
Qty < #Avg + #StdDev
STEPS I NEED TO DO
Calculate STDEV and AVG for distance, GROUP BY SupplierID
Output items where the distance is greater than AVG + STDEV for the item's supplier
This is where I'm scratching my head as this is multiple steps AFTER the multiple steps I've already performed. I guess I could insert what I have into a TEMP table and go from there, but is that really the best way?
You can use window functions for this. Both AVG and STDEV are available as window functions
WITH Distances AS
(
SELECT
i.ItemID,
s.SupplierID,
i.LatLng,
v.SupplierLocation,
i.LatLng.STDistance(v.SupplierLocation)/1000 As Distance
FROM
Items i
JOIN
Suppliers s ON i.SupplierID = s.SupplierID
CROSS APPLY (VALUES (
GEOGRAPHY::Point(s.Latitude, s.Longitude, 4326)
)) v(SupplierLocation)
),
Averages AS (
SELECT
ItemID,
SupplierID,
LatLng,
SupplierLocation
Distance,
AVG(Distance) OVER (PARTITION BY SupplierID) AS Avg,
STDEV(Distance) OVER (PARTITION BY SupplierID) AS StDev
FROM
Distances
)
SELECT
ItemID,
SupplierID,
Distance,
Avg,
StDev
FROM
Averages
WHERE
Distance > Avg - StdDev AND
Distance < Avg + StdDev;

Update a multiple records with duplicate column value

I have a query that identify how many times a ChassisNo was use:
Query:
SELECT
ROW_NUMBER() OVER (
PARTITION BY ChassisNo
ORDER BY datecreated ASC
) row_num,
CollateralType,
LoanID,
ClientID,
CollateralID,
PlateNo,
ChassisNo,
EngineNo,
datecreated,
PreparedBy
FROM
TestAllLoanWithCollaterals
Result:
I highlighted an example of duplicated chassisno three times, some of the chassisno are duplicated 5 times or so, but the main thing is, how can I update all records with the same details with the latest chassisno
Expected result
based on the highlighted example above:
The yellow highlight is the latest record based on the datecreated column and always the last row_num of each chassisno. the blue highlight is the columns that should be updated.
I am thinking of using the Database Cursor but I don't think it is possible.
You may use an update join involving your original table and the logic you have already defined:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ChassisNo ORDER BY datecreated DESC) rn
FROM TestAllLoanWithCollaterals
)
UPDATE a
SET
CollateralType = b.CollateralType
LoanID = b.LoanID
ClientID = b.ClientID
CollateralID = b.CollateralID
PlateNo = b.PlateNo
EngineNo = b.EngineNo
datecreated = b.datecreated
PreparedBy = b.PreparedBy
FROM TestAllLoanWithCollaterals a
INNER JOIN cte b
ON a.ChassisNo = b.ChassisNo
WHERE
b.rn = 1;
Note that the above update logic simply overwrites all fields among duplicate by chassis to use those of the record which were most recently updated in the group.

Finding point of interest on a square wave using sql

Good day,
I have a sql table with the following setup:
DataPoints{ DateTime timeStampUtc , bit value}
The points are on a minute interval, and store either a 1(on) or a 0(off).
I need to write a stored procedure to find the points of interest from all the data points.
I have a simplified drawing below:
I need to find the corner points only. Please note that there may be many data points between a value change. For example:
{0,0,0,0,0,0,0,1,1,1,1,0,0,0}
This is my thinking atm (high level)
Select timeStampUtc, Value
From Data Points
Where Value before or value after differs by 1 or -1
I am struggling to convert this concept to sql, and I also have a feeling there is an more elegant mathematical solution that I am not aware off. This must be a common problem in electronics?
I have wrapped the table into a CTE. Then, I am joining every row in the CTE to the next row of itself. Also, I've added a condition that the consequent rows should differ in the value.
This would return you all rows where the value changes.
;WITH CTE AS(
SELECT ROW_NUMBER() OVER(ORDER BY TimeStampUTC) AS id, VALUE, TIMESTAMPUTC
FROM DataPoints
)
SELECT CTE.TimeStampUTC as "Time when the value changes", CTE.id, *
FROM CTE
INNER JOIN CTE as CTE2
ON CTE.id = CTE2.id + 1
AND CTE.Value != CTE2.Value
Here's a working fiddle: http://sqlfiddle.com/#!6/a0ddc/3
If I got it correct, you are looking for something like this:
with cte as (
select * from (values (1,0),(2,0),(3,1),(4,1),(5,0),(6,1),(7,0),(8,0),(9,1)) t(a,b)
)
select
min(a), b
from (
select
a, b, sum(c) over (order by a rows unbounded preceding) grp
from (
select
*, iif(b = lag(b) over (order by a), 0, 1) c
from
cte
) t
) t
group by b, grp

T-SQL - Get last as-at date SUM(Quantity) was not negative

I am trying to find a way to get the last date by location and product a sum was positive. The only way i can think to do it is with a cursor, and if that's the case I may as well just do it in code. Before i go down that route, i was hoping someone may have a better idea?
Table:
Product, Date, Location, Quantity
The scenario is; I find the quantity by location and product at a particular date, if it is negative i need to get the sum and date when the group was last positive.
select
Product,
Location,
SUM(Quantity) Qty,
SUM(Value) Value
from
ProductTransactions PT
where
Date <= #AsAtDate
group by
Product,
Location
i am looking for the last date where the sum of the transactions previous to and including it are positive
Based on your revised question and your comment, here another solution I hope answers your question.
select Product, Location, max(Date) as Date
from (
select a.Product, a.Location, a.Date from ProductTransactions as a
join ProductTransactions as b
on a.Product = b.Product and a.Location = b.Location
where b.Date <= a.Date
group by a.Product, a.Location, a.Date
having sum(b.Value) >= 0
) as T
group by Product, Location
The subquery (table T) produces a list of {product, location, date} rows for which the sum of the values prior (and inclusive) is positive. From that set, we select the last date for each {product, location} pair.
This can be done in a set based way using windowed aggregates in order to construct the running total. Depending on the number of rows in the table this could be a bit slow but you can't really limit the time range going backwards as the last positive date is an unknown quantity.
I've used a CTE for convenience to construct the aggregated data set but converting that to a temp table should be faster. (CTEs get executed each time they are called whereas a temp table will only execute once.)
The basic theory is to construct the running totals for all of the previous days using the OVER clause to partition and order the SUM aggregates. This data set is then used and filtered to the expected date. When a row in that table has a quantity less than zero it is joined back to the aggregate data set for all previous days for that product and location where the quantity was greater than zero.
Since this may return multiple positive date rows the ROW_NUMBER() function is used to order the rows based on the date of the positive quantity day. This is done in descending order so that row number 1 is the most recent positive day. It isn't possible to use a simple MIN() here because the MIN([Date]) may not correspond to the MIN(Quantity).
WITH x AS (
SELECT [Date],
Product,
[Location],
SUM(Quantity) OVER (PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS Quantity,
SUM([Value]) OVER(PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS [Value]
FROM ProductTransactions
WHERE [Date] <= #AsAtDate
)
SELECT [Date], Product, [Location], Quantity, [Value], Positive_date, Positive_date_quantity
FROM (
SELECT x1.[Date], x1.Product, x1.[Location], x1.Quantity, x1.[Value],
x2.[Date] AS Positive_date, x2.[Quantity] AS Positive_date_quantity,
ROW_NUMBER() OVER (PARTITION BY x1.Product, x1.[Location] ORDER BY x2.[Date] DESC) AS Positive_date_row
FROM x AS x1
LEFT JOIN x AS x2 ON x1.Product=x2.Product AND x1.[Location]=x2.[Location]
AND x2.[Date]<x1.[Date] AND x1.Quantity<0 AND x2.Quantity>0
WHERE x1.[Date] = #AsAtDate
) AS y
WHERE Positive_date_row=1
Do you mean that you want to get the last date of positive quantity come to positive in group?
For example, If you are using SQL Server 2012+:
In following scenario, when the date going to 01/03/2017 the summary of quantity come to 1(-10+5+6).
Is it possible the quantity of following date come to negative again?
;WITH tb(Product, Location,[Date],Quantity) AS(
SELECT 'A','B',CONVERT(DATETIME,'01/01/2017'),-10 UNION ALL
SELECT 'A','B','01/02/2017',5 UNION ALL
SELECT 'A','B','01/03/2017',6 UNION ALL
SELECT 'A','B','01/04/2017',2
)
SELECT t.Product,t.Location,SUM(t.Quantity) AS Qty,MIN(CASE WHEN t.CurrentSum>0 THEN t.Date ELSE NULL END ) AS LastPositiveDate
FROM (
SELECT *,SUM(tb.Quantity)OVER(ORDER BY [Date]) AS CurrentSum FROM tb
) AS t GROUP BY t.Product,t.Location
Product Location Qty LastPositiveDate
------- -------- ----------- -----------------------
A B 3 2017-01-03 00:00:00.000

How to avoid duplicate rows while inserting a set of row from flatfile in SQL SERVER by considering existing column values

I have a table with set of rows with same RecordtypeCode,
then the single/set row coming from a flatfile/other source like below,
finally I need a unique row in my table by elimating the duplicate Recordtypecode & taking the max of other field information,
Finally my table should like this,
What I tried right now?
Fetching all the rows from my table & then union with the new set of records then wrote the stored procedure (using group by & max keyword) to get the desired output in temp table & finally truncate my table & then insert the temp table data to my table.
Is there is any other better ways to avoid performance issue, because i am going to play with millions of records here.
Difficult to answer without more details, but you could try something like this to get grouped results:
SELECT RecordTypeCode,
Max(AgeGroupFemale60_64),
Max(AgeGroupFemale65_69),
Max(AgeGroupFemale70_74)
FROM [TempTable]
GROUP BY RecordTypeCode
Assuming you are using SQL Server 2005+, you could use MAX() OVER to determine maximum flag values within every Recordtypecode group:
SELECT
Recordtypecode,
AgeGroupFemale60_64,
AgeGroupFemale65_69,
AgeGroupFemale70_74,
MAX(AgeGroupFemale60_64) OVER (PARTITION BY Recordtypecode),
MAX(AgeGroupFemale65_69) OVER (PARTITION BY Recordtypecode),
MAX(AgeGroupFemale70_74) OVER (PARTITION BY Recordtypecode)
FROM
dbo.TempTable
and update all the flags with those values:
WITH maximums AS (
SELECT
Recordtypecode,
AgeGroupFemale60_64,
AgeGroupFemale65_69,
AgeGroupFemale70_74,
MaxFemale60_64 = MAX(AgeGroupFemale60_64) OVER (PARTITION BY Recordtypecode),
MaxFemale65_69 = MAX(AgeGroupFemale65_69) OVER (PARTITION BY Recordtypecode),
MaxFemale70_74 = MAX(AgeGroupFemale70_74) OVER (PARTITION BY Recordtypecode)
FROM
dbo.TempTable
)
UPDATE
maximums
SET
AgeGroupFemale60_64 = MaxFemale60_64,
AgeGroupFemale65_69 = MaxFemale65_69,
AgeGroupFemale70_74 = MaxFemale70_74
;
Next, you could use ROW_NUMBER() to enumerate all the rows within the groups:
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
and delete all the rows with rn > 1:
WITH enumerated AS (
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
)
DELETE FROM
enumerated
WHERE
rn > 1
;
Alternatively, instead of the two statements, UPDATE and DELETE, you could use one, MERGE (which now assumes SQL Server 2008+), like this:
WITH enumerated AS (
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
),
maximums AS (
SELECT
Recordtypecode,
MaxFemale60_64 = MAX(AgeGroupFemale60_64),
MaxFemale65_69 = MAX(AgeGroupFemale65_69),
MaxFemale70_74 = MAX(AgeGroupFemale70_74),
rn = 1
FROM
dbo.TempTable
GROUP BY
Recordtypecode
)
MERGE INTO
enumerated AS tgt
USING
maximums AS src
ON
tgt.Recordtypecode = src.Recordtypecode AND tgt.rn = src.rn
WHEN MATCHED THEN
UPDATE SET
tgt.AgeGroupFemale60_64 = src.MaxFemale60_64,
tgt.AgeGroupFemale65_69 = src.MaxFemale65_69,
tgt.AgeGroupFemale70_74 = src.MaxFemale70_74
WHEN NOT MATCHED THEN
DELETE
;
More information:
OVER Clause (Transact-SQL)
MERGE (Transact-SQL)
Note that there are known issues with the MERGE statement that you need to be aware before deciding to use it. You can start with this article to learn more about them and see whether any of them would apply to your situation:
Use Caution with SQL Server's MERGE Statement

Resources