SQL Calculate (time) gap between occurrences in a log - sql-server

I have tables that record when certain items were sent or returned to a particular location, and I want to work out the intervals between each time a particular item is returned.
Sample data:
Item ReturnDate:
Item1, 20120101
Item1, 20120201
Item1, 20120301
Item2, 20120401
Item2, 20120601
So in this case, we can see that the there was a month gap until Item 1 was returned the first time, and another month before it was returned the second time. Item 2 came back after 2 months.
My starting point is:
Select r1.Item, r1.ReturnDate, r2.Item, r2.ReturnDate, DateDiff(m, r1.ReturnDate, r2.ReturnDate)
from Returns r1
inner join Returns r2 on r2.VehicleNo = r1.VehicleNo
However, in the this sample, each item is compared to every other instance where it has been returned - and not just the next one. So I need to limit this query so it will only compare adjacent returns.
One solution is to tag each return with an count (of the number of times that item has been returned):
Item ReturnDate, ReturnNo:
Item1, 20120101, 1
Item1, 20120201, 2
Item1, 20120301, 3
Item2, 20120401, 1
Item2, 20120601, 2
This would enable me to use the following T-SQL (or similar):
Select r1.Item, r1.ReturnDate, r2.Item, r2.ReturnDate, DateDiff(m, r1.ReturnDate, r2.ReturnDate)
from Returns r1
inner join Returns r2 on r2.VehicleNo = r1.VehicleNo
and (r1.ReturnNo + 1 = r2.ReturnNo)
My first question is whether the is a sensible/optimal approach or whether there is a better approach?
Secondly, what is the easiest/slickest means of calculating the ReturnNo?

If you are using SQL Server 2005+, use ROW_NUMBER() to do exactly what you want:
WITH RankedReturn AS
(
SELECT Item, ReturnDate,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ReturnDate DESC) AS ReturnNo
FROM Returns
)
SELECT * FROM RankedReturn
Obviously, now that you have your CTE you can put whatever you need in the outer SELECT. I would use an OUTER APPLY for this:
WITH RankedReturn AS
(
SELECT Item, ReturnDate,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ReturnDate DESC) AS ReturnNo
FROM Returns
)
SELECT rOuter.Item, rOuter.ReturnDate, DATEDIFF(month, prev.PrevDate, ReturnDate) AS Months
FROM RankedReturn rOuter
OUTER APPLY
(
SELECT ReturnDate AS PrevDate
FROM RankedReturn rInner
WHERE rOuter.Item = rInner.Item AND rOuter.ReturnNo = rInner.ReturnNo - 1
) prev
Oops, and the SQL Fiddle is here.
Edited because the month difference calculation was backwards; fixed now

Easiest way of calculating the ReturnNo would be to use OVER:
SELECT [Item], [ReturnDate],
ROW_NUMBER() OVER (PARTITION BY [Item] ORDER BY [ReturnDate]) AS ReturnNumber
FROM Returns
http://sqlfiddle.com/#!3/e18ad/1/0
You could also attempt to make use of the techniques for calculating a running total to work out the difference between two rows.

This is how I would do it:
select itemNo,
dt,
DATEDIFF(day, previousDt, dt) as daysSince
from (select itemNo,
dt,
(select top 1 dt from testTable where itemNo = outerTbl.itemNo and dt < outerTbl.dt order by dt desc) as previousDt
from testTable as outerTbl
) as x
... and here's a bit of setup code for anybody else testing a solution to this
create table testTable(
itemNo nvarchar(20),
dt datetime)
go
insert into testTable values('Item1', '2012-01-01');
insert into testTable values('Item1', '2012-02-01');
insert into testTable values('Item1', '2012-03-01');
insert into testTable values('Item2', '2012-04-01');
insert into testTable values('Item2', '2012-05-01');
go

Related

update using over order by row_number()

I found some answers to ways to update using over order by, but not anything that solved my issue. In SQL Server 2014, I have a column of DATES (with inconsistent intervals down to the millisecond) and a column of PRICE, and I would like to update the column of OFFSETPRICE with the value of PRICE from 50 rows hence (ordered by DATES). The solutions I found have the over order by in either the query or the subquery, but I think I need it in both. Or maybe I'm making it more complicated than it is.
In this simplified example, if the offset was 3 rows hence then I need to turn this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, null
2018-01-03, 8.52, null
2018-02-15, 3.17, null
2018-02-24, 4.67, null
2018-03-18, 2.54, null
2018-04-09, 7.37, null
into this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, 3.17
2018-01-03, 8.52, 4.67
2018-02-15, 3.17, 2.54
2018-02-24, 4.67, 7.37
2018-03-18, 2.54, null
2018-04-09, 7.37, null
This post was helpful, and so far I have this code which works as far as it goes:
select dates, price, row_number() over (order by dates asc) as row_num
from pricetable;
I haven't yet figured out how to point the update value to the future ordered row. Thanks in advance for any assistance.
LEAD is a useful window function for getting values from subsequent rows. (Also, LAG, which looks at preceding rows,) Here's a direct answer to your question:
;WITH cte AS (
SELECT dates, LEAD(price, 2) OVER (ORDER BY dates) AS offsetprice
FROM pricetable
)
UPDATE pricetable SET offsetprice = cte.offsetprice
FROM pricetable
INNER JOIN cte ON pricetable.dates = cte.dates
Since you asked about ROW_NUMBER, the following does the same thing:
;WITH cte AS (
SELECT dates, price, ROW_NUMBER() OVER (ORDER BY dates ASC) AS row_num
FROM pricetable
),
cte2 AS (
SELECT dates, price, (SELECT price FROM cte AS sq_cte WHERE row_num = cte.row_num + 2) AS offsetprice
FROM cte
)
UPDATE pricetable SET offsetprice = cte2.offsetprice
FROM pricetable
INNER JOIN cte2 ON pricetable.dates = cte2.dates
So, you could use ROW_NUMBER to sort the rows and then use that result to select a value 2 rows ahead. LEAD just does that very thing directly.

T-SQL - get only latest row for selected condition

I have table with measurement with column SERIAL_NBR, DATE_TIME, VALUE.
There is a lot of data so when I need them to get the last 48 hours for 2000 devices
Select * from MY_TABLE where [TIME]> = DATEADD (hh, -48, #TimeNow)
takes a very long time.
Is there a way not to receive all the rows for each device, but only the latest entry? Would this speed up the query execution time?
Assuming that there is column named deviceId(change as per your needs), you can use top 1 with ties with window function row_number:
Select top 1 with ties *
from MY_TABLE
where [TIME]> = DATEADD (hh, -48, #TimeNow)
Order by row_number() over (
partition by deviceId
order by Time desc
);
You can simply create Common Table Expression that sorts and groups the entries and then pick the latest one from there.
;WITH numbered
AS ( SELECT [SERIAL_NBR], [TIME], [VALUE], row_nr = ROW_NUMBER() OVER (PARTITION BY [SERIAL_NBR] ORDER BY [TIME] DESC)
FROM MY_TABLE
WHERE [TIME]> = DATEADD (hh, -48, #TimeNow) )
SELECT [SERIAL_NBR], [TIME], [VALUE]
FROM numbered
WHERE row_nr = 1 -- we want the latest record only
Depending on the amount of data and the indexes available this might or might not be faster than Anthony Hancock's answer.
Similar to his answer you might also try the following:
(from MSSQL's point of view, the below query and Anthony's query are pretty much identical and they'll probably end up with the same query plan)
SELECT [SERIAL_NBR] , [TIME], [VALUE]
FROM MY_TABLE AS M
JOIN (SELECT [SERIAL_NBR] , max_time = MAX([TIME])
FROM MY_TABLE
GROUP BY [SERIAL_NBR]) AS L -- latest
ON L.[SERIAL_NBR] = M.[SERIAL_NBR]
AND L.max_time = M.[TIME]
WHERE M.DATE_TIME >= DATEADD(hh,-48,#TimeNow)
Your listed column values and your code don't quite match up so you'll probably have to change this code a little, but it sounds like for each SERIAL_NBR you want the record with the highest DATE_TIME in the last 48 hours. This should achieve that result for you.
SELECT SERIAL_NBR,DATE_TIME,VALUE
FROM MY_TABLE AS M
WHERE M.DATE_TIME >= DATEADD(hh,-48,#TimeNow)
AND M.DATE_TIME = (SELECT MAX(_M.DATE_TIME) FROM MY_TABLE AS _M WHERE M.SERIAL_NBR = _M.SERIAL_NBR)
This will get you details of the latest record per serial number:
Select t.SERIAL_NBR, q.FieldsYouWant
from MY_TABLE t
outer apply
(
selct top 1 t2.FieldsYouWant
from MY_TABLE t2
where t2.SERIAL_NBR = t.SERIAL_NBR
order by t2.[TIME] desc
)q
where t.[TIME]> = DATEADD (hh, -48, #TimeNow)
Also, worth sticking DATEADD (hh, -48, #TimeNow) into a variable rather than calculating inline.

Calculate same day start/end dates as 0 days if another occurrence already exists

I have a query where there are instances where a "phase" starts and ends on the same day - this is calculated as 1 day. If, however, another "phase" starts and ends on the same day against the same ref. no. and period no., then I'd like to calculate this as 0 days.
Example:
**Ref. Period. Phase StDt EndDt**
013 3 KAA 01/01/16 01/01/16 - This is one day
013 3 TAA 02/01/16 03/01/16 - this is 2 days
013 3 KAT 01/01/16 01/01/16 - **would like this to be counted as 0 day**
013 3 TTA 04/04/16 04/04/16 - this is one day
I would like this unique calculation to be done in the data grouped by Ref. And Period numbers. This is a tricky one....
Thanks
Try this.
I am assuming that you are using TSQl (Not sure a you have also tagged SQL.
;WITH cte_result(ID,Ref, Period,Phase,StDt,EndDt) AS
(
SELECT 1,'013' ,3,'KAA',CAST('01/01/16'AS DATETIME),CAST('01/01/16'AS DATETIME) UNION ALL
SELECT 2,'013' ,3,'TAA','01/02/16','01/03/16' UNION ALL
SELECT 3,'013' ,3,'KAT','01/01/16','01/01/16' UNION ALL
SELECT 4,'013' ,3,'TTA','04/04/16','04/04/16')
,cte_PreResult AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CAST(StDt AS DATE), CAST(EndDt AS DATE) ORDER BY ID) AS [Order],
Ref,
Period,
Phase,
StDt,
EndDt
FROM cte_result
)
SELECT Ref,
Period,
Phase,
StDt,
EndDt,
CASE
WHEN [Order] <> 1
THEN '0 Day(s)'
ELSE CAST(DATEDIFF(dd, StDt, EndDt) + 1 AS VARCHAR(10)) + ' Day(s)'
END AS Comment
FROM cte_PreResult
If there is no ID column then select some column to order by, probably Phase so replace ID with Phase as here ROW_NUMBER() OVER (PARTITION BY StDt,EndDt ORDER BY ID) AS [Order], if there is no candidate column to order by then try this
;WITH cte_result(ID,Ref, Period,Phase,StDt,EndDt) AS
(
SELECT 1,'013' ,3,'KAA',CAST('01/01/16'AS DATETIME),CAST('01/01/16'AS DATETIME) UNION ALL
SELECT 2,'013' ,3,'TAA','01/02/16','01/03/16' UNION ALL
SELECT 3,'013' ,3,'KAT','01/01/16','01/01/16' UNION ALL
SELECT 4,'013' ,3,'TTA','04/04/16','04/04/16')
,cte_PreResult AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CAST(StDt AS DATE), CAST(EndDt AS DATE) ORDER BY (SELECT NULL)) AS [Order],
Ref,
Period,
Phase,
StDt,
EndDt
FROM cte_result
)
SELECT Ref,
Period,
Phase,
StDt,
EndDt,
CASE
WHEN [Order] <> 1
THEN '0 Day(s)'
ELSE CAST(DATEDIFF(dd, StDt, EndDt) + 1 AS VARCHAR(10)) + ' Day(s)'
END AS Comment
FROM cte_PreResult
This expression should work on the SSRS side:
=IIF(Fields!StartDate.Value=Fields!EndDate.Value AND Fields!Phase.Value <> LOOKUPSET(Fields!StartDate.Value &"_" & Fields!EndDate.Value,Fields!StartDate.Value & "_" & Fields!EndDate.Value,Fields!Phase.Value,"DatasetName").GetValue(0),0,DATEDIFF("D",Fields!StartDate.Value,Fields!EndDate.Value)+1)
It will return a value of 1 for the first phase returned by the dataset. If the phase-date range combinations are not unique within the grouping, this will not work as written, but you should be able to modify accordingly.
Also, if the rows are sorted differently between SSRS and the dataset, it may not be the first row that appears that gets the 1.
The below did the trick! Basically, I'm using count aggregate to count the number of instances where phases start and end on the same day PER Ref and period. Then, for any where there are more than 1, I just use simple case statments to count the first one as 1 and any subsequent ones as 0. I'm creating the below as a subquery in the joins as a left outer join:
LEFT OUTER JOIN
(SELECT TOP (100) PERCENT Period, Ref,
CONVERT(date, PhaseStartDate) AS stdt, CONVERT(date, PhaseEndDate) AS enddt,
COUNT(*)
AS NoOfSameDayPhases,
MIN(PhaseSequence) AS FirstPhSeq
FROM Phases AS Phases_1
WHERE (CONVERT(date, PhaseStartDate) =
CONVERT(date, PhaseEndDate))
GROUP BY VoidPeriod, Ref, CONVERT(date,
PhaseStartDate), CONVERT(date, PhaseEndDate)) AS SameDayPH ON CONVERT(date,
PhaseEndDate) = SameDayPH.enddt AND CONVERT(date,
PhaseStartDate) = SameDayPH.stdt AND
VoidPeriod = SameDayPH.VoidPeriod AND SameDayPH.Ref =
VoidPhases.Ref

Identify sub-set of records based on date and rules in SQL Server

I have a dataset that looks like this:
I need to identify the rows that have Linked set to 1 but ONLY where they are together when sorted by ToDate descending as in the picture.
In other words I want to be able to identify these records (EDITED):
This is a simplified dataset, in fact there will be many more records...
The logic that defines whether a record is linked is if the FromDate of a record is within 8 weeks of the ToDate of the preceeding date...but this is testData so may not be perfect
What's the best way to do that please?
You can use LAG() and LEAD() analytic functions:
SELECT * FROM (
SELECT t.*,
LAG(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_1, --Next one
LEAD(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_2, -- Last one,
LEAD(t.linked,2,0) OVER(ORDER BY t.FromDate DESC) as rnk_3 -- Last two,
FROM YourTable t) s
WHERE ((s.rnk_1 = 1 OR s.rnk_2 = 1) AND s.linked = 1) OR
(s.rnk_2 = 1 and s.rnk_3 = 1 and s.linked = 0)
ORDER BY s.FromDate DESC
This will result in records that have linked = 1 and the previous/next record is also 1.
Using LAG and LEAD functions you can examine the previous/next row values given a sort criteria.
You can achieve your required dataset using the following DDL:
;
WITH CTE_LagLead
AS (
SELECT FromDate,
ToDate,
NoOfDays,
Weeks,
Linked,
LAG(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLag,
LEAD(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLead
FROM #table
)
SELECT FromDate,
ToDate,
NoOfDays,
Weeks,
Linked
FROM CTE_LagLead
WHERE Linked = 1 AND
(LinkedLag = 1 OR
LinkedLead = 1)
ORDER BY ToDate DESC;
See working example
here is the answer I came up with:
Select
*
from
#tmpAbsences
where
idcol between 1 AND (
Select TOP 1 idcol from #tmpAbsences where Linked=0)
this includes the row 7 in the below picture:

T-SQL CTE self-reference CROSS APPLY previous row by date with gaps

I have an updatable table of date-value sequence (say dbo.sequence) in SQL Server 2014. Dates are unique.
When new updates come I want to distribute that values into different columns in a separate table (say dbo.distributed_values) by certain conditions, e.g. if previous value from dbo.sequence is less/greater than current dbo.sequence value, it gets inserted into specified column of dbo.distributed_values or becomes NULL in that column.
Here is the main idea:
;WITH
CTE_tbl (date, value, val_1, val_2, val_3)
AS (
SELECT ... FROM dbo.distributed_values -- get latest values from database
UNION ALL
SELECT
SEQ.date,
SEQ.value,
CASE
WHEN ABS (SEQ.value - prev.value) >= 0.5
THEN SEQ.value
ELSE NULL
END AS val_1,
...
FROM dbo.sequence AS SEQ
CROSS APPLY (SELECT * FROM CTE_tbl WHERE date = DATEADD(DAY, -1, SEQ.date)) AS prev
)
INSERT INTO dbo.distributed_values (...)
SELECT *
FROM CTE_tbl
ORDER BY date ASC
OPTION (MAXRECURSION 1000)
Seems it works mostly, but the dbo.sequence contains gaps, so I can not use things like date = DATEADD(DAY, -1, SEQ.date) to bind on previous row properly.
2012-01-04
2012-01-05
2012-01-06
2012-01-09
2012-01-10
2012-01-11
How to bind previous value correctly in case of date gaps?
UPD:
By the way, I can not use LAG ... OVER in WHERE clause, I tried. Could it be used here somehow?
Add another CTE and use that in your recursive CTE, something like this:
;WITH
SequenceWithPrevious AS(
SELECT *
,PrevValue = LAG(value,1,NULL) OVER (ORDER BY SEQ.date)
,Prevdate = LAG(date,1,NULL) OVER (ORDER BY SEQ.date)
FROM dbo.sequence AS SEQ
),
CTE_tbl (date, value, val_1, val_2, val_3)
AS (
SELECT ... FROM dbo.distributed_values -- get latest values from database
UNION ALL
SELECT ...
FROM SequenceWithPrevious AS SEQ
CROSS APPLY (SELECT * FROM CTE_tbl WHERE date = SEQ.PrevDate) AS prev
)
INSERT INTO dbo.distributed_values (...)
SELECT *
FROM CTE_tbl
ORDER BY date ASC
OPTION (MAXRECURSION 1000)

Resources