Determine consecutive date count in SQL Server - sql-server

I have some data that looks like this:
id date
--------------------------------
123 2013-04-08 00:00:00.000
123 2013-04-07 00:00:00.000
123 2013-04-06 00:00:00.000
123 2013-04-04 00:00:00.000
123 2013-04-03 00:00:00.000
I need to return a count of the most recent consecutive date streak for a given ID, which in this case would be 3 for id 123. I have no idea if this can be done in SQL. Any suggestions?

The way to do this is to subtract a sequence of numbers and take the difference. This is a constant for a sequence of dates. Here is an example to get the length of all sequences for an id:
select id, grp, count(*) as NumInSequence, min(date), max(date)
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
To get the longest one, I would use row_number() again:
select t.*
from (select id, grp, count(*) as NumInSequence,
min(date) as mindate, max(date) as maxdate,
row_number() over (partition by id order by count(*) desc) as seqnum
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
) t
where seqnum = 1

Related

Update gaps in sequential table

I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);
Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days
Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp

How to transpose min/max value to columns in SQL Server using Partition?

I want to get the MIN and MAX from a certain values and put them in columns beside each other. Here's my query but I don't know how to transpose the values...
SELECT *
, MIN([CID]) OVER (PARTITION BY [TID] ORDER BY [TID]) MinID
, MAX([CID]) OVER (PARTITION BY [TID] ORDER BY [TID]) MaxID
Given:
TID CID DATE
123456789 1 01JAN
123456789 2 02JAN
123456789 3 03JAN
123456789 4 04JAN
Result:
TID CID DATE MIN MAX DATEMIN DATEMAX
123456789 1 01JAN 1 4 01JAN 04JAN
Isn't simple aggregation good enough here?
select
tid,
min(cid) min_cid,
max(cid) max_cid,
min(date) min_date,
max(date) max_date
from mytable
group by tid
Or, if the cids and dates are not changing accordingly, you can use conditional aggregation:
select
tid,
max(case when rn_asc = 1 then cid end) cid_at_min_date,
max(case when rn_desc = 1 then cid end) cid_at_max_date,
min(date) min_date,
max(date) max_date
from (
select
t.*,
row_number() over(partition by tid order by cdate asc ) rn_asc,
row_number() over(partition by tid order by cdate desc) rn_desc
from mytable t
) t
where 1 in (rn_asc, rn_desc)
group by tid
This orders records by cdate, and gives you the cids that correspond to the minimum and maximum date. You can easily adapt the query if you want things the other way around (basically, switch cid and cdate).

Reporting Previous Records on SQL Server

I’m struggling a bit here. The data is fabricated, but the query concept is very real.
I need to select the Customer, Current Amount, Previous Amount, Sequence and Date
WHERE DATE < 1190105
AND the DATE/SEQ is the maximum date/seq prior to that date point grouping by customer.
I’ve spent quite a few days trying all sorts of things using HAVING, nested select to try and obtain the max-date/amount and min-date/amount by customer and can’t quite get my head around it. I’m sure it should be quite easy, but any help you can offer would be really appreciated.
Thanks
**SEQ DATE CUSTOMER AMOUNT**
1 1181225 Bob 400
2 1181226 Fred 300
3 1190101 Bob 100
4 1190104 Fred 500
5 1190104 George 200
6 1190105 Bob 150
7 1190106 Bob 200
8 1190110 Fred 160
9 1190110 Bob 300
10 1190112 Fred 400
Opt 1 use row number and lag functions
SELECT
ROW_NUMBER() OVER (Partition By CustomerID Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
Lead(Amount) OVER (Partition By CustomerID, Order By [Date]) as PreviousAmount
FROM
YourTable
WHERE
[DATE] < 1190105
Opt use outer apply
SELECT
ROW_NUMBER() OVER (Partition By Customer Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
Prev.Amount as PreviousAmount
FROM
YourTable T
OUTER APPLY (
SELECT TOP 1 Amount FROM YourTable
WHERE Customer = T.Customer AND [Date] < T.[Date]
ORDER BY [DATE] DESC
) Prev
WHERE
DATE < 1190105
Opt 3 use a correlated subquery
SELECT
ROW_NUMBER() OVER (Partition By Customer Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
(
SELECT TOP 1 Amount FROM YourTable
WHERE Customer = T.Customer AND [Date] < T.[Date]
ORDER BY [DATE] DESC
) as PreviousAmount
FROM YourTable
WHERE
DATE < 1190105
First restrict the rows with the date filter, then search for the max by customer.
Using GROUP BY:
DECLARE #FilterDate INT = 1190105
;WITH MaxDateByCustomer AS
(
SELECT
T.CUSTOMER,
MaxSEQ = MAX(T.SEQ)
FROM
YourTable AS T
WHERE
T.Date < #FilterDate
GROUP BY
T.CUSTOMER
)
SELECT
T.*
FROM
YourTable AS T
INNER JOIN MaxDateByCustomer AS M ON
T.CUSTOMER = M.CUSTOMER AND
T.SEQ = M.MaxSEQ
Using ROW_NUMBER window function:
DECLARE #FilterDate INT = 1190105
;WITH DateRankingByCustomer AS
(
SELECT
T.*,
DateRanking = ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER ORDER BY T.SEQ DESC)
FROM
YourTable AS T
WHERE
T.Date < #FilterDate
)
SELECT
D.*
FROM
DateRankingByCustomer AS D
WHERE
D.DateRanking = 1

how to select rows when minute value change

I have a table named
Ship(Date datetime,name varchar(50),Type char(1)).
In table Ship The "Date" Column is of datetime datatype. I want to select rows from ship table when minute value in Date column (of Datetime datatype) changes. For this i used the following query:
;WITH x AS
(
SELECT Name, Date,Type, rn = ROW_NUMBER() OVER
(PARTITION BY Date ORDER BY Date desc)
FROM Ship
)
SELECT * FROM x WHERE rn = 1
But the desired output is not coming. The Result coming is:
Date Name Type
2017-05-08 14:59:13.000 sumit A
2017-05-08 14:59:23.000 sumit B
2017-05-08 14:59:33.000 sumit A
2017-05-08 15:00:05.000 Ajay B
2017-05-08 15:00:13.000 Deep G
2017-05-08 15:01:03.000 Suri D
2017-05-08 15:01:13.000 Faiz E
Here in above output those rows are also coming when there is a change in second value of Date column. But i want to select rows when there is change in minute value of Date Column.Can anyone solve this?
You could use datediff minute on partition by clause
;WITH x AS
(
SELECT Name, Date,Type, rn = ROW_NUMBER() OVER
(PARTITION BY datediff(min,0,[Date]) ORDER BY Date desc)
FROM Ship
)
SELECT * FROM x WHERE rn = 1
Or it is shorter version
SELECT TOP 1 WITH TIES
Name, Date,Type
FROM Ship
ORDER BY ROW_NUMBER() OVER (PARTITION BY datediff(min,0,[Date]) ORDER BY [date] desc)
Can you add the DATEPART in the PARTITION BY to get the change in the minutes section only.
;WITH x AS
(
SELECT Name, Date,Type, rn = ROW_NUMBER() OVER
(PARTITION BY DATEPART(N, Date) ORDER BY Date desc)
FROM Ship
)
SELECT * FROM x WHERE rn = 1
Refer the Demo: http://rextester.com/YZW84894

SQL Server Select where stage/sequence has been missed or is out of sequence

I have a table that has families_id, date, metric_id
A record gets inserted for each families_id there will be a date & metric_id 1-10.
So there should be 10 records for each families_id, the records get inserted with a date an each should follow on from each other. So metric_id 10 date should be greater than metric_id 6 date.
On mass how can I select where they have
Missed a metric_id
The date for the metric_id 6 is before the date for metric_id 2
use row_number to assign an ordinal to the metric_id and date for each family, then they should match - also metric_id, 1,2,3,4... should match with its calculated row_number(), also 1,2,3,4....
SELECT IQ.* FROM (SELECT families_id, [date], metric_id,
ROW_NUMBER() OVER (PARTITION BY families_id ORDER BY [date]) rn_date,
ROW_NUMBER() OVER (PARTITION BY families_id ORDER BY metricid) rn_metric FROM YourTable) IQ
WHERE IQ.rn_date != IQ.rn_metric;
--should detect wrongly ordered metric_ids
SELECT IQ.* FROM (SELECT families_id, [date], metric_id,
ROW_NUMBER() OVER (PARTITION BY families_id ORDER BY [date]) rn_date,
ROW_NUMBER() OVER (PARTITION BY families_id ORDER BY metricid) rn_metric FROM YourTable) IQ
WHERE IQ.metric_id != IQ.rn_metric;
Another possibility - detect a metricID where the date is earlier for a higher id
SELECT y1.families_id, y1.metric_id FROM yourtable y1
WHERE
EXISTS(SELECT 0 FROM yourtable y2 WHERE y1.families_id = y2.families_id
AND
y2.date < y1.date
AND
y2.metricid > y1.metricid)

Resources