SQL query to get start and end date from a result set - sql-server

I am working on one of requirement the raw data is in following format
Requirement - Startdate should be the date when status changed to 1 and enddate should be the 1st date after the record status changed from 1 to any other number.
Customer
Status
Date
A123
0
7/2/2021
A123
0
7/15/2021
A123
0
7/22/2021
A123
1
8/18/2021
A123
1
9/8/2021
A123
0
12/1/2021
A123
0
1/21/2022
A123
1
3/6/2022
A123
1
3/7/2022
A123
0
3/15/2022
B123
1
1/1/2022
B123
0
1/6/2022
C123
1
1/2/2022
C123
2
1/8/2022
C123
0
1/9/2022
expected output
Customer
StartDate
EndDate
A123
8/18/2021
12/1/2021
A123
9/8/2021
12/1/2021
A123
3/6/2022
3/15/2022
A123
3/7/2022
3/15/2022
B123
1/1/2022
1/6/2022
C123
1/2/2022
1/8/2022
Query I tried to get the output is below, I am getting the output for Customer B123 and C123, but not for A123 as expected.
Query Explanation - In 1st part of query I am taking all the records with status = 1 and in next part taking only those records where status is not equal to 1, and joining these 2 datasets based on Customer and row number generated.
SELECT A.[Customer],A.StartDate,B.EndDate
from
(
SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] = 1
GROUP BY Customer,Date,[Status]
) A
LEFT JOIN
(
SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] != 1
AND Date>(
SELECT MIN(Date) AS STARTDATE
FROM table1
WHERE [STATUS] = 1
)
GROUP BY Customer,Date,[Status]
) B
ON
(
A.[Customer] = B.[Customer]
AND A.RowNum = B.RowNum
)
ORDER BY A.Startdate

First you list the rows where Status = 1 and then use CROSS APPLY to get the corresponding minimum Date where the Status is not equal to 1
select s.[Customer],
StartDate = s.[Date],
EndDate = e.[Date]
from Table1 s
cross apply
(
select [Date] = min(e.[Date])
from Table1 e
where e.[Customer] = s.[Customer]
and e.[Date] > s.[Date]
and e.[Status] <> 1
) e
where s.[Status] = 1
order by s.[Customer], s.[Date]

Here is a more efficient way to do this without a self-join.
WITH cte01only AS
( SELECT *, CASE Status WHEN 1 THEN 1 ELSE 0 END AS Status1 FROM table1 ),
cteDifference AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Date, Status1)
- ROW_NUMBER() OVER (PARTITION BY Customer, Status1 ORDER BY Date) AS StatusGroup
FROM cte01only
),
cteGroup AS
(
SELECT Customer, StatusGroup, Status1, MIN(Date) As StartDate
FROM cteDifference
GROUP BY Customer, StatusGroup, Status1
),
cteNextDate AS
(
SELECT Customer, StatusGroup, Status1, StartDate,
LEAD(StartDate, 1, NULL) OVER (PARTITION BY Customer ORDER BY StatusGroup) AS EndDate
FROM cteGroup
)
SELECT Customer, StartDate, EndDate
FROM cteNextDate
WHERE Status1 = 1
ORDER BY Customer, StateDate
The key trick here is the second CTE which uses the difference of two ROW_NUMBER() functions to tag the customer records (with the StatusGroup column) into separate partitions by contiguous runs of records whose status is 1 or not 1. After that they can be grouped according to that tag to get the start dates, and then use the LEAD() function to get the following group's StartDate as the current groupings EndDate.
(There may be a more compact way to express this, but I like to layout each stage as a separate CTE.)

Related

How to update rows between two different sets of criteria in SQL Server without using a loop

Issue: How to update rows between two different sets of criteria in SQL Server without using a loop (SQL Server 2014). In other words, for each row in a result set, how to update every row between the first occurrence (with one criterion) and the second occurrence (with different criteria). I think part of the issue is trying to run a TOP N query for every row in the query.
Specifically:
In the example starting table below, how can I update the last 2 columns of dates where:
Update rows between the null Category rows and the last consecutive "M" Category row if the null Category row is preceded by a "S" Category. Category can contain any order of "S", "M", or null.
Set StartDate = IDEndDate+1 day of the "S" row preceding the null row.
Set EndDate = IDEndDate of the last row with a "M" Category.
Here is a SQLFiddle.
Notes: I have done this in the past with a loop (fetch..) but I am trying to do this with a few queries instead kind of like:
step 1: Get work: select all valid null rows (beginning of range)
step 2: for each row above, select the related last "M" row (end of range) and then run a query to update the StartDate, EndDates in each range.
Starting Table:
ID IDStartDate IDEndDate Category
------------------------------------
11 2017-01-01 2017-01-31 S
11 2017-02-02 2017-02-03 null
11 2017-02-03 2017-03-31 M
11 2017-04-01 2017-04-30 M
22 2017-05-01 2017-06-15 S
22 2017-06-16 2017-06-20 null
22 2017-06-21 2017-06-25 M
22 2017-06-26 2017-06-27 null
22 2017-06-28 2017-06-29 S
22 2017-06-30 2017-07-05 M
33 2017-06-30 2017-07-14 M
33 2017-07-15 2017-07-20 S
33 2017-07-21 2017-07-25 null
44 2018-06-30 2018-07-14 S
44 2018-07-15 2018-07-20 M
44 2018-07-21 2018-07-25 null
Desired Ending Table:
ID IDStartDate IDEndDate Category StartDate EndDate
----------------------------------------------------------
11 2017-01-01 2017-01-31 S
11 2017-02-02 2017-02-03 null 2017-02-01 2017-04-30
11 2017-02-03 2017-03-31 M 2017-02-01 2017-04-30
11 2017-04-01 2017-04-30 M 2017-02-01 2017-04-30
22 2017-05-01 2017-06-15 S
22 2017-06-16 2017-06-20 null 2017-06-16 2017-06-25
22 2017-06-21 2017-06-25 M 2017-06-16 2017-06-25
22 2017-06-26 2017-06-27 null
22 2017-06-28 2017-06-29 S
22 2017-06-30 2017-07-05 M
33 2017-06-30 2017-07-14 M
33 2017-07-15 2017-07-20 S
33 2017-07-21 2017-07-25 null
44 2018-06-30 2018-07-14 S
44 2018-07-15 2018-07-20 M
44 2018-07-21 2018-07-25 null
Below is some SQL to create the table and view the query results that I have started. I tried cte, cross apply, outer apply, inner joins... with no luck.
thanks so much!
CREATE TABLE test (
ID INT,
IDStartDate date,
IDEndDate date,
Category VARCHAR (2),
StartDate date,
EndDate date
);
INSERT INTO test (ID, IDStartDate, IDEndDate, Category)
VALUES
(11, '2017-01-01', '2017-01-31', 'S')
,(11, '2017-02-02', '2017-02-03', null)
,(11, '2017-02-03', '2017-03-31', 'M')
,(11, '2017-04-01', '2017-04-30', 'M')
,(22, '2017-05-01', '2017-06-15', 'S')
,(22, '2017-06-16', '2017-06-20', null)
,(22, '2017-06-21', '2017-06-25', 'M')
,(22, '2017-06-26', '2017-06-27', null)
,(22, '2017-06-28', '2017-06-29', 'S')
,(22, '2017-06-30', '2017-07-05', 'M')
,(33, '2017-06-30', '2017-07-14', 'M')
,(33, '2017-07-15', '2017-07-20', 'S')
,(33, '2017-07-21', '2017-07-25', null)
,(44, '2018-06-30', '2018-07-14', 'S')
,(44, '2018-07-15', '2018-07-20', 'M')
,(44, '2018-07-21', '2018-07-25', null);
--**************************
--results: shows first rows of each range
--**************************
;with cte as
(
select *
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS RowNum
,LAG(IDEndDate) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastIDEndDate
,LAG(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastCategory
,LEAD(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS nextCategory
from test
)
select * --select first row of each range to update
from cte
where Category is null and lastCategory = 'S' and nextCategory = 'M'
--*******************************
--6 of 8 "new" values are correct (missing NewEndDate for first range)
--*******************************
;with cte as
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS RowNum
,LAG(IDEndDate) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastIDEndDate
,LAG(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastCategory
,LEAD(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS nextCategory
FROM test
), cte2 as
(
select * --find the first/start row of each range
,LAG(RowNum) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastRowNum
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', DateAdd(day, 1, lastIDEndDate), null) as NewStartDate
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', RowNum, null) as NewStartRowNum
from cte
)
select t1.*, t3.*
from cte2 t1
outer apply
(
select top 1 --find the last/ending row of each range
t2.lastIDEndDate as NewEndDate
,t2.lastRowNum as NewEndRowNum
from cte2 t2
where t1.ID = t2.ID
and t1.NewStartRowNum < t2.RowNum
and t2.nextCategory <> 'M'
order by t2.ID, t2.RowNum
) t3
order by t1.ID, t1.RowNum
Here's an attempt on this SQL puzzle.
Basically, it updates from a CTE.
First it calculates a Cummulative sum. To create some kind of ranking.
Then only for rank 2 & 3 it'll calculate the dates.
;WITH CTE AS
(
SELECT ID, IDStartDate, IDEndDate, Category, StartDate, EndDate,
DATEADD(day,1, FIRST_VALUE(IDEndDate) OVER (PARTITION BY ID ORDER BY IDStartDate)) AS NewStartDate,
FIRST_VALUE(IDEndDate) OVER (PARTITION BY ID ORDER BY IDStartDate DESC) AS NewEndDate
FROM
(
SELECT ID, IDStartDate, IDEndDate, Category, StartDate, EndDate,
SUM(CASE WHEN Category = 'S' THEN 2 WHEN Category IS NULL THEN 1 END) OVER (PARTITION BY ID ORDER BY IDStartDate) AS cSum
FROM test t
) q
WHERE cSum IN (2, 3)
)
UPDATE CTE
SET
StartDate = NewStartDate,
EndDate = NewEndDate
WHERE (Category IS NULL OR Category = 'M');
A test on rextester here
I answered my own question. I had two major errors:
1) A Cross Apply (or Outer Apply) is needed for the Top N query to work properly.
Using a cross apply, the Top N query will be run for each row from the inner query.
Using an inner join (or left join), all rows will be returned first from the inner query and the Top N query runs only once.
2) Filtering on "[column] <> 'M'" messed me up as it did not exclude NULL's. I had to use instead "[column] = 'S' or [column] is null"
Final SQL found in rextester
Working code below:
;with cte as
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS RowNum
,LAG(IDEndDate) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastIDEndDate
,LAG(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS lastCategory
,LEAD(Category) OVER(PARTITION BY ID ORDER BY ID, IDStartDate, IDEndDate) AS nextCategory
FROM test
), cte2 as
(
select t1.ID, t1.IDStartDate, t1.IDEndDate --find the first/start row of the range
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', DateAdd(day, 1, lastIDEndDate), null) as NewStartDate
,IIF(Category is null and lastCategory = 'S' and nextCategory = 'M', RowNum, null) as NewStartRowNum
,t3.*
from cte t1
cross apply
(
select top 1 --find the last/ending row of the range
t2.IDEndDate as NewEndDate
,t2.RowNum as NewEndRowNum
from cte t2
where t1.ID = t2.ID
and t1.RowNum < t2.RowNum
and (t2.nextCategory ='S' or t2.nextCategory is null)
order by t1.ID, t1.RowNum
) t3
where Category is null and lastCategory = 'S' and nextCategory = 'M'
)
update t4
set StartDate = NewStartDate
,EndDate = NewEndDate
from cte t4
inner join cte2 t5
on t4.ID = t5.ID
and t4.RowNum Between NewStartRowNum and NewEndRowNum
select * from test

Select Earliest Date Within a Range of Dates Before a Break Occurs

I have been trying to find a solution for getting the most recent start date from a series of date ranges. I have found similar topics on StackOverflow as well as other websites, but none of worked for my specific scenario.
Here are two examples of the data in my database:
Example 1
Start Date | End Date
-----------|-----------
8/26/2006 | 5/31/2016
6/1/2016 | 12/31/2017
1/1/2018 | NULL
For this example, I'm expecting the result of the query to be: 8/26/2006. This is because the start and end dates are continuous all the way back to the original start date.
Example 2
Start Date | End Date
-----------|-----------
7/6/2014 | 11/30/2014
1/1/2019 | NULL
For this example, I'm expecting the result of the query to be: 1/1/2019. This is because there is a break between 11/30/2014 and 1/1/2019.
I don't need a list of all of the dates or even the end dates returned. I just need the earliest start date before a break in the date ranges.
I'm guessing what I need is a recursive CTE to loop through the records, such as this:
WITH CTE AS
(
SELECT
T1.StartDate
,T1.EndDate
FROM
ExampleTable AS T1
LEFT JOIN
ExampleTable AS T2
ON
T1.EmployeeID = T2.EmployeeID
AND T1.StartDate - 1 = T2.EndDate
WHERE
T1.EmployeeID = #EmployeeID
UNION ALL
SELECT
C.EmployeeID
,C.StartDate
,T2.EndDate
FROM
CTE AS C
JOIN
ExampleTable AS T2
ON
C.EmployeeID = T2.EmployeeID
AND T2.StartDate - 1 = C.EndDate
)
SELECT
StartDate
,NULLIF(MAX(ISNULL(EndDate, '32121231')), '32121231') AS EndDate
FROM
CTE
GROUP BY
StartDate;
But no luck. It always returns all of the date ranges I listed in examples 1 or 2. Can anyone help please?
This seems the simplest method to get the result:
SELECT TOP 1 StartDate
FROM YourTable
ORDER BY CASE WHEN LAG(EndDate) OVER (ORDER BY StartDate) = DATEADD(DAY,-1,StartDate) THEN 1 ELSE 0 END,
StartDate DESC;
So, for your data:
WITH VTE AS(
SELECT CONVERT(date, StartDate,101) AS StartDate,
CONVERT(date, EndDate,101) AS EndDate
FROM (VALUES('7/6/2014','11/30/2014'),
('1/1/2019',NULL)) V(StartDate, EndDate))
SELECT TOP 1 StartDate
FROM VTE
ORDER BY CASE WHEN LAG(EndDate) OVER (ORDER BY StartDate) = DATEADD(DAY,-1,StartDate) THEN 1 ELSE 0 END,
StartDate DESC;
WITH VTE AS(
SELECT CONVERT(date, StartDate,101) AS StartDate,
CONVERT(date, EndDate,101) AS EndDate
FROM (VALUES('8/26/2006','5/31/2016'),
('6/1/2016 ','12/31/2017'),
('1/1/2018 ',NULL)) V(StartDate, EndDate))
SELECT TOP 1 StartDate
FROM VTE
ORDER BY CASE WHEN LAG(EndDate) OVER (ORDER BY StartDate) = DATEADD(DAY,-1,StartDate) THEN 1 ELSE 0 END,
StartDate DESC;

SQL frequency count

I have a table in SSMS:
Id Date Value
111 1/1/18 x
111 1/2/18 x
111 1/3/18 y
111 1/4/18 y
111 1/5/18 x
111 1/6/18 x
222 1/3/18 z
222 1/6/18 y
222 1/8/18 y
I want to count for the frequency of latest value . So the output will be:
Id Value Days
111 x 2 *(for 1/5/18 & 1/6/18)*
222 y 3 *(for 1/6/18 & 1/8/18; Here I assume 1/7/18 is a weekend or holiday. Even though my table skips the weekend, we still want to count days for the weekend)*
How would this be done? Many thanks!
Use lag to get the previous row's value and then a running sum to assign groups. Thereafter count the number in the first group.
select id,val,datediff(day,min(date),max(date))+1 as days
from (select t.*,sum(case when val=prev_val then 0 else 1 end) over(partition by id order by date desc) as grp
from (select t.*,lag(val) over(partition by id order by date desc) as prev_val
from tbl t
) t
) t
where grp=1
group by id,val
Try:
SELECT COUNT(*) FROM Table1 WHERE Value =
(
SELECT Value FROM Table1 WHERE Id = MAX(Id)
)
I hope you want this
select Id, count(Date) as "Days", Value from SSMS
group by ID, Value
correct me if I'm wrong
This answer should account for the weekends and holiday assumptions you have made (with another test case).
SELECT
T.Id, T.val, DATEDIFF(DD, COALESCE(T.MaxSwitch, T.MinMatch, T.MaxDate), T.MaxDate) + 1 AS [Days]
FROM (
SELECT
T.Id,
MAX(CASE WHEN T.LastValue IS NULL THEN T.val ELSE '' END) AS [val],
MAX(T.Date) AS [MaxDate],
MAX(CASE WHEN t.val <> t.LastValue THEN T.RunningDate ELSE NULL END) AS [MaxSwitch],
MIN(CASE WHEN t.val = t.LastValue THEN T.[Date] ELSE NULL END) AS [MinMatch]
FROM (SELECT *, LAG(val) OVER (PARTITION BY Id ORDER BY DATE DESC) AS LastValue,
LAG([Date]) OVER (PARTITION BY Id ORDER BY DATE DESC) AS RunningDate FROM #T) T
GROUP BY
T.Id
) T
This approach uses LAG to track previous value and date so that it can determine (1) the last value to get running match, (2) the latest date when value switched to most recent value, and (3) the earliest date with value matching final date. It then calculates the date difference to account for skipping days in table from priority of (A) latest date value switched to recent value, (B) or if no switch occurred, then earliest date with value matching final date.
For the sample data below:
DECLARE #T TABLE (
Id INT, [Date] DATE, val VARCHAR(10)
)
INSERT #T VALUES
('111', '1/1/18', 'x'),
('111', '1/2/18', 'x'),
('111', '1/3/18', 'y'),
('111', '1/4/18', 'y'),
('111', '1/5/18', 'x'),
('111', '1/6/18', 'x'),
('222', '1/2/18', 'y'),
('222', '1/3/18', 'z'),
('222', '1/6/18', 'y'),
('222', '1/8/18', 'y'),
('333', '1/9/18', 'a')
The following output is given:
Id val Days
----------- ---------- -----------
111 x 2 (from OP example)
222 y 3 (from OP example)
333 a 1 (case of single value)

SQL Server query In clause

I have a SQL Server query which is supposed to select those first week and third week login to our portal but didn't login at second week. My problem is, the query below taking about 15 secs to be loaded. Is there any faster way or any problem on my query ?
select
count(distinct id )
from
table_x
where
g in (319, 329)
and enable = 1
and Date between '2016-01-18' and '2016-01-24' --Third Week
and id in (select distinct id
from table_x
where g in (319, 329)
and enable = 1
and Date between '2016-01-05' and '2016-01-11' --First Week
and id not in (select distinct id
from table_x
where g in (319, 329)
and enable = 1
and Date between '2016-01-11' and '2016-01-17' --Second Week
)
)
Try using conditional aggregates (a single where clause and summing 3 case expressions) instead of multiple passes through the table.
SELECT
COUNT(*)
FROM (
SELECT
user_id
, SUM(CASE WHEN [Date] BETWEEN '2016-01-18' AND '2016-01-24' THEN 1 ELSE 0 END) [ThirdWeek]
, SUM(CASE WHEN [Date] BETWEEN '2016-01-11' AND '2016-01-17' THEN 1 ELSE 0 END) [SecondWeek]
, SUM(CASE WHEN [Date] BETWEEN '2016-01-05' AND '2016-01-11' THEN 1 ELSE 0 END) [FirstWeek]
FROM table_x
WHERE x1.g IN (319, 329)
AND x1.enable = 1
AND x1.[Date] BETWEEN '2016-01-05' AND '2016-01-24'
GROUP BY
user_id
) d
WHERE [FirstWeek] > 0
AND [ThirdWeek] > 0
AND [SecondWeek] = 0
While I would expect the above to be a good option, perhaps use of EXISTS/NOT EXISTS could help, note you do NOT need distinct in the following example.
SELECT
COUNT(DISTINCT user_id)
FROM table_x x1
WHERE x1.g IN (319, 329)
AND x1.enable = 1
AND x1.[Date] BETWEEN '2016-01-18' AND '2016-01-24' --Third Week
AND EXISTS (
SELECT
NULL
FROM table_x
WHERE g IN (319, 329)
AND enable = 1
AND Date BETWEEN '2016-01-05' AND '2016-01-11' --First Week
AND x1.user_id = table_x.user_id
)
AND NOT EXISTS (
SELECT
NULL
FROM table_x
WHERE g IN (319, 329)
AND enable = 1
AND Date BETWEEN '2016-01-11' AND '2016-01-17' --Second Week
AND x1.user_id = table_x.user_id
)
;

SQL SMS 2008 -Count column ids and count duplicate ids if createddate is greater than 3 months between ids

*Edit (Hopefully to be more clear)
Table below, I would like to count ids and count duplicate ids where the createddate has a gap of 3 months or more for that ID.
Query I have so far...
if object_id('tempdb..#temp') is not null
begin drop table #temp end
select
top 100
a.id, a.CreatedDate
into #temp
from tbl a
where 1=1
--and year(CreatedDate) = '2015'
if object_id('tempdb..#temp2') is not null
begin drop table #temp2 end
select t.id, count(t.id) as Total_Cnt
into #temp2
from #temp t
group by id
select distinct #temp2.Total_Cnt, #temp2.id, #temp.CreatedDate, DENSE_RANK() over (partition by #temp.id order by createddate) RK
from #temp2
inner join #temp on #temp2.id = #temp.id
where 1=1
order by Total_Cnt desc
Results:
Total_cnt id createddate rk
3 1 01-01-2015 1
3 1 03-02-2015 2
3 1 01-02-2015 3
2 2 05-01-2015 1
2 2 05-02-2015 2
1 3 06-01-2015 1
1 4 07-01-2015 1
Count ids and only count duplicate ids when the createddate from the id is greater than 3 months.
Something like this...
Total_cnt id Countwith3monthgap
3 1 2
2 2 1
1 3 1
1 4 1
You can use a cte and ROW_NUMBER to get your order and self join the cte based on the order..
WITH cte AS
( SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate) Rn
FROM
Test
)
SELECT
c1.ID,
COUNT(CASE WHEN c2.CreatedDate IS NULL THEN 1
WHEN c1.CreatedDate >= DATEADD(month,3,c2.CreatedDate) THEN 1
END)
FROM
cte c1
LEFT JOIN cte c2 ON c1.ID = c2.ID
AND c1.RN = c2.RN + 1
GROUP BY
c1.ID
You also need to use a conditional count where the Previous CreatedDate is null or if the Current CreatedDate is >= the Previous CreatedDate + 3 months
If you happen to be using SQL 2012+ you can also use LAG here to get the same result
SELECT
ID,
COUNT(*)
FROM
(SELECT
ID,
CreatedDate CurrentDate,
LAG(CreatedDate) OVER (PARTITION BY ID ORDER BY CreatedDate) PreviousDate
FROM
Test
) T
WHERE
PreviousDate IS NULL
OR CurrentDate >= DATEADD(month, 3, PreviousDate)
GROUP BY
ID
You can use a lag to get the previous date, Null for the first in the list
SELECT
id,
lag(CreatedDate,1) OVER (PARTITION BY Id ORDER BY CreatedDate) AS PreviousCreateDate,
CreatedDate
FROM #t
You can use that as a subquery and get the difference in months using DATEDIFF
SELECT sub.id,DATEDiff(month, sub.PreviousCreateDate ,sub.CreatedDate)
FROM (SELECT
id,
lag(CreatedDate,1) OVER (PARTITION BY Id ORDER BY CreatedDate) AS PreviousCreateDate,
CreatedDate
FROM #t) sub
WHERE DATEDiff(month, sub.PreviousCreateDate ,sub.CreatedDate) >=3
OR sub.PreviousCreateDate IS NULL
You can then take your totals
SELECT sub.id,COUNT(sub.id) as cnt
FROM (SELECT
id,
lag(CreatedDate,1) OVER (PARTITION BY Id ORDER BY CreatedDate) AS PreviousCreateDate,
CreatedDate
FROM #t) sub
WHERE DATEDIFF(month, sub.PreviousCreateDate ,sub.CreatedDate) >=3
OR sub.PreviousCreateDate IS NULL
GROUP BY sub.id
Note that using datediff the last day of january is three months before the first day of march. That appears to be the logic you were after.
You might want to define your three month gap criteria as
WHERE sub.PreviousCreateDate <= DATEADD(month, -3, sub.CreatedDate)
OR sub.PreviousCreateDate IS NULL
or
WHERE sub.CreatedDate >= DATEADD(month, +3, sub.PreviousCreateDate )
OR sub.PreviousCreateDate IS NULL
I'm guessing that your desired definition of three-month gap doesn't coincide with datediff()'s. Most of the logic here is to look back at the previous date and decide if the gap is big enough to qualify.
When datediff() counts three months difference we still need to make sure the day of month is later than the first one (per example and ID 5). If difference is more than three months then we're good automatically.
But I'm also assuming that you would want to treat the distance from November 30th to February 28th (or 29th in a leap year) as a full three months because the end date falls on the final day of the month. By adjusting the end date by an extra day this is an easy scenario to snag as it will bump the date into the following month and increase the month difference by one as well. If that's not what you want then just remove the dateadd(day, 1, ...) portion and use only the raw CreatedDate value.
You sample data is limited so I'm also making the assumption that the gaps are measure between consecutive dates. If you're wanting to find blocks of runs that don't span more than three months across the set, then that's a different problem and you should clarify with more information.
Since you've indicated that you're probably on SQL Server 2008 you'll have to do without the lag() function. Although the first query could be adjusted for that it's likely easier to go with the second approach at the end.
with diffs as (
select
ID,
row_number() over (partition by ID order by CreatedDate) as RN,
case when
datediff(
month,
lag(CreatedDate, 1) over (partition by ID order by CreatedDate),
CreatedDate
) = 3
and
datepart(
day,
lag(CreatedDate, 1) over (partition by ID order by CreatedDate)
) <= datepart(day, CreatedDate)
or
datediff(
month,
lag(CreatedDate, 1) over (partition by ID order by CreatedDate),
/* adding one day to handle gaps like Nov30 - Feb28/29 and Jan31 - Apr30 */
dateadd(day, 1, CreatedDate)
) >= 4
then 1
else 0
end as GapFlag
from <T> /* <--- your table name here */
), gaps as (
select
ID, RN,
sum(1 + GapFlag) over (partition by ID order by RN) as Counter
from diffs
)
select ID, count(distinct Counter - RN) as "Count"
from gaps
group by ID
The rest of the logic is a typical gaps and islands scenario looking for holes in the sum(1 + GapCount) sequence with the offset of 1 acting pretty much like row_number().
http://sqlfiddle.com/#!6/61b12/3
JamieD77's approach is also valid. I was originally thinking your problem involved more than looking at the rows in sequence. Here's how I would tweak it for the gap definition I've been running with:
with data as (
select ID, CreatedDate, row_number() over (partition by ID order by CreatedDate) as RN
from T
)
select ID, count(*) as "Count"
from data d1 left outer join data d0
on d0.ID = d1.ID and d0.RN = d1.RN - 1 /* connect to the one before */
where
datediff(month, d0.CreatedDate, d1.CreatedDate) = 3
and datepart(day, d0.CreatedDate) <= datepart(day, d0.CreatedDate)
or datediff(month, d0.CreatedDate, dateadd(day, 1, d0.CreatedDate)) >= 4
or d0.ID is null
group by ID
Edit: You have changed the question since yesterday.
Change this line in the first query to include the total count:
...
select count(*) as TotalCnt, ID, count(distinct Counter - RN) as GapCount
...
Second would look like:
with data as (
select ID, CreatedDate, row_number() over (partition by ID order by CreatedDate) as RN
from T
)
select
count(*) as TotalCnt, ID,
count(case when
datediff(month, d0.CreatedDate, d1.CreatedDate) = 3
and datepart(day, d0.CreatedDate) <= datepart(day, d0.CreatedDate)
or datediff(month, d0.CreatedDate, dateadd(day, 1, d0.CreatedDate)) >= 4
or d0.ID is null then 1 end
) as GapCount
from data d1 left outer join data d0
on d0.ID = d1.ID and d0.RN = d1.RN - 1 /* connect to the one before */
where
group by ID

Resources