Hoping someone has run across this issue previously and has a solution.
I am trying to find customers who lapse based off subscription periods rather than a single order date.
Lapse is defined by us as not making a purchase/renewal within 30 days of the end of their subscription. A customer can have multiple subscriptions simultaneously and subscriptions can vary in length.
I have a data set that includes customerIDs, Orders, the subscription start date, the subscription expire date, and that order’s rank in the customer’s order history, something like this:
CREATE TABLE #Subscriptions
(CustomerID INT,
Orderid INT,
SubscriptionStart DATE,
SubscriptionEnd DATE,
OrderNumber INT);
INSERT INTO #Subscriptions
VALUES(1, 111111, '2017-01-01', '2017-12-31', 1),
(1, 211111, '2018-01-01', '2019-12-31' ,2),
(1, 311121, '2018-10-01', '2018-10-02', 3),
(1, 451515, '2019-02-01', '2019-02-28', 4),
(2, 158797, '2018-07-01', '2018-07-31', 1),
(2, 287584, '2018-09-01', '2018-12-31', 2),
(2, 387452, '2019-01-01', '2019-01-31', 3),
(3, 187498, '2019-01-01', '2019-02-28', 1),
(3, 284990, '2019-02-01', '2019-02-28', 2),
(4, 184849, '2019-02-01', '2019-02-28', 1)
Within this data set, customer 2 would have lapsed on 2018-07-31. Since Customer 1 has a subscription of 2017-01-01 - 2017-12-31 and then one that starts 2018-01-01 and ends 2019-12-31 they cannot lapse within that time period even if other orders made by the customer would qualify.
I have attempt some of simple gap calculations using LEAD() and LAG(), however, I have had no success due to the variable lengths of the subscription period where a single subscription can span across multiple other orders. Eventually, we will use this to calculate monthly churn rate across approximately 5 million records.
You're overthinking this trying to use LEAD() and LAG(). All you need is a NOT EXISTS() function in the WHERE clause
In psuedocode:
SELECT...FROM...
WHERE {SubscriptionEnd is at least 30 days in the past}
AND NOT EXISTS(
{A row for the same Customer where the StartDate is 30 days or less after this EndDate}
)
This one looks to be a tricky one. You are correct about the problem with using the LEAD() and LAG() functions. It stems from customers being able to have multiple subscriptions of variable length. So we need to deal with that issue first. Let's begin with creating a single list of dates instead of having a list of SubscriptionStart and SubscriptionEnd.
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity ActivityDate
----------- ----------- ----------- ------------
1 111111 1 2017-01-01
1 111111 -1 2017-12-31
1 211111 1 2018-01-01
1 311121 1 2018-10-01
1 311121 -1 2018-10-02
1 451515 1 2019-02-01
1 451515 -1 2019-02-28
1 211111 -1 2019-12-31
2 158797 1 2018-07-01
2 158797 -1 2018-07-31
2 287584 1 2018-09-01
2 287584 -1 2018-12-31
2 387452 1 2019-01-01
2 387452 -1 2019-01-31
3 187498 1 2019-01-01
3 284990 1 2019-02-01
3 187498 -1 2019-02-28
3 284990 -1 2019-02-28
4 184849 1 2019-02-01
4 184849 -1 2019-02-28
Notice the additional Activity field. It is 1 for the SubscriptionStart and -1 for the SubscriptionEnd.
Using this new Activity field it is possible to find places where there might be a lapse in the customer's subscriptions. At the same time use LEAD() to find the NextDate.
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate,
DATEDIFF(d, ActivityDate, LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate)) AS LapsedDays
FROM
SubscriptionList
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
1 111111 1 1 2017-01-01 2017-12-31 364
1 111111 -1 0 2017-12-31 2018-01-01 1
1 211111 1 1 2018-01-01 2018-10-01 273
1 311121 1 2 2018-10-01 2018-10-02 1
1 311121 -1 1 2018-10-02 2019-02-01 122
1 451515 1 2 2019-02-01 2019-02-28 27
1 451515 -1 1 2019-02-28 2019-12-31 306
1 211111 -1 0 2019-12-31 2019-02-28 -306
2 158797 1 1 2018-07-01 2018-07-31 30
2 158797 -1 0 2018-07-31 2018-09-01 32
2 287584 1 1 2018-09-01 2018-12-31 121
2 287584 -1 0 2018-12-31 2019-01-01 1
2 387452 1 1 2019-01-01 2019-01-31 30
2 387452 -1 0 2019-01-31 2019-02-28 28
3 187498 1 1 2019-01-01 2019-02-01 31
3 284990 1 2 2019-02-01 2019-02-28 27
3 187498 -1 1 2019-02-28 2019-02-28 0
3 284990 -1 0 2019-02-28 2019-02-28 0
4 184849 1 1 2019-02-01 2019-02-28 27
4 184849 -1 0 2019-02-28 2019-02-28 0
Adding running total on the Activity field will effectively give the number of active subscriptions. While it is greater than 0 a lapse is not possible. So focus in on the rows WHERE the SubscriptionCount is zero.
Using LEAD() get the NextDate. If there isn't a next date then default to today. If the SubscriptionCount is 0 then the NextDate has to be from a new subscription and the NextDate will be the date that the new subscription starts. Using DATEDIFF count the number of days between the SubscriptionEnd and the SubscriptionBegin if it is > 30 days then there was a lapse. Sounds like a good WHERE statement.
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
, FindLapse AS (
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate
FROM
SubscriptionList
)
SELECT
CustomerId,
OrderId,
Activity,
SubscriptionCount,
ActivityDate,
NextDate,
DATEDIFF(d, ActivityDate, NextDate) AS LapsedDays
FROM
FindLapse
WHERE
SubscriptionCount = 0
AND DATEDIFF(d, ActivityDate, NextDate) >= 30
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
2 158797 -1 0 2018-07-31 2018-09-01 32
Looks like we have a winner!
I've got a problem in SQL Server.
"Whate'er is well conceived is clearly said, And the words to say it flow with ease", Nicolas Boileau-Despreaux
Well, I don't think I'll be able to make it clear but I'll try ! And I'd like to apologize for my bad english !
I've got this table :
id ind lvl result date
1 1 a 3 2017-01-31
2 1 a 3 2017-02-28
3 1 a 1 2017-03-31
4 1 a 1 2017-04-30
5 1 a 1 2017-05-31
6 1 b 1 2017-01-31
7 1 b 3 2017-02-28
8 1 b 3 2017-03-31
9 1 b 1 2017-04-30
10 1 b 1 2017-05-31
11 2 a 3 2017-01-31
12 2 a 1 2017-02-28
13 2 a 3 2017-03-31
14 2 a 1 2017-04-30
15 2 a 3 2017-05-31
I'd like to count the number of month the combo {ind, lvl} remain in the result 1 before re-initializing the number of month to 0 if the result is not 1.
Clearly, I need to get something like that :
id ind lvl result date BadResultRemainsFor%Months
1 1 a 3 2017-01-31 0
2 1 a 3 2017-02-28 0
3 1 a 1 2017-03-31 1
4 1 a 1 2017-04-30 2
5 1 a 1 2017-05-31 3
6 1 b 1 2017-01-31 1
7 1 b 3 2017-02-28 0
8 1 b 3 2017-03-31 0
9 1 b 1 2017-04-30 1
10 1 b 1 2017-05-31 2
11 2 a 3 2017-01-31 0
12 2 a 1 2017-02-28 1
13 2 a 3 2017-03-31 0
14 2 a 1 2017-04-30 1
15 2 a 3 2017-05-31 0
So that if I was looking for the number of months the result was 1 for the date 2017-05-31 with the id 1 and the lvl a, I know it's been 3 months.
Assume all the date the the end day of month:
;WITH tb(id,ind,lvl,result,date) AS(
select 1,1,'a',3,'2017-01-31' UNION
select 2,1,'a',3,'2017-02-28' UNION
select 3,1,'a',1,'2017-03-31' UNION
select 4,1,'a',1,'2017-04-30' UNION
select 5,1,'a',1,'2017-05-31' UNION
select 6,1,'b',1,'2017-01-31' UNION
select 7,1,'b',3,'2017-02-28' UNION
select 8,1,'b',3,'2017-03-31' UNION
select 9,1,'b',1,'2017-04-30' UNION
select 10,1,'b',1,'2017-05-31' UNION
select 11,2,'a',3,'2017-01-31' UNION
select 12,2,'a',1,'2017-02-28' UNION
select 13,2,'a',3,'2017-03-31' UNION
select 14,2,'a',1,'2017-04-30' UNION
select 15,2,'a',3,'2017-05-31'
)
SELECT t.id,t.ind,t.lvl,t.result,t.date
,CASE WHEN t.isMatched=1 THEN ROW_NUMBER()OVER(PARTITION BY t.ind,t.lvl,t.id-t.rn ORDER BY t.id) ELSE 0 END
FROM (
SELECT t1.*,c.MonthDiff,CASE WHEN c.MonthDiff=t1.result THEN 1 ELSE 0 END AS isMatched
,CASE WHEN c.MonthDiff=t1.result THEN ROW_NUMBER()OVER(PARTITION BY t1.ind,t1.lvl,CASE WHEN c.MonthDiff=t1.result THEN 1 ELSE 0 END ORDER BY t1.id) ELSE null END AS rn
FROM tb AS t1
LEFT JOIN tb AS t2 ON t1.ind=t2.ind AND t1.lvl=t2.lvl AND t2.id=t1.id-1
CROSS APPLY(VALUES(ISNULL(DATEDIFF(MONTH,t2.date,t1.date),1))) c(MonthDiff)
) AS t
ORDER BY t.id
id ind lvl result date
----------- ----------- ---- ----------- ---------- --------------------
1 1 a 3 2017-01-31 0
2 1 a 3 2017-02-28 0
3 1 a 1 2017-03-31 1
4 1 a 1 2017-04-30 2
5 1 a 1 2017-05-31 3
6 1 b 1 2017-01-31 1
7 1 b 3 2017-02-28 0
8 1 b 3 2017-03-31 0
9 1 b 1 2017-04-30 1
10 1 b 1 2017-05-31 2
11 2 a 3 2017-01-31 0
12 2 a 1 2017-02-28 1
13 2 a 3 2017-03-31 0
14 2 a 1 2017-04-30 1
15 2 a 3 2017-05-31 0
By slightly tweaking your input data and slightly tweaking how we define the requirement, it becomes quite simple to produce the expected results.
First, we tweak your date values so that the only thing that varies is the month and year - the days are all the same. I've chosen to do that my adding 1 day to each value1. The fact that this produces results which are one month advanced doesn't matter here, since all values are similarly transformed, and so the monthly relationships stay the same.
Then, we introduce a numbers table - here, I've assumed a small fixed table is adequate. If it doesn't fit your needs, you can easily locate examples online for creating a large fixed numbers table that you can use for this query.
And, finally, we recast the problem statement. Instead of trying to count months, we instead ask "what's the smallest number of months, greater of equal to zero, that I need to go back from the current row, to locate a row with a non-1 result?". And so, we produce this query:
declare #t table (id int not null,ind int not null,lvl varchar(13) not null,
result int not null,date date not null)
insert into #t(id,ind,lvl,result,date) values
(1 ,1,'a',3,'20170131'), (2 ,1,'a',3,'20170228'), (3 ,1,'a',1,'20170331'),
(4 ,1,'a',1,'20170430'), (5 ,1,'a',1,'20170531'), (6 ,1,'b',1,'20170131'),
(7 ,1,'b',3,'20170228'), (8 ,1,'b',3,'20170331'), (9 ,1,'b',1,'20170430'),
(10,1,'b',1,'20170531'), (11,2,'a',3,'20170131'), (12,2,'a',1,'20170228'),
(13,2,'a',3,'20170331'), (14,2,'a',1,'20170430'), (15,2,'a',3,'20170531')
;With Tweaked as (
select
*,
DATEADD(day,1,date) as dp1d
from
#t
), Numbers(n) as (
select 0 union all select 1 union all select 2 union all select 3 union all select 4
union all
select 5 union all select 6 union all select 7 union all select 8 union all select 9
)
select
id, ind, lvl, result, date,
COALESCE(
(select MIN(n) from Numbers n1
inner join Tweaked t2
on
t2.ind = t1.ind and
t2.lvl = t1.lvl and
t2.dp1d = DATEADD(month,-n,t1.dp1d)
where
t2.result != 1
),
1) as [BadResultRemainsFor%Months]
from
Tweaked t1
The COALESCE is just there to deal with the edge case, such as for your 1,b data, where there is no previous row with a non-1 result.
Results:
id ind lvl result date BadResultRemainsFor%Months
----------- ----------- ------------- ----------- ---------- --------------------------
1 1 a 3 2017-01-31 0
2 1 a 3 2017-02-28 0
3 1 a 1 2017-03-31 1
4 1 a 1 2017-04-30 2
5 1 a 1 2017-05-31 3
6 1 b 1 2017-01-31 1
7 1 b 3 2017-02-28 0
8 1 b 3 2017-03-31 0
9 1 b 1 2017-04-30 1
10 1 b 1 2017-05-31 2
11 2 a 3 2017-01-31 0
12 2 a 1 2017-02-28 1
13 2 a 3 2017-03-31 0
14 2 a 1 2017-04-30 1
15 2 a 3 2017-05-31 0
1An alternative way to perform the adjustment is to use a DATEADD/DATEDIFF pair to perform a "floor" operation against the dates:
DATEADD(month,DATEDIFF(month,0,date),0) as dp1d
Which resets all of the date values to be the first of their own month rather than the following month. This may fell more "natural" to you, or you may already have such values available in your original data.
Assuming the dates are continously increasing in month, you can use window function like so:
select
t.id, ind, lvl, result, dat,
case when result = 1 then row_number() over (partition by grp order by id) else 0 end x
from (
select t.*,
dense_rank() over (order by e, result) grp
from (
select
t.*,
row_number() over (order by id) - row_number() over (partition by ind, lvl, result order by id) e
from your_table t
order by id) t ) t;