SQL Server : find Cust with Continuous Enrollment - sql-server

I have a task to solve well known problem in industry task to ID those CustID who have continuous activity , for given period of time and we allow little breaks between contracts.
I did first part populating matrix table like in snippet below for whole period of time and setting flag if it's active for this date, I think this is the only reliable way to do this, as contracts can have overlaps, etc..
So now I need to check if CustID is 1/0 for cont activity, I stuck into the task how to track this, let say in my example there is 3 days break which is OK, but I need to make sure that those days are one after another.
Do you have any good ideas how I can do this nicely, appreciate your help and leads. I saw some examples but they done in SAS so it's hard to understand.
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
;with matrix_dd as
(
select *
from
(select 111 CustID, '2015-1-11' dd, 1 Active union
select 111 CustID, '2015-1-12' dd, 0 Active union
select 111 CustID, '2015-1-13' dd, 0 Active union
select 111 CustID, '2015-1-14' dd, 0 Active union
select 111 CustID, '2015-1-15' dd, 1 Active union
select 111 CustID, '2015-1-16' dd, 1 Active union
select 111 CustID, '2015-1-17' dd, 1 Active union
select 111 CustID, '2015-1-18' dd, 1 Active union
select 111 CustID, '2015-1-19' dd, 0 Active union
select 111 CustID, '2015-1-20' dd, 0 Active) a
)
select *
from matrix_dd
Best
M

This solution calculates the active ranges and how long of a break it's been since the last interval ended:
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
with matrix_dd as
(
select * from ( values
(111, '2015-1-11', 1 ),
(111, '2015-1-12', 0 ),
(111, '2015-1-13', 0 ),
(111, '2015-1-14', 0 ),
(111, '2015-1-15', 1 ),
(111, '2015-1-16', 1 ),
(111, '2015-1-17', 1 ),
(111, '2015-1-18', 1 ),
(111, '2015-1-19', 0 ),
(111, '2015-1-20', 0 )
) as x(CustID, dd, Active)
), active_with_groups as (
select *,
row_number() over (partition by CustID order by dd) -
datediff(day, '2000-01-01', dd) as gid
from matrix_dd
where active = 1
and dd between #PeriodStart and #PeriodEnd
), islands as (
select CustId, min(dd) as islandStart, max(dd) as islandEnd
from active_with_groups
group by CustID, gid
), islands_with_gaps as (
select *,
datediff(
day,
lag(islandEnd, 1, islandStart)
over (partition by CustID order by islandStart),
islandStart
) - 1 as [break]
from islands
)
select *
from islands_with_gaps
where [break] >= #maxBreak
order by islandStart
Let's break it down. In the "active_with_groups" common table expression (CTE), all I'm doing is converting the dates into integers that have the same relationship by using datediff(). Why? Integers are easier to work with for this problem. Note that I'm also using row_number() to get a contiguous sequence and then getting the difference between that and the datediff() value. The key observation is that if the days also don't go up contiguously, that difference will be, well, different. Likewise, if the dates do go up contiguously, then the difference will be the same. Therefore, we can use this value as a group identifier for values that are in a contiguous range.
Next, we use that the group identifier to group by (bet you didn't see that coming!). This gives us the start and end of each interval. Nothing very clever is going on here.
The next step is to calculate the amount of time that's passed between when the last interval ended and the current one began. For this, we use a simple call to the lag() function. The only thing to note here is that I've chosen to have the lag() function emit a default value of islandStart in the case of the first interval. It could have just as easily been no default (which would have then caused it to emit a NULL value).
Lastly, we look for intervals with a gap over the specified threshold.

Similar to Ben's answer. I'm assuming that all your dates are represented in the data. So really we just need to make sure there isn't a run of zeroes longer than 3.
with inactive_runs as (
select
CustID,
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where 3 >= all (
select count(*) from inactive_runs ir
where ir.CustID = m.CustID
group by grp
);
http://rextester.com/AHI22250
Using all isn't particularly common. Here's an alternative:
...
with inactive_runs as (
select
CustID, dd, /* <-- had to add dd */
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from #matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where not exists (
select 1 from inactive_runs ir
where ir.CustID = m.CustID
group by grp
having datediff(day, min(dd), max(dd)) > 2
);
I glanced at your comment above. I think it confirms my suspicion that you've got a single row for every date. If you've got a new version of SQL Server you can just sum over the previous three rows. Unfortunately you wouldn't be able to use a variable for the window size if the length is variable:
with cust as (
select
CustID,
case when
sum(case when Active = 0 then 1 end) over (
partition by CustID
order by dd
rows between 3 preceding and current row
) = 4 then 1
end as isBrk
from matrix_dd
)
select CustID
from cust
group by CustID
having count(isBrk) = 0;
Edit:
Based on your comment with the data in a "pre-matrix" format, yes, that's a simpler query. At that point you're just looking at the previous end date and the current row's start date.
with data as (
select * from (
values (111, 1230, '2014-12-11', '2015-01-11'),
(111, 1231, '2015-01-15', '2015-01-18'),
(111, 1232, '2015-03-22', '2015-04-01')
) as t (CustID, ContractID, StartDD, EndDD)
), gaps as (
select
CustID,
datediff(day,
lag(EndDD, 1, StartDD) over (partition by CustID order by StartDD),
StartDD
) as days
from data
)
select CustID
from gaps
group by CustID;
having max(days) <= 3;

Related

Is it possible to use the SQL DATEADD function but exclude dates from a table in the calculation?

Is it possible to use the DATEADD function but exclude dates from a table?
We already have a table with all dates we need to exclude. Basically, I need to add number of days to a date but exclude dates within a table.
Example: Add 5 days to 01/08/2021. Dates 03/08/2021 and 04/08/2021 exist in the exclusion table. So, resultant date should be: 08/08/2021.
Thank you
A bit of a "wonky" solution, but it works. Firstly we use a tally to create a Calendar table of dates, that exclude your dates in the table, then we get the nth row, where n is the number of days to add:
DECLARE #DaysToAdd int = 5,
#StartDate date = '20210801';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3), --Up to 1,000
Calendar AS(
SELECT DATEADD(DAY,T.I, #StartDate) AS D,
ROW_NUMBER() OVER (ORDER BY T.I) AS I
FROM Tally T
WHERE NOT EXISTS (SELECT 1
FROM dbo.DatesTable DT
WHERE DT.YourDate = DATEADD(DAY,T.I, #StartDate)))
SELECT D
FROM Calendar
WHERE I = #DaysToAdd+1;
A best solution is probably a calendar table.
But if you're willing to traverse through every date, then a recursive CTE can work. It would require tracking the total iterations and another column to substract if any traversed date was in the table. The exit condition uses the total difference.
An example dataset would be:
CREATE TABLE mytable(mydate date); INSERT INTO mytable VALUES ('20210803'), ('20210804');
And an example function run in it's own batch:
ALTER FUNCTION dbo.fn_getDays (#mydate date, #daysadd int)
RETURNS date
AS
BEGIN
DECLARE #newdate date;
WITH CTE(num, diff, mydate) AS (
SELECT 0 AS [num]
,0 AS [diff]
,DATEADD(DAY, 0, #mydate) [mydate]
UNION ALL
SELECT num + 1 AS [num]
,CTE.diff +
CASE WHEN DATEADD(DAY, num+1, #mydate) IN (SELECT mydate FROM mytable)
THEN 0 ELSE 1 END
AS [diff]
,DATEADD(DAY, num+1, #mydate) [mydate]
FROM CTE
WHERE (CTE.diff +
CASE WHEN DATEADD(DAY, num+1, #mydate) IN (SELECT mydate FROM mytable)
THEN 0 ELSE 1 END) <= #daysadd
)
SELECT #newdate = (SELECT MAX(mydate) AS [mydate] FROM CTE);
RETURN #newdate;
END
Running the function:
SELECT dbo.fn_getDays('20210801', 5)
Produces output, which is the MAX(mydate) from the function:
----------
2021-08-08
For reference the MAX(mydate) is taken from this dataset:
n diff mydate
----------- ----------- ----------
0 0 2021-08-01
1 1 2021-08-02
2 1 2021-08-03
3 1 2021-08-04
4 2 2021-08-05
5 3 2021-08-06
6 4 2021-08-07
7 5 2021-08-08
You can use the IN clause.
To perform the test, I used a W3Schools Test DB
SELECT DATE_ADD(BirthDate, INTERVAL 10 DAY) FROM Employees WHERE FirstName NOT IN (Select FirstName FROM Employees WHERE FirstName LIKE 'N%')
This query shows all the birth dates + 10 days except for the only employee with name starting with N (Nancy)

Choosing distinct ID with differing column values

Lets say I have this query:
SELECT id, date, amount, cancelled
FROM transactions
Which gives me the following results:
id date amount cancelled
1 01/2019 25.10 0
1 02/2019 19.55 1
1 06/2019 20.33 0
2 10/2019 11.00 0
If there are duplicate IDs, how can I get the one with the latest date? So it would look like this:
id date amount cancelled
1 06/2019 20.33 0
2 10/2019 11.00 0
One method is with ROW_NUMBER and a common table expression like this example. In a multi-statement batch, be mindful to terminate the preceding statement with a semi-colon to avoid parsing errors.
WITH data_with_date_sequence AS (
SELECT
id
, date
, amount
, cancelled
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC) AS seq
FROM dbo.SomeTable
)
SELECT
id
, date
, amount
, cancelled
FROM data_with_date_sequence
WHERE seq = 1;
One option could be to use ROW_NUMBER function, which will group rows by id and order them by date within same id.
;WITH max_dates AS (
SELECT id,
, date
, amount
, cancelled
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS Position
FROM transactions
)
SELECT * FROM max_dates WHERE Position = 1

Inserting "missing records" in a CTE with dates as a key

I have a CTE with the following structure
ITEM, DATE, QUANTITY, EXTRA
AAAA 01-07-2015 100 20
AAAA 04-07-2015 100 13
AAAA 07-07-2015 100 16
AAAA 09-07-2015 100 23
.
.
.
AAAA 31-07-2015 100 30
Basically, what I want to do is get the records out with the 'missing dates' in there and the EXTRA field containing the EXTRA from the previous record (as below)
ITEM, DATE, QUANTITY, EXTRA
AAAA 01-07-2015 100 20
AAAA 02-07-2015 0 20
AAAA 03-07-2015 0 20
AAAA 04-07-2015 100 13
AAAA 05-07-2015 0 13
AAAA 06-07-2015 0 13
AAAA 07-07-2015 100 16
AAAA 08-07-2015 0 16
AAAA 09-07-2015 100 23
.
.
.
AAAA 31-07-2015 100 30
I could just insert the records manually using a mix of LAG and a temp table containing the full list of valid dates + MERGE. However, this CTE is just used for one-time checking then disposed. Is there a better way?
Consider the following:
I made a CTE representing your data (not considering the ITEM since it is not relevant in the example).
With the following two CTEs I'm searching for the missing dates between the min and max of the available data using a recursive query.
Next, you need to join the right data to the available dates. The key is to get the row with the max date from data, as long as data.date is less equal dates.date.
Lastly, my interpretation based on the example is that you want to inherit extra, but not quantity. Hence the case-statement in the select.
with data as (
select cast('2015-07-01' as date) date, 100 quantity, 20 extra union all
select cast('2015-07-04' as date) date, 100 quantity, 13 extra union all
select cast('2015-07-07' as date) date, 100 quantity, 16 extra union all
select cast('2015-07-09' as date) date, 100 quantity, 23 extra
)
, maxDate as (
select MAX(date) date
from data
)
, dates as (
select date
from data
union all
select DATEADD(day,1,date) date
from dates
where DATEADD(day,1,date) not in (select date from data)
and DATEADD(day,1,date) < (select date from maxDate)
)
select dates.date
, case dates.date when data.date then data.quantity else 0 end quantity
, data.extra
from dates
join data on data.date = (select max(date) from data where data.date <= dates.date)
order by 1
Try this;
;with tbl as (
--your data....
),
max_min_val as (
select max(date) as max_date, min(date) as min_date from tbl
),
all_date AS (
SELECT min_date as DateColumn
from max_min_val
UNION ALL
SELECT DATEADD(day,1,DateColumn)
FROM all_date, max_min_val
WHERE DATEADD(day,1,DateColumn) <= max_date
)
select
coalesce(t.item, (select top 1 tt.item from tbl tt where tt.date < x.DateColumn order by tt.date desc)) as item,
x.DateColumn as date,
coalesce(t.QUANTITY, 0) as QUANTITY,
coalesce(t.EXTRA, (select top 1 tt.EXTRA from tbl tt where tt.date < x.DateColumn order by tt.date desc)) as EXTRA
from all_date x
left join
tbl t on t.date = x.DateColumn
Here all_date will have all date between min and max date in tbl.
Left join all_date will give all the dates
coalesce(t.QUANTITY, 0) as QUANTITY will give t.QUANTITY if not null else 0
EXTRA and item will give previous rows data from CTE tbl based on date.

Conditional counting based on comparison to previous row sql

Let's start with a sample of the data I'm working with:
Policy No | start date
1 | 2/15/2006
1 | 2/15/2009
1 | 2/15/2012
2 | 3/15/2006
3 | 3/19/2006
3 | 3/19/2012
4 | 3/31/2006
4 | 3/31/2009
I'm trying to write code in SQL Server 2008 that counts a few things. The principle is that the policyholder's earliest start date is when the policy began. Every three years an increase is offered to the client. If they agree to the increase, the start date is refreshed with the same date as the original, three years later. If they decline, nothing is added to the database at all.
I'm trying to not only count the number of times a customer accepted the offer (or increased the start date by three years), but separate it out by first offer or second offer. Taking the original start date and dividing the number of days between now and then by 1095 gets me the total number of offers, so I've gotten that far. What I really want it to do is compare each policy number to the one before it to see if it's the same (it's already ordered by policy number), then count the date change in a new "accepted" column and count the times it didn't change but could have as "declined".
Is this a case where I would need to self-join the table to itself to compare the dates? Or is there an easier way?
are you looking for this :-
Set Nocount On;
Declare #Test Table
(
PolicyNo Int
,StartDate Date
)
Declare #PolicyWithInc Table
(
RowId Int Identity(1,1) Primary Key
,PolicyNo Int
,StartDate Date
)
Insert Into #Test(PolicyNo,StartDate) Values
(1,'2/15/2006')
,(1,'2/15/2009')
,(1,'2/15/2012')
,(2,'3/15/2006')
,(3,'3/19/2006')
,(3,'3/19/2012')
,(4,'3/31/2006')
,(4,'3/31/2009')
Insert Into #PolicyWithInc(PolicyNo,StartDate)
Select t.PolicyNo
,t.StartDate
From #Test As t
Select pw.PolicyNo
,Sum(Case When Datediff(Year,t.StartDate, pw.StartDate) = 3 Then 1 Else 0 End) As DateArrived
,Sum(Case When Datediff(Year,t.StartDate, pw.StartDate) > 3 Then 1 Else 0 End) As DateNotArrived
,Sum(Case When Isnull(Datediff(Year,t.StartDate,pw.StartDate),0) = 3 Then 1 Else 0 End) As Years3IncrementCount
From #PolicyWithInc As pw
Left Join #PolicyWithInc As t On pw.PolicyNo = t.PolicyNo And pw.RowId = (t.RowId + 1)
Group By pw.PolicyNo
Probably below could help:
Set Nocount On;
Declare #Test Table
(
PolicyNo Int
,StartDate Date
)
Insert Into #Test(PolicyNo,StartDate) Values
(1,'2/15/2006')
,(1,'2/15/2009')
,(1,'2/15/2012')
,(2,'3/15/2006')
,(3,'3/19/2006')
,(3,'3/19/2012')
,(4,'3/31/2006')
,(4,'3/31/2009')
select PolicyNo, StartDate, dateadd(yy, 3, StartDate)Offer1, dateadd(yy, 6, StartDate)Offer2, dateadd(yy, 9, StartDate)Offer3 from
(select * , row_number() over (partition by PolicyNo order by StartDate) rn from #Test)A
where rn = 1
select
count(*) * 3 TotalOffersMade,
count(Data1.StartDate) FirstOfferAccepted,
count(Data2.StartDate) SecondOfferAccepted,
count(Data3.StartDate) ThirdOfferAccepted,
count(*) - count(Data1.StartDate) FirstOfferDeclined,
count(*) - count(Data2.StartDate) SecondOfferDeclined,
count(*) - count(Data3.StartDate) ThirdOfferDeclined
from
(
select PolicyNo, StartDate, dateadd(yy, 3, StartDate)Offer1, dateadd(yy, 6, StartDate)Offer2, dateadd(yy, 9, StartDate)Offer3 from
(select * , row_number() over (partition by PolicyNo order by StartDate) rn from #Test)A
where rn = 1
)Offers
LEFT JOIN
#Test Data1
on Offers.PolicyNo = Data1.PolicyNo and Offers.Offer1 = Data1.StartDate
LEFT JOIN
#Test Data2
on Offers.PolicyNo = Data2.PolicyNo and Offers.Offer2 = Data2.StartDate
LEFT JOIN
#Test Data3
on Offers.PolicyNo = Data3.PolicyNo and Offers.Offer3 = Data3.StartDate

Tsql getting depending data over multiple rows in one query

I would like to calculate an average of a value in one year. I have a historical data table that saves the changes of the value in time.
I know how to do this with a (sub)query for each individual month, but Im hopeful that there is a simple way to do it in one query.
Example:
ID, Value, DateUntilActivity
1, 10.00, 2014-03-01
2, 5.00, 2014-05-01
3, 3.00, 2014-07-01
4, 12.00, 2014-10-01
So - the correct calculation here is:
(2x10.00 + 2x5.00 + 2x3.00 + 3x12.00 + 3x<current_value_in_a_different_table>)/12
The calculation includes the number of moths the data was active for - the first value, 10.00 was valid in 2 months - January and February.
And consider the value current_value_in_a_different_table a fixed value.
Also, it needs to work on MSSQL server 2005.
Thank you in advance!
;with cte as
(
select value, DateUntilActivity from yourtable
union
select 100 as currentvalue, '2015-1-1' from yourothertable
)
select avg(value)
from
(
select (select top 1 value from cte where DateUntilActivity>DATEADD(MONTH,number, '2014-1-1') order by DateUntilActivity ) as value
from master..spt_values
where type='p' and number <=11
) v
If my memory is wrong and you can't use a CTE, this is equivalent to
select avg(value)
from
(
select
(select top 1 value
from
(
select value, DateUntilActivity from yourtable
union
select 100 as currentvalue, '2015-1-1' from yourothertable
) v
where DateUntilActivity>DATEADD(MONTH,number, '2014-1-1') order by DateUntilActivity ) as value
from master..spt_values
where type='p' and number <=11
) v

Resources