Chained records - counting repetition of records - database

I have records about customer calls like;
PHONENO CALLTIME REP
======== =================== ===
01555444 10.03.2017 10:30:00 N <- first occurence of 01555444
02888999 12.03.2017 11:40:20 N
01555444 15.03.2017 18:22:33 Y <- repeated 1st time 01555444
03666777 18.03.2017 20:36:44 N
01555444 19.03.2017 08:15:47 Y <- repeated 2nd time 01555444
01555444 30.03.2017 22:18:30 N <- first occurence of 01555444 (gap more than 10 days)
If a call occures within next 10 from previous call (from the same phone number), then it is assumed a repeated call (assigned 'Y' in column REP).
Now I want to have the table like this with number of repetitions:
PHONENO CALLTIME REP REPNO
======== =================== === =====
01555444 10.03.2017 10:30:00 N 0
02888999 12.03.2017 11:40:20 N 0
01555444 15.03.2017 18:22:33 Y 1
03666777 18.03.2017 20:36:44 N 0
01555444 19.03.2017 08:15:47 Y 2
01555444 30.03.2017 22:18:30 N 0
REPNO represents the number of (chained) call repetition (within 10 days).
How to calculate this?

Here's a way of doing it that uses the tabibitosan method to identify the groups of repeated rows:
WITH cust_calls AS (SELECT '01555444' phoneno, to_date('10/03/2017 10:30:00', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '02888999' phoneno, to_date('12/03/2017 11:40:20', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '01555444' phoneno, to_date('15/03/2017 18:22:33', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '03666777' phoneno, to_date('18/03/2017 20:36:44', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '01555444' phoneno, to_date('19/03/2017 08:15:47', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '01555444' phoneno, to_date('30/03/2017 22:18:30', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '01555444' phoneno, to_date('30/04/2017 23:42:31', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '01555444' phoneno, to_date('05/05/2017 16:35:41', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '01555444' phoneno, to_date('20/05/2017 21:20:52', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
SELECT '02888999' phoneno, to_date('12/03/2017 11:45:20', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual),
-- end of mimicking a table with your sample data in it. You do not need the above subquery, since you already have the table.
initial_info AS (SELECT phoneno,
calltime,
CASE WHEN calltime - LAG(calltime) OVER (PARTITION BY phoneno ORDER BY calltime) <= 10 THEN 'Y' ELSE 'N' END rep_row
FROM cust_calls),
middle_info AS (SELECT phoneno,
calltime,
rep_row rep,
CASE WHEN rep_row = 'Y' THEN
row_number() OVER (PARTITION BY phoneno ORDER BY calltime)
- row_number() OVER (PARTITION BY phoneno, rep_row ORDER BY calltime)
END rep_grp
FROM initial_info)
SELECT phoneno,
calltime,
rep,
CASE WHEN rep_grp is not NULL THEN
row_number() OVER (PARTITION BY phoneno, rep_grp ORDER BY calltime)
END repno
FROM middle_info
ORDER BY phoneno, calltime;
PHONENO CALLTIME REP REPNO
-------- ------------------- --- ----------
01555444 05/05/2017 16:35:41 Y 1
01555444 10/03/2017 10:30:00 N
01555444 15/03/2017 18:22:33 Y 1
01555444 19/03/2017 08:15:47 Y 2
01555444 20/05/2017 21:20:52 N
01555444 30/03/2017 22:18:30 N
01555444 30/04/2017 23:42:31 N
02888999 12/03/2017 11:40:20 N
02888999 12/03/2017 11:45:20 Y 1
03666777 18/03/2017 20:36:44 N
This works by first identifying the repeated rows by comparing the current row's calltime with the previous row's calltime and deciding if it's within 10 days or not. If you already have this info, you can skip this step and go straight to the next.
Next, we use the tabibitosan method to compare consecutive rows over all rows for the same phoneno and over all rows where rep_row is 'Y'.
Then we can use the number output by the previous step to partition the phoneno rows up even further, and then apply the row_number() analytic function to it.

Related

Is it possible to use the SQL DATEADD function but exclude dates from a table in the calculation?

Is it possible to use the DATEADD function but exclude dates from a table?
We already have a table with all dates we need to exclude. Basically, I need to add number of days to a date but exclude dates within a table.
Example: Add 5 days to 01/08/2021. Dates 03/08/2021 and 04/08/2021 exist in the exclusion table. So, resultant date should be: 08/08/2021.
Thank you
A bit of a "wonky" solution, but it works. Firstly we use a tally to create a Calendar table of dates, that exclude your dates in the table, then we get the nth row, where n is the number of days to add:
DECLARE #DaysToAdd int = 5,
#StartDate date = '20210801';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3), --Up to 1,000
Calendar AS(
SELECT DATEADD(DAY,T.I, #StartDate) AS D,
ROW_NUMBER() OVER (ORDER BY T.I) AS I
FROM Tally T
WHERE NOT EXISTS (SELECT 1
FROM dbo.DatesTable DT
WHERE DT.YourDate = DATEADD(DAY,T.I, #StartDate)))
SELECT D
FROM Calendar
WHERE I = #DaysToAdd+1;
A best solution is probably a calendar table.
But if you're willing to traverse through every date, then a recursive CTE can work. It would require tracking the total iterations and another column to substract if any traversed date was in the table. The exit condition uses the total difference.
An example dataset would be:
CREATE TABLE mytable(mydate date); INSERT INTO mytable VALUES ('20210803'), ('20210804');
And an example function run in it's own batch:
ALTER FUNCTION dbo.fn_getDays (#mydate date, #daysadd int)
RETURNS date
AS
BEGIN
DECLARE #newdate date;
WITH CTE(num, diff, mydate) AS (
SELECT 0 AS [num]
,0 AS [diff]
,DATEADD(DAY, 0, #mydate) [mydate]
UNION ALL
SELECT num + 1 AS [num]
,CTE.diff +
CASE WHEN DATEADD(DAY, num+1, #mydate) IN (SELECT mydate FROM mytable)
THEN 0 ELSE 1 END
AS [diff]
,DATEADD(DAY, num+1, #mydate) [mydate]
FROM CTE
WHERE (CTE.diff +
CASE WHEN DATEADD(DAY, num+1, #mydate) IN (SELECT mydate FROM mytable)
THEN 0 ELSE 1 END) <= #daysadd
)
SELECT #newdate = (SELECT MAX(mydate) AS [mydate] FROM CTE);
RETURN #newdate;
END
Running the function:
SELECT dbo.fn_getDays('20210801', 5)
Produces output, which is the MAX(mydate) from the function:
----------
2021-08-08
For reference the MAX(mydate) is taken from this dataset:
n diff mydate
----------- ----------- ----------
0 0 2021-08-01
1 1 2021-08-02
2 1 2021-08-03
3 1 2021-08-04
4 2 2021-08-05
5 3 2021-08-06
6 4 2021-08-07
7 5 2021-08-08
You can use the IN clause.
To perform the test, I used a W3Schools Test DB
SELECT DATE_ADD(BirthDate, INTERVAL 10 DAY) FROM Employees WHERE FirstName NOT IN (Select FirstName FROM Employees WHERE FirstName LIKE 'N%')
This query shows all the birth dates + 10 days except for the only employee with name starting with N (Nancy)

SQL Server: update table with value from previous record

I have tried several ways using LAG(), ROW_NUMBER() and so on, but I cannot get it working... Please help.
Assume we have this table:
Date Time Amount Balance
---------------------------------------------
20171001 12:44:00 102.00 102.00
20171002 09:32:12 10.00 null
20171002 20:00:00 123.00 null
20171003 07:43:12 5.29 null
My goal is to update the Balance but these records are not ordered in this table.
I have tried to use this code:
with t1 as
(
select
Date, Time, Amount, Balance,
lag(Balance) over (order by Date, Time) Balance_old
from
table1
)
update table1
set Balance = Amount + Balance_old
where Balance_old is not null
However, this seems to only update 1 record instead of 3 in the above example. Even when I try to do something similar with ROW_NUMBER() then I do not get the results I require.
The results I would like to have are as follows:
Date Time Amount Balance
---------------------------------------------
20171001 12:44:00 102.00 102.00
20171002 09:32:12 10.00 112.00
20171002 20:00:00 123.00 235.00
20171003 07:43:12 5.29 240.29
Please notice: in my situation there is always a record which has a value in Balance. This is the starting point which can be 0 or <>0 (but not null).
As one of the approaches is to simply use sum() over() window function.
-- set up
select *
into t1
from (
select cast('20171001' as date) Date1, cast('12:44:00' as time) Time1, 102.00 Amount, 102.00 Balance union all
select cast('20171002' as date), cast('09:32:12' as time), 10.00, null union all
select cast('20171002' as date), cast('20:00:00' as time), 123.00, null union all
select cast('20171003' as date), cast('07:43:12' as time), 5.29, null
) q
-- UPDATE statement
;with t2 as(
select date1
, time1
, amount
, balance
, sum(isnull(balance, amount)) over(order by date1, time1) as balance1
from t1
)
update t2
set balance = balance1
The result:
Date1 Time1 Amount Balance
---------- ---------------- ---------- -------------
2017-10-01 12:44:00.0000000 102.00 102.00
2017-10-02 09:32:12.0000000 10.00 112.00
2017-10-02 20:00:00.0000000 123.00 235.00
2017-10-03 07:43:12.0000000 5.29 240.29

reset window function when the time gap is over one hour

I have a dataset already sorted by a window function in sql:
ROW_NUMBER() OVER (PARTITION BY LOAN_NUMBER, CAST(CREATED_DATE AS DATE) ORDER BY LOAN_NUMBER, CREATED_DATE) AS ROW_IDX
shown as above. I wonder if there's a way that reset the ROW_IDX when the CREATED_DATE has begun to have a value with over one hour gap to the minimum datetime in a specific day.
For example, the row index for row 3 should be 1 because the time gap between 2016-11-03 15:39:16.000 and 2016-11-03 12:44:11.000 is over one hour.And row index of row 4 will be 2.
I've tried several ways to manipulate the datatime column, since the consideration is about 'gap' instead of moments of the day, no rounding methods worked perfectly.
Are mean ,when the gap more than 60 minutes, will restart at 1?
Which version are you use? If it is SQL Server 2012+, you can try this.
The following query is not satisfying, but wish can give you help.
Calculating the diff minutes between continuous two line.
Check the diff minutes whether greater than one hour
Get row number base on the gap time has same situation continuously.
Sorry if I can not describe clear. My english is not well.
;WITH tb(RptDate,ISSUE_ID,ACCOUNT,CREATED_DATE )AS(
select '2017-01-17','35775','76505156','2016-11-03 12:44:11.000' UNION
select '2017-01-17','35793','76505156','2016-11-03 12:51:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 13:47:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 14:45:43.000' UNION
select '2017-01-17','36097','76505156','2016-11-03 15:39:16.000' UNION
select '2017-01-17','36132','76505156','2016-11-03 15:52:51.000' UNION
select '2017-01-17','41391','76505156','2016-11-10 10:49:30.000'
)
SELECT *,ROW_NUMBER()OVER(PARTITION BY tt.ACCOUNT,a ORDER BY tt.ACCOUNT, rn) AS ROW_IDX FROM (
SELECT * ,rn-ROW_NUMBER () OVER (PARTITION BY ACCOUNT, CAST(CREATED_DATE AS DATE),n ORDER BY rn) AS a
FROM (
SELECT *, ROW_NUMBER()OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE) AS rn
,CASE WHEN DATEDIFF(MINUTE, LAG(CREATED_DATE)OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE),tb.CREATED_DATE)>60 THEN 1 ELSE 0 END AS n
,ISNULL(DATEDIFF(MINUTE, LAG(CREATED_DATE)OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE),tb.CREATED_DATE),0) AS DiffMin
FROM tb
) AS t
) AS tt
ORDER BY rn
RptDate ISSUE_ID ACCOUNT CREATED_DATE rn n DiffMin a ROW_IDX
---------- -------- -------- ----------------------- -------------------- ----------- ----------- -------------------- --------------------
2017-01-17 35775 76505156 2016-11-03 12:44:11.000 1 0 0 0 1
2017-01-17 35793 76505156 2016-11-03 12:51:43.000 2 0 7 0 2
2017-01-17 36097 76505156 2016-11-03 15:39:16.000 3 1 168 2 1
2017-01-17 36132 76505156 2016-11-03 15:52:51.000 4 0 13 1 1
2017-01-17 41391 76505156 2016-11-10 10:49:30.000 5 1 9777 4 1
It is another script,Do not use the LAG function, Each step has a statement:
;WITH tb(RptDate,ISSUE_ID,ACCOUNT,CREATED_DATE )AS(
select '2017-01-17','35775','76505156','2016-11-03 12:44:11.000' UNION
select '2017-01-17','35793','76505156','2016-11-03 12:51:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 13:47:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 14:45:43.000' UNION
select '2017-01-17','36097','76505156','2016-11-03 15:39:16.000' UNION
select '2017-01-17','36132','76505156','2016-11-03 15:52:51.000' UNION
select '2017-01-17','41391','76505156','2016-11-10 10:49:30.000'
),t1 AS(
SELECT *, ROW_NUMBER()OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE) AS rn FROM tb
),t2 AS (
SELECT t1.*,CASE WHEN DATEDIFF(MINUTE,tt.CREATED_DATE,t1.CREATED_DATE)>60 THEN 1 ELSE 0 END AS m
,t1.rn-ROW_NUMBER()OVER(PARTITION BY t1.ACCOUNT,CASE WHEN DATEDIFF(MINUTE,tt.CREATED_DATE,t1.CREATED_DATE)>60 THEN 1 ELSE 0 END ORDER BY t1.CREATED_DATE) AS a
FROM t1 LEFT JOIN t1 AS tt ON tt.ACCOUNT=t1.ACCOUNT AND tt.rn=t1.rn-1
),t3 AS(
SELECT *,ROW_NUMBER()OVER(PARTITION BY ACCOUNT,t2.a ORDER BY CREATED_DATE) AS ROW_IDX
FROM t2
)
SELECT * FROM t3
ORDER BY t3.ACCOUNT,t3.CREATED_DATE

SQL Server : find Cust with Continuous Enrollment

I have a task to solve well known problem in industry task to ID those CustID who have continuous activity , for given period of time and we allow little breaks between contracts.
I did first part populating matrix table like in snippet below for whole period of time and setting flag if it's active for this date, I think this is the only reliable way to do this, as contracts can have overlaps, etc..
So now I need to check if CustID is 1/0 for cont activity, I stuck into the task how to track this, let say in my example there is 3 days break which is OK, but I need to make sure that those days are one after another.
Do you have any good ideas how I can do this nicely, appreciate your help and leads. I saw some examples but they done in SAS so it's hard to understand.
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
;with matrix_dd as
(
select *
from
(select 111 CustID, '2015-1-11' dd, 1 Active union
select 111 CustID, '2015-1-12' dd, 0 Active union
select 111 CustID, '2015-1-13' dd, 0 Active union
select 111 CustID, '2015-1-14' dd, 0 Active union
select 111 CustID, '2015-1-15' dd, 1 Active union
select 111 CustID, '2015-1-16' dd, 1 Active union
select 111 CustID, '2015-1-17' dd, 1 Active union
select 111 CustID, '2015-1-18' dd, 1 Active union
select 111 CustID, '2015-1-19' dd, 0 Active union
select 111 CustID, '2015-1-20' dd, 0 Active) a
)
select *
from matrix_dd
Best
M
This solution calculates the active ranges and how long of a break it's been since the last interval ended:
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
with matrix_dd as
(
select * from ( values
(111, '2015-1-11', 1 ),
(111, '2015-1-12', 0 ),
(111, '2015-1-13', 0 ),
(111, '2015-1-14', 0 ),
(111, '2015-1-15', 1 ),
(111, '2015-1-16', 1 ),
(111, '2015-1-17', 1 ),
(111, '2015-1-18', 1 ),
(111, '2015-1-19', 0 ),
(111, '2015-1-20', 0 )
) as x(CustID, dd, Active)
), active_with_groups as (
select *,
row_number() over (partition by CustID order by dd) -
datediff(day, '2000-01-01', dd) as gid
from matrix_dd
where active = 1
and dd between #PeriodStart and #PeriodEnd
), islands as (
select CustId, min(dd) as islandStart, max(dd) as islandEnd
from active_with_groups
group by CustID, gid
), islands_with_gaps as (
select *,
datediff(
day,
lag(islandEnd, 1, islandStart)
over (partition by CustID order by islandStart),
islandStart
) - 1 as [break]
from islands
)
select *
from islands_with_gaps
where [break] >= #maxBreak
order by islandStart
Let's break it down. In the "active_with_groups" common table expression (CTE), all I'm doing is converting the dates into integers that have the same relationship by using datediff(). Why? Integers are easier to work with for this problem. Note that I'm also using row_number() to get a contiguous sequence and then getting the difference between that and the datediff() value. The key observation is that if the days also don't go up contiguously, that difference will be, well, different. Likewise, if the dates do go up contiguously, then the difference will be the same. Therefore, we can use this value as a group identifier for values that are in a contiguous range.
Next, we use that the group identifier to group by (bet you didn't see that coming!). This gives us the start and end of each interval. Nothing very clever is going on here.
The next step is to calculate the amount of time that's passed between when the last interval ended and the current one began. For this, we use a simple call to the lag() function. The only thing to note here is that I've chosen to have the lag() function emit a default value of islandStart in the case of the first interval. It could have just as easily been no default (which would have then caused it to emit a NULL value).
Lastly, we look for intervals with a gap over the specified threshold.
Similar to Ben's answer. I'm assuming that all your dates are represented in the data. So really we just need to make sure there isn't a run of zeroes longer than 3.
with inactive_runs as (
select
CustID,
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where 3 >= all (
select count(*) from inactive_runs ir
where ir.CustID = m.CustID
group by grp
);
http://rextester.com/AHI22250
Using all isn't particularly common. Here's an alternative:
...
with inactive_runs as (
select
CustID, dd, /* <-- had to add dd */
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from #matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where not exists (
select 1 from inactive_runs ir
where ir.CustID = m.CustID
group by grp
having datediff(day, min(dd), max(dd)) > 2
);
I glanced at your comment above. I think it confirms my suspicion that you've got a single row for every date. If you've got a new version of SQL Server you can just sum over the previous three rows. Unfortunately you wouldn't be able to use a variable for the window size if the length is variable:
with cust as (
select
CustID,
case when
sum(case when Active = 0 then 1 end) over (
partition by CustID
order by dd
rows between 3 preceding and current row
) = 4 then 1
end as isBrk
from matrix_dd
)
select CustID
from cust
group by CustID
having count(isBrk) = 0;
Edit:
Based on your comment with the data in a "pre-matrix" format, yes, that's a simpler query. At that point you're just looking at the previous end date and the current row's start date.
with data as (
select * from (
values (111, 1230, '2014-12-11', '2015-01-11'),
(111, 1231, '2015-01-15', '2015-01-18'),
(111, 1232, '2015-03-22', '2015-04-01')
) as t (CustID, ContractID, StartDD, EndDD)
), gaps as (
select
CustID,
datediff(day,
lag(EndDD, 1, StartDD) over (partition by CustID order by StartDD),
StartDD
) as days
from data
)
select CustID
from gaps
group by CustID;
having max(days) <= 3;

SQL Server to Oracle to find number of months

I just did query to find the number of months between two dates (except current month) in SQL Server:
SELECT
(DATEDIFF(MONTH, DATEADD(MONTH, 1, '2016-02-20 00:00:00.000') -
DAY(DATEADD(MONTH, 1, '2016-02-20 00:00:00.000')), '2016-12-31 00:00:00.000'))
The above will return 10 .. how do we achieve this in Oracle?
I guess - atleast in 10g and above - the correct function is
SELECT MONTHS_BETWEEN([date1],[date2]) FROM dual
Furthermore it is possible to calculate the difference between two dates
SELECT [date2] - [date1] FROM dual
This will give you the difference in days.
There are several different ways to count a difference in months;
a good starting point could be months_between:
SQL> with testDates (date1, date2) as
2 (
3 select date '2016-12-31', date '2016-02-20' from dual union all
4 select date '2016-02-01', date '2016-01-31' from dual union all
5 select date '2016-12-30', date '2016-11-30' from dual union all
6 select date '2016-12-29', date '2016-11-30' from dual
7 )
8 select date1, date2, months_between(date1, date2) diff, floor(months_between(date1, date2) ) floor_diff
9 from testDates;
DATE1 DATE2 DIFF FLOOR_DIFF
--------- --------- ---------- ----------
31-DIC-16 20-FEB-16 10,3548387 10
01-FEB-16 31-GEN-16 ,032258065 0
30-DIC-16 30-NOV-16 1 1
29-DIC-16 30-NOV-16 ,967741935 0
The Right query to find the sum of months between two dates in Oracle11g..Please try it
SELECT COUNT(0) TOTAL_MONTHS
FROM (SELECT DATE '2015-01-05' START_DATE,
DATE '2015-12-15' END_DATE
FROM DUAL)
CONNECT BY LEVEL <= MONTHS_BETWEEN(
TRUNC(END_DATE,'MM'),
TRUNC(START_DATE,'MM') )
+ 1

Resources