This is for SQL Server 2008/2012.
I have the following dataset with the claim start date and end date. I want to calculate the number of days when there are back to back claims where the claim start date of the next date is one day after the claim end date of the previous date, making it a continuous service.
If there is a break in service, like for member id 1002 where the claim end of 05/15 and next one starts on 05/18, the count should restart.
MemberID Claim Start Claim End Claim_ID
1001 2016-04-01 2016-04-15 ABC11111
1001 2016-04-16 2016-04-30 ABC65465
1001 2016-05-01 2016-05-15 ABC51651
1001 2016-05-16 2016-06-15 ABC76320
1002 2016-04-01 2016-04-15 ABC74563
1002 2016-04-16 2016-04-30 ABC02123
1002 2016-05-01 2016-05-15 ABC02223
1002 2016-05-18 2016-06-15 ABC66632
1002 2016-06-16 2016-06-30 ABC77447
1002 2016-07-10 2016-07-31 ABC33221
1002 2016-08-01 2016-08-10 ABC88877
So effectively, I want the following output. Min of the very first claim start date, max of the claim end date when there is no gap in coverage between multiple claims. If there is a gap in coverage, the count starts over and the min of the start date of the 1st claim and the max of the claim end date until there is no gap in coverage between multiple claims.
MemberID Claim_Start Claim_End Continuous_Service_Days
1001 2016-04-01 2016-06-15 76
1002 2016-04-01 2016-05-15 45
1002 2016-05-18 2016-06-30 44
1002 2016-07-10 2016-08-10 32
I have tried while loops, CTE's and I have also tried the following table to first get all the dates between the claims. But I am having problems with counting the days between consecutive dates and to reset the count if there is a break in coverage.
Master.dbo.spt_values
Any help is appreciated. Thanks!
You need to find the gaps first.
This solution uses a Tally Table to generate the dates first from ClaimStart to ClaimEnd. Then using the generated dates, get the gaps using this method.
Now that you have the gaps, you can now use GROUP BY to ge the MIN(ClaimStart) and MAX(ClaimStart):
WITH E1(N) AS( -- 10 ^ 1 = 10 rows
SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
SELECT TOP(SELECT MAX(DATEDIFF(DAY, ClaimStart, ClaimEnd) + 1) FROM tbl)
ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E4
),
CteDates AS( -- Generate the dates from ClaimStart to ClaimEnd
SELECT
t.MemberID,
dt = DATEADD(DAY, ct.N - 1, t.ClaimStart)
FROM tbl t
INNER JOIN CteTally ct
ON DATEADD(DAY, ct.N - 1, t.ClaimStart) <= t.ClaimEnd
),
CteGrp AS( -- Find gaps and continuous dates
SELECT *,
rn = DATEADD(DAY, - ROW_NUMBER() OVER(PARTITION BY MemberID ORDER BY dt), dt)
FROM CteDates
)
SELECT
MemberID,
ClaimStart = MIN(dt),
ClaimEnd = MAX(dt),
Diff = DATEDIFF(DAY, MIN(dt), MAX(dt)) + 1
FROM CteGrp
GROUP BY MemberID, rn
ORDER BY MemberID, ClaimStart;
ONLINE DEMO
Declare #YourTable table (MemberID int,[Claim Start] date,[Claim End] date,[Claim_ID] varchar(25))
Insert Into #YourTable values
(1001,'2016-04-01','2016-04-15','ABC11111'),
(1001,'2016-04-16','2016-04-30','ABC65465'),
(1001,'2016-05-01','2016-05-15','ABC51651'),
(1001,'2016-05-16','2016-06-15','ABC76320'),
(1002,'2016-04-01','2016-04-15','ABC74563'),
(1002,'2016-04-16','2016-04-30','ABC02123'),
(1002,'2016-05-01','2016-05-15','ABC02223'),
(1002,'2016-05-18','2016-06-15','ABC66632'),
(1002,'2016-06-16','2016-06-30','ABC77447'),
(1002,'2016-07-10','2016-07-31','ABC33221'),
(1002,'2016-08-01','2016-08-10','ABC88877')
;with cte0(N) as (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N))
,cte1(R,D) as (Select Row_Number() over (Order By (Select Null))
,DateAdd(DD,-1+Row_Number() over (Order By (Select Null)),(Select MinDate=min([Claim Start]) From #YourTable))
From cte0 N1, cte0 N2, cte0 N3, cte0 N4)
Select MemberID
,[Claim Start] = Min([Claim Start])
,[Claim End] = Max([Claim End])
,Continuous_Service_Days = count(*)
From (
Select *,Island = R - Row_Number() over (Partition By MemberID Order by [Claim Start])
From #YourTable A
Join cte1 B on D Between [Claim Start] and [Claim End]
) A
Group By MemberID,Island
Order By 1,2
Returns
MemberID Claim Start Claim End Continuous_Service_Days
1001 2016-04-01 2016-06-15 76
1002 2016-04-01 2016-05-15 45
1002 2016-05-18 2016-06-30 44
1002 2016-07-10 2016-08-10 32
Related
I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);
Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days
Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp
I need to fill the range from 2017-04-01 to 2017-04-30 with the data from this table, knowing that the highest priority records should prevail over those with lower priorities
id startValidity endValidity priority
-------------------------------------------
1004 2017-04-03 2017-04-30 1
1005 2017-04-10 2017-04-22 2
1010 2017-04-19 2017-04-23 3
1006 2017-04-24 2017-04-28 2
1008 2017-04-26 2017-04-28 3
In practice I would need to get a result like this:
id startValidity endValidity priority
--------------------------------------------
1004 2017-04-03 2017-04-09 1
1005 2017-04-10 2017-04-18 2
1010 2017-04-19 2017-04-23 3
1006 2017-04-24 2017-04-25 2
1008 2017-04-26 2017-04-28 3
1004 2017-04-29 2017-04-30 1
can't think of anything elegant or more efficient solution right now . . .
-- Sample Table
declare #tbl table
(
id int,
startValidity date,
endValidty date,
priority int
)
-- Sample Data
insert into #tbl select 1004, '2017-04-03', '2017-04-30', 1
insert into #tbl select 1005, '2017-04-10', '2017-04-22', 2
insert into #tbl select 1010, '2017-04-19', '2017-04-23', 3
insert into #tbl select 1006, '2017-04-24', '2017-04-28', 2
insert into #tbl select 1008, '2017-04-26', '2017-04-28', 3
-- Query
; with
date_range as -- find the min and max date for generating list of dates
(
select start_date = min(startValidity), end_date = max(endValidty)
from #tbl
),
dates as -- gen the list of dates using recursive CTE
(
select rn = 1, date = start_date
from date_range
union all
select rn = rn + 1, date = dateadd(day, 1, d.date)
from dates d
where d.date < (select end_date from date_range)
),
cte as -- for each date, get the ID based on priority
(
select *, grp = row_number() over(order by id) - rn
from dates d
outer apply
(
select top 1 x.id, x.priority
from #tbl x
where x.startValidity <= d.date
and x.endValidty >= d.date
order by x.priority desc
) t
)
-- final result
select id, startValidity = min(date), endValidty = max(date), priority
from cte
group by grp, id, priority
order by startValidity
I do not understand the purpose of Calendar CTE or table.
So I am not using any REcursive CTE or calendar.
May be I hvn't understood the requirement completly.
Try this with diff sample data,
declare #tbl table
(
id int,
startValidity date,
endValidty date,
priority int
)
-- Sample Data
insert into #tbl select 1004, '2017-04-03', '2017-04-30', 1
insert into #tbl select 1005, '2017-04-10', '2017-04-22', 2
insert into #tbl select 1010, '2017-04-19', '2017-04-23', 3
insert into #tbl select 1006, '2017-04-24', '2017-04-28', 2
insert into #tbl select 1008, '2017-04-26', '2017-04-28', 3
;With CTE as
(
select * ,ROW_NUMBER()over(order by startValidity)rn
from #tbl
)
,CTE1 as
(
select c.id,c.startvalidity,isnull(dateadd(day,-1, c1.startvalidity)
,c.endValidty) Endvalidity
,c.[priority],c.rn
from cte c
left join cte c1
on c.rn+1=c1.rn
)
select id,startvalidity,Endvalidity,priority from cte1
union ALL
select id,startvalidity,Endvalidity,priority from
(
select top 1 id,ca.startvalidity,ca.Endvalidity,priority from cte1
cross apply(
select top 1
dateadd(day,1,endvalidity) startvalidity
,dateadd(day,-1,dateadd(month, datediff(month,0,endvalidity)+1,0)) Endvalidity
from cte1
order by rn desc)CA
order by priority
)t4
--order by startvalidity --if req
I need to calculate break time taken by employee, sample shown here:
Userid Date_time
------ ---------
1001 9/1/15 10:31 AM
1001 9/1/15 11:51 AM
1001 9/1/15 11:58 AM
1001 9/1/15 2:02 PM
1001 9/1/15 2:38 PM
1001 9/1/15 4:37 PM
1001 9/1/15 5:12 PM
1001 9/1/15 6:32 PM
1001 9/1/15 6:34 PM
1001 9/1/15 7:39 PM
1001 9/1/15 7:42 PM
1001 9/1/15 7:53 PM
Hence I don't want first and last record because it will be calculated as total working hours.
Expected result:
Userid break_time_MIN
------ --------------
1001 83
Please suggest how I can calculate the break time for each employee.
First, you want to remove the first and last row. After that, you want to group two consecutive rows and then get their difference. Finally, compute for the SUM of all the differences:
WITH Cte AS(
SELECT *,
grp = rn - (rn % 2 + 1)
FROM (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY Userid ORDER BY Date_time),
rnd = ROW_NUMBER() OVER(PARTITION BY Userid ORDER BY Date_time DESC)
FROM #tbl
) t
WHERE rn <> 1 AND rnd <> 1
),
CteFinal AS(
SELECT
Userid,
BreakDuration = DATEDIFF(MINUTE, MIN(Date_time), MAX(Date_time))
FROM Cte
GROUP BY
Userid, grp
)
SELECT
Userid,
break_time_MIN = SUM(BreakDuration)
FROM CteFinal
GROUP BY UserId;
ONLINE DEMO
---------------------
Result:
---------------------
Userid break_time_MIN
------ --------------
1001 83
;WITH cte AS (
SELECT Userid,
Date_time,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY UserId, Date_time) as RN
FROM YourTableName)
SELECT r1.Userid, SUM(DATEDIFF(MINUTE,r1.Date_time,r2.Date_time)) as break_time
FROM cte r1
INNER JOIN cte r2
ON r1.Userid = r2.Userid AND r1.RN + 1 = r2.RN
WHERE r1.RN % 2 = 0
GROUP BY r1.Userid
Output:
Userid break_time
1001 83
I'm trying to do this query, in sql server, but something is wrong. Need some help...
I have a table with item movements and another one with other movements (buy) where I find the cost of each item in each date when I buy it. So, I just need first table with last cost based on the date of movement finding the cost on second table on the last date.
In other words, only must search the records from the second table with date lower than the first table date for that item and return the cost of the most recent date.
Examples:
First Table
REF DATE
1 2015-10-15
1 2015-08-30
2 2015-09-11
3 2015-05-22
2 2015-03-08
2 2015-07-15
3 2015-11-14
1 2015-11-20
Second Table (Buy)
REF DATE COST
1 2015-08-20 150
1 2015-10-12 120
2 2015-04-04 270
2 2015-06-15 280
3 2015-03-01 75
3 2015-10-17 80
I need this result:
REF DATE Cost
1 2015-10-15 120
1 2015-08-30 150
2 2015-09-11 280
3 2015-05-22 75
2 2015-03-08 -
2 2015-07-15 280
3 2015-11-14 80
1 2015-11-20 120
Any help appreciated.
You can do it using OUTER APPLY:
SELECT [REF], [DATE], [COST]
FROM Table1 AS t1
OUTER APPLY (
SELECT TOP 1 COST
FROM Table2 AS t2
WHERE t1.REF = t2.REF AND t1.DATE >= t2.DATE
ORDER BY t2.DATE DESC) AS t3
Demo here
;WITH cte AS (
SELECT ft.*,
st.[Cost],
ROW_NUMBER() OVER (PARTITION BY ft.[Ref],ft.[Date] ORDER BY st.[Date] DESC) RN
FROM FirstTable ft
LEFT JOIN SecondTable st ON ft.[Ref] = st.[Ref]
AND ft.[Date] >= st.[Date]
)
SELECT Ref,
[Date],
[Cost]
FROM cte
WHERE RN = 1
or if you dont want to use a cte.
SELECT
Ref,
[Date],
[Cost]
FROM
(SELECT
ft.*,
st.[Cost],
ROW_NUMBER() OVER (PARTITION BY ft.[Ref],ft.[Date] ORDER BY st.[Date] DESC) RN
FROM
FirstTable ft
LEFT JOIN SecondTable st ON ft.[Ref] = st.[Ref]
AND ft.[Date] >= st.[Date]
) t
WHERE
t.RN = 1
Suppose I have a table with data as below:
SELECT *
FROM TestTable
ORDER BY deliver_date
deliver_date quantity
2015-10-01 5.00
2015-10-02 3.00
2015-10-05 10.00
2015-10-07 8.00
2015-10-08 6.00
I know how to do the cumulative as below:
SELECT t1.deliver_date, SUM(t2.quantity) AS cumQTY
FROM TestTable t1
INNER JOIN TestTable t2 ON t2.deliver_date <= t1.deliver_date
GROUP BY t1.deliver_date
ORDER BY t1.deliver_date
result:
deliver_date cumQTY
2015-10-01 5.00
2015-10-02 8.00
2015-10-05 18.00
2015-10-07 26.00
2015-10-08 32.00
But, is it possible for me to get the result as below?
deliver_date cumQTY
2015-10-01 5.00
2015-10-02 8.00
2015-10-03 8.00
2015-10-04 8.00
2015-10-05 18.00
2015-10-06 18.00
2015-10-07 26.00
2015-10-08 32.00
Means, the date must follow continuously.
For example: I do not have 2015-10-03 data in my TestTable table, but the cumulative table must show the date 2015-10-03
Appreciate if someone can help on this.
Thank you.
You can do this using a Tally Table:
SQL Fiddle
DECLARE #startDate DATE,
#endDate DATE
SELECT
#startDate = MIN(deliver_date),
#endDate = MAX(deliver_date)
FROM TestTable
;WITH E1(N) AS(
SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b),
Tally(N) AS(
SELECT TOP(DATEDIFF(DAY, #startDate, #endDate) + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E4
),
CteAllDates AS(
SELECT
deliver_date = DATEADD(DAY, t.N-1, #startDate),
quantity = ISNULL(tt.quantity, 0)
FROM Tally t
LEFT JOIN TestTable tt
ON DATEADD(DAY, N-1, #startDate) = tt.deliver_date
)
SELECT
deliver_date,
cumQty = SUM(quantity) OVER(ORDER BY deliver_date)
FROM CteAllDates
First, you want to generate all dates starting from the MIN(deliver_date) up to MAX(deliver_date). This is done using a tally table, the CTEs from E1(N) up to Tally(N).
Now that you have all the dates, do a LEFT JOIN on the original table, TestTable, to get the corresponding quantity, assigning 0 if there is no matching dates.
Lastly, to get the cumulative sum, you can use SUM(quantity) OVER(ORDER BY deliver_date).
For more explanation on tally table, see my answer here.