Join two tables by MRN and date1 >= MAX(date2) - sql-server

I need to combine the data from two hospital activity reports. What happens is this: Patients get admitted to a spinal department. Some of whom then get referred to put on ventilation. After a while patient is discharged. Later, the same patient may or may not get re-referred back to the spinal department and may or may not be re-referred for ventilation. I am sent activity data in two reports:
Monthly Activity Report:
[MRN] [NHS Number] [Admission Date] [DoB] [Blah] [Blah]
Ventilation Report
[MRN] [Admission Date] [Ventilation Days] [Ventilation Type] [blah] [blah]
N.B. The Admission Date on the Ventilation Report is the date they are referred for ventilation. This may be the same day, or some date after they are referred into spinal dept.
What I need to achieve is this: join each row to the most immediate entry prior to the patient being referred to ventilation. I need to avoid duplicating rows, but I cannot join it to the most recent row in the Monthly Activity Report as this could easily be a subsequent referral and the other information will not be applicable.
By following the answer to a similar question on Stackoverflow, I came up with this code:
SELECT [Year], [Month], MRN, [NHS Number], [Admission Date] AS [VD
Admission Date],
[Admit date] AS [MAR Admit Date], Days,
[Ventilation Type], [Ventilation Route], [Ventilation Time], [Package of
care class],
[Para/Tetra/No deficit], [Social charge date commenced ] AS [Social charge
date], [Discharge date]
FROM Spinal_Costing.Vented_Days VD
LEFT JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY [Patient MRN] ORDER BY
[Admit Date] DESC) AS row
FROM Spinal_Costing.MAR
) MAR ON VD.MRN = MAR.[Patient MRN]
WHERE MAR.row = 1;
But this returns the most recent entry in MAR for each patient.

This can also be achieved with an apply that references the values in Vented_Days and simply returns a top 1 for each row. cross apply won't return null values whereas outer apply will:
declare #vd table(MRN int,AdmissionDate date);
declare #mar table(MRN int,AdmissionDate date);
insert into #vd values
(1,'20190102')
,(1,'20190106')
,(2,'20190104')
,(3,'20190101');
insert into #mar values
(1,'20190101')
,(1,'20190105')
,(2,'20190102');
select v.MRN
,v.AdmissionDate
,m.AdmissionDate
from #vd as v
outer apply (select top 1 m.AdmissionDate
from #mar as m
where v.MRN = m.MRN
and v.AdmissionDate >= m.AdmissionDate
order by m.AdmissionDate desc
) as m
order by v.MRN
,v.AdmissionDate;
Output
+-----+---------------+---------------+
| MRN | AdmissionDate | AdmissionDate |
+-----+---------------+---------------+
| 1 | 2019-01-02 | 2019-01-01 |
| 1 | 2019-01-06 | 2019-01-05 |
| 2 | 2019-01-04 | 2019-01-02 |
| 3 | 2019-01-01 | NULL |
+-----+---------------+---------------+

You were on the right track, you just need to add a JOIN to that derived table to limit the rows in the Spinal_Costing.MAR table to those that came at, or before discharge.
SELECT
[Year],
[Month],
MRN,
[NHS Number],
[Admission Date] AS [VD Admission Date],
[Admit date] AS [MAR Admit Date],
Days,
[Ventilation Type],
[Ventilation Route],
[Ventilation Time],
[Package of care class],
[Para/Tetra/No deficit],
[Social charge date commenced ] AS [Social charge date],
[Discharge date]
FROM
Spinal_Costing.Vented_Days VD
LEFT JOIN
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY [Patient MRN] ORDER BY [Admit Date] DESC) AS row
FROM Spinal_Costing.MAR
--added the JOIN and WHERE clause here
INNER JOIN Spinal_Costing.Vented_Days
ON Spinal_Costing.Vented_Days.MRN = Spinal_Costing.MAR.[Patient MRN]
WHERE Spinal_Costing.MAR.[Admit Date] <= Spinal_Costing.Vented_Days.[Discharge date]
) MAR ON VD.MRN = MAR.[Patient MRN]
WHERE MAR.row = 1;

Related

Update gaps in sequential table

I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);
Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days
Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp

SQL Server : update columns with sum() and group by on a column

I am trying to update the SQL Server table where my estimatename, region, marketname, b_date, len, creative, file_id are same and sum the spend column to make it single row.
For example:
mdl_drtv_part_b_master_id:
ESTIMATE NAME REGION MARKET NAME BCAST DATE LEN CREATIVE SPEND file_id create_date
451 4Q18 EAST CENTRAL EC PIT PA 2018-11-15 60 GET MORE - HYBRID 410.00 5862 2019-04-05 16:17:14.453
452 4Q18 EAST CENTRAL EC PIT PA 2018-11-15 60 Get More - Hybrid 350.00 5862 2019-04-05 16:17:14.453
1929 4Q18 EAST CENTRAL EC PIT PA 2018-11-15 60 GET MORE - HYBRID 646.00 5863 2019-04-05 16:18:51.490
I would like to get this as my output:
ESTIMATE NAME REGION MARKET NAME BCAST DATE LEN CREATIVE SPEND file_ID create_date
4Q18 EAST CENTRAL EC PIT PA 2018-11-15 60 GET MORE - HYBRID 760.00 5862 2019-04-05 16:17:14.453
4Q18 EAST CENTRAL EC PIT PA 2018-11-15 60 GET MORE - HYBRID 646.00 5863 2019-04-05 16:18:51.490
Here is my SQL select to get my output results:
SELECT
[ESTIMATE NAME], [REGION], [MARKET NAME], [BCAST DATE], [LEN],
[CREATIVE], SUM(SPEND), file_ID, [create_date]
FROM
dbo.mdl_drtv_part_b_sl
WHERE
[bcast date] = '2018-11-15'
-- AND region LIKE 'ec%'
AND creative = 'GET MORE - HYBRID'
GROUP BY
[ESTIMATE NAME], [REGION], [MARKET NAME], [BCAST DATE], [LEN],
[CREATIVE], file_ID, [create_date]
Thank you in advance.
Here is one possible solution:
update dbo.mdl_drtv_part_b_sl
set SPEND =
(
select sum(SPEND)
from dbo.mdl_drtv_part_b_sl mdp
where
mdp.[ESTIMATE NAME]= dbo.mdl_drtv_part_b_sl.[ESTIMATE NAME]
and
mdp.[REGION]= dbo.mdl_drtv_part_b_sl.[REGION]
and
mdp.[MARKET NAME]= dbo.mdl_drtv_part_b_sl.[MARKET NAME]
and
mdp.[BCAST DATE]= dbo.mdl_drtv_part_b_sl.[BCAST DATE]
and
mdp.[BCAST DATE]= dbo.mdl_drtv_part_b_sl.[BCAST DATE]
and
mdp.[LEN]= dbo.mdl_drtv_part_b_sl.[LEN]
and
mdp.[CREATIVE]= dbo.mdl_drtv_part_b_sl.[CREATIVE]
and
mdp.[CREATIVE]= dbo.mdl_drtv_part_b_sl.[CREATIVE]
and
mdp.[file_ID,]= dbo.mdl_drtv_part_b_sl.[file_ID,]
and
mdp.[create_date]= dbo.mdl_drtv_part_b_sl.[create_date]
)
Then after updating the SPEND column, you can remove duplicates by using a window function:
;with cte as (
select row_number() over (partition by [ESTIMATE NAME] ,[REGION], [MARKET NAME] , [BCAST DATE],[LEN],[CREATIVE] ,[file_ID],[create_date] order by [ESTIMATE NAME] desc) rn
FROM dbo.mdl_drtv_part_b_sl)
delete from cte where rn> 1
You could try using a UPDATE with join on a subquery based on your select
update your_table
SET m.SPEND = t.sum_spend
FROM your_table m
INNER JOIN (
SELECT [ESTIMATE NAME] ,[REGION], [MARKET NAME]
, [BCAST DATE],[LEN],[CREATIVE]
, SUM(SPEND) sum_spend , file_ID,[create_date]
FROM dbo.mdl_drtv_part_b_sl
where [bcast date] = '2018-11-15'
--and region like 'ec%'
and creative = 'GET MORE - HYBRID'
GROUP BY [ESTIMATE NAME] ,[REGION], [MARKET NAME] , [BCAST DATE],[LEN],[CREATIVE], file_ID,[create_date]
) t ON t.[ESTIMATE NAME] = m.[ESTIMATE NAME]
AND t.[REGION] = m.[REGION]
AND t.[MARKET NAME] = m.[MARKET NAME]
AND t.[BCAST DATE] ) m.[BCAST DATE]
AND t.[LEN] = m.[LEN]
AND t.[CREATIVE] = m.[CREATIVE]
AND t.file_ID = m.file_ID
AND t.[create_date] = m.[create_date]

SQL to split mutilple row by date

Let's say I have these values in a table
| Start Date | End date |Other Value
---------------------------------------------------------------------
| 2015-01-07 01:00:00.000 | 2015-01-08 04:00:00.000 | Yes
| 2015-01-08 10:00:00.000 | 2015-01-10 20:00:00.000 | No
I want to write a select statement that should give me results like:
|Date | Start Date | End date |Other Value
-----------------------------------------------------------
|2015-01-07 | 01:00:00.000 | | Yes
|2015-01-08 | | 04:00:00.000 | Yes
|2015-01-08 | 10:00:00.000 | | No
|2015-01-10 | | 20:00:00.000 | No
Is there a way to do it in T-SQL?
I am using SQL Server 2008 R2.
You can do something like this..
SQL Fiddle
WITH cte AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rn,startdate,enddate, othervalue FROM yourtable
)
,cte1 AS
(
SELECT rn,cast(startdate AS DATE) AS [date]
,CAST(startdate AS TIME) AS [Start Date]
,null AS [End date]
, othervalue
FROM cte
UNION all
SELECT rn,cast(Enddate AS DATE) AS [date]
,null AS [Start Date]
,CAST(Enddate AS TIME) AS [End date],
othervalue
FROM cte
)
sELECT * FROM CTE1 ORDER BY RN,[Start Date] desc
You can use UNION ALL and ROW_NUMBER() to order your result, like this:
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL) AS rownum
FROM Your_Table
)
SELECT
rownum,
CAST([Start date] AS DATE) AS [Date],
CAST([Start date] AS DATE) AS [Start date],
NULL AS [End date],
[Other Value]
FROM CTE
UNION ALL
SELECT
rownum,
CAST([End date] AS DATE) AS [Date],
NULL AS [Start date],
CAST([End date] AS DATE) AS [End date],
[Other Value]
FROM CTE
ORDER BY rownum
Using CROSS APPLY and VALUES:
SQL Fiddle
SELECT
x.*, t.OtherValue
FROM tbl t
CROSS APPLY(VALUES
(CAST(StartDate AS DATE), CAST(StartDate AS TIME), NULL),
(CAST(EndDate AS DATE), NULL, CAST(EndDate AS TIME))
)x(Date, StartDate, EndDate)
This method scans the table only once.

SQL multiple start dates to end date

I have a table with the following format (which I cannot change)
ClientID | RefAd1 | Cluster Start Date | Cluster End Date
100001 | R1234 | 2014-11-01 |
100001 | R1234 | 2014-11-10 |
100001 | R1234 | 2014-11-20 |
What I would like to come out with is:
ClientID | RefAd1 | Cluster Start Date | Cluster End Date
100001 | R1234 | 2014-11-01 | 2014-11-10
100001 | R1234 | 2014-11-10 | 2014-11-20
100001 | R1234 | 2014-11-20 | NULL
I've searched on here, and had many attempts myself, but just can't get it working.
I can't update the source table (or add another table into the database) so I'm going to do this in a view (which I can save)
Any help would be gratefully appreciated, been going round in circles with this for a day and a bit now!
Use Self join to get next record
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY [Cluster Start Date])RNO,*
FROM YOURTABLE
)
SELECT C1.ClientID,C1.RefAd1,C1.[Cluster Start Date],C2.[Cluster Start Date] [Cluster End Date]
FROM CTE C1
LEFT JOIN CTE C2 ON C1.RNO=C2.RNO-1
Click here to view result
EDIT :
To update the table, you can use the below query
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY [Cluster Start Date])RNO,*
FROM #TEMP
)
UPDATE #TEMP SET [Cluster End Date] = TAB.[Cluster End Date]
FROM
(
SELECT C1.ClientID,C1.RefAd1,C1.[Cluster Start Date],C2.[Cluster Start Date] [Cluster End Date]
FROM CTE C1
LEFT JOIN CTE C2 ON C1.RNO=C2.RNO-1
)TAB
WHERE TAB.[Cluster Start Date]=#TEMP.[Cluster Start Date]
Click here to view result
EDIT 2 :
If you want this to be done for ClientId and RefAd1.
;WITH CTE AS
(
-- Get current date and next date for each type of ClientId and RefAd1
SELECT ROW_NUMBER() OVER(PARTITION BY ClientID,RefAd1 ORDER BY [Cluster Start Date])RNO,*
FROM #TEMP
)
UPDATE #TEMP SET [Cluster End Date] = TAB.[Cluster End Date]
FROM
(
SELECT C1.ClientID,C1.RefAd1,C1.[Cluster Start Date],C2.[Cluster Start Date] [Cluster End Date]
FROM CTE C1
LEFT JOIN CTE C2 ON C1.RNO=C2.RNO-1 AND C1.ClientID=C2.ClientID AND C1.RefAd1=C2.RefAd1
)TAB
WHERE TAB.[Cluster Start Date]=#TEMP.[Cluster Start Date] AND TAB.ClientID=#TEMP.ClientID AND TAB.RefAd1=#TEMP.RefAd1
Click here to view result
If you want to do it only for ClientId, remove the conditions for RefAd1
Here is the script if you just want the view you described:
CREATE VIEW v_name as
SELECT
ClientId,
RefAd1,
[Cluster Start Date],
( SELECT
min([Cluster Start Date])
FROM yourTable
WHERE
t.[Cluster Start Date] < [Cluster Start Date]
) as [Cluster End Date]
FROM yourtable t

Get all funds which has at least minimum data points

I have two tables
1) Fund details
ID Symbol
-------------------
1 ABC
2 XYZ
2) Fund Price data
Fund_id date Price
-------------------------------------------
1 2014-07-01 00:00:00.000 25.25
1 2014-07-02 00:00:00.000 25.45
......
2 2014-07-01 00:00:00.000 75.25
2 2014-07-02 00:00:00.000 75.42
.......
Now what I want to achieve is:
Here I am fetching the monthly data of a particular Fund as below:
SELECT YEAR(date) [Year], MONTH(date) [Month],
DATENAME(MONTH,date) [Month Name], COUNT(1) [Sales Count], F.Symbol
FROM FundData FD inner join FundDetails F on F.ID = FD.Fund_ID
where F.Symbol = 'ABC'
GROUP BY YEAR(date), MONTH(date), DATENAME(MONTH, date), F.Symbol
Output:
Year Month Month Name Sales Count Symbol
-------------------------------------------
2014 4 April 21 ABC
2014 5 May 21 ABC
2014 6 June 21 ABC
2014 7 July 3 ABC
.......
Total Rows: 301
So here this is only for only particular fund which has returned 301 rows.
Now I want to get all the funds from the Fund details table which has rows less than given count ex 216 which I will pass as a parameter
Use Following query:
Declare #YourParameter int = 10
SELECT YEAR(date) [Year],
MONTH(date) [Month],
DATENAME(MONTH,date) [Month Name],
COUNT(1) [Sales Count],
F.Symbol
FROM FundData FD
INNER JOIN FundDetails F on FD.ID = F.Fund_ID
Where FD.ID IN (SELECT z.Fund_ID
FROM FundDetails z
WHERE z.Fund_ID=FD.ID
GROUP BY z.Fund_ID, YEAR(z.date), MONTH(z.date)
HAVING COUNT(*) <= #YourParameter
)
GROUP BY YEAR(date), MONTH(date), DATENAME(MONTH, date), F.Symbol
I have fixed it:
Declare #YourParameter int = 110
WITH CTE AS
(
SELECT YEAR(date) [Year], MONTH(date) [Month],
DATENAME(MONTH,date) [Month Name], COUNT(1) [Sales Count], F.Symbol
FROM FundData FD inner join FundDetails F on F.ID = FD.Fund_ID
where F.ID
IN (SELECT z.ID FROM FundDetails z)
GROUP BY F.Symbol, YEAR(date), MONTH(date), DATENAME(MONTH, date)
)
SELECT Symbol, COUNT(*) as cnt FROM CTE
GROUP BY Symbol
having COUNT(*) >= #YourParameter

Resources