I would appreciate if you could give me any hints regarding the fastest solution of the following SQL Server challenge:
Let's say I have a table with DATE, CLIENT and his several characteristics in other columns. I need to calculate COLUMN_1 and COLUMN_2 but:
COLUMN_1 uses the client's characteristics as of current DATE and as of previous DATE and COLUMN_1 value from the previous DATE (recursive referencing)
COLUMN_2 additionally uses COLUMN_1 value as of current date (therefore I would like to refer to its final value, not the particular 'case when' that implements the column logic)
How do I replicate this logic most efficiently in SQL Server?
I was thinking about the loop that goes over DATA and for each DATA, joins previous DATA, calculates firstly COLUMN_1, then COLUMN_2 (but how to make sure that the values in COLUMN_1 are accessible for COLUMN_2?)
Regards,
Bart
Without a specific example you we will not be able to tell you which solution would be the most efficient, especially when you are looking for a solution you describe as recursive. You might not need a full recursive solution if you could use window functions instead.
In sql server 2012+ you have access to lead() and lag() which you can use to get the previous and next values for a column based on a partition and order.
select
client
, date
, nextdate = lead(date) over (partition by client order by date)
, prevdate = lag(date) over (partition by client order by date)
, column1 = 'do stuff with lead/lag'
, column2 = 'do stuff with lead/lag'
from t
rextester example: http://rextester.com/FFHU71709
returns:
+--------+------------+------------+------------+------------------------+------------------------+
| client | date | nextdate | prevdate | column1 | column2 |
+--------+------------+------------+------------+------------------------+------------------------+
| 1 | 2017-01-01 | 2017-01-02 | NULL | do stuff with lead/lag | do stuff with lead/lag |
| 1 | 2017-01-02 | 2017-01-03 | 2017-01-01 | do stuff with lead/lag | do stuff with lead/lag |
| 1 | 2017-01-03 | NULL | 2017-01-02 | do stuff with lead/lag | do stuff with lead/lag |
| 2 | 2017-01-02 | 2017-01-04 | NULL | do stuff with lead/lag | do stuff with lead/lag |
| 2 | 2017-01-04 | 2017-01-06 | 2017-01-02 | do stuff with lead/lag | do stuff with lead/lag |
| 2 | 2017-01-06 | NULL | 2017-01-04 | do stuff with lead/lag | do stuff with lead/lag |
+--------+------------+------------+------------+------------------------+------------------------+
One way to simulate lead/lag prior to sql server 2012 is with outer apply()
select
client
, date
, nextdate
, prevdate
, column1 = 'do stuff with lead/lag'
, column2 = 'do stuff with lead/lag'
from t
outer apply (
select top 1 nextdate = i.date
from t i
where i.client = t.client
and i.date > t.date
order by i.date asc
) n
outer apply (
select top 1 prevdate = i.date
from t i
where i.client = t.client
and i.date < t.date
order by i.date desc
) p
rextester demo: http://rextester.com/GGS1299
returns:
+--------+------------+------------+------------+---------------------------------+---------------------------------+
| client | date | nextdate | prevdate | column1 | column2 |
+--------+------------+------------+------------+---------------------------------+---------------------------------+
| 1 | 2017-01-01 | 2017-01-02 | NULL | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 1 | 2017-01-02 | 2017-01-03 | 2017-01-01 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 1 | 2017-01-03 | NULL | 2017-01-02 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 2 | 2017-01-02 | 2017-01-04 | NULL | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 2 | 2017-01-04 | 2017-01-06 | 2017-01-02 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 2 | 2017-01-06 | NULL | 2017-01-04 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
+--------+------------+------------+------------+---------------------------------+---------------------------------+
For solutions that absolutely require recursion, then you probably need to use a recursive cte.
;with cte as (
-- non recursive cte to add `nextdate` for recursive join
select
t.client
, t.date
, nextdate = x.date
from t
outer apply (
select top 1 i.date
from t i
where i.client = t.client
and i.date > t.date
order by i.date asc
) x
)
, r_cte as (
--anchor rows / starting rows
select
client
, date
, nextdate
, prevDate = convert(date, null)
, column1 = convert(varchar(64),null)
, column2 = convert(varchar(64),null)
from cte t
where not exists (
select 1
from cte as i
where i.client = t.client
and i.date < t.date
)
union all
--recursion starts here
select
c.client
, c.date
, c.nextdate
, prevDate = p.date
, column1 = convert(varchar(64),'do recursive stuff with p.column1')
, column2 = convert(varchar(64),'do recursive stuff with p.column2')
from cte c
inner join r_cte p
on c.client = p.client
and c.date = p.nextdate
)
select *
from r_cte
rextester demo: http://rextester.com/LKH38243
returns:
+--------+------------+------------+------------+-----------------------------------+-----------------------------------+
| client | date | nextdate | prevdate | column1 | column2 |
+--------+------------+------------+------------+-----------------------------------+-----------------------------------+
| 1 | 2017-01-01 | 2017-01-02 | NULL | NULL | NULL |
| 2 | 2017-01-02 | 2017-01-04 | NULL | NULL | NULL |
| 2 | 2017-01-04 | 2017-01-06 | 2017-01-02 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
| 2 | 2017-01-06 | NULL | 2017-01-04 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
| 1 | 2017-01-02 | 2017-01-03 | 2017-01-01 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
| 1 | 2017-01-03 | NULL | 2017-01-02 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
+--------+------------+------------+------------+-----------------------------------+-----------------------------------+
Reference
Recursive Queries Using Common Table Expressions (cte)
If using SQL2012 or later, look at the features LAG & LEAD
For example, if you want to use the previous row's value in conjunction with this row's value - LAG like this:
DECLARE #T TABLE (DateCol DATETIME, StringCol VARCHAR(10))
INSERT INTO #T (DateCol, StringCol) VALUES ('2017-01-01','A'), ('2017-01-02','B'), ('2017-01-03','C'), ('2017-01-04','D'), ('2017-01-05','E')
SELECT DateCol, StringCol, PreviousRowStringcol = LAG(StringCol,1,NULL) OVER (ORDER BY DateCol) FROM #T
Related
I am reviewing reports that contain date ranges by a member ID and upload date. This looks like the following:
+--------------------+---------------------+---------------------+--------------------+
| UploadDate | MemberID | StartDate | EndDate |
| | | | |
+-------------------------------------------------------------------------------------+
| 08/01/2020 | 12345 | 04/01/2020 | 10/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 08/01/2020 | 12345 | 01/01/2020 | 03/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 06/01/2020 | 12345 | 01/01/2020 | 03/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 06/01/2020 | 98765 | 02/01/2020 | 03/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 06/01/2020 | 98765 | 05/01/2020 | 08/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 07/01/2020 | 34568 | 01/01/2020 | 12/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 07/01/2020 | 34568 | 03/31/2020 | 06/01/2020 |
| | | | |
+--------------------+---------------------+---------------------+--------------------+
I need to merge rows with the same UploadDate and the same MemberID where their are no gaps in the date range StartDate - EndDate. If there are gaps the rows will not be merged.
The expected output would be:
+--------------------+---------------------+---------------------+--------------------+
| UploadDate | MemberID | StartDate | EndDate |
| | | | |
+-------------------------------------------------------------------------------------+
| 08/01/2020 | 12345 | 01/01/2020 | 10/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 06/01/2020 | 12345 | 01/01/2020 | 03/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 06/01/2020 | 98765 | 02/01/2020 | 03/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 06/01/2020 | 98765 | 05/01/2020 | 08/31/2020 |
| | | | |
+-------------------------------------------------------------------------------------+
| 07/01/2020 | 34568 | 01/01/2020 | 12/31/2020 |
| | | | |
+--------------------+---------------------+---------------------+--------------------+
I had been trying the following without success:
SELECT
ROW_NUMBER() OVER(ORDER BY [MemberID],[StartDate],[EndDate]) AS RN,
[MemberID],
[StartDate],
[EndDate],
LAG([EndDate],1) OVER (ORDER BY [MemberID],[StartDate], [EndDate]) AS PreviousEndDate
FROM
[dbo].[RCNI]
SELECT
*,
CASE WHEN Groups.PreviousEndDate >= [StartDate] THEN 0 ELSE 1 END AS IslandStartInd,
SUM(CASE WHEN Groups.PreviousEndDate >= [StartDate] THEN 0 ELSE 1 END) OVER (ORDER BY Groups.RN) AS IslandId
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY [UploadDate], [MemberID],[StartDate], [Benefit End Date]) AS RN,
[UploadDate],
[MemberID],
[StartDate],
[EndDate],
LAG([EndDate],1) OVER (ORDER BY [UploadDate],[MemberID],[StartDate], [EndDate]) AS PreviousEndDate
FROM
[dbo].[RCNI]
) Groups
The solution below appears to work for the given sample data. In words with would be something like:
Filter out rows that have a period that falls completely within another row for the same UploadDate and MemberId (the not exists clause in the common table expression cte).
Look at the remaining rows for each UploadDate and MemberId combination (over(partition by r.UploadDate, r.MemberId ...) and sort them by StartDate (... order by r.StartDate)).
If the start date of a row comes before, is equal to or comes one day after the end date of the previous row for the combination (lag(r.EndDate) over(partition by r.UploadDate, r.MemberId order by r.StartDate)), then they must be merged.
If rows must be merged, then the start dates becomes the smallest start date of the combination (min(r.StartDate) over(partition by r.UploadDate, r.MemberId)). All rows that must be merged now have the same start date (StartDateNew).
Determine the new end date by grouping on UploadDate, MemberId and StartDateNew and taking the maximum value for EndDate.
Sample data
create table rcni
(
UploadDate date,
MemberId int,
StartDate date,
EndDate date
);
insert into rcni (UploadDate, MemberId, StartDate, EndDate) values
('08/01/2020', 12345, '04/01/2020', '10/31/2020'),
('08/01/2020', 12345, '01/01/2020', '03/31/2020'),
('06/01/2020', 12345, '01/01/2020', '03/31/2020'),
('06/01/2020', 98765, '02/01/2020', '03/31/2020'),
('06/01/2020', 98765, '05/01/2020', '08/31/2020'),
('07/01/2020', 34568, '01/01/2020', '12/31/2020'),
('07/01/2020', 34568, '03/31/2020', '06/01/2020');
Solution
with cte as
(
select r.UploadDate,
r.MemberId,
case
when r.StartDate <= dateadd(dd, 1, lag(r.EndDate) over(partition by r.UploadDate, r.MemberId order by r.StartDate))
then min(r.StartDate) over(partition by r.UploadDate, r.MemberId)
else r.StartDate
end as StartDateNew,
r.EndDate
from rcni r
where not exists ( select 'x'
from rcni r2
where r2.UploadDate = r.UploadDate
and r2.MemberId = r.MemberId
and r2.StartDate < r.StartDate
and r2.EndDate > r.EndDate )
)
select c.UploadDate,
c.MemberId,
c.StartDateNew,
max(c.EndDate) as EndDateNew
from cte c
group by c.UploadDate,
c.MemberId,
c.StartDateNew;
Fiddle
I have a table dbo.X with DateTime column lastUpdated and a code product column CodeProd which may have hundreds of records, with CodeProd duplicated because the table is used as "stock history"
My Stored Procedure has parameter #Date, I want to get all CodeProd nearest to that date so for example if I have:
+----------+--------------+--------+
| CODEPROD | lastUpdated | STATUS |
+----------+--------------+--------+
| 10 | 2-1-2019 | C1 |
| 10 | 1-1-2019 | C2 |
| 10 | 31-12-2019 | C1 |
| 11 | 31-12-2018 | C1 |
| 11 | 30-12-2018 | C1 |
| 12 | 30-8-2018 | C3 |
+----------+--------------+--------+
and #Date= '1-1-2019'
I wanna get:
+----+--------------+------+
| 10 | 1-1-2019 | C2 |
| 11 | 31-12-2018 | C1 |
| 12 | 30-8-2018 | C3 |
+----+--------------+------+
How to find it?
You can use TOP(1) WITH TIES to get one row with nearest date for each CODEPROD which should be less than provided date.
Try like following code.
SELECT TOP(1) WITH TIES *
FROM [YourTableName]
WHERE lastupdated <= #date
ORDER BY Row_number()
OVER (
partition BY [CODEPROD]
ORDER BY lastupdated DESC);
You can use apply :
select distinct t.CODEPROD, t1.lastUpdated, t1.STATUS
from table t cross apply
( select top (1) t1.*
from table t1
where t1.CODEPROD = t.CODEPROD and t1.lastUpdated <= #date
order by t1.lastUpdated desc
) t1;
Sorry for the title if you find it incorrect, I really wasn't sure how to name this question. There is probably a term for this type of query/pattern.
I have a sequence of records that need to be ordered by date, the records have a condition I would like to "group" by (SomeCondition) to get the earliest start date and latest end date (taking NULL's into account) but I'm unsure how to accomplish the query (if it's even possible). The original records in the table look something like;
-----------------------------------------------------------
| AbcID | XyzID | StartDate | EndDate | SomeCondition |
-----------------------------------------------------------
| 1 | 1 | 2018-01-01 | 2018-03-05 | 1 |
| 2 | 1 | 2018-04-20 | 2018-05-01 | 1 |
| 3 | 1 | 2018-05-02 | 2018-05-15 | 0 |
| 4 | 1 | 2018-06-01 | 2018-07-01 | 1 |
| 5 | 1 | 2018-08-01 | NULL | 1 |
| 6 | 2 | 2018-01-01 | 2018-06-30 | 1 |
| 7 | 2 | 2018-07-01 | 2018-08-31 | 0 |
-----------------------------------------------------------
The result I'm going for would be;
-----------------------------------
| XyzID | StartDate | EndDate |
-----------------------------------
| 1 | 2018-01-01 | 2018-05-01 |
| 1 | 2018-06-01 | NULL |
| 2 | 2018-01-01 | 2018-06-30 |
-----------------------------------
Thanks for any help/insight, even if it's "not possible".
Solving this problem requires you to solve it piece by piece. Here are the steps that I used to do that:
Determine when the island begins (when SomeCondition is false)
Create an "ID" number for each island (within each XyzID) by summing the number of IslandBegins while considering the records in AbcID order
Determine the first and last AbcID within each XyzID/IslandNumber combination where SomeCondition is true
Use the previous step as a guide as to what StartDate / EndDate you should get for each record in the result set
Sample Data:
declare #sample_data table
(
AbcID int
, XyzID int
, StartDate date
, EndDate date
, SomeCondition bit
)
insert into #sample_data
values (1, 1, '2018-01-01', '2018-03-05', 1)
, (2, 1, '2018-04-20', '2018-05-01', 1)
, (3, 1, '2018-05-02', '2018-05-15', 0)
, (4, 1, '2018-06-01', '2018-07-01', 1)
, (5, 1, '2018-08-01', NULL, 1)
, (6, 2, '2018-01-01', '2018-06-30', 1)
, (7, 2, '2018-07-01', '2018-08-31', 0)
Answer:
The comments in the code show which step each part of the CTE is accomplishing.
with island_bgn as
(
--Step 1
select d.AbcID
, d.XyzID
, d.StartDate
, d.EndDate
, d.SomeCondition
, case when d.SomeCondition = 0 then 1 else 0 end as IslandBegin
from #sample_data as d
)
, island_nbr as
(
--Step 2
select b.AbcID
, b.XyzID
, b.StartDate
, b.EndDate
, b.SomeCondition
, b.IslandBegin
, sum(b.IslandBegin) over (partition by b.XyzID order by b.AbcID asc) as IslandNumber
from island_bgn as b
)
, prelim as
(
--Step 3
select n.XyzID
, n.IslandNumber
, min(n.AbcID) as AbcIDMin
, max(n.AbcID) as AbcIDMax
from island_nbr as n
where 1=1
and n.SomeCondition = 1
group by n.XyzID
, n.IslandNumber
)
--Step 4
select p.XyzID
, a.StartDate
, b.EndDate
from prelim as p
inner join #sample_data as a on p.AbcIDMin = a.AbcID
inner join #sample_data as b on p.AbcIDMax = b.AbcID
order by p.XyzID
, a.StartDate
, b.EndDate
Results:
+-------+------------+------------+
| XyzID | StartDate | EndDate |
+-------+------------+------------+
| 1 | 2018-01-01 | 2018-05-01 |
+-------+------------+------------+
| 1 | 2018-06-01 | NULL |
+-------+------------+------------+
| 2 | 2018-01-01 | 2018-06-30 |
+-------+------------+------------+
Let's say we have this and want to see all Tasks, that havent been done yet and an additional column showing how many open Tasks there are left for this customer.
I have a table like this in my database:
+------------+--------------------------+-------+
| CustomerID | Task | Done |
+------------+--------------------------+-------+
| 1 | CleanRoom | False |
| 1 | Cleandishes | True |
| 1 | WashClothes | False |
| 2 | TakeDogsOut | True |
| 2 | PlayWithKids | True |
| 3 | HaveFunWithMrSamplesWife | True |
| 3 | CleanMrSamplesCar | False |
+------------+--------------------------+-------+
I need this as returned table:
+------------+-------------------+-------------+
| CustomerID | Task | DoneOverAll |
+------------+-------------------+-------------+
| 1 | CleanRoom | 2 |
| 1 | WashClothes | 2 |
| 3 | CleanMrSamplesCar | 1 |
+------------+-------------------+-------------+
Perfect return table would be like this, but I can do that myself when I have the one above:
About this a question; Doing this will probably be a String combination task. Should I do this on the Select statement, or would it be more advisable to do that in the final application on the client computer?
+------------+-------------------+-------------+
| CustomerID | Task | DoneOverAll |
+------------+-------------------+-------------+
| 1 | CleanRoom | 1/3 |
| 1 | WashClothes | 1/3 |
| 3 | CleanMrSamplesCar | 1/2 |
+------------+-------------------+-------------+
I know I could go like
SELECT
a.CustomerID,
a.Task,
(
Select count(*) from myTable where
customerID = a.CustomerID and
done = False
) as DoneOverAll
FROM myTable as a
WHERE Done = False
But I think that this is very ineffective, since it would execute a Select Count for each row in my table. Is there a way to achieve this with a JOIN using groupBy or something? I'm not into GroupBy commands yet.
Okay I should have tried first. Came up with the following;
Select count(*), CustomerID from myTable group by CustomerID
All I need to do now is to get this into a join.
Okay, got it. Sorry again for not trying first!
SELECT
a.CustomerID,
a.Task,
b.cnt
FROM myTable as a
LEFT JOIN (select count(*) AS cnt, CustomerID FROM myTable GROUP BY CustomerID) as b on a.CustomerID = B.CustomerID
WHERE Done = False
Question left;
Perfect return table would be like this, but I can do that myself when I have the one above:
About this a question; Doing this will probably be a String combination task. Should I do this on the Select statement, or would it be more advisable to do that in the final application on the client computer?
+------------+-------------------+-------------+
| CustomerID | Task | DoneOverAll |
+------------+-------------------+-------------+
| 1 | CleanRoom | 1/3 |
| 1 | WashClothes | 1/3 |
| 3 | CleanMrSamplesCar | 1/2 |
+------------+-------------------+-------------+
I'm not sure why Done = False, but this is your logic. :-)
Here's what I would do, without the LEFT JOIN.
SELECT
a.CustomerID,
a.Task,
SUM(CASE WHEN a.Done = 'False' THEN 1 ELSE 0 END) DoneOverAll,
SUM(Case WHEN a.Done = 'True' THEN 1 ELSE 0 END) NotDone
FROM myTable as a
Group By a.CustomerID, a.Task
Do calculate separately .
;with tempfalse as(
SELECT
a.CustomerID,
a.Task,
count(*) as DoneOverAll
FROM myTable as a
WHERE Done = False
group by a.CustomerID, a.Task
)
, temptrue (
SELECT
a.CustomerID,
a.Task,
count(*) as total
FROM myTable as a
group by a.CustomerID, a.Task
)
SELECT
a.CustomerID,
a.Task,
cast(NULLIF(DoneOverAll,0) as varchar (10) ) + '/' + cast(NULLIF(b.total,0) as varchar (10) )
from temptrue as a left join tempfalse b
on a.CustomerID =a.CustomerID and
a.Task = b.Task
Hello I have a temp table (#tempResult) that contains results like the following...
-----------------------------------------
| DrugAliasID | Dosage1 | Unit1 | rowID |
-----------------------------------------
| 322 | 10 | MG | 1 |
| 322 | 50 | ML | 2 |
| 441 | 20 | ML | 3 |
| 443 | 15 | ML | 4 |
-----------------------------------------
I'm looking to get the results to be like the following, pivoting the rows that have the same DrugAliasID.
--------------------------------------------------
| DrugAliasID | Dosage1 | Unit1 | Dosage2 | Unit2 |
--------------------------------------------------
| 322 | 10 | MG | 50 | ML |
| 441 | 20 | ML | NULL | NULL |
| 443 | 15 | ML | NULL | NULL |
--------------------------------------------------
So far I have a solution that isn't using pivot. I'm not too good with pivot and was wondering if anyone knew how to use it in this scenario. Or solve it some other way. Thanks
SELECT
tr.drugAliasID,
MIN(trmin.dosage1) AS dosage1,
MIN(trmin.unit1) AS unit1,
MIN(trmax.dosage1) AS dosage2,
MIN(trmax.unit1) AS unit2
FROM
#tempResult tr
JOIN
#tempResult trmin ON trmin.RowID = tr.rowid AND trmin.drugAliasID = tr.drugAliasID
JOIN
#tempResult trmax ON trmax.RowID = tr.rowid AND trmax.drugAliasID = tr.drugAliasID
JOIN
(SELECT
MIN(RowID) AS rowid,
drugAliasID
FROM
#tempResult
GROUP BY
drugAliasID) tr1 ON tr1.rowid = trmin.RowID
JOIN
(SELECT
MAX(RowID) AS rowid,
drugAliasID
FROM
#tempResult
GROUP BY
drugAliasID) tr2 ON tr2.rowid = tr.RowID
GROUP BY
tr.drugAliasID
HAVING
count(tr.drugAliasID) > 1
Assuming your version of SQL Server supports the use of CTEs, you can simplify your query thus:
;with cte as
(select *, row_number() over (partition by drugaliasid order by rowid) rn
from #tempResult
)
select c.drugaliasid, c.dosage1, c.unit1, c2.dosage1 as dosage2, c2.unit1 as unit2
from cte c
left join cte c2 on c.drugaliasid = c2.drugaliasid and c.rn = 1 and c2.rn = 2
where c.rn = 1
Demo
This will give you the desired result, without having to use the pivot keyword.