Row Over Partition with Case SQL Server - sql-server

The new_commsstream column below calculates if the previous row's date, partitioned by persondid and ordered by a few other columns including the date in a subquery, is greater than 90 days and returns a 1 if it is and a 0 otherwise:
create view Motability_Dataset_Staging_cmp as
select
mdsc.PersonID,
mdsc.AddressID,
mdsc.Email,
mdsc.Reportdate_month,
mdsc.Channel,
mdsc.CommsMedium,
mdsc.Campaign_Name,
mdsc.Category,
mdsc.MRM_Campaign_code,
mdsc.Action_id,
mdsc.NumSents,
mdsc.ReportDate,
isnull(cmp.ppersonid,mdsc.PersonID) as Prev_PersonID,
isnull(cmp.paddressid,mdsc.AddressID) as Prev_AddressID,
isnull(cmp.pmrmcampaigncode,mdsc.MRM_Campaign_code) as Prev_MRMCampaignCode,
isnull(cmp.pactionid,mdsc.Action_id) as Prev_ActionID,
isnull(cmp.preportdate,mdsc.ReportDate) as Prev_ReportDate,
isnull(cmp.commsdaysinterval,0) as Prev_CommsDays,
isnull(cmp.newcommsstream,0) as New_CommsStream
from Motability_Dataset_Staging as mdsc
left join
(select
cmp.row +1 as row,pcmp.row as prow,
cmp.personid as personid,pcmp.personid as ppersonid,
cmp.addressid as addressid,pcmp.addressid as paddressid,
cmp.MRM_Campaign_code as mrmcampaigncode,pcmp.MRM_Campaign_code as pmrmcampaigncode,
cmp.Action_id as actionid,pcmp.Action_id as pactionid,
cmp.reportdate as reportdate,pcmp.reportdate as preportdate,
datediff(day,cmp.ReportDate,pcmp.ReportDate) as commsdaysinterval,
case when datediff(day,cmp.ReportDate,pcmp.ReportDate) <-90 then 1 else 0 end as newcommsstream
from
(select row_number() over(partition by personid order by personid,addressid,reportdate,mrm_campaign_code,action_id)-1 as row,personid,addressid,MRM_Campaign_code,action_id,reportdate from Motability_Dataset_Staging) cmp
inner join (select row_number() over(partition by personid order by personid,addressid,reportdate,mrm_campaign_code,action_id) as row,personid,addressid,MRM_Campaign_code,action_id,reportdate from Motability_Dataset_Staging) pcmp on cmp.row = pcmp.row and cmp.personid=pcmp.personid
) cmp
on mdsc.PersonID = cmp.personid and mdsc.AddressID = cmp.addressid and mdsc.MRM_Campaign_code=cmp.mrmcampaigncode
I'm struggling to then partition by person id and new_commsstream so every time there's a 1 within the same personid it adds a new row number otherwise returns a 1:
personid new_commsstream row
1 0 1
1 0 1
1 0 1
1 1 2
1 0 2
2 0 1
3 0 1
4 0 1
5 0 1
5 1 2
5 1 3
Any ideas how to achieve this?
Thanks.

I'm not sure if it helps but you do not need to SELECT data for ROW_NUMBER() twice.
You can just place it into a SQL Server CTE exression as follows
Then you can refer to it twice
;with cmp as (
select
row_number() over(partition by personid order by addressid,reportdate,mrm_campaign_code,action_id) as row,
personid,
addressid,
MRM_Campaign_code,
action_id,
reportdate
from Motability_Dataset_Staging
), cmp2 as (
select
cmp2.*, -- previous values
cmp.* --
from cmp
left join cmp as cmp2 -- previous
cmp.row = cmp2.row + 1
)
select
mdsc.PersonID,
mdsc.AddressID,
mdsc.Email,
mdsc.Reportdate_month,
mdsc.Channel,
mdsc.CommsMedium,
mdsc.Campaign_Name,
mdsc.Category,
mdsc.MRM_Campaign_code,
mdsc.Action_id,
mdsc.NumSents,
mdsc.ReportDate,
isnull(cmp.ppersonid,mdsc.PersonID) as Prev_PersonID,
isnull(cmp.paddressid,mdsc.AddressID) as Prev_AddressID,
isnull(cmp.pmrmcampaigncode,mdsc.MRM_Campaign_code) as Prev_MRMCampaignCode,
isnull(cmp.pactionid,mdsc.Action_id) as Prev_ActionID,
isnull(cmp.preportdate,mdsc.ReportDate) as Prev_ReportDate,
isnull(cmp.commsdaysinterval,0) as Prev_CommsDays,
isnull(cmp.newcommsstream,0) as New_CommsStream
from Motability_Dataset_Staging as mdsc
inner join cmp on ......

Related

How can I use Row Num partition by different default value

I have something like
ID Mobile isOptOut
1 123 1
2 123 0
3 123 0
4 123 1
5 234 1
6 234 0
to have something like partition by mobile and isOptOut
if the isOptOut is equal to 1 start from 0
otherwise start from 6
ID Mobile isOptOut RowNum
1 123 1 0
4 123 1 1
2 123 0 6
3 123 0 7
5 234 1 0
6 234 0 6
select *,
case when isOptOut = 0 then ROW_Number() OVER(
PARTITION BY Mobile ,isOptOut
order by Mobile ,isOptOut
) as [Row Number]
from r
where isOptOut = 1
Thanks so much
You've on the right path with the window ROW_NUMBER function. However,
As you want to number all the rows, you'll need it for all rows (not just isOptOut = 0)
In the window function, I've ordered it by ID so it will always return the same values (if you order by the same fields as the partition, then they could come out in any order)
Once you have the row_numbers, add the modifier e.g., if isOptOut = 0, add 5 to the row number; or if it's 1, subtract 1
; WITH src AS
(select *,
ROW_Number() OVER(
PARTITION BY Mobile, isOptOut
ORDER BY ID -- Note I changed this to 'ID' for ordering
) as [rn]
FROM r
)
SELECT ID, Mobile, isOptOut,
CASE isOptOut
WHEN 0 THEN rn + 5
WHEN 1 THEN rn - 1
ELSE NULL
END AS RowNum
FROM src
SELECT ID,Mobile,isOptOut, IIF(isOptOut=1,ROWNUMBER-1,ROWNUMBER+5),ROWNUMBER FROM (
SELECT *, (ROW_Number() OVER(
PARTITION BY Mobile,isOptOut ORDER BY ID, Mobile,isOptOut )) ROWNUMBER
FROM #table) as T ORDER BY ID
I think subquery and IIF suit you to reach your target. I created a window function and in the main query the condition.

Performance issue with CTE SQL Server query

We have a table with a parent child relationship, that represents a deep tree structure.
We are using a view with a CTE to query the data but the performance is poor (see code and execution plan below).
Is there any way we can improve the performance?
WITH cte (ParentJobTypeId, Id) AS
(
SELECT
Id, Id
FROM
dbo.JobTypes
UNION ALL
SELECT
e.Id, cte.Id
FROM
cte
INNER JOIN
dbo.JobTypes AS e ON e.ParentJobTypeId = cte.ParentJobTypeId
)
SELECT
ISNULL(Id, 0) AS ParentJobTypeId,
ISNULL(ParentJobTypeId, 0) AS Id
FROM
cte
A quick example of using the range keys. As I mentioned before, hierarchies were 127K points and some sections where 15 levels deep
The cte Builds, let's assume the hier results will be will be stored in a table (indexed as well)
Declare #Table table(ID int,ParentID int,[Status] varchar(50))
Insert #Table values
(1,101,'Pending'),
(2,101,'Complete'),
(3,101,'Complete'),
(4,102,'Complete'),
(101,null,null),
(102,null,null)
;With cteOH (ID,ParentID,Lvl,Seq)
as (
Select ID,ParentID,Lvl=1,cast(Format(ID,'000000') + '/' as varchar(500)) from #Table where ParentID is null
Union All
Select h.ID,h.ParentID,cteOH.Lvl+1,Seq=cast(cteOH.Seq + Format(h.ID,'000000') + '/' as varchar(500)) From #Table h INNER JOIN cteOH ON h.ParentID = cteOH.ID
),
cteR1 as (Select ID,Seq,R1=Row_Number() over (Order by Seq) From cteOH),
cteR2 as (Select A.ID,R2 = max(B.R1) From cteOH A Join cteR1 B on (B.Seq Like A.Seq+'%') Group By A.ID)
Select B.R1
,C.R2
,A.Lvl
,A.ID
,A.ParentID
Into #TempHier
From cteOH A
Join cteR1 B on (A.ID=B.ID)
Join cteR2 C on (A.ID=C.ID)
Select * from #TempHier
Select H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
,Total = count(*)
,Complete = sum(case when D.Status = 'Complete' then 1 else 0 end)
,Pending = sum(case when D.Status = 'Pending' then 1 else 0 end)
,PctCmpl = format(sum(case when D.Status = 'Complete' then 1.0 else 0.0 end)/count(*),'##0.00%')
From #TempHier H
Join (Select _R1=B.R1,A.* From #Table A Join #TempHier B on A.ID=B.ID) D on D._R1 between H.R1 and H.R2
Group By H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
Order By 1
Returns the hier in a #Temp table for now. Notice the R1 and R2, I call these the range keys. Data (without recursion) can be selected and aggregated via these keys
R1 R2 Lvl ID ParentID
1 4 1 101 NULL
2 2 2 1 101
3 3 2 2 101
4 4 2 3 101
5 6 1 102 NULL
6 6 2 4 102
VERY SIMPLE EXAMPLE: Illustrates the rolling the data up the hier.
R1 R2 Lvl ID ParentID Total Complete Pending PctCmpl
1 4 1 101 NULL 4 2 1 50.00%
2 2 2 1 101 1 0 1 0.00%
3 3 2 2 101 1 1 0 100.00%
4 4 2 3 101 1 1 0 100.00%
5 6 1 102 NULL 2 1 0 50.00%
6 6 2 4 102 1 1 0 100.00%
The real beauty of the the range keys, is if you know an ID, you know where it exists (all descendants and ancestors).

Moving Median, Mode in T-SQL

I am using SQL Server 2012 and I know it is quite simple to calculate moving averages.
But what I need is to get the mode and the median for a defined window frame like so (with a window of 2 preceding to current row; month unique):
MONTH | CODE | MEDIAN | MODE
1 0 0 0
2 3 1.5 0
3 2 2 0
4 2 2 2
5 2 2 2
6 5 2 2
7 3 3 2
If several values qualify as mode, than pick the first.
I commented my code thoroughly. Read my comments on my Mode calculations and let me know it needs tweaking. Overall, it's a relatively simple query. It just has a lot of ugly subqueries and it has a lot of comments. Check it out:
DECLARE #Table TABLE ([Month] INT,[Code] INT);
INSERT INTO #Table
VALUES (1,0),
(2,3),
(3,2),
(4,2), --Try commenting this out to test my special mode thingymajig
(5,2),
(6,5),
(7,3);
WITH CTE
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [Month]) row_num,
[Month],
CAST(Code AS FLOAT) Code
FROM #Table
)
SELECT [Month],
Code,
ISNULL((
SELECT CASE
--When there is only one previous value at row_num = 2, find Mean of first two codes
WHEN A.row_num = 2 THEN (LAG(B.code,1) OVER (ORDER BY [Code]) + B.Code)/2.0
--Else find middle code value of current and previous two rows
ELSE B.Code
END
FROM CTE B
--How subquery relates to outer query
WHERE B.row_num BETWEEN A.row_num - 2 AND A.row_num
ORDER BY B.[Code]
--Order by code and offset by 1 so don't select the lowest value, but fetch the one above the lowest value
OFFSET 1 ROW FETCH NEXT 1 ROW ONLY),
0) AS Median,
--I did mode a little different
--Instead of Avg(D.Code) you could list the values because with mode,
--If there's a tie with more than one of each number, you have multiple modes
--Instead of doing that, I simply return the mean of the tied modes
--When there's one, it doesn't change anything.
--If you were to delete the month 4, then your number of Codes 2 and number of Codes 3 would be the same in the last row.
--Proper mode would be 2,3. I instead average them out to be 2.5.
ISNULL((
SELECT AVG(D.Code)
FROM (
SELECT C.Code,
COUNT(*) cnt,
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) dnse_rank
FROM CTE C
WHERE C.row_num <= A.row_num
GROUP BY C.Code
HAVING COUNT(*) > 1) D
WHERE D.dnse_rank = 1),
0) AS Mode
FROM CTE A
Results:
Month Code Median Mode
----------- ---------------------- ---------------------- ----------------------
1 0 0 0
2 3 1.5 0
3 2 2 0
4 2 2 2
5 2 2 2
6 5 2 2
7 3 3 2
If I understood your requirements correctly, your source table contains MONTH and CODE columns, and you want to calculate MEDIAN and MODE.
The query below calculates MEDIAN and MODE with moving window <= than 3 month ("2 preceding to current row") and returns the results matching your example.
-----------------------------------------------------
--Demo data
-----------------------------------------------------
CREATE TABLE #Data(
[Month] INT NOT NULL,
[Code] INT NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[Month] ASC
));
INSERT #Data
([Month],[Code])
VALUES
(1,0),
(2,3),
(3,2),
(4,2),
(5,2),
(6,5),
(7,3);
-----------------------------------------------------
--Query
-----------------------------------------------------
DECLARE #PrecedingRowsLimit INT = 2;
WITH [MPos] AS
(
SELECT [R].[Month]
, [RB].[Month] AS [SubId]
, [RB].[Code]
, ROW_NUMBER() OVER(PARTITION BY [R].[Month] ORDER BY [RB].[Code]) AS [RowNumberInPartition]
, CASE
WHEN [R].[Count] % 2 = 1 THEN ([R].[Count] + 1) / 2
ELSE NULL
END AS [MedianPosition]
, CASE
WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2
ELSE NULL
END AS [MedianPosition1]
, CASE
WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2 + 1
ELSE NULL
END AS [MedianPosition2]
FROM
(
SELECT [RC].[Month]
, [RC].[RowNumber]
, CASE WHEN [RC].[Count] > #PrecedingRowsLimit + 1 THEN #PrecedingRowsLimit + 1 ELSE [RC].[Count] END AS [Count]
FROM
(
SELECT [Month]
, ROW_NUMBER() OVER(ORDER BY [Month]) AS [RowNumber]
, ROW_NUMBER() OVER(ORDER BY [Month]) AS [Count]
FROM #Data
) [RC]
) [R]
INNER JOIN #Data [RB]
ON [R].[Month] >= [RB].[Month]
AND [RB].[Month] >= [R].[RowNumber] - #PrecedingRowsLimit
)
SELECT DISTINCT [M].[Month]
, [ORIG].[Code]
, COALESCE([ME].[Code],([M1].[Code] + [M2].[Code]) / 2.0) AS [Median]
, [MOD].[Mode]
FROM [MPos] [M]
LEFT JOIN [MPOS] [ME]
ON [M].[Month] = [ME].[Month]
AND [M].[MedianPosition] = [ME].[RowNumberInPartition]
LEFT JOIN [MPOS] [M1]
ON [M].[Month] = [M1].[Month]
AND [M].[MedianPosition1] = [M1].[RowNumberInPartition]
LEFT JOIN [MPOS] [M2]
ON [M].[Month] = [M2].[Month]
AND [M].[MedianPosition2] = [M2].[RowNumberInPartition]
INNER JOIN
(
SELECT [MG].[Month]
, FIRST_VALUE([MG].[Code]) OVER (PARTITION BY [MG].[Month] ORDER BY [MG].[Count] DESC , [MG].[SubId] ASC) AS [Mode]
FROM
(
SELECT [Month] , MIN([SubId]) AS [SubId], [Code] , COUNT(1) AS [Count]
FROM [MPOS]
GROUP BY [Month] , [Code]
) [MG]
) [MOD]
ON [M].[Month] = [MOD].[Month]
INNER JOIN #Data [ORIG]
ON [ORIG].[Month] = [M].[Month]
ORDER BY [M].[Month];

Trying avoid using cursor

I have been given a query and trying to figure out a way to remove the cursor yet maintaining functionality, because the starting table can get into the millions of rows.
Example of data in table:
ID DollarValue Month RowNumber
1 $10 1/1/2014 1
1 $15 2/1/2014 2
1 -$40 3/1/2014 3
1 $50 4/1/2014 4
2 -$11 1/1/2014 1
2 $11 2/1/2014 2
2 $5 3/1/2014 3
Expected results:
ID DollarValue Month RowNumber TestVal
1 $10 1/1/2014 1 1
1 $15 2/1/2014 2 0
1 -$40 3/1/2014 3 -1
1 $50 4/1/2014 4 1
2 -$11 1/1/2014 1 -1
2 $11 2/1/2014 2 0
2 $5 3/1/2014 3 1
Here is the logic (pseudocode)that happens inside the cursor:
If a #ID <> #LastId AND #Month <> #LastMonth
Set #RunningTotal = #DollarValue
Set #LastMonth = '12/31/2099'
Set #LastID = #ID
Set #TestVal = Sign(#DollarValue)
Else
If Sign(#RunningTotal) = Sign(#RunningTotal + #DollarValue)
Set #TestVal = 0
Else
Set #TestVal = Sign(#DollarValue)
Set #RunningTotal = #RunningTotal + #DollarValue
Any idea how I can change this to set based?
You can use the windowed version of SUM to calculate running totals:
;WITH CTE AS (
SELECT ID, DollarValue, Month, RowNumber,
SUM ( DollarValue ) OVER (PARTITION BY ID ORDER BY RowNumber) as RunningTotal
FROM #mytable
)
SELECT C1.ID, C1.DollarValue, C1.Month, C1.RowNumber,
CASE WHEN C1.RowNumber = 1 THEN SIGN(C1.DollarValue)
WHEN SIGN(C1.RunningTotal) = SIGN(C2.RunningTotal) THEN 0
ELSE SIGN(C1.RunningTotal)
END AS TestVal
FROM CTE AS C1
LEFT JOIN CTE AS C2 ON C1.ID = C2.ID AND C1.RowNumber = C2.RowNumber + 1
Using LEFT JOIN on RowNumber you can get the previous record and compare the current running total with the previous one. Then use a simple CASE to apply rules pertinent to changes in SIGN of running total.
SQL FIDDLE Demo
P.S. It seems the above solution wont work in versions prior to SQL Server 2012. In this case the running total calculation inside the CTE has to be replaced by the "conventional" version.
This is 2008 solution
WITH CTE AS (
SELECT
AA.[ID]
,AA.[Month]
,AA.[RowNumber]
,AA.[DollarValue]
,SIGN(SUM(BB.[DollarValue])) AS RunTotalSign
FROM YourTable AS AA
LEFT JOIN YourTable AS BB
ON (AA.[ID] = BB.[ID] AND BB.[RowNumber] <= AA.[RowNumber])
GROUP BY AA.[ID],AA.[Month],AA.[DollarValue],AA.[RowNumber])
)
SELECT
AA.[ID]
,AA.[Month]
,AA.[RowNumber]
,AA.[DollarValue]
,CASE WHEN AA.RunTotalSign = CC.RunTotalSign Then 0
ELSE AA.RunTotalSign
END
AS TestVal
FROM CTE AS AA
LEFT JOIN CTE AS CC
ON (AA.[ID] = CC.[ID] AND AA.[RowNumber] = CC.[RowNumber]+1)

Filter Duplicate Rows on Conditions

I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id
select id, rid, min(modified), max(active), did from foo group by rid, did order by id;
You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.

Resources