Performance issue with CTE SQL Server query - sql-server

We have a table with a parent child relationship, that represents a deep tree structure.
We are using a view with a CTE to query the data but the performance is poor (see code and execution plan below).
Is there any way we can improve the performance?
WITH cte (ParentJobTypeId, Id) AS
(
SELECT
Id, Id
FROM
dbo.JobTypes
UNION ALL
SELECT
e.Id, cte.Id
FROM
cte
INNER JOIN
dbo.JobTypes AS e ON e.ParentJobTypeId = cte.ParentJobTypeId
)
SELECT
ISNULL(Id, 0) AS ParentJobTypeId,
ISNULL(ParentJobTypeId, 0) AS Id
FROM
cte

A quick example of using the range keys. As I mentioned before, hierarchies were 127K points and some sections where 15 levels deep
The cte Builds, let's assume the hier results will be will be stored in a table (indexed as well)
Declare #Table table(ID int,ParentID int,[Status] varchar(50))
Insert #Table values
(1,101,'Pending'),
(2,101,'Complete'),
(3,101,'Complete'),
(4,102,'Complete'),
(101,null,null),
(102,null,null)
;With cteOH (ID,ParentID,Lvl,Seq)
as (
Select ID,ParentID,Lvl=1,cast(Format(ID,'000000') + '/' as varchar(500)) from #Table where ParentID is null
Union All
Select h.ID,h.ParentID,cteOH.Lvl+1,Seq=cast(cteOH.Seq + Format(h.ID,'000000') + '/' as varchar(500)) From #Table h INNER JOIN cteOH ON h.ParentID = cteOH.ID
),
cteR1 as (Select ID,Seq,R1=Row_Number() over (Order by Seq) From cteOH),
cteR2 as (Select A.ID,R2 = max(B.R1) From cteOH A Join cteR1 B on (B.Seq Like A.Seq+'%') Group By A.ID)
Select B.R1
,C.R2
,A.Lvl
,A.ID
,A.ParentID
Into #TempHier
From cteOH A
Join cteR1 B on (A.ID=B.ID)
Join cteR2 C on (A.ID=C.ID)
Select * from #TempHier
Select H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
,Total = count(*)
,Complete = sum(case when D.Status = 'Complete' then 1 else 0 end)
,Pending = sum(case when D.Status = 'Pending' then 1 else 0 end)
,PctCmpl = format(sum(case when D.Status = 'Complete' then 1.0 else 0.0 end)/count(*),'##0.00%')
From #TempHier H
Join (Select _R1=B.R1,A.* From #Table A Join #TempHier B on A.ID=B.ID) D on D._R1 between H.R1 and H.R2
Group By H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
Order By 1
Returns the hier in a #Temp table for now. Notice the R1 and R2, I call these the range keys. Data (without recursion) can be selected and aggregated via these keys
R1 R2 Lvl ID ParentID
1 4 1 101 NULL
2 2 2 1 101
3 3 2 2 101
4 4 2 3 101
5 6 1 102 NULL
6 6 2 4 102
VERY SIMPLE EXAMPLE: Illustrates the rolling the data up the hier.
R1 R2 Lvl ID ParentID Total Complete Pending PctCmpl
1 4 1 101 NULL 4 2 1 50.00%
2 2 2 1 101 1 0 1 0.00%
3 3 2 2 101 1 1 0 100.00%
4 4 2 3 101 1 1 0 100.00%
5 6 1 102 NULL 2 1 0 50.00%
6 6 2 4 102 1 1 0 100.00%
The real beauty of the the range keys, is if you know an ID, you know where it exists (all descendants and ancestors).

Related

T-SQL select rows where [col] = MIN([col])

I have a data set produced from a UNION query that aggregates data from 2 sources.
I want to select that data based on whether or not data was found in only of those sources,or both.
The data relevant parts of the set looks like this, there are a number of other columns:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
1
135
1
5
1
135
2
6
1
135
3
7
2
111
1
8
2
135
1
The [preference] column combined with the [group] column is what I'm trying to filter on, I want to return all the rows that have the same [preference] as the MIN([preference]) for each [group]
The desired output given the data above would be rows 1 -> 6
The [preference] column indicates the original source of the data in the UNION query so a legitimate data set could look like:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
2
111
1
5
2
135
1
In which case the desired output would be rows 1,2,3, & 5
What I can't work out is how to do (not real code):
SELECT * WHERE [preference] = MIN([preference]) PARTITION BY [group]
One way to do this is using RANK:
SELECT row
, preference
, [group]
, position
FROM (
SELECT row
, preference
, [group]
, position
, RANK() OVER (PARTITION BY [group] ORDER BY preference) AS seq
FROM t) t2
WHERE seq = 1
Demo here
Should by doable via simple inner join:
SELECT t1.*
FROM t AS t1
INNER JOIN (SELECT [group], MIN(preference) AS preference
FROM t
GROUP BY [group]
) t2 ON t1.[group] = t2.[group]
AND t1.preference = t2.preference

How to insert "empty" row extracting a month list?

I've this sp, which return a list of data, for each "month" (i.e. each row is a month). Somethings like that:
SELECT
*,
(CAST(t1.NumActivities AS DECIMAL) / t1.NumVisits) * 100 AS PercAccepted,
(CAST(t1.Accepted AS DECIMAL) / t1.Estimated) * 100 AS PercValue
FROM
(SELECT
MONTH(DateVisit) AS Month,
COUNT(*) AS NumVisits,
SUM(CASE WHEN DateActivity is not null THEN 1 ELSE 0 END) AS NumActivities,
SUM(Estimate) AS Estimated,
SUM(CASE WHEN DateActivity is not null THEN Estimate ELSE 0 END) AS Accepted
FROM [dbo].[Activities]
WHERE
DateVisit IS NOT NULL
AND (#year IS NULL OR YEAR(DateVisit) = #year)
AND (#clinicID IS NULL OR ClinicID = #clinicID)
GROUP BY MONTH(DateVisit)) t1
This is a result:
Month NumVisits NumActivities Estimated Accepted PercAccepted PercValue
1 5 1 13770.00 2520.00 20.00000000000 18.30065359477124
2 2 2 7900.00 7900.00 100.00000000000 100.00000000000000
3 1 0 2730.00 0.00 0.00000000000 0.00000000000000
8 1 1 3000.00 3000.00 100.00000000000 100.00000000000000
But as you can see, I could "miss" some Month (for example, here April "4" is missed).
Is it possible to insert, for the missing month/row, an empty (0) record? Such as:
Month NumVisits NumActivities Estimated Accepted PercAccepted PercValue
1 5 1 13770.00 2520.00 20.00000000000 18.30065359477124
2 2 2 7900.00 7900.00 100.00000000000 100.00000000000000
3 1 0 2730.00 0.00 0.00000000000 0.00000000000000
4 0 0 0 0 0 0
...
Here is a example with sample data:
CREATE TABLE #Report
(
Id INT,
Name nvarchar(max),
Percentage float
)
INSERT INTO #Report VALUES (1,'ONE',2.01)
INSERT INTO #Report VALUES (2,'TWO',3.01)
INSERT INTO #Report VALUES (5,'Five',5.01)
;WITH months(Month) AS
(
SELECT 1
UNION ALL
SELECT Month+1
FROM months
WHERE Month < 12
)
SELECT *
INTO #AllMonthsNumber
from months;
Your select query:
The left join will gives you the NULL for other months so just use ISNULL('ColumnName','String_to_replace')
\/\/\/\/
SELECT Month, ISNULL(Name,0), ISNULL(Percentage,0)
FROM AllMonthsNumber A
LEFT JOIN #Report B
ON A.Month = B.Id
EDIT:
Yes you can do it without creating AllMonthNumber Table:
You can use master..spt_values (found here) system table which contains the numbers so just with some where condition.
SELECT Number as Month, ISNULL(B.Name,0), ISNULL(Percentage,0)
FROM master..spt_values A
LEFT JOIN #Report B ON A.Number = B.Id
WHERE Type = 'P' AND number BETWEEN 1 AND 12

How to sum a column in SQL Server recursive cte for optimization?

I have following table with hierarchical data:
FolderId ParentFolderId NumberOfAffectedItems
---------------------------------------------
1 NULL 2
2 1 3
3 2 5
4 2 3
5 1 0
I want to find number of affected items under each folders and all of its children. I can write a recursive cte, which can produce following result, after that by doing group by I can find out what I want.
Normal recursive CTE:
WITH FolderTree AS
(
SELECT
fsa.FolderId AS ParentFolderId,
fsa.FolderId AS ChildFolderId,
fsa.NumberOfReportsAffected
FROM
FoldersWithNumberOfReportsAffected fsa
UNION ALL
SELECT
ft.ParentFolderId,
fsa.FolderId AS ChildFolderId,
fsa.NumberOfReportsAffected
FROM
FoldersWithNumberOfReportsAffected fsa
INNER JOIN
FolderTree ft ON fsa.ParentFolderId = ft.ChildFolderId
)
Result:
ParentFolderId ChildFolderId NumberOfAffectedItems
--------------------------------------------------
1 1 2
1 2 3
1 3 5
1 4 3
1 5 0
2 2 3
2 3 5
2 4 3
3 3 5
4 4 3
5 5 0
But I want to optimize it, I want to start from the leaf child, while
moving through the CTE itself, I want to compute NumberOfAffectedItems.
Expected CTE
WITH FolderTree AS
(
SELECT
fsa.FolderId AS LeafChildId,
fsa.FolderId AS ParentFolderId,
fsa.NumberOfReportsAffected
FROM
FoldersWithNumberOfReportsAffected fsa
LEFT JOIN
FoldersWithNumberOfReportsAffected f ON fsa.folderid = f.ParentfolderId
WHERE
f.ParentfolderId is null -- this is finding leaf child
UNION ALL
SELECT
ft.LeafChildId,
fsa.FolderId AS ParentFolderId,
fsa.NumberOfReportsAffected + ft.NumberOfReportsAffected AS [ComputedResult]
FROM
FoldersWithNumberOfReportsAffected fsa
INNER JOIN
FolderTree ft ON fsa.FolderId = ft.ParentFolderId
)
Result:
LeafChildId ParentFolderId ComputedNumberOfAffectedItems
---------------------------------------------------------
3 3 5
3 2 8
3 1 10
4 4 3
4 2 5
4 1 7
5 5 0
5 1 2
If I group by ParentFolderId, I will get a wrong result, the reason is while doing computing in CTE, the same parent folder is visited from multiple
children, hence results in a wrong result. I want to find out is there anyway we can compute the result while going through the CTE itself.
Please check the following solution. I used your cte as basis and added the calculation (as column x) to it:
DECLARE #t TABLE(
FolderID INT
,ParentFolderID INT
,NumberOfAffectedItems INT
);
INSERT INTO #t VALUES (1 ,NULL ,2)
,(2 ,1 ,3)
,(3 ,2 ,5)
,(4 ,2 ,3)
,(5 ,1 ,0);
WITH FolderTree AS
(
SELECT 1lvl,
fsa.FolderId AS LeafChildId,
fsa.ParentFolderId AS ParentFolderId,
fsa.NumberOfAffectedItems
FROM
#t fsa
LEFT JOIN
#t f ON fsa.folderid = f.ParentfolderId
WHERE
f.ParentfolderId is null -- this is finding leaf child
UNION ALL
SELECT lvl + 1,
ft.LeafChildId,
fsa.ParentFolderId,
fsa.NumberOfAffectedItems
FROM
FolderTree ft
INNER JOIN #t fsa
ON fsa.FolderId = ft.ParentFolderId
)
SELECT LeafChildId,
ISNULL(ParentFolderId, LeafChildId) ParentFolderId,
NumberOfAffectedItems,
SUM(NumberOfAffectedItems) OVER (PARTITION BY LeafChildId ORDER BY ISNULL(ParentFolderId, LeafChildId) DESC) AS x
FROM FolderTree
ORDER BY 1, 2 DESC
OPTION (MAXRECURSION 0)
Result:
LeafChildId ParentFolderId NumberOfAffectedItems x
3 3 2 2
3 2 5 7
3 1 3 10
4 4 2 2
4 2 3 5
4 1 3 8
5 5 2 2
5 1 0 2

Trying avoid using cursor

I have been given a query and trying to figure out a way to remove the cursor yet maintaining functionality, because the starting table can get into the millions of rows.
Example of data in table:
ID DollarValue Month RowNumber
1 $10 1/1/2014 1
1 $15 2/1/2014 2
1 -$40 3/1/2014 3
1 $50 4/1/2014 4
2 -$11 1/1/2014 1
2 $11 2/1/2014 2
2 $5 3/1/2014 3
Expected results:
ID DollarValue Month RowNumber TestVal
1 $10 1/1/2014 1 1
1 $15 2/1/2014 2 0
1 -$40 3/1/2014 3 -1
1 $50 4/1/2014 4 1
2 -$11 1/1/2014 1 -1
2 $11 2/1/2014 2 0
2 $5 3/1/2014 3 1
Here is the logic (pseudocode)that happens inside the cursor:
If a #ID <> #LastId AND #Month <> #LastMonth
Set #RunningTotal = #DollarValue
Set #LastMonth = '12/31/2099'
Set #LastID = #ID
Set #TestVal = Sign(#DollarValue)
Else
If Sign(#RunningTotal) = Sign(#RunningTotal + #DollarValue)
Set #TestVal = 0
Else
Set #TestVal = Sign(#DollarValue)
Set #RunningTotal = #RunningTotal + #DollarValue
Any idea how I can change this to set based?
You can use the windowed version of SUM to calculate running totals:
;WITH CTE AS (
SELECT ID, DollarValue, Month, RowNumber,
SUM ( DollarValue ) OVER (PARTITION BY ID ORDER BY RowNumber) as RunningTotal
FROM #mytable
)
SELECT C1.ID, C1.DollarValue, C1.Month, C1.RowNumber,
CASE WHEN C1.RowNumber = 1 THEN SIGN(C1.DollarValue)
WHEN SIGN(C1.RunningTotal) = SIGN(C2.RunningTotal) THEN 0
ELSE SIGN(C1.RunningTotal)
END AS TestVal
FROM CTE AS C1
LEFT JOIN CTE AS C2 ON C1.ID = C2.ID AND C1.RowNumber = C2.RowNumber + 1
Using LEFT JOIN on RowNumber you can get the previous record and compare the current running total with the previous one. Then use a simple CASE to apply rules pertinent to changes in SIGN of running total.
SQL FIDDLE Demo
P.S. It seems the above solution wont work in versions prior to SQL Server 2012. In this case the running total calculation inside the CTE has to be replaced by the "conventional" version.
This is 2008 solution
WITH CTE AS (
SELECT
AA.[ID]
,AA.[Month]
,AA.[RowNumber]
,AA.[DollarValue]
,SIGN(SUM(BB.[DollarValue])) AS RunTotalSign
FROM YourTable AS AA
LEFT JOIN YourTable AS BB
ON (AA.[ID] = BB.[ID] AND BB.[RowNumber] <= AA.[RowNumber])
GROUP BY AA.[ID],AA.[Month],AA.[DollarValue],AA.[RowNumber])
)
SELECT
AA.[ID]
,AA.[Month]
,AA.[RowNumber]
,AA.[DollarValue]
,CASE WHEN AA.RunTotalSign = CC.RunTotalSign Then 0
ELSE AA.RunTotalSign
END
AS TestVal
FROM CTE AS AA
LEFT JOIN CTE AS CC
ON (AA.[ID] = CC.[ID] AND AA.[RowNumber] = CC.[RowNumber]+1)

A group by challenge

Let's say I have this table MyTbl
Record Id_try Id Type IsOk DateOk
1 1 MYDB00125 A 0 NULL
2 1 MYDB00125 B 1 2012-07-19 20:10:05.000
3 1 MYDB00125 A 0 2012-07-25 14:10:05.000
4 2 MYDB00125 A 0 2012-07-19 22:10:05.000
5 1 MYDB00254 B 0 2012-07-19 22:10:05.000
6 1 MYDB00254 A 0 NULL
7 3 MYDB00125 A 1 2012-07-19 22:15:05.000
8 3 MYDB00125 B 1 2012-07-19 22:42:53.000
9 1 MYDB00323 A 1 2012-07-22 00:15:05.00 0
10 1 MYDB00323 C 0 NULL
And I want a group by that brings me for each Id and Type my last "Id_Try Record".
SELECT Id, MAX(Id_Try), MyTbl.Type, IsOK, MAX(DateOk) from MyTbl
GROUP BY Id, MyTbl.Type, IsOK
Won't do, because It'll bring me the last Id_Try AND the last date (Date of record 3 in the example). And I don't care if its the last date or not, I need the date of the last Id_Try.
Is this only solved by a subselect? or a having clause could do?
This is the result expected:
Record Id_try Id Type IsOk DateOk
5 1 MYDB00254 B 0 2012-07-19 22:10:05.000
6 1 MYDB00254 A 0 NULL
7 3 MYDB00125 A 1 2012-07-19 22:15:05.000
8 3 MYDB00125 B 1 2012-07-19 22:42:53.000
9 1 MYDB00323 A 1 2012-07-22 00:15:05.00 0
10 1 MYDB00323 B 0 NULL
I think you will need to break this into two pieces:
with maxIDTry as
(
SELECT MAX(Id_try) as maxId, ID
FROM MyTable
GROUP BY ID
)
SELECT * FROM MyTable as mt
INNER JOIN maxIDTry as max
ON mt.id_try = max.maxId AND mt.id = max.id
I think you want this:
select * FROM
(
select *, row_number() over (partition by id,type order by Id_try desc) as position from mytbl
) foo
where position = 1
order by record
http://www.sqlfiddle.com/#!3/95742/5
Your sample result set lists
9 1 MYDB00323 A 1 2012-07-22 00:15:05.00 0
10 1 MYDB00323 A 0 NULL
But that doesn't make sense since you're saying the ID and the Id_try have the same value. I assume you meant for Id_try to be 2 maybe? Otherwise I think my results match up.
Hope this helps.
SELECT A.Record, A.Id_try, A.Id, A.Type, A.IsOk, A.DateOk
FROM MyTbl A INNER JOIN (
SELECT MAX(Id_Try) Id_Try, Id, B1.Type
from MyTbl B1
GROUP BY Id, B1.Type) AS B
ON A.Id_Try = B.Id_Try AND A.Id = B.Id AND A.Type = B.Type
ORDER BY A.RECORD

Resources