How does a recursive CTE eliminate duplicates? - sql-server

I'm learning recursive CTEs in the AdventureWorks2012 database using SQL Server 2014 Express. I think I'm mostly getting the below example (taking from Beginning T-SQL 3rd Edition), but I don't quite understand why the recursive CTE doesn't produce duplicates.
Below is the recursive CTE that I'm trying to understand, it's a standard employee - manager hierarchy.
;with orgchart (employeeid, managerid, title, level, node) as (
--Anchor
select employeeid
, managerid
, title
, 0
, convert(varchar(30),'/') 'node'
from employee
where managerid is null
union all
--Recursive
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join orgchart oc on oc.employeeid = emp.managerid
)
select employeeid
, managerid
, space(level * 3) + title 'title'
, level
, node
from orgchart
order by node;
It works fine, but the question comes when I try to understand what's going on by recreating it via temp tables. I create a series of temp tables to plug one output into the next query's input and recreate what the recursive CTE does.
--Anchor (Level 0)
select employeeid
, managerid
, title
, 0
, convert(varchar(30),'/') 'node'
into #orgchart
from employee
where managerid is null
Then I use that temp table to recreate the first level of recursion, at this point it's just the recursive CTE but with temp tables.
--Anchor + 1 level
select *
into #orgchart2
from #orgchart
union all
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join #orgchart oc on oc.employeeid = emp.managerid
So far so good, the results make sense. Then I do it one more time, but here's where it starts to break down:
--Anchor + 2 levels
select *
into #orgchart3
from #orgchart2
union all
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join #orgchart2 oc on oc.employeeid = emp.managerid
The output from this begins to return duplicate rows (all fields duplicate) of the level 1 employees. This makes sense - the second query after the UNION ALL will return the previous levels as well as the new level of recursion, and UNION ALL doesn't duplicate. If I do another round of recursion, the level 2 employees are also duplicated, and so on.
I understand that I can change UNION ALL to UNION in order to remove duplicates, but I'm trying to understand why the recursive CTE doesn't produce duplicates as well? It uses UNION ALL so I don't understand where the deduplication comes in. Is removal of duplicates an intrinsic part of a recursive CTE?
I'm trying to post all the result sets, but if they're needed to understand the problem then let me know and I will post them. Thanks in advance.

The difference is that when you populate your #orgchart2, you are including all the rows from #orgchart. So now when you create #orgchart3 (which represents a 3rd level of recursion), you are joining on the rows from #orgchart as well as #orgchart2.
So when you create the third level in #orgchart3, it is related to rows in both #orgchart and #orgchart2, when it should only be related to #orgchart2. Instead your third level includes rows that are one level beyond the 2nd level, but also one level beyond the anchor level, so you are duplicating rows, since you already have rows in the second level that are one level beyond the anchor level.
The optimizer knows not to do that with recursive CTEs. Each level of recursion only looks at the previous one and ignores all the ones that came before it. So no duplicates are created.
You would simulate what the optimizer does if you left out the top half of the UNION ALL when you populated #orgchart2 and #orgchart3, and then finally produced a single UNION ALL of all three temp tables.

Related

Improve a query with Pivot and Recursive code in SQL Server

I need to reach the next result considering these two tables.
An area receives services from different departments. Each department belongs to a hierarchy on three (or fewer) levels. The idea is to represent in one column the relationship between the area and all the hierarchies where it can be present. The Level Nro should be 1 for the record that does not have any father.
So far, I have this code https://rextester.com/KYHKR17801 . I've got the result that I need. However, the performance is not the best because the table is too large, and I had to do many transformations:
Pivot
Recursion
Addition of register because I lost the nulls when creating the Pivot table
Update the level Nro
I do not if anyone can give any advice to improve the runtime of this query.
This appears to do everything you need in one statement:
WITH R AS
(
SELECT
SA.AreaID,
S.[service],
S.[description],
L.[Level],
L.child_service,
Recursion = 1
FROM dbo.service_area AS SA
JOIN dbo.[service] AS S
ON S.[service] = SA.[Service]
OUTER APPLY
(
-- Unpivot
VALUES
(1, S.level1),
(2, S.level2),
(3, S.level3)
) AS L ([Level], child_service)
WHERE
L.child_service IS NOT NULL
UNION ALL
SELECT
R.AreaID,
S.[service],
S.[description],
R.[Level],
child_service = CHOOSE(R.[Level], S.level1, S.level2, S.level3),
Recursion = R.Recursion + 1
FROM R
JOIN dbo.[service] AS S
ON S.[service] = R.child_service
)
SELECT
R.AreaID,
R.[service],
R.[description],
[Level] = 'Level' + CONVERT(char(1), R.[Level]),
[Level Nro] = ROW_NUMBER() OVER (
PARTITION BY R.AreaID, R.[Level]
ORDER BY R.Recursion DESC)
FROM R
ORDER BY
R.AreaID ASC,
R.[Level] ASC,
[Level Nro]
OPTION (MAXRECURSION 3);
The following index will help the recursive section locate rows quickly:
CREATE UNIQUE CLUSTERED INDEX cuq ON dbo.[service] ([service]);
db<>fiddle demo
If your version of SQL Server doesn't have CHOOSE, write the CASE statement out by hand:
CASE R.[Level] WHEN 1 THEN S.level1 WHEN 2 THEN S.level2 ELSE S.level3 END

Getting non-deterministic results from WITH RECURSIVE cte

I'm trying to create a recursive CTE that traverses all the records for a given ID, and does some operations between ordered records. Let's say I have customers at a bank who get charged a uniquely identifiable fee, and a customer can pay that fee in any number of installments:
WITH recursive payments (
id
, index
, fees_paid
, fees_owed
)
AS (
SELECT id
, index
, fees_paid
, fee_charged
FROM table
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, t.fees_paid
, p.fees_owed - p.fees_paid
FROM table t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
The join logic seems sound, but when I join the output of this query to the source table, I'm getting non-deterministic and incorrect results.
This is my first foray into Snowflake's recursive CTEs. What am I missing in the intermediate result logic that is leading to the non-determinism here?
I assume this is edited code, because in the anchor of you CTE you select the fourth column fee_charged which does not exist, and then in the recursion you don't sum the fees paid and other stuff, basically you logic seems rather strange.
So creating some random data, that has two different id streams to recurse over:
create or replace table data (id number, index number, val text);
insert into data
select * from values (1,1,'a'),(2,1,'b')
,(1,2,'c'), (2,2,'d')
,(1,3,'e'), (2,3,'f')
v(id, index, val);
Now altering you CTE just a little bit to concat that strings together..
WITH RECURSIVE payments AS
(
SELECT id
, index
, val
FROM data
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, p.val || t.val as val
FROM data t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
we get:
ID INDEX VAL
1 1 a
1 2 ac
1 3 ace
2 1 b
2 2 bd
2 3 bdf
Which is exactly as I would expect. So how this relates to your "it gets strange when I join to other stuff" is ether, your output of you CTE is not how you expect it to be.. Or your join to other stuff is not working as you expect, Or there is a bug with snowflake.
Which all comes down to, if the CTE results are exactly what you expect, create a table and join that to your other table, so eliminate some form of CTE vs JOIN bug, and to debug why your join is not working.
But if your CTE output is not what you expect, then lets help debug that.

How does this recursion repeat itself?

I have a question about some code.
I have a relation that is called comedians. It has the attribute comedian and preceding comedian. So the first comedian say Bob, has null in his field for preceding comedian. My question is, how does this code repeat until all child instances are found? I just can not wrap my head around it.
I know that the first part: the one part before UNION ALL selects all parent elements, so all comedians that have no comedians that performed before them (preceding comedian), but how can all the other comedians, under the parent be chosen? What makes it recursive?
with recursive tree as (
select company, comedian, preceding_comedian, 1 as level
from the_table
where preceding_comedian is null
union all
select ch.company, ch.comedian, ch.preceding_comedian, p.level + 1
from the_table ch
join tree p on ch.preceding_comedian = p.comedian
)
First, the non-recursive part of the query is performed:
select company, comedian, preceding_comedian, 1 as level
from the_table
where preceding_comedian is null
and the result is put in a “work table”.
Then the recursive part of the query is performed, where the work table is substituted for the recursive CTE:
select ch.company, ch.comedian, ch.preceding_comedian, p.level + 1
from the_table ch
join <work-table> p on ch.preceding_comedian = p.comedian
The result is added to the work table (if UNION is used instead of UNION ALL, duplicates are removed in the result).
The second step is repeated until the work table does not change any more.
The resulting work table is the result of the CTE.
So it is actually not so much a recursive, but an “iterative” CTE.

SQL Server Equivalent of Oracle 'CONNECT BY PRIOR', and 'ORDER SIBLINGS BY'

I've got this Oracle code structure I'm trying to convert to SQL Server 2008 (Note: I have used generic names, enclosed column names and table names within square brackets '[]', and done some formatting to make the code more readable):
SELECT [col#1], [col#2], [col#3], ..., [col#n], [LEVEL]
FROM (SELECT [col#1], [col#2], [col#3], ..., [col#n]
FROM [TABLE_1]
WHERE ... )
CONNECT BY PRIOR [col#1] = [col#2]
START WITH [col#2] IS NULL
ORDER SIBLINGS BY [col#3]
What is the SQL Server equivalent template of the above code?
Specifically, I'm struggling with the LEVEL, and 'ORDER SIBLINGS BY' Oracle constructs.
Note:
The above "code" is the final output from a set of Oracle procedures. Basically, the 'WHERE' clause is built up dynamically and changes depending on various parameters passed. The code block starting with 'CONNECT BY PRIOR' is hard-coded.
For Reference:
The Simulation of CONNECT BY PRIOR of ORACLE in SQL SERVER article comes close, but it does not explain how to handle the 'LEVEL' and the 'ORDER SIBLINGS' constructs. ... And my mind is getting in a twist!
SELECT name
FROM emp
START WITH name = 'Joan'
CONNECT BY PRIOR empid = mgrid
equates to:
WITH n(empid, name) AS
(SELECT empid, name
FROM emp
WHERE name = 'Joan'
UNION ALL
SELECT nplus1.empid, nplus1.name
FROM emp as nplus1, n
WHERE n.empid = nplus1.mgrid)
SELECT name FROM n
If I have an initial template to work from, it will go a long way to helping me construct SQL Server stored procs to build up a correct T-SQL statement.
Assistance will be much appreciated.
Simulating the LEVEL column
The level column can easily be simulated by incrementing a counter in the recursive part:
WITH tree (empid, name, level) AS (
SELECT empid, name, 1 as level
FROM emp
WHERE name = 'Joan'
UNION ALL
SELECT child.empid, child.name, parent.level + 1
FROM emp as child
JOIN tree parent on parent.empid = child.mgrid
)
SELECT name
FROM tree;
Simulating order siblings by
Simulating the order siblings by is a bit more complicated. Assuming we have a column sort_order that defines the order of elements per parent (not the overall sort order - because then order siblings wouldn't be necessary) then we can create a column which gives us an overall sort order:
WITH tree (empid, name, level, sort_path) AS (
SELECT empid, name, 1 as level,
cast('/' + right('000000' + CONVERT(varchar, sort_order), 6) as varchar(max))
FROM emp
WHERE name = 'Joan'
UNION ALL
SELECT child.empid, child.name, parent.level + 1,
parent.sort_path + '/' + right('000000' + CONVERT(varchar, child.sort_order), 6)
FROM emp as child
JOIN tree parent on parent.empid = child.mgrid
)
SELECT *
FROM tree
order by sort_path;
The expression for the sort_path looks so complicated because SQL Server (at least the version you are using) does not have a simple function to format a number with leading zeros. In Postgres I would use an integer array so that the conversion to varchar isn't necessary - but that doesn't work in SQL Server either.
The option given by the user "a_horse_with_no_name" worked for me. I changed the code and applied it to a menu generator query and it worked the first time. Here is the code:
WITH tree(option_id,
option_description,
option_url,
option_icon,
option_level,
sort_path)
AS (
SELECT ppo.option_id,
ppo.option_description,
ppo.option_url,
ppo.option_icon,
1 AS option_level,
CAST('/' + RIGHT('00' + CONVERT(VARCHAR, ppo.option_index), 6) AS VARCHAR(MAX))
FROM security.options_table_name ppo
WHERE ppo.option_parent_id IS NULL
UNION ALL
SELECT co.option_id,
co.option_description,
co.option_url,
co.option_icon,
po.option_level + 1,
po.sort_path + '/' + RIGHT('00' + CONVERT(VARCHAR, co.option_index), 6)
FROM security.options_table_name co,
tree AS po
WHERE po.option_id = co.option_parent_id)
SELECT *
FROM tree
ORDER BY sort_path;
to get dates for last 10 days:
SELECT DISTINCT RecordDate = DATEADD(DAY,-number,CAST(GETDATE() AS DATE))
FROM master..[spt_values]
WHERE number BETWEEN 1 AND 10

Hierarchical SQL query not returning level

I have a typical SQL Server hierarchical query:
WITH bhp AS (
SELECT name, 0 AS level
FROM dbo.BhpNode
WHERE parent_id IS NULL
UNION ALL
SELECT a.name, level + 1
FROM dbo.BhpNode a
INNER JOIN dbo.BhpNode b
ON b.bhp_node_id = a.parent_id )
SELECT * FROM bhp
This seems to match the various examples of hierarchical queries I've found around the web, but for some reason it's producting this error:
Msg 207, Level 16, State 1, Line 12
Invalid column name 'level'.
I'm sure I'm missing something obvious, but I've stared at it too long to see it. Any idea where I'm going wrong?
Your query isn't recursive - you have to select from bhp inside the second part of the recursive CTE. Try this instead:
WITH bhp AS (
SELECT *, 0 AS [level]
FROM dbo.BhpNode
WHERE parent_id IS NULL
UNION ALL
SELECT b.*, [level] + 1
FROM bhp a
INNER JOIN dbo.BhpNode b
ON a.bhp_node_id = b.parent_id)
SELECT * FROM bhp
In the recursive section of the CTE, one of the tables you reference should be the CTE itself, shouldn't it? At the moment you are just self-joining BhpNode, and it doesn't have a level column itself.

Resources