SQL Server: How to limit CTE recursion to rows just recursivly added? - sql-server

Simpler Example
Let's try a simpler example, so people can wrap their heads around the concepts, and have a practical example that you can copy&paste into SQL Query Analizer:
Imagine a Nodes table, with a heirarchy:
A
- B
- C
We can start testing in Query Analizer:
CREATE TABLE ##Nodes
(
NodeID varchar(50) PRIMARY KEY NOT NULL,
ParentNodeID varchar(50) NULL
)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('A', null)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('B', 'A')
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('C', 'B')
Desired output:
ParentNodeID NodeID GenerationsRemoved
============ ====== ==================
NULL A 1
NULL B 2
NULL C 3
A B 1
A C 2
B C 1
Now the suggested CTE expression, with it's incorrect output:
WITH NodeChildren AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM ##Nodes
WHERE ParentNodeID IS NULL
UNION ALL
--recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM NodeChildren AS P
INNER JOIN ##Nodes AS N
ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
Actual output:
ParentNodeID NodeID GenerationsRemoved
============ ====== ==================
NULL A 1
NULL B 2
NULL C 3
Note: If SQL Server 2005† CTE cannot do what i was doing before in 2000‡, that's fine, and that's the answer. And whoever gives "it's not possible" as the answer will win the bounty. But i will wait a few days to make sure everyone concur's that it's not possible before i irrecovably give 250 reputation for a non-solution to my problem.
Nitpickers Corner
†not 2008
‡without resorting to a UDF*, which is the solution already have
*unless you can see a way to improve the performance of the UDF in the original question
Original Question
i have a table of Nodes, each with a parent that points to another Node (or to null).
For illustration:
1 My Computer
2 Drive C
4 Users
5 Program Files
7 Windows
8 System32
3 Drive D
6 mp3
i want a table that returns all the parent-child relationships, and the number of generations between them
For for all direct parent relationships:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 1 1
1 2 1
2 4 1
2 5 1
2 7 1
1 3 1
3 6 1
7 8 1
But then there's the grandparent relationships:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 2 2
(null) 3 2
1 4 2
1 5 2
1 7 2
1 6 2
2 8 2
And the there's the great-grand-grandparent relationships:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 4 3
(null) 5 3
(null) 7 3
(null) 6 3
1 8 3
So i can figure out the basic CTE initialization:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, NodeID AS ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
}
The problem now is the recursive part. The obvious answer, of course, doesn't work:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT parents.ParentNodeID, children.NodeID, parents.Generations+1
FROM NodeChildren parents
INNER JOIN NodeParents children
ON parents.NodeID = children.ParentNodeID
}
Msg 253, Level 16, State 1, Line 1
Recursive member of a common table expression 'NodeChildren' has multiple recursive references.
All the information needed to generate the entire recursive list is present in the inital CTE table. But if that's not allowed i'll try:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT parents.ParentNodeID, Nodes.NodeID, parents.Generations+1
FROM NodeChildren parents
INNER JOIN Nodes
ON parents.NodeID = nodes.ParentNodeID
}
But that fails because it's not only joining on the recursive elements, but keeps recursivly adding the same rows over and over:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
In SQL Server 2000 i simulated a CTE by using a User Defined Function (UDF):
CREATE FUNCTION [dbo].[fn_NodeChildren] ()
RETURNS #Result TABLE (
ParentNodeID int NULL,
ChildNodeID int NULL,
Generations int NOT NULL)
AS
/*This UDF returns all "ParentNode" - "Child Node" combinations
...even multiple levels separated
BEGIN
DECLARE #Generations int
SET #Generations = 1
--Insert into the Return table all "Self" entries
INSERT INTO #Result
SELECT ParentNodeID, NodeID, #Generations
FROM Nodes
WHILE ##rowcount > 0
BEGIN
SET #Generations = #Generations + 1
--Add to the Children table:
-- children of all nodes just added
-- (i.e. Where #Result.Generation = CurrentGeneration-1)
INSERT #Result
SELECT CurrentParents.ParentNodeID, Nodes.NodeID, #Generations
FROM Nodes
INNER JOIN #Result CurrentParents
ON Nodes.ParentNodeID = CurrentParents.ChildNodeID
WHERE CurrentParents.Generations = #Generations - 1
END
RETURN
END
And the magic to keep it from blowing up was the limiting where clause:
WHERE CurrentParents.Generations - #Generations-1
How do you prevent a recursive CTE from recursing forever?

Try this:
WITH Nodes AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM ##Nodes
UNION ALL
----recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM Nodes AS P
INNER JOIN ##Nodes AS N
ON P.NodeID = N.ParentNodeID
WHERE P.GenerationsRemoved <= 10
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM Nodes
ORDER BY ParentNodeID, NodeID, GenerationsRemoved
Basically removing the "only show me absolute parents" from the initialization query; That way it generates the results starting from each of them and decending from there. I also added in the "WHERE P.GenerationsRemoved <= 10" as an infinite recursion catch(replace 10 with any number up to 100 to fit your needs). Then add the sort so it looks like the results you wanted.

Aside: do you have SQL Server 2008? This might be suited to the hierarchyid data type.

If I understand your intentions you can get you result by doing something like this:
DECLARE #StartID INT;
SET #StartID = 1;
WITH CTE (ChildNodeID, ParentNodeID, [Level]) AS
(
SELECT t1.ChildNodeID,
t1.ParentNodeID,
0
FROM tblNodes AS t1
WHERE ChildNodeID = #StartID
UNION ALL
SELECT t1.ChildNodeID,
t1.ParentNodeID,
t2.[Level]+1
FROM tblNodes AS t1
INNER JOIN CTE AS t2 ON t1.ParentNodeID = t2.ChildNodeID
)
SELECT t1.ChildNodeID, t2.ChildNodeID, t1.[Level]- t2.[Level] AS GenerationsDiff
FROM CTE AS t1
CROSS APPLY CTE t2
This will return the generation difference between all nodes, you can modify it for you exact needs.

Well, your answer is not quite that obvious :-)
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
This part is called the "anchor" part of the recursive CTE - but it should really only select one or a select few rows from your table - this selects everything!
I guess what you're missing here is simply a suitable WHERE clause:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
**WHERE ParentNodeID IS NULL**
However, I am afraid your requirement to have not just the "straight" hierarchy, but also the grandparent-child rows, might not be that easy to satisfy.... normally recursive CTE will only ever show one level and its direct subordinates (and that down the hierarchy, of course) - it doesn't usually skip one, two or even more levels.
Hope this helps a bit.
Marc

The issue is with the Sql Server default recursion limit (100). If you try your example at the top with the anchor restriction removed (also added Order By):
WITH NodeChildren AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM NodeChildren AS P
inner JOIN Nodes AS N
ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
ORDER BY ParentNodeID ASC
This produces the desired results. The problem you aare facing is with a larger number of rows you will recirse over 100 times which is a default limit. This can be changed by adding option (max recursion x) after your query, where x is a number between 1 and 32767. x can also be set to 0 which sets no limit however could very quickly have a very detrimental impact on your server performance. Clearly as the number of rows in Nodes increases, the number of recursions will can rise very quickly and I would avoid this approach unless there was a known upper limit on the rows in the table. For completeness, the final query should look like:
WITH NodeChildren AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM NodeChildren AS P
inner JOIN Nodes AS N
ON P.NodeID = N.ParentNodeID
)
SELECT *
FROM NodeChildren
ORDER BY ParentNodeID
OPTION (MAXRECURSION 32767)
Where 32767 could be adjusted downwards to fit your scenario

Have you tried constructing a path in the CTE and using it to identify ancestors?
You can then subtract the descendant node depth from the ancestor node depth to calculate the GenerationsRemoved column, like so...
DECLARE #Nodes TABLE
(
NodeId varchar(50) PRIMARY KEY NOT NULL,
ParentNodeId varchar(50) NULL
)
INSERT INTO #Nodes (NodeId, ParentNodeId) VALUES ('A', NULL)
INSERT INTO #Nodes (NodeId, ParentNodeId) VALUES ('B', 'A')
INSERT INTO #Nodes (NodeId, ParentNodeId) VALUES ('C', 'B')
DECLARE #Hierarchy TABLE
(
NodeId varchar(50) PRIMARY KEY NOT NULL,
ParentNodeId varchar(50) NULL,
Depth int NOT NULL,
[Path] varchar(2000) NOT NULL
)
WITH Hierarchy AS
(
--initialization
SELECT NodeId, ParentNodeId, 0 AS Depth, CONVERT(varchar(2000), NodeId) AS [Path]
FROM #Nodes
WHERE ParentNodeId IS NULL
UNION ALL
--recursive execution
SELECT n.NodeId, n.ParentNodeId, p.Depth + 1, CONVERT(varchar(2000), p.[Path] + '/' + n.NodeId)
FROM Hierarchy AS p
INNER JOIN #Nodes AS n
ON p.NodeId = n.ParentNodeId
)
INSERT INTO #Hierarchy
SELECT *
FROM Hierarchy
SELECT parent.NodeId AS AncestorNodeId, child.NodeId AS DescendantNodeId, child.Depth - parent.Depth AS GenerationsRemoved
FROM #Hierarchy AS parent
INNER JOIN #Hierarchy AS child
ON child.[Path] LIKE parent.[Path] + '/%'

This breaks the recursion limit imposed on Chris Shaffer's answer.
I create a table with a cycle:
CREATE TABLE ##Nodes
(
NodeID varchar(50) PRIMARY KEY NOT NULL,
ParentNodeID varchar(50) NULL
)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('A', 'C');
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('B', 'A');
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('C', 'B');
In cases where there is a potential cycle (i.e. ParentNodeId IS NOT NULL), the generation removed is started at 2. We can then identity cycles by checking (P.ParentNodeID == N.NodeID), which we simply don't add it. Afterwards, we append the omitted generation remove = 1.
WITH ParentNodes AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM ##Nodes
WHERE ParentNodeID IS NULL
UNION ALL
SELECT P.ParentNodeID, N.NodeID, 2 AS GenerationsRemoved
FROM ##Nodes N
JOIN ##Nodes P ON N.ParentNodeID=P.NodeID
WHERE P.ParentNodeID IS NOT NULL
UNION ALL
----recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM ParentNodes AS P
INNER JOIN ##Nodes AS N
ON P.NodeID = N.ParentNodeID
WHERE P.ParentNodeID IS NULL OR P.ParentNodeID <> N.NodeID
),
Nodes AS (
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM ##Nodes
WHERE ParentNodeID IS NOT NULL
UNION ALL
SELECT ParentNodeID, NodeID, GenerationsRemoved FROM ParentNodes
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM Nodes
ORDER BY ParentNodeID, NodeID, GenerationsRemoved

with cte as
(
select a=65, L=1
union all
select a+1, L=L+1
from cte
where L<=100
)
select
IsRecursion=Case When L>1 then 'Recursion' else 'Not Recursion' end,
AsciiValue=a,
AsciiCharacter=char(a)
from cte
Create a column containing the current level.
Check if the level is >1
My example here shows a recursive CTE that stops recursion after 100 levels (the max). As a bonus, it displays a bunch of ASCII characters and the corresponding numeric value.

Related

List all the ancestors or parent nodes of a particular node present at each level in pivoted table containing levels as attributes in SQL Server

I have a table 'temp' which has id and its immediate parent's id as columns. The table is as follows:
1
/ \
2 3
/|\ \
4 5 6 7
/
8
Hierarchy of nodes can be represented in tree structure as above.
Now, I want to list all the ancestor or parent nodes of each node present at all levels in pivoted table using recursive cte which has levels (like Level1, Level2 and so on) as its attributes. To get this output, I have calculated all the parent nodes in non pivoted table with level of each node with respect to its parent. The sql query for which is below:
WITH ctetable as
(
SELECT S.id, S.parent, 1 as level
FROM temp as S where S.parent is not null
UNION ALL
SELECT S2.id, p.parent, p.level + 1
FROM ctetable AS p JOIN temp as S2 on S2.parent = p.id
)
SELECT * FROM ctetable ORDER BY id;
The output of above query is shown below:
But, I want to pivot the recursive cte which contains the parent id under at each level of a particular node. Say, for id=4, it should display parent id 4, 2 and 1 under Level3, Level2 and Level1 respectively. For this I wrote the following query:
WITH ctetable as
(
SELECT S.id, S.parent, 1 as level
FROM temp as S where S.parent is not null
UNION ALL
SELECT S2.id, p.parent, p.level + 1
FROM ctetable AS p JOIN temp as S2 on S2.parent = p.id
)
SELECT
myid,
[pn].[1] AS [Level1],
[pn].[2] AS [Level2],
[pn].[3] AS [Level3]
FROM
(
SELECT [a].id,
[a].id as myid,
[a].level
FROM ctetable AS [a]
) AS [hn] PIVOT(max([hn].id) FOR [hn].level IN([1],[2],[3])) AS [pn]
But, the output table is not the desired one as it contains the same id repeated as parent id under each level for a particular node instead it should contain all the parents of that node under various levels. The output i got after executing the above query is shown below:
Can anybody help me out with this....
If you have a known or maximum number of levels, and assuming I did not reverse your desired results.
Also Best if you post sample data and desired results as text, not as an image
Example
Declare #YourTable Table ([id] int,[parent] int)
Insert Into #YourTable Values
(1,null)
,(2,1)
,(3,1)
,(7,3)
,(4,2)
,(5,2)
,(6,2)
,(8,4)
;with cteP as (
Select id
,Parent
,PathID = cast(10000+id as varchar(500))
From #YourTable
Where Parent is Null
Union All
Select id = r.id
,Parent = r.Parent
,PathID = cast(concat(p.PathID,',',10000+r.id) as varchar(500))
From #YourTable r
Join cteP p on r.Parent = p.id)
Select ID
,B.*
From cteP A
Cross Apply (
Select Level1 = xDim.value('/x[1]','int')-10000
,Level2 = xDim.value('/x[2]','int')-10000
,Level3 = xDim.value('/x[3]','int')-10000
,Level4 = xDim.value('/x[4]','int')-10000
,Level5 = xDim.value('/x[5]','int')-10000
From (Select Cast('<x>' + replace(PathID,',','</x><x>')+'</x>' as xml) as xDim) as X
) B
Order By PathID
Returns
EDIT - Added +10000
I added the +10000 so that the sequence will be maintained

How to present hierarchical data in SQL Server 2014

I have two tables Company and CompanyRelationShip.
DECLARE #Company TABLE (
CompanyId INT
,RootCompanyId INT
,CompanyName VARCHAR(100)
)
INSERT INTO #Company
VALUES (2,2,'ROOT')
,(106,2,'ABC')
,(105,2,'CDF')
,(3,3,'ROOT2')
,(150,3,'YXZ')
,(151,3,'XZX')
DECLARE #CompanyRelationShip TABLE (
PrimaryCompanyId INT
,CompanyId INT
)
INSERT INTO #CompanyRelationShip
VALUES (2,2)
,(2,106)
,(2,105)
,(106,105)
,(3,3)
,(3,151)
,(3,150)
,(151,150)
I want the result in the below format
CompanyId PrimayCompanyId PrimaryCompanyName RootCompanyId RootCompanyName
2 2 ROOT 2 ROOT
106 2 ROOT 2 ROOT
105 106 ABC 2 ROOT
3 3 ROOT2 3 ROOT2
151 3 ROOT2 3 ROOT2
150 151 XZX 3 ROOT2
I have tried the below query to get the result
WITH PrimayCompany
AS (
SELECT CR.PrimaryCompanyId
,C.CompanyName
FROM #CompanyRelationShip CR
JOIN #Company C ON CR.CompanyId = CR.PrimaryCompanyId
)
,RootCompany
AS (
SELECT RootCompanyId
,CompanyName
FROM #Company
WHERE CompanyId = RootCompanyId
)
SELECT C.CompanyId
,C.RootCompanyId
,RC.CompanyName
,CR.PrimaryCompanyId
,PC.CompanyName
FROM #Company C
LEFT JOIN #CompanyRelationShip CR ON C.CompanyId = CR.PrimaryCompanyId
LEFT JOIN PrimayCompany PC ON PC.PrimaryCompanyId = CR.PrimaryCompanyId
LEFT JOIN RootCompany RC ON RC.RootCompanyId = CR.PrimaryCompanyId
I would really appreciate a bit of help.
In my comment I asked you, why you would need the table #CompanyRelationShip at all... This is just adding a hell of a lot of complexity and potentials for errors.
My suggestion relies on the first table alone. Look, how I've changed the parent IDs of 105 and 151 to place them below in the hierarchy. Just to show the principles I've added a second child below 150:
DECLARE #Company TABLE (
CompanyId INT
,RootCompanyId INT
,CompanyName VARCHAR(100)
);
INSERT INTO #Company
VALUES (2,2,'ROOT')
,(106,2,'ABC')
,(105,106,'CDF')
,(3,3,'ROOT2')
,(150,3,'YXZ')
,(151,150,'XZX')
,(152,150,'Second below 150');
--the query
WITH recCTE AS
(
SELECT CompanyId AS [RootId],CompanyName AS [RootName],*,1 AS HierarchyLevel FROM #Company WHERE CompanyId=RootCompanyId
UNION ALL
SELECT rc.RootId,rc.RootName,c.*,rc.HierarchyLevel+1
FROM #Company c
INNER JOIN recCTE rc ON c.RootCompanyId=rc.CompanyId AND c.CompanyId<>rc.CompanyId
)
SELECT RootId
,RootName
,RootCompanyId AS [PrevId]
,CompanyId
,CompanyName
,HierarchyLevel
FROM recCTE rc
ORDER BY RootId,HierarchyLevel;
The result
RootId RootName PrevId CompanyId CompanyName HierarchyLevel
2 ROOT 2 2 ROOT 1
2 ROOT 2 106 ABC 2
2 ROOT 106 105 CDF 3
3 ROOT2 3 3 ROOT2 1
3 ROOT2 3 150 YXZ 2
3 ROOT2 150 151 XZX 3
3 ROOT2 150 152 Second below 150 3
The idea in short:
We use a a recursive CTE (which is a rather iterative concept actually).
The first SELECT (the anchor) starts with the companies, where the two IDs match.
The second SELECT after UNION ALL picks the next level by joining to the intermediate result line
The two columns RootId and RootName are just passed through to show up in your final set.
The HierarchyLevel is the position within the line, thus placing 105 within ROOT, but below 106.
Hope this helps...
A solution for the given structure
As told, the given structure is not the best choice and should be altered. But if you have to stick to this, you might try something along this:
WITH recCTE AS
(
SELECT CompanyId AS [RootId],CompanyName AS [RootName],*,1 AS HierarchyLevel FROM #Company WHERE CompanyId=RootCompanyId
UNION ALL
SELECT rc.RootId,rc.RootName,c.*,rc.HierarchyLevel+1
FROM #Company c
INNER JOIN recCTE rc ON c.RootCompanyId=rc.CompanyId AND c.CompanyId<>rc.CompanyId
)
SELECT rc.CompanyId
,rc.CompanyName
,COALESCE(crs.PrimaryCompanyId,rc.RootCompanyId) AS ComputedPrevId
,COALESCE(c1.CompanyName,rc.RootName) AS ComputedPrevName
,rc.RootId
,rc.RootName
FROM recCTE rc
LEFT JOIN #CompanyRelationShip crs ON rc.CompanyId=crs.CompanyId AND rc.RootCompanyId<>crs.PrimaryCompanyId
LEFT JOIN #Company c1 ON crs.PrimaryCompanyId=c1.CompanyId
ORDER BY rc.RootId,rc.HierarchyLevel;
This will first use a recursive CTE to find the children below their root companies and the will try to find the corresponding line in your relationship table.
If you use just SELECT * instead of the column list you can see the full set.
Using LEFT JOIN will return NULL, when the ON claus is not met.
COALESCE will return the first non-NULL value, so - hopefully - the one you are after.

SQL Server skip duplicate relations (parent child) in recursion

I have a tree, where specific node in tree can appear in another node in tree. (2 in my example):
1
/ \
2 3
/ \ \
4 5 6
\
2
/ \
4 5
Notice 2 is duplicated. First under 1, and second under 6.
My recursion is:
with cte (ParentId, ChildId, Field1, Field2) AS (
select BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2
from BillOfMaterials BOM
WHERE ParentId=x
UNION ALL
SELECT BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2 FROM BillOfMaterials BOM
JOIN cte on BOM.ParentId = cte.ChildId
)
select * from cte;
But the problem is that in result relation 2-4 and 2-5 is duplicated (first from relation 1-2 and second from relation 6-2):
ParentId ChildId OtherFields
1 2
1 3
2 4 /*from 1-2*/
2 5 /*from 1-2*/
3 6
6 2
2 4 /*from 6-2*/
2 5 /*from 6-2*/
Is there any way, to skip visiting duplicated relationships? I do no see any logic why should recursion run over rows that are already in result. It would be faster. Something like that:
with cte (ParentId, ChildId, Field1, Field2) AS (
select BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2
from BillOfMaterials BOM
WHERE ParentId=x
UNION ALL
SELECT BOM.ParentId, BOM.ChildId, BOM.Field1, BOM.Field2 FROM BillOfMaterials BOM
JOIN cte on BOM.ParentId = cte.ChildId
------> WHERE (select count(*) FROM SoFarCollectedResult WHERE ParentId=BOM.ParentId AND ChildId=BOM.ChildId ) = 0
)
select * from cte;
I found this thread, but it is 8 years old.
I am using SQL server 2016.
If this is not possible, then my question is how can I remove duplicates from final result, but check distinct only on ParentId and ChildId columns?
Edited:
Expected result is:
ParentId ChildId OtherFields
1 2
1 3
2 4
2 5
3 6
6 2
You can, with adding to 2 little tricks to the SQL.
But you need an extra Id column with a sequential number.
For example via an identity, or a datetime field that shows when the record was inserted.
For the simple reason that as far the database is concerned, there is no order in the records as they were inserted, unless you got a column that indicates that order.
Trick 1) Join the CTE record only to Id's that are higher. Because if they were lower then those are the duplicates you don't want to join.
Trick 2) Use the window function Row_number to get only those that are nearest to the Id the recursion started from
Example:
declare #BillOfMaterials table (Id int identity(1,1) primary key, ParentId int, ChildId int, Field1 varchar(8), Field2 varchar(8));
insert into #BillOfMaterials (ParentId, ChildId, Field1, Field2) values
(1,2,'A','1-2'),
(1,3,'B','1-3'),
(2,4,'C','2-4'), -- from 1-2
(2,5,'D','2-5'), -- from 1-2
(3,6,'E','3-6'),
(6,2,'F','6-2'),
(2,4,'G','2-4'), -- from 6-2
(2,5,'H','2-5'); -- from 6-2
;with cte AS
(
select Id as BaseId, 0 as Level, BOM.*
from #BillOfMaterials BOM
WHERE ParentId in (1)
UNION ALL
SELECT CTE.BaseId, CTE.Level + 1, BOM.*
FROM cte
JOIN #BillOfMaterials BOM on (BOM.ParentId = cte.ChildId and BOM.Id > CTE.Id)
)
select ParentId, ChildId, Field1, Field2
from (
select *
--, row_number() over (partition by BaseId, ParentId, ChildId order by Id) as RNbase
, row_number() over (partition by ParentId, ChildId order by Id) as RN
from cte
) q
where RN = 1
order by ParentId, ChildId;
Result:
ParentId ChildId Field1 Field2
-------- ------- ------ ------
1 2 A 1-2
1 3 B 1-3
2 4 C 2-4
2 5 D 2-5
3 6 E 3-6
6 2 F 6-2
Anyway, as a sidenote, normally a Parent-Child relation table is used differently.
More often it's just a table with unique Parent-Child combinations that are foreign keys to another table where that Id is a primary key. So that the other fields are kept in that other table.
Change your last query from:
select * from cte;
To:
select * from cte group by ParentId, ChildId;
This will essentially take what you have right now, but go one step further and remove rows that have already appeared, which would take care of your duplicate problem. Just be sure that all * returns here is ParentId and ChildId, should it be returning other columns you will need to either add them to the GROUP BY or apply some sort of aggregator to it so that it can still group (max, min, count...).
Should you have more rows that you can't aggregate or group on, you could write the query as such:
select * from cte where ID in (select MAX(ID) from cte group by ParentId, ChildId);
Where ID would be your primary table id for cte. This would take the maximum id when rows matched, which would normally be your latest entry, if you want the earliest entry just change MAX() to MIN().

SQL Server Hierarchical Sum of column

I have my database design as per the diagram.
Category table is self referencing parent child relationship
Budget will have all the categories and amount define for each category
Expense table will have entries for categories for which the amount has been spend (consider Total column from this table).
I want to write select statement that will retrieve dataset with columns given below :
ID
CategoryID
CategoryName
TotalAmount (Sum of Amount Column of all children hierarchy From BudgetTable )
SumOfExpense (Sum of Total Column of Expense all children hierarchy from expense table)
I tried to use a CTE but was unable to produce anything useful. Thanks for your help in advance. :)
Update
I just to combine and simplify data I have created one view with the query below.
SELECT
dbo.Budget.Id, dbo.Budget.ProjectId, dbo.Budget.CategoryId,
dbo.Budget.Amount,
dbo.Category.ParentID, dbo.Category.Name,
ISNULL(dbo.Expense.Total, 0) AS CostToDate
FROM
dbo.Budget
INNER JOIN
dbo.Category ON dbo.Budget.CategoryId = dbo.Category.Id
LEFT OUTER JOIN
dbo.Expense ON dbo.Category.Id = dbo.Expense.CategoryId
Basically that should produce results like this.
This is an interesting problem. And I'm going to solve it with a hierarchyid. First, the setup:
USE tempdb;
IF OBJECT_ID('dbo.Hierarchy') IS NOT NULL
DROP TABLE dbo.[Hierarchy];
CREATE TABLE dbo.Hierarchy
(
ID INT NOT NULL PRIMARY KEY,
ParentID INT NULL,
CONSTRAINT [FK_parent] FOREIGN KEY ([ParentID]) REFERENCES dbo.Hierarchy([ID]),
hid HIERARCHYID,
Amount INT NOT null
);
INSERT INTO [dbo].[Hierarchy]
( [ID], [ParentID], [Amount] )
VALUES
(1, NULL, 100 ),
(2, 1, 50),
(3, 1, 50),
(4, 2, 58),
(5, 2, 7),
(6, 3, 10),
(7, 3, 20)
SELECT * FROM dbo.[Hierarchy] AS [h];
Next, to update the hid column with a proper value for the hiearchyid. I'll use a bog standard recursive cte for that
WITH cte AS (
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST('/' + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
WHERE [h].[ParentID] IS NULL
UNION ALL
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST([c].[h] + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
JOIN [cte] AS [c]
ON [h].[ParentID] = [c].[ID]
)
UPDATE [h]
SET hid = [cte].[h]
FROM cte
JOIN dbo.[Hierarchy] AS [h]
ON [h].[ID] = [cte].[ID];
Now that the heavy lifting is done, the results you want are almost trivially obtained:
SELECT p.id, SUM([c].[Amount])
FROM dbo.[Hierarchy] AS [p]
JOIN [dbo].[Hierarchy] AS [c]
ON c.[hid].IsDescendantOf(p.[hid]) = 1
GROUP BY [p].[ID];
After much research and using test data, I was able to get the running totals starting from bottom of hierarchy.
The solution is made up of two steps.
Create a scalar-valued function that will decide whether a categoryId is a direct or indirect child of another categoryId. This is given in first code-snippet. Note that a recursive query is used for this since that is the best approach when dealing with hierarchy in SQL Server.
Write the running total query that will give totals according to your requirements for all categories. You can filter by category if you wanted to on this query. The second code snippet provides this query.
Scalar-valued function that tells if a child category is a direct or indirect child of another category
CREATE FUNCTION dbo.IsADirectOrIndirectChild(
#childId int, #parentId int)
RETURNS int
AS
BEGIN
DECLARE #isAChild int;
WITH h(ParentId, ChildId)
-- CTE name and columns
AS (
SELECT TOP 1 #parentId, #parentId
FROM dbo.Category AS b
UNION ALL
SELECT b.ParentId, b.Id AS ChildId
FROM h AS cte
INNER JOIN
Category AS b
ON b.ParentId = cte.ChildId AND
cte.ChildId IS NOT NULL)
SELECT #isAChild = ISNULL(ChildId, 0)
FROM h
WHERE ChildId = #childId AND
ParentId <> ChildId
OPTION(MAXRECURSION 32000);
IF #isAChild > 0
BEGIN
SET #isAChild = 1;
END;
ELSE
BEGIN
SET #isAChild = 0;
END;
RETURN #isAChild;
END;
GO
Query for running total starting from bottom of hierarchy
SELECT c.Id AS CategoryId, c.Name AS CategoryName,
(
SELECT SUM(ISNULL(b.amount, 0))
FROM dbo.Budget AS b
WHERE dbo.IsADirectOrIndirectChild( b.CategoryId, c.Id ) = 1 OR
b.CategoryId = c.Id
) AS totalAmount,
(
SELECT SUM(ISNULL(e.total, 0))
FROM dbo.Expense AS e
WHERE dbo.IsADirectOrIndirectChild( e.CategoryId, c.Id ) = 1 OR
e.CategoryId = c.Id
) AS totalCost
FROM dbo.Category AS c;

Displaying sorted hierarchy rows in SQL server?

Assuming I have this table : ( c is a child of parent p)
c p
------
40 0
2 3
2 40
3 1
7 2
1 0
Where (0 means root) — I want the order of select to be displayed as :
c b
------
1 0
3 1
2 3
40 0
2 40
7 2
That's becuase we have 2 roots (1,40) and 1 < 40.
So we start at 1 and then display below it - all it's descendants.
Then we get to 40. same logic again.
Question:
How can I do it ?
I've succeeded to display it recursively + finding level of hierarchy*(not sure if it helps though)*
WITH cte(c, p) AS (
SELECT 40, 0 UNION ALL
SELECT 2,3 UNION ALL
SELECT 2,40 UNION ALL
SELECT 3,1 UNION ALL
SELECT 7,2 UNION ALL
SELECT 1,0
) , cte2 AS(
SELECT c,
p,
PLevel = 1
FROM cte
WHERE p = 0
UNION ALL
SELECT cte.c,
cte.p,
PLevel = cte2.PLevel + 1
FROM cte
INNER JOIN cte2
ON cte2.c = cte.p
)
SELECT *
FROM cte2
Full SQL fiddle
You have almost done it. Just add a rank to identify each group and then sort the data on it.
Also, as you are working with more complex hierarchy we need to change the [level] value. In is now not a number, put the full path of the current element to its parent. Where \ means parent. For example the following string:
\1\5\4\1
represents the hierarchy below:
1
--> 5
--> 4
--> 1
I get the idea from hierarchyid type. You may want to consider storing hierarchies using it, as it has handy build-in functions for working with such structures.
Here is full working example with the new data:
DECLARE #DataSource TABLE
(
[c] TINYINT
,[p] TINYINT
);
INSERT INTO #DataSource ([c], [p])
VALUES (1,0)
,(3, 1)
,(2, 3)
,(5,1)
,(7, 2)
,(40, 0)
,(2, 40);
WITH DataSource ([c], [p], [level], [rank])AS
(
SELECT [c]
,[p]
,CAST('/' AS VARCHAR(24))
,ROW_NUMBER() OVER (ORDER BY [c] ASC)
FROM #DataSource
WHERE [p] = 0
UNION ALL
SELECT DS.[c]
,DS.[p]
,CAST(DS1.[level] + CAST(DS.[c] AS VARCHAR(3)) + '/' AS VARCHAR(24))
,DS1.[rank]
FROM #DataSource DS
INNER JOIN DataSource DS1
ON DS1.[c] = DS.[p]
)
SELECT [c]
,[p]
FROM DataSource
ORDER BY [Rank]
,CAST([level] AS hierarchyid);
Again, pay attention to the node (7,2) which is participating in the two groups (even in your example). I guess this is just a sample data and you have a way to defined where the node should be included.

Resources