Recursive hierarchical traversal in MSSQL - sql-server

Using MSSQL, I am trying to traverse through a table with parent child relationship. I need my result set so that I get all elements in a proper indented manner, till the last leaf, like shown below.
A parent item 36 has 2 children 17 and 18. Each of those children 17 and 18 have one more children to them 26, 42 respectively
36 - 17
17 - 26
36 - 18
18 - 42
But my recursion is working OK in terms of the data traversal, but order wise, it is failing. My recursive query gives me the following output
36 - 17
36 - 18
17 - 26
18 - 42
It brings all levels at once, stores them in a record, then traverses through each of the children of those levels.
Oracle's "connect by prior" seems to be working fine, but, MSSQL is not. I am pasting a sample of what I am using
WITH SRC (Level, PARITEMID, CHIITEMID) AS
(
SELECT
0 as Level,
PI.pitem_id as PARITEMID,
CI.pitem_id as CHIITEMID
FROM PI, CI JOIN <Condition> where PI.PITEM_ID =
UNION ALL
SELECT
Level + 1,
PI1.pitem_id as PARITEMID,
CI1.pitem_id AS CHIITEMID
FROM PI1, CI1 JOIN <Condition>
)
Select * from SRC
Is there something I need to do on the SRC I obtain by ordering it, or is there fundamentally something wrong with the recursion itself?

Wasn't clear on your field names so I assumed the following:
cItem_ID - Child ID
pItem_ID - Parent ID
item_Title - Item name/Description
Also, not clear on the Sequence, So I assumed Item_Title (alphabetical). However, you can use any field available. (see the the "10000+Row_Number()" lines)
I should note, cteR1, and cteR2 are not necessary. I do like the range keys, they server many purposes. If you do remove them, just set the final Order By to Order By A.Seq
Declare #MyTable table (pItem_ID int,cItem_ID int,item_Title varchar(50))
Insert into #MyTable values
(null,36,'Item 36')
,(36,17,'Item 17')
,(17,26,'Item 26')
,(36,18,'Item 18')
,(18,42,'Item 42')
Declare #Top int = null --<< Sets top of Hier Try 7
Declare #Nest varchar(25) = '|-----' --<< Optional: Added for readability
;with cteP as (
Select Seq = cast(10000+Row_Number() over (Order by item_Title) as varchar(500))
,cItem_ID
,pItem_ID
,Lvl=1
,item_Title
From #MyTable
Where IsNull(#Top,-1) = case when #Top is null then isnull(pItem_ID,-1) else cItem_ID end
Union All
Select Seq = cast(concat(p.Seq,'.',10000+Row_Number() over (Order by r.item_Title)) as varchar(500))
,r.cItem_ID
,r.pItem_ID
,p.Lvl+1
,r.item_Title
From #MyTable r
Join cteP p on r.pItem_ID = p.cItem_ID)
,cteR1 as (Select *,R1=Row_Number() over (Order By Seq) From cteP)
,cteR2 as (Select A.Seq,A.cItem_ID,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.cItem_ID )
Select A.R1
,B.R2
,A.cItem_ID
,A.pItem_ID
,A.Lvl
,item_Title = Replicate(#Nest,A.Lvl-1) + A.item_Title
From cteR1 A
Join cteR2 B on A.cItem_ID=B.cItem_ID
Order By A.R1
Returns

Related

How to present hierarchical data in SQL Server 2014

I have two tables Company and CompanyRelationShip.
DECLARE #Company TABLE (
CompanyId INT
,RootCompanyId INT
,CompanyName VARCHAR(100)
)
INSERT INTO #Company
VALUES (2,2,'ROOT')
,(106,2,'ABC')
,(105,2,'CDF')
,(3,3,'ROOT2')
,(150,3,'YXZ')
,(151,3,'XZX')
DECLARE #CompanyRelationShip TABLE (
PrimaryCompanyId INT
,CompanyId INT
)
INSERT INTO #CompanyRelationShip
VALUES (2,2)
,(2,106)
,(2,105)
,(106,105)
,(3,3)
,(3,151)
,(3,150)
,(151,150)
I want the result in the below format
CompanyId PrimayCompanyId PrimaryCompanyName RootCompanyId RootCompanyName
2 2 ROOT 2 ROOT
106 2 ROOT 2 ROOT
105 106 ABC 2 ROOT
3 3 ROOT2 3 ROOT2
151 3 ROOT2 3 ROOT2
150 151 XZX 3 ROOT2
I have tried the below query to get the result
WITH PrimayCompany
AS (
SELECT CR.PrimaryCompanyId
,C.CompanyName
FROM #CompanyRelationShip CR
JOIN #Company C ON CR.CompanyId = CR.PrimaryCompanyId
)
,RootCompany
AS (
SELECT RootCompanyId
,CompanyName
FROM #Company
WHERE CompanyId = RootCompanyId
)
SELECT C.CompanyId
,C.RootCompanyId
,RC.CompanyName
,CR.PrimaryCompanyId
,PC.CompanyName
FROM #Company C
LEFT JOIN #CompanyRelationShip CR ON C.CompanyId = CR.PrimaryCompanyId
LEFT JOIN PrimayCompany PC ON PC.PrimaryCompanyId = CR.PrimaryCompanyId
LEFT JOIN RootCompany RC ON RC.RootCompanyId = CR.PrimaryCompanyId
I would really appreciate a bit of help.
In my comment I asked you, why you would need the table #CompanyRelationShip at all... This is just adding a hell of a lot of complexity and potentials for errors.
My suggestion relies on the first table alone. Look, how I've changed the parent IDs of 105 and 151 to place them below in the hierarchy. Just to show the principles I've added a second child below 150:
DECLARE #Company TABLE (
CompanyId INT
,RootCompanyId INT
,CompanyName VARCHAR(100)
);
INSERT INTO #Company
VALUES (2,2,'ROOT')
,(106,2,'ABC')
,(105,106,'CDF')
,(3,3,'ROOT2')
,(150,3,'YXZ')
,(151,150,'XZX')
,(152,150,'Second below 150');
--the query
WITH recCTE AS
(
SELECT CompanyId AS [RootId],CompanyName AS [RootName],*,1 AS HierarchyLevel FROM #Company WHERE CompanyId=RootCompanyId
UNION ALL
SELECT rc.RootId,rc.RootName,c.*,rc.HierarchyLevel+1
FROM #Company c
INNER JOIN recCTE rc ON c.RootCompanyId=rc.CompanyId AND c.CompanyId<>rc.CompanyId
)
SELECT RootId
,RootName
,RootCompanyId AS [PrevId]
,CompanyId
,CompanyName
,HierarchyLevel
FROM recCTE rc
ORDER BY RootId,HierarchyLevel;
The result
RootId RootName PrevId CompanyId CompanyName HierarchyLevel
2 ROOT 2 2 ROOT 1
2 ROOT 2 106 ABC 2
2 ROOT 106 105 CDF 3
3 ROOT2 3 3 ROOT2 1
3 ROOT2 3 150 YXZ 2
3 ROOT2 150 151 XZX 3
3 ROOT2 150 152 Second below 150 3
The idea in short:
We use a a recursive CTE (which is a rather iterative concept actually).
The first SELECT (the anchor) starts with the companies, where the two IDs match.
The second SELECT after UNION ALL picks the next level by joining to the intermediate result line
The two columns RootId and RootName are just passed through to show up in your final set.
The HierarchyLevel is the position within the line, thus placing 105 within ROOT, but below 106.
Hope this helps...
A solution for the given structure
As told, the given structure is not the best choice and should be altered. But if you have to stick to this, you might try something along this:
WITH recCTE AS
(
SELECT CompanyId AS [RootId],CompanyName AS [RootName],*,1 AS HierarchyLevel FROM #Company WHERE CompanyId=RootCompanyId
UNION ALL
SELECT rc.RootId,rc.RootName,c.*,rc.HierarchyLevel+1
FROM #Company c
INNER JOIN recCTE rc ON c.RootCompanyId=rc.CompanyId AND c.CompanyId<>rc.CompanyId
)
SELECT rc.CompanyId
,rc.CompanyName
,COALESCE(crs.PrimaryCompanyId,rc.RootCompanyId) AS ComputedPrevId
,COALESCE(c1.CompanyName,rc.RootName) AS ComputedPrevName
,rc.RootId
,rc.RootName
FROM recCTE rc
LEFT JOIN #CompanyRelationShip crs ON rc.CompanyId=crs.CompanyId AND rc.RootCompanyId<>crs.PrimaryCompanyId
LEFT JOIN #Company c1 ON crs.PrimaryCompanyId=c1.CompanyId
ORDER BY rc.RootId,rc.HierarchyLevel;
This will first use a recursive CTE to find the children below their root companies and the will try to find the corresponding line in your relationship table.
If you use just SELECT * instead of the column list you can see the full set.
Using LEFT JOIN will return NULL, when the ON claus is not met.
COALESCE will return the first non-NULL value, so - hopefully - the one you are after.

Order By A Value In Another Field

I have a job definition table with example data, shown below, that needs to be sorted in such a way that records that have a NextJobDefinitionID > 0 are kept together. The sort order for records where the NextJobDefinitionID = 0 does not matter. In the example the record with the JobName of "M1 P1" must follow "M1 Pre-Roll" and "M1 Pre-Roll" must follow "M1 Recurring Benefits". I am using SQL Server 2014.
Data:
My desired output would be:
M1 Recurring Benefits
M1 Pre-Roll
M1 P1
I believe this constructs the required ordering:
declare #t table (ID int,NextID int)
insert into #t(ID,NextID) values
(1,0),
(2,5),
(3,6),
(4,2),
(5,0),
(6,4)
;With Parents as (
select ID,ID as ParentID, 0 as Depth, NextID
from #t
where ID not in (select NextID from #t)
union all
select p.NextID,p.ParentID,Depth+1,t.NextID
from Parents p
inner join
#t t
on
p.NextID = t.ID
where p.NextID != 0
)
select * from Parents
order by ParentID,Depth
It works by building a CTE by using rows which may be freely ordered as the base case and then following the NextID values along the chain, keeping the original ParentID and increasing a Depth value, to then be able to have a simple ORDER BY at the end.
(Translating back to your original column/table/sample data left as an exercise for the reader, since as I say, I don't need the typing practice to transcribe it from an image)
If I correctly understand, you need something like this:
(select JobDefinitionID, FloatingJobID, JobName, NextJobDefinitionID from JobDefinitions
where NextJobDefinitionID <> 0)
UNION ALL
(select JobDefinitionID, FloatingJobID, JobName, 9223372036854775807 AS NextJobDefinitionID from JobDefinitions WHERE JobDefinitionID = (SELECT MAX(NextJobDefinitionID) FROM JobDefinitions))
order by NextJobDefinitionID

How do I exclude rows when an incremental value starts over?

I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.
Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle
Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance

Displaying sorted hierarchy rows in SQL server?

Assuming I have this table : ( c is a child of parent p)
c p
------
40 0
2 3
2 40
3 1
7 2
1 0
Where (0 means root) — I want the order of select to be displayed as :
c b
------
1 0
3 1
2 3
40 0
2 40
7 2
That's becuase we have 2 roots (1,40) and 1 < 40.
So we start at 1 and then display below it - all it's descendants.
Then we get to 40. same logic again.
Question:
How can I do it ?
I've succeeded to display it recursively + finding level of hierarchy*(not sure if it helps though)*
WITH cte(c, p) AS (
SELECT 40, 0 UNION ALL
SELECT 2,3 UNION ALL
SELECT 2,40 UNION ALL
SELECT 3,1 UNION ALL
SELECT 7,2 UNION ALL
SELECT 1,0
) , cte2 AS(
SELECT c,
p,
PLevel = 1
FROM cte
WHERE p = 0
UNION ALL
SELECT cte.c,
cte.p,
PLevel = cte2.PLevel + 1
FROM cte
INNER JOIN cte2
ON cte2.c = cte.p
)
SELECT *
FROM cte2
Full SQL fiddle
You have almost done it. Just add a rank to identify each group and then sort the data on it.
Also, as you are working with more complex hierarchy we need to change the [level] value. In is now not a number, put the full path of the current element to its parent. Where \ means parent. For example the following string:
\1\5\4\1
represents the hierarchy below:
1
--> 5
--> 4
--> 1
I get the idea from hierarchyid type. You may want to consider storing hierarchies using it, as it has handy build-in functions for working with such structures.
Here is full working example with the new data:
DECLARE #DataSource TABLE
(
[c] TINYINT
,[p] TINYINT
);
INSERT INTO #DataSource ([c], [p])
VALUES (1,0)
,(3, 1)
,(2, 3)
,(5,1)
,(7, 2)
,(40, 0)
,(2, 40);
WITH DataSource ([c], [p], [level], [rank])AS
(
SELECT [c]
,[p]
,CAST('/' AS VARCHAR(24))
,ROW_NUMBER() OVER (ORDER BY [c] ASC)
FROM #DataSource
WHERE [p] = 0
UNION ALL
SELECT DS.[c]
,DS.[p]
,CAST(DS1.[level] + CAST(DS.[c] AS VARCHAR(3)) + '/' AS VARCHAR(24))
,DS1.[rank]
FROM #DataSource DS
INNER JOIN DataSource DS1
ON DS1.[c] = DS.[p]
)
SELECT [c]
,[p]
FROM DataSource
ORDER BY [Rank]
,CAST([level] AS hierarchyid);
Again, pay attention to the node (7,2) which is participating in the two groups (even in your example). I guess this is just a sample data and you have a way to defined where the node should be included.

where not in / where not like subquery

Can somebody help me out with a MS-SQL query please.
I have the following:
select Name from Keyword.dbo.NGrams
where Name not in (select Name from Keyword.dbo.Brands)
What I really want is something like this, but I can't get the syntax right
select Name from Keyword.dbo.NGrams
where Name not like (select Name from Keyword.dbo.Brands)
"not in" works great for NGrams & Brands that match exactly. But my NGrams are multiple words long and some contain a Brand within them.
Thanks so much
Edit: Maybe I can re-clarify what I am looking for my this pseudo sql:
select Name from Keyword.dbo.NGrams
where Description not containing (select Word from Keyword.dbo.Brands)
Brand is a list of single words. Description in NGrams would be a 2 or 3 word phrase. I want to select all the NGrams that do not contain any of the Brands
SELECT
n.Name
FROM Keyword.dbo.NGrams n
LEFT JOIN Keyword.dbo.Brands b
ON n.Name LIKE '%'+b.Name+'%'
WHERE b.Name IS NULL
SQL Fiddle Demo
If you want to avoid the Scunthorpe Problem and only match whole words, change the join condition to:
ON ' '+n.Name+' ' LIKE '% '+b.Name+' %'
Use a where not exists to express the like:
select Name
from Keyword.dbo.NGrams ng
where not exists (
select *
from Keyword.dbo.Brands b
where ng.Name like '%' + b.name + '%'
)
I ran a test using the ENABLE2K standard English word list. I generated 10 million random ngrams and 50000 random brands. The query takes about 1 minute to run on my workstation.
CREATE TABLE #enable2k (word varchar(max) NOT NULL)
BULK INSERT #enable2k FROM 'C:\enable2k.txt'
CREATE TABLE #ngrams (ngram_id int NOT NULL, word_num int NOT NULL, word varchar(max) NOT NULL, PRIMARY KEY(ngram_id, word_num));
INSERT #ngrams SELECT TOP 10000000 ROW_NUMBER() OVER(ORDER BY NEWID()), 1, word FROM #enable2k,(SELECT TOP 58 0 FROM master..spt_values) t(i)
INSERT #ngrams SELECT TOP 10000000 ROW_NUMBER() OVER(ORDER BY NEWID()), 2, word FROM #enable2k,(SELECT TOP 58 0 FROM master..spt_values) t(i)
INSERT #ngrams SELECT TOP 10000000 ROW_NUMBER() OVER(ORDER BY NEWID()), 3, word FROM #enable2k,(SELECT TOP 58 0 FROM master..spt_values) t(i)
CREATE TABLE #brands (brand varchar(32) NOT NULL PRIMARY KEY)
INSERT #brands SELECT TOP 50000 word FROM #enable2k WHERE LEN(word) <= 32 ORDER BY NEWID()
SELECT *
FROM #ngrams n
PIVOT (MIN(word) FOR word_num IN ([1],[2],[3])) n1
WHERE NOT EXISTS (
SELECT 1
FROM #ngrams n2
INNER JOIN #brands b
ON (n2.word = b.brand)
WHERE n1.ngram_id = n2.ngram_id
)

Resources