How to find Hierarchical order in SQL server - sql-server

Assume an organization assigning employees to do annual reviews to others employees. Each ReviewID (who is an employee) can get reviewed by multiple employeeIDs. An employee can start/do the review only if the particular reviewID completed all his reviewIDs or has no pending reviewIDs.
Sample Data code:
CREATE TABLE FindOrder
(
EmployeeID int
,ReviewID int
)
insert findorder
values (1,3), (1,10), (1,12), (2,3), (2,5), (2,7), (3,0), (4,6), (5, 3), (6,0), (7,0), (10,0), (12,5)
EmployeeIDs that have nothing to review (ReviewID=0) should be my first set of list (3, 6, 7, 10). EmployeedIDs who can start their review now are 4,5 ( should be my second set) as they need to review 6, 3 who dont have pending ReviewIDs. Here not employeeIDs 1 or 2 because 1 has reviewID 12 who did not complete all his reviews. so on...
Please let me know if I am still not clear.
I want to find the order levels such that level 0 is (6,10,7,3), level 1 is (5, 4), level 2 is (2, 12), level 3 is (1).
I tried this cte to find order:
;WITH CTE AS
(
SELECT EmployeeID, ReviewID, 0 AS [Level] FROM FindOrder WHERE NETOUT = '0'
UNION ALL
SELECT NN.EmployeeID, NN.ReviewID, [Level]+1 FROM FindOrder nn
JOIN CTE ON NN.ReviewID=CTE.EmployeeID
)
SELECT * FROM CTE
But I get Employeeid 1 in level 1 and level 3. EmployeeID 1 should not come in level 1 as all ppl Employee 1 has to review did not complete their reviews ie., Employee 1 should come as Employee 12 did not complete his review.
In general, new subset of data in recursive query above should have filtered EmployeeID 1 and 2.
Little tricky to explain but I hope I am clear now :(

It looks like your level should actually be the longest path of reviews needed for a given employee. For example, employee one has the following paths...
1->3
1->10
1->12->5->3
The level for this employee is the longest path, and if I'm understanding your question, the only one you care about. Try this...
;WITH CTE AS
(
SELECT EmployeeID, ReviewID, 0 AS [Level] FROM FindOrder WHERE ReviewId = '0'
UNION ALL
SELECT NN.EmployeeID, NN.ReviewID, [Level]+1 FROM FindOrder nn
JOIN CTE ON NN.ReviewID=CTE.EmployeeID
)
SELECT EmployeeId, MAX(Level) AS Level FROM CTE
GROUP BY EmployeeID
ORDER BY MAX(Level)

Related

Left outer join returning extra records

I have 2 tables namely "Item" and "Messages".
Item table has the columns like Id, Amount, etc.
Messages table has the columns like ItemId, Count, Comment, etc.
Here the common link between these 2 tables is the "Id" from Item and "ItemId" from Messages.
The "Count" column in the Messages table is just the count of comments per ItemId. i.e. When user updates the comment for any record, an entry gets created in the Messages table and Count for that particular ItemId shows as 1. If user updates one more comment to same record, the Count shows 2 and so on. If user does not update comment for a certain record, the entry does not get created in Messages table at all (NULL).
I want to capture all the records from the Item table irrespective of whether user has updated comment or not. If there are 0 comments, the query should return NULL in the Comments column for that record. But, If the user has updated the comment, it should pick up the comment having the highest "Count". E.g. if one record has 8 comments, the query should return only the record where Messages.Count=8 and not all 8 records. If only one comment, then that comment should be seen.
I have written LEFT OUTER JOIN but not able to get through as it shows all 8 records. In the results, I find 7 records with NULL as the count and the 8th record showing count as 8 but I need only this 8th record and not the other 7.
Any help would be highly appreciated. Below is my query:
Select
Id,
Amount,
Messages.Comment As Comments
From Item
Left Outer Join Messages ON Messages.ItemId=Item.Id
Left Outer Join (Select ItemId, MAX(Id) as max_id from Messages Group by ItemId) T ON Messages.ItemId=T.ItemId and Messages.Id=T.max_id
Where amount > 100
I've hooked up an example using temp tables which I think covers what you're looking for. Just remove the temp table stuff and replace with your actual tables and it should work.
CREATE TABLE #Item ( ID int PRIMARY KEY,
Amount numeric(9,2))
CREATE TABLE #Messages ( ItemId int REFERENCES #Item(ID),
[Count] smallint,
Comment nvarchar(max))
INSERT INTO #Item (ID, Amount)
SELECT 1, 100
UNION
SELECT 2, 120
UNION
SELECT 3, 140
UNION
SELECT 4, 50
INSERT INTO #Messages ( ItemID,
[Count],
Comment)
SELECT 1, 1, 'Comment 1 - 1'
UNION
SELECT 1, 2, 'Comment 1 - 2'
UNION
SELECT 2, 1, 'Comment 2 - 1'
UNION
SELECT 2, 1, 'Comment 3 - 1'
UNION
SELECT 2, 2, 'Comment 3 - 2'
SELECT I.Id,
I.Amount,
M.Comment
FROM #Item AS I
OUTER APPLY ( SELECT TOP 1 M.Comment
FROM #Messages AS M
WHERE M.ItemId = I.ID
ORDER BY M.[Count] DESC) AS M
WHERE i.amount > 100
DROP TABLE #Messages
DROP TABLE #Item
go for it bro....
Select
Id,
Amount,
T.Comment As Comments
From Item
Left Outer Join (Select ItemId, MAX(Id) as max_id, Comments from Messages Group by ItemId) T ON Item.ItemId=T.ItemId
Where amount > 100

Comparing the length of two similar strings and picking the longest

I am Trying to compare two strings and pick the longest if they are similar, I have managed to pick the longest by using the following code:
SELECT D.RID, ProductID, Product, [Length] FROM (
SELECT RID, MAX([Length]) AS theLength FROM SortData GROUP BY RID)
AS X INNER JOIN SortData AS D ON D.RID = X.RID AND D.[Length] = X.theLength
But I am now trying to make sure that the code only pick the longest string if it is a like the word it is comparing it to, I have attempted the following code in a few ways but I would be grateful if somebody could help me:
SELECT D.RID, D.ProductID, Product, [Length] FROM (
SELECT RID, Product, MAX([Length]) AS theLength FROM SortData GROUP BY RID)
AS X INNER JOIN SortData AS D ON D.RID = X.RID AND D.[Length] = X.theLength WHERE
D.Product LIKE Product
Using this code I get the Following Error:
Msg 8120, Level 16, State 1, Line 3 Column 'SortData.Product' is
invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause. Msg 209, Level 16, State 1,
Line 5 Ambiguous column name 'Product'. Msg 209, Level 16, State 1,
Line 2 Ambiguous column name 'Product'.
Example of the Data I would Like to pick:
1 Sam
1 Samantha
2 Oliver
3 Ollie
4 Benjamin
4 Ben
...
I would expect the output list to be like:
1 Samantha
2 Oliver
3 Ollie
4 Benjamin
...
To Clarify what I am trying to do in the context of this example, I am trying to compare the two Names and if the are LIKE (e.g. x.Name LIKE Name) then pick the longest...
As Requested here is further test data:
1 Hydrogen
1 Hydrogen Oxide
1 Carbon Monoxide
2 Carbon
2 Carbon
2 Carbon Dioxide
3 Carbon Monoxide
3 Carbon Dioxide
3 Oxygen
4 Hydrogen Dioxide
Desired Results are as so:
1 Hydrogen Oxide
1 Carbon Monoxide
2 Carbon Dioxide
3 Carbon Monoxide
3 Oxygen
4 Hydrogen Dioxide
Perhaps another option: The WITH TIES clause in concert with Row_Number()
Example
Select Top 1 with ties *
From YourTable
Order By Row_Number() over (Partition by ID Order By Len(Name) desc)
Your query doesn't come close to your sample data and output. So I built this around the sample data provided to demonstrate one way of solving this.
declare #Something table
(
Col1 int
, Col2 varchar(20)
)
insert #Something values
(1, 'Sam')
, (1, 'Samantha')
, (2, 'Oliver')
, (3, 'Ollie')
select x.Col1
, x.Col2
from
(
select *
, RowNum = ROW_NUMBER() over(partition by Col1 order by LEN(Col2) desc)
from #Something
) x
where x.RowNum = 1
---EDIT---
To demonstrate that this code still returns the desired output from your new sample data...
declare #Something table
(
Col1 int
, Col2 varchar(20)
)
insert #Something values
(1, 'Sam')
, (1, 'Samantha')
, (2, 'Oliver')
, (3, 'Ollie')
, (4, 'Benjamin')
, (4, 'Ben')
select x.Col1
, x.Col2
from
(
select *
, RowNum = ROW_NUMBER() over(partition by Col1 order by LEN(Col2) desc)
from #Something
) x
where x.RowNum = 1
This returns:
1 Samantha
2 Oliver
3 Ollie
4 Benjamin
Since you claim this still doesn't work you need to provide an example of how or why this doesn't work. You keep mentioning LIKE but have not explained or demonstrated how that comes into play here. Help me understand the problem and I can help you find a solution.
I Ended up figuring it out and using the following code:
SELECT D.RID, ProductID, D.Product, [Length] FROM
(
SELECT RID, MAX([Length]) AS theLength
FROM SortData GROUP BY RID
) AS X
INNER JOIN SortData AS D ON D.RID = X.RID AND D.[Length] = X.theLength
WHERE D.Product LIKE Product
GO

SQL server logic for fetching data from the table

"Need to display all items linked to the parent category id=1 As per the table, It should fetch:Big Machine, Computer, CPU Cabinet, Hard Disk and Magnetic Disk. But by the logic that is written it is not fetching all the records. Plz help.."
create table ItemSpares
(
ItemName varchar(20),
ItemID int,
ParentCategoryID int
)
insert into ItemSpares (ItemName,ItemID,ParentCategoryID)
select 'Big Machine', 1 , NULL UNION ALL
select 'Computer', 2, 1 UNION ALL
select 'CPU Cabinet', 3, 2 UNION ALL
select 'Hard Disk', 4, 3 UNION ALL
select 'Magnetic Disk',5,4 UNION ALL
select 'Another Big Machine',6, NULL
You need to use a hierarchical SQL query, took a while to figure out but try this:
with BigComputerList (ItemName, ItemID, ParentCategoryID, Level)
AS
(
-- Anchor member definition
SELECT e.ItemName, e.ItemID, e.ParentCategoryID,
0 AS Level
FROM ItemSpares AS e
WHERE ItemID = 1
UNION ALL
-- Recursive member definition
SELECT e.ItemName, e.ItemID, e.ParentCategoryID,
Level + 1
FROM ItemSpares AS e
INNER JOIN BigComputerList AS d
ON e.ParentCategoryId = d.ItemID
)
Select * From BigComputerList
I would highly recommend reading this article if you want to understand what the query is doing.

CTE Recursion to get tree hierarchy

I need to get an ordered hierarchy of a tree, in a specific way. The table in question looks a bit like this (all ID fields are uniqueidentifiers, I've simplified the data for sake of example):
EstimateItemID EstimateID ParentEstimateItemID ItemType
-------------- ---------- -------------------- --------
1 A NULL product
2 A 1 product
3 A 2 service
4 A NULL product
5 A 4 product
6 A 5 service
7 A 1 service
8 A 4 product
Graphical view of the tree structure (* denotes 'service'):
A
___/ \___
/ \
1 4
/ \ / \
2 7* 5 8
/ /
3* 6*
Using this query, I can get the hierarchy (just pretend 'A' is a uniqueidentifier, I know it isn't in real life):
DECLARE #EstimateID uniqueidentifier
SELECT #EstimateID = 'A'
;WITH temp as(
SELECT * FROM EstimateItem
WHERE EstimateID = #EstimateID
UNION ALL
SELECT ei.* FROM EstimateItem ei
INNER JOIN temp x ON ei.ParentEstimateItemID = x.EstimateItemID
)
SELECT * FROM temp
This gives me the children of EstimateID 'A', but in the order that it appears in the table. ie:
EstimateItemID
--------------
1
2
3
4
5
6
7
8
Unfortunately, what I need is an ordered hierarchy with a result set that follows the following constraints:
1. each branch must be grouped
2. records with ItemType 'product' and parent are the top node
3. records with ItemType 'product' and non-NULL parent grouped after top node
4. records with ItemType 'service' are bottom node of a branch
So, the order that I need the results, in this example, is:
EstimateItemID
--------------
1
2
3
7
4
5
8
6
What do I need to add to my query to accomplish this?
Try this:
;WITH items AS (
SELECT EstimateItemID, ItemType
, 0 AS Level
, CAST(EstimateItemID AS VARCHAR(255)) AS Path
FROM EstimateItem
WHERE ParentEstimateItemID IS NULL AND EstimateID = #EstimateID
UNION ALL
SELECT i.EstimateItemID, i.ItemType
, Level + 1
, CAST(Path + '.' + CAST(i.EstimateItemID AS VARCHAR(255)) AS VARCHAR(255))
FROM EstimateItem i
INNER JOIN items itms ON itms.EstimateItemID = i.ParentEstimateItemID
)
SELECT * FROM items ORDER BY Path
With Path - rows a sorted by parents nodes
If you want sort childnodes by ItemType for each level, than you can play with Level and SUBSTRING of Pathcolumn....
Here SQLFiddle with sample of data
This is an add-on to Fabio's great idea from above. Like I said in my reply to his original post. I have re-posted his idea using more common data, table name, and fields to make it easier for others to follow.
Thank you Fabio! Great name by the way.
First some data to work with:
CREATE TABLE tblLocations (ID INT IDENTITY(1,1), Code VARCHAR(1), ParentID INT, Name VARCHAR(20));
INSERT INTO tblLocations (Code, ParentID, Name) VALUES
('A', NULL, 'West'),
('A', 1, 'WA'),
('A', 2, 'Seattle'),
('A', NULL, 'East'),
('A', 4, 'NY'),
('A', 5, 'New York'),
('A', 1, 'NV'),
('A', 7, 'Las Vegas'),
('A', 2, 'Vancouver'),
('A', 4, 'FL'),
('A', 5, 'Buffalo'),
('A', 1, 'CA'),
('A', 10, 'Miami'),
('A', 12, 'Los Angeles'),
('A', 7, 'Reno'),
('A', 12, 'San Francisco'),
('A', 10, 'Orlando'),
('A', 12, 'Sacramento');
Now the recursive query:
-- Note: The 'Code' field isn't used, but you could add it to display more info.
;WITH MyCTE AS (
SELECT ID, Name, 0 AS TreeLevel, CAST(ID AS VARCHAR(255)) AS TreePath
FROM tblLocations T1
WHERE ParentID IS NULL
UNION ALL
SELECT T2.ID, T2.Name, TreeLevel + 1, CAST(TreePath + '.' + CAST(T2.ID AS VARCHAR(255)) AS VARCHAR(255)) AS TreePath
FROM tblLocations T2
INNER JOIN MyCTE itms ON itms.ID = T2.ParentID
)
-- Note: The 'replicate' function is not needed. Added it to give a visual of the results.
SELECT ID, Replicate('.', TreeLevel * 4)+Name 'Name', TreeLevel, TreePath
FROM MyCTE
ORDER BY TreePath;
I believe that you need to add the following to the results of your CTE...
BranchID = some kind of identifier that uniquely identifies the branch. Forgive me for not being more specific, but I'm not sure what identifies a branch for your needs. Your example shows a binary tree in which all branches flow back to the root.
ItemTypeID where (for example) 0 = Product and 1 = service.
Parent = identifies the parent.
If those exist in the output, I think you should be able to use the output from your query as either another CTE or as the FROM clause in a query. Order by BranchID, ItemTypeID, Parent.

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.
This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.
If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.
Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Resources