SQL child parent hierarchy - Using a where on child to hide parent - sql-server

Let's say I have table below:
ID | Name | Active | ParentID
1 | Foo1 | 1 | 0
2 | Foo2 | 1 | 1
3 | Foo3 | 1 | 2
4 | Foo4 | 1 | 3
5 | Foo5 | 1 | 3
6 | Foo6 | 0 | 5
7 | Foo7 | 1 | 2
7 | Foo7 | 1 | 6
8 | Foo8 | 1 | 7
9 | Foo9 | 1 | 5
(I have indeed duplicate ID's, on which I expressed my thoughts but to no result)
As you can see, once child can have multiple parents. ID's with ParentID 0 have no parent. I need to select all ID's that are active and do not have an inactive parent above them, however high in the tree that might be.
So with the data set above, my result would be:
ID | Name |
1 | Foo1 |
2 | Foo2 |
3 | Foo3 |
4 | Foo4 |
5 | Foo5 |
9 | Foo9 |
ID 6 got removed because it was Inactive
ID 7 got removed because one of its parents (6) is inactive
ID 8 got removed because a parent (6) of its parent (7) is inactive
ID 9 is fine because its parent (5) is active and so are 5 his parents etc
I attempted this with a subquery in the where
SELECT *
FROM table
WHERE ID not in (SELECT ID FROM table where Active = 0)
But that only solves it for the current record.
I've also tried a typical self-join as used for employee/manager, but that only goes one layer deep, while here I also need to check for the parent of the parent etc
Any suggestions/ideas?

One method would be to use an rCTE to work through the hierachy, with a column that retains the initial ID. Then you can use an EXISTS to ensure there are no rows with a value of 0 for Active:
WITH rCTE AS(
SELECT ID,
Name,
Active,
ParentID,
ID AS InitialID
FROM dbo.YourTable YT
UNION ALL
SELECT YT.ID,
YT.Name,
YT.Active,
YT.ParentID,
r.InitialID
FROM rCTE r
JOIN dbo.YourTable YT ON r.ParentID = YT.ID)
SELECT *
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM rCTE r
WHERE r.InitialID = YT.ID
AND r.Active = 0);

I would use a recursive CTE to identify IDs where the chain is continuous, using both conditional and unconditional increment by 1 as follows:
With A As
(Select ID, [Name], Active, ParentID, 0 As NUM_1, 0 As NUM_2
From Tbl Where ParentID=0
Union All
Select Tbl.ID, Tbl.[Name], Tbl.Active, Tbl.ParentID,
NUM_1 + 1 As NUM_1,
NUM_2 + IIF(Tbl.Active=1,1,0) As NUM_2
From Tbl Inner Join A On (Tbl.ParentID=A.ID)
)
Select ID, [Name]
From A
Where ID Not In (Select ID From A Where NUM_1<>NUM_2)
Order by ID
Result:
ID
Name
1
Foo1
2
Foo2
3
Foo3
4
Foo4
5
Foo5
9
Foo9
db<>fiddle

Related

Update hierarchy after deletion of row

I have a table that contains tree-like data (hierarchic design). Here is a small sample:
+----+----------+-----------+-------+----------+---------+
| ID | ParentID | Hierarchy | Order | FullPath | Project |
+----+----------+-----------+-------+----------+---------+
| 1 | null | 1 | 1 | 1 | 1 |
| 2 | null | 2 | 2 | 2 | 1 |
| 3 | 1 | 1.1 | 1 | 1-3 | 1 |
| 4 | 1 | 1.2 | 2 | 1-4 | 1 |
| 5 | 4 | 1.2.1 | 1 | 1-4-5 | 1 |
| 6 | 2 | 2.1 | 1 | 2-6 | 1 |
| 7 | null | 3 | 1 | 1 | 2 |
+----+----------+-----------+-------+----------+---------+
Project indicates which project owns the hierarchic dataset
ParentID is the ID of the parent node, it has a foreign key on ID.
Order is the rank of the element in one branch. For example, IDs 1, 2 and 7 are on the same node while 3 and 4 are in another.
FullPath shows the order using the ID (it's for system use and performance reasons).
Hierarchy is the column displayed to the user, which displays the hierarchy to the UI. It auto calculates after every insert, update and delete, and it's the one I'm having issues.
I created a procedure for deletion elements in the table. It receives as input the ID of the element to delete and deletes it, along with it's children if any. Then, it recalculates the FullPath and the Order Column .That works.
Problems is when I try to update the Hierarchy column. I use this procedure:
SELECT T.ID,
T.ParentID,
CASE WHEN T.ParentID IS NOT NULL THEN
CONCAT(T1.Hierarchy, '.', CAST(T.Order AS NVARCHAR(255)))
ELSE
CAST(T.Order AS NVARCHAR(255))
END AS Hierarchy
INTO #tmp
FROM t_HierarchyTable T
LEFT JOIN t_HierarchyTable T1
ON T1.ID = T.ParentID
WHERE Project = #Project --Variable to only update the current project for performance
ORDER BY T.FullPath
--Update the table with ID as key on tmp table
This fails when I delete items that have lower order than others and they have children.
For example, if I delete the item 3, item 4 Hierachy will be corrected (1.1), BUT its child won't (it will stay at 1.2.1, while it should be 1.1.1). I added the order by to make sure parents where updated first, but no change.
What is my error, I really don't know how to fix this.
I managed to update the hierarchy with a CTE. Since I have the order, I can append it to Hierarchy, based on the previous branch (parent) who is already updated.
;WITH CODES(ID, sCode, iLevel) AS
(
SELECT
T.[ID] AS [ID],
CONVERT(VARCHAR(8000), T.[Order]) AS [Hierarchy],
1 AS [iLevel]
FROM
[dbo].[data] AS T
WHERE
T.[ParentID] IS NULL
UNION ALL
SELECT
T.[ID] AS [ID],
P.[Hierarchy] + IIF(RIGHT(P.[Hierarchy], 1) <> '-', '-', '') + CONVERT(VARCHAR(8000), T.[Order]) AS [Hierarchy],
P.[iLevel] + 1 AS [iLevel]
FROM
[dbo].[data] AS T
INNER JOIN CODES AS P ON
P.[ID] = T.[ParentID]
WHERE
P.[iLevel] < 100
)
SELECT
[ID], [Hierarchy], [iLevel]
INTO
#CODES
FROM
CODES

T-SQL: UPDATE table according to a column

TLDNR: how do I update a table depending on a column?
Problem situation: the current column SortingNumber is full of bad data.
Solution: reassign new values to SortingNumber based on their Parent. The SortingNumber shall be 1 for the lowest current SortingNumber (by Parent) and be incremented by 1 for every subsequent dataset.
Current data: Desired result:
ID | Parent | SortingNumber >> ID | Parent | SortingNumber
1 | 1 | 3 >> 1 | 1 | 1
2 | 1 | 4 >> 2 | 1 | 2
3 | 1 | 5 >> 3 | 1 | 3
4 | 2 | 8 >> 4 | 2 | 1
5 | 2 | 10 >> 5 | 2 | 2
6 | 2 | 13 >> 6 | 2 | 3
Actual problem: I'm having trouble figuring out how to update the datasets corresponding to their parents.
My script currently updates all the values incrementally and doesn't group it by Parent.
My current solution:
DECLARE #lastSN INTEGER = 0;
WITH toUpdate AS
(
SELECT
T1.*,
-- "calculate" the sorting number from the row above
LAG(T1.SortingNumber + 1, 1, 1) OVER (ORDER BY T1.SortingNumber) AS [newSortNumber]
FROM
T AS T1
INNER JOIN
T AS T2 ON T1.Parent = T2.ID
)
UPDATE toUpdate
SET
#lastSN = CASE WHEN [newSortNumber] = 1 AND #lastSN = 0 THEN 1 ELSE #lastSN + 1 END,
toUpdate.SortingNumber = #lastSN
;
Result is:
ID | Parent | SortingNumber
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 4
5 | 2 | 5
6 | 2 | 6
I guess my question could be phrased as: how do I update datasets depending on the Parent column?
PS: here is the CREATE statement if you wish to try it out yourself
CREATE TABLE T
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Parent INT FOREIGN KEY REFERENCES T(ID),
SortingNumber INT
);
GO
INSERT INTO T (Parent, SortingNumber)
VALUES (1, 3), (1, 4), (1, 5), (2, 8), (2, 10), (2, 13);
You can employ row_number to achieve this using partitioning by Parent and ordering by SortingNumber.
WITH cte AS (
SELECT
* ,
ROW_NUMBER() OVER (PARTITION BY Parent ORDER BY SortingNumber) AS NewSortingNumber
FROM T
)
UPDATE cte
SET SortingNumber = NewSortingNumber
A window function creates small tables within the table using Parent, so we have two subsets, one for Parent = 1 and the another for Parent = 2. Then it uses ORDER BY to know from which row it should start count (starting from 1). The first row is for Parent = 1 and ID =1 so it gets 1, the next row gets 2 etc. Please look here for more details.
As an alternative you can just rank, ordering by patient then ID:
UPDATE tt
SET sortingnumber = drank from (select *, DENSE_RANK() OVER (order by Parent, ID) as drank from tt ) a where tt.ID=a.id and tt.parent=a.parent
select * from tt

How to select rows based on a certain criteria from subsets in a table?

I have a test table with an ActionId column. The column contains an increasing and random number of rows with values of 1 to 5 and then it starts again with another subset of values from 1 to 5. The data can have one or more subsets like that.
I am interested in rows which contain ActionId of values 4 or 5 but only the last one in each subset. So in this sample, I want to return rows 7 and 11. Row id 7 because 5 is the last value before the value goes down and row id 11 because 4 is the last value before the value goes down again. For the last subset, the value doesn't need to go down again. The value 4 or 5 could be in the last row.
I can program this in a procedural language but I can't think of set based SQL solution.
CREATE TABLE test (
id [int] IDENTITY(1,1)
,ActionId INT)
INSERT INTO [test] (ActionId ) VALUES
(1), (2), (3), (3), (4), (4), (5), (3), (3), (3), (4), (1),(2)
select * from test
http://sqlfiddle.com/#!18/4ffe71/3
The solution I came up with involves a simple correlated subquery and a common table expression:
;with cte as
(
select id,
ActionId,
isnull((
select top 1 ActionId
from test as t1
where t0.id < t1.id
order by t1.id
), 0) as nextActionId
from test As t0
)
select id, ActionId
from cte
where actionId IN(4,5)
and actionId > nextActionId
The subquery gets the next actionId for each row, based on the order of the id column. The isnull is there for the last row - to return 0 instead of null.
Then, all you have to do is query the cte where the actionId is either 4 or 5 and it is larger than the next action id.
If I guess correct, You need all values where the next row's value is less that the current value. If I am correct, You can use self join for your purpose. The following script will give you the desired output-
DECLARE #test TABLE
(
id [int] IDENTITY(1,1),
Actionid INT
)
INSERT INTO #test (Actionid )
VALUES
(1), (2), (3), (3), (4), (4), (5), (3), (3), (3), (4), (1),(2)
SELECT A.*
FROM #test A LEFT JOIN #test B ON A.id = B.id-1
WHERE B.Actionid < A.Actionid
The output is-
id Actionid
7 5
11 4
If you also need the last row's value without considering any condition, just change the script with below. This will include the last value 2 in the output.
SELECT A.*
FROM #test A LEFT JOIN #test B ON A.id = B.id-1
WHERE B.Actionid < A.Actionid
OR B.Actionid IS NULL
A recursive CTE can help you here:
--Your mockup table
DECLARE #test TABLE
(
id [int] IDENTITY(1,1),
Actionid INT
)
INSERT INTO #test (Actionid )
VALUES (1), (2), (3), (3), (4), (4), (5), (3), (3), (3), (4), (1),(2);
--the query
WITH recCTE AS
(
SELECT id
,Actionid
,1 AS GroupKey
,1 AS GroupStep
FROM #test t WHERE id=1 --the IDENTITY is the sorting key obviously and will start with a 1 in this test case.
UNION ALL
SELECT t.id
,t.Actionid
,CASE WHEN t.Actionid<=r.Actionid THEN r.GroupKey+1 ELSE r.GroupKey END
,CASE WHEN t.Actionid<=r.Actionid THEN 1 ELSE r.GroupStep+1 END
FROM #test t
INNER JOIN recCTE r ON t.id=r.id+1
)
SELECT *
FROM recCTE;
The idea in short:
We start with the first row and iterate through the set row-by-row. Each row we test, if the ActionId is not increasing and set corresponding values to the GroupKey and the GroupStep.
The result
+----+----------+----------+-----------+
| id | Actionid | GroupKey | GroupStep |
+----+----------+----------+-----------+
| 1 | 1 | 1 | 1 |
+----+----------+----------+-----------+
| 2 | 2 | 1 | 2 |
+----+----------+----------+-----------+
| 3 | 3 | 1 | 3 |
+----+----------+----------+-----------+
| 4 | 3 | 2 | 1 |
+----+----------+----------+-----------+
| 5 | 4 | 2 | 2 |
+----+----------+----------+-----------+
| 6 | 4 | 3 | 1 |
+----+----------+----------+-----------+
| 7 | 5 | 3 | 2 |
+----+----------+----------+-----------+
| 8 | 3 | 4 | 1 |
+----+----------+----------+-----------+
| 9 | 3 | 5 | 1 |
+----+----------+----------+-----------+
| 10 | 3 | 6 | 1 |
+----+----------+----------+-----------+
| 11 | 4 | 6 | 2 |
+----+----------+----------+-----------+
| 12 | 1 | 7 | 1 |
+----+----------+----------+-----------+
| 13 | 2 | 7 | 2 |
+----+----------+----------+-----------+
Solving your issue
We can proceed from there by changing the final SELECT to this
SELECT TOP 1 WITH TIES *
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY GroupKey ORDER BY GroupStep DESC);
The result shows the last entry per sub-set
+----+----------+----------+-----------+
| id | Actionid | GroupKey | GroupStep |
+----+----------+----------+-----------+
| 3 | 3 | 1 | 3 |
+----+----------+----------+-----------+
| 5 | 4 | 2 | 2 |
+----+----------+----------+-----------+
| 8 | 3 | 4 | 1 |
+----+----------+----------+-----------+
| 9 | 3 | 5 | 1 |
+----+----------+----------+-----------+
| 11 | 4 | 6 | 2 |
+----+----------+----------+-----------+
| 7 | 5 | 3 | 2 |
+----+----------+----------+-----------+
| 13 | 2 | 7 | 2 |
+----+----------+----------+-----------+
You can filter to the sub-sets where the last entry is a 4 or a 5. In this case I see the rows 7 and 11 but also the row 5. Might be I did not get the logic correctly...
This is the query I came up with:
WITH cte
AS
(SELECT id, Actionid, ROW_NUMBER() OVER (ORDER BY id) rn FROM test)
SELECT
prev.id
,prev.Actionid prevActionId
,cur.Actionid curActionId
FROM cte cur
JOIN cte prev
ON prev.rn = cur.rn - 1
WHERE
prev.Actionid > cur.Actionid
AND prev.Actionid IN (4, 5)

Getting a lineage of linked rows with details

I'm trying to get a "lineage" or similar, and also information about the first and last links (at least; all would be good), out of a table that has self-referential links between rows that have been "replaced" and rows that have replaced them. The table has a structure along these lines:
CREATE TABLE Thing (
Id INT PRIMARY KEY,
TStamp DATETIME,
Replaces INT NULL,
ReplacedBy INT NULL
);
I'm stuck with this structure. :-) It's sort of doubly-linked (yes, it's a bit silly): Each row has a unique Id, and then a row that has been "replaced" by another will have a non-NULL ReplacedBy giving the Id of the replacement row, and the replacement row will also have a link back to what it replaces in Replaces. So we can use either Replaces or ReplacedBy (or both) if we like.
Here's some sample data:
INSERT INTO Thing
(Id, TStamp, Replaces, ReplacedBy)
VALUES
(1, '2017-01-01', NULL, 11),
(2, '2017-01-02', NULL, 12),
(3, '2017-01-03', NULL, NULL),
(4, '2017-01-04', NULL, NULL),
(11, '2017-01-11', 1, NULL),
(12, '2017-01-12', 2, 22),
(22, '2017-01-22', 12, NULL);
So 1 was replaced by 11, 2 was replaced by 12, and 12 was replaced by 22.
I'd like to get the following information for each chain of links from this table in a reasonable way:
Details of the row that started the chain
Details of the final row in the chain
Details of the links in-between or at least how many links (total) there are in the chain
...filtered by a date range applied to the last row in the chain.
In an ideal universe, I'd get back something like this:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 2017−01−01 |
| 1 | 11 | 11 | 2 | 2017−01−11 |
| 2 | 22 | 2 | 3 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2017−01−22 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
So far I have this query, which I could post-process to get the above:
WITH Data AS (
SELECT Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.TStamp, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
)
SELECT *
FROM Data
WHERE ReplacedBy IS NOT NULL OR Depth > 0
ORDER BY
Id, Depth;
That gives me:
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| Id | TStamp | Replaces | ReplacedBy | Depth |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| 1 | 2017−01−01 | NULL | 11 | 0 |
| 2 | 2017−01−02 | NULL | 12 | 0 |
| 11 | 2017−01−11 | 1 | NULL | 1 |
| 12 | 2017−01−12 | 2 | 12 | 0 |
| 12 | 2017−01−12 | 2 | 12 | 1 |
| 22 | 2017−01−13 | 12 | NULL | 1 |
| 22 | 2017−01−13 | 12 | NULL | 2 |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
And I could use something like this to figure out (for instance) the final row of each chain:
WITH Data AS (
SELECT Id, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
UNION ALL
SELECT Thing.Id, Thing.Replaces, Thing.ReplacedBy, Depth + 1
FROM Data
JOIN Thing
ON Thing.Replaces = Data.Id
),
MaxData AS (
SELECT Data.Id, Data.Depth
FROM Data
JOIN (
SELECT Id, MAX(Depth) AS MaxDepth
FROM Data
GROUP BY Id
) j ON data.Id = j.Id AND Data.Depth = j.MaxDepth
WHERE Depth > 0
)
SELECT *
FROM MaxData
ORDER BY
Id;
...which gives me:
+−−−−+−−−−−−−+
| Id | Depth |
+−−−−+−−−−−−−+
| 11 | 1 |
| 12 | 1 |
| 22 | 2 |
+−−−−+−−−−−−−+
...but I've lost the starting point and the points along the way.
I have the strong feeling I'm missing something really straight-forward — but clever — that would let me get this largely with the query rather than post-processing, some kind of join with a "min" and "max" query (but not like my one above). What would it be?
The table doesn't have any indexes on Replaces or ReplacedBy, but we could add any needed. The table is only lightly used (roughly 300k rows and probably only a couple of hundred updates/inserts a day).
I'm limited to SQL Server 2008 features.
Inspired by Gordon Linoff's answer and HABO's comment which highlighted something Gordon was doing that was critical, I:
Removed the SQL Server 2012+ FIRST_VALUE function, replacing it with a CROSS JOIN on an "overview" query of the data
Included the Links count in the overview query
Removed the reliance on t in Gordon's WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id), which (at last on SQL Server 2008) wasn't bound to anything
Filtered out rows that weren't replaced
Below, I also add the date filtering mentioned in the question
...filtered by a date range applied to the last row in the chain.
...which Gordon didn't cover at all, and changes our approach, but only in terms of the arrow of time.
So, first, without the date criteria, sticking fairly close to Gordon's answer:
WITH Data AS (
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
),
Overview AS (
SELECT FirstId, MAX(Id) AS LastId, COUNT(*) AS Links
FROM Data
GROUP BY
FirstId
)
SELECT d.FirstId, o.LastId, d.Id, o.Links, d.Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT LastId, Links
FROM Overview
WHERE FirstId = d.FirstId
) o
ORDER BY
d.FirstId, d.Depth
;
The critical parts of that are grabbing the seed Id as FirstId here:
SELECT Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
and then propagating it through the results of the recursive join:
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d
JOIN Thing t ON t.Replaces = d.Id
Just adding that to my original query gives us most of what I wanted. Then we add a second query to get the LastId for each FirstId (Gordon did it as a FIRST_VALUE over a partition, but I can't do that in SQL Server 2008) and using an overview query also lets me grab the number of links. We cross-apply that on the basis of the FirstId value to get the overall results I wanted.
The query above returns the following for the sample data:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...e.g., exactly what I wanted, plus Depth if I want (so I know what order the intermediary links were in).
If we wanted to include rows that were never replaced, we'd just change
WHERE Replaces IS NULL AND ReplacedBy IS NOT NULL
to
WHERE Replaces IS NULL
Giving us:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 1 | 11 | 1 | 2 | 0 | 2017-01-01 |
| 1 | 11 | 11 | 2 | 1 | 2017-01-11 |
| 2 | 22 | 2 | 3 | 0 | 2017-01-02 |
| 2 | 22 | 12 | 3 | 1 | 2017-01-12 |
| 2 | 22 | 22 | 3 | 2 | 2017-01-13 |
| 3 | 3 | 3 | 1 | 0 | 2017-01-03 |
| 4 | 4 | 4 | 1 | 0 | 2017-01-04 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
But we've ignored the date criteria required by the question:
...filtered by a date range applied to the last row in the chain.
To do that without building a massive temporary result set, we have to work backward: Instead of selecting the starting point (the first entry in a chain, Replaces IS NULL), we need to select the ending point (the last entry in a chain, ReplacedBy IS NULL), and then invert our logic working back through the chain. It's largely a matter of:
Swapping FirstId with LastId
Swapping Replaces with ReplacedBy (convenient the table had both!)
Using MIN to get the first ID in the chain rather than MAX to get the last
Using d.Depth - 1 rather than d.Depth + 1
Then fixing-up Depth based on Links once we know it in our final select, to get those nice values where 0 = first link rather than some varying negative number: o.Links + d.Depth - 1 AS Depth
All of which gives us:
WITH Data AS (
SELECT Id AS LastId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing
WHERE ReplacedBy IS NULL AND Replaces IS NOT NULL
-- Filtering by date of last entry would go here
UNION ALL
SELECT d.LastId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth - 1
FROM Data d
JOIN Thing t ON t.ReplacedBy = d.Id
),
Overview AS (
SELECT LastId, MIN(Id) AS FirstId, COUNT(*) AS Links
FROM Data
GROUP BY
LastId
)
SELECT o.FirstId, d.LastId, d.Id, o.Links, o.Links + d.Depth - 1 AS Depth, d.TStamp
FROM Data d
CROSS APPLY (
SELECT FirstId, Links
FROM Overview
WHERE LastId = d.LastId
) o
ORDER BY
o.FirstId, d.Depth
;
So for instance, if we used
AND TStamp BETWEEN '2017-01-12' AND '2017-02-01'
where I have
-- Filtering by date of last entry would go here
above, with our sample data we'd get this result:
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| 2 | 22 | 2 | 3 | 0 | 2017−01−02 |
| 2 | 22 | 12 | 3 | 1 | 2017−01−12 |
| 2 | 22 | 22 | 3 | 2 | 2017−01−13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
...because the last link the Id = 1 chain is outside the date range, so we don't include it.
This is a little tricky. Arrange the CTE to start at the beginning of each list. That makes the subsequent processing easier:
WITH Data AS (
SELECT Id as FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM Thing t
WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id)
UNION ALL
SELECT d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM Data d JOIN
Thing t
ON t.Replaces = d.Id
)
SELECT d.*,
FIRST_VALUE(id) OVER (PARTITION BY FirstId ORDER BY Depth DESC) as LastId
FROM Data d;
Then, you can use FIRST_VALUE() with a reverse sort to get the last value in the chain.
This returns chains that have no links. You can add a filter to remove these.

SQL Server 2005 T-SQL Problem: Need help in omitting records

Good day!
I need help in writing a query.. I have records in a table below.. The condition would be no records should be displayed if the succeeding records' new_state was repeated from the previous records(new_state) and if it is changed in the same date..
here record_id 1 has gone through the ff states: 0->1->2->1->3->4->3 in the same day.. state 1 was changed to state 2 then back to state 1 again (id 2 & 3 would not be displayed).. same with state 3 (id 5 & 6 would not be displayed)..
id | record_id| date_changed | old_state | new_state |
1 | 1 | 2009-01-01 | 0 | 1 |
2 | 1 | 2009-01-01 | 1 | 2 | not displayed
3 | 1 | 2009-01-01 | 2 | 1 | not displayed
4 | 1 | 2009-01-01 | 1 | 3 |
5 | 1 | 2009-01-01 | 3 | 4 | not displayed
6 | 1 | 2009-01-01 | 4 | 3 | not displayed
so the result would display only 2 records for record_id=1..
id | record_id| date_changed | old_state | new_state |
1 | 1 | 2009-01-01 | 0 | 1 |
4 | 1 | 2009-01-01 | 1 | 3 |
Here's the code for table creation and data:
IF OBJECT_ID('TempDB..#table','U') IS NOT NULL
DROP TABLE #table
CREATE TABLE #table
(
id INT identity primary key,
record_id INT,
date_changed DATETIME,
old_state INT,
new_state INT
)
INSERT INTO #table(record_id,date_changed,old_state,new_state)
SELECT 1,'2009-01-01',0,1 UNION ALL --displayed
SELECT 1,'2009-01-01',1,2 UNION ALL --not displayed
SELECT 1,'2009-01-01',2,1 UNION ALL --not displayed
SELECT 1,'2009-01-01',1,3 UNION ALL --displayed
SELECT 1,'2009-01-01',3,4 UNION ALL --not displayed
SELECT 1,'2009-01-01',4,3 --not displayed
INSERT INTO #table(record_id,date_changed,old_state,new_state)
SELECT 3,'2009-01-01',0,1 UNION ALL --displayed
SELECT 3,'2009-01-01',1,2 UNION ALL --not displayed
SELECT 3,'2009-01-01',2,3 UNION ALL --not displayed
SELECT 3,'2009-01-01',3,4 UNION ALL --not displayed
SELECT 3,'2009-01-01',4,1 --not displayed
SELECT * FROM #table
I would appreciate any help..
Thanks
For clarity regarding record_id=3.. Given this table:
id | record_id| date_changed | old_state | new_state |
7 | 3 | 2009-01-01 | 0 | 1 |
8 | 3 | 2009-01-01 | 1 | 2 | not displayed
9 | 3 | 2009-01-01 | 2 | 3 | not displayed
10 | 3 | 2009-01-01 | 3 | 4 | not displayed
11 | 3 | 2009-01-01 | 4 | 1 | not displayed
when running the query for record_id=3, the table result will be:
id | record_id| date_changed | old_state | new_state |
7 | 3 | 2009-01-01 | 0 | 1 |
Thanks!
UPDATE (12/2/2009):
Special scenario
id | record_id| date_changed | old_state | new_state |
1 | 4 | 2009-01-01 | 0 | 1 | displayed
2 | 4 | 2009-01-01 | 1 | 2 | displayed
3 | 4 | 2009-01-01 | 2 | 3 | not displayed
4 | 4 | 2009-01-01 | 3 | 2 | not displayed
5 | 4 | 2009-01-01 | 2 | 3 | displayed
6 | 4 | 2009-01-01 | 3 | 4 | not displayed
7 | 4 | 2009-01-01 | 4 | 3 | not displayed
where new_state 3 appears on id 3,5 and 7.. id 3 would not be displayed since it is between id 2 and id 4 which have the same new_state(3).. Then id 5 should be displayed since there is no existing new_state 3 yet..
code snippet:
IF OBJECT_ID('TempDB..#tablex','U') IS NOT NULL
DROP TABLE #tablex
CREATE TABLE #tablex
(
id INT identity primary key,
record_id INT,
date_changed DATETIME,
old_state INT,
new_state INT
)
INSERT INTO #tablex(record_id,date_changed,old_state,new_state)
SELECT 4,'2009-01-01',0,1 UNION ALL --displayed
SELECT 4,'2009-01-01',1,2 UNION ALL --displayed
SELECT 4,'2009-01-01',2,3 UNION ALL --not displayed
SELECT 4,'2009-01-01',3,2 UNION ALL --not displayed
SELECT 4,'2009-01-01',2,3 UNION ALL --displayed
SELECT 4,'2009-01-01',3,4 UNION ALL --not displayed
SELECT 4,'2009-01-01',4,3 --not displayed
I think the sequence in building the result is important..
Thanks!
SELECT A.*
/*
A.ID, A.old_state, a.new_state,
B.ID as [Next], b.old_state, b.new_state,
C.ID as [Prev], c.old_state, c.new_state
*/
FROM #table A LEFT JOIN
#table B ON A.ID = (B.ID - 1)
LEFT JOIN #table C ON (A.ID - 1) = C.ID
-- WHERE A.old_State <> B.new_State AND A.new_State <> C.old_State
WHERE A.record_id = 1
AND A.old_State <> COALESCE(B.new_State, -1)
AND A.new_State <> COALESCE(C.old_State, -1)
EDIT: I guess, what OP needs is that the remaining record should be selected except those where current record's old state is not the same as next record's new state (kind of an undo operation in records) and current record's new state should not be same as previous record's old state.
Following steps to get to the result
select all items that should not appear in the result.
left join these with the original table and select only those records that don't match a should not appear record.
.
;WITH cte_table (master_id, master_state, id, record_id, old_state, new_state, level) AS
(
SELECT id, old_state, id, record_id, old_state, new_state, 1
FROM #table
UNION ALL
SELECT master_id, master_state, #table.id, #table.record_id, #table.old_state, #table.new_state, level + 1
FROM cte_table
INNER JOIN #table ON cte_table.new_state = #table.old_state
AND cte_table.record_id = #table.record_id
AND cte_table.id < #table.id
AND cte_table.master_state < #table.old_state
)
SELECT master_id, t1.*, level
INTO #result
FROM #table t1
INNER JOIN (
SELECT master_id, min_child_id = MIN(id), level
FROM cte_table
GROUP BY master_id, level
) t2 ON t2.min_child_id = t1.id
SELECT t1.*
FROM #table t1
LEFT OUTER JOIN (
SELECT r1.id
FROM #result r1
INNER JOIN (
SELECT r1.master_id
FROM #result r1
INNER JOIN #result r2 ON r2.new_state = r1.old_state
AND r2.master_id = r1.master_id
WHERE r1.level = 1
) r2 ON r2.master_id = r1.master_id
) r1 ON r1.id = t1.id
WHERE r1.id IS NULL
AND t1.old_state < t1.new_state
ORDER BY 1, 2, 3

Resources