How to update the parent field in SQL Server - sql-server

My data looks like this
ID Text IsParent ParentID
-------------------------
1 A 1 NULL
2 B 0 NULL
3 C 0 NULL
4 D 0 NULL
5 E 1 NULL
6 F 0 NULL
7 G 1 NULL
8 H 0 NULL
I want to fill ParentID with the previous parentID.
Data is ordered so
ID : 2,3,4 have parentID : 1
ID : 6 has parentID : 5
ID : 8 has parentID : 7
How to do this with SQL?
I have tried with a cursor, but it is way too slow.
Here is my code:
DECLARE cur1 CURSOR FOR
SELECT ID Text IsParent ParentID
FROM x2
ORDER BY ID
OPEN cur1
FETCH NEXT FROM cur1 INTO #ID, #Text, #IsParent, #ParentID
WHILE ##FETCH_STATUS = 0
BEGIN
IF #IsParent = 1
BEGIN
SET #LastParentID = #ID
END
ELSE
BEGIN
UPDATE X2
SET ParentID = #LastParentID
WHERE ID = #ID
END
FETCH NEXT FROM cur1 INTO #ID, #Text, #IsParent, #ParentID
END;
CLOSE cur1;
DEALLOCATE cur1;

You can do this with APPLY. The premise is to find the parent record with the highest ID, where the ID is lower than the child record.
Example
DECLARE #x2 TABLE (ID INT NOT NULL, Text CHAR(1), IsParent BIT, ParentID INT);
INSERT #x2 (ID, Text, IsParent)
VALUES
(1, 'A', 1), (2, 'B', 0), (3, 'C', 0), (4, 'D', 0),
(5, 'E', 1), (6, 'F', 0), (7, 'G', 1), (8, 'H', 0);
UPDATE c
SET ParentID = p.ID
FROM #x2 AS c
CROSS APPLY
( SELECT TOP 1 ID
FROM #x2 AS p
WHERE p.IsParent = 1 -- Is a parent record
AND p.ID < c.ID -- ID is lower than child record
ORDER BY p.ID DESC -- Order descending to get the highest ID
) AS p
WHERE c.IsParent = 0
AND c.ParentID IS NULL;
SELECT *
FROM #x2;
OUTPUT
ID Text IsParent ParentID
---------------------------------
1 A 1 NULL
2 B 0 1
3 C 0 1
4 D 0 1
5 E 1 NULL
6 F 0 5
7 G 1 NULL
8 H 0 7

You can use CTE and window function to achieve that.
First we are creating continuous id(cid) using sum and secondly picking the minimum ID using the cid created in the first step and then finally updating the table where IsParent is 0.
try the following:
;WITH cte AS
(
SELECT *, sum(t.IsParent) OVER (ORDER BY id) cid
FROM #t t
),
cte2 AS
(
SELECT *, min(id) OVER (PARTITION BY cid ORDER BY id) pid
FROM cte c
)
UPDATE t
SET
t.ParentID = pid
FROM #t t
JOIN cte2 c ON c.id = t.ID
WHERE c.IsParent = 0
db<>fiddle demo.

The quickest way would be to insert child records with the parent id. Instead of populating the parent ids after the fact. From code you would first insert parent records and get back the newly generated parent ids. Then insert the child records with those newly generated parent ids.
Trying to maintain a query's speed like the ones suggested would just get gross over time as the data grows. Just because you can, doesn't mean you should.
Also as a side note, if you plan on having child of child records with an unknown depth. To avoid recursion I would recommend looking into having a hierarchyid data type column.

Related

Identify duplicates based on multiple columns and parent row

This is an example of table data that I am working on (the table contained a lot of columns, I am showing here only the relevant ones):
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
1
1
3
42FWD-42
5
1
Id is auto generated. parent_id links the job using the id.
When a new job is created via the app, a new row is created (with status "0"). The auto-generated Id is then used for subsequent rows of same job, and set as parent id.
Another record with status "1" (which is code for started) is also created just after parent record.
Explanation of the problem: due to a bug in the app, there are duplicate set of rows for the same job.
Example of problem
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
0
0
3
42FWD-42
1
1
4
42FWD-42
1
2
5
42FWD-42
5
1
As you can see from this example, due to the bug, there are 2 rows with "0" status for the same job, and 2 rows with "1" status.
This creates a lot of problems in operation in app where the job is updated using the job number.
The status number should not repeat for a specific job.
What I want to do is to find all duplicates like those in example. For example, I want a query where I can find all duplicates which have same job number, but different parent_id and NO "5" status.
Example result using the example table above, I need the query to return:
Id
job_number
status
parent_id
2
42FWD-42
0
0
4
42FWD-42
1
2
Explanation of this result:
Row with Id=1 is considered the correct record because it has an associated record with status "5"
Row with Id=2 is considered duplicate and its associated records are also considered duplicate
Another possible case: there are duplicate rows, but none have status=5. These rows can be discarded, ie need not be shown in results.
A brief explanation of how the query works will be appreciated.
EDIT:
I forgo to add an important information:
job_number is case sensitive.
ie: 42FWD-42 and 42fwd-42 are different and valid job number. They should not be considered duplicates, and are 2 separate jobs.
The reason for this is the actual job number is not small text as in my example. It is a long string like a guid.
First I must mention you should block identical rows by means of a unique constraint. I suggest that once you have eliminated all duplicates you put up a such a constraint to keep this from happening again.
Now for your question, you can do this by grouping on the duplicate columns, and have only those that count more than one.
Here is an example
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0), (2, '42FWD-42', 0, 0), (3, '42FWD-42', 1, 1), (4, '42FWD-42', 1, 2), (5, '42FWD-42', 5, 1)
select max(t.id) as id, t.job_number, t.status
from #t t
group by t.job_number, t.status
having count(*) > 1
the result is
id job_number status
2 42FWD-42 0
4 42FWD-42 1
and to get also the parent_id you can add a self join
select max(t.id) as id,
t.job_number,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number, t.status
having count(*) > 1
this returns
id job_number status parent_id
2 42FWD-42 0 0
4 42FWD-42 1 2
EDIT
To solve the addional problem in the edit of your question, about the case sensitive, you can fix that by using a COLLATE in your field retrieval and your comparision
this should do it
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0),
(2, '42FWD-42', 0, 0),
(3, '42FWD-42', 1, 1),
(4, '42fwd-42', 1, 2), -- LOWERCASE !!!
(5, '42FWD-42', 5, 1)
select max(t.id) as id,
t.job_number COLLATE Latin1_General_CS_AS,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
and now the result will be
id job_number status parent_id
2 42FWD-42 0 0
Yet another edit
Now, suppose you need to use the result of these duplicate id's in another query, you could do something like this
select t.*
from #t t
where t.id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
What I am doing here is getting only the duplicate id's in a form that can be used to feed a where clause in another query.
This way you can use the result set in any way you wish.
Also note that for this we don't need the self join to retrieve the parent_id anymore.
One possible use of this could be to delete duplicate rows, you can write
delete from yourtable
where id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
you can try to use ROW_NUMBER window function to get duplicate row data and its id by job_number, then using cte recursive to find all error records by this id
Query 1:
;WITH CTE AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY job_number ORDER BY Id) rn
FROM T
WHERE status = 0
), CTE1 AS (
SELECT id,job_number,status,parent_id
FROM CTE
WHERE rn > 1
UNION ALL
SELECT t.id,t.job_number,t.status,t.parent_id
FROM CTE1 c INNER JOIN T t
ON c.id = t.parent_id
)
SELECT *
FROM CTE1
Results:
| id | job_number | status | parent_id |
|----|------------|--------|-----------|
| 2 | 42FWD-42 | 0 | 0 |
| 4 | 42FWD-42 | 1 | 2 |

T-SQL Get Values from Lookup table and use in view / stored procedure

I couldn't find that via search, so I guess I am not asking it the right way, so help is welcome.
We have a lookup table:
Id Name
------------------
1 "Test"
2 "New"
3 "InProgress"
Table2:
StatusId SomethingElse
1
2
Table 1
ID Other Other StatusId (Fkey to Table2) ...
Then we have a view that selects from several tables and one of the columns is a CASE Statement:
SELECT * FROM Table1 t1 -- has million records
CASE When t1.StatusId = 1 THEN (SELECT Name from LOOKUP table where ID = 1) END --'Test'
CASE When t1.StatusId = 2 THEN (SELECT Name from LOOKUP table where ID = 2) END --'New'
CASE When t3.Date is not null THEN (SELECT Name from LOOKUP table where ID = 3) END --'In Progress'
-- AND ALSO the case look at other tables another 5-6 tables and there are conditions from there
INNER JOIN Table2 t2 on ...
INNER JOIN Table3 t3 on ...
As you see these are really static values.
I want to load them once into variables, e.g.
#LookUp1 = SELECT [NAME] FROM LookUP WHERE Id = 1,
#LookUp2 = SELECT [NAME] FROM LookUP WHERE Id = 2
and replace the select in the CASE statement to this:
When StatusId = 1 THEN #LookUp
When StatusId = 2 THEN #LookUp2
The view loops through millions of records and it gets really slow to do the select from Lookup table for every row.
Why not simply use a join?
SELECT <columns list from main table>, Lt.Name
FROM <main table> As Mt -- Do not use such aliases in real code!
JOIN <SecondaryTable> As St -- this represents your Table3
ON <condition>
[LEFT] JOIN <Lookup table> As Lt
ON Mt.StatusId = Lt.Id
OR (Lt.Id = 3 AND St.Date is not null)
Of course, replace <columns list from main table> with the actual columns list, <main table> with the name of the main table and so on.
The join might be an inner or left join, depending on the nullability of the StatusId column in the main table and if it's nullable, on what you want to get in such cases (either a row with null name or no row at all).
I've put together a little demonstration to show you exactly what I mean.
Create and populate sample tables (Please save us this step in your future questions):
CREATE TABLE LookUp (Id int, Name varchar(10));
INSERT INTO LookUp (Id, Name) VALUES
(1, 'Test'), (2, 'New'), (3, 'InProgress');
CREATE TABLE Table1 (Id int not null, StatusId int null);
INSERT INTO Table1(Id, StatusId)
SELECT n, CASE WHEN n % 3 = 0 THEN NULL ELSE (n % 3) END
FROM
(
SELECT TOP 30 ROW_NUMBER() OVER(ORDER BY ##SPID) As n
FROM sys.objects
) tally
CREATE TABLE Table3
(
Id int not null,
Date date null
)
INSERT INTO Table3 (Id, Date)
SELECT Id, CASE WHEN StatusId IS NULL AND Id % 4 = 0 THEN GetDate() END
FROM Table1
The query:
SELECT Table1.Id,
Table1.StatusId,
Table3.Date,
LookUp.Name
FROM Table1
JOIN Table3
ON Table1.Id = Table3.Id
LEFT JOIN LookUp
ON Table1.StatusId = LookUp.Id
OR (LookUp.Id = 3 AND Table3.Date IS NOT NULL)
Results:
Id StatusId Date Name
1 1 NULL Test
2 2 NULL New
3 NULL NULL NULL
4 1 NULL Test
5 2 NULL New
6 NULL NULL NULL
7 1 NULL Test
8 2 NULL New
9 NULL NULL NULL
10 1 NULL Test
11 2 NULL New
12 NULL 27.06.2019 InProgress
13 1 NULL Test
14 2 NULL New
15 NULL NULL NULL
16 1 NULL Test
17 2 NULL New
18 NULL NULL NULL
19 1 NULL Test
20 2 NULL New
21 NULL NULL NULL
22 1 NULL Test
23 2 NULL New
24 NULL 27.06.2019 InProgress
25 1 NULL Test
26 2 NULL New
27 NULL NULL NULL
28 1 NULL Test
29 2 NULL New
30 NULL NULL NULL
You can also see a live demo on rextester.
Create a SQL function which return Name according to Id.
Create FUNCTION [dbo].[GetLookUpValue]
(
#Id int
)
RETURNS varchar(500)
AS BEGIN
return(Select Name from LOOKUP_table with(nolock) where Id=#Id)
END

Batches over groups

I need to process rows in a table in batches of not less than N rows. Each batch needs to contain an entire group of rows (group is just another column) i.e. when I select top N rows from the table for processing, I need to extend that N to cover the last group in the batch rather than splitting the last group between batches.
Sample data:
CREATE TABLE test01 (id INT PRIMARY KEY CLUSTERED IDENTITY(1, 1) NOT NULL
, person_name NVARCHAR(100)
, person_surname NVARCHAR(100)
, person_group_code CHAR(2) NOT NULL);
INSERT INTO
dbo.test01 (person_name
, person_surname
, person_group_code)
VALUES
('n1', 's1', 'g1')
, ('n2', 's2', 'g1')
, ('n3', 's3', 'g1')
, ('n4', 's4', 'g1')
, ('n5', 's5', 'g2')
, ('n6', 's6', 'g2')
, ('n7', 's7', 'g2')
, ('n8', 's8', 'g2')
, ('n9', 's9', 'g2')
, ('n10', 's10', 'g2')
, ('n11', 's11', 'g3')
, ('n12', 's12', 'g3')
, ('n13', 's13', 'g3')
, ('n14', 's14', 'g3');
My current attempt:
DECLARE #batch_start INT = 1
, #batch_size INT = 5;
DECLARE #max_id INT = (SELECT MAX(id) FROM dbo.test01);
WHILE #batch_start <= #max_id
BEGIN
SELECT *
FROM dbo.test01
WHERE id BETWEEN #batch_start AND #batch_start + #batch_size - 1;
SELECT #batch_start += #batch_size;
END;
DROP TABLE dbo.test01;
In the example above, I am splitting the 14 rows into 3 batches: 5 rows in batch #1, another 5 rows in batch #2 and then 4 rows in the final batch.
The first batch (id from 1 to 5) covers only fraction of the 'g2' group so I need to extend this batch to cover rows 1-10 (I need to process the entire g2 in a single batch).
(by the way, I don't mind batch upsizing - I need to make sure I cover at least one full group per batch).
The result would be that batch #1 would cover groups g1 and g2 (10 rows) then batch #2 would cover group g3 (4 rows) and there would be no batch #3 at all.
Now, the table is billions of rows and batch sizes are around 50K-100K each so I need a solution that performs well.
Any hints on how to approach this with minimal performance hit?
The first thing I've noticed is that your current code assumes no gaps in the identity column - However that is a mistake. An identity column may (and often do) have gaps in the numbers - so the first thing you want to do is use row_number() over(order by id) to get a continuous running number for all your records.
The second thing I've added as a column that gave a numeric id for each group ordered by the same order as the identity column - using a well-known technique for solving gaps and islands problems.
I've used a table variable to store this data for each id on the source table for the purpose if this demonstration, but you might want to use a temporary table and add indexes on the relevant columns to improve performance.
I've also renamed your #batch_size variable to #batch_min_size and added a few other variables.
So here is the table variable I've used:
DECLARE #Helper As Table (Id int, Rn int, GroupId int)
INSERT INTO #Helper (Id, Rn, GroupId)
SELECT Id,
ROW_NUMBER() OVER(ORDER BY ID) As Rn,
ROW_NUMBER() OVER(ORDER BY ID) -
ROW_NUMBER() OVER(PARTITION BY person_group_code ORDER BY ID) As GroupId
FROM dbo.test01
This is the content of this table:
Id Rn GroupId
1 1 0
2 2 0
3 3 0
4 4 0
5 5 4
6 6 4
7 7 4
8 8 4
9 9 4
10 10 4
11 11 10
12 12 10
13 13 10
14 14 10
I've used a while loop to do the batches.
In the loop, I've used this table to calculate the first and last id of each batch, as well as the last row number of the batch.
Then all I had to do was to use the first and last id in the where clause of the original table:
DECLARE #batch_min_size int = 10
, #batch_end int = 0
, #batch_start int
, #first_id_of_batch int
, #last_id_of_batch int
, #total_row_count int;
SELECT #total_row_count = COUNT(*) FROM #test01
WHILE #batch_end < #total_row_count
BEGIN
SELECT #batch_start = #batch_end + 1;
SELECT #batch_end = MAX(Rn)
, #first_id_of_batch = MIN(Id)
, #last_id_of_batch = MAX(Id)
FROM #Helper
WHERE Rn >= #batch_start
AND GroupId <=
(
SELECT MAX(GroupId)
FROM #Helper
WHERE Rn <= #batch_start + #batch_min_size - 1
)
SELECT id, person_name, person_surname, person_group_code
FROM dbo.test01
WHERE Id >= #first_id_of_batch
AND Id <= #last_id_of_batch
END
See a live demo on rextester.
See if below helps:
CREATE TABLE #Temp(g_record_count int, groupname varchar(50) )
insert into #Temp(g_record_count,groupname) SELECT MAX(id),person_group_code FROM dbo.test01 group by person_group_code
After this loop through this temporary table :
DECLARE #rec_per_batch INT = 1
WHILE #batch_start <= #max_id
BEGIN
select min(g_record_count) into #rec_per_batch from #temp where g_record_count>=#batch_size * #batch_start;
SELECT *
FROM dbo.test01
WHERE id BETWEEN #batch_start AND #rec_per_batch;
SELECT #batch_start += #batch_size;
END;

Transact-SQL - number rows until condition met

I'm trying to generate the numbers in the "x" column considering the values in field "eq", in a way that it should assign a number for every record until it meets the value "1", and the next row should reset and start counting again. I've tried with row_number, but the problem is that I only have ones and zeros in the column I need to evaluate, and the cases I've seen using row_number were using growing values in a column. Also tried with rank, but I haven't managed to make it work.
nInd Fecha Tipo #Inicio #contador_I #Final #contador_F eq x
1 18/03/2002 I 18/03/2002 1 null null 0 1
2 20/07/2002 F 18/03/2002 1 20/07/2002 1 1 2
3 19/08/2002 I 19/08/2002 2 20/07/2002 1 0 1
4 21/12/2002 F 19/08/2002 2 21/12/2002 2 1 2
5 17/03/2003 I 17/03/2003 3 21/12/2002 2 0 1
6 01/04/2003 I 17/03/2003 4 21/12/2002 2 0 2
7 07/04/2003 I 17/03/2003 5 21/12/2002 2 0 3
8 02/06/2003 F 17/03/2003 5 02/06/2003 3 0 4
9 31/07/2003 F 17/03/2003 5 31/07/2003 4 0 5
10 31/08/2003 F 17/03/2003 5 31/08/2003 5 1 6
11 01/09/2005 I 01/09/2005 6 31/08/2003 5 0 1
12 05/09/2005 I 01/09/2005 7 31/08/2003 5 0 2
13 31/12/2005 F 01/09/2005 7 31/12/2005 6 0 3
14 14/01/2006 F 01/09/2005 7 14/01/2006 7 1 4
There is another solution available:
select
nind, eq, row_number() over (partition by s order by s)
from (
select
nind, eq, coalesce((
select sum(eq) +1 from mytable pre where pre.nInd < mytable.nInd)
,1) s --this is the sum of eq!
from mytable) g
The inner subquery creates groups sequentially for each occurrence of 1 in eq. Then we can use row_number() over partition to get our counter.
Here is an example using Sql Server
I have two answers here. One is based off of the ROW_NUMBER() and the other is based off of what appears to be your index (nInd). I wasn't sure if there would be a gap in your index so I made the ROW_NUMBER() as well.
My table format was as follows -
myIndex int identity(1,1) NOT NULL
number int NOT NULL
First one is ROW_NUMBER()...
WITH rn AS (SELECT *, ROW_NUMBER() OVER (ORDER BY myIndex) AS rn, COUNT(*) AS max
FROM counting c GROUP BY c.myIndex, c.number)
,cte (myIndex, number, level, row) AS (
SELECT r.myIndex, r.number, 1, r.rn + 1 FROM rn r WHERE r.rn = 1
UNION ALL
SELECT r1.myIndex, r1.number,
CASE WHEN r1.number = 0 AND r2.number = 1 THEN 1
ELSE c.level + 1
END,
row + 1
FROM cte c
JOIN rn r1
ON c.row = r1.rn
JOIN rn r2
ON c.row - 1 = r2.rn
)
SELECT c.myIndex, c.number, c.level FROM cte c OPTION (MAXRECURSION 0);
Now the index...
WITH cte (myIndex, number, level) AS (
SELECT c.myIndex + 1, c.number, 1 FROM counting c WHERE c.myIndex = 1
UNION ALL
SELECT c1.myIndex + 1, c1.number,
CASE WHEN c1.number = 0 AND c2.number = 1 THEN 1
ELSE c.level + 1
END
FROM cte c
JOIN counting c1
ON c.myIndex = c1.myIndex
JOIN counting c2
ON c.myIndex - 1 = c2.myIndex
)
SELECT c.myIndex - 1 AS myIndex, c.number, c.level FROM cte c OPTION (MAXRECURSION 0);
The answer that I have now is via using
Cursor
I know if there is another solution without cursor it will be better for performance aspects
here is a quick demo of my solution:
-- Create DBTest
use master
Go
Create Database DBTest
Go
use DBTest
GO
-- Create table
Create table Tabletest
(nInd int , eq int)
Go
-- insert dummy data
insert into Tabletest (nInd,eq)
values (1,0),
(2,1),
(3,0),
(4,1),
(5,0),
(6,0),
(7,0),
(8,0),
(9,1),
(8,0),
(9,1)
Create table #Tabletest (nInd int ,eq int ,x int )
go
DECLARE #nInd int , #eq int , #x int
set #x = 1
DECLARE db_cursor CURSOR FOR
SELECT nInd , eq
FROM Tabletest
order by nInd
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #nInd , #eq
WHILE ##FETCH_STATUS = 0
BEGIN
if (#eq = 0)
begin
insert into #Tabletest (nInd ,eq ,x) values (#nInd , #eq , #x)
set #x = #x +1
end
else if (#eq = 1)
begin
insert into #Tabletest (nInd ,eq ,x) values (#nInd , #eq , #x)
set #x = 1
end
FETCH NEXT FROM db_cursor INTO #nInd , #eq
END
CLOSE db_cursor
DEALLOCATE db_cursor
select * from #Tabletest
The end result set will be as following:
Hope it helps.
Looking at this a slightly different way (which might not be true, but eliminates the need for cursors of recursive CTEs), it looks like you building ordered groups within your dataset. So, start by finding those groups, then determining the ordering of each of them.
The real key is to determine the rules to find the correcting grouping. Based on your description and comments, I'm guessing the grouping is from the start (ordered by the nInd column) ending at each row with and eq value of 1, so you can do something like:
;with ends(nInd, ord) as (
--Find the ending row for each set
SELECT nInd, row_number() over(order by nInd)
FROM mytable
WHERE eq=1
), ranges(sInd, eInd) as (
--Find the previous ending row for each ending row, forming a range for the group
SELECT coalesce(s.nInd,0), e.nInd
FROM ends s
right join ends e on s.ord=e.ord-1
)
Then, using these group ranges, you can find the final ordering of each:
select t.nInd, t.Fecha, t.eq
,[x] = row_number() over(partition by sInd order by nInd)
from ranges r
join mytable t on r.sInd < t.nInd
and t.nInd <= r.eInd
order by t.nInd

Tsql group by clause with exceptions

I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.
Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.

Resources