SQL Count of rows in a group in a sequence - sql-server

I apologise for the title of the question, it isn't very clear, but I can't think of a better way of describing it in words, the data should speak for itself.
I have a table of data where I need to know the count of rows that are the same depending on a value, but also taking into account the sequence they are currently in. The table is much larger than this, but these are the columns which are relevant.
Id | MinCode | MaxCode | ExpectedResult
----------------------------------------------------
1 | 00001.000001 | 00001.000001 | 2
2 | 00001.000001 | 00001.000002 | 2
3 | 00002.00001a | 00002.00001a | 3
4 | 00002.00001a | 00002.00001b | 3
5 | 00002.00001a | 00002.00001c | 3
6 | 00002.000002 | 00002.000002 | 1
7 | 00002.00003a | 00002.00003a | 2
8 | 00002.00003a | 00002.00003b | 2
9 | 00002.000002 | 00002.000004 | 1
10 | 00003.000001 | 00003.000001 | 1
Note: Id is also the order in this example, I just didn't see the point of an extra column with the same data.
I have tried several versions using COUNT, ROW_NUMBER/RANK, PARTITION and GROUP BY without getting the ExpectedResult values. The issue is with Ids 6 and 9 as my ExpectedResult value always combines them to produce a 2 which I partially understand as these functions don't take into account the ordering of the data. I believe I'm close, but my T-SQL is pretty rusty these days!
I know I could get this value with processing this data set through a CURSOR, but I'd like to avoid that.

The key here is to create a sequence id you can use in a window function to get the count. You won't be able to do it in one query because window functions can't be combined, but you can pull it off with a subquery or CTE.
To determine the sequence number for a row, you need to count the number of times the group key has changed in the preceding rows. So to determine the changes, create an inner query that checks if the current group key is different from the previous by using the lag window function. Use a case statement that results in 1 or 0 depending on if the lagged value is different from the current. The outer query then just has to sum up the values for all rows preceding up to the current.
Once you have the sequence number, you can use a count window function to count all the rows with matching numbers.
WITH src AS ( -- cte to mimic table.
SELECT *
FROM (VALUES
(1, N'00001.000001', N'00001.000001', 2),
/* ... test data ... */
(10, N'00003.000001', N'00003.000001', 1)
) [src] ( [Id],[MinCode],[MaxCode],[ExpectedResult] )
)
SELECT src.Id, MinCode, MaxCode, ExpectedResult
, COUNT(1) OVER (PARTITION BY seq.SequenceId) [Result]
FROM src
INNER JOIN (
SELECT x.Id, SUM(x.IsNew) OVER (ORDER BY Id ROWS UNBOUNDED PRECEDING) [SequenceId]
FROM (
SELECT Id, CASE WHEN LAG(MinCode) OVER (ORDER BY Id) <> MinCode THEN 1 ELSE 0 END [IsNew]
FROM src
) x
) seq ON seq.Id = src.Id
ORDER BY Id

Here's another possibility:
/* Testing Data */
DECLARE #Data table (
Id int, MinCode varchar(20), MaxCode varchar(20), ExpectedResult int
);
INSERT INTO #Data VALUES
( 1 , '00001.000001', '00001.000001', 2 ),
( 2 , '00001.000001', '00001.000002', 2 ),
( 3 , '00002.00001a', '00002.00001a', 3 ),
( 4 , '00002.00001a', '00002.00001b', 3 ),
( 5 , '00002.00001a', '00002.00001c', 3 ),
( 6 , '00002.000002', '00002.000002', 1 ),
( 7 , '00002.00003a', '00002.00003a', 2 ),
( 8 , '00002.00003a', '00002.00003b', 2 ),
( 9 , '00002.000002', '00002.000004', 1 ),
( 10, '00003.000001', '00003.000001', 1 );
/* Get count of MinCode rows that are the same, taking into account their sequence */
WITH cte AS (
SELECT
Id,
MinCode,
CASE WHEN
LAG ( MinCode, 1 ) OVER ( ORDER BY Id ) = MinCode
OR
LEAD ( MinCode, 1 ) OVER ( ORDER BY Id ) = MinCode
THEN 1
ELSE 0
END AS SeqMatch
FROM #Data
)
SELECT
Id, MinCode, MaxCode, ExpectedResult,
CASE WHEN MatchCount = 0 THEN 1 ELSE MatchCount END AS DerivedResult
FROM #Data d
OUTER APPLY (
SELECT SUM( SeqMatch ) AS MatchCount FROM cte WHERE cte.MinCode = d.MinCode
) AS x;
Returns
+----+--------------+--------------+----------------+---------------+
| Id | MinCode | MaxCode | ExpectedResult | DerivedResult |
+----+--------------+--------------+----------------+---------------+
| 1 | 00001.000001 | 00001.000001 | 2 | 2 |
| 2 | 00001.000001 | 00001.000002 | 2 | 2 |
| 3 | 00002.00001a | 00002.00001a | 3 | 3 |
| 4 | 00002.00001a | 00002.00001b | 3 | 3 |
| 5 | 00002.00001a | 00002.00001c | 3 | 3 |
| 6 | 00002.000002 | 00002.000002 | 1 | 1 |
| 7 | 00002.00003a | 00002.00003a | 2 | 2 |
| 8 | 00002.00003a | 00002.00003b | 2 | 2 |
| 9 | 00002.000002 | 00002.000004 | 1 | 1 |
| 10 | 00003.000001 | 00003.000001 | 1 | 1 |
+----+--------------+--------------+----------------+---------------+

I understand you want this
SELECT count(*) over (partition by MinCode ) as result
FROM test
order BY id
Gives
create table test (Id int, MinCode VARCHAR(30), MaxCode VARCHAR(30), ExpectedResult INT);
INSERT INTO test VALUES (1 , '00001.000001', '00001.000001', 2);
INSERT INTO test VALUES (2 , '00001.000001', '00001.000002', 2);
INSERT INTO test VALUES (3 , '00002.00001a', '00002.00001a', 3);
INSERT INTO test VALUES (4 , '00002.00001a', '00002.00001b', 3);
INSERT INTO test VALUES (5 , '00002.00001a', '00002.00001c', 3);
INSERT INTO test VALUES (6 , '00002.000002', '00002.000002', 1);
INSERT INTO test VALUES (7 , '00002.00003a', '00002.00003a', 2);
INSERT INTO test VALUES (8 , '00002.00003a', '00002.00003b', 2);
INSERT INTO test VALUES (9 , '00002.000002', '00002.000004', 1);
INSERT INTO test VALUES (10 , '00003.000001', '00003.000001', 1);
SELECT Id,MinCode,MaxCode, ExpectedResult AS You,
(count(*) over (partition by MinCode )) as Me
FROM test
order BY id
Id MinCode MaxCode You Me
1 00001.000001 00001.000001 2 2
2 00001.000001 00001.000002 2 2
3 00002.00001a 00002.00001a 3 3
4 00002.00001a 00002.00001b 3 3
5 00002.00001a 00002.00001c 3 3
6 00002.000002 00002.000002 1 2
7 00002.00003a 00002.00003a 2 2
8 00002.00003a 00002.00003b 2 2
9 00002.000002 00002.000004 1 2
10 00003.000001 00003.000001 1 1

Related

Consolidating 2 Recursive CTE Statements

The following statement uses 2 separate recursive CTE statements to build a list of steps from a starting location to an ending location based on trips. The desired output is correct, however I am wondering if it is possible to consolidate the 2 CTE statements into one.
The difficulty I am having is relating the endLocation to the startLocation in the first recursive iteration cte1.
The database is SQL Server 2017. I have added the SQL fiddle below:
[SQL Fiddle][1]
SQL Server 2017 Schema Setup:
Create Table TripLocation
(TripID int,
LocationID int,
StopOrder int
)
Create table FromTo
(tripID int,
fromLocationID int,
fromStopOrder int,
toLocationID int,
toStopOrder int
)
Create table cte1Temp
(startTripID int,
startLocationID int,
tripID int,
fromLocationID int,
toLocationID int,
step int)
Create table cte2Temp
(startTripID int,
startLocationID int,
endLocationID int,
tripID int,
fromLocationID int,
toLocationID int,
step int)
--LIST OF LOCATIONS FOR EACH TRIP
Insert into TripLocation
Values
(1,1,0),
(1,2,1),
(1,1,2),
(2,2,0),
(2,3,1),
(2,2,2),
(3,3,0),
(3,4,1),
(3,3,2)
--LIST OF POSSIBLE TO/FROM COMBINATIONS FOR EACH TRIP BASED ON STOPORDER
insert into FromTo
select
FromLocation.tripID,
FromLocation.LocationID [fromLocationID],
FromLocation.StopOrder [fromStopOrder],
ToLocation.LocationID [toLocationID],
ToLocation.StopOrder [toStopOrder]
from
TripLocation FromLocation
join TripLocation ToLocation
on FromLocation.tripID = ToLocation.tripID
and ToLocation.StopOrder >= FromLocation.StopOrder
and FromLocation.LocationID <> ToLocation.LocationID
;
--FIND ALL POSSIBLE END LOCATIONS FOR EACH START LOCATION IF TRIPS SHARE A COMMON LOCATION
with cte1 as
(
select
tripID [startTripID],
fromLocationID [startLocationID],
tripID,
fromLocationID,
toLocationID,
1 [step]
from
FromTo
union all
select
anchor.startTripID,
anchor.startLocationID,
member.tripID,
member.fromLocationID,
member.toLocationID,
anchor.step + 1 [step]
from
FromTo member
join cte1 anchor
on anchor.toLocationID = member.fromLocationID
and member.toLocationID <> anchor.fromLocationID
and member.tripID <> anchor.tripID
)
insert into cte1Temp
select
*
from
cte1
;
--GENERATE PLAN FOR EACH START LOCATION TO AN END LOCATION
with cte2 as
(
select
startTripID,
StartLocationID,
ToLocationID [EndLocationID],
tripID,
FromLocationID,
ToLocationID,
step
from
cte1Temp
union all
select
b.startTripID,
b.StartLocationID,
b.ToLocationID,
a.tripID,
a.FromLocationID,
a.ToLocationID,
a.step
from
cte1Temp b
join CTE2 a
on a.endLocationID = b.FromLocationID
and a.startLocationID = b.startLocationID
)
insert into cte2Temp
select
*
from
cte2
Query 1:
select
*
from
cte2Temp
order by
startlocationID, endLocationID, step
Results:
| startTripID | startLocationID | endLocationID | tripID | fromLocationID | toLocationID | step |
|-------------|-----------------|---------------|--------|----------------|--------------|------|
| 1 | 1 | 2 | 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 2 | 2 | 3 | 2 |
| 1 | 1 | 4 | 1 | 1 | 2 | 1 |
| 1 | 1 | 4 | 2 | 2 | 3 | 2 |
| 1 | 1 | 4 | 3 | 3 | 4 | 3 |
| 1 | 2 | 1 | 1 | 2 | 1 | 1 |
| 2 | 2 | 3 | 2 | 2 | 3 | 1 |
| 2 | 2 | 4 | 2 | 2 | 3 | 1 |
| 2 | 2 | 4 | 3 | 3 | 4 | 2 |
| 2 | 3 | 1 | 2 | 3 | 2 | 1 |
| 2 | 3 | 1 | 1 | 2 | 1 | 2 |
| 2 | 3 | 2 | 2 | 3 | 2 | 1 |
| 3 | 3 | 4 | 3 | 3 | 4 | 1 |
| 3 | 4 | 1 | 3 | 4 | 3 | 1 |
| 3 | 4 | 1 | 2 | 3 | 2 | 2 |
| 3 | 4 | 1 | 1 | 2 | 1 | 3 |
| 3 | 4 | 2 | 3 | 4 | 3 | 1 |
| 3 | 4 | 2 | 2 | 3 | 2 | 2 |
| 3 | 4 | 3 | 3 | 4 | 3 | 1 |
You can try this
Create Table TripLocation
(TripID int,
LocationID int,
StopOrder int
)
Create table FromTo
(tripID int,
fromLocationID int,
fromStopOrder int,
toLocationID int,
toStopOrder int
)
Create table cte1Temp
(startTripID int,
startLocationID int,
tripID int,
fromLocationID int,
toLocationID int,
step int,
path varchar(max)
)
--LIST OF LOCATIONS FOR EACH TRIP
Insert into TripLocation
Values
(1,1,0),
(1,2,1),
(1,1,2),
(2,2,0),
(2,3,1),
(2,2,2),
(3,3,0),
(3,4,1),
(3,3,2)
--LIST OF POSSIBLE TO/FROM COMBINATIONS FOR EACH TRIP BASED ON STOPORDER
insert into FromTo
select
FromLocation.tripID,
FromLocation.LocationID [fromLocationID],
FromLocation.StopOrder [fromStopOrder],
ToLocation.LocationID [toLocationID],
ToLocation.StopOrder [toStopOrder]
from
TripLocation FromLocation
join TripLocation ToLocation
on FromLocation.tripID = ToLocation.tripID
and ToLocation.StopOrder >= FromLocation.StopOrder
and FromLocation.LocationID <> ToLocation.LocationID
;
--FIND ALL POSSIBLE END LOCATIONS FOR EACH START LOCATION IF TRIPS SHARE A COMMON LOCATION
with cte1 as
(
select
tripID [startTripID],
fromLocationID [startLocationID],
tripID,
fromLocationID,
toLocationID,
1 [step],
cast('1_' + ltrim(str(tripID)) + '-' + ltrim(str(tolocationID)) as varchar(max)) [path]
from
FromTo
union all
select
anchor.startTripID,
anchor.startLocationID,
member.tripID,
member.fromLocationID,
member.toLocationID,
anchor.step + 1 [step],
anchor.path + ',' + ltrim(str(anchor.step + 1)) + '_' + + ltrim(str(member.tripID)) + '-' + ltrim(str(member.toLocationID))
from
FromTo member
join cte1 anchor
on anchor.toLocationID = member.fromLocationID
and member.toLocationID <> anchor.fromLocationID
and member.tripID <> anchor.tripID
)
insert into cte1Temp
select
*
from
cte1
;
select
StartLocationID,
TolocationId,
Substring(Value, 1,Charindex('-', Value)-1) as Trip,
Substring(Value, Charindex('-', Value)+1, LEN(Value)) as CrossingLocationID
from
cte1Temp
CROSS APPLY STRING_SPLIT(path, ',')
order by StartLocationId, ToLocationId, Trip
Results:
StartLocationID TolocationId Trip CrossingLocationID
1 2 1_1 2
1 3 1_1 2
1 3 2_2 3
1 4 1_1 2
1 4 2_2 3
1 4 3_3 4
2 1 1_1 1
2 3 1_2 3
2 4 1_2 3
2 4 2_3 4
3 1 1_2 2
3 1 2_1 1
3 2 1_2 2
3 4 1_3 4
4 1 1_3 3
4 1 2_2 2
4 1 3_1 1
4 2 1_3 3
4 2 2_2 2
4 3 1_3 3

How to select rows based on a certain criteria from subsets in a table?

I have a test table with an ActionId column. The column contains an increasing and random number of rows with values of 1 to 5 and then it starts again with another subset of values from 1 to 5. The data can have one or more subsets like that.
I am interested in rows which contain ActionId of values 4 or 5 but only the last one in each subset. So in this sample, I want to return rows 7 and 11. Row id 7 because 5 is the last value before the value goes down and row id 11 because 4 is the last value before the value goes down again. For the last subset, the value doesn't need to go down again. The value 4 or 5 could be in the last row.
I can program this in a procedural language but I can't think of set based SQL solution.
CREATE TABLE test (
id [int] IDENTITY(1,1)
,ActionId INT)
INSERT INTO [test] (ActionId ) VALUES
(1), (2), (3), (3), (4), (4), (5), (3), (3), (3), (4), (1),(2)
select * from test
http://sqlfiddle.com/#!18/4ffe71/3
The solution I came up with involves a simple correlated subquery and a common table expression:
;with cte as
(
select id,
ActionId,
isnull((
select top 1 ActionId
from test as t1
where t0.id < t1.id
order by t1.id
), 0) as nextActionId
from test As t0
)
select id, ActionId
from cte
where actionId IN(4,5)
and actionId > nextActionId
The subquery gets the next actionId for each row, based on the order of the id column. The isnull is there for the last row - to return 0 instead of null.
Then, all you have to do is query the cte where the actionId is either 4 or 5 and it is larger than the next action id.
If I guess correct, You need all values where the next row's value is less that the current value. If I am correct, You can use self join for your purpose. The following script will give you the desired output-
DECLARE #test TABLE
(
id [int] IDENTITY(1,1),
Actionid INT
)
INSERT INTO #test (Actionid )
VALUES
(1), (2), (3), (3), (4), (4), (5), (3), (3), (3), (4), (1),(2)
SELECT A.*
FROM #test A LEFT JOIN #test B ON A.id = B.id-1
WHERE B.Actionid < A.Actionid
The output is-
id Actionid
7 5
11 4
If you also need the last row's value without considering any condition, just change the script with below. This will include the last value 2 in the output.
SELECT A.*
FROM #test A LEFT JOIN #test B ON A.id = B.id-1
WHERE B.Actionid < A.Actionid
OR B.Actionid IS NULL
A recursive CTE can help you here:
--Your mockup table
DECLARE #test TABLE
(
id [int] IDENTITY(1,1),
Actionid INT
)
INSERT INTO #test (Actionid )
VALUES (1), (2), (3), (3), (4), (4), (5), (3), (3), (3), (4), (1),(2);
--the query
WITH recCTE AS
(
SELECT id
,Actionid
,1 AS GroupKey
,1 AS GroupStep
FROM #test t WHERE id=1 --the IDENTITY is the sorting key obviously and will start with a 1 in this test case.
UNION ALL
SELECT t.id
,t.Actionid
,CASE WHEN t.Actionid<=r.Actionid THEN r.GroupKey+1 ELSE r.GroupKey END
,CASE WHEN t.Actionid<=r.Actionid THEN 1 ELSE r.GroupStep+1 END
FROM #test t
INNER JOIN recCTE r ON t.id=r.id+1
)
SELECT *
FROM recCTE;
The idea in short:
We start with the first row and iterate through the set row-by-row. Each row we test, if the ActionId is not increasing and set corresponding values to the GroupKey and the GroupStep.
The result
+----+----------+----------+-----------+
| id | Actionid | GroupKey | GroupStep |
+----+----------+----------+-----------+
| 1 | 1 | 1 | 1 |
+----+----------+----------+-----------+
| 2 | 2 | 1 | 2 |
+----+----------+----------+-----------+
| 3 | 3 | 1 | 3 |
+----+----------+----------+-----------+
| 4 | 3 | 2 | 1 |
+----+----------+----------+-----------+
| 5 | 4 | 2 | 2 |
+----+----------+----------+-----------+
| 6 | 4 | 3 | 1 |
+----+----------+----------+-----------+
| 7 | 5 | 3 | 2 |
+----+----------+----------+-----------+
| 8 | 3 | 4 | 1 |
+----+----------+----------+-----------+
| 9 | 3 | 5 | 1 |
+----+----------+----------+-----------+
| 10 | 3 | 6 | 1 |
+----+----------+----------+-----------+
| 11 | 4 | 6 | 2 |
+----+----------+----------+-----------+
| 12 | 1 | 7 | 1 |
+----+----------+----------+-----------+
| 13 | 2 | 7 | 2 |
+----+----------+----------+-----------+
Solving your issue
We can proceed from there by changing the final SELECT to this
SELECT TOP 1 WITH TIES *
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY GroupKey ORDER BY GroupStep DESC);
The result shows the last entry per sub-set
+----+----------+----------+-----------+
| id | Actionid | GroupKey | GroupStep |
+----+----------+----------+-----------+
| 3 | 3 | 1 | 3 |
+----+----------+----------+-----------+
| 5 | 4 | 2 | 2 |
+----+----------+----------+-----------+
| 8 | 3 | 4 | 1 |
+----+----------+----------+-----------+
| 9 | 3 | 5 | 1 |
+----+----------+----------+-----------+
| 11 | 4 | 6 | 2 |
+----+----------+----------+-----------+
| 7 | 5 | 3 | 2 |
+----+----------+----------+-----------+
| 13 | 2 | 7 | 2 |
+----+----------+----------+-----------+
You can filter to the sub-sets where the last entry is a 4 or a 5. In this case I see the rows 7 and 11 but also the row 5. Might be I did not get the logic correctly...
This is the query I came up with:
WITH cte
AS
(SELECT id, Actionid, ROW_NUMBER() OVER (ORDER BY id) rn FROM test)
SELECT
prev.id
,prev.Actionid prevActionId
,cur.Actionid curActionId
FROM cte cur
JOIN cte prev
ON prev.rn = cur.rn - 1
WHERE
prev.Actionid > cur.Actionid
AND prev.Actionid IN (4, 5)

How to combine multiple rows into one row and multiple column in SQL Server?

I have different tables through I made temp table and here is the result set of temp table:
car_id | car_type | status | count
--------+----------+---------+------
100421 | 1 | 1 | 9
100421 | 1 | 2 | 8
100421 | 1 | 3 | 3
100421 | 2 | 1 | 6
100421 | 2 | 2 | 8
100421 | 2 | 3 | 3
100422 | 1 | 1 | 5
100422 | 1 | 2 | 8
100422 | 1 | 3 | 7
Here is the meaning of status column:
1 as sale
2 as purchase
3 as return
Now I want to show this result set as below
car_id | car_type | sale | purchase | return
--------+----------+------+----------+----------
100421 | 1 | 9 | 8 | 3
100421 | 2 | 6 | 8 | 3
100422 | 1 | 5 | 8 | 7
I tried but unable to generate this result set. Can anyone help?
You can also use a CASE expression.
Query
select [car_id], [car_type],
max(case [status] when 1 then [count] end) as [sale],
max(case [status] when 2 then [count] end) as [purchase],
max(case [status] when 3 then [count] end) as [return]
from [your_table_name]
group by [car_id], [car_type]
order by [car_id];
Try this
select car_id ,car_type, [1] as Sale,[2] as Purchase,[3] as [return]
from (select car_id , car_type , [status] ,[count] from tempTable)d
pivot(sum([count]) for [status] in([1],[2],[3]) ) as pvt
also you can remove the subquery if you don't have any condition
like
select car_id ,car_type, [1] as Sale,[2] as Purchase,[3] as [return]
from tempTable d
pivot(sum([count]) for [status] in([1],[2],[3]) ) as pvt

Change the Value on Duplicate Rows

I need assistance on how to code duplicate Line IDs for the same Purchase Order and assign the additional line IDs with a new number. I would like to use Line ID + 100 for the additional duplicate rows. For example if Purchase Order #11 has three Line ID #5s then the first would stay as 5 and the second would be 501 and the third would be 502, however, I can only get a 1, 2 or 3 or if no duplicate just 1. I am not sure what to use to increment. I am hoping some one can assist or guide. Thank you
PurchaseOrderID LineID PackingList NewLineID
11 1 12323 1
11 1 78786 2
11 2 67523 1
11 3 44559 1
11 4 44559 1
11 5 96545 1
11 5 12323 2
11 5 34569 3
The Packing Slip causes the duplicates for the same line ID.
Below is what I am trying to use which is giving me the above NewLineID:
SELECT
PurchaseOrderID,
LineID,
PackingList,
ROW_NUMBER() over
(
partition by PurchaseOrderID, LineID
order by PurchaseOrderID, LineID
) as NewLineID
FROM PurchaseOrderTransactions
Using ROW_NUMBER and CASE:
WITH Cte AS(
SELECT
PurchaseOrderID,
LineID,
PackingList,
RN = ROW_NUMBER() OVER (PARTITION BY PurchaseOrderID, LineID ORDER BY LineID)
FROM PurchaseOrderTransactions
)
SELECT
PurchaseOrderID,
LineID,
PackingList,
NewLineID = CASE
WHEN RN = 1 THEN LineID
ELSE (LineID * 100) + (RN - 1)
END
FROM Cte
Without using a CTE:
SELECT
PurchaseOrderID,
LineID,
PackingList,
NewLineID =
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY PurchaseOrderID, LineID ORDER BY LineID) = 1 THEN LineID
ELSE (LineID * 100) + (ROW_NUMBER() OVER (PARTITION BY PurchaseOrderID, LineID ORDER BY LineID) - 1)
END
FROM PurchaseOrderTransactions
SQL Fiddle
| PurchaseOrderID | LineID | PackingList | NewLineID |
|-----------------|--------|-------------|-----------|
| 11 | 1 | 12323 | 1 |
| 11 | 1 | 78786 | 101 |
| 11 | 2 | 67523 | 2 |
| 11 | 3 | 44559 | 3 |
| 11 | 4 | 44559 | 4 |
| 11 | 5 | 96545 | 5 |
| 11 | 5 | 12323 | 501 |
| 11 | 5 | 34569 | 502 |

SQL Server 2005 T-SQL Problem: Need help in omitting records

Good day!
I need help in writing a query.. I have records in a table below.. The condition would be no records should be displayed if the succeeding records' new_state was repeated from the previous records(new_state) and if it is changed in the same date..
here record_id 1 has gone through the ff states: 0->1->2->1->3->4->3 in the same day.. state 1 was changed to state 2 then back to state 1 again (id 2 & 3 would not be displayed).. same with state 3 (id 5 & 6 would not be displayed)..
id | record_id| date_changed | old_state | new_state |
1 | 1 | 2009-01-01 | 0 | 1 |
2 | 1 | 2009-01-01 | 1 | 2 | not displayed
3 | 1 | 2009-01-01 | 2 | 1 | not displayed
4 | 1 | 2009-01-01 | 1 | 3 |
5 | 1 | 2009-01-01 | 3 | 4 | not displayed
6 | 1 | 2009-01-01 | 4 | 3 | not displayed
so the result would display only 2 records for record_id=1..
id | record_id| date_changed | old_state | new_state |
1 | 1 | 2009-01-01 | 0 | 1 |
4 | 1 | 2009-01-01 | 1 | 3 |
Here's the code for table creation and data:
IF OBJECT_ID('TempDB..#table','U') IS NOT NULL
DROP TABLE #table
CREATE TABLE #table
(
id INT identity primary key,
record_id INT,
date_changed DATETIME,
old_state INT,
new_state INT
)
INSERT INTO #table(record_id,date_changed,old_state,new_state)
SELECT 1,'2009-01-01',0,1 UNION ALL --displayed
SELECT 1,'2009-01-01',1,2 UNION ALL --not displayed
SELECT 1,'2009-01-01',2,1 UNION ALL --not displayed
SELECT 1,'2009-01-01',1,3 UNION ALL --displayed
SELECT 1,'2009-01-01',3,4 UNION ALL --not displayed
SELECT 1,'2009-01-01',4,3 --not displayed
INSERT INTO #table(record_id,date_changed,old_state,new_state)
SELECT 3,'2009-01-01',0,1 UNION ALL --displayed
SELECT 3,'2009-01-01',1,2 UNION ALL --not displayed
SELECT 3,'2009-01-01',2,3 UNION ALL --not displayed
SELECT 3,'2009-01-01',3,4 UNION ALL --not displayed
SELECT 3,'2009-01-01',4,1 --not displayed
SELECT * FROM #table
I would appreciate any help..
Thanks
For clarity regarding record_id=3.. Given this table:
id | record_id| date_changed | old_state | new_state |
7 | 3 | 2009-01-01 | 0 | 1 |
8 | 3 | 2009-01-01 | 1 | 2 | not displayed
9 | 3 | 2009-01-01 | 2 | 3 | not displayed
10 | 3 | 2009-01-01 | 3 | 4 | not displayed
11 | 3 | 2009-01-01 | 4 | 1 | not displayed
when running the query for record_id=3, the table result will be:
id | record_id| date_changed | old_state | new_state |
7 | 3 | 2009-01-01 | 0 | 1 |
Thanks!
UPDATE (12/2/2009):
Special scenario
id | record_id| date_changed | old_state | new_state |
1 | 4 | 2009-01-01 | 0 | 1 | displayed
2 | 4 | 2009-01-01 | 1 | 2 | displayed
3 | 4 | 2009-01-01 | 2 | 3 | not displayed
4 | 4 | 2009-01-01 | 3 | 2 | not displayed
5 | 4 | 2009-01-01 | 2 | 3 | displayed
6 | 4 | 2009-01-01 | 3 | 4 | not displayed
7 | 4 | 2009-01-01 | 4 | 3 | not displayed
where new_state 3 appears on id 3,5 and 7.. id 3 would not be displayed since it is between id 2 and id 4 which have the same new_state(3).. Then id 5 should be displayed since there is no existing new_state 3 yet..
code snippet:
IF OBJECT_ID('TempDB..#tablex','U') IS NOT NULL
DROP TABLE #tablex
CREATE TABLE #tablex
(
id INT identity primary key,
record_id INT,
date_changed DATETIME,
old_state INT,
new_state INT
)
INSERT INTO #tablex(record_id,date_changed,old_state,new_state)
SELECT 4,'2009-01-01',0,1 UNION ALL --displayed
SELECT 4,'2009-01-01',1,2 UNION ALL --displayed
SELECT 4,'2009-01-01',2,3 UNION ALL --not displayed
SELECT 4,'2009-01-01',3,2 UNION ALL --not displayed
SELECT 4,'2009-01-01',2,3 UNION ALL --displayed
SELECT 4,'2009-01-01',3,4 UNION ALL --not displayed
SELECT 4,'2009-01-01',4,3 --not displayed
I think the sequence in building the result is important..
Thanks!
SELECT A.*
/*
A.ID, A.old_state, a.new_state,
B.ID as [Next], b.old_state, b.new_state,
C.ID as [Prev], c.old_state, c.new_state
*/
FROM #table A LEFT JOIN
#table B ON A.ID = (B.ID - 1)
LEFT JOIN #table C ON (A.ID - 1) = C.ID
-- WHERE A.old_State <> B.new_State AND A.new_State <> C.old_State
WHERE A.record_id = 1
AND A.old_State <> COALESCE(B.new_State, -1)
AND A.new_State <> COALESCE(C.old_State, -1)
EDIT: I guess, what OP needs is that the remaining record should be selected except those where current record's old state is not the same as next record's new state (kind of an undo operation in records) and current record's new state should not be same as previous record's old state.
Following steps to get to the result
select all items that should not appear in the result.
left join these with the original table and select only those records that don't match a should not appear record.
.
;WITH cte_table (master_id, master_state, id, record_id, old_state, new_state, level) AS
(
SELECT id, old_state, id, record_id, old_state, new_state, 1
FROM #table
UNION ALL
SELECT master_id, master_state, #table.id, #table.record_id, #table.old_state, #table.new_state, level + 1
FROM cte_table
INNER JOIN #table ON cte_table.new_state = #table.old_state
AND cte_table.record_id = #table.record_id
AND cte_table.id < #table.id
AND cte_table.master_state < #table.old_state
)
SELECT master_id, t1.*, level
INTO #result
FROM #table t1
INNER JOIN (
SELECT master_id, min_child_id = MIN(id), level
FROM cte_table
GROUP BY master_id, level
) t2 ON t2.min_child_id = t1.id
SELECT t1.*
FROM #table t1
LEFT OUTER JOIN (
SELECT r1.id
FROM #result r1
INNER JOIN (
SELECT r1.master_id
FROM #result r1
INNER JOIN #result r2 ON r2.new_state = r1.old_state
AND r2.master_id = r1.master_id
WHERE r1.level = 1
) r2 ON r2.master_id = r1.master_id
) r1 ON r1.id = t1.id
WHERE r1.id IS NULL
AND t1.old_state < t1.new_state
ORDER BY 1, 2, 3

Resources