data sequence handing - sql-server

I am stuck with a weird problem of handling data sequences.
My source data looks like -
Roll-on, Marker
1,1
2,0
3,0
5,1
8,1
9,0
10,1
the marker column can only have two values, 1 and 0
if the roll no column is in a sequence, the marker value of 1 indicates the start of the sequence and all the remaining roll no will have marker value 0 within that sequence. So for roll no sequence 1-3, marker value is 1 for roll no 1 and 0 for the rest. However, if roll no doesn't fall into a sequence(as in roll no 8), the marker value is 1.
From this data I need to create an output as follows -
Roll range
1
2
3
1-3
5
5-5
8
9
10
8-10
Meaning -
display the roll no in sequence as in the input
after each sequence ends, display a new record containing the start and end roll no of the proceeding sequence
How is this possible?
Thanks in advance for help.

It seems like an island and gap problem.
If I understand correctly, we can try to use SUM window function with conditions to make it.
Generator a gap of row number then getting min and max group by
SELECT CONCAT(MIN(Roll),'-',MAX(Roll))
FROM (
SELECT *,
SUM(CASE WHEN Marker = 1 THEN 1 ELSE 0 END) OVER(ORDER BY Roll) grp
FROM T
) t1
GROUP BY grp
as I comment I am not sure about the logic of 8-10 (why isn't 8-9 and 10-10) from your expect result and columns description, I think we can try to judge Max of Roll then do some arithmetic.
SELECT CONCAT(MIN(Roll),'-',MAX(Roll))
FROM (
SELECT *,
SUM(CASE WHEN Marker = 1 THEN 1 ELSE 0 END) OVER(ORDER BY Roll) + IIF(MAX(Roll) OVER() = Roll, - Marker,0) grp
FROM T
) t1
GROUP BY grp
so that the final query combines result set we can use UNION ALL
;WITH CTE AS (
SELECT *,
SUM(CASE WHEN Marker = 1 THEN 1 ELSE 0 END) OVER(ORDER BY Roll) + IIF(MAX(Roll) OVER() = Roll, - Marker,0) grp
FROM T
)
SELECT [Roll range]
FROM (
SELECT CONCAT(MIN(Roll),'-',MAX(Roll)) 'Roll range',MAX(Roll) seq
FROM CTE t1
GROUP BY grp
UNION ALL
SELECT CAST(Roll AS VARCHAR(5)),Roll
FROM CTE t1
) t1
ORDER BY seq
sqlfiddle

SELECT
CASE WHEN a=2 AND CHARINDEX('-',R)=0 THEN CONCAT(R,'-',R) ELSE R END as R,
R2,
a
FROM (
SELECT
1 as a,
CONVERT(VARCHAR(3), Roll) R,
Roll as R2
FROM table1
UNION ALL
SELECT
2,
STRING_AGG(Roll,'-') R,
MAX(Roll) as R2
FROM (
SELECT
Roll,
SUM(Marker) OVER (ORDER BY Roll) S
FROM
table1
) x
GROUP BY S
) x
ORDER BY R2,a
output:
R
R2
a
1
1
1
2
2
1
3
3
1
1-2-3
3
2
5
5
1
5-5
5
2
8
8
1
9
9
1
8-9
9
2
10
10
1
10-10
10
2
Columns R2 and a are added for correct sorting.
I group 8-9 and 10-10, but this question is still open, see comment

Related

Transact-SQL - number rows until condition met

I'm trying to generate the numbers in the "x" column considering the values in field "eq", in a way that it should assign a number for every record until it meets the value "1", and the next row should reset and start counting again. I've tried with row_number, but the problem is that I only have ones and zeros in the column I need to evaluate, and the cases I've seen using row_number were using growing values in a column. Also tried with rank, but I haven't managed to make it work.
nInd Fecha Tipo #Inicio #contador_I #Final #contador_F eq x
1 18/03/2002 I 18/03/2002 1 null null 0 1
2 20/07/2002 F 18/03/2002 1 20/07/2002 1 1 2
3 19/08/2002 I 19/08/2002 2 20/07/2002 1 0 1
4 21/12/2002 F 19/08/2002 2 21/12/2002 2 1 2
5 17/03/2003 I 17/03/2003 3 21/12/2002 2 0 1
6 01/04/2003 I 17/03/2003 4 21/12/2002 2 0 2
7 07/04/2003 I 17/03/2003 5 21/12/2002 2 0 3
8 02/06/2003 F 17/03/2003 5 02/06/2003 3 0 4
9 31/07/2003 F 17/03/2003 5 31/07/2003 4 0 5
10 31/08/2003 F 17/03/2003 5 31/08/2003 5 1 6
11 01/09/2005 I 01/09/2005 6 31/08/2003 5 0 1
12 05/09/2005 I 01/09/2005 7 31/08/2003 5 0 2
13 31/12/2005 F 01/09/2005 7 31/12/2005 6 0 3
14 14/01/2006 F 01/09/2005 7 14/01/2006 7 1 4
There is another solution available:
select
nind, eq, row_number() over (partition by s order by s)
from (
select
nind, eq, coalesce((
select sum(eq) +1 from mytable pre where pre.nInd < mytable.nInd)
,1) s --this is the sum of eq!
from mytable) g
The inner subquery creates groups sequentially for each occurrence of 1 in eq. Then we can use row_number() over partition to get our counter.
Here is an example using Sql Server
I have two answers here. One is based off of the ROW_NUMBER() and the other is based off of what appears to be your index (nInd). I wasn't sure if there would be a gap in your index so I made the ROW_NUMBER() as well.
My table format was as follows -
myIndex int identity(1,1) NOT NULL
number int NOT NULL
First one is ROW_NUMBER()...
WITH rn AS (SELECT *, ROW_NUMBER() OVER (ORDER BY myIndex) AS rn, COUNT(*) AS max
FROM counting c GROUP BY c.myIndex, c.number)
,cte (myIndex, number, level, row) AS (
SELECT r.myIndex, r.number, 1, r.rn + 1 FROM rn r WHERE r.rn = 1
UNION ALL
SELECT r1.myIndex, r1.number,
CASE WHEN r1.number = 0 AND r2.number = 1 THEN 1
ELSE c.level + 1
END,
row + 1
FROM cte c
JOIN rn r1
ON c.row = r1.rn
JOIN rn r2
ON c.row - 1 = r2.rn
)
SELECT c.myIndex, c.number, c.level FROM cte c OPTION (MAXRECURSION 0);
Now the index...
WITH cte (myIndex, number, level) AS (
SELECT c.myIndex + 1, c.number, 1 FROM counting c WHERE c.myIndex = 1
UNION ALL
SELECT c1.myIndex + 1, c1.number,
CASE WHEN c1.number = 0 AND c2.number = 1 THEN 1
ELSE c.level + 1
END
FROM cte c
JOIN counting c1
ON c.myIndex = c1.myIndex
JOIN counting c2
ON c.myIndex - 1 = c2.myIndex
)
SELECT c.myIndex - 1 AS myIndex, c.number, c.level FROM cte c OPTION (MAXRECURSION 0);
The answer that I have now is via using
Cursor
I know if there is another solution without cursor it will be better for performance aspects
here is a quick demo of my solution:
-- Create DBTest
use master
Go
Create Database DBTest
Go
use DBTest
GO
-- Create table
Create table Tabletest
(nInd int , eq int)
Go
-- insert dummy data
insert into Tabletest (nInd,eq)
values (1,0),
(2,1),
(3,0),
(4,1),
(5,0),
(6,0),
(7,0),
(8,0),
(9,1),
(8,0),
(9,1)
Create table #Tabletest (nInd int ,eq int ,x int )
go
DECLARE #nInd int , #eq int , #x int
set #x = 1
DECLARE db_cursor CURSOR FOR
SELECT nInd , eq
FROM Tabletest
order by nInd
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #nInd , #eq
WHILE ##FETCH_STATUS = 0
BEGIN
if (#eq = 0)
begin
insert into #Tabletest (nInd ,eq ,x) values (#nInd , #eq , #x)
set #x = #x +1
end
else if (#eq = 1)
begin
insert into #Tabletest (nInd ,eq ,x) values (#nInd , #eq , #x)
set #x = 1
end
FETCH NEXT FROM db_cursor INTO #nInd , #eq
END
CLOSE db_cursor
DEALLOCATE db_cursor
select * from #Tabletest
The end result set will be as following:
Hope it helps.
Looking at this a slightly different way (which might not be true, but eliminates the need for cursors of recursive CTEs), it looks like you building ordered groups within your dataset. So, start by finding those groups, then determining the ordering of each of them.
The real key is to determine the rules to find the correcting grouping. Based on your description and comments, I'm guessing the grouping is from the start (ordered by the nInd column) ending at each row with and eq value of 1, so you can do something like:
;with ends(nInd, ord) as (
--Find the ending row for each set
SELECT nInd, row_number() over(order by nInd)
FROM mytable
WHERE eq=1
), ranges(sInd, eInd) as (
--Find the previous ending row for each ending row, forming a range for the group
SELECT coalesce(s.nInd,0), e.nInd
FROM ends s
right join ends e on s.ord=e.ord-1
)
Then, using these group ranges, you can find the final ordering of each:
select t.nInd, t.Fecha, t.eq
,[x] = row_number() over(partition by sInd order by nInd)
from ranges r
join mytable t on r.sInd < t.nInd
and t.nInd <= r.eInd
order by t.nInd

TSQL - Difficult Grouping

Please see fiddle: http://sqlfiddle.com/#!6/e6768/2
I have data, like below:
DRIVER DROP
1 1
1 2
1 ReturnToBase
1 4
1 5
1 ReturnToBase
1 6
1 7
2 1
2 2
2 ReturnToBase
2 4
I am trying to group my data, so for each driver, each group of return to bases have a grouping number.
My output should look like this:
DRIVER DROP GROUP
1 1 1
1 2 1
1 ReturnToBase 1
1 4 2
1 5 2
1 ReturnToBase 2
1 6 3
1 7 3
1 ReturnToBase 3
2 1 1
2 2 1
2 ReturnToBase 1
2 4 2
I've tried getting this result with a combination of windowed functions but I've been miles off so far
Below is what I had so far, it isn't supposed to be functional I was trying to figure out how it could be done, if it's even possible.
SELECT
ROW_NUMBER() OVER (Partition BY Driver order by Driver Desc) rownum,
Count(1) OVER (Partition By Driver Order By Driver Desc) counter,
Count
DropNo,
Driver,
CASE DropNo
WHEN 'ReturnToBase' THEN 1 ELSE 0 END AS EnumerateRound
FROM
Rounds
You can use the following query:
SELECT id, DRIVER, DROPno,
1 + SUM(flag) OVER (PARTITION BY DRIVER ORDER BY id) -
CASE
WHEN DROPno = 'ReturnToBase' THEN 1
ELSE 0
END AS grp
FROM (
SELECT id, DRIVER, DROPno,
CASE
WHEN DROPno = 'ReturnToBase' THEN 1
ELSE 0
END AS flag
FROM rounds ) AS t
Demo here
This query uses windowed version of SUM with ORDER BY in the OVER clause to calculate a running total. This version of SUM is available from SQL Server 2012 onwards AFAIK.
Fiddling a bit with this running total value is all we need in order to get the correct GROUP value.
EDIT: (credit goes to #Conrad Frix)
Using CROSS APPLY instead of an in-line view can considerably simplify things:
SELECT id, DRIVER, DROPno,
1 + SUM(x.flag) OVER (PARTITION BY DRIVER ORDER BY id) - x.flag
FROM rounds
CROSS APPLY (SELECT CASE WHEN DROPno = 'ReturnToBase' THEN 1 ELSE 0 END) AS x(flag)
Demo here
Added a sequential ID column to your example for use in a recursive CTE:
with cte as (
select ID,DRIVER,DROPno,1 as GRP
FROM rounds
where ID = 1
union all
select a.ID
,a.DRIVER
,a.DROPno
,case when b.DROPno = 'ReturnToBase'
or b.DRIVER <> a.DRIVER then b.GRP + 1
else b.GRP end
from rounds a
inner join cte b
on a.ID = b.ID + 1
)
select * from cte
SQL Fiddle

How to write this query without cursor in SQL Server 2008 R2?

I have this table ScoreDetails, 2 columns (there are more, but only 2 needed or this query). One is ScoreDate, Score.
The structure is like
2012:03:27: 5:06:37:134 27
2012:03:27: 5:06:37:276 37
2012:03:28: 4:12:97:019 19
2012:03:29: 7:06:37:134 7
2012:03:29: 8:06:37:134 0
2012:04:03: 12:06:37:739 16
2012:04:04: 23:21:15:834 33
2012:04:04: 15:08:24:697 12
2012:04:06: 5:06:37:134 0
2012:04:09: 5:06:37:134 2
2012:04:13: 5:06:37:134 92
What I want is to write a select query, without using temp table or cursor. Such that, I have a column that starts from 1 and keeps on increasing as 2,3 and so on, upto when the score is non-zero. But as soon as a zero is encountered in score column, it resets to 1 and then start again. Like this...
2012:03:27: 5:06:37:134 27 1
2012:03:27: 5:06:37:276 37 2
2012:03:28: 4:12:97:019 19 3
2012:03:29: 7:06:37:134 7 4
2012:03:29: 8:06:37:134 0 0
2012:04:03: 12:06:37:739 16 1
2012:04:04: 23:21:15:834 33 2
2012:04:04: 15:08:24:697 12 3
2012:04:06: 5:06:37:134 0 0
2012:04:09: 5:06:37:134 2 1
2012:04:13: 5:06:37:134 92 2
I am using SQL Server 2008 R2.
You can use common table expressions for that. I defined 2 anchor queries: one for records with 0 score and the other for the first record. Then you build up the result based on previous results until you find 0 score.
with cte
as
(
select ScoreDate, Score, ScoreRank, 0 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
where Score = 0
union all
select ScoreDate, Score, ScoreRank, 1 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
where Score <> 0 and ScoreRank = 1
union all
select X.ScoreDate, X.Score, X.ScoreRank, cte.Value + 1 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
inner join cte
on X.ScoreRank = cte.ScoreRank + 1
and X.Score <> 0
)
select ScoreDate, Score, Value, ScoreRank
from cte
order by ScoreDate
SQL Fiddle Demo
I won't spoil the fun of finding the solution yourself, but I will give you some hints on how to split the problem into smaller pieces:
Find all the records where the score is reset. Let's call this subquery the resetRecords.
Join the records of the original table to the resetRecords, such that every record has "its" reset record (i.e., the reset record that provides the base for its count).
Use ROW_NUMBER() OVER (PARTITION BY ... ) to assign the numbers.
Try to do this one step at a time. Beware: It won't be a simple query, so a solution with temp tables or cursors might be easier to understand and maintain.
Try something like this:
with x as (
select *, sum(case when Score=0 then 1 else 0 end) over(order by ScoreDate) as grp
from ScoreDetails
)
select ScoreDate, Score, row_number() over (partition by grp order by ScoreDate)
from x
order by ScoreDate
(as soon as a zero is encountered in score column, it resets to 1 and then start again, you said)

Rolling a number from rows with a flag into the next row without the flag

I'm a bit stumped about how to solve this particular piece of a problem I'm working on. I started with a much bigger problem, but I managed to simplify it into this while keeping good performance intact.
Say I have the following result set. AggregateMe is something I'm deriving from SQL conditionals.
MinutesElapsed AggregateMe ID Type RowNumber
1480 1 1 A 1
1200 0 1 A 2
1300 0 1 B 3
1550 0 1 C 4
725 1 1 A 5
700 0 1 A 6
1900 1 2 A 7
3300 1 2 A 8
4900 0 2 A 9
If AggregateMe is 1 (true) or, if you prefer, if is true, I want the counts to be aggregated into the next row where AggregateMe (or conditions) do not evaluate to true.
Aggregate functions or Subqueries are fair game as is PARTITION BY.
For example, the above result set would become:
MinutesElapsed ID Type
2680 1 A
1300 1 B
1550 1 C
1425 1 A
10100 2 A
Is there a clean way to do this? If you want, I can share more about the original problem, but it is a bit more complicated.
Edited to add: SUM and GROUP BY alone won't work, because some sums would be rolled into the wrong row. My sample data did not reflect this case, so I added rows where this case can occur. In the updated sample data, using an aggregate function in the simplest way would cause the 2680 count and the 1425 count to be rolled together, which I do not want.
EDIT: And if you're wondering how I got here in the first place, here you go. I'm going to aggregate statistics about how long our program left something in a certain ActionType, and my first step was by creating this subquery. Please feel free to criticize:
select
ROW_NUMBER() over(order by claimid, insertdate asc) as RowNbr,
DateDiff(mi, ahCurrent.InsertDate, CASE WHEN ahNext.NextInsertDate is null THEN GetDate() ELSE ahNext.NextInsertDate END) as MinutesInActionType,
ahCurrent.InsertDate, ahNext.NextInsertDate,
ahCurrent.ClaimID, ahCurrent.ActionTypeID,
case when ahCurrent.ActionTypeID = ahNext.NextActionTypeID and ahCurrent.ClaimID = ahNext.NextClaimID then 1 else 0 end as aggregateme
FROM
(
select ROW_NUMBER () over(order by claimid, insertdate asc) as RowNum, ClaimID, InsertDate, ActionTypeID
From autostatushistory
--Where AHCurrent is not AHPast
) ahCurrent
LEFT JOIN
(
select ROW_NUMBER() over(order by claimid, insertdate asc) as RowNum, ClaimID as NextClaimID, InsertDate as NextInsertDate, ActionTypeID as NextActionTypeID
FROM autostatushistory
) ahNext
ON (ahCurrent.ClaimID = ahNext.NextClaimID AND ahCurrent.RowNum = ahNext.RowNum - 1 and ahCurrent.ActionTypeID = ahNext.NextActionTypeID)
here the query the you need to execute,
it's not clean, maybe you'll optimize it:
WITH cte AS( /* Create a table containing row number */
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ROW,
MinutesElapsed,
AggregateMe,
ID,
TYPE
FROM rolling
)
SELECT MinutesElapsed + (CASE /* adding minutes from next valid records*/
WHEN cte.AggregateMe <> 1 /*if current record is 0 then */
THEN 0 /*skip it*/
ELSE
(SELECT SUM(MinutesElapsed) /* calculating sum of all -> */
FROM cte localTbl
WHERE
cte.ROW < localTbl.ROW /* next records -> */
AND
localTbl.ROW <= ( /* until we find aggregate = 0 */
SELECT MIN(ROW)
FROM cte sTbl
WHERE sTbl.AggregateMe = 0
AND
sTbl.ROW > cte.ROW
)
AND
(localTbl.AggregateMe = 0 OR /* just to be sure :) */
localTbl.AggregateMe = 1))
END) as MinutesElapsed,
AggregateMe,
ID,
TYPE
FROM cte
WHERE cte.ROW = 1 OR NOT( /* not showing records used that are used in sum, skipping 1 record*/
( /* records with agregate 0 after record with aggregate 1 */
cte.AggregateMe = 0
AND
(
SELECT AggregateMe
FROM cte tblLocal
WHERE cte.ROW = (tblLocal.ROW + 1)
)>0
)
OR
( /* record with aggregate 1 after record with aggregate 1 */
cte.AggregateMe = 1
AND
(
SELECT AggregateMe
FROM cte tblLocal
WHERE cte.ROW = (tblLocal.ROW + 1)
)= 1
)
);
test here
hope it helps to your problem.
feel free to ask questions.
By looking at your result set seems like following would work,
SELECT ID,Type,SUM(MinutesElapsed)
FROM mytable
GROUP BY ID,Type
But cannot tell for sure without looking into original dataset.

Filter Duplicate Rows on Conditions

I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id
select id, rid, min(modified), max(active), did from foo group by rid, did order by id;
You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.

Resources