TSQL - Difficult Grouping

TSQL - Difficult Grouping - sql-server

Please see fiddle: http://sqlfiddle.com/#!6/e6768/2
I have data, like below:
DRIVER DROP
1 1
1 2
1 ReturnToBase
1 4
1 5
1 ReturnToBase
1 6
1 7
2 1
2 2
2 ReturnToBase
2 4
I am trying to group my data, so for each driver, each group of return to bases have a grouping number.
My output should look like this:
DRIVER DROP GROUP
1 1 1
1 2 1
1 ReturnToBase 1
1 4 2
1 5 2
1 ReturnToBase 2
1 6 3
1 7 3
1 ReturnToBase 3
2 1 1
2 2 1
2 ReturnToBase 1
2 4 2
I've tried getting this result with a combination of windowed functions but I've been miles off so far
Below is what I had so far, it isn't supposed to be functional I was trying to figure out how it could be done, if it's even possible.
SELECT
ROW_NUMBER() OVER (Partition BY Driver order by Driver Desc) rownum,
Count(1) OVER (Partition By Driver Order By Driver Desc) counter,
Count
DropNo,
Driver,
CASE DropNo
WHEN 'ReturnToBase' THEN 1 ELSE 0 END AS EnumerateRound
FROM
Rounds

You can use the following query:
SELECT id, DRIVER, DROPno,
1 + SUM(flag) OVER (PARTITION BY DRIVER ORDER BY id) -
CASE
WHEN DROPno = 'ReturnToBase' THEN 1
ELSE 0
END AS grp
FROM (
SELECT id, DRIVER, DROPno,
CASE
WHEN DROPno = 'ReturnToBase' THEN 1
ELSE 0
END AS flag
FROM rounds ) AS t
Demo here
This query uses windowed version of SUM with ORDER BY in the OVER clause to calculate a running total. This version of SUM is available from SQL Server 2012 onwards AFAIK.
Fiddling a bit with this running total value is all we need in order to get the correct GROUP value.
EDIT: (credit goes to #Conrad Frix)
Using CROSS APPLY instead of an in-line view can considerably simplify things:
SELECT id, DRIVER, DROPno,
1 + SUM(x.flag) OVER (PARTITION BY DRIVER ORDER BY id) - x.flag
FROM rounds
CROSS APPLY (SELECT CASE WHEN DROPno = 'ReturnToBase' THEN 1 ELSE 0 END) AS x(flag)
Demo here

Added a sequential ID column to your example for use in a recursive CTE:
with cte as (
select ID,DRIVER,DROPno,1 as GRP
FROM rounds
where ID = 1
union all
select a.ID
,a.DRIVER
,a.DROPno
,case when b.DROPno = 'ReturnToBase'
or b.DRIVER <> a.DRIVER then b.GRP + 1
else b.GRP end
from rounds a
inner join cte b
on a.ID = b.ID + 1
)
select * from cte
SQL Fiddle

Related

Classifying rows into a grouping column that shows the data is related to prior rows

I have a set of data that I want to classify into groups based on a prior record id existing on the newer rows. The initial record of the group has a prior sequence id = 0.
The data is as follows:
customer id
sequence id
prior_sequence id
1
1
0
1
2
1
1
3
2
2
4
0
2
5
4
2
6
0
2
7
6
Ideally, I would like to create the following grouping column and yield the following results:
customer id
sequence id
prior sequence id
grouping
1
1
0
1
1
2
1
1
1
3
2
1
2
4
0
2
2
5
4
2
2
6
0
3
2
7
6
3
I've attempted to utilize island gap logic utilizing the ROW_NUMBER() function. However, I have been unsuccessful in doing so. I suspect the need here is more along the lines of a recursive CTE, which I am attempting at the moment.

I agree that a recursive CTE will do the job. Something like:
WITH reccte AS
(
/*query that determines starting point for recursion
*
* In this case we want all records with no prior_sequence_id
*/
SELECT
customer_id,
sequence_id,
prior_sequence_id,
/*establish grouping*/
ROW_NUMBER() OVER (ORDER BY sequence_id) as grouping
FROM yourtable
WHERE prior_sequence_id = 0
UNION
/*join the recursive CTe back to the table and iterate*/
SELECT
yourtable.customer_id,
yourtable.sequence_id,
yourtable.prior_sequence_id,
reccte.grouping
FROM reccte
INNER JOIN yourtable ON reccte.sequence_id = yourtable.prior_sequence_id
)
SELECT * FROM reccte;

It looks like you could use a simple correlated query, at least given your sample data:
select *, (
select Sum(Iif(prior_sequence_id = 0, 1, 0))
from t t2
where t2.sequence_id <= t.sequence_id
) Grouping
from t;
See Example Fiddle

How to query records based on row_num and one of the column value?

Rownum Status
1 2
2 1
3 3
4 2
5 3
6 1
The condition is to query records appear before the first record of status=3 which in the above scenario the expected output will be rownum = 1 and 2.
In the case if there is no status=3 then show everything.
I'm not sure from where to start hence currently no findings

If you are using SQL Server 2012+, then you can use window version of SUM with an ORDER BY clause:
SELECT Rownum, Status
FROM (
SELECT Rownum, Status,
SUM(CASE WHEN Status = 3 THEN 1 ELSE 0 END)
OVER
(ORDER BY Rownum) AS s
FROM mytable) t
WHERE t.s = 0
Calculated field s is a running total of Status = 3 occurrences. The query returns all records before the first occurrence of a 3 value.
Demo here

How to write this query without cursor in SQL Server 2008 R2?

I have this table ScoreDetails, 2 columns (there are more, but only 2 needed or this query). One is ScoreDate, Score.
The structure is like
2012:03:27: 5:06:37:134 27
2012:03:27: 5:06:37:276 37
2012:03:28: 4:12:97:019 19
2012:03:29: 7:06:37:134 7
2012:03:29: 8:06:37:134 0
2012:04:03: 12:06:37:739 16
2012:04:04: 23:21:15:834 33
2012:04:04: 15:08:24:697 12
2012:04:06: 5:06:37:134 0
2012:04:09: 5:06:37:134 2
2012:04:13: 5:06:37:134 92
What I want is to write a select query, without using temp table or cursor. Such that, I have a column that starts from 1 and keeps on increasing as 2,3 and so on, upto when the score is non-zero. But as soon as a zero is encountered in score column, it resets to 1 and then start again. Like this...
2012:03:27: 5:06:37:134 27 1
2012:03:27: 5:06:37:276 37 2
2012:03:28: 4:12:97:019 19 3
2012:03:29: 7:06:37:134 7 4
2012:03:29: 8:06:37:134 0 0
2012:04:03: 12:06:37:739 16 1
2012:04:04: 23:21:15:834 33 2
2012:04:04: 15:08:24:697 12 3
2012:04:06: 5:06:37:134 0 0
2012:04:09: 5:06:37:134 2 1
2012:04:13: 5:06:37:134 92 2
I am using SQL Server 2008 R2.

You can use common table expressions for that. I defined 2 anchor queries: one for records with 0 score and the other for the first record. Then you build up the result based on previous results until you find 0 score.
with cte
as
(
select ScoreDate, Score, ScoreRank, 0 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
where Score = 0
union all
select ScoreDate, Score, ScoreRank, 1 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
where Score <> 0 and ScoreRank = 1
union all
select X.ScoreDate, X.Score, X.ScoreRank, cte.Value + 1 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
inner join cte
on X.ScoreRank = cte.ScoreRank + 1
and X.Score <> 0
)
select ScoreDate, Score, Value, ScoreRank
from cte
order by ScoreDate
SQL Fiddle Demo

I won't spoil the fun of finding the solution yourself, but I will give you some hints on how to split the problem into smaller pieces:
Find all the records where the score is reset. Let's call this subquery the resetRecords.
Join the records of the original table to the resetRecords, such that every record has "its" reset record (i.e., the reset record that provides the base for its count).
Use ROW_NUMBER() OVER (PARTITION BY ... ) to assign the numbers.
Try to do this one step at a time. Beware: It won't be a simple query, so a solution with temp tables or cursors might be easier to understand and maintain.

Try something like this:
with x as (
select *, sum(case when Score=0 then 1 else 0 end) over(order by ScoreDate) as grp
from ScoreDetails
)
select ScoreDate, Score, row_number() over (partition by grp order by ScoreDate)
from x
order by ScoreDate
(as soon as a zero is encountered in score column, it resets to 1 and then start again, you said)

How to count numbers for each group?

Let's say there is a result set...I need to print it out like so:
ID Count
1 5
1 5
1 5
1 5
1 5
2 2
2 2
3 1
Thanks in advance.

Do you mean that your query:
SELECT ID, COUNT(*) AS "Count"
FROM tableX
GROUP BY ID ;
produces this:
ID Count
1 5
2 2
3 1
but you want this?:
ID Count
1 5
1 5
1 5
1 5
1 5
2 2
2 2
3 1
Then, this query will do:
SELECT grp.ID, grp."Count"
FROM
tableX AS t
JOIN
( SELECT ID, COUNT(*) AS "Count"
FROM tableX
GROUP BY ID
) AS grp
ON grp.ID = t.ID ;
It will work in almost all DBMS and in all versions of SQL-Server. For SQL-Server versions 2005 and newer (and also in Oracle and Postgres), the answer with the OVER clause looks more elegant and may be prefered. Test in your version which one is more efficient. I think that in 2012 version, queries with OVER clause are quite efficient.

You can use count() with OVER clause:
select a, count(*) over (partition by a) as [count]
from tableName ;
It's called window function. I recommend you study these.

Filter Duplicate Rows on Conditions

I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2

Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id

select id, rid, min(modified), max(active), did from foo group by rid, did order by id;

You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.