Filter Duplicate Rows on Conditions - sql-server

I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2

Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id

select id, rid, min(modified), max(active), did from foo group by rid, did order by id;

You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.

Related

How to Sum (MAX values) from different value groups in same column SQL Server

I have a table like this:
Date
Consec_Days
2015-01-01
1
2015-01-03
1
2015-01-06
1
2015-01-07
2
2015-01-09
1
2015-01-12
1
2015-01-13
2
2015-01-14
3
2015-01-17
1
I need to Sum the max value (days) for each of the consecutive groupings where Consec_Days are > 1. So the correct result would be 5 days.
This is a type of gaps-and-islands problem.
There are many solutions, here is one simple one
Get the start points of each group using LAG
Calculate a grouping ID using a windowed conditional count
Group by that ID and take the highest sum
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN LAG(Consec_Days) OVER (ORDER BY Date) = 1 THEN 1 END
FROM YourTable t
),
Groupings AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (ORDER BY Date)
FROM StartPoints
WHERE Consec_Days > 1
)
SELECT TOP (1)
SUM(Consec_Days)
FROM Groupings
GROUP BY
GroupId
ORDER BY
SUM(Consec_Days) DESC;
db<>fiddle
with cte as (
select Consec_Days,
coalesce(lead(Consec_Days) over (order by Date), 1) as next
from YourTable
)
select sum(Consec_Days)
from cte
where Consec_Days <> 1 and next = 1
db<>fiddle

reset window function when the time gap is over one hour

I have a dataset already sorted by a window function in sql:
ROW_NUMBER() OVER (PARTITION BY LOAN_NUMBER, CAST(CREATED_DATE AS DATE) ORDER BY LOAN_NUMBER, CREATED_DATE) AS ROW_IDX
shown as above. I wonder if there's a way that reset the ROW_IDX when the CREATED_DATE has begun to have a value with over one hour gap to the minimum datetime in a specific day.
For example, the row index for row 3 should be 1 because the time gap between 2016-11-03 15:39:16.000 and 2016-11-03 12:44:11.000 is over one hour.And row index of row 4 will be 2.
I've tried several ways to manipulate the datatime column, since the consideration is about 'gap' instead of moments of the day, no rounding methods worked perfectly.
Are mean ,when the gap more than 60 minutes, will restart at 1?
Which version are you use? If it is SQL Server 2012+, you can try this.
The following query is not satisfying, but wish can give you help.
Calculating the diff minutes between continuous two line.
Check the diff minutes whether greater than one hour
Get row number base on the gap time has same situation continuously.
Sorry if I can not describe clear. My english is not well.
;WITH tb(RptDate,ISSUE_ID,ACCOUNT,CREATED_DATE )AS(
select '2017-01-17','35775','76505156','2016-11-03 12:44:11.000' UNION
select '2017-01-17','35793','76505156','2016-11-03 12:51:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 13:47:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 14:45:43.000' UNION
select '2017-01-17','36097','76505156','2016-11-03 15:39:16.000' UNION
select '2017-01-17','36132','76505156','2016-11-03 15:52:51.000' UNION
select '2017-01-17','41391','76505156','2016-11-10 10:49:30.000'
)
SELECT *,ROW_NUMBER()OVER(PARTITION BY tt.ACCOUNT,a ORDER BY tt.ACCOUNT, rn) AS ROW_IDX FROM (
SELECT * ,rn-ROW_NUMBER () OVER (PARTITION BY ACCOUNT, CAST(CREATED_DATE AS DATE),n ORDER BY rn) AS a
FROM (
SELECT *, ROW_NUMBER()OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE) AS rn
,CASE WHEN DATEDIFF(MINUTE, LAG(CREATED_DATE)OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE),tb.CREATED_DATE)>60 THEN 1 ELSE 0 END AS n
,ISNULL(DATEDIFF(MINUTE, LAG(CREATED_DATE)OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE),tb.CREATED_DATE),0) AS DiffMin
FROM tb
) AS t
) AS tt
ORDER BY rn
RptDate ISSUE_ID ACCOUNT CREATED_DATE rn n DiffMin a ROW_IDX
---------- -------- -------- ----------------------- -------------------- ----------- ----------- -------------------- --------------------
2017-01-17 35775 76505156 2016-11-03 12:44:11.000 1 0 0 0 1
2017-01-17 35793 76505156 2016-11-03 12:51:43.000 2 0 7 0 2
2017-01-17 36097 76505156 2016-11-03 15:39:16.000 3 1 168 2 1
2017-01-17 36132 76505156 2016-11-03 15:52:51.000 4 0 13 1 1
2017-01-17 41391 76505156 2016-11-10 10:49:30.000 5 1 9777 4 1
It is another script,Do not use the LAG function, Each step has a statement:
;WITH tb(RptDate,ISSUE_ID,ACCOUNT,CREATED_DATE )AS(
select '2017-01-17','35775','76505156','2016-11-03 12:44:11.000' UNION
select '2017-01-17','35793','76505156','2016-11-03 12:51:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 13:47:43.000' UNION
-- select '2017-01-17','35793','76505156','2016-11-03 14:45:43.000' UNION
select '2017-01-17','36097','76505156','2016-11-03 15:39:16.000' UNION
select '2017-01-17','36132','76505156','2016-11-03 15:52:51.000' UNION
select '2017-01-17','41391','76505156','2016-11-10 10:49:30.000'
),t1 AS(
SELECT *, ROW_NUMBER()OVER(PARTITION BY ACCOUNT ORDER BY CREATED_DATE) AS rn FROM tb
),t2 AS (
SELECT t1.*,CASE WHEN DATEDIFF(MINUTE,tt.CREATED_DATE,t1.CREATED_DATE)>60 THEN 1 ELSE 0 END AS m
,t1.rn-ROW_NUMBER()OVER(PARTITION BY t1.ACCOUNT,CASE WHEN DATEDIFF(MINUTE,tt.CREATED_DATE,t1.CREATED_DATE)>60 THEN 1 ELSE 0 END ORDER BY t1.CREATED_DATE) AS a
FROM t1 LEFT JOIN t1 AS tt ON tt.ACCOUNT=t1.ACCOUNT AND tt.rn=t1.rn-1
),t3 AS(
SELECT *,ROW_NUMBER()OVER(PARTITION BY ACCOUNT,t2.a ORDER BY CREATED_DATE) AS ROW_IDX
FROM t2
)
SELECT * FROM t3
ORDER BY t3.ACCOUNT,t3.CREATED_DATE

SQL Server 2012, Rank() & SUM() Over() Until condition

I'm really confused on how to segment these groups into subgroups. This is an example of 2 orders (out of ~5M)
An order may have 1 or more "grouped items".
The group number = SUM(ItemQuantity).
Groups are ordered by OrderLine
eg. In the below table we see one group of "3" & two groups of "2"
OrderNo OrderLine GroupNo ItemQty
10496 1 3 1 =3
10496 2 3 1 =3
10496 3 3 1 =3
10496 4 2 1 =2(1)
10496 5 2 1 =2(1)
10496 6 2 1 =2(2)
10496 7 2 1 =2(2)
Rank() & Dense_Rank dont solve the issue as there are multiples of the same group, OrderLines are different.
I'll be joining this to another table in the end but what I'd like is a way to differentiate the same groups. Perhaps by adding a "subgroup" field.
OrderNo OrderLine GroupNo ItemQty Subgroup
10496 1 3 1 300
10496 2 3 1 300
10496 3 3 1 300
10496 4 2 1 201
10496 5 2 1 201
10496 6 2 1 202
10496 7 2 1 202
Testing below
CREATE TABLE #temptable(
OrderNo varchar(5),
OrderLine int,
GroupNo int,
ItemQty int);
INSERT INTO #temptable (OrderNo,OrderLine,GroupNo,ItemQty)
VALUES
('10496','1','3','1'),
('10496','2','3','1'),
('10496','3','3','1'),
('10495','1','4','1'),
('10495','2','4','2'),
('10495','3','4','1'),
('10495','4','2','1'),
('10495','5','2','1'),
('10495','6','3','1'),
('10495','7','3','2'),
('10495','8','2','1'),
('10495','9','2','1'),
('10495','10','2','1'),
('10495','11','2','1'),
('10495','12','2','1'),
('10495','13','2','1');
A DO WHILE
SUM(ItemQty)Over(Partition by OrderNo,GroupNo Order by OrderLine) >= GroupNo
may work but it'll need to run for every group in every order.
I then started using XML path to query each line but it's really not going to be efficient.
SELECT distinct t1.OrderNo,t1.GroupNo,
STUFF(( SELECT ',' + QUOTENAME(t2.OrderLine)
FROM #temptable t2
WHERE
t2.OrderNo = t1.OrderNo AND t2.GroupNo = t1.GroupNo
Order by t2.OrderLine Asc
FOR XML PATH(''),TYPE
).value('.', 'NVARCHAR(MAX)') ,1,1,'' )
AS [Rows]
FROM #temptable t1
Order by t1.OrderNo,t1.GroupNo
Taking #Nick.McDermaid s advice about the mod % here's a solution, admittedly it could be improved but for now it'll work out.
With a as (
select OrderNo,OrderLine,GroupNo,ItemQty
,CASE
WHEN SUM(ItemQty)Over
(Partition by OrderNo,GroupNo Order by OrderNo,OrderLine) % GroupNo=1
THEN GroupNo*100
ELSE NULL END as SG
from #temptable )
Select a.OrderNo,a.OrderLine,a.ItemQty,a.GroupNo
,MAX(a.SG2)Over(Partition by a.OrderNo,a.GroupNo Order by a.OrderNo,a.OrderLine ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as Subgroup
from
(Select OrderNo,OrderLine,GroupNo,ItemQty
,CASE WHEN SG IS NULL THEN NULL ELSE SG+RANK()Over(Partition by OrderNo,SG Order by OrderNo,OrderLine) END as SG2
from a )a
Order by a.OrderNo,a.OrderLine;

TSQL - Difficult Grouping

Please see fiddle: http://sqlfiddle.com/#!6/e6768/2
I have data, like below:
DRIVER DROP
1 1
1 2
1 ReturnToBase
1 4
1 5
1 ReturnToBase
1 6
1 7
2 1
2 2
2 ReturnToBase
2 4
I am trying to group my data, so for each driver, each group of return to bases have a grouping number.
My output should look like this:
DRIVER DROP GROUP
1 1 1
1 2 1
1 ReturnToBase 1
1 4 2
1 5 2
1 ReturnToBase 2
1 6 3
1 7 3
1 ReturnToBase 3
2 1 1
2 2 1
2 ReturnToBase 1
2 4 2
I've tried getting this result with a combination of windowed functions but I've been miles off so far
Below is what I had so far, it isn't supposed to be functional I was trying to figure out how it could be done, if it's even possible.
SELECT
ROW_NUMBER() OVER (Partition BY Driver order by Driver Desc) rownum,
Count(1) OVER (Partition By Driver Order By Driver Desc) counter,
Count
DropNo,
Driver,
CASE DropNo
WHEN 'ReturnToBase' THEN 1 ELSE 0 END AS EnumerateRound
FROM
Rounds
You can use the following query:
SELECT id, DRIVER, DROPno,
1 + SUM(flag) OVER (PARTITION BY DRIVER ORDER BY id) -
CASE
WHEN DROPno = 'ReturnToBase' THEN 1
ELSE 0
END AS grp
FROM (
SELECT id, DRIVER, DROPno,
CASE
WHEN DROPno = 'ReturnToBase' THEN 1
ELSE 0
END AS flag
FROM rounds ) AS t
Demo here
This query uses windowed version of SUM with ORDER BY in the OVER clause to calculate a running total. This version of SUM is available from SQL Server 2012 onwards AFAIK.
Fiddling a bit with this running total value is all we need in order to get the correct GROUP value.
EDIT: (credit goes to #Conrad Frix)
Using CROSS APPLY instead of an in-line view can considerably simplify things:
SELECT id, DRIVER, DROPno,
1 + SUM(x.flag) OVER (PARTITION BY DRIVER ORDER BY id) - x.flag
FROM rounds
CROSS APPLY (SELECT CASE WHEN DROPno = 'ReturnToBase' THEN 1 ELSE 0 END) AS x(flag)
Demo here
Added a sequential ID column to your example for use in a recursive CTE:
with cte as (
select ID,DRIVER,DROPno,1 as GRP
FROM rounds
where ID = 1
union all
select a.ID
,a.DRIVER
,a.DROPno
,case when b.DROPno = 'ReturnToBase'
or b.DRIVER <> a.DRIVER then b.GRP + 1
else b.GRP end
from rounds a
inner join cte b
on a.ID = b.ID + 1
)
select * from cte
SQL Fiddle

Get Sum of Count

The View obtains the first three columns. I need to add one more column (totalCount) to the view that obtains the total count:
CId CCId CCount totalCount
1 a 3 6
1 a 3 6
1 b 3 6
1 c 3 6
2 b 2 6
2 b 2 6
2 a 2 6
2 a 2 6
3 v 1 6
How to get the totalCount as 6?
(Business rule for Cid=1 Ccount=3 Cid=2 Ccount=2 Cid=3 Ccount=1 So the totalCount =3+2+1 =6)
SELECT a.CID, a.CCID, a.CCOUNT,
b.TotalCount
FROM Table1 a, (SELECT SUM(DISTINCT cCOunt) TotalCount
FROM Table1) b
SQLFiddle Demo
UPDATE
As Andomar pointed out on the comment, An update has been made on the query,
SELECT a.CID, a.CCID, a.CCOUNT,
b.TotalCount
FROM Table1 a,
(
SELECT SUM(TotalCount) TotalCount
FROM
(
SELECT MAX(cCOunt) TotalCount
FROM Table1
GROUP BY CId
) c
) b
SQLFiddle Demo
With this code I came to the desired result:
select CId
,CCId
,CCount
,(select SUM(a.tcount)
from (select distinct CId ,CCount as tcount
from dbo.Test) as a ) totalcount
from dbo.Test
From your example data, I'm assuming a Cid can only have one, possibly repeated, value of CCount. In that case you can pick a random one (say max) using a group by, and sum those:
select sum(OneCCCount) as TotalCount
from (
select max(CCount) as OneCCCount
from YourTable
group by
CId
) as SubQueryAlias

Resources