I'm really confused on how to segment these groups into subgroups. This is an example of 2 orders (out of ~5M)
An order may have 1 or more "grouped items".
The group number = SUM(ItemQuantity).
Groups are ordered by OrderLine
eg. In the below table we see one group of "3" & two groups of "2"
OrderNo OrderLine GroupNo ItemQty
10496 1 3 1 =3
10496 2 3 1 =3
10496 3 3 1 =3
10496 4 2 1 =2(1)
10496 5 2 1 =2(1)
10496 6 2 1 =2(2)
10496 7 2 1 =2(2)
Rank() & Dense_Rank dont solve the issue as there are multiples of the same group, OrderLines are different.
I'll be joining this to another table in the end but what I'd like is a way to differentiate the same groups. Perhaps by adding a "subgroup" field.
OrderNo OrderLine GroupNo ItemQty Subgroup
10496 1 3 1 300
10496 2 3 1 300
10496 3 3 1 300
10496 4 2 1 201
10496 5 2 1 201
10496 6 2 1 202
10496 7 2 1 202
Testing below
CREATE TABLE #temptable(
OrderNo varchar(5),
OrderLine int,
GroupNo int,
ItemQty int);
INSERT INTO #temptable (OrderNo,OrderLine,GroupNo,ItemQty)
VALUES
('10496','1','3','1'),
('10496','2','3','1'),
('10496','3','3','1'),
('10495','1','4','1'),
('10495','2','4','2'),
('10495','3','4','1'),
('10495','4','2','1'),
('10495','5','2','1'),
('10495','6','3','1'),
('10495','7','3','2'),
('10495','8','2','1'),
('10495','9','2','1'),
('10495','10','2','1'),
('10495','11','2','1'),
('10495','12','2','1'),
('10495','13','2','1');
A DO WHILE
SUM(ItemQty)Over(Partition by OrderNo,GroupNo Order by OrderLine) >= GroupNo
may work but it'll need to run for every group in every order.
I then started using XML path to query each line but it's really not going to be efficient.
SELECT distinct t1.OrderNo,t1.GroupNo,
STUFF(( SELECT ',' + QUOTENAME(t2.OrderLine)
FROM #temptable t2
WHERE
t2.OrderNo = t1.OrderNo AND t2.GroupNo = t1.GroupNo
Order by t2.OrderLine Asc
FOR XML PATH(''),TYPE
).value('.', 'NVARCHAR(MAX)') ,1,1,'' )
AS [Rows]
FROM #temptable t1
Order by t1.OrderNo,t1.GroupNo
Taking #Nick.McDermaid s advice about the mod % here's a solution, admittedly it could be improved but for now it'll work out.
With a as (
select OrderNo,OrderLine,GroupNo,ItemQty
,CASE
WHEN SUM(ItemQty)Over
(Partition by OrderNo,GroupNo Order by OrderNo,OrderLine) % GroupNo=1
THEN GroupNo*100
ELSE NULL END as SG
from #temptable )
Select a.OrderNo,a.OrderLine,a.ItemQty,a.GroupNo
,MAX(a.SG2)Over(Partition by a.OrderNo,a.GroupNo Order by a.OrderNo,a.OrderLine ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as Subgroup
from
(Select OrderNo,OrderLine,GroupNo,ItemQty
,CASE WHEN SG IS NULL THEN NULL ELSE SG+RANK()Over(Partition by OrderNo,SG Order by OrderNo,OrderLine) END as SG2
from a )a
Order by a.OrderNo,a.OrderLine;
Related
I have below sample input table. In real it has lots of records.
Input:
ID
Classification
123
1
123
2
123
3
123
4
657
1
657
3
657
4
For a 'ID', I want it's records should have 'Classification' column contains all the values 1, 2, 3 and 4. If any of these values are not present then that ID's records should be considered as an exception. The output should be as below.
ID
Classification
Flag
123
1
0
123
2
0
123
3
0
123
4
0
657
1
1
657
3
1
657
4
1
Can someone please help me with how can this can be achieved in sql server.
Thanks.
There are a couple of options here, which is more performant is up to you to test, not me (especially when I don't know what indexes you have). One uses conditional aggregation, to check that all the values are there, and the other uses a subquery and counts the DISTINCT values (as I don't know if there could be duplicate classifications):
SELECT *
INTO dbo.YourTable
FROM (VALUES(123,1),
(123,2),
(123,3),
(123,4),
(657,1),
(657,3),
(657,4))V(ID,Classification);
GO
CREATE CLUSTERED INDEX CI_YourIndex ON dbo.YourTable (ID,Classification);
GO
SELECT ID,
Classification,
CASE WHEN COUNT(CASE YT.Classification WHEN 1 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 2 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 3 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 4 THEN 1 END) OVER (PARTITION BY ID) > 0 THEN 1 ELSE 0
END AS Flag
FROM dbo.YourTable YT;
GO
SELECT ID,
Classification,
CASE (SELECT COUNT(DISTINCT sq.Classification)
FROM dbo.YourTable sq
WHERE sq.ID = YT.ID
AND sq.Classification IN (1,2,3,4)) WHEN 4 THEN 1 ELSE 0
END AS Flag
FROM dbo.YourTable YT;
GO
DROP TABLE dbo.YourTable;
I have a data set produced from a UNION query that aggregates data from 2 sources.
I want to select that data based on whether or not data was found in only of those sources,or both.
The data relevant parts of the set looks like this, there are a number of other columns:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
1
135
1
5
1
135
2
6
1
135
3
7
2
111
1
8
2
135
1
The [preference] column combined with the [group] column is what I'm trying to filter on, I want to return all the rows that have the same [preference] as the MIN([preference]) for each [group]
The desired output given the data above would be rows 1 -> 6
The [preference] column indicates the original source of the data in the UNION query so a legitimate data set could look like:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
2
111
1
5
2
135
1
In which case the desired output would be rows 1,2,3, & 5
What I can't work out is how to do (not real code):
SELECT * WHERE [preference] = MIN([preference]) PARTITION BY [group]
One way to do this is using RANK:
SELECT row
, preference
, [group]
, position
FROM (
SELECT row
, preference
, [group]
, position
, RANK() OVER (PARTITION BY [group] ORDER BY preference) AS seq
FROM t) t2
WHERE seq = 1
Demo here
Should by doable via simple inner join:
SELECT t1.*
FROM t AS t1
INNER JOIN (SELECT [group], MIN(preference) AS preference
FROM t
GROUP BY [group]
) t2 ON t1.[group] = t2.[group]
AND t1.preference = t2.preference
We have a table with a parent child relationship, that represents a deep tree structure.
We are using a view with a CTE to query the data but the performance is poor (see code and execution plan below).
Is there any way we can improve the performance?
WITH cte (ParentJobTypeId, Id) AS
(
SELECT
Id, Id
FROM
dbo.JobTypes
UNION ALL
SELECT
e.Id, cte.Id
FROM
cte
INNER JOIN
dbo.JobTypes AS e ON e.ParentJobTypeId = cte.ParentJobTypeId
)
SELECT
ISNULL(Id, 0) AS ParentJobTypeId,
ISNULL(ParentJobTypeId, 0) AS Id
FROM
cte
A quick example of using the range keys. As I mentioned before, hierarchies were 127K points and some sections where 15 levels deep
The cte Builds, let's assume the hier results will be will be stored in a table (indexed as well)
Declare #Table table(ID int,ParentID int,[Status] varchar(50))
Insert #Table values
(1,101,'Pending'),
(2,101,'Complete'),
(3,101,'Complete'),
(4,102,'Complete'),
(101,null,null),
(102,null,null)
;With cteOH (ID,ParentID,Lvl,Seq)
as (
Select ID,ParentID,Lvl=1,cast(Format(ID,'000000') + '/' as varchar(500)) from #Table where ParentID is null
Union All
Select h.ID,h.ParentID,cteOH.Lvl+1,Seq=cast(cteOH.Seq + Format(h.ID,'000000') + '/' as varchar(500)) From #Table h INNER JOIN cteOH ON h.ParentID = cteOH.ID
),
cteR1 as (Select ID,Seq,R1=Row_Number() over (Order by Seq) From cteOH),
cteR2 as (Select A.ID,R2 = max(B.R1) From cteOH A Join cteR1 B on (B.Seq Like A.Seq+'%') Group By A.ID)
Select B.R1
,C.R2
,A.Lvl
,A.ID
,A.ParentID
Into #TempHier
From cteOH A
Join cteR1 B on (A.ID=B.ID)
Join cteR2 C on (A.ID=C.ID)
Select * from #TempHier
Select H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
,Total = count(*)
,Complete = sum(case when D.Status = 'Complete' then 1 else 0 end)
,Pending = sum(case when D.Status = 'Pending' then 1 else 0 end)
,PctCmpl = format(sum(case when D.Status = 'Complete' then 1.0 else 0.0 end)/count(*),'##0.00%')
From #TempHier H
Join (Select _R1=B.R1,A.* From #Table A Join #TempHier B on A.ID=B.ID) D on D._R1 between H.R1 and H.R2
Group By H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
Order By 1
Returns the hier in a #Temp table for now. Notice the R1 and R2, I call these the range keys. Data (without recursion) can be selected and aggregated via these keys
R1 R2 Lvl ID ParentID
1 4 1 101 NULL
2 2 2 1 101
3 3 2 2 101
4 4 2 3 101
5 6 1 102 NULL
6 6 2 4 102
VERY SIMPLE EXAMPLE: Illustrates the rolling the data up the hier.
R1 R2 Lvl ID ParentID Total Complete Pending PctCmpl
1 4 1 101 NULL 4 2 1 50.00%
2 2 2 1 101 1 0 1 0.00%
3 3 2 2 101 1 1 0 100.00%
4 4 2 3 101 1 1 0 100.00%
5 6 1 102 NULL 2 1 0 50.00%
6 6 2 4 102 1 1 0 100.00%
The real beauty of the the range keys, is if you know an ID, you know where it exists (all descendants and ancestors).
The View obtains the first three columns. I need to add one more column (totalCount) to the view that obtains the total count:
CId CCId CCount totalCount
1 a 3 6
1 a 3 6
1 b 3 6
1 c 3 6
2 b 2 6
2 b 2 6
2 a 2 6
2 a 2 6
3 v 1 6
How to get the totalCount as 6?
(Business rule for Cid=1 Ccount=3 Cid=2 Ccount=2 Cid=3 Ccount=1 So the totalCount =3+2+1 =6)
SELECT a.CID, a.CCID, a.CCOUNT,
b.TotalCount
FROM Table1 a, (SELECT SUM(DISTINCT cCOunt) TotalCount
FROM Table1) b
SQLFiddle Demo
UPDATE
As Andomar pointed out on the comment, An update has been made on the query,
SELECT a.CID, a.CCID, a.CCOUNT,
b.TotalCount
FROM Table1 a,
(
SELECT SUM(TotalCount) TotalCount
FROM
(
SELECT MAX(cCOunt) TotalCount
FROM Table1
GROUP BY CId
) c
) b
SQLFiddle Demo
With this code I came to the desired result:
select CId
,CCId
,CCount
,(select SUM(a.tcount)
from (select distinct CId ,CCount as tcount
from dbo.Test) as a ) totalcount
from dbo.Test
From your example data, I'm assuming a Cid can only have one, possibly repeated, value of CCount. In that case you can pick a random one (say max) using a group by, and sum those:
select sum(OneCCCount) as TotalCount
from (
select max(CCount) as OneCCCount
from YourTable
group by
CId
) as SubQueryAlias
I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id
select id, rid, min(modified), max(active), did from foo group by rid, did order by id;
You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.