How to select only first ROW_NUMBER combined with SUM - sql-server

I like to group my table by [ID] while using SUM and also bring back
[Product_Name] of the top ROW_NUMBER - not sure if I should use ROW_NUMBER, GROUPING SETS or loop through everything with FETCH... this is what I tried:
DECLARE #SampleTable TABLE
(
[ID] INT,
[Price] MONEY,
[Product_Name] VARCHAR(50)
)
INSERT INTO #SampleTable
VALUES (1, 100, 'Product_1'), (1, 200, 'Product_2'),
(1, 300, 'Product_3'), (2, 500, 'Product_4'),
(2, 200, 'Product_5'), (2, 300, 'Product_6');
SELECT
[ID],
[Product_Name],
[Price],
SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total],
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM
#SampleTable T1
My desired results - only two records:
1 Product_1 100.00 600.00 1
2 Product_4 500.00 1000.00 1
Any help or guidance is highly appreciated.
UPDATE:
I end up using what Prateek Sharma suggested in his comment, to simply wrap the query with another SELECT WHERE [Row_Number] = 1
SELECT * FROM
(
SELECT
[ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM #SampleTable
) MultipleRows
WHERE [Row_Number] = 1

You should have a column on which you will perform ORDER BY for ROW_NUMBER(). In this case if you want to only rely on the table self index then it's OK to use ID column for ORDER BY.
Hence your query is correct and you can go with it.
Other option is to use WITH TIES clause. BUT again, If you will use WITH TIES clause with the ORDER BY on ID column then performance will be very poor. WITH TIES only performs well if you have well defined index. And, then can use that indexed column with WITH TIES clause.
SELECT TOP 1 WITH TIES *
FROM (
SELECT [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable
) TAB
ORDER BY ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY <IndexedColumn> DESC)
This query may help you bit. But remember, it is also not going to provide better performance than the query written by you. It is only reducing the line of code.

One option is using the WITH TIES clause. No extra field RN.
Hopefully, you have a proper sequence number or date which can be used in either the sum() over or in the final row_number() over
Example
SELECT Top 1 with ties *
From (
Select [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable T1
) A
Order By ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [Price_Total] Desc)
Returns
ID Product_Name Price Price_Total
1 Product_1 100.00 600.00
2 Product_4 500.00 1000.00

There is no "top ROW_NUMBER" unless you have a column that defines ordering.
If you just want an arbitary row per id you can use the below. To deterministically pick one you would need to order by deterministic unique criteria.
DECLARE #SampleTable TABLE
(
ID INT,
Price MONEY,
Product_Name VARCHAR(50),
INDEX cix CLUSTERED (ID)
);
INSERT INTO #SampleTable
VALUES (1,100,'Product_1'),
(1,200,'Product_2'),
(1,300,'Product_3'),
(2,500,'Product_4'),
(2,200,'Product_5'),
(2,300,'Product_6');
WITH T AS
(
SELECT *,
OrderingColumn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM #SampleTable
)
SELECT ID,
SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Product_Name)), 11, 50) AS Product_Name,
CAST(SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Price)), 11, 50) AS MONEY) AS Price,
SUM(Price) AS Price_Total
FROM T
GROUP BY ID
The plan for this is pretty efficient as it is able to use the index ordered by id and has no additional sorts, spools, or passes through the table.

Related

SQL Server Window Paging Based on # of Groups

Given the following table structure
Column
Id
Name
DateCreated
with the following data
id
Name
DateCreated
1
Joe
1/13/2021
2
Fred
1/13/2021
3
Bob
1/12/2021
4
Sue
1/12/2021
5
Sally
1/10/2021
6
Alex
1/9/2021
I need SQL that will page over the data based on datecreated. The query should return the top 3 records, and any record which also shares the datecreated of the top 3.
So give the data above, we should get back Joe, Fred and Bob (as the top 3 records) plus Sue since sue has the same date as Bob.
Is there something like ROW_NUMBER that increments for each row where it encounters a different value.
For some context this query is being used to generate an agenda type view, and once we select any date we want to keep all data for that date together.
EDIT
I do have a solution but it smells:
;WITH CTE AS ( SELECT ROW_NUMBER() OVER(ORDER BY DateCreated DESC) RowNum,CAST(DateCreated AS DATE) DateCreated,Name
FROM MyTable),
PAGE AS (SELECT *
FROM CTE
WHERE RowNum<=5)
SELECT *
FROM Page
UNION
SELECT *
FROM CTE
WHERE DateCreated=(SELECT MIN(DateCreated) FROM Page)
I've used a TOP 3 WITH TIES example and a ROW_NUMBER example and a CTE to return four records:
DROP TABLE IF EXISTS #tmp
GO
CREATE TABLE #tmp (
Id INT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
dateCreated DATE
)
GO
INSERT INTO #tmp VALUES
( 1, 'Joe', '13 Jan 2021' ),
( 2, 'Fred', '13 Jan 2021' ),
( 3, 'Bob', '12 Jan 2021' ),
( 4, 'Sue', '12 Jan 2021' ),
( 5, 'Sally', '10 Jan 2021' ),
( 6, 'Alex', '9 Jan 2021' )
GO
-- Gets same result
SELECT TOP 3 WITH TIES *
FROM #tmp t
ORDER BY dateCreated DESC
;WITH cte AS (
SELECT ROW_NUMBER() OVER( ORDER BY dateCreated DESC ) rn, *
FROM #tmp
)
SELECT *
FROM #tmp t
WHERE EXISTS
(
SELECT *
FROM cte c
WHERE rn <=3
AND t.dateCreated = c.dateCreated
)
My results:
As #Charlieface, we only need to replace ROW_NUMBER with DENSE_RANK. So that the ROW_NUMBER will be tied according to the same value.
When we run the query:
SELECT DENSE_RANK () OVER(ORDER BY DateCreated DESC) RowNum,CAST(DateCreated AS DATE) DateCreated,Name
FROM MyTable
The result will show as follows:
So as a result, we can set RowNum<=3 in the query to get the top 3:
;WITH CTE AS ( SELECT DENSE_RANK() OVER(ORDER BY DateCreated DESC) RowNum,CAST(DateCreated AS DATE) DateCreated,Name
FROM MyTable),
PAGE AS (SELECT *
FROM CTE
WHERE RowNum<=3)
SELECT *
FROM Page
UNION
SELECT *
FROM CTE
WHERE DateCreated=(SELECT MIN(DateCreated) FROM Page)
The First one is as yours the second one is as above. The results of the two queries are the same.
Kindly let us know if you need more infomation.

SQL Select Only the Most Common Results

I have a table with IDs and Items where sometimes the associated Item has a variation from the other Items associated with the same ID. I need a query that selects the most common Item and assigns it to that ID.
The below query works, but I'm hoping to optimize it to avoid having to join two separate CTEs at the end, and rather have one slick SELECT statement:
IF OBJECT_ID('tempdb..#Test') IS NOT NULL
DROP TABLE #Test
CREATE TABLE #Test
(
[ID] INT
,[Item] VARCHAR(20)
)
INSERT #Test
VALUES
(100, 'Apple'),
(100, 'Apple'),
(100, 'Apples'),
(200, 'Orange'),
(200, 'Orange'),
(200, 'Orange'),
(200, 'Oranges'),
(300, 'Grape');
WITH cteOne AS (SELECT
[ID]
,[Item]
,COUNT(*) [Count]
FROM #Test
GROUP BY [ID]
,[Item]
),
cteTwo AS (SELECT
[ID]
,MAX([Count]) [Max]
FROM cteOne
GROUP BY [ID])
SELECT
C1.[ID]
,C1.[Item]
FROM cteOne C1
INNER JOIN cteTwo C2 ON C2.[ID] = C1.[ID]
AND C2.[Max] = C1.[Count]
ORDER BY [ID]
Any help is appreciated!
You can try top 1 with ties with row_number
select
top 1 with ties [ID], [Item]
from (
SELECT
[ID], [Item], COUNT(*) [Count]
FROM #Test
GROUP BY [ID], [Item]
) t
order by row_number() over (partition by [ID] order by [Count] desc)
This is even better:
;WITH
cteOne AS (
SELECT [ID],[Item] ,COUNT(*) [Count]
FROM #Test
GROUP BY [ID],[Item]
),
cteTwoo as (
select *, ROW_NUMBER() over (partition by id order by count) idx
from cteOne
)
select ID, Item
from cteTwoo
where idx = 1

How to use Row_Number to group a resultset

i'm stuck with a query and i don't want to use a while loop or another nasty method to do this.
Here's the situation:
I've got a query that gets some data, and i need to calculate a column based on 2 other columns.
My results are as follow:
Type | Customer | Cycle | Amount | Expiration | Row_Number (Partition By Customer, Cycle)
So, my row_number column needs to "group" customers and cycles, here's a Fiddle to better understand it
Here's an example:
As you can see, iteration column is correctly applied as far as i know what row_number does.
But i need to do this:
Is there a way to do this with Row_Number ?
or should i need store the data in a temp table, loop through it and update this ITERATION column?
Maybe a CTE?
Any help on this will be highly appreciated. Thanks!
just run this as new query, replace what you need in your query...
WITH T(StyleID, ID)
AS (SELECT 1,1 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,2)
SELECT *,
RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS 'RANK',
ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) AS 'ROW_NUMBER',
DENSE_RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS 'DENSE_RANK'
FROM T
regards,
Valentin
You could use DENSE_RANK function instead of ROW_NUMBER.
DECLARE #MyTable TABLE
(
Customer NVARCHAR(100) NOT NULL,
[Cycle] SMALLINT NOT NULL
);
INSERT #MyTable VALUES ('C1', 2010);
INSERT #MyTable VALUES ('C1', 2010);
INSERT #MyTable VALUES ('C1', 2011);
INSERT #MyTable VALUES ('C1', 2012);
INSERT #MyTable VALUES ('C1', 2012);
INSERT #MyTable VALUES ('C1', 2012);
INSERT #MyTable VALUES ('C2', 2010);
INSERT #MyTable VALUES ('C2', 2010);
SELECT t.Customer, t.[Cycle],
DENSE_RANK() OVER(PARTITION BY t.Customer ORDER BY t.[Cycle]) AS Rnk
FROM #MyTable t
ORDER BY Customer, [Cycle];
Results:
Customer Cycle Rnk
-------- ------ ---
C1 2010 1
C1 2010 1
C1 2011 2
C1 2012 3
C1 2012 3
C1 2012 3
C2 2010 1
C2 2010 1
SQL Fiddle

How to use RANK() in SQL Server

I have a problem using RANK() in SQL Server.
Here’s my code:
SELECT contendernum,
totals,
RANK() OVER (PARTITION BY ContenderNum ORDER BY totals ASC) AS xRank
FROM (
SELECT ContenderNum,
SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a
The results for that query are:
contendernum totals xRank
1 196 1
2 181 1
3 192 1
4 181 1
5 179 1
What my desired result is:
contendernum totals xRank
1 196 1
2 181 3
3 192 2
4 181 3
5 179 4
I want to rank the result based on totals. If there are same value like 181, then two numbers will have the same xRank.
Change:
RANK() OVER (PARTITION BY ContenderNum ORDER BY totals ASC) AS xRank
to:
RANK() OVER (ORDER BY totals DESC) AS xRank
Have a look at this example:
SQL Fiddle DEMO
You might also want to have a look at the difference between RANK (Transact-SQL) and DENSE_RANK (Transact-SQL):
RANK (Transact-SQL)
If two or more rows tie for a rank, each tied rows receives the same
rank. For example, if the two top salespeople have the same SalesYTD
value, they are both ranked one. The salesperson with the next highest
SalesYTD is ranked number three, because there are two rows that are
ranked higher. Therefore, the RANK function does not always return
consecutive integers.
DENSE_RANK (Transact-SQL)
Returns the rank of rows within the partition of a result set, without
any gaps in the ranking. The rank of a row is one plus the number of
distinct ranks that come before the row in question.
To answer your question title, "How to use Rank() in SQL Server," this is how it works:
I will use this set of data as an example:
create table #tmp
(
column1 varchar(3),
column2 varchar(5),
column3 datetime,
column4 int
)
insert into #tmp values ('AAA', 'SKA', '2013-02-01 00:00:00', 10)
insert into #tmp values ('AAA', 'SKA', '2013-01-31 00:00:00', 15)
insert into #tmp values ('AAA', 'SKB', '2013-01-31 00:00:00', 20)
insert into #tmp values ('AAA', 'SKB', '2013-01-15 00:00:00', 5)
insert into #tmp values ('AAA', 'SKC', '2013-02-01 00:00:00', 25)
You have a partition which basically specifies grouping.
In this example, if you partition by column2, the rank function will create ranks for groups of column2 values. There will be different ranks for rows where column2 = 'SKA' than rows where column2 = 'SKB' and so on.
The ranks are decided like this:
The rank for every record is one plus the number of ranks that come before it in its partition. The rank will only increment when one of the fields you selected (other than the partitioned field(s)) is different than the ones that come before it. If all of the selected fields are the same, then the ranks will tie and both will be assigned the value, one.
Knowing this, if we only wanted to select one value from each group in column two, we could use this query:
with cte as
(
select *,
rank() over (partition by column2
order by column3) rnk
from t
) select * from cte where rnk = 1 order by column3;
Result:
COLUMN1 | COLUMN2 | COLUMN3 |COLUMN4 | RNK
------------------------------------------------------------------------------
AAA | SKB | January, 15 2013 00:00:00+0000 |5 | 1
AAA | SKA | January, 31 2013 00:00:00+0000 |15 | 1
AAA | SKC | February, 01 2013 00:00:00+0000 |25 | 1
SQL DEMO
You have to use DENSE_RANK rather than RANK. The only difference is that it doesn't leave gaps. You also shouldn't partition by contender_num, otherwise you're ranking each contender in a separate group, so each is 1st-ranked in their segregated groups!
SELECT contendernum,totals, DENSE_RANK() OVER (ORDER BY totals desc) AS xRank FROM
(
SELECT ContenderNum ,SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM dbo.Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a
order by contendernum
A hint for using StackOverflow, please post DDL and sample data so people can help you using less of their own time!
create table Cat1GroupImpersonation (
contendernum int,
criteria1 int,
criteria2 int,
criteria3 int,
criteria4 int);
insert Cat1GroupImpersonation select
1,196,0,0,0 union all select
2,181,0,0,0 union all select
3,192,0,0,0 union all select
4,181,0,0,0 union all select
5,179,0,0,0;
DENSE_RANK() is a rank with no gaps, i.e. it is “dense”.
select Name,EmailId,salary,DENSE_RANK() over(order by salary asc) from [dbo].[Employees]
RANK()-It contain gap between the rank.
select Name,EmailId,salary,RANK() over(order by salary asc) from [dbo].[Employees]
You have already grouped by ContenderNum, no need to partition again by it.
Use Dense_rank()and order by totals desc.
In short,
SELECT contendernum,totals, **DENSE_RANK()**
OVER (ORDER BY totals **DESC**)
AS xRank
FROM
(
SELECT ContenderNum ,SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM dbo.Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a
SELECT contendernum,totals, RANK() OVER (ORDER BY totals ASC) AS xRank FROM
(
SELECT ContenderNum ,SUM(Criteria1+Criteria2+Criteria3+Criteria4) AS totals
FROM dbo.Cat1GroupImpersonation
GROUP BY ContenderNum
) AS a
RANK() is good, but it assigns the same rank for equal or similar values. And if you need unique rank, then ROW_NUMBER() solves this problem
ROW_NUMBER() OVER (ORDER BY totals DESC) AS xRank
Select T.Tamil, T.English, T.Maths, T.Total, Dense_Rank()Over(Order by T.Total Desc) as Std_Rank From (select Tamil,English,Maths,(Tamil+English+Maths) as Total From Student) as T
enter image description here

Order by and top on grouped rows

How do I order by date desc my grouped items and get top 20?
For example: The table Orderproduct have OrderID, ProductId, Date, Price, I want to group by ProductId and sort each grouped by Date desc then get top 20 and avg(price).
On LINQ, this is how (but the sql generated is very dirty and the performance is very bad).
OrderProduct
.GroupBy(g => g.ProductId)
.Select(s => new{ s.Key, Values = s.OrderByDescending(o => o.Date).Take(20) })
.Select(s => new{ Avg = s.Values.Average(a => a.Price) } )
If I understand your question this might work for you.
select ProductId,
avg(price) as AvgPrice
from ( select ProductId,
Price,
row_number() over(partition by ProductId order by [Date] desc) as rn
from Orderproduct
) as O
where rn <= 20
group by ProductId
Based on your comments, this may work:
SELECT [Date],
ProductID,
MIN(Price) as Min,
MAX(Price) as MAX,
AVG(Price) as Avg
FROM OrderProduct o1
WHERE [Date] IN (SELECT TOP 20 [Date]
FROM OrderProduct o2
WHERE o1.productid = o2.productid)
GROUP BY [Date], ProductID
ORDER BY [Date] DESC
Does this get you want you want for ALL rows? I understand you ONLY want the top 20 for each product. I just don't want to spend the time on top 20 if this is not correct. And does top 20 mean 20 highest prices, or 20 most recent dates, or 20 most recent (can there be more than one per day per productID?)?
SELECT [ProductID], [Date], [Price]
FROM [OrderProduct]
ORDER BY [ProductID] asc, [Date] desc, [Price] desc
COMPUTE AVG(Price) BY [ProductID];

Resources