Order by and top on grouped rows - sql-server

How do I order by date desc my grouped items and get top 20?
For example: The table Orderproduct have OrderID, ProductId, Date, Price, I want to group by ProductId and sort each grouped by Date desc then get top 20 and avg(price).
On LINQ, this is how (but the sql generated is very dirty and the performance is very bad).
OrderProduct
.GroupBy(g => g.ProductId)
.Select(s => new{ s.Key, Values = s.OrderByDescending(o => o.Date).Take(20) })
.Select(s => new{ Avg = s.Values.Average(a => a.Price) } )

If I understand your question this might work for you.
select ProductId,
avg(price) as AvgPrice
from ( select ProductId,
Price,
row_number() over(partition by ProductId order by [Date] desc) as rn
from Orderproduct
) as O
where rn <= 20
group by ProductId

Based on your comments, this may work:
SELECT [Date],
ProductID,
MIN(Price) as Min,
MAX(Price) as MAX,
AVG(Price) as Avg
FROM OrderProduct o1
WHERE [Date] IN (SELECT TOP 20 [Date]
FROM OrderProduct o2
WHERE o1.productid = o2.productid)
GROUP BY [Date], ProductID
ORDER BY [Date] DESC

Does this get you want you want for ALL rows? I understand you ONLY want the top 20 for each product. I just don't want to spend the time on top 20 if this is not correct. And does top 20 mean 20 highest prices, or 20 most recent dates, or 20 most recent (can there be more than one per day per productID?)?
SELECT [ProductID], [Date], [Price]
FROM [OrderProduct]
ORDER BY [ProductID] asc, [Date] desc, [Price] desc
COMPUTE AVG(Price) BY [ProductID];

Related

Average day gap in between a repeat order for each product

Can someone please help me to find the average time between first and second purchase on a product level.
This is what I have written -
Select A.CustomerId,A.ProductId , A.OrderSequence, (Case WHEN OrderSequence = 1 THEN OrderDate END) AS First_Order_Date,
MAX(Case WHEN OrderSequence = 2 THEN OrderDate END) AS Second_Order_Date
From
(
Select t.CustomerId, t.ProductId, t.OrderDate,
Dense_RANK() OVER (PARTITION BY t.CustomerId, t.ProductId ORDER BY OrderDate Asc) as OrderSequence
From Transactions t (NOLOCK)
Where t.SiteKey = 01
Group by t.CustomerId, t.ProductId, t.OrderDate)
A
Where A.OrderSequence IN (1,2)
Group By A.Customer_Id, A.ProductId, A.OrderSequence, A.OrderDate
Sample Data:
It looks like row-numbering and LEAD should do the trick for you here.
Don't use NOLOCK unless you really know what you're doing
It's unclear if you want the results to be partitioned by CustomerId also. If not, you can remove it everywhere in the query
SELECT
A.CustomerId,
A.ProductId,
AVG(DATEDIFF(day, OrderDate, NextOrderDate))
FROM
(
SELECT
t.CustomerId,
t.ProductId,
t.OrderDate,
ROW_NUMBER() OVER (PARTITION BY t.CustomerId, t.ProductId ORDER BY OrderDate) AS rn,
LEAD(OrderDate) OVER (PARTITION BY t.CustomerId, t.ProductId ORDER BY OrderDate) AS NextOrderDate
FROM Transactions t
WHERE t.SiteKey = '01'
) t
WHERE t.rn = 1
GROUP BY
t.Customer_Id,
t.ProductId;

Reporting Previous Records on SQL Server

I’m struggling a bit here. The data is fabricated, but the query concept is very real.
I need to select the Customer, Current Amount, Previous Amount, Sequence and Date
WHERE DATE < 1190105
AND the DATE/SEQ is the maximum date/seq prior to that date point grouping by customer.
I’ve spent quite a few days trying all sorts of things using HAVING, nested select to try and obtain the max-date/amount and min-date/amount by customer and can’t quite get my head around it. I’m sure it should be quite easy, but any help you can offer would be really appreciated.
Thanks
**SEQ DATE CUSTOMER AMOUNT**
1 1181225 Bob 400
2 1181226 Fred 300
3 1190101 Bob 100
4 1190104 Fred 500
5 1190104 George 200
6 1190105 Bob 150
7 1190106 Bob 200
8 1190110 Fred 160
9 1190110 Bob 300
10 1190112 Fred 400
Opt 1 use row number and lag functions
SELECT
ROW_NUMBER() OVER (Partition By CustomerID Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
Lead(Amount) OVER (Partition By CustomerID, Order By [Date]) as PreviousAmount
FROM
YourTable
WHERE
[DATE] < 1190105
Opt use outer apply
SELECT
ROW_NUMBER() OVER (Partition By Customer Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
Prev.Amount as PreviousAmount
FROM
YourTable T
OUTER APPLY (
SELECT TOP 1 Amount FROM YourTable
WHERE Customer = T.Customer AND [Date] < T.[Date]
ORDER BY [DATE] DESC
) Prev
WHERE
DATE < 1190105
Opt 3 use a correlated subquery
SELECT
ROW_NUMBER() OVER (Partition By Customer Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
(
SELECT TOP 1 Amount FROM YourTable
WHERE Customer = T.Customer AND [Date] < T.[Date]
ORDER BY [DATE] DESC
) as PreviousAmount
FROM YourTable
WHERE
DATE < 1190105
First restrict the rows with the date filter, then search for the max by customer.
Using GROUP BY:
DECLARE #FilterDate INT = 1190105
;WITH MaxDateByCustomer AS
(
SELECT
T.CUSTOMER,
MaxSEQ = MAX(T.SEQ)
FROM
YourTable AS T
WHERE
T.Date < #FilterDate
GROUP BY
T.CUSTOMER
)
SELECT
T.*
FROM
YourTable AS T
INNER JOIN MaxDateByCustomer AS M ON
T.CUSTOMER = M.CUSTOMER AND
T.SEQ = M.MaxSEQ
Using ROW_NUMBER window function:
DECLARE #FilterDate INT = 1190105
;WITH DateRankingByCustomer AS
(
SELECT
T.*,
DateRanking = ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER ORDER BY T.SEQ DESC)
FROM
YourTable AS T
WHERE
T.Date < #FilterDate
)
SELECT
D.*
FROM
DateRankingByCustomer AS D
WHERE
D.DateRanking = 1

How to select only first ROW_NUMBER combined with SUM

I like to group my table by [ID] while using SUM and also bring back
[Product_Name] of the top ROW_NUMBER - not sure if I should use ROW_NUMBER, GROUPING SETS or loop through everything with FETCH... this is what I tried:
DECLARE #SampleTable TABLE
(
[ID] INT,
[Price] MONEY,
[Product_Name] VARCHAR(50)
)
INSERT INTO #SampleTable
VALUES (1, 100, 'Product_1'), (1, 200, 'Product_2'),
(1, 300, 'Product_3'), (2, 500, 'Product_4'),
(2, 200, 'Product_5'), (2, 300, 'Product_6');
SELECT
[ID],
[Product_Name],
[Price],
SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total],
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM
#SampleTable T1
My desired results - only two records:
1 Product_1 100.00 600.00 1
2 Product_4 500.00 1000.00 1
Any help or guidance is highly appreciated.
UPDATE:
I end up using what Prateek Sharma suggested in his comment, to simply wrap the query with another SELECT WHERE [Row_Number] = 1
SELECT * FROM
(
SELECT
[ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM #SampleTable
) MultipleRows
WHERE [Row_Number] = 1
You should have a column on which you will perform ORDER BY for ROW_NUMBER(). In this case if you want to only rely on the table self index then it's OK to use ID column for ORDER BY.
Hence your query is correct and you can go with it.
Other option is to use WITH TIES clause. BUT again, If you will use WITH TIES clause with the ORDER BY on ID column then performance will be very poor. WITH TIES only performs well if you have well defined index. And, then can use that indexed column with WITH TIES clause.
SELECT TOP 1 WITH TIES *
FROM (
SELECT [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable
) TAB
ORDER BY ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY <IndexedColumn> DESC)
This query may help you bit. But remember, it is also not going to provide better performance than the query written by you. It is only reducing the line of code.
One option is using the WITH TIES clause. No extra field RN.
Hopefully, you have a proper sequence number or date which can be used in either the sum() over or in the final row_number() over
Example
SELECT Top 1 with ties *
From (
Select [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable T1
) A
Order By ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [Price_Total] Desc)
Returns
ID Product_Name Price Price_Total
1 Product_1 100.00 600.00
2 Product_4 500.00 1000.00
There is no "top ROW_NUMBER" unless you have a column that defines ordering.
If you just want an arbitary row per id you can use the below. To deterministically pick one you would need to order by deterministic unique criteria.
DECLARE #SampleTable TABLE
(
ID INT,
Price MONEY,
Product_Name VARCHAR(50),
INDEX cix CLUSTERED (ID)
);
INSERT INTO #SampleTable
VALUES (1,100,'Product_1'),
(1,200,'Product_2'),
(1,300,'Product_3'),
(2,500,'Product_4'),
(2,200,'Product_5'),
(2,300,'Product_6');
WITH T AS
(
SELECT *,
OrderingColumn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM #SampleTable
)
SELECT ID,
SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Product_Name)), 11, 50) AS Product_Name,
CAST(SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Price)), 11, 50) AS MONEY) AS Price,
SUM(Price) AS Price_Total
FROM T
GROUP BY ID
The plan for this is pretty efficient as it is able to use the index ordered by id and has no additional sorts, spools, or passes through the table.

SQL LAG Days since last order

Hi I am trying to create a windowed query in SQL that shows me the days since last order for each customer.
It now shows me the days in between each order.
What do I need to change in my query to have it only show the days since the last and the previous order per customer? Now it shows it for every order the customer made.
Query:
SELECT klantnr,besteldatum,
DATEDIFF(DAY,LAG(besteldatum) OVER(PARTITION BY klantnr ORDER BY besteldatum),besteldatum) AS DaysSinceLastOrder
FROM bestelling
GROUP BY klantnr,besteldatum;
You can use row_number() to order the rows by besteldatum for each klantnr, and return the latest two using a derived table (subquery) or common table expression.
derived table version:
select klantnr, besteldatum, DaysSinceLastOrder
from (
select klantnr, besteldatum
, DaysSinceLastOrder = datediff(day,lag(besteldatum) over (partition by klantnr order by besteldatum),besteldatum)
, rn = row_number() over (partition by klantnr order by besteldatum desc)
from bestelling
group by klantnr, besteldatum
) t
where rn = 1
common table expression version:
;with cte as (
select klantnr, besteldatum
, DaysSinceLastOrder = datediff(day,lag(besteldatum) over (partition by klantnr order by besteldatum),besteldatum)
, rn = row_number() over (partition by klantnr order by besteldatum desc)
from bestelling
group by klantnr, besteldatum
)
select klantnr, besteldatum, DaysSinceLastOrder
from cte
where rn = 1
If you want one row per customer, rn = 1 is the proper filter. If you want n number of latest rows, use rn < n+1.

SQL Server 2008 R2 GROUP BY or OVER

I have this table:
ID COLOR TYPE DATE
-------------------------------
1 blue A 2012.02.05
2 white V 2010.10.23
3 white V 2014.03.05
4 black S 2013.02.14
I'd like to select only the ID, but in case of 2nd and 3rd rows I want to select the 3rd row because of its latest DATE value.
I have tried this query but it gives back all the two rows:
SELECT
ID, MAX(DATE) OVER(PARTITION BY COLOR, TYPE)
FROM
TABLE
WHERE
...
How can I select just one column value while I group the rows by other columns, please?
;WITH CTE AS
(
SELECT * , ROW_NUMBER() OVER (PARTITION BY COLOR,[TYPE] ORDER BY [DATE] DESC) rn
FROM TableName
)
SELECT ID
,COLOR
,[TYPE]
,[DATE]
FROM CTE
WHERE rn = 1
OR
SELECT ID
,COLOR
,[TYPE]
,[DATE]
FROM
(
SELECT * , ROW_NUMBER() OVER (PARTITION BY COLOR,[TYPE] ORDER BY [DATE] DESC) rn
FROM TableName
) A
WHERE rn = 1

Resources