T-SQL: Get top n records by column - sql-server

Struggling with what's probably a very simple problem. I have a query like this:
;WITH rankedData
AS ( -- a big, complex subquery)
SELECT UserId,
AttributeId,
ItemId
FROM rankedData
WHERE rank = 1
ORDER BY datEventDate DESC
The sub-query is designed to grab a big chunk of interlined data and rank it by itemId and date, so that the rank=1 in the above query ensures we only get unique ItemIds, ordered by date. The partition is:
Rank() OVER (partition BY ItemId ORDER BY datEventDate DESC) AS rk
The problem is that what I want is the top 75 records for each UserID, ordered by date. Seeing as I've already got a rank inside my sub-query to sort out item duplicates by date, I can't see a straightforward way of doing this.
Cheers,
Matt

I think your query should look like
SELECT t.UserId, t.AttributeId, t.ItemId
FROM (
SELECT UserId, AttributeId, ItemId, rowid = ROW_NUMBER() OVER (
PARTITION BY UserId ORDER BY datEventDate
)
FROM rankedData
) t
WHERE t.rowid <= 75

Related

Partition with select distinct

I have a dataset that looks like this
StudentName Course Studentmailid Score
Student1 A student1#gmail.com. 80
Student1 A student1#gmail.com. 75
Student2 A student2#gmail.com. 70
Student1 B student2#gmail.com. 70
Now I want records 1,3,4.Basically the first occurance of the student in each Course
I have my query as
select distinct StudentName,Course, Studentmailid,Score fromStudentTable group by Course
and it throws an error.What would I have to tweak the query as to get the desired output
Hope it help.
;with cte as (
select StudentName,Course, Studentmailid,Score, Row_Number() over (partition by StudentName, Course order by StudentName) as Row_Num from StudentTable
)
select * from cte where Row_Num = 1
You need a column to define first item from multiple rows. Other wise there is no guaranty that you will get the same output every time. Following is a sample where I order by NULL, which will return record on normal order. But this will not always return the same result.
You need a column with values to order all records. You can then simply replace the ordering part in WINDOW function with your column.
Demo Here
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY StudentName, Course ORDER BY (SELECT NULL)) RN
-- ORDER BY (SELECT NULL) is basically to keep data as it is now in the table
-- But there is no guaranty that it will order data in same order always
FROM your_table
)A
WHERE RN=1

Query table and Select latest 2 rows (in SQL Server)

I have a table that logs all updates made to an application. I want to query the table and return the last update by [Timestamp] and the update before that for a different value [ITEM]. I'm struggling to figure out how to get what i need. I'm returning more than one record for each ID and don't want that.
;WITH cte AS
(
SELECT
ID,
LAG(ITEM) OVER (PARTITION BY ID ORDER BY timestamp DESC) AS ITEM,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY timestamp DESC) RN
FROM
MyLoggingTable
WHERE
accountid = 1234
)
SELECT
cte.ID,
dl.ITEM,
DL.timestamp
FROM
cte
JOIN
MyLoggingTable DL ON cte.ID = DL.ID
WHERE
rn = 1
AND cte.ID IN ('id here | Sub select :( ..')
Is ID unique? Because if it is, your code shouldn't return duplicates. If it isn't, you will get duplicates because you are joining back to the MyLoggingTable which isn't needed. You should just move those columns (dl.Item & dl.timestamp) into the cte and return them from the cte like you did cte.ID.
I removed the LAG since you didn't return that column in your final query.
;WITH cte AS
(
SELECT
ID,
ITEM,
[timestamp],
--LAG(ITEM) OVER (PARTITION BY ID ORDER BY timestamp DESC) AS ITEM,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY timestamp DESC) RN
FROM
MyLoggingTable
WHERE
accountid = 1234
)
SELECT
cte.ID,
cte.ITEM,
cte.timestamp
FROM
cte
WHERE
rn = 1
AND cte.ID IN ('id here | Sub select :( ..')
Note, if you wanted the second to the last item, as you stated in your comments, make rn=2

how to grab the max of common columns with partition in sql

i have this Table Chips:
im looking to find the max for each ID but
the code im using is just not correct i need the new table to be
select *
from
(
max (numchips) over (partition by Id)
from #chips
)
You can use ROW_NUMBER:
SELECT Id, numchips
FROM (
SELECT Id, numchips,
ROW_NUMBER() OVER (PARTITION BY Id
ORDER BY numchips DESC) as rn
FROM #chips
) t
WHERE rn = 1
rn is equal to 1 for the record having the highest numchips value within each Id partition.
Using ROW_NUMBER() makes sense only if you have some additional columns in Chips table that you also want to retrieve.
Why can't you just do?:
SELECT
MAX(c.numchips),
c.Id
FROM
#chips as c
GROUP BY
c.Id

Ordering by an expression in a partition by

I have some almost duplicate data in my database (duplicates based on these 5 columns: Date, Code, Expiry, TheType, Strike, there are many more columns but they won't be counted towards labeling a record a duplicate). I want to keep only one record in each case and the one I want to keep is the one whose mtm column is closest to its checkprice column (i.e. minimize abs(mtm-checkprice)). So I think the CTE below gets pretty close if I can just order the partition by that expression. The way I tried gives me the error Invalid column name 'diff'.
WITH CTE AS(
SELECT *, ABS(Mtm - checkprice) as diff,
RN = ROW_NUMBER()OVER(PARTITION BY Date, Strike, Mtm, /* ALL THE OTHER COLUMN NAMES */
ORDER BY diff DESC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
Any ideas on how to rectify this?
Use the ABS(mtm-checkprice) in the ORDER BY of the ROW_NUMBER:
WITH CTE AS(
SELECT *, Diff = ABS(mtm-checkprice),
RN = ROW_NUMBER()OVER(PARTITION BY Date, Code, Expiry, TheType, Strike
ORDER BY ABS(mtm-checkprice) ASC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
You cannot access Diff in the ROW_NUMBER, only outside of the CTE.

Filter first then select page

How to first filter the result based on params then to apply where-between?
Some thing like
With Results as
(
Select colName,Title, Row_Number(Over...) as row from a table where colName=5
)
Select * from Results
where
row between #first and #last
But it does not works. I need to move my where colName=5 from with clause to outside then I got wrong data as It first get rows between #first n #last then search for colName=5.
Also I want count of Results.
Any idea?
You can use COUNT(*) OVER() to get the count of the unfiltered results
WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN,
count(*) over() AS [Count]
from master..spt_values
)
SELECT name, number,[Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Returns
name number Count
----------------------------------- ----------- -----------
VIEW 8278 2506
VIEW 8278 2506
view 2 2506
varchar 3 2506
varbinary 1 2506
This has performance implications though. You might want to just calculate the COUNT up front and cache it somewhere rather than recalculating it for every page request.
Your ROW_NUMBER syntax is incorrect. It should be this:
With Results as
(
SELECT colName, Title, ROW_NUMBER() OVER (ORDER BY ...) AS RN
FROM your_table
WHERE colName = 5
)
SELECT * FROM Results
WHERE rn BETWEEN #first AND #last
ORDER BY rn
See the documentation for more information.
I use approach very similar to Martin Smiths (currently selected answer) and at least in the tests I've made it gives better performance results.
; WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN
from master..spt_values
)
SELECT name, number, (SELECT COUNT(*) FROM cte) AS [Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Run this and his queries side by side and compare execution plans.

Resources