Partition with select distinct - sql-server

I have a dataset that looks like this
StudentName Course Studentmailid Score
Student1 A student1#gmail.com. 80
Student1 A student1#gmail.com. 75
Student2 A student2#gmail.com. 70
Student1 B student2#gmail.com. 70
Now I want records 1,3,4.Basically the first occurance of the student in each Course
I have my query as
select distinct StudentName,Course, Studentmailid,Score fromStudentTable group by Course
and it throws an error.What would I have to tweak the query as to get the desired output

Hope it help.
;with cte as (
select StudentName,Course, Studentmailid,Score, Row_Number() over (partition by StudentName, Course order by StudentName) as Row_Num from StudentTable
)
select * from cte where Row_Num = 1

You need a column to define first item from multiple rows. Other wise there is no guaranty that you will get the same output every time. Following is a sample where I order by NULL, which will return record on normal order. But this will not always return the same result.
You need a column with values to order all records. You can then simply replace the ordering part in WINDOW function with your column.
Demo Here
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY StudentName, Course ORDER BY (SELECT NULL)) RN
-- ORDER BY (SELECT NULL) is basically to keep data as it is now in the table
-- But there is no guaranty that it will order data in same order always
FROM your_table
)A
WHERE RN=1

Related

How to Remove Duplicate Statement

How to delete duplicate data row in SQL Server where there are not any unique value differences? I remain only one statement from my sales table (dbo.Sales)
ID DESCRIPTIONS QTY RATE AMOUNT
--------------------------------
1 APPLE 50 100 1000
1 APPLE 50 100 1000
1 APPLE 50 100 1000
1 APPLE 50 100 1000
We can try using a CTE here to arbitrarily delete all but one of the duplicates:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, DESCRIPTIONS, QTY, RATE, AMOUNT
ORDER BY (SELECT NULL)) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
You can delete like following.
DELETE A
FROM (SELECT Row_number()
OVER (
partition BY id, descriptions, qty, rate, amount
ORDER BY (SELECT 1)) AS rn
FROM table1) A
WHERE a.rn > 1
If you want to use CTE, you can try like following.
;WITH cte
AS (SELECT Row_number()
OVER(
partition BY id, descriptions, qty, rate, amount
ORDER BY (SELECT 1)) RN
FROM table1)
DELETE FROM cte
WHERE rn > 1
you can use this:
select distinct * into temp from tableName
delete from tableName
insert into tableName
select * from temp
drop table temp
I suggest to add a column like rn and feed it by row_number() over (Partition by ID, DESCRIPTIONS ,QTY, RATE, AMOUNT order by Id)
Now delete the data having rn not equal to 1
after completion drop that column... this is a one time solution if it is frequent that add a unique key in your table

Getting Top 3 values for each id and status

I have data something like this,
ID Time Status
--- ---- ------
1 10 B
1 20 B
1 30 C
1 70 C
1 100 B
1 490 D
The desired result should be,
ID Time Status
1 490 D
1 100 B
1 70 C
This is how,I should get top 3 Time vales for ID and distinct status.
For this I Tried:-
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY TIME DESC) AS rn
FROM MyTable
)
SELECT id,TIME,Status
FROM cte
where rn<=3
But it doesn't meet my requirement iam gettng top 3 duplicates staus values,How can i solve this.Help!
Partition by status as well:
WITH cte AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id, status
ORDER BY TIME DESC
) AS rn
FROM MyTable t
)
SELECT id, TIME, Status
FROM t
WHERE rn <= 3;
The with ties argument of the top function will return all the of the rows which match the top values:
select top (3) with ties id, Time, Status from table1 order by Time desc
Alternatively, if you wanted to return 3 values only, but make sure they are always the same 3 values, then you will need to use something else as a tie-breaker. In this case, it looks like your id column could be unique.
select top (3) id, Time, Status from table1 order by Time desc, id
Try this:
select distinct id,max(time) over (partition by id,status) as time ,status
from mytable t order by time desc
Output -
id time status
1 490 D
1 100 B
1 70 C
EDIT:
select distinct TOP 3 id,max(time) over (partition by id,status) as time,status
from mytable t order by time desc
Try this:
SELECT TOP 3 * FROM [MyTable] WHERE [Id] = 1 ORDER BY [Time] DESC
This will give you top three records for ID = 1. For any other ID, just change the number in WHERE clause.
Additionally you can make some stored procedure to UNION all top three records for each ID - this can be done using looping through all distinct IDs in your table :)
Try using RANK.
You may use the below query to get your desired result.
select * from
(select *, RANK() over(partition by status order by time desc) as rn from myTable)T
where rn = 1
FIDDLE

Select Middle Rows in SQL Server

I have a table where I want to select the last 10% of rows, offset by 10% (so I want to select the last 80-90% of the data).
I wrote the following query
SELECT TOP 10 PERCENT
[col1], [col2]
FROM [table]
ORDER BY [col1] DESC
OFFSET 10 ROWS
But I receive the following error:
Line 5: Incorrect syntax near 'OFFSET'.
What am I doing wrong? I am using Microsoft SQL Server 2012 which should be compatible with OFFSET
Try something like this....
SELECT TOP (50) PERCENT *
FROM (
SELECT TOP (20) PERCENT
[col1]
,[col2]
FROM [table]
ORDER BY [col1] DESC
)T
ORDER BY [col1] ASC
You can use a simple good old not in:
SELECT TOP 10 PERCENT [col1], [col2]
FROM [table]
WHERE [col1] NOT IN (
SELECT TOP 10 PERCENT [col1]
FROM [table]
ORDER BY [col1] DESC
)
ORDER BY [col1] DESC
For your error message, is your database set to backwards compatibility mode?
The offset expression only allows you to specify row numbers, not percentages. You can select the 80-90 percentile like:
select *
from (
select 100.0 * row_number() over (order by FirstName desc) /
count(*) over () as perc_pos
from YourTable
) as SubQueryAlias
where 80 <= perc_pos and perc_pos < 90
If your looking for a way to present, to a web page for example, blocks of data..
Try
WITH Ordered AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY ServerName) AS 'RowNumber'
FROM systems
)
SELECT *
FROM Ordered
WHERE RowNumber BETWEEN 11 AND 20
With this code, I was able to offer the user the first 10, then then second block of 10 (11 - 20) and so one.
Now, a word of caution. If you data changes frequently, this may suffer as it will give you the first 10 rows (or rows 50 to 60) at the time the query is done.
So, if new data is being added, that throws off the list, be warned. If you looking at a list of computers, for example, and someone adds a new server named "AAA", and your looking at the middle of the list, what was item 50 in one query, may be item 49 in the second query. (I hope I didn't confuse that even more).
with this code, rownum get a proper rownum from a list of records available in table and pick a middle one record from them and display as follows:
SELECT * FROM
(SELECT E.*, ROWNUM RM FROM MYCODE E)
WHERE RM=(SELECT COUNT(*)/2 FROM MYCODE);
Required Output of middle record from table:
select top 1 *
from Employee
where empid in (
select top 50 percent empid
from employee
order by empid
)
order by empid desc
declare #middle1 as int
set #middle1 = ((select COUNT(*) from [table] )+1)/2
declare #middle2 as int
set #middle2 = ((select COUNT(*) from [table] ))/2
select * from
(select ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as R, * from [table] where (select COUNT(*) from [table] ) % 2 = 0) T2
where (T2.R - #middle2 = 0) or (T2.R- #middle1 = 0)
union
select * from
(select ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as R, * from [table] where (select COUNT(*) from [table] ) % 2 != 0) T2
where T2.R - #middle1 = 0

T-SQL: Get top n records by column

Struggling with what's probably a very simple problem. I have a query like this:
;WITH rankedData
AS ( -- a big, complex subquery)
SELECT UserId,
AttributeId,
ItemId
FROM rankedData
WHERE rank = 1
ORDER BY datEventDate DESC
The sub-query is designed to grab a big chunk of interlined data and rank it by itemId and date, so that the rank=1 in the above query ensures we only get unique ItemIds, ordered by date. The partition is:
Rank() OVER (partition BY ItemId ORDER BY datEventDate DESC) AS rk
The problem is that what I want is the top 75 records for each UserID, ordered by date. Seeing as I've already got a rank inside my sub-query to sort out item duplicates by date, I can't see a straightforward way of doing this.
Cheers,
Matt
I think your query should look like
SELECT t.UserId, t.AttributeId, t.ItemId
FROM (
SELECT UserId, AttributeId, ItemId, rowid = ROW_NUMBER() OVER (
PARTITION BY UserId ORDER BY datEventDate
)
FROM rankedData
) t
WHERE t.rowid <= 75

Filter first then select page

How to first filter the result based on params then to apply where-between?
Some thing like
With Results as
(
Select colName,Title, Row_Number(Over...) as row from a table where colName=5
)
Select * from Results
where
row between #first and #last
But it does not works. I need to move my where colName=5 from with clause to outside then I got wrong data as It first get rows between #first n #last then search for colName=5.
Also I want count of Results.
Any idea?
You can use COUNT(*) OVER() to get the count of the unfiltered results
WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN,
count(*) over() AS [Count]
from master..spt_values
)
SELECT name, number,[Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Returns
name number Count
----------------------------------- ----------- -----------
VIEW 8278 2506
VIEW 8278 2506
view 2 2506
varchar 3 2506
varbinary 1 2506
This has performance implications though. You might want to just calculate the COUNT up front and cache it somewhere rather than recalculating it for every page request.
Your ROW_NUMBER syntax is incorrect. It should be this:
With Results as
(
SELECT colName, Title, ROW_NUMBER() OVER (ORDER BY ...) AS RN
FROM your_table
WHERE colName = 5
)
SELECT * FROM Results
WHERE rn BETWEEN #first AND #last
ORDER BY rn
See the documentation for more information.
I use approach very similar to Martin Smiths (currently selected answer) and at least in the tests I've made it gives better performance results.
; WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN
from master..spt_values
)
SELECT name, number, (SELECT COUNT(*) FROM cte) AS [Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Run this and his queries side by side and compare execution plans.

Resources