Get the top n rows from each group - sql-server

I have a table as following:
[ID] [Name] [Score] [Class]
1 John 90 A
2 Mary 63 A
3 Tom 87 A
4 David 98 A
5 Mary 87 B
6 David 77 B
7 David 73 C
8 Mary 92 C
9 Tom 73 C
10 John 79 C
11 Mary 70 D
12 Jane 85 D
13 David 83 D
I need to get the top 2 persons based on the scores in each class.
My expected output is
[ID] [Name] [Score] [Class]
1 John 90 A
4 David 98 A
5 Mary 87 B
6 David 77 B
8 Mary 92 C
10 John 79 C
12 Jane 85 D
13 David 83 D
Here is what I tried so far but this is not producing the correct results
SELECT *
FROM Student s
WHERE
(
SELECT COUNT(*)
FROM Student f
WHERE f.name = s.name AND
f.score >= s.score
) <= 2

Use ROW_NUMBER:
SELECT
ID, Name, Score, Class
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY Class ORDER BY Score DESC)
FROM Student
) t
WHERE rn <= 2
DEMO

The specification said... "for each class"
The smallest change that would need to be made to the proposed query to get the the specified results "students with the two highest scores for each class", we'd need to replace one of the predicates in the correlated subquery... matching on [class] rather than on [name]...
SELECT s.*
FROM Student s
WHERE
(
SELECT COUNT(*)
FROM Student f
WHERE f.class = s.class
AND f.score >= s.score
) <= 2
Note that if there are multiple students in the class with the same "highest scores", the query will return all of those students, not just two students for class.
If we specifically want to return "at most two students from each class, the students who have the highest score in each class", we'd need to write the query a little differently.
Given the example data, with no duplicate scores in a class, the results would be the same.
The difference can be demonstrated by adding a row to the example data, for example, adding Saul, having the same "second highest" score as David.
11 Mary 70 D
12 Jane 85 D
13 David 83 D
14 Saul 83 D
The question we need to ask about the specification... should only two students be returned for this class, Jane has the highest score, so obviously return Jane. But David and Saul have the same score. Do we return both, or if we only return one of them, does it matter which one we return?
Should we return three rows:
[ID] [Name] [Score] [Class]
12 Jane 85 D
13 David 83 D
14 Saul 83 D
because those are all of the students with the two highest scores, or should we return just two of the students with the highest scores:
[ID] [Name] [Score] [Class]
12 Jane 85 D
13 David 83 D
or
[ID] [Name] [Score] [Class]
12 Jane 85 D
14 Saul 83 D
Once that question is answered, we can write a query that returns the specified result.
And (obviously) this isn't the only query. There are other query patterns that will return an equivalent result... using either ANSI-standard syntax, or vendor specific extensions.

Related

SQL: insert total count row after joining table

The first step is to join staff and customer together. The second step is to count the distinct product_id. My target is to add the total(sum) field under the result table.
Thanks.
staff
staff_ID Name cust_id
1 Tom 101
1 Tom 101
1 Tom 105
2 Peter 102
2 Peter 104
3 Billy 103
customer
cust_id product_id
101 A1
102 A2
103 A3
104 A4
105 A5
My work:
SELECT a.staff_name,COUNT(DISTINCT a.product_id)
FROM (SELECT distinct a.staff_id, a.staff_name, a.cust_id
FROM staff)a
LEFT JOIN customer b ON a.cust_id=b.cust_id
GROUPBY a.staff_name
What I want is to add the total column below the count.
Name count
Tom 2
Peter 2
Billy 1
Total 5
Update:
Regarding the "Total", as #MatBailie correctly pointed out in the comments:
The aggregate of multiple COUNT(DISTINCT) rows CAN NOT be guaranteed to be summable. If two staff members share the same product_id the summary value will be LESS THAN the sum of its members.
So for this sample data set:
db<>fiddle here
cust_id
product_id
101
A1
102
A2
103
A3
104
A4 <== Same product
105
A5
105
A4 <== Same product
Using GROUP BY ROLLUP yields a "Total" value of 5:
SELECT COALESCE(a.staff_name, 'Total') AS Staff_Name
, COUNT(DISTINCT b.product_id) AS [Count]
FROM staff a LEFT JOIN customer b ON a.cust_id=b.cust_id
GROUP BY ROLLUP (a.staff_name);
Results:
Staff_Name
Count
Billy
1
Peter
2
Tom
3
Total
5 **
Whereas calculating a simple sum of the totals, yields a "Total" value of 6. So just be aware of the difference.
Staff_Name
Count
Billy
1
Peter
2
Tom
3
Total
6 **
Original (Wrong Answer):
Can't remember where I saw this answer, but ... assuming Staff_Name is never null, you could use GROUP BY ROLLUP to obtain the total. That said calculating grand totals is usually more of a front-end job.
SELECT COALESCE(a.staff_name, 'Total') AS Staff_Name
, COUNT(DISTINCT b.product_id) AS [Count]
FROM staff a LEFT JOIN customer b ON a.cust_id=b.cust_id
GROUP BY ROLLUP (a.staff_name);
Try this one:
SELECT s.staff_name, COUNT(DISTINCT b.product_id), SUM(count) Total
FROM staff s
INNER JOIN customer b ON b.cust_id = s.cust_id
GROUP BY s.staff_name

SQL selecting row specific data by type

After numerous joins building a query, I stuck in a table of products with 3 column identifies ID-Color-Size and the column of data barcode like
Id
Color
Size
Barcode
34
40
4
5205barcode1
34
40
4
extradata1
34
40
5
5205barcode2
34
40
5
extradata2
34
41
4
5205barcode3
34
41
4
extradata3
35
40
5
5205barcode4
35
40
5
extradata4
34
40
3
data4
35
39
5
data5
35
40
3
data6
I need to keep the unique combinations of ID-Color-Size with barcode (starting with '5205%') and remove the rows with same id-color-size (the extradata1-5 are considered duplicate).
The final table would have unique combinations of ID-Color-Size-barcode1-4 and data4-5-6
If I understand correctly you need a window function to order duplicates of id/color/size by the barcode and only select those where the barcode starts 5205:
with p as (
select *,
Row_Number() over(partition by id, color, size order by case when barcode like '5205%' then 1 end desc) rn
from t
)
select id, color, size, barcode
from p
where rn=1

Finding class average and number of student per subject

How can I get column for class average per subject and column for No. of students offering each subject? I have created the following tables statement and queries. Below is my query that gives me the result of a student without class average and No. of students per subject.
Table Student has three columns containing three students
Studentid Firstname Lastname
--------------------------------
1 Oreofeoluwa Ogunkoya
2 Prevailer Adebayo
3 Arike Adeladan
4 Khalilat Yakubu
Table course contains four courses Irk and Crk are optional.
Courseid Course
------------------
1 Maths
2 English
3 Irk
4 Crk
I also have StudentCourse as a junction table for course and student. It contains score for all students.
SELECT Course
,Score
,Grade
,Comment
,Pos
,Minimum
,Maximum
FROM (
SELECT S.firstname
,S.lastname
,C.course
,Sc.score
,CASE
WHEN Score BETWEEN 80
AND 100
THEN 'A'
WHEN Score BETWEEN 70
AND 79
THEN 'B'
WHEN Score BETWEEN 60
AND 69
THEN 'C1'
WHEN Score BETWEEN 50
AND 59
THEN 'C2'
WHEN Score BETWEEN 40
AND 49
THEN 'D'
ELSE 'F'
END AS Grade
,CASE
WHEN Score BETWEEN 80
AND 100
THEN 'Excellent'
WHEN Score BETWEEN 70
AND 79
THEN 'Very Good'
WHEN Score BETWEEN 60
AND 69
THEN 'Good'
WHEN Score BETWEEN 50
AND 59
THEN 'Average'
WHEN Score BETWEEN 40
AND 49
THEN 'Pass'
ELSE 'Fail'
END AS Comment
,Rank() OVER (
PARTITION BY course ORDER BY Score DESC
) AS Pos
,Min(Score) OVER (
ORDER BY course
) AS Minimum
,Max(Score) OVER (
ORDER BY course
) AS Maximum
FROM Student S
JOIN Studentcourse Sc ON S.Studentid = Sc.Studentid
JOIN Courses C ON C.courseid = Sc.Courseid
) sub
WHERE firstname = 'Oreofeoluwa'
This query gives me the following table but i need the class average and no of students offering each subject
Course Score Grade Comment Pos Minimum Maximum
---------------------------------------------------
Crk 62.00 C1 Good 1 44.00 62.00
English 80.00 A Excellent 1 43.00 80.00
Maths 96.00 A Excellent 1 36.00 96.00
You have to group it at the course level to get the average scores at a course level. Please find below a sample query:
Select course,avg(score) as avg_score,count(distinct studentid) as students
from Student S
JOIN Studentcourse Sc ON S.Studentid = Sc.Studentid
JOIN Courses C ON C.courseid = Sc.Courseid
Group by course
If you want this added to the table you have created at a student level, then you will have to perform a left join. Hope this helps.

list all students who have not taken midterm for 1 or more subjects

I have 3 tables likes Student,Subject,and Midterm tables.
Student table contains
studid Firstname lastname Class
1 A R 12A
2 B S 12A
3 C T 12A
4 D U 12A
5 E V 12B
SUBJECT table contains
subid subname
1 maths
2 science
3 english
MIDTERM table contains
studid subid marks examdate
1 1 100 2014-09-24
1 2 92 2014-09-25
1 2 92 2014-09-26
2 1 74 2014-09-24
2 2 78 2014-09-26
2 3 73 2014-09-26
3 1 90 2014-09-24
3 2 84 2014-09-25
3 2 92 2014-09-25
5 1 87 2014-09-24
4 2 79 2014-09-24
4 3 90 2014-09-26
The result must be:
Firstname LastName Subname
Based on the below comment and Assuming all Students must take all midterms
select Firstname , lastname , subname
from (
select studid , FirstName, lastname , subID , subname
from student , Subject) d
left outer join midterm m on d.studid = m.studID and d.subid = m.subid
where m.examdate is null
could probably write it without the catesian join but it should suffice
As I guess, the one who didn't pass a midterm is "the one who has less than 50 points".
If not, you can fix it in the last line.
You can use INNER JOIN SQL keyword to make a query to several logically interconnected tables definining a matching condition.
Simply said, first you need to understand how you will extract data logically.
For example, in your case you need to select from MIDTERM table as it stores both student id and subject id.
In words, it sounds like:
1. Take all results of failed midterms (midterms with marks < 50). Each row contains studid (who failed an exam) and subid (which exam he failed);
2. From table Student take FirstName and LastName of student who failed it;
3. From table Subject take Subname of subject which has been failed;
4. Return these three values.
In code, it looks like:
SELECT s.Firstname, s.Lastname, subj.subname FROM `MIDTERM`
FROM `MIDTERM` as m
INNER JOIN `Student` as s ON s.stuid = m.studid
INNER JOIN `SUBJECT` as subj ON subj.subid = m.subid
WHERE `marks` < 50

SQL- Getting maximum value along with all other columns?

I have a table, which can be seen as a evaluation of two courses in several classroom tests, like this:
student_ID Evaluation Course1 Course2
------------------------------------------------------
1 5 88 93
2 4 70 87
1 5 93 90
2 5 99 91
3 3 65 60
3 4 88 70
I need to get the result of the Evaluation=5 for each student, if any. If that student has more than one Evaluation=5, the query only show any one of them. So for the above example table, the query result will be
student_ID Evaluation Course1 Course2
------------------------------------------------------
1 5 88 93
2 5 99 91
Of course in my real table, the "Courses" number is more than 2.
Thanks for the help.
Since you only want to get only one record for every student_id, you can use ROW_NUMBER() which generates sequential number. The number generated will always starts with 1 which you can use to filter out row for every partition, in this case Student_ID.
SELECT Student_ID, Evaluation, Course1, Course2
FROM
(
SELECT Student_ID, Evaluation, Course1, Course2,
ROW_NUMBER() OVER (PARTITION BY Student_ID
ORDER BY Student_ID) rn
FROM TableName
WHERE Evaluation = 5
) a
WHERE a.rn = 1
SQLFiddle Demo

Resources