Take average of only most recent group - sql-server

There's one table named StudentScore which has fields of: Score, CourseID, StudentID and Semester. The later three ones are the primary keys.
I want to write a stored procedure to get the average score of each student. But the rule is quite complex and I don't know how to express it in one query. Nested query should be avoided if is possible.
Here is the rule:
If one student take a course for more than once, only the last score should be calculated.
For example, there're following data:
StudentID | CourseID | Semester | Score
1 1 1 80
1 2 1 40
1 3 1 60
1 2 2 50
1 3 2 20
2 1 1 90
The stored procedure should return:
StudentID | AvgScore
1 50 // which is avg(80, 50, 20)
2 90
Please suggest stored procedure as efficient as possible. Thanks!

;WITH x AS
(
SELECT StudentID, Score, rn = ROW_NUMBER() OVER
(PARTITION BY StudentID, CourseID
ORDER BY Semester DESC)
FROM dbo.StudentScore
)
SELECT StudentID, AvgScore = AVG(Score)
FROM x
WHERE rn = 1
GROUP BY StudentID;
If you want something rounded to certain decimal places, maybe:
;WITH x AS
(
SELECT StudentID, Score = 1.0*Score, rn = ROW_NUMBER() OVER
(PARTITION BY StudentID, CourseID
ORDER BY Semester DESC)
FROM dbo.StudentScore
)
SELECT StudentID, AvgScore = CONVERT(DECIMAL(10,2), AVG(Score))
FROM x
WHERE rn = 1
GROUP BY StudentID;

Related

How to Sum (MAX values) from different value groups in same column SQL Server

I have a table like this:
Date
Consec_Days
2015-01-01
1
2015-01-03
1
2015-01-06
1
2015-01-07
2
2015-01-09
1
2015-01-12
1
2015-01-13
2
2015-01-14
3
2015-01-17
1
I need to Sum the max value (days) for each of the consecutive groupings where Consec_Days are > 1. So the correct result would be 5 days.
This is a type of gaps-and-islands problem.
There are many solutions, here is one simple one
Get the start points of each group using LAG
Calculate a grouping ID using a windowed conditional count
Group by that ID and take the highest sum
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN LAG(Consec_Days) OVER (ORDER BY Date) = 1 THEN 1 END
FROM YourTable t
),
Groupings AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (ORDER BY Date)
FROM StartPoints
WHERE Consec_Days > 1
)
SELECT TOP (1)
SUM(Consec_Days)
FROM Groupings
GROUP BY
GroupId
ORDER BY
SUM(Consec_Days) DESC;
db<>fiddle
with cte as (
select Consec_Days,
coalesce(lead(Consec_Days) over (order by Date), 1) as next
from YourTable
)
select sum(Consec_Days)
from cte
where Consec_Days <> 1 and next = 1
db<>fiddle

Not able to filter the required data

Hi i have 1 table Student where data is inserted row wise and i want to select only those student whose marks are more than 50% in all the subjects and if in any subject marks are less than 50% then it should not select that student in output and all records should be excluded for that student and there is no primary key
i tried below code :
Select * into #temp1 from Student where percent >=0.5 and group by Roll_Number
i am getting error :
is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
and if try like this :
Select * into #temp1 from Student where percent >=0.5
then i am getting students who has even in 1 subject more than 50% which is not required in output
Table structure is as follows
Student_Name Roll_Number Subject Marks Percent
Ashutosh 1234 English 40 40%
Ishan 1231 Maths 60 60%
Atul 1232 Maths 30 30%
Ashutosh 1234 MAths 70 70%
now in output it should only give
Ishan 1234 Maths 60 60%
You can use count() over() in order to indicate for every student if any row has < 50%, then use this as a filter criteria:
with s as (
select *,
Count(case when perc<0.5 then 1 end) over(partition by Student_Name) pc
from Students
)
select *
from s
where Percent>=0.5 and pc=0
You can get the desired result by using a subquery / cte and a window function in order to check if the student has at least 50% in all subjects:
DECLARE #Student TABLE(
Student_Name VARCHAR(20)
,Roll_Number int
,Subject VARCHAR(20)
,Marks int
,Perc DECIMAL(5,2)
)
INSERT INTO #Student VALUES
('Ashutosh',1234,'English',40,0.4)
,('Ishan',1231,'Maths',60,0.6)
,('Atul',1232,'Maths',30,0.3)
,('Ashutosh',1234,'Maths',70,0.7);
WITH cteFilter AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Student_Name, Roll_Number ORDER BY Perc ASC, Marks ASC) rn
FROM #Student
)
SELECT *
FROM cteFilter
WHERE rn = 1
AND Perc >= 0.5

How do I group these values

I have to take a person's race, gender, age range and
I have to take:
Race 1 - Gender 1 - Age Range
Race 1 - Gender 2 - Age Range
Race 2 - Gender 1 - Age Range
Race 2 - Gender 2 - Age Range
and turn it into:
Group # | Average Age
Group 1 | 20-30
Group 2 | 40-50
Group 3 | 30-40
Group 4 | 40-50
The age is inputted as 20-30, 30-40, 40-50 so I have to find the most repeated string but I don't know how to tie it all together in 2 columns and 4 rows. I'm still new and would like to learn. Can anyone explain how I can do this?
Edit:
End Result Correct Output Desired End Result
I'm not quite clear on your table structure but perhaps something like this would work.
select GroupType, age
from (select race + gender as GroupType , age, count(*) as frequency,
ROW_NUMBER() OVER (PARTITION BY race + gender ORDER BY COUNT(*) DESC) as seqnum
from tbl
group by race + gender, age) g
where seqnum = 1

Displaying all columns in SQL and also sum of columns with same ID in the last Repeating row

I have 2 tables
OrderDetails:
Id Name type Quantity
------------------------------------------
2009 john a 10
2009 john a 20
2010 sam b 25
2011 sam c 50
2012 sam d 30
ValueDetails:
Id Value
-------------------
2009 300
2010 500
2011 200
2012 100
I need to get an output which displays the data as such :
Id Name type Quantity Price
-------------------------------------------------
2009 john a 10
2009 john a 20 9000
2010 sam b 25
2011 sam c 50
2012 sam d 30 25500
The price is calculated by Value x Quantity and the sum of the values is displayed in the last repeating row of the given Name.
I tired to use sum and group by but I get only two rows. I need to display all 5 rows. How can I write this query?
You can use Row_Number with max of Row_Number to get this formatted sum
;with cte as (
select od.*, sm= sum( od.Quantity*vd.value ) over (partition by Name),
RowN = row_number() over(partition by Name order by od.id)
from #yourOrderDetails od
inner join #yourValueDetails vd
on od.Id = vd.Id
)
select Id, Name, Type, Quantity,
case when max(RowN) over(partition by Name) = row_number() over(partition by Name order by Id)
then sm else null end as ActualSum
from cte
Your input tables:
create table #yourOrderDetails (Id int, Name varchar(20), type varchar(2), Quantity int)
insert into #yourOrderDetails (Id, Name, type, Quantity) values
(2009 ,'john','a', 10 )
,(2009 ,'john','a', 20 ) ,(2010 ,'sam ','b', 25 )
,(2011 ,'sam ','c', 50 ) ,(2012 ,'sam ','d', 30 )
create table #yourValueDetails(Id int, Value Int)
insert into #yourValueDetails(Id, value) values
( 2009 , 300 ) ,( 2010 , 500 )
,( 2011 , 200 ) ,( 2012 , 100 )
SELECT a.ID,
a.Name,
a.Type,
a.quantity,
price = (a.quantity * b.price)
FROM OrderDetails a LEFT JOIN
ValueDetails b on a.id = b.id
This will put the price on every row. If you want to do a SUM by Id,Name and Type it's not going to show the individual records like you show them above. If you want to put a SUM on one of the lines that share the same Id, Name and Type then you'd need a rule to figure out which one and then you could probably use a CASE statement to decide on which line you want to show the SUM total.

SQL How to create output with sub totals

I'm new to T-SQL and need help converting an excel report to a run on SQL. I have a SQL table that records all the daily inventory transactions (in/out) from each stockroom. I need to create a report that list the current inventory levels for each product in each location and the qty in each place as follows. In other words, the current inventory levels of each place.
I also need help on how to insert the Preferred Out Report (below) into SQL Server as a view so I can run this each month over and over again.
Thanks in Advance!
Inventory Log table:
PubID QTY LocationID Transaction
1 10 1 Add
1 20 2 Add
1 30 3 Add
1 5 1 Sold
1 10 2 Sold
1 5 3 Sold
2 10 1 Add
2 10 2 Add
2 5 2 Sold
2 8 2 Sold
1 20 1 Add
1 20 2 Add
2 2 2 Sold
Preferred Output Table:
PubID Local_1 Local_2 Local_3 Total
1 25 30 25 80
2 5 0 0 5
Total 30 30 25 85
I see a lot of close examples here but most just add the value while I need to subtract the Sold inventory from the Added stock to get my totals in each column.
The row totals and column totals on the right and bottom are pluses but not needed if it's easier without.
THANKS!
If this was about aggregation without pivoting, you could use a CASE expression, like this:
SELECT
...
Local_1 = SUM(CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END),
...
FROM ...
GROUP BY ...
However, in the PIVOT clause, the argument of the aggregate function must be just a column reference, not an expression. You can work around that by transforming the original dataset so that QTY is either positive or negative, depending on Transaction:
SELECT
PubID,
QTY = CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END,
LocationID
FROM dbo.InventoryLog
The above query will give you a result set like this:
PubID QTY LocationID
----- --- ----------
1 10 1
1 20 2
1 30 3
1 -5 1
1 -10 2
1 -5 3
2 10 1
2 10 2
2 -5 2
2 -8 2
1 20 1
1 20 2
2 -2 2
which is now easy to pivot:
WITH prepared AS (
SELECT
PubID,
QTY = CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END,
LocationID
FROM dbo.InventoryLog
)
SELECT
PubID,
Local_1 = [1],
Local_2 = [2],
Local_3 = [3]
FROM prepared
PIVOT
(
SUM(QTY)
FOR LocationID IN ([1], [2], [3])
) AS p
;
Note that you could actually prepare the names Local_1, Local_2, Local_3 beforehand and avoid renaming them in the main SELECT. Assuming they are formed by appending the LocationID value to the string Local_, here's an example of what I mean:
WITH prepared AS (
SELECT
PubID,
QTY = CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END,
Name = 'Local_' + CAST(LocationID AS varchar(10))
FROM dbo.InventoryLog
)
SELECT
PubID,
Local_1,
Local_2,
Local_3
FROM prepared
PIVOT
(
SUM(QTY)
FOR Name IN (Local_1, Local_2, Local_3)
) AS p
;
You will see, however, that in this solution renaming will be needed at some point anyway, so I'll use the previous version in my further explanation.
Now, adding the totals to the pivot results as in your desired output may seem a little tricky. Obviously, the column could be calculated simply as the sum of all the Local_* columns, which might actually not be too bad with a small number of locations:
WITH prepared AS (
SELECT
PubID,
QTY = CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END,
LocationID
FROM dbo.InventoryLog
)
SELECT
PubID,
Local_1 = [1],
Local_2 = [2],
Local_3 = [3]
Total = COALESCE([1], 0)
+ COALESCE([2], 0)
+ COALESCE([3], 0)
FROM prepared
PIVOT
(
SUM(QTY)
FOR LocationID IN ([1], [2], [3])
) AS p
;
(COALESCE is needed because some results may be NULL.)
But there's an alternative to that, where you don't have to list all the locations explicitly one extra time. You could return the totals per PubID alongside the details in the prepared dataset using SUM() OVER (...), like this:
WITH prepared AS (
SELECT
PubID,
QTY = CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END,
LocationID,
Total = SUM(CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END)
OVER (PARTITION BY PubID)
FROM dbo.InventoryLog
)
…
or like this, if you wish to avoid repetition of the CASE expression:
WITH prepared AS (
SELECT
t.PubID,
QTY = x.AdjustedQTY,
t.LocationID,
Total = SUM(x.AdjustedQTY) OVER (PARTITION BY t.PubID)
FROM dbo.InventoryLog AS t
CROSS APPLY (
SELECT CASE t.[Transaction] WHEN 'Add' THEN t.QTY ELSE -t.QTY END
) AS x (AdjustedQTY)
)
…
Then you would just include the Total column into the main SELECT clause along with the pivoted results and PubID:
…
SELECT
PubID,
Local_1,
Local_2,
Local_3,
Total
FROM prepared
PIVOT
(
SUM(QTY)
FOR LocationID IN ([1], [2], [3])
) AS p
;
That would be the total column for you. As for the row, it is actually easy to add it when you are acquainted with the ROLLUP() grouping function:
…
SELECT
PubID,
Local_1 = SUM([1]),
Local_2 = SUM([2]),
Local_3 = SUM([3]),
Total = SUM(Total)
FROM prepared
PIVOT
(
SUM(QTY)
FOR LocationID IN ([1], [2], [3])
) AS p
GROUP BY ROLLUP(PubID)
;
The total row will have NULL in the PubID column, so you'll again need COALESCE to put the word Total instead (only if you want to return it in SQL; alternatively you could substitute it in the calling application):
…
PubID = COALESCE(CAST(PubID AS varchar(10)), 'Total'),
…
And that would be all. To sum it up, here is a complete query:
WITH prepared AS (
SELECT
PubID,
QTY = x.AdjustedQTY,
t.LocationID,
Total = SUM(x.AdjustedQTY) OVER (PARTITION BY t.PubID)
FROM dbo.InventoryLog AS t
CROSS APPLY (
SELECT CASE t.[Transaction] WHEN 'Add' THEN t.QTY ELSE -t.QTY END
) AS x (AdjustedQTY)
)
SELECT
PubID = COALESCE(CAST(PubID AS varchar(10)), 'Total'),
Local_1 = SUM([1]),
Local_2 = SUM([2]),
Local_3 = SUM([3]),
Total = SUM(Total)
FROM prepared
PIVOT
(
SUM(QTY)
FOR LocationID IN ([1], [2], [3])
) AS p
GROUP BY ROLLUP(PubID)
;
As a final touch to it, you may want to apply COALESCE to the SUMs as well, to avoid returning NULLs in your data (if that is necessary).
The query below does what you need. I might have had one extra group by that could be combined into 1 but you get the idea.
DECLARE #InventoryLog TABLE
(
PubId INT,
Qty INT,
LocationId INT,
[Transaction] Varchar(4)
)
DECLARE #LocationTable TABLE
(
Id INT,
Name VarChar(10)
)
INSERT INTO #LocationTable
VALUES
(1, 'LOC_1'),
(2, 'LOC_2'),
(3, 'LOC_3')
INSERT INTO #InventoryLog
VALUES
(1 , 10, 1 , 'Add'),
(1 , 20, 2 , 'Add'),
(1 , 30, 3 , 'Add'),
(1 , 5 , 1 , 'Sold'),
(1 , 10, 2 , 'Sold'),
(1 , 5 , 3 , 'Sold'),
(2 , 10, 1 , 'Add'),
(2 , 10, 2 , 'Add'),
(2 , 5 , 2 , 'Sold'),
(2 , 8 , 2 , 'Sold'),
(1 , 20, 1 , 'Add'),
(1 , 20, 2 , 'Add'),
(2 , 2 , 2 , 'Sold')
SELECT PubId,
lT.Name LocationName,
CASE
WHEN [Transaction] ='Add' Then Qty
WHEN [Transaction] ='Sold' Then -Qty
END as Quantity
INTO #TempInventoryTable
FROM #InventoryLog iL
INNER JOIN #LocationTable lT on iL.LocationId = lT.Id
SELECT * INTO #AlmostThere
FROM
(
SELECT PubId,
ISNULL(LOC_1,0) LOC_1,
ISNULL(LOC_2,0) LOC_2,
ISNULL(LOC_3,0) LOC_3,
SUM(ISNULL(LOC_1,0) + ISNULL(LOC_2,0) + ISNULL(LOC_3,0)) AS TOTAL
FROM #TempInventoryTable s
PIVOT
(
SUM(Quantity)
FOR LocationName in (LOC_1,LOC_2,LOC_3)
) as b
GROUP BY PubId, LOC_1, LOC_2, LOC_3
) b
SELECT CAST(PubId as VARCHAR(10))PubId,
LOC_1,
LOC_2,
LOC_3,
TOTAL
FROM #AlmostThere
UNION
SELECT ISNULL(CAST(PubId AS VARCHAR(10)),'TOTAL') PubId,
[LOC_1]= SUM(LOC_1),
[LOC_2]= SUM(LOC_2),
[LOC_3]= SUM(LOC_3),
[TOTAL]= SUM(TOTAL)
FROM #AlmostThere
GROUP BY ROLLUP(PubId)
DROP TABLE #TempInventoryTable
DROP TABLE #AlmostThere
PubId LOC_1 LOC_2 LOC_3 TOTAL
1 25 30 25 80
2 10 -5 0 5
TOTAL 35 25 25 85
Sql Fiddle
Here is another approach: aggregate the data before pivoting, then pivot the aggregated results.
Compared to my other suggestion, this method is much simpler syntactically, which may also make it easier to understand and maintain.
All the aggregation is done with the help of the CUBE() grouping function. The basic query would be this:
SELECT
PubID,
LocationID,
QTY = SUM(CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END)
FROM dbo.InventoryLog
GROUP BY CUBE(PubID, LocationID)
You can see the same CASE expression as in my other answer, only this time it can be directly used as the argument of SUM.
Using aggregation by CUBE gives us not only the totals by (PubID, LocationID), but also by PubID and LocationID separately, as well as the grand total. This is the result of the query for the example in your question:
PubID LocationID QTY
----- ---------- ---
1 1 35
2 1 10
NULL 1 45
1 2 50
2 2 25
NULL 2 75
1 3 35
NULL 3 35
NULL NULL 155
1 NULL 120
2 NULL 35
Rows with NULLs in LocationID are row totals in the final result set, and those with NULLs in PubID are column totals. The row with NULLs in both columns is the grand total.
Before we can proceed with the pivoting, we need to prepare column names for the pivoted results. If the names are supposed to be derived from the values of LocationID, the following declaration will replace LocationID in the original query's SELECT clause:
Location = COALESCE('Local_' + CAST(LocationID AS varchar(10)), 'Total')
We can also substitute 'Total' for the NULLs in PubID at this same stage, so this will replace PubID in the SELECT clause:
PubID = COALESCE(CAST(PubID AS varchar(10)), 'Total')
Now the results will look like this:
PubID LocationID QTY
----- ---------- ---
1 Local_1 35
2 Local_1 10
Total Local_1 45
1 Local_2 50
2 Local_2 25
Total Local_2 75
1 Local_3 35
Total Local_3 35
Total Total 155
1 Total 120
2 Total 35
and at this point everything is ready to apply PIVOT. This query transforms the above result set according to the desired format:
WITH aggregated AS (
SELECT
PubID = COALESCE(CAST(PubID AS varchar(10)), 'Total'),
Location = COALESCE('Local_' + CAST(LocationID AS varchar(10)), 'Total'),
QTY = SUM(CASE [Transaction] WHEN 'Add' THEN QTY ELSE -QTY END)
FROM dbo.InventoryLog
GROUP BY CUBE(PubID, LocationID)
)
SELECT
PubID,
Local_1,
Local_2,
Local_3,
Total
FROM aggregated
PIVOT (
MAX(QTY)
FOR Location IN (Local_1, Local_2, Local_3, Total)
) AS p
;
This query will return NULLs for missing combinations of (PubID, LocationID). If you want to return 0 instead, apply COALESCE to the result of SUM in the definition of aggregated.

Resources