I have several query results that use one or more aggregate functions and a date GROUP-BY so they look something like this:
Date VisitCount(COUNT) TotalBilling(SUM)
1/1/10 234 15765.21
1/2/10 321 23146.27
1/3/10 289 19436.51
The simplified SQL for the above is:
SELECT
VisitDate,
COUNT(*) AS VisitCount,
SUM(BilledAmount) AS TotalBilling
FROM Visits
GROUP BY VisitDate
What I would like is a way to apply an aggregate function such as AVG to one of the columns in the result set. For example, I would like to add "AvgVisits" and "AvgBilling" columns to the result set like this:
Date VisitCount(COUNT) TotalBilling(SUM) AvgVisits AvgBilling
1/1/10 234 15765.21 281.3 19449.33
1/2/10 321 23146.27 281.3 19449.33
1/3/10 289 19436.51 281.3 19449.33
SQL does not permit the application of an aggregate function to another aggregate function or a subquery, so the only ways I can think to do this are by using a temporary table or by iterating through the result set and manually calculating the values. Are there any ways I can do this in MSSQL2008 without a temp table or manual calculation?
with cteGrouped as (
SELECT
VisitDate,
COUNT(*) AS VisitCount,
SUM(BilledAmount) AS TotalBilling
FROM Visits
GROUP BY VisitDate),
cteTotal as (
SELECT COUNT(*)/COUNT(DISTINCT VisitDate) as AvgVisits,
SUM(BilledAmount)/COUNT(DISTINCT VisitDate) as AvgBilling
FROM Visits)
SELECT *
FROM cteGrouped
CROSS JOIN cteTotal;
You can achieve the same with sub-queries, I just find CTEs more expressive.
Something Similar to
select *,avg(visitcount) over(),
avg(totalbilling) over()
from(
SELECT
VisitDate,
COUNT(*) AS VisitCount,
SUM(BilledAmount) AS TotalBilling
FROM Visits
GROUP BY VisitDate) as a
Well if you are specifically trying to avoid using temporary tables i believe it could be done using a Common Table Expression.
Related
I have some sample data as follows
Name Value Timestamp
a 23 2016/12/23 11:23
a 43 2016/12/23 12:55
b 12 2016/12/23 12:55
I want to select the latest value for a and b. When I used Last_Value, I used the following query
Select Name, Last_Value(Value) over (partition by Name order by timestamp) from table
This returned 2 rows for a, but I wanted it grouped so that I get only the last entered value for each name. So I had to use sub queries.
select x.Name,x.Value from (Select Name, Last_Value(Value) over (partition by Name order by timestamp) ) as x group by x.Name,x.Value
This again returns 2 records for a...I just wanted to do a group by and orderby and instaed of selelcting the max() wanted to select the top record.
Can anybody tell me how to solve this problem?
One method doesn't use window functions:
select t.*
from table t
where t.timestamp = (select max(t2.timestamp) from table t2 where t2.name = t.name);
Otherwise, the subquery method is fine, although I would often use row_number() and conditional aggregation rather than last_value() (or first_value() with a descending order by).
Unfortunately, SQL Server does not support first_value() or last_value() as an aggregation function, only as a window function.
I need some help in writing a SQL Server stored procedure. All data group by Train_B_N.
my table data
Expected result :
expecting output
with CTE as
(
select Train_B_N, Duration,Date,Trainer,Train_code,Training_Program
from Train_M
group by Train_B_N
)
select
*
from Train_M as m
join CTE as c on c.Train_B_N = m.Train_B_N
whats wrong with my query?
The GROUP BY smashes the table together, so having columns that are not GROUPED combine would cause problems with the data.
select Train_B_N, Duration,Date,Trainer,Train_code,Training_Program
from Train_M
group by Train_B_N
By ANSI standard, the GROUP BY must include all columns that are in the SELECT statement which are not in an aggregate function. No exceptions.
WITH CTE AS (SELECT TRAIN_B_N, MAX(DATE) AS Last_Date
FROM TRAIN_M
GROUP BY TRAIN_B_N)
SELECT A.Train_B_N, Duration, Date,Trainer,Train_code,Training_Program
FROM TRAIN_M AS A
INNER JOIN CTE ON CTE.Train_B_N = A.Train_B_N
AND CTE.Last_Date = A.Date
This example would return the last training program, trainer, train_code used by that ID.
This is accomplished from MAX(DATE) aggregate function, which kept the greatest (latest) DATE in the table. And since the GROUP BY smashed the rows to their distinct groupings, the JOIN only returns a subset of the table's results.
Keep in mind that SQL will return #table_rows X #Matching_rows, and if your #Matching_rows cardinality is greater than one, you will get extra rows.
Look up GROUP BY - MSDN. I suggest you read everything outside the syntax examples initially and obsorb what the purpose of the clause is.
Also, next time, try googling your problem like this: 'GROUP BY, SQL' or insert the error code given by your IDE (SSMS or otherwise). You need to understand why things work...and SO is here to help, not be your google search engine. ;)
Hope you find this begins your interest in learning all about SQL. :D
I was wondering how I can retrieve multiple result sets based on one CTE? Something like what I have below - but obviously this doesn't work.
Does anyone know how I can get these 2 (or more) sets of data, based on that one CTE? (more, as in that it would be nice to get the total record count from this same CTE as well.)
;WITH CTE AS
(
SELECT
Column1, Column2, Column3
FROM
Product
WHERE
Name LIKE '%Hat%' AND Description Like '%MyBrand%'
)
SELECT DISTINCT CategoryId FROM CTE
SELECT DISTINCT BrandId FROM CTE
A CTE only exists for the query immediately following it, so it's not possible to use it for two separate select statements. You'll either need to persist the data in something like a temp table, or construct/invoke the CTE twice.
My query below runs fine without the MAX(colName) lines. The original query selects about 100 columns, but now the MAX(colName) columns need to be added. Obviously when I add them, MS SQL complains with the error:
"Column 'applicationId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
Is there any way to add these computed value columns without having to change the other 100 columns in the select?? The example below is simplified but the original query is a lot bigger and more complex.
SELECT
g.applicationId,
-- (another 100 or so columns just like above)
-- max(g.AScore) as AScore,
-- max(g.APercentile) as APercentile
FROM application a
LEFT JOIN GREScores g ON a.applicationId = g.applicationId
WHERE g.applicationID = 1
Thanks
UPDATE
Looks like the subquery approach mentioned by #OVais did the trick. If you believe this is not a good approach, please tell me why:
SELECT
g.applicationId,
-- (another 100 or so columns just like above)
(SELECT MAX(AScore) FROM GREScores WHERE GREScores.applicationId = a.applicationId) AS tAScore
-- max(g.APercentile) as APercentile
FROM application a
LEFT JOIN GREScores g ON a.applicationId = g.applicationId
WHERE g.applicationID = 1
The GROUP BY is there to ensure the query is semantically correct. What if you have 14 different values for column56: what should SQL Server guess that you want 14 rows in the output or collapse to MAX?
The SQL standard requires the GROUP BY to be populated (of mainstream RDBMS, only MySQL doesn't and makes a guess to resolved the ambiguity).
Now, there are ways around this:
copy/paste from SELECT list to GROUP BY
drag from the table column node which generates a CSV list in the query editor
Edit:
My answer above is for "MAX per grouping of the 100 columns".
If you want "All rows, with a single MAX for all rows" then you can use windowing
SELECT
g.applicationId,
-- (another 100 or so columns just like above)
max(g.AScore) OVER () as AScore,
max(g.APercentile) OVER () as APercentile
FROM GREScores g
WHERE g.applicationID = 1
You can use subquery instead of using aggregate function..
select col1,col2,(select Sum(col3) from table_name) as Sum from table_name
Max is an aggregate function
you need to add a GROUP BY for all the other columns at the end...
;WITH cte AS
(
SELECT *
FROM GREScores g
WHERE g.applicationID = 1
)
SELECT
g.applicationId,
-- (another 100 or so columns just like above)
AScore =(select max(g2.AScore) FROM cte g2) ,
APercentile =(select max(g2.APercentile) FROM cte g2)
FROM cte g
Corelated subquery is one of methods to do this
I have a simple query that runs in SQL 2008 and uses a custom CLR aggregate function, dbo.string_concat which aggregates a collection of strings.
I require the comments ordered sequentially hence the ORDER BY requirement.
The query I have has an awful TOP statement in it to allow ORDER BY to work for the aggregate function otherwise the comments will be in no particular order when they are concatenated by the function.
Here's the current query:
SELECT ID, dbo.string_concat(Comment)
FROM (
SELECT TOP 10000000000000 ID, Comment, CommentDate
FROM Comments
ORDER BY ID, CommentDate DESC
) x
GROUP BY ID
Is there a more elegant way to rewrite this statement?
So... what you want is comments concatenated in order of ID then CommentDate of the most recent comment?
Couldn't you just do
SELECT ID, dbo.string_concat(Comment)
FROM Comments
GROUP BY ID
ORDER BY ID, MAX(CommentDate) DESC
Edit: Misunderstood your objective. Best I can come up with is that you could clean up your query a fair bit by making it SELECT TOP 100 PERCENT, it's still using a top but at least it gets around having an arbitrary number as the limit.
Since you're using sql server 2008, you can use a Common Table Expression:
WITH cte_ordered (ID, Comment, CommentDate)
AS
(
SELECT ID, Comment, CommentDate
FROM Comments
ORDER BY ID, CommentDate DESC
)
SELECT ID, dbo.string_concat(Comment)
FROM cte_ordered
GROUP BY ID