TSQL - Aggregates in HAVING Clause - sql-server

I know this problem has been asked a lot but when I address the error message and use a HAVING clause, I am still receiving the dreaded:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list,
and the column being aggregated is an outer reference.
What am I doing wrong, here?
SELECT
mr.ClubKeyNumber,
COUNT(mr.MonthlyReportID),
SUM(CONVERT(int,mr.Submitted))
FROM MonthlyReport mr
WHERE mr.ReportYear = 2014
AND COUNT(mr.MonthlyReportID) = 12
GROUP BY mr.ClubKeyNumber
HAVING (SUM(CONVERT(int,mr.Submitted))) > 11

The problem isn't with your HAVING clause it's in your WHERE clause.
You have an aggregate count in your where clause, try this:
SELECT
mr.ClubKeyNumber,
COUNT(mr.MonthlyReportID),
SUM(CONVERT(int,mr.Submitted))
FROM MonthlyReport mr
WHERE mr.ReportYear = 2014
GROUP BY mr.ClubKeyNumber
HAVING (SUM(CONVERT(int,mr.Submitted))) > 11 and COUNT(mr.MonthlyReportID) = 12
The where clause checks each row being aggregated before the group by clause. It cannot count your MonthlyReportID until after the group by so move it to the having clause.
Here is a simple example you can play with to demonstrate where vs have.

Related

How to limit the results of a query and group them together,

I have the two tables pictured from a "city jail' DB, one is the sentences given to criminals and the other criminal information. I am trying to write a query the lists only the criminal_id, first and last names with more that one sentence (i.e. the criminal_id's that have more than one sentence_id associated with it).
I have tried this query but get an error.
select
criminals.last, sentences.criminal_id,
count(sentences.sentence_id) as 'Number of Sentences'
from
criminals
join
sentences on criminals.criminal_id = sentences.criminal_id
where
count(sentences.sentence_id) > 1
group by
criminals.last
order by
'Number of Sentences' desc;
I get this error:
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
I would appreciate any suggestions on how to go about this one.
Filtering on aggregates such as the count happen in the HAVING clause, so you may use this version:
SELECT c.last, s.criminal_id, C0UNT(s.sentence_id) AS [Number of Sentences]
FROM criminals c
INNER JOIN sentences s
ON c.criminal_id = s.criminal_id
GROUP BY c.last, s.criminal_id
HAVING C0UNT(s.sentence_id) > 1
ORDER BY C0UNT(s.sentence_id) DESC;

Group by an evaluated field (sql server) [duplicate]

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.

SQL Selecting all but newest result per id

I need to set a "waived" flag in my table for all but the newest result per id. I thought I had a query that will work here, but when I run a select on the query, I'm getting incorrect results - I saw one case where it selected both of the only two results for a particular id. I'm also getting multiple results with the same exact data.
What am I doing wrong here?
Here's my select statement:
select t.test_row_id, t.test_result_id, t.waived, t.pass, t.comment
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
order by t.test_row_id
Here's the actual query I want to run:
update EV.Test_Result
set waived = 1
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
If I understand this correctly, you are having problems because the Cardinality of the ON predicate returns all matching rows.
EV.Test_Result.test_row_id = t.test_row_id
and EV.Test_Result.start_time < t.start_time
This ON will compare all of the start_time values that have the same id and return every combination of result sets where start_time is lesser than the t.start_time. Clearly, this is not what you want.
and t.device_id = 1219
and t.waived = 0
This is actually a predicate (ON technically is one), but I would prefer to use this in a subquery/CTE for several reasons: You limit the number of rows SQL has to retrieve and compare.
Something like the following might be what you needed:
SELECT A.test_row_id
, A.test_result_id
, A.waived
, A.pass
, A.comment
FROM EV.Test_Result A
INNER JOIN (SELECT MAX(start_time) AS start_time
, test_row_id
FROM EV.Test_Result
WHERE device_id = 1219
AND waived = 0
GROUP BY test_row_id
) AS T ON A.test_row_id = T.test_row_id
AND A.start_time < T.start_time
ORDER BY A.test_row_id
This query then returns a 1:M relationship between the values in the ON predicate, unlike the M:M query you had run.
UPDATE:
Since I sheepishly screwed up trying to alter my Query on SO, I'll redeem myself by explaining the physical and logical orders of basic SQL Query operators:
As you know, you write a simple SELECT statement like the following:
SELECT <aggregate column>, SUM(<non-aggregate column>) AS Cost
FROM <table_name>
WHERE <column> = 'some_value'
GROUP BY <aggregate column>
HAVING SUM(<non-aggregate column>) > some_value
ORDER BY <column>
Note that if you use a aggregate function, all other columns MUST appear in the GROUP BY or another function.
Now, SQL Server requires them to be written in that order although it actually processes this logically by the following order that is worth memorizing:
FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY
There are more details found on SELECT - MSDN, but this is why any columns in the SELECT operator must be in the group by or in a aggregate function (SUM, MIN, MAX, etc)...and also why my lazy code failed on your first attempt. :/
Note also that the ORDER BY is last (technically TOP operator occurs after this), and that without it the result is not deterministic unless a function such as DENSE_RANK enforces it (thought this occurs in the SELECT statement).
Hope this helps solve the problem and better yet how SQL works. Cheers
Can you try ROW_NUMBER () function order by timestamp descending and filtering out values having ROW_NUMBER 1 ;
Below query should fetch all records per id except the latest one
I tried below query in Oracle with a table having fields : id,user_id, record_order adn timestamp and it worked :
select
<table_name_alias>.*
from
(
select
id,
user_id,
row_number() over (partition by id order by record_order desc) as record_number
from
<your_table_name>
) <table_name_alias>
where
record_number <>1;
If you are using Teradata DB, you can also try QUALIFY statement. I'm not sure if all DBs support this.
Select
table_name.*
from table_name
QUALIFY row_number() over (partition by id order by record_order desc) <>1;

Grouping by single column but returning all the columns without including other columns in aggregate function

I am working on an SQL query which should group by a column bidBroker and return all the columns in the table.
I tried it using the following query
select Product,
Term,
BidBroker,
BidVolume,
BidCP,
Bid,
Offer,
OfferCP,
OfferVolume,
OfferBroker,
ProductID,
TermID
from canadiancrudes
group by BidBroker
The above query threw me an error as follows
Column 'canadiancrudes.Product' is invalid in the select list because it is not contained in either an aggregate function or the
GROUP BY clause.
Is there any other way which returns all the data grouping by bidBroker without changing the order of data coming from CanadadianCrudes?
First if you are going to agregate, you should learn about agregate functions.
Then grouping becomes much more obvious.
I think you should explain what you are trying to accomplish here, because I suspect that you are trying to SORT bu Bidbroker, rather than grouping.
If you mean you want to sort by BidBroker, you can use:
SELECT Product,Term,BidBroker,BidVolume,BidCP,Bid,Offer,OfferCP,OfferVolume,OfferBroker,ProductID,TermID
FROM canadiancrudes
ORDER BY BidBroker
If you want to GROUP BY, and give example-data you can use:
SELECT c1.Product,c1.Term,c1.BidBroker,c1.BidVolume,c1.BidCP,c1.Bid,c1.Offer,c1.OfferCP,c1.OfferVolume,c1.OfferBroker,c1.ProductID,c1.TermID
FROM canadiancrudes c1
WHERE c1.YOURPRIMARYKEY IN (
select MIN(c2.YOURPRIMARYKEY) from canadiancrudes c2 group by c2.BidBroker
)
Replace YOURPRIMARYKEY with your column with your row-unique id.
As others have said, don't use "group by" if you don't want to aggregate something. If you do want to aggregate by one column but include others as well, consider researching "partition."

datediff between multiple records in SQL

I have one scenario where i want to get a count if date difference between the two dates is
<=14 days.That is in first table i have to filter records where any one value of DATE1 values are <=14 days of DATE2.
For Ex:
q1="SELECT DATE1 FROM DATE1_TABLE";
q2="SELECT DATE2 FROM DATE2_TABLE";
My simple query :
SELECT
COUNT(*)
FROM
DATE1_TABLE WHERE DATEDIFF(DD,DATE1,(SELECT DATE2 FROM DATE2_TABLE))<=14
But i have multiple records in both the tables,but i want to choose any record having
this difference then it will get a count >0.So,it is throwing error subquery returned more
than one record.I want some solutions for this.I am using SQL SERVER 2008
NOTE:I can't use join here.because i wanted results from two different queries.
Thanks in advance.
You can use TOP 1 clause in your query..
SELECT TOP 1 *
FROM DATE1_TABLE
WHERE DATEDIFF(DD,DATE1,(SELECT DATE2 FROM DATE2_TABLE))<=14
You cannot use SELECT which will return multiple values where function expects scalar value... however you can join those tables:
SELECT
COUNT(DISTINCT dt1.*)
FROM DATE1_TABLE dt1 INNER JOIN DATE2_TABLE dt2 ON DATEDIFF(DD,DATE1,DATE2)<=14
This query will join tables on values only when they are within 14 days and count on unique values from DATE1_TABLE. I have no idea if is it performance wise.

Resources