GROUP BY Not Recognizing Aggregate Function

GROUP BY Not Recognizing Aggregate Function - sql-server

I have a Table with records like this:
Id
ForeignId
Date Created
Status
1
ZZ01
2021-01-20
failed
2
ZZ02
2021-03-24
passed
3
ZZ01
2021-08-09
passed
4
ZZ03
2022-01-12
failed
5
ZZ01
2022-04-23
passed
I am trying to write a query that returns the latest DateCreated and Status of every Distinct ForeignId in the table above. The query I have been writing uses MAX() to find the latest date of a record, and uses FIRST_VALUE() on Status to get only the latest value as well, and later using GROUP BY all the columns I put in the SELECT except for . The problem is that I I keep getting this error:
Column 'dbo.T.DateSubmitted' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Now I don't understand why this error keeps popping up when I am using all my Selects in the Group By or in the case of the failing DateSubmitted in an Aggregate Function.
SELECT [T].[Id],
[T].[ForeignId],
MAX([T].[DateSubmitted]) AS [Date Submitted],
FIRST_VALUE([S].[Status]) OVER (ORDER BY [T].[DateSubmitted] DESC) AS [Status]
FROM Table as [T]
GROUP BY [T].[Id], [T].[ForeignId], [T].[Status]
ORDER BY [T].[ForeignId];
As you can see from my code above I am using the MAX() function for my DateSubmitted select, and all my leftover selects in the GROUP BY, what am I missing? Why do I need to include my DateSubmitted in the GROUP BY when I am only selecting the MAX() value?
What I want is to return only the latest date and status for each distinct ForeignId, since every ForeignId can have multiple Status and Dates I want only the latest values, I almost get it with the query above, but if I am unable to use the GROUP BY and MAX() I can get the latest column information I have repeated ForeignId instances. For example I would receive ForeignId ZZ01 3 times instead of once. Which is why I need the GROUP BY to work. An example of the bad output when unable to use GROUP BY:
Id
ForeignId
Date Created
Status
1
ZZ01
2021-04-23
passed
2
ZZ01
2021-04-23
passed
3
ZZ01
2021-04-23
passed
4
ZZ02
2022-03-24
passed
5
ZZ03
2022-01-12
failed
Expected Result:
Id
ForeignId
Date Created
Status
1
ZZ01
2021-04-23
passed
2
ZZ02
2022-03-24
passed
3
ZZ03
2022-01-12
failed

Unfortunately, FIRST_VALUE is not available as an aggregate function, only as a window function. Therefore, the compiler understands it as operating over the resultset after aggregating. So the columns referred to must be in the GROUP BY, but cannot be on non-aggregated and non-grouped columns.
You can use it over an aggregate function
SELECT [SLM].[Id],
[SLM].[CompanySiteId],
MAX([QF].[DateSubmitted]) AS [Date Submitted],
FIRST_VALUE([QF].[Status]) OVER (ORDER BY MAX([QF].[DateSubmitted]) DESC) AS [QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
ON [SAS].[SiteListMemberId] = [SLM].[Id]
JOIN [dbo].[QCForm] AS [QF]
ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];
This may not give the results you want, it's hard to say without sample data.
Or you need to push it down into a derived table (subquery). You can do this over the whole joined set
SELECT t.[Id],
t.[CompanySiteId],
MAX(t.[DateSubmitted]) AS [Date Submitted],
t.[QC Status]
FROM (
SELECT [SLM].[Id],
[SLM].[CompanySiteId],
[QF].[DateSubmitted] AS [Date Submitted],
FIRST_VALUE([QF].[Status]) OVER (ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
ON [SAS].[SiteListMemberId] = [SLM].[Id]
JOIN [dbo].[QCForm] AS [QF]
ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
) t
GROUP BY t.[Id], t.[CompanySiteId], t.[Status]
ORDER BY t.[CompanySiteId];
Or you can do it just over the one table and join it after, by adding a PARTITION BY clause.
SELECT [SLM].[Id],
[SLM].[CompanySiteId],
MAX([QF].[DateSubmitted]) AS [Date Submitted],
QC.[QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
ON [SAS].[SiteListMemberId] = [SLM].[Id]
JOIN (
SELECT *,
FIRST_VALUE([QF].[Status]) OVER (PARTITION BY [QF].[SiteAssessmentStagingId]
ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
FROM [dbo].[QCForm] AS [QF]
) ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];
I would advise you to only quote column and table names which need it, and to avoid such names if at all possible. Lots of [] is annoying to read.

Related

Why is Rank() OVER PARTITION BY returning too many results

I want the results of my query to be the top 3 newest, distinct Campaign Names for each Campaign Type.
My query at the moment is:
DECLARE #currentRecord varchar(160);
SET #currentRecord = '316827D2-B522-E811-816A-0050569FE3BD';
SELECT DISTINCT
rs.CampaignName,
rs.CampaignType,
rs.receivedon,
rs.Rank
FROM
(SELECT
fs_retentioncontact,
receivedon,
regardingobjectidname AS CampaignName,
fs_campaignresponsetypename AS CampaignType,
RANK() OVER (PARTITION BY fs_campaignresponsetypename, regardingobjectidname
ORDER BY receivedon DESC) AS Rank
FROM
dbo.FilteredCampaignResponse) rs
INNER JOIN
dbo.FilteredContact ON rs.fs_retentioncontact = dbo.FilteredContact.contactid
WHERE
(dbo.FilteredContact.parentcustomerid IN (#currentRecord))
AND Rank <= 3
ORDER BY
CampaignType, receivedon DESC;
There may be multiple results for each campaign name as well as campaign response because they are linked to individual contacts but I only want to see the 3 latest unique campaigns for each campaign type.
My query is not partitioning by each individual campaign response type (there are 6 different ones) as I was expecting. If I remove the regardingobjectidname from the PARTITION BY I only get a single row in the results when I should be getting 18 rows. This particular company has over 700 campaign responses across the 6 campaign types.
My query is returning 102 rows so it seems to be removing duplicates on campaign name which is part of what I need but not the whole story.
I have read quite a few posts regarding rank() on here e.g.
how-to-use-rank-in-sql-server
[ using-sql-rank-for-overall-rank-and-rank-within-a-group]2
but I am not able to work out what I am doing wrong from their examples. Could it be the positioning of the 'receivedon' in the ORDER BY? or something else?

I have finally worked out from reading a post on another site how to get the top 3 of each group. I shall post my answer in case it helps anyone else.
I had to use ROW_NUMBER() OVER (PARTITION BY instead of RANK() OVER (PARTITION BY and I also moved the INNER JOIN and WHERE clause (to filter for the correct company) from the outer query to the inner query.
DECLARE #currentRecord varchar(160)
SET #currentRecord='316827D2-B522-E811-816A-0050569FE3BD'
SELECT distinct rs.CampaignName
,rs.CampaignType
, rs.receivedon
,RowNum
FROM(
SELECT fs_retentioncontact
, receivedon
, regardingobjectidname AS CampaignName
,fs_campaignresponsetypename as CampaignType
,ROW_NUMBER() OVER (PARTITION BY fs_campaignresponsetypename ORDER BY fs_campaignresponsetypename, receivedon DESC) AS RowNum
FROM FilteredCampaignResponse
INNER JOIN dbo.FilteredContact ON fs_retentioncontact = dbo.FilteredContact.contactid
WHERE(dbo.FilteredContact.parentcustomerid IN (#currentRecord)))rs
WHERE RowNum <=3
ORDER BY CampaignType,receivedon DESC;

Column 'ACCOUNT.ACCOUNT_ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I am trying to get available balance on last(max) date. I am trying to write below query but it is showing error.
select ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE,MAX(LAST_ACTIVITY_DATE)
from ACCOUNT
group by CUST_ID;
Column 'ACCOUNT.ACCOUNT_ID' is invalid in the select list because it
is not contained in either an aggregate function or the GROUP BY
clause.
I am new to sql. Can anyone let me know where I am wrong in this query?

Any column not having a calculation/function on it must be in the GROUP BY clause.
select ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE,MAX(LAST_ACTIVITY_DATE)
from ACCOUNT
group by ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE;

If you're wanting the most recent row for each customer, think ROW_NUMBER(), not GROUP BY:
;With Numbered as (
select *,ROW_NUMBER() OVER (
PARTITION BY CUST_ID
ORDER BY LAST_ACTIVITY_DATE desc) rn
from Account
)
select ACCOUNT_ID,AVAIL_BALANCE,OPEN_DATE,LAST_ACTIVITY_DATE
from Numbered
where rn=1

I think you want to select one records having max(LAST_ACTIVITY_DATE) for each CUST_ID.
For this you can use TOP 1 WITH TIES like following.
SELECT TOP 1 WITH TIES account_id,
avail_balance,
open_date,
last_activity_date
FROM account
ORDER BY Row_number()
OVER (
partition BY cust_id
ORDER BY last_activity_date DESC)
Issue with your query is, you can't select non aggregated column in select if you don't specify those columns in group by

If you want to get the max activity date for a customer then your query should be as below
select CUST_ID, MAX(LAST_ACTIVITY_DATE)
from ACCOUNT
group by CUST_ID;
You can't select any other column which is not in the group by clause. The error message also giving the same message.

with query(CUST_ID, LAST_ACTIVITY_DATE) as
(
select
CUST_ID,
MAX(LAST_ACTIVITY_DATE) as LAST_ACTIVITY_DATE
from ACCOUNT
group by CUST_ID
)
select
a.ACCOUNT_ID,
a.AVAIL_BALANCE,
a.OPEN_DATE,
a.LAST_ACTIVITY_DATE
from ACCOUNT as a
inner join query as q
on a.CUST_ID = q.CUST_ID
and a.LAST_ACTIVITY_DATE = q.LAST_ACTIVITY_DATE

T-SQL: GROUP BY, but while keeping a non-grouped column (or re-joining it)?

I'm on SQL Server 2008, and having trouble querying an audit table the way I want to.
The table shows every time a new ID comes in, as well as every time an IDs Type changes
Record # ID Type Date
1 ae08k M 2017-01-02:12:03
2 liei0 A 2017-01-02:12:04
3 ae08k C 2017-01-02:13:05
4 we808 A 2017-01-03:20:05
I'd kinda like to produce a snapshot of the status for each ID, at a certain date. My thought was something like this:
SELECT
ID
,max(date) AS Max
FROM
Table
WHERE
Date < 'whatever-my-cutoff-date-is-here'
GROUP BY
ID
But that loses the Type column. If I add in the type column to my GROUP BY, then I'd get get duplicate rows per ID naturally, for all the types it had before the date.
So I was thinking of running a second version of the table (via a common table expression), and left joining that in to get the Type.
On my query above, all I have to join to are the ID & Date. Somehow if the dates are too close together, I end up with duplicate results (like say above, ae08k would show up once for each Type). That or I'm just super confused.
Basically all I ever do in SQL are left joins, group bys, and common table expressions (to then left join). What am I missing that I'd need in this situation...?

Use row_number()
select *
from ( select *
, row_number() over (partition by id order by date desc) as rn
from table
WHERE Date < 'whatever-my-cutoff-date-is-here'
) tt
where tt.rn = 1

I'd kinda like know how many IDs are of each type, at a certain date.
Well, for that you use COUNT and GROUP BY on Type:
SELECT Type, COUNT(ID)
FROM Table
WHERE Date < 'whatever-your-cutoff-date-is-here'
GROUP BY Type

Basing on your comment under Zohar Peled answer you probably looking for something like this:
; with cte as (select distinct ID from Table where Date < '$param')
select [data].*, [data2].[count]
from cte
cross apply
( select top 1 *
from Table
where Table.ID = cte.ID
and Table.Date < '$param'
order by Table.Date desc
) as [data]
cross apply
( select count(1) as [count]
from Table
where Table.ID = cte.ID
and Table.Date < '$param'
) as [data2]

In T-SQL how to select only the top(not max) value in a group of record

I have some sample data as follows
Name Value Timestamp
a 23 2016/12/23 11:23
a 43 2016/12/23 12:55
b 12 2016/12/23 12:55
I want to select the latest value for a and b. When I used Last_Value, I used the following query
Select Name, Last_Value(Value) over (partition by Name order by timestamp) from table
This returned 2 rows for a, but I wanted it grouped so that I get only the last entered value for each name. So I had to use sub queries.
select x.Name,x.Value from (Select Name, Last_Value(Value) over (partition by Name order by timestamp) ) as x group by x.Name,x.Value
This again returns 2 records for a...I just wanted to do a group by and orderby and instaed of selelcting the max() wanted to select the top record.
Can anybody tell me how to solve this problem?

One method doesn't use window functions:
select t.*
from table t
where t.timestamp = (select max(t2.timestamp) from table t2 where t2.name = t.name);
Otherwise, the subquery method is fine, although I would often use row_number() and conditional aggregation rather than last_value() (or first_value() with a descending order by).
Unfortunately, SQL Server does not support first_value() or last_value() as an aggregation function, only as a window function.

Replace Group By clause with any other clause

In below query, I am using GROUP BY clause to get list of recently updated records depends on updated date. But I would like to have the query without a GROUP BY clause because of some internal reasons. Can please any one help me to solve this.
SELECT Proj_UpdatedDate,
Proj_UpdatedBy
FROM ProjectProgress PP
WHERE Proj_UpdatedDate IN (SELECT MAX(Proj_UpdatedDate)
FROM ProjectProgress
GROUP BY
Proj_ProjectID)
ORDER BY
Proj_ProjectID

Using TOP 1 should give you the same result assuming you meant the MAX(Proj_UpdatedDate):
SELECT Proj_UpdatedDate,
Proj_UpdatedBy
FROM ProjectProgress PP
WHERE Proj_UpdatedDate IN (SELECT TOP 1 Proj_UpdatedDate
FROM ProjectProgress
ORDER BY Proj_UpdatedDate DESC)
ORDER BY
Proj_ProjectID
However your query actually returns multiple dates since it's GROUPED BY Proj_ProjectId (the max date for each project). Is that your desired outcome - to show a list of dates that the projects were updated and by whom?
If so, try using ROW_NUMBER():
SELECT Proj_UpdatedDate, Proj_UpdatedBy
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Proj_ProjectID ORDER BY Proj_UpdatedBy DESC) rn,
Proj_UpdatedDate,
Proj_UpdatedBy
FROM ProjectProgress
) t
WHERE rn = 1
And here is the SQL Fiddle. This assumes you are running SQL Server 2005 or greater.
Good luck.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight