How do I select columns together with aggregate functions?

How do I select columns together with aggregate functions? - sql-server

Let's say I have a table of companies:
Company
coID | coName | coCSR
The coCSR field is a numeric ID which relates to the account handler table:
AccountHandler
ahID | ahFirstName | ahLastName
I also have a table of orders:
Order
orID | orCompanyID | orDate | orValue
Now what I need to produce is output structured as follows:
Company | Account handler | No. of orders | Total of orders
Here is the query I have tried, which produces an error:
SELECT coID, coName, ahFirstName+' '+ahLastName AS CSRName, COUNT(orID) AS numOrders, SUM(orValue) AS totalRevenue
FROM Company
LEFT JOIN AccountHandler ON coCSR = ahID
LEFT JOIN Order ON coID = orCompanyID
WHERE coCSR = 8
AND orDate > getdate() - 365
ORDER BY coName ASC
The error is: Column name 'AccountHandler.ahLastName' is invalid in the ORDER BY clause because it is not contained in an aggregate function and there is no GROUP BY clause.
If I use GROUP BY coID, I get Incorrect syntax near the keyword 'WHERE'. If I change the WHERE to HAVING because of the aggregate functions, I get errors telling me to remove each of the other column names that aren't contained in either an aggregate function or the GROUP BY clause.
I have to admit, I don't yet understand the logic and syntax of anything but the most basic SQL commands, I'm just trying to apply what I've seen used before, and it's not working. Please help me to get this working. Better still, can you help me understand why it doesn't work at the moment? :)

For one thing, your query is probably missing FROM Company, but that might somehow have been lost when you were writing your post.
You seem to be aggregating data by companies. Therefore you need to group by companies. The most likely reason why your attempt at grouping failed might be because you put GROUP BY in the wrong place. I think you put it before WHERE, but in fact it should go after it (and before ORDER BY):
SELECT
c.coID,
c.coName,
a.ahFirstName + ' ' + a.ahLastName AS CSRName,
COUNT(o.orID) AS numOrders,
SUM(o.orValue) AS totalRevenue
FROM Company c
LEFT JOIN AccountHandler a ON c.coCSR = a.ahID
LEFT JOIN [Order] o ON c.coID = o.orCompanyID
WHERE c.coCSR = 8
AND o.orDate > getdate() - 365
GROUP BY ...
ORDER BY c.coName ASC
Another question is, what to group by. SQL Server requires that all non-aggregated columns be specified in GROUP BY. Therefore your GROUP BY clause should look like this:
GROUP BY
c.coID,
c.coName,
a.ahFirstName,
a.ahLastName
Note that you can't reference columns by aliases assigned to them in the SELECT clause (e.g. CSRName). But you could use the ahFirstName+' '+ahLastName expression instead of the corresponding columns, it wouldn't make any difference in this particular situation.
If you ever need to add more non-aggregated columns to this query, you'll have to add them both to SELECT and to GROUP BY. At some point this may become a bit tedious. I would suggest you try the following instead:
SELECT
c.coID,
c.coName,
a.ahFirstName + ' ' + a.ahLastName AS CSRName,
ISNULL(o.numOrders, 0) AS numOrders,
ISNULL(o.totalRevenue, 0) AS totalRevenue
FROM Company c
LEFT JOIN AccountHandler a ON c.coCSR = a.ahID
LEFT JOIN (
SELECT
orCompanyID,
COUNT(orID) AS numOrders,
SUM(orValue) AS totalRevenue
FROM [Order]
GROUP BY
orCompanyID
WHERE orDate > getdate() - 365
) o ON c.coID = o.orCompanyID
WHERE c.coCSR = 8
ORDER BY c.coName ASC
That is, aggregating is done on the Order table only. The aggregated row set is then joined to the other tables. You can now pull more attributes to the output from either Company or AccountHandler without worrying about adding them to GROUP BY because grouping is not needed at that level any more.

Can you change the query like below? You should add Max and Group By Clause
SELECT
MAX(C.coID),
C.coName,
MAX(AH.ahFirstName+' '+ AH.ahLastName ) AS CSRName,
COUNT(O.orID) AS numOrders,
SUM(O.orValue) AS totalRevenue
From Company C
LEFT JOIN AccountHandler AH ON C.coCSR = AH.ahID
LEFT JOIN Order O ON C.coID = O.orCompanyID
WHERE C.coCSR = 8 AND
O.orDate > getdate() - 365
Group by C.coName
ORDER BY C.coName ASC
My Suggestion
You should use Alias for the selected Column Names

SELECT
--<<<non aggregate section of SELECT clause
coID
, coName
, [CSRName] = CONVERT(VARCHAR(100),ahFirstName + ' ' + ahLastName)
--<<<aggregate section of SELECT clause
, [numOrders] = COUNT(orID)
, [totalRevenue] = SUM(orValue)
FROM --<<<<<<sql is not too happy without FROM
Company c
LEFT JOIN AccountHandler a
ON coCSR = ahID
LEFT JOIN Order o
ON coID = orCompanyID
WHERE coCSR = 8
AND orDate > getdate() - 365
GROUP BY
coID
, coName
, CONVERT(VARCHAR(100),ahFirstName + ' ' + ahLastName) --<<<<looks like aggregate but is just text manipulation
ORDER BY coName ASC
You have two aggregate functions ; a COUNT and a SUM ; this means that you are required to do some grouping and the general rule of thumb is GROUP BY the non aggregate section of the select clause
The really big problem in your OP is that when you JOIN two tables, whatever flavour (LEFT, RIGHT, OUTER, INNER, CROSS) it has to be in the FROM clause and needs a table specified on either side of the join
Then if joining several tables you might like to use aliases for each of the tables; I've just used single lower case letter; c/o/a. Although looking at your column names these might not be needed as all columns are uniquely named.

Related

Why do I have duplicate records in my JOIN

I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column.
And in order to get a Description column, I need to join 3 tables:
NetRate_Quote_Insur_Quote
NetRate_Quote_Insur_Quote_Locat
NetRate_Quote_Insur_Quote_Locat_Liabi
But after that my premium is completely off.
What am I doing wrong here?
SELECT QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID,
ISNULL(SUM(premium),0) AS NetWrittenPremium,
MONTH(prm.EffectiveDate) AS EffMonth
FROM ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q
ON prm.NetRate_QuoteID = Q.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat QL
ON Q.QuoteID = QL.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL
ON QL.LocationID = QLL.LocationID
WHERE YEAR(prm.EffectiveDate) = 2016 AND
CompanyLine = 'Ironshore Insurance Company'
GROUP BY MONTH(prm.EffectiveDate),
QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID
I think the problem in this table:
What Am I missing in this Query?
select
ClassCode,
QLL.Description,
sum(Premium)
from ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID
LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID
LEFT JOIN
(SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI
JOIN ( SELECT LocationID, MAX(ClassCode)
FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA
ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID
where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company'
GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID
Now it says
Msg 8156, Level 16, State 1, Line 14
The column 'LocationID' was specified multiple times for 'QLL'.

It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table.
I would suggest you do a quick check against those tables, looking for table counts.
--this should be your baseline count
SELECT COUNT(*)
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID
--this will be a check against the first joined table.
SELECT COUNT(*)
FROM NetRate_Quote_Insur_Quote Q
WHERE QuoteID IN
(SELECT NetRate_QuoteID
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID)
Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using.
Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation).
This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.

SQL: Select a column independent of where clause

SELECT TOP 1000 p.Title,p.Distributor, SUM(r.SalesVolume) AS VolumeOfSales,
CAST(SUM(r.CustomerPrice*r.SalesVolume) as decimal (18,0)) AS ValueOfSales,
CAST (AVG(r.CustomerPrice) as decimal (18,1)) AS AvgPrice,
p.MS_ContentType AS category ,Min(c.WeekId) AS ReleaseWeek
from Product p
INNER JOIN RawData r
ON p.ProductId = r.ProductId
INNER JOIN Calendar c
ON r.DayId = c.DayId
WHERE c.WeekId BETWEEN ('20145231') AND ('20145252')
AND p.Distributor IN ('WARNER', 'TF1', 'GAUMONT')
AND p.VODEST IN ('VOD', 'EST')
AND p.ContentFlavor IN ('SD', 'HD', 'NC')
AND p.MS_ExternalID1 IN ('ADVENTURE/ACTION', 'ANIMATION/FAMILY', 'COMEDY')
AND p.MS_ContentType IN ('FILM', 'TV', 'OTHERS')
AND r.CountryId = 1
GROUP BY p.Title,p.Distributor,p.MS_ContentType
ORDER BY VolumeOfSales DESC, ValueOfSales DESC
I want to madify the above query so that only the column ReleaseWeek is independent of the where clause WHERE c.WeekId BETWEEN ('20145231') AND ('20145252')
The result that I dervive looks like:
`Title Distributor VolumeOfSales ValueOfSales AvgPrice category ReleaseWeek
Divergente M6SND 94038 450095 4.0 Film 20145233`
However what I really want is the ReleaseWeek to be the first value in the column c.WeekId corresponding to that Titlein the database and not the first one between ('20145231') AND ('20145252') What is the best way to modify it? Any leads would be greatful.

Combine two subqueries to main query in SQL

I am new to SQL and need assistance to combine (JOIN) two subqueries with a main query in SQL. Here is what I have. *Each query works independent of another. The end result would be that I would retrieve the # of accommodations for each resort, retrieve the lowest cost of all accommodations for each resort, and join those results to the list of resort types and resorts.
DB Schema
Table 1 - Resort -
resort_id (PK)
resort_type_id (FK)
name
Table 2 - Resort_type -
resort_type_id (PK)
resort_type
Table 3 - Accommodations -
accommodations_id (PK)
resort_id (FK)
description
cost_per_night
Query
SELECT Resort.name, Resort_type, Acc.Accommodations, Low.min_price
FROM
(SELECT resort.name AS resort_name, Resort_type.resort_type
FROM Resort
INNER JOIN Resort_type
ON Resort.resort_type_id = Resort_type.resort_type_id
(SELECT resort_id, Count(resort_id) AS Accommodations
FROM Accommodations
GROUP BY resort_id) AS Acc
(SELECT resort_id, Min(cost_per_night) AS min_price
FROM Accommodations
GROUP BY resort_id) AS Low
Any guidance would be greatly appreciated. I am having a difficult time visualizing how this should come together.

The query below lists each resort and its type along with the number of accommodations and the lowest cost per night.
select
r.name,
t.resort_type as type,
count(a.accommodations_id) as accommodations,
min(cost_per_night) as lowestcost
from resort r
inner join resort_type t
on t.resort_type_id = r.resort_type_id
left join accommodations a
on a.resort_id = r.resort_id
group by r.name, t.resort_type
Example: http://sqlfiddle.com/#!9/fc089/6

Is this what you're looking for:
SELECT
r.name,
rt.resort_type,
t.no_of_accommodations,
t.min_price
FROM Resort r
INNER JOIN Resort_Type rt
ON rt.resort_type_id = r.resort_type_id
LEFT JOIN(
SELECT
COUNT(*) AS no_of_accommodations,
MIN(cost_per_night) AS min_price
FROM Accommodations
GROUP BY resort_id
)t
ON t.resort_id = r.resort_id

SELECT Specefic Date in Tsql Query?

I have 3 tables that are joined together with this query.
One of them brings me people names , another one brings me their points and the last one brings me date time.
I select the total people score.
Also, there is a column in the 3th tables that brings me the scores' transaction Date Time. My problem is that I want to write a TSQL query with this condition:
Select the transaction date where the people score is 12,000 or more.
In my idea I should use while loop but I do not know the syntax?

This is how I would do it-
SELECT cp.FirstName
, cp.LastName
, SUM(Points) as Score
FROM ClubProfile cp
RIGHT JOIN CardTransaction ct
ON cp.ClubProfileId = ct.ClubProfileId
INNER JOIN Your3rdTable as t3
ON cp.ClubProfileId = t3.ClubProfileId
WHERE CONVERT(VARCHAR, ct.[Date Column], 101) = #your_date_param
GROUP BY
cp.FirstName
, cp.LastName
HAVING SUM(Points) >=12000

Based on your post this should be close to what you need. You need to add that 3rd table and alter this statement accordingly.
SELECT cp.FirstName
, cp.LastName
, SUM(Points) as Score
FROM [fidilio].[dbo].[ClubProfile] cp
RIGHT JOIN (
CardTransaction ct
INNER JOIN CardTransactionLog ctl
ON cp.CardTransactionLogId = ctl.CardTransactionLogId
)
ON cp.ClubProfileId = ct.ClubProfileId
GROUP BY
cp.FirstName
, cp.LastName
HAVING SUM(Points) >=12000
AND ctl.TransactionTimeStamp = #SomeDateTimeVariable
The variable #SomeDateTimeVariable has to come from someplace what is your exact time-frame criteria

Join subquery with min

I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id

The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;

This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How do I select columns together with aggregate functions? - sql-server

Related

Why do I have duplicate records in my JOIN

SQL: Select a column independent of where clause

Combine two subqueries to main query in SQL

SELECT Specefic Date in Tsql Query?

Join subquery with min

Categories

Resources