Invalid left join on SQL Server [duplicate] - sql-server

This question already has answers here:
Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause [duplicate]
(4 answers)
Closed 1 year ago.
I´m trying to excecute a simple left join on SQL Server but it keeps getting me the same message.
Select * from customers left join orders on customers.id = orders.customer_id group by customers.id order by amount;
Msg 8120, Level 16, State 1, Line 39 Column 'customers.first_name' is
invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
I´m not sure what else to do.
If it helps, here there are my tables:
CREATE TABLE customers(id INT IDENTITY(1,1) PRIMARY KEY, first_name VARCHAR(100), last_name VARCHAR(100), email VARCHAR(100));
CREATE TABLE orders(id INT IDENTITY(1,1) PRIMARY KEY, order_date DATE, amount DECIMAL(8,2), customer_id INT, FOREIGN KEY(customer_id) REFERENCES customers(id));

Once you create an aggregate using GROUP BY, you can only SELECT what you have grouped by and the aggregate functions (e.g., MAX, MIN, SUM, COUNT, etc.). For your query:
SELECT *
FROM customers
LEFT JOIN orders on customers.id = orders.customer_id
GROUP BY customers.id
ORDER BY amount;
Since you GROUP BY customers.id, the only things that can appear in your SELECT list is customers.id and aggregate functions. You are getting the error because the * means all columns, but you are only allowed to use customers.id and the aggregate function. For example, this would work:
SELECT customers.id
FROM customers
LEFT JOIN orders on customers.id = orders.customer_id
GROUP BY customers.id
ORDER BY amount;
As far as what you want to see, I cannot tell from this query. If you wanted to see the smallest order for each customer id, you could do that with:
SELECT customers.id, MIN(orders.amount) AS [SmallestOrder]
FROM customers
LEFT JOIN orders on customers.id = orders.customer_id
GROUP BY customers.id
ORDER BY amount;
So the cause of the error is hopefully clear now, but what data you want to find isn't.

Related

User Defined Function with a sub query

Create a user defined function named XXRepeatCustomer (where the XX are your initials). The function is to have one input parameter. Use the INT datatype for the input parameter. When the function is executed it is to return a three column table (CustFirstName, CustLastName, and Phone) for customers that placed a number of orders greater than or equal to the number passed in via the input parameter.
In order to receive a total number of orders placed I have joined together the Customer, and CustOrder tables. The problem only wants me to show the first, last, and phone of each customer but not the total of orders. I'm struggling with assigning the #orders parameter, and counting the total amount of orders in the sub query.
CREATE FUNCTION dbo.JERepeatCustomer
(#orders INT)
RETURNS TABLE AS
RETURN (SELECT CustFirstName, CustLastName, Phone
FROM Customer C JOIN CustOrder CO
ON C.CustomerID = CO.CustomerID
WHERE #orders <= OrderID AND OrderID = (SELECT COUNT (DISTINCT OrderID) FROM CustOrder)
GROUP BY CustFirstName, CustLastName, Phone)
I expect the user to enter a 7, or any number, and the results show only the customers who have ordered 7, or more.
The keyword you need is HAVING. HAVING is similar to WHERE. WHERE will filter returned rows based on a specific value in that column, while HAVING will filter rows based on an aggregated value in the column.
For example, you have a customer table, and in your orders table, you have all the orders for each customer.
DECLARE #input INT = 7
SELECT ct.customer, ct.phone, COUNT(ot.orderID)
FROM customertable ct
INNER JOIN ordertable ot
ON ct.customerID = ot.customerID
GROUP BY ct.customer, ct.phone
HAVING COUNT(ot.OrderID) >= #input
SELECT CustFirstName, CustLastName, Phone
FROM Customer C
CORSS APPLY (
SELECT COUNT(*) AS Orders
FROM CustOrder CO
WHERE C.CustomerID = CO.CustomerID) CustomerOrders
Where CustomerOrders.Orders >= #orders

SQL query with two tables, count and more info

I'm just learning this stuff and I'm having trouble with this one. I have two tables, STUDENTS and ADVISORS. The students are assigned advisors within the students table using a foreign key attached to the primary key of the advisors table.
The task here is this: Provide a list of all advisors and the number of active students assigned to each. Filter out any advisors with more than 1 student.
The current script is listed below:
select
Students.AdvisorID, count(Students.AdvisorID) as 'TotalStudents'
from
Students
left outer join
Advisors on Students.AdvisorID = Advisors.AdvisorID
where
Students.IsActive = 1
Group by
Students.AdvisorID
Having
count(Students.AdvisorID) < 2
This will output a proper list showing only the advisorID and total students.
I need to also display the
Advisors.FirstName + ' ' + Advisors.LastName as 'AdvisorName'
Any help would be greatly appreciated.
EDIT
students table
advisors table
I think your original attempt is on the right track, but you need to join again to the Advisors table to pull in the first and last name for each adviser. The reason for this is that after doing the aggregation all that remains is an ID for each adviser and a student count.
SELECT t1.AdvisorID,
t2.TotalStudents,
t1.FirstName + ' ' + t1.LastName AS AdvisorName
FROM Advisors t1
INNER JOIN
(
SELECT a.AdvisorID, COUNT(*) AS TotalStudents
FROM Advisors a
LEFT JOIN Students s
ON a.AdvisorID = s.AdvisorID
GROUP BY a.AdvisorID
HAVING COUNT(*) < 2
) t2
ON t1.AdvisorID = t2.AdvisorID
Other notes:
I chose to LEFT JOIN advisers to students, not the other way around, since you want a statistic for each adviser. Doing the join as you first had it could filter out advisers who do not match to any student. This is not the behavior you want, since an adviser who does not match to any student should have a student count of zero.
Here's a little sample data to work with
USE tempdb
GO
IF OBJECT_ID('tempdb.dbo.Advisors') IS NOT NULL DROP TABLE dbo.Advisors;
IF OBJECT_ID('tempdb.dbo.Students') IS NOT NULL DROP TABLE dbo.Students;
CREATE TABLE dbo.Advisors (AdvisorID int primary key, AdvisorName varchar(100));
CREATE TABLE dbo.Students
(
studentID int identity primary key,
AdvisorID int foreign key references dbo.Advisors(AdvisorID)
);
INSERT dbo.Advisors VALUES (1, 'Mr. White'),(2,'Walter Jr.'),(3,'Mr. Pinkman');
INSERT dbo.Students (AdvisorID)
SELECT TOP (20) abs(checksum(newid())%3)+1 FROM sys.all_columns;
No Left Join needed, I think this will give you what you are looking for.
SELECT a.AdvisorID, total_students = COUNT(*)
FROM dbo.Advisors a
INNER JOIN dbo.Students s ON a.AdvisorID = s.AdvisorID
GROUP BY a.AdvisorID
HAVING COUNT(*) < 2;

SELECT from multiple queries

I have this tables:
tblDiving(
diving_number int primary key
diving_club int
date_of_diving date)
tblDivingClub(
number int primary key not null check (number>0),
name char(30),
country char(30))
tblWorks_for(
diver_number int
club_number int
end_working_date date)
tblCountry(
name char(30) not null primary key)
I need to write a query to return a name of a country and the number of "Super club" in it.
a Super club is a club which have more than 25 working divers (tblWorks_for.end_working_date is null) or had more than 100 diving's in it(tblDiving) in the last year.
after I get the country and number of super club, I need to show only the country's that contains more than 2 super club.
I wrote this 2 queries:
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as number_of_guids
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
select tblDivingClub.name, count(distinct tblDiving.diving_number) as number_of_divings
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
But I don't know how do I continue.
Every query works separately, but how do I combine them and select from them?
It's university assignment and I'm not allowed to use views or temporary tables.
It's my first program so I'm not really sure what I'm doing:)
WITH CTE AS (
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as diving_number
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
UNION ALL
select tblDivingClub.name, count(distinct tblDiving.diving_number) as diving_number
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
)
SELECT * FROM CTE
You can combine the queries using a UNION ALL as long as there are the same number of columns in each query. You can then roll them into a Common Table Expression (CTE) and do a select from that.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

How can I get this query to return 0 instead of null?

I have this query:
SELECT (SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)) AS TenantBalance, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
But there's a problem with it; there are other TenantID's that don't have transactions and I want to get those too.
For example, the transaction table has 3 rows for bob, 2 row for john and none for jane. I want it to return the sum for bob and john AND return 0 for jane. (or possibly null if there's no other way)
How can I do this?
Tables are like this:
Tenants
ID
Other Data
Transactions
ID
TenantID (fk to Tenants)
Other Data
(You didn't state your sql engine, so I'm going to link to the MySQL documentation).
This is pretty much exactly what the COALESCE() function is meant for. You can feed it a list, and it'll return the first non-null value in the list. You would use this in your query as follows:
SELECT COALESCE((SUM(tr.AmountPaid) - SUM(tr.AmountCharged)), 0) AS TenantBalance, te.ID
FROM tblTenant AS te
LEFT JOIN tblTransaction AS tr ON (tr.TenantID = te.ID)
GROUP BY te.ID;
That way, if the SUM() result would be NULL, it's replaced with zero.
Edited: I rewrote the query using a LEFT JOIN as well as the COALESCE(), I think this is the key of what you were missing originally. If you only select from the Transactions table, there is no way to get information about things not in the table. However, by using a left join from the Tenants table, you should get a row for every existing tenant.
Below is a full walkthrough of the problem. The function isnull has also been included to ensure that a balance of zero (rather than null) is returned for Tenants with no transactions.
create table tblTenant
(
ID int identity(1,1) primary key not null,
Name varchar(100)
);
create table tblTransaction
(
ID int identity(1,1) primary key not null,
tblTenantID int,
AmountPaid money,
AmountCharged money
);
insert into tblTenant(Name)
select 'bob' union all select 'Jane' union all select 'john';
insert into tblTransaction(tblTenantID,AmountPaid, AmountCharged)
select 1,5.00,10.00
union all
select 1,10.00,10.00
union all
select 1,10.00,10.00
union all
select 2,10.00,15.00
union all
select 2,15.00,15.00
select * from tblTenant
select * from tblTransaction
SELECT
tenant.ID,
tenant.Name,
isnull(SUM(Trans.AmountPaid) - SUM(Trans.AmountCharged),0) AS Balance
FROM tblTenant tenant
LEFT JOIN tblTransaction Trans ON
tenant.ID = Trans.tblTenantID
GROUP BY tenant.ID, tenant.Name;
drop table tblTenant;
drop table tblTransaction;
Select Tenants.ID, ISNULL((SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)), 0) AS TenantBalance
From Tenants
Left Outer Join Transactions Tenants.ID = Transactions.TenantID
Group By Tenents.ID
I didn't syntax check it but it is close enough.
SELECT (SUM(ISNULL(tblTransaction.AmountPaid, 0))
- SUM(ISNULL(tblTransaction.AmountCharged, 0))) AS TenantBalance
, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
I only added this because if you're intention is to take into account for one of the parts being null you'll need to do the ISNULL separately
Actually, I found an answer:
SELECT tenant.ID, ISNULL(SUM(trans.AmountPaid) - SUM(trans.AmountCharged),0) AS Balance FROM tblTenant tenant
LEFT JOIN tblTransaction trans
ON tenant.ID = trans.TenantID
GROUP BY tenant.ID

Resources