SQL - Selecting counts from multiple tables - sql-server

Here is my problem (I'm using SQL Server)
I have a table of Students (StudentId, Firstname, Lastname, etc).
I have a table that records StudentAttendance (StudentId, ClassDate, etc.)
I record other student activity (I'm generalizing here for simplicity) such as a Papers table (StudentId, PaperId, etc.). There may be anywhere from zero to 20 papers turned in. Similarly, there is a table called Projects (StudentId, ProjectId, etc.). Same deal as with Papers.
What I'm trying to do is create a list of counts for students who have attendance over a certain level (say 10 attendances). Something like this:
ID Name Att Paper Proj
123 Baker 23 0 2
234 Charlie 26 5 3
345 Delta 13 3 0
Here is what I have:
select
s.StudentId,
s.Lastname,
COUNT(sa.StudentId) as CountofAttendance,
COUNT(p.StudentId) as CountofPapers
from Student s
inner join StudentAttendance sa on (s.StudentId = sa.StudentId)
left outer join Paper p on (s.StudentId = p.StudentId)
group by s.StudentId, s.Lastname
Having COUNT(sa.StudentId) > 10
order by CountofAttendance
If the CountofPaper and join (either inner or left outer) to the Papers table is commented out, the query works fine. I get a nice count of students who have attended at least 10 classes.
However, if I put in the CountofPapers and the join, things get crazy. With a left outer join, any students with papers just show their attendance count in the paper column. With an inner join, both attendance and paper counts seem to multiple off each other.
Guidance needed and appreciated.
Dave

Look at using Common Table Expressions and then divide and conquer your problem. BTW, you are off by 1 in your original query, you'll have 11 minimum attendence
;
WITH GOOD_STUDENTS AS
(
-- this query defines all students with 10+ attendance
SELECT
S.StudentID
, count(1) AS attendence_count
FROM
Student S
inner join
StudentAttendance sa
on (s.StudentId = sa.StudentId)
GROUP BY
S.StudentId
HAVING
COUNT(1) >= 10
)
, STUDIOUS_STUDENTS AS
(
-- lather, rinse, repeat for other metrics
SELECT
S.StudentID
, count(1) AS paper_count
FROM
Student S
inner join
Papers P
on (s.StudentId = P.StudentId)
GROUP BY
S.StudentId
)
, GREGARIOUS_STUDENTS AS
(
SELECT
S.StudentID
, count(1) AS project_count
FROM
Student S
inner join
Projects P
on (s.StudentId = P.StudentId)
GROUP BY
S.StudentId
)
-- And now we roll it all together
SELECT
S.*
, G.attendance_count
, SS.paper_count
, GS.project_count
-- ad nauseum
FROM
-- back to the well on this one as there may be
-- students did nothing
Students S
LEFT OUTER JOIN
GOOD_STUDENTS G
ON G.studentId = S.studentId
LEFT OUTER JOIN
STUDIOUS_STUDENTS SS
ON SS.studentId = S.studentId
LEFT OUTER JOIN
GREGARIOUS_STUDENTS GS
ON GS.studentId = S.studentId
I see plenty of other answer rolling in but I typed for far too long to quit ;)

The problem is there are multiple papers per student, so a StudentAttendance row for every row of Paper that joins: the counts will be re-added every time. Try this:
select
s.StudentId,
s.Lastname,
(select COUNT(*) from StudentAttendance where s.StudentId = sa.StudentId) as CountofAttendance,
(select COUNT(*) from Paper where s.StudentId = p.StudentId) as CountofPapers
from Student s
where (select COUNT(*) from StudentAttendance where s.StudentId = sa.StudentId) > 10
order by CountofAttendance
EDITED to incorporate issue with reference to CountofAttendance
btw, this isn't the fastest solution, but it is the easiest to understand, which was my intention. You can avoid the re-calculation by using a join to an aliased select, but as I said, this is the simplest.

Try this:
select std.StudentId, std.Lastname, att.AttCount, pap.PaperCount, prj.ProjCount
from Students std
left join
(
select StudentId, count(*) AttCount
from StudentAttendance
) att on
std.StudentId = att.StudentId
left join
(
select StudentId, count(*) PaperCount
from Papers
) pap on
std.StudentId = pap.StudentId
left join
(
select StudentId, count(*) ProjCount
from Projects
) prj on
std.StudentId = prj.StudentId
where att.AttCount > 10

Related

Select columns from several tables with count

I have 3 tables in SQL Server:
Sales (customerId)
Customer (customerId, personId)
Person (personId, firstName, lastName)
and I need to return the top 10 customers.
I used this query:
SELECT TOP 10
CustomerID, COUNT(CustomerID)
FROM
Sales
GROUP BY
(CustomerID)
ORDER BY
COUNT(CustomerID) DESC
The query currently returns only the customerId and count, but I also need to return the firstName and lastName of these customers from the Person table.
I know I need to reach the firstName and lastName by correlating between Sales.customerId and Customer.customerId, and from Customer.personId to get the Person.personId.
My question is whether I need to use an inner join or union, and how to use either of them to get the firstName and lastName of these customers
Union is mostly used for disjoint sets. To achieve your target, u can go with inner-join.
If you want to use joins, then here is the query which works similarly to your requirement.
SELECT TOP 10 S.CustomerID, P.FirstName,P.LastName, count(*)
FROM Sales S
INNER JOIN Customer C on S.CustomerId=C.CustomerId
INNER JOIN Person P on C.PersonId = P.PersonId
GROUP BY (S.CustomerID, P.FirstName,P.LastName)
ORDER BY count(*) DESC
You need use inner join like this :
SELECT TOP 10 S.CustomerID
, P.FirstName
, P.LastName
, COUNT (1) AS CountOfCustomer -- this is equal count(*)
FROM Sales S
INNER JOIN Customer C ON S.CustomerId = C.CustomerId
INNER JOIN Person P ON C.PersonId = P.PersonId
GROUP BY S.CustomerID, P.FirstName, P.LastName
ORDER BY 4 DESC; -- this is equal order by count(*)

SQL query with Northwind database

I have a little problem with some query.
This is the task:
Create a query that displays the employees with no sale in the last 3 months to customers who are from "USA".
This is what i wrote:
Select emp.EmployeeID, (emp.FirstName + ' ' + emp.LastName) AS Name
From Employees AS emp
Join Orders AS o ON emp.EmployeeID = o.EmployeeID
Join Customers AS c ON o.CustomerID = c.CustomerID
Where c.Country LIKE 'USA';
One of the problem is that i don't know where to put this select query (it's for calculating the last 3 months but i'm not sure that this is true):
Select DATEDIFF(MM, '1998-02-01', '1998-05-31') From Orders
The second problem is that i don't have an idea for the part "employees with no sale" How can i find this?
Should i use other kind of joins or something else?
Sorry for my question but i'm new in SQL and i'll appreciate any kind of help.
If you have any questions, please ask. :)
Create a query that displays the employees with no sale in the last 3 months to customers who are from "USA"
Select * From Employees e
Where not exists
(Select * from orders o join customers c
on c.CustomerID = o.CustomerID
Where c.Country = "USA"
and o.saleDate >= DateAdd(month, -3, getdate()))
How you actually treat "last 3 months" depends on you, but this is one way to approach the task:
select e.*
from employees e
where not exists (
select 1
from orders o
join customers c on c.customerid = o.customerid
where e.employeeid = o.employeeid
and c.country = 'USA'
and c.orderdate > dateadd(month, datediff(month, 0, getdate()), 3)
);
Last three months could mean months per se or just days that make exactly last three months (eg. 91)
If you're just learning SQL, please don't take one of these answers and use it to do your homework. You'll need to figure out what's going on for yourself.
Try to take this task in pieces. First, let's look at 'customers from USA.' You can probably write this one yourself:
Select * from customers where country = 'usa'
Next, consider how to find the orders for those customers (all of them, for now).
Since you've posted an example of an inner join, I'm going to assume you could write that one yourself, too:
Select o.* from customers c inner join orders o on
c.customerID = o.customerID
where country = 'usa'
Now you've asked where/how to apply the criteria for 'sales in the last 3 months.' Note each order has an orderDate, representing the date the order was placed. You need to use the order table's orderdate field with today's date and compare the number of months between them. The getdate() function returns the date and time from the server. Try executing:
select getdate()
then you can experiment with the datediff() function on the orderDates in the orders table:
Select orderid, getdate(), orderdate, datediff(mm,orderdate,getdate())
from orders
I think you'll have better luck, though, adding 3 months to orderdate and comparing with getdate():
Select orderid, orderdate, dateadd(month, orderdate, 3),
getdate(), dateadd(month, getdate(), -3)
from orders
Now you can see all the employee IDs you don't want, the ones that have orders within the last 3 months, and you already know how to limit orders by customers in USA. Those are the IDs you want to exclude from the employees table. You can do that in a couple of different ways, depending on what you've learned so far, but typically you're exposed to LEFT OUTER JOINs for this sort of thing. You'll want to left outer join your employee table with the set of IDs you don't want (the ones with orders in the past 3 months) on the employeeID field, and return rows where the employeeID from the order subquery is null.
Try:
select count(*) from employees -- note rowcount
then:
select min(employeeID) from orders -- pick one employee
then:
select count(e.employeeID)
from employees e left outer join
(select * from orders
where employeeID in (select min(employeeID) from orders)
) o on e.employeeID = o.employeeID
where o.employeeID is null
this count should be one less than the total number of rows in your employees table, it should exclude the lowest employeeID with an order.
Then see if you can figure out how to do your homework.
Left/Right joins exist as well and you can somewhat think of them as Venn diagrams (Join or Inner Join is the intersect, Left Join is intersect and left circle, etc.). Its one perspective on how to think what will be returned. Fields returned which are not in the intersect like with a Left Join will be NULL.
The below is another way to do it:
Select emp.EmployeeID, (emp.FirstName + ' ' + emp.LastName) AS Name
From Employees AS emp
Left Join Orders As o ON emp.EmployeeID = o.EmployeeID
And o.OrderDate > DateAdd(month,-3,GetDate())
Join Customers AS c ON o.CustomerID = c.CustomerID
Where c.Country = 'USA' And o.EmployeeID is Null
Group By emp.EmployeeID, emp.FirstName, emp.LastName;

Compare top record against Top record -1

I was asked to create a report comparing all clients most recent order and their previous order, and then compare and return only those who placed orders with a higher amount as their next order. (I really hope this makes sense)
The order history table is laid out in such a way as each customer has an Order number that is sequential to that Customer (E.G. If a customer places 5 orders, then their top order number is 5, which should make this easier.) So for a customer with 5 orders, I would want to compare order #'s 4 and 5, and then only return this customer if Order #5 was for a higher Dollar amount.
The Order amount is stored in a different table, but they are linked by a guid reference (ID).
SELECT TOP 1 CO.OrderNumber
,COD.Amount
FROM cust_OrderDetail COD
INNER JOIN dbo.cust_Order CO ON Cod.cust_OrderID = CO.ID
INNER JOIN Customer c ON CO.Customer = c.ID
WHERE COD.Amount > (SELECT COD1.Amount
FROM cust_OrderDetail COD1
INNER JOIN dbo.cust_Order CO1 ON Cod1.cust_OrderID = CO1.ID
WHERE CO1.Ordernumber = (This is where I fall apart)
I hope this makes sense. I fall apart right there at the end. I know how to link in all the other details and everything else that is needed here. It is just this one comparison that kicks my teeth in.
Assuming your query returns Customer and CustomerOrder info correctly
Approach: Get your top 2 records per customer in CTE and then compare the Topmost record with the previous one.
WITH Top2 AS (
SELECT *
FROM
(
SELECT c.ID, CO.OrderNumber ,COD.Amount,
ROW_NUMBER() OVER(PARTITION BY c.ID ORDER BY c.ID, CO.OrderNumber DESC) Rnk
FROM cust_OrderDetail COD
INNER JOIN dbo.cust_Order CO ON Cod.cust_OrderID = CO.ID
INNER JOIN Customer c ON CO.Customer = c.ID
) T WHERE Rnk <= 2)
SELECT * FROM
(SELECT * FROM Top2 Where Rnk = 1) T1
LEFT JOIN (SELECT * FROM Top2 Where Rnk = 2) T2
ON T1.ID = T2.ID
AND T1.Amount > T2.amount

How can I exclude LEFT JOINed tables from TOP in SQL Server?

Let's say I have two tables of books and two tables of their corresponding editions.
I have a query as follows:
SELECT TOP 10 * FROM
(SELECT hbID, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
UNION
SELECT pbID, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
Left JOIN paperbackEdition on pbID = pbedID
) books
WHERE hbPublisherID = 7
ORDER BY hbPublishDate DESC
If there are 5 editions of the first two hardback and/or paperback books, this query only returns two books. However, I want the TOP 10 to apply only to the number of actual book records returned. Is there a way I can select 10 actual books, and still get all of their associated edition records?
In case it's relevant, I do not have database permissions to CREATE and DROP temporary tables.
Thanks for reading!
Update
To clarify: The paperback table has an associated table of paperback editions. The hardback table has an associated table of hardback editions. The hardback and paperback tables are not related to each other except to the user who will (hopefully!) see them displayed together.
If I understand you correctly, you could get the 10 books with all associated editions by
Using a WITH statement to return the initial, complete resultset
select 10 distinct books by using a GROUP BY
JOIN the results of this group to retain all information from given 10 books.
SQL Statement
;WITH books AS (
SELECT hbID, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
WHERE hbPublisherID = 7
UNION ALL
SELECT pbID, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
LEFT JOIN paperbackEdition on pbID = pbedID
WHERE hbPublisherID = 7
)
SELECT *
FROM books b
INNER JOIN (
SELECT TOP 10 hbID
FROM books
GROUP BY
hbID
) bt ON bt.hbID = b.hbID
or if you prefer to write the where clause only once
;WITH books AS (
SELECT hbID, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
UNION ALL
SELECT pbID, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
LEFT JOIN paperbackEdition on pbID = pbedID
)
, q AS (
SELECT *
FROM books
WHERE hbPublisherID = 7
)
SELECT *
FROM q b
INNER JOIN (
SELECT TOP 10 hbID
FROM q
GROUP BY
hbID
) bt ON bt.hbID = b.hbID
Not so easy. You need to apply Top 10 to only the hardback and paperback tables, without the join. Then join the result to the data.
The following query only works when the hbID and pbID are always unique. If not, it gets more complicated. You need to separate them or add another column to the query to distinguish them.
SELECT *
FROM
(SELECT hbID as id, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
UNION
SELECT pbID as id, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
Left JOIN paperbackEdition on pbID = pbedID
) books
INNER JOIN
(SELECT TOP 10 *
FROM
(SELECT hbID as id, hbPublisherID as publishedId, hbPublishDate as publishDate
FROM hardback
UNION
SELECT pbID as id, pbPublisherID as publishedId, pbPublishDate as publishDate
FROM paperback
)
WHERE publisherID = 7
ORDER BY publishDate DESC
) topTen
on books.id = TopTen.id
This should grab the ten most recently published titles with a hardback from publisher 7:
select *
from (
select top 10 title
from hardback
where hbPublisherID = 7
group by
title
order by
hbPublishDate desc
) top_titles
left join
hardback
on hardback.hbTitle = top_titles.title
left join
paperback
on paperback.pbTitle = top_titles.title

SQL Join on 3 tables

Here is my query:
select custnmbr,custname,slprsnid,cdatetime,cdur,cnumber,cext,
finalcalledpartynumber,sono,invno,ordamt,invamt,adduser
from table1 calls left join table2 cust
on (calls.number = cust.phone1 or calls.cext = cust.phone1)
left outer join table3 sales on (cust.custnmbr = sales.custno
and sales.adddate = #date)
where (cnumber = #phone or cext = #phone) and cdatetime >= #date
Here is what I am trying to do:
Get all the calls from table 1, and get the customer from table 2. Then get all the sales from table 3 and the customer from table 2.
What I am getting is all the calls, the customer, and then if there is an order for that customer I get that as well. What I want is all the orders as well.
Just looking for some pointers on joining 3 tables.
You have to think about how you're going to handle the calls and sales results since there does not appear to be a correlation between calls and sales. So, a customer with 3 calls and 4 sales will produce 12 rows in the result set. This is where we could help you better if you provide more specifics.
The concept is that if calls is a necessary requirement, then you could do something like...
SELECT ca.CallId, ca.TimeStarted, ...,
c.CustomerId, c.FirstName, ...,
s.SaleDate, s.SaleAmount, ...
FROM calls AS ca
INNER JOIN customers AS c ON ca.CustomerId = c.CustomerId
OUTER APPLY ( --or CROSS APPLY, depending on your needs
SELECT s.SaleDate, s.SaleAmount, ...
FROM sales AS s
WHERE s.CustomerId = c.CustomerId
AND s.SaleDate = ca.CallDate --It would help if this relationship existed
)
Generally I would start with the table you want ALL the data from, the one that you are going to form the basis of your navigation to the other entities.:
SELECT
C.*,
Ca.*,
S.*
FROM Sales S
LEFT JOIN Customer C
ON (S.CustomerId = C.CustomerId) -- your condition
LEFT JOIN Calls Ca
ON (C.CustomerId = Ca.CustomerId) -- your condition

Resources