How can I exclude LEFT JOINed tables from TOP in SQL Server? - sql-server

Let's say I have two tables of books and two tables of their corresponding editions.
I have a query as follows:
SELECT TOP 10 * FROM
(SELECT hbID, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
UNION
SELECT pbID, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
Left JOIN paperbackEdition on pbID = pbedID
) books
WHERE hbPublisherID = 7
ORDER BY hbPublishDate DESC
If there are 5 editions of the first two hardback and/or paperback books, this query only returns two books. However, I want the TOP 10 to apply only to the number of actual book records returned. Is there a way I can select 10 actual books, and still get all of their associated edition records?
In case it's relevant, I do not have database permissions to CREATE and DROP temporary tables.
Thanks for reading!
Update
To clarify: The paperback table has an associated table of paperback editions. The hardback table has an associated table of hardback editions. The hardback and paperback tables are not related to each other except to the user who will (hopefully!) see them displayed together.

If I understand you correctly, you could get the 10 books with all associated editions by
Using a WITH statement to return the initial, complete resultset
select 10 distinct books by using a GROUP BY
JOIN the results of this group to retain all information from given 10 books.
SQL Statement
;WITH books AS (
SELECT hbID, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
WHERE hbPublisherID = 7
UNION ALL
SELECT pbID, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
LEFT JOIN paperbackEdition on pbID = pbedID
WHERE hbPublisherID = 7
)
SELECT *
FROM books b
INNER JOIN (
SELECT TOP 10 hbID
FROM books
GROUP BY
hbID
) bt ON bt.hbID = b.hbID
or if you prefer to write the where clause only once
;WITH books AS (
SELECT hbID, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
UNION ALL
SELECT pbID, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
LEFT JOIN paperbackEdition on pbID = pbedID
)
, q AS (
SELECT *
FROM books
WHERE hbPublisherID = 7
)
SELECT *
FROM q b
INNER JOIN (
SELECT TOP 10 hbID
FROM q
GROUP BY
hbID
) bt ON bt.hbID = b.hbID

Not so easy. You need to apply Top 10 to only the hardback and paperback tables, without the join. Then join the result to the data.
The following query only works when the hbID and pbID are always unique. If not, it gets more complicated. You need to separate them or add another column to the query to distinguish them.
SELECT *
FROM
(SELECT hbID as id, hbTitle, hbPublisherID, hbPublishDate, hbedID, hbedDate
FROM hardback
LEFT JOIN hardbackEdition on hbID = hbedID
UNION
SELECT pbID as id, pbTitle, pbPublisher, pbPublishDate, pbedID, pbedDate
FROM paperback
Left JOIN paperbackEdition on pbID = pbedID
) books
INNER JOIN
(SELECT TOP 10 *
FROM
(SELECT hbID as id, hbPublisherID as publishedId, hbPublishDate as publishDate
FROM hardback
UNION
SELECT pbID as id, pbPublisherID as publishedId, pbPublishDate as publishDate
FROM paperback
)
WHERE publisherID = 7
ORDER BY publishDate DESC
) topTen
on books.id = TopTen.id

This should grab the ten most recently published titles with a hardback from publisher 7:
select *
from (
select top 10 title
from hardback
where hbPublisherID = 7
group by
title
order by
hbPublishDate desc
) top_titles
left join
hardback
on hardback.hbTitle = top_titles.title
left join
paperback
on paperback.pbTitle = top_titles.title

Related

UNION & ORDER two tables inside Common Table Expression

I have a CTE inside a SQL Stored Procedure that is UNIONing values from two databases - the values are customer numbers and that customer's last order date.
Here is the original SQL -
;WITH CTE_last_order_date AS
(
SELECT c1.customer ,MAX(s2.dt_created) AS last_order_date
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer
GROUP BY c1.customer
UNION ALL
SELECT c1.customer ,MAX(s1.dt_created) AS last_order_date
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK)
ON c1.customer = s1.customer
GROUP BY c1.customer
)
Example Results:
customer, last_order_date
CF122595, 2011-11-15 15:30:22.000
CF122595, 2016-08-15 10:01:51.230
(2 row(s) affected)
This obviously doesn't apply the UNION distinct records rule because the date values are not matched, meaning SQL returned the max value from both tables (i.e. the final record set was not distinct)
To try and get around this, I tried another method borrowed from this question and implemented grouping:
;WITH CTE_last_order_date AS
(
SELECT max(last_order_date) as 'last_order_date', customer
FROM (
SELECT distinct cust.customer, max(s2.dt_created) AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer
GROUP BY c1.customer
UNION
SELECT distinct c1.customer, max(sord.dt_created) AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK)
ON cust.customer = sord.customer
GROUP BY
c1.customer
) AS t
GROUP BY customer
ORDER BY MIN('group'), customer
)
Example Results:
customer, last_order_date
CF122595, 2016-08-15 10:01:51.230
(1 row(s) affected)
This had the distinction (hah) of working fine, up until clattering into the rule that prevents ORDER BY inside Common Table Expressions, which is needed in order to pick the lowest group (which would imply Live orders (group 1), whose date needs to take precedence over the Archive (group 2)).
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
All help or ideas appreciated.
Rather than grouping, then unioning, then grouping again, why not union the orders tables and work from there:
SELECT c1.customer ,MAX(s2.dt_created) AS last_order_date
FROM customers c1
INNER JOIN (select customer, dt_created from archive_orders
union all select customer, dt_created from orders) s2
ON c1.customer = s2.customer
GROUP BY c1.customer
Remember, in SQL your job is to tell the system what you want, not what steps/procedure to follow to get those results. The above, logically, describes what we're wanting - we want the last order date from each customer's orders, and we don't care whether that was an archived order or a non-archived one.
Since we're going to reduce the order information down to a single row (per customer) during the GROUP BY behaviour anyway, we don't also need the UNION to remove duplicates so I've switched to UNION ALL.
(I confess, I couldn't really see what the ORDER BY was supposed to be adding to the mix at this point so I've not tried to include it here. If this is going into a CTE, then reflect on the fact that CTEs, just like tables and views, have no inherent order. The only ORDER BY clause that affects the ordering of result rows is the one applied to the outermost/final SELECT)
Giving orders precedence over archived_orders:
;With CTE1 as (
SELECT c1.customer,group,MAX(s2.dt_created) as MaxInGroup
FROM customers c1
INNER JOIN (select customer, dt_created,2 as group from archive_orders
union all select customer, dt_created,1 from orders) s2
ON c1.customer = s2.customer
GROUP BY c1.customer,group
), CTE2 as (
SELECT *,ROW_NUMBER() OVER (PARTITION BY customer ORDER BY group) as rn
from CTE2
)
select * from CTE2 where rn = 1
An alternative approach could be to only get the customer from the archive table where we do not have a current one. Something like:
WITH CurrentLastOrders(customer, last_order_date) AS -- Get current last orders
(
SELECT o.customer, max(o.dt_created) AS last_order_date
FROM orders s WITH (NOLOCK) ON c.customer = o.customer
GROUP BY o.customer
),
ArchiveLastOrders(customer, last_order_date) AS -- Get archived last orders where customer does not have a current order
(
SELECT o.customer, max(o.dt_created) AS last_order_date
FROM archive_orders o WITH (NOLOCK)
WHERE NOT EXISTS ( SELECT *
FROM CurrentLastOrders lo
WHERE o.customer = lo.customer)
GROUP BY o.customer
),
AllLastOrders(customer, last_order_date) AS -- All customers with orders
(
SELECT customer, last_order_date
FROM CurrentLastOrders
UNION ALL
SELECT customer, last_order_date
FROM ArchiveLastOrders
)
AllLastOrdersPlusCustomersWithNoOrders(customer, last_order_date) AS -- All customerswith latest order if they have one
(
SELECT customer, last_order_date
FROM AllLastOrders
UNION ALL
SELECT customer, null
FROM customers c WITH (NOLOCK)
WHERE NOT EXISTS ( SELECT *
FROM AllLastOrders lo
WHERE c.customer = lo.customer)
)
I wouldn't try to nest SQL to achive a distinct result set, it's the same logic of grouping by customer in both unioned queries.
If you want a distinct ordered set, you can do that outside of the CTE
How about:
;WITH CTE_last_order_date AS
(
SELECT c1.customer ,s2.dt_created AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK) ON c1.customer = s2.customer
UNION ALL
SELECT c1.customer ,s1.dt_created AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK) ON c1.customer = s1.customer
)
SELECT customer, MAX(last_order_date)
FROM CTE_last_order_date
GROUP BY customer
ORDER BY MIN('group'), customer
if you union all possible rows together, then calculate a row_number, partitioned on customer and ordered on 'group' then last_order_date descending, you can then select all the row=1 to give the 'top 1' per customer
;WITH CTE_last_order_date AS
(
SELECT max(last_order_date) as 'last_order_date', customer
FROM (
SELECT distinct cust.customer, max(s2.dt_created) AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer
GROUP BY c1.customer
UNION
SELECT distinct c1.customer, max(sord.dt_created) AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK)
ON cust.customer = sord.customer
GROUP BY
c1.customer
) AS t
GROUP BY customer
)
, --row_number below is 'per customer' and can be used to make rn=1 the top 1 for each customerid
ROWN AS (SELECT Customer,last_order_date,[group], row_number() OVER(partition by customer order by [group] ASC, sord.dt_created DESC) AS RN)
SELECT * FROM Rown WHERE Rown.rn = 1

How to select multiple columns from a detail table in SQL Server

I have the following tables:
MASTER table (ID(PK), NAME, etc)
DETAIL table (ID(PK), IDMASTER(FK), VALUE1, DATE1, etc)
What I need is a SQL query or a way to do a select like
I work with SQL Server.
What I need is a SQL query or a way to do a select like
SELECT
M.ID, M.NAME,
(SELECT TOP 1 DT.ID, DT.VALUE1
FROM DETAIL D
WHERE D.IDMASTER = M.ID
ORDER BY DATE 1 DESC)
-- more than one column with a where clause and an order clause
FROM
MASTER M
Use OUTER APPLY:
SELECT M.ID,
M.NAME
D.ID,
D.VALUE1
FROM dbo.[MASTER] M
OUTER APPLY(SELECT TOP 1 ID, VALUE1
FROM dbo.DETAIL
WHERE IDMASTER = M.ID
ORDER BY [DATE] DESC) D;
You do this with APPLY operator:
select * from master m
outer apply(select top 1 * from detail d where d.masterid = m.id order by d.date1 desc)oa

MAX() SQL Server multiple rows. How fix to return only 1 row per month year?

I needed help with using the function MAX() properly as I seem to be getting more than one row when I have clearly stated that I want the MAX(Monthid), which should return the last monthyear row for the customer.
What I need is the last monthyear row for either customer_segment or agreement. When I finally put the customer_segment and agreement columns to the original, I get upto 6 different monthyear rows wiht different customer_segment names when I only want 1 row.
How do fix this?
--Finding customer segment
SELECT
a.[cust_no]
,Customer_Segment
,max(monthid) AS monthyear
INTO #Segment
FROM Original_table a
INNER JOIN Customer_Segment ku
on ku.Cust_no=a.cust_no
GROUP BY a.cust_no,Customer_Segment
--------------------------------------------------------------------------
--Finding agreement(yes/no)
SELECT DISTINCT
a.cust_no,
Agreement,
max(monthid) as Monthyear
into #Agreement
FROM Original_table a
INNER JOIN Cust_Details zx
ON zx.cust_no=a.cust_no
GROUP BY a.cust_no,
zx.Agreement
------------------------------------------------
-- Attaching columns to original file on cust_no
select DISTINCT
A.cust_no,
B.Customer_Segment,
d.Agreement
from Original_table A
LEFT JOIN ( SELECT DISTINCT * FROM #Segment ) b
on b.cust_no=A.cust_no
LEFT JOIN( SELECT distinct * FROM #Agreement ) d
ON d.cust_no=a.cust_no
Aren't you missing some info on the joins?
(...)
LEFT JOIN ( SELECT DISTINCT * FROM #Segment ) b
on b.cust_no=A.cust_no and
b.Customer_Segment = A.Customer_Segment
LEFT JOIN( SELECT distinct * FROM #Agreement ) d
ON d.cust_no=a.cust_no and
d.Agreement = A.Agreement
try this:
select
A.cust_no, b.monthyear,
e.Customer_Segment,
d.Agreement
from Original_table A
JOIN (SELECT a.[cust_no] cust_no ,max(monthid) AS monthyear
FROM Original_table a) b on b.cust_no=A.cust_no
OUTER APPLY
( SELECT TOP 1 Agreement FROM Cust_Details d
WHERE d.cust_no=a.cust_no
ORDER BY Agreement
) d
OUTER APPLY
( SELECT TOP 1 Customer_Segment FROM Customer_Segment e
WHERE e.cust_no=a.cust_no
ORDER BY Customer_Segment
) e

SQL Server Full Text Search - Weighting Certain Columns Over Others

If I have the following full text search query:
SELECT *
FROM dbo.Product
INNER JOIN CONTAINSTABLE(Product, (Name, Description, ProductType), 'model') ct
ON ct.[Key] = Product.ProductID
Is it possible to weigh the columns that are being searched?
For example, I care more about the word model appearing in the Name column than I do the
Description or ProductType columns.
Of course if the word is in all 3 columns then I would expect it to rank higher than if it was just in the name column. Is there any way to have a row rank higher if it just appears in Name vs just in Description/ProductType?
You can do something like the following query. Here, WeightedRank is computed by multiplying the rank of the individual matches. NOTE: unfortunately I don't have Northwind installed so I couldn't test this, so look at it more like pseudocode and let me know if it doesn't work.
declare #searchTerm varchar(50) = 'model';
SELECT 100 * coalesce(ct1.RANK, 0) +
10 * coalesce(ct2.RANK, 0) +
1 * coalesce(ct3.RANK, 0) as WeightedRank,
*
FROM dbo.Product
LEFT JOIN
CONTAINSTABLE(Product, Name, #searchTerm) ct1 ON ct1.[Key] = Product.ProductID
LEFT JOIN
CONTAINSTABLE(Product, Description, #searchTerm) ct2 ON ct2.[Key] = Product.ProductID
LEFT JOIN
CONTAINSTABLE(Product, ProductType, #searchTerm) ct3 ON ct3.[Key] = Product.ProductID
order by WeightedRank desc
Listing 3-25. Sample Column Rank-Multiplier Search of Pro Full-Text Search in SQL Server 2008
SELECT *
FROM (
SELECT Commentary_ID
,SUM([Rank]) AS Rank
FROM (
SELECT bc.Commentary_ID
,c.[RANK] * 10 AS [Rank]
FROM FREETEXTTABLE(dbo.Contributor_Birth_Place, *, N'England') c
INNER JOIN dbo.Contributor_Book cb ON c.[KEY] = cb.Contributor_ID
INNER JOIN dbo.Book_Commentary bc ON cb.Book_ID = bc.Book_ID
UNION ALL
SELECT c.[KEY]
,c.[RANK] * 5
FROM FREETEXTTABLE(dbo.Commentary, Commentary, N'England') c
UNION ALL
SELECT ac.[KEY]
,ac.[RANK]
FROM FREETEXTTABLE(dbo.Commentary, Article_Content, N'England') ac
) s
GROUP BY Commentary_ID
) s1
INNER JOIN dbo.Commentary c1 ON c1.Commentary_ID = s1.Commentary_ID
ORDER BY [Rank] DESC;
Similar to Henry's solution but simplified, tested and using the details the question provided.
NB: I ran performance tests on both the union and left join styles and found the below to require far less logical reads on the union style below with my datasets YMMV.
declare #searchTerm varchar(50) = 'model';
declare #nameWeight int = 100;
declare #descriptionWeight int = 10;
declare #productTypeWeight int = 1;
SELECT ranksGroupedByProductID.*, outerProduct.*
FROM (SELECT [key],
Sum([rank]) AS WeightedRank
FROM (
-- Each column that needs to be weighted separately
-- should be added here and unioned with the other queries
SELECT [key],
[rank] * #nameWeight as [rank]
FROM Containstable(dbo.Product, [Name], #searchTerm)
UNION ALL
SELECT [key],
[rank] * #descriptionWeight as [rank]
FROM Containstable(dbo.Product, [Description], #searchTerm)
UNION ALL
SELECT [key],
[rank] * #productTypeWeight as [rank]
FROM Containstable(dbo.Product, [ProductType], #searchTerm)
) innerSearch
-- Grouping by key allows us to sum each ProductID's ranks for all the columns
GROUP BY [key]) ranksGroupedByProductID
-- This join is just to get the full Product table columns
-- and is optional if you only need the ordered ProductIDs
INNER JOIN dbo.Product outerProduct
ON outerProduct.ProductID = ranksGroupedByProductID.[key]
ORDER BY WeightedRank DESC;

SQL - Selecting counts from multiple tables

Here is my problem (I'm using SQL Server)
I have a table of Students (StudentId, Firstname, Lastname, etc).
I have a table that records StudentAttendance (StudentId, ClassDate, etc.)
I record other student activity (I'm generalizing here for simplicity) such as a Papers table (StudentId, PaperId, etc.). There may be anywhere from zero to 20 papers turned in. Similarly, there is a table called Projects (StudentId, ProjectId, etc.). Same deal as with Papers.
What I'm trying to do is create a list of counts for students who have attendance over a certain level (say 10 attendances). Something like this:
ID Name Att Paper Proj
123 Baker 23 0 2
234 Charlie 26 5 3
345 Delta 13 3 0
Here is what I have:
select
s.StudentId,
s.Lastname,
COUNT(sa.StudentId) as CountofAttendance,
COUNT(p.StudentId) as CountofPapers
from Student s
inner join StudentAttendance sa on (s.StudentId = sa.StudentId)
left outer join Paper p on (s.StudentId = p.StudentId)
group by s.StudentId, s.Lastname
Having COUNT(sa.StudentId) > 10
order by CountofAttendance
If the CountofPaper and join (either inner or left outer) to the Papers table is commented out, the query works fine. I get a nice count of students who have attended at least 10 classes.
However, if I put in the CountofPapers and the join, things get crazy. With a left outer join, any students with papers just show their attendance count in the paper column. With an inner join, both attendance and paper counts seem to multiple off each other.
Guidance needed and appreciated.
Dave
Look at using Common Table Expressions and then divide and conquer your problem. BTW, you are off by 1 in your original query, you'll have 11 minimum attendence
;
WITH GOOD_STUDENTS AS
(
-- this query defines all students with 10+ attendance
SELECT
S.StudentID
, count(1) AS attendence_count
FROM
Student S
inner join
StudentAttendance sa
on (s.StudentId = sa.StudentId)
GROUP BY
S.StudentId
HAVING
COUNT(1) >= 10
)
, STUDIOUS_STUDENTS AS
(
-- lather, rinse, repeat for other metrics
SELECT
S.StudentID
, count(1) AS paper_count
FROM
Student S
inner join
Papers P
on (s.StudentId = P.StudentId)
GROUP BY
S.StudentId
)
, GREGARIOUS_STUDENTS AS
(
SELECT
S.StudentID
, count(1) AS project_count
FROM
Student S
inner join
Projects P
on (s.StudentId = P.StudentId)
GROUP BY
S.StudentId
)
-- And now we roll it all together
SELECT
S.*
, G.attendance_count
, SS.paper_count
, GS.project_count
-- ad nauseum
FROM
-- back to the well on this one as there may be
-- students did nothing
Students S
LEFT OUTER JOIN
GOOD_STUDENTS G
ON G.studentId = S.studentId
LEFT OUTER JOIN
STUDIOUS_STUDENTS SS
ON SS.studentId = S.studentId
LEFT OUTER JOIN
GREGARIOUS_STUDENTS GS
ON GS.studentId = S.studentId
I see plenty of other answer rolling in but I typed for far too long to quit ;)
The problem is there are multiple papers per student, so a StudentAttendance row for every row of Paper that joins: the counts will be re-added every time. Try this:
select
s.StudentId,
s.Lastname,
(select COUNT(*) from StudentAttendance where s.StudentId = sa.StudentId) as CountofAttendance,
(select COUNT(*) from Paper where s.StudentId = p.StudentId) as CountofPapers
from Student s
where (select COUNT(*) from StudentAttendance where s.StudentId = sa.StudentId) > 10
order by CountofAttendance
EDITED to incorporate issue with reference to CountofAttendance
btw, this isn't the fastest solution, but it is the easiest to understand, which was my intention. You can avoid the re-calculation by using a join to an aliased select, but as I said, this is the simplest.
Try this:
select std.StudentId, std.Lastname, att.AttCount, pap.PaperCount, prj.ProjCount
from Students std
left join
(
select StudentId, count(*) AttCount
from StudentAttendance
) att on
std.StudentId = att.StudentId
left join
(
select StudentId, count(*) PaperCount
from Papers
) pap on
std.StudentId = pap.StudentId
left join
(
select StudentId, count(*) ProjCount
from Projects
) prj on
std.StudentId = prj.StudentId
where att.AttCount > 10

Resources