I have a table that can link out to several other tables depending on certain conditions. Please consider the following SQL:
DECLARE #someFlag nvarchar(1) = 'B'
select COUNT(SL_A.ID) + COUNT(SL_B.ID) + COUNT(SL_C.ID) as TotalLinks
from Customers C
left outer join SomeLookupA SL_A on
#someFlag = 'A'
AND C.SomeLookupAId = SL_A.ID
left outer join SomeLookupB SL_B on
#someFlag = 'B'
AND C.SomeLookupBId = SL_B.ID
left outer join SomeLookupC SL_C on
#someFlag = 'C'
AND C.SomeLookupCId = SL_C.ID
As you can see, as #someFlag is set to 'B' only the second left outer join will return results because of the #someFlag = '<?>' condition on each join.
Now obviously I could equally split this SQL into 3 separate statements which would run in a conditional block:
if (#someFlag = 'A')
begin
select COUNT(SL_A.ID) as TotalLinks
from Customers C
left outer join SomeLookupA SL_A on
#someFlag = 'A'
AND C.SomeLookupAId = SL_A.ID
end
else if (#someFlag = 'B')
... etc.... etc...
I personally prefer the first approach as it is more succinct.
My question is, will the SQL Server engine will optimise out the redundant left joins and will the first approach generally perform just as well as the second approach?
OP here. Thought I may as well answer my own question in case anyone else ever stumbles across this in the future.
It would appear that using conditional outer joins as I have shown above, do not perform as well as the second 'non-conditional' approach. The SQL Server engine still expends some processing time on these joins even though they can't return additional results. You can see this by constructing a query and turning on the "Include Actual Execution Plan" option as suggested above in the comments.
TL;DR; The second approach is faster.
Related
Using Microsoft T-SQL, I know there is a difference in the result set between putting a filter in the join verses putting the filter in the where clause, because I get a different row count, and its hard to pin point, because it is a complex query with thousands of rows.
So does anybody know how this might lead to different results?
I feel safer with the where clause, but I am curious why the other one didn't get the same results.
Select
.....
left outer join
datamart_agent ag on a.CUSTOMER_TKN = ag.customer_tkn
and a.CUSTOMER_ACCT_TKN = ag.customer_acct_tkn
and a.ACCOUNT_PKG_TKN = ag.account_pkg_tkn
--where (commented out)
and ag.TYPE_DESC = 'Agent'
versus
Select
.....
left outer join
datamart_agent ag on a.CUSTOMER_TKN = ag.customer_tkn
and a.CUSTOMER_ACCT_TKN = ag.customer_acct_tkn
and a.ACCOUNT_PKG_TKN = ag.account_pkg_tkn
where
--and (commented out)
ag.TYPE_DESC = 'Agent'
If condition is in your where clause then it will drop all records for which no match was found. It is a like an eraser. I deletes all records for which no match was found (Type_Desc is null).
The safer thing, if you are actually seeking left join, is to put the condition in the on clause. If there was no match, it will still come in the result. When Type_Desc is null, it means you don't know who the agent is, but if you don't know, you don't want to delete the record....
I am trying to migrate an old SQL Server 2000 DataBase, and I have to convert some queries, I found something really strange for me
Having this old query:
Select opo.opoId, fac.rate From
t_Oportunities opo, t_factors fac
Where
opo.facId *= fac.facId
And opo.clientGroup = #Group
I migrated this way:
Select opo.opoId, fac.rate From
t_Oportunities opo
Left Outer Join t_factors fac ON opo.facId = fac.facId
WHERE opo.clientGroup = #Group
But the results where different than expected, less rows, so I tried this way:
Select opo.opoId, fac.rate From
t_Oportunities opo
Left Outer Join t_factors fac ON (opo.facId = fac.facId AND opo.clientGroup
= #Group)
This worked as expected but it is a surprise for me to have to set a WHERE condition in the ON
*= used in SQL-Server... wow that is old...
as for the reason for your attempt excluding records, is that once you put a WHERE clause on the "right" table in a LEFT JOIN, you're effectively making it an INNER JOIN, as NULL records are excluded.
You can leave it as you did it at the end, though the parenthesis around the two where clauses on the join aren't needed.
I want to join a table to one of two possible tables, depending on data. Here's an attempt that did not work, but gets the idea across, I hope. Also, this is a mocked up example that may not be very realistic, so don't get too hung up on the idea this is representing real students and classes.
SELECT *
FROM
student
INNER JOIN class ON class.student_id = student.student_id
CASE
WHEN class.complete=0
THEN RIGHT OUTER JOIN report ON report.label_id = inprogress.class_id
WHEN class.complete=1
THEN RIGHT OUTER JOIN report ON report.label_id = completed.class_id
END
Any ideas?
You have two join conditions and if either are true you want to commit a join - That's a boolean OR operation.
You simply need to:
RIGHT OUTER JOIN report ON (CONDITION1) OR (CONDITION2)
Let's unravel that a moment though, what is condition 1 and what is condition 2?
WHEN class.complete=0
THEN RIGHT OUTER JOIN report ON report.label_id = inprogress.class_id
WHEN class.complete=1
THEN RIGHT OUTER JOIN report ON report.label_id = completed.class_id
Here you're putting together two conditions on each of your condition 1 and 2, so your condition 1 is:
class.complete = 0 AND report.label_id = inprogress.class_id
and your condition 2 is
class.complete = 1 AND report.label_id = completed.class_id
So the completed SQL should be something like (and this is untested off the top of my head)
RIGHT OUTER JOIN report ON (
class.complete = 0 AND report.label_id = inprogress.class_id
) OR (
class.complete = 1 AND report.label_id = completed.class_id
)
Worth mentioning..
I haven't run the above join but I know from experience the execution plan on that will be absolutely abominable, won't matter if performance isn't important and or your data set is small, but if the performance matters I strongly suggest you post a broader scope of what you want here and we can talk about a better approach to getting your particular data set that won't perform so terribly. I would personally write a join like above only as a last resort or if I was hacking something truly irrelevant.
try this (untested) -
SELECT *
FROM student S
INNER JOIN class c ON c.student_id = S.student_id
left outer join report ir on ir.label_id = inprogress.class_id AND c.complete=0
left outer join report cr on cr.label_id = completed.class_id AND c.complete=1
If you want to join to either of 2 tables with reasonable performance, write a stored procedure for each path (one that joins table 1 to table A, one that joins table 1 to table B). Make a third stored procedure that calls either the 1-A stored procedure or the 1-B stored procedure. This way an efficient query plan will be performed in each case without having to make it recompile on each call or generate a strange query plan.
In your case, you actually want some records from both of the tables you might join to. In that case, you want to union the results together rather than pick one or the other (and you can combine them in one sproc if you want to, it shouldn't hurt the query plan). If you are sure the records won't duplicate between the two queries (it seems like they wouldn't), then as usual use UNION ALL for performance.
I have been trying to test this, but I have doubts about my tests as the timings vary so much.
-- Scenario 1
SELECT * FROM Foo f
INNER JOIN Bar b ON f.id = b.id
WHERE b.flag = true;
-- Scenario 2
SELECT * FROM Foo f
INNER JOIN Bar b ON b.flag = true AND f.id = b.id;
Logically it seems like scenario 2 would be more efficient, but I wasn't sure if SQL server is smart enough to optimize this or not.
Not sure why you think scenario 2 would "logically" be more efficient. On an INNER JOIN everything is basically a filter so SQL Server can collapse the logic to the exact same underlying plan shape. Here's an example from AdventureWorks2012 (click to enlarge):
I prefer separating the join criteria from the filter criteria, so will always write the query in the format on the left. However #HLGEM makes a good point, these clauses are interchangeable in this case only because it's an INNER JOIN. For an OUTER JOIN, it is very important to place the filters on the outer table in the join criteria, else you unwittingly end up with an INNER JOIN and drastically change the semantics of the query. So my advice about how the plan can be collapsed only holds true for inner joins.
If you're worried about performance, I'd start by getting rid of SELECT * and only pulling the columns you actually need (and make sure there's a covering index).
Four months later, another answer has emerged claiming that there usually will be a difference in performance, and that putting filter criteria in the ON clause will be better. While I won't dispute that it is certainly plausible that this could happen, I contend that it certainly isn't the norm and shouldn't be something you use as an excuse to always put all filter criteria in the ON clause.
The accepted answer is correct only for your test case.
An answer to the headline question as stated is yes, moving the constraint to the join condition can greatly improve the query and ensures. I have seen forms similar to this (but perhaps not exactly)...
select *
from A
inner join B
on B.id = a.id
inner join C
on C.id = A.id
where B.z = 1 and C.z = 2;
...not optimize to the same plan as the "on join" equivalents so I tend to use the "on join" constraints as a best practice even for the simpler cases that might have resolved optimally either way.
I read this article:
Logical Processing Order of the SELECT statement
in end of article has been write ON and JOIN clause consider before WHERE.
Consider we have a master table that has 10 million records and a detail table (that has reference to master table(FK)) with 50 million record. We have a query that want just 100 record of detail table according a PK in master table.
In this situation ON and JOIN execute before WHERE?I mean that do we have 500 million record after JOIN and then WHERE apply to it?or first WHERE apply and then JOIN and ON Consider? If second answer is true do it has incoherence with top article?
thanks
In the case of an INNER JOIN or a table on the left in a LEFT JOIN, in many cases, the optimizer will find that it is better to perform any filtering first (highest selectivity) before actually performing whatever type of physical join - so there are obviously physical order of operations which are better.
To some extent you can sometimes control this (or interfere with this) with your SQL, for instance, with aggregates in subqueries.
The logical order of processing the constraints in the query can only be transformed according to known invariant transformations.
So:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is still logically equivalent to:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and they will generally have the same execution plan.
On the other hand:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is NOT equivalent to:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and so the optimizer isn't going to transform them into the same execution plan.
The optimizer is very smart and is able to move things around pretty successfully, including collapsing views and inline table-valued functions as well as even pushing things down through certain kinds of aggregates fairly successfully.
Typically, when you write SQL, it needs to be understandable, maintainable and correct. As far as efficiency in execution, if the optimizer is having difficulty turning the declarative SQL into an execution plan with acceptable performance, the code can sometimes be simplified or appropriate indexes or hints added or broken down into steps which should perform more quickly - all in successive orders of invasiveness.
It doesn't matter
Logical processing order is always honoured: regardless of actual processing order
INNER JOINs and WHERE conditions are effectively associative and commutative (hence the ANSI-89 "join in the where" syntax) so actual order doesn't matter
Logical order becomes important with outer joins and more complex queries: applying WHERE on an OUTER table changes the logic completely.
Again, it doesn't matter how the optimiser does it internally so long as the query semantics are maintained by following logical processing order.
And the key word here is "optimiser": it does exactly what it says
Just re-reading Paul White's excellent series on the Query Optimiser and remembered this question.
It is possible to use an undocumented command to disable specific transformation rules and get some insight into the transformations applied.
For (hopefully!) obvious reasons only try this on a development instance and remember to re-enable them and remove any suboptimal plans from the cache.
USE AdventureWorks2008;
/*Disable the rules*/
DBCC RULEOFF ('SELonJN');
DBCC RULEOFF ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
You can see with those two rules disabled it does a cartesian join and filter after.
/*Re-enable them*/
DBCC RULEON ('SELonJN');
DBCC RULEON ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
With them enabled the predicate is pushed right down into the index seek and so reduces the number of rows processed by the join operation.
There is no defined order. The SQL engine determines what order to perform the operations based on the execution strategy chosen by its optimizer.
I think you have misread ON as IN in the article.
However, the order it is showing in the article is correct (obviously it is msdn anyway). The ON and JOIN are executed before WHERE naturally because WHERE has to be applied as a filter on the temporary resultset obtained due to JOINS
The article just says it is logical order of execution and also at end of the paragraph it adds this line too ;)
"Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list."