SQL INNER JOIN vs LEFT JOIN with a WHERE - sql-server

I am trying to grasp SQL joins more intuitively. For example, learning how a RIGHT JOIN can just be re-written as a LEFT JOIN (by flipping the order of the tables) helped me understand much better the way that the two joins work.
However, now I'm wondering if an INNER JOIN could be re-written as a LEFT JOIN with a WHERE condition- meaning that their logic could be equivalent (by "logic" I do not mean the execution plan, but the way that the intended result set would be described).
Like:
SELECT * FROM HeaderTable
INNER JOIN DetailTable
ON HeaderTable.ID = DetailTable.ParentID
Which I would read as "Show me all the records from tables HeaderTable and DetailTable that have a matching value in the HeaderTable.ID and DetailTable.ParentID fields." Being the same as:
SELECT * FROM HeaderTable
LEFT JOIN DetailTable
ON HeaderTable.ID = DetailTable.ParentID
WHERE HeaderTable.ID = DetailTable.ParentID
Which I would read as "Show me all the records from tables HeaderTable and DetailTable where the value of HeaderTable.ID is the same as the value of DetailTable.ParentID."
Will these return the same result set? I am more asking about the logic being the same as opposed to one being more efficient than the other.
If I may ask, please don't answer with any Venn diagrams as these don't seem to describe the logic of a join exactly to me.

Yes, they will return the same result. The left join without the where clause would read as show me all the records from the header table and the related items from the details table or null for the details where there are no matches.
Adding a where clause relating the ids effectively transforms the left join to an inner join by eliminating the non-matching rows that would have shown up as having null for the detail part.
In some databases, like MS SQL Server, the left join would show up as an inner join in the query execution plan.
Although you stated that you don't want Venn diagrams I can't help referring you to this question and its answers even though they are filled with (in my opinion very helpful) Venn diagrams.

Yes they would return the same result.
But then you could simply write
SELECT *
FROM HeaderTable, DetailTable
WHERE HeaderTable.ID = DetailTable.ParentID
this returns the same result as well. This is an old syntax used before the join-clauses were introduced.

On a left join if you reference the left in the where then you negate the left and turn it into regular join

Related

MS SQL Server trouble with JOIN

I'm new to SQL and need a push in the right direction.
I currently have a working SQL query accessing 3 tables in a database, and I need to add a JOIN using a 4th table, but the syntax escapes me. To simplify, what I have now is:
SELECT
t1_col1, t1_col2, t2_col1, t2_col2, t3_col1
FROM
table1, table2, table3
WHERE
{some conditions}
ORDER BY
t1_col1 ASC;
What I need to do is to add a LEFT OUTER JOIN selecting 2 columns from table4 and have ON t1_field1 = t4_field1, but whatever I try, I'm getting syntax errors all over the place. I don't seem to understand the correct syntax.
I tried
SELECT *
FROM table1
LEFT OUTER JOIN table2;
which has no errors, but as soon as I start SELECTing columns and adding conditions, I get stuck.
I would greatly appreciate any assistance with this.
You do not specify the join criteria. Those would be in your WHERE clause under "some conditions". So, I will make up so that I can show syntax. The syntax you show is often termed "old". It has been discouraged for 15 years or more in the SQL Server documentation. Microsoft consistently threatens to stop recognizing the syntax. But, apparently they have not followed through on that threat.
The syntax errors you are getting occur because you are mixing the old style joins (comma separated with WHERE clause) with the new style (LEFT OUTER JOIN) with ON clauses.
Your existing query should be changed to something like this. The tables are aliased because it makes it easier to read and is customary. I just made up the JOIN criteria.
SELECT t1_col1, t1_col2, t2_col1, t2_col2, t3_col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.one_ID = t1.one_ID
INNER JOIN table3 t3 ON t3.two_ID = t2.two_ID
LEFT OUTER JOIN table4 t4 ON t4.three_ID = t3.three_ID
I hope that helps with "a push in the right direction."
You may also want to read this post that explains the different ways to join tables in a query. What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Also for the record the "OLD STYLE" of joining tables (NOT RECOMMENDED) would look like this (but do NOT do this - it is a horrible way to write SQL). And it does not work for left outer joins. Get familiar with using the ...JOIN...ON... syntax:
SELECT t1_col1, t1_col2, t2_col1, t2_col2, t3_col1
FROM table1 t1, table2 t2, table3 t3
LEFT OUTER JOIN table4 t4 ON t4.three_ID = t3.three_ID
WHERE
t2.one_ID = t1.one_ID
AND t3.two_ID = t2.two_ID

SQLite join multiple values from two tables [duplicate]

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.

Mixed JOIN and COMMA table listing in FROM; meaning of listed items

Ok, I currently tasked with maintaining a legacy SQL Server 2005 database recently migrated to 2008 [R2] and there are some sprocs with something like the following:
SELECT
#val = c.TableVal
FROM
dbo.TaqbleA a
LEFT JOIN dbo.TableB b
ON a.TableId = b.TableId
,dbo.TableC c
WHERE
a.TableId = c.TableId
Fist off I know that the join conditions as seen with table A and B are the normal and how I'd rewrite this but I'm wondering if Table C is treated as a LEFT JOIN as it is listed after the LEFT JOIN or if it is treated as an INNER JOIN? My gut says it is treated as an INNER JOIN, but I'm not certain. Which is it?
Furthermore, I know I'm not the only one who has ever seen something like this and it would be good to have this answer immortalized in the StackExchange...
Thanks.
It is effectively an INNER JOIN or CROSS JOIN; the comma has lower precedence than the explicit JOIN, so FROM ... LEFT JOIN ... ON ..., ... means FROM (... LEFT JOIN ... ON ...), ..., with the LEFT JOIN pertaining only to the first two tables.

Is moving a constraint into a join more efficient than join and a where clause?

I have been trying to test this, but I have doubts about my tests as the timings vary so much.
-- Scenario 1
SELECT * FROM Foo f
INNER JOIN Bar b ON f.id = b.id
WHERE b.flag = true;
-- Scenario 2
SELECT * FROM Foo f
INNER JOIN Bar b ON b.flag = true AND f.id = b.id;
Logically it seems like scenario 2 would be more efficient, but I wasn't sure if SQL server is smart enough to optimize this or not.
Not sure why you think scenario 2 would "logically" be more efficient. On an INNER JOIN everything is basically a filter so SQL Server can collapse the logic to the exact same underlying plan shape. Here's an example from AdventureWorks2012 (click to enlarge):
I prefer separating the join criteria from the filter criteria, so will always write the query in the format on the left. However #HLGEM makes a good point, these clauses are interchangeable in this case only because it's an INNER JOIN. For an OUTER JOIN, it is very important to place the filters on the outer table in the join criteria, else you unwittingly end up with an INNER JOIN and drastically change the semantics of the query. So my advice about how the plan can be collapsed only holds true for inner joins.
If you're worried about performance, I'd start by getting rid of SELECT * and only pulling the columns you actually need (and make sure there's a covering index).
Four months later, another answer has emerged claiming that there usually will be a difference in performance, and that putting filter criteria in the ON clause will be better. While I won't dispute that it is certainly plausible that this could happen, I contend that it certainly isn't the norm and shouldn't be something you use as an excuse to always put all filter criteria in the ON clause.
The accepted answer is correct only for your test case.
An answer to the headline question as stated is yes, moving the constraint to the join condition can greatly improve the query and ensures. I have seen forms similar to this (but perhaps not exactly)...
select *
from A
inner join B
on B.id = a.id
inner join C
on C.id = A.id
where B.z = 1 and C.z = 2;
...not optimize to the same plan as the "on join" equivalents so I tend to use the "on join" constraints as a best practice even for the simpler cases that might have resolved optimally either way.

SQL Server - Join Question - 3 tables

Consider the example from MSDN documentation:
SELECT p.Name, pr.ProductReviewID
FROM Production.Product p
LEFT OUTER JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductID
In this example, it is clear that the table on the left is "Production" and that is where all rows will be returned from, and then only those that match in ProductReview.
But now consider the following hypothetical query with 3 tables A,B,C
select * from A
inner Join B on A.field1 = B.field1
left outer join C on C.field2 = b.Field2
Which is the left table in this query (from which all records will be returned, regardless of a match to C)? Is it A or B? Or is it the result of the join from A & B?
My confusion arises from the following MSDN documentation, which states that "Outer joins can be specified in the FROM clause only" which would mean that the left table in my hypothetical query is A, but then I dont have an ON clause that specifies the join condition - in which case is my hypothetical query a bad one?
Since there is an INNER JOIN between A and B, only rows from B that match A will qualify for the LEFT JOIN to C.
I'm not 100% sure I understand you question, but assuming I am understanding it correctly:
Your "left" table in your hypothetical query is B, since your ON condition specifies the B.Field2.
The terms 'left" and "right" are not sufficiently specific in this context. Instead, you should use the terms "preserved" and "unpreserved". In that light, tables A and B are preserved and table C is unpreserved.
The reference in the MSDN documentation is meant to imply you cannot use joins (outer or otherwise) in the Select, Where, Group By, Having or Order By clauses outside of a subquery (where they are still in a From clause).
From your joins
A inner Join B on A.field1 = B
left outer join C on C.field2 = b.Field2
You need to have records from table A and B.
The left join only has data from table C field field2 matching the B table, but note that table A field2 does not have to match.
To see your data for table C run the following:
select c.*
from A inner Join B on A.field1 = B.field1 left outer join C on C.field2 = b.Field2
They use the term FROM clause in a general (broad) sense meaning the whole section of the query that starts from the keyword FROM and includes all the joins there are.
Here's a fuller context (note the previous sentence):
Inner joins can be specified in either the FROM or WHERE clauses. Outer joins can be specified in the FROM clause only.
See? They mean you cannot specify an outer join in the WHERE clause as is the case with inner joins. You can only do that in the FROM clause (that is, after however many other joins too). The result will be applied to the result of the previous joins.

Resources