LEFT JOIN vs. LEFT OUTER JOIN in SQL Server - sql-server

What is the difference between LEFT JOIN and LEFT OUTER JOIN?

As per the documentation: FROM (Transact-SQL):
<join_type> ::=
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN
The keyword OUTER is marked as optional (enclosed in square brackets). In this specific case, whether you specify OUTER or not makes no difference. Note that while the other elements of the join clause is also marked as optional, leaving them out will make a difference.
For instance, the entire type-part of the JOIN clause is optional, in which case the default is INNER if you just specify JOIN. In other words, this is legal:
SELECT *
FROM A JOIN B ON A.X = B.Y
Here's a list of equivalent syntaxes:
A LEFT JOIN B A LEFT OUTER JOIN B
A RIGHT JOIN B A RIGHT OUTER JOIN B
A FULL JOIN B A FULL OUTER JOIN B
A INNER JOIN B A JOIN B
Also take a look at the answer I left on this other SO question: SQL left join vs multiple tables on FROM line?.

To answer your question there is no difference between LEFT JOIN
and LEFT OUTER JOIN, they are exactly same that said...
At the top level there are mainly 3 types of joins:
INNER
OUTER
CROSS
INNER JOIN - fetches data if present in both the tables.
OUTER JOIN are of 3 types:
LEFT OUTER JOIN - fetches data if present in the left table.
RIGHT OUTER JOIN - fetches data if present in the right table.
FULL OUTER JOIN - fetches data if present in either of the two tables.
CROSS JOIN, as the name suggests, does [n X m] that joins everything to everything.
Similar to scenario where we simply lists the tables for joining (in the FROM clause of the SELECT statement), using commas to separate them.
Points to be noted:
If you just mention JOIN then by default it is a INNER JOIN.
An OUTER join has to be LEFT | RIGHT | FULL you can not simply say OUTER JOIN.
You can drop OUTER keyword and just say LEFT JOIN or RIGHT JOIN or FULL JOIN.
For those who want to visualise these in a better way, please go to this link:
A Visual Explanation of SQL Joins

What is the difference between left join and left outer join?
Nothing. LEFT JOIN and LEFT OUTER JOIN are equivalent.

Left Join and Left Outer Join are one and the same. The former is the shorthand for the latter. The same can be said about the Right Join and Right Outer Join relationship. The demonstration will illustrate the equality. Working examples of each query have been provided via SQL Fiddle. This tool will allow for hands on manipulation of the query.
Given
Left Join and Left Outer Join
Results
Right Join and Right Outer Join
Results

I'm a PostgreSQL DBA, as far as I could understand the difference between outer or not outer joins difference is a topic that has considerable discussion all around the internet. Until today I never saw a difference between those two; So I went further and I try to find the difference between those.
At the end I read the whole documentation about it and I found the answer for this,
So if you look on documentation (at least in PostgreSQL) you can find this phrase:
"The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join."
In another words,
LEFT JOIN and LEFT OUTER JOIN ARE THE SAME
RIGHT JOIN and RIGHT OUTER JOIN ARE THE SAME
I hope it can be a contribute for those who are still trying to find the answer.

I find it easier to think of Joins in the following order:
CROSS JOIN - a Cartesian product of both tables. ALL joins begin here
INNER JOIN - a CROSS JOIN with a filter added.
OUTER JOIN - an INNER JOIN with missing elements (from either LEFT or RIGHT table)
added afterward.
Until I figured out this (relatively) simple model, JOINS were always a bit more of a black art. Now they make perfect sense.
Hope this helps more than it confuses.

To answer your question
In Sql Server joins syntax OUTER is optional
It is mentioned in msdn article : https://msdn.microsoft.com/en-us/library/ms177634(v=sql.130).aspx
So following list shows join equivalent syntaxes with and without OUTER
LEFT OUTER JOIN => LEFT JOIN
RIGHT OUTER JOIN => RIGHT JOIN
FULL OUTER JOIN => FULL JOIN
Other equivalent syntaxes
INNER JOIN => JOIN
CROSS JOIN => ,
Strongly Recommend Dotnet Mob Artice : Joins in Sql Server

Why are LEFT/RIGHT and LEFT OUTER/RIGHT OUTER the same? Let's explain why this vocabulary.
Understand that LEFT and RIGHT joins are specific cases of the OUTER join, and therefore couldn't be anything else than OUTER LEFT/OUTER RIGHT. The OUTER join is also called FULL OUTER as opposed to LEFT and RIGHT joins that are PARTIAL results of the OUTER join. Indeed:
Table A | Table B Table A | Table B Table A | Table B Table A | Table B
1 | 5 1 | 1 1 | 1 1 | 1
2 | 1 2 | 2 2 | 2 2 | 2
3 | 6 3 | null 3 | null - | -
4 | 2 4 | null 4 | null - | -
null | 5 - | - null | 5
null | 6 - | - null | 6
OUTER JOIN (FULL) LEFT OUTER (partial) RIGHT OUTER (partial)
It is now clear why those operations have aliases, as well as it is clear only 3 cases exist: INNER, OUTER, CROSS. With two sub-cases for the OUTER.
The vocabulary, the way teachers explain this, as well as some answers above, often make it looks like there are lots of different types of join. But it's actually very simple.

There are only 3 joins:
A) Cross Join = Cartesian (E.g: Table A, Table B)
B) Inner Join = JOIN (E.g: Table A Join/Inner Join Table
B)
C) Outer join:
There are three type of outer join
Left Outer Join = Left Join
Right Outer Join = Right Join
Full Outer Join = Full Join

There are mainly three types of JOIN
Inner: fetches data, that are present in both tables
Only JOIN means INNER JOIN
Outer: are of three types
LEFT OUTER - - fetches data present only in left table & matching condition
RIGHT OUTER - - fetches data present only in right table & matching condition
FULL OUTER - - fetches data present any or both table
(LEFT or RIGHT or FULL) OUTER JOIN can be written w/o writing "OUTER"
Cross Join: joins everything to everything

Syntactic sugar, makes it more obvious to the casual reader that the join isn't an inner one.

Just in the context of this question, I want to post the 2 'APPLY' operators as well:
JOINS:
INNER JOIN = JOIN
OUTER JOIN
LEFT OUTER JOIN = LEFT JOIN
RIGHT OUTER JOIN = RIGHT JOIN
FULL OUTER JOIN = FULL JOIN
CROSS JOIN
SELF-JOIN: This is not exactly a separate type of join. This is basically joining a table to itself using one of the above joins. But I felt it is worth mentioning in the context JOIN discussions as you will hear this term from many in the SQL Developer community.
APPLY:
CROSS APPLY -- Similar to INNER JOIN (But has added advantage of being able to compute something in the Right table for each row of the Left table and would return only the matching rows)
OUTER APPLY -- Similar to LEFT OUTER JOIN (But has added advantage of being able to compute something in the Right table for each row of the Left table and would return all the rows from the Left table irrespective of a match on the Right table)
https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/
https://sqlhints.com/2016/10/23/outer-apply-in-sql-server/
Real life example, when to use OUTER / CROSS APPLY in SQL
I find APPLY operator very beneficial as they give better performance than having to do the same computation in a subquery. They are also replacement of many Analytical functions in older versions of SQL Server. That is why I believe that after being comfortable with JOINS, one SQL developer should try to learn the APPLY operators next.

Related

Custom order for multiple joins in SQL Server

I am trying to write a query in SQL Server that replicates following figure:
I want the result of first left join (order_defect & ncdef) to be left join with third table (filter) and again the result of these three left join with last one (nsdic).
Each of these tables are huge, so I'm trying to find most efficient way to do it because i have limited space and I get "out of memory" error... any suggestion for an efficient query?
If I do:
Select *
from A
left join B on a.id = B.id
left join C on a.id = c.id
it's joining A and B first and then A and C...but I want the result of "A & B" to be join with "C".
Basically my question is how to use a result of one join, to join with another table.
Thank you
select
c.id
,c.colum1
,c.colum2
,c.colum3
,c.colum4
,t3.colum1
from
(
select
t1.id as id
,t1.colum1 as colum1
,t1.colum2 as colum2
,t2.column1 as colum3
,t2.colum2 as colum4
from table1 t1
left join table2 t2
on t1.id = t2.id
) as c
left join table3 t3
on c.id = t3.id
It's dificult to help you without the tables design and fields|columns, keys, ...
But I'll considerate:
1 - Primary keys fields, and how to relation the tables
2 - How to add left joins with "filters", or how to reduce the number of results
3 - Evaluate if it'll be better to use Sub-querys
Plus: Try the query with TOP 100 <--- to make test.
And remember: sometimes it's imposible to optimizate querys because of the hardware limits, like the RAM, in those case you have to show the data in sections.

SQL Server replaces LEFT JOIN for LEFT OUTER JOIN in view query

I need to use a LEFT JOIN in a View however SQL Server replaces LEFT JOIN for LEFT OUTER JOIN every time I save my view.
When trying to use LEFT INNER JOIN explicitly I get the error "Incorrect syntax near word 'INNER'". What is more when I want to create an index for the view I get the error "Cannot add clustered index to views using OUTER JOINS".
It's maddening, and ideas why this could be happening?
So when I try to create an index for the view I get the message I used an outer join though I didn't.
You are getting the joins confused and keep in mind there are different ways of writing joins. What you're looking for is a LEFT OUTER JOIN(OUTER is an optional). There is no LEFT INNER JOIN.
There are three major types of joins.
Type 1: INNER JOIN - only where both tables match
1.) INNER JOIN aka JOIN
Type 2: OUTER JOINS where either one or both tables match
1.) LEFT OUTER JOIN aka LEFT JOIN
2.) RIGHT OUTER JOIN aka RIGHT JOIN
3.) FULL OUTER JOIN aka FULL JOIN
Type 3: CROSS JOIN - Cartesian product(all possible combos of each table)
1.) Cross Join
Here's a graphic showing how each works:

SQL Server speed: left outer join vs inner join

In theory, why would inner join work remarkably faster then left outer join given the fact that both queries return same result set. I had a query which would take long time to describe, but this is what I saw changing single join: left outer join - 6 sec, inner join - 0 sec (the rest of the query is the same). Result set: the same
Actually depending on the data, left outer join and inner join would not return the same results..most likely left outer join will have more result and again depends on the data..
I'd be worried if I changed a left join to an inner join and the results were not different. I would suspect that you have a condition on the left side of the table in the where clause effectively (and probably incorrectly) turning it into an inner join.
Something like:
select *
from table1 t1
left join table2 t2 on t1.myid = t2.myid
where t2.somefield = 'something'
Which is not the same thing as
select *
from table1 t1
left join table2 t2
on t1.myid = t2.myid and t2.somefield = 'something'
So first I would be worried that my query was incorrect to begin with, then I would worry about performance. An inner join is NOT a performance enhancement for a Left Join, they mean two different things and should return different results unless you have a table where there will always be a match for every record. In this case you change to an inner join because the other is incorrect not to improve performance.
My best guess as to the reason the left join takes longer is that it is joining to many more rows that then get filtered out by the where clause. But that is just a wild guess. To know you need to look at the Execution plans.

SQL Server - Join Question - 3 tables

Consider the example from MSDN documentation:
SELECT p.Name, pr.ProductReviewID
FROM Production.Product p
LEFT OUTER JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductID
In this example, it is clear that the table on the left is "Production" and that is where all rows will be returned from, and then only those that match in ProductReview.
But now consider the following hypothetical query with 3 tables A,B,C
select * from A
inner Join B on A.field1 = B.field1
left outer join C on C.field2 = b.Field2
Which is the left table in this query (from which all records will be returned, regardless of a match to C)? Is it A or B? Or is it the result of the join from A & B?
My confusion arises from the following MSDN documentation, which states that "Outer joins can be specified in the FROM clause only" which would mean that the left table in my hypothetical query is A, but then I dont have an ON clause that specifies the join condition - in which case is my hypothetical query a bad one?
Since there is an INNER JOIN between A and B, only rows from B that match A will qualify for the LEFT JOIN to C.
I'm not 100% sure I understand you question, but assuming I am understanding it correctly:
Your "left" table in your hypothetical query is B, since your ON condition specifies the B.Field2.
The terms 'left" and "right" are not sufficiently specific in this context. Instead, you should use the terms "preserved" and "unpreserved". In that light, tables A and B are preserved and table C is unpreserved.
The reference in the MSDN documentation is meant to imply you cannot use joins (outer or otherwise) in the Select, Where, Group By, Having or Order By clauses outside of a subquery (where they are still in a From clause).
From your joins
A inner Join B on A.field1 = B
left outer join C on C.field2 = b.Field2
You need to have records from table A and B.
The left join only has data from table C field field2 matching the B table, but note that table A field2 does not have to match.
To see your data for table C run the following:
select c.*
from A inner Join B on A.field1 = B.field1 left outer join C on C.field2 = b.Field2
They use the term FROM clause in a general (broad) sense meaning the whole section of the query that starts from the keyword FROM and includes all the joins there are.
Here's a fuller context (note the previous sentence):
Inner joins can be specified in either the FROM or WHERE clauses. Outer joins can be specified in the FROM clause only.
See? They mean you cannot specify an outer join in the WHERE clause as is the case with inner joins. You can only do that in the FROM clause (that is, after however many other joins too). The result will be applied to the result of the previous joins.

Why does the order of join clauses affect the query plan in SQL Server?

I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
<joined_table>
Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
WTSA_SessionSubrange ssr
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).

Resources