Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.
I have Table1 having a1,a2,a3,a4 fields and Table2 having b1,b2,b3,b4 fileds.
Join:
inner join Table1.a1=Table2.b2 AND outer join Table1.a3=Table2.b3
How could I create Entity based VO having these both inner and outer joins?
You can try modifying the generated query by hand (custom query), after you create the VO based on both EO's in the traditional way.
I need to use a LEFT JOIN in a View however SQL Server replaces LEFT JOIN for LEFT OUTER JOIN every time I save my view.
When trying to use LEFT INNER JOIN explicitly I get the error "Incorrect syntax near word 'INNER'". What is more when I want to create an index for the view I get the error "Cannot add clustered index to views using OUTER JOINS".
It's maddening, and ideas why this could be happening?
So when I try to create an index for the view I get the message I used an outer join though I didn't.
You are getting the joins confused and keep in mind there are different ways of writing joins. What you're looking for is a LEFT OUTER JOIN(OUTER is an optional). There is no LEFT INNER JOIN.
There are three major types of joins.
Type 1: INNER JOIN - only where both tables match
1.) INNER JOIN aka JOIN
Type 2: OUTER JOINS where either one or both tables match
1.) LEFT OUTER JOIN aka LEFT JOIN
2.) RIGHT OUTER JOIN aka RIGHT JOIN
3.) FULL OUTER JOIN aka FULL JOIN
Type 3: CROSS JOIN - Cartesian product(all possible combos of each table)
1.) Cross Join
Here's a graphic showing how each works:
In theory, why would inner join work remarkably faster then left outer join given the fact that both queries return same result set. I had a query which would take long time to describe, but this is what I saw changing single join: left outer join - 6 sec, inner join - 0 sec (the rest of the query is the same). Result set: the same
Actually depending on the data, left outer join and inner join would not return the same results..most likely left outer join will have more result and again depends on the data..
I'd be worried if I changed a left join to an inner join and the results were not different. I would suspect that you have a condition on the left side of the table in the where clause effectively (and probably incorrectly) turning it into an inner join.
Something like:
select *
from table1 t1
left join table2 t2 on t1.myid = t2.myid
where t2.somefield = 'something'
Which is not the same thing as
select *
from table1 t1
left join table2 t2
on t1.myid = t2.myid and t2.somefield = 'something'
So first I would be worried that my query was incorrect to begin with, then I would worry about performance. An inner join is NOT a performance enhancement for a Left Join, they mean two different things and should return different results unless you have a table where there will always be a match for every record. In this case you change to an inner join because the other is incorrect not to improve performance.
My best guess as to the reason the left join takes longer is that it is joining to many more rows that then get filtered out by the where clause. But that is just a wild guess. To know you need to look at the Execution plans.
If an Inner Join can be thought of as a cross join and then getting the records that satisfy the condition, then a LEFT OUTER JOIN can be thought of as that, plus ONE record on the left table that doesn't satisfy the condition.
In other words, it is not a cross join that "goes easy" on the left records (even when the condition is not satisfied), because then the left record can appear many times (as many times as there are records in the right table).
So the LEFT OUTER JOIN is the Cross JOIN with the records satisfying the condition, plus ONE record from the LEFT TABLE that doesn't satisfy the condition?
I don't think it is correct to say a left outer join is: "the cross join with the records satsifying the condition and one record for the left table that doesn't satisy the condition".
An inner join without a condition is the same as a cross join. An inner join on x is the same as a cross join where x. But prefer the first as it is more explicit and harder to get wrong.
However with an outer join you don't always get the row "that doesn't satisfy the condition". The difference between a left outer join and an inner join is:
Inner join: If the join condition for a row in the left table fails for every row in the right table, you don't get that row.
Outer join: If the join condition for a row in the left table fails for every row in the right table, you get the row from the left table with NULLs for the columns in the right table.
You don't get both the rows that match and one row that doesn't - you either get the first situation or the second. Your statement seems to suggest that you can get both.