I have three tables:
SmallTable
(id int, flag1 bit, flag2 bit)
JoinTable
(SmallTableID int, BigTableID int)
BigTable
(id int, text1 nvarchar(100), otherstuff...)
SmallTable has, at most, a few dozen records. BigTable has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.
Here's the join logic:
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
Average joined size is a few thousand results. Everything shown is indexed.
For most SmallTable records, flag1 and flag2 are set to 1, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.
Is there a better way to hint to SQL Server that, if flag1 and flag2 are both set to 1, it shouldn't even bother looking at text1?
Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key query even faster.
SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.
On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?
I don't believe SQL Server will short-circuit conditions like that unfortunately.
SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.
This article on the matter is worth a read, and includes the bottom line:
...SQL Server does not do
short-circuiting like it is done in
other programming languages and
there's nothing you can do to force it
to.
Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).
It's not elegant, but it should work...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1 = 1 and s.flag2 = 1) OR
(
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
)
SQL Server usually grabs the subquery hint (though it's free to discard it):
SELECT *
FROM (
SELECT * FROM SmallTable where flag1 <> 1 or flag2 <> 1
) s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
...
No idea if this will be faster without test data... but it sounds like it might
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
Please let me know what happens
Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.
edit
something like this?
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE EXISTS
(SELECT 1 from BigTable b
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
)
Hope this works - careful of shortcut logic in case statements around aggregates but...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE 1=case when (s.flag1 = 1 and s.flag2 = 1) then 1
when (
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
) then 1
else 0 end
Related
I have come across this situation multiple times wherein I need to grab data from one or another table based on some parameter to the stored procedure. Let me clarify with an example. Suppose we need to grab some data from either an archived table or an online table and a bunch of other tables. I can think of 3 ways to accomplish this:
Use an if condition and store result in a temp table and then join temp table to other tables
Use an if condition and grab data either from archive table or online table and join other tables. The entire query will be duplicated except for the part of archive table or online table.
Use a union subquery
Query for Approach 1
create table #archivedOrOnline (Id int);
declare #archivedData as bit = 1;
if (#archivedData = 1)
begin
insert into #archivedOrOnline
select
at.Id
from
dbo.ArchivedTable at
end
else
begin
insert into #archivedOrOnline
select
ot.Id
from
dbo.OnlineTable ot
end
select
*
from
#archivedOrOnline ao
inner join dbo.AnotherTable at on ao.Id = at.Id;
-- Lots more joins and subqueries irrespective of #archivedData
Query for Approach 2
declare #archivedData as bit = 1;
if (#archivedData = 1)
begin
select
*
from
dbo.ArchivedTable at
inner join dbo.AnotherTable another on at.Id = another.Id
-- Lots more joins and subqueries irrespective of #archivedData
end
else
begin
select
*
from
dbo.OnlineTable ot
inner join dbo.AnotherTable at on ot.Id = at.Id
-- Lots more joins and subqueries irrespective of #archivedData
end
Query for Approach 3
declare #archivedData as bit = 1;
select
*
from
(
select
m.Id
from
dbo.OnlineTable ot
where
#archivedData = 0
union
select
m.Id
from
dbo.ArchivedTable at
where
#archivedData = 1
) archiveOrOnline
inner join dbo.AnotherTable at on at.Id = archiveOrOnline.Id;
-- Lots more joins and subqueries irrespective of #archivedData
Basically I am asking which approach to choose or if there is a better approach. Approach 2 will have a lot of duplicate code. The other 2 approaches remove code duplication. I even have the query plans but my knowledge of making decisions based on the query plan is limited. I always go with the approach which removes code duplication. If there is a performance issue, I may choose another approach.
Your approach 3 can work fine. You should definitely use UNION ALL not UNION though so SQL Server does not add operations to remove duplicates from the tables.
For best chances of success with approach 3 you would need to add an OPTION (RECOMPILE) hint so that SQL Server simplifies out the unneeded table reference at compile time at the expense of recompiling it on each execution.
If the query is executed too frequently to make that approach attractive then you may get an OK plan without it and filters with startup predicates to only access the relevant table at run time - but you may have problems with cardinality estimates with this more generic approach and it might limit the optimisations available and give you a worse plan than option 2.
If you don't mind extra unused columns in your results, you can represent such "IF"s with additional join conditions.
SELECT stuff
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON #archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON #archivedData <> 1 AND m.id = o.id
;
If the Archive and Online tables have the same fields, you can even avoid extra result fields with select expressions like COALESCE(a.field1, b.field1) AS field1
If there are following joins that are dependent on values from ArchiveTable OnlineTable, this can be simplified by performing these core joins in a subquery (at least some coalesces will be necessary though)
SELECT stuff
FROM (
SELECT m.stuff, a.stuff, o.stuff
, COALESCE(a.field1, b.field1) AS xValue
, COALESCE(a.field2, b.field2) AS yValue
, COALESCE(a.field3, b.field3) AS zValue
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON #archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON #archivedData <> 1 AND m.id = o.id
) AS coreQuery
INNER JOIN xTable AS x ON x.something = coreQuery.xValue
INNER JOIN yTable AS y ON y.something = coreQuery.yValue
INNER JOIN zTable AS z ON z.something = coreQuery.zValue
;
If there is criteria narrowing down the MainTable rows to be used, the WHERE for them should be included in the subquery to minimize the amount of Archive/Online carried out of the subquery.
If the Archive/Online table is actually the "main" table, the question's option 3 should work, but I would suggest putting any filtering criteria relevant to those tables in the their UNIONed subqueries.
If there is no filtering criteria on whatever table is "main", I would consider just maintaining two queries (or building one dynamically) so that the subqueries these approaches necessitate are not needed and will not interfere with index use.
Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.
How can I optimize Performance of the below mentioned query when the table structure is as shown in the pic below
Pic Showing The Table Structure
select CounterID, OutletTitle, CounterTitle
from(
select OutletID, Text as OutletTitle
from Outlets as q1
inner join
TranslationTexts as tt
on q1.TitleID=tt.TranslationID
where tt.Locale='ar-SA' and q1.CompanyID=311 and q1.OutletID=8 --Locale & CompanyID & OutletID
) as O
inner join
(
select CounterID, Text as CounterTitle, OutletID
from Counters as q1
inner join
TranslationTexts as tt
on q1.TitleID=tt.TranslationID
where tt.Locale='ar-SA' and q1.OutletID=8 --Locale & OutletID
) as C
on O.OutletID=C.OutletID
You should try this request :
SELECT CounterID, tou.Text as OutletTitle, tco.Text as CounterTitle
FROM Counters as co
INNER JOIN Outlets as ou ON co.OutletID = ou.OutletID
INNER JOIN TranslationTexts as tco on co.TitleID=tco.TranslationID
INNER JOIN TranslationTexts as tou on ou.TitleID=tou.TranslationID
WHERE co.CompanyID=311 and co.OutletID=8 AND tco.Locale='ar-SA' and tou.Locale='ar-SA'
To have much better performance, you could add some indexes on the 3 tables.
This is a different approach. I cannot say about improvement in performance because that depends on a lot of other things, but I believe it is an equivalent version and an easier one to read.
SELECT
C.CounterID
, tt.Text AS OutletTitle
, tt.Text AS CounterTitle
FROM
Outlets AS q1
INNER JOIN TranslationTexts AS tt ON q1.TitleID=tt.TranslationID
INNER JOIN Counters C ON c.OutletID=q1.OutletID
INNER JOIN TranslationTexts AS tt2 ON tt2.TranslationID=tt.TranslationID AND tt2.Locale=tt.Locale
WHERE
tt.Locale='ar-SA' and q1.CompanyID=311 and q1.OutletID=8;
The question is what you want to optimize.. readability (and maintainability) and/or performance ?
Most people have their own 'style' when writing queries. I prefer the one below, but to the server it will probably look the same and most likely the system will have the exact same amount of 'work' to get the data even though it 'looks' different to us humans. I'd suggest to google around a bit and learn how to interpret a Query Plan.
SELECT q2.CounterID,
tt1.Text as OutletTitle,
tt2.Text as CounterTitle
FROM Outlets as q1
INNER JOIN Counters as q2
ON q2.OutletID = q1.OutletID
INNER JOIN TranslationTexts as tt1
ON tt1.TranslationID = q1.TitleID
AND tt1.Locale = 'ar-SA'
INNER JOIN TranslationTexts as tt2
ON tt2.TranslationID = q2.TitleID
AND tt2.Locale = 'ar-SA'
WHERE q1.CompanyID = 311
AND q1.OutletID = 8
On of the things I notice is that you pass both CompanyID and OutletID as filters for the Outlets table. Since OutletID is the primary key of that table I wonder if you really need the filter on CompanyID. At best it will eliminate the record because it's the wrong company, but somehow I'm under the impression that you already know the right CompanyID.
As for performance, I'd advice these indexes
CREATE INDEX idx_Locale ON TranslationTexts (Locale, Translation_id)
CREATE INDEX idx_CompanyID ON Outlets (CompanyID) INCLUDE (TitleID, OutletID)
Most likely you even can make that index on Local a UNIQUE index making it work even better.
I need to optimize this query. The professor recommends using indices, but i'm very confused about how. If I could get just one example of what a good index is and why, and the actual code needed, I could definitely do the rest by myself. Any help would be awesome. (PSQL btw)
SELECT
x.enteredBy
, x.id
, count(DISTINCT xr.id)
, count(DISTINCT c.id)
, 'l'
FROM
((locationsV x left outer join locationReviews xr on x.id = xr.lid)
left outer join reviews r on r.id = xr.id)
left outer join comments c on xr.id = c.reviewId
WHERE
x.vNo = 0
AND (r.enteredBy IS NULL OR
(r.enteredBy <> x.enteredBy
AND c.enteredBy <> x.enteredBy
AND r.enteredBY NOT IN
(SELECT requested FROM friends WHERE requester = x.enteredBY)
AND r.enteredBY NOT IN
(SELECT requester FROM friends WHERE requested = x.enteredBY)))
AND (c.enteredBy IS NULL OR
(c.enteredBY NOT IN
(SELECT requested FROM friends WHERE requester = x.enteredBY)
AND c.enteredBY NOT IN
(SELECT requester FROM friends WHERE requested = x.enteredBY)))
GROUP BY
x.enteredBy
, x.id
I tried adding something like this to the beginning, but the overall time it took didn't change.
CREATE INDEX friends1_idx ON friends(requested);
CREATE INDEX friends2_idx ON friends(requester);
I think the SQL itself could be optimized to improve performance in addition to looking at indexes. Having those IN clauses in the WHERE clause may cause the optimizer do full table scans. So if you could move those to be tables in the FROM section you would have better performance. Also, having the COUNT(DISTINCT ...) clauses in in the SELECT statement seems problematic. You would likely be better off if you could make changes so the DISTINCT clauses were necessary there and simply use the COUNT aggregate function.
Consider using a SQL statement in the FROM clause before you do the left join--a structure something like this:
SELECT ...
FROM Table1 LEFT JOIN
(SELECT ... FROM Table2 INNER JOIN Table3 ON ...) AS Table4 ON
Table1.somecolumn = Table4.somecolumn
...
I know this isn't giving you the solution, but hopefully it will help you to think about other aspects of the problem and to explore other ways to address performance.
I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
<joined_table>
Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
WTSA_SessionSubrange ssr
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).