Loop Join in SQL Server 2008 - sql-server

I'm not clear about working difference between queries mentioned below.
Specifically I'm unclear about the concept of OPTION(LOOP JOIN).
1st approach: it's a traditional join used, which is most expensive than all of below.
SELECT *
FROM [Item Detail] a
LEFT JOIN [Order Detail] b ON a.[ItemId] = b.[fkItemId] OPTION (FORCE ORDER);
2nd approach: It includes OPTION in a statement with sorted data, merely optimized.
SELECT *
FROM [Item Detail] a
LEFT LOOP JOIN [Order Detail] b ON a.[ItemId] = b.[fkItemId] OPTION (FORCE ORDER);
3rd approach: Here, I am not clear, how the query works and includes OPTION with loop join!!?
SELECT *
FROM [Item Detail] a
LEFT LOOP JOIN [Order Detail] b ON a.[ItemId] = b.[fkItemId] OPTION (LOOP JOIN);
Can anybody explain difference and way of working and advantages of each one over other?
Note: These are not Nested OR Hash loops!

From Query Hints (Transact-SQL)
FORCE ORDER Specifies that the join order indicated by the query
syntax is preserved during query optimization. Using FORCE ORDER does
not affect possible role reversal behavior of the query optimizer.
also
{ LOOP | MERGE | HASH } JOIN Specifies that all join operations are
performed by LOOP JOIN, MERGE JOIN, or HASH JOIN in the whole query.
If more than one join hint is specified, the optimizer selects the
least expensive join strategy from the allowed ones.
Advanced Query Tuning Concepts
If one join input is small (fewer than 10 rows) and the other join
input is fairly large and indexed on its join columns, an index nested
loops join is the fastest join operation because they require the
least I/O and the fewest comparisons.
If the two join inputs are not small but are sorted on their join
column (for example, if they were obtained by scanning sorted
indexes), a merge join is the fastest join operation.
Hash joins can efficiently process large, unsorted, nonindexed inputs.
And Join Hints (Transact-SQL)
Join hints specify that the query optimizer enforce a join strategy
between two tables
Your option 1 tells the optimizer to keep the join order as is. So the JOIN type can be decided by the optimizer, so might be MERGE JOIN.
You option 2 is telling the optimizer to use LOOP JOIN for this specific JOIN. If there were any other joins in the FROM section, the optimizer would be able to decide for them. Also, you are specifying the order of JOINS to take for the optimizer.
Your last option OPTION (LOOP JOIN) would enforce LOOP JOIN across all joins in the query.
This all said, it is very seldom that the optimizer would choose an incorrect plan, and this should probably indicate bigger underlying issues, such as outdated statistics or fragmented indexes.

Related

Does the order of the tables in the INNER JOINs and the FROM clause matter?

So assume you perform an inner join between table1 and table2 on column1
Does it make a difference if the table with lower entries is in the FROM-clause?
So you start with less results in the first place and then the Join will reduce it less than the other way around.
Same thing for multiple INNER JOINs, should you put the JOINs first that will reduce the results the most, so the following joins have less work? Or does it not follow an procesing order?

Is Cross join with on equivalent to inner join [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.

why Indexed Nested-Loop Join only applicable for equi-join or natural join?

Indexed Nested-Loop Join :
For each tuple tr in the outer relation R, use the index to look up tuples in S that satisfy the join condition with tuple tr
some materials mentioned that "Indexed Nested-Loop Join" only applicable for equi-join or natural join and an index is available on the inner relation’s join attribute
SELECT *
FROM tableA as a
JOIN tableB as b
ON a.col1 > b.col1;
Suppose we have an index on b.col1.
why Indexed Nested-Loop Join is not applicable for this case?
You are quoting slides for Database Systems Concepts (c) Silberschatz, Korth and Sudarshan.
We want the DBMS to calculate a join. There are lots of special cases where it can do it various ways. These might involve whether there are indexes, selection conditions, etc.
The particular technique that that book calls by that name works in certain cases:
Indexed Nested-Loop Join
If an index is available on the inner loop's join attribute and join
is an equi-join or natural join
The answer is, because your query does not meet the conditions. It is not an equi-join (ie ON or WHERE a.col1 = b.col1) or natural join (USING (col1) or NATURAL JOIN).
As to why not meeting those conditions means not using that technique, it would be because it doesn't work and/or some other technique is better. You gave the technique :
For each tuple tr in the outer relation r, use the index to look up
tuples in s that satisfy the join condition with tuple tr
If it's an inequality, you can't "look up in" the index; you have search through the index. Not this method.
Well I read the second answer, so I checked the book called Database System Concepts 7th Edition, by Silberschatz, Korth and Sudarshan
First, like you can see, Indexed Nested Loop Join, can be used if an index is available on the inner loop, and we don't have any other restriction related to equality joins
I think the confusion is with Merge Join. In page 708, Chapter 15, Query Processing subject, we can see that this algorithm can be used just to compute natural joins and equi-joins.
Profiting the topic, just a mention about Hash Join. In this case, same as Merge Join, can be used just to compute natural joins and equi-joins.

Algorithm for merge join with inequality condition

I read that Oracle supports merge join with inequality join predicates.
Is there online reference to algorithm used in implementation of such join ?
If anyone knows how to do that, Can you put it in answer?
This is what you're looking for.
7.4 Sort Merge Joins
Sort merge joins can join rows from two independent sources. In
general, hash joins perform better than sort merge joins. However,
sort merge joins can perform better than hash joins if both of the
following conditions exist:
The row sources are sorted. A sort operation is not required. However,
if a sort merge join involves choosing a slower access method (an
index scan as opposed to a full table scan), then the benefit of using
a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables
is an inequality condition such as <, <=, >, or >=. Sort merge joins
perform better than nested loops joins for large data sets. Hash joins
require an equality condition.
In a merge join, there is no concept of a driving table. The join
consists of two steps:
Sort join operation
Both the inputs are sorted on the join key.
Merge join operation
The sorted lists are merged.
If the input is sorted by the join column, then a sort join operation
is not performed for that row source. However, a sort merge join
always creates a positionable sort buffer for the right side of the
join so that it can seek back to the last match in the case where
duplicate join key values come out of the left side of the join.
There's an example here: http://www.serkey.com/oracle-skyline-query-challenge-bdh859.html
Is this what you're looking to do? (key word is "soft-merge")

Composite indexes for columns in joins and where

If I have a query:
Select *
From tableA
Inner Join tableB ON tableA.bId = tableB.id
Inner Join tableC ON tableA.cId = tableC.id
where
tableA.someColumn = ?
Do I get any performance benefit from creating a composite index(bId,cId,someColumn)?
I'm using DB2 Database for this activity.
Indexing joins depends on the join algorithm used by the database. You'll see that in the execution plan.
You will probably need an index on tableA that starts with someColumn for the where clause. Everything else depends on the join algorithm and join order.
You will probably get a more specific answer if you post the execution plan. You can also read the chapter "The Join Operation" on my site about sql indexing and try yourself.
If there are no indexes now, I'd guess that the composite index might be used in one or both inner joins. I doubt that it would be used in the WHERE clause.
But I've been doing this stuff for a long time. Guessing, like hoping, doesn't scale well.
Instead of guessing, you're better off learning how to use DB2's explain and design advisor utilities. Expect to test things like indexing first on a development computer. Building a three-column index on a 500 million row table that's in production will not make you popular.

Resources