Right Join in SQL Server is taking too long - sql-server

SELECT
b.1, b.2, b.3, b.4, a.4, a.3, a.5
FROM
a
RIGHT JOIN
b ON a.id = b.id
This query is taking more than 7 minutes.
Both tables have around 100 000 records and just a select from each table runs around 12 seconds avg. In execution plan it is saying that table a has logical reads of around 8708 and 100% operator cost. Both tables have CI on ID.

Verify that an INDEX on the ID column exists in table A. For each row selected in B there will be a lookup of rows in A on the ID column. If an index does not exist on that column this will result in a table scan i.e. a lookup through 100k rows to find rows with that specific ID. Not efficient.
PS - General advice: write queries that don't use RIGHT JOIN, stick to INNER, LEFT and OUTER joins unless there is no other way (there almost always is).

Use this sql below to help you identify any missing indexes. My guess is you are missing at least one.
SELECT
statement AS [database.scheme.table],
column_id , column_name, column_usage,
migs.user_seeks, migs.user_scans,
migs.last_user_seek, migs.avg_total_user_cost,
migs.avg_user_impact
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON mig.index_handle = mid.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS migs
ON mig.index_group_handle=migs.group_handle
ORDER BY mig.index_group_handle, mig.index_handle, column_id

Related

DB2 Concatenation of varchar columns in join condition

I have a table structured as below
partial_id1 | partial_id2 | partial_id3|partial_id4| Name | Address
____________|_____________|____________|___________|______|____________
and select query as
select
A.bla1,
A.bla2,
A.bla3
B.Name,
C.Name,
D.Name
from TABLE1 as A
left join ABOVE_TABLE as B
on
B.partial_id1||B.partial_id2||B.partial_id3||B.partial_id4=RPAD(A.ID1,11,'0')
left join ABOVE_TABLE as C
on
C.partial_id1||C.partial_id2||C.partial_id3||C.partial_id4=RPAD(A.ID2,11,'0')
left join ABOVE_TABLE as D
on
D.partial_id1||D.partial_id2||D.partial_id3||D.partial_id4=RPAD(A.ID3,11,'0')
where A.PK in ('1','2','22')
This query is taking too much time. If I remove the left joins it takes <50ms and if I leave them as is, it takes around 4 seconds.
How can I optimize this query? How can I avoid concatenation in join condition?
The answer to this is to fix your database design. Creating a table where you have to concatenate multiple columns to form a key is not a great design for performance.
Note that when you use column functions (concatenate on the left, RPAD on the right) this eliminates the possibility of using indexes (unless you are on DB2 10.5, which added expression-based indexes).

Why LEFT JOIN increase query time so much?

I'm using SQL Server 2012 and encountered strange problem.
This is the original query I've been using:
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, NULL
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
WHERE
H.[NUM] = (SELECT TOP 1 [NUM] FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
Table TABLE_Accounts_History consists of 3 200 000 records.
Table TABLE_For_Filtering is around 1 500 records.
Insert took me 2m 40s and inserted 1 600 000 records for further work.
But then I decided to attach a column from pretty small table TABLE_Additional (only around 100 recs):
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, P.[prof_type]
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
LEFT JOIN [TABLE_Additional] P ON H.[ACCOUNTSYS] = P.[AccountSys]
WHERE H.[NUM] = ( SELECT TOP 1 [NUM]
FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
And now it takes ages this query to complete. Why is it so? How such small left join possibly can dump performance? How can I improve it?
An update: no luck so far with LEFT JOIN. Indexes, no indexes, hinted indexes.. For now I've found a workaround by using my first query and UPDATE after it:
UPDATE [TABLE_TEMP]
SET [PROF_TYPE] = P1.[prof_type]
FROM [TABLE_TEMP] A1
LEFT JOIN
[TABLE_Additional] P1
ON A1.[ACCOUNTSYS] = P1.[AccountSys]
Takes only 5s and does pretty much the same I've been trying to achieve. Still SQL Server performance is mystery to me.
The 'small' left join is actually doing a lot of extra work for you. SQL Server has to go back to TABLE_Additional for each row from your inner join between and TABLE_Accounts_History and TABLE_For_Filtering. You can help SQL Server a few ways to speed this up by trying some indexing. You could:
1) Ensure TABLE_Accounts_History has an index on the Foreign Key H.[ACCOUNTSYS]
2) If you think that TABLE_Additional will always be accessed by the AccountSys, i.e. you will be requesting AccountSys in ordered groups, you could create a Clustered Index on TABLE_Additional.AccountSys. (in orther words physically order the table on disk in order of AccountSys)
3) You could also ensure there is a foreign key index on TABLE_Accounts_History.
left outer join selects all rows from left table. In Your case your left table has 3 200 000 this much rows and then comparing with each record to your right table. One solution is to use Indexes which will reduce retrieval time.

SQL Server Rewriting Left Join

I was having a problem with a larger query in SQL Server which I traced back to this section of code which isn't performing as expected.
SELECT item_name,item_num,rst_units
FROM tbl_item left join tbl_sales_regional on item_num=rst_item
WHERE rst_month=7 and rst_customer='AB123'
The first table (tbl_item) has 10,000 records. The second table (tbl_sales_regional) has 83 for the shown criteria.
Instead of returning 10,000 records with 9,917 nulls, the execution plan shows SQL Server has rewritten as an inner join and consequently returns 83 results.
In order to achieve the intended result, I have to join off a subquery. Can someone provide an explanation why this doesn't work?
Not sure which fields belong where, but you seem to have some fields from tbl_sales_regional in your WHERE condition.
Move them into the ON clause:
SELECT item_name, item_num, rst_units
FROM tbl_item
LEFT JOIN
tbl_sales_regional
ON rst_item = item_num
AND rst_month = 7
AND rst_customer = 'AB123'
The WHERE clause operates on the results of the join so these conditions cannot possibly hold true for any NULL records from tbl_sales_regional returned by the join, as NULL cannot be equal to anything.
That's why the optimizer transforms your query into the inner join.
Any conditions you have in your WHERE clause are applied regardless of the left join, effectively making it an inner join.
You need to change it to:
SELECT item_name,item_num,rst_units
FROM tbl_item left join tbl_sales_regional on item_num=rst_item
AND rst_month=7 and rst_customer='AB123'

Index with Leftouter join there is always Index scan in sql server 2005

I have query joining several tables, the last table is joined with LEFT
JOIN. The last table
has more then million rows and execution plan shows table scan on it. I have
indexed columns
on which the join is made. It is always use index scan but If I replace LEFT JOIN with INNER JOIN, index seek is used
used and execution
takes few seconds but with LEFT JOIN there is a table scan , so the
execution
takes several minutes. Does using outer joins turn off indexes? Missed I
something?
What is the reason for such behavior?
Here is the Query
Select *
FROM
Subjects s
INNER join Question q ON q.SubjectID = s.SubjectID
INNER JOIN Answer c ON a.QestionID = q.QuestionID
Left outer JOIN Cell c ON c.Question ID = q.QuestionID
Where S.SubjectID =15
There is cluster index on SubjectID in "Subject" table. and there is non-cluster index on questionID in other tables.
Solution:
I try it in other way and now I am index seek on Cell table. Here is the modified query:
Select *
FROM
Subjects s
INNER join Question q ON q.SubjectID = s.SubjectID
INNER JOIN Answer c ON a.QestionID = q.QuestionID
Left outer JOIN Cell c ON c.Question ID = q.QuestionID
AND C.QuestionID > 0
AND C.CellKey > 0
Where S.SubjectID =15
This way I did high selectivity on Cell table. :)
I just tried to simulate the same issue, however there is no table scan instead it was using the clustered index of Cell, at the same time you could try to force the index, you can check the syntax here and the issues you may face when forcing an index here. Hope this helps.

Why does the order of join clauses affect the query plan in SQL Server?

I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
<joined_table>
Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
WTSA_SessionSubrange ssr
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).

Resources