Query optimization for INNER JOIN OR condition - sql-server

I have the following query that runs really slowly and I've used the estimated execution plan to narrow down the problem to the final INNER JOIN's OR condition.
SELECT
TableE.id
FROM
TableA WITH (NOLOCK)
INNER JOIN TableB WITH (NOLOCK)
ON TableA.[bid] = TableB.[id]
LEFT JOIN TableC WITH (NOLOCK)
ON TableB.[cid] = TableC.[id]
LEFT JOIN TableD WITH (NOLOCK)
ON TableA.[did] = TableD.[id]
INNER JOIN TableE WITH (NOLOCK)
ON TableD.[eid] = TableE.[id]
OR TableE.[numericCol] = TableB.[numericCol] -- commenting out this OR statement leads to large performance increase
WHERE
TableA.[id] = #Id
I have the following index on TableB:
CREATE UNIQUE NONCLUSTERED INDEX [IX_TableB_numericCol_id] ON [dbo].[TableB]
(
[numericCol] ASC,
[id] ASC
)
and on TableE:
CREATE NONCLUSTERED INDEX [IX_TableE_numericCol] ON [dbo].[TableE]
(
[numericCol] ASC
)
Any ideas?

See my answer here:
What goes wrong when I add the Where clause?
You say that you tried adding a covering index on TableE, covering both numericCol and id. However, I suspect that the query didn't use it. That is why you see a Table Spool.
You want to force the query to use the covering index, either by making it the only index on the table, or by including a query hint. That should eliminate the Table Spool and speed up the Nested Loop.
Same for TableB. If the index being used is only on Id, then it is not helping the Nested Loop which is on NumericCol. Force the issue either by getting rid of the index on Id, or with a query hint.

Related

View slows down when I add additional conditions in where clause when querying the view

I am using a similar query. I cannot post the actual query and the execution plans here. I tried adding an execution plan suggesting a non-clustered index but it slowed down the query further.
I know it's incomplete information, but can you please suggest what I can try please? I am out of options!!
I am putting the below condition in the where clause, the date seems fine but as soon as I add any of the other 2, the query takes hours. The where condition is used when I try to query the view.
where Date_Time between '2021-11-01 00:00:00.000' and '2022-11-02 00:00:00.000'
and Visit_code not in ('12', '13')
and mode_code <>'99'
Execution plan XML
CREATE VIEW [dbo].[vw_Test] AS
select fields
from table1 ed
left join table2 e on ed.field1_id = e.field1_id
left join table3 et on et.field1_id = ed.field1_id
left join table4 etf on etf.field1_id = e.field1_id
and etf.field2_cd= 85429041
and etf.dt_tm_field >= '2025-01-01 00:00:00.0000000'
left join table5 etf_dt on etf_dt.field1 = e.field1
and etf_dt.field3= 85429039
and etf_dt.dt_tm_field >= '2025-01-01 00:00:00.0000000'
left join table6 ei on ei.field1 = ed.field1
and ei.field4_cd = 123485.00
left join table7 cvo_ModeOfArrival on cvo_ModeOfArrival.field = ed.field6
and cvo_ModeOfArrival.field5 = 12345
left join table7 cvo_ModeOfSep on cvo_ModeOfSep.field = ei.field7
and cvo_ModeOfSep.field5 = 23456
left join table7 cvo_FinancialClass on cvo_FinancialClass.field = e.field8
and cvo_FinancialClass.field5 = 34567
left join table7 cvo_Specialty on cvo_Specialty.field = e.field9
and cvo_Specialty.field5 = 45678
left join table8 ea on ea.field1_id = e.field1_id
left join table7 cvo_ea on cvo_ea.field = ea.field10
and cvo_ea.field11 = 345666
GO
Looking at your code I can't see anything that can be improved in the context of T-SQL statement.
I will advice the following:
check each table and which columns you need in the fields part - it is possible the engine to be reading the whole row, instead the needed columns as index is missing; you can create nonclustered indexes in order to reduce the IO
check if any of these new indexes can be filtered index, as you have a lot of hard coded criteria (ei.field4_cd = 123485.00)
If the above is not enough, you may think of creating separate table for storing this information and populate it in advanced.
In order to debug, you can add the following line before the query:
SET STATISTICS IO ON;
and then past the results from the messages tab here - it will give you some details about for which tables most IO is consumed. You can start with them.
I would investigate breaking this query into multiple parts using derived tables. There are plenty of examples for this online. I always try to use SELECT TOP (2147483647) ....

Efficiently group query by one column, taking the maximum of another column and a third column that comes from the same row as the maximum column

I have a table of 100,000,000+ values, so efficiency is very important to me. I need to take information from table A, join it to an index table B, then join to table C using the index retrieved from table B. The problem is, there are multiple indexes for each value in table A, and I want to retrieve the one with the most recent date.
The query below creates duplicates:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM Table_1 t1
INNER JOIN Table_2 t2 ON t1.ID_1=t2.ID_1
INNER JOIN Table_3 t3 ON t2.ID_2=t3.ID_2
This one does not, but when running with more than 35,000 vs 40,000 elements, the execution time goes from <5sec to >1min:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM
(SELECT * FROM Table_1 l CROSS APPLY Table_2 t2 WHERE t1.ID_1=t2.ID_1) t_temp
LEFT JOIN Table_3 t3 ON t_temp.ID_2=t3.ID_2
How can I decrease my execution time as much as possible?
Here is an example table:
For this table, I would be trying to get the most recent location for each person.
None of the columns are indexed and I cannot create indexes on this table.
First of all, when you are working on 100 Million+ records and that
too joining to other tables, first thing I would ask is what is the
rationale behind not creating indexes which can cover your query. If
you are not the admin of that system, I would suggest that you
should bring this up to admin group and try to understand what is
the exact reason (if any) they do not want index on that huge table.
Specially because you mentioned "efficiency is very important to
me".
Remember that 'SQL Tuning' is only one of the steps of 'Database Performance Tuning' and you can tune only as much with writing a good SQL Query. When the data volume gets huge, a good SQL Query is never sufficient without taking other Performance Tuning Measures.
Apart from what Roger has already provided, here are a few solutions that you can try out:
Solution 1
SELECT T1.ID_1, OA.ID_2, OA.Location
FROM Table1 T1
OUTER APPLY (
SELECT TOP 1 T3.ID_2, T3.Location
FROM Table2 T2
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2
WHERE T2.ID_1 = T1.ID_1
ORDER BY T3.Date DESC
) OA;
Solution 2:
SELECT DISTINCT
T1.ID_1
,T2.ID_2
,Location = FIRST_VALUE(T3.Location) OVER (PARTITION BY T1.ID_1 ORDER BY T3.Date DESC)
FROM Table1 T1
INNER JOIN Table2 T2
ON T1.ID_1 = T2.ID_1
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2;
Data Preparation:
DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2
DROP TABLE IF EXISTS Table3
SELECT TOP 10000 ID_1 = object_id, name
INTO Table1
FROM sys.all_objects
ORDER BY object_id
SELECT ID_1 = T1.ID_1, ID_2 = IDENTITY(INT, 1, 1)
INTO Table2
FROM Table1 T1
CROSS JOIN Table1 T2
SELECT ID_2, Location = 'City_'+ CAST(ID_2 AS VARCHAR(100)), Date = CAST(DATEADD(DAY, ID_2/10000, GETDATE()) AS DATE)
INTO Table3
FROM Table2
Indexes to cover the Solution 1:
CREATE NONCLUSTERED INDEX IX_TABLE1_ID_1 ON Table1 (ID_1)
CREATE NONCLUSTERED INDEX IX_TABLE2_ID_2 ON Table2 (ID_1, ID_2)
CREATE NONCLUSTERED INDEX IX_TABLE3_ID_2 ON Table3 (ID_2, Date DESC) INCLUDE (Location)
Execution Plan:
You can see that all are 'Index Seek' except for Table1 which is an legitimate 'Index Scan' because you are doing scans for each value of Table1's ID_1 value. If you put a where clause in the outer loop to search for a few specific ID_1 values, then that 'Index Scan' will turn to a 'Index Seek' as well.
I will leave the Index Strategy for the 2nd solution to you (as a homework :) ). Tips: You have to make the Location as a key as well. Or you can go with COLUMNSTORE index approach.
You can use something like this:
select top (1) with ties
a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = it.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
order by row_number() over(partition by a.A_Id order by b.Date desc);
Alternatively, you can try an olde fashioneth approache:
select a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = b.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
where not exists (
select 0 from dbo.TableB pb where pb.B_Id = b.B_Id and pb.Date > b.Date
);
However, as with all such situations, its performance will heavily depend on indices. SSMS can suggest you some, if you will look at the execution plan; off the top of my head, you will need all Id columns to be indexed, and you will need either a single (Date) or a composite (A_Id, Date, B_Id) on the TableB.
UPD: If you can't create or modify any indices, and performance is paramount, I would suggest copying the data in question into a separate schema or database, where you might have appropriate permissions. Apart from that... it's impossible to get something out of nothing.

SQL Server Using sub query in JOIN statements

Does JOINING of tables more easy to read or faster to execute if I create a sub query and a select statement with only the columns needed for the entire query?
-- Example
SELECT s.Id,
s.TransactionDate,
s.TransactionNo,
s.CustomerId,
s.SiteLocationId,
s.SubTotal,
sd.ItemId,
sd.UnitPrice,
sd.GrossAmount
FROM tblTransactions s
LEFT OUTER JOIN tblTransactionDetails sd ON sd.TransactionId = s.Id
Compare to this:
SELECT s.Id,
s.TransactionDate,
s.TransactionNo,
s.CustomerId,
s.SubTotal,
sd.ItemId,
sd.UnitPrice,
sd.GrossAmount
FROM tblTransactions s
LEFT OUTER JOIN (
SELECT TransactionId,
ItemId,
UnitPrice,
GrossAmount
FROM tblTransactionDetails
) sd ON sd.TransactionId = s.Id
What are the advantages and disadvantages of each example given? I am also trying to reduce the percentage read of the Details in the Execution Plan.
I think that SQL Server creates the same execution plan for both queries. If you want to increase the performance due to the column selection, you should create a non-clustered index. Then, the query optimizer should use the more compact index instead the table.
CREATE NONCLUSTERED INDEX ix_tblTransactionDetails_test
ON tblTransactionDetails (TransactionId) INCLUDE (ItemId, UnitPrice, GrossAmount)

How to avoid table scan and index scan for huge tables

I am using MSSQL 2008 R2. I have a table with huge number of rows (test table)
I have the following SQL code, please suggest where I can use index hints, force seek or any other means to improve performance.
Indexes
1. non-clustered - idx_id (id)
2. non-clustered - idx_name (name)
SELECT DISTINCT
p.id,
p.name,
FROM
test p
LEFT OUTER JOIN
(
SELECT
e.id
FROM
test e
INNER JOIN
(
SELECT
c.id
FROM
test c
GROUP BY
c.id
HAVING
COUNT(1) > 1
) f
ON e.id = f.id
WHERE
e.name = 'test_name'
) m
ON p.id = m.id
WHERE
m.id is null
Prerequise: have a primary key
select distinct
p.id
, p.name
from test p
where not exists (
SELECT TOP(1)
1
FROM test e
WHERE e.PrimaryKey <> p.PrimaryKey
AND e.id = p.id
AND 'test_name' IN (e.name, p.name)
)
How many columns your table contains? If there's only these two columns, it makes no sense to add nonclustered index. You should create CLUSTERED index on ID column, and that's it - you'll see performance increase.
If you have many colums, consider two options:
Create clustered index on NAME column and nonclustered index on ID column.
Create nonclustered index on ID column, and INCLUDE NAME column (you'll create covering index that way)
Generally speaking, relational databases (being relational) are written in such a way to optimize join statements. When using a "JOIN" clause with "ON" criteria, the database engine can create an optimized execution plan that takes the table structure, indexes, etc. into account. When joining on a sub-select, sometimes the same optimizing factors are not available, or are not taken into account the same way. It depends on your schema, but it is a good rule of thumb to assume that a standard join with an "on" clause is going to be more efficient than a join on a sub-select.
Your schema is pretty vague, so I am not even sure that you need the joins, but if you do, you should try performing the joins directly with "on" criteria.

Index with Leftouter join there is always Index scan in sql server 2005

I have query joining several tables, the last table is joined with LEFT
JOIN. The last table
has more then million rows and execution plan shows table scan on it. I have
indexed columns
on which the join is made. It is always use index scan but If I replace LEFT JOIN with INNER JOIN, index seek is used
used and execution
takes few seconds but with LEFT JOIN there is a table scan , so the
execution
takes several minutes. Does using outer joins turn off indexes? Missed I
something?
What is the reason for such behavior?
Here is the Query
Select *
FROM
Subjects s
INNER join Question q ON q.SubjectID = s.SubjectID
INNER JOIN Answer c ON a.QestionID = q.QuestionID
Left outer JOIN Cell c ON c.Question ID = q.QuestionID
Where S.SubjectID =15
There is cluster index on SubjectID in "Subject" table. and there is non-cluster index on questionID in other tables.
Solution:
I try it in other way and now I am index seek on Cell table. Here is the modified query:
Select *
FROM
Subjects s
INNER join Question q ON q.SubjectID = s.SubjectID
INNER JOIN Answer c ON a.QestionID = q.QuestionID
Left outer JOIN Cell c ON c.Question ID = q.QuestionID
AND C.QuestionID > 0
AND C.CellKey > 0
Where S.SubjectID =15
This way I did high selectivity on Cell table. :)
I just tried to simulate the same issue, however there is no table scan instead it was using the clustered index of Cell, at the same time you could try to force the index, you can check the syntax here and the issues you may face when forcing an index here. Hope this helps.

Resources