I'm trying to debug a query that is performing slowly. It has several with expressions which are left joined. When I remove the joins, it speeds up considerably.
Original query:
;with CTE as
(
Select *
from table1
)
SELECT *
FROM table2
LEFT JOIN CTE ON table2.CTEID
Better performing query:
;with CTE as
(
Select *
from table1
)
SELECT *
FROM table2
In the above, does it not execute CTE since it is not joined, or does it execute it regardless?
My guess is probably not-- the query optimizer is pretty smart about not executing unnecessary stuff. Every query is different, and the query optimizer uses statistics about your actual data to decide how to evaluate it, so the only way to know for sure, is to get SQL Server to tell you how it evaluated your query.
To do this, execute your query in SQL Server Management Studio with 'Include Actual Execution Plan' and you will be see clearly how it evaluated the query.
Related
SELECT DISTINCT Table3.ID
FROM Table1
INNER JOIN Table2 ON Table1.thisID = Table2.thisID
INNER JOIN Table3 ON Table2.ID = Table3.ID
WHERE ( Table1.ID IN
(
<cfqueryparam cfsqltype="cf_sql_integer"
value="#idlist#" list="yes">
)
)
AND Table2.ID IN
(
<cfqueryparam cfsqltype="cf_sql_integer"
value="#idlist2#" list="yes">
)
AND Table3.active=1
ORDER BY Table3.ID
When I run the above code it takes 11 to 15 seconds. If I remove the cfqueryparam, and just use the idlist2 variable, the query only takes 32 milliseconds.
Is this an issue with cfqueryparam, or am I doing something incorrect?
SQL performance can drop precipitously with long lists in an IN clause. If you can reduce the length of the lists, your query performance will likely improve.
When you use cfqueryparam, the values are passed to SQL as a list of arguments/parameters/variables. When you do NOT use cfqueryparam, the list of values is hardcoded into the query string. This allows SQL's "query execution plan" to be pre-optimized for that specific list of values. It also allows the plan to be cached from one execution to the next. This can result in subsequent identical queries to execute very fast, like during debugging and testing.
If this is a dynamic query, if the list of values changes each time the query is run, then you want to make sure to use cfqueryparam so that SQL Server isn't caching the execution plan for each one-time hardcoded query.
Furthermore, cfqueryparam gives you a LOT of protection against SQL Injection attacks. From a security aspect, I recommend that all values being passed into a query should use cfqueryparam.
Finally, try running the query in SQL Server Management Studio and click the Show Actual Execution Plan button. It can help you determine if adding one or more indexes on your tables would help the execution time.
'Missing Index' feature of SQL Server Management Studio
I have this query...
SELECT Distinct([TargetAttributeID]) FROM
(SELECT distinct att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
union all
SELECT distinct ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate) x
Execution Plan for the above query
The two inner distincts are looking at 32 and 10,000 rows respectively. This query returns 5 rows and executes in under 1 second.
If I then use the result of this query as the subject of an IN like so...
SELECT attx.intAttributeID,attx.txtAttributeName,attx.txtAttributeLabel,attx.txtType,attx.txtEntity FROM
AST_tblAttributes attx WHERE attx.intAttributeID
IN
(SELECT Distinct([TargetAttributeID]) FROM
(SELECT Distinct att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
union all
SELECT Distinct ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate) x)
Execution Plan for the above query
Then it takes over 3 minutes! If I just take the result of the query and perform the IN "manually" then again it comes back extremely quickly.
However if I remove the two inner DISTINCTS....
SELECT attx.intAttributeID,attx.txtAttributeName,attx.txtAttributeLabel,attx.txtType,attx.txtEntity FROM
AST_tblAttributes attx WHERE attx.intAttributeID
IN
(SELECT Distinct([TargetAttributeID]) FROM
(SELECT att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
union all
SELECT ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate) x)
Execution Plan for the above query
..then it comes back in under a second.
What is SQL Server thinking? Can it not figure out that it can perform the two sub-queries and use the result as the subject of the IN. It seems as slow as a correlated sub-query, but it isn't correlated!!!
In Show Estimate Execution plan there are three Clustered Index Scans each with a cost of 100%! (Execution Plan is here)
Can anyone tell me why the inner DISTINCTS make this query so much slower (but only when used as the subject of an IN...) ?
UPDATE
Sorry it's taken me a while to get these execution plans up...
Query 1
Query 2 (The slow one)
Query 3 - No Inner Distincts
Honestly I think it comes down to the fact that, in terms of relational operators, you have a gratuitously baroque query there, and SQL Server stops searching for alternate execution plans within the time it allows itself to find one.
After the parse and bind phase of plan compilation, SQL Server will apply logical transforms to the resulting tree, estimate the cost of each, and choose the one with the lowest cost. It doesn't exhaust all possible transformations, just as many as it can compute within a given window. So presumably, it has burned through that window before it arrives at a good plan, and it's the addition of the outer semi-self-join on AST_tblAttributes that pushed it over the edge.
How is it gratuitously baroque? Well, first off, there's this (simplified for noise reduction):
select distinct intAttributeID from (
select distinct intAttributeID from AST_tblAttributes ....
union all
select distinct intAttributeID from AST_tblAttributes ....
)
Concatenating two sets, and projecting the unique elements? Turns out there's operator for that, it's called UNION. So given enough time during plan compilation and enough logical transformations, SQL Server will realize what you really mean is:
select intAttributeID from AST_tblAttributes ....
union
select intAttributeID from AST_tblAttributes ....
But wait, you put this in a correlated subquery. Well, a correlated subquery is a semi-join, and the right relation does not require logical dedupping in a semi-join. So SQL Server may logically rewrite the query as this:
select * from AST_tblAttributes
where intAttributeID in (
select intAttributeID from AST_tblAttributes ....
union all
select intAttributeID from AST_tblAttributes ....
)
And then go about physical plan selection. But to get there, it has to see though the cruft first, and that may fall outside the optimization window.
EDIT:
Really, the way to explore this for yourself, and corroborate the speculation above, is to put both versions of the query in the same window and compare estimated execution plans side-by-side (Ctrl-L in SSMS). Leave one as is, edit the other, and see what changes.
You will see that some alternate forms are recognized as logically equivalent and generate to the same good plan, and others generate less optimal plans, as you bork the optimizer.**
Then, you can use SET STATISTICS IO ON and SET STATISTICS TIME ON to observe the actual amount of work SQL Server performs to execute the queries:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT ....
SELECT ....
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
The output will appear in the messages pane.
** Or not--if they all generate the same plan, but actual execution time still varies like you say, something else may be going on--it's not unheard of. Try comparing actual execution plans and go from there.
El Ronnoco
First of all a possible explanation:
You say that: "This query returns 5 rows and executes in under 1 second.". But how many rows does it ESTIMATE are returned? If the estimate is very much off, using the query as part of the IN part could cause you to scan the entire: AST_tblAttributes in the outer part, instead of index seeking it (which could explain the big difference)
If you shared the query plans for the different variants (as a file, please), I think I should be able to get you an idea of what is going on under the hood here. It would also allow us to validate the explanation.
Edit: each DISTINCT keyword adds a new Sort node to your query plan. Basically, by having those other DISTINCTs in there, you're forcing SQL to re-sort the entire table again and again to make sure that it isn't returning duplicates. Each such operation can quadruple the cost of the query. Here's a good review of the effects that the DISTINCT operator can have, intended an unintended. I've been bitten by this, myself.
Are you using SQL 2008? If so, you can try this, putting the DISTINCT work into a CTE and then joining to your main table. I've found CTEs to be pretty fast:
WITH DistinctAttribID
AS
(
SELECT Distinct([TargetAttributeID])
FROM (
SELECT distinct att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
UNION ALL
SELECT distinct ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate
) x
SELECT attx.intAttributeID,
attx.txtAttributeName,
attx.txtAttributeLabel,
attx.txtType,
attx.txtEntity
FROM AST_tblAttributes attx
JOIN DistinctAttribID attrib
ON attx.intAttributeID = attrib.TargetAttributeID
I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.
You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)
Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes
As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues
I have CTEs which all uses NOLOCK inside. But then selects from those CTEs in parent CTEs using children CTEs doesn't use NOLOCK in presumption that it is already NOLOCK'd. And the final select doesn't use NOLOCK either.
Something like that:
with cte1 as
(select * from tab1 (nolock)),
cte2 as
(select * from cte1)
select * from cte2
or shall I write
with cte1 as
(select * from tab1 (nolock)),
cte2 as
(select * from cte1 (nolock))
select * from cte2 (nolock)
thanks
You don't need the outer nolock to avoid taking shared locks on tab1. You can easily verify this by setting up a SQL Profiler trace capturing the various events in the locks category, filtering on the spid of an SSMS connection and trying both versions.
nolock is quite a dangerous setting though, are you aware of all of the possible downsides to using it (dirty reads, reading data twice or not at all)?
NOLOCK on an "outer" query applies to any inner queries too. A CTE is just a macro like a view or inline table udf: nothing more, nothing less. So you actually have (ignoring NOLOCK hints)
select * from (
select * from (
select * from tab1
) t1
) t2
From Table Hints on MSDN, under "Remarks"
All lock hints are propagated to all the tables and views that are accessed by the query plan, including tables and views referenced in a view.
In this case, you need only one. Doesn't matter where.
Where it does matter is where you have JOINs. If cte1 was a join of 2 tables, you'd need it for each table,. Or specify it once at a higher/outer level.
Oh, and I'll join in with everyone else: NOLOCK is a bad idea
The innermost nolock is sufficient, no need to repeat it for the outer selects.
You can test this by starting a transaction without ending it:
begin transaction
; with YourCte ( ...
Then you can view the locks using Management Studio. They'll be there until the transaction times out.
NOLOCK for CTEs work the same way as with everything else: it causes inconsistent results. Use SNAPSHOT instead, see SQL Server 2005 Row Versioning-Based Transaction Isolation.
I have a simple, uncorrelated subquery that performs very poorly on SQL Server. I'm not very experienced at reading execution plans, but it looks like the inner query is being executed once for every row in the outer query, even though the results are the same each time. What can I do to tell SQL Server to execute the inner query only once?
The query looks like this:
select *
from Record record0_
where record0_.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
and (
record0_.EntityFK in (
select record1_.EntityFK
from Record record1_
join RecordTextValue textvalues2_ on record1_.PK=textvalues2_.RecordFK
and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and (textvalues2_.Value like 'O%' escape '~')
)
)
Analyze your SQL Statement and put it in the Database Engine Tuning Advisor (2005+) and see what indexes it suggests.
I don't think you're giving SQL Server enough credit on determining the best way to run a query. Make sure you have indexes on the fields in your joins and where clauses.
This may be an alternative to your query, but will probably run the same:
select record0_.*
from Record record0_
inner join Record record1_
on record0_.EntityFK = record1_.EntityFK
inner join RecordTextValue textvalues2_
on record1_.PK=textvalues2_.RecordFK
and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and (textvalues2_.Value like 'O%' escape '~')
where record0_.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
You should be able to change this to a straight forward join, does that help:
select r.*
from Record r
join ( select record1_.EntityFK
from Record record1_
join RecordTextValue textvalues2_ on record1_.PK=textvalues2_.RecordFK
and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and (textvalues2_.Value like 'O%' escape '~')
) s on s.EntityFK = r.EntityFK
where r.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
This looks a lot more sensible.. (but pretty much the same query)
select r.*
from Record r
join ( select ri.EntityFK
from Record ri
join RecordTextValue t on ri.PK=t.RecordFK
where
t.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and t.Value like 'O%'
) s on s.EntityFK = r.EntityFK
where r.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'