Optimizing Execution Plans for Parameterized T-SQL Queries Containing Window Functions - sql-server

EDIT: I've updated the example code and provided complete table and view implementations for reference, but the essential question remains unchanged.
I have a fairly complex view in a database that I am attempting to query. When I attempt to retrieve a set of rows from the view by hard-coding the WHERE clause to specific foreign key values, the view executes very quickly with an optimal execution plan (indexes are used properly, etc.)
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20
However, when I attempt to add parameters to the query, all of a sudden my execution plan falls apart. When I run the query below, I'm getting index scans instead of seeks all over the place and the query performance is very poor.
DECLARE #ForeignKeyCol int = 20
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = #ForeignKeyCol
I'm using SQL Server 2008 R2. What gives here? What is it about using parameters that is causing a sub-optimal plan? Any help would be greatly appreciated.
For reference, here are the object definitions for which I'm getting the error.
CREATE TABLE [dbo].[BaseTable]
(
[PrimaryKeyCol] [uniqueidentifier] PRIMARY KEY,
[ForeignKeyCol] [int] NULL,
[DataCol] [binary](1000) NOT NULL
)
CREATE NONCLUSTERED INDEX [IX_BaseTable_ForeignKeyCol] ON [dbo].[BaseTable]
(
[ForeignKeyCol] ASC
)
CREATE VIEW [dbo].[ViewOnBaseTable]
AS
SELECT
PrimaryKeyCol,
ForeignKeyCol,
DENSE_RANK() OVER (PARTITION BY ForeignKeyCol ORDER BY PrimaryKeyCol) AS ForeignKeyRank,
DataCol
FROM
dbo.BaseTable
I am certain that the window function is the problem, but I am filtering my query by a single value that the window function is partitioning by, so I would expect the optimizer to filter first and then run the window function. It does this in the hard-coded example but not the parameterized example. Below are the two query plans. The top plan is good and the bottom plan is bad.

When using OPTION (RECOMPILE) be sure to look at the post-execution ('actual') plan rather than the pre-execution ('estimated') one. Some optimizations are only applied when execution occurs:
DECLARE #ForeignKeyCol int = 20;
SELECT ForeignKeyCol, ForeignKeyRank
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = #ForeignKeyCol
OPTION (RECOMPILE);
Pre-execution plan:
Post-execution plan:
Tested on SQL Server 2012 build 11.0.3339 and SQL Server 2008 R2 build 10.50.4270
Background & limitations
When windowing functions were added in SQL Server 2005, the optimizer had no way to push selections past these new sequence projections. To address some common scenarios where this caused performance problems, SQL Server 2008 added a new simplification rule, SelOnSeqPrj, which allows suitable selections to be pushed where the value is a constant. This constant may be a literal in the query text, or the sniffed value of a parameter obtained via OPTION (RECOMPILE). There is no particular problem with NULLs though the query may need to have ANSI_NULLS OFF to see this. As far as I know, applying the simplification to constant values only is an implementation limitation; there is no particular reason it could not be extended to work with variables. My recollection is that the SelOnSeqPrj rule addresssed the most commonly seen performance problems.
Parameterization
The SelOnSeqPrj rule is not applied when a query is successfully auto-parameterized. There is no reliable way to determine if a query was auto-parameterized in SSMS, it only indicates that auto-param was attempted. To be clear, the presence of place-holders like [#0] only shows that auto-parameterization was attempted. A reliable way to tell if a prepared plan was cached for reuse is to inspect the plan cache, where the 'parameterized plan handle' provides the link between ad-hoc and prepared plans.
For example, the following query appears to be auto-parameterized in SSMS:
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20;
But the plan cache shows otherwise:
WITH XMLNAMESPACES
(
DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
SELECT
parameterized_plan_handle =
deqp.query_plan.value('(//StmtSimple)[1]/#ParameterizedPlanHandle', 'nvarchar(64)'),
parameterized_text =
deqp.query_plan.value('(//StmtSimple)[1]/#ParameterizedText', 'nvarchar(max)'),
decp.cacheobjtype,
decp.objtype,
decp.plan_handle
FROM sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
WHERE
dest.[text] LIKE N'%ViewOnBaseTable%'
AND dest.[text] NOT LIKE N'%dm_exec_cached_plans%';
If the database option for forced parameterization is enabled, we get a parameterized result, where the optimization is not applied:
ALTER DATABASE Sandpit SET PARAMETERIZATION FORCED;
DBCC FREEPROCCACHE;
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20;
The plan cache query now shows a parameterized cached plan, linked by the parameterized plan handle:
Workaround
Where possible, my preference is to rewrite the view as an in-line table-valued function, where the intended position of the selection can be made more explicit (if necessary):
CREATE FUNCTION dbo.ParameterizedViewOnBaseTable
(#ForeignKeyCol integer)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
bt.PrimaryKeyCol,
bt.ForeignKeyCol,
ForeignKeyRank = DENSE_RANK() OVER (
PARTITION BY bt.ForeignKeyCol
ORDER BY bt.PrimaryKeyCol),
bt.DataCol
FROM dbo.BaseTable AS bt
WHERE
bt.ForeignKeyCol = #ForeignKeyCol;
The query becomes:
DECLARE #ForeignKeyCol integer = 20;
SELECT pvobt.*
FROM dbo.ParameterizedViewOnBaseTable(#ForeignKeyCol) AS pvobt;
With the execution plan:

You could always go the CROSS APPLY way.
ALTER VIEW [dbo].[ViewOnBaseTable]
AS
SELECT
PrimaryKeyCol,
ForeignKeyCol,
ForeignKeyRank,
DataCol
FROM (
SELECT DISTINCT
ForeignKeyCol
FROM dbo.BaseTable
) AS Src
CROSS APPLY (
SELECT
PrimaryKeyCol,
DENSE_RANK() OVER (ORDER BY PrimaryKeyCol) AS ForeignKeyRank,
DataCol
FROM dbo.BaseTable AS B
WHERE B.ForeignKeyCol = Src.ForeignKeyCol
) AS X

I think in this particular case it may be because the data types between your parameters and your table do not match exactly so SQL Server has to do an implicit conversion which is not a sargable operation.
Check your table data types and make your parameters the same type. Or do the cast yourself outside the query.

Related

Why does SSMS insert a ' TOP (100) PERCENT' into the view that I am trying to write and warn me that ORDER BY may not work?

I am trying to create a SQL View in SSMS. I am using Views because they are easier to invoke from Power BI than Stored Procedures (especially when no parameters are needed).
I start by writing and testing a SQL SELECT query with an ORDER BY clause.
When I copy and paste my query in the New View:
SSMS adds a TOP (100) PERCENT to my SELECT statement.
Tells me that my ORDER BY clause (which works perfectly well in the SQL SELECT) may not work.
If you click the Help button on the dialog, you are taken to a Microsoft "Oops! No F1 help match was found" page.
My questions are:
Is TOP (100) PERCENT not implied when it is left out of a SQL Select?
Why would a View based on a SQL Select statement not like ORDER BY clauses?
SQL views to not support ORDER BY. For more detail on this, see these other posts:
Create a view with ORDER BY clause
Possible to have an OrderBy in a view?
Order BY is not supported in view in sql server
Why use Select Top 100 Percent?
As #Martin Smith said, your options are one of the following:
Put the ORDER BY in a query that references the view.
SELECT * FROM ViewName ORDER BY [...]
Do the ordering in the Power Query Editor. If you don't have any steps before this sort that break query folding, this should be translated into a native SQL query that gets evaluated on the SQL Server.
I recommend the latter since further steps can also potentially be folded in as well. Specifying your own query does not support query folding.

Why is a T-SQL variable comparison slower than GETDATE() function-based comparison?

I have a T-SQL statement that I am running against a table with many rows. I am seeing some strange behavior. Comparing a DateTime column against a precalculated value is slower than comparing each row against a calculation based on the GETDATE() function.
The following SQL takes 8 secs:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
DECLARE #TimeZoneOffset int = -(DATEPART("HH", GETUTCDATE() - GETDATE()))
DECLARE #LowerTime DATETIME = DATEADD("HH", ABS(#TimeZoneOffset), CONVERT(VARCHAR, GETDATE(), 101) + ' 17:00:00')
SELECT TOP 200 Id, EventDate, Message
FROM Events WITH (NOLOCK)
WHERE EventDate > #LowerTime
GO
This alternate strangely returns instantly:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
SELECT TOP 200 Id, EventDate, Message
FROM Events WITH (NOLOCK)
WHERE EventDate > GETDATE()-1
GO
Why is the second query so much faster?
EDITED: I updated the SQL to accurately reflect other settings I am using
After doing a lot of reading and researching, I've discovered the issue here is parameter sniffing. Sql Server attempts to determine how best to use indexes based on the where clause, but in this case it isnt doing a very good job.
See the examples below :
Slow version:
declare #dNow DateTime
Select #dNow=GetDate()
Select *
From response_master_Incident rmi
Where rmi.response_date between DateAdd(hh,-2,#dNow) AND #dNow
Fast version:
Select *
From response_master_Incident rmi
Where rmi.response_date between DateAdd(hh,-2,GetDate()) AND GetDate()
The "Fast" version runs around 10x faster than the slow version. The Response_Date field is indexed and is a DateTime type.
The solution is to tell Sql Server how best to optimise the query. Modifying the example as follows to include the OPTIMIZE option resulted in it using the same execution plan as the "Fast Version". The OPTMIZE option here explicitly tells sql server to treat the local #dNow variable as a date (as if declaring it as DateTime wasnt enough :s )
Care should be taken when doing this however because in more complicated WHERE clauses you could end up making the query perform worse than Sql Server's own optimisations.
declare #dNow DateTime
SET #dNow=GetDate()
Select ID, response_date, call_back_phone
from response_master_Incident rmi
where rmi.response_date between DateAdd(hh,-2,#dNow) AND #dNow
-- The optimizer does not know too much about the variable so assumes to should perform a clusterd index scann (on the clustered index ID) - this is slow
-- This hint tells the optimzer that the variable is indeed a datetime in this format (why it does not know that already who knows)
OPTION(OPTIMIZE FOR (#dNow = '99991231'));
The execution plans must be different, because SQL Server does not evaluate the value of the variable when creating the execution plan in execution time. So, it uses average statistics from all the different dates that can be stored in the table.
On the other hand, the function getdate is evaluated in execution time, so the execution plan is created using statistics for that specific date, which of course, are more realistic that the previous ones.
If you create a stored procedure with #LowerTime as a parameter, you will get better results.

SQL Server Query: Fast with Literal but Slow with Variable

I have a view that returns 2 ints from a table using a CTE. If I query the view like this it runs in less than a second
SELECT * FROM view1 WHERE ID = 1
However if I query the view like this it takes 4 seconds.
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id
I've checked the 2 query plans and the first query is performing a Clustered index seek on the main table returning 1 record then applying the rest of the view query to that result set, where as the second query is performing an index scan which is returning about 3000 records records rather than just the one I'm interested in and then later filtering the result set.
Is there anything obvious that I'm missing to try to get the second query to use the Index Seek rather than an index scan. I'm using SQL 2008 but anything I do needs to also run on SQL 2005. At first I thought it was some sort of parameter sniffing problem but I get the same results even if I clear the cache.
Probably it is because in the parameter case, the optimizer cannot know that the value is not null, so it needs to create a plan that returns correct results even when it is. If you have SQL Server 2008 SP1 you can try adding OPTION(RECOMPILE) to the query.
You could add an OPTIMIZE FOR hint to your query, e.g.
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id OPTION (OPTIMIZE FOR (#ID = 1))
In my case in DB table column type was defined as VarChar and in parameterized query parameter type was defined as NVarChar, this introduced CONVERT_IMPLICIT in the actual execution plan to match data type before comparing and that was culprit for sow performance, 2 sec vs 11 sec. Just correcting parameter type made parameterized query as fast as non parameterized version.
One possible way to do that is to CAST the parameters, as such:
SELECT ...
FROM ...
WHERE name = CAST(:name AS varchar)
Hope this may help someone with similar issue.
I ran into this problem myself with a view that ran < 10ms with a direct assignment (WHERE UtilAcctId=12345), but took over 100 times as long with a variable assignment (WHERE UtilAcctId = #UtilAcctId).
The execution-plan for the latter was no different than if I had run the view on the entire table.
My solution didn't require tons of indexes, optimizer-hints, or a long-statistics-update.
Instead I converted the view into a User-Table-Function where the parameter was the value needed on the WHERE clause. In fact this WHERE clause was nested 3 queries deep and it still worked and it was back to the < 10ms speed.
Eventually I changed the parameter to be a TYPE that is a table of UtilAcctIds (int). Then I can limit the WHERE clause to a list from the table.
WHERE UtilAcctId = [parameter-List].UtilAcctId.
This works even better. I think the user-table-functions are pre-compiled.
When SQL starts to optimize the query plan for the query with the variable it will match the available index against the column. In this case there was an index so SQL figured it would just scan the index looking for the value. When SQL made the plan for the query with the column and a literal value it could look at the statistics and the value to decide if it should scan the index or if a seek would be correct.
Using the optimize hint and a value tells SQL that “this is the value which will be used most of the time so optimize for this value” and a plan is stored as if this literal value was used. Using the optimize hint and the sub-hint of UNKNOWN tells SQL you do not know what the value will be, so SQL looks at the statistics for the column and decides what, seek or scan, will be best and makes the plan accordingly.
I know this is long since answered, but I came across this same issue and have a fairly simple solution that doesn't require hints, statistics-updates, additional indexes, forcing plans etc.
Based on the comment above that "the optimizer cannot know that the value is not null", I decided to move the values from a variable into a table:
Original Code:
declare #StartTime datetime2(0) = '10/23/2020 00:00:00'
declare #EndTime datetime2(0) = '10/23/2020 01:00:00'
SELECT * FROM ...
WHERE
C.CreateDtTm >= #StartTime
AND C.CreateDtTm < #EndTime
New Code:
declare #StartTime datetime2(0) = '10/23/2020 00:00:00'
declare #EndTime datetime2(0) = '10/23/2020 01:00:00'
CREATE TABLE #Times (StartTime datetime2(0) NOT NULL, EndTime datetime2(0) NOT NULL)
INSERT INTO #Times(StartTime, EndTime) VALUES(#StartTime, #EndTime)
SELECT * FROM ...
WHERE
C.CreateDtTm >= (SELECT MAX(StartTime) FROM #Times)
AND C.CreateDtTm < (SELECT MAX(EndTime) FROM #Times)
This performed instantly as opposed to several minutes for the original code (obviously your results may vary) .
I assume if I changed my data type in my main table to be NOT NULL, it would work as well, but I was not able to test this at this time due to system constraints.
Came across this same issue myself and it turned out to be a missing index involving a (left) join on the result of a subquery.
select *
from foo A
left outer join (
select x, count(*)
from bar
group by x
) B on A.x = B.x
Added an index named bar_x for bar.x
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id
Do this
DECLARE #sql varchar(max)
SET #sql = 'SELECT * FROM View1 WHERE ID =' + CAST(#id as varchar)
EXEC (#sql)
Solves your problem

if else within CTE?

I want to execute select statement within CTE based on a codition. something like below
;with CTE_AorB
(
if(condition)
select * from table_A
else
select * from table_B
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
But i get error on this. Is it possible to have if else within CTEs? If not is there a work around Or a better approach.
Thanks.
try:
;with CTE_AorB
(
select * from table_A WHERE (condition true)
union all
select * from table_B WHERE NOT (condition true)
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
the key with a dynamic search condition is to make sure an index is used, Here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
Never ever try to put conditions like IF inside a single query statements. Even if you do manage to pull it off, this is the one sure-shot way to kill performance. Remember, a single statement means a single plan, and the plan will have to be generated in a way to satisfy both cases, when condition is true and when condition is false, at once. This usually result in the worse possible plan, since the 'condition' usually creates mutually exclusive access path for the plan and the union of the two results in always end-to-end table scan.
Your best approach, for this and many many other reasons, is to pull the IF outside of the statement:
if(condition true)
select * from table_A
else
select * from table_B
I think the IF ELSE stuff might have poor caching if your branch condition flips. Maybe someone more knowledgeable can comment.
Another way would be to UNION ALL with the WHERE clauses as suggested by others. The UNION ALL would replace the IF ELSE
If you are using a parameter, then you only need one statement.
#ID (Some parameter)
;with CTE
(
select * from table_A WHERE id = #ID
union all
select * from table_B WHERE (id = #ID and condition)
)

T-SQL 1=1 Performance Hit

For my SQL queries, I usually do the following for SELECT statements:
SELECT ...
FROM table t
WHERE 1=1
AND t.[column1] = #param1
AND t.[column2] = #param2
This will make it easy if I need to add / remove / comment any WHERE clauses, since I don't have to care about the first line.
Is there any performance hit when using this pattern?
Additional Info:
Example for sheepsimulator and all other who didn't get the usage.
Suppose the above query, I need to change #param1 to be not included into the query:
With 1=1:
...
WHERE 1=1 <-- no change
--AND t.[column1] = #param1 <-- changed
AND t.[column2] = #param2 <-- no change
...
Without 1=1:
...
WHERE <-- no change
--t.[column1] = #param1 <-- changed
{AND removed} t.[column2] = #param2 <-- changed
...
No, SQL Server is smart enough to omit this condition from the execution plan since it's always TRUE.
Same is true for Oracle, MySQL and PostgreSQL.
It is likely that if you use the profiler and look, you will end up seeing that the optimizer will end up ignoring that more often than not, so in the grand scheme of things, there probably won't be much in the way of performance gain or losses.
This has no performance impact, but there the SQL text looks like it has been mangled by a SQL injection attack. The '1=1' trick appears in many sql injection based attacks. You just run the risk that some customer of yours someday deploys a 'black box' that monitors SQL traffic and you'll find your app flagged as 'hacked'. Also source code analyzers may flag this. Its a long long shot, of course, but something worth putting into the balance.
One potentially mildly negative impact of this is that the AND 1=1 will stop SQL Server's simple parameterisation facility from kicking in.
Demo script
DBCC FREEPROCCACHE; /*<-- Don't run on production box!*/
CREATE TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
(
X INT,
Y INT,
PRIMARY KEY (X,Y)
);
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 1
AND Y = 2;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 2
AND Y = 3;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 1
AND Y = 2
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 2
AND Y = 3
SELECT usecounts,
execution_count,
size_in_bytes,
cacheobjtype,
objtype,
text,
creation_time,
last_execution_time,
execution_count
FROM sys.dm_exec_cached_plans a
INNER JOIN sys.dm_exec_query_stats b
ON a.plan_handle = b.plan_handle
CROSS apply sys.dm_exec_sql_text(b.sql_handle) AS sql_text
WHERE text LIKE '%\[E7ED0174-9820-4B29-BCDF-C999CA319131\]%' ESCAPE '\'
AND text NOT LIKE '%this_query%'
ORDER BY last_execution_time DESC
GO
DROP TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
Shows that both the queries without the 1=1 were satisfied by a single parameterised version of the cached plan whereas the queries with the 1=1 compiled and stored a separate plan for the different constant values.
Ideally you shouldn't be relying on this anyway though and should be explicitly parameterising queries to ensure that the desired elements are parameterised and the parameters have the correct datatypes.
There is no difference, as they evaluated constants and are optimized out. I use both 1=1 and 0=1 in both hand- and code-generated AND and OR lists and it has no effect.
Since the condition is always true, SQL Server will ignore it. You can check by running two queries, one with the condition and one without, and comparing the two actual execution plans.
An alternative to achieve your ease of commenting requirement is to restructure your query:
SELECT ...
FROM table t
WHERE
t.[column1] = #param1 AND
t.[column2] = #param2 AND
t.[column3] = #param3
You can then add/remove/comment out lines in the where conditions and it will still be valid SQL.
No performance hit. Even if your WHERE clause is loaded with a large number of comparisons, this is tiny.
Best case scenario is that it's a bit-for-bit comparison. Worse case is that the digits are evaluated as integers.
For queries of any reasonable complexity there will be no difference. You can look at some execution plans and also compare real execution costs, and see for yourself.

Resources