I'm maintaining stored procedures for SQL Server 2005 and I wish I could use a new feature in 2008 that allows the query hint: "OPTIMIZE FOR UNKNOWN"
It seems as though the following query (written for SQL Server 2005) estimates the same number of rows (i.e. selectivity) as if OPTION (OPTIMIZE FOR UNKNOWN) were specified:
CREATE PROCEDURE SwartTest(#productid INT)
AS
DECLARE #newproductid INT
SET #newproductid = #productid
SELECT ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = #newproductid
This query avoids parameter sniffing by declaring and setting a new variable. Is this really a SQL Server 2005 work-around for the OPTIMIZE-FOR-UNKNOWN feature? Or am I missing something? (Authoritative links, answers or test results are appreciated).
More Info:
A quick test on SQL Server 2008 tells me that the number of estimated rows in this query is in fact the same as if OPTIMIZE FOR UNKNOWN was specified. Is this the same behavior on SQL Server 2005? I think I remember hearing once that without more info, the SQL Server Optimizing Engine has to guess at the selectivity of the parameter (usually at 10% for inequality predicates). I'm still looking for definitive info on SQL 2005 behavior though. I'm not quite sure that info exists though...
More Info 2:
To be clear, this question is asking for a comparison of the UNKNOWN query hint and the parameter-masking technique I describe.
It's a technical question, not a problem solving question. I considered a lot of other options and settled on this. So the only goal of this question was to help me gain some confidence that the two methods are equivalent.
I've used that solution several times recently to avoid parameter sniffing on SQL 2005 and it seems to me to do the same thing as OPTIMIZE FOR UNKNOWN on SQL 2008. Its fixed a lot of problems we had with some of our bigger stored procedures sometimes just hanging when passed certain parameters.
Okay, so I've done some experimenting. I'll write up the results here, but first I want to say that based on what I've seeen and know, I'm confident that using temporary parameters in 2005 and 2008 is exactly equivalent to using 2008's OPTIMIZE FOR UNKNOWN. At least in the context of stored procedures.
So this is what I've found.
In the procedure above, I'm using the AdventureWorks database. (But I use similar methods and get similar results for any other database) I ran:
dbcc show_statistics ('Sales.SalesOrderDetail', IX_SalesOrderDetail_ProductID)
And I see statistics with 200 steps in its histogram. Looking at its histogram I see that there are 66 distinct range rows (i.e. 66 distinct values that weren't included in stats as equality values). Add the 200 equality rows (from each step), and I get an estimate of 266 distinct values for ProductId in Sales.SalesOrderDetail.
With 121317 rows in the table, I can estimate that each ProductId has 456 rows on average. And when I look at the query plan for my test procedure (in xml format) I see something like:
...
<QueryPlan DegreeOfParallelism="1" >
<RelOp NodeId="0"
PhysicalOp="Index Seek"
LogicalOp="Index Seek"
EstimateRows="456.079"
TableCardinality="121317" />
...
<ParameterList>
<ColumnReference
Column="#newproductid"
ParameterRuntimeValue="(999)" />
</ParameterList>
</QueryPlan>
...
So I know where the EstimateRows value is coming from (accurate to three decimals) and Notice that the ParameterCompiledValue attribute is missing from query plan. This is exactly what a plan looks like when using 2008's OPTIMIZE FOR UNKNOWN
Interesting question.
There's a good article on the SQL Programming and API Development Team blog here which lists the workaround solutions, pre-SQL 2008 as:
use RECOMPILE hint so the query is recompiled every time
unparameterise the query
give specific values in OPTIMIZE FOR hint
force use of a specific index
use a plan guide
Which leads me on to this article, which mentions your workaround of using local parameters and how it generates an execution plan based on statistics. How similar this process is to the new OPTIMIZER FOR UNKNOWN hint, I don't know. My hunch is it is a reasonable workaround.
I've been using this parameter masking technique for at least the past year because off odd performance problems, and it has worked well, but is VERY annoying to have to do all the time.
I have ALSO been using WITH RECOMPILE.
I do not have controlled tests because I can't selectively turn the usage of each on and off automatically in the system, but I suspect that the parameter masking will only help IF the parameter is used. I have some complex SPs where the parameter is not used in every statement, and I expect that WITH RECOMPILE was still necessary because some of the "temporary" work tables are not populated (or even indexed identically, if I'm trying to tune) the same way on every run, and some later statements don't rely on the parameter once the work tables are already appropriately populated. I have broken some processes up into multiple SPs precisely so that work done to populate a work table in one SP can be properly analyzed and executed against WITH RECOMPILE in the next SP.
Related
I remember many years ago (YES, last 2006) that I stumbled upon a hidden or undocumented built-in function that computes the variability / variance of a column. This function was used when trying to determine if a column is good to put an index on. I remember reading that at SQL Server Central and it outputs something like
Column 1: 0.0291
Example would be if a column is boolean, where there are only 2 possible values it will output: 0.5
Or something like that
Now, I've searched for that function, I tried many google searches for weeks but I can't find it anymore.
Does anyone know what that function is?
I think what you are referring to is 'Cardinality', not variability. In any case it's an estimate of the number of distinct values in a column. I'm not aware of any function, hidden or otherwise, in SQL Server that will generate any sort of value for this, but the query plan generator certainly makes use of this when generating query plans. And you can see the results of that in the estimated query plan it generates in SSMS. The cardinality estimator has been greatly improved for SQL 2014 and you can read more about it here.
In SQL Server, what is the best way to allow for multiple execution plans to exist for a query in a SP without having to recompile every time?
For example, I have a case where the query plan varies significantly depending on how many rows are in a temp table that the query uses. Since there was no "one size fits all" plan that was satisfactory, and since it was unacceptable to recompile every time, I ended up copy/pasting (ick) the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It actually seemed to work beautifully performance-wise, but it feels a bit clunky. (I know I could similarly break this part out into multiple SPs to do the same thing.) Is there a better way to do this?
IF #RowCount < 1
[paste query here]
ELSE IF #RowCount < 50
[paste query here]
ELSE IF #RowCount < 200
[paste query here]
ELSE
[paste query here]
You can use OPTIMIZE FOR in certain situations, to create a plan targeted to a certain value of a parameter (but not multiple plans per se). This allows you to specify what parameter value we want SQL Server to use when creating the execution plan. This is a SQL Server 2005 onwards hint.
Optimize Parameter Driven Queries with the OPTIMIZE FOR Hint in SQL Server
There is also OPTIMIZE FOR UNKNOWN – a SQL Server 2008 onwards feature (use judiciously):
This hint directs the query optimizer
to use the standard algorithms it has
always used if no parameters values
had been passed to the query at all.
In this case the optimizer will look
at all available statistical data to
reach a determination of what the
values of the local variables used to
generate the queryplan should be,
instead of looking at the specific
parameter values that were passed to
the query by the application.
Perhaps also look into optimize for ad hoc workloads Option
SQL Server 2005+ has statement level recompilation and is better at dealing with this kind of branching. You have one plan still but the plan can be partially recompiled at the statement level.
But it is ugly.
I'd go with #Mitch Wheat's option personally because you have recompilations anyway with the stored procedure using a temp table. See Temp table and stored proc compilation
I would like to know if exists some command to force NO RECOMPILE of a stored procedure.
I have a procedure that take about 5 minutes to executed. But when I run directly in the Query windows it takes just few seconds. This sp have a temporary table.
My question is: Is there any way to force a stored procedure to avoid recompilation???
Note: I am using SQL Server 2005.
As was pointed out in the comments, this is almost certainly nothing to do with plan recompilation. If I had to hazard a guess, this is due a bad query plan being caused by parameter sniffing.
Assume that you have an ecommerce website where we can get different sales. We're going to have a lot more addresses in California than I will in Alaska, right? The physical operations that SQL Server is going to perform to read a lot of data (summarized sales in California) is going to be very different than they query to read a little bit of data (summarized sales in Alaska). Sometimes the cached plans are great for only one set of parameters and are horrible for all others. This is often referred to as parameter sniffing.
There's a fantastic article about Parameter Sniffing available on Simple Talk's website. So you can avoid reading that, you don't have too many options apart from specifying OPTION (RECOMPILE) at the statement level, specifying WITH RECOMPILE at the procedure level, or copying the procedure's parameters into local variables and using those to run your parameterized query.
Note that SQL Server's plans are cached by SET options as well as by query text. That is, if you have different SET options active in Management Studio, you can see different behaviour from what the application is seeing.
To check the SET options for each connection, look at the quoted_identifier, arithabort, ansi_null_dflt_on, ansi_defaults, ansi_warnings, ansi_padding, ansi_nulls and concat_null_yields_null columns of the sys.dm_exec_sessions dynamic management view. For my recent problem, ADO.NET had set ARITHABORT OFF, while Management Studio had it set ON.
To change the options for a query window in Management Studio, right-click in the query editor and select Query Options from the context menu, then go to the Advanced page for ARITHABORT and CONCAT_NULL_YIELDS_NULL, and the ANSI page for QUOTED_IDENTIFIER and the ANSI options. Alternatively just execute the necessary SET options in that query window.
Once you have the same environment set up, check for differences between the estimated execution plan and the actual execution plan. The estimated plan will be computed using the parameters and statistics available at that instant, whereas the actual plan will be whatever is in the cache. Chances are that the plans are different, and you either need to update statistics, guide it according to the more typical parameters, force a recompile each time, or rewrite the query to be more stable. For example, if you have optional parameters, consider using IF/ELSE statements rather than trying to be clever and saying 'WHERE #param = -1 or Column = #param', which will behave very differently if #param is not supplied. Or, use dynamic SQL to construct the text.
You should be aware that the statistics are best when the first column of the statistics, i.e. the first column of the index for index statistics, is the most selective and the most frequently updated. SQL Server produces a detailed histogram for the first column only - up to 200 values from that column with the number of rows in each range. For the other combinations of columns it just computes an average selectivity value, the number of unique combinations divided by the number of rows sampled. It also automatically updates the statistics only when a sufficient number of changes have occurred to the lead column. See http://blogs.technet.com/b/rob/archive/2008/05/16/sql-server-statistics.aspx for more information about when statistics are updated.
We just upgraded our SQL Server 2005 to SQL server 2008 R2 and noticed some performance problems.
The query below was already slow but now in 2008 it just times out. We rebuild the catalog to make sure its freshly made on 2008
DECLARE #FREETEXT varchar(255) = 'TEN-T'
select Distinct ...
from
DOSSIER_VERSION
inner join
DOSSIER_VERSION_LOCALISED ...
where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
or
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
The query takes minutes if you have both conditions enabled.
If you just put the following in the where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
Its super fast. Same goes for the case if its just
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
Since we are or'ing the results I would expect the time for this query to run to be less than the sum but as stated above it takes minutes/times out.
Can anyone tell me what is going on here? If I use a union (which is conceptually the same as the or) the performance problem is gone but I would like to know what issue I am running into here since I want to avoid rewriting queries.
Regards, Tom
See my answers to these very similar questions:
Adding more OR searches with
CONTAINS Brings Query to Crawl
SQL Server full text query across
multiple tables - why so slow?
The basic idea is that using LEFT JOINs to CONTAINSTABLE (or FREETEXTTABLE) performs significantly better than having multiple CONTAINS (or FREETEXT) ORed together in the WHERE clause.
A while ago I had a query that I ran quite a lot for one of my users. It was still being evolved and tweaked but eventually it stablised and ran quite quickly, so we created a stored procedure from it.
So far, so normal.
The stored procedure, though, was dog slow. No material difference between the query and the proc, but the speed change was massive.
[Background, we're running SQL Server 2005.]
A friendly local DBA (who no longer works here) took one look at the stored procedure and said "parameter spoofing!" (Edit: although it seems that it is possibly also known as 'parameter sniffing', which might explain the paucity of Google hits when I tried to search it out.)
We abstracted some of the stored procedure to a second one, wrapped the call to this new inner proc into the pre-existing outer one, called the outer one and, hey presto, it was as quick as the original query.
So, what gives? Can someone explain parameter spoofing?
Bonus credit for
highlighting how to avoid it
suggesting how to recognise possible cause
discuss alternative strategies, e.g. stats, indices, keys, for mitigating the situation
FYI - you need to be aware of something else when you're working with SQL 2005 and stored procs with parameters.
SQL Server will compile the stored proc's execution plan with the first parameter that's used. So if you run this:
usp_QueryMyDataByState 'Rhode Island'
The execution plan will work best with a small state's data. But if someone turns around and runs:
usp_QueryMyDataByState 'Texas'
The execution plan designed for Rhode-Island-sized data may not be as efficient with Texas-sized data. This can produce surprising results when the server is restarted, because the newly generated execution plan will be targeted at whatever parameter is used first - not necessarily the best one. The plan won't be recompiled until there's a big reason to do it, like if statistics are rebuilt.
This is where query plans come in, and SQL Server 2008 offers a lot of new features that help DBAs pin a particular query plan in place long-term no matter what parameters get called first.
My concern is that when you rebuilt your stored proc, you forced the execution plan to recompile. You called it with your favorite parameter, and then of course it was fast - but the problem may not have been the stored proc. It might have been that the stored proc was recompiled at some point with an unusual set of parameters and thus, an inefficient query plan. You might not have fixed anything, and you might face the same problem the next time the server restarts or the query plan gets recompiled.
Yes, I think you mean parameter sniffing, which is a technique the SQL Server optimizer uses to try to figure out parameter values/ranges so it can choose the best execution plan for your query. In some instances SQL Server does a poor job at parameter sniffing & doesn't pick the best execution plan for the query.
I believe this blog article http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx has a good explanation.
It seems that the DBA in your example chose option #4 to move the query to another sproc to a separate procedural context.
You could have also used the with recompile on the original sproc or used the optimize for option on the parameter.
A simple way to speed that up is to reassign the input parameters to local parameters in the very beginning of the sproc, e.g.
CREATE PROCEDURE uspParameterSniffingAvoidance
#SniffedFormalParameter int
AS
BEGIN
DECLARE #SniffAvoidingLocalParameter int
SET #SniffAvoidingLocalParameter = #SniffedFormalParameter
--Work w/ #SniffAvoidingLocalParameter in sproc body
-- ...
In my experience, the best solution for parameter sniffing is 'Dynamic SQL'. Two important things to note is that 1. you should use parameters in your dynamic sql query 2. you should use sp_executesql (and not sp_execute), which saves the execution plan for each parameter values
Parameter sniffing is a technique SQL Server uses to optimize the query execution plan for a stored procedure. When you first call the stored procedure, SQL Server looks at the given parameter values of your call and decides which indices to use based on the parameter values.
So when the first call contains not very typical parameters, SQL Server might select and store a sub-optimal execution plan in regard to the following calls of the stored procedure.
You can work around this by either
using WITH RECOMPILE
copying the parameter values to local variables inside the stored procedure and using the locals in your queries.
I even heard that it's better to not use stored procedures at all but to send your queries directly to the server.
I recently came across the same problem where I have no real solution yet.
For some queries the copy to local vars helps getting back to the right execution plan, for some queries performance degrades with local vars.
I still have to do more research on how SQL Server caches and reuses (sub-optimal) execution plans.
I had similar problem. My stored procedure's execution plan took 30-40 seconds. I tried using the SP Statements in query window and it took few ms to execute the same.
Then I worked out declaring local variables within stored procedure and transferring the values of parameters to local variables. This made the SP execution very fast and now the same SP executes within few milliseconds instead of 30-40 seconds.
Very simple and sort, Query optimizer use old query plan for frequently running queries. but actually the size of data is also increasing so at that time new optimized plan is require and still query optimizer using old plan of query. This is called Parameter Sniffing.
I have also created detailed post on this. Please visit this url:
http://www.dbrnd.com/2015/05/sql-server-parameter-sniffing/
Changing your store procedure to execute as a batch should increase the speed.
Batch file select i.e.:
exec ('select * from order where order id ='''+ #ordersID')
Instead of the normal stored procedure select:
select * from order where order id = #ordersID
Just pass in the parameter as nvarchar and you should get quicker results.