Getting Around Parameter Sniffing in SQL Server 2005 - sql-server

I have seen people suggest copying the parameters to a local variable to avoid parameter sniffing in a stored proc. Say you have
CREATE PROCEDURE List_orders_3 #fromdate datetime AS
DECLARE #fromdate_copy datetime
SELECT #fromdate_copy = #fromdate
SELECT * FROM Orders WHERE OrderDate > #fromdate_copy
(I got this from http://www.sommarskog.se/query-plan-mysteries.html but I need more details to understand it fully).
But what does this actually do to the query plan cache and query plan optimizer? If it is true that the optimizer makes no assumptions about #fromdate_copy, then why is it that it won't cache a plan that is most likely going to be a full table scan (since it makes no assumptions, how could it generate anything else)?
Is this technique basically like a "no inputs will run well, but no input will run terribly either" ?

Actually, you need to assign a default variable to the #fromdate_copy field that you declare, so that when the query engine looks at the query itself, it bases a plan on the value that is 'hard-coded' - but instead, when the query actually gets executed, it gets executed with the value being passed in and switched..
Ken Henderson (the Guru himself) explained this in great detail: http://blogs.msdn.com/b/khen1234/archive/2005/06/02/424228.aspx
If you can, read his books - they offer a plethora of information about sql server internals: http://www.amazon.com/Gurus-Guide-Server-Architecture-Internals/dp/0201700476/ref=pd_bxgy_b_text_c
I'm not sure if he has anything written for the newer versions, but some of the fundamentals haven't changed that much...

Related

The order of the results is changed for apparently no reason in a multi statement query

I'm running Python pyodbc against SQL Server. I have a very complex query that I minimize here as
DECLARE #mydate DATETIME = '2021-02-08'
SELECT *
FROM atable
WHERE adate = #mydate AND afield = ?
On the Python side I'm executing the usual
crsr.execute(sql, field)
What is baffling me is that it returns all the results and it ignores the condition afield = field with no other errors but with a strange order so that when I plot the graph it is very confused! Why does it happen?
(edit Of course I should have added an ORDER BY)
I have already fixed the code with an initial
DECLARE #myfield VARCHAR(32) = ?
followed by the where condition ending with afield=#myfield and now it works as expected: the order is the normal one even if I have not introduced an explicit ORDER BY.
In other words, aside from the fact that the final correct fix is adding an ORDER BY, e.g.
SELECT *
FROM atable
WHERE adate = #mydate AND afield = ?
ORDER BY id
I'm wondering why introducing the above said change was enough to change the order.
Because your SQL connection driver does not seem to support proper parameters, it is embedding the parameter as text. I have no idea how good it's sanitizing and escaping method is, but it's not usually a good idea because of the risk of SQL injection.
I note that pymssql driver does support named parameters
What can happen is that when the compiler can see the exact value you want to use, it will calculate the statistics of how many rows are likely to match (the "cardinality"), and therefore it may choose a different access pattern based on what indexes are available.
When the value comes through as a proper parameter (when using a good SQL driver), the parameter is "sniffed", and the compiler will use the statistics based on the value in the first run of the query. This can be of benefit if there is a commonly-used value.
If the value is in a local variable, then parameter-sniffing is not used, and the compiler uses the average cardinality for that predicate. This may be better or worse than "sniffing".
It would sound like embedding the value is better. But this is not the case.
When the value is embedded, each query stand on it's own, and must be compiled again every time. This can have a big impact on execution time and CPU usage, as well as memory usage to save the query plans.

T-SQL Select from view much slower with variable

I have a view that runs fast (< 1s) when specifying a value in the where clause:
SELECT *
FROM vwPayments
WHERE AccountId = 8155
...but runs slow (~3s) when that value is a variable:
DECLARE #AccountId BIGINT = 8155
SELECT *
FROM vwPayments
WHERE AccountId = #AccountId
Why is the execution plan different for the second query? Why is it running so much slower?
In the first case the parameter value was known while compiling the statement. The optimizer used the statistics histogram to generate the best plan for that particular parameter value.
When you defined the local variable, SQL server was not able to use the parameter value to find 'the optimal value'. Since the parameter value is unknown at compile time, the optimizer calculates an estimated number of rows based on 'uniform distribution'. The optimizer came up with a plan that would be 'good enough' for any possible input parameter value.
Another interesting article that almost exactly describes your case can be found here.
In short the statistical analysis the query optimizer uses to pick the best plan picks a seek when the value is a known value and it can leverage statistics and a scan when the value is not known. It picks a scan in the second choice because the plan is compiled before the value of the where clause is known.
While I rarely recommend bossing the query analyzer around in this specific case you can use a forceseek hint or other query hints to override the engine. Be aware however, that finding a way to get an optimal plan with the engine's help is a MUCH better solution.
I did a quick Google and found a decent article that goes into the concept of local variables affecting query plans more deeply.
DECLARE #Local_AccountId BIGINT = #AccountId
SELECT *
FROM vwPayments
WHERE AccountId = #Local_AccountId
OPTION(RECOMPILE)
It works for me
It could be parameter sniffing. Try and do the following - I assume it is in a stored procedure?
DECLARE #Local_AccountId BIGINT = #AccountId
SELECT *
FROM vwPayments
WHERE AccountId = #Local_AccountId
For details about parameter sniffing, you can view this link : http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx
See if the results are different. I have encountered this problem several times, especially if the query is being called a lot during peaks and the execution plan cached is one which was created when off-peak.
Another option, but you should not need in your case is adding "WITH RECOMPILE" to a procedure definition. This would cause the procedure to be recompiled every time it is called. View http://www.techrepublic.com/article/understanding-sql-servers-with-recompile-option/5662581
I think #souplex made a very good point
Basically at the first case it's just a number and easy for system to understand, while the 2nd one is variable which means every time the system need to find the very value of it and do the check for each statement, which is a different method

Inline SQL versus stored procedure

I have a simple SELECT statement with a couple columns referenced in the WHERE clause. Normally I do these simple ones in the VB code (setup a Command object, set Command Type to text, set Command Text to the Select statement). However I'm seeing timeout problems. We've optimized just about everything we can with our tables, etc.
I'm wondering if there'd be a big performance hit just because I'm doing the query this way, versus creating a simple stored procedure with a couple params. I'm thinking maybe the inline code forces SQL to do extra work compiling, creating query plan, etc. which wouldn't occur if I used a stored procedure.
An example of the actual SQL being run:
SELECT TOP 1 * FROM MyTable WHERE Field1 = #Field1 ORDER BY ID DESC
A well formed "inline" or "ad-hoc" SQL query - if properly used with parameters - is just as good as a stored procedure.
But this is absolutely crucial: you must use properly parametrized queries! If you don't - if you concatenate together your SQL for each request - then you don't benefit from these points...
Just like with a stored procedure, upon first executing, a query execution plan must be found - and then that execution plan is cached in the plan cache - just like with a stored procedure.
That query plan is reused over and over again, if you call your inline parametrized SQL statement multiple times - and the "inline" SQL query plan is subject to the same cache eviction policies as the execution plan of a stored procedure.
Just from that point of view - if you really use properly parametrized queries - there's no performance benefit for a stored procedure.
Stored procedures have other benefits (like being a "security boundary" etc.), but just raw performance isn't one of their major plus points.
It is true that the db has to do the extra work you mention, but that should not result in a big performance hit (unless you are running the query very, very frequently..)
Use sql profiler to see what is actually getting sent to the server. Use activity monitor to see if there are other queries blocking yours.
Your query couldn't be simpler. Is Field1 indexed? As others have said, there is no performance hit associated with "ad-hoc" queries.
For where to put your queries, this is one of the oldest debates in tech. I would argue that your requests "belong" to your application. They will be versionned with your app, tested with your app and should disappear when your app disappears. Putting them anywhere other than in your app is walking into a world of pain. But for goodness sake, use .sql files, compiled as embedded resources.
Select statement which is part of form clause of any
another statement is called as inline query.
Cannot take parameters.
Not a database object
Procedure:
Can take paramters
Database object
can be used globally if same action needs to be performed.

How does DateTime.Now affect query plan caching in SQL Server?

Question:
Does passing DateTime.Now as a parameter to a proc prevent SQL Server from caching the query plan? If so, then is the web app missing out on huge performance gains?
Possible Solution:
I thought DateTime.Today.AddDays(1) would be a possible solution. It would pass the same end-date to the sql proc (per day). And the user would still get the latest data. Please speak to this as well.
Given Example:
Let's say we have a stored procedure. It reports data back to a user on a webpage. The user can set a date range. If the user sets today's date as the "end date," which includes today's data, the web app passes DateTime.Now to the sql proc.
Let's say that one user runs a report--5/1/2010 to now--over and over several times. On the webpage, the user sees 5/1/2010 to 5/4/2010. But the web app passes DateTime.Now to the sql proc as the end date. So, the end date in the proc will always be different, although the user is querying a similar date range.
Assume the number of records in the table and number of users are large. So any performance gains matter. Hence the importance of the question.
Example proc and execution (if that helps to understand):
CREATE PROCEDURE GetFooData
#StartDate datetime
#EndDate datetime
AS
SELECT *
FROM Foo
WHERE LogDate >= #StartDate
AND LogDate < #EndDate
Here's a sample execution using DateTime.Now:
EXEC GetFooData '2010-05-01', '2010-05-04 15:41:27' -- passed in DateTime.Now
Here's a sample execution using DateTime.Today.AddDays(1)
EXEC GetFooData '2010-05-01', '2010-05-05' -- passed in DateTime.Today.AddDays(1)
The same data is returned for both procs, since the current time is: 2010-05-04 15:41:27.
The query plan will be cached regardless of parameter values. Parameters basically guarantee that a consistent, reusable query exists, since they are type-safe as far as SQL server is concerned.
What you want is not query plan, but result caching. And this will be affected by the behavior you describe.
Since you seem to handle whole days only, you can try passing in dates, not datetimes, to minimize different parameter values. Also try caching query results in the application instead of doing a database roundtrip every time.
Because you invoke a stored procedure, not directly a query, then your only query that changes is the actual batch you send to SQL, the EXEC GetFooData '2010-05-01', '2010-05-05' vs. GetFooData '2010-05-01', '2010-05-04 15:41:27'. This is a trivial batch, that will generate a trivial plan. While is true that, from a strict technical point of view, you are loosing some performance, it will be all but unmeasurable. The details why this happes are explained in this response: Dynamically created SQL vs Parameters in SQL Server
The good news is that by a minor change in your SqlClient invocation code, you'll benefit from even that minor performance improvement mentioned there. Change your SqlCommand code to be an explicit stored procedure invocation:
SqlCommand cmd = new SqlCommand("GetFooData", connection);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue("#StartDate", dateFrom);
cmd.Parameters.AddWithValue("#EndDate", DateTime.Now);
As a side note, storing localized times in the database is not a very good idea, due to the clients being on different time zones than the server and due to the complications of daylight savings change night. A much better solution is to always store UTC time and simply format it to user's local time in the application.
In your case, you are probably fine if the second parameter is just drifting upward in real time.
However, it is possible to become a victim of parameter sniffing where the first execution (which produces the cached execution plan) is called with parameters which produce a plan which is not typically good for the other parameters normally used (or the data profile changes drastically). The later invocations might use a plan which is sometimes so poor that it won't even complete properly.
If your data profile changes drastically by different choices of parameters, and the execution plan becomes poor for certain choices of parameters, you can mask the parameters into local variables - this will effectively prevent parameter sniffing in SQL Server 2005. There is also the WITH RECOMPILE (either in the SP or in the EXEC - but for heavily called SPs, this is not a viable option) In SQL Server 2008, I would almost always use the OPTIMIZE FOR UNKNOWN which will avoid producing a plan based on parameter sniffing.

Parameter Sniffing (or Spoofing) in SQL Server

A while ago I had a query that I ran quite a lot for one of my users. It was still being evolved and tweaked but eventually it stablised and ran quite quickly, so we created a stored procedure from it.
So far, so normal.
The stored procedure, though, was dog slow. No material difference between the query and the proc, but the speed change was massive.
[Background, we're running SQL Server 2005.]
A friendly local DBA (who no longer works here) took one look at the stored procedure and said "parameter spoofing!" (Edit: although it seems that it is possibly also known as 'parameter sniffing', which might explain the paucity of Google hits when I tried to search it out.)
We abstracted some of the stored procedure to a second one, wrapped the call to this new inner proc into the pre-existing outer one, called the outer one and, hey presto, it was as quick as the original query.
So, what gives? Can someone explain parameter spoofing?
Bonus credit for
highlighting how to avoid it
suggesting how to recognise possible cause
discuss alternative strategies, e.g. stats, indices, keys, for mitigating the situation
FYI - you need to be aware of something else when you're working with SQL 2005 and stored procs with parameters.
SQL Server will compile the stored proc's execution plan with the first parameter that's used. So if you run this:
usp_QueryMyDataByState 'Rhode Island'
The execution plan will work best with a small state's data. But if someone turns around and runs:
usp_QueryMyDataByState 'Texas'
The execution plan designed for Rhode-Island-sized data may not be as efficient with Texas-sized data. This can produce surprising results when the server is restarted, because the newly generated execution plan will be targeted at whatever parameter is used first - not necessarily the best one. The plan won't be recompiled until there's a big reason to do it, like if statistics are rebuilt.
This is where query plans come in, and SQL Server 2008 offers a lot of new features that help DBAs pin a particular query plan in place long-term no matter what parameters get called first.
My concern is that when you rebuilt your stored proc, you forced the execution plan to recompile. You called it with your favorite parameter, and then of course it was fast - but the problem may not have been the stored proc. It might have been that the stored proc was recompiled at some point with an unusual set of parameters and thus, an inefficient query plan. You might not have fixed anything, and you might face the same problem the next time the server restarts or the query plan gets recompiled.
Yes, I think you mean parameter sniffing, which is a technique the SQL Server optimizer uses to try to figure out parameter values/ranges so it can choose the best execution plan for your query. In some instances SQL Server does a poor job at parameter sniffing & doesn't pick the best execution plan for the query.
I believe this blog article http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx has a good explanation.
It seems that the DBA in your example chose option #4 to move the query to another sproc to a separate procedural context.
You could have also used the with recompile on the original sproc or used the optimize for option on the parameter.
A simple way to speed that up is to reassign the input parameters to local parameters in the very beginning of the sproc, e.g.
CREATE PROCEDURE uspParameterSniffingAvoidance
#SniffedFormalParameter int
AS
BEGIN
DECLARE #SniffAvoidingLocalParameter int
SET #SniffAvoidingLocalParameter = #SniffedFormalParameter
--Work w/ #SniffAvoidingLocalParameter in sproc body
-- ...
In my experience, the best solution for parameter sniffing is 'Dynamic SQL'. Two important things to note is that 1. you should use parameters in your dynamic sql query 2. you should use sp_executesql (and not sp_execute), which saves the execution plan for each parameter values
Parameter sniffing is a technique SQL Server uses to optimize the query execution plan for a stored procedure. When you first call the stored procedure, SQL Server looks at the given parameter values of your call and decides which indices to use based on the parameter values.
So when the first call contains not very typical parameters, SQL Server might select and store a sub-optimal execution plan in regard to the following calls of the stored procedure.
You can work around this by either
using WITH RECOMPILE
copying the parameter values to local variables inside the stored procedure and using the locals in your queries.
I even heard that it's better to not use stored procedures at all but to send your queries directly to the server.
I recently came across the same problem where I have no real solution yet.
For some queries the copy to local vars helps getting back to the right execution plan, for some queries performance degrades with local vars.
I still have to do more research on how SQL Server caches and reuses (sub-optimal) execution plans.
I had similar problem. My stored procedure's execution plan took 30-40 seconds. I tried using the SP Statements in query window and it took few ms to execute the same.
Then I worked out declaring local variables within stored procedure and transferring the values of parameters to local variables. This made the SP execution very fast and now the same SP executes within few milliseconds instead of 30-40 seconds.
Very simple and sort, Query optimizer use old query plan for frequently running queries. but actually the size of data is also increasing so at that time new optimized plan is require and still query optimizer using old plan of query. This is called Parameter Sniffing.
I have also created detailed post on this. Please visit this url:
http://www.dbrnd.com/2015/05/sql-server-parameter-sniffing/
Changing your store procedure to execute as a batch should increase the speed.
Batch file select i.e.:
exec ('select * from order where order id ='''+ #ordersID')
Instead of the normal stored procedure select:
select * from order where order id = #ordersID
Just pass in the parameter as nvarchar and you should get quicker results.

Resources