I've got a procedure exhibiting performance issues due to parameter sniffing. The procedure is costly but generally executes in an acceptable amount of time amid typical load. However, it will periodically perform poorly. When this happens, we see sessions executing the procedure go parallel. We'll see the number of sessions executing the routine start to pile up. We resolve the issue by pulling the procedure's plan from the cache, killing the active sessions, and monitoring for an immediate re-occurrence (like when a bad plan ends up back in the cache).
The procedure is in a typical OLTP database (lots of small DML queries).When the subject query performs poorly, it causes a CPU spike and degrades performance of the associated service. During business hours, the procedure is executed more than 5 times a minute (don't have an exact count). The plan for the procedure is here:
https://www.brentozar.com/pastetheplan/?id=S1aEVQJBN
All columns in the query (join and projected columns), are covered by indexes.
Here are the options I'm weighing:
We can apply Option Recompile to the problematic select statement inside the routine.
We could do a little research and find a parameter value that generates a plan that is generally good for "everything". We'd then add the following OPTIMIZE FOR to the function. This will incur a little technical debt as we may need to adjust this value overtime.
We could introduce branching logic in the procedure. If the parameter value were X or in range-X, we'd execute sproc, and if not, we'd run sproc. Again, incurs a little technical debt.
We could change the problematic statement to dynamic SQL. This will cause a plan to be generated for each unique SQL statement. If we get tons of unique calls of this procedure, it could bloat the plan cache.
Slap a MAXDOP of "1" on the problematic select statement.
Options I've factored out:
We can optimize for unknown. This will change record estimates the density vector, and optimize operations for that.
Have I missed any sensible options? What options make the most sense?
Related
I have a stored procedure which has been variably quick and slow. I read about parameter sniffing and query plans and have been analyzing what differences in the query plan could cause this differential.
What I'm realizing is that it doesn't appear to be an optimization issue, parameter sniffing issue. I've used the estimated execution plan, actual execution plan and live query statistics to compare the plans in a variety of scenarios. While the return time in SMSS varies from subsecond to almost a minute, one thing is constant - all of the execution plans and live query statistics say the query is/was/should be fast. Live query statistics shows 0.074s on the latest run, yet the results took 52 seconds to return.
The flow goes like this:
I execute the stored procedure from SMSS with Live Query Statistics enabled
It whirs for a second generating the execution plan
It displays the plan, then the status bar says "Executing Query... 0%"
It spins for the next X seconds where X is between 5 and 52
Then blip, the results pop up and the live query stats plan fills in instantaneously with a stated execution time at the green select node of between 0.074s and ~2s
Sometimes repeated runs of the same query go fast (which made me think, oh, it's loading the plan from the cache and able to execute quicker, but then I put OPTION(RECOMPILE) at the end of the stored procedure, which I read forces it to not use the plan cache), but other times it takes just as long. Sometimes when I change the input values it makes it go quicker and sometimes it doesn't. Again, the one constant - the plan/stats think it should be/is fast.
The stored procedure itself isn't too complicated, one select statement with some aggregating using windowing on a fairly small dataset and a couple of joins to other small tables - the max number of rows returned is about 90 on my test dataset. All of the critical columns have indices.
I really don't think it is actually a performance problem with the query itself, regardless of the differences in the plan, which I've seen a couple slight variations depending on the parameters passed in, the execution time estimates and live query stats are unanimously telling me - it's fast, sometimes so fast the live query stats don't even show any times underneath the nodes.
At some point of running it over and over again with different parameters it seems to settle on returning quickly, but only for a while. Then, usually just at the moment when I'm feeling like, "well whatever that was, it seems to be good now, moving on..." BAM right back to Slowsville.
Without more information this is tough to diagnose. But, it sounds like you might have parameter sniffing happening.
https://www.brentozar.com/archive/2013/06/the-elephant-and-the-mouse-or-parameter-sniffing-in-sql-server/
I have just analyzed a call to a particular SQL Server Stored Procedure. It is very slow so I decided to analyze it.
The result is confusing:
When executing the procedure it takes 1 min 24s.
When executing two identical calls simultaneously it takes 2 min 50 s. Obviously some bad blocking and context swapping is happening.
When switching on Actual Execution Plan and running one call it takes 4 min 48s. The client statistics now tells me the total execution time is 3000ms.
What is happening?
Does the Actual Execution Plan actually interfere that much?
That means there should be a warning somewhere that the execution time is MUCH longer and the statistics is WAY off.
The procedure is not huge in size but with a nasty complexity: cursors, nested selects in five levels, temporary tables, sub procedures and function calls.
There are tons of articles in the web discussing why using cursors is a bad practice and should be avoided when possible. Depending on database settings a cursor is registered in the global namespace making concurrency close to impossible - it is absolutely normal that 2 instances of your proc take exactly twice longer. Performance of cursors and locking are another points of consideration. So having said that even a while loop on, say, temp table with identity column to loop on, might be a better solution than cursors.
My experience is that turning on actual execution plan always adds some time - I have noticed that the more complex the plan is the bigger the effect it is (so i guess it is quite related to the SSMS handling it) ... but in your case the effect looks of larger scale than I would expect, so there might be something else going on.
I have a stored procedure which usually runs pretty quickly (hardly a few seconds), but then there are odd days where the same proc with the same parameters takes minutes to execute. But if I defrag the indexes at this point, it starts running within seconds again.
Could this be because of bad execution plan or fragmented indexes?
If so, Is there a way I can make this procedure NOT dependant on execution plans or fragmented indexes?
Thanks in advance,
Joseph
Well, depending on your SP, the solution might be throught these options:
1/ WITH RECOMPILE could save your day. This increases the total execution time, by recompiling the SP, but it assures you'll have the best execution plan.
2/ KEEPFIXED PLAN could be also an option.
3/ It worths to give a try with OPTIMIZE FOR in case you have a set of parameters that are "representative" from statistics point of view.
4/ Monitor the level of fragmentation on involved tables and indexes. Check if there are statements that heavily updates the tables used by your SP. If yes, update statistics (UPDATE STATISTICS <tablename>;)
5/ Parameter sniffing could be also a root cause.
You can go further into details and see a list of causes of recompilations.
Short answer: no. SQL Server relies on execution plans and indexes to perform well.
Longer answer: Maybe. If your performance improves immediately after defragging your indexes, then my first question would be: which indexes, and why are they fragmenting? Are you clustering on a uniqueidentifier? Are your statistics up-to-date? What's the execution plan look like?
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
I have just got hit with a parameter sniffing problem. (It wasn't too serious as it caused a report to take about 2 minutes to run instead of a few seconds if properly cached and maybe 30 seconds if recompiled. And since the report is usually only run a few times per month, it is not really a problem).
However, since I wrote the report and I knew what it did, I was curious and went investigating and using SQL Profiler, I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
So, it struck me, that if SQL has these figures, (or at least can get these figures), that perhaps there is some way of getting sql to track and report which plans were significantly out.
You've got a couple of questions in there:
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
To catch this, you need to monitor the procedure cache to find out when a query's execution plan changes from good to bad. SQL Server 2008 made this a lot easier by adding query_hash and query_plan_hash fields to sys.dm_exec_query_stats. You can compare the current query plan to past ones for the same query_hash, and when it changes, compare the number of logical reads or amount of worker time from the old query to the new one. If it skyrockets, you might have a parameter sniffing problem.
Then again, someone might have just eliminated an index or changed the code in a UDF that's being called or a change in MAXDOP or any one of a million settings that influence query plan behavior.
What you want is a single dashboard that shows the most resource-consuming queries in aggregate (because you might have this problem on a query that's called extremely frequently, but consumes tiny amounts of resources each time) and then shows you changes in its execution plan over time, plus lays over system and database level changes. Quest Foglight Performance Analysis does this. (I used to work for Quest, so I know the product, but I'm not shilling here.) Note that Quest sells a separate product, Foglight, that has nothing to do with Performance Analysis. I'm not aware of any other product that goes into this level of detail.
I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
That's not necessarily parameter sniffing - that could be bad stats or table variable usage, for example. To catch this kind of issue, I like the free SQL Sentry Plan Advisor tool. In the Top Operations tab, it highlights variances between estimated and actual rows.
Now, that's only for one plan at a time, and you have to know the plan first. You want to do this 24/7, right? Sure you do - but it's computationally intensive. The procedure cache can be huge (I've got clients with >100GB of procedure cache), and it's all unindexed XML. To compare estimated vs actual rows, you have to shred all that XML - and keep in mind that the procedure cache can be constantly changing under load.
What you really want is a product that could very rapidly dump the entire procedure cache into a database, throw XML indexes on it, and then compare estimates versus actual rows. I can imagine a script doing that, but I haven't seen one yet.
You said
"estimated rows was 1, but the actual number of rows was several hundred thousand."
This can be caused by table variables which don't have statistics.
To detect parameter sniffing is difficult but you can verify it is happening by running sp_updatestats. If the problems disappears it's most likely parameter sniffing. If it doesn't then you have other problems, such as too large table variables
We use parameter masking consistently now (system was developed on SQL Server 2000). We don't need it 99.9+ % of the time but the < 0.1% justifies it because of user confidence + support overhead it entails.
You can set up a trace that to record the query text of all batches / stored procedures run that have duration > Ns.
You obviously need to tailor N for your system (and probably add rules to exclude batch jobs that take a long time even during normal execution), but this should identify which queries offer the poorest performance and will also record any queries (along with their parameters) which have abnormally long execution times - potentially the result of a parameter sniffing problem.
See How to create a SQL trace using T-SQL on how to create a trace using T-SQL. This will give better performance than using SQL Profiler as this only captures the events that you set trace events for (SQL Profiler reportedly captures all events and then filters them in the application).
I'm using one SP performing all CRUD operations
So, basically I'm executing same SP depending what action necesary:
eg
-- for select
exec USP_ORDER #MODE='S', #ORDER_DATE='2009/01/01'
-- for update
exec USP_ORDER #MODE='U', #other_params
-- for insert
exec USP_ORDER #MODE='I', #other_params
-- for delete
exec USP_ORDER #MODE='D', #ID=100
Thanks that I have only 1 SP per 1 Buisness Object which keeps my DB ordered.
But recently I experienced performance issues.
In light on that my question:
Is this approach correct? Can it has influence on performance / proper exec. plan?
It can have performance implications due to possible caching the 'wrong' query plan. Check out the topics 'parameter sniffing' and query plan caching.
EDIT: In response to John's comment, you could also have your top level SP decide which CRUD SP to call, then each would get its own cached query plan.
I think this is more a coding/design preference question.
Personally, I am big fan of keeping things simple and for this reason I would suggest you break out your operations into separate stored procedures.
This will be more transparent and also aide any performance tuning you may have to do in future, i.e. if your update procedure/logic is performing slowly, you can immediately isolate it as the cause whereas if the logic is part of a much larger procedure with varying CRUD operations, the root cause of the issue will not be quite so obvious.
I'm also fan of simplification (if possible).
But the reason I've decided on that way: currently I have ~80 SPs If I divide all by function they serve (eg. USP_Sample_Insert, USP_Sample_Select1, USP_Sample_Select2,
USP_Sample_Delete) I will have ~400 SPs!
Managing, navigating, updating, sync parameters between such huge amount if SP instances would be nightmare to me.
For me - only reasonable case to do it is performance....