Detecting/Monitoring for parameter sniffing problems - sql-server

Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
I have just got hit with a parameter sniffing problem. (It wasn't too serious as it caused a report to take about 2 minutes to run instead of a few seconds if properly cached and maybe 30 seconds if recompiled. And since the report is usually only run a few times per month, it is not really a problem).
However, since I wrote the report and I knew what it did, I was curious and went investigating and using SQL Profiler, I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
So, it struck me, that if SQL has these figures, (or at least can get these figures), that perhaps there is some way of getting sql to track and report which plans were significantly out.

You've got a couple of questions in there:
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
To catch this, you need to monitor the procedure cache to find out when a query's execution plan changes from good to bad. SQL Server 2008 made this a lot easier by adding query_hash and query_plan_hash fields to sys.dm_exec_query_stats. You can compare the current query plan to past ones for the same query_hash, and when it changes, compare the number of logical reads or amount of worker time from the old query to the new one. If it skyrockets, you might have a parameter sniffing problem.
Then again, someone might have just eliminated an index or changed the code in a UDF that's being called or a change in MAXDOP or any one of a million settings that influence query plan behavior.
What you want is a single dashboard that shows the most resource-consuming queries in aggregate (because you might have this problem on a query that's called extremely frequently, but consumes tiny amounts of resources each time) and then shows you changes in its execution plan over time, plus lays over system and database level changes. Quest Foglight Performance Analysis does this. (I used to work for Quest, so I know the product, but I'm not shilling here.) Note that Quest sells a separate product, Foglight, that has nothing to do with Performance Analysis. I'm not aware of any other product that goes into this level of detail.
I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
That's not necessarily parameter sniffing - that could be bad stats or table variable usage, for example. To catch this kind of issue, I like the free SQL Sentry Plan Advisor tool. In the Top Operations tab, it highlights variances between estimated and actual rows.
Now, that's only for one plan at a time, and you have to know the plan first. You want to do this 24/7, right? Sure you do - but it's computationally intensive. The procedure cache can be huge (I've got clients with >100GB of procedure cache), and it's all unindexed XML. To compare estimated vs actual rows, you have to shred all that XML - and keep in mind that the procedure cache can be constantly changing under load.
What you really want is a product that could very rapidly dump the entire procedure cache into a database, throw XML indexes on it, and then compare estimates versus actual rows. I can imagine a script doing that, but I haven't seen one yet.

You said
"estimated rows was 1, but the actual number of rows was several hundred thousand."
This can be caused by table variables which don't have statistics.
To detect parameter sniffing is difficult but you can verify it is happening by running sp_updatestats. If the problems disappears it's most likely parameter sniffing. If it doesn't then you have other problems, such as too large table variables
We use parameter masking consistently now (system was developed on SQL Server 2000). We don't need it 99.9+ % of the time but the < 0.1% justifies it because of user confidence + support overhead it entails.

You can set up a trace that to record the query text of all batches / stored procedures run that have duration > Ns.
You obviously need to tailor N for your system (and probably add rules to exclude batch jobs that take a long time even during normal execution), but this should identify which queries offer the poorest performance and will also record any queries (along with their parameters) which have abnormally long execution times - potentially the result of a parameter sniffing problem.
See How to create a SQL trace using T-SQL on how to create a trace using T-SQL. This will give better performance than using SQL Profiler as this only captures the events that you set trace events for (SQL Profiler reportedly captures all events and then filters them in the application).

Related

Query compilation and provisioning times

What does it mean there is a longer time for COMPILATION_TIME, QUEUED_PROVISIONING_TIME or both more than usual?
I have a query runs every couple of minutes and it usually takes less than 200 milliseconds for compilation and 0 for provisioning. There are 2 instances in the last couple of days the values are more than 4000 for compilation and more than 100000 for provisioning.
Is that mean warehouse was being resumed and there was a hiccup?
COMPILATION_TIME:
The SQL is parsed and simplified, and the tables meta data is loaded. Thus a compile for select a,b,c from table_name will be fractally faster than select * from table_name because the meta data is not needed from every partition to know the final shape.
Super fragmented tables, can give poor compile performance as there is more meta data to load. Fragmentation comes from many small writes/deletes/updates.
Doing very large INSERT statements can give horrible compile performance. We did a lift-and-shift and did all data loading via INSERT, just avoid..
PRIOVISIONING_TIME is the amount of time to setup the hardware, this occurs for two main reasons ,you are turning on 3X, 4X, 5X, 6X servers and it can take minutes just to allocate those volume of servers.
Or there is failure, sometime around releases there can be a little instability, where a query fails on the "new" release, and query is rolled back to older instances, which you would see in the profile as 1, 1001. But sometimes there has been problems in the provisioning infrastructure (I not seen it for a few years, but am not monitoring for it presently).
But I would think you will mostly see this on a on going basis for the first reason.
The compilation process involves query parsing, semantic checks, query rewrite components, reading object metadata, table pruning, evaluating certain heuristics such as filter push-downs, plan generations based upon the cost-based optimization, etc., which totally accounts for the COMPILATION_TIME.
QUEUED_PROVISIONING_TIME refers to Time (in milliseconds) spent in the warehouse queue, waiting for the warehouse compute resources to provision, due to warehouse creation, resume, or resize.
https://docs.snowflake.com/en/sql-reference/functions/query_history.html
To understand the reason behind the query taking long time recently in detail, the query ID needs to be analysed. You can raise a support case to Snowflake support with the problematic query ID to have the details checked.

SQL Query taking much longer to select than inserting into a temp table

I have a query that takes about 1.5 minutes to run if it runs as an insert into an empty temp table, but selecting the data doesn't complete even after 20 minutes. Why would a select take so much longer than an insert? Will SQL Server draw different plans based on that?
The estimated execution plans are the same for both, the actual execution plan is different than the estimated.
Update: It seems that the INSERT version of the query is using
parallelism, while the select version is not. My question still
remains as to why that is?
Once a single query plan reaches a certain level of complexity, I often have a better experience sequencing a collection of smaller queries together than one really big nasty query. This enables the query optimizer to evaluate a smaller request.
Locking duration in a highly transnational database can also be a significant contributing factor.
It sounds like you're on to something with the parallelism. There are several operators within SQL that will prevent parallelism.

SQL Server: live query statistics show a fast stored procedure time, but it takes almost a minute to return results

I have a stored procedure which has been variably quick and slow. I read about parameter sniffing and query plans and have been analyzing what differences in the query plan could cause this differential.
What I'm realizing is that it doesn't appear to be an optimization issue, parameter sniffing issue. I've used the estimated execution plan, actual execution plan and live query statistics to compare the plans in a variety of scenarios. While the return time in SMSS varies from subsecond to almost a minute, one thing is constant - all of the execution plans and live query statistics say the query is/was/should be fast. Live query statistics shows 0.074s on the latest run, yet the results took 52 seconds to return.
The flow goes like this:
I execute the stored procedure from SMSS with Live Query Statistics enabled
It whirs for a second generating the execution plan
It displays the plan, then the status bar says "Executing Query... 0%"
It spins for the next X seconds where X is between 5 and 52
Then blip, the results pop up and the live query stats plan fills in instantaneously with a stated execution time at the green select node of between 0.074s and ~2s
Sometimes repeated runs of the same query go fast (which made me think, oh, it's loading the plan from the cache and able to execute quicker, but then I put OPTION(RECOMPILE) at the end of the stored procedure, which I read forces it to not use the plan cache), but other times it takes just as long. Sometimes when I change the input values it makes it go quicker and sometimes it doesn't. Again, the one constant - the plan/stats think it should be/is fast.
The stored procedure itself isn't too complicated, one select statement with some aggregating using windowing on a fairly small dataset and a couple of joins to other small tables - the max number of rows returned is about 90 on my test dataset. All of the critical columns have indices.
I really don't think it is actually a performance problem with the query itself, regardless of the differences in the plan, which I've seen a couple slight variations depending on the parameters passed in, the execution time estimates and live query stats are unanimously telling me - it's fast, sometimes so fast the live query stats don't even show any times underneath the nodes.
At some point of running it over and over again with different parameters it seems to settle on returning quickly, but only for a while. Then, usually just at the moment when I'm feeling like, "well whatever that was, it seems to be good now, moving on..." BAM right back to Slowsville.
Without more information this is tough to diagnose. But, it sounds like you might have parameter sniffing happening.
https://www.brentozar.com/archive/2013/06/the-elephant-and-the-mouse-or-parameter-sniffing-in-sql-server/

Terrible SQL reads performance (culprit update stats?)

I'm running on SQL Server 2008 R2 and am trying to fine-tune performance. I did everything I could from:
Code review of SQL code
Create or remove indexes as I think appropriate
Auto create stats ON
Auto update stats ON
Auto update stats async ON
I have a 24/7 system that constantly stores data. Sometimes we do reads and that's where the issue is. Sometimes the reads take a couple of seconds or less (which would be expected and acceptable to us). Other times, the reads take several seconds that could amount to a minute before the stored procedure completes and we render data on the UI.
If we do the read again, it would be faster. The SQL profiler would trace the particular stored procedure or query that took several seconds. We would zoom into that stored procedure, and do everything we can do to optimize it if we can.
I also traced the auto stats event and the recompile event. It's hard to tell if a stat is being updated causing the read to take a long time, or if a recompile caused it. Sometimes, I see that the profiler traced a recompile of the read query that took several unacceptable minutes, other times it doesn't trace a recompile.
I tried to prevent the query optimizer from blocking the read until it recompiles or updates stats by using option use plan XML, etc. But I ran into compile errors complaining that the query plan XML isn't valid; that could be true because the query is quiet involved: select + joins that involve a local table var. I sort of hacked the XML and maybe that's why it deemed it invalid. So I gave up on using plan hint.
We tried periodic (every 15 minutes) manual running update stats in order to keep stats up-to-date as much as we can, but that hurt performance. updatestats blocks writes, and I'm sure even reads; updatestats seemed to maintain a bunch of statistics and on average it was taking around 80-90 seconds. A read that waits that long is unacceptable.
So the idea is to let the reads happen and prevent a situation when a recompile/update stat blocks it, correct? Does it make sense to disable auto statistics altogether? Or perhaps disable auto create statistics after deleting all the auto created stats?
This goes against Microsoft recommendations perhaps, since they enable auto create statistics and auto update statistics by default, and performance may suffer, but any ideas/hints you can give would be appreciated.
From what you are explaining, it looks like the below (all or some) might be happening.
You are doing physical reads. The quick way you avoid this is by increasing the amount of RAM you throw at the box. You haven't mentioned the hardware specs of your server. Please add details.
If you trace the SQL calls then you can easily figure out why the RECOMPILE happened. Look at the EventSubClass to figure out the reason and work towards resolving that.
ref: http://msdn.microsoft.com/en-us/library/ms187105.aspx
You mentioned table variables. These are notorious for causing performance issues when NOT using at the right place. If you use table variables in a JOIN, parallel plan is out of the question and no stats also. I am NOT sure how and where you are using but try replacing them with temp tables. And starting from SQL Server 2005, you will get only STMT recompilation at best and NOT the complete SP recompile as it happened in 2000.
You mentioned Update Stats ASYNC option and this won't block the query.
What are the TOP WAIT STATS on this server? Have you identified the expensive procedures based on CPU, Logical reads & execution count?
Have you looked the Page Life Expectancy, amount of IO using virtual file stats DMV?
Updating Stats every 15 minutes is NOT a good plan. How often is data inserted into the system? What is the sample rate you are using? What is your index maintenance strategy?
Have you looked at the missing indexes DMV?
There are a bunch of good queries to identify problems in more granular fashion using the below queries.
ref: http://dl.dropbox.com/u/13748067/SQL%20Server%202008%20Diagnostic%20Information%20Queries%20%28April%202011%29.sql
There are so many other things to look at but the above is a good starting point.
OK, here is my IMHO catch on this:
DBCC INDEXDEFRAG is worth trying and is an ONLINE function hence can be used on a live system
You could be reaching the maximum capacity of your architectural design. You can scale up which can always help but more likely you have to change the architecture to achieve better scalability sacrificing simplicity
A common trick is partitioning. You are writing to a table whose index distribution looks nothing like it was a few hours ago - hence degrading performance. This is a massive write, such a table could be divided to daily write and the rest of the data with nightly batches of moving stuff across.
More and more, people are being converting to CQRS. You might be the next. This solves the problem by separating reads from writes (a very simplistic explanation).

What is the best way to compare 2 variants of a SQL query for performance?

I've got a SQL 2005 DB running under a virtual environment.
To simplify things, let's say I have two SQL SELECT Queries. They both do the exact same thing. But I'm trying to analyze them for performance purposes.
Generally, I'd fire up a local DB, load up some data and using timing to compare one variant to other variants.
But in this case, since the DB is large and it's a testbox, the client has placed it on a host that's serving other VM's as well.
The DB is too large to pull down locally, so that's out (at least for now).
But my main issue is that when I run queries against the server, the timing is all over the place. I can run the +exact+ same query 4 times and get timings of 7secs, 8 minutes, 3:45min and 15min.
My first thought was use SET STATISTICS IO ON.
But, that yields basically read and write stats on the tables being queries, which, depending on the variations in the queries (temp tables, vs views, vs joins, etc) can't really be accurately compared, except in aggregate.
I then though of SET STATISTICS TIME ON, and just using the CPU time, but that seems to discount all the IO, which also doesn't make for a good baseline.
My question is is there any other statistic or performance analysis technique that could be useful in a situation like this?
The STATISTICS IO information will still be useful. You may see significantly different numbers of reads, writes and scans that will make it obvious which query is better.
You can also view Execution Plan information for each query. You can select Query -> Display Estimated Execution Plan to see a graphical presentation of the SQL Server estimate to run the query. You can also use the Query -> Include Actual Execution Plan to show the actual plan used.
And, you can also use SET SHOWPLAN_TEXT, SET SHOWPLAN_ALL or SET SHOWPLAN_XML to include the execution plan to view a textual display of the plan.
When viewing the results of the execution plan, you can look at the estimated cost value and compare the values for each query. The estimated cost is a relative value that can be used to compare the cost of each option.

Resources