What does the sys.dm_exec_query_optimizer_info "timeout" record indicate? - sql-server

During an investigation of some client machines losing their connection with SQL Server 2005, I ran into the following line of code on the web:
Select * FROM sys.dm_exec_query_optimizer_info WHERE counter = 'timeout'
When I run this query on our server - we are getting the following results:
counter - occurrence - value
timeout - 9100 - 1
As far as I can determine, this means that the query optimizer is timing out while trying to optimize queries run against our server – 9100 times. We are however, not seeing any timeout errors in the SQL Server error log, and our end-users have not reported any timeout specific errors.
Can anyone tell me what this number of “occurrences” means? Is this an issue we should be concerned about?

This counter is nothing to do with your connection issues.
SQL Server won't spend forever trying to compile the best possible plan (at least without using trace flags).
It calculates two values at the beginning of the optimisation process.
Cost of a good enough plan
Maximum time to spend on query optimisation (this is measured in number of transformation tasks carried out rather than clock time).
If a plan with a cost lower than the threshold is found then it needn't continue optimising. Also if it exceeds the number of tasks budgeted then optimisation will also end and it will return the best plan found so far.
The reason that optimisation finished early shows up in the execution plan in the StatementOptmEarlyAbortReason attribute. There are actually three possible values.
Good enough plan found
Timeout
Memory Limit Exceeded.
A timeout will increment the counter you ask about in sys.dm_exec_query_optimizer_info.
Further Reading
Reason for Early Termination of Statement
Microsoft SQL Server 2014 Query Tuning & Optimization

The occurence column will tell you the number of times that counter has been incremented and the value column is an internal column for this counter.
See here

Sorry, the documentation say this is internal only.
Based on the other link, I suspect this is for internal engine timeouts (eg SET QUERY_GOVERNOR_COST_LIMIT)
A client timeout will also not be logged in SQL because the client aborts the batch, ths stopping SQL processing.
Please do you have more details?

Related

MaxDOP = 8, does that mean 8 threads on all parallel operators?

I've been reading a bit on MaxDOP and have run into a question that I cant seem to find an answer for. If MaxDOP is set to a value, lets say 8, does that mean that SQL Server will always spin-up 8 threads on the parallel activities in the query, or could it decide to use less threads for a particular operator?
It boils down to: Is too many threads a performance concern if the workload is small (OLTP) and MaxDOP has been set too high?
A hint to the correct DMW would be nice. I got lost in DMW land, again.
The short answer is: SQL Server will dynamically decide to use a parallel execution of the query, but will not exceed the maximum degree of parallelity (MAXDOP) that you have indicated.
The following article has some more detailed information: How It Works: Maximizing Max Degree Of Parallelism (MAXDOP). I'll just cite a part of it here:
There are several stages to determining the degree of parallelism (MAXDOP) a query can utilize.
Stage 1 – Compile
During complication SQL Server considers the hints, sp_configure and resource workgroup settings to see if a parallel plan should even be considered. Only if the query operations allow parallel execution:
If hint is present and > 1 then build a parallel plan
else if no hint or hint (MAXDOP = 0)
if sp_configure setting is 1 but workload group > 1 then build a parallel plan
else if sp_configure setting is 0 or > 1 then build parallel plan
Stage 2 – Query Execution
When the query begins execution the runtime, degree of parallelism is determined. This involves many factors, already outlined in SQL Server Books Online: http://technet.microsoft.com/en-US/library/ms178065(v=SQL.105).aspx
Before SQL Server looks at the idle workers and other factors it determines the target for the degree of parallelism.
[... see details in article ...]
If still 0 after the detailed calculations it is set to 64 (default max for SQL Server as documented in Books Online.) [...] SQL Server hard codes the 64 CPU target when the runtime target of MAXDOP is still 0 (default.)
The MAXDOP target is now adjusted for:
Actual CPU count (affinity settings from sp_configure and the resource pool).
Certain query types (index build for example) look at the partitions
Other query type limitations that may exist
Now SQL Server takes a look at the available workers (free workers for query execution.) You can loosely calculate the free worker count on a scheduler using (Free workers = Current_workers_count – current_tasks_count) from sys.dm_os_schedulers.
Once the target is calculated the actual is determined by looking at the available resources to support a parallel execution. This involves determining the node(s) and CPUs with available workers.
[...]
The worker location information is then used to target an appropriate set of CPUs to assign the parallel task to.
Using XEvents you can monitor the MAXDOP decision logic. For example:
XeSqlPkg::calculate_dop_begin
XeSqlPkg::calculate_dop
You can monitor the number of parallel workers by querying: sys.dm_os_tasks
It is only used to limit the max number of threads allowed per request:
https://msdn.microsoft.com/en-us/library/ms189094.aspx
So if SQL thinks using one thread is fastest it will just use one.
Generally on an OLTP system you will keep this on the low side. On large warehouse DB's you may want to keep a higher number.
unless you are seeing specific problems I wouldn't change it unless you are confident of the outcome.
SQL Server can also decide to use less threads, you can see them from the actual plan with the number of rows handled by each thread. The maximum of threads is also for each of the parallel sections, and one query can have more than one section.
In addition to MAXDOP there is setting "cost threshold for parallelism" which decides if parallel plan is even considered for a query.

Terrible SQL reads performance (culprit update stats?)

I'm running on SQL Server 2008 R2 and am trying to fine-tune performance. I did everything I could from:
Code review of SQL code
Create or remove indexes as I think appropriate
Auto create stats ON
Auto update stats ON
Auto update stats async ON
I have a 24/7 system that constantly stores data. Sometimes we do reads and that's where the issue is. Sometimes the reads take a couple of seconds or less (which would be expected and acceptable to us). Other times, the reads take several seconds that could amount to a minute before the stored procedure completes and we render data on the UI.
If we do the read again, it would be faster. The SQL profiler would trace the particular stored procedure or query that took several seconds. We would zoom into that stored procedure, and do everything we can do to optimize it if we can.
I also traced the auto stats event and the recompile event. It's hard to tell if a stat is being updated causing the read to take a long time, or if a recompile caused it. Sometimes, I see that the profiler traced a recompile of the read query that took several unacceptable minutes, other times it doesn't trace a recompile.
I tried to prevent the query optimizer from blocking the read until it recompiles or updates stats by using option use plan XML, etc. But I ran into compile errors complaining that the query plan XML isn't valid; that could be true because the query is quiet involved: select + joins that involve a local table var. I sort of hacked the XML and maybe that's why it deemed it invalid. So I gave up on using plan hint.
We tried periodic (every 15 minutes) manual running update stats in order to keep stats up-to-date as much as we can, but that hurt performance. updatestats blocks writes, and I'm sure even reads; updatestats seemed to maintain a bunch of statistics and on average it was taking around 80-90 seconds. A read that waits that long is unacceptable.
So the idea is to let the reads happen and prevent a situation when a recompile/update stat blocks it, correct? Does it make sense to disable auto statistics altogether? Or perhaps disable auto create statistics after deleting all the auto created stats?
This goes against Microsoft recommendations perhaps, since they enable auto create statistics and auto update statistics by default, and performance may suffer, but any ideas/hints you can give would be appreciated.
From what you are explaining, it looks like the below (all or some) might be happening.
You are doing physical reads. The quick way you avoid this is by increasing the amount of RAM you throw at the box. You haven't mentioned the hardware specs of your server. Please add details.
If you trace the SQL calls then you can easily figure out why the RECOMPILE happened. Look at the EventSubClass to figure out the reason and work towards resolving that.
ref: http://msdn.microsoft.com/en-us/library/ms187105.aspx
You mentioned table variables. These are notorious for causing performance issues when NOT using at the right place. If you use table variables in a JOIN, parallel plan is out of the question and no stats also. I am NOT sure how and where you are using but try replacing them with temp tables. And starting from SQL Server 2005, you will get only STMT recompilation at best and NOT the complete SP recompile as it happened in 2000.
You mentioned Update Stats ASYNC option and this won't block the query.
What are the TOP WAIT STATS on this server? Have you identified the expensive procedures based on CPU, Logical reads & execution count?
Have you looked the Page Life Expectancy, amount of IO using virtual file stats DMV?
Updating Stats every 15 minutes is NOT a good plan. How often is data inserted into the system? What is the sample rate you are using? What is your index maintenance strategy?
Have you looked at the missing indexes DMV?
There are a bunch of good queries to identify problems in more granular fashion using the below queries.
ref: http://dl.dropbox.com/u/13748067/SQL%20Server%202008%20Diagnostic%20Information%20Queries%20%28April%202011%29.sql
There are so many other things to look at but the above is a good starting point.
OK, here is my IMHO catch on this:
DBCC INDEXDEFRAG is worth trying and is an ONLINE function hence can be used on a live system
You could be reaching the maximum capacity of your architectural design. You can scale up which can always help but more likely you have to change the architecture to achieve better scalability sacrificing simplicity
A common trick is partitioning. You are writing to a table whose index distribution looks nothing like it was a few hours ago - hence degrading performance. This is a massive write, such a table could be divided to daily write and the rest of the data with nightly batches of moving stuff across.
More and more, people are being converting to CQRS. You might be the next. This solves the problem by separating reads from writes (a very simplistic explanation).

Detecting/Monitoring for parameter sniffing problems

Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
I have just got hit with a parameter sniffing problem. (It wasn't too serious as it caused a report to take about 2 minutes to run instead of a few seconds if properly cached and maybe 30 seconds if recompiled. And since the report is usually only run a few times per month, it is not really a problem).
However, since I wrote the report and I knew what it did, I was curious and went investigating and using SQL Profiler, I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
So, it struck me, that if SQL has these figures, (or at least can get these figures), that perhaps there is some way of getting sql to track and report which plans were significantly out.
You've got a couple of questions in there:
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
To catch this, you need to monitor the procedure cache to find out when a query's execution plan changes from good to bad. SQL Server 2008 made this a lot easier by adding query_hash and query_plan_hash fields to sys.dm_exec_query_stats. You can compare the current query plan to past ones for the same query_hash, and when it changes, compare the number of logical reads or amount of worker time from the old query to the new one. If it skyrockets, you might have a parameter sniffing problem.
Then again, someone might have just eliminated an index or changed the code in a UDF that's being called or a change in MAXDOP or any one of a million settings that influence query plan behavior.
What you want is a single dashboard that shows the most resource-consuming queries in aggregate (because you might have this problem on a query that's called extremely frequently, but consumes tiny amounts of resources each time) and then shows you changes in its execution plan over time, plus lays over system and database level changes. Quest Foglight Performance Analysis does this. (I used to work for Quest, so I know the product, but I'm not shilling here.) Note that Quest sells a separate product, Foglight, that has nothing to do with Performance Analysis. I'm not aware of any other product that goes into this level of detail.
I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
That's not necessarily parameter sniffing - that could be bad stats or table variable usage, for example. To catch this kind of issue, I like the free SQL Sentry Plan Advisor tool. In the Top Operations tab, it highlights variances between estimated and actual rows.
Now, that's only for one plan at a time, and you have to know the plan first. You want to do this 24/7, right? Sure you do - but it's computationally intensive. The procedure cache can be huge (I've got clients with >100GB of procedure cache), and it's all unindexed XML. To compare estimated vs actual rows, you have to shred all that XML - and keep in mind that the procedure cache can be constantly changing under load.
What you really want is a product that could very rapidly dump the entire procedure cache into a database, throw XML indexes on it, and then compare estimates versus actual rows. I can imagine a script doing that, but I haven't seen one yet.
You said
"estimated rows was 1, but the actual number of rows was several hundred thousand."
This can be caused by table variables which don't have statistics.
To detect parameter sniffing is difficult but you can verify it is happening by running sp_updatestats. If the problems disappears it's most likely parameter sniffing. If it doesn't then you have other problems, such as too large table variables
We use parameter masking consistently now (system was developed on SQL Server 2000). We don't need it 99.9+ % of the time but the < 0.1% justifies it because of user confidence + support overhead it entails.
You can set up a trace that to record the query text of all batches / stored procedures run that have duration > Ns.
You obviously need to tailor N for your system (and probably add rules to exclude batch jobs that take a long time even during normal execution), but this should identify which queries offer the poorest performance and will also record any queries (along with their parameters) which have abnormally long execution times - potentially the result of a parameter sniffing problem.
See How to create a SQL trace using T-SQL on how to create a trace using T-SQL. This will give better performance than using SQL Profiler as this only captures the events that you set trace events for (SQL Profiler reportedly captures all events and then filters them in the application).

SQL Server - Management Studio - Client Statistics - Wait time on server replies vs Client processing time

I have a slow running query that I've been working on optimising.
When looking at the Client Statistics in Management Studio it was taking about 8 seconds wait time on server replies and about 1 second on Client processing time.
I have always thought that the Wait time on server replies was the number to work on and Client processing time was generally bandwidth or large data size related.
I have made a number of changes to the query and now my Wait time on server replies is around 250ms, however, the Client processing time has increased to about 9 seconds making the Total execution time slightly slower.
The result set being returned is exactly the same.
Can someone shed any light on what exactly the difference between these two numbers is and what would cause such a result?
'Wait time on server replies' is the time between the last request packet left the client and the very first response packet returned from the server. 'Client processing time' is the time between first response packet and last response packet. Btw, I couldn't find the documentation to back these claims, but I'd say, based on my observations, that they are valid educated guess.
If you run a query with a large 'wait time on server replies' it means the server took long time to produce the very first row. This is usual on queries that have operators that need the entire sub-query to evaluate before they proceed (typical example is sort operators).
On the other hand a query with a very small 'wait time on server replies' means that the query was able to return the first row fast. However a long 'client processing time' does not necessarily implies the client spent a lot of time processing and the server was blocked waiting on the client. It can simply mean that the server continued to return rows from the result and this is how long it took until the very last row was returned.
What you see is the result of changes in the query plan that probably removed an operator that was blocking execution (most probably a sort) and the new plan uses a different strategy that produces the first result faster (probably uses an index that guarantees the requested order so no sort i needed), but overall last longer.
If you are worried about the client holding back the server (it can happen on large result sets) then you should investigate the wait_type in sys.dm_exec_requests (also info from sys.dm_os_tasks and sys.dm_os_workers is usefull) for the session executing the query under investigation. If I'm not mistaken the server waiting on client wait type is ASYNC_NETWORK_IO. You can also check the aggregate sys.dm_os_wait_stats, reset it using DBCC SQLPERF("sys.dm_os_wait_stats" , CLEAR) then run the query, see how long the ASYNC_NETWORK_IO wait type adds up. Of course, make sure no other activity occurs on the server during the test.

SQL Server & update (or insert) parallelism

I got a large conversion job- 299Gb of JPEG images, already in the database, into thumbnail equivalents for reporting and bandwidth purposes.
I've written a thread safe SQLCLR function to do the business of re-sampling the images, lovely job.
Problem is, when I execute it in an UPDATE statement (from the PhotoData field to the ThumbData field), this executes linearly to prevent race conditions, using only one processor to resample the images.
So, how would I best utilise the 12 cores and phat raid setup this database machine has? Is it to use a subquery in the FROM clause of the update statement? Is this all that is required to enable parallelism on this kind of operation?
Anyway the operation is split into batches, around 4000 images per batch (in a windowed query of about 391k images), this machine has plenty of resources to burn.
Please check the configuration setting for Maximum Degree of Parallelism (MAXDOP) on your SQL Server. You can also set the value of MAXDOP.
This link might be useful to you http://www.mssqltips.com/tip.asp?tip=1047
cheers
Could you not split the query into batches, and execute each batch separately on a separate connection? SQL server only uses parallelism in a query when it feels like it, and although you can stop it, or even encourage it (a little) by changing the cost threshold for parallelism option to O, but I think its pretty hit and miss.
One thing thats worth noting is that it will only decide whether or not to use parallelism at the time that the query is compiled. Also, if the query is compiled at a time when the CPU load is higher, SQL server is less likely to consider parallelism.
I too recommend the "round-robin" methodology advocated by kragen2uk and onupdatecascade (I'm voting them up). I know I've read something irritating about CLR routines and SQL paralellism, but I forget what it was just now... but I think they don't play well together.
The bit I've done in the past on similar tasks it to set up a table listing each batch of work to be done. For each connection you fire up, it goes to this table, gest the next batch, marks it as being processed, processes it, updates it as Done, and repeats. This allows you to gauge performance, manage scaling, allow stops and restarts without having to start over, and gives you something to show how complete the task is (let alone show that it's actually doing anything).
Find some criteria to break the set into distinct sub-sets of rows (1-100, 101-200, whatever) and then call your update statement from multiple connections at the same time, where each connection handles one subset of rows in the table. All the connections should run in parallel.

Resources