How do I run the query below (from this MSDN article) to determine the top worst queries (by CPU time) but only for a set date?
-- Find top 5 queries
SELECT TOP 5 query_stats.query_hash AS "Query Hash",
SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
MIN(query_stats.statement_text) AS "Statement Text"
FROM
(SELECT QS.*,
SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
((CASE statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE QS.statement_end_offset END
- QS.statement_start_offset)/2) + 1) AS statement_text
FROM sys.dm_exec_query_stats AS QS
CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
GROUP BY query_stats.query_hash
ORDER BY 2 DESC;
GO
Our database has just gone under serious strain in the last day and we cannot figure out the source of the problem.
We are using Azure SQL Database.
It's not possible to get statistics per day from the DMVs. dm_exec_query_stats has columns creation_time and last_execution_time which of course can give you some idea what has happened -- but that's only the first and last time that plan was used. The statistics will also be lost if the plan gets dropped out of plan cache, so you might not have that plan and its statistics anymore if the situation is now better (and the "bad" plans have been replaced by better ones).
That query shows the average CPU used by the queries, so it's not the perfect query for solving performance problems, because it really is average, so something with small execution count can be really high in the list even if it's really not a problem. I usually use total CPU and total logical reads for solving performance issues -- but those are total amounts since creation time, which might be a long time ago. In that case you might also considering dividing the numbers with hours since the creation time, so you'll get average CPU / I/O per hour. Also looking at max* columns might give some hints for the bad queries / plans.
If you have this kind of problems it might be a good idea to schedule that SQL as a task and gather the results somewhere. Then you can also use it as a baseline for comparing what has changed when the situation is bad. Of course in that case (and probably also otherwise) you should most likely look at more than just the top 5.
Related
Usual blather... query takes too long to run... blah blah. Long question. blah.
Obviously, I am looking at different ways of rewriting the query; but that is not what this post is about.
To resolve a "spill to tempdb" warning in a query, I have already
rebuilt all of the indexes in the database
updated all of the statistics on the tables and indexes
This fixed the "spill to tempdb" warning and improved the query performance.
Since rebuilding indexes and statistics resulted in a huge performance gain for one query (with out having to rewrite it), this got me thinking about how to improve the performance of other queries without rewriting them.
I have a nice big query that joins about 20 tables, does lots of fancy stuff I am not posting here, but takes about 6900ms to run.
Looking at the actual execution plan, I see 4 steps that have a total cost of 79%; so "a-hah" that is where the performance problem is. 3 steps are "clustered index seek" on PK_Job and the 4th step is an "Index lazy spool".
execution plan slow query
So, I break out those elements into a standalone query to investigate further... I get the "same" 4 steps in the execution plan, with a cost of 97%, only the query time is blazing fast 34ms. ... WTF? where did the performance problem disappear to?
execution plan fast query
I expected the additional tables to increase the query time; but I am not expecting the execution time to query this one Job table to go from 30ms to 4500ms.
-- this takes 34ms
select *
from equip e
left join job jf on (jf.jobid = e.jobidf)
left join job jd on (jd.jobid = e.jobidd)
left join job jr on (jr.jobid = e.jobidd)
-- this takes 6900ms
select *
from equip e
left join job jf on (jf.jobid = e.jobidf)
left join job jd on (jd.jobid = e.jobidd)
left join job jr on (jr.jobid = e.jobidd)
-- add another 20 tables in here..
Question 1: what should I look at in the two execution plans to identify why the execution time (of the clustered index seek) on this table goes from 30ms to 4500ms?
So, thinking this might have something to do with the statistics I review the index statistics on the PK_Job = JobID (which is an Int column) the histogram ranges look useless... all the "current" records are lumped together in one range (row 21 in the image). Standard problem with a PK that increments, new data is always in the last range; that is 99.999% of the JobID values that are referenced are in the one histogram range. I tried adding a filtered statistic, but that had no impact on the actual execution plan.
output from DBCC SHOW_STAT for PK_Job
Question 2: are the above PK_Job statistics a contributing factor to the complicated query being slow? That is, would "fixing" the statistics help with the complicated query? if so, what could that fix look like?
Again: I know, rewrite the query. Post more of the code (all 1500 lines of it that no one will find of any use). blah, blah.
What I would like are tips on what to look at in order to answer Q1 and Q2.
Thanks in advance!
Question 3: why would a simple IIF add 100ms to a query? the "compute scalar" nodes all show a cost of 0%, but the IIF doubles the execution time of the query.
adding this to select doubles execution time from 90ms to 180ms; Case statements are just as bad too.
IFF(X.Okay = 1, '', 'N') As OkayDesc
Next observation: Actual execution plan shows query cost relative to batch of 98%; but STATISTICS TIME shows cpu time of 141 ms; however batch cpu time is 3640 ms.
Question 4: why doesn't the query cost % (relative to batch) match up with statement cpu time?
The SQL Engine is pretty smart in optimizing badly written queries in most of the cases. But, when a query is too complex, sometimes it cannot use these optimizations and even perform bad.
So, you are asking:
I break out those elements into a standalone query to investigate
further... I get the "same" 4 steps in the execution plan, with a cost
of 97%, only the query time is blazing fast 34ms? where did
the performance problem disappear to?
The answer is pretty simple. Breaking the queries and materializing the data in #table or #table helps the engine to understand better with what amount of that it is working and built a better plan.
Brent Ozar wrote about this yesterday giving an example how bad a big query can be.
If you want more details about how to optimize your query via rewriting, you need to provide more details, but in my practice, in most of the cases simplifying the query and materializing the data in #temp tables (as we can use parallel operations using them) is giving good results.
I have tried with below SQL query.
SELECT
sql_id,
child_number,
sql_fulltext,
elapsed_time,
executions,
round(elapsed_time_avg) elapsed_time_avg
FROM
(
SELECT
command_type,
sql_id,
child_number,
sql_fulltext,
elapsed_time,
cpu_time,
disk_reads,
executions,
( elapsed_time / executions ) elapsed_time_avg
FROM
v$sql
WHERE
executions > 0
order by elapsed_time_avg desc
)
where rownum <=10;
I expect all the time top 10 expensive query from the database. my query fetched but after some time change the SQL_id (results change) with a same SQL query.
Your approach is correct. (Wowever, I suggest sorting by ELAPSED_TIME instead of an average, since it's the total run time that matters most. A million fast queries can be worst than one slow query.) But you just have to keep in mind that queries will disappear from V$SQL as they age out of the shared pool. And it's hard to predict exactly how long something will stay in the shared pool.
You might want to look at the active session history, in V$ACTIVE_SESSION_HISTORY, which usually stores many hours worth of data. And then look at DBA_HIST_ACTIVE_SESS_HISTORY, which stores 8 days of data by default. You'll have to adjust your queries, since those two views don't store sums, they store a row for each wait. You'll need to count the number of rows per SQL_ID to find the estimated wait time. (V$ACTIVE_SESSION_HISTORY samples once per second, DBA_HIST_ACTIVE_SESS_HISTORY samples once every 10 seconds.)
One of the most important thing to realize about tuning SQL is that you're not looking for perfection. You don't want to trace every single statement, or you'll go crazy. If you sample the system every X seconds, and a statement doesn't show up, then you almost certainly don't care about that statement. It's fine if slow statements disappear from the top N list.
I have huge SQL Query. Probably 15-20 tables involved.
There are 6 to 7 subqueries which are joined again.
This query most of times takes a minute to run and return 5 million records.
So even if this query is badly written, it does have query plan that makes it finish in a minute. I have ensured that query actually ran and didn't use cached results.
Sometimes, the query plan gets jacked up and then it never finishes. I run a vacuum analyze every night on the tables involved in the query. The work_memory is currently set at 200 MB..I have tried increasing this to 2 GB as well. I haven't experienced the query getting messed when work_memory was 2 GB. But when i reduced it and ran the query, it got messed. Now when i increased it back to 2 GB, the query is still messed. Has it got something to do with the query plan not getting refreshed with the new setting ? I tried discard plan on my session.
I can only think of work_mem and vacuum analyze at this point. Any other factors that can affect a smoothly running query that returns results in a minute to go and and not return anything ?
Let me know if you need more details on any settings ? or the query itself ? I can paste the plan too...But the query and the plan or too big to be pasting here..
If there are more than geqo_treshold (typically 12) entries in the range table, the genetic optimiser will kick in, often resulting in random behaviour, as described in the question. You can solve this by:
increasing geqo_limit
move some of your table referencess into a CTE. If you already have some subqueries, promote one (or more) of these to a CTE. It is a kind of black art to identify clusters of tables in your query that will fit in a compact CTE (with relatively few result tuples, and not too many key references to the outer query).
Setting geqo_treshold too high (20 is probably too high ...) will cause the planner to need a lot of time to evaluate all the plans. (the number of plans increases basically exponential wrt the number of RTEs) If you expect your query to need a few minutes to run, a few seconds of planning time will probably do no harm.
I have a relation between two tables with 600K rows and my first question is, is that a lot of data? It doesn't seem like a lot (in terms of rows, not bytes)
I can write a query like this
SELECT EntityID, COUNT(*)
FROM QueryMembership
GROUP BY EntityID
And it completes in now time at all, but when I do this.
SELECT EntityID, COUNT(*)
FROM QueryMembership
WHERE PersonID IN (SELECT PersonID FROM GetAcess(1))
GROUP BY EntityID
The thing takes 3-4 seconds to complete, despite just returning about 183 rows. SELECT * FROM QueryMembership takes about 12-13 seconds.
What I don't understand is how a filter like this would take so long, as soon as I introduce this table value function. The function it self doesn't take any time at all to return it's result and no matter if I write it as a CTE or some bizarre sub query the result is the same.
However, if it defer the filter, by inserting the result of the first select into a temporary table #temp then using the GetAccess UDF the entire thing goes about three times as fast.
I would really like some in-depth technical help on this matter. Where I should start look, and how I can analyze the execution plan to figure out what's going on.
There's an excellent series of posts on execution plans and how to read and interpret them - and a totally free e-book on the topic as well! - on the excellent Simple-Talk site.
Check them out - well worth the time!
Execution Plan Basics
SQL Server Execution Plans
Understanding More Complex Query Plans
Graphical Execution Plans for Simple SQL Queries
SQL Server Execution Plans - free e-book download
600k rows is not a particularly large amount. However, you are getting to the point where server configuration (disks, non-SQL load, etc) matters, so if your server wasn't carefully put together you should look at that now rather than later.
Analyzing execution plans is one of those things that you tend to pick up over time. The book "Inside SQL Server" is (was?) pretty nice for learning how things work internally, which helps guide you a bit as you're optimzing.
I would personally try rewriting the above query as a join, IN often doesn't perform as well as you might hope. Something like:
SELECT
EntityID,
COUNT(*)
FROM
QueryMembership q
join GetAccess(1) a on a.PersonID = q.PersonID
GROUP BY
EntityID
SELECT EntityID, COUNT(*)
FROM QueryMembership
WHERE PersonID IN (SELECT PersonID FROM GetAcess(1))
GROUP BY EntityID
The embedded subquery is expensive. as you said using a temporary table is perfect alternative solution.
I suspect that the reasons for your slowdown may be similar to those in this quesiton:
how to structure an index for group by in Sql Server
An execution plan will answer the question as to why the second query is slower, however I suspect it will be because SQL server can use indexes to look up aggregate functions (such as COUNT and MAX) using relatively inexpensive operations on some index.
If you combine a filter and a group however, SQL server can no longer use this trick and is forced to evaluate the value of COUNT or MAX based on the filtered result set, leading to expensive lookups.
600k rows is a fairly reasonable / small table size, however its big enough so that things like table scans or RDI lookups against large portions of the table will start becoming expensive.
I'd be interested to see the execution plan to understand whats going on.
My SQL Server CPU has been at around 90% for the most part of today.
I am not in a position to be able to restart it due to it being in constant use.
Is it possible to find out what within SQL is causing such a CPU overload?
I have run SQL Profiler but so much is going on it's difficult to tell if anything in particular is causing it.
I have run sp_who2 but am not sure what everything means exactly and if it is possible to identify possible problems in here.
To pre-empt any "it's probably just being used a lot" responses, this has only kicked in today from perfectly normal activitly levels.
I'm after any way of finding what is causing CPU grief within SQL.
This query uses DMV's to identify the most costly queries by CPU
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
FROM
sys.dm_exec_query_stats AS qs
CROSS APPLY
sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY
sys.dm_exec_query_plan (qs.plan_handle) AS qp
ORDER BY
qs.total_worker_time DESC
For a complete explanation see: How to identify the most costly SQL Server queries by CPU
I assume due diligence here that you confirmed the CPU is actually consumed by SQL process (perfmon Process category counters would confirm this). Normally for such cases you take a sample of the relevant performance counters and you compare them with a baseline that you established in normal load operating conditions. Once you resolve this problem I recommend you do establish such a baseline for future comparisons.
You can find exactly where is SQL spending every single CPU cycle. But knowing where to look takes a lot of know how and experience. Is is SQL 2005/2008 or 2000 ?
Fortunately for 2005 and newer there are a couple of off the shelf solutions. You already got a couple good pointer here with John Samson's answer. I'd like to add a recommendation to download and install the SQL Server Performance Dashboard Reports. Some of those reports include top queries by time or by I/O, most used data files and so on and you can quickly get a feel where the problem is. The output is both numerical and graphical so it is more usefull for a beginner.
I would also recommend using Adam's Who is Active script, although that is a bit more advanced.
And last but not least I recommend you download and read the MS SQL Customer Advisory Team white paper on performance analysis: SQL 2005 Waits and Queues.
My recommendation is also to look at I/O. If you added a load to the server that trashes the buffer pool (ie. it needs so much data that it evicts the cached data pages from memory) the result would be a significant increase in CPU (sounds surprising, but is true). The culprit is usually a new query that scans a big table end-to-end.
You can find some useful query here:
Investigating the Cause of SQL Server High CPU
For me this helped a lot:
SELECT s.session_id,
r.status,
r.blocking_session_id 'Blk by',
r.wait_type,
wait_resource,
r.wait_time / (1000 * 60) 'Wait M',
r.cpu_time,
r.logical_reads,
r.reads,
r.writes,
r.total_elapsed_time / (1000 * 60) 'Elaps M',
Substring(st.TEXT,(r.statement_start_offset / 2) + 1,
((CASE r.statement_end_offset
WHEN -1
THEN Datalength(st.TEXT)
ELSE r.statement_end_offset
END - r.statement_start_offset) / 2) + 1) AS statement_text,
Coalesce(Quotename(Db_name(st.dbid)) + N'.' + Quotename(Object_schema_name(st.objectid, st.dbid)) + N'.' +
Quotename(Object_name(st.objectid, st.dbid)), '') AS command_text,
r.command,
s.login_name,
s.host_name,
s.program_name,
s.last_request_end_time,
s.login_time,
r.open_transaction_count
FROM sys.dm_exec_sessions AS s
JOIN sys.dm_exec_requests AS r
ON r.session_id = s.session_id
CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st
WHERE r.session_id != ##SPID
ORDER BY r.cpu_time desc
In the fields of status, wait_type and cpu_time you can find the most CPU consuming task that is running right now.
Run either of these a few second apart. You'll detect the high CPU connection.
Or: stored CPU in a local variable, WAITFOR DELAY, compare stored and current CPU values
select * from master..sysprocesses
where status = 'runnable' --comment this out
order by CPU
desc
select * from master..sysprocesses
order by CPU
desc
May not be the most elegant but it'd effective and quick.
You can run the SQL Profiler, and filter by CPU or Duration so that you're excluding all the "small stuff". Then it should be a lot easier to determine if you have a problem like a specific stored proc that is running much longer than it should (could be a missing index or something).
Two caveats:
If the problem is massive amounts of tiny transactions, then the filter I describe above would exclude them, and you'd miss this.
Also, if the problem is a single, massive job (like an 8-hour analysis job or a poorly designed select that has to cross-join a billion rows) then you might not see this in the profiler until it is completely done, depending on what events you're profiling (sp:completed vs sp:statementcompleted).
But normally I start with the Activity Monitor or sp_who2.
For a GUI approach I would take a look at Activity Monitor under Management and sort by CPU.