We are running SQL Server 2008 with currently around 50 databases of varying size and workload. Occasionally SQL Server spikes the CPU completely for about a minute, after which it drops to normal baseline load.
My problem is that I can't determine which database or connection is causing it (I'm fairly sure it is one specific query that is missing an index - or something like that).
I have found T-SQL queries that gives you a frozen image of current processes. There are also the "recent expensive queries" view and of course the profiler, but it is hard to map to a "this is the database that is causing it" answer.
What makes it even harder is that the problem disappears before I have even fired up the profiler or activity monitor, and it only happens about once or twice a day.
Ideally I would like to use a performance counter so I could simply run it for a day or two and then take a look at what caused the spikes. I can however not find any relevant counter.
Any suggestions?
This will help, courtesy of Glenn Berry adapted from Robert Pearl:
WITH DB_CPU_Stats
AS
(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName], SUM(total_worker_time) AS [CPU_Time_Ms]
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID]
FROM sys.dm_exec_plan_attributes(qs.plan_handle)
WHERE attribute = N'dbid') AS F_DB
GROUP BY DatabaseID)
SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [row_num],
DatabaseName, [CPU_Time_Ms],
CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]
FROM DB_CPU_Stats
WHERE DatabaseID > 4 -- system databases
AND DatabaseID <> 32767 -- ResourceDB
ORDER BY row_num OPTION (RECOMPILE);
Run a profiler trace logging the database name and cpu during a spike, load the data up into a table, count and group on db.
select DatabaseName, sum(CPU) from Trace
group by DatabaseName
Have a look at sys.dm_exec_query_stats. The total_worker_time column is a measure of CPU. You may be able to accomplish what you're trying to do in one look at the view. You may, however, need to come up with a process to take "snapshots" of the view and compare successive snapshots. That is, look at the data in the view and compare it to five minutes later and compare the differences. The differences will the the amount of resources consumed between the two snapshots. Good luck!
Have you tried relating SQL Server Profiler to Performance Monitor?
When you correlate the data, you can see spikes in performance related to DB activity.
http://www.sqlservernation.com/home/relating-sql-server-profiler-with-performance-monitor.html
Related
I'm trying to assess load level on sql server 2008/2016 by insert query.
There are articles I found which discuss that, like:
http://use-the-index-luke.com/sql/dml/insert
which talks about execution time.
I'm not very proficient in sql server, e.g. don't know how to evaluate execution plans.
I know that are handy performance reports, like "Performance - Top Queries by Total CPU Time".
I've searched and not found definitions of those reports.
So question is - which server tasks does this report include in CPU time calculations of queries, i.e.
indexes recalculation?
maybe even executing of triggers?
something else?
Thank you!
These are MDW or Management Data Warehouse reports and in particular the Query Statistics History introduced in SQL Server 2008. If you are interested in collecting this data then enable and Configure the Management Data Warehouse.
What are these reports anyway.
By default, only the top 10 queries will be included in the Top 10 Queries by CPU, however, you can emulate the query behind the report and tweak the desired outcome using a query similar to the one below as discussed in this article.
SELECT TOP X
qs.total_worker_time/(qs.execution_count*60000000) as [Minutes Avg CPU Time],
qs.execution_count as [Times Run],
qs.min_worker_time/60000000 as [CPU Time in Mins],
SUBSTRING(qt.text,qs.statement_start_offset/2,
(case when qs.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end -qs.statement_start_offset)/2) as [Query Text],
db_name(qt.dbid) as [Database],
object_name(qt.objectid) as [Object Name]
FROM sys.dm_exec_query_stats qs cross apply
sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Minutes Avg CPU Time] DESC
Index recalculations and trigger executions are not performed as part of a query. Index updates are part of maintenance activity and trigger executions are part of Insert/Update/Delete activity.
Generally speaking, there are no "server tasks" included in the calculations for the Top Queries report. A query execution plan is based on that query and the data statistics available at the start of the query compilation. The plan generated is independent of maintenance or IUD activity taking place on the server.
It is possible that other activity make cause the actual duration to increase, but that additional time is not directly attributable to the query. The query is just forced to wait while the other activity completes.
Does that help?
Here is modified query which shows top CPU time consumers.
It is not average, it is summarized.
Also it is grouped by query_plan_hash so same query with different params will be in one group.
Note 1: if query running frequently (~1 time every second) then it's statistics will be flushed every hour.
Note 2: User name will exist if only query is running at the moment
Note 3: If you need to keep stats for long time, you will need to store it somewhere separately. Also adding grouping by date will help with reporting
SELECT TOP 10
SUM(qs.total_worker_time)/(1000000) AS [CPU Time Seconds],
SUM(qs.execution_count) AS [Times Run],
qs.query_plan_hash AS [Hash],
MIN(creation_time) AS [Creation time],
MIN(qt.text) AS [Query],
MIN(USER_NAME(r.user_id)) AS [UserName]
FROM sys.dm_exec_query_stats qs CROSS apply
sys.dm_exec_sql_text(qs.sql_handle) AS qt
LEFT JOIN sys.dm_exec_requests AS r ON qs.query_plan_hash =
r.query_plan_hash
GROUP BY qs.query_plan_hash
ORDER BY [CPU Time Seconds] DESC
I have a Azure SQL production database that runs at around 10-20% DTU usage on average, however, I get DTU spikes that take it upwards of 100% at times. Here is a sample from the past 1 hour:
I realize this could be a rouge query, so I switched over to the Query Performance Insight tab, and I find the following from the past 24 hours:
This chart makes sense with regards to the CPU usage line. Query 3780 takes the majority of at CPU, as expected with my application. The Overall DTU (red) line seems to follow this correctly (minus the spikes).
However, in the DTU Components charts I can see large Data IO spikes occurring that coincide with the Overall DTU spikes. Switching over to the TOP 5 queries by Data IO, I see the following:
This seems to indicate that there are no queries that are using high amounts of Data IO.
How do I find out where this Data IO usage is coming from?
Finally, I see that there is this one, "odd ball" query (7966) listed under the TOP 5 queries by Data IO with only 5 executions. Selecting it shows the following:
SELECT StatMan([SC0], [SC1], [SC2], [SB0000])
FROM (SELECT TOP 100 PERCENT [SC0], [SC1], [SC2], step_direction([SC0]) over (order by NULL) AS [SB0000]
FROM (SELECT [UserId] AS [SC0], [Type] AS [SC1], [Id] AS [SC2] FROM [dbo].[Cipher] TABLESAMPLE SYSTEM (1.828756e+000 PERCENT)
WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER
ORDER BY [SC0], [SC1], [SC2], [SB0000] ) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 16)
What is this query?
This does not look like any query that my application has created/uses. The timestamps on the details chart seem to line up with the approximate times of the overall Data IO spikes (just prior to 6am) which leads me to think this query has something to do with all of this.
Are there any other tools can I use to help isolate this issue?
The query is updating statistics..this occurs when this setting AUTO UPDATE STATISTICS is on..This should be kept on and you can't turn it off..this is a best practice..
You should update stats manually only when when you see a query not performing well and stats are off for that query..
Also below are some rules when SQL will update stats automatically for you
When a table with no rows gets a row
When 500 rows are changed to a table that is less than 500 rows
When 20% + 500 are changed in a table greater than 500 rows
By ‘change’ we mean if a row is inserted, updated or deleted. So, yes, even the automatically-created statistics get updated and maintained as the data changes.There were some changes to these rules in recent versions and sql can update stats more often
References:
https://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/
It seems that query is part of the automatic update of statistics process. To mitigate the impact of this process on production you can regularly update statistics and indexes using runbooks as explained here. Run sp_updatestats to immediately try to mitigate the impact of this process.
Sometimes my application runs slow. The major problem is that some expensive reports are running. How can I find these reports and how to kill these instantly?
You can use the following command to get the long running queries.
SELECT r.session_id,
st.TEXT AS batch_text,
qp.query_plan AS 'XML Plan',
r.start_time,
r.status,
r.total_elapsed_time
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp
WHERE DB_NAME(r.database_id) = '{db_name}'
ORDER BY cpu_time DESC;
Then you can use
KILL 60
to kill session_id 60 for example.
I always use sp_WhoIsActive from Adam Machanic for finding long running queries.
sp_WhoIsActive is described in detail on dba.stackexchange.com.
Although you can also write your own script or use sp_who2 for example.
Update
You are interested in the first 2 columns of the output of sp_WhoIsActive.
The first column defines how long the query is running. The second column is the session_id (or SPID) of the query.
You can use KILL 60 to kill session_id 60 for example.
Have a look over here for a detailed explanation of the stored procedure.
I have a few advices for you but not all them fit for you.
1- Reporting and CRUD operations must be sparated. At least you can use nolock or something or run them at night and can work offline.
2 - Check your queries because if the data amount less then the 2 000 000, the main problem is queries for many time.
3- Analyse the report types and if suitable for offline work, use offline system for reporting
4- can use mirroring or other techniques for reporting.
5- Best practise is always sparate the databases for reporting and CRUD operations.
We have a database with about 50-60 % recompiles. That value comes from [SQL Compilations/sec] coupled with [Batch Requests/sec].
We Think that that value is a bit high
If we look at this query:
SELECT TOP 150
qs.plan_generation_num,
qs.execution_count,
qs.statement_start_offset,
qs.statement_end_offset,
st.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE qs.plan_generation_num > 1
ORDER BY plan_generation_num DESC
We don't have a lot of plan_generation_num if you compare it to execution count.
Wat we do have is a lot of single use objects and I am trying to figure out why?
Our application is built in ASP.NET and we always use parameterized querys. We use both SP's and SQL-statements in the application but always parameterized.
The webpage that runs agains this database is a pretty big website with about 500 000 pageviews each day and about 10 000 request per minute if this information helps.
We have no long running Querys and indexes and statisics are in order. This is one of the last things to optimize.
CPU is average 15%
ram is about 100 gb and of coursed used up by SQL-server.
We use SQL Server 2014 Enterprise.
One thing I started wondering about. If I have a sql statement like this
SELECT doors, windows, seats from cars where Wheels = #Wheels AND Active = 1
Will this plan not be reused beacause we don't set a parameter on this part: **AND Active = 1
**
Any idea on how to get an idea on why we have so much single use?
The Count on cached plans is about 20 000. In comparasion we have about 700 sp' and a lot more querys in the app.
Ok,
I don't know if I am going crazy or not, but didn't the Estimated exexution plan use to show you what Indexes you would need to improve your performance? I used to this at my old job, but now it seems I have to using the tuning advisor. I don't mind the tuning advisor, but the way I did it before was so simple!
Thanks
In both SSMS 2008 and in SSMS 2012 see this is working fine for both estimated and actual plans.
Here is a quick example to show that estimated and actual execution plans will both show missing indexes:
USE tempdb;
GO
CREATE TABLE dbo.smorg(blamp INT);
GO
INSERT dbo.smorg(blamp) SELECT n FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY c1.object_id)
FROM sys.all_objects AS c1, sys.all_objects AS c2
) AS x(n);
GO
Now highlight this and choose estimated execution plan, or turn on actual execution plan and hit Execute:
SELECT blamp FROM dbo.smorg WHERE blamp BETWEEN 100 AND 105;
You should see a missing index recommendation. And you will see it represented here:
SELECT *
FROM sys.dm_db_missing_index_details
WHERE [object_id] = OBJECT_ID('dbo.smorg');
You can read more about the DMV here:
http://msdn.microsoft.com/en-us/library/ms345434.aspx
Also you should investigate SQL Sentry Plan Explorer (disclaimer: I work for SQL Sentry). This is a free tool that shows the missing indexes for estimated plans (if they are in the XML provided by SQL Server), doesn't have bugs like SSMS (where it repeats the same recommendation across multiple batches, even batches that don't mention the same tables), and generates actual execution plans without pulling all of the results across the network to the client - it just discards them (so the network and data overhead don't factor into to plan analysis).