How do I find out what is hammering my SQL Server? - sql-server

My SQL Server CPU has been at around 90% for the most part of today.
I am not in a position to be able to restart it due to it being in constant use.
Is it possible to find out what within SQL is causing such a CPU overload?
I have run SQL Profiler but so much is going on it's difficult to tell if anything in particular is causing it.
I have run sp_who2 but am not sure what everything means exactly and if it is possible to identify possible problems in here.
To pre-empt any "it's probably just being used a lot" responses, this has only kicked in today from perfectly normal activitly levels.
I'm after any way of finding what is causing CPU grief within SQL.

This query uses DMV's to identify the most costly queries by CPU
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
FROM
sys.dm_exec_query_stats AS qs
CROSS APPLY
sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY
sys.dm_exec_query_plan (qs.plan_handle) AS qp
ORDER BY
qs.total_worker_time DESC
For a complete explanation see: How to identify the most costly SQL Server queries by CPU

I assume due diligence here that you confirmed the CPU is actually consumed by SQL process (perfmon Process category counters would confirm this). Normally for such cases you take a sample of the relevant performance counters and you compare them with a baseline that you established in normal load operating conditions. Once you resolve this problem I recommend you do establish such a baseline for future comparisons.
You can find exactly where is SQL spending every single CPU cycle. But knowing where to look takes a lot of know how and experience. Is is SQL 2005/2008 or 2000 ?
Fortunately for 2005 and newer there are a couple of off the shelf solutions. You already got a couple good pointer here with John Samson's answer. I'd like to add a recommendation to download and install the SQL Server Performance Dashboard Reports. Some of those reports include top queries by time or by I/O, most used data files and so on and you can quickly get a feel where the problem is. The output is both numerical and graphical so it is more usefull for a beginner.
I would also recommend using Adam's Who is Active script, although that is a bit more advanced.
And last but not least I recommend you download and read the MS SQL Customer Advisory Team white paper on performance analysis: SQL 2005 Waits and Queues.
My recommendation is also to look at I/O. If you added a load to the server that trashes the buffer pool (ie. it needs so much data that it evicts the cached data pages from memory) the result would be a significant increase in CPU (sounds surprising, but is true). The culprit is usually a new query that scans a big table end-to-end.

You can find some useful query here:
Investigating the Cause of SQL Server High CPU
For me this helped a lot:
SELECT s.session_id,
r.status,
r.blocking_session_id 'Blk by',
r.wait_type,
wait_resource,
r.wait_time / (1000 * 60) 'Wait M',
r.cpu_time,
r.logical_reads,
r.reads,
r.writes,
r.total_elapsed_time / (1000 * 60) 'Elaps M',
Substring(st.TEXT,(r.statement_start_offset / 2) + 1,
((CASE r.statement_end_offset
WHEN -1
THEN Datalength(st.TEXT)
ELSE r.statement_end_offset
END - r.statement_start_offset) / 2) + 1) AS statement_text,
Coalesce(Quotename(Db_name(st.dbid)) + N'.' + Quotename(Object_schema_name(st.objectid, st.dbid)) + N'.' +
Quotename(Object_name(st.objectid, st.dbid)), '') AS command_text,
r.command,
s.login_name,
s.host_name,
s.program_name,
s.last_request_end_time,
s.login_time,
r.open_transaction_count
FROM sys.dm_exec_sessions AS s
JOIN sys.dm_exec_requests AS r
ON r.session_id = s.session_id
CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st
WHERE r.session_id != ##SPID
ORDER BY r.cpu_time desc
In the fields of status, wait_type and cpu_time you can find the most CPU consuming task that is running right now.

Run either of these a few second apart. You'll detect the high CPU connection.
Or: stored CPU in a local variable, WAITFOR DELAY, compare stored and current CPU values
select * from master..sysprocesses
where status = 'runnable' --comment this out
order by CPU
desc
select * from master..sysprocesses
order by CPU
desc
May not be the most elegant but it'd effective and quick.

You can run the SQL Profiler, and filter by CPU or Duration so that you're excluding all the "small stuff". Then it should be a lot easier to determine if you have a problem like a specific stored proc that is running much longer than it should (could be a missing index or something).
Two caveats:
If the problem is massive amounts of tiny transactions, then the filter I describe above would exclude them, and you'd miss this.
Also, if the problem is a single, massive job (like an 8-hour analysis job or a poorly designed select that has to cross-join a billion rows) then you might not see this in the profiler until it is completely done, depending on what events you're profiling (sp:completed vs sp:statementcompleted).
But normally I start with the Activity Monitor or sp_who2.

For a GUI approach I would take a look at Activity Monitor under Management and sort by CPU.

Related

SQL Server 2019 instance seems to randomly dump its query plan cache

The instance in question has maximum server memory set to 6GB, but only seems to be using half a GB. I checked the query plan cache by using the query on this page:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-cached-plans-transact-sql?view=sql-server-ver16
SELECT usecounts, cacheobjtype, objtype, text
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE usecounts > 1
ORDER BY usecounts DESC;
GO
After running that, I only see about 3 plans. When I run the application that uses this database, sometimes there will be 300-400 plans, but about 30 seconds later the same query will only show about 3 plans in the cache.
I've run SQL profiler and can't find anything running a DBCC FREEPROCCACHE
There are 3 other instances on this server that are consuming their allocated memory just fine. One in particular is allowed to eat 2GB and has consumed the entire amount with over 500 plans consistently in its cache.
Other than a scheduled task running DBCC FREEPROCCACHE every 30-60 seconds, is there anything that would cause SQL Server 2019 to behave in this way?
Multiple facets of SQL Server will 'compete' for buffer cache, including:
Data
Plans
Clerks (i.e., other caches)
Memory Grants
etc
The amount of space that Plans can consume is dictated by thresholds defined here:
https://learn.microsoft.com/en-us/previous-versions/tn-archive/cc293624(v=technet.10)
https://www.sqlskills.com/blogs/erin/sql-server-plan-cache-limits/
And, once plans start to exceed those thresholds, the SQLOS will beging to 'eagerly cleanup/clip/evict' less frequently used plans.
Likewise, if OTHER clerks (caches for things like schemas, objects, and permissions-caches against those objects - i.e., TOKENPERMS) exceed certain, internal, cache thresholds they TOO can cause the SQLOS to start scavenging ALL caches - including cache plans.
For example:
https://learn.microsoft.com/en-us/archive/blogs/psssql/query-performance-issues-associated-with-a-large-sized-security-cache
Likewise, Memory Grants can/will use buffer cache during query processing. For example, if you're querying a huge table and the engine expects to get back (or hang-on-to for further processing) roughly 1KB of for each of 10 million rows, you're going to need potentially 9GB of buffer space for said query to process. (Or, there are mechanics LIKE this in play with memory grants - the example I've cited is WAY too simplistic - to the point of not being even close to accurate).
The point being, however, that these grants can/will be given RAM directly from the overall buffer cache and can/will cause INTERNAL memory pressure against the plan-cache (and all other caches for that matter).
In short, memory grants can be a huge problem with SOME workloads.
Otherwise, external factors (other apps - especially memory-hungry apps) can/will cause the OS to tell SQL Server to 'cough up' memory it has been using. (You can prevent this by granting the Lock_Pages_In_Memory User Right to the SQL Server service account - just be sure you know what you're doing here.)
In your case, with 4x distinct instances running, I'd assume you're likely running into 'external' memory pressure against the instance in question.
That said, you can query sys.dm_os_ring_buffers to get insight into whether or not memory pressure is happening - as per posts like the following:
https://learn.microsoft.com/en-us/archive/blogs/psssql/how-it-works-what-are-the-ring_buffer_resource_monitor-telling-me
https://learn.microsoft.com/en-us/archive/blogs/mvpawardprogram/using-sys-dm_os_ring_buffers-to-diagnose-memory-issues-in-sql-server
https://www.sqlskills.com/blogs/jonathan/identifying-external-memory-pressure-with-dm_os_ring_buffers-and-ring_buffer_resource_monitor/
Along those lines, I use the following query/diagnostic to check for memory pressure:
WITH core AS (
SELECT
EventTime,
record.value('(/Record/ResourceMonitor/Notification)[1]', 'varchar(max)') as [Type],
record.value('(/Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') as [IndicatorsProcess],
record.value('(/Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') as [IndicatorsSystem],
record.value('(/Record/ResourceMonitor/IndicatorsPool)[1]', 'int') as [IndicatorsPool],
record.value('(/Record/MemoryNode/#id)[1]', 'int') as [MemoryNode],
record.value('(/Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [Avail Phys Mem, Kb],
record.value('(/Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [Avail VAS, Kb],
record
FROM (
SELECT
DATEADD (ss, (-1 * ((cpu_ticks / CONVERT (float, ( cpu_ticks / ms_ticks ))) - [timestamp])/1000), GETDATE()) AS EventTime,
CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
CROSS JOIN sys.dm_os_sys_info
WHERE ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR') AS tab
)
SELECT
EventTime,
[Type],
IndicatorsProcess,
IndicatorsSystem,
IndicatorsPool,
MemoryNode,
CAST([Avail Phys Mem, Kb] / (1024.0 * 1024.0) AS decimal(20,2)) [Avail Phys Mem (GB)],
CAST([Avail VAS, Kb] / (1024.0 * 1024.0) AS decimal(20,2)) [Avail VAS (GB)]
,record
FROM
core
WHERE
[Type] = N'RESOURCE_MEMPHYSICAL_LOW'
ORDER BY
EventTime DESC;
As in, if you run that against effectively ANY SQL Server instance, you REALLY don't want to see ANY results from this query. Or, if you do, they should be at times when you're running REALLY heavy workloads (ugly data-loading/population jobs or other huge processing operations) that you're already aware are issues/problems from a performance perspective.
Otherwise, the occasional entry/hiccup (i.e., set of results) isn't necessarily a reason to worry about major problems, but if you're routinely seeing entries/rows/results from the above with regular workloads, you'll want to investigate things like all of the details listed above (cache and clerk sizes/thresholds, trap for any large memory grants, check plan-cache sizing based on overall RAM, etc.) AND/OR start looking into cache clock hands to see exactly where memory is being scavenged:
https://learn.microsoft.com/en-us/archive/blogs/slavao/q-and-a-clock-hands-what-are-they-for

What statistical metrics are related to overall SQL Server database performance?

In SQL Server, I would like to know what related statistical metrics such as Oracle's 'SQL Service Response Time' or 'Response Time Per Txn' can evaluate the overall database performance.
Please tell me the name of the statistical metrics and how to collect it using sql .
SQL Server does not accumulate statistics about transactions, but stats of execution are available for free in all editions for queries, procedures, triggers and UDF in DMV like :
SELECT * FROM sys.dm_exec_query_stats;
SELECT * FROM sys.dm_exec_procedure_stats;
SELECT * FROM sys.dm_exec_trigger_stats;
SELECT * FROM sys.dm_exec_function_stats;
The metrics to consider are the followings :
execution_count,
total_worker_time
total_elapsed_time
...
As an example, to have a mean exec time, you must divide the total time by the execution_count
You're looking for Windows Performance counters, there are a range of them, see example:
https://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
These can be read by code.
this is a big topic, but if this is what you need, please describe what problem you want to address as it dictates which part of windows is interesting to that end.
Generally i look for:
batch requests per second
lock wait time
deadlocks
cache hit ratio
target/ actual memory relation
available memory
context switches per second
CPU utilization
what we need to act on is the values changing away from normal picture.

sqlserver.exe takes high CPU usage, that is more than 90%. In which scenario's sql server takes so much CPU usage?

I have SQL Server 2008 R2 on server, my issue is its CPU usage reaches more than 90%. I just need to know that,
In which cases or scenario's, the sql server's cpu memory usage goes so high?
In which cases or scenario's, the sql server's cpu memory usage goes so high?
This is a very broad question,but here are the things in order i will check..
Are there any applications hosted on same box other than SQLServer ..? if yes try to avoid them,as this also doesn't fall under best practices..
Basic troubleshooting to start with ,when SQLServer is using huge CPU is, to find out top CPU Consuming queries..
this can be found using below query or you can use Glen Berry DMV's specific to Version you are using..
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
from
sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
ORDER BY qs.total_worker_time desc
next step is to try to optimise those queries...Remember a query can use High CPU,when
1.There are no appropriate indexes,so it has to read the entire table every time..
2.Indexes are there ,but you are also facing memory pressure ,which can cause buffer pool to be flushed out..
Also Consider base lining the performance of SQLServer,which will help you in future and to see if you need to more processors..

"Find top 5 queries" by date

How do I run the query below (from this MSDN article) to determine the top worst queries (by CPU time) but only for a set date?
-- Find top 5 queries
SELECT TOP 5 query_stats.query_hash AS "Query Hash",
SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
MIN(query_stats.statement_text) AS "Statement Text"
FROM
(SELECT QS.*,
SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
((CASE statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE QS.statement_end_offset END
- QS.statement_start_offset)/2) + 1) AS statement_text
FROM sys.dm_exec_query_stats AS QS
CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
GROUP BY query_stats.query_hash
ORDER BY 2 DESC;
GO
Our database has just gone under serious strain in the last day and we cannot figure out the source of the problem.
We are using Azure SQL Database.
It's not possible to get statistics per day from the DMVs. dm_exec_query_stats has columns creation_time and last_execution_time which of course can give you some idea what has happened -- but that's only the first and last time that plan was used. The statistics will also be lost if the plan gets dropped out of plan cache, so you might not have that plan and its statistics anymore if the situation is now better (and the "bad" plans have been replaced by better ones).
That query shows the average CPU used by the queries, so it's not the perfect query for solving performance problems, because it really is average, so something with small execution count can be really high in the list even if it's really not a problem. I usually use total CPU and total logical reads for solving performance issues -- but those are total amounts since creation time, which might be a long time ago. In that case you might also considering dividing the numbers with hours since the creation time, so you'll get average CPU / I/O per hour. Also looking at max* columns might give some hints for the bad queries / plans.
If you have this kind of problems it might be a good idea to schedule that SQL as a task and gather the results somewhere. Then you can also use it as a baseline for comparing what has changed when the situation is bad. Of course in that case (and probably also otherwise) you should most likely look at more than just the top 5.

Monitoring Cursors, what are some good queries/scripts to do this?

I need to provide management with proof that a group of existing stored procedures that use cursors are the cause of much of our performance issues. Can someone point me in the right direction to find scripts and queries to accomplish this, please? Such as, how to monitor and measure cursors, etc. Using SQL Server 2005.
Thanks.
========UPDATE============
Management needs ammunition to take back to 3rd party vendor to tell them to change their procs at little or no cost to us. Since these are 3rd party procs hitting our accounting system, I don't have any way of rewriting them first.
Besides traces (already doing), are there any other things I can do? I've found that using sys.dm_exec_cursors(0) lets me get a quick list of exisitng cursors. Are there any other things like this?
So you did hard measurements and collected execution times and statistics showing that the problem procedures are the ones using the cursors, right? Then the collected information is an excellent argument to prove your case. If you did not... then how do you know is the cursors?
Start by looking at sys.dm_exec_query_stats and collect the most expensive queries by worker time (CPU), elapsed time (duration) and by I/O. These should be enough to point to the culprit and find out if indeed, the problem is because of the cursors or not.
If the cursors turn out to be indeed an issue, there are dedicated DMVs for them too, sys.dm_exec_cursors
For example, the top most expensive CPU frequently executed statements:
select top(10) substring(Text,
statement_start_offset/2,
(statement_end_offset-statement_start_offset)/2) as Statement
, *
from sys.dm_exec_query_stats q
cross apply sys.dm_exec_sql_text(sql_handle)
where execution_count > 100
order by total_worker_time/execution_count desc
The best thing (time permitting) would be to rewrite some of the procs as set-based statements and then compare the two with waits analysis (http://technet.microsoft.com/en-us/library/cc966413.aspx has a good paper about how to do this type of thing). Without a before-and-after, your adversaries might just say "set-based won't be any better :-)"
You can run SQL Profiler and capture a trace with the offending sprocs (this will give you important measures like Reads, CPU, Duration).
A good idea would be to e.g. take one of them as an example that's quite easy to rewrite as a set-based approach, run it and capture the profiler trace for that. This way, you can show realworld differences in performance.
If possible, (i.e. not on production), you should clear down the execution plan and data cache before running each version of the sproc to allow a fair comparison.
Also, you could get the execution plans for the cursor version, and the set-based version.
At the end of the day, bottom-line stats speak for themselves so having a comparison "before" and "after" will be beneficial.
Performance monitor (perfmon.exe) is an excellent tool for real time analysis of SQL Server performance.

Resources