I need to determine the workload of our database instance each week, AWR report provides many details but its very hard to break down the data.
I need a query that produces a data set that represents the snap-id with the following value:
CPU utilization
Memory Utilization
Read/Write operations
Using this set I will be able to create a histogram shows the CPU, memory, and read/write utilization during the week by each hour.
You can try querying the DBA_HIST_SYSMETRIC_SUMMARY view to get the CPU utilization Memory Utilization Read/Write operations at the SNAP_ID level.
Sample query provided below:
select *
from DBA_HIST_SYSMETRIC_SUMMARY
where snap_id=<snap_id>
and metric_name in ('Host CPU Utilization (%)','I/O Megabytes per Second','I/O Requests per Second','Total PGA Allocated');
Related
The instance in question has maximum server memory set to 6GB, but only seems to be using half a GB. I checked the query plan cache by using the query on this page:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-cached-plans-transact-sql?view=sql-server-ver16
SELECT usecounts, cacheobjtype, objtype, text
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE usecounts > 1
ORDER BY usecounts DESC;
GO
After running that, I only see about 3 plans. When I run the application that uses this database, sometimes there will be 300-400 plans, but about 30 seconds later the same query will only show about 3 plans in the cache.
I've run SQL profiler and can't find anything running a DBCC FREEPROCCACHE
There are 3 other instances on this server that are consuming their allocated memory just fine. One in particular is allowed to eat 2GB and has consumed the entire amount with over 500 plans consistently in its cache.
Other than a scheduled task running DBCC FREEPROCCACHE every 30-60 seconds, is there anything that would cause SQL Server 2019 to behave in this way?
Multiple facets of SQL Server will 'compete' for buffer cache, including:
Data
Plans
Clerks (i.e., other caches)
Memory Grants
etc
The amount of space that Plans can consume is dictated by thresholds defined here:
https://learn.microsoft.com/en-us/previous-versions/tn-archive/cc293624(v=technet.10)
https://www.sqlskills.com/blogs/erin/sql-server-plan-cache-limits/
And, once plans start to exceed those thresholds, the SQLOS will beging to 'eagerly cleanup/clip/evict' less frequently used plans.
Likewise, if OTHER clerks (caches for things like schemas, objects, and permissions-caches against those objects - i.e., TOKENPERMS) exceed certain, internal, cache thresholds they TOO can cause the SQLOS to start scavenging ALL caches - including cache plans.
For example:
https://learn.microsoft.com/en-us/archive/blogs/psssql/query-performance-issues-associated-with-a-large-sized-security-cache
Likewise, Memory Grants can/will use buffer cache during query processing. For example, if you're querying a huge table and the engine expects to get back (or hang-on-to for further processing) roughly 1KB of for each of 10 million rows, you're going to need potentially 9GB of buffer space for said query to process. (Or, there are mechanics LIKE this in play with memory grants - the example I've cited is WAY too simplistic - to the point of not being even close to accurate).
The point being, however, that these grants can/will be given RAM directly from the overall buffer cache and can/will cause INTERNAL memory pressure against the plan-cache (and all other caches for that matter).
In short, memory grants can be a huge problem with SOME workloads.
Otherwise, external factors (other apps - especially memory-hungry apps) can/will cause the OS to tell SQL Server to 'cough up' memory it has been using. (You can prevent this by granting the Lock_Pages_In_Memory User Right to the SQL Server service account - just be sure you know what you're doing here.)
In your case, with 4x distinct instances running, I'd assume you're likely running into 'external' memory pressure against the instance in question.
That said, you can query sys.dm_os_ring_buffers to get insight into whether or not memory pressure is happening - as per posts like the following:
https://learn.microsoft.com/en-us/archive/blogs/psssql/how-it-works-what-are-the-ring_buffer_resource_monitor-telling-me
https://learn.microsoft.com/en-us/archive/blogs/mvpawardprogram/using-sys-dm_os_ring_buffers-to-diagnose-memory-issues-in-sql-server
https://www.sqlskills.com/blogs/jonathan/identifying-external-memory-pressure-with-dm_os_ring_buffers-and-ring_buffer_resource_monitor/
Along those lines, I use the following query/diagnostic to check for memory pressure:
WITH core AS (
SELECT
EventTime,
record.value('(/Record/ResourceMonitor/Notification)[1]', 'varchar(max)') as [Type],
record.value('(/Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') as [IndicatorsProcess],
record.value('(/Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') as [IndicatorsSystem],
record.value('(/Record/ResourceMonitor/IndicatorsPool)[1]', 'int') as [IndicatorsPool],
record.value('(/Record/MemoryNode/#id)[1]', 'int') as [MemoryNode],
record.value('(/Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [Avail Phys Mem, Kb],
record.value('(/Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [Avail VAS, Kb],
record
FROM (
SELECT
DATEADD (ss, (-1 * ((cpu_ticks / CONVERT (float, ( cpu_ticks / ms_ticks ))) - [timestamp])/1000), GETDATE()) AS EventTime,
CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
CROSS JOIN sys.dm_os_sys_info
WHERE ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR') AS tab
)
SELECT
EventTime,
[Type],
IndicatorsProcess,
IndicatorsSystem,
IndicatorsPool,
MemoryNode,
CAST([Avail Phys Mem, Kb] / (1024.0 * 1024.0) AS decimal(20,2)) [Avail Phys Mem (GB)],
CAST([Avail VAS, Kb] / (1024.0 * 1024.0) AS decimal(20,2)) [Avail VAS (GB)]
,record
FROM
core
WHERE
[Type] = N'RESOURCE_MEMPHYSICAL_LOW'
ORDER BY
EventTime DESC;
As in, if you run that against effectively ANY SQL Server instance, you REALLY don't want to see ANY results from this query. Or, if you do, they should be at times when you're running REALLY heavy workloads (ugly data-loading/population jobs or other huge processing operations) that you're already aware are issues/problems from a performance perspective.
Otherwise, the occasional entry/hiccup (i.e., set of results) isn't necessarily a reason to worry about major problems, but if you're routinely seeing entries/rows/results from the above with regular workloads, you'll want to investigate things like all of the details listed above (cache and clerk sizes/thresholds, trap for any large memory grants, check plan-cache sizing based on overall RAM, etc.) AND/OR start looking into cache clock hands to see exactly where memory is being scavenged:
https://learn.microsoft.com/en-us/archive/blogs/slavao/q-and-a-clock-hands-what-are-they-for
I'm benchmarking comparable (2vCPU, 2G RAM) server (Ubuntu 18.04) from DigitalOcean (DO) and AWS EC2 (t3a.small).
The disk benchmark (fio) goes inline with the results of https://dzone.com/articles/iops-benchmarking-disk-io-aws-vs-digitalocean
In summary:
DO --
READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14060-14060msec
WRITE: bw=72.0MiB/s (76.5MB/s), 72.0MiB/s-72.0MiB/s (76.5MB/s-76.5MB/s), io=1026MiB (1076MB), run=14060-14060msec
EC2 --
READ: bw=9015KiB/s (9232kB/s), 9015KiB/s-9015KiB/s (9232kB/s-9232kB/s), io=3070MiB (3219MB), run=348703-348703msec
WRITE: bw=3013KiB/s (3085kB/s), 3013KiB/s-3013KiB/s (3085kB/s-3085kB/s), io=1026MiB (1076MB), run=348703-348703msec
which shows DO disk more than 10 times faster than the EBS of EC2
However, sysbench following https://severalnines.com/database-blog/how-benchmark-postgresql-performance-using-sysbench is showing DO slower than EC2 (using Postgres 11 default configuration, read-write test on oltp_legacy/oltp.lua )
DO --
transactions: 14704 (243.87 per sec.)
Latency (ms):
min: 9.06
avg: 261.77
max: 2114.04
95th percentile: 383.33
EC2 --
transactions: 20298 (336.91 per sec.)
Latency (ms):
min: 5.85
avg: 189.47
max: 961.27
95th percentile: 215.44
What could be the explanation?
Sequential read/write throughput matters for large sequential scans, stuff like data warehousing, loading a large backup, etc.
Your benchmark is OLTP which does lots of small quick queries. For this sequential throughput is irrelevant.
For reads (SELECTs) the most important factor is having enough RAM to keep your working set in cache and not do any actual IO. Failing that, it is read random access time.
For writes (UPDATE,INSERT) then the fsync latency, which is the time required to commit data to stable storage, is the most important factor since the database will only finish a COMMIT when data has been written.
Most likely the EC2 has better random access and fsync performance. Maybe it uses SSDs or battery-backed cache.
Sequential bandwidth and latency / iops are independent parameters.
Some workloads (like DBs) depend on latency for lots of small IOs. Or throughput for lots of small IO operations, iops (IOs per second).
In addition to IOPS vs throughput which others mentioned. I also wanted to point out that they are both pretty similar numbers. 240 tps vs 330 tps. you could add or subtract almost that much by just doing things like vacuum, analyze, or let it sit there for a while.
there could be other factors too. CPU speed could be different, there could be one performance for short burst vs throttling a heavy user, there could be presence or absence of huge_pages, different cache timings, memory speeds, or different nvme drivers. the point is 240 is not as much less than 330 as you might think.
Update: something else to point out is that OLTP read/write transactions arent necessary bottlenecked by disk performance. if you have sync off, then it really isnt.
I dont know exactly what the sysbench legacy OLTP read write test is doing, but I suspect its more like a bank xaction touching multiple records, using indexes, ... its probably not some sort of raw max insertion rate, or MAX CRUD operation rate benchmark.
I get 1000 tps on my desktop in the write heavy benchmark against pg13, but i can insert something like 50k records per second, each being ~ 100 bytes records from just a single process python client during bulk loads. and nearly 100k w/ sync off.
I want to calculate the page life expectancy of my SQL Server.
If I query the PLE with the follwowing query I get the value 46.000:
SELECT [object_name],
[counter_name],
[cntr_value] FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'
I think this value isn't the final value because of the high amount. Do I have to calculate these value with a specifiy formula?
Thanks
Although some counters reported by sys.dm_os_performance_counters are cumulative, PLE reflects the current value so no calculation is necessary.
As to whether the value of 46 seconds is a cause for concern depends much on the workload and storage system. This value would be concern on a high-volume OLTP system with local spinning disk media due to the multi-millisecond latency incurred for each physical IO and IOPS of roughly 200 per spindle. Conversely, the same workload with high-performance local SSD may be fine because the storage capable of well over 100K IOPS.
There are two kinds of queries that I ran,
1.A purposely introduced query to perform sorting(order by) in about 10 columns.This uses CPU since sorting is a CPU intensive operation.
The scenario involved running the query which took 30 seconds and ran about 100 of those using simultaneous connections on 100 different tables.CPU usage on a 32 core machine was about 85% on all 32 cores and all 100 queries ran in parallel.
2.Inserting a million rows on a table.
I don't understand why this would consume CPU, since this is purely disk I/O.But I inserted 1 million rows on a single table using 100 simultaneous connections/threads and no indexes where there on those tables,now insert is not the fastest way to load data, but the point here is it is consuming CPU time about 32% on about 10 cores.This is way lesser than the above but still I am just curios.
I could be wrong because of Wal archiving was on and query log was on - does this contribute to CPU.I am assuming no since those are also disk IO.
There was no other process/application running/installed on this machine other than postgres.
Many different things:
CPU time for query planning and the logic in the executor for query execution
Transforming text representations of tuples into their on-disk format. Parsing dates, and so on.
Log output
Processing the transaction logs
Writing to shared_buffers when inserting pages to write, scanning shard_buffers for pages to write out
Interprocess communication for lock management
Scanning through in-memory cached copies of indexes when checking uniqueness, inserting new keys in an index, etc
....
If you really want to know the juicy details, fire up perf with stack traces enabled to see where CPU time is spent.
If your table had a primary key, then it has an implicit index.
It may also be true that if the table had a primary key, then it would be stored as a b-tree and not a simple flat table; I'm not clear on this point since my postgres-fu has weakened over the years, but many DBMSes use the primary key as a default clustering key for a b-tree and just store everything in the b-tree. Managing that b-tree requires plenty of CPU.
Additionally, if you're inserting from 100 threads and connections, then postgres has to perform locking in order to keep internal data structures consistent. Fighting for locks can consume a ton of CPU, and is especially difficult to do efficiently on machines with many CPUs - acquiring a single mutex requires the cooperation of every CPU in the system ala cache coherency protocol.
You may want to experiment with different numbers of threads, while measuring overall runtime and cpu usage - you may find that with, say, 8 threads, the total CPU utilized is 1/10th of your current usage, but still gets the job done within 110-150% of the original time. This would be a sure sign that lock contention is killing your CPU usage.
From a performance tuning perspective which one is more important?
Say a query reports 30 scans and 148 logical reads on a table with about 2 million records.
A modified version of the the same query reports 1 scan with 1400 logical reads. Second query takes about 40ms less CPU time to execute. Is the second query better?
I think so and this is my thesis:
In the first case, we have a high number of scans on a very large table. This is costly on CPU and server memory, since all the rows in the table have to be loaded into memory. Executing such a query thousands of times will be taxing on server resources.
In the second case, we have less scans even though we are accumulating a higher number of logical reads. Since logical reads effectively corresponds to number of pages being read from cache, the bottle neck here will be network bandwidth in getting the results back to the client. The actual work SQL Server has to do in this case is less.
What are your thoughts?
The logical read metrics are mostly irrelevant. You care about time elapsed, CPU time spent and disk resources used. Why would you care about logical reads? They are accounted for by looking at CPU time.
If you want your query to go faster measure wall clock time. If you want to use less resources measure CPU and physical IO.