What statistical metrics are related to overall SQL Server database performance? - sql-server

In SQL Server, I would like to know what related statistical metrics such as Oracle's 'SQL Service Response Time' or 'Response Time Per Txn' can evaluate the overall database performance.
Please tell me the name of the statistical metrics and how to collect it using sql .

SQL Server does not accumulate statistics about transactions, but stats of execution are available for free in all editions for queries, procedures, triggers and UDF in DMV like :
SELECT * FROM sys.dm_exec_query_stats;
SELECT * FROM sys.dm_exec_procedure_stats;
SELECT * FROM sys.dm_exec_trigger_stats;
SELECT * FROM sys.dm_exec_function_stats;
The metrics to consider are the followings :
execution_count,
total_worker_time
total_elapsed_time
...
As an example, to have a mean exec time, you must divide the total time by the execution_count

You're looking for Windows Performance counters, there are a range of them, see example:
https://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
These can be read by code.
this is a big topic, but if this is what you need, please describe what problem you want to address as it dictates which part of windows is interesting to that end.
Generally i look for:
batch requests per second
lock wait time
deadlocks
cache hit ratio
target/ actual memory relation
available memory
context switches per second
CPU utilization
what we need to act on is the values changing away from normal picture.

Related

Detecting/Monitoring for parameter sniffing problems

Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
I have just got hit with a parameter sniffing problem. (It wasn't too serious as it caused a report to take about 2 minutes to run instead of a few seconds if properly cached and maybe 30 seconds if recompiled. And since the report is usually only run a few times per month, it is not really a problem).
However, since I wrote the report and I knew what it did, I was curious and went investigating and using SQL Profiler, I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
So, it struck me, that if SQL has these figures, (or at least can get these figures), that perhaps there is some way of getting sql to track and report which plans were significantly out.
You've got a couple of questions in there:
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
To catch this, you need to monitor the procedure cache to find out when a query's execution plan changes from good to bad. SQL Server 2008 made this a lot easier by adding query_hash and query_plan_hash fields to sys.dm_exec_query_stats. You can compare the current query plan to past ones for the same query_hash, and when it changes, compare the number of logical reads or amount of worker time from the old query to the new one. If it skyrockets, you might have a parameter sniffing problem.
Then again, someone might have just eliminated an index or changed the code in a UDF that's being called or a change in MAXDOP or any one of a million settings that influence query plan behavior.
What you want is a single dashboard that shows the most resource-consuming queries in aggregate (because you might have this problem on a query that's called extremely frequently, but consumes tiny amounts of resources each time) and then shows you changes in its execution plan over time, plus lays over system and database level changes. Quest Foglight Performance Analysis does this. (I used to work for Quest, so I know the product, but I'm not shilling here.) Note that Quest sells a separate product, Foglight, that has nothing to do with Performance Analysis. I'm not aware of any other product that goes into this level of detail.
I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
That's not necessarily parameter sniffing - that could be bad stats or table variable usage, for example. To catch this kind of issue, I like the free SQL Sentry Plan Advisor tool. In the Top Operations tab, it highlights variances between estimated and actual rows.
Now, that's only for one plan at a time, and you have to know the plan first. You want to do this 24/7, right? Sure you do - but it's computationally intensive. The procedure cache can be huge (I've got clients with >100GB of procedure cache), and it's all unindexed XML. To compare estimated vs actual rows, you have to shred all that XML - and keep in mind that the procedure cache can be constantly changing under load.
What you really want is a product that could very rapidly dump the entire procedure cache into a database, throw XML indexes on it, and then compare estimates versus actual rows. I can imagine a script doing that, but I haven't seen one yet.
You said
"estimated rows was 1, but the actual number of rows was several hundred thousand."
This can be caused by table variables which don't have statistics.
To detect parameter sniffing is difficult but you can verify it is happening by running sp_updatestats. If the problems disappears it's most likely parameter sniffing. If it doesn't then you have other problems, such as too large table variables
We use parameter masking consistently now (system was developed on SQL Server 2000). We don't need it 99.9+ % of the time but the < 0.1% justifies it because of user confidence + support overhead it entails.
You can set up a trace that to record the query text of all batches / stored procedures run that have duration > Ns.
You obviously need to tailor N for your system (and probably add rules to exclude batch jobs that take a long time even during normal execution), but this should identify which queries offer the poorest performance and will also record any queries (along with their parameters) which have abnormally long execution times - potentially the result of a parameter sniffing problem.
See How to create a SQL trace using T-SQL on how to create a trace using T-SQL. This will give better performance than using SQL Profiler as this only captures the events that you set trace events for (SQL Profiler reportedly captures all events and then filters them in the application).

What is the best way to compare 2 variants of a SQL query for performance?

I've got a SQL 2005 DB running under a virtual environment.
To simplify things, let's say I have two SQL SELECT Queries. They both do the exact same thing. But I'm trying to analyze them for performance purposes.
Generally, I'd fire up a local DB, load up some data and using timing to compare one variant to other variants.
But in this case, since the DB is large and it's a testbox, the client has placed it on a host that's serving other VM's as well.
The DB is too large to pull down locally, so that's out (at least for now).
But my main issue is that when I run queries against the server, the timing is all over the place. I can run the +exact+ same query 4 times and get timings of 7secs, 8 minutes, 3:45min and 15min.
My first thought was use SET STATISTICS IO ON.
But, that yields basically read and write stats on the tables being queries, which, depending on the variations in the queries (temp tables, vs views, vs joins, etc) can't really be accurately compared, except in aggregate.
I then though of SET STATISTICS TIME ON, and just using the CPU time, but that seems to discount all the IO, which also doesn't make for a good baseline.
My question is is there any other statistic or performance analysis technique that could be useful in a situation like this?
The STATISTICS IO information will still be useful. You may see significantly different numbers of reads, writes and scans that will make it obvious which query is better.
You can also view Execution Plan information for each query. You can select Query -> Display Estimated Execution Plan to see a graphical presentation of the SQL Server estimate to run the query. You can also use the Query -> Include Actual Execution Plan to show the actual plan used.
And, you can also use SET SHOWPLAN_TEXT, SET SHOWPLAN_ALL or SET SHOWPLAN_XML to include the execution plan to view a textual display of the plan.
When viewing the results of the execution plan, you can look at the estimated cost value and compare the values for each query. The estimated cost is a relative value that can be used to compare the cost of each option.

SQL Queries, execution plans and "Parallelism"

So I'm (still) going through some slow legacy sql views used to do calculate some averages and standarddeviations on a (sometimes) large set of data. What I end up with are views joining views joining views etc.
So I though I would review the execution plan for my query. And it immediately suggested a missing index, which I then implemented. But it's still unbearably slow (so slow it times out the VB6 app querying it for data ;) )
So upon studying the execution plan further, I see that what costs the most (about 8% each in my case) are "Paralellism" cases. Mostly "Distribute Streams" and "Repartition Streams". What are these?
Distribute Streams and Repartion Streams are operations that occur when the SQL optimizer chooses to use Parallel Query Processing. If you suspect that this is causing an issue with your query, you can force SQL Server to only use one CPU with the MAXDOP query hint, as illustrated below.
select *
from sys.tables
option (maxdop 1)

How can I tell *IF* AND *WHAT* partitions are being accessed - sql server 2008 db engine

Setup
Cost of Threshold for Parallelism : 5
Max Degree of Parallelism : 4
Number of Processors : 8
SQL Server 2008 10.0.2.2757
I have a query with many joins, many records.
The design is a star. ( Central table with fks to the reference tables )
The central table is partitioned on the relevant date column.
The partition schema is split by days
The data is very well split across the partition schema - as judged by comparing the sizes of the files in the filegroups assigned to the partition schema
Queries involved have the predicate set over the partitioned column. such as ( cs.dte >= #min_date and cs.dte < #max_date )
The values of the date parameters are a day apart # midnight so, 2010-02-01, 2010-02-02
The estimated query plan shows no parallelism
a) This question is in regards to Sql Server 2008 Database Engine. When a query in the OLTP engine is running, I would like to see / have the sort of insight one gets when profiling an SSAS Query using Progress End event - where one sees something like "Done reading PartititionXYZ".
b) if the estimated query plan or the actual query plan shows no parallel processing does that mean that all partitions will be / were checked / read? * What I was trying to say here was - just b/c I don't see parallelism in a query plan, that doesn't guarantee the query isn't hitting multiple partitions - right? Or - is there a solid relationship between parallelism and # partitions accessed?
c) suggestions? Is there more information that I need to provide?
d) how can I tell if a query is processing in parallel w/o looking # the actual query plan? * I'm really only interested in this if it is helpful in pinning down what partitions are being used.
Added Nov 10
Try this:
Create querys that should hit 1, 3, and all your partitions
Open an SSMS query window, and run SET SHOWPLAN_XML ON
Run each query one by one in that window
Each run will kick out a chunk of XML
Compare these XML results (I use a text diff tool, “CompareIt”, but any similar tool would do)
You should see that the execution plans are significantly different. In my “3” and “All” querys, there’s a chunk of text tagged as “ConstantScan” that has an entry for (respectively) 3 and All partitions in the table, and that section is not present for the “1 partition” query. I use this to infer that yes indeed, SQL doing what it says it will do, to wit: only read as much of the table as it believes it needs to in order to resovle the query.
Got a pretty good answer here: http://www.sqlservercentral.com/Forums/Topic1064946-391-1.aspx#bm1065048
a) I am not aware of any way to determine how a query has progressed while the query is still running. Maybe something finicky with the latching and locking system views, but I doubt it. (I am, alas, not familiar enough with SSAS to draw parallels between the two.)
b) SQL will probably use parallelism when working with multiple partitions within a single table, in which case you will see parallel processing "tokens" in your query plan. However, if for whatever reason parallelism is not invoked yet multiple partitions must be read, they will be read without the use of parallelism.
d) Another thing that perhaps cannot be done. Under very controlled cirsumstances, you could use System Monitor (Perfmon) to track CPU usage or perhaps disk reads during the execution of they query. This won't help if the server is performing other work, or the data is resident in memory (the buffer cache), and so may be of limited use.
c) What is it you are actually trying to figure out? Which partitions (if any) are being accessed by users over a period of time? Is SQL generating a "smart" query plan? Without details of the data, structure, and query, it's hard to come up with advice.

How often should Oracle database statistics be run?

In your experience, how often should Oracle database statistics be run? Our team of developers recently discovered that statistics hadn't been run our production box in over 2 1/2 months. That sounds like a long time to me, but I'm not a DBA.
Since Oracle 11g statistics are gathered automatically by default.
Two Scheduler windows are predefined upon installation of Oracle Database:
WEEKNIGHT_WINDOW starts at 10 p.m. and ends at 6 a.m. every Monday
through Friday.
WEEKEND_WINDOW covers whole days Saturday and Sunday.
When statistics were last gathered?
SELECT owner, table_name, last_analyzed FROM all_tables ORDER BY last_analyzed DESC NULLS LAST; --Tables.
SELECT owner, index_name, last_analyzed FROM all_indexes ORDER BY last_analyzed DESC NULLS LAST; -- Indexes.
Status of automated statistics gathering?
SELECT * FROM dba_autotask_client WHERE client_name = 'auto optimizer stats collection';
Windows Groups?
SELECT window_group_name, window_name FROM dba_scheduler_wingroup_members;
Window Schedules?
SELECT window_name, start_time, duration FROM dba_autotask_schedule;
Manually gather Database Statistics in this Schema:
EXEC dbms_stats.gather_schema_stats(ownname=>NULL, cascade=>TRUE); -- cascade=>TRUE means include Table Indexes too.
Manually gather Database Statistics in all Schemas!
-- Probably need to CONNECT / AS SYSDBA
EXEC dbms_stats.gather_database_stats;
Whenever the data changes "significantly".
If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.
Statistics store information about item counts and relative frequencies -- things that will let it "guess" at how many rows will match a given criteria. When it guesses wrong, the optimizer can pick a very suboptimal query plan.
At my last job we ran statistics once a week. If I remember correctly, we scheduled them on a Thursday night, and on Friday the DBAs were very careful to monitor the longest running queries for anything unexpected. (Friday was picked because it was often just after a code release, and tended to be a fairly low traffic day.) When they saw a bad query they would find a better query plan and save that one so it wouldn't change again unexpectedly. (Oracle has tools to do this for you automatically, you tell it the query to optimize and it does.)
Many organizations avoid running statistics out of fear of bad query plans popping up unexpectedly. But this usually means that their query plans get worse and worse over time. And when they do run statistics then they encounter a number of problems. The resulting scramble to fix those issues confirms their fears about the dangers of running statistics. But if they ran statistics regularly, used the monitoring tools as they are supposed to, and fixed issues as they came up then they would have fewer headaches, and they wouldn't encounter them all at once.
What Oracle version are you using? Check this page which refers to Oracle 10:
http://www.acs.ilstu.edu/docs/Oracle/server.101/b10752/stats.htm
It says:
The recommended approach to gathering statistics is to allow Oracle to automatically gather the statistics. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly-scheduled maintenance job.
When I was managing a large multi-user planning system backed by Oracle, our DBA had a weekly job that gathered statistics. Also, when we rolled out a significant change that could affect or be affected by statistics, we would force the job to run out of cycle to get things caught up.
With 10g and higher version of oracle, up to date statistics on tables and indexes are needed by the optimizer to make "good" execution plan decision. How often you collect statistics is a tricky call. It depends on your application, schema, data rate and business practice. Some third party apps which are written to be backward compatible with older version of oracle do not perform well with the new optimizer. Those application require that tables have no stats so that the db resorts back to rule base execution plan. But on the average oracle recommends that stats be collected on tables with stale statistics. You can set tables to be monitor and check their state and have them analyze if/when stale. Often that is enough, sometime it is not. It really depend on your database. For my database we have a set of OLTP tables that need nightly stats collection to maintain performance. Other tables are analyze once a week. On our large dw database, we analyze as needed as the tables are too large for regular analysis without affecting overall db load and performance. So the correct answer is, it depends on the application, data change and business needs.
Make sure to balance the risk that fresh statistics cause undesirable changes to query plans against the risk that stale statistics can themselves cause query plans to change.
Imagine you have a bug database with a table ISSUE and a column CREATE_DATE where the values in the column increase more or less monotonically. Now, assume that there is a histogram on this column that tells Oracle that the values for this column are uniformly distributed between January 1, 2008 and September 17, 2008. This makes it possible for the optimizer to reasonably estimate the number of rows that would be returned if you were looking for all issues created last week (i.e. September 7 - 13). If the application continues to be used and the statistics are never updated, though, this histogram will be less and less accurate. So the optimizer will expect queries for "issues created last week" to be less and less accurate over time and may eventually cause Oracle to change the query plan negatively.
In the case of a data warehouse-type system you can consider collecting no statistics at all, and relying on dynamic sampling (setting optimizer_dynamic_sampling to level 2 or above).
Generally it's not recommended to gather statistics so frequent on the whole database unless you have a strong justification for that, such as a bulk insert or big data change happen frequently on the database.
gathering statistics on the database in this frequency MAY change the queries execution plan to a new poor execution plans, the thing may cost you much time trying to tune every query affected by the new poor plans, this is why you should test the impact of gathering new statistics on a test database, or in case you don't have the time or the man power for that, at least you should keep a fallback plan by backing up the original statics before you gather new ones, so in case you gather a new statistics and then the queries didn't perform as expected, you can easily restore back the original statistics.
There is a very useful script can help you backup original statistics and gather new ones and provide you with SQL command you can use to restore back the original statics in case the thing didn't go as expected after gathering new statistics. You can find the script in this link:
http://dba-tips.blogspot.com/2014/09/script-to-ease-gathering-statistics-on.html

Resources