SQL Server Statistics Update

SQL Server Statistics Update - sql-server

I manage 25 SQL Server databases. All 25 databases are configured to "Auto Update Statistics". A few of these databases are 250+ GB and contain tables with 2+ billion records. The "Auto Update Statistics" setting is not sufficient to effectively keep the larger database statistics updated. I created a nightly job to update stats for all databases and tables with fullscan. This fixed our performance issues initially, but now the job is taking too long (7 hours).
How can I determine which tables need a full scan statistics update? Can I use a value from sys.dm_db_index_usage_stats or some other DMV?
Using SQL Sever 2019 (version 15.0.2080.9) and the compatibility level of the databases is SQL Server 2016 (130).

As of Sql2016+ (db compatibility level 130+), the main formula used to decide if stats need updating is: MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ). In the formula, n is the count of rows in the table/index in question. You then compare the result of the formula to how many rows have been modified since the statistic was last updated. That's found at either sys.sysindexes.rowmodctr or sys.dm_db_stats_properties(...).row_count (they have the same value).
Ola's scripts also use this formula, internally, but you can use the StatisticsModificationLevel param, to be more aggressive, if you want (e.g. like Erin Stellato). The main reason people (like Erin) give to be more aggressive is if you know your tables have a lot of skew or churn.
If you find your problem is that a filtered index isn't getting updated automatically, be aware of a long-standing issue that could be the cause.
However, ultimately, I believe the reason you have a performance problem with your nightly statistics job is because you're blindly updating all statistics for every table. It's better to update only the statistics that need it---especially since you can cause an IO storm.

Related

Real time table alternative vs swapping table

I use SSMS 2016. I have a view that has a few millions of records. The view is not indexed and should not be as it's being updated (insert, delete, update) every 5 minutes by a job on the server to then display update data sets in to the client calling application in GUI.
The view does a very heavy volume of conversion INT values to VARCHAR appending to them some string values.
The view also does some CAST operations on the NULL assigning them column names aliases. And the worst performance hit is that the view uses FOR XML PATH('') function on 20 columns.
Also the view uses two CTEs as the source as well as Subsidiaries to define a single column value.
I made sure I created the right indexes (Clustered, nonclustered,composite and Covering) that are used in the view Select,JOIN,and WHERE clauses.
Database Tuning Advisor also have not suggested anything that could substantialy improve performance.
AS a workaround I decided to create two identical physical tables with clustered indexes on each and using the Merge statement (further converted into a SP and then Into as SQL Server Agent Job) maintain them updated. And to assure there is no long locking of the view. I will then swap(rename) the tables names immediately after each merge finishes. So in this case all the heavy workload falls onto a SQL Server Agent Job keep the tables updated.
The problem is that the merge will take roughly 15 minutes considering current size of the data, which may increase in the future. So, I need to have a real time design to assure that the view has the most up-to-date information.
Any ideas?

Why are these SQL Server statistics outdated, when it's set to auto-update?

We have a SQL Server 2012 instance, with auto-statitics set to ON for the DB:
But then i ran a query to check some statistics, and some haven't been updated in a while:
Why is this the case? Is there a rule to whether SQL Server updates statistics that haven't been triggered by these indexes?
Do i need to care? How do i know if i need to update them, or if they are causing performance issues for me?
Thanks!

Even though you set Auto update statistics to true, they will update only when a threshold has been reached..this is different for different versions
Thresholds for SQL Server 2012 or older:
The table size has gone from 0 to > 0 rows
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered
For SQLServer 2016,there are few major changes and SQL Updates statistics with a new algorithm(read as more frequently than old versions)
Do i need to care? How do i know if i need to update them, or if they are causing performance issues for me?
Normally people schedule maintenance jobs during weekends and this includes index rebuild/stats update..
This should normally take care of most databases.In your case,if you are seeing performance issues due to invalid stats,you can update them manually.We do it once a week,but sites like StackOverflow does it more often
update stats tablename
Further reading/references:
Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
Understanding When Statistics Will Automatically Update

Why auto update statistics are not efficient to improve the performance

We have a DB of size 1257GB having 100+ objects with hourly transactions (Insert and update).
We have set Auto Update statistics:True and Auto Update statistics asynchronously : False.
When we trigger the queries to fetch the data it's taking long time. But when we manually execute SP_UpdateStats, the same query is taking very less time to fetch the same amount of data.
Please let me know whether we need to update the stats on regular basis? And what are the advantages and disadvantages of using EXEC SP_UpdateStats?
Windows server 2012R2 and SSMS2014.

But when we manually exec SP_UpdateStats, the same query is taking very less time to fetch the same amount of data
Even though you have auto update statistics set to true, your statistics won't be updated frequently
SQLServer triggers automatic statistics update,based on certain thresholds and below thresholds holds good for all version less than SQLServer 2016
The table size has gone from 0 to >0 rows (test 1).
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
so based on how big your table is ,you can determine the threshold using above formula
Starting with SQLServer 2016,this threshold has been changed and statistics will be triggered more frequently.
The same behaviour can be obtained in older versions with the help of traceflag 371
For example, if the trace flag is activated(by default activated in SQLServer 2016), update statistics will be triggered on a table with 1 billion rows when 1 million changes occur.
If the trace flag is not activated, then the same table with 1 billion records would need 200 million changes before an update statistics is triggered.
Could you please let me know whether we have to update the stats on regular basis? And what are the advantages and disadvantages of using EXEC SP_UpdateStats?
If you see suboptimal plans due to inaccurate statistics,go ahead and schedule this new flag
Talking about disadvantages, if you update statistics frequently, you will see query plans getting recompiled,which will in turn cause CPU pressure,when plans are compiled again

Sp_UpdateStats will update the statistics on all the tables. Brent Ozar believes that this should be done much more regularly then doing reorgs or rebuilds on indexes. By updating your statistics SQL Server is more likely to create a 'better' query plan. The downside of this is that all the statistics on all tables (whether they need to or not) will be updated and it can take substantial resources to do so. Many DBAs run sp_updatestats on a nightly or weekly basis when the machine is not being used heavily. There are scripts that will check for what tables to be updated and only those are updated.
To see different approaches to updating statistics this is a good place to start:
https://www.brentozar.com/archive/2014/01/update-statistics-the-secret-io-explosion/
If the query is running slowly it is much more likely that there are other issues with the query. You should post the query and query plan and the community may be able to offer useful hints on improving the query or adding indexes to the underlying tables.

Extrememly High Estimated Number of Rows in Execution Plan

I have a stored procedure running 10 times slower in production than in staging. I took at look at the execution plan and the first thing I noticed was the cost on Table Insert (into a table variable #temp) was 100% in production and 2% in staging.
The estimated number of rows in production showed almost 200 million row! But in staging was only about 33.
Although the production DB is running on SQL Server 2008 R2 while staging is SQL Server 2012 but I don't think this difference could cause such a problem.
What could be the cause of such a huge difference?
UPDATED
Added the execution plan. As you can see, the large number of estimated rows shows up in Nested Loops (Inner Join) but all it does is a clustered index seek to another table.
UPDATED2
Link for the plan XML included
plan.xml
And SQL Sentry Plan Explorer view (with estimated counts shown)

This looks like a bug to me.
There are an estimated 90,991.1 rows going into the nested loops.
The table cardinality of the table being seeked on is 24,826.
If there are no statistics for a column and the equality operator is used, that means the SQL can’t know the density of the column, so it uses a 10 percent fixed value.
90,991.1 * 24,826 * 10% = 225,894,504.86 which is pretty close to your estimated rows of 225,894,000
But the execution plan shows that only 1 row is estimated per seek. Not the 24,826 from above.
So these figures don't add up. I would assume that it starts off from an original 10% ball park estimate and then later adjusts it to 1 because of the presence of a unique constraint without making a compensating adjustment to the other branches.
I see that the seek is calling a scalar UDF [dbo].[TryConvertGuid] I was able to reproduce similar behavior on SQL Server 2005 where seeking on a unique index on the inside of a nested loops with the predicate being a UDF produced a result where the number of rows estimated out of the join was much larger than would be expected by multiplying estimated seeked rows * estimated number of executions.
But, in your case, the operators to the left of the problematic part of the plan are pretty simple and not sensitive to the number of rows (neither the rowcount top operator or the insert operator will change) so I don't think this quirk is responsible for the performance issues you noticed.
Regarding the point in the comments to another answer that switching to a temp table helped the performance of the insert this may be because it allows the read part of the plan to operate in parallel (inserting to a table variable would block this)

Run EXEC sp_updatestats; on the production database. This updates statistics on all tables. It might produce more sane execution plans if your statistics are screwed up.

Please don't run EXEC sp_updatestats; On a large system it could take hours, or days, to complete. What you may want to do is look at the query plan that is being used on production. Try to see if it has a index that could be used and is not being used. Try rebuilding the index (as a side effect it rebuilds statistics on the index.) After rebuilding look at the query plan and note if it is using the index. Perhaps you many need to add an index to the table. Does the table have a clustered index?
As a general rule, since 2005, SQL server manages statistics on its own rather well. The only time you need to explicitly update statistics is if you know that if SQL Server uses an index the query would execute would execute a lot faster but its not. You may want to run (on a nightly or weekly basis) scripts that automatically test every table and every index to see if the index needs to be reorged or rebuilt (depending on how fragmented it is). These kind of scripts (on a large active OLTP system)r may take a long time to run and you should consider carefully when you have a window to run it. There are quite a few versions of this script floating around but I have used this one often:
https://msdn.microsoft.com/en-us/library/ms189858.aspx

Sorry this is probably too late to help you.
Table Variables are impossible for SQL Server to predict. They always estimate one row and exactly one row coming back.
To get accurate estimates so that the better plan can be created you need to switch your table variable to a temp table or a cte.

Have you ever encountered a query that SQL Server could not execute because it referenced too many tables?

Have you ever seen any of there error messages?
-- SQL Server 2000
Could not allocate ancillary table for view or function resolution.
The maximum number of tables in a query (256) was exceeded.
-- SQL Server 2005
Too many table names in the query. The maximum allowable is 256.
If yes, what have you done?
Given up? Convinced the customer to simplify their demands? Denormalized the database?
#(everyone wanting me to post the query):
I'm not sure if I can paste 70 kilobytes of code in the answer editing window.
Even if I can this this won't help since this 70 kilobytes of code will reference 20 or 30 views that I would also have to post since otherwise the code will be meaningless.
I don't want to sound like I am boasting here but the problem is not in the queries. The queries are optimal (or at least almost optimal). I have spent countless hours optimizing them, looking for every single column and every single table that can be removed. Imagine a report that has 200 or 300 columns that has to be filled with a single SELECT statement (because that's how it was designed a few years ago when it was still a small report).

For SQL Server 2005, I'd recommend using table variables and partially building the data as you go.
To do this, create a table variable that represents your final result set you want to send to the user.
Then find your primary table (say the orders table in your example above) and pull that data, plus a bit of supplementary data that is only say one join away (customer name, product name). You can do a SELECT INTO to put this straight into your table variable.
From there, iterate through the table and for each row, do a bunch of small SELECT queries that retrieves all the supplemental data you need for your result set. Insert these into each column as you go.
Once complete, you can then do a simple SELECT * from your table variable and return this result set to the user.
I don't have any hard numbers for this, but there have been three distinct instances that I have worked on to date where doing these smaller queries has actually worked faster than doing one massive select query with a bunch of joins.

#chopeen You could change the way you're calculating these statistics, and instead keep a separate table of all per-product stats.. when an order is placed, loop through the products and update the appropriate records in the stats table. This would shift a lot of the calculation load to the checkout page rather than running everything in one huge query when running a report. Of course there are some stats that aren't going to work as well this way, e.g. tracking customers' next purchases after purchasing a particular product.

This would happen all the time when writing Reporting Services Reports for Dynamics CRM installations running on SQL Server 2000. CRM has a nicely normalised data schema which results in a lot of joins. There's actually a hotfix around that will up the limit from 256 to a whopping 260: http://support.microsoft.com/kb/818406 (we always thought this a great joke on the part of the SQL Server team).
The solution, as Dillie-O aludes to, is to identify appropriate "sub-joins" (preferably ones that are used multiple times) and factor them out into temp-table variables that you then use in your main joins. It's a major PIA and often kills performance. I'm sorry for you.
#Kevin, love that tee -- says it all :-).

I have never come across this kind of situation, and to be honest the idea of referencing > 256 tables in a query fills me with a mortal dread.
Your first question should probably by "Why so many?", closely followed by "what bits of information do I NOT need?" I'd be worried that the amount of data being returned from such a query would begin to impact performance of the application quite severely, too.

I'd like to see that query, but I imagine it's some problem with some sort of iterator, and while I can't think of any situations where its possible, I bet it's from a bad while/case/cursor or a ton of poorly implemented views.

Post the query :D
Also I feel like one of the possible problems could be having a ton (read 200+) of name/value tables which could condensed into a single lookup table.

I had this same problem... my development box runs SQL Server 2008 (the view worked fine) but on production (with SQL Server 2005) the view didn't. I ended up creating views to avoid this limitation, using the new views as part of the query in the view that threw the error.
Kind of silly considering the logical execution is the same...

Had the same issue in SQL Server 2005 (worked in 2008) when I wanted to create a view. I resolved the issue by creating a stored procedure instead of a view.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight