Why auto update statistics are not efficient to improve the performance

Why auto update statistics are not efficient to improve the performance - sql-server

We have a DB of size 1257GB having 100+ objects with hourly transactions (Insert and update).
We have set Auto Update statistics:True and Auto Update statistics asynchronously : False.
When we trigger the queries to fetch the data it's taking long time. But when we manually execute SP_UpdateStats, the same query is taking very less time to fetch the same amount of data.
Please let me know whether we need to update the stats on regular basis? And what are the advantages and disadvantages of using EXEC SP_UpdateStats?
Windows server 2012R2 and SSMS2014.

But when we manually exec SP_UpdateStats, the same query is taking very less time to fetch the same amount of data
Even though you have auto update statistics set to true, your statistics won't be updated frequently
SQLServer triggers automatic statistics update,based on certain thresholds and below thresholds holds good for all version less than SQLServer 2016
The table size has gone from 0 to >0 rows (test 1).
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
so based on how big your table is ,you can determine the threshold using above formula
Starting with SQLServer 2016,this threshold has been changed and statistics will be triggered more frequently.
The same behaviour can be obtained in older versions with the help of traceflag 371
For example, if the trace flag is activated(by default activated in SQLServer 2016), update statistics will be triggered on a table with 1 billion rows when 1 million changes occur.
If the trace flag is not activated, then the same table with 1 billion records would need 200 million changes before an update statistics is triggered.
Could you please let me know whether we have to update the stats on regular basis? And what are the advantages and disadvantages of using EXEC SP_UpdateStats?
If you see suboptimal plans due to inaccurate statistics,go ahead and schedule this new flag
Talking about disadvantages, if you update statistics frequently, you will see query plans getting recompiled,which will in turn cause CPU pressure,when plans are compiled again

Sp_UpdateStats will update the statistics on all the tables. Brent Ozar believes that this should be done much more regularly then doing reorgs or rebuilds on indexes. By updating your statistics SQL Server is more likely to create a 'better' query plan. The downside of this is that all the statistics on all tables (whether they need to or not) will be updated and it can take substantial resources to do so. Many DBAs run sp_updatestats on a nightly or weekly basis when the machine is not being used heavily. There are scripts that will check for what tables to be updated and only those are updated.
To see different approaches to updating statistics this is a good place to start:
https://www.brentozar.com/archive/2014/01/update-statistics-the-secret-io-explosion/
If the query is running slowly it is much more likely that there are other issues with the query. You should post the query and query plan and the community may be able to offer useful hints on improving the query or adding indexes to the underlying tables.

Related

SQL Server Statistics Update

I manage 25 SQL Server databases. All 25 databases are configured to "Auto Update Statistics". A few of these databases are 250+ GB and contain tables with 2+ billion records. The "Auto Update Statistics" setting is not sufficient to effectively keep the larger database statistics updated. I created a nightly job to update stats for all databases and tables with fullscan. This fixed our performance issues initially, but now the job is taking too long (7 hours).
How can I determine which tables need a full scan statistics update? Can I use a value from sys.dm_db_index_usage_stats or some other DMV?
Using SQL Sever 2019 (version 15.0.2080.9) and the compatibility level of the databases is SQL Server 2016 (130).

As of Sql2016+ (db compatibility level 130+), the main formula used to decide if stats need updating is: MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ). In the formula, n is the count of rows in the table/index in question. You then compare the result of the formula to how many rows have been modified since the statistic was last updated. That's found at either sys.sysindexes.rowmodctr or sys.dm_db_stats_properties(...).row_count (they have the same value).
Ola's scripts also use this formula, internally, but you can use the StatisticsModificationLevel param, to be more aggressive, if you want (e.g. like Erin Stellato). The main reason people (like Erin) give to be more aggressive is if you know your tables have a lot of skew or churn.
If you find your problem is that a filtered index isn't getting updated automatically, be aware of a long-standing issue that could be the cause.
However, ultimately, I believe the reason you have a performance problem with your nightly statistics job is because you're blindly updating all statistics for every table. It's better to update only the statistics that need it---especially since you can cause an IO storm.

How do I figure out what is causing Data IO spikes on my Azure SQL database?

I have a Azure SQL production database that runs at around 10-20% DTU usage on average, however, I get DTU spikes that take it upwards of 100% at times. Here is a sample from the past 1 hour:
I realize this could be a rouge query, so I switched over to the Query Performance Insight tab, and I find the following from the past 24 hours:
This chart makes sense with regards to the CPU usage line. Query 3780 takes the majority of at CPU, as expected with my application. The Overall DTU (red) line seems to follow this correctly (minus the spikes).
However, in the DTU Components charts I can see large Data IO spikes occurring that coincide with the Overall DTU spikes. Switching over to the TOP 5 queries by Data IO, I see the following:
This seems to indicate that there are no queries that are using high amounts of Data IO.
How do I find out where this Data IO usage is coming from?
Finally, I see that there is this one, "odd ball" query (7966) listed under the TOP 5 queries by Data IO with only 5 executions. Selecting it shows the following:
SELECT StatMan([SC0], [SC1], [SC2], [SB0000])
FROM (SELECT TOP 100 PERCENT [SC0], [SC1], [SC2], step_direction([SC0]) over (order by NULL) AS [SB0000]
FROM (SELECT [UserId] AS [SC0], [Type] AS [SC1], [Id] AS [SC2] FROM [dbo].[Cipher] TABLESAMPLE SYSTEM (1.828756e+000 PERCENT)
WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER
ORDER BY [SC0], [SC1], [SC2], [SB0000] ) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 16)
What is this query?
This does not look like any query that my application has created/uses. The timestamps on the details chart seem to line up with the approximate times of the overall Data IO spikes (just prior to 6am) which leads me to think this query has something to do with all of this.
Are there any other tools can I use to help isolate this issue?

The query is updating statistics..this occurs when this setting AUTO UPDATE STATISTICS is on..This should be kept on and you can't turn it off..this is a best practice..
You should update stats manually only when when you see a query not performing well and stats are off for that query..
Also below are some rules when SQL will update stats automatically for you
When a table with no rows gets a row
When 500 rows are changed to a table that is less than 500 rows
When 20% + 500 are changed in a table greater than 500 rows
By ‘change’ we mean if a row is inserted, updated or deleted. So, yes, even the automatically-created statistics get updated and maintained as the data changes.There were some changes to these rules in recent versions and sql can update stats more often
References:
https://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/

It seems that query is part of the automatic update of statistics process. To mitigate the impact of this process on production you can regularly update statistics and indexes using runbooks as explained here. Run sp_updatestats to immediately try to mitigate the impact of this process.

Why are these SQL Server statistics outdated, when it's set to auto-update?

We have a SQL Server 2012 instance, with auto-statitics set to ON for the DB:
But then i ran a query to check some statistics, and some haven't been updated in a while:
Why is this the case? Is there a rule to whether SQL Server updates statistics that haven't been triggered by these indexes?
Do i need to care? How do i know if i need to update them, or if they are causing performance issues for me?
Thanks!

Even though you set Auto update statistics to true, they will update only when a threshold has been reached..this is different for different versions
Thresholds for SQL Server 2012 or older:
The table size has gone from 0 to > 0 rows
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered
For SQLServer 2016,there are few major changes and SQL Updates statistics with a new algorithm(read as more frequently than old versions)
Do i need to care? How do i know if i need to update them, or if they are causing performance issues for me?
Normally people schedule maintenance jobs during weekends and this includes index rebuild/stats update..
This should normally take care of most databases.In your case,if you are seeing performance issues due to invalid stats,you can update them manually.We do it once a week,but sites like StackOverflow does it more often
update stats tablename
Further reading/references:
Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
Understanding When Statistics Will Automatically Update

Index gets out of sync on bulk insert

I have a strange problem with my SQL Server database.
I am writing bulk data (about 90,000 rows) using SqlBulkCopy.WriteToServer and I am also writing about 30,000 rows in batches of 1,000 using EF's AddRange.
This causes the indexes on these tables to go out of sync and queries take many factors longer than usually (timeout after 10 minutes instead of a result after a few seconds).
After I manually rebuild the indexes, the queries are fast again until another of these imports is happening.
My understanding of bulk loading is that it should also update the index.
My question is: Is there a well-known reason for this behavior? If not, how can I go about trouble shooting this?

We have got exactly the same issue some years ago. And as dfundako suggested, the answer is the outdated statistics.
SQLServer by defaults updates the statistics if a certain percent of records was changed. This is a problem if your table has a huge number of records, so 90000 added records would not reach the required percentage of number of changed rows.
So if you want to be sure, after inserting you have either reindex you table (as you did) or update the statistics of your table
update statistics <your table>

Based on the comments and answers here, I tried to figure out if I can change that 20% threshold somehow.
And indeed, there is a way to do this, using trace flag 2371
You can enable it like this:
DBCC TRACEON(2371, -1)
I will now wait a few weeks to be sure that this fixed the problem, but I have good hopes about it.

Extrememly High Estimated Number of Rows in Execution Plan

I have a stored procedure running 10 times slower in production than in staging. I took at look at the execution plan and the first thing I noticed was the cost on Table Insert (into a table variable #temp) was 100% in production and 2% in staging.
The estimated number of rows in production showed almost 200 million row! But in staging was only about 33.
Although the production DB is running on SQL Server 2008 R2 while staging is SQL Server 2012 but I don't think this difference could cause such a problem.
What could be the cause of such a huge difference?
UPDATED
Added the execution plan. As you can see, the large number of estimated rows shows up in Nested Loops (Inner Join) but all it does is a clustered index seek to another table.
UPDATED2
Link for the plan XML included
plan.xml
And SQL Sentry Plan Explorer view (with estimated counts shown)

This looks like a bug to me.
There are an estimated 90,991.1 rows going into the nested loops.
The table cardinality of the table being seeked on is 24,826.
If there are no statistics for a column and the equality operator is used, that means the SQL can’t know the density of the column, so it uses a 10 percent fixed value.
90,991.1 * 24,826 * 10% = 225,894,504.86 which is pretty close to your estimated rows of 225,894,000
But the execution plan shows that only 1 row is estimated per seek. Not the 24,826 from above.
So these figures don't add up. I would assume that it starts off from an original 10% ball park estimate and then later adjusts it to 1 because of the presence of a unique constraint without making a compensating adjustment to the other branches.
I see that the seek is calling a scalar UDF [dbo].[TryConvertGuid] I was able to reproduce similar behavior on SQL Server 2005 where seeking on a unique index on the inside of a nested loops with the predicate being a UDF produced a result where the number of rows estimated out of the join was much larger than would be expected by multiplying estimated seeked rows * estimated number of executions.
But, in your case, the operators to the left of the problematic part of the plan are pretty simple and not sensitive to the number of rows (neither the rowcount top operator or the insert operator will change) so I don't think this quirk is responsible for the performance issues you noticed.
Regarding the point in the comments to another answer that switching to a temp table helped the performance of the insert this may be because it allows the read part of the plan to operate in parallel (inserting to a table variable would block this)

Run EXEC sp_updatestats; on the production database. This updates statistics on all tables. It might produce more sane execution plans if your statistics are screwed up.

Please don't run EXEC sp_updatestats; On a large system it could take hours, or days, to complete. What you may want to do is look at the query plan that is being used on production. Try to see if it has a index that could be used and is not being used. Try rebuilding the index (as a side effect it rebuilds statistics on the index.) After rebuilding look at the query plan and note if it is using the index. Perhaps you many need to add an index to the table. Does the table have a clustered index?
As a general rule, since 2005, SQL server manages statistics on its own rather well. The only time you need to explicitly update statistics is if you know that if SQL Server uses an index the query would execute would execute a lot faster but its not. You may want to run (on a nightly or weekly basis) scripts that automatically test every table and every index to see if the index needs to be reorged or rebuilt (depending on how fragmented it is). These kind of scripts (on a large active OLTP system)r may take a long time to run and you should consider carefully when you have a window to run it. There are quite a few versions of this script floating around but I have used this one often:
https://msdn.microsoft.com/en-us/library/ms189858.aspx

Sorry this is probably too late to help you.
Table Variables are impossible for SQL Server to predict. They always estimate one row and exactly one row coming back.
To get accurate estimates so that the better plan can be created you need to switch your table variable to a temp table or a cte.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight