How often should the indexes be rebuilt in our SQL Server database? - sql-server

Currently our database has size 10 GB and is growing by around 3 GB per month. Often I hear that one should from time to time rebuild the indexes, to improve the query execution time. So how often should I rebuild the indexes in the given scenario?

There's a general consensus that you should reorganize ("defragment") your indices as soon as index fragmentation reaches more than 5 (sometimes 10%), and you should rebuild them completely when it goes beyond 30% (at least that's the numbers I've heard advocated in a lot of places).
Michelle Ufford (a.k.a. "SQL Fool") has an automated index defrag script, which uses those exact limits for deciding when to reorganize or rebuild an index.
Also see Brad McGehee's tips on rebuild indexes with some good thoughts and tips on how to deal with index rebuilding.
I use this script here (can't remember when I got this from - whoever it was: many thanks! Really helpful stuff) to display the index fragmentation on all your indices in a given database:
SELECT
t.NAME 'Table name',
i.NAME 'Index name',
ips.index_type_desc,
ips.alloc_unit_type_desc,
ips.index_depth,
ips.index_level,
ips.avg_fragmentation_in_percent,
ips.fragment_count,
ips.avg_fragment_size_in_pages,
ips.page_count,
ips.avg_page_space_used_in_percent,
ips.record_count,
ips.ghost_record_count,
ips.Version_ghost_record_count,
ips.min_record_size_in_bytes,
ips.max_record_size_in_bytes,
ips.avg_record_size_in_bytes,
ips.forwarded_record_count
FROM
sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') ips
INNER JOIN
sys.tables t ON ips.OBJECT_ID = t.Object_ID
INNER JOIN
sys.indexes i ON ips.index_id = i.index_id AND ips.OBJECT_ID = i.object_id
WHERE
AVG_FRAGMENTATION_IN_PERCENT > 0.0
ORDER BY
AVG_FRAGMENTATION_IN_PERCENT, fragment_count

"When you need to" and "When you can"!
For example...
Test for fragmentation first and decide whether to do nothing, reorg or rebuild.
SQL Fool's script does this, for example, has #minFragmentation and #rebuildThreshold parameters
Do statistics daily, say, but indexes at weekends. What is your maintenance window?

You should rebuild indexes often enough so that production is not detrimentally affected by index degradation. I understand that this seems vague, but all databases are different and are used in different ways. You only need to regularly rebuild/defrag indexes that incur write operations (inserts/updates) – your static or mostly read only tables will not need much reindexing.
You will need to use dbcc showcontig([Table]) to check the fragmentation level of your indexes, determine how often they become fragmented and as to what level the fragmentation actually is.
Use dbcc dbreindex([Table]) to totally rebuild the indexes when they become too fragmented (above 20%-30% or so) but if you cannot find a large enough downtime window and the fragmentation level is relatively low (1%-25%), you should use dbcc indexdefrag([Database], [Table], [Index]) to defrag the index in an "online" fassion. Also keep in mind, that you can stop the index defrag operation and start it again at a later time without losing any work.
Keeping a database and its indexes "in tune" takes a bit of monitoring to really get a feel for when and what to reindex.

Given the size of your database, you can easily rebuild the indexes once per month. But as the size increases, say to around 500 GB, you could do it twice per month.

Mentioned in Bacon Bits comment Ola Hallengren's SQL Server Maintenance Solution  IndexOptimize is supported on SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, SQL Server 2014, SQL Server 2016, SQL Server 2017, SQL Server 2019, Azure SQL Database, and Azure SQL Database Managed Instance.
It has 2K stars on https://github.com/olahallengren/sql-server-maintenance-solution.
Michelle Ufford (a.k.a. "SQL Fool") 's automated index defrag script, suggested in the accepted answer, conceptually seems does the same as  Ola Hallengren's SQL Server Maintenance Solution, but the latest version is in 2011.

Related

SQL Server Statistics Update

I manage 25 SQL Server databases. All 25 databases are configured to "Auto Update Statistics". A few of these databases are 250+ GB and contain tables with 2+ billion records. The "Auto Update Statistics" setting is not sufficient to effectively keep the larger database statistics updated. I created a nightly job to update stats for all databases and tables with fullscan. This fixed our performance issues initially, but now the job is taking too long (7 hours).
How can I determine which tables need a full scan statistics update? Can I use a value from sys.dm_db_index_usage_stats or some other DMV?
Using SQL Sever 2019 (version 15.0.2080.9) and the compatibility level of the databases is SQL Server 2016 (130).
As of Sql2016+ (db compatibility level 130+), the main formula used to decide if stats need updating is: MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ). In the formula, n is the count of rows in the table/index in question. You then compare the result of the formula to how many rows have been modified since the statistic was last updated. That's found at either sys.sysindexes.rowmodctr or sys.dm_db_stats_properties(...).row_count (they have the same value).
Ola's scripts also use this formula, internally, but you can use the StatisticsModificationLevel param, to be more aggressive, if you want (e.g. like Erin Stellato). The main reason people (like Erin) give to be more aggressive is if you know your tables have a lot of skew or churn.
If you find your problem is that a filtered index isn't getting updated automatically, be aware of a long-standing issue that could be the cause.
However, ultimately, I believe the reason you have a performance problem with your nightly statistics job is because you're blindly updating all statistics for every table. It's better to update only the statistics that need it---especially since you can cause an IO storm.

Large difference in performance of complex SQL query based on initial "Use dB" statement

Why does a complex SQL query run worse with a Use statement and implicit dB references than with a Use master statement and full references to the user dB?
I'm using SQL Server Std 64-bit Version 13.0.4466.4 running on Windows Server 2012 R2. This is an "academic" question raised by one of my users, not an impediment to production.
By "complex" I mean several WITH clauses and a CROSS APPLY, simplified query structure below. By "worse" I mean 3 min. vs. 1 sec for 239 Rows, repeatably. The "plain" Exec Plan for fast query will not show, however, the Exec Plan w/ Live Query Stats runs for both, analysis further below. Tanx in advance for any light shed on this!
USE Master versus USE <userdb>;
DECLARE #chartID INTEGER = 65;
WITH
with1 AS
( SELECT stuff FROM <userdb>.schema1.userauxtable ),
with2 AS
( SELECT lotsastuff FROM <userdb>.dbo.<views w/ JOINS> ),
with3 AS
( SELECT allstuff FROM with2 WHERE TheDate IN (SELECT MAX(TheDate) FROM with2 GROUP BY <field>, CAST(TheDate AS DATE)) ),
with4 AS
( SELECT morestuff FROM with1 WHERE with1.ChartID = #chartID )
SELECT finalstuff FROM with3
CROSS APPLY ( SELECT littelstuff FROM with4 WHERE
with3.TheDate BETWEEN with4.PreDate AND with4.AfterDate
AND with4.MainID = with3.MainID ) as AvgCross
The Exec Plan w/ Live Query Stats for slow query has ~41% Cost ea. (83% total) in two ops:
a) Deep under the 5th Step (of 15) Hash match (Inner Join) Hash Keys Build ... 41% Cost to Index Scan (non-clustered) of ...
b) Very deep under the 4th Step (of 15) Nested Loops (Left Semi Join) -- 42% Cost to near-identical Index Scan per (1) except addition of (... AND datediff(day,Date1,getdate() ) to Predicate.
While the Exec Plan w/ Live Query Stats for fast query shows an 83% Cost in a Columnstore Idx Scan (non-clustered) of quite deep under the 9th Step (of 12) Hash match (Inner Join) Hash Keys Build .
It would seem that the difference is in the Columnstore Idx, but why does the Use master stmt send the Execution down that road?
There may be several possible reasons for this kind of behaviour; however, in order to identify them all, you will need people like Paul Randall or Kalen Delaney to answer this.
With my limited knowledge and understanding of MS SQL Server, I can think of at least 2 possible causes.
1. (Most plausible one) The queries are actually different
If, as you are saying, the query text is sufficiently lengthy and complex, it is completely possible to miss a single object (table, view, user-defined function, etc.) when adding database qualifiers and leave it with no DB prefix.
Now, if an object by that name somehow ended up in both the master and your UserDB databases then different objects will be picked up depending on the current database context, the data might be different, indices and their fragmentation, even data types... well, you get the idea.
This way, queries become different depending on the database context, and there is no point comparing their performance.
2. Compatibility level of user database
Back in the heyday of the 2005 version, I had a database with its compatibility level set to 80, so that ANSI SQL-89 outer joins generated by some antiquated ORM in legacy client apps would keep working. Most of the tasty new stuff worked too, with one notable exception however: the pivot keyword.
A query with PIVOT, when executed in the context of that database, threw an error saying the keyword is not recognised. However, when I switched the context to master and prefixed everything with user database's name, it ran perfectly fine.
Of course, this is not exactly your case, but it's a good demonstration of what I'm talking about. There are lots of internal SQL Server components, invisible to the naked eye, that affect the execution plan, performance and sometimes even results (or your ability to retrieve them, as in the example above) that depend on settings such as database' compatibility level, trace flags and other similar things.
As a possible cause, I can think of the new cardinality estimator which was introduced in SQL Server 2014. The version of the SQL Server instance you mentioned corresponds to 2016 SP1 CU7, however it is still possible that:
your user database may be in compatibility with 2012 version (for example, if it was restored from 2012 backup and nobody bothered to check its settings after that), or
trace flag 9481 is set either for the session or for the entire SQL Server instance, or
database scoped configuration option LEGACY_CARDINALITY_ESTIMATION is set for the database, etc.
(Thankfully, SQL Server doesn't allow to change compatibility level of the master database, so it's always of the latest supported level. Which is probably good, as no one can screw the database engine itself - not this way, at least.)
I'm pretty sure that I have only scratched the surface of the subject, so while checking the aforementioned places definitely wouldn't hurt, what you need to do is to identify the actual cause of the difference (if it's not #1 above, that is). This can be done by looking at actual execution plans of the queries (forget the estimated ones, they are worthless) with a tool other than vanilla SSMS. As an example, SentryOne Plan Explorer might be a good thing to begin with. Even without that, saving plans in .sqlplan files and opening them with any XML-capable viewer/editor will show you much more, including possible leads that might explain the difference you observe.

Azure SQL Database size growing out of control DBCC SHRINKDATABASE doesn't work

Azure SQL database size in portal is 164GB. There are a lot of binary large objects passing through the database, those records are being deleted but the space is not getting reclaimed. DBCC SHRINKDATABASE doesn't help, it reports many more used pages than the sum of used_page_count from sys.dm_db_partition_stats.
DBCC SHRINKDATABASE results
DbId FileId CurrentSize MinimumSize UsedPages EstimatedPages
5 1 19877520 2048 19877208 19877208
5 2 17024 128 17024 128
sum of used_page_count from sys.dm_db_partition_stats results: 8292675
This represents a difference of 11584533 pages or about 90GB that is not actually being used and cannot be reclaimed with DBCC SHRINKDATABASE. This difference between the database reported size and actual used page count size has been growing rapidly over the past few weeks and the database will soon hit the size limit of 250GB. What can I do to resolve this issue? Any help is much appreciated - thank you.
Update: per Microsoft support, a deployment to their SQL database servers in April broke the automated ghost record cleanup. A couple weeks ago, somebody was able to manually turn it back on for our server and the database size leveled out at 174GB but did not reclaim the other space consumed by ghost records. Microsoft support recommended scaling up to a Premium tier to minimize the effects of the following I/O intensive process:
declare #db_id int = db_id()
exec ('dbcc forceghostcleanup ('+ #db_id + ', 'visit_all_pages'')')
I scaled up to P15 assuming a quicker turnaround and less down time. Running the command results:
Msg 40518, Level 16, State 1, Line 1
DBCC command 'forceghostcleanup' is not supported in this version of SQL Server.
Unable to run the command, I attempted to scale back down to S3. The scale operation ran for 24 hours, reported that it had succeeded in the activity log, but the database was still P15. The next recommendation was to scale down in stages. I attempted to scale down to P6. The scale operation ran for 24 hours, reported that it had succeeded in the activity log, but the database is still P15. At this point, MS support is going back to product support and I'm waiting to hear back. I hope there's a refund in this somewhere.
Defragmenting some indexes will very likely help.
You can use the following query to get the indexes which have the largest differences between the number of used and reserved pages:
select
table_name = object_schema_name(i.[object_id]) + '.' + object_name(i.[object_id]),
index_name = i.[name], partition_number, reserved_page_count, used_page_count
from
sys.dm_db_partition_stats ps
inner join sys.indexes i
on ps.[object_id] = i.[object_id] and ps.index_id = i.index_id
order by reserved_page_count - used_page_count desc
Rebuild the indexes from the top of the list one by one until.
Note that if you're running out of space on the entire database or the indexes are particularly large, rebuild may fail or take a very long time. In that case you should fall back to reorganizing.
More information about index defragmentation:
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes
The update I added explains this issue is a ghost record issue that will hopefully be resolved by Microsoft.

Need to reorganize or recreate index after insert?

I have inserted a million rows into a table(on sql 2000), which already has several million rows in it.
There are indexes on the target table. It was not dropped before the new insertion.
So, do I have to rebuild or reorganize the target table's indexes, after every new insertion??
Or does SQL server 2000 already perform an auto arrangement of the indexes after the load??
Dbas /Sql specialist please reply..these indexes are confusing me a lot
Yes you should reorganize it since after such a huge insertion your indexes will be fragmented.
To check the percentage fragmentation you can do this:-
DBCC SHOWCONTIG
or
select i.name,avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),null,null,null,null)
S inner join sys.indexes I on s.object_id=I.object_id and s.index_id=i.index_id
Check this site
Index should be rebuild when index fragmentation is great than 40%.
Index should be reorganized when index fragmentation is between 10% to
40%. Index rebuilding process uses more CPU and it locks the database
resources. SQL Server development version and Enterprise version has
option ONLINE, which can be turned on when Index is rebuilt. ONLINE
option will keep index available during the rebuilding.

What are hypothetical indexes?

Does anybody know what hypothetical indexes are used for in sql server 2000? I have a table with 15+ such indexes, but have no idea what they were created for. Can they slow down deletes/inserts?
hypothetical indexes are usually created when you run index tuning wizard, and are suggestions, under normal circumstances they will be removed if the wizard runs OK.
If some are left around they can cause some issues, see this link for ways to remove them.
Not sure about 2000, but in 2005 hypothetical indexes and database objects in general are objects created by DTA (Database Tuning Advisor)
You can check if an index is hypothetical by running this query:
SELECT *
FROM sys.indexes
WHERE is_hypothetical = 1
If you have given the tuning advisor good information on which to base it's indexing strategy, then I would say to generally trust its results, but if you should of course examine how it has allocated these before you trust it blindly. Every situation will be different.
A google search for "sql server hypothetical indexes" returned the following article as the first result. Quote:
Hypothetical indexes and database objects in general are simply objects created by DTA (Database Tuning Advisor)
Hypothetical indexes are those generated by the Database Tuning Advisor. Generally speaking, having too many indexes is not a great idea and you should examine your query plans to prune those which are not being used.
From sys.indexes:
is_hypothetical bit
1 = Index is hypothetical and cannot be used directly as a data access path.
Hypothetical indexes hold column-level statistics.
0 = Index is not hypothetical.
They could be also created manually with undocumented WITH STATISTICS_ONLY:
CREATE TABLE tab(id INT PRIMARY KEY, i INT);
CREATE INDEX MyHypIndex ON tab(i) WITH STATISTICS_ONLY = 0;
/* 0 - withoud statistics -1 - generate statistics */
SELECT name, is_hypothetical
FROM sys.indexes
WHERE object_id = OBJECT_ID('tab');
db<>fiddle demo

Resources