People often ask how to reduce the time in rebuilding the indexes on a huge SQL Server table.
But I have the reverse issue. Its a table of 5 million+ rows and when we Rebuild one of the primary non-clustered index which was fragmented up to 97%, it really went quick and was done in less than a minute.
Does SSMS say 'Rebuilding Index Completed' prematurely, but the actual re-indexing continues to happen in the background for hours on? We use SQL Server 2012.
This is the first place where I have witnessed rebuilding non-clustered indexes on any table of any huge size, finish literally in less than a minute, which frankly boggles my mind. Especially since people always seem to ask the exact opposite question.
Any explanation on this would be highly appreciated!
Related
In my test environment, on a copy of my 4GB production database, I archived about 20% of my data, then ran a shrink on it from the SSMS, suggesting 20% max free space.
The result was a 2.7GB database with horrid performance. A particular query is about .5s in production, and about 11s now in test. If I remove the full-text portion of the query in test, execution time is about 2 seconds.
Actual execution plan is identical between production and test.
I rebuilt all the indexes and fulltext indexes. Performance is still about the same. No actual content in the test database has changed since duplication.
Any thoughts on where I'd look for the culprit (besides just behind the keyboard? :)
EDIT: ok, repeated the process three times, same results each time... HOWEVER, the performance degrades BEFORE I run the shrink - as soon as I archive inactive records. 0 seconds before the archive, 18 after. Get 7 seconds back after rebuilding some indexes. The archive process:
Creates a new "Archive" DB
Identifies 3 types of keys to delete, storing them in table variables
Performs a select into the "Archive" DB for those three keys from 20 tables
Deleted rows from 20 "Live" tables for those three keys.
That's it. Post-archive, when I look at the execution plan 40% time is spent in the very first operation, a clustered index scan.
I'm going to delete this and repost with the question rephrased, over at the SQL site.
relocated question: https://dba.stackexchange.com/questions/22337/option-force-order-improves-performance-until-rows-are-deleted
I'm going to delete this in a few days since the question is misleading, but just in case anyone is curious as to the outcome, it was solved here:
https://dba.stackexchange.com/questions/22337/option-force-order-improves-performance-until-rows-are-deleted
The shrink wasn't the cause, I only assumed it was because of the likelihood of fragmenting data with a shrink. The real issue was that deleting rows caused a bad statistical sample of the data shape to be taken. That in turn caused the query analyzer to return a bad plan. It thought its plan would scan about 900 rows, but instead it scanned over 52,000,000.
Thanks for all the help!
I am running an archive script which deletes rows from a large (~50m record DB) based on the date they were entered. The date field is the clustered index on the table, and thus what I'm applying my conditional statement to.
I am running this delete in a while loop, trying anything from 1000 to 100,000 records in a batch. Regardless of batch size, it is surprisingly slow; something like 10,000 records getting deleted a minute. Looking at the execution plan, there is a lot of time spent on "Index Delete"s. There are about 15 fields in the table, and roughly 10 of them have some sort of index on them. Is there any way to get around this issue? I'm not even sure why it takes so long to do each index delete, can someone shed some light on exactly whats happening here? This is a sample of my execution plan:
alt text http://img94.imageshack.us/img94/1006/indexdelete.png
(The Sequence points to the Delete command)
This database is live and is getting inserted into often, which is why I'm hesitant to use the copy and truncate method of trimming the size. Is there any other options I'm missing here?
Deleting 10k records from a clustered index + 5 non clustered ones should definetely not take 1 minute. Sounds like you have a really really slow IO subsytem. What are the values for:
Avg. Disk sec/Write
Avg. Disk sec/Read
Avg. Disk Write Queue Length
Avg. Disk Read Queue Length
On each drive involved in the operation (including the Log ones!). If you placed indexes in separate filegroups and allocated each filegroup to its own LUN or own disk, then you can identify which indexes are more problematic. Also, the log flush may be a major bottleneck. SQL Server doesn't have much control here, is all in your own hands how to speed things up. that time is not spent in CPU cycles, is spent waiting for IO to complete and you need an IO subsystem calibrated for the load you demand.
To reduce the IO load you should look into making indexes narrower. Primarily, make sure the clustered index is the narrowest possible that works. Then, make sure the nonclustered indexes don't include sporious unused large columns (I've seen that...). A major gain may be had by enabling page compression. And ultimately, inspect index usage stats in sys.dm_db_index_usage_stats and see if any index is good for the axe.
If you can't reduce the IO load much, you should try to split it. Add filegroups to the database, move large indexes on separate filegroups, place the filegroups on separate IO paths (distinct spindles).
For future regular delete operations, the best alternative is to use partition switching, have all indexes aligned with the clustered index partitioning and when the time is due, just drop the last partition for a lightning fast deletion.
Assume for each record in the table there are 5 index records.
Now each delete is in essence 5 operations.
Add to that, you have a clustered index. Notice the clustered index delete time is huge? (10x) longer than the other indexes? This is because your data is being reorganized with every record deleted.
I would suggest dropping at least that index, doing a mass delete, than reapplying. Index operations on delete and insert are inherently costly. A single rebuild is likely a lot faster.
I second the suggestion that #NickLarsen made in a comment. Find out if you have unused indexes and drop them. This could reduce the overhead of those index-deletes, which might be enough of an improvement to make the operation more timely.
Another more radical strategy is to drop all the indexes, perform your deletes, and then quickly recreate the indexes for the now smaller data set. This doesn't necessarily interrupt service, but it could probably make queries a lot slower in the meantime. Though I am not a Microsoft SQL Server expert, so you should take my advice on this strategy with a grain of salt.
More of a workaround, but can you add an IsDeleted flag to the table and update that to 1 rather than deleting the rows? You will need to modify your SELECTs and UPDATEs to use this flag.
Then you can schedule deletion or archiving of these records for off-hours.
It would take some work to implement it given this is in production, but if you are on SQL Server 2005 / 2008 you should investigate and convert the table to being partitioned, then the removal of old data can be achieved extremely quickly. It is designed for a 'rolling window' type effect and prevents large scale deletes tieing up a table / process.
Unfortunately with the table in production, migrating it across to this technique will take some T-SQL coding, knowledge and a weekend to upgrade / migrate it. Once in place though any existing selects and inserts will work against it seamlessly, the partition maintenance and addition / removal is where you need the t-sql to control the process.
We have a SQL table that is populated with events from our website (mostly error logging and the like.) The table has several text fields that contain all of the information about the type of event, and a date/time field that shows when the event was logged. The table is fairly large and grows by around 10-100 records per day.
Obviously, when going through this log, we often are looking for the most recent items, so I figured an obvious way to improve our search times would be to add a index to the date field. Me, I figured that while either ASC or DESC would both be great, DESC would be better since that's the way we're searching most of the time. Our DB guy said "no way"...it would be really bad, because the index table would rapidly become fragmented.
I could see why you wouldn't want to have a clustered index on date DESC, because you'd constantly be trying to insert at the beginning...but I thought with a non-clustered index it would be okay, since the records wouldn't need to be moved around. But what he's saying also makes sense...still would have to move indexes around.
But how much? And how big of a hit would it be? And even if it isn't much of a hit, maybe it's still not worth it because the performance on occasional selects just couldn't improve that much? Thoughts?
I don't think it's a bad idea - quite the contrary!
Not knowing your database system, I can't really be sure why your DB guy would think this would be a bad idea. And even so - even an ascending index on the date will be quite beneficial already (at least in the case of SQL Server).
In this case, if you do frequently query by date and usually will retrieve the most recent ones, this seems like a perfect index to me! Maybe you could make it even better by adding the second most likely selection criteria (log application? log type?) to it, so that if you specify both the date and that second criteria, the search scope would be even more limited within the index.
If I were you, I would try a few sample queries against the table without this index, and then add the non-clustered index on your logdate - first with ASC and test how your queries perform (check out their execution plans!), then try the index with DESC, and possibly try the index with LogDate and an additional criteria field, too. See how performance looks like.
Marc
Indexes speed up some queries but slow down all loads. Whether or not an index gives an overall performance improvement depends on how much it speeds up your actual query workload and how much it slows down your actual loading workload (as well as deletes and updates that modify the indexed column).
In many (probably most) applications that involve storing event data, there is a huge amount of loading going on and relatively little querying, which is primarily summary-type queries that don't benefit from indexes. In these sorts of applications, indexes often do more harm than good.
In many such applications, it is possible to do loads during off hours so even if the index gives an overall slowdown, it might be worth it to increase query speed because someone is waiting for the query output but no one waits for the load to complete. However, the index can get so large that overruns the file cache and each insert has to read and write a different leaf page from disk. At this point, loads start to require a linear number of random access disk reads and writes, which can cause it to take all day to do a load.
I had a revelation a couple of years ago when optimizing a database table that was running slow. The original query would take around 20 minutes in a table with large data-fields. After finally realizing that noone had bothered to index the table in the first place I built an index for a couple of the keys. Lo and behold, the query ran in 0.15 seconds!
What are some of your biggest improvements when adding an index for a table?
This relates to a very old DBMS product, similar to Oracle but not Oracle itself. This product made it very easy to create a table with no indexes. Oracle is different in that, if you declare a primary key, Oracle will automatically create an index on the primary key. This product didn't do that.
I was called in to speed up a database that was crawling. There was a table called "CostCenters" with 900 rows in it and no indexes. A couple of years earlier, that table had had 20 rows in it. Referential integrity lookups on this table were requiring a table scan. The system was on its knees.
Creating the index took five minutes. It speeded things up by a factor of 100. We did some other things, like defragmenting the disks, and rebuilding some indexes that had become overpopulated. A task that had been taking 10 minutes before the speedup took two seconds after the speedup.
Having said this, don't let concerns about speed blind you to simple and sound design. You need simple and sound tables, indexes, database objects, application code, and queries. It's easy to speed up things that are simple and sound. It's much harder to take things designed only for speed and make them simple and sound.
I once modified an analytic function to include a couple of logically redundant columns in the windowing clause, allowing partition and subpartition pruning and index-based access.
A one hour query was reduced to 0.02 seconds, that being 180,000 times faster.
http://oraclesponge.wordpress.com/2006/03/15/predicate-pushing-and-analytic-functions/
Do I win? :D
i had a quite similar case with a table which had no primary key set - so joining to this table (containing 5 row's or so) took about 10minutes (yes - the other table was quite big)
all happening on an MSSQL2k
after setting a PK it took less then a tenth of a second...
so the query-optimizer really f***cks up when no PK is present :)
For example, for heavily used tables with volumes in the order of 10 million rows that grow by a million rows a month, if the stats are 6-8 months old how detrimental to the performance of the database is this going to be? How often should you be refreshing the stats?
Statistics are kept and used by the query planner, and they have a noticeable impact. I can't give you exact guidelines on how often you should refresh them. That will depend on how much work it takes to refresh them and how much impact fresh stats have on your queries. The real answer for this is to take good measurements and judge options by the results. Tinkering without measurement is a throw of the dice.
We refresh stats every night. No sense waiting for the Weekend if the stats could be refreshed nightly - by Friday they will be worse than they were on Monday ...
Problem is what if it takes too long?
For databases which have that problem we refresh stats on certain tables each night - so some tables are done every night, some less often. (We have a database table of which tables to do when, and a history of how long the Stats took to regenerate, and tune the schedule accordingly)
if the stats are 6-8 months old how detrimental to the performance of the database is this going to be
I would be very surprised if it didn't make a huge difference on a table growing by 1 million rows-per-month
If that is your actual state I would expect that the tables need defragging too
Implications are dire. You should be refreshing them as often as you can to give the optimizer the best information to make decisions. You will be able to find out how bad the statistics are by running the optdiag utility. Analysing the output and running again to compare over a few days or a week will let you know exactly how bad the situation is. I would recommend that at the earliest convieniance you drop and recreate the indexes and run 'update index statistics' on the table in question. This should be enough information to get you through. I am assuming that you are able to analyse the output of optdiag though.