In my test environment, on a copy of my 4GB production database, I archived about 20% of my data, then ran a shrink on it from the SSMS, suggesting 20% max free space.
The result was a 2.7GB database with horrid performance. A particular query is about .5s in production, and about 11s now in test. If I remove the full-text portion of the query in test, execution time is about 2 seconds.
Actual execution plan is identical between production and test.
I rebuilt all the indexes and fulltext indexes. Performance is still about the same. No actual content in the test database has changed since duplication.
Any thoughts on where I'd look for the culprit (besides just behind the keyboard? :)
EDIT: ok, repeated the process three times, same results each time... HOWEVER, the performance degrades BEFORE I run the shrink - as soon as I archive inactive records. 0 seconds before the archive, 18 after. Get 7 seconds back after rebuilding some indexes. The archive process:
Creates a new "Archive" DB
Identifies 3 types of keys to delete, storing them in table variables
Performs a select into the "Archive" DB for those three keys from 20 tables
Deleted rows from 20 "Live" tables for those three keys.
That's it. Post-archive, when I look at the execution plan 40% time is spent in the very first operation, a clustered index scan.
I'm going to delete this and repost with the question rephrased, over at the SQL site.
relocated question: https://dba.stackexchange.com/questions/22337/option-force-order-improves-performance-until-rows-are-deleted
I'm going to delete this in a few days since the question is misleading, but just in case anyone is curious as to the outcome, it was solved here:
https://dba.stackexchange.com/questions/22337/option-force-order-improves-performance-until-rows-are-deleted
The shrink wasn't the cause, I only assumed it was because of the likelihood of fragmenting data with a shrink. The real issue was that deleting rows caused a bad statistical sample of the data shape to be taken. That in turn caused the query analyzer to return a bad plan. It thought its plan would scan about 900 rows, but instead it scanned over 52,000,000.
Thanks for all the help!
Related
We have a large database that has several tables with 10-50 million of rows each. But in reality we just need the data for the past 3 years only. So we created new tables to contain only the latest data. They tables are exactly like the original ones e.g. contains the same indexes on same partition ... same everything.
And everything went perfect. The table records count is now ~10-15 times smaller than the original size. Initial perf. measures showed significant gain, but then we found that some other stored procedures perform worst than previous - now they take 100% more time e.g. from ~2 minutes to ~4.
The table swapping was done via sp_rename.
We rebuilt all the indexes, rebuilt even the statistics, but the effect was actually very small.
Update: we cleared all execution plans via:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
Fortunatelly for me when I get back to the original tables, the problematic Stored Procedures starts to work fast as before. Right now I am comparing the execution plans, but it is pain because those SPs are huge.
Any help will be appreciate.
People often ask how to reduce the time in rebuilding the indexes on a huge SQL Server table.
But I have the reverse issue. Its a table of 5 million+ rows and when we Rebuild one of the primary non-clustered index which was fragmented up to 97%, it really went quick and was done in less than a minute.
Does SSMS say 'Rebuilding Index Completed' prematurely, but the actual re-indexing continues to happen in the background for hours on? We use SQL Server 2012.
This is the first place where I have witnessed rebuilding non-clustered indexes on any table of any huge size, finish literally in less than a minute, which frankly boggles my mind. Especially since people always seem to ask the exact opposite question.
Any explanation on this would be highly appreciated!
Before indexing tables, i backup database and restore my test database with this backup. After than i created non cluster indexes on necessary tables.
Before index, query execute time around of 20 mins
After index, query execute time around of 10 secs.
And than i created these indexes at the prod table manually. But after create indexes execute time was around of 10 mins. When i research this problem on internet, i realised index column order is important for performance. Than i changed column orders. But performance still bad. around of 9 mins.
What is wrong?
(Sorry for bad english)
From the question i understood that, Indexing increased performance on both Test and Prod environments but on Test it is taking 10 secs where as on prod it is taking 10 Mins approx.
On prod environment there are several factors to look at.
Any locks / blocks happening on the object.
index fragmentation levels
Last stats update happened.
If you post the query and the indexing strategy that would get you more help.
I am running an archive script which deletes rows from a large (~50m record DB) based on the date they were entered. The date field is the clustered index on the table, and thus what I'm applying my conditional statement to.
I am running this delete in a while loop, trying anything from 1000 to 100,000 records in a batch. Regardless of batch size, it is surprisingly slow; something like 10,000 records getting deleted a minute. Looking at the execution plan, there is a lot of time spent on "Index Delete"s. There are about 15 fields in the table, and roughly 10 of them have some sort of index on them. Is there any way to get around this issue? I'm not even sure why it takes so long to do each index delete, can someone shed some light on exactly whats happening here? This is a sample of my execution plan:
alt text http://img94.imageshack.us/img94/1006/indexdelete.png
(The Sequence points to the Delete command)
This database is live and is getting inserted into often, which is why I'm hesitant to use the copy and truncate method of trimming the size. Is there any other options I'm missing here?
Deleting 10k records from a clustered index + 5 non clustered ones should definetely not take 1 minute. Sounds like you have a really really slow IO subsytem. What are the values for:
Avg. Disk sec/Write
Avg. Disk sec/Read
Avg. Disk Write Queue Length
Avg. Disk Read Queue Length
On each drive involved in the operation (including the Log ones!). If you placed indexes in separate filegroups and allocated each filegroup to its own LUN or own disk, then you can identify which indexes are more problematic. Also, the log flush may be a major bottleneck. SQL Server doesn't have much control here, is all in your own hands how to speed things up. that time is not spent in CPU cycles, is spent waiting for IO to complete and you need an IO subsystem calibrated for the load you demand.
To reduce the IO load you should look into making indexes narrower. Primarily, make sure the clustered index is the narrowest possible that works. Then, make sure the nonclustered indexes don't include sporious unused large columns (I've seen that...). A major gain may be had by enabling page compression. And ultimately, inspect index usage stats in sys.dm_db_index_usage_stats and see if any index is good for the axe.
If you can't reduce the IO load much, you should try to split it. Add filegroups to the database, move large indexes on separate filegroups, place the filegroups on separate IO paths (distinct spindles).
For future regular delete operations, the best alternative is to use partition switching, have all indexes aligned with the clustered index partitioning and when the time is due, just drop the last partition for a lightning fast deletion.
Assume for each record in the table there are 5 index records.
Now each delete is in essence 5 operations.
Add to that, you have a clustered index. Notice the clustered index delete time is huge? (10x) longer than the other indexes? This is because your data is being reorganized with every record deleted.
I would suggest dropping at least that index, doing a mass delete, than reapplying. Index operations on delete and insert are inherently costly. A single rebuild is likely a lot faster.
I second the suggestion that #NickLarsen made in a comment. Find out if you have unused indexes and drop them. This could reduce the overhead of those index-deletes, which might be enough of an improvement to make the operation more timely.
Another more radical strategy is to drop all the indexes, perform your deletes, and then quickly recreate the indexes for the now smaller data set. This doesn't necessarily interrupt service, but it could probably make queries a lot slower in the meantime. Though I am not a Microsoft SQL Server expert, so you should take my advice on this strategy with a grain of salt.
More of a workaround, but can you add an IsDeleted flag to the table and update that to 1 rather than deleting the rows? You will need to modify your SELECTs and UPDATEs to use this flag.
Then you can schedule deletion or archiving of these records for off-hours.
It would take some work to implement it given this is in production, but if you are on SQL Server 2005 / 2008 you should investigate and convert the table to being partitioned, then the removal of old data can be achieved extremely quickly. It is designed for a 'rolling window' type effect and prevents large scale deletes tieing up a table / process.
Unfortunately with the table in production, migrating it across to this technique will take some T-SQL coding, knowledge and a weekend to upgrade / migrate it. Once in place though any existing selects and inserts will work against it seamlessly, the partition maintenance and addition / removal is where you need the t-sql to control the process.
In a comment I read
Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.
Is this true? Under which circumstances?
As with Joel I will echo the statement that yes it can be true. I've found that the key to identifying the scenario that he mentioned is all in the distribution of data, and the size of the index(es) that you have on the specific table.
In an application that I used to support that did a regular bulk import of 1.8 million rows, with 4 indexes on the table, 1 with 11 columns, and a total of 90 columns in the table. The import with indexes took over 20 hours to complete. Dropping the indexes, inserting, and re-creating the indexes only took 1 hour and 25 minutes.
So it can be a big help, but a lot of it comes down to your data, the indexes, and the distribution of data values.
Yes, it is true. When there are indexes on the table during an insert, the server will need to be constantly re-ordering/paging the table to keep the indexes up to date. If you drop the indexes, it can just add the rows without worrying about that, and then build the indexes all at once when you re-create them.
The exception, of course, is when the import data is already in index order. In fact, I should note that I'm working on a project right now where this opposite effect was observed. We wanted to reduce the run-time of a large import (nightly dump from a mainframe system). We tried removing the indexes, importing the data, and re-creating them. It actually significantly increased the time for the import to complete. But, this is not typical. It just goes to show that you should always test first for your particular system.
One thing you should consider when dropping and recreating indexes is that it should only be done on automated processes that run during the low volumne periods of database use. While the index is dropped it can't be used for other queries that other users might be riunning at the same time. If you do this during production hours ,your users will probably start complaining of timeouts.