I have a MSSQL2005 database that has records dating back to 2004, there are currently just under 1,000,000 records in one particular table.
Thing is, if I run a report comparing 2009 data against 2010 data, 2008 against 2009, 2009 against 2009 or any combination of years before this year then results are returned in 1-5 seconds.
If however I run a report that includes 2011 data then the report takes ~6 minutes.
I've checked the data and it looks similar to previous years and is cross-referenced against the same data used in all of the reports.
It's as if the database has exceeded some limit; that data for this year has become fragmented and therefore harder to access. I'm not saying this is the case but it may be for all I know.
Anyone have any suggestions?
Shaun.
Update:Since posting the question I found DBCC DBREINDEX table_name which seems to have done the trick.
What do the execution plans look like? If different you might need to manually update statistics on the table as the newly inserted rows are likely to be disproportionately unrepresented in the statistics and it might thus choose a sub optimal plan.
See this blog post for an explanation of this issue Statistics, row estimations and the ascending date column
Additionally check that your 2011 query isn't encountering blocking due to concurrent inserts or updates that do not affect queries against historic data.
Related
I manage 25 SQL Server databases. All 25 databases are configured to "Auto Update Statistics". A few of these databases are 250+ GB and contain tables with 2+ billion records. The "Auto Update Statistics" setting is not sufficient to effectively keep the larger database statistics updated. I created a nightly job to update stats for all databases and tables with fullscan. This fixed our performance issues initially, but now the job is taking too long (7 hours).
How can I determine which tables need a full scan statistics update? Can I use a value from sys.dm_db_index_usage_stats or some other DMV?
Using SQL Sever 2019 (version 15.0.2080.9) and the compatibility level of the databases is SQL Server 2016 (130).
As of Sql2016+ (db compatibility level 130+), the main formula used to decide if stats need updating is: MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ). In the formula, n is the count of rows in the table/index in question. You then compare the result of the formula to how many rows have been modified since the statistic was last updated. That's found at either sys.sysindexes.rowmodctr or sys.dm_db_stats_properties(...).row_count (they have the same value).
Ola's scripts also use this formula, internally, but you can use the StatisticsModificationLevel param, to be more aggressive, if you want (e.g. like Erin Stellato). The main reason people (like Erin) give to be more aggressive is if you know your tables have a lot of skew or churn.
If you find your problem is that a filtered index isn't getting updated automatically, be aware of a long-standing issue that could be the cause.
However, ultimately, I believe the reason you have a performance problem with your nightly statistics job is because you're blindly updating all statistics for every table. It's better to update only the statistics that need it---especially since you can cause an IO storm.
We have a DMV query that executes every 10 mins and inserts usage statistics, like SESSION_CURRENT_DATABASE, SESSION_LAST_COMMAND_STARTTIME, etc.. and supposedly has been running fine for the last 2 years.
Today we were notified by data hyperingestion team that the last records shown were from 6/10. So we found out the job has been stuck for 14 days not executing new statistics since. We've immediately restarted the job and it's been executing successfully since the morning, but basically we've lost the data during this 14 days period. Is there a way for us to execute this DMV query between 6/10-6/24 on the $SYSTEM.DISCOVER to recover these past 14 days of data?
Or all hope's lost?
DMV query:
SELECT [SESSION_ID]
,[SESSION_SPID]
,[SESSION_CONNECTION_ID]
,[SESSION_USER_NAME]
,[SESSION_CURRENT_DATABASE]
,[SESSION_USED_MEMORY]
,[SESSION_PROPERTIES]
,[SESSION_START_TIME]
,[SESSION_ELAPSED_TIME_MS]
,[SESSION_LAST_COMMAND_START_TIME]
,[SESSION_LAST_COMMAND_END_TIME]
,[SESSION_LAST_COMMAND_ELAPSED_TIME_MS]
,[SESSION_IDLE_TIME_MS]
,[SESSION_CPU_TIME_MS]
,[SESSION_LAST_COMMAND_CPU_TIME_MS]
,[SESSION_READS]
,[SESSION_WRITES]
,[SESSION_READ_KB]
,[SESSION_WRITE_KB]
,[SESSION_COMMAND_COUNT]
FROM $SYSTEM.DISCOVER_SESSIONS
I wouldn't say it's "gone" unless the instance has been restarted or the db has been detached. For example, the dmv for procedure usage should still have data in it, but you won't be able to specifically recreate what it looked like 10 days ago.
You can get a rough idea by looking back through the 2 years of data you already have, and get a sense of if there are spikes or consistent usage. Then, grab a snapshot of the DMV today, and extrapolate it back 14 days to get a rough idea of what usage was like.
We have a SQL Server 2012 instance, with auto-statitics set to ON for the DB:
But then i ran a query to check some statistics, and some haven't been updated in a while:
Why is this the case? Is there a rule to whether SQL Server updates statistics that haven't been triggered by these indexes?
Do i need to care? How do i know if i need to update them, or if they are causing performance issues for me?
Thanks!
Even though you set Auto update statistics to true, they will update only when a threshold has been reached..this is different for different versions
Thresholds for SQL Server 2012 or older:
The table size has gone from 0 to > 0 rows
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered
For SQLServer 2016,there are few major changes and SQL Updates statistics with a new algorithm(read as more frequently than old versions)
Do i need to care? How do i know if i need to update them, or if they are causing performance issues for me?
Normally people schedule maintenance jobs during weekends and this includes index rebuild/stats update..
This should normally take care of most databases.In your case,if you are seeing performance issues due to invalid stats,you can update them manually.We do it once a week,but sites like StackOverflow does it more often
update stats tablename
Further reading/references:
Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
Understanding When Statistics Will Automatically Update
I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)
This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)
Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.
Have you ever seen any of there error messages?
-- SQL Server 2000
Could not allocate ancillary table for view or function resolution.
The maximum number of tables in a query (256) was exceeded.
-- SQL Server 2005
Too many table names in the query. The maximum allowable is 256.
If yes, what have you done?
Given up? Convinced the customer to simplify their demands? Denormalized the database?
#(everyone wanting me to post the query):
I'm not sure if I can paste 70 kilobytes of code in the answer editing window.
Even if I can this this won't help since this 70 kilobytes of code will reference 20 or 30 views that I would also have to post since otherwise the code will be meaningless.
I don't want to sound like I am boasting here but the problem is not in the queries. The queries are optimal (or at least almost optimal). I have spent countless hours optimizing them, looking for every single column and every single table that can be removed. Imagine a report that has 200 or 300 columns that has to be filled with a single SELECT statement (because that's how it was designed a few years ago when it was still a small report).
For SQL Server 2005, I'd recommend using table variables and partially building the data as you go.
To do this, create a table variable that represents your final result set you want to send to the user.
Then find your primary table (say the orders table in your example above) and pull that data, plus a bit of supplementary data that is only say one join away (customer name, product name). You can do a SELECT INTO to put this straight into your table variable.
From there, iterate through the table and for each row, do a bunch of small SELECT queries that retrieves all the supplemental data you need for your result set. Insert these into each column as you go.
Once complete, you can then do a simple SELECT * from your table variable and return this result set to the user.
I don't have any hard numbers for this, but there have been three distinct instances that I have worked on to date where doing these smaller queries has actually worked faster than doing one massive select query with a bunch of joins.
#chopeen You could change the way you're calculating these statistics, and instead keep a separate table of all per-product stats.. when an order is placed, loop through the products and update the appropriate records in the stats table. This would shift a lot of the calculation load to the checkout page rather than running everything in one huge query when running a report. Of course there are some stats that aren't going to work as well this way, e.g. tracking customers' next purchases after purchasing a particular product.
This would happen all the time when writing Reporting Services Reports for Dynamics CRM installations running on SQL Server 2000. CRM has a nicely normalised data schema which results in a lot of joins. There's actually a hotfix around that will up the limit from 256 to a whopping 260: http://support.microsoft.com/kb/818406 (we always thought this a great joke on the part of the SQL Server team).
The solution, as Dillie-O aludes to, is to identify appropriate "sub-joins" (preferably ones that are used multiple times) and factor them out into temp-table variables that you then use in your main joins. It's a major PIA and often kills performance. I'm sorry for you.
#Kevin, love that tee -- says it all :-).
I have never come across this kind of situation, and to be honest the idea of referencing > 256 tables in a query fills me with a mortal dread.
Your first question should probably by "Why so many?", closely followed by "what bits of information do I NOT need?" I'd be worried that the amount of data being returned from such a query would begin to impact performance of the application quite severely, too.
I'd like to see that query, but I imagine it's some problem with some sort of iterator, and while I can't think of any situations where its possible, I bet it's from a bad while/case/cursor or a ton of poorly implemented views.
Post the query :D
Also I feel like one of the possible problems could be having a ton (read 200+) of name/value tables which could condensed into a single lookup table.
I had this same problem... my development box runs SQL Server 2008 (the view worked fine) but on production (with SQL Server 2005) the view didn't. I ended up creating views to avoid this limitation, using the new views as part of the query in the view that threw the error.
Kind of silly considering the logical execution is the same...
Had the same issue in SQL Server 2005 (worked in 2008) when I wanted to create a view. I resolved the issue by creating a stored procedure instead of a view.