I have two tables - Things and History. Let's say Things have a column Id as primary key and another column LastUpdated that is some kind of timestamp value. There is an index on LastUpdated column. There are other columns but I believe that should not be of much relevance. I have two business workflows from my ASP.NET MVC Web API application
Gets things that are changed since last time. Runs a query of form: SELECT [Columns] FROM Things WHERE LastUpdated > #LastUpdated. This query is fired on connection w/o transaction or transaction scope.
Updates particular thing. This happens in two database transactions - first transaction has a SELECT on Things with Id followed by UPDATE Things using Id. The second transaction inserts row in History table. Both transactions are controlled using TransactionScope with Read Committed isolation level.
I am trying to bench mark the performance with a scenario such as
5000 rows in Things table
25 queries (#1 above) per second
200 updates (#2 above) per second
Environment: My dev machine (Core i7, 8GB) with Sql Server 2016 LocalDB engine.
What I am observing is that as the number of rows increases in History table, performance degrades for both #1 and #2. #2 is understandable as it actually updates the history table. However, I am puzzled that why #1 gets degraded dramatically. For example, #2 time degrades from around 10 ms at empty database to around 65 ms for 250K rows in history table while #1 time degrades from around 10 ms to 200 ms. Note that Things table have constant number of rows (5000). Only explanation that I can think of is due to contention at Disk IO.
Am I right here? Is there any way to verify this(say, using SQL Profiler)?
I couldn't spend as much time as to my liking on this but as suspected, the apparent reason seams to be disk I/O contention. I moved the history table to another database on different machine. And now, number of rows in history table stopped affecting #1 timing, it remains more or less constant (6-7% variance) irrespective of number of rows in history table.
Related
I manage 25 SQL Server databases. All 25 databases are configured to "Auto Update Statistics". A few of these databases are 250+ GB and contain tables with 2+ billion records. The "Auto Update Statistics" setting is not sufficient to effectively keep the larger database statistics updated. I created a nightly job to update stats for all databases and tables with fullscan. This fixed our performance issues initially, but now the job is taking too long (7 hours).
How can I determine which tables need a full scan statistics update? Can I use a value from sys.dm_db_index_usage_stats or some other DMV?
Using SQL Sever 2019 (version 15.0.2080.9) and the compatibility level of the databases is SQL Server 2016 (130).
As of Sql2016+ (db compatibility level 130+), the main formula used to decide if stats need updating is: MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ). In the formula, n is the count of rows in the table/index in question. You then compare the result of the formula to how many rows have been modified since the statistic was last updated. That's found at either sys.sysindexes.rowmodctr or sys.dm_db_stats_properties(...).row_count (they have the same value).
Ola's scripts also use this formula, internally, but you can use the StatisticsModificationLevel param, to be more aggressive, if you want (e.g. like Erin Stellato). The main reason people (like Erin) give to be more aggressive is if you know your tables have a lot of skew or churn.
If you find your problem is that a filtered index isn't getting updated automatically, be aware of a long-standing issue that could be the cause.
However, ultimately, I believe the reason you have a performance problem with your nightly statistics job is because you're blindly updating all statistics for every table. It's better to update only the statistics that need it---especially since you can cause an IO storm.
I would need information what might be the impact for production DB of creating triggers for ~30 Production tables that capture any Update,Delete and Insert statement and put following information "PK", "Table Name", "Time of modification" to separate table.
I have limited ability to test it as I have read only permissions to both Prod and Test environment (and I can get one work day for 10 end users to test it).
I have estimated that number of records inserted from those triggers will be around ~150-200k daily.
Background:
I have project to deploy Data Warehouse for database that is very customized + there are jobs running every day that manipulate the data. Updated on Date column is not being maintain (customization) + there are hard deletes occurring on tables. We decided to ask DEV team to add triggers like:
CREATE TRIGGER [dbo].[triggerName] ON [dbo].[ProductionTable]
FOR INSERT, UPDATE, DELETE
AS
INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
SELECT 'ProductionTable', PK_ID, GETDATE() FROM inserted
INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
SELECT 'ProductionTable', PK_ID, GETDATE() FROM deleted
on core ~30 production tables.
Based on this table we will pull delta from last 24 hours and push it to Data Warehouse staging tables.
If anyone had similar issue and can help me estimate how it can impact performance on production database I will really appreciate. (if it works - I am saved, if not I need to propose other solution. Currently mirroring or replication might be hard to get as local DEVs have no idea how to set it up...)
Other ideas how to handle this situation or perform tests are welcome (My deadline is Friday 26-01).
First of all I would suggest you code your table name into a smaller variable and not a character one (30 tables => tinyint).
Second of all you need to understand how big is the payload you are going to write and how:
if you chose a correct clustered index (date column) then the server will just need to out data row by row in a sequence. That is a silly easy job even if you put all 200k rows at once.
if you code the table name as a tinyint, then basically it has to write:
1byte (table name) + PK size (hopefully numeric so <= 8bytes) + 8bytes datetime - so aprox 17bytes on the datapage + indexes if any + log file . This is very lightweight and again will put no "real" pressure on sql sever.
The trigger itself will add a small overhead, but with the amount of rows you are talking about, it is negligible.
I saw systems that do similar stuff on a way larger scale with close to 0 effect on the work process, so I would say that it's a safe bet. The only problem with this approach is that it will not work in some cases (ex: outputs to temp tables from DML statements). But if you do not have these kind of blockers then go for it.
I hope it helps.
I use SSMS 2016. I have a view that has a few millions of records. The view is not indexed and should not be as it's being updated (insert, delete, update) every 5 minutes by a job on the server to then display update data sets in to the client calling application in GUI.
The view does a very heavy volume of conversion INT values to VARCHAR appending to them some string values.
The view also does some CAST operations on the NULL assigning them column names aliases. And the worst performance hit is that the view uses FOR XML PATH('') function on 20 columns.
Also the view uses two CTEs as the source as well as Subsidiaries to define a single column value.
I made sure I created the right indexes (Clustered, nonclustered,composite and Covering) that are used in the view Select,JOIN,and WHERE clauses.
Database Tuning Advisor also have not suggested anything that could substantialy improve performance.
AS a workaround I decided to create two identical physical tables with clustered indexes on each and using the Merge statement (further converted into a SP and then Into as SQL Server Agent Job) maintain them updated. And to assure there is no long locking of the view. I will then swap(rename) the tables names immediately after each merge finishes. So in this case all the heavy workload falls onto a SQL Server Agent Job keep the tables updated.
The problem is that the merge will take roughly 15 minutes considering current size of the data, which may increase in the future. So, I need to have a real time design to assure that the view has the most up-to-date information.
Any ideas?
We have a DB of size 1257GB having 100+ objects with hourly transactions (Insert and update).
We have set Auto Update statistics:True and Auto Update statistics asynchronously : False.
When we trigger the queries to fetch the data it's taking long time. But when we manually execute SP_UpdateStats, the same query is taking very less time to fetch the same amount of data.
Please let me know whether we need to update the stats on regular basis? And what are the advantages and disadvantages of using EXEC SP_UpdateStats?
Windows server 2012R2 and SSMS2014.
But when we manually exec SP_UpdateStats, the same query is taking very less time to fetch the same amount of data
Even though you have auto update statistics set to true, your statistics won't be updated frequently
SQLServer triggers automatic statistics update,based on certain thresholds and below thresholds holds good for all version less than SQLServer 2016
The table size has gone from 0 to >0 rows (test 1).
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
so based on how big your table is ,you can determine the threshold using above formula
Starting with SQLServer 2016,this threshold has been changed and statistics will be triggered more frequently.
The same behaviour can be obtained in older versions with the help of traceflag 371
For example, if the trace flag is activated(by default activated in SQLServer 2016), update statistics will be triggered on a table with 1 billion rows when 1 million changes occur.
If the trace flag is not activated, then the same table with 1 billion records would need 200 million changes before an update statistics is triggered.
Could you please let me know whether we have to update the stats on regular basis? And what are the advantages and disadvantages of using EXEC SP_UpdateStats?
If you see suboptimal plans due to inaccurate statistics,go ahead and schedule this new flag
Talking about disadvantages, if you update statistics frequently, you will see query plans getting recompiled,which will in turn cause CPU pressure,when plans are compiled again
Sp_UpdateStats will update the statistics on all the tables. Brent Ozar believes that this should be done much more regularly then doing reorgs or rebuilds on indexes. By updating your statistics SQL Server is more likely to create a 'better' query plan. The downside of this is that all the statistics on all tables (whether they need to or not) will be updated and it can take substantial resources to do so. Many DBAs run sp_updatestats on a nightly or weekly basis when the machine is not being used heavily. There are scripts that will check for what tables to be updated and only those are updated.
To see different approaches to updating statistics this is a good place to start:
https://www.brentozar.com/archive/2014/01/update-statistics-the-secret-io-explosion/
If the query is running slowly it is much more likely that there are other issues with the query. You should post the query and query plan and the community may be able to offer useful hints on improving the query or adding indexes to the underlying tables.
I am looking for much more better way to update tables using SSIS. Specifically, i wanted to optimize the updates on tables (around 10 tables uses same logic).
The logic is,
Select the source data from staging then inserts into physical temp table in the DW (i.e TMP_Tbl)
Update all data matching by customerId column from TMP_Tbl to MyTbl.
Inserts all non-existing customerId column from TMP_Tbl1 to MyTbl.
Using the above steps, this takes some time populating TMP_Tbl. Hence, i planned to change the logic to delete-insert but according to this:
In SQL, is UPDATE always faster than DELETE+INSERT? this would be a recipe for pain.
Given:
no index/keys used on the tables
some tables contains 5M rows, some contains 2k rows
each table update took up to 2-3 minutes, which took for about (15 to 20 minutes) all in all
these updates we're in separate sequence container simultaneously runs
Anyone knows what's the best way to use, seems like using physical temp table needs to be remove, is this normal?
With SSIS you usually BULK INSERT, not INSERT. So if you do not mind DELETE - reinserting the rows should in general outperform UPDATE.
Considering this the faster approach will be:
[Execute SQL Task] Delete all records which you need to update. (Depending on your DB design and queries, some index may help here).
[Data Flow Task] Fast load (using OLE DB Destination, Data access mode: Table of fiew - fast load) both updated and new records from source into MyTbl. No need for temp tables here.
If you cannot/don't want to DELETE records - your current approach is OK too.
You just need to fix the performance of that UPDATE query (adding an index should help). 2-3 minutes per every record updated is way too long.
If it is 2-3 minutes for updating millions of records though - then it's acceptable.
Adding the correct non-clustered index to a table should not result in "much more time on the updates".
There will be a slight overhead, but if it helps your UPDATE to seek instead of scanning a big table - it is usually well worth it.