I have a merge query which on satisfying a condition updates the table, and in case it doesn't satisfy then it inserts the records in another table. The issue is this particular insertion is taking lot of time, around 25 minutes to insert 15,000 records. What I found out is that, while inserting the records, we're also inserting a sequence id, which in turn is generated by a Trigger associated with it. The trigger selects the max id from 2 tables and as such it adds 1 to the max and returns it, which is then used by the Insert query.
Is this the exact reason why Inserts are slower in my Stored Procedure? This SP, runs on DB2.
Triggers are not good solution for performance. Use autoincrement column on your table.
The bottleneck was indeed the Trigger. I dropped the trigger and ran the sp, it took at most 2 minutes to insert around 15k records. I am using a sequence instead of trigger now.
Related
I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.
A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.
How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.
I am looking for much more better way to update tables using SSIS. Specifically, i wanted to optimize the updates on tables (around 10 tables uses same logic).
The logic is,
Select the source data from staging then inserts into physical temp table in the DW (i.e TMP_Tbl)
Update all data matching by customerId column from TMP_Tbl to MyTbl.
Inserts all non-existing customerId column from TMP_Tbl1 to MyTbl.
Using the above steps, this takes some time populating TMP_Tbl. Hence, i planned to change the logic to delete-insert but according to this:
In SQL, is UPDATE always faster than DELETE+INSERT? this would be a recipe for pain.
Given:
no index/keys used on the tables
some tables contains 5M rows, some contains 2k rows
each table update took up to 2-3 minutes, which took for about (15 to 20 minutes) all in all
these updates we're in separate sequence container simultaneously runs
Anyone knows what's the best way to use, seems like using physical temp table needs to be remove, is this normal?
With SSIS you usually BULK INSERT, not INSERT. So if you do not mind DELETE - reinserting the rows should in general outperform UPDATE.
Considering this the faster approach will be:
[Execute SQL Task] Delete all records which you need to update. (Depending on your DB design and queries, some index may help here).
[Data Flow Task] Fast load (using OLE DB Destination, Data access mode: Table of fiew - fast load) both updated and new records from source into MyTbl. No need for temp tables here.
If you cannot/don't want to DELETE records - your current approach is OK too.
You just need to fix the performance of that UPDATE query (adding an index should help). 2-3 minutes per every record updated is way too long.
If it is 2-3 minutes for updating millions of records though - then it's acceptable.
Adding the correct non-clustered index to a table should not result in "much more time on the updates".
There will be a slight overhead, but if it helps your UPDATE to seek instead of scanning a big table - it is usually well worth it.
Stored procedure 1 has a select query returning a record set, dates to be specific. I am using a cursor to go through that record set and for each row another stored procedure is called.
Stored procedure 2 inserts about 20K rows into a table for each value from the cursor.
Since there are about 100 records in the cursor, total number of rows inserted amounts to 200K, which makes the query run for days until it's stopped in production.
The same query takes about 8 minutes in dev.
I tried using foreach container in SSIS (dev) and this takes 5 minutes now (dev).
Is there a faster way of inserting these records?
I considered using table valued function but the join between the two is difficult considering the first record set contains only dates.
Dpending on what stored procedure 2 is doing it's probably worthwhile to look at bulk insert.
See: https://www.simple-talk.com/sql/learn-sql-server/bulk-inserts-via-tsql-in-sql-server/
You may also want to review indexes, and configuration of the prod environment to ensure optimal performance of the load.
The link above has some suggestions on how to improve insert performance.
So definitely worth a read.
Suppose I have a SQL Server table that has millions of rows and receives over 2000 inserts per minute. A separate process needs to do a bulk update on this table, let's say with a where clause that will update 1000 rows. But it doesn't care about performance and could optionally run 1000 single-row updates using the primary key.
If the bulk update runs too long, it will block the incoming insertions, right? Whereas updating rows individually will allow insertions to squeak through the cracks and not block? So from the standpoint of optimizing performance for the insertions, am I better off running the updates one row at a time?
Updates will not block the insert but you might get an unexpected behavior if the where condition of the where condition is not applied to the new inserted rows.. So it's better to review the logic of the application to make sure that the new inserted rows are not needed in the update.
But in general the bulk update is much better than single updates.
I have a very large database, little over 60 gigs, with many tables with millions of rows. I am getting some timeout errors, so I am rethinking some of my code design.
Currently, my pseduo code is like this:
delete from table where person=123 (deletes about 200 rows)
Then I re-insert the updated data (again, 200 rows). The data is always different, as it's time sensitive.
If I was to do an update, instead of insert, I'd have to select the row first (I'm using an ORM in c#).
tl;dr
I am just wondering, simple question, what is more cost effective.
Select / Update or Delete/Insert?
If you update any column that is part of the clustered index key then your update is handled internally as a delete/insert anyway
How would you handle the difference in cardinality with an UPDATE? Ie. person=123 has 200 rows to delete, but only 199 to insert. Update would not be able to handle this.
Your best approach should be to use a MERGE statement and a table valued parameter with the new values. Of course, no ORM can handle this, but you mention 'performance', and the terms 'performance' and 'ORM' cannot be used in the same sentence...
With Delete/Insert, you will be writing to the database twice. One time to delete and one time to insert. You will also be logging both of those transactions separately, unless you are properly wrapping the entire process in a single transaction.
You could test both methods and watch the results in SQL Profiler, but 9/10 Update will be quicker.
Could of cavets, I'd make sure the person key is indexed so that you are not doing a complete table scan to find the affected records.
Finally, as #Mundu say, you may want to do this using a parametrized query via ADO.NET instead of the ORM.