Improve UPDATE/DELETE performance SQL Server 2012 - sql-server

We have process in our project where records in a table with specific flag is deleted and remaining record's flag is updated.
Table have approx 45 million records and half the records are with flag='C' and remaining half with flag='P'.
Process run once in a day to delete all the records with flag 'P' and then update all the remaining ones with flag 'C'
Below are the two statements that is run through SSIS package.
DELETE FROM dbo.RTL_Valuation WITH (TABLOCK)
WHERE Valuation_Age_Flag = 'P';
UPDATE dbo.RTL_Valuation WITH (TABLOCK)
SET Valuation_Age_Flag = 'P'
WHERE Valuation_Age_Flag = 'C';
Currently process takes 60 minutes to complete. Is there any way process time could be improved ?
Thanks

You need to do 10000 rows at a time. You are creating one enormous transaction that takes up a lot of room in the transaction log (so it can be rolled back).
set nocount on
DELETE top (10000) FROM dbo.RTL_Valuation WHERE valuation_Age_Flag = 'P';
while ##rowcount()>0
begin
DELETE top (10000) FROM dbo.RTL_Valuation WHERE valuation_Age_Flag = 'P';
end
You can try 1,000, 5,000 or some other number to determine which is the best 'magic' number to quickly delete rows from a large table on your install of SQL Server. But it will be a lot faster that doing a big delete. The same logic applies to the update.

Ok. I assume, that when you perform your delete and update statements it results into two scans of the entire table (one to identify the rows to delete and one to identify the rows to update) and then you have to perform fully logged delete and update operations over it.
There is nice trick for situations like this if your database is in the simple recovery model. However, whether this is suitable for you depends on other circumstances (e.g. how many indexes you table has, whether there are some references, data types ...) that I am not able to asses from your description. It requires more coding but it usually results into much better performance. You would have to test whether it works better for you than your original approach.
Anyway, the trick works like this:
Instead of delete and update operations just select the rows you want to keep (including the changes of the flag) using "SELECT INTO" construct into new table. This results in the minimally logged insert operation and single table scan. You can use also the "INSERT INTO SELECT" construct but there you must fulfill some additional conditions to get the minimally logged insert.
Once data is in the new table, you have to build all required indexes on it.
Once all indexes are build, you just truncate the original table and using the SWITCH command you simply switch the data back to the original table and drop the "new table". It works also on the standard edition of the SQL Server.

Related

How to improve the update?

description
I use Postgres together with python3
There are 17 million rows in the table, the max ID 3000 million+
My task is select id,link from table where data is null;.And do some codes them Update table set data = %s where id = %s.
I tested a single data update needs 0.1s.
my thoughts
The following is my idea
Try a new database, I heard radis soon.But i don't know how to do.
In addition,what is the best number of connections?
I used to made 5-6 connections.
Now only two connections, but better.One hour updated 2million data.
If there is any way you can push the calculation of the new value into the database, i.e. issue a single large UPDATE statement like
UPDATE "table"
SET data = [calculation here]
WHERE data IS NULL;
you would be much faster.
But for the rest of this discussion I'll assume that you have to calculate the new values in your code, i.e. run one SELECT to get all the rows where data IS NULL and then issue a lot of UPDATE statements, each targeting a single row.
In that case, there are two ways how you can speed up processing considerable:
Avoid index updates
Updating an index is more expensive than adding a tuple to the table itself (the appropriately so-called heap, onto which it is quick and easy to pile up entries). So by avoiding index updates, you will be much faster.
There are two ways to avoid index updates:
Drop all indexes after selecting the rows to change and before the UPDATEs and recreate them after processing is completed.
This will be a net win if you update enough rows.
Make sure that there is no index on data and that the tables have been created with a fillfactor of less then 50. Then there is room enough in the data pages to write the update into the same page as the original row version, which obviates the need to update the index (this is known as a HOT update).
This is probably not an option for you, since you probably didn't create the table with a fillfactor like that, but I wanted to add it for completeness' sake.
Bundle many updates in a single transaction
By default, each UPDATE will run in its own transaction, which is committed at the end of the statement. However, each COMMIT forces the transaction log (WAL) to be written out to disk, which slows down processing considerably.
You do that by explicitly issuing a BEGIN before the first UPDATE and a COMMIT after the last one. That will also make the whole operation atomic, so that all changes are undone automatically if processing is interrupted.

Is it possible to do INSERT + UPDATE + DELETE vs DELETE + INSERT?

I am using SQL Server 2008 and this type of questions (INSERT + UPDATE vs DELETE + INSERT) was asked a number of time but my situation is kind of different based on my understanding. Please see the attached image below.
I have a table BOM, but later I need to update the quantity, insert a new material, and a material is not needed so I delete.
My questions are: for a very large table (10M+ rows)
is it possible to do INSERT + UPDATE + DELETE?
If yes, is it better than using DELETE + INSERT?
I searched and questions/answers were for case 1 but I need a solution for case 2.
It depends on the table/index structure, triggers, etc.
In certain cases you may execute UPDATE statement, but in fact in can be DELETE/INSERT operation in the background.
In certain cases it's faster to use 'soft deletion' - just mark the row as deleted, and then, during the quiet times (over weekend), you can schedule to delete the rows physically, defragment indexes, etc. You can use MERGE command to do INSERT/UPDATE/DELETE in one go, but you need to measure the performance comparing it with individual INSERT/DELETE/UPDATE statements. In many cases I prefer to do deletion first, then update, and finally insert.
It also depends on the number of changed rows... Sometimes it's faster to move the data to the new table (heap), then create the indexes there and finally drop/rename the tables.

Bulk update in sql server

If I have millions rows to update in sql server, then how do I proceed? Is there a logical method for bulk update as it will otherwise lock the table for a long time.
You can run the code as a bulkadmin user which will mean that triggers won't fire and will make the insert table less time if you have triggers. The down side of this is that triggers won't fire and you may need to have them fire or to write code to do what the tirgger might have done.
You can run the update in batches (of say 10,000) so that you you are not updating millions of rows in one transaction. Make sure each batch is in it's own transaction. It is often best to do this during nonpeak hours as well.
You can make sure your code is written so that it will not update rows that don't need updating. For instance you might see an update statment like this:
Update table1
set somefield = 0
where you might need
Update table1
set somefield = 0
where somefield <> 0
There is a huge performance difference between updating a million rows and the 35 that needed update.
If your update is from a file, use SSIS which is optimized for high performance if you write the package corectly.
The interviewer(s) probably wanted to gauge your knowledge of lock granularity. If the "bulk" update would otherwise lock the table, then you could update smaller subsets of the table data, with each one in a separate transaction until all necessary rows were updated. Smaller subsets of data could allow the updates to hold lesser locks, such as extent, page, RID locks, etc.

How do I speed up deletes from a large database table?

Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second.
In some cases I need to delete millions of records from the tables, and it just takes too long to be practical.
Disabling foreign keys is not an option. Deleting large batches of rows is also not an option because this is a production application and large deletes lock too many resources, causing failures. I'm using Sql Server, and I know about partitioned tables, but the restrictions on partitioning (and the license fees for enterprise edition) are so unrealistic that they are not possible.
When I began working on this problem I thought the hard part would be writing the algorithm that figures out how to delete rows from the leaf level up to the top of the data model, so that no foreign key constraints get violated along the way. But solving that problem did me no good since it takes weeks to delete records that need to disappear overnight.
I already built in a way to mark data as virtually deleted, so as far as the application is concerned, the data is gone, but I'm still dealing with large data files, large backups, and slower queries because of the sheer size of the tables.
Any ideas? I have already read older related posts here and found nothing that would help.
Please see: Optimizing Delete on SQL Server
This MS support article might be of interest: How to resolve blocking problems that are caused by lock escalation in SQL Server:
Break up large batch operations into several smaller operations. For
example, suppose you ran the following
query to remove several hundred
thousand old records from an audit
table, and then you found that it
caused a lock escalation that blocked
other users:
DELETE FROM LogMessages WHERE LogDate < '2/1/2002'
By removing these records a few
hundred at a time, you can
dramatically reduce the number of
locks that accumulate per transaction
and prevent lock escalation. For
example:
SET ROWCOUNT 500
delete_more:
DELETE FROM LogMessages WHERE LogDate < '2/1/2002'
IF ##ROWCOUNT > 0 GOTO delete_more
SET ROWCOUNT 0
Reduce the query's lock footprint by making the query as efficient as
possible. Large scans or large
numbers of Bookmark Lookups may
increase the chance of lock
escalation; additionally, it increases
the chance of deadlocks, and generally
adversely affects concurrency and
performance.
delete_more:
DELETE TOP(500) FROM LogMessages WHERE LogDate < '2/1/2002'
IF ##ROWCOUNT > 0 GOTO delete_more
You could achieve the same result using SET ROWCOUNT as suggested by Mitch but according to MSDN it won't be supported for DELETE and some other operations in future versions of SQL Server:
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE
statements in a future release of SQL Server. Avoid using SET ROWCOUNT
with DELETE, INSERT, and UPDATE statements in new development work,
and plan to modify applications that currently use it. For a similar
behavior, use the TOP syntax. For more information, see TOP
(Transact-SQL).
You could create new files, copy all but the "deleted" rows, then swap the names on the tables. Finally, drop the old tables. If you're deleting a large percentage of the records, then this may actually be faster.
Another suggestion is to rename the table and add a status column. When status = 1 (deleted), then you won't want it to show. So you then create a view with the same name as the orginal table which selects from the table when status is null or = 0 (depending on how you implement it). The deletion appears immediate to the user and a background job can run every fifteen minutes deleting records that runs without anyone other than the dbas being aaware of it.
If you're using SQL 2005 or 2008, perhaps using "snapshot isolation" would help you. It allows the data to remain visible to users while there's an underlying data update operation processing, and then reveals the data as soon as it's committed. Even if you delete takes 30 minutes to run, your applications would stay online during this time.
Here's a quick primer of snapshot locking:
http://www.mssqltips.com/tip.asp?tip=1081
Though you should still try to speed up your delete so it's as quick as possible, this may alleviate some of the burden.
You can delete small batches using a while loop, something like this:
DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
WHILE ##ROWCOUNT > 0
BEGIN
DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
END
If a sizeable percentage of the table is going to match the deletion criteria (near or over 50%), then it is "cheaper" to create a temporary table with the records that are not going to be deleted (reverse the WHERE criteria), truncate the original table and then repopulate it with the records that were intended to be kept.
DELETE FROM TABLE WHERE ROW_TO_DELETE = 'OK';
GO
-->
INSERT INTO #TABLE WHERE NOT ROW_TO_DELETE = 'OK';
TRUNCATE TABLE;
INSERT INTO TABLE (SELECT * FROM #TABLE);
GO
here is the solution to your problem.
DECLARE #RC AS INT
SET #RC = -1
WHILE #RC <> 0
BEGIN
DELETE TOP(1000000) FROM [Archive_CBO_ODS].[CBO].[AckItem] WHERE [AckItemId] >= 300
SET #RC = ##ROWCOUNT
--SET #RC = 0
END

Is deleting all records in a table a bad practice in SQL Server?

I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.
If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx
delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.
To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)
You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!
The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.
Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.
I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.
Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.
There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.

Resources