I have a scenario where I need to delete each row from a table but commit multiple rows at a time.
For example : I have a table #temp1 where 100 records should be deleted. But after each 10 records deleted, I need to commit those 10 delete and so on. So after I have to commit 10 time to commit all 100 delete but loop 100 time to delete all 100 records.
I tried with loop and transaction. Sample below :
BEGIN TRANSACTION
While (#rowsdeleted <= #rowstodelete)
BEGIN
Delete records where condition is match
SET #deletedrows = ##rowcount
END
IF (deletedrows = 10)
BEGIN
COMMIT TRANSACTION
SET deletedrows =0
END
This logic is committing only one row per commit. This will delete 10 records and when commit condition matched it commits only the last deleted record and other 9 records are not committed.
Do any one have any solution or logic to implement this issue ?
declare
#check_tran int,
#deletedrows int,
#total_deletedrows int,
#max_delete int
begin
set #max_delete = 100;
set #check_tran = 0
set #total_deletedrows =0
BEGIN TRANSACTION
While (#total_deletedrows <= #max_delete)
BEGIN
Delete records where condition is match
SET #deletedrows = ##rowcount
SET #check_tran = #check_tran + #deletedrows
SET #total_deletedrows = #total_deletedrows + #deletedrows
print #deletedrows
if (#check_tran >= 10)
begin
COMMIT TRANSACTION
SET #check_tran = 0
BEGIN TRANSACTION
end
END
-- when record count of table will be 105 records then 5 records are not be commited
-- so if transaction count > 0 then commit transaction
if (##TRANCOUNT>0)
begin
COMMIT TRANSACTION
end
end
I've an UPDATE statement which can update more than million records. I want to update them in batches of 1000 or 10000. I tried with ##ROWCOUNT but I am unable to get desired result.
Just for testing purpose what I did is, I selected table with 14 records and set a row count of 5. This query is supposed to update records in 5, 5 and 4 but it just updates first 5 records.
Query - 1:
SET ROWCOUNT 5
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
WHILE ##ROWCOUNT > 0
BEGIN
SET rowcount 5
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
PRINT (##ROWCOUNT)
END
SET rowcount 0
Query - 2:
SET ROWCOUNT 5
WHILE (##ROWCOUNT > 0)
BEGIN
BEGIN TRANSACTION
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
PRINT (##ROWCOUNT)
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
What am I missing here?
You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the UPDATE operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.
You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:
It has that been deprecated since SQL Server 2005 was released (11 years ago):
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax
It can affect more than just the statement you are dealing with:
Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.
Instead, use the TOP () clause.
There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a ROLLBACK, which isn't even needed since each statement is its own transaction (i.e. auto-commit).
Assuming you find a reason to keep the explicit transaction, then you do not have a TRY / CATCH structure. Please see my answer on DBA.StackExchange for a TRY / CATCH template that handles transactions:
Are we required to handle Transaction in C# Code as well as in Store procedure
I suspect that the real WHERE clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model (please see note below regarding performance) would be:
DECLARE #Rows INT,
#BatchSize INT; -- keep below 5000 to be safe
SET #BatchSize = 2000;
SET #Rows = #BatchSize; -- initialize just to enter the loop
BEGIN TRY
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP (#BatchSize) tab
SET tab.Value = 'abc1'
FROM TableName tab
WHERE tab.Parameter1 = 'abc'
AND tab.Parameter2 = 123
AND tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
-- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
-- that you don't skip differences that compare the same due to
-- insensitivity of case, accent, etc, or linguistic equivalence.
SET #Rows = ##ROWCOUNT;
END;
END TRY
BEGIN CATCH
RAISERROR(stuff);
RETURN;
END CATCH;
By testing #Rows against #BatchSize, you can avoid that final UPDATE query (in most cases) because the final set is typically some number of rows less than #BatchSize, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to #BatchSize will this code run a final UPDATE affecting 0 rows.
I also added a condition to the WHERE clause to prevent rows that have already been updated from being updated again.
NOTE REGARDING PERFORMANCE
I emphasized "better" above (as in, "this is a better model") because this has several improvements over the O.P.'s original code, and works fine in many cases, but is not perfect for all cases. For tables of at least a certain size (which varies due to several factors so I can't be more specific), performance will degrade as there are fewer rows to fix if either:
there is no index to support the query, or
there is an index, but at least one column in the WHERE clause is a string data type that does not use a binary collation, hence a COLLATE clause is added to the query here to force the binary collation, and doing so invalidates the index (for this particular query).
This is the situation that #mikesigs encountered, thus requiring a different approach. The updated method copies the IDs for all rows to be updated into a temporary table, then uses that temp table to INNER JOIN to the table being updated on the clustered index key column(s). (It's important to capture and join on the clustered index columns, whether or not those are the primary key columns!).
Please see #mikesigs answer below for details. The approach shown in that answer is a very effective pattern that I have used myself on many occasions. The only changes I would make are:
Explicitly create the #targetIds table rather than using SELECT INTO...
For the #targetIds table, declare a clustered primary key on the column(s).
For the #batchIds table, declare a clustered primary key on the column(s).
For inserting into #targetIds, use INSERT INTO #targetIds (column_name(s)) SELECT and remove the ORDER BY as it's unnecessary.
So, if you don't have an index that can be used for this operation, and can't temporarily create one that will actually work (a filtered index might work, depending on your WHERE clause for the UPDATE query), then try the approach shown in #mikesigs answer (and if you use that solution, please up-vote it).
WHILE EXISTS (SELECT * FROM TableName WHERE Value <> 'abc1' AND Parameter1 = 'abc' AND Parameter2 = 123)
BEGIN
UPDATE TOP (1000) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1'
END
I encountered this thread yesterday and wrote a script based on the accepted answer. It turned out to perform very slowly, taking 12 hours to process 25M of 33M rows. I wound up cancelling it this morning and working with a DBA to improve it.
The DBA pointed out that the is null check in my UPDATE query was using a Clustered Index Scan on the PK, and it was the scan that was slowing the query down. Basically, the longer the query runs, the further it needs to look through the index for the right rows.
The approach he came up with was obvious in hind sight. Essentially, you load the IDs of the rows you want to update into a temp table, then join that onto the target table in the update statement. This uses an Index Seek instead of a Scan. And ho boy does it speed things up! It took 2 minutes to update the last 8M records.
Batching Using a Temp Table
SET NOCOUNT ON
DECLARE #Rows INT,
#BatchSize INT,
#Completed INT,
#Total INT,
#Message nvarchar(max)
SET #BatchSize = 4000
SET #Rows = #BatchSize
SET #Completed = 0
-- #targetIds table holds the IDs of ALL the rows you want to update
SELECT Id into #targetIds
FROM TheTable
WHERE Foo IS NULL
ORDER BY Id
-- Used for printing out the progress
SELECT #Total = ##ROWCOUNT
-- #batchIds table holds just the records updated in the current batch
CREATE TABLE #batchIds (Id UNIQUEIDENTIFIER);
-- Loop until #targetIds is empty
WHILE EXISTS (SELECT 1 FROM #targetIds)
BEGIN
-- Remove a batch of rows from the top of #targetIds and put them into #batchIds
DELETE TOP (#BatchSize)
FROM #targetIds
OUTPUT deleted.Id INTO #batchIds
-- Update TheTable data
UPDATE t
SET Foo = 'bar'
FROM TheTable t
JOIN #batchIds tmp ON t.Id = tmp.Id
WHERE t.Foo IS NULL
-- Get the # of rows updated
SET #Rows = ##ROWCOUNT
-- Increment our #Completed counter, for progress display purposes
SET #Completed = #Completed + #Rows
-- Print progress using RAISERROR to avoid SQL buffering issue
SELECT #Message = 'Completed ' + cast(#Completed as varchar(10)) + '/' + cast(#Total as varchar(10))
RAISERROR(#Message, 0, 1) WITH NOWAIT
-- Quick operation to delete all the rows from our batch table
TRUNCATE TABLE #batchIds;
END
-- Clean up
DROP TABLE IF EXISTS #batchIds;
DROP TABLE IF EXISTS #targetIds;
Batching the slow way, do not use!
For reference, here is the original slower performing query:
SET NOCOUNT ON
DECLARE #Rows INT,
#BatchSize INT,
#Completed INT,
#Total INT
SET #BatchSize = 4000
SET #Rows = #BatchSize
SET #Completed = 0
SELECT #Total = COUNT(*) FROM TheTable WHERE Foo IS NULL
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE t
SET Foo = 'bar'
FROM TheTable t
JOIN #batchIds tmp ON t.Id = tmp.Id
WHERE t.Foo IS NULL
SET #Rows = ##ROWCOUNT
SET #Completed = #Completed + #Rows
PRINT 'Completed ' + cast(#Completed as varchar(10)) + '/' + cast(#Total as varchar(10))
END
This is a more efficient version of the solution from #Kramb. The existence check is redundant as the update where clause already handles this. Instead you just grab the rowcount and compare to batchsize.
Also note #Kramb solution didn't filter out already updated rows from the next iteration hence it would be an infinite loop.
Also uses the modern batch size syntax instead of using rowcount.
DECLARE #batchSize INT, #rowsUpdated INT
SET #batchSize = 1000;
SET #rowsUpdated = #batchSize; -- Initialise for the while loop entry
WHILE (#batchSize = #rowsUpdated)
BEGIN
UPDATE TOP (#batchSize) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 and Value <> 'abc1';
SET #rowsUpdated = ##ROWCOUNT;
END
I want share my experience. A few days ago I have to update 21 million records in table with 76 million records. My colleague suggested the next variant.
For example, we have the next table 'Persons':
Id | FirstName | LastName | Email | JobTitle
1 | John | Doe | abc1#abc.com | Software Developer
2 | John1 | Doe1 | abc2#abc.com | Software Developer
3 | John2 | Doe2 | abc3#abc.com | Web Designer
Task: Update persons to the new Job Title: 'Software Developer' -> 'Web Developer'.
1. Create Temporary Table 'Persons_SoftwareDeveloper_To_WebDeveloper (Id INT Primary Key)'
2. Select into temporary table persons which you want to update with the new Job Title:
INSERT INTO Persons_SoftwareDeveloper_To_WebDeveloper SELECT Id FROM
Persons WITH(NOLOCK) --avoid lock
WHERE JobTitle = 'Software Developer'
OPTION(MAXDOP 1) -- use only one core
Depends on rows count, this statement will take some time to fill your temporary table, but it would avoid locks. In my situation it took about 5 minutes (21 million rows).
3. The main idea is to generate micro sql statements to update database. So, let's print them:
DECLARE #i INT, #pagesize INT, #totalPersons INT
SET #i=0
SET #pagesize=2000
SELECT #totalPersons = MAX(Id) FROM Persons
while #i<= #totalPersons
begin
Print '
UPDATE persons
SET persons.JobTitle = ''ASP.NET Developer''
FROM Persons_SoftwareDeveloper_To_WebDeveloper tmp
JOIN Persons persons ON tmp.Id = persons.Id
where persons.Id between '+cast(#i as varchar(20)) +' and '+cast(#i+#pagesize as varchar(20)) +'
PRINT ''Page ' + cast((#i / #pageSize) as varchar(20)) + ' of ' + cast(#totalPersons/#pageSize as varchar(20))+'
GO
'
set #i=#i+#pagesize
end
After executing this script you will receive hundreds of batches which you can execute in one tab of MS SQL Management Studio.
4. Run printed sql statements and check for locks on table. You always can stop process and play with #pageSize to speed up or speed down updating(don't forget to change #i after you pause script).
5. Drop Persons_SoftwareDeveloper_To_AspNetDeveloper. Remove temporary table.
Minor Note: This migration could take a time and new rows with invalid data could be inserted during migration. So, firstly fix places where your rows adds. In my situation I fixed UI, 'Software Developer' -> 'Web Developer'.
More about this method on my blog https://yarkul.com/how-smoothly-insert-millions-of-rows-in-sql-server/
Your print is messing things up, because it resets ##ROWCOUNT. Whenever you use ##ROWCOUNT, my advice is to always set it immediately to a variable. So:
DECLARE #RC int;
WHILE #RC > 0 or #RC IS NULL
BEGIN
SET rowcount 5;
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1';
SET #RC = ##ROWCOUNT;
PRINT(##ROWCOUNT)
END;
SET rowcount = 0;
And, another nice feature is that you don't need to repeat the update code.
First of all, thank you all for your inputs. I tweak my Query - 1 and got my desired result. Gordon Linoff is right, PRINT was messing up my query so I modified it as following:
Modified Query - 1:
SET ROWCOUNT 5
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
Output:
(5 row(s) affected)
(5 row(s) affected)
(4 row(s) affected)
(0 row(s) affected)
I need to do large update in sybase.
Table1 has column a, b, c, e, f, around 9 million records.
Table2 has column a, e, f, around 500000 records.
The bellow update sql failed due to syslog full
update table1
set e = table2.e, f = table2.f
from table1, table2
where table1.a = table2.a
After doing research in the internet, the bellow two procedures still failed due to the same error however they did update 5000 records successfully.
Any ideas. Thank you!
SET ROWCOUNT 5000
WHILE (1=1)
BEGIN
update table1
set e = table2.e, f = table2.f
from table1, table2
where table1.a = table2.a
IF ##ROWCOUNT != 5000
BREAK
END
SET ROWCOUNT 0
declare #errorStatus int, #rowsProcessed int
set rowcount 5000
select #rowsProcessed = 1
while (#rowsProcessed != 0)
begin
begin tran
update table1
set e = table2.e, f = table2.f
from table1, table2
where table1.a = table2.a
select #rowsProcessed = ##rowcount, #errorStatus = ##error
if (#errorStatus != 0)
begin
--raiserror ....
rollback tran
--return -1
end
commit tran
end
set rowcount 0
It appears that your syslog device is not large enough for what you are trying to accomplish, and the log may not be getting truncated.
You should check the setting of the database (sp_helpdb <DBNAME>) and find out if it says trunc log on chkpt If you don't find that setting, then big transactions will always fill up your syslog.
To fix this you must either add log space to the database, manually dump the log, or set the database option to truncate the log on checkpoint.
If you are worried about up to the minute recovery in case of a system or disk failure, then you should add log space, and schedule more frequent log dumps.
If you are not worried about up to the minute recovery, then you can turn on the database option to truncate the log on checkpoint.
sp_dboption <DBNAME>, 'trunc log on chkpt', true
Once again.. i have the trigger below which has the function to keep/set the value in column esb for maximum 1 row to value 0 (in each row the value cycles from Q->0->R->1)
When i insert more than 1 row the trigger fails with an "Subquery returned more than 1 value. This is not permitted when the subquery follows" error on row 38, the "IF ((SELECT esb FROM INSERTED) in ('1','Q'))" statment.
I understand that 'SELECT esb FROM INSERTED' will return all rows of the insert, but do not know how to process one row at a time. I also tried it by creating a temporary table and iterating through the resultset, but subsequently found out that temporary tables based on the INSERTED table are not allowed.
any suggestions are welcome (again)
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go
ALTER TRIGGER [TR_PHOTO_AIU]
ON [SOA].[dbo].[photos_TEST]
AFTER INSERT,UPDATE
AS
DECLARE #MAXCONC INT -- Maximum concurrent processes
DECLARE #CONC INT -- Actual concurrent processes
SET #MAXCONC = 1 -- 1 concurrent process
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- If column esb is involved in the update, does not necessarily mean
-- that the column itself is updated
If ( Update(ESB) )
BEGIN
-- If column esb has been changed to 1 or Q
IF ((SELECT esb FROM INSERTED) in ('1','Q'))
BEGIN
-- count the number of (imminent) active processes
SET #CONC = (SELECT COUNT(*)
FROM SOA.dbo.photos_TEST pc
WHERE pc.esb in ('0','R'))
-- if maximum has not been reached
IF NOT ( #CONC >= #MAXCONC )
BEGIN
-- set additional rows esb to '0' to match #MAXCONC
UPDATE TOP(#MAXCONC-#CONC) p2
SET p2.esb = '0'
FROM ( SELECT TOP(#MAXCONC-#CONC) p1.esb
FROM SOA.dbo.photos_TEST p1
INNER JOIN INSERTED i ON i.Value = p1.Value
AND i.ArrivalDateTime > p1.ArrivalDateTime
WHERE p1.esb = 'Q'
ORDER BY p1.arrivaldatetime ASC
) p2
END
END
END
Try to rewrite your IF as:
IF EXISTS(SELECT 1 FROM INSERTED WHERE esb IN ('1','Q'))
...
Does table constraints execute in the same transaction?
I have a transaction with Read Committed isolation level which inserts some rows in a table. The table has a constraint on it that calls a function which in turn selects some rows from the same table.
It looks like the function runs without knowing anything about the transaction and the select in the function returns rows in the table which were there prior to the transaction.
Is there a workaround or am I missing anything? Thanks.
Here are the codes for the transaction and the constraint:
insert into Treasury.DariaftPardakhtDarkhastFaktor
(DarkhastFaktor, DariaftPardakht, Mablagh, CodeVazeiat,
ZamaneTakhsiseFaktor, MarkazPakhsh, ShomarehFaktor, User)
values
(#DarkhastFaktor, #DariaftPardakht, #Mablagh, #CodeVazeiat,
#ZamaneTakhsiseFaktor, #MarkazPakhsh, #ShomarehFaktor, #User);
constraint expression (enforce for inserts and updates):
([Treasury].[ufnCheckDarkhastFaktorMablaghConstraint]([DarkhastFaktor])=(1))
ufnCheckDarkhastFaktorMablaghConstraint:
returns bit
as
begin
declare #SumMablagh float
declare #Mablagh float
select #SumMablagh = isnull(sum(Mablagh), 0)
from Treasury.DariaftPardakhtDarkhastFaktor
where DarkhastFaktor= #DarkhastFaktor
select #Mablagh = isnull(MablaghKhalesFaktor, 0)
from Sales.DarkhastFaktor
where DarkhastFaktor= #DarkhastFaktor
if #Mablagh - #SumMablagh < -1
return 0
return 1
end
Check constraints are not enforced for delete operations, see http://msdn.microsoft.com/en-us/library/ms188258.aspx
CHECK constraints are not validated
during DELETE statements. Therefore,
executing DELETE statements on tables
with certain types of check
constraints may produce unexpected
results.
Edit - to answer your question on workaround, you can use a delete trigger to roll back if your function call shows an invariant is broken.
Edit #2 - #reticent, if you are adding rows then the function called by the check constraint should in fact see the rows. If it didn't, check constraints would be useless. Here is a simple example, you will find that the first 2 inserts succeed and the third fails as expected:
create table t1 (id int)
go
create function t1_validateSingleton ()
returns bit
as
begin
declare #ret bit
set #ret = 1
if exists (
select count(*)
from t1
group by id
having count(*) > 1
)
begin
set #ret = 0
end
return (#ret)
end
go
alter table t1
add constraint t1_singleton
check (dbo.t1_validateSingleton()=1)
go
insert t1 values (1)
insert t1 values (2)
insert t1 values (1)