Truncate some rows in table SQL Server Management Studio [duplicate] - sql-server

Let's say we have table Sales with 30 columns and 500,000 rows. I would like to delete 400,000 in the table (those where "toDelete='1'").
But I have a few constraints :
the table is read / written "often" and I would not like a long "delete" to take a long time and lock the table for too long
I need to skip the transaction log (like with a TRUNCATE) but while doing a "DELETE ... WHERE..." (I need to put a condition), but haven't found any way to do this...
Any advice would be welcome to transform a
DELETE FROM Sales WHERE toDelete='1'
to something more partitioned & possibly transaction log free.

Calling DELETE FROM TableName will do the entire delete in one large transaction. This is expensive.
Here is another option which will delete rows in batches :
deleteMore:
DELETE TOP(10000) Sales WHERE toDelete='1'
IF ##ROWCOUNT != 0
goto deleteMore

I'll leave my answer here, since I was able to test different approaches for mass delete and update (I had to update and then delete 125+mio rows, server has 16GB of RAM, Xeon E5-2680 #2.7GHz, SQL Server 2012).
TL;DR: always update/delete by primary key, never by any other condition. If you can't use PK directly, create a temp table and fill it with PK values and update/delete your table using that table. Use indexes for this.
I started with solution from above (by #Kevin Aenmey), but this approach turned out to be inappropriate, since my database was live and it handles a couple of hundred transactions per second and there was some blocking involved (there was an index for all there fields from condition, using WITH(ROWLOCK) didn't change anything).
So, I added a WAITFOR statement, which allowed database to process other transactions.
deleteMore:
WAITFOR DELAY '00:00:01'
DELETE TOP(1000) FROM MyTable WHERE Column1 = #Criteria1 AND Column2 = #Criteria2 AND Column3 = #Criteria3
IF ##ROWCOUNT != 0
goto deleteMore
This approach was able to process ~1.6mio rows/hour for updating and ~0,2mio rows/hour for deleting.
Turning to temp tables changed things quite a lot.
deleteMore:
SELECT TOP 10000 Id /* Id is the PK */
INTO #Temp
FROM MyTable WHERE Column1 = #Criteria1 AND Column2 = #Criteria2 AND Column3 = #Criteria3
DELETE MT
FROM MyTable MT
JOIN #Temp T ON T.Id = MT.Id
/* you can use IN operator, it doesn't change anything
DELETE FROM MyTable WHERE Id IN (SELECT Id FROM #Temp)
*/
IF ##ROWCOUNT > 0 BEGIN
DROP TABLE #Temp
WAITFOR DELAY '00:00:01'
goto deleteMore
END ELSE BEGIN
DROP TABLE #Temp
PRINT 'This is the end, my friend'
END
This solution processed ~25mio rows/hour for updating (15x faster) and ~2.2mio rows/hour for deleting (11x faster).

What you want is batch processing.
While (select Count(*) from sales where toDelete =1) >0
BEGIN
Delete from sales where SalesID in
(select top 1000 salesId from sales where toDelete = 1)
END
Of course you can experiment which is the best value to use for the batch, I've used from 500 - 50000 depending on the table. If you use cascade delete, you will probably need a smaller number as you have those child records to delete.

One way I have had to do this in the past is to have a stored procedure or script that deletes n records. Repeat until done.
DELETE TOP 1000 FROM Sales WHERE toDelete='1'

You should try to give it a ROWLOCK hint so it will not lock the entire table. However, if you delete a lot of rows lock escalation will occur.
Also, make sure you have a non-clustered filtered index (only for 1 values) on the toDelete column. If possible make it a bit column, not varchar (or what it is now).
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
Ultimately, you can try to iterate over the table and delete in chunks.
Updated
Since while loops and chunk deletes are the new pink here, I'll throw in my version too (combined with my previous answer):
SET ROWCOUNT 100
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
WHILE ##rowcount > 0
BEGIN
SET ROWCOUNT 100
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
END

My own take on this functionality would be as follows.
This way there is no repeated code and you can manage your chunk size.
DECLARE #DeleteChunk INT = 10000
DECLARE #rowcount INT = 1
WHILE #rowcount > 0
BEGIN
DELETE TOP (#DeleteChunk) FROM Sales WITH(ROWLOCK)
SELECT #rowcount = ##RowCount
END

I have used the below to delete around 50 million records -
BEGIN TRANSACTION
DeleteOperation:
DELETE TOP (BatchSize)
FROM [database_name].[database_schema].[database_table]
IF ##ROWCOUNT > 0
GOTO DeleteOperation
COMMIT TRANSACTION
Please note that keeping the BatchSize < 5000 is less expensive on resources.

As I assume the best way to delete huge amount of records is to delete it by Primary Key. (What is Primary Key see here)
So you have to generate tsql script that contains the whole list of lines to delete and after this execute this script.
For example code below is gonna generate that file
GO
SET NOCOUNT ON
SELECT 'DELETE FROM DATA_ACTION WHERE ID = ' + CAST(ID AS VARCHAR(50)) + ';' + CHAR(13) + CHAR(10) + 'GO'
FROM DATA_ACTION
WHERE YEAR(AtTime) = 2014
The ouput file is gonna have records like
DELETE FROM DATA_ACTION WHERE ID = 123;
GO
DELETE FROM DATA_ACTION WHERE ID = 124;
GO
DELETE FROM DATA_ACTION WHERE ID = 125;
GO
And now you have to use SQLCMD utility in order to execute this script.
sqlcmd -S [Instance Name] -E -d [Database] -i [Script]
You can find this approach explaned here https://www.mssqltips.com/sqlservertip/3566/deleting-historical-data-from-a-large-highly-concurrent-sql-server-database-table/

Here's how I do it when I know approximately how many iterations:
delete from Activities with(rowlock) where Id in (select top 999 Id from Activities
(nolock) where description like 'financial data update date%' and len(description) = 87
and User_Id = 2);
waitfor delay '00:00:02'
GO 20
Edit: This worked better and faster for me than selecting top:
declare #counter int = 1
declare #msg varchar(max)
declare #batch int = 499
while ( #counter <= 37600)
begin
set #msg = ('Iteration count = ' + convert(varchar,#counter))
raiserror(#msg,0,1) with nowait
delete Activities with (rowlock) where Id in (select Id from Activities (nolock) where description like 'financial data update date%' and len(description) = 87 and User_Id = 2 order by Id asc offset 1 ROWS fetch next #batch rows only)
set #counter = #counter + 1
waitfor delay '00:00:02'
end

Declare #counter INT
Set #counter = 10 -- (you can always obtain the number of rows to be deleted and set the counter to that value)
While #Counter > 0
Begin
Delete TOP (4000) from <Tablename> where ID in (Select ID from <sametablename> with (NOLOCK) where DateField < '2021-01-04') -- or opt for GetDate() -1
Set #Counter = #Counter -1 -- or set #counter = #counter - 4000 if you know number of rows to be deleted.
End

Related

Pulling Values From Temp Table After An Iteration

I'm querying for the total sizes of recent data in certain databases.
I create a table containing the DBs to be queried then iterate over it to get the DB names and total number of times to run the iteration.
I then create a temptable where the needed data will be inserted into.
I run the iteration to grab the information and push it into the temptable for each database.
After the iteration finishes I'm not able to pull the values from this newly created table.
I wrote a little comment next to each portion of code explaining what I'm trying to do and what I expect to happen.
/*check if the #databases table is already present and then drop it*/
IF OBJECT_ID('tempdb..#databases', 'U') IS NOT NULL
begin
drop table #databases;
end
select ArtifactID into #databases from edds.eddsdbo.[Case]
where name like '%Review%'
/*Once this first statement has been run there will now be a
number column that is associated with the artificatID. Each database has an area that is
titled [EDDS'artifactID']. So if the artifactID = 1111111 then the DB would
be accessed at [EDDS1111111]*/
declare #runs int = 1; /*this will track the number of times iterated
over the result set*/
declare #max int = 0; /*this will be the limit*/
declare #databasename sysname='' /*this will allow the population of each
database name*/
/*check if your temp table exists and drop if necessary*/
IF OBJECT_ID('tempdb..#temptable', 'U') IS NOT NULL
begin
drop table #temptable;
end
/*create the temp table as outside the loop*/
create table #temptable(
fileSize dec,
extractedTextSize dec
)
while #runs<=#max
begin
select #max=count(*) from #databases;
/*the #max is now the number of databases inserted in to this table*/
/*This select statement pulls the information that will be placed
into the temptable. This second statment should be inside the loop. One time
for each DB that appeared in the first query's results.*/
/*begin the loop by assigning your database name, I don't know what the
column is called so I have just called it databasename for now*/
select top 1 #databasename = ArtifactID from #databases;
/*generate your sql using the #databasename variable, if you want to make
the database and table names dynamic too then you can use the same formula*/
insert into #temptable
select SUM(fileSize)/1024/1024/1024, SUM(extractedTextSize)/1024/1024
FROM [EDDS'+cast(#databasename as nvarchar(128))+'].[EDDSDBO].[Document] ed
where ed.CreatedDate >= (select CONVERT(varchar,dateadd(d,-
(day(getdate())),getdate()),106))'
/*remove that row from the databases table so the row won't be redone
This will take the #max and lessen it by one*/
delete from #databases where ArtifactID=#databasename;
/* Once the #max is less than 1 then end the loop*/
end
/* Query the final values in the temp table after the iteration is complete*/
select filesize+extractedTextSize as Gigs from #temptable
When that final select statement runs to pull values from #temptable the response is a single gigs column(as expected) but the table itself is blank.
Something is happening to clear the data out of the table and I'm stuck.
I'm not sure if my error is in syntax or a general error of logic but any help would be greatly appreciated.
Made a few tweaks to formatting but main issue is your loop would never run.
You have #runs <= #max, but #max = 1 and #runs = 0 at start so it will never loop
To fix this you can do a couple different things but I set the #max before loop, and in the loop just added 1 to #runs each loop, since you know how many you need #max before loop runs, and just add it to number of runs and do your compare.
But one NOTE there are much better ways to do this then the way you have. Put identity on your #databases table, and in your loop just do where databaseID = loopCount (then you dont have to delete from the table)
--check if the #databases table is already present and then drop it
IF OBJECT_ID('tempdb..#databases', 'U') IS NOT NULL
drop table #databases;
--Once this first statement has been run there will now be a number column that is associated with the artificatID. Each database has an area that is
-- titled [EDDS'artifactID']. So if the artifactID = 1111111 then the DB would be accessed at [EDDS1111111]
select ArtifactID
INTO #databases
FROM edds.eddsdbo.[Case]
where name like '%Review%'
-- set to 0 to start
DECLARE #runs int = 0;
--this will be the limit
DECLARE #max int = 0;
--this will allow the population of each database name
DECLARE #databasename sysname = ''
--check if your temp table exists and drop if necessary
IF OBJECT_ID('tempdb..#temptable', 'U') IS NOT NULL
drop table #temptable;
--create the temp table as outside the loop
create table #temptable(
fileSize dec,
extractedTextSize dec
)
-- ***********************************************
-- Need to set the value your looping on before you get to your loop, also so if you dont have any you wont do your loop
-- ***********************************************
--the #max is now the number of databases inserted in to this table
select #max = COUNT(*)
FROM #databases;
while #runs <= #max
BEGIN
/*This select statement pulls the information that will be placed
into the temptable. This second statment should be inside the loop. One time
for each DB that appeared in the first query's results.*/
/*begin the loop by assigning your database name, I don't know what the
column is called so I have just called it databasename for now*/
select top 1 #databasename = ArtifactID from #databases;
/*generate your sql using the #databasename variable, if you want to make
the database and table names dynamic too then you can use the same formula*/
insert into #temptable
select SUM(fileSize)/1024/1024/1024, SUM(extractedTextSize)/1024/1024
FROM [EDDS'+cast(#databasename as nvarchar(128))+'].[EDDSDBO].[Document] ed
where ed.CreatedDate >= (select CONVERT(varchar,dateadd(d,- (day(getdate())),getdate()),106))
--remove that row from the databases table so the row won't be redone This will take the #max and lessen it by one
delete from #databases where ArtifactID=#databasename;
--Once the #max is less than 1 then end the loop
-- ***********************************************
-- no need to select from the table and change your max value, just change your runs by adding one for each run
-- ***********************************************
--the #max is now the number of databases inserted in to this table
select #runs = #runs + 1 --#max=count(*) from #databases;
end
-- Query the final values in the temp table after the iteration is complete
select filesize+extractedTextSize as Gigs from #temptable
This is second answer, but its alternative to like I mentioned in above and cleaner to post to post as an alternative answer to keep them seperate
This is a better way to do your looping (not fully tested out yet so you will have to verify).
But instead of deleting from your table just add an ID to it and loop through it using that ID. Way less steps and much cleaner.
--check if the #databases table is already present and then drop it
IF OBJECT_ID('tempdb..#databases', 'U') IS NOT NULL
drop table #databases;
--create the temp table as outside the loop
create table #databases(
ID INT IDENTITY,
ArtifactID VARCHAR(20) -- not sure of this ID's data type
)
--check if your temp table exists and drop if necessary
IF OBJECT_ID('tempdb..#temptable', 'U') IS NOT NULL
drop table #temptable;
--create the temp table as outside the loop
create table #temptable(
fileSize dec,
extractedTextSize dec
)
--this will allow the population of each database name
DECLARE #databasename sysname = ''
-- initialze to 1 so it matches first record in temp table
DECLARE #LoopOn int = 1;
--this will be the max count from table
DECLARE #MaxCount int = 0;
--Once this first statement has been run there will now be a number column that is associated with the artificatID. Each database has an area that is
-- titled [EDDS'artifactID']. So if the artifactID = 1111111 then the DB would be accessed at [EDDS1111111]
-- do insert here so it adds the ID column
INSERT INTO #databases(
ArtifactID
)
SELECT ArtifactID
FROM edds.eddsdbo.[Case]
where name like '%Review%'
-- sets the max number of loops we are going to do
select #MaxCount = COUNT(*)
FROM #databases;
while #LoopOn <= #MaxCount
BEGIN
-- your table has IDENTITY so select the one for the loop your on (initalize to 1)
select #databasename = ArtifactID
FROM #databases
WHERE ID = #LoopOn;
--generate your sql using the #databasename variable, if you want to make
--the database and table names dynamic too then you can use the same formula
insert into #temptable
select SUM(fileSize)/1024/1024/1024, SUM(extractedTextSize)/1024/1024
-- dont know/think this will work like this? If not you have to use dynamic SQL
FROM [EDDS'+cast(#databasename as nvarchar(128))+'].[EDDSDBO].[Document] ed
where ed.CreatedDate >= (select CONVERT(varchar,dateadd(d,- (day(getdate())),getdate()),106))
-- remove all deletes/etc and just add one to the #LoopOn and it will be selected above based off the ID
select #LoopOn += 1
end
-- Query the final values in the temp table after the iteration is complete
select filesize+extractedTextSize as Gigs
FROM #temptable

Inserting data from one table to another using a loop

I am trying to insert data from one table into another (from table [PPRS] to table [Verify]) where the Caption in PPRS is the same as in table [Master]. Someone suggested i use a loop to insert the data instead of hard coding it, however I am confused as to how to go about it.
Here's my code so far:
Declare #counter int
declare #total int
set #counter = 0
SELECT #total = Count(*) FROM PPRS
while #counter <= #total
begin
set #counter += 1
insert into [Verify]
select [Task_ID],
[Project_StartDate] ,
[PPR_Caption],
[Date]
FROM PPRS
where [PPR_Caption] in (SELECT [Caption] from Master)
end
No data is being inserted (0 rows affected)
The sample data I'm trying to insert:
17286 01/03/2018 MP - Youth Environmental Services (12/15) 15/10/2018
I suggest that this is along the lines of what you want to do:
INSERT INTO [Verify] (some_col) -- you never told us the name of this column
SELECT TOP 10 [PPR_Caption]
FROM PPRS
WHERE [PPR_Caption] IN (SELECT [Caption] FROM MasterRecords)
ORDER BY some_column;
That is, you want to insert ten records into the Verify table, determined by some order in the source PPRS table.
Try to declare specific columns to which you want to make insert. Please provide definition of table PPRS and Verify for more precise help from community.
Anyway idea please find bellow:
Declare #counter int
set #counter = 0
while #counter <= 10
begin
set #counter += 1
insert into [Verify] (NameofColumn1_Table_Verify, NameofColumn2_Table_Verify,...)
select PPRS.NameofColumn1_Table_PPRS,
PPRS.NameofColumn2_Table_PPRS,
...
FROM PPRS
where PPRS.[PPR_Caption] in (SELECT DISTINCT [Caption] from MasterRecords)
end
Anyway I think the loop is unnecesary. One batch should make it. Please write what you want to achieve at the end.

Is there a better way to DELETE 80 million+ rows from a table?

Is there a better way to DELETE 80 million+ rows from a table?
WHILE EXISTS (SELECT TOP 1 * FROM large_table)
BEGIN
WITH LT AS
(
SELECT TOP 60000 *
FROM large_table
)
DELETE FROM LT
END
This does the job of keeping my transaction logs from becoming too large, but I need to know if there is a way to make this process go faster? I've had my computer on for 5+ days now running this script and I haven't gotten very far, very fast.
You can truncate the table simply by.
TRUNCATE TABLE large_table
GO
You can also use delete by using where condition. The time taken by delete depends on various aspects. You can reduce the cost by eliminating SELECT query in the condition of WHILE loop.
DECLARE #rows INT = 1
WHILE (#rows>0)
BEGIN
DELETE TOP 1000 *
FROM large_table
#rows = ##ROWCOUNT
END
Bulk deletion will create a lots of logs and rollback happen if the log file is full.
you can do the delete as batches and ensure every transaction is committed.
DECLARE #IDCollection TABLE (ID INT)
DECLARE #Batch INT = 1000;
DECLARE #ROWCOUNT INT;
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION;
INSERT INTO #IDCollection
SELECT TOP (#Batch) ID
FROM table
ORDER BY id
DELETE
FROM table
WHERE id IN (
SELECT *
FROM #IDCollection
)
SET #ROWCOUNT = ##ROWCOUNT
IF (#ROWCOUNT = 0)
BREAK
COMMIT TRANSACTION;
END

How to update large table with millions of rows in SQL Server?

I've an UPDATE statement which can update more than million records. I want to update them in batches of 1000 or 10000. I tried with ##ROWCOUNT but I am unable to get desired result.
Just for testing purpose what I did is, I selected table with 14 records and set a row count of 5. This query is supposed to update records in 5, 5 and 4 but it just updates first 5 records.
Query - 1:
SET ROWCOUNT 5
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
WHILE ##ROWCOUNT > 0
BEGIN
SET rowcount 5
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
PRINT (##ROWCOUNT)
END
SET rowcount 0
Query - 2:
SET ROWCOUNT 5
WHILE (##ROWCOUNT > 0)
BEGIN
BEGIN TRANSACTION
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
PRINT (##ROWCOUNT)
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
What am I missing here?
You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the UPDATE operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.
You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:
It has that been deprecated since SQL Server 2005 was released (11 years ago):
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax
It can affect more than just the statement you are dealing with:
Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.
Instead, use the TOP () clause.
There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a ROLLBACK, which isn't even needed since each statement is its own transaction (i.e. auto-commit).
Assuming you find a reason to keep the explicit transaction, then you do not have a TRY / CATCH structure. Please see my answer on DBA.StackExchange for a TRY / CATCH template that handles transactions:
Are we required to handle Transaction in C# Code as well as in Store procedure
I suspect that the real WHERE clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model (please see note below regarding performance) would be:
DECLARE #Rows INT,
#BatchSize INT; -- keep below 5000 to be safe
SET #BatchSize = 2000;
SET #Rows = #BatchSize; -- initialize just to enter the loop
BEGIN TRY
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP (#BatchSize) tab
SET tab.Value = 'abc1'
FROM TableName tab
WHERE tab.Parameter1 = 'abc'
AND tab.Parameter2 = 123
AND tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
-- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
-- that you don't skip differences that compare the same due to
-- insensitivity of case, accent, etc, or linguistic equivalence.
SET #Rows = ##ROWCOUNT;
END;
END TRY
BEGIN CATCH
RAISERROR(stuff);
RETURN;
END CATCH;
By testing #Rows against #BatchSize, you can avoid that final UPDATE query (in most cases) because the final set is typically some number of rows less than #BatchSize, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to #BatchSize will this code run a final UPDATE affecting 0 rows.
I also added a condition to the WHERE clause to prevent rows that have already been updated from being updated again.
NOTE REGARDING PERFORMANCE
I emphasized "better" above (as in, "this is a better model") because this has several improvements over the O.P.'s original code, and works fine in many cases, but is not perfect for all cases. For tables of at least a certain size (which varies due to several factors so I can't be more specific), performance will degrade as there are fewer rows to fix if either:
there is no index to support the query, or
there is an index, but at least one column in the WHERE clause is a string data type that does not use a binary collation, hence a COLLATE clause is added to the query here to force the binary collation, and doing so invalidates the index (for this particular query).
This is the situation that #mikesigs encountered, thus requiring a different approach. The updated method copies the IDs for all rows to be updated into a temporary table, then uses that temp table to INNER JOIN to the table being updated on the clustered index key column(s). (It's important to capture and join on the clustered index columns, whether or not those are the primary key columns!).
Please see #mikesigs answer below for details. The approach shown in that answer is a very effective pattern that I have used myself on many occasions. The only changes I would make are:
Explicitly create the #targetIds table rather than using SELECT INTO...
For the #targetIds table, declare a clustered primary key on the column(s).
For the #batchIds table, declare a clustered primary key on the column(s).
For inserting into #targetIds, use INSERT INTO #targetIds (column_name(s)) SELECT and remove the ORDER BY as it's unnecessary.
So, if you don't have an index that can be used for this operation, and can't temporarily create one that will actually work (a filtered index might work, depending on your WHERE clause for the UPDATE query), then try the approach shown in #mikesigs answer (and if you use that solution, please up-vote it).
WHILE EXISTS (SELECT * FROM TableName WHERE Value <> 'abc1' AND Parameter1 = 'abc' AND Parameter2 = 123)
BEGIN
UPDATE TOP (1000) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1'
END
I encountered this thread yesterday and wrote a script based on the accepted answer. It turned out to perform very slowly, taking 12 hours to process 25M of 33M rows. I wound up cancelling it this morning and working with a DBA to improve it.
The DBA pointed out that the is null check in my UPDATE query was using a Clustered Index Scan on the PK, and it was the scan that was slowing the query down. Basically, the longer the query runs, the further it needs to look through the index for the right rows.
The approach he came up with was obvious in hind sight. Essentially, you load the IDs of the rows you want to update into a temp table, then join that onto the target table in the update statement. This uses an Index Seek instead of a Scan. And ho boy does it speed things up! It took 2 minutes to update the last 8M records.
Batching Using a Temp Table
SET NOCOUNT ON
DECLARE #Rows INT,
#BatchSize INT,
#Completed INT,
#Total INT,
#Message nvarchar(max)
SET #BatchSize = 4000
SET #Rows = #BatchSize
SET #Completed = 0
-- #targetIds table holds the IDs of ALL the rows you want to update
SELECT Id into #targetIds
FROM TheTable
WHERE Foo IS NULL
ORDER BY Id
-- Used for printing out the progress
SELECT #Total = ##ROWCOUNT
-- #batchIds table holds just the records updated in the current batch
CREATE TABLE #batchIds (Id UNIQUEIDENTIFIER);
-- Loop until #targetIds is empty
WHILE EXISTS (SELECT 1 FROM #targetIds)
BEGIN
-- Remove a batch of rows from the top of #targetIds and put them into #batchIds
DELETE TOP (#BatchSize)
FROM #targetIds
OUTPUT deleted.Id INTO #batchIds
-- Update TheTable data
UPDATE t
SET Foo = 'bar'
FROM TheTable t
JOIN #batchIds tmp ON t.Id = tmp.Id
WHERE t.Foo IS NULL
-- Get the # of rows updated
SET #Rows = ##ROWCOUNT
-- Increment our #Completed counter, for progress display purposes
SET #Completed = #Completed + #Rows
-- Print progress using RAISERROR to avoid SQL buffering issue
SELECT #Message = 'Completed ' + cast(#Completed as varchar(10)) + '/' + cast(#Total as varchar(10))
RAISERROR(#Message, 0, 1) WITH NOWAIT
-- Quick operation to delete all the rows from our batch table
TRUNCATE TABLE #batchIds;
END
-- Clean up
DROP TABLE IF EXISTS #batchIds;
DROP TABLE IF EXISTS #targetIds;
Batching the slow way, do not use!
For reference, here is the original slower performing query:
SET NOCOUNT ON
DECLARE #Rows INT,
#BatchSize INT,
#Completed INT,
#Total INT
SET #BatchSize = 4000
SET #Rows = #BatchSize
SET #Completed = 0
SELECT #Total = COUNT(*) FROM TheTable WHERE Foo IS NULL
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE t
SET Foo = 'bar'
FROM TheTable t
JOIN #batchIds tmp ON t.Id = tmp.Id
WHERE t.Foo IS NULL
SET #Rows = ##ROWCOUNT
SET #Completed = #Completed + #Rows
PRINT 'Completed ' + cast(#Completed as varchar(10)) + '/' + cast(#Total as varchar(10))
END
This is a more efficient version of the solution from #Kramb. The existence check is redundant as the update where clause already handles this. Instead you just grab the rowcount and compare to batchsize.
Also note #Kramb solution didn't filter out already updated rows from the next iteration hence it would be an infinite loop.
Also uses the modern batch size syntax instead of using rowcount.
DECLARE #batchSize INT, #rowsUpdated INT
SET #batchSize = 1000;
SET #rowsUpdated = #batchSize; -- Initialise for the while loop entry
WHILE (#batchSize = #rowsUpdated)
BEGIN
UPDATE TOP (#batchSize) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 and Value <> 'abc1';
SET #rowsUpdated = ##ROWCOUNT;
END
I want share my experience. A few days ago I have to update 21 million records in table with 76 million records. My colleague suggested the next variant.
For example, we have the next table 'Persons':
Id | FirstName | LastName | Email | JobTitle
1 | John | Doe | abc1#abc.com | Software Developer
2 | John1 | Doe1 | abc2#abc.com | Software Developer
3 | John2 | Doe2 | abc3#abc.com | Web Designer
Task: Update persons to the new Job Title: 'Software Developer' -> 'Web Developer'.
1. Create Temporary Table 'Persons_SoftwareDeveloper_To_WebDeveloper (Id INT Primary Key)'
2. Select into temporary table persons which you want to update with the new Job Title:
INSERT INTO Persons_SoftwareDeveloper_To_WebDeveloper SELECT Id FROM
Persons WITH(NOLOCK) --avoid lock
WHERE JobTitle = 'Software Developer'
OPTION(MAXDOP 1) -- use only one core
Depends on rows count, this statement will take some time to fill your temporary table, but it would avoid locks. In my situation it took about 5 minutes (21 million rows).
3. The main idea is to generate micro sql statements to update database. So, let's print them:
DECLARE #i INT, #pagesize INT, #totalPersons INT
SET #i=0
SET #pagesize=2000
SELECT #totalPersons = MAX(Id) FROM Persons
while #i<= #totalPersons
begin
Print '
UPDATE persons
SET persons.JobTitle = ''ASP.NET Developer''
FROM Persons_SoftwareDeveloper_To_WebDeveloper tmp
JOIN Persons persons ON tmp.Id = persons.Id
where persons.Id between '+cast(#i as varchar(20)) +' and '+cast(#i+#pagesize as varchar(20)) +'
PRINT ''Page ' + cast((#i / #pageSize) as varchar(20)) + ' of ' + cast(#totalPersons/#pageSize as varchar(20))+'
GO
'
set #i=#i+#pagesize
end
After executing this script you will receive hundreds of batches which you can execute in one tab of MS SQL Management Studio.
4. Run printed sql statements and check for locks on table. You always can stop process and play with #pageSize to speed up or speed down updating(don't forget to change #i after you pause script).
5. Drop Persons_SoftwareDeveloper_To_AspNetDeveloper. Remove temporary table.
Minor Note: This migration could take a time and new rows with invalid data could be inserted during migration. So, firstly fix places where your rows adds. In my situation I fixed UI, 'Software Developer' -> 'Web Developer'.
More about this method on my blog https://yarkul.com/how-smoothly-insert-millions-of-rows-in-sql-server/
Your print is messing things up, because it resets ##ROWCOUNT. Whenever you use ##ROWCOUNT, my advice is to always set it immediately to a variable. So:
DECLARE #RC int;
WHILE #RC > 0 or #RC IS NULL
BEGIN
SET rowcount 5;
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1';
SET #RC = ##ROWCOUNT;
PRINT(##ROWCOUNT)
END;
SET rowcount = 0;
And, another nice feature is that you don't need to repeat the update code.
First of all, thank you all for your inputs. I tweak my Query - 1 and got my desired result. Gordon Linoff is right, PRINT was messing up my query so I modified it as following:
Modified Query - 1:
SET ROWCOUNT 5
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
Output:
(5 row(s) affected)
(5 row(s) affected)
(4 row(s) affected)
(0 row(s) affected)

Subquery returned more than 1 value. This is not permit error in AFTER INSERT,UPDATE trigger

Once again.. i have the trigger below which has the function to keep/set the value in column esb for maximum 1 row to value 0 (in each row the value cycles from Q->0->R->1)
When i insert more than 1 row the trigger fails with an "Subquery returned more than 1 value. This is not permitted when the subquery follows" error on row 38, the "IF ((SELECT esb FROM INSERTED) in ('1','Q'))" statment.
I understand that 'SELECT esb FROM INSERTED' will return all rows of the insert, but do not know how to process one row at a time. I also tried it by creating a temporary table and iterating through the resultset, but subsequently found out that temporary tables based on the INSERTED table are not allowed.
any suggestions are welcome (again)
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go
ALTER TRIGGER [TR_PHOTO_AIU]
ON [SOA].[dbo].[photos_TEST]
AFTER INSERT,UPDATE
AS
DECLARE #MAXCONC INT -- Maximum concurrent processes
DECLARE #CONC INT -- Actual concurrent processes
SET #MAXCONC = 1 -- 1 concurrent process
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- If column esb is involved in the update, does not necessarily mean
-- that the column itself is updated
If ( Update(ESB) )
BEGIN
-- If column esb has been changed to 1 or Q
IF ((SELECT esb FROM INSERTED) in ('1','Q'))
BEGIN
-- count the number of (imminent) active processes
SET #CONC = (SELECT COUNT(*)
FROM SOA.dbo.photos_TEST pc
WHERE pc.esb in ('0','R'))
-- if maximum has not been reached
IF NOT ( #CONC >= #MAXCONC )
BEGIN
-- set additional rows esb to '0' to match #MAXCONC
UPDATE TOP(#MAXCONC-#CONC) p2
SET p2.esb = '0'
FROM ( SELECT TOP(#MAXCONC-#CONC) p1.esb
FROM SOA.dbo.photos_TEST p1
INNER JOIN INSERTED i ON i.Value = p1.Value
AND i.ArrivalDateTime > p1.ArrivalDateTime
WHERE p1.esb = 'Q'
ORDER BY p1.arrivaldatetime ASC
) p2
END
END
END
Try to rewrite your IF as:
IF EXISTS(SELECT 1 FROM INSERTED WHERE esb IN ('1','Q'))
...

Resources