Requirement
I want a way to generate a new unique number (Invoice Number) in a continuous sequence (no number should be left out when generating a new number)
Valid Example: 1, 2, 3, 4
Invalid Example: 1, 2, 4, 3 (not in a continuous sequence)
Current Schema
Here is my existing table schema of the table Test
Solution i came up with
After doing some research i came up with the below code which seems to be working as of now.
DECLARE #i as int=0
While(#i<=10000 * 10000)
BEGIN
Begin Transaction
Insert Into Test(UniqueNo,[Text])values((Select IsNull(MAX(UniqueNo),0)+1 from Test with (TABLOCK)),'a')
COMMIT
SET #i = #i + 1;
END
Testing
I tried running the code from 12 different SQL Query or 12 threads you can say and currently it generates new and unique value for each records even after inserting 162,921 rows
Main Question
Can the above code result into duplicate values?
I tried it by hit-and-trial method and it works perfectly BUT when i go in-depth of Transaction Locking the select statement generates a Shared Lock for the whole table that means it will allow concurrent transactions to access the same data, right?
That means that multiple transactions can generate duplicate values, right?
Then how come i am not able to see any duplicate values yet?
EDIT
As per david's comment
I cannot use identity field because in any case if i delete a record then it would be difficult for me to fill that number up.
this is the procedure can create your countinues check number
create procedure [dbo].[aa] #var int
as
declare #isexists int=0
declare #lastnum int=0
set #isexists=(select isnull((select [text] from t000test where [text]=#var),0))
if(isnull(#isexists,0)<=0)
begin
set #lastnum=(select isnull( max([text]),0) from t000test )
if(#var>#lastnum)
begin
insert into t000test(text) values (#var)
end
else
print 'your number dose not in rang'
end
else
print 'your number exists in the data'
GO
you can test this procedure like this :
exec dbo.aa 6 --or any number u like
and this loop create your number automaticly by range u like to
declare #yourrange int =50
declare #id int =0;
while #yourrange>0
begin
exec dbo.aa #id
set #id=#id+1;
set #yourrange=#yourrange-1;
end
I am using 50 but you can use more range or less
Related
There are 2 tables:
Wallets;
Transactions.
There is a stored procedure that handles (I think with ACID operation):
updating on Wallet table
inserting one row into Transactions table
every time it is called.
The issue occurs when there are many calls to the SP at same time, infact the value of PreviousBalance is not correct (sequentially wrong), cause in the SP read old value, meantime another process of call is running.
To understand better look the following screenshot.
There are 3 Transaction with same DT (IDs 1289, 1288, 1287), in all of those PreviouseBalance is equal, but is not correct, because the value for :
Trx ID 1288 should be 180,78 as Balance of previous row;
Trx ID 1289 should be 168,07 = 180,78 - 12,08
I think that the issue is in the SET of #OLDBalance var; at same time those 3 thread read same value, so when the SP goes to INSERT loads same value of PreviousBalance.
How can I do in order to read #OLDBalance correct after commit of one operation?
I tried to set several type of Isolation Levet into SP, the result was the same and sometime went in error for deadlock.
I have the following stored procedure:
Stored Procedure
ALTER PROCEDURE [dbo].[upsMovimenta_internal]
#AccountID int,
#Amount money,
#TypeTransactionID int,
#ProductID int,
#notes nvarchar(max)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #OLDBalance MONEY;
DECLARE #PreviousBalance MONEY;
DECLARE #CurrentBalance MONEY;
DECLARE #Molt float;
BEGIN TRANSACTION;
IF NOT EXISTS( SELECT * FROM Accounts WHERE AccountID = #AccountID)
BEGIN
RaisError(N'Account not found ', 10, 1, N'number', 3);
return (1)
END
SELECT #Molt = Moltiplicatore
FROM TypeTransactions
where TypeTransactionID = #TypeTransactionID
;
IF (#Molt is null )
BEGIN
RaisError(N'Error transaction', 10, 1, N'number', 3);
return (1)
END
SET #Amount = #Amount * #Molt;
--SELECT * FROM Wallets
SELECT TOP 1 #OLDBalance = TotalAmount
FROM Wallets
where AccountID = #AccountID
;
SET #CurrentBalance = #OLDBalance + #Amount;
IF (#ProductID = 1 )
BEGIN
UPDATE Wallets
SET TotalAmount+=#Amount,
Cash+=#Amount
FROM Wallets where AccountID = #AccountID
;
END
IF (#ProductID = 2 )
BEGIN
UPDATE Wallets
SET TotalAmount+=#Amount,
Fun+=#Amount
FROM Wallets where AccountID = #AccountID
;
END
INSERT INTO Transactions
( AccountID, ProductID, DT, TypeTransactionID, Amout, Balance, PreviousBalance, Notes )
VALUES
( #AccountID, #ProductID, GETDATE(), #TypeTransactionID, #Amount, #CurrentBalance, #OLDBalance, #notes)
;
COMMIT TRANSACTION;
return (0)
END
Thank you so much guys
Generally, one way of managing locks on records, is to apply a dummy update on the rows you want to work on, right after starting transaction.
In this case SQL Server guarantees that those rows will be locked and no other transactions can access the rows. So you can change your design to something like this:
begin tran
update myTable
set Field1 = Field1
where someKeyField = 212
-- do the same for other tables that you want to protect against change
-- at this moment all working rows will be locked, other procedure calls will be on hold
-- do your main operations here
commit tran
The issue with this will be the other proc calls will wait and this may degrade performance or even time-out if the traffic is high and your operation in this proc is lengthy
If you are working on high transaction environment, you need to change your design.
Update: Design Suggestion
I don't get why you have PreviousBalance and Balance in your transaction (it is against the design rules, however you can override rule in special case).
Probably you have that to speed up your calculations or make your queries simpler. But it is not good practice in OLTP database.
Rules say you keep the Amount column and calculate PreviousBalance and Balance somewhere else.
You should drop PreviousBalance but keep the Balance column, and every time you insert a transaction, you update (increase/decrease) the Balance column. Plus you need to initialize the Balance column at the first transaction.
This is what I can think of. If I knew your whole system, I would be able to have better ideas though.
I'm having problems with a stored procedure that iterates over a table, it works fine with a few hundred rows however when the table is over the thousands it saturates the memory and crashes.
The procedure should iterate row by row and fill a column with a value which is calculated from another column in the row. I suspect it is the cursor that crashes the procedure and in other questions I've read to use a while loop but I'm no expert in sql and the examples I tried from those answers didn't work.
CREATE PROCEDURE [dbo].[GenerateNewHashes]
AS
BEGIN
SET NOCOUNT ON;
DECLARE #module BIGINT = 382449983
IF EXISTS(SELECT 1 FROM dbo.telephoneSource WHERE Hash IS NULL)
BEGIN
DECLARE hash_cursor CURSOR FOR
SELECT a.telephone, a.Hash
FROM dbo.telephoneSource AS a
OPEN hash_cursor
FETCH FROM hash_cursor
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE dbo.telephoneSource
SET Hash = CAST(telephone AS BIGINT) % #module
WHERE CURRENT OF hash_cursor
FETCH NEXT FROM hash_cursor
END
CLOSE hash_cursor
DEALLOCATE hash_cursor
END
END
Basically the stored procedure is intended to fill a new column called Hash that was added to the existing table, when the script that updates the table ends the new column is filled with NULL values and then this stored procedure is supposed to fill each null value with the operation telephone number (which is a bigint) % module variable (big int as well).
Is there anything besides changing to a while loop that I can do to make it use less memory or just don't crash? Thanks in advance.
You could do the following:
WHILE 1=1
BEGIN
UPDATE TOP (10000) dbo.telephoneSource
SET Hash = CAST(telephone AS BIGINT)%#module
WHERE Hash IS NULL;
IF ##ROWCOUNT = 0
BEGIN
BREAK;
END;
END;
This will update Hash as long as there are NULL values and will exit once there have been no records updated.
Adding a filtered index could be useful as well:
CREATE NONCLUSTERED INDEX IX_telephoneSource_Hash_telephone
ON dbo.telephoneSource (Hash)
INCLUDE (telephone)
WHERE Hash IS NULL;
It will speed up lookups in order to update it. But this might be not needed.
Here is example of code to do it in loops from my comment above with out using a cursor, and if you add where your field you are updating IS NOT NULL into the inner loop it wont update ones that were already done (in case you need to restart the process or something.
I didnt include your specific tables in there but if you need me to I can add it in there.
DECLARE #PerBatchCount as int
DECLARE #MAXID as bigint
DECLARE #WorkingOnID as bigint
Set #PerBatchCount = 1000
--Find range of IDs to process using yoru tablename
SELECT #WorkingOnID = MIN(ID), #MAXID = MAX(ID)
FROM YouTableHere WITH (NOLOCK)
WHILE #WorkingOnID <= #MAXID
BEGIN
-- do an update on all the ones that exist in the offer table NOW
--DO YOUR UPDATE HERE
-- include this where clause where ID is your PK you are looping through
WHERE ID BETWEEN #WorkingOnID AND (#WorkingOnID + #PerBatchCount -1)
set #WorkingOnID = #WorkingOnID + #PerBatchCount
END
SET NOCOUNT OFF;
I would simply add computed column:
ALTER TABLE dbo.telephoneSource
ADD Hash AS (CAST(telephone AS BIGINT)%382449983) PERSISTED;
Background:
This is a multi-tenant application, so that a normal identity column will not work. All tables have a unique client identifier Clients.id. So each client can have many customers. This column is not included below for simplicity.
We want to generate a unique customer number starting at 1000.
We store the current (last) generated number in a table called Master. Let's say Master.CustomerNumber. So numbers will go 1001, 1002 etc. and the last one is stored there.
So each time we add a customer, we have a query that looks up the current value, increment it, and insert it in Customer.Number.
NOTE: we are using SQL Server 2008. We have multiple servers in a cluster.
What is the best approach to insure that if two customers are added at the same time that each gets a unique customer number? Stored procedure, locking, CFLOCKING?
How do I insure that this process is 'single-threaded' and that the same number is not issued twice?
I do have a unique index on Customer.Number+Clients.id. I am interested in the implementation of how to guarantee uniqueness when generated.
I have not reviewed the existing solutions because they are quite long and elaborate. Wouldn't the following be all you need?
CREATE TABLE MasterDB.dbo.Sequences (ClientId INT NOT NULL PRIMARY KEY, LastGeneratedNumber INT NOT NULL)
DECLARE #nextId INT; //Holds the newly allocated ID
UPDATE MasterDB.dbo.Sequences
SET LastGeneratedNumber = LastGeneratedNumber + 1, #nextId = LastGeneratedNumber + 1
WHERE ClientId = 1234
This is correct under any isolation level and any index structure. The row holding the ID information will be U or X locked by the engine.
In case there never has been an ID generated this update statement will not do anything. You can solve that by using MERGE or by using control flow. I recommend MERGE.
Or, you insert a row whenever you create a new client. Set LastGeneratedNumber = 1000 - 1.
There is no need to use stored procedures but you certainly can. There is almost no performance difference to executing this T-SQL as a batch from the application. Do what's more convenient to you.
If you make this T-SQL part of your main transaction the ID assignment will be transactional. It will potentially roll back and it will serialize customer creation. If you don't like that use a separate transaction. That way IDs might be lost, though. This is unavoidable in any solution.
A variation the UPDATE given above is:
UPDATE MasterDB.dbo.Sequences
SET LastGeneratedNumber = LastGeneratedNumber + 1
OUTPUT INSERTED.LastGeneratedNumber
WHERE ClientId = 1234
You could use one sequence per client. This requires that your application executes DDL. This can be awkward. Also, you cannot make ID generation transactional. There is less control. I would not recommend that but it's possible.
You could use following solution:
CREATE TABLE dbo.[Master] (
-- Foreign key to dbo.Tenant table ?
-- Only one row for every tenant is allowed => PK on tenant identifier
TenantNum INT NOT NULL
CONSTRAINT PK_Master PRIMARY KEY CLUSTERED (TenantNum),
-- LastCustomerNum = last generated value for CustomerNum
-- NULL means no value was generated
LastCustomerNum INT NULL,
-- It will create one clustered unique index on these two columns
InitialValue INT NOT NULL
CONSTRAINT DF_Master_InitialValue DEFAULT (1),
Step INT NOT NULL
CONSTRAINT DF_Master_Step DEFAULT (1)
);
GO
CREATE PROCEDURE dbo.p_GetNewCustomerNum
#TenantNum INT,
#NewCustomerNum INT OUTPUT,
#HowManyCustomerNum INT = 1 -- Ussualy, we want to generate only one CustomerNum
AS
BEGIN
BEGIN TRY
IF #TenantNum IS NULL
RAISERROR('Invalid value for #TenantNum: %d', 16, 1, #TenantNum);
IF #HowManyCustomerNum IS NULL OR #HowManyCustomerNum < 1
RAISERROR('Invalid value for #HowManyCustomerNum: %d', 16, 1, #HowManyCustomerNum)
-- It updated the LastCustomerNum column and it assign the new value to #NewCustomerNum output parameter
UPDATE m
SET #NewCustomerNum
= LastCustomerNum
= CASE WHEN LastCustomerNum IS NULL THEN InitialValue - Step ELSE LastCustomerNum END
+ Step * #HowManyCustomerNum
FROM dbo.[Master] AS m
WHERE m.TenantNum = #TenantNum
IF ##ROWCOUNT = 0
RAISERROR('#TenantNum: %d doesn''t exist', 16, 1, #TenantNum);
END TRY
BEGIN CATCH
-- ReThrow intercepted exception/error
DECLARE #ExMessage NVARCHAR(2048) = ERROR_MESSAGE()
RAISERROR(#ExMessage, 16, 1)
-- Use THROW for SQL2012+
END CATCH
END
GO
Usage (no gaps):
BEGIN TRAN
...
DECLARE #cn INT
EXEC dbo.p_GetNewCustomerNum
#TenantNum = ...,
#NewCustomerNum = #cn OUTPUT,
[#HowManyCustomerNum = ...]
...
INSERT INTO dbo.Customer(..., CustomerNum, ...)
VALUES (..., #cs, ...)
COMMIT
Note: if you don't use transactions to generate the new customer number and then to insert this values into Customer table then they could get gaps.
How it works ?
{Primary key | Unique index} defined on TenantNum and CustomerNum will prevent any duplicates
Under default isolation level (READ COMMITTED) but also under READ UNCOMMITTED, REPETABLE READ and SERIALIZABLE, the UPDATE statement require and X lock. If we have two concurent SQL Server sessions (and transactions) which try to generate a new CustomerNum then first session will successfully get the X lock on tenant row and the second session will have to wait till first session (and transaction) will end (COMMIT or ROLLBACK). Note: I assumed that every session has one active transaction.
Regarding X lock behavior: this is possible because two [concurent] X locks are incompatibles. See table bellow with "Requested mode" and [Granted mode]:
For above reasons, only one connection/TX can update within dbo.[Master] a tenant row with a new customer number.
-- Tests #1
-- It insert few new and "old" tenants
INSERT dbo.[Master] (TenantNum) VALUES (101)
INSERT dbo.[Master] (TenantNum, LastCustomerNum) VALUES (102, 1111)
SELECT * FROM dbo.[Master]
/*
TenantNum LastCustomerNum InitialValue Step
----------- --------------- ------------ -----------
101 NULL 1 1
102 1111 1 1
*/
GO
-- It generate one CustomerNum for tenant 101
DECLARE #cn INT
EXEC p_GetNewCustomerNum 101, #cn OUTPUT
SELECT #cn AS [cn]
/*
cn
-----------
1
*/
GO
-- It generate second CustomerNums for tenant 101
DECLARE #cn INT
EXEC p_GetNewCustomerNum 101, #cn OUTPUT
SELECT #cn AS [cn]
/*
cn
-----------
2
*/
GO
-- It generate three CustomerNums for tenant 101
DECLARE #cn INT
EXEC p_GetNewCustomerNum 101, #cn OUTPUT, 3
SELECT #cn AS [cn]
/*
cn
-----------
5 <-- This ID means that following range was reserved [(5-3)+1, 5] = [3, 5] = {3, 4, 5}; Note: 1 = Step
*/
GO
-- It generate one CustomerNums for tenant 102
DECLARE #cn INT
EXEC p_GetNewCustomerNum 102, #cn OUTPUT
SELECT #cn AS [cn]
/*
cn
-----------
1112
*/
GO
-- End status of Master table
SELECT * FROM dbo.Master
/*
TenantNum LastCustomerNum InitialValue Step
----------- --------------- ------------ -----------
101 5 1 1
102 1112 1 1
*/
GO
.
-- Tests #2: To test concurent sesssions / TX you could use bellow script
-- Step 1: Session 1
BEGIN TRAN
-- It generate three CustomerNums for tenant 101
DECLARE #cn INT
EXEC p_GetNewCustomerNum 101, #cn OUTPUT
SELECT #cn AS [cn] -- > It generates #cn 6
-- Step 2: Session 2
BEGIN TRAN
-- It generate three CustomerNums for tenant 101
DECLARE #cn INT
EXEC p_GetNewCustomerNum 101, #cn OUTPUT -- Update waits for Session 1 to finish
SELECT #cn AS [cn]
COMMIT
-- Step 3: Session 1
COMMIT -- End of first TX. Check Session 2: it'll output 7.
First end note: to manage transactions and exceptions I would use SET XACT_ABORT ON and/or BEGIN TRAN ... END CATCH. Discussion about this topic is beyond the the purpose of this answer.
Second end note: see updated section "How it works ?" (bullet 3 and 4).
I know this is little bit late, but still hope it helps you :)
We do have same situation over here...
And we solved it by having a common table in a separate database, which contains only three columns (i.e. tablename, columnname and LastIndex).
Now, we use a separate SP to always getting new number from this table with separate transaction (as it should always commit, regardless your main Insert function true/false).
So, this will always return with new ID to any request, and that new Index will be used for inserting a record.
Let me know if you need any sample on this.
You want to use a Sequence, for example:
CREATE SEQUENCE Customer_Number_Seq
AS INTEGER
START WITH 1
INCREMENT BY 1
MINVALUE 1000
MAXVALUE 100
CYCLE;
and then possibly something like :
CREATE TABLE Customers
(customer_nbr INTEGER DEFAULT NEXT VALUE FOR Customer_Number_Seq,
.... other columns ....
The documentation has more details.
I've an UPDATE statement which can update more than million records. I want to update them in batches of 1000 or 10000. I tried with ##ROWCOUNT but I am unable to get desired result.
Just for testing purpose what I did is, I selected table with 14 records and set a row count of 5. This query is supposed to update records in 5, 5 and 4 but it just updates first 5 records.
Query - 1:
SET ROWCOUNT 5
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
WHILE ##ROWCOUNT > 0
BEGIN
SET rowcount 5
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
PRINT (##ROWCOUNT)
END
SET rowcount 0
Query - 2:
SET ROWCOUNT 5
WHILE (##ROWCOUNT > 0)
BEGIN
BEGIN TRANSACTION
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
PRINT (##ROWCOUNT)
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
What am I missing here?
You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the UPDATE operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.
You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:
It has that been deprecated since SQL Server 2005 was released (11 years ago):
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax
It can affect more than just the statement you are dealing with:
Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.
Instead, use the TOP () clause.
There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a ROLLBACK, which isn't even needed since each statement is its own transaction (i.e. auto-commit).
Assuming you find a reason to keep the explicit transaction, then you do not have a TRY / CATCH structure. Please see my answer on DBA.StackExchange for a TRY / CATCH template that handles transactions:
Are we required to handle Transaction in C# Code as well as in Store procedure
I suspect that the real WHERE clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model (please see note below regarding performance) would be:
DECLARE #Rows INT,
#BatchSize INT; -- keep below 5000 to be safe
SET #BatchSize = 2000;
SET #Rows = #BatchSize; -- initialize just to enter the loop
BEGIN TRY
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP (#BatchSize) tab
SET tab.Value = 'abc1'
FROM TableName tab
WHERE tab.Parameter1 = 'abc'
AND tab.Parameter2 = 123
AND tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
-- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
-- that you don't skip differences that compare the same due to
-- insensitivity of case, accent, etc, or linguistic equivalence.
SET #Rows = ##ROWCOUNT;
END;
END TRY
BEGIN CATCH
RAISERROR(stuff);
RETURN;
END CATCH;
By testing #Rows against #BatchSize, you can avoid that final UPDATE query (in most cases) because the final set is typically some number of rows less than #BatchSize, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to #BatchSize will this code run a final UPDATE affecting 0 rows.
I also added a condition to the WHERE clause to prevent rows that have already been updated from being updated again.
NOTE REGARDING PERFORMANCE
I emphasized "better" above (as in, "this is a better model") because this has several improvements over the O.P.'s original code, and works fine in many cases, but is not perfect for all cases. For tables of at least a certain size (which varies due to several factors so I can't be more specific), performance will degrade as there are fewer rows to fix if either:
there is no index to support the query, or
there is an index, but at least one column in the WHERE clause is a string data type that does not use a binary collation, hence a COLLATE clause is added to the query here to force the binary collation, and doing so invalidates the index (for this particular query).
This is the situation that #mikesigs encountered, thus requiring a different approach. The updated method copies the IDs for all rows to be updated into a temporary table, then uses that temp table to INNER JOIN to the table being updated on the clustered index key column(s). (It's important to capture and join on the clustered index columns, whether or not those are the primary key columns!).
Please see #mikesigs answer below for details. The approach shown in that answer is a very effective pattern that I have used myself on many occasions. The only changes I would make are:
Explicitly create the #targetIds table rather than using SELECT INTO...
For the #targetIds table, declare a clustered primary key on the column(s).
For the #batchIds table, declare a clustered primary key on the column(s).
For inserting into #targetIds, use INSERT INTO #targetIds (column_name(s)) SELECT and remove the ORDER BY as it's unnecessary.
So, if you don't have an index that can be used for this operation, and can't temporarily create one that will actually work (a filtered index might work, depending on your WHERE clause for the UPDATE query), then try the approach shown in #mikesigs answer (and if you use that solution, please up-vote it).
WHILE EXISTS (SELECT * FROM TableName WHERE Value <> 'abc1' AND Parameter1 = 'abc' AND Parameter2 = 123)
BEGIN
UPDATE TOP (1000) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1'
END
I encountered this thread yesterday and wrote a script based on the accepted answer. It turned out to perform very slowly, taking 12 hours to process 25M of 33M rows. I wound up cancelling it this morning and working with a DBA to improve it.
The DBA pointed out that the is null check in my UPDATE query was using a Clustered Index Scan on the PK, and it was the scan that was slowing the query down. Basically, the longer the query runs, the further it needs to look through the index for the right rows.
The approach he came up with was obvious in hind sight. Essentially, you load the IDs of the rows you want to update into a temp table, then join that onto the target table in the update statement. This uses an Index Seek instead of a Scan. And ho boy does it speed things up! It took 2 minutes to update the last 8M records.
Batching Using a Temp Table
SET NOCOUNT ON
DECLARE #Rows INT,
#BatchSize INT,
#Completed INT,
#Total INT,
#Message nvarchar(max)
SET #BatchSize = 4000
SET #Rows = #BatchSize
SET #Completed = 0
-- #targetIds table holds the IDs of ALL the rows you want to update
SELECT Id into #targetIds
FROM TheTable
WHERE Foo IS NULL
ORDER BY Id
-- Used for printing out the progress
SELECT #Total = ##ROWCOUNT
-- #batchIds table holds just the records updated in the current batch
CREATE TABLE #batchIds (Id UNIQUEIDENTIFIER);
-- Loop until #targetIds is empty
WHILE EXISTS (SELECT 1 FROM #targetIds)
BEGIN
-- Remove a batch of rows from the top of #targetIds and put them into #batchIds
DELETE TOP (#BatchSize)
FROM #targetIds
OUTPUT deleted.Id INTO #batchIds
-- Update TheTable data
UPDATE t
SET Foo = 'bar'
FROM TheTable t
JOIN #batchIds tmp ON t.Id = tmp.Id
WHERE t.Foo IS NULL
-- Get the # of rows updated
SET #Rows = ##ROWCOUNT
-- Increment our #Completed counter, for progress display purposes
SET #Completed = #Completed + #Rows
-- Print progress using RAISERROR to avoid SQL buffering issue
SELECT #Message = 'Completed ' + cast(#Completed as varchar(10)) + '/' + cast(#Total as varchar(10))
RAISERROR(#Message, 0, 1) WITH NOWAIT
-- Quick operation to delete all the rows from our batch table
TRUNCATE TABLE #batchIds;
END
-- Clean up
DROP TABLE IF EXISTS #batchIds;
DROP TABLE IF EXISTS #targetIds;
Batching the slow way, do not use!
For reference, here is the original slower performing query:
SET NOCOUNT ON
DECLARE #Rows INT,
#BatchSize INT,
#Completed INT,
#Total INT
SET #BatchSize = 4000
SET #Rows = #BatchSize
SET #Completed = 0
SELECT #Total = COUNT(*) FROM TheTable WHERE Foo IS NULL
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE t
SET Foo = 'bar'
FROM TheTable t
JOIN #batchIds tmp ON t.Id = tmp.Id
WHERE t.Foo IS NULL
SET #Rows = ##ROWCOUNT
SET #Completed = #Completed + #Rows
PRINT 'Completed ' + cast(#Completed as varchar(10)) + '/' + cast(#Total as varchar(10))
END
This is a more efficient version of the solution from #Kramb. The existence check is redundant as the update where clause already handles this. Instead you just grab the rowcount and compare to batchsize.
Also note #Kramb solution didn't filter out already updated rows from the next iteration hence it would be an infinite loop.
Also uses the modern batch size syntax instead of using rowcount.
DECLARE #batchSize INT, #rowsUpdated INT
SET #batchSize = 1000;
SET #rowsUpdated = #batchSize; -- Initialise for the while loop entry
WHILE (#batchSize = #rowsUpdated)
BEGIN
UPDATE TOP (#batchSize) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 and Value <> 'abc1';
SET #rowsUpdated = ##ROWCOUNT;
END
I want share my experience. A few days ago I have to update 21 million records in table with 76 million records. My colleague suggested the next variant.
For example, we have the next table 'Persons':
Id | FirstName | LastName | Email | JobTitle
1 | John | Doe | abc1#abc.com | Software Developer
2 | John1 | Doe1 | abc2#abc.com | Software Developer
3 | John2 | Doe2 | abc3#abc.com | Web Designer
Task: Update persons to the new Job Title: 'Software Developer' -> 'Web Developer'.
1. Create Temporary Table 'Persons_SoftwareDeveloper_To_WebDeveloper (Id INT Primary Key)'
2. Select into temporary table persons which you want to update with the new Job Title:
INSERT INTO Persons_SoftwareDeveloper_To_WebDeveloper SELECT Id FROM
Persons WITH(NOLOCK) --avoid lock
WHERE JobTitle = 'Software Developer'
OPTION(MAXDOP 1) -- use only one core
Depends on rows count, this statement will take some time to fill your temporary table, but it would avoid locks. In my situation it took about 5 minutes (21 million rows).
3. The main idea is to generate micro sql statements to update database. So, let's print them:
DECLARE #i INT, #pagesize INT, #totalPersons INT
SET #i=0
SET #pagesize=2000
SELECT #totalPersons = MAX(Id) FROM Persons
while #i<= #totalPersons
begin
Print '
UPDATE persons
SET persons.JobTitle = ''ASP.NET Developer''
FROM Persons_SoftwareDeveloper_To_WebDeveloper tmp
JOIN Persons persons ON tmp.Id = persons.Id
where persons.Id between '+cast(#i as varchar(20)) +' and '+cast(#i+#pagesize as varchar(20)) +'
PRINT ''Page ' + cast((#i / #pageSize) as varchar(20)) + ' of ' + cast(#totalPersons/#pageSize as varchar(20))+'
GO
'
set #i=#i+#pagesize
end
After executing this script you will receive hundreds of batches which you can execute in one tab of MS SQL Management Studio.
4. Run printed sql statements and check for locks on table. You always can stop process and play with #pageSize to speed up or speed down updating(don't forget to change #i after you pause script).
5. Drop Persons_SoftwareDeveloper_To_AspNetDeveloper. Remove temporary table.
Minor Note: This migration could take a time and new rows with invalid data could be inserted during migration. So, firstly fix places where your rows adds. In my situation I fixed UI, 'Software Developer' -> 'Web Developer'.
More about this method on my blog https://yarkul.com/how-smoothly-insert-millions-of-rows-in-sql-server/
Your print is messing things up, because it resets ##ROWCOUNT. Whenever you use ##ROWCOUNT, my advice is to always set it immediately to a variable. So:
DECLARE #RC int;
WHILE #RC > 0 or #RC IS NULL
BEGIN
SET rowcount 5;
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1';
SET #RC = ##ROWCOUNT;
PRINT(##ROWCOUNT)
END;
SET rowcount = 0;
And, another nice feature is that you don't need to repeat the update code.
First of all, thank you all for your inputs. I tweak my Query - 1 and got my desired result. Gordon Linoff is right, PRINT was messing up my query so I modified it as following:
Modified Query - 1:
SET ROWCOUNT 5
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
UPDATE TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
SET ROWCOUNT 0
Output:
(5 row(s) affected)
(5 row(s) affected)
(4 row(s) affected)
(0 row(s) affected)
I have written this stored procedure and it executes but it doesn't update the customer. The question is: Create a procedure named prc_cus_balance_update that will take the invoice number as a parameter and update the customer balance.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE PRC_CUS_BALANCE_UPDATE3
#INV_NUMBER INT
AS
BEGIN
DECLARE #CUS_CODE INT
SELECT #CUS_CODE=CUS_CODE
FROM INVOICE
WHERE #INV_NUMBER=INV_NUMBER
UPDATE CUSTOMER
SET CUS_BALANCE=CUS_BALANCE +
(SELECT INV_TOTAL FROM INVOICE WHERE #INV_NUMBER=INV_NUMBER)
WHERE #CUS_CODE=CUS_CODE
END
GO
While developing, I'd put in some "extras" to figure out what is going on.
Pseudo code below.
You want to make sure you found a matching row.
And you want to make sure at least one row was actually updated.
I'm NOT saying the code below is "production ready". But will show the concepts.
CREATE PROCEDURE PRC_CUS_BALANCE_UPDATE3
#INV_NUMBER INT
AS
BEGIN
DECLARE #CUS_CODE INT
DECLARE #MYROWCOUNT INT
SELECT #CUS_CODE=CUS_CODE
FROM INVOICE
WHERE #INV_NUMBER=INV_NUMBER
if(Not(#CUS_CODE IS NULL))
BEGIN
SET NOCOUNT OFF
UPDATE CUSTOMER
SET CUS_BALANCE=CUS_BALANCE +
(SELECT INV_TOTAL FROM INVOICE WHERE #INV_NUMBER=INV_NUMBER)
WHERE #CUS_CODE=CUS_CODE
select #MYROWCOUNT = ##ROWCOUNT
if(#MYROWCOUNT <=0)
BEGIN
print 'No row updated. :<'
END
SET NOCOUNT OFF
END
ELSE
BEGIN
print "#CUS_CODE match not found."
END
END
GO
Try putting some better bulletproofing in. Multiple rows, null values, and the like, can all cause problems with how you currently have it. Here's a stab at it, without me knowing the specifics of your data model (it may not be appropriate to sum together the totals from invoices, I'm just saying you could have multiple rows and need to deal with that).
CREATE PROCEDURE PRC_CUS_BALANCE_UPDATE3
#INV_NUMBER INT
AS
BEGIN
DECLARE #CUS_CODE INT
SELECT
TOP 1 #CUS_CODE = CUS_CODE
FROM
INVOICE
WHERE
INV_NUMBER=#INV_NUMBER
IF #CUS_CODE IS NOT NULL
BEGIN
UPDATE
CUSTOMER
SET
CUS_BALANCE = ISNULL(CUS_BALANCE, 0.0) +
ISNULL(
(SELECT
SUM(INV_TOTAL)
FROM
INVOICE
WHERE
#INV_NUMBER = INV_NUMBER),
0.0)
WHERE
CUS_CODE = #CUS_CODE
END
END
GO