T-SQL select and update with lock - transaction or table hint - sql-server

How to achieve following transaction lock?
In big simplification - I have a table of "Tasks" with statuses (Created, Started, Completed). I want to create stored procedure GetNext to get top 1 task that wasn't yet started (has Created status).
In this procedure I want to mark the task as Started. Obviously I want to avoid the situation when two processes call this procedure and get the same task.
The procedure will not be called frequently so performance is not an issue, keeping data uncorrupted is an issue.
So I want to do something like this:
UPDATE tblTasks
SET Status = 'Started'
WHERE TaskId = (SELECT TOP 1 TaskId
FROM tblTasks
WHERE Status = 'Created')
I also want to receive the task that I just updated so rather than what is above I need something like:
DECLARE #TaskId AS INT = (SELECT TOP 1 TaskId FROM tblTasks WHERE Status = 'Created')
UPDATE tblTasks
SET Status = 'Started'
WHERE TaskId = #TaskId
[... - Do something with #TaskId - not relevant]
OR
DECLARE #TaskIds AS TABLE(Id INT)
UPDATE tblTasks
SET Status = 'Started'
OUTPUT INSERTED.Id INTO #TaskIdS
WHERE TaskId = #TaskId
[... - Do something with #TaskIds - not relevant]
So assuming that I need select + update to achieve what I need - how can I assure that no other process will execute even first operation (select) until existing process is done?
As far as I understand even Serializable isolation level of transaction is not enough here because other process can read data, then wait until I finish (because its update is being held by lock) and update the data that I just updated.
I feel that table hints XLOCK or HOLDLOCK might help but I'm no expert and MS doc scared me with :
Caution
Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints be used only as a last resort by experienced developers and database administrators.
(from https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table)
So how do I make sure that two processes will not update one item and also how do I make sure that if one process is running the other will wait and do its job after the first finishes instead of failing?

Typically SQL Server locks automatically for each step until you either have a GOor reaches the end of the script.
From what I understand, what you want/need is a way to SELECT/UPDATE "in one go". You should be able to achieve that with a combination of TRANSACTION, TRY ... CATCH and CTE.
DECLARE #TaskIds AS TABLE (TaskId INT);
BEGIN TRANSACTION;
BEGIN TRY
WITH myTasks (TaskId) AS (
SELECT TOP 1 t.TaskId
FROM tblTasks AS t
WHERE t.Status = 'Created'
)
UPDATE t
SET t.Status = 'Started'
OUTPUT INSERTED.TaskId INTO #TaskIds
FROM tblTasks AS t
INNER JOIN myTasks AS mt
ON mt.TaskId = t.TaskId;
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH;
IF ##TRANCOUNT > 0
BEGIN
COMMIT TRANSACTION;
SELECT TaskId FROM #TaskIds;
[.. do other stuff ..]
END
GO

Related

SQL Server: using table lock hint in select for ensuring correctness?

I've got a project that is trying to apply DDD (Domain Driven Design). Currently, we've got something like this:
begin tran
try
_manager.CreateNewEmployee(newEmployeeCmd);
tran.Commit();
catch
rollback tran
Internally, the CreateNewEmployee method uses a domain service for checking if there's already an employee with the memberId. Here's some pseudo code:
void CreateNewEmployee(NewEmployeeCmd cmd)
if(_duplicateMember.AlreadyRegistered(cmd.MemberId) )
throw duplicate
// extra stuff
saveNewEmployee()
end
Now, in the end, it's as if we have the following SQL instructions executed (pesudo code again):
begin sql tran
select count(*) from table where memberId=#memberId and status=1 -- active
--some time goes by
insert into table ...
end
NOw, when I started looking at the code, I've noticed that it was using the default SQL Server locking level. In practice, that means that something like this could happen:
--thread 1
(1)select ... --assume it returns 0
--thread 2
(2)select ... ---nothing found
(3)insert recordA
--thread 1
(4)insert record --some as before
(5) commit tran
--thread 1
(6) commit tran
So, we could end up having repeated records. I've tried playing with the transaction levels, but the only way I've managed to make it work like it's intended was by changing the select that is used to check if there's already a record in the table. I've ended up using a table lock hint which instructs sql to maintain a lock until the end of the transaction. That was the only way I've managed to get a lock when the select starts (changing the other isolation levels still wouldn't do what I needed since they all allowed the select to run)
So, I've ended up using a table lock which is held from the beginning until the end of the transaction. In practice, that means that step (2) will block until thread 1 ends its job.
Is there a better option for this kind of scenarios (that don't depend on using, say, indexes)?
Thanks.
Luis
You need to get the proper locks on the initial select, which you can do with the locking hints with (updlock, serializable). Once you do that, thread 2 will wait for thread 1 to finish if thread 2 is using the same key range in its where.
You could use the Sam Saffron upsert approach.
For example:
create procedure dbo.Employee_getset_byName (#Name nvarchar(50), #MemberId int output) as
begin
set nocount, xact_abort on;
begin tran;
select #MemberId = Id
from dbo.Employee with (updlock, serializable) /* hold key range for #Name */
where Name = #Name;
if ##rowcount = 0 /* if we still do not have an Id for #Name */
begin;
/* for a sequence */
set #MemberId = next value for dbo.IdSequence; /* get next sequence value */
insert into dbo.Employee (Name, Id)
values (#Name, #MemberId);
/* for identity */
insert into dbo.Employee (Name)
values (#Name);
set #MemberId = scope_identity();
end;
commit tran;
end;
go

Commit Transaction take too long?

I have a stored procedure that have the following code:
BEGIN TRY
--BEGIN TRANSACTION #TranName
DECLARE #ID int
INSERT INTO [dbo].[a] ([Comment],[Type_Id],[CreatedBy])
VALUES ('test',1,2)
SET #ID = SCOPE_IDENTITY()
INSERT INTO [dbo].[b] ([Can_ID],[Com_ID],[Cal_ID],[CreatedBy])
VALUES (1,#ID,null,2)
UPDATE c SET LastUpdated = GETDATE(), LastUpdatedBy = 2 WHERE b.id = #ID
--COMMIT TRANSACTION #TranName
SELECT * from [View] where a.id=#ID
END TRY
BEGIN CATCH
--ROLLBACK TRANSACTION #TranName
END CATCH
Each of the statements in there running individually (as it is now) run fast. But when we remove the comments from the Transaction's piece of code the scripts run time increases from 1s to more than 2 minutes.
The system has been running for quite a while now, and this wasn't a problem before, I've been trying to search documentation about how SQL Server handle Transactions just in case there is anything that may affect SQL performance and the only thing that I have in mind is the Transaction Log... but ideally these individual statements run in a individual transaction as well, any idea?
As Jens suggested The problems was because of some Tables blocking, after resetting SQL Server Service this locks disappeared and the DB started working properly again.

How can I set a lock inside a stored procedure?

I've got a long-running stored procedure on a SQL server database. I don't want it to run more often than once every ten minutes.
Once the stored procedure has run, I want to store the latest result in a LatestResult table, against a time, and have all calls to the procedure return that result for the next ten minutes.
That much is relatively simple, but we've found that, because the procedure checks the LatestResult table and updates it, that large userbases are getting a number of deadlocks, when two users call the procedure at the same time.
In a client-side/threading situation, I would solve this by using a lock, having the first user lock the function, the second user encounters the lock, waiting for the result, the first user finishes their procedure call, updates the LatestResult table, and unlocks the second user, who then picks up the result from the LatestResult table.
Is there any way to accomplish this kind of locking in SQL Server?
EDIT:
This is basically how the code looks without its error checking calls:
DECLARE #LastChecked AS DATETIME
DECLARE #LastResult AS NUMERIC(18,2)
SELECT TOP 1 #LastChecked = LastRunTime, #LastResult = LastResult FROM LastResult
DECLARE #ReturnValue AS NUMERIC(18,2)
IF DATEDIFF(n, #LastChecked, GetDate()) >= 10 OR NOT #LastResult = 0
BEGIN
SELECT #ReturnValue = ABS(ISNULL(SUM(ISNULL(Amount,0)),0)) FROM Transactions WHERE ISNULL(DeletedFlag,0) = 0 GROUP BY GroupID ORDER BY ABS(ISNULL(SUM(ISNULL(Amount,0)),0))
UPDATE LastResult SET LastRunTime = GETDATE(), LastResult = #ReturnValue
SELECT #ReturnValue
END
ELSE
BEGIN
SELECT #LastResult
END
I'm not really sure what's going on with the grouping, but I've found a test system where execution time is coming in around 4 seconds.
I think there's some work scheduled to archive some of these records and boil them down to running totals, which will probably help things given that there's several million rows in that four second table...
This is a valid opportunity to use an Application Lock (see sp_getapplock and sp_releaseapplock) as it is a lock taken out on a concept that you define, not on any particular rows in any given table. The idea is that you create a transaction, then create this arbitrary lock that has an indetifier, and other processes will wait to enter that piece of code until the lock is released. This works just like lock() at the app layer. The #Resource parameter is the label of the arbitrary "concept". In more complex situations, you can even concatenate a CustomerID or something in there for more granular locking control.
DECLARE #LastChecked DATETIME,
#LastResult NUMERIC(18,2);
DECLARE #ReturnValue NUMERIC(18,2);
BEGIN TRANSACTION;
EXEC sp_getapplock #Resource = 'check_timing', #LockMode = 'Exclusive';
SELECT TOP 1 -- not sure if this helps the optimizer on a 1 row table, but seems ok
#LastChecked = LastRunTime,
#LastResult = LastResult
FROM LastResult;
IF (DATEDIFF(MINUTE, #LastChecked, GETDATE()) >= 10 OR #LastResult <> 0)
BEGIN
SELECT #ReturnValue = ABS(ISNULL(SUM(ISNULL(Amount, 0)), 0))
FROM Transactions
WHERE DeletedFlag = 0
OR DeletedFlag IS NULL;
UPDATE LastResult
SET LastRunTime = GETDATE(),
LastResult = #ReturnValue;
END;
ELSE
BEGIN
SET #ReturnValue = #LastResult; -- This is always 0 here
END;
SELECT #ReturnValue AS [ReturnValue];
EXEC sp_releaseapplock #Resource = 'check_timing';
COMMIT TRANSACTION;
You need to manage errors / ROLLBACK yourself (as stated in the linked MSDN documentation) so put in the usual TRY / CATCH. But, this does allow you to manage the situation.
If there are any concerns regarding contention on this process, there shouldn't be much as the lookup done right after locking the resource is a SELECT from a single-row table and then an IF statement that (ideally) just returns the last known value if the 10-minute timer hasn't elapsed. Hence, most calls should process rather quickly.
Please note: sp_getapplock / sp_releaseapplock should be used sparingly; Application Locks can definitely be very handy (such as in cases like this one) but they should only be used when absolutely necessary.

Is a single SQL Server statement atomic and consistent?

Is a statement in SQL Server ACID?
What I mean by that
Given a single T-SQL statement, not wrapped in a BEGIN TRANSACTION / COMMIT TRANSACTION, are the actions of that statement:
Atomic: either all of its data modifications are performed, or none of them is performed.
Consistent: When completed, a transaction must leave all data in a consistent state.
Isolated: Modifications made by concurrent transactions must be isolated from the modifications made by any other concurrent transactions.
Durable: After a transaction has completed, its effects are permanently in place in the system.
The reason I ask
I have a single statement in a live system that appears to be violating the rules of the query.
In effect my T-SQL statement is:
--If there are any slots available,
--then find the earliest unbooked transaction and mark it booked
UPDATE Transactions
SET Booked = 1
WHERE TransactionID = (
SELECT TOP 1 TransactionID
FROM Slots
INNER JOIN Transactions t2
ON Slots.SlotDate = t2.TransactionDate
WHERE t2.Booked = 0 --only book it if it's currently unbooked
AND Slots.Available > 0 --only book it if there's empty slots
ORDER BY t2.CreatedDate)
Note: But a simpler conceptual variant might be:
--Give away one gift, as long as we haven't given away five
UPDATE Gifts
SET GivenAway = 1
WHERE GiftID = (
SELECT TOP 1 GiftID
FROM Gifts
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
In both of these statements, notice that they are single statements (UPDATE...SET...WHERE).
There are cases where the wrong transaction is being "booked"; it's actually picking a later transaction. After staring at this for 16 hours, I'm stumped. It's as though SQL Server is simply violating the rules.
I wondered what if the results of the Slots view is changing before the update happens? What if SQL Server is not holding SHARED locks on the transactions on that date? Is it possible that a single statement can be inconsistent?
So I decided to test it
I decided to check if the results of sub-queries, or inner operations, are inconsistent. I created a simple table with a single int column:
CREATE TABLE CountingNumbers (
Value int PRIMARY KEY NOT NULL
)
From multiple connections, in a tight loop, I call the single T-SQL statement:
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
In other words the pseudo-code is:
while (true)
{
ADOConnection.Execute(sql);
}
And within a few seconds I get:
Violation of PRIMARY KEY constraint 'PK__Counting__07D9BBC343D61337'.
Cannot insert duplicate key in object 'dbo.CountingNumbers'.
The duplicate value is (1332)
Are statements atomic?
The fact that a single statement wasn't atomic makes me wonder if single statements are atomic?
Or is there a more subtle definition of statement, that differs from (for example) what SQL Server considers a statement:
Does this fundamentally means that within the confines of a single T-SQL statement, SQL Server statements are not atomic?
And if a single statement is atomic, what accounts for the key violation?
From within a stored procedure
Rather than a remote client opening n connections, I tried it with a stored procedure:
CREATE procedure [dbo].[DoCountNumbers] AS
SET NOCOUNT ON;
DECLARE #bumpedCount int
SET #bumpedCount = 0
WHILE (#bumpedCount < 500) --safety valve
BEGIN
SET #bumpedCount = #bumpedCount+1;
PRINT 'Running bump '+CAST(#bumpedCount AS varchar(50))
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
IF (#bumpedCount >= 500)
BEGIN
PRINT 'WARNING: Bumping safety limit of 500 bumps reached'
END
END
PRINT 'Done bumping process'
and opened 5 tabs in SSMS, pressed F5 in each, and watched as they too violated ACID:
Running bump 414
Msg 2627, Level 14, State 1, Procedure DoCountNumbers, Line 14
Violation of PRIMARY KEY constraint 'PK_CountingNumbers'.
Cannot insert duplicate key in object 'dbo.CountingNumbers'.
The duplicate key value is (4414).
The statement has been terminated.
So the failure is independent of ADO, ADO.net, or none of the above.
For 15 years i've been operating under the assumption that a single statement in SQL Server is consistent; and the only
What about TRANSACTION ISOLATION LEVEL xxx?
For different variants of the SQL batch to execute:
default (read committed): key violation
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
default (read committed), explicit transaction: no error key violation
BEGIN TRANSACTION
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
COMMIT TRANSACTION
serializable: deadlock
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
snapshot (after altering database to enable snapshot isolation): key violation
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRANSACTION
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
Bonus
Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)
Default transaction isolation level (READ COMMITTED)
Turns out every query I've ever written is broken
This certainly changes things. Every update statement I've ever written is fundamentally broken. E.g.:
--Update the user with their last invoice date
UPDATE Users
SET LastInvoiceDate = (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.uid = Users.uid)
Wrong value; because another invoice could be inserted after the MAX and before the UPDATE. Or an example from BOL:
UPDATE Sales.SalesPerson
SET SalesYTD = SalesYTD +
(SELECT SUM(so.SubTotal)
FROM Sales.SalesOrderHeader AS so
WHERE so.OrderDate = (SELECT MAX(OrderDate)
FROM Sales.SalesOrderHeader AS so2
WHERE so2.SalesPersonID = so.SalesPersonID)
AND Sales.SalesPerson.BusinessEntityID = so.SalesPersonID
GROUP BY so.SalesPersonID);
without exclusive holdlocks, the SalesYTD is wrong.
How have I been able to do anything all these years.
I've been operating under the assumption that a single statement in SQL Server is consistent
That assumption is wrong. The following two transactions have identical locking semantics:
STATEMENT
BEGIN TRAN; STATEMENT; COMMIT
No difference at all. Single statements and auto-commits do not change anything.
So merging all logic into one statement does not help (if it does, it was by accident because the plan changed).
Let's fix the problem at hand. SERIALIZABLE will fix the inconsistency you are seeing because it guarantees that your transactions behave as if they executed single-threadedly. Equivalently, they behave as if they executed instantly.
You will be getting deadlocks. If you are ok with a retry loop, you're done at this point.
If you want to invest more time, apply locking hints to force exclusive access to the relevant data:
UPDATE Gifts -- U-locked anyway
SET GivenAway = 1
WHERE GiftID = (
SELECT TOP 1 GiftID
FROM Gifts WITH (UPDLOCK, HOLDLOCK) --this normally just S-locks.
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
You will now see reduced concurrency. That might be totally fine depending on your load.
The very nature of your problem makes achieving concurrency hard. If you require a solution for that we'd need to apply more invasive techniques.
You can simplify the UPDATE a bit:
WITH g AS (
SELECT TOP 1 Gifts.*
FROM Gifts
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
UPDATE g -- U-locked anyway
SET GivenAway = 1
This gets rid of one unnecessary join.
Below is an example of an UPDATE statement that does increment a counter value atomically
-- Do this once for test setup
CREATE TABLE CountingNumbers (Value int PRIMARY KEY NOT NULL)
INSERT INTO CountingNumbers VALUES(1)
-- Run this in parallel: start it in two tabs on SQL Server Management Studio
-- You will see each connection generating new numbers without duplicates and without timeouts
while (1=1)
BEGIN
declare #nextNumber int
-- Taking the Update lock is only relevant in case this statement is part of a larger transaction
-- to prevent deadlock
-- When executing without a transaction, the statement will itself be atomic
UPDATE CountingNumbers WITH (UPDLOCK, ROWLOCK) SET #nextNumber=Value=Value+1
print #nextNumber
END
Select does not lock exclusively, even serializable does, but only for the time the select is executed! Once the select is over, the select lock is gone. Then, update locks take on as they now know what to lock as Select has return results. Meanwhile, anyone else can Select again!
The only sure way to safely read and lock a row is:
begin transaction
--lock what i need to read
update mytable set col1=col1 where mykey=#key
--now read what i need
select #d1=col1,#d2=col2 from mytable where mykey=#key
--now do here calculations checks whatever i need from the row i read to decide my update
if #d1<#d2 set #d1=#d2 else set #d1=#d2 * 2 --just an example calc
--now do the actual update on what i read and the logic
update mytable set col1=#d1,col2=#d2 where mykey=#key
commit transaction
This way any other connection running the same statement for the same data it will surely wait at the first (fake) update statement until the previous is done. This ensures that when lock is released only one connection will granted permission to lock request to 'update' and this one will surely read committed finalized data to make calculations and decide if and what to actually update at the second 'real' update.
In other words, when you need to select information to decide if/how to update, you need a begin/commit transaction block plus you need to start with a fake update of what you need to select - before you select it(update output will also do).

How do I use locking hints so that two parallel queries return non-intersecting results?

I have an SQL table Tasks with columns Id and State. I need to do the following: find any one task with state ReadyForProcessing, retrieve all its columns and set its state to Processing. Something like (pseudocode):
BEGIN TRANSACTION;
SELECT TOP 1 * FROM Tasks WHERE State = ReadyForProcessing
// here check if the result set is not empty and get the id, then
UPDATE Tasks SET State = Processing WHERE TaskId = RetrievedTaskId
END TRANSACTION
This query will be run in parallel from several database clients and the idea is that if two clients run the query in parallel they acquire different tasks and never the same task.
Looks like I need locking hints. I've read this MSDN article but don't understand anything there. How do I use locking hints for solving the above problem?
This should do the trick.
BEGIN TRANSACTION
DECLARE #taskId
SELECT TOP (1) #taskid = TaskId FROM Tasks WITH (UPDLOCK, READPAST) WHERE State = 'ReadyForProcessing'
UPDATE Tasks SET State = 'Processing' WHERE TaskId = #taskid
COMMIT TRAN
what about something like this:
UPDATE TOP (1) Tasks
SET State = Processing
OUTPUT INSERTED.RetrievedTaskId
WHERE State = ReadyForProcessing
test it out:
DECLARE #Tasks table (RetrievedTaskId int, State char(1))
INSERT #Tasks VALUES (1,'P')
INSERT #Tasks VALUES (2,'P')
INSERT #Tasks VALUES (3,'R')
INSERT #Tasks VALUES (4,'R')
UPDATE TOP (1) #Tasks
SET State = 'P'
OUTPUT INSERTED.RetrievedTaskId
WHERE State = 'R'
SELECT * FROM #Tasks
--OUTPUT:
RetrievedTaskId
---------------
3
(1 row(s) affected)
RetrievedTaskId State
--------------- -----
1 P
2 P
3 P
4 R
(4 row(s) affected)
I really, really don't like explicit locking in databases, it's a source of all sorts of crazy bugs - and the performance of the database can drop through the floor.
I'd suggest re-writing the SQL along the following lines:
begin transaction;
update tasks
set state = processing
where state = readyForProcessing
and ID = (select min(ID) from tasks where state = readyForProcessing);
commit;
This way, you don't need to lock anything - and because the update is atomic, there's no risk of two processes updating the same record.

Resources