Record locking and concurrency issues - sql-server

My logical schema is as follows:
A header record can have multiple child records.
Multiple PCs can be inserting Child records, via a stored procedure that accepts details about the child record, and a value.
When a child record is inserted, a header record may need to be inserted if one doesn't exist with the specified value.
You only ever want one header record inserted for any given "value". So if two child records are inserted with the same "Value" supplied, the header should only be created once. This requires concurrency management during inserts.
Multiple PCs can be querying unprocessed header records, via a stored procedure
A header record needs to be queried if it has a specific set of child records, and the header record is unprocessed.
You only ever want one machine PC to query and process each header record. There should never be an instance where a header record and it's children should be processed by more than one PC. This requires concurrency management during selects.
So basically my header query looks like this:
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT TOP 1
*
INTO
#unprocessed
FROM
Header h WITH (READPAST, UPDLOCK)
JOIN
Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN
Child part2 ON part1.HeaderID = part2.HeaderID AND
WHERE
h.Processed = 0x0;
UPDATE
Header
SET
Processed = 0x1
WHERE
HeaderID IN (SELECT [HeaderID] FROM #unprocessed);
SELECT * FROM #unprocessed
COMMIT TRAN
So the above query ensures that concurrent queries never return the same record.
I think my problem is on the insert query. Here's what I have:
DECLARE #HeaderID INT
BEGIN TRAN
--Create header record if it doesn't exist, otherwise get it's HeaderID
MERGE INTO
Header WITH (HOLDLOCK) as target
USING
(
SELECT
[Value] = #Value, --stored procedure parameter
[HeaderID]
) as source ([Value], [HeaderID]) ON target.[Value] = source.[Value] AND
target.[Processed] = 0
WHEN MATCHED THEN
UPDATE SET
--Get the ID of the existing header
#HeaderID = target.[HeaderID],
[LastInsert] = sysdatetimeoffset()
WHEN NOT MATCHED THEN
INSERT
(
[Value]
)
VALUES
(
source.[Value]
)
--Get new or existing ID
SELECT #HeaderID = COALESCE(#HeaderID , SCOPE_IDENTITY());
--Insert child with the new or existing HeaderID
INSERT INTO
[Correlation].[CorrelationSetPart]
(
[HeaderID],
[Name]
)
VALUES
(
#HeaderID,
#Name --stored procedure parameter
);
My problem is that insertion query is often blocked by the above selection query, and I'm receiving timeouts. The selection query is called by a broker, so it can be called fairly quickly. Is there a better way to do this? Note, I have control over the database schema.

To answer the second part of the question
You only ever want one machine PC to query and process each header
record. There should never be an instance where a header record and
it's children should be processed by more than one PC
Have a look at sp_getapplock.
I use app locks within the similar scenario. I have a table of objects that must be processed, similar to your table of headers. The client application runs several threads simultaneously. Each thread executes a stored procedure that returns the next object for processing from the table of objects. So, the main task of the stored procedure is not to do the processing itself, but to return the first object in the queue that needs processing.
The code may look something like this:
CREATE PROCEDURE [dbo].[GetNextHeaderToProcess]
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarHeaderID int = NULL;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'GetNextHeaderToProcess_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Find the most suitable header for processing
SELECT TOP 1
#VarHeaderID = h.HeaderID
FROM
Header h
JOIN Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN Child part2 ON part1.HeaderID = part2.HeaderID
WHERE
h.Processed = 0x0
ORDER BY ....;
-- sorting is optional, but often useful
-- for example, order by some timestamp to process oldest/newest headers first
-- Mark the found Header to prevent multiple processing.
UPDATE Header
SET Processed = 2 -- in progress. Another procedure that performs the actual processing should set it to 1 when processing is complete.
WHERE HeaderID = #VarHeaderID;
-- There is no need to explicitly verify if we found anything.
-- If #VarHeaderID is null, no rows will be updated
END;
-- Return found Header, or no rows if nothing was found, or failed to acquire the lock
SELECT
#VarHeaderID AS HeaderID
WHERE
#VarHeaderID IS NOT NULL
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
This procedure should be called from the procedure that does actual processing. In my case the client application does the actual processing, in your case it may be another stored procedure. The idea is that we acquire the app lock for the short time here. Of course, if the actual processing is fast you can put it inside the lock, so only one header can be processed at a time.
Once the lock is acquired we look for the most suitable header to process and then set its Processed flag. Depending on the nature of your processing you can set the flag to 1 (processed) right away, or set it to some intermediary value, like 2 (in progress) and then set it to 1 (processed) later. In any case, once the flag is not zero the header will not be chosen for processing again.
These app locks are separate from normal locks that DB puts when reading and updating rows and they should not interfere with inserts. In any case, it should be better than locking the whole table as you do WITH (UPDLOCK).
Returning to the first part of the question
You only ever want one header record inserted for any given "value".
So if two child records are inserted with the same "Value" supplied,
the header should only be created once.
You can use the same approach: acquire app lock in the beginning of the inserting procedure (with some different name than the app lock used in querying procedure). Thus you would guarantee that inserts happen sequentially, not simultaneously. BTW, in practice most likely inserts can't happen simultaneously anyway. The DB would perform them sequentially internally. They will wait for each other, because each insert locks a table for update. Also, each insert is written to transaction log and all writes to transaction log are also sequential. So, just add sp_getapplock to the beginning of your inserting procedure and remove that WITH (HOLDLOCK) hint in the MERGE.
The caller of the GetNextHeaderToProcess procedure should handle correctly the situation when procedure returns no rows. This can happen if the lock acquisition timed out, or there are simply no more headers to process. Usually the processing part simply retries after a while.
Inserting procedure should check if the lock acquisition failed and retry the insert or report the problem to the caller somehow. I usually return the generated identity ID of the inserted row (the ChildID in your case) to the caller. If procedure returns 0 it means that insert failed. The caller may decide what to do.

Related

Stored procedure to constantly check table for records and process them

We have a process that is causing dirty read errors. So, I was thinking of redesigning it as a queue with a single process to go through the queue.
My idea was to create a table that various processes could insert into. Then, one process actually processes the records in the table. But, we need real-time results from the processing method. So, while I was thinking of scheduling it to run every couple of seconds, that may not be fast enough. I don't want a user waiting several seconds for a result.
Essentially, I was thinking of using an infinite loop so that one stored procedure is constantly running, and that stored procedure creates transactions to perform updates.
It could be something like:
WHILE 1=1
BEGIN
--Check for new records
IF NewRecordsExist
BEGIN
--Mark new records as "in process"
BEGIN TRANSACTION
--Process records
--If errors, Rollback
--Otherwise Commit transaction
END
END
But, I don't want SQL Server to get overburdened by this one method. Basically, I want this running in the background all the time, not eating up processor power. Unless there is a lot for it to do. Then, I want it doing its work. Is there a better design pattern for this? Are stored procedures with infinite loops thread-safe? I use this pattern all the time in Windows Processes, but this particular task is not appropriate for a Windows Process.
Update: We have transaction locking set. That is not the issue. We are trying to set aside items that are reserved for orders. So, we have a stored procedure start a transaction, check what is available, and then update the table to mark what is reserved.
The problem is that when two different users attempt to reserve the same product at the same time, the first process checks availability, finds product available, then start to reserve it. But, the second process cannot see what the first process is doing (we have transaction locking set), so it has no idea that another process is trying to reserve the items. It sees the items as still available and also goes to reserve them.
We considered application locking, but we are worried about waits, delays, etc. So, another solution we came up with is one process that handles reservations in a queue. It is first come first serve. Only one process will ever be reading the queue at a time. Different processes can add to the queue, but we no longer need to worry about two processes trying to reserve the same product at the same time. Only one process will be doing the reservations. I was hoping to do this all in SQL, but that may not be possible.
Disclaimer: This may be an option, but the recommendations for using a Service Broker to serialize requests are likely the better solutions.
If you can't use a transaction, but need your your stored procedure to return an immediate result, there are ways to safely update a record in a single statement.
DECLARE #ProductId INT = 123
DECLARE #Quantity INT = 5
UPDATE Inventory
SET Available = Available - #Quantity
WHERE ProductId = #ProductId
AND Available >= #Quantity
IF ##ROW_COUNT > 0
BEGIN
-- Success
END
Under the covers, there is still a transaction occurring accompanied by a lock, but it just covers this one statement.
If you need to update multiple records (reserve multiple product IDs) in one statement, you can use the OUTPUT clause to capture which records were successfully updated, which you can then compare with the original request.
DECLARE #Request TABLE (ProductId INT, Quantity INT)
DECLARE #Result TABLE (ProductId INT, Quantity INT)
INSERT #Request VALUES (123, 5), (456, 1)
UPDATE I
SET Available = Available - R.Quantity
OUTPUT R.ProductId, R.Quantity INTO #Result
FROM #Request R
JOIN Inventory I
ON I.ProductId = R.ProductId
AND I.Available >= R.#Quantity
IF (SELECT COUNT(*) FROM #Request) = (SELECT COUNT(*) FROM #Result)
BEGIN
-- Success
END
ELSE BEGIN
-- Partial or fail. Need to undo those that may have been updated.
END
In any case, you need to thoroughly think through your error handling and undo scenarios.
If you store reservations separate from inventory and define "Available" as "Inventory.OnHand - SUM(Reserved.Quantity)", this approach is not an option.
(Comments as to why this a bad approach will likely follow.)

Atomic DROP and SELECT ... INTO table

I would have thought that code like the following would be atomic: if DeleteMe exists before running this transaction, it should be dropped and recreated. Otherwise it should simply be created:
BEGIN TRANSACTION
IF OBJECT_ID('DeleteMe') IS NOT NULL
DROP TABLE DeleteMe
SELECT query.*
INTO DeleteMe
FROM (SELECT 1 AS Value) AS query
COMMIT TRANSACTION
However, it appears that executing this code multiple times concurrently can cause various combinations of the errors:
Cannot drop the table 'DeleteMe', because it does not exist or you do not have permission.
There is already an object named 'DeleteMe' in the database.
Here's a LINQPad Script to show what I mean.
var sql = #"
BEGIN TRANSACTION
IF OBJECT_ID('DeleteMe') IS NOT NULL
DROP TABLE DeleteMe
SELECT query.*
INTO DeleteMe
FROM (SELECT 1 AS Value) AS query
COMMIT TRANSACTION
";
await Task.WhenAll(Enumerable.Range(1, 50)
.Select(async i =>
{
using var connection = new SqlConnection(this.Connection.ConnectionString);
await connection.OpenAsync();
await connection.ExecuteAsync(sql);
}).Dump());
And an example of its output:
If I use SQL Server 2016's DROP TABLE IF EXISTS feature, that part at least appears to be atomic, but then another concurrent command can apparently still create the DeleteMe table between the time this one gets dropped and the time it gets created again.
Question: Is there any way to atomically drop, create, and populate a table, such that there's no time during which that table won't exist from the perspective of another concurrent connection?
Is there any way to atomically drop, create, and populate a table, such that there's no time during which that table won't exist from the perspective of another concurrent connection?
Sure. It's just like any transaction: you have to take an inconsistent lock on the very first statement. In your transaction two sessions can run IF OBJECT_ID('DeleteMe') IS NOT NULL at the same time. Then they both try to drop the object, and only one succeeds.
DROP TABLE IF EXISTS also performs the existence check before taking the exclusive schema lock on the object that would be necessary to drop it.
A simple and reliable way to get an exclusive lock is to use sp_getapplock.
eg
BEGIN TRANSACTION
exec sp_getapplock 'dropandcreate_DeleteMe', 'exclusive'
DROP TABLE IF EXISTS DeleteMe
SELECT query.*
INTO DeleteMe
FROM (SELECT 1 AS Value) AS query
COMMIT TRANSACTION
The biggest problem I see you encountering, is that by dropping the object you want to lock (you can lock an object, but not a 'name' of an object) you have nothing to lock.
Proposals that involve finding something else to lock only resolve half the issue; the process stops racing itself, but then any other process that references the DeleteMe table can still race with this process.
10x the process referenced in the question, using sp_getapplock, for example
Those 10 concurrent instances of the process no longer race each other
Then 1x another process that only uses SELECT * FROM DeleteMe but not sp_getapplock
That process CAN fail due to racing with the currently Active DROP/SELECT INTO process
That leads me to conclude that NOT dropping objects is better, so that the table in use remains in existence and CAN be locked...
BEGIN TRANSACTION
TRUNCATE TABLE DeleteMe
INSERT INTO DeleteMe SELECT 1 AS Value
COMMIT TRANSACTION
The TRUNCATE implicitly takes a table lock, and a secondary process that reads from this table never sees it as empty.

Query from multiple threads on a database table

I have a database table with thousands of entries. I have multiple worker threads which pick up one row at a time, does some work (takes roughly one second each). While picking up the row, each thread updates a flag on the database row (like a timestamp) so that the other threads do not pick it up. But the problem is that I end up in a scenario where multiple threads are picking up the same row.
My general question is that what general design approach should I follow here to ensure that each thread picks up unique rows and does their task independently.
Note : Multiple threads are running in parallel to hasten the processing of the database rows. So I would like to have a as small as possible critical segment or exclusive lock.
Just to give some context, below is the stored proc which picks up the rows from the table after it has updated the flag on the row. Please note that the stored proc is not compilable as I have removed unnecessary portions from it. But generally that's the structure of it.
The problem happens when multiple threads execute the stored proc in parallel. The change made by the update statement (note that the update is done after taking up a lock) in one thread is not visible to the other thread unless the transaction is committed. And as there is a SELECT statement (which takes around 50ms) between the UPDATE and the TRANSACTION COMMIT, on 20% cases the UPDATE statement in a thread picks up a row which has already been processed.
I hope I am clear enough here.
USE ['mydatabase']
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GetRequest]
AS
BEGIN
-- some variable declaration here
BEGIN TRANSACTION
-- check if there are blocking rows in the request table
-- FM: Remove records that don't qualify for operation.
-- delete operation on the table to remove rows we don't want to process
delete FROM request where somecondition = 1
-- Identify the requests to process
DECLARE #TmpTableVar table(TmpRequestId int NULL);
UPDATE TOP(1) request
WITH (ROWLOCK)
SET Lock = DateAdd(mi, 5, GETDATE())
OUTPUT INSERTED.ID INTO #TmpTableVar
FROM request tur
WHERE (Lock IS NULL OR GETDATE() > Lock) -- not locked or lock expired
AND GETDATE() > NextRetry -- next in the queue
IF(##RowCount = 0)
BEGIN
ROLLBACK TRANSACTION
RETURN
END
select #RequestID = TmpRequestId from #TmpTableVar
-- Get details about the request that has been just updated
SELECT somerows
FROM request
WHERE somecondition = 1
COMMIT TRANSACTION
END
The analog of a critical section in SQL Server is sp_getapplock, which is simple to use. Alternatively you can SELECT the row to update with (UPDLOCK,READPAST,ROWLOCK) table hints. Both of these require a multi-statement transaction to control the duration of the exclusive locking.
You need start a transaction isolation level on sql for isolation your line, but this can impact on your performance.
Look the sample:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
GO
SELECT ID, NAME, FLAG FROM SAMPLE_TABLE WHERE FLAG=0
GO
UPDATE SAMPLE_TABLE SET FLAG=1 WHERE ID=1
GO
COMMIT TRANSACTION
Finishing, not exist a better way for use isolation level. You need analyze the positive and negative point for each level isolation and test your system performance.
More information:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
http://www.besttechtools.com/articles/article/sql-server-isolation-levels-by-example
https://en.wikipedia.org/wiki/Isolation_(database_systems)

Read Committed vs Repeatable Read Example

I'm trying to execute the following two queries in SQL Server Management Studio (in separate query windows). I run them in the same order I typed them here.
When isolation level is set to READ COMMITTED they execute ok, but when it's set to REPEATABLE READS the transactions are dead locked.
Can you please help me to understand what is dead locked here?
First:
begin tran
declare #a int, #b int
set #a = (select col1 from Test where id = 1)
set #b = (select col1 from Test where id = 2)
waitfor delay '00:00:10'
update Test set col1 = #a + #b where id = 1
update Test set col1 = #a - #b where id = 2
commit
Second:
begin tran
update Test set col1 = -1 where id = 1
commit
UPD Answer is laready given but folowing the advice I'm inserting the deadlock graph
In both cases the selects use a shared lock and the updates an exclusive lock.
In READ COMMITTED mode, the shared lock is released immediately after the select finishes.
In REPEATABLE READS mode, the shared locks for the selects are held untill the end of the transaction, to ensure that no other sessions can change the data that was read. A new read within the same transaction is garantueed to yield the same results, unless the data was changed in the current session/transaction
Originally I thought, that you executed "First" in both sessions. Then the explanation would be trivial: both sessions acquire and get a shared lock, which then blocks the exclusive lock required for the updates.
The situation with a second session doing only an update is a little more complex. An update staement will first acquire an update lock (UPDLOCK) for selecting the rows that must be updated, which is probably similar to a shared lock, but at least not blocked by a shared lock. Next, when the data is actually updated, it tries to convert the update lock to an exclusive lock, which fails, because the first session is still holding the shared lock. Now both sessions block each other.

Let each run of a sproc process its share of rows

I am maintaining a sproc where the developer has implemented his own locking mechanism but to me it seemed flawed:
CREATE PROCEDURE Sproc 1
AS
Update X
set flag = lockedforprocessing
where flag = unprocessed
-- Some processing occurs here with enough time to
-- 1. table X gets inserted new rows with a flag of unprocessed
-- 2. start another instance of this Sproc 1 that executes the above update
Select from X
where flag = lockedforprocessing
-- Now the above statement reads rows that it hadn't put a lock on to start with.
I know that I can just wrap it sproc inside a transaction with isolation level of SERIALIZABLE but I want to avoid this.
The goal is
that multiple instances of this sproc can run at the same time and process their own "share" of the records to achieve maximum concurrency.
An execution of the sproc should not wait on a previous run that is still executing
I don't think REPEATABLE READ can help here since it won't prevent the new records with a value of "unprocessed" being read (correct me if I'm wrong please).
I just discovered the sp_getlock sproc and it would resolve the bug but serialize exaction which is not my goal.
A solution that I see is to have each run of the proc generate its own unique GUID and assign that to the flag but somehow I am thinking I am simulating something that SQL Server already can solve out of the box.
Is the only way that let each run of a sproc process it's "share" of the rows to have it in SERIALIZABLE?
Regards, Tom
Assuming there is an ID field in X, a temporary table of updated Xs can help:
CREATE PROCEDURE Sproc 1
AS
-- Temporary table listing all accessed Xs
declare #flagged table (ID int primary key)
-- Lock and retrieve locked records
Update X
set flag = lockedforprocessing
output Inserted.ID into #flagged
where flag = unprocessed
-- Processing
Select from X inner join #flagged f on x.ID = f.ID
-- Clean-up
update X
set flag = processed
from x inner join #flagged f on x.ID = f.ID

Resources