Select query skips records during concurrent updates - sql-server

i have table that processed concurrently by N threads.
CREATE TABLE [dbo].[Jobs]
(
[Id] BIGINT NOT NULL CONSTRAINT [PK_Jobs] PRIMARY KEY IDENTITY,
[Data] VARBINARY(MAX) NOT NULL,
[CreationTimestamp] DATETIME2(7) NOT NULL,
[Type] INT NOT NULL,
[ModificationTimestamp] DATETIME2(7) NOT NULL,
[State] INT NOT NULL,
[RowVersion] ROWVERSION NOT NULL,
[Activity] INT NULL,
[Parent_Id] BIGINT NULL
)
GO
CREATE NONCLUSTERED INDEX [IX_Jobs_Type_State_RowVersion] ON [dbo].[Jobs]([Type], [State], [RowVersion] ASC) WHERE ([State] <> 100)
GO
CREATE NONCLUSTERED INDEX [IX_Jobs_Parent_Id_State] ON [dbo].[Jobs]([Parent_Id], [State] ASC)
GO
Job is adding to table with State=0 (New) โ€” it can be consumed by any worker in this state. When worker gets this queue item, State changed to 50 (Processing) and job becomes unavailable for other consumers (workers call [dbo].[Jobs_GetFirstByType] with arguments: Type=any, #CurrentState=0, #NewState=50).
CREATE PROCEDURE [dbo].[Jobs_GetFirstByType]
#Type INT,
#CurrentState INT,
#NewState INT
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
DECLARE #JobId BIGINT;
BEGIN TRAN
SELECT TOP(1)
#JobId = Id
FROM [dbo].[Jobs] WITH (UPDLOCK, READPAST)
WHERE [Type] = #Type AND [State] = #CurrentState
ORDER BY [RowVersion];
UPDATE [dbo].[Jobs]
SET [State] = #NewState,
[ModificationTimestamp] = SYSUTCDATETIME()
OUTPUT INSERTED.[Id]
,INSERTED.[RowVersion]
,INSERTED.[Data]
,INSERTED.[Type]
,INSERTED.[State]
,INSERTED.[Activity]
WHERE [Id] = #JobId;
COMMIT TRAN
END
After processing, job State can be changed to 0 (New) again or it can be once set to 100 (Completed).
CREATE PROCEDURE [dbo].[Jobs_UpdateStatus]
#Id BIGINT,
#State INT,
#Activity INT
AS
BEGIN
UPDATE j
SET j.[State] = #State,
j.[Activity] = #Activity,
j.[ModificationTimestamp] = SYSUTCDATETIME()
OUTPUT INSERTED.[Id], INSERTED.[RowVersion]
FROM [dbo].[Jobs] j
WHERE j.[Id] = #Id;
END
Jobs has hierarchical structure, parent job gets State=100 (Completed) only when all childs are completed.
Some workers call stored procedures ([dbo].[Jobs_GetCountWithExcludedState] with #ExcludedState=100) that returns number of incompleted jobs, when it returns 0, parent job State can be set to 100 (Completed).
CREATE PROCEDURE [dbo].[Jobs_GetCountWithExcludedState]
#ParentId INT,
#ExcludedState INT
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT COUNT(1)
FROM [dbo].[Jobs]
WHERE [Parent_Id] = #ParentId
AND [State] <> #ExcludedState
END
The main problem is strange behaviour of this stored procedure. Sometimes it returns 0 for parent job, but it exactly has incompleted jobs. I tryied turn on change data tracking and some debug information (including profiling) โ€” child jobs 100% doesn't have State=100 when SP return 0.
It seems that the SP skips records, that are not in 100 (Completed) state, but why it happen and how we can prevent this?
UPD:
Calling [dbo].[Jobs_GetCountWithExcludedState] starts when parent job has childs. There ัan be no situation when worker starts checking child jobs without their existence, because creating childs and setting to parent job checking activity wrapped in transaction:
using (var ts = new TransactionScope())
{
_jobManager.AddChilds(parentJob);
parentJob.State = 0;
parentJob.Activity = 30; // in this activity worker starts checking child jobs
ts.Complete();
}

It would be very disturbing if in fact your procedure Jobs_GetCountWithExcludedState was returning a count of 0 records when there were in fact committed records matching your criteria. It's a pretty simple procedure. So there are two possibilities:
The query is failing due to an issue with SQL Server or data
corruption.
There actually are no committed records matching the criteria at the time the
procedure is run.
Corruption is an unlikely, but possible cause. You can check for corruption with DBCC CHECKDB.
Most likely there really are no committed job records that have a Parent_ID equal to the #ParentId parameter and are not in a state of 100 at the time it is run.
I emphasize committed because that's what the transaction will see.
You never really explain in your question how the Parent_ID gets set on the jobs. My first thought is that maybe you are checking for unprocessed child jobs and it finds none, but then another process adds it as the Parent_ID of another incomplete job. Is this a possibility?
I see you added an update to show that when you add a child job record that the update of the parent and child records are wrapped in a transaction. This is good, but not the question I was asking. This is the scenario that I am considering as a possibility:
A Job Record is inserted and committed for the parent.
Jobs_GetFirstByType grabs the parent job.
A worker thread processes it and calls Jobs_UpdateStatus and updates it's status to 100.
Something calls Jobs_GetCountWithExcludedState with the job and returns 0.
A child job is created and attached to the completed parent job record... which makes it now incomplete again.
I'm not saying that this is what is happening... I'm just asking if it's possible and what steps are you taking to prevent it? For example, in your code above in the update to your question you are selecting a ParentJob to attach the child to outside of the transaction. Could it be that you are selecting a parent job and then it gets completed before you run the transaction that adds the child to the parent? Or maybe the last child job of a parent job completes so the worker thread checks and marks the parent complete, but some other worker thread has already selected the job to be the parent for a new child job?
There are many different scenarios that could cause the symptom you are describing. I believe that the problem is to be found in some code that you have not shared with us yet particularly about how jobs are created and the code surrounding calls to Jobs_GetCountWithExcludedState. If you can give more information I think you will be more likely to find a usable answer, otherwise the best we can do is guess all the things that could happen in the code we can't see.

Your problem is almost certainly caused by your choice of "READ COMMITTED" isolation level. The behaviour of this depends on your configuration setting for READ_COMMITTED_SNAPSHOT, but either way it allows another transaction thread to modify records that would have been seen by your SELECT, between your SELECT and your UPDATE - so you have a race condition.
Try it again with isolation level "SERIALIZABLE" and see if that fixes your problem. For more information on isolation levels, the documentation is very helpful:
https://msdn.microsoft.com/en-AU/library/ms173763.aspx

Your sql code looks fine. Therefore, the problem lies in how it is used.
Hypothesis #0
Procedure "Jobs_GetCountWithExcludedState" is called with a totally wrong ID. Because yes sometime problem are really just a little mistake. I doubt this is your case however.
Hypothesis #1
The code checking the field "Activity = 30" is doing it in "READ UNCOMMITED" isolation level. It would then call "Jobs_GetCountWithExcludedState" with an parentID that may not be really ready for it because the insertion transaction may not have ended yet or have been rollbacked.
Hypothesis #2
Procedure "Jobs_GetCountWithExcludedState" is called with an id that has no longer child. There could be many reason why this happen.
For example,
Transaction that inserted child job failed for whatever reason but this procedure was called anyway.
A single child job was deleted and was about to be replace.
etc
Hypothesis #3
Procedure "Jobs_GetCountWithExcludedState" is called before the childJob get their parentId assigned.
Conclusion
As you can see, we need more information on two things :
1. How "Jobs_GetCountWithExcludedState" is called.
2. How job are inserted. Is the parentId assigned at the insertion time or it is updated a bit later? Are they inserted in batch? Is there annexed code to it that do other stuff?
This is also where I recommend you to have a look to verify above hypothesis because the problem is most likely in the program.
Possible refactoring to invalidate all those hypothesis
Have the database tell the application which parents tasks are completed directly instead.
Just like "Jobs_GetFirstByType", there could be Jobs_GetFirstParentJobToComplete" which could return the next uncompleted parent job with completed childs if any. It could also be a view that return all of them. Either ways, usage of "Jobs_GetCountWithExcludedState" would then no longer be used thus invaliding all my hypothesis. The new procedure or view should be READ COMMITTED or above.

I have suggestion review client side and how you handle transaction and connection lifetime for each thread. Because all commands are running on client transaction.

Related

Stored procedure to constantly check table for records and process them

We have a process that is causing dirty read errors. So, I was thinking of redesigning it as a queue with a single process to go through the queue.
My idea was to create a table that various processes could insert into. Then, one process actually processes the records in the table. But, we need real-time results from the processing method. So, while I was thinking of scheduling it to run every couple of seconds, that may not be fast enough. I don't want a user waiting several seconds for a result.
Essentially, I was thinking of using an infinite loop so that one stored procedure is constantly running, and that stored procedure creates transactions to perform updates.
It could be something like:
WHILE 1=1
BEGIN
--Check for new records
IF NewRecordsExist
BEGIN
--Mark new records as "in process"
BEGIN TRANSACTION
--Process records
--If errors, Rollback
--Otherwise Commit transaction
END
END
But, I don't want SQL Server to get overburdened by this one method. Basically, I want this running in the background all the time, not eating up processor power. Unless there is a lot for it to do. Then, I want it doing its work. Is there a better design pattern for this? Are stored procedures with infinite loops thread-safe? I use this pattern all the time in Windows Processes, but this particular task is not appropriate for a Windows Process.
Update: We have transaction locking set. That is not the issue. We are trying to set aside items that are reserved for orders. So, we have a stored procedure start a transaction, check what is available, and then update the table to mark what is reserved.
The problem is that when two different users attempt to reserve the same product at the same time, the first process checks availability, finds product available, then start to reserve it. But, the second process cannot see what the first process is doing (we have transaction locking set), so it has no idea that another process is trying to reserve the items. It sees the items as still available and also goes to reserve them.
We considered application locking, but we are worried about waits, delays, etc. So, another solution we came up with is one process that handles reservations in a queue. It is first come first serve. Only one process will ever be reading the queue at a time. Different processes can add to the queue, but we no longer need to worry about two processes trying to reserve the same product at the same time. Only one process will be doing the reservations. I was hoping to do this all in SQL, but that may not be possible.
Disclaimer: This may be an option, but the recommendations for using a Service Broker to serialize requests are likely the better solutions.
If you can't use a transaction, but need your your stored procedure to return an immediate result, there are ways to safely update a record in a single statement.
DECLARE #ProductId INT = 123
DECLARE #Quantity INT = 5
UPDATE Inventory
SET Available = Available - #Quantity
WHERE ProductId = #ProductId
AND Available >= #Quantity
IF ##ROW_COUNT > 0
BEGIN
-- Success
END
Under the covers, there is still a transaction occurring accompanied by a lock, but it just covers this one statement.
If you need to update multiple records (reserve multiple product IDs) in one statement, you can use the OUTPUT clause to capture which records were successfully updated, which you can then compare with the original request.
DECLARE #Request TABLE (ProductId INT, Quantity INT)
DECLARE #Result TABLE (ProductId INT, Quantity INT)
INSERT #Request VALUES (123, 5), (456, 1)
UPDATE I
SET Available = Available - R.Quantity
OUTPUT R.ProductId, R.Quantity INTO #Result
FROM #Request R
JOIN Inventory I
ON I.ProductId = R.ProductId
AND I.Available >= R.#Quantity
IF (SELECT COUNT(*) FROM #Request) = (SELECT COUNT(*) FROM #Result)
BEGIN
-- Success
END
ELSE BEGIN
-- Partial or fail. Need to undo those that may have been updated.
END
In any case, you need to thoroughly think through your error handling and undo scenarios.
If you store reservations separate from inventory and define "Available" as "Inventory.OnHand - SUM(Reserved.Quantity)", this approach is not an option.
(Comments as to why this a bad approach will likely follow.)

Atomic DROP and SELECT ... INTO table

I would have thought that code like the following would be atomic: if DeleteMe exists before running this transaction, it should be dropped and recreated. Otherwise it should simply be created:
BEGIN TRANSACTION
IF OBJECT_ID('DeleteMe') IS NOT NULL
DROP TABLE DeleteMe
SELECT query.*
INTO DeleteMe
FROM (SELECT 1 AS Value) AS query
COMMIT TRANSACTION
However, it appears that executing this code multiple times concurrently can cause various combinations of the errors:
Cannot drop the table 'DeleteMe', because it does not exist or you do not have permission.
There is already an object named 'DeleteMe' in the database.
Here's a LINQPad Script to show what I mean.
var sql = #"
BEGIN TRANSACTION
IF OBJECT_ID('DeleteMe') IS NOT NULL
DROP TABLE DeleteMe
SELECT query.*
INTO DeleteMe
FROM (SELECT 1 AS Value) AS query
COMMIT TRANSACTION
";
await Task.WhenAll(Enumerable.Range(1, 50)
.Select(async i =>
{
using var connection = new SqlConnection(this.Connection.ConnectionString);
await connection.OpenAsync();
await connection.ExecuteAsync(sql);
}).Dump());
And an example of its output:
If I use SQL Server 2016's DROP TABLE IF EXISTS feature, that part at least appears to be atomic, but then another concurrent command can apparently still create the DeleteMe table between the time this one gets dropped and the time it gets created again.
Question: Is there any way to atomically drop, create, and populate a table, such that there's no time during which that table won't exist from the perspective of another concurrent connection?
Is there any way to atomically drop, create, and populate a table, such that there's no time during which that table won't exist from the perspective of another concurrent connection?
Sure. It's just like any transaction: you have to take an inconsistent lock on the very first statement. In your transaction two sessions can run IF OBJECT_ID('DeleteMe') IS NOT NULL at the same time. Then they both try to drop the object, and only one succeeds.
DROP TABLE IF EXISTS also performs the existence check before taking the exclusive schema lock on the object that would be necessary to drop it.
A simple and reliable way to get an exclusive lock is to use sp_getapplock.
eg
BEGIN TRANSACTION
exec sp_getapplock 'dropandcreate_DeleteMe', 'exclusive'
DROP TABLE IF EXISTS DeleteMe
SELECT query.*
INTO DeleteMe
FROM (SELECT 1 AS Value) AS query
COMMIT TRANSACTION
The biggest problem I see you encountering, is that by dropping the object you want to lock (you can lock an object, but not a 'name' of an object) you have nothing to lock.
Proposals that involve finding something else to lock only resolve half the issue; the process stops racing itself, but then any other process that references the DeleteMe table can still race with this process.
10x the process referenced in the question, using sp_getapplock, for example
Those 10 concurrent instances of the process no longer race each other
Then 1x another process that only uses SELECT * FROM DeleteMe but not sp_getapplock
That process CAN fail due to racing with the currently Active DROP/SELECT INTO process
That leads me to conclude that NOT dropping objects is better, so that the table in use remains in existence and CAN be locked...
BEGIN TRANSACTION
TRUNCATE TABLE DeleteMe
INSERT INTO DeleteMe SELECT 1 AS Value
COMMIT TRANSACTION
The TRUNCATE implicitly takes a table lock, and a secondary process that reads from this table never sees it as empty.

Query from multiple threads on a database table

I have a database table with thousands of entries. I have multiple worker threads which pick up one row at a time, does some work (takes roughly one second each). While picking up the row, each thread updates a flag on the database row (like a timestamp) so that the other threads do not pick it up. But the problem is that I end up in a scenario where multiple threads are picking up the same row.
My general question is that what general design approach should I follow here to ensure that each thread picks up unique rows and does their task independently.
Note : Multiple threads are running in parallel to hasten the processing of the database rows. So I would like to have a as small as possible critical segment or exclusive lock.
Just to give some context, below is the stored proc which picks up the rows from the table after it has updated the flag on the row. Please note that the stored proc is not compilable as I have removed unnecessary portions from it. But generally that's the structure of it.
The problem happens when multiple threads execute the stored proc in parallel. The change made by the update statement (note that the update is done after taking up a lock) in one thread is not visible to the other thread unless the transaction is committed. And as there is a SELECT statement (which takes around 50ms) between the UPDATE and the TRANSACTION COMMIT, on 20% cases the UPDATE statement in a thread picks up a row which has already been processed.
I hope I am clear enough here.
USE ['mydatabase']
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GetRequest]
AS
BEGIN
-- some variable declaration here
BEGIN TRANSACTION
-- check if there are blocking rows in the request table
-- FM: Remove records that don't qualify for operation.
-- delete operation on the table to remove rows we don't want to process
delete FROM request where somecondition = 1
-- Identify the requests to process
DECLARE #TmpTableVar table(TmpRequestId int NULL);
UPDATE TOP(1) request
WITH (ROWLOCK)
SET Lock = DateAdd(mi, 5, GETDATE())
OUTPUT INSERTED.ID INTO #TmpTableVar
FROM request tur
WHERE (Lock IS NULL OR GETDATE() > Lock) -- not locked or lock expired
AND GETDATE() > NextRetry -- next in the queue
IF(##RowCount = 0)
BEGIN
ROLLBACK TRANSACTION
RETURN
END
select #RequestID = TmpRequestId from #TmpTableVar
-- Get details about the request that has been just updated
SELECT somerows
FROM request
WHERE somecondition = 1
COMMIT TRANSACTION
END
The analog of a critical section in SQL Server is sp_getapplock, which is simple to use. Alternatively you can SELECT the row to update with (UPDLOCK,READPAST,ROWLOCK) table hints. Both of these require a multi-statement transaction to control the duration of the exclusive locking.
You need start a transaction isolation level on sql for isolation your line, but this can impact on your performance.
Look the sample:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
GO
SELECT ID, NAME, FLAG FROM SAMPLE_TABLE WHERE FLAG=0
GO
UPDATE SAMPLE_TABLE SET FLAG=1 WHERE ID=1
GO
COMMIT TRANSACTION
Finishing, not exist a better way for use isolation level. You need analyze the positive and negative point for each level isolation and test your system performance.
More information:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
http://www.besttechtools.com/articles/article/sql-server-isolation-levels-by-example
https://en.wikipedia.org/wiki/Isolation_(database_systems)

Detect if Stored Proc is being called within a transaction

Ok - so the answer is probably 'everything is within a transaction in SQLServer'. But here's my problem.
I have a stored procedure which returns data based on one input parameter. This is being called from an external service which I believe is Java-based.
I have a log-table to see how long each hit to this proc is taking. Kind of like this (simplified)...
TABLE log (ID INT PRIMARY KEY IDENTITY(1,1), Param VARCHAR(10), In DATETIME, Out DATETIME)
PROC troublesome
(#param VARCHAR(10))
BEGIN
--log the 'in' time and keep the ID
DECLARE #logid INT
INSERT log (In,Param) VALUES (GET_DATE(),#param)
SET #logid = SCOPE_IDENTITY()
SELECT <some stuff from another table based on #param>
--set the 'out' time
UPDATE log SET Out = GET_DATE() WHERE ID = #logid
END
So far so easily-criticised by SQL fascists.
Anyway - When I call this eg troublesome 'SOMEPARAM' - I get the result of the SELECT back and in the log table is a nice new row with the SOMEPARAM and the in and out timestamps.
Now - I watch this table, and even though no rows are going in - if I am to generate a row, I will see that the ID has skipped many places. I guess this is being caused by the external client code hitting it. They are getting the result of the SELECT - but I am not getting their log data.
This suggests to me they are wrapping their client call in a TRAN and rolling it back. Which is one thing that would cause this behaviour. I want to know...
If there is a way I can FORCE the write of the log even if it is contained within a transaction over which I have no control and which is subsequently rolled back (seems unlikely)
If there is a way I can determine from within this proc if it is executing within a transaction (and perhaps raise an error)
If there are other possible explanations for the ID skipping (and it's skipping alot like 1000 places in an hour. So I'm sure it's caused by the client code - as they are reporting successfully retrieving the results of the SELECT also.)
Thanks!
For the why rows are skipped part of the question, your assumption sounds plausible to me.
For how to force the write even within a transaction, I don't know.
As for how to discover whether the SP is running inside a transaction how about the following:
SELECT ##trancount
I tested this out in an sql batch and seems to be ok. Maybe it can take you a bit further.

Record locking and concurrency issues

My logical schema is as follows:
A header record can have multiple child records.
Multiple PCs can be inserting Child records, via a stored procedure that accepts details about the child record, and a value.
When a child record is inserted, a header record may need to be inserted if one doesn't exist with the specified value.
You only ever want one header record inserted for any given "value". So if two child records are inserted with the same "Value" supplied, the header should only be created once. This requires concurrency management during inserts.
Multiple PCs can be querying unprocessed header records, via a stored procedure
A header record needs to be queried if it has a specific set of child records, and the header record is unprocessed.
You only ever want one machine PC to query and process each header record. There should never be an instance where a header record and it's children should be processed by more than one PC. This requires concurrency management during selects.
So basically my header query looks like this:
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT TOP 1
*
INTO
#unprocessed
FROM
Header h WITH (READPAST, UPDLOCK)
JOIN
Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN
Child part2 ON part1.HeaderID = part2.HeaderID AND
WHERE
h.Processed = 0x0;
UPDATE
Header
SET
Processed = 0x1
WHERE
HeaderID IN (SELECT [HeaderID] FROM #unprocessed);
SELECT * FROM #unprocessed
COMMIT TRAN
So the above query ensures that concurrent queries never return the same record.
I think my problem is on the insert query. Here's what I have:
DECLARE #HeaderID INT
BEGIN TRAN
--Create header record if it doesn't exist, otherwise get it's HeaderID
MERGE INTO
Header WITH (HOLDLOCK) as target
USING
(
SELECT
[Value] = #Value, --stored procedure parameter
[HeaderID]
) as source ([Value], [HeaderID]) ON target.[Value] = source.[Value] AND
target.[Processed] = 0
WHEN MATCHED THEN
UPDATE SET
--Get the ID of the existing header
#HeaderID = target.[HeaderID],
[LastInsert] = sysdatetimeoffset()
WHEN NOT MATCHED THEN
INSERT
(
[Value]
)
VALUES
(
source.[Value]
)
--Get new or existing ID
SELECT #HeaderID = COALESCE(#HeaderID , SCOPE_IDENTITY());
--Insert child with the new or existing HeaderID
INSERT INTO
[Correlation].[CorrelationSetPart]
(
[HeaderID],
[Name]
)
VALUES
(
#HeaderID,
#Name --stored procedure parameter
);
My problem is that insertion query is often blocked by the above selection query, and I'm receiving timeouts. The selection query is called by a broker, so it can be called fairly quickly. Is there a better way to do this? Note, I have control over the database schema.
To answer the second part of the question
You only ever want one machine PC to query and process each header
record. There should never be an instance where a header record and
it's children should be processed by more than one PC
Have a look at sp_getapplock.
I use app locks within the similar scenario. I have a table of objects that must be processed, similar to your table of headers. The client application runs several threads simultaneously. Each thread executes a stored procedure that returns the next object for processing from the table of objects. So, the main task of the stored procedure is not to do the processing itself, but to return the first object in the queue that needs processing.
The code may look something like this:
CREATE PROCEDURE [dbo].[GetNextHeaderToProcess]
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarHeaderID int = NULL;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'GetNextHeaderToProcess_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Find the most suitable header for processing
SELECT TOP 1
#VarHeaderID = h.HeaderID
FROM
Header h
JOIN Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN Child part2 ON part1.HeaderID = part2.HeaderID
WHERE
h.Processed = 0x0
ORDER BY ....;
-- sorting is optional, but often useful
-- for example, order by some timestamp to process oldest/newest headers first
-- Mark the found Header to prevent multiple processing.
UPDATE Header
SET Processed = 2 -- in progress. Another procedure that performs the actual processing should set it to 1 when processing is complete.
WHERE HeaderID = #VarHeaderID;
-- There is no need to explicitly verify if we found anything.
-- If #VarHeaderID is null, no rows will be updated
END;
-- Return found Header, or no rows if nothing was found, or failed to acquire the lock
SELECT
#VarHeaderID AS HeaderID
WHERE
#VarHeaderID IS NOT NULL
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
This procedure should be called from the procedure that does actual processing. In my case the client application does the actual processing, in your case it may be another stored procedure. The idea is that we acquire the app lock for the short time here. Of course, if the actual processing is fast you can put it inside the lock, so only one header can be processed at a time.
Once the lock is acquired we look for the most suitable header to process and then set its Processed flag. Depending on the nature of your processing you can set the flag to 1 (processed) right away, or set it to some intermediary value, like 2 (in progress) and then set it to 1 (processed) later. In any case, once the flag is not zero the header will not be chosen for processing again.
These app locks are separate from normal locks that DB puts when reading and updating rows and they should not interfere with inserts. In any case, it should be better than locking the whole table as you do WITH (UPDLOCK).
Returning to the first part of the question
You only ever want one header record inserted for any given "value".
So if two child records are inserted with the same "Value" supplied,
the header should only be created once.
You can use the same approach: acquire app lock in the beginning of the inserting procedure (with some different name than the app lock used in querying procedure). Thus you would guarantee that inserts happen sequentially, not simultaneously. BTW, in practice most likely inserts can't happen simultaneously anyway. The DB would perform them sequentially internally. They will wait for each other, because each insert locks a table for update. Also, each insert is written to transaction log and all writes to transaction log are also sequential. So, just add sp_getapplock to the beginning of your inserting procedure and remove that WITH (HOLDLOCK) hint in the MERGE.
The caller of the GetNextHeaderToProcess procedure should handle correctly the situation when procedure returns no rows. This can happen if the lock acquisition timed out, or there are simply no more headers to process. Usually the processing part simply retries after a while.
Inserting procedure should check if the lock acquisition failed and retry the insert or report the problem to the caller somehow. I usually return the generated identity ID of the inserted row (the ChildID in your case) to the caller. If procedure returns 0 it means that insert failed. The caller may decide what to do.

Resources