Let each run of a sproc process its share of rows

Let each run of a sproc process its share of rows - sql-server

I am maintaining a sproc where the developer has implemented his own locking mechanism but to me it seemed flawed:
CREATE PROCEDURE Sproc 1
AS
Update X
set flag = lockedforprocessing
where flag = unprocessed
-- Some processing occurs here with enough time to
-- 1. table X gets inserted new rows with a flag of unprocessed
-- 2. start another instance of this Sproc 1 that executes the above update
Select from X
where flag = lockedforprocessing
-- Now the above statement reads rows that it hadn't put a lock on to start with.
I know that I can just wrap it sproc inside a transaction with isolation level of SERIALIZABLE but I want to avoid this.
The goal is
that multiple instances of this sproc can run at the same time and process their own "share" of the records to achieve maximum concurrency.
An execution of the sproc should not wait on a previous run that is still executing
I don't think REPEATABLE READ can help here since it won't prevent the new records with a value of "unprocessed" being read (correct me if I'm wrong please).
I just discovered the sp_getlock sproc and it would resolve the bug but serialize exaction which is not my goal.
A solution that I see is to have each run of the proc generate its own unique GUID and assign that to the flag but somehow I am thinking I am simulating something that SQL Server already can solve out of the box.
Is the only way that let each run of a sproc process it's "share" of the rows to have it in SERIALIZABLE?
Regards, Tom

Assuming there is an ID field in X, a temporary table of updated Xs can help:
CREATE PROCEDURE Sproc 1
AS
-- Temporary table listing all accessed Xs
declare #flagged table (ID int primary key)
-- Lock and retrieve locked records
Update X
set flag = lockedforprocessing
output Inserted.ID into #flagged
where flag = unprocessed
-- Processing
Select from X inner join #flagged f on x.ID = f.ID
-- Clean-up
update X
set flag = processed
from x inner join #flagged f on x.ID = f.ID

Related

How to find blocking session ID for my SP

I have a SP that runs at night and sometimes it does not finish. The tool I automate the runs with has an option about that can kill the job after some time if it does not finish, i.e. it kills the job e.g. after one hour.
Anyway I think the reason it sometimes does not finish in the maximum allotted time is because it is being blocked by another session ID. How can I query the DMV's for the text of the query and find out exactly what is in the blocking session.
I have this query and I know the blocking session ID and my session ID.
SELECT TOP 100 w.session_id, w.wait_duration_ms, w.blocking_session_id, w.wait_type, e.database_id, D.name
FROM sys.dm_os_waiting_tasks w
LEFT JOIN sys.dm_exec_sessions e ON w.session_id = e.session_id
LEFT JOIN sys.databases d ON e.database_id = d.database_id
where w.session_id = x and w.blocking_session_id = y
order by w.wait_duration_ms desc
How can I get the content (e.g. name of the SP) of the blocking session ID?

You can download and create sp_whoisactive routine. It will give you a clear details of what's going on right now.
For example, create a table:
DROP TABLE IF EXISTS dbo.TEST;
CREATE TABLE dbo.TEST
(
[Column] INT
);
In one session execute the code below:
BEGIN TRAN;
INSERT INTO dbo.TEST
SELECT 1
-- commit tran
Then in second:
SELECT *
FROM dbo.TEST;
In third one, execute the routine:
EXEC sp_Whoisactive
It will give you something like the below:
You can clearly see the SELECT is blocked by the session with open transaction.
As the routine is returning the activity for particular moment, you may want to record the details in a table and analyze them later.
If you are doubting that the process is blocked or a deadlock victim, it will be more appropriate to create extended event session which is collecting only these events. There are many examples of how this is done and it's easy. It's good, because you can analyze the deadlock graph and fix the issue easier.

Query from multiple threads on a database table

I have a database table with thousands of entries. I have multiple worker threads which pick up one row at a time, does some work (takes roughly one second each). While picking up the row, each thread updates a flag on the database row (like a timestamp) so that the other threads do not pick it up. But the problem is that I end up in a scenario where multiple threads are picking up the same row.
My general question is that what general design approach should I follow here to ensure that each thread picks up unique rows and does their task independently.
Note : Multiple threads are running in parallel to hasten the processing of the database rows. So I would like to have a as small as possible critical segment or exclusive lock.
Just to give some context, below is the stored proc which picks up the rows from the table after it has updated the flag on the row. Please note that the stored proc is not compilable as I have removed unnecessary portions from it. But generally that's the structure of it.
The problem happens when multiple threads execute the stored proc in parallel. The change made by the update statement (note that the update is done after taking up a lock) in one thread is not visible to the other thread unless the transaction is committed. And as there is a SELECT statement (which takes around 50ms) between the UPDATE and the TRANSACTION COMMIT, on 20% cases the UPDATE statement in a thread picks up a row which has already been processed.
I hope I am clear enough here.
USE ['mydatabase']
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GetRequest]
AS
BEGIN
-- some variable declaration here
BEGIN TRANSACTION
-- check if there are blocking rows in the request table
-- FM: Remove records that don't qualify for operation.
-- delete operation on the table to remove rows we don't want to process
delete FROM request where somecondition = 1
-- Identify the requests to process
DECLARE #TmpTableVar table(TmpRequestId int NULL);
UPDATE TOP(1) request
WITH (ROWLOCK)
SET Lock = DateAdd(mi, 5, GETDATE())
OUTPUT INSERTED.ID INTO #TmpTableVar
FROM request tur
WHERE (Lock IS NULL OR GETDATE() > Lock) -- not locked or lock expired
AND GETDATE() > NextRetry -- next in the queue
IF(##RowCount = 0)
BEGIN
ROLLBACK TRANSACTION
RETURN
END
select #RequestID = TmpRequestId from #TmpTableVar
-- Get details about the request that has been just updated
SELECT somerows
FROM request
WHERE somecondition = 1
COMMIT TRANSACTION
END

The analog of a critical section in SQL Server is sp_getapplock, which is simple to use. Alternatively you can SELECT the row to update with (UPDLOCK,READPAST,ROWLOCK) table hints. Both of these require a multi-statement transaction to control the duration of the exclusive locking.

You need start a transaction isolation level on sql for isolation your line, but this can impact on your performance.
Look the sample:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
GO
SELECT ID, NAME, FLAG FROM SAMPLE_TABLE WHERE FLAG=0
GO
UPDATE SAMPLE_TABLE SET FLAG=1 WHERE ID=1
GO
COMMIT TRANSACTION
Finishing, not exist a better way for use isolation level. You need analyze the positive and negative point for each level isolation and test your system performance.
More information:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
http://www.besttechtools.com/articles/article/sql-server-isolation-levels-by-example
https://en.wikipedia.org/wiki/Isolation_(database_systems)

Read Committed vs Repeatable Read Example

I'm trying to execute the following two queries in SQL Server Management Studio (in separate query windows). I run them in the same order I typed them here.
When isolation level is set to READ COMMITTED they execute ok, but when it's set to REPEATABLE READS the transactions are dead locked.
Can you please help me to understand what is dead locked here?
First:
begin tran
declare #a int, #b int
set #a = (select col1 from Test where id = 1)
set #b = (select col1 from Test where id = 2)
waitfor delay '00:00:10'
update Test set col1 = #a + #b where id = 1
update Test set col1 = #a - #b where id = 2
commit
Second:
begin tran
update Test set col1 = -1 where id = 1
commit
UPD Answer is laready given but folowing the advice I'm inserting the deadlock graph

In both cases the selects use a shared lock and the updates an exclusive lock.
In READ COMMITTED mode, the shared lock is released immediately after the select finishes.
In REPEATABLE READS mode, the shared locks for the selects are held untill the end of the transaction, to ensure that no other sessions can change the data that was read. A new read within the same transaction is garantueed to yield the same results, unless the data was changed in the current session/transaction
Originally I thought, that you executed "First" in both sessions. Then the explanation would be trivial: both sessions acquire and get a shared lock, which then blocks the exclusive lock required for the updates.
The situation with a second session doing only an update is a little more complex. An update staement will first acquire an update lock (UPDLOCK) for selecting the rows that must be updated, which is probably similar to a shared lock, but at least not blocked by a shared lock. Next, when the data is actually updated, it tries to convert the update lock to an exclusive lock, which fails, because the first session is still holding the shared lock. Now both sessions block each other.

ROWLOCK in Stored Procedure with Composite Key in SQL Server

EDITED: I have a table with composite key which is being used by multiple windows services deployed on multiple servers.
Columns:
UserId (int) [CompositeKey],
CheckinTimestamp (bigint) [CompositeKey],
Status (tinyint)
There will be continuous insertion in this table. I want my windows service to select top 10000 rows and do some processing while locking those 10000 rows only. I am using ROWLOCK for this using below stored procedure:
ALTER PROCEDURE LockMonitoringSession
AS
BEGIN
BEGIN TRANSACTION
SELECT TOP 10000 * INTO #TempMonitoringSession FROM dbo.MonitoringSession WITH (ROWLOCK) WHERE [Status] = 0 ORDER BY UserId
DECLARE #UserId INT
DECLARE #CheckinTimestamp BIGINT
DECLARE SessionCursor CURSOR FOR SELECT UserId, CheckinTimestamp FROM #TempMonitoringSession
OPEN SessionCursor
FETCH NEXT FROM SessionCursor INTO #UserId, #CheckinTimestamp
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE dbo.MonitoringSession SET [Status] = 1 WHERE UserId = #UserId AND CheckinTimestamp = #CheckinTimestamp
FETCH NEXT FROM SessionCursor INTO #UserId, #CheckinTimestamp
END
CLOSE SessionCursor
DEALLOCATE SessionCursor
SELECT * FROM #TempMonitoringSession
DROP TABLE #TempMonitoringSession
COMMIT TRANSACTION
END
But by doing so, dbo.MonitoringSession is being locked permanently until the stored procedure ends. I am not sure what I am doing wrong here.
The only purpose of this stored procedure is to select and update 10000 recent rows without any primary key and ensuring that whole table is not locked because multiple windows services are accessing this table.
Thanks in advance for any help.

(not an answer but too long for comment)
The purpose description should be about why/what for are you updating whole table. Your SP is for updating all rows with Status=0 to set Status=1. So when one of your services decides to run this SP - all rows become non-relevant. I mean, logically event which causes status change already occurred, you just need some time to physically change it in the database. So why do you want other services to read non-relevant rows? Ok, probably you need to read rows available to read (not changed) - but it's not clear again because you are updating whole table.
You may use READPAST hint to skip locked rows and you need rowlocks for that.
Ok, but even with processing of top N rows update of those N rows with one statement would be much faster then looping through this number of rows. You are doing same job but manually.
Check out example of combining UPDLOCK + READPAST to process same queue with parallel processes: https://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/
Small hint - CURSOR STATIC, READONLY, FORWARD_ONLY would do same thing as storing to temp table. Review STATIC option:
https://msdn.microsoft.com/en-us/library/ms180169.aspx
Another thing is a suggestion to think of RCSI. This will avoid other services locking for sure but this is a db-level option so you'll have to test all your functionality. Most of it will work same as before but some scenarios need testing (concurrent transactions won't be locked in situations where they were locked before).
Not clear to me:
what is the percentage of 10000 out of the total number of rows?
is there a clustered index or this is a heap?
what is actual execution plan for select and update?
what are concurrent transactions: inserts or selects?
by the way discovered similar question:
why the entire table is locked while "with (rowlock)" is used in an update statement

Record locking and concurrency issues

My logical schema is as follows:
A header record can have multiple child records.
Multiple PCs can be inserting Child records, via a stored procedure that accepts details about the child record, and a value.
When a child record is inserted, a header record may need to be inserted if one doesn't exist with the specified value.
You only ever want one header record inserted for any given "value". So if two child records are inserted with the same "Value" supplied, the header should only be created once. This requires concurrency management during inserts.
Multiple PCs can be querying unprocessed header records, via a stored procedure
A header record needs to be queried if it has a specific set of child records, and the header record is unprocessed.
You only ever want one machine PC to query and process each header record. There should never be an instance where a header record and it's children should be processed by more than one PC. This requires concurrency management during selects.
So basically my header query looks like this:
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT TOP 1
*
INTO
#unprocessed
FROM
Header h WITH (READPAST, UPDLOCK)
JOIN
Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN
Child part2 ON part1.HeaderID = part2.HeaderID AND
WHERE
h.Processed = 0x0;
UPDATE
Header
SET
Processed = 0x1
WHERE
HeaderID IN (SELECT [HeaderID] FROM #unprocessed);
SELECT * FROM #unprocessed
COMMIT TRAN
So the above query ensures that concurrent queries never return the same record.
I think my problem is on the insert query. Here's what I have:
DECLARE #HeaderID INT
BEGIN TRAN
--Create header record if it doesn't exist, otherwise get it's HeaderID
MERGE INTO
Header WITH (HOLDLOCK) as target
USING
(
SELECT
[Value] = #Value, --stored procedure parameter
[HeaderID]
) as source ([Value], [HeaderID]) ON target.[Value] = source.[Value] AND
target.[Processed] = 0
WHEN MATCHED THEN
UPDATE SET
--Get the ID of the existing header
#HeaderID = target.[HeaderID],
[LastInsert] = sysdatetimeoffset()
WHEN NOT MATCHED THEN
INSERT
(
[Value]
)
VALUES
(
source.[Value]
)
--Get new or existing ID
SELECT #HeaderID = COALESCE(#HeaderID , SCOPE_IDENTITY());
--Insert child with the new or existing HeaderID
INSERT INTO
[Correlation].[CorrelationSetPart]
(
[HeaderID],
[Name]
)
VALUES
(
#HeaderID,
#Name --stored procedure parameter
);
My problem is that insertion query is often blocked by the above selection query, and I'm receiving timeouts. The selection query is called by a broker, so it can be called fairly quickly. Is there a better way to do this? Note, I have control over the database schema.

To answer the second part of the question
You only ever want one machine PC to query and process each header
record. There should never be an instance where a header record and
it's children should be processed by more than one PC
Have a look at sp_getapplock.
I use app locks within the similar scenario. I have a table of objects that must be processed, similar to your table of headers. The client application runs several threads simultaneously. Each thread executes a stored procedure that returns the next object for processing from the table of objects. So, the main task of the stored procedure is not to do the processing itself, but to return the first object in the queue that needs processing.
The code may look something like this:
CREATE PROCEDURE [dbo].[GetNextHeaderToProcess]
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarHeaderID int = NULL;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'GetNextHeaderToProcess_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Find the most suitable header for processing
SELECT TOP 1
#VarHeaderID = h.HeaderID
FROM
Header h
JOIN Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN Child part2 ON part1.HeaderID = part2.HeaderID
WHERE
h.Processed = 0x0
ORDER BY ....;
-- sorting is optional, but often useful
-- for example, order by some timestamp to process oldest/newest headers first
-- Mark the found Header to prevent multiple processing.
UPDATE Header
SET Processed = 2 -- in progress. Another procedure that performs the actual processing should set it to 1 when processing is complete.
WHERE HeaderID = #VarHeaderID;
-- There is no need to explicitly verify if we found anything.
-- If #VarHeaderID is null, no rows will be updated
END;
-- Return found Header, or no rows if nothing was found, or failed to acquire the lock
SELECT
#VarHeaderID AS HeaderID
WHERE
#VarHeaderID IS NOT NULL
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
This procedure should be called from the procedure that does actual processing. In my case the client application does the actual processing, in your case it may be another stored procedure. The idea is that we acquire the app lock for the short time here. Of course, if the actual processing is fast you can put it inside the lock, so only one header can be processed at a time.
Once the lock is acquired we look for the most suitable header to process and then set its Processed flag. Depending on the nature of your processing you can set the flag to 1 (processed) right away, or set it to some intermediary value, like 2 (in progress) and then set it to 1 (processed) later. In any case, once the flag is not zero the header will not be chosen for processing again.
These app locks are separate from normal locks that DB puts when reading and updating rows and they should not interfere with inserts. In any case, it should be better than locking the whole table as you do WITH (UPDLOCK).
Returning to the first part of the question
You only ever want one header record inserted for any given "value".
So if two child records are inserted with the same "Value" supplied,
the header should only be created once.
You can use the same approach: acquire app lock in the beginning of the inserting procedure (with some different name than the app lock used in querying procedure). Thus you would guarantee that inserts happen sequentially, not simultaneously. BTW, in practice most likely inserts can't happen simultaneously anyway. The DB would perform them sequentially internally. They will wait for each other, because each insert locks a table for update. Also, each insert is written to transaction log and all writes to transaction log are also sequential. So, just add sp_getapplock to the beginning of your inserting procedure and remove that WITH (HOLDLOCK) hint in the MERGE.
The caller of the GetNextHeaderToProcess procedure should handle correctly the situation when procedure returns no rows. This can happen if the lock acquisition timed out, or there are simply no more headers to process. Usually the processing part simply retries after a while.
Inserting procedure should check if the lock acquisition failed and retry the insert or report the problem to the caller somehow. I usually return the generated identity ID of the inserted row (the ChildID in your case) to the caller. If procedure returns 0 it means that insert failed. The caller may decide what to do.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight