tsql: stored procedure and rowlock - sql-server

I have a concurrency in a multiple user system and a stored procedure as shown below:
CREATE PROCEDURE dbo.GetWorkitemID
AS
DECLARE #workitem int;
UPDATE workqueue
SET status = 'InProcess', #workitem = workitemid
WHERE workitemid = (SELECT TOP 1 workitemid
FROM workqueue WITH (ROWLOCK,UPDLOCK,READPAST)
WHERE status = 'New' ORDER BY workitemid)
SELECT #workitem
GO
It updates a single record status from 'New' to 'InProcess' and returns record's ID.
The questions are as follows: Should I use this stored procedure in a transaction scope to enable ROWLOCK, UPDLOCK etc.? Is it required? And the second: Is it really thread safe and guarantee uniqueness?

This is the correct way to run "table as a queue"
See this please: SQL Server Process Queue Race Condition
You don't need a transaction
This is both thread and concurrency safe
Edit:
As a counter example to Filip De Vos's
Note the use of an covering index and UPDLOCK not XLOCK and the same query
DROP table locktest
create table locktest (id int, workitem int, status varchar(50))
insert into locktest (id, workitem) values (1, 1), (2,2), (3,3)
create index ix_test2 on locktest(workitem) INCLUDE (id, status)
--When I run this on one connection
begin tran
select top (1) id, status
from locktest with (rowlock, updlock, readpast)
ORDER BY workitem
... I get expected results in another connection with the same query

Should I use this stored procedure in a transaction scope...
Every DML statement in SQL runs in the context of a transaction, whether you explicitly open one or not. By default, when executing each statement, SQL server will open a transaction if one is not open, execute the statement, and then commit the transaction (if no error occurred) or roll it back.
Subject to the caveat mention by #Filip (that there's still no guarantee on the order in which items will be selected), it will be safe and each invocation will return a different row, if one is available and not locked.

It is not reliable. Because the locking hints you gave are just that, locking hints. Additionally, depending on the way the table is indexed the results might be very different.
For example:
create table test (id int, workitem int, status varchar(50))
insert into test (id, workitem) values (1, 1), (2,2), (3,3)
create index ix_test on test(workitem)
When I run this on one connection
begin tran
select * from test with (rowlock, xlock, holdlock) where workitem = 1
And I run this on a second connection:
select top (1) * from test with (rowlock, readpast) order by workitem
This returns:
workitem
--------
3
Same if i do:
update top (1) test with (rowlock, readpast)
set status = 'Proc'
output inserted.workitem
So, you can use this to concurrent pick up what you need, but this is not a reliable way to have in-order concurrent processing.

Related

SQL Server Custom Identity Column

I want to generate a custom identity column related to type of product.
Can this query guaranty the order of identity and resolve concurrency.
This is a sample query:
BEGIN TRAN
INSERT INTO TBLKEY
VALUES((SELECT 'A-' + CAST(MAX(CAST(ID AS INT)) + 1 AS NVARCHAR) FROM TBLKEY),'EHSAN')
COMMIT
Try this:
BEGIN TRAN
INSERT INTO TBLKEY
VALUES((SELECT MAX(ID) + 1 AS NVARCHAR) FROM TBLKEY WITH (UPDLOCK)),'EHSAN')
COMMIT
When selecting the max ID you acquire a U lock on the row. The U lock is incompatible with the U lock which will try to acquire another session with the same query running at the same time. Only one query will be executed at a given time. The ids will be in order and continuous without any gaps between them.
A better solution would be to create an extra table dedicated only for storing the current or next id and use it instead of the maximum.
You can understand the difference by doing the following:
Prepare a table
CREATE TABLE T(id int not null PRIMARY KEY CLUSTERED)
INSERT INTO T VALUES(1)
And then run the following query in two different sessions one after another with less than 10 seconds apart
BEGIN TRAN
DECLARE #idv int
SELECT #idv = max (id) FROM T
WAITFOR DELAY '0:0:10'
INSERT INTO T VALUES(#idv+1)
COMMIT
Wait for a while until both queries complete. Observe that one of them succeeded and the other failed.
Now do the same with the following query
BEGIN TRAN
DECLARE #idv int
SELECT #idv = max (id) FROM T WITH (UPDLOCK)
WAITFOR DELAY '0:0:5'
INSERT INTO T VALUES(#idv+1)
COMMIT
View the contents of T
Cleanup the T Table with DROP TABLE T
This would be a bad thing to do as there is no way to guarantee that two queries running at the same time wouldn't get MAX(ID) as being the same value.
If you used a standard identity column you could also have a computed column which uses that or just return the key when you return the data.
Ed

SQL Server race condition issue with range lock

I'm implementing a queue in SQL Server (please no discussions about this) and am running into a race condition issue. The T-SQL of interest is the following:
set transaction isolation level serializable
begin tran
declare #RecordId int
declare #CurrentTS datetime2
set #CurrentTS=CURRENT_TIMESTAMP
select top 1 #RecordId=Id from QueuedImportJobs with (updlock) where Status=#Status and (LeaseTimeout is null or #CurrentTS>LeaseTimeout) order by Id asc
if ##ROWCOUNT> 0
begin
update QueuedImportJobs set LeaseTimeout = DATEADD(mi,5,#CurrentTS), LeaseTicket=newid() where Id=#RecordId
select * from QueuedImportJobs where Id = #RecordId
end
commit tran
RecordId is the PK and there is also an index on Status,LeaseTimeout.
What I'm basically doing is select a record of which the lease happens to be expired, while simultaneously updating the lease time with 5 minutes and setting a new lease ticket.
So the problem is that I'm getting deadlocks when I run this code in parallel using a couple of threads. I've debugged it up to the point where I found out that the update statement sometimes gets executing twice for the same record. Now, I was under the impression that the with (updlock) should prevent this (it also happens with xlock btw, not with tablockx). So it actually look like there is a RangeS-U and a RangeX-X lock on the same range of records, which ought to be impossible.
So what am I missing? I'm thinking it might have something to do with the top 1 clause or that SQL Server does not know that where Id=#RecordId is actually in the locked range?
Deadlock graph:
Table schema (simplified):
It looks like the locks are on different HOBT's. Are there multiple indexes on the table?
If so, the select with (updlock) might only take an update lock on one index.
Why not just:
DECLARE #t TABLE(Id INT);
UPDATE TOP (1) dbo.QueuedImportJobs
SET LeaseTimeout = DATEADD(MINUTE, 5, CURRENT_TIMESTAMP)
OUTPUT inserted.Id INTO #t
WHERE Status = #Status
AND COALESCE(LeaseTimeout, '19000101') < CURRENT_TIMESTAMP;
SELECT <cols> FROM dbo.QueuedImportJobs
WHERE Id IN (SELECT Id FROM #t);
As an aside you might want to have ORDER BY to ensure the selected row is the first one on the queue according to the desired index order. If the index on Id is clustered, this is probably how it will work anyway, but there is no guarantee unless you say so. This will require a slight re-structuring of the query, since you can't apply ORDER BY (or an index hint) directly on an UPDATE, e.g.:
WITH x AS
(
SELECT TOP (1) Id, LeaseTimeout
FROM dbo.QueuedImportJobs
WHERE Status = #Status
AND COALESCE(LeaseTimeout, '19000101') < CURRENT_TIMESTAMP
ORDER BY Id
)
UPDATE x
SET LeaseTimeout = DATEADD(MINUTE, 5, CURRENT_TIMESTAMP)
OUTPUT inserted.id INTO #t;

T-SQL Is a sub query for an Update restriction Atomic with the update?

I've got a simple queue implementation in MS Sql Server 2008 R2. Here's the essense of the queue:
CREATE TABLE ToBeProcessed
(
Id BIGINT IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Priority] INT DEFAULT(100) NOT NULL,
IsBeingProcessed BIT default (0) NOT NULL,
SomeData nvarchar(MAX) NOT null
)
I want to atomically select the top n rows ordered by the priority and the id where IsBeingProcessed is false and update those rows to say they are being processed. I thought I'd use a combination of Update, Top, Output and Order By but unfortunately you can't use top and order by in an Update statement.
So I've made an in clause to restrict the update and that sub query does the order by (see below). My question is, is this whole statement atomic, or do I need to wrap it in a transaction?
DECLARE #numberToProcess INT = 2
CREATE TABLE #IdsToProcess
(
Id BIGINT NOT null
)
UPDATE
ToBeProcessed
SET
ToBeProcessed.IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
WHERE
ToBeProcessed.Id IN
(
SELECT TOP(#numberToProcess)
ToBeProcessed.Id
FROM
ToBeProcessed
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC)
SELECT
*
FROM
#IdsToProcess
DROP TABLE #IdsToProcess
Here's some sql to insert some dummy rows:
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
If I understand the motivation for the question you want to avoid the possibility that two concurrent transactions could both execute the sub query to get the top N rows to process then proceed to update the same rows?
In that case I'd use this approach.
;WITH cte As
(
SELECT TOP(#numberToProcess)
*
FROM
ToBeProcessed WITH(UPDLOCK,ROWLOCK,READPAST)
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC
)
UPDATE
cte
SET
IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
I was a bit uncertain earlier whether SQL Server would take U locks when processing your version with the sub query thus blocking two concurrent transactions from reading the same TOP N rows. This does not appear to be the case.
Test Table
CREATE TABLE JobsToProcess
(
priority INT IDENTITY(1,1),
isprocessed BIT ,
number INT
)
INSERT INTO JobsToProcess
SELECT TOP (1000000) 0,0
FROM master..spt_values v1, master..spt_values v2
Test Script (Run in 2 concurrent SSMS sessions)
BEGIN TRY
DECLARE #FinishedMessage VARBINARY (128) = CAST('TestFinished' AS VARBINARY (128))
DECLARE #SynchMessage VARBINARY (128) = CAST('TestSynchronising' AS VARBINARY (128))
SET CONTEXT_INFO #SynchMessage
DECLARE #OtherSpid int
WHILE(#OtherSpid IS NULL)
SELECT #OtherSpid=spid
FROM sys.sysprocesses
WHERE context_info=#SynchMessage and spid<>##SPID
SELECT #OtherSpid
DECLARE #increment INT = ##spid
DECLARE #number INT = #increment
WHILE (#number = #increment AND NOT EXISTS(SELECT * FROM sys.sysprocesses WHERE context_info=#FinishedMessage))
UPDATE JobsToProcess
SET #number=number +=#increment,isprocessed=1
WHERE priority = (SELECT TOP 1 priority
FROM JobsToProcess
WHERE isprocessed=0
ORDER BY priority DESC)
SELECT *
FROM JobsToProcess
WHERE number not in (0,#OtherSpid,##spid)
SET CONTEXT_INFO #FinishedMessage
END TRY
BEGIN CATCH
SET CONTEXT_INFO #FinishedMessage
SELECT ERROR_MESSAGE(), ERROR_NUMBER()
END CATCH
Almost immediately execution stops as both concurrent transactions update the same row so the S locks taken whilst identifying the TOP 1 priority must get released before it aquires a U lock then the 2 transactions proceed to get the row U and X lock in sequence.
If a CI is added ALTER TABLE JobsToProcess ADD PRIMARY KEY CLUSTERED (priority) then deadlock occurs almost immediately instead as in this case the row S lock doesn't get released, one transaction aquires a U lock on the row and waits to convert it to an X lock and the other transaction is still waiting to convert its S lock to a U lock.
If the query above is changed to use MIN rather than TOP
WHERE priority = (SELECT MIN(priority)
FROM JobsToProcess
WHERE isprocessed=0
)
Then SQL Server manages to completely eliminate the sub query from the plan and takes U locks all the way.
Every individual T-SQL statement is, according to all my experience and all the documenation I've ever read, supposed to be atomic. What you have there is a single T-SQL statement, ergo is should be atomic and will not require explicit transaction statements. I've used this precise kind of logic many times, and never had a problem with it. I look forward to seeing if anyone as a supportable alternate opinion.
Incidentally, look into the ranking functions, specifically row_number(), for retrieving a set number of items. The syntax is perhaps a tad awkward, but overall they are flexible and powerful tools. (There are about a bazillion Stack Overlow questions and answers that discuss them.)

Can I Select and Update at the same time?

This is an over-simplified explanation of what I'm working on.
I have a table with status column. Multiple instances of the application will pull the contents of the first row with a status of NEW, update the status to WORKING and then go to work on the contents.
It's easy enough to do this with two database calls; first the SELECT then the UPDATE. But I want to do it all in one call so that another instance of the application doesn't pull the same row. Sort of like a SELECT_AND_UPDATE thing.
Is a stored procedure the best way to go?
You could use the OUTPUT statement.
DECLARE #Table TABLE (ID INTEGER, Status VARCHAR(32))
INSERT INTO #Table VALUES (1, 'New')
INSERT INTO #Table VALUES (2, 'New')
INSERT INTO #Table VALUES (3, 'Working')
UPDATE #Table
SET Status = 'Working'
OUTPUT Inserted.*
FROM #Table t1
INNER JOIN (
SELECT TOP 1 ID
FROM #Table
WHERE Status = 'New'
) t2 ON t2.ID = t1.ID
Sounds like a queue processing scenario, whereby you want one process only to pick up a given record.
If that is the case, have a look at the answer I provided earlier today which describes how to implement this logic using a transaction in conjunction with UPDLOCK and READPAST table hints:
Row locks - manually using them
Best wrapped up in sproc.
I'm not sure this is what you are wanting to do, hence I haven't voted to close as duplicate.
Not quite, but you can SELECT ... WITH (UPDLOCK), then UPDATE.. subsequently. This is as good as an atomic operation as it tells the database that you are about to update what you previously selected, so it can lock those rows, preventing collisions with other clients. Under Oracle and some other database (MySQL I think) the syntax is SELECT ... FOR UPDATE.
Note: I think you'll need to ensure the two statements happen within a transaction for it to work.
You should do three things here:
Lock the row you're working on
Make sure that this and only this row is locked
Do not wait for the locked records: skip the the next ones instead.
To do this, you just issue this:
SELECT TOP 1 *
FROM mytable (ROWLOCK, UPDLOCK, READPAST)
WHERE status = 'NEW'
ORDER BY
date
UPDATE …
within a transaction.
A stored procedure is the way to go. You need to look at transactions. Sql server was born for this kind of thing.
Yes, and maybe use the rowlock hint to keep it isolated from the other threads, eg.
UPDATE
Jobs WITH (ROWLOCK, UPDLOCK, READPAST)
SET Status = 'WORKING'
WHERE JobID =
(SELECT Top 1 JobId FROM Jobs WHERE Status = 'NEW')
EDIT: Rowlock would be better as suggested by Quassnoi, but the same idea applies to do the update in one query.

Efficient transaction, record locking

I've got a stored procedure, which selects 1 record back. the stored procedure could be called from several different applications on different PCs. The idea is that the stored procedure brings back the next record that needs to be processed, and if two applications call the stored proc at the same time, the same record should not be brought back. My query is below, I'm trying to write the query as efficiently as possible (sql 2008). Can it get done more efficiently than this?
CREATE PROCEDURE GetNextUnprocessedRecord
AS
BEGIN
SET NOCOUNT ON;
--ID of record we want to select back
DECLARE #iID BIGINT
-- Find the next processable record, and mark it as dispatched
-- Must be done in a transaction to ensure no other query can get
-- this record between the read and update
BEGIN TRAN
SELECT TOP 1
#iID = [ID]
FROM
--Don't read locked records, only lock the specific record
[MyRecords] WITH (READPAST, ROWLOCK)
WHERE
[Dispatched] is null
ORDER BY
[Received]
--Mark record as picked up for processing
UPDATE
[MyRecords]
SET
[Dispatched] = GETDATE()
WHERE
[ID] = #iID
COMMIT TRAN
--Select back the specific record
SELECT
[ID],
[Data]
FROM
[MyRecords] WITH (NOLOCK, READPAST)
WHERE
[ID] = #iID
END
Using the READPAST locking hint is correct and your SQL looks OK.
I'd add use XLOCK though which is also HOLDLOCK/SERIALIZABLE
...
[MyRecords] WITH (READPAST, ROWLOCK, XLOCK)
...
This means you get the ID, and exclusively lock that row while you carry on and update it.
Edit: add an index on Dispatched and Received columns to make it quicker. If [ID] (I assume it's the PK) is not clustered, INCLUDE [ID]. And filter the index too because it's SQL 2008
You could also use this construct which does it all in one go without XLOCK or HOLDLOCK
UPDATE
MyRecords
SET
--record the row ID
#id = [ID],
--flag doing stuff
[Dispatched] = GETDATE()
WHERE
[ID] = (SELECT TOP 1 [ID] FROM MyRecords WITH (ROWLOCK, READPAST) WHERE Dispatched IS NULL ORDER BY Received)
UPDATE, assign, set in one
You can assign each picker process a unique id, and add columns pickerproc and pickstate to your records. Then
UPDATE MyRecords
SET pickerproc = myproc,
pickstate = 'I' -- for 'I'n process
WHERE Id = (SELECT MAX(Id) FROM MyRecords WHERE pickstate = 'A') -- 'A'vailable
That gets you your record in one atomic step, and you can do the rest of your processing at your leisure. Then you can set pickstate to 'C'omplete', 'E'rror, or whatever when it's resolved.
I think Mitch is referring to another good technique where you create a message-queue table and insert the Ids there. There are several SO threads - search for 'message queue table'.
You can keep MyRecords on a "MEMORY" table for faster processing.

Resources