I've got a long-running stored procedure on a SQL server database. I don't want it to run more often than once every ten minutes.
Once the stored procedure has run, I want to store the latest result in a LatestResult table, against a time, and have all calls to the procedure return that result for the next ten minutes.
That much is relatively simple, but we've found that, because the procedure checks the LatestResult table and updates it, that large userbases are getting a number of deadlocks, when two users call the procedure at the same time.
In a client-side/threading situation, I would solve this by using a lock, having the first user lock the function, the second user encounters the lock, waiting for the result, the first user finishes their procedure call, updates the LatestResult table, and unlocks the second user, who then picks up the result from the LatestResult table.
Is there any way to accomplish this kind of locking in SQL Server?
EDIT:
This is basically how the code looks without its error checking calls:
DECLARE #LastChecked AS DATETIME
DECLARE #LastResult AS NUMERIC(18,2)
SELECT TOP 1 #LastChecked = LastRunTime, #LastResult = LastResult FROM LastResult
DECLARE #ReturnValue AS NUMERIC(18,2)
IF DATEDIFF(n, #LastChecked, GetDate()) >= 10 OR NOT #LastResult = 0
BEGIN
SELECT #ReturnValue = ABS(ISNULL(SUM(ISNULL(Amount,0)),0)) FROM Transactions WHERE ISNULL(DeletedFlag,0) = 0 GROUP BY GroupID ORDER BY ABS(ISNULL(SUM(ISNULL(Amount,0)),0))
UPDATE LastResult SET LastRunTime = GETDATE(), LastResult = #ReturnValue
SELECT #ReturnValue
END
ELSE
BEGIN
SELECT #LastResult
END
I'm not really sure what's going on with the grouping, but I've found a test system where execution time is coming in around 4 seconds.
I think there's some work scheduled to archive some of these records and boil them down to running totals, which will probably help things given that there's several million rows in that four second table...
This is a valid opportunity to use an Application Lock (see sp_getapplock and sp_releaseapplock) as it is a lock taken out on a concept that you define, not on any particular rows in any given table. The idea is that you create a transaction, then create this arbitrary lock that has an indetifier, and other processes will wait to enter that piece of code until the lock is released. This works just like lock() at the app layer. The #Resource parameter is the label of the arbitrary "concept". In more complex situations, you can even concatenate a CustomerID or something in there for more granular locking control.
DECLARE #LastChecked DATETIME,
#LastResult NUMERIC(18,2);
DECLARE #ReturnValue NUMERIC(18,2);
BEGIN TRANSACTION;
EXEC sp_getapplock #Resource = 'check_timing', #LockMode = 'Exclusive';
SELECT TOP 1 -- not sure if this helps the optimizer on a 1 row table, but seems ok
#LastChecked = LastRunTime,
#LastResult = LastResult
FROM LastResult;
IF (DATEDIFF(MINUTE, #LastChecked, GETDATE()) >= 10 OR #LastResult <> 0)
BEGIN
SELECT #ReturnValue = ABS(ISNULL(SUM(ISNULL(Amount, 0)), 0))
FROM Transactions
WHERE DeletedFlag = 0
OR DeletedFlag IS NULL;
UPDATE LastResult
SET LastRunTime = GETDATE(),
LastResult = #ReturnValue;
END;
ELSE
BEGIN
SET #ReturnValue = #LastResult; -- This is always 0 here
END;
SELECT #ReturnValue AS [ReturnValue];
EXEC sp_releaseapplock #Resource = 'check_timing';
COMMIT TRANSACTION;
You need to manage errors / ROLLBACK yourself (as stated in the linked MSDN documentation) so put in the usual TRY / CATCH. But, this does allow you to manage the situation.
If there are any concerns regarding contention on this process, there shouldn't be much as the lookup done right after locking the resource is a SELECT from a single-row table and then an IF statement that (ideally) just returns the last known value if the 10-minute timer hasn't elapsed. Hence, most calls should process rather quickly.
Please note: sp_getapplock / sp_releaseapplock should be used sparingly; Application Locks can definitely be very handy (such as in cases like this one) but they should only be used when absolutely necessary.
Related
I'm trying to work out how I can prevent blocking while running my code. There are a few things I could swap out, and I'm not sure which would be best. I'm including below some fake code to illustrate my current layout, please look past any syntax errors
SELECT ID
INTO #ID
FROM Table
DECLARE #TargetID int = 0
BEGIN TRAN
DECLARE ID_Cursor FOR
SELECT ID
FROM TABLE
OPEN ID_Cursor
FETCH NEXT FROM ID_Cursor
INTO #TargetID
WHILE ##FETCH_STATUS = 0
BEGIN
EXEC usp_ChangeID
#ID = #TargetID
#NewValue = 100
FETCH NEXT FROM ID_Cursor INTO #TargetID
END
CLOSE ID_Cursor
DEALLOCATE ID_Cursor
IF ((SELECT COUNT(*) FROM Table WHERE Value = 100) = 10000)
COMMIT TRAN
ELSE
ROLLBACK TRAN
The problem I'm encountering is that the usp_ChangeID in my code updates about 15 tables each run and other spids that want to work with any of those tables are needing to wait until the entire process is done running. The stored proc itself runs in about a second, but I need to run it repeatedly. I'm thinking that these locks are because of the transaction rather than the cursor itself, though I'm not 100% sure. Ideally, my code would finish one run of the stored proc, let other users through to the tables, then run again once that other operation is complete. The rows I'm working with each time shouldn't be frequently used so a row lock would be perfect, but the blocking I'm seeing implies that isn't happening.
This is running on production data so I want to leave as little impact as possible while this runs. Performance hits are fine if it means less blocking. I don't want to break this into chunks because I generally want this to only be saved if every record is updated as expected and rolled back in any other case. I can't modify the proc, go around the proc, or do this without a cursor involved either. I'm leaning towards breaking my initial large select into smaller chunks, but I'd rather not have to change parts manually.
Thanks in advance!
I use stored procedures to manage a warehouse. PDA scanners scan the added stock and send it in bulk (when plugged back) to a SQL database (SQL Server 2016).
The SQL database is fairly remote (other country), so there's sometimes delay in some queries, but this particular one is problematic: even if the stock table is fine, I had some problems with updating the occupancy of the warehouse spots. The PDA tracks the added items in every spot as a SMALLINT, then send back this value to the stored procedure below.
PDA "send_spots" query:
SELECT spot, added_items FROM spots WHERE change=1
Stored procedure:
CREATE PROCEDURE [dbo].[update_spots]
#spot VARCHAR(10),
#added_items SMALLINT
AS
BEGIN
BEGIN TRAN
UPDATE storage_spots
SET remaining_capacity = remaining_capacity - #added_items
WHERE storage_spot=#spot
IF ##ROWCOUNT <> 1
BEGIN
ROLLBACK TRAN
RETURN - 1
END
ELSE
BEGIN
COMMIT TRAN
RETURN 0
END
END
GO
If the remaining_capacity value is 0, the PDAs can't add more items to it on next round. But with this process, I had negative values because the query allegedly ran two times (so subtracted #added_items two times).
Is there a way for that to be possible? How could I fix it? From what I understood the transaction should be cancelled (ROLLBACK) if the affected rows are != 1, but maybe that's something else.
EDIT: current solution with the help of #Zero:
CREATE PROCEDURE [dbo].[update_spots]
#spot VARCHAR(10),
#added_racks SMALLINT
AS
BEGIN
-- Recover current capacity of the spot
DECLARE #orig_capacity SMALLINT
SELECT TOP 1
#orig_capacity = remaining_capacity
FROM storage_spots
WHERE storage_spot=#spot
-- Test if double is present in logs by comparing dates (last 10 seconds)
DECLARE #is_double BIT = 0
SELECT #is_double = CASE WHEN EXISTS(SELECT *
FROM spot_logs
WHERE log_timestamp >= dateadd(second, -10, getdate()) AND storage_spot=#spot AND delta=#added_racks)
THEN 1 ELSE 0 END
BEGIN
BEGIN TRAN
UPDATE storage_spots
SET remaining_capacity= #orig_capacity - #added_racks
WHERE storage_spot=#spot
IF ##ROWCOUNT <> 1 OR #is_double <> 0
-- If double, rollback UPDATE
ROLLBACK TRAN
ELSE
-- If no double, commit UPDATE
COMMIT TRAN
-- write log
INSERT INTO spot_logs
(storage_spot, former_capacity, new_capacity, delta, log_timestamp, double_detected)
VALUES
(#spot, #orig_capacity, #orig_capacity-#added_racks, #added_racks, getdate(), #is_double)
END
END
GO
I was thinking about possible causes (and a way to trace them) and then it hit me - you have no value validation!
Here's a simple example to illustrate the problem:
Spot | capacity
---------------
x1 | 1
Update spots set capacity = capacity - 2 where spot = 'X1'
Your scanner most likely gave you higher quantity than you had capacity to take in.
I am not sure how your business logic goes, but you need to perform something in lines of
Update spots set capacity = capacity - #added_items where spot = 'X1' and capacity >= #added_items
if ##rowcount <> 1;
rollback;
EDIT: few methods to trace your problem without implementing validation:
Create a logging table (with timestamp, user id (user, that is connected to DB), session id, old value, new value, delta value (added items).
Option 1:
Log all updates that change value from positive to negative (at least until you figure out the problem).
The drawback of this option is that it will not register double calls that do not result in a negative capacity.
Option 2: (logging identical updates):
Create script that creates a global temporary table and deletes records, from that table timestamps older than... let's say 10 minutes once every minute or so (play around with numbers).
This temporary table should hold the data passed to your update statement so 'spot', 'added_items' + 'timestamp' (for tracking).
Now the crucial part: When you call your update statement check if a similar record exists in the temporary table (same spot and added_items and where current timestamp is between timestamp and [timestamp + 1 second] - again play around with numbers). If a record like that exist log that update, if not add it to temporary table.
This will register identical updates that go within 1 second of each other (or whatever time-frame you choose).
I found here an alternative SQL query that does the update the way I need, but with a temporary value using DECLARE. Would it work better in my case, or is my initial query correct?
Initial query:
UPDATE storage_spots
SET remaining_capacity = remaining_capacity - #added_items
WHERE storage_spot=#spot
Alternative query:
DECLARE #orig_capacity SMALLINT
SELECT TOP 1 #orig_capacity = remaining_capacity
FROM storage_spots
WHERE spot=#spot
UPDATE Products
SET remaining_capacity = #orig_capacity - #added_items
WHERE spot=#spot
Also, should I get rid of the ROLLBACK/COMMIT instructions?
I am working on a system that uses multiple threads to read, process and then update database records. Threads run in parallel and try to pick records by calling Sql Server stored procedure.
They call this stored procedure looking for unprocessed records multiple times per second and sometimes pick this same record up.
I try to prevent this happening this way:
UPDATE dbo.GameData
SET Exported = #Now,
ExportExpires = #Expire,
ExportSession = #ExportSession
OUTPUT Inserted.ID INTO #ExportedIDs
WHERE ID IN ( SELECT TOP(#ArraySize) GD.ID
FROM dbo.GameData GD
WHERE GD.Exported IS NULL
ORDER BY GD.ID ASC)
The idea here is to set a record as exported first using an UPDATE with OUTPUT (remembering record id), so no other thread can pick it up again. When record is set as exported, then I can do some extra calculations and pass the data to the external system hoping that no other thread will pick this same record again in the mean time. Since the UPDATE that has in mind to secure the record first.
Unfortunately it doesn't seem to be working and the application sometimes pick same record twice anyway.
How to prevent it?
Kind regards
Mariusz
I think you should be able to do this atomically using a common table expression. (I'm not 100% certain about this, and I haven't tested, so you'll need to verify that it works for you in your situation.)
;WITH cte AS
(
SELECT TOP(#ArrayCount)
ID, Exported, ExportExpires, ExportSession
FROM dbo.GameData WITH (READPAST)
WHERE Exported IS NULL
ORDER BY ID
)
UPDATE cte
SET Exported = #Now,
ExportExpires = #Expire,
ExportSession = #ExportSession
OUTPUT INSERTED.ID INTO #ExportedIDs
I have a similar set up and I use sp_getapplock. My application runs many threads and they call a stored procedure to get the ID of the element that has to be processed. sp_getapplock guarantees that the same ID would not be chosen by two different threads.
I have a MyTable with a list of IDs that my application checks in an infinite loop using many threads. For each ID there are two datetime columns: LastCheckStarted and LastCheckCompleted. They are used to determine which ID to pick. Stored procedure picks an ID that wasn't checked for the longest period. There is also a hard-coded period of 20 minutes - the same ID can't be checked more often than every 20 minutes.
CREATE PROCEDURE [dbo].[GetNextIDToCheck]
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarID int = NULL;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'SomeUniqueName_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Find ID that wasn't checked for the longest period
SELECT TOP 1
#VarID = ID
FROM
dbo.MyTable
WHERE
LastCheckStarted <= LastCheckCompleted
-- this ID is not being checked right now
AND LastCheckCompleted < DATEADD(minute, -20, GETDATE())
-- last check was done more than 20 minutes ago
ORDER BY LastCheckCompleted;
-- Start checking
UPDATE dbo.MyTable
SET LastCheckStarted = GETDATE()
WHERE ID = #VarID;
-- There is no need to explicitly verify if we found anything.
-- If #VarID is null, no rows will be updated
END;
-- Return found ID, or no rows if nothing was found,
-- or failed to acquire the lock
SELECT
#VarID AS ID
WHERE
#VarID IS NOT NULL
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
The second procedure is called by an application when it finishes checking the found ID.
CREATE PROCEDURE [dbo].[SetCheckComplete]
-- Add the parameters for the stored procedure here
#ParamID int
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'SomeUniqueName_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Completed checking the given ID
UPDATE dbo.MyTable
SET LastCheckCompleted = GETDATE()
WHERE ID = #ParamID;
END;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
It does not work because multiple transactions might first execute the IN clause and find the same set of rows, then update multiple times and overwrite each other.
LukeH's answer is best, accept it.
You can also fix it by adding AND Exported IS NULL to cancel double updates.
Or, make this SERIALIZABLE. This will lead to some blocking and deadlocking. This can safely be handled by timeouts and retry in case of deadlock. SERIALIZABLE is always safe for all workloads but it might block/deadlock more often.
I have created a "queue" of sorts in sql, and I want to be able to set an item as invisible to semi-simulate an azure like queue (instead of deleting it immediately in the event the worker fails to process it, it will appear automatically in the queue again for another worker to fetch).
As per recommendation from this SO: Is T-SQL Stored Procedure Execution 'atomic'?
I wrapped Begin Tran and Commit around my spDeQueue procedure, but I'm still running into duplicate pulls from my test agents. (They are all trying to empty a queue of 10 simultaneously and I'm getting duplicate reads, which I shouldn't)
This is my sproc
ALTER PROCEDURE [dbo].[spDeQueue]
#invisibleDuration int = 30,
#priority int = null
AS
BEGIN
begin tran
declare #now datetime = GETDATE()
-- get the queue item
declare #id int =
(
select top 1
[Id]
from
[Queue]
where
([InvisibleUntil] is NULL or [InvisibleUntil] <= #now)
and (#priority is NULL or [Priority] = #priority)
order by
[Priority],
[Created]
)
-- set the invisible and viewed count
update
[Queue]
set
[InvisibleUntil] = DATEADD(second, #invisibleDuration, #now),
[Viewed] = [Viewed] + 1
where
[Id] = #id
-- fetch the entire item
select
*
from
[Queue]
where
[Id] = #id
commit
return 0
END
What should I do to ensure this acts atomicly, to prevent duplicate dequeues.
Thanks
Your transaction (ie statements between 'begin trans' and 'commit') is atomic in the sense that either all the statements will be committed to the database, or none of them.
It appears you have transactions mixed up with synchronization / mutual exclusive execution.
Have a read into transaction isolation levels which should help enforce sequential execution - repeatable read might do the trick. http://en.wikipedia.org/wiki/Isolation_(database_systems)
I have a system which requires I have IDs on my data before it goes to the database. I was using GUIDs, but found them to be too big to justify the convenience.
I'm now experimenting with implementing a sequence generator which basically reserves a range of unique ID values for a given context. The code is as follows;
ALTER PROCEDURE [dbo].[Sequence.ReserveSequence]
#Name varchar(100),
#Count int,
#FirstValue bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
-- Ensure the parameters are valid
IF (#Name IS NULL OR #Count IS NULL OR #Count < 0)
RETURN -1;
-- Reserve the sequence
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
-- Get the sequence ID, and the last reserved value of the sequence
DECLARE #SequenceID int;
DECLARE #LastValue bigint;
SELECT TOP 1 #SequenceID = [ID], #LastValue = [LastValue]
FROM [dbo].[Sequences]
WHERE [Name] = #Name;
-- Ensure the sequence exists
IF (#SequenceID IS NULL)
BEGIN
-- Create the new sequence
INSERT INTO [dbo].[Sequences] ([Name], [LastValue])
VALUES (#Name, #Count);
-- The first reserved value of a sequence is 1
SET #FirstValue = 1;
END
ELSE
BEGIN
-- Update the sequence
UPDATE [dbo].[Sequences]
SET [LastValue] = #LastValue + #Count
WHERE [ID] = #SequenceID;
-- The sequence start value will be the last previously reserved value + 1
SET #FirstValue = #LastValue + 1;
END
COMMIT TRANSACTION
END
The 'Sequences' table is just an ID, Name (unique), and the last allocated value of the sequence. Using this procedure I can request N values in a named sequence and use these as my identifiers.
This works great so far - it's extremely quick since I don't have to constantly ask for individual values, I can just use up a range of values and then request more.
The problem is that at extremely high frequency, calling the procedure concurrently can sometimes result in a deadlock. I have only found this to occur when stress testing, but I'm worried it'll crop up in production. Are there any notable flaws in this procedure, and can anyone recommend any way to improve on it? It would be nice to do with without transactions for example, but I do need this to be 'thread safe'.
MS themselves offer a solution and even they say it locks/deadlocks.
If you want to add some lock hints then you'd reduce concurrency for your high loads
Options:
You could develop against the "Denali" CTP which is the next release
Use IDENTITY and the OUTPUT clause like everyone else
Adopt/modify the solutions above
On DBA.SE there is "Emulate a TSQL sequence via a stored procedure": see dportas' answer which I think extends the MS solution.
I'd recommend sticking with the GUIDs, if as you say, this is mostly about composing data ready for a bulk insert (it's simpler than what I present below).
As an alternative, could you work with a restricted count? Say, 100 ID values at a time? In that case, you could have a table with an IDENTITY column, insert into that table, return the generated ID (say, 39), and then your code could assign all values between 3900 and 3999 (e.g. multiply up by your assumed granularity) without consulting the database server again.
Of course, this could be extended to allocating multiple IDs in a single call - provided that your okay with some IDs potentially going unused. E.g. you need 638 IDs - so you ask the database to assign you 7 new ID values (which imply that you've allocated 700 values), use the 638 you want, and the remaining 62 never get assigned.
Can you get some kind of deadlock trace? For example, enable trace flag 1222 as shown here. Duplicate the deadlock. Then look in the SQL Server log for the deadlock trace.
Also, you might inspect what locks are taken out in your code by inserting a call to exec sp_lock or select * from sys.dm_tran_locks immediately before the COMMIT TRANSACTION.
Most likely you are observing a conversion deadlock. To avoid them, you want to make sure that your table is clustered and has a PK, but this advice is specific to 2005 and 2008 R2, and they can change the implementation, rendering this advice useless. Google up "Some heap tables may be more prone to deadlocks than identical tables with clustered indexes".
Anyway, if you observe an error during stress testing, it is likely that sooner or later it will occur in production as well.
You may want to use sp_getapplock to serialize your requests. Google up "Application Locks (or Mutexes) in SQL Server 2005". Also I described a few useful ideas here: "Developing Modifications that Survive Concurrency".
I thought I'd share my solution. I doesn't deadlock, nor does it produce duplicate values. An important difference between this and my original procedure is that it doesn't create the queue if it doesn't already exist;
ALTER PROCEDURE [dbo].[ReserveSequence]
(
#Name nvarchar(100),
#Count int,
#FirstValue bigint OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
IF (#Count <= 0)
BEGIN
SET #FirstValue = NULL;
RETURN -1;
END
DECLARE #Result TABLE ([LastValue] bigint)
-- Update the sequence last value, and get the previous one
UPDATE [Sequences]
SET [LastValue] = [LastValue] + #Count
OUTPUT INSERTED.LastValue INTO #Result
WHERE [Name] = #Name;
-- Select the first value
SELECT TOP 1 #FirstValue = [LastValue] + 1 FROM #Result;
END