I was running a proc inside a cursor. After a lot of succesful iterations, I got this:
Transaction (Process ID 104) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
I am not posting the full details, so I don't expect a fine grain debugging answer. The facts:
I am sure noone else (including myself in another session) was using the proc, as I was developing it and
This transaction was "stuck" when doing a select ( i saw the running query from dm exec requests)
If I am not mistaken on my 2 points, is it ever possible to have a deadlock? Wouldn't the deadlock require all of the involved users of a resource to be doing write operations on them, which would create a cycle in the resource request graph? I understand a timeout error in a select, but cannot understand a deadlock. What am I missing?
An update:
I abandoned further debugging because I noticed that an index I thought existed didn't. When it was created, the performance was OK.
However, in hopes to keep this useful and hopefully come up with an answer, here is some more things I investigated, some facts, and thoughts on comments:
First, the sql server version is 2008. I understand this is not supported. I am in no position to make recommendations, much less update the server though.
I found Jeroen Mostert's comment interesting. How much is "the past"? I noticed in sys.dm_os_waiting_tasks the session being blocked by itself multiple times with wait type CXPACKET. I did some searching around, but option(maxdop 1) did not solve the problem. However, remember the index that did not exist which would cause abysmal performance. Could it be that there was correct parallelism appended, but the operations were too many? Still, I also witnessed a huge dm_exec_requests.wait_time. So, even though the query was bad, I am led to believe that there were strange (dead)locks around.
If an answer/comment comes up with specific queries/steps to do to trace the problem, I'll be happy to recreate it.
It is possible for a SELECT to cause a deadlock if someone else is using the table.
This example is ripped almost 100% from Brent Ozar's video on deadlocks, but changed one command to a SELECT.
To start with, create two tables
CREATE TABLE Lefty (ID int PRIMARY KEY)
CREATE TABLE Righty (ID int PRIMARY KEY)
INSERT INTO Lefty (ID) VALUES (1)
INSERT INTO Righty (ID) VALUES (2)
Then open two windows in SSMS. In the first put this code (don't run it yet)
BEGIN TRAN
UPDATE Lefty SET ID = 5
SELECT * FROM Righty
COMMIT TRAN
In the second window put in this code (also don't run it yet).
BEGIN TRAN
UPDATE Righty SET ID = 5
UPDATE Lefty SET ID = 5
COMMIT TRAN
Now, in the first window, run the first two commands (BEGIN TRAN AND UPDATE LEFTY).
That starts.
In the second window, run the whole transaction. It sits there waiting for your first window, and will wait forever.
In the first window, go back and run the SELECT * FROM Righty and COMMIT TRAN. 5, 4, 3, 2, 1 Boom deadlock - because the second window already had a lock on the table and therefore the SELECT in the first window couldn't run (and the second window couldn't run because the first had a lock on a table it needed).
(I'd like to reiterate - this is Brent Ozar's demo not mine! I'm just passing it on. Indeed, I recommend them).
Related
We had to use a trigger to sync an old system to the new system until we can fully deprecate the old system. The new system doesn't need this trigger at all and, in fact, exits out immediately on the condition that it's the new app.
The impact on the old system is acceptable.
However, the impact on the new system is not because the new system processes many, many more records on a single update. Merely executing the trigger changes an update from 10 seconds (already "UGH") to over a minute and a half.
The new system performs acceptably by disabling the trigger in code (VS Core with EntityFramework btw), running the update and then re-enabling the code, all within a transaction. There is disagreement among my colleagues about whether or not the trigger is disabled for the other application while the transaction is being processed.
I have already seen this post:
https://dba.stackexchange.com/questions/204339/sql-server-how-to-disable-trigger-for-an-update-only-for-your-current-session
And the first answer is the solution I am using. My colleagues tell me that won't work. I believe it will. But the answers 2-whatever seem to contradict the first answer.
My testing proved out the first answer as well but I need to be 100% sure on this.
TIA
However, the impact on the new system is not because the new system processes many, many more records on a single update.
You should find a way to batch the updates into fewer statements. The trigger fires per statement, not per row. EG EF Core does batching automatically, or you can use a TVP or SqlBulkCopy into a temp table, etc.
DISABLE TRIGGER within a transaction eliminates the possibility of other users updating while the trigger is disabled
Yes. You can easily verify that disabling the trigger takes a Sch-M lock on the table for the duration of the transaction, which is incompatible with all other table access.
eg
use tempdb
drop table if exists t
create table t(id int primary key)
go
create trigger t_t on t after insert
as
begin
select 'trigger running' msg
end
go
begin transaction
go
disable trigger t_t on t
go
select object_name(resource_associated_entity_id) table_name, resource_lock_partition, request_mode, request_status
from sys.dm_tran_locks
where request_session_id = ##spid
and resource_type = 'OBJECT'
order by 1,2
rollback
I'm trying to delete one single record from the database.
Code is very simple:
SELECT * FROM database.tablename
WHERE SerialNbr = x
This gives me the one record I'm looking for. It has that SerialNbr plus a number ids that are foreign keys to other tables. I took care of all foreign key constraints to where the next line of the code will start running.
After that the code is followed by:
DELETE FROM tablename
WHERE SerialNbr = x
This should be a relatively simple and quick query I would think. However it has now run for 30 minutes with no results. It isn't yelling about any problems with foreign keys or anything like that, it just is taking a very very long time to process. Is there anything I can do to speed up this process? Or am I just stuck waiting? It seems something is wrong that deleting one single record would take this long.
I am using Microsoft SQL Server 2008.
It's not taking a long time to delete the row, it's waiting in line for its turn to access the table. This is called blocking, and is a fundamental part of how databases work. Essentially, you can't delete that row if someone else has a lock on it - they may be reading it and want to be sure it doesn't change (or disappear) before they're done, or they may be trying to update it (to an unsatisfactory end, of course, if you wait it out, since once they commit your delete will remove it anyway).
Check the SPID for the window where you're running the query. If you have to, stop the current instance of the query, then run this:
SELECT ##SPID;
Make note of that number, then try to run the DELETE again. While it's sitting there taking forever, check for a blocker in a different query window:
SELECT blocking_session_id FROM sys.dm_exec_requests WHERE session_id = <that spid>;
Take the number there, and issue something like:
DBCC INPUTBUFFER(<the blocking session id>);
This should give you some idea about what the blocker is doing (you can get other information from sys.dm_exec_sessions etc). From there you can decide what you want to do about it - issue KILL <the spid>;, wait it out, go ask the person what they're doing, etc.
You may need to repeat this process multiple times, e.g. sometimes a blocking chain can be several sessions deep.
What I think is happening is there is some kind of performance hunt in the database or that particular table you can see that by simply run sp_who2 and kill the SP's. Be careful on running the kill sp because it might not be your query.
Delete From Database.Tablename
where SerialNbr=x
I have a stored procedure, and I want to ensure it cannot be executed concurrently.
My (multi-threaded) application does all necessary work on the underlying table via this stored procedure.
IMO, locking the table itself is an unnecessarily drastic action to take, and so when I found out about sp_GetAppLock, which essentially enforces a critical section, this sounded ideal.
My plan was to encase the stored procedure in a transaction and to set up spGetAppLock with transaction scope. The code was written and tested successfully.
The code has now been put forward for review and I have been told that I should not call this function. However when asking the obvious question "why not?", the only reasons I am getting are highly subjective, to do with any form of locking being complicated.
I don't necessarily buy this, but I was wondering whether anyone had any objective reasons why I should avoid this construct. Like I say, given my circumstances a critical section sounds an ideal approach to me.
Further info: An application sits on top of this with 2 threads T1 and T2. Each thread is waiting for a different message M1 and M2. The business logic involved says that processing can only happen once both M1 and M2 have arrived. The stored procedure logs that Mx has arrived (insert) and then checks whether My is present (select). The built-in locking is fine to make sure the inserts happen serially. But the selects need to happen serially too and I think I need to do something over and above the built-in functionality here.
Just for clarity, I want the "processing" to happen exactly once. So I can't afford for the stored procedure to return either false positives or false negatives. I'm worried that if the stored proc runs twice in very quick succession, then both "selects" might return data which indicates that it is appropriate to perform processing.
What is the procedure doing that you cannot rely on SQL Servers built-in concurrency control mechanisms? Often queries can be rewritten to allow real concurrency.
But if this procedure indeed has to be executed "alone", locking the table itself on first access is most likely going to be a lot faster than using the call to sp_GetAppLock. It sounds like this procedure is going to be called often. If that is the case you should look for a way to achieve the goal with minimal impact.
If the table contains no other rows besides of M1 and M2 a table lock is still your best bet.
If you have multiple threads sending multiple messages you can get more fine-grained by using "serializable" as transaction level and check if the other message is there before you do the insert but within the same transaction. To prevent deadlocks in this case make sure you check for both messages for example like this:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN;
SELECT
#hasM1 = MAX(CASE WHEN msg_type='M1' THEN 1 ELSE 0 END),
#hasM2 = MAX(CASE WHEN msg_type='M2' THEN 1 ELSE 0 END)
FROM messages WITH(UPDLOCK)
WHERE msg_type IN ('M1','M2')
INSERT ...
IF(??) EXEC do_other_stuff_and_delete_messages;
COMMIT
In the IF statement before(!) the COMMIT you can use the information collected before the insert together with the information that you inserted to decide if additional processing is necessary.
In that processing step make sure to either mark those messages as processed or to delete them all still within the same transaction. That will make sure that you will not process those messages twice.
SERIALIZABLE is the only transaction isolation level that allows to lock rows that do not exist yet, so the first select statement with the WITH(UPDLOCK) effectively prevents the other row being inserted while the first execution is still running.
Finally, these are a lot of things to be aware of that could go wrong. You might want to have a look at service broker instead. you could use three queues with that. one for type M1 and one for type M2. Every time a message arrives within those queues a procedure can automatically be called to insert a token into the third queue. The third queue then could activate a process to check if both messages exist and do work. That would make the entire process asynchronous but for that it would be easy to restrict the queue 3 response to always only do one check at a time.
Service broker on msdn, also look at "activation" for the automatic message processing.
sp_GetAppLock is just like many other tools and as such it can be misused, overused, or correctly used. It is an exact match for the type of problem described by the original poster.
This is a good MSSQL Tips post on the usage
Prevent multiple users from running the same SQL Server stored procedure at the same time
http://www.mssqltips.com/sqlservertip/3202/prevent-multiple-users-from-running-the-same-sql-server-stored-procedure-at-the-same-time/
We use sp_getapplock all the time, due to the fact that we support some legacy applications that have been re-worked to use a SQL back-end, and the SQL Server locking model is not an exact match for our application logic.
We tend to go for a 'pessimistic' locking model, where we lock an entity before allowing a user to edit it, and use the (NOLOCK) hint extensively when reading data to bypass any blocking from the native locks on the actual tables. sp_getapplock is a good match for this. We also use it to enforce critical paths in large multi-user systems. You have to be systematic about what you call the locks you place.
We've found no performance problems with large numbers of user/locks via this route, so I see no reason why it wouldn't work well for you. Just be aware that you can get blocking and deadlocks if you have processes that place the same named locks, but not necessarily in the same order.
You can create a table with a flag for each set of messages, so if one of the threads is first to start processing it will mark the flag as processing.
To make sure that record blocked properly once one of threads reaches it use:
SELECT ... FROM WITH(XLOCK,ROWLOCK,READCOMMITTED) ... WHERE ...
This peace of code will put Exclusive lock on the record meaning who first got to it owns the row.
Then you do your changes and update flag, other thread will get updated value because it will be blocked by Exclusive lock until first thread commmits or rollbacks transaction.
For this to work you always need to select records from table with XLOCK this way it will work as expected.
Hope this helps.
Exclusive lock prove:
USE master
GO
IF OBJECT_ID('dbo.tblTest') IS NOT NULL
DROP TABLE dbo.tblTest
CREATE TABLE tblTest ( id int PRIMARY KEY )
;WITH cteNumbers AS (
SELECT 1 N
UNION ALL
SELECT N + 1 FROM cteNumbers WHERE N<1000
)
INSERT INTO
tblTest
SELECT
N
FROM
cteNumbers
OPTION (MAXRECURSION 0)
BEGIN TRANSACTION
SELECT * FROM dbo.tblTest WITH(XLOCK,ROWLOCK,READCOMMITTED) WHERE id = 1
SELECT * FROM sys.dm_tran_locks WHERE resource_database_id = DB_ID('master')
ROLLBACK TRANSACTION
I need to develop a server application (in C#) that will read rows from a simple table (in SQL Server 2005 or 2008), do some work, such as calling a web service, and then update the rows with the resulting status (success, error).
Looks quite simple, but things get tougher when I add the following application requisites:
Multiple application instances must be running at the same time, for Load Balancing and Fault Tolerance purposes. Typically, the application will be deployed on two or more servers, and will concurrently access the same database table. Each table row must be processed only once, so a common synchronization/locking mechanism must be used between multiple application instances.
When an application instance is processing a set of rows, other application instances shouldn't have to wait for it to end in order to read a different set of rows waiting to be processed.
If an application instance crashes, no manual intervention should need to take place on the table rows that were being processed (such as removing temporary status used for application locking on rows that the crashing instance was processing).
The rows should be processed in a queue-like fashion, i.e., the oldest rows should be processed first.
Although these requisites don't look too complex, I'm having some trouble in coming up with a solution.
I've seen locking hint suggestions, such as XLOCK, UPDLOCK, ROWLOCK, READPAST, etc., but I see no combination of locking hints that will allow me to implement these requisites.
Thanks for any help.
Regards,
Nuno Guerreiro
This is a typical table as queue pattern, as described in Using tables as Queues. You would use a Pending Queue and the dequeue transaction should also schedule a retry in a reasonable timeout. Is not realistically possible to hold on to locks for the duration of the web calls. On success, you would remove the pending item.
You also need to be able to dequeue in batch, dequeuing one-by-one is too slow if you go into serious load (100 and thousands of operations per second). So taking the Pending Queue example from the article linked:
create table PendingQueue (
id int not null,
DueTime datetime not null,
Payload varbinary(max),
cnstraint pk_pending_id nonclustered primary key(id));
create clustered index cdxPendingQueue on PendingQueue (DueTime);
go
create procedure usp_enqueuePending
#dueTime datetime,
#payload varbinary(max)
as
set nocount on;
insert into PendingQueue (DueTime, Payload)
values (#dueTime, #payload);
go
create procedure usp_dequeuePending
#batchsize int = 100,
#retryseconds int = 600
as
set nocount on;
declare #now datetime;
set #now = getutcdate();
with cte as (
select top(#batchsize)
id,
DueTime,
Payload
from PendingQueue with (rowlock, readpast)
where DueTime < #now
order by DueTime)
update cte
set DueTime = dateadd(seconds, #retryseconds, DueTime)
output deleted.Payload, deleted.id;
go
On successful processing you would remove the item from the queue using the ID. On failure, or on crash, it would be retries automatically in 10 minutes. One think you must internalize is that as long as HTTP does not offer transactional semantics you will never be able to do this with 100% consistent semantics (eg. guarantee that no item is processed twice). You can achieve a very high margin for error, but there will always be a moment when the system can crash after the HTTP call succeeded before the database is updated, and will cause the same item to be retried since you cannot distinguish this case from a case when the system crashed before the HTTP call.
I initially suggested SQL Server Service Broker for this. However, after some research it turns out this is probably not the best way of handling the problem.
What you're left with is the table architecture you've asked for. However, as you've been finding, it is unlikely that you will be able to come up with a solution that meets all the given criteria, due to the great complexity of locking, transactions, and the pressures placed on such a scheme by high concurrency and high transactions per second.
Note: I am currently researching this issue and will get back to you with more later. The following script was my attempt to meet the given requirements. However, it suffers from frequent deadlocks and processes items out of order. Please stay tuned, and in the meantime consider a destructive reads method (DELETE with OUTPUT or OUTPUT INTO).
SET XACT_ABORT ON; -- blow up the whole tran on any errors
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN
UPDATE X
SET X.StatusID = 2 -- in process
OUTPUT Inserted.*
FROM (
SELECT TOP 1 * FROM dbo.QueueTable WITH (READPAST, ROWLOCK)
WHERE StatusID = 1 -- ready
ORDER BY QueuedDate, QueueID -- in case of items with the same date
) X;
-- Do work in application, holding open the tran.
DELETE dbo.QueueTable WHERE QueueID = #QueueID; -- value taken from recordset that was output earlier
COMMIT TRAN;
In the case of several/many rows being locked at once by a single client, there is a possibility of the rowlock escalating to an extent, page, or table lock, so be aware of that. Also, normally holding long-running transactions that maintain locks is a big no-no. It may work in this special usage case, but I fear that high tps by multiple clients will make the system break down. Note that normally, the only processes querying your queue table should be those that are doing queue work. Any processes doing reporting should use READ UNCOMMITTED or WITH NOLOCK to avoid interfering with the queue in any way.
What is the implication of rows being processed out of order? If an application instance crashes while another instance is successfully completing rows, this delay will likely cause at least one row to be delayed in its completion, causing the processing order to be incorrect.
If the transaction/locking method above is not to your satisfaction, another way to handle your application crashing would be to give your instances names, then set up a monitor process that has the capacity to check periodically if those named instances are running. When a named instance starts up it would always reset any unprocessed rows that possess its instance identifier (something as simple as "instance A" and "instance B" would work). Additionally, the monitor process would check if the instances are running and if one of them is not, reset the rows for that missing instance, enabling any other instances to run. There would be a small lag between crash and recovery, but with proper architecture it could be quite reasonable.
Note: The following links should be edifying:
info about XLOCK
Tables as Queues
You can't do this with SQL transactions (or relying on transactions as your main component here). Actually, you can do this, but you shouldn't. Transactions are not meant to be used this way, for long locks, and you shouldn't abuse them like this.
Keeping a transaction open for that long (retrieve rows, call the web service, get back to make some updates) is simply not good. And there's no optimistic locking isolation level that will allow you to do what you want.
Using ROWLOCK is also not a good idea, because it's just that. A hint. It's subject to lock escalation, and it can be converted to a table lock.
May I suggest a single entry point to your database? I think it fits in the pub/sub design.
So there would be only one component that reads/updates these records:
Reads batches of messages (enough for all your other instances to consume) - 1000, 10000, whatever you see fit. It makes these batches available to the other (concurrent) components through some queued way. I'm not going to say MSMQ :) (it would be the second time today I recommend it, but it's really suitable in your case too).
It marks the messages as in progress or something similar.
Your consumers are all bound, transactionally, to the inbound queue and do their stuff.
When ready, after the web service call, they put the messages in an outbound queue.
The central component picks them up and, inside a distributed transaction, does an update on the database (if it fails the messages will stay in the queue). Since it is the only one that could do that operation you won't have any concurrency issues. At least not on the database.
In the mean time it can read the next pending batch and so on.
An internal application needs to dynamically create SQL tables based on some provided criteria. There are multiple consumer of this application.
IF (NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME = 'SomeTableName'))
BEGIN
-- Create table in here.
END
To do this I have the above basic construct within the sproc. I am aware of possible race conditions, so my first solution was to add some locking hints to the SELECT statement to ensure that all other transactions checking for the existance of a table would be blocked until the other transactions had finished. However, no matter which hints I used, this would not work.
My next solution was to wrap the table creation in a TRY..CATCH so that even if it did fail, I could just ignore the error. However, the failure of the CREATE TABLE statement dooms the transaction so I cannot carry on even if I do ignore the error.
My last solution, which works, was to use the TRY..CATCH construct and if an error is raised then GOTO the top of the sproc where a fresh transaction is created and everything goes through fine as the table exists second time round.
I am not happy with the solution as it seems like a hack. Any SQL gurus out there who knows a clean solution to this issue?
Just to clarify, the solution I discussed above does not have a large impact on performance, so I am really looking for a clean solution which doesn't have large performance implications.
Use semaphores (aka manual locking) with sp_getapplock (top of code) and sp_releaseapplock (bottom of code) to ensure one process only.
A 2nd process will fail or wait or timeout based on your sp_getapplock parameters