How to prevent deadlock in concurrent T-SQL transactions? - sql-server

I have a query which inserts hundreds of records. The idea behind the query is:
DELETE old record with id
INSERT new record with the same id
If the record with id not exists, value for eternal_id will be generated
If the record with id exists, we should save the value from the eternal_id
Query executing in transaction with Read Committed type
Query looks like:
DECLARE #id1 int = 100
DECLARE #id2 int = 200
CREATE TABLE #t(
[eternal_id] [uniqueidentifier] NULL,
[id] [int] NOT NULL
)
DELETE FROM [dbo].[SomeTable] WITH (HOLDLOCK)
OUTPUT
DELETED.eternal_id
,DELETED.id
INTO #t
WHERE [id] IN (#id1, #id2)
INSERT INTO [dbo].[SomeTable]
([id]
,[title]
,[eternal_id])
SELECT main.*, ISNULL([eternal_id], NEWID())
FROM
(
SELECT
#id1 Id
,'Some title 1' Title
UNION
SELECT
#id2 Id
,'Some title 2' Title
) AS main
LEFT JOIN #t t ON main.[id] = t.[id]
DROP TABLE #t
I have hundreds of threads which executing this query with different #id. Everything works perfectly when record already exists in [dbo].[SomeTable], but when records with #id doesn't exists I am catching:
Transaction (Process ID 73) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
So the problem appears when 2 or more concurrent threads pass the same #id and the record not existing in [dbo].[SomeTable].
I tried to remove WITH (HOLDLOCK) here:
DELETE FROM [dbo].[SomeTable] WITH (HOLDLOCK)
OUTPUT
DELETED.eternal_id
,DELETED.id
INTO #t
WHERE [id] IN (#id1, #id2)
This not hepled and I am started to catch:
Violation of PRIMARY KEY constraint 'PK__SomeTable__3213E83F5D97F3D0'. Cannot insert duplicate key in object 'dbo.SomeTable'. The duplicate key value is (49).
The statement has been terminated.
So without WITH (HOLDLOCK) it works bad even when record already exists.
How to prevent deadlocks when record with id doesn't exists in the table?

Conditional update of eternal_id can be done like this:
update t set
...
eternal_id = ISNULL(t.eternal_id, NEWID())
from [dbo].[SomeTable] t
where t.id = #id
Thus you will keep the old value if it exists. No need to delete/insert. Unless you have some magic in triggers.

I think the comment above from #DaleK helped me the most. I will quote it:
While its a great ambition to try and avoid all deadlocks... its not
always possible... and you can't prevent all future deadlocks from
happens, because as more rows are added to tables query plans change.
Any application code should have some form of retry mechanism to
handle this. – Dale K
So I decided to implement some form of retry mechanism to handle this.

Related

How can a constraint be violated when the SQL query excludes it?

I have a result table that holds the output of a large, complicated, slow running query.
It's defined something like:
create table ResultsStore (
Foo int not null,
Bar int not null,
... other fields
constraint [PK_ResultsStore_foo_bar] primary key clustered
(
Foo asc,
Bar asc
)
)
I then insert to this table with a query like:
insert ResultsStore (Foo, Bar)
output inserted.*
select subQuery.ID, #bar
from (
-- large complex slow query
) subQuery
where subQuery.ID not in (
select Foo
from ResultsStore
where Bar = #bar
)
In testing this is fine, but in production, with lots of users hitting it regularly, we often get an exception:
Violation of PRIMARY KEY constraint 'PK_ResultsStore_foo_bar'. Cannot insert duplicate key in object 'ResultsStore'.
How is this possible? Surely the where should exclude any combination of the multiple primary key fields where they are already in the table?
How to best avoid this?
As written two sessions can run the query, both checking for the existence of the row concurrently, both not finding it, then both proceeding to attempt the insert. The first one will succeed in READ COMMITED, and the second one will fail.
You need WITH (UPDLOCK, HOLDLOCK, ROWLOCK) on the subquery to avoid this race condition. At default read committed isolation level either S locks taken by the sub query or row versioning is used and no locks at all are taken.
The HOLDLOCK gives serializable semantics and protects the range. UPDLOCK forces the read to use a U lock which will block other sessions from reading with UPDLOCK.
You can also use a temp table to hold interim results and perform the final insert at the end.
The following also includes a DISTINCT (which might or might not be needed), changes the dup test to use EXISTS, and applies WITH (UPDLOCK, HOLDLOCK, ROWLOCK) options to the final insert as suggested by others.
declare #TempResults table (
Foo int not null,
Bar int not null
)
insert #TempResults
select distinct subQuery.ID, #bar
from (
-- large complex slow query
) subQuery
insert ResultsStore (Foo, Bar)
output inserted.*
select T.Foo, T.Bar
from #TempResults T
where not exists (
select *
from ResultsStore RS with (updlock, holdlock, rowlock)
where RS.Foo = T.Foo
and RS.Bar = T.Bar
)
This lets your long running query run fast and dirty (as you intend), but should maintain integrity and minimize actual lock duration for the final insert.

Will an UPDLOCK in Microsoft SQL Server prevent inserts on keys in an IF EXISTS query?

I have a table similar to this one...
CREATE TABLE [Customer]
(
[Id] BIGINT IDENTITY NOT NULL,
[AccountName] CHARACTER VARYING(255),
CONSTRAINT [PK_Customer_Id] PRIMARY KEY ([Id]),
CONSTRAINT [UQ_Customer_AccountName] UNIQUE ([AccountName])
)
I want to execute this query concurrently from many applications...
IF NOT EXISTS(SELECT [AccountName] FROM [Customers] WITH (UPDLOCK, HOLDLOCK) WHERE [AccountName] = 'SuperCustomer') THEN
BEGIN
INSERT INTO [Customers] ([AccountName]) VALUES ('SuperCustomer');
END
Would the WITH (UPDLOCK, HOLDLOCK) prevent concurrent execution of this query from attempting the insert with the same AccountName value even if the row does not exist yet by holding the update lock on the index of the non existent data? I want to avoid termination due to a unique constraint violation on AccountName in the Customer table in all cases if a user tries to submit the same customer for creation twice at the same time or in high volume maliciously for whatever reason. We're operating with SET XACT_ABORT ON and this will be inside a transaction that is at the READ COMMITTED isolation level.
Testing this using two connections, with a WAITFOR DELAY before the INSERT, indicates that it's an effective technique. The HOLDLOCK hint keeps the locks until the end of the transaction (the UPDLOCK acquires an U lock on a KEY resource, which is incompatible with another similar lock).
The select is not in a tranaction so I think it will be released.
IF NOT EXISTS( SELECT [AccountName]
FROM [Customers] WITH (UPDLOCK, HOLDLOCK)
WHERE [AccountName] = 'SuperCustomer' )
THEN
BEGIN
INSERT INTO [Customers] ([AccountName]) VALUES ('SuperCustomer');
END
This is a single statement so it is a transaction
insert into INSERT INTO [Customers] ([AccountName])
select 'SuperCustomer'
where not exists ( select 1
from [Customers] with (UPDLOCK)
where [AccountName] = 'SuperCustomer' )

How to refactor this deadlock issue?

I ran into a deadlock issue synchronizing a table multiple times in a short period of time. By synchronize I mean doing the following:
Insert data to be synchronized into a temp table
Update existing records in destination table
Insert new records into the destination table
Delete records that are not in the synch table under certain
circumstances
Drop temp table
For the INSERT and DELETE statements, I'm using a LEFT JOIN similar to:
INSERT INTO destination_table (fk1, fk2, val1)
FROM #tmp
LEFT JOIN destination_table dt ON dt.fk1 = #tmp.fk1
AND dt.fk2 = #temp.fk2
WHERE dt.pk IS NULL;
The deadlock graph is reporting the destination_table's primary key is under an exclusive lock. I assume the above query is causing a table or page lock instead of a row lock. How would I confirm that?
I could rewrite the above query with an IN, EXIST or EXCEPT command. Are there any additional ways of refactoring the code? Will refactoring using any of these commands avoid the deadlock issue? Which one would be the best? I'm assuming EXCEPT.
Well under normal circumstances I could execute scenario pretty well. Given below is the test script I created. Are you trying something else?
drop table #destination_table
drop table #tmp
Declare #x int=0
create table #tmp(fk1 int, fk2 int, val int)
set #x=2
while (#x<1000)
begin
insert into #tmp
select #x,#x,100
set #x=#x+3
end
create table #destination_table(fk1 int, fk2 int, val int)
while (#x<1000)
begin
insert into #destination_table
select #x,#x,100
set #x=#x+1
end
INSERT INTO #destination_table (fk1, fk2, val)
select t.*
FROM #tmp t
LEFT JOIN #destination_table dt ON dt.fk1 = t.fk1
AND dt.fk2 = t.fk2
WHERE dt.fk1 IS NULL

Handling max(ID) in a concurrent environment

I am new to web application programming and handling concurrency using an RDBMS like SQL Server. I am using SQL Server 2005 Express Edition.
I am generating employee code in which the last four digits come from this query:
SELECT max(ID) FROM employees WHERE district = "XYZ";
I am not following how to handle issues that might arise due to concurrent connections. Many users can pick same max(ID) and while one user clicks "Save Record", the ID might have already been occupied by another user.
How to handle this issue?
Here are two ways of doing what you want. The fact that you might end up with unique constraint violation on EmpCode I will leave you to worry about :).
1. Use scope_identity() to get the last inserted ID and use that to calculate EmpCode.
Table definition:
create table Employees
(
ID int identity primary key,
Created datetime not null default getdate(),
DistrictCode char(2) not null,
EmpCode char(10) not null default left(newid(), 10) unique
)
Add one row to Employees. Should be done in a transaction to be sure that you will not be left with the default random value from left(newid(), 10) in EmpCode:
declare #ID int
insert into Employees (DistrictCode) values ('AB')
set #ID = scope_identity()
update Employees
set EmpCode = cast(year(Created) as char(4))+DistrictCode+right(10000+#ID, 4)
where ID = #ID
2. Make EmpCode a computed column.
Table definition:
create table Employees
(
ID int identity primary key,
Created datetime not null default getdate(),
DistrictCode char(2) not null,
EmpCode as cast(year(Created) as char(4))+DistrictCode+right(10000+ID, 4) unique
)
Add one row to Employees:
insert into Employees (DistrictCode) values ('AB')
It is a bad idea to use MAX, because with a proper locking mechanism, you will not be able to insert rows in multiple threads for the same district.
If it is OK for you that you can only create one user at a time, and if your tests show that the MAX scales up even with a lot of users per district, it may be ok to use it.
Long story short, dealing with identies, as much as possible, you should rely on IDENTITY. Really.
But if it is not possible, one solution is to handle IDs in a separate table.
Create Table DistrictID (
DistrictCode char(2),
LastID Int,
Constraint PK_DistrictCode Primary Key Clustered (DistrictCode)
);
Then you increment the LastID counter. It is important that incrementing IDs is a transaction separated to the user creation transaction if you want to create many users in parallel threads. You can limit to have only the ID generation in sequence.
The code can look like this:
Create Procedure usp_GetNewId(#DistrictCode char(2), #NewId Int Output)
As
Set NoCount On;
Set Transaction Isolation Level Repeatable Read;
Begin Tran;
Select #NewId = LastID From DistrictID With (XLock) Where DistrictCode = #DistrictCode;
Update DistrictID Set LastID = LastID + 1 Where DistrictCode = #DistrictCode;
Commit Tran;
The Repeatable Read and XLOCK keywords are the minimum that you need to avoid two threads to get the same ID.
If the table does not have all districts, you will need to change the Repeatable Read into a Serializable, and fork the Update with a Insert.
This can be done through Transaction Isolation Levels. For instance, if you specify SERIALIZABLE as the level then other transactions will be blocked so that you aren't running into this problem.
If I did not understand your question correctly, please let me know.

SQL Server 2005 How can I set up an audit table that records the column name updated?

given this table definition
create table herb.app (appId int identity primary key
, application varchar(15) unique
, customerName varchar(35),LoanProtectionInsurance bit
, State varchar(3),Address varchar(50),LoanAmt money
,addedBy varchar(7) not null,AddedDt smalldatetime default getdate())
I believe changes will be minimal, usually only a single field, and very sparse.
So I created this table:
create table herb.appAudit(appAuditId int primary key
, field varchar(20), oldValue varchar(50),ChangedBy varchar(7) not null,AddedDt smalldatetime default getdate())
How in a trigger can I get the column name of the value of what was changed to store it? I know how to get the value by joining the deleted table.
Use the inserted and deleted tables. Nigel Rivett wrote a great generic audit trail trigger using these tables. It is fairly complex SQL code, but it highlights some pretty cool ways of pulling together the information and once you understand them you can create a custom solution using his ideas as inspiration, or you could just use his script.
Here are the important ideas about the tables:
On an insert, inserted holds the inserted values and deleted is empty.
On an update, inserted holds the new values and deleted holds the old values.
On a delete, deleted holds the deleted values and inserted is empty.
The structure of the inserted and deleted tables (if not empty) are identical to the target table.
You can determine the column names from system tables and iterate on them as illustrated in Nigel's code.
if exists (select * from inserted)
if exists (select * from deleted)
-- this is an update
...
else
-- this is an insert
...
else
-- this is a delete
...
-- For updates to a specific field
SELECT d.[MyField] AS OldValue, i.[MyField] AS NewValue, system_user AS User
FROM inserted i
INNER JOIN deleted d ON i.[MyPrimaryKeyField] = d.[MyPrimaryKeyField]
-- For your table
SELECT d.CustomerName AS OldValue, i.CustomerName AS NewValue, system_user AS User
FROM inserted i
INNER JOIN deleted d ON i.appId = d.appId
If you really need this kind of auditing in a way that's critical to your business look at SQL Server 2008's Change Data Capture feature. That feature alone could justify the cost of an upgrade.
something like this for each field you want to track
if UPDATE(Track_ID)
begin
insert into [log].DataChanges
(
dcColumnName,
dcID,
dcDataBefore,
dcDataAfter,
dcDateChanged,
dcUser,
dcTableName
)
select
'Track_ID',
d.Data_ID,
coalesce(d.Track_ID,-666),
coalesce(i.Track_ID,-666),
getdate(),
#user,
#table
from inserted i
join deleted d on i.Data_ID=d.Data_ID
and coalesce(d.Track_ID,-666)<>coalesce(i.Track_ID,-666)
end
'Track_ID' is the name of the field, and d.Data_ID is the primary key of the table your tracking. #user is the user making the changes, and #table would be the table your keeping track of changes in case you're tracking more than one table in the same log table
Here's my quick and dirty audit table solution. (from http://freachable.net/2010/09/29/QuickAndDirtySQLAuditTable.aspx)
CREATE TABLE audit(
[on] datetime not null default getutcdate(),
[by] varchar(255) not null default system_user+','+AppName(),
was xml null,
[is] xml null
)
CREATE TRIGGER mytable_audit ON mytable for insert, update, delete as
INSERT audit(was,[is]) values(
(select * from deleted as [mytable] for xml auto,type),
(select * from inserted as [mytable] for xml auto,type)
)

Resources