Can adding a primary key identity column solve deadlock issues? - sql-server

I have a table in SQL server that is CRUD-ed concurrently by a stored procedure running simultaneously in different sessions:
|----------------|---------|
| <some columns> | JobGUID |
|----------------|---------|
The procedure works as follows:
Generate a GUID.
Insert some records into the shared table described above, marking them with the GUID from step 1.
Perform a few updates on all records from step 2.
Select the records from step 3 as SP output.
Every select / insert / update / delete statement in the stored procedure has a WHERE JobGUID = #jobGUID clause, so the procedure works only with the records it has inserted on step 2. However, sometimes when the same stored procedure runs in parallel in different connections, deadlocks occur on the shared table. Here is the deadlock graph from SQL Server Profiler:
Lock escalations do not occur. I tried adding (UPDLOCK, ROWLOCK) locking hints to all DML statements and/or wrapping the body of the procedure in a transaction and using different isolation levels, but it did not help. Still the same RID lock on the shared table.
After that I've discovered that the shared table did not have a primary key/identity column. And once I added it, deadlocks seem to have disappeared:
alter table <SharedTable> add ID int not null identity(1, 1) primary key clustered
When I remove the primary key column, the deadlocks are back. When I add it back, I cannot reproduce the deadlock anymore.
So, the question is, is a primary key identity column really able to resolve deadlocks or is it just a coincidence?
Update: as #Catcall suggests, I've tried creating a natural clustered primary key on the existing columns (without adding an identity column), but still caught the same deadlock (of course, this time it was a key lock instead of RID lock).

The best resource (still) for deadlock resolution is here: http://blogs.msdn.com/b/bartd/archive/2006/09/09/deadlock-troubleshooting_2c00_-part-1.aspx.
Pt #4 says:
Run the queries involved in the deadlock through Database Tuning
Advisor. Plop the query in a Management Studio query window, change
db context to the correct database, right-click the query text and
select “Analyze Query in DTA”. Don’t skip this step; more than half
of the deadlock issues we see are resolved simply by adding an
appropriate index so that one of the queries runs more quickly and
with a smaller lock footprint. If DTA recommends indexes (it'll say
“Estimated Improvement: %”), create them and monitor to
see if the deadlock persists. You can select “Apply Recommendations”
from the Action drop-down menu to create the index immediately, or
save the CREATE INDEX commands as a script to create them during a
maintenance window. Be sure to tune each of the queries separately.
I know this doesn't "answer" the question to why necessarily, but it does show that adding indexes can change the execution in ways to make either the lock footprint smaller or execution time faster which can significantly reduce the chances of a deadlock.

Recently I have seen this post, according to above information i hope this post will help you,
http://databaseusergroup.blogspot.com/2013/10/deadlocked-on-sql-server.html

Related

Set Null works in Sqlite but not works in Sql Server when relating table itself

The problem is that "DeleteBehavior.SetNull" works only in Sqlite and doesn't work at all in Sql Server, is this some limitation of Sql Server with SET NULL?
I have the "User" model:
User.Id
User.Name
And I also have the "Partner" model:
Partner.Id
Partner.Title
Partner.ParentId
Partner.Parent (virtual)
Scenario:
I create Partner 1
I create Partner 2 and define that the ParentId is Partner 1 (1 is the father of 2)
I try to delete Partner 1 (I try to delete the parent)
At that moment, Sqlite defines NULL in the ParentId of Partner 2, that's correct, that's the behavior I want, but in SQL Server I can't do that at all, I tried innumerable ways and I fall into some errors, follow below:
Errors:
Delete Error:
Microsoft.Data.SqlClient.SqlException (0x80131904): The DELETE statement conflicted with the SAME TABLE REFERENCE constraint "FK_Partners_Partners_ParentId". The conflict occurred in database "master", table "dbo.Partners", column 'ParentId'.
Migrations Error:
Introducing FOREIGN KEY constraint 'FK_Partners_Partners_ParentId' on table 'Partners' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
Could not create constraint or index. See previous errors.
I even found some old texts saying that this is a Sql Server limitation, but it's already 2023 and this limitation still exists? Is it possible to get around this in some way that is easy and affects every table in the database?
I already tried all the DefaultBehavior and none works like Sqlite, I was programming 100% in Sqlite and I managed to develop a system and everything is working, however when generating the migration and trying to use Sql Server I came across this problem.
The same thing is asked at dba.stackexchange.com. The answers explain in detail why this isn't so easy to implement. Relational databases operate on sets of rows at a time, not individual rows. Deleting or updating rows one by one is the slowest way possible.
While SQLite is built to handle a few thousand rows for a single application running inside a watch, SQL Server has to handle thousands of concurrent operations to the same table that may contain several millions of rows spread across multiple partitions. The self-referencing ON DELETE SET NULL has to work reliably and predictably when deleting 1 row in an 1000 row table and when deleting 10K rows in a 50M row table.
As Mikael Eriksson explains in the first answer, ON DELETE SET NULL converts a DELETE operation on a table to an UPDATE operation on the same table.
This DBA question on cascading DELETEs shows what's involved in the easy case :
In this picture the server :
Finds the rows that need to be deleted in the first table,
Removes them from the parent table. That means marking rows and pages for deletion, writing records to the transaction log
Spool the deleted keys so they can be used on the related table
Repeat 1-2 on the child table
When all that finishes, commit the transaction by committing all changes in the data pages and the transaction log.
And that's a single operation. ON DELETE SET NULL on the other hand converts the DELETE operation to an DELETE and an UPDATE on the same table. The database would have to both DELETE and UPDATE index rows on the ParentID index to get this to happen. Different kinds of locks would have to be taken, and some of them could be taken
There's a similar statement that does multiple operations at once, MERGE. Aaron Bertrand's Use Caution with SQL Server's MERGE Statement shows a list of 30 bugs for that statement alone. MERGE isn't even atomic and the UPDATE/DELETE/INSERT operations are executed separately, which is the cause of some of the bugs.
I'd rather not have ON DELETE SET NULL than have a slow or unreliable one
While trying to reproduce this I found an SQLite limitation - foreign keys aren't enforced by default for compatibility with the way it worked over a decade ago. The docs warn this can change in the future:
Foreign key constraints are disabled by default (for backwards compatibility), so must be enabled separately for each database connection. (Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default but will instead enable or disable them as necessary.) The application can also use a PRAGMA foreign_keys statement to determine if foreign keys are currently enabled.
This can seem like an illogical restriction in 2023 until one remembers that SQLite was built to run on the weakest possible devices (microcontrollers, not even processors) where the very fact of checking constraints can cause significant problems. Those devices can easily be inside a car or other hardware device with a lifetime of decades.

Gather Streams operator before table update causing serial update leading to long running query in SQL Server 2017

I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.
A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.
How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.

Update deadlock issue

I'm using SQL server 2005 and running into deadlock issues. I've begun reading up on NO LOCK, but I'm not sure that is the correct way to solve my problem. Any assistance would be greatly appreciated.
I have a batch process that is running every 15 seconds. It generates dynamic UPDATE SQL statements based off a list of foreign keys. To over simplify, imagine the below simple SQL statement:
UPDATE dual
SET val1 = #val1
WHERE fk = #fk
;
Remember this example is over simplified, for each foreign key the SQL statement is actually different, but the table it updates and the values are always the same. I cannot just write a single update statement to deal with all the foreign keys at once.
If I run each statement one at a time everything works fine, but I risk going over my 15 second interval. As a silver bullet, I decided to multi thread the batch application so it would run 25 update statements at once instead of just 1 at a time. After doing this, I begin receiving deadlock errors.
How do I solve this deadlock issue? Three things to remember:
The batch is the only application that will ever INSERT, UPDATE, or
DELETE records from the table in question
Every UPDATE statement uses the foreign key in the WHERE clause, so
the batch would never access the same record at once
If a record gets bad data, the batch would self correct it in the
next run
Instead of your current setup, within your dynamic sql create a table variable and insert your values into it. This will all be inserts so you should not have to worry about deadlocks. Next update your table(s) with a single update by joining your table variable to your real table. This way you are only hitting your actual table with one single update statement.

Updating Identity with DELETE - OUTPUT - INSERT

I need to update an identity column in a very specific scenario (most of the time the identity will be left alone). When I do need to update it, I simply need to give it a new value and so I'm trying to use a DELETE + INSERT combo.
At present I have a working query that looks something like this:
DELETE Test_Id
OUTPUT DELETED.Data,
DELETED.Moredata
INTO Test_id
WHERE Id = 13
(This is only an example, the real query is slightly more complex.)
A colleague brought up an important point. She asked if this wont cause a deadlock since we are writing and reading from the same table. Although in the example it works fine (half a dozen rows), in a real world scenario with tens of thousands of rows this might not work.
Is this a real issue? If so, is there a way to prevent it?
I set up an SQL Fiddle example.
Thanks!
My first thought was, yes it can. And maybe it is still possible, however in this simplified version of the statement it would be very hard to hit an deadlock. You're selecting a single row for which probably row level locks are acquired plus the fact that the locks required for the delete and the insert are acquired very fast after each other.
I've did some testing against a table holding a million rows execution the statement 5 million times on 6 different connections in parallel. Did not hit a single deadlock.
But add the reallive query, an table with indexes and foreign keys and you just might have a winner. I've had a similar statement which did cause deadlocks.
I have encountered deadlock errors with a similar statement.
UPDATE A
SET x=0
OUTPUT INSERTED.ID, 'a' INTO B
So for this statement to complete mssql needs to take locks for the updates on table A, locks for the inserts on table B and shared (read) locks on table A to validate the foreign key table B has to table A.
And last but not least, mssql decided it would be wise to use parallelism on this particular query causing the statement to deadlock on itself. To resolve this I've simply set "MAXDOP 1" query hint on the statement to prevent parallelism.
There is however no definite answer to prevent deadlocks. As they say with mssql ever so ofter, it depends. You could take an exclusive using the TABLOCKX table hint. This will prevent a deadlock, however it's probably not desirable for other reasons.

How to efficiently use LOCK_ESCALATION in SQL Server 2008

I'm currently having troubles with frequent deadlocks with a specific user table in SQL Server 2008. Here are some facts about this particular table:
Has a large amount of rows (1 to 2 million)
All the indexes used on this table only have the "use row lock" ticked in their options
Edit: There is only one index on the table which is its primary Key
rows are frequently updated by multiple transactions but are unique (e.g. probably a thousand or more update statements are executed to different unique rows every hour)
the table does not use partitions.
Upon checking the table on sys.tables, I found that the lock_escalation is set to TABLE
I'm very tempted to turn the lock_escalation for this table to DISABLE but I'm not really sure what side effect this would incur. From What I understand, using DISABLE will minimize escalating locks from TABLE level which if combined with the row lock settings of the indexes should theoretically minimize the deadlocks I am encountering..
From what I have read in Determining threshold for lock escalation it seems that locking automatically escalates when a single transaction fetches 5000 rows..
What does a single transaction mean in this sense? A single session/connection getting 5000 rows thru individual update/select statements?
Or is it a single sql update/select statement that fetches 5000 or more rows?
Any insight is appreciated, btw, n00b DBA here
Thanks
LOCK Escalation triggers when a statement holds more than 5000 locks on a SINGLE object. A statement holding 3000 locks each on two different indexes of the same table will not trigger escalation.
When a lock escalation is attempted and a conflicting lock exists on the object, the attempt is aborted and retried after another 1250 locks (held, not acquired)
So if your updates are performed on individual rows and you have a supporting index on the column, then lock escalation is not your issue.
You will be able to verify this using the Locks-> lock escalation event from profiler.
I suggest you capture the deadlock trace to identify the actual cause of the deadlock.
I found this article after a quick Google of disabling table lock escalation. Although not a real answer for the OP I think it is still relevant for one off scripts and note worthy here. There's a nice little trick you can do to temporarily disable table lock escalation.
Open another connection and issue something like.
BEGIN TRAN
SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1=0
WAITFOR DELAY '1:00:00'
COMMIT TRAN
as
Lock escalation cannot occur if a different SPID is currently holding
an incompatible table lock.
from microsoft kb

Resources