Update deadlock issue - sql-server

I'm using SQL server 2005 and running into deadlock issues. I've begun reading up on NO LOCK, but I'm not sure that is the correct way to solve my problem. Any assistance would be greatly appreciated.
I have a batch process that is running every 15 seconds. It generates dynamic UPDATE SQL statements based off a list of foreign keys. To over simplify, imagine the below simple SQL statement:
UPDATE dual
SET val1 = #val1
WHERE fk = #fk
;
Remember this example is over simplified, for each foreign key the SQL statement is actually different, but the table it updates and the values are always the same. I cannot just write a single update statement to deal with all the foreign keys at once.
If I run each statement one at a time everything works fine, but I risk going over my 15 second interval. As a silver bullet, I decided to multi thread the batch application so it would run 25 update statements at once instead of just 1 at a time. After doing this, I begin receiving deadlock errors.
How do I solve this deadlock issue? Three things to remember:
The batch is the only application that will ever INSERT, UPDATE, or
DELETE records from the table in question
Every UPDATE statement uses the foreign key in the WHERE clause, so
the batch would never access the same record at once
If a record gets bad data, the batch would self correct it in the
next run

Instead of your current setup, within your dynamic sql create a table variable and insert your values into it. This will all be inserts so you should not have to worry about deadlocks. Next update your table(s) with a single update by joining your table variable to your real table. This way you are only hitting your actual table with one single update statement.

Related

Gather Streams operator before table update causing serial update leading to long running query in SQL Server 2017

I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.
A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.
How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.

Updating Identity with DELETE - OUTPUT - INSERT

I need to update an identity column in a very specific scenario (most of the time the identity will be left alone). When I do need to update it, I simply need to give it a new value and so I'm trying to use a DELETE + INSERT combo.
At present I have a working query that looks something like this:
DELETE Test_Id
OUTPUT DELETED.Data,
DELETED.Moredata
INTO Test_id
WHERE Id = 13
(This is only an example, the real query is slightly more complex.)
A colleague brought up an important point. She asked if this wont cause a deadlock since we are writing and reading from the same table. Although in the example it works fine (half a dozen rows), in a real world scenario with tens of thousands of rows this might not work.
Is this a real issue? If so, is there a way to prevent it?
I set up an SQL Fiddle example.
Thanks!
My first thought was, yes it can. And maybe it is still possible, however in this simplified version of the statement it would be very hard to hit an deadlock. You're selecting a single row for which probably row level locks are acquired plus the fact that the locks required for the delete and the insert are acquired very fast after each other.
I've did some testing against a table holding a million rows execution the statement 5 million times on 6 different connections in parallel. Did not hit a single deadlock.
But add the reallive query, an table with indexes and foreign keys and you just might have a winner. I've had a similar statement which did cause deadlocks.
I have encountered deadlock errors with a similar statement.
UPDATE A
SET x=0
OUTPUT INSERTED.ID, 'a' INTO B
So for this statement to complete mssql needs to take locks for the updates on table A, locks for the inserts on table B and shared (read) locks on table A to validate the foreign key table B has to table A.
And last but not least, mssql decided it would be wise to use parallelism on this particular query causing the statement to deadlock on itself. To resolve this I've simply set "MAXDOP 1" query hint on the statement to prevent parallelism.
There is however no definite answer to prevent deadlocks. As they say with mssql ever so ofter, it depends. You could take an exclusive using the TABLOCKX table hint. This will prevent a deadlock, however it's probably not desirable for other reasons.

Is an update statement in T-SQL really just an Delete and Insert collectively?

Just curiosity,
Does the update statement delete and re-insert a row in MS SQL Server with the amended values?
The reason I asking is because I saw that it suggests to me that a row is deleted and re-inserted when using an output clause.
UPDATE dbo.table
SET field1= 'value'
OUTPUT DELETED.field1, INSERTED.field1
WHERE ID = 12345;
GO
If so, would a seperate DELETE FROM followed by INSERT INTO the same table work at more or less the same speed?
Even if it was physically a delete/insert pair you could never tell. SQL Server gives you specified behavior. How SQL Server physically performs the update is irrelevant for semantics (just relevant for performance).
That said, the DELETED and INSERTED tables are specified just this way so that you can use both old and new values.
Physically, an update is performed in-place if possible. If an index must be updated and the key of that index row changes you get a delete/insert pair. Again, this is indetectable to you.

Can adding a primary key identity column solve deadlock issues?

I have a table in SQL server that is CRUD-ed concurrently by a stored procedure running simultaneously in different sessions:
|----------------|---------|
| <some columns> | JobGUID |
|----------------|---------|
The procedure works as follows:
Generate a GUID.
Insert some records into the shared table described above, marking them with the GUID from step 1.
Perform a few updates on all records from step 2.
Select the records from step 3 as SP output.
Every select / insert / update / delete statement in the stored procedure has a WHERE JobGUID = #jobGUID clause, so the procedure works only with the records it has inserted on step 2. However, sometimes when the same stored procedure runs in parallel in different connections, deadlocks occur on the shared table. Here is the deadlock graph from SQL Server Profiler:
Lock escalations do not occur. I tried adding (UPDLOCK, ROWLOCK) locking hints to all DML statements and/or wrapping the body of the procedure in a transaction and using different isolation levels, but it did not help. Still the same RID lock on the shared table.
After that I've discovered that the shared table did not have a primary key/identity column. And once I added it, deadlocks seem to have disappeared:
alter table <SharedTable> add ID int not null identity(1, 1) primary key clustered
When I remove the primary key column, the deadlocks are back. When I add it back, I cannot reproduce the deadlock anymore.
So, the question is, is a primary key identity column really able to resolve deadlocks or is it just a coincidence?
Update: as #Catcall suggests, I've tried creating a natural clustered primary key on the existing columns (without adding an identity column), but still caught the same deadlock (of course, this time it was a key lock instead of RID lock).
The best resource (still) for deadlock resolution is here: http://blogs.msdn.com/b/bartd/archive/2006/09/09/deadlock-troubleshooting_2c00_-part-1.aspx.
Pt #4 says:
Run the queries involved in the deadlock through Database Tuning
Advisor. Plop the query in a Management Studio query window, change
db context to the correct database, right-click the query text and
select “Analyze Query in DTA”. Don’t skip this step; more than half
of the deadlock issues we see are resolved simply by adding an
appropriate index so that one of the queries runs more quickly and
with a smaller lock footprint. If DTA recommends indexes (it'll say
“Estimated Improvement: %”), create them and monitor to
see if the deadlock persists. You can select “Apply Recommendations”
from the Action drop-down menu to create the index immediately, or
save the CREATE INDEX commands as a script to create them during a
maintenance window. Be sure to tune each of the queries separately.
I know this doesn't "answer" the question to why necessarily, but it does show that adding indexes can change the execution in ways to make either the lock footprint smaller or execution time faster which can significantly reduce the chances of a deadlock.
Recently I have seen this post, according to above information i hope this post will help you,
http://databaseusergroup.blogspot.com/2013/10/deadlocked-on-sql-server.html

Rows locking when running update statement in production

I'm wondering what is the correct solution to the below is.
I have an UPDATE statement in T-SQL that needs to be run as a daily task. The procedure will update one bit column in one table. Rows affected is around 30,000.
A pseudo version of the T-SQL
UPDATE TABLE_NAME
SET BIT_FIELD = [dbo].[FUNCTION](TABLE_NAME.ID)
WHERE -- THIS ISN'T RELEVANT
The function that determines true or false basically runs a few checks and hits around 3 other tables. Currently the procedure takes about 30 minutes to run and update 30,000 rows in our development environment. I was expecting this to double on production.
The problem I'm having is that intermittently TABLE_NAME table locks up. If I run it in batches of 1000 it seems ok but if I increase this it appears to run fine but eventually the table locks up. The only resolution is to cancel the query which results in no rows being updated.
Please note that the procedure is not wrapped in a TRANSACTION.
If I run each update in a separate UPDATE statement would this fix it? What would be a good solution when updating quite a large number of records in a live environment?
Any help would be much appreciated.
Thanks!
In your case, the SQL Server Optimizer has probably determined that a table lock is needed to perform the update of your table. You should perform rework on your query so that this table lock will not occur or will have a smaller impact on your users. So in a practical way this means: (a) speed up your query and (b) make sure the table will not lock.
Personally I would consider the following:
1. Create clustered and non-clustered indexes on your tables in order to improve the performance of your query.
2. See if it is possible to not use a function, but instead use joins, they are typically a lot faster.
3. Break up the update in multiple parts and perform these parts separately. You might have an 'or' satement in your 'where' clause, that is a good splitting point, but you can also consider creating a cursor to loop through the table and perform the update at one record at a time.

Resources