Updating Identity with DELETE - OUTPUT - INSERT - sql-server

I need to update an identity column in a very specific scenario (most of the time the identity will be left alone). When I do need to update it, I simply need to give it a new value and so I'm trying to use a DELETE + INSERT combo.
At present I have a working query that looks something like this:
DELETE Test_Id
OUTPUT DELETED.Data,
DELETED.Moredata
INTO Test_id
WHERE Id = 13
(This is only an example, the real query is slightly more complex.)
A colleague brought up an important point. She asked if this wont cause a deadlock since we are writing and reading from the same table. Although in the example it works fine (half a dozen rows), in a real world scenario with tens of thousands of rows this might not work.
Is this a real issue? If so, is there a way to prevent it?
I set up an SQL Fiddle example.
Thanks!

My first thought was, yes it can. And maybe it is still possible, however in this simplified version of the statement it would be very hard to hit an deadlock. You're selecting a single row for which probably row level locks are acquired plus the fact that the locks required for the delete and the insert are acquired very fast after each other.
I've did some testing against a table holding a million rows execution the statement 5 million times on 6 different connections in parallel. Did not hit a single deadlock.
But add the reallive query, an table with indexes and foreign keys and you just might have a winner. I've had a similar statement which did cause deadlocks.
I have encountered deadlock errors with a similar statement.
UPDATE A
SET x=0
OUTPUT INSERTED.ID, 'a' INTO B
So for this statement to complete mssql needs to take locks for the updates on table A, locks for the inserts on table B and shared (read) locks on table A to validate the foreign key table B has to table A.
And last but not least, mssql decided it would be wise to use parallelism on this particular query causing the statement to deadlock on itself. To resolve this I've simply set "MAXDOP 1" query hint on the statement to prevent parallelism.
There is however no definite answer to prevent deadlocks. As they say with mssql ever so ofter, it depends. You could take an exclusive using the TABLOCKX table hint. This will prevent a deadlock, however it's probably not desirable for other reasons.

Related

Gather Streams operator before table update causing serial update leading to long running query in SQL Server 2017

I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.
A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.
How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.

Update deadlock issue

I'm using SQL server 2005 and running into deadlock issues. I've begun reading up on NO LOCK, but I'm not sure that is the correct way to solve my problem. Any assistance would be greatly appreciated.
I have a batch process that is running every 15 seconds. It generates dynamic UPDATE SQL statements based off a list of foreign keys. To over simplify, imagine the below simple SQL statement:
UPDATE dual
SET val1 = #val1
WHERE fk = #fk
;
Remember this example is over simplified, for each foreign key the SQL statement is actually different, but the table it updates and the values are always the same. I cannot just write a single update statement to deal with all the foreign keys at once.
If I run each statement one at a time everything works fine, but I risk going over my 15 second interval. As a silver bullet, I decided to multi thread the batch application so it would run 25 update statements at once instead of just 1 at a time. After doing this, I begin receiving deadlock errors.
How do I solve this deadlock issue? Three things to remember:
The batch is the only application that will ever INSERT, UPDATE, or
DELETE records from the table in question
Every UPDATE statement uses the foreign key in the WHERE clause, so
the batch would never access the same record at once
If a record gets bad data, the batch would self correct it in the
next run
Instead of your current setup, within your dynamic sql create a table variable and insert your values into it. This will all be inserts so you should not have to worry about deadlocks. Next update your table(s) with a single update by joining your table variable to your real table. This way you are only hitting your actual table with one single update statement.

Avoiding Locking Contention on DB2 zOS

I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.

Rows locking when running update statement in production

I'm wondering what is the correct solution to the below is.
I have an UPDATE statement in T-SQL that needs to be run as a daily task. The procedure will update one bit column in one table. Rows affected is around 30,000.
A pseudo version of the T-SQL
UPDATE TABLE_NAME
SET BIT_FIELD = [dbo].[FUNCTION](TABLE_NAME.ID)
WHERE -- THIS ISN'T RELEVANT
The function that determines true or false basically runs a few checks and hits around 3 other tables. Currently the procedure takes about 30 minutes to run and update 30,000 rows in our development environment. I was expecting this to double on production.
The problem I'm having is that intermittently TABLE_NAME table locks up. If I run it in batches of 1000 it seems ok but if I increase this it appears to run fine but eventually the table locks up. The only resolution is to cancel the query which results in no rows being updated.
Please note that the procedure is not wrapped in a TRANSACTION.
If I run each update in a separate UPDATE statement would this fix it? What would be a good solution when updating quite a large number of records in a live environment?
Any help would be much appreciated.
Thanks!
In your case, the SQL Server Optimizer has probably determined that a table lock is needed to perform the update of your table. You should perform rework on your query so that this table lock will not occur or will have a smaller impact on your users. So in a practical way this means: (a) speed up your query and (b) make sure the table will not lock.
Personally I would consider the following:
1. Create clustered and non-clustered indexes on your tables in order to improve the performance of your query.
2. See if it is possible to not use a function, but instead use joins, they are typically a lot faster.
3. Break up the update in multiple parts and perform these parts separately. You might have an 'or' satement in your 'where' clause, that is a good splitting point, but you can also consider creating a cursor to loop through the table and perform the update at one record at a time.

Concurrent update on the same record

I'm facing a strange issue with some TSQL code on SQL2005.
The piece we suspect is generating the issue is:
INSERT INTO SGVdProcessInfo
([StartTs])
VALUES
(GETDATE())
SELECT #IdProcessInfo = SCOPE_IDENTITY()
UPDATE TOP(#quantity)
[SGVdTLogDetail] WITH (ROWLOCK)
SET
[IdSGVdProcessInfo] = #IdProcessInfo
WHERE
[IdSGVdProcessInfo] IS NULL
and IdTLogDetailStatus != 9
#quantity usually takes 500.
There is a non-clustered index over IdSGVdProcessInfo and IdTLogDetailStatus on SGVdTLogDetail
What's happening is that some records of SGVdTLogDetail are first updated with one id of the processinfo table and later they are updated again by another process with a new processinfo ID.
I'm wondering if the rowlock hint is raising this issue or maybe there's something else...
My guess is while the update is being applied over the first 500 selected rows, another process is selecting the next group, and taking some records of the first group which are not yet updated (because of the rowlock). Is this possible?
Any help will be much appreciated!
Yes, that sounds right. You can fix it (at the cost of lost concurrency) by putting the entire operation inside of a serializeable transaction. That will guarantee that all the rows are locked for the life of the transaction, instead of only during the atomic row-level reads and updates.
I believe this is happening because SQL Server is escalating the row-level locks to page locks. You'd think that an UPDATE in which you specify the primary key would always cause a row lock, but when SQL Server gets a batch with a bunch of these, and some of them happen to be in the same page (depending on this situation, this can be quite likely, e.g. updating all files in a folder, files which were created at pretty much the same time), you'll see page locks, and bad things will happen. And if you don't specify a primary key for an UPDATE or DELETE, there's no reason the database wouldn't assume that a lot won't be affected, so it probably goes right to page locks, and bad things happen.
By specifically requesting row-level locks, these problems are avoided as you are doing, however, in your case lots of rows are affected, and the database is taking the initiative and escalating to page locks.

Resources