SQL Server deadlock graph: please explain - sql-server

I am incapable of understanding a deadlock_xml from Azure SQL Server V12. Here is the graph (which is consistent with the underlying XML):
So the rhs process has issued an Update lock and the lhs process, which also wants an Update lock on the same resource, has to wait.
Then the rhs process requests an exclusive lock on the same resource which apparently is blocked due to an Update lock of the lhs process (why? because it has requested one??!).
My question:
Why the rhs process cannot restrict the U lock to an X lock?
I am trying to understand this at a high level but, nevertheless, here are the specifics:
Both processes were running the same sp
The sp performs an upsert op: Insert where not exists (Select...); if ##ROWCOUNT= 0 Update...
the transactions were serializable.

Both actions are affecting the same table. You see a Key Lock on the primary key index "PK_Product".
I try to put it in an easy example:
A man comes into a room and says: "I'm going to tear down this wall!" and walks out to get his tools. Another one comes in and says: "I'm going to paint this wall!" and walks out to get the color. Now both come back and want to start the work. The one tearing down the wall starts a little earlier. Now for the second man there is no sense in waiting. These processes cannot be serialized. He cannot wait, until the first one is finished. The first man's work changed the base of his work and made it impossible.
For you this means: Both processes say: "We are going to update this table, but we check for a certain condition first". Since an INSERT affects the primary key the second process cannot wait and continue a little later. This process can only be killed and re-started.
You might have a look at MERGE-command, which allows you to peform the upsert within one single go.

Related

Prioritizing Transactions in Google AppEngine

Let's say I need to perform two different kinds write operations on a datastore entity that might happen simultaneously, for example:
The client that holds a write-lock on the entry updates the entry's content
The client requests a refresh of the write-lock (updates the lock's expiration time-stamp)
As the content-update operation is only allowed if the client holds the current write-lock, I need to perform the lock-check and the content-write in a transaction (unless there is another way that I am missing?). Also, a lock-refresh must happen in a transaction because the client needs to first be confirmed as the current lock-holder.
The lock-refresh is a very quick operation.
The content-update operation can be quite complex. Think of it as the client sending the server a complicated update-script that the server executes on the content.
Given this, if there is a conflict between those two transactions (should they be executed simultaneously), I would much rather have the lock-refresh operation fail than the complex content-update.
Is there a way that I can "prioritize" the content-update transaction? I don't see anything in the docs and I would imagine that this is not a specific feature, but maybe there is some trick I can use?
For example, what happens if my content-update reads the entry, writes it back with a small modification (without committing the transaction), then performs the lengthy operation and finally writes the result and commits the transaction? Would the first write be applied immediately and cause a simultaneous lock-refresh transaction to fail? Or are all writes kept until the transaction is committed at the end?
Is there such a thing as keeping two transactions open? Or doing an intermediate commit in a transaction?
Clearly, I can just split my content-update into two transactions: The first one sets a "don't mess with this, please!"-flag and the second one (later) writes the changes and clears that flag.
But maybe there is some other trick to achieve this with fewer reads/writes/transactions?
Another thought I had was that there are 3 different "blocks" of data: The current lock-holder (LH), the lock expiration (EX), and the content that is being modified (CO). The lock-refresh operation needs to perform a read of LH and a write to EX in a transaction, while the content-update operation needs to perform a read of LH, a read of CO, and a write of CO in a transaction. Is there a way to break the data apart into three entities and somehow have the transactions span only the needed entities? Since LH is never modified by these two operations, this might help avoid the conflict in the first place?
The datastore uses optimistic concurrency control, which means that a (datastore primitive) transaction waits until it is committed, then succeeds only if someone else hasn't committed first. Typically, the app retries the failed transaction with fresh data. There is no way to modify this first-wins behavior.
It might help to know that datastore transactions are strongly consistent, so a client can first commit a lock refresh with a synchronous datastore call, and when that call returns, the client knows for sure whether it obtained or refreshed the lock. The client can then proceed with its update and lock clear. The case you describe where a lock refresh and an update might occur concurrently from the same client sounds avoidable.
I'm assuming you need the lock mechanism to prevent writes from other clients while the lock owner performs multiple datastore primitive transactions. If a client is actually only doing one update before it releases the lock and it can do so within seconds (well before the datastore RPC timeout), you might get by with just a primitive datastore transaction with optimistic concurrency control and retries. But a lock might be a good idea for simple serialization of, say, edits to a record in a user interface, where a user hits an "edit" button in a UI and you want that to guarantee that the user has some time to prepare and submit changes without the record being changed by someone else. (Whether that's the user experience you want is your decision. :) )

Dealing with race condition in transactional database table

Let me lay the scenario out first. Say you have a database for a business app and one of the things it tracks is inventory. The system says you have 5 screws in stock. Say you needed all 5. The system creates an inventory transaction record for -5. After you commit that transaction, since you know you had 5 before and you pulled out 5, if you sum up all the inventory transaction records for that screw the total should be 0. The problem occurs when two people are trying to do this at the same time. Say one person wants 4 and the other wants 2. Both client apps check the quantity beforehand and they are both told 5. At the exact same time one creates a transaction for -4 and the other for -2. The results in the total inventory quantity to be -1 which should never be possible because the system should not allow negative inventory.
How would you solve this if you didn't have a server application to help you? I mention that because a server coordinating the inventory transactions is how I would solve it but right now our product has no server application. We just have client apps which talk to a Firebird database directly. I'm trying to figure out how to do this with just the client apps and database. One thing that might help is that Firebird has something called a Generator which is basically a unique number generator that is atomic so you are guaranteed that if you asked Firebird to increment the generator and give you the next number that it will not give anyone else that same number.
My mind was going down the route of trying to create a makeshift record lock using a generator. I thought I could have them both check a "lock" field on the Item table. If it is null, then noone has a lock. If it is non-null it is locked so you need to keep checking back until it is not locked. If there is no lock you ask the generator for a uniq number and store that in the locking field for the Item you want to lock. You commit that transaction then go back and check to see if it is indeed the case that the Item table's lock field contains the number you put there. If it does then you have successfully locked and if it doesn't then that means someone was locking it at the same time and you lost the race. Once you are done you null out the lock and the client that is waiting will then see the null, lock it themselves and repeat.
This itself has a race condition I believe though. Trxn1 (transaction 1) checks lock and finds null. Trxn2 checks lock and finds null. Trxn1 gets new lock number from generator. Trxn2 gets new lock from generator. Trxn1 says update Item record with my lock if lock is still null which it is. Trxn1 commits trxn then starts a new Trxn1 and proves the lock contains his lock id and it does so it knows it has permission to make inventory transactions and it starts doing so. Right after Trxn1 checks to see if it got the lock Trxn2 commits its update statement that stored its lock if the lock was null. If Trxn2 executed his update statement before Trxn1 committed the lock then Trxn2 would still see the value as null and the update would occur. If Trxn2's lock commit happens after Trxn1 committed lock and already verified it we have a problem. Trxn1 is making changes to Item transaction table. Trxn2 got his lock committed because the lock was null in its transaction world when it did it and when it commits Trxn2's update statement will overwrite Trxn1's lock because the null check in the update statement happened before both committed, not at the time of commit. So now both think they have a lock and we will end up with negative inventory.
Can anyone think of a way to solve this short of having a server application with some kind of queueing system (FIFO)? I would prefer if it could all be done via clients "talking to the database" to coordinate this but that may not be possible technically speaking. Sorry If this got a bit wordy :D
Solution Edit:
jtahlborn seems to have the right idea. I somehow didn't realize that Firebird does in fact have row level locking. Simple select statements (no joins, group by, etc) can have "with lock" appended to the end of the statement and any row returned by the statement will be locked until the transaction is committed or rolled back. Noone else can obtain a lock on that row nor make changes to it. Because I don't want to lock the entire ITEM table while I'm inserting rows in to the Item transaction table, I am going to create a table just for locking that has one column (the ItemID field). Because the second transaction will get an error when it tries to do it's own lock, it doesn't matter that I am never actually modifying anything on the locking table itself. Failing to get a lock gives me all the information I need. I will put triggers on the insert / delete of the ITEM table so that for every Item record this is also a record in the ITEMLOCK table. Here is the process I'm going to use.
Start database transaction
Attempted to obtain lock on ITEMLOCK row with the ItemID of the Item you want to change
If you can't get a lock keep trying until the record is unlocked
Once locked go prove that the quantity on hand of that Item is enough to cover what you
want to take out, because they could have old data this might not be
the case and it will drop out here and message the user
If sufficient quantities exist insert your inventory transaction record in the inventory transaction table
Commit transaction which in turn releases the lock
Note: Matthieu M mentioned the FOR UPDATE clause. It is mentioned in the documentation along with the WITH LOCK clause. As I understand it you can use that when you are locking multiple rows with one statement. I am not one hundred percent sure, but it seems like doing this with WITH LOCK will trying an all or nothing approach and FOR UPDATE will lock each one separately one at a time. I am not sure what happens if it locked the first 100 records you asked for but on the 101th record it couldn't get a lock. Does it then release the 100 locks you did get? I will need to lock more than one Item at a time, but I do not feel comfortable with FOR UPDATE since I feel like I don't truly understand the difference. I also probably want to know which Item was already locked for user messaging purposes (going to put a timeout so trxns wont wait forever for a lock) so I will be locking one at at time using WITH LOCK.
Note 2: I want to point out to anyone using this in their own code to be careful. I am going to have a very simple loop when waiting for a lock to be released (is it released yet? how about now? now?). If I had a ton of users possibly trying to lock the same row at the same time there may be a deadlock scenario. Say you have a slow client. That client may always end up with the short end of the stick because every time the lock was release some other client then grabbed it faster than the slow client could. If this happened over and over this would be essentially a deadlock scenario. If I was worried about that I would need a way to figure out who is first in line. In my case, database transactions should be short lived, we never have more than 50 users (not a cloud system), and it is highly unlikely that they all are using this part of the system at the same time trying to modify the exact same Item's inventory quantity.
The simplest solution is to lock some primary row (like the main "item") and use this as your distributed locking mechanism. (assuming your database supports row-level locks, as most modern dbs do).
I recommend reading up about the CAP theorem and how it may be an explanation for the scenario you are describing. EDIT: Having read in more detail, my comment may be of limited use because it seems you already know this and are trying to solve the problem within Firebird.

When does "select for update" lock and unlock?

Here is my Pseudo-codeļ¼š
re = [select **result** from table where **condition**=key for update]
if[re satisfies]
{
delete from table where **condition** = key;
}
commit
I want to ask if the row with condition equals to "key" has already been deleted, Can the lock blocked by the "select for update" be unlocked automatically, which means if another process enters at this point and select for the same "key" it can not be blocked by this one ?
Locks are taken during (usually at or near the beginning of) a command's execution. Locks (except advisory locks) are released only when a transaction commits or rolls back. There is no FOR UNLOCK, nor is there an UNLOCK command to reverse the effects of the table-level LOCK command. This is all explained in the concurrency control section of the PostgreSQL documentation.
You must commit or rollback your transaction to release locks.
Additionally, it doesn't really make sense to ask "has this row already been deleted by another concurrent transaction". It isn't really deleted until the transaction that deleted the row commits... and even then, it might've deleted and re-inserted the row or another concurrent transaction might've inserted the row again.
Are you building a task queue or message queue system by any chance, because if so, that problem is solved and you shouldn't be trying to reinvent that unusually complicated wheel. See PGQ, ActiveMQ, RabbitMQ, ZeroMQ, etc. (Future PostgreSQL versions may include FOR UPDATE SKIP LOCKED as this is being tested, but hasn't been released at time of writing).
I suggest that you post a new question with a more detailed description of the underlying problem you are trying to solve. You're assuming that the solution to your problem is "find out if the row has already been deleted" or "unlock the row". That probably isn't actually the solution. It's a bit like someone saying "where do I buy petrol" when their push-bike doesn't go so they assume it's out of fuel. Fuel isn't the problem, the problem is that push bikes don't take fuel and you have to pedal them.
Explain the background. Explain what you're trying to achieve. Above all else, don't post pseudocode, post the actual code you are having problems with, preferably in a self-contained and runnable form.

How to get rid of the deadlock

I've described the problem here:
Deadlock under ReadCommited IL
and got an answer:
So a deadlock can occur when running SELECTs because the transaction running
those selects had acquired write locks before running the SELECT.
Okay, so what can I do to get rid of this? There are some common type of deadlocks which can be solved by adding covered indices or changing isolation level of changing the sql command text, using table hints, etc, but I can't think of a proper solution for my case.
Seems like this is the most common and easiest reason of deadlock:
Process A acquired lock on resouce R1, Process B acquires lock on resource R2
Process A waits for resource R2 to be released, and process B waits for R1
So this is largely a parallelism problem, and, probably, business logic also.
Maybe I would be able to avoid the deadock if the locks were applied to rows, but seems that there are several rowlocks within a page and then lock escalation occurs and then I have the whole page locked.
What would you advice? Disable lock escalation ? Can I do it locally for 1 transaction?
Or maybe applying some table-hint (WITH ROWLOCK) or something...idk
Changing isolation level to snapshot (or other type) is not an option now.
Thanks.
Fixing deadlocks is mostly a task that is specific to the particular transactions under consideration. There is little general advice to be given (except enable snapshot isolation which you cannot do).
One pattern emerges as a standard fix, though: Acquire all necessary locks in the right order with the right lock-modes. This might mean selecting WITH (UPDLOCK, ROWLOCK, HOLDLOCK) in order to proactively U-lock rows.
I haven't seen lock escalation to be the problem because it requires very high amounts of locks to kick in. Of course it might be the reason but most often, row locks are enough to trigger deadlock.

how to solve deadlock problem?

i have read this dead lock problem When database tables start accumulating thousands of rows and many users start working on the same table concurrently, SELECT queries on the tables start producing lock contentions and transaction deadlocks.
Is this deadlock problem related with TransactNo updlock?
If you know this problem, let me know pls.
Thanks in advance.
Deadlocks can occur for many reasons and sometimes troubleshooting deadlocks can be more of an art than a science.
What I use to find and get rid of deadlocks, outside of plain SQL Profiler, is a lightweight tool that gives a graphical depiction of deadlocks as they occur. When you see a deadlock, you can drill down and get valuable information. Deadlock Detector -- http://www.sqlsolutions.com/products/sql-deadlock-detector
It's a simple tool, but for me, it does exactly what it is supposed to do. One thing: the first time I used it, I had to wait 15 minutes for the tool to gather enough metrics to start showing deadlocks.
A common issue with high isolation is lock escalation deadlocks due the the following scenario; i.e. (where X is any resource, such as a row)
SPID a reads X - gets a read lock
SPID b reads X - gets a read lock
SPID a attempts to update X - blocked by b's read lock, so has to wait
SPID b attempts to update X - blocked by a's read lock, so has to wait
Deadlock! This scenario can be avoided by taking more locks:
SPID a reads X with (UPDLOCK) specified - gets an exclusive lock
SPID b attempts to reads X - blocked by a's exclusive lock, so has to wait
SPID a attempts to update X - fine
... (SPID a commits/rolls-back, and releases the lock at some point)
... (SPID b does whatever it wanted to do)
A deadlock can happen for many many reasons so you would have to do a little bit of homework first if you want to be helped and tell us what is causing the deadlock, ie. what are the batches involve din the deadlock executing, what resources are involved and so on and so forth. The Profiler deadlock event graph is always a great place to start the investigation.
If I'd venture a shot in the dark what happens is that your queries and indexes are not tuned properly so most of your read operations (and perhaps some of the writes) are full table scans and thus are guaranteed to collide with updates. This can cause deadlocks by order of index access, deadlock by order of operations, deadlock by escalation and so on and so forth.
Once you identify the cause of the deadlock then the proper action to remove it can be taken. The cases when he proper action is to resort to dirty reads are extremely rare.
BTW I'm not sure what you mean by 'TransactNo updlock'. Are you specifically asking about the S-U/U-S asymmetry of the U locks?
You have not supplied enough information to answer your question directly.
But most locking and blocking can be reduced (or even eliminated) by having the 'correct' indexes to cover your query workload.
Due you have a regular index maintainance job scheduled?
If you have SELECTs that do not need to be 100% accurate (i.e. allow dirty reads etc) then you can run some SELECTS with WITH(NOLOCK), which is the same as an isolation level of READ UNCOMMITED. Please Note: I'm not suggesting you place WITH(NOLOCK) everywhere; just on those SELECTS that do not need 100% intact data.
I'll throw my own articles and posts into the mix about deadlocks:
https://www.sqlskills.com/blogs/jonathan/category/deadlock/
I also have a series of videos on troubleshooting deadlocking on JumpstartTv.com as well:
http://jumpstarttv.com/profiles/1379/Jonathan-Kehayias.aspx
Deadlocks can be difficult to resolve, but unless you post your deadlock graph information, there isn't anyway we can do more than offer up links to posts and information on solving deadlocks.
"Deadlock Troubleshooting, Part 1"
http://blogs.msdn.com/bartd/archive/2006/09/09/Deadlock-Troubleshooting_2C00_-Part-1.aspx
"When Index Covering Prevents Deadlocks"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/03/when-index-covering-prevents-deadlocks.aspx

Resources