Cause of a process being a deadlock victim - sql-server

I have a process with a Select which takes a long time to finish, on the order of 5 to 10 minutes. I am currently not using NOLOCK as a hint to the MS SQL database engine.At the same time we have another process doing updates and inserts into the same database and same tables. The first process has started, recently to end prematurely with a message
SQLEXCEPTION: Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim.
This first process is running at other sites in identical conditions but with smaller databases and thus the select statement in question takes a much shorter period of time (on the order of 30 seconds or so). In these other sites, I don't get the deadlock message in these other sites. I also did not get this message at the site that is having the problem initially, but, I assume, as the database has grown, I believe I must have crossed some threshold. Here are my questions:
Could the time it takes for a transaction to execute make the associated process more likely to be flagged as a deadlock victim.
If I execute the select with a NOLOCK hint, will this remove the problem?
I suspect that a datetime field that is checked as part of the WHERE clause in the select statement is causing the slow lookup time. Can I create an index based on this field? Is it advisable?

Q1:Could the time it takes for a transaction to execute make the associated process more likely to be flagged as a deadlock victim.
No. The SELECT is the victim because it had only read data, therefore the transaction has a lower cost associated with it so is chosen as the victim:
By default, the Database Engine chooses as the deadlock victim the
session running the transaction that is least expensive to roll back.
Alternatively, a user can specify the priority of sessions in a
deadlock situation using the SET DEADLOCK_PRIORITY statement.
DEADLOCK_PRIORITY can be set to LOW, NORMAL, or HIGH, or alternatively
can be set to any integer value in the range (-10 to 10).
Q2. If I execute the select with a NOLOCK hint, will this remove the problem?
No. For several reasons:
you should first try to eliminate the deadlock properly, by investigating the root cause
dirty reads are inconsistent reads.
the proper way to specify dirty reads is to use transaction isolation levels
there is a much better solution: read committed snapshot.
Q3. I suspect that a datetime field that is checked as part of the WHERE clause in the select statement is causing the slow lookup time. Can I create an index based on this field? Is it advisable?
Probably. The cause of the deadlock is almost very likely to be a poorly indexed database.10 minutes queries are acceptable in such narrow conditions, that I'm 100% certain in your case is not acceptable.
With 99% confidence I declare that your deadlock is cased by a large table scan conflicting with updates. Start by capturing the deadlock graph to analyze the cause. You will very likely have to optimize the schema of your database. Before you do any modification, read this topic Designing Indexes and the sub-articles.

Here is how this particular deadlock problem actually occurred and how it was actually resolved. This is a fairly active database with 130K transactions occurring daily. The indexes in the tables in this database were originally clustered. The client requested us to make the indexes nonclustered. As soon as we did, the deadlocking began. When we reestablished the indexes as clustered, the deadlocking stopped.

The answers here are worth a try, but you should also review your code. Specifically have a read of Polyfun's answer here:
How to get rid of deadlock in a SQL Server 2005 and C# application?
It explains the concurrency issue, and how the usage of "with (updlock)" in your queries might correct your deadlock situation - depending really on exactly what your code is doing. If your code does follow this pattern, this is likely a better fix to make, before resorting to dirty reads, etc.

Although #Remus Rusanu's is already an excelent answer, in case one is looking forward a better insight on SQL Server's Deadlock causes and trace strategies, I would suggest you to read Brad McGehee's How to Track Down Deadlocks Using SQL Server 2005 Profiler

Related

SQL Server Deadlocks Process

I have two question..
I want to confirm are deadlock may happen if one session is querying a table which is locked by another session with.
And how do resolve the above mentioned SQL error when there are more than one computer accessing to the MSSQL Server for actions such as update and delete
enter image description here
A deadlock can occur when a transaction that has started to change data, conflicts with another transaction for acquiring an exclusive lock. A blockage, even a very long one, does not necessarily lead to a deadly lock.
It is possible to eradicate any deadlock using the banker's algorithm but this leads to crippling any access conccurancy in the database, which leads to have only one user in parallel!
However, we can reduce the occurrence of deadly locks by:
modeling the base correctly, i.e. respecting the normal form
indexing tables for reading as well as updating
reducing the level of isoaltion of transactions
switching from pessimistic locking (default in SQL Server) to optimized locking
avoiding unnecessary transactions
reducing the number of requests in the explicit transaction
changing the sequence of processing logic in the transaction
See the link below Using WITH For SELECT queries use WITH(NOLOCK) for UPDATE,DELETE,INSERT use WITH(XLOCK) ..

What is the proper way to run a long query against an active database?

We are using SQL Server 2012 EE but currently do not have the option to run queries on a R/O mirror though that is my long term goal, though am concerned I may run into the below issue in that scenario as well since the mirror would also be updating data I am querying.
I have a view that joins across several tables from two databases and is used for invoicing from existing data. Three of these tables are also actively updated by ongoing transactions. Running a report that used this view did not used to be a problem but now our database is getting much larger and I have run into some timeout problems. First the query was timing out so I set command timeout to 0 and reran the query which pegged all 4 CPUs 100% for 90 minutes and then I killed it. There were no problems with active transactions during that time. I reviewed the query and found a field I was joining on that was not indexed so created an index on that field, reran the report, which then finished in three minutes and all the CPUs were busy but not at all pegged out. Same data amount queried both times. I figured problem solved. Of course later, my boss ran a similar query, perhaps with some more data but probably not a lot more, and our live transactions started timing out 100% while his query was running. I did not get a chance to see the CPU usage during that time.
So my questions are two:
Given I have to use the live and active database, what is the proper way to run a long R/O query so that active transactions can still continue? I am considering NO LOCK but am hoping there is a better standard practice.
And what might cause sqlserver to peg out 4 CPUs with 100% busy and not cause live transaction timeouts, yet when my boss ran his query, after I added the index and my query ran much better, the live update transactions start timing out 100%?
I know this is not a lot of info to go on. I'm not very familiar with sql profiling and performance monitoring yet this behavior seems rather odd and am hoping a best practice would be the correct workaround.
The default behavior of SELECT queries in the READ_COMMITTED transaction isolation level is to acquire shared locks during query execution to provide the requested data consistency (read committed data only). These locks are typically row-level and released quickly during query execution immediately after each row is read. There are also less granular intent locks at the page and table level prevent concurrent updates to data as it is being read. Depending on the particulars of the execution plan, there may even be shared locks held at the table level for the duration of the query, which will prevent updates to the table during query execution and result in readers blocking writers.
Setting the READ_COMMITTED_SNAPSHOT database option causes SQL Server to use row versioning instead of locking to provide the same read consistency. A row version store is maintained in tempdb so that when a row requested by the query has changed since the query began, the most recent committed row version is returned instead. This row-versioning behavior avoids locking and effectively provides a statement-level snapshot of the database at the time the query began. Readers do not block writers and writers do not block readers. Do not confuse the READ_COMMITTED_SNAPSHOT database option with the SNAPSHOT isolation level (a common mistake).
The downside of setting the READ_COMMITTED_SNAPSHOT is additional resource usage. An additional 14 bytes of storage overhead for each row is incurred once the database option is enabled. Updates and deletes will generate row versions in tempdb. These versions require tempdb space for the duration of the longest running query and there is overhead in maintained the version store. Also consider whether you have existing applications that depend on readers-block-writers locking behavior. Despite this overhead, the concurrency benefits may yield better overall performance depending on your workload, while providing read integrity. See http://technet.microsoft.com/en-us/library/ms188277.aspx for more information.
Actually I decided to create a snapshot at the beginning of each month for reporting to run against. Then delete when no longer needed for reporting. This seems to work fine. I could do something similar with a database restore but slightly more work. This allows not needing a second SQL EE license, and lets me run reports w/o locking tables for live transactions.

Table partitioning in sql server - how do you ensure it wins a deadlock over queries?

The bane of my existence is when the MERGE on a table partition gets deadlocked and eventually loses.
I've tried setting the deadlock_priority to high.
I have code that kills each query hitting the tables beforehand, but sometimes I still get a query in there during those milliseconds.
I'm at a loss. Is there a good way to do this?
You must use hints in order to lock the objects that are used by the merge statement -xlock,holdlock.
My advice is to start the sql Profiler with event Deadlock Graph and try to reproduce the deadlock with a tool (I use this tool) and then to figure out why it is happening. Read this as initial information.

Can I be a deadlock victim if I'm not executing a query within a transaction?

Lets say I open a transaction and run update queries.
BEGIN TRANSACTION
UPDATE x SET y = z WHERE w = v
The query returns successfully and the transaction stays open deliberately for a period of time before I decide to commit.
While I'm sitting on the transaction is it ever possible the MSSQL deadlock machinary would be able to preempt my open transaction that is not actually executing anything to either clear a deadlock or free resources as system memory/resource limits are reached?
I know about SET DEADLOCK_PRIORITY and have read the MSDN articles on the topic of deadlocks. Logically since I'm not actively seeking to stake claim on any additional resources I can't imagine a scenario that would trigger a sane deadlock avoidance algorithm.
Does anyone know for sure if its possible that simply holding any locks can make me a valid target? Similarly could any low resource condition trigger the killing of my SPID?
NO
For a deadlock to occur all the participants in the deadlock chain must be waiting for a resource (a lock). If your connection is idle it means it doesn't execute a request, which implies it cannot be waiting.
As for other conditions that can kill your session I can think of at least three:
administrative operations that use WITH ROLLBACK_IMMEDIATE
a mirroring failover
intentional KILL <yourspid>, perhaps as a joke by your friendly DBA
To answer your question: you can be a deadlock victim if you're not executing a query in a transaction.
It's counter-intuitive, but you can be a deadlock victim by running a SELECT statement.
It can happen if you're running a query that uses an index:
you scan indexes looking for matching rows
other process starts updating data pages
you now want to fetch data from data pages from matching rows
other process holding locks on data pages
you wait for data page locks to release
other process finished updating data pages, wants to update indexes
you are holding read locks on indexes
other process waits for index locks to release
DEADLOCK
So, strictly speaking, you can be a deadlock victim, when you're not executing a query in a transaction. The other guy wasn't executing his UPDATE statement in a transaction either.
Nobody's explicitly using a transaction, yet there's a deadlock.
Possible problems:
SQL Server only has a finite number of locks. It is possible to run out of locks.
Other resources are finite (e.g. memory, tempdb). Holding on to these resources could cause those resources to run out.
Transaction logs - the logical transaction logs cannot be freed for re-use if a transaction is open. The result could be a log that fills up. This problem could stop your process because it would halt the entire instance.
To consider:
CASCADE: DELETE may only have one table in the command, but the a CASCADE relationship may touch other tables.
Triggers: Triggers on the modified table may affect other tables.
DELETE and UPDATE commands may use the FROM clause which touch other tables. I've never seen this, but I would not rule it out.
Transactions may time out, is that what is happening.
As you have at least 1 (or more) update locks taken out and make be some read and table scan locks, you may be killed to help free up deadlocks created by other transactions. The deadlock recovery code in SQL Server is unlikely to be totally bug free and it is not normal to keep a transaction open for a long time on SQL Server. However I would not expect that to happen often.
Some system when they detach deadlock type problem, just start killing “long lived” transactions that have not done match work so as to free up locks. Just because you are not part of the deadlock loop, does not stop the system picking on you.
To understand what is going on in your case, you will have to use the Sql Server Profiler to collect all the locking and deadlock related events, as well as event about aborted connection and transactions etc. Good lack this will time some time and a good level of understanding of the profiler events you are looking at...
The detail of this sort of things are different between database vendors and versions of their database. However as it is considered bad design by most database vendors to have a transaction open for a long time, doing so tends to lead to problems and hit code paths that have not had the most testing effort.
Just because you're not in a transaction doesn't mean you're not holding locks.

Concurrency issues

Here's my situation (SQL Server):
I have a web application that utilizes nHibernate for data access, and another 3 desktop applications. All access the same database, and are likely to utilize the same tables at any one time.
Now, with the help of NH I'm batching selects in order to load an aggregate with all of its hierarchy - so I would see 4 to maybe 7 selects being issued at once (not sure if it matters).
Every few days one of the applications will get a : "Transaction has been chosen as the deadlock victim." (this usually appears on a select)
I tried changing to snapshot isolation on the database , but that didn't helped - I was ending up with :
Snapshot isolation transaction aborted
due to update conflict. You cannot use
snapshot isolation to access table
'...' directly or indirectly in
database '...' to update,
delete, or insert the row that has
been modified or deleted by another
transaction. Retry the transaction or
change the isolation level for the
update/delete statement.
What suggestions to you have for this situation ? What should I try, or what should I read in order to find a solution ?
EDIT:
Actually there's no raid in there :). The number of users per day is small (I'll say 100 per day - with hundreds of small orders on a busy day), the database is a bit bigger at about 2GB and growing faster every day.
It's a business app, that handles orders, emails, reports, invoices and stuff like that.
Lazy loading would not be an option in this case.
I guess taking a very close looks at those indexes is my best bet.
Deadlocks are complicated. A deadlock means that at least two sessions have locks and are waiting for one another to release a different lock; since both are waiting, the locks never get released, neither session can continue, and a deadlock occurs.
In other words, A has lock X, B has lock Y, now A wants Y and B wants X. Neither will give up the lock they have until they are finished with their transaction. Both will wait indefinitely until they get the other lock. SQL Server sees that this is happening and kills one of the transactions in order to prevent the deadlock. Snapshot isolation won't help you - the DB still needs to preserve atomicity of transactions.
There is no simple answer anyone can give as to why a deadlock would be occurring. You'll need to profile your application to find out.
Start here: How to debug SQL deadlocks. That's a good intro.
Next, look at Detecting and Ending Deadlocks on MSDN. That will give you a lot of good background information on why deadlocks occur, and help you understand what you're looking at/for.
There are also some previous SO questions that you might want to look at:
Diagnosing Deadlocks in SQL Server 2005
Zero SQL deadlock by design
Or, if the deadlocks are very infrequent, just write some exception-handling code into your application to retry the transaction if a deadlock occurs. Sometimes it can be extremely hard (if not nearly impossible) to prevent certain deadlocks. As long as you write transactionally-safe code, it's not the end of the world; it's completely safe to just try the transaction again.
Is your hardware properly configured (specifically RAID configuration)? Is it capable of matching your workload?
If hardware is all good and humming, you should ensure you have the 'right' indexes to match your query workload.
Many locking/deadlock problems can be eliminated with the correct indexes (covering indexes can take pressure off the clustered index during inserts).
BTW: turning on snapshot isolation will put increased pressure on your tempDB. How is tempDB configured? RAID 0 is preferred (and even better use an SSD if tempDB is a bottleneck).
While it's not uncommon to find this error in NHibernate sessions with large numbers of users, it seems to be happening too often in your case.
Perhaps your objects are very large resulting in long-running selects? And if your selects are taking too long, that might indicate problems with your indexes (as Mitch Wheat explains)
If everything is in order, you could also try Lazy Loading to postpone your selects until when you really need your data. This might not be appropriate for your exact situation so you do have to see if it works.

Resources