Deadlock caused by SELECT JOIN statement with SQL Server - sql-server

When executing a SELECT statement with a JOIN of two tables SQL Server seems to
lock both tables of the statement individually. For example by a query like
this:
SELECT ...
FROM
table1
LEFT JOIN table2
ON table1.id = table2.id
WHERE ...
I found out that the order of the locks depends on the WHERE condition. The
query optimizer tries to produce an execution plan that only reads as much
rows as necessary. So if the WHERE condition contains a column of table1
it will first get the result rows from table1 and then get the corresponding
rows from table2. If the column is from table2 it will do it the other way
round. More complex conditions or the use of indexes may have an effect on
the decision of the query optimizer too.
When the data read by a statement should be updated later in the transaction
with UPDATE statements it is not guaranteed that the order of the UPDATE
statements matches the order that was used to read the data from the 2 tables.
If another transaction tries to read data while a transaction is updating the
tables it can cause a deadlock when the SELECT statement is executed in
between the UPDATE statements because neither the SELECT can get the lock on
the first table nor can the UPDATE get the lock on the second table. For
example:
T1: SELECT ... FROM ... JOIN ...
T1: UPDATE table1 SET ... WHERE id = ?
T2: SELECT ... FROM ... JOIN ... (locks table2, then blocked by lock on table1)
T1: UPDATE table2 SET ... WHERE id = ?
Both tables represent a type hierarchy and are always loaded together. So it
makes sense to load an object using a SELECT with a JOIN. Loading both tables
individually would not give the query optimizer a chance to find the best
execution plan. But since UPDATE statements can only update one table at a
time this can causes deadlocks when an object is loaded while the object
is updated by another transaction. Updates of objects often cause UPDATEs on
both tables when properties of the object that belong to different types of the
type hierarchy are updated.
I have tried to add locking hints to the SELECT statement, but that does not
change the problem. It just causes the deadlock in the SELECT statements when
both statements try to lock the tables and one SELECT statement gets the lock
in the opposite order of the other statement. Maybe it would be possible to
load data for updates always with the same statement forcing the locks to be
in the same order. That would prevent a deadlock between two transactions that
want to update the data, but would not prevent a transaction that only reads
data to deadlock which needs to have different WHERE conditions.
The only work-a-round so this so far seems to be that reads may not get locks
at all. With SQL Server 2005 this can be done using SNAPSHOT ISOLATION. The
only way for SQL Server 2000 would be to use the READ UNCOMMITED isolation
level.
I would like to know if there is another possibilty to prevent the SQL Server
from causing these deadlocks?

This will never happen under snapshot isolation, when readers do not block writers. Other than that, there is no way to prevent such things. I wrote a lot of repro scripts here: Reproducing deadlocks involving only one table
Edit:
I don't have access to SQL 2000, but I would try to serialize access to the object by using sp_getapplock, so that reading and modifications never run concurrently. If you cannot use sp_getapplock, roll out your own mutex.

Another way to fix this is to split the select... from... join into multiple select statements. Set the isolation level to read committed. Use table variable to pipe data from select to be joined to other. Use distinct to filter down inserts into these table variables.
So if I've two tables A, B. I'm inserting/updating into A and then B. Where as the sql's query optimizer prefers to read B first and A. I'll split the single select into 2 selects. First I'll read B. Then pass on this data to next select statement which reads A.
Here deadlock won't happen because the read locks on table B will be released as soon as 1st statement is done.
PS I've faced this issue and this worked very good. Much better than my force order answer.

I was facing the same issue. Using query hint FORCE ORDER will fix this issue. The downside is you won't be able to leverage best plan that query optimizer has for your query, but this will prevent the deadlock.
So (this is from "Bill the Lizard" user) if you have a query FROM table1 LEFT JOIN table2 and your WHERE clause only contains columns from table2 the execution plan will normally first select the rows from table2 and then look up the rows from table1. With a small result set from table2 only a few rows from table1 have to be fetched. With FORCE ORDER first all rows from table1 have to be fetched, because it has no WHERE clause, then the rows from table2 are joined and the result is filtered using the WHERE clause. Thus degrading performance.
But if you know this won't be the case, use this. You might want to optimize the query manually.
The syntax is
SELECT ...
FROM
table1
LEFT JOIN table2
ON table1.id = table2.id
WHERE ...
OPTION (FORCE ORDER)

Related

How to use results from first query in later queries within a DB Transaction

A common case for DB transactions is performing operations on multiple tables, as you can then easily rollback all operations if one fails. However, a common scenario I run into is wanting to insert records to multiple tables where the later inserts need the serial ID from the previous inserts.
Since the ID is not generated/available until the transaction is actually committed, how can one accomplish this? If you have to commit after the first insert in order to get the ID and then execute the second insert, it seems to defeat the purpose of the transaction in the first place because after committing (or if I don't use a transaction at all) I cannot rollback the first insert if the second insert fails.
This seems like such a common use case for DB transactions that I can't imagine it would not be supported in some way. How can this be accomplished?
cte (common table expression) with data modifying statements should cover your need, see the manual.
Typical example :
WITH cte AS (INSERT INTO table_A (id) VALUES ... RETURNING id)
INSERT INTO table_B (id) SELECT id FROM cte
see the demo in dbfiddle

Deadlock in dependent multiple update statements

In a SP, three tables are getting updated in a single transaction. These update are dependent on each other. But intermittently deadlock is happening during this update. It is not happening consistently but rather intermittently.
A WCF service is being called and that calls the SP. The input of the SP is a XML. The XML is parsed wing the OPENXML method and the values are used to update the tables.
#Table is a table variable ,populated by OPENXML on applying the input XML of the SP. The input XML contains only one ID.
<A>
<Value>XYZ</Value>
<ID>1</ID>
</A>
BEGIN TRAN
--update Table1
Update Table1
Set ColumnA = A.value
JOIN #Table A
ON Table1.ID = A.ID
--update Table2
Update Table2
Set ColumnA = Table1.ColumnA
JOIN Table1
ON Table1.ID = Table2.ID
--update Table3
Update Table3
Set ColumnA = Table1.ColumnA
JOIN Table1
ON Table1.ID = Table3.ID
COMMIT TRAN
In Table1 , ID column is primary key.
In Table2, in ID column no index are available.
Here sometimes deadlock is happening while updating Table2.
Receiving the error "Transaction (Process ID 100) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."
Advise is required on resolving this intermittent deadlock issue.
Deadlocks are often the result of more data being touched than needed by queries. Query and index tuning can help ensure only data needed by queries are accessed and locked, reducing both blocking and the likelihood of deadlocks by concurrent sessions.
Because your queries join on ID with no other criteria, an index on that column may help avoid the UPDATE and DELETE statements from touching other rows. I see from your comments that there was no index on the table2 ID column so a clustered index scan was performed. Not only did the scan result in suboptimal performance, it can lead to blocking and deadlocking when concurrent sessions contend for the same rows.
Adding a non-clustered index on ID changed the plan from a full clustered index scan to a non-clustered index seek. This should reduce, if not eliminate, the deadlocks going forward and improve performance considerably too. I like to say that performance and concurrency go hand-in-hand, an especially important detail with data modification statements.

SQL Server Shared Lock on Select

I am querying a table in SQL Server DB that gets continuous inserts from other sources. The SELECT statement used to read the data from this table is used in my ETL job and it queries only a selected partition in the table.
SELECT *
FROM REALTIMESRC
WHERE PARTITION = '2018-11';
I understand that SELECT statement by default introduces a Shared Lock on the rows that it selects.
When this table gets inserts from other sources in the same partition where I am querying, does data insert get impacted due to my Select operation?
I am presuming that shared lock introduced by Select statement will be applicable at row table and doesn't apply to new inserts which happens in parallel. Can someone please clarify this?
I understand that SELECT statement by default introduces a shared lock on the rows that it selects.
That is correct, yes.
When this table gets inserts from other sources in the same partition
where I am querying, does data insert get impacted due to my Select operation?
No, since the insert only introduces new rows that you haven't selected, there shouldn't be any problem.
I am presuming that shared lock introduced by Select statement will be applicable at row table and doesn't apply to new inserts which happens in parallel.
Yes, that is correct - the INSERT and SELECT should work just fine in parallel.
There might be some edge cases where you could run into trouble:
if the INSERT statement tries to insert more than 5000 rows in a single transaction, SQL Server might opt to escalate those 5000 individual locks into a table-level exclusive lock - at which point no more SELECT operations would be possible until the INSERT transaction completes

Which type of locking mode for INSERT, UPDATE or DELETE operations in Sql Server?

I know that NOLOCK is default for SELECT operations. So, if I even don't write with (NOLOCK) keyword for a select query, the row won't be locked.
I couldn't find what happens if with (ROWLOCK) is not specified for UPDATE and DELETE query. Is there a difference between below queries?
UPDATE MYTABLE set COLUMNA = 'valueA';
and
UPDATE MYTABLE WITH (ROWLOCK) set COLUMNA = 'valueA';
If there is no hint, then the db engine chooses the LOCK mdoe as a function of the operation(select/modify), the level of isolation and granularity, and the possibility of escalating the granularity level. Specifying ROWLOCKX does not give 100% of the result of the fact that it will be X on a rows. In general, a very large topic for such a broad issue
Read first about Lock Modes that https://technet.microsoft.com/en-us/library/ms175519(v=sql.105).aspx
If
In statement 1 (without rowlock) the DBMS decides to lock the entire table or the page that updating record is in it. so it means while updating the row all or number of other rows in the table are locked and could not be updated or deleted.
Statement 2 (with (ROWLOCK)) suggests the DBMS to only lock the record that is being updated. But be ware that this is just a HINT and there is no guarantee that the it will be accepted by the DBMS.
So, if I even don't write with (NOLOCK) keyword for a select query, the row won't be locked.
select queries always take a lock and it is called shared lock and duration of the lock depends on your isolation level
Is there a difference between below queries?
UPDATE MYTABLE set COLUMNA = 'valueA';
and
UPDATE MYTABLE WITH (ROWLOCK) set COLUMNA = 'valueA';
Suppose your first statement affects more than 5000 locks,locks will be escalated to table ,but with rowlock ...SQLServer won't lock total table

dirty reads: Different results within single query?

In SQL Server 2014, when I issue the following SQL:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT * FROM TableA
UNION ALL
SELECT * FROM TableB
WHERE NOT EXISTS (
SELECT 1 FROM TableA WHERE TableA.ID = TableB.ID
)
Is it possible to read different versions of one table even within a single Statement because of dirty reads?
Example: Reading 2 Rows from TableA in the first part of the union but reading just 1 Row from TableA in the inner select of the second part of the union because one row got deleted by another transaction meanwhile.
Short answer: yes, depending on the execution plan generated. It doesn't matter that you're doing it in a single statement; no special privileges are associated with a statement boundary. READ UNCOMMITTED means no locking on data for any reason, and that's exactly what you'll get. This is also why using that generally is very much not recommended; it's terribly easy to get inconsistent/"impossible" results. Heck, even a single SELECT is not safe: you're not even guaranteed that rows will not be skipped or duplicated!
Seems to me its god damn possible.
The query execution plan will look like this:
It looks like there will be two different reads from TableA, so it really depends on time delay between them and amount of CRUD operation made to those table.
READ UNCOMMITTED is really not so great choice for such query.

Resources