Locking transactions in databases - database

I was wondering what happens to a transactions that got blocked by another transaction?
It will best to work through an example, say I have two transactions - T1 and T2 and following scenario:
T1 ........................................................ T2
Lock DB object
Read Q ..................................................Lock Q (T2 is blocked)
Write Q
Unlock Q
So does the T2 is un-blocked after T1 is done or is it forever lost? I used to think that T2 was sent into a wait queue and waits there for its turn.
Thank you anyone who would clarify this concept to me :)

There are two common things that happen in this case:
T2 waits until T1 releases the lock. How this is implemented depends on the database software (and potentially the locking primitives provided by the OS).
T2 gets aborted when it tries to lock Q and finds that it is already locked. (Ex. for Oracle, an UPDATE or LOCK TABLE statement issued with the NOWAIT option.)
Having T2 "lost forever" would be a bug in the database engine.
An interesting read about Oracle's locking strategies: How Oracle Locks Data. (That whole chapter, Data Concurrency and Consistency, is interesting if you're studying these database aspects. Note that these details are highly database dependent. What you read there will not apply directly to SQL Server, DB2 or MySQL for instance.)

Related

Shared lock in READ_COMMITTED_SNAPSHOT and SNAPSHOT isolation

I've read on Microsoft's site
http://msdn.microsoft.com/en-us/library/ms173763.aspx
that Sql Server doesn't request locks when reading data, except when a database is being recovered.
Does it mean that Sql Server using READ_COMMITTED_SNAPSHOT/SNAPSHOT ISOLATION doesn't use shared locks at all?
How is that possible?
For example, if there are 2 transactions.
First transaction T1 wants to update some row.
Second transaction T2 starts reading same row (this transaction is copying him to some output buffer, response buffer or whatever it's called in Sql Server).
At the same time transaction T1 starts updating that row (it created versioned row first).
Isn't there a possibility that transaction T2 will read uncommited data?
Remember, transaction T2 started copying that row before T1 made update, so there is no exclusive lock on that row.
Is this situation even possible and how can this be avoided without setting shared lock on that row during copying of it's data?
Beside logical locks there are also physical latches to protect the database structures (particularly, in this example, pages). Latches protect any changes (modification of bits), irrelevant of isolation level. So even if the T1 does not acquire locks, it still needs to acquire a shared latch on the pages it reads, otherwise it would be victim to low level concurrent modifications done to the very structures it reads. T2 can modify the page containing the rows it modifies only if it obtains a page exclusive latch. Thus T1 can only see the image of the row either before T2 modified it (and therefore the row is the one T1 wants) or after T2 is complete with the modifications done to the row (and now T1 has to lookup the previous row image in the version store).
The latching protocol must be honored by all isolation levels, including read uncommitted and versioned reads (ie. snapshot and friends).
Does it mean that Sql Server using READ_COMMITTED_SNAPSHOT/SNAPSHOT ISOLATION doesn't use shared locks at all? How is that possible?
It is possible because SQL Server is reading from a SNAPSHOT, which is not going to go through any change at all. It's already frozen at the state of the DB at the start of the current transaction, disregarding uncommitted transactions from other processes. This is done by SQL Server keeping a snapshot (row-versioned) copy of the record in tempdb for transactions to refer to, letting the current in-progress data/index page(s) get changed.
Isn't there a possibility that transaction T2 will read uncommited data? Remember, transaction T2 started copying that row before T1 made update, so there is no exclusive lock on that row.
The above narrative explains this already. But to illustrate (simplified):
Scenario 1:
T1: begin tran (implicit/explicit)
T1: read value (4)
T2: read value (4) -- *
T1: update value to (8)
* - This is the committed value at the time the T2 transaction started
Scenario 2:
T1: begin tran (implicit/explicit)
T1: read value (4)
T1: update value to (8)
version of the row with the value (4) is made
T2: read value (4) -- * from the versioned row
T1: commit
* - (4) is [still] the *committed* value at the time the T2 transaction started

How Read Committed Isolation Level prevents dirty reads

i start with a simple question:
according to Dirty Read definition in
Wikipedia
and Msdn :
we have 2 concurrent transactions, T1 and T2
Dirty Reads Occur in ,when T1 is Updating a row and T2 is reading row that "is not Committed yet" by T1
but at Read Committed Level shared locks are released as soon as the data is read (not at the end of the transaction or even the end of the statement
then how Read Committed prevents Dirty Reads?
Bkaz as soon as the share lock released on updated row T2 can read the updated row and t1 can rollback the whole operation,,then we have a dirty read on the hand of t1
It prevents the dirty read because T1 has a lock on the row, so T2 can't read the "not yet committed" row that could be rollbacked later.
The problem Read Committed tries to resolve is:
T1 creates a transaction and writes something
T2 reads that something
T1 rollback the transaction
now T2 has a data that didn't really ever existed.
Depending on how the DB is structured, there are two "good" possibilities:
T1 creates a transaction and writes something
T2 waits for T1 to end the transaction
or
T2 reads a "snapshot" of how the DB was BEFORE T1 began the transaction (it's called Read committed using row versioning)
(the default on MSSQL is the first option)
Here for example there is a comparison of the various isolation levels: http://msdn.microsoft.com/en-us/library/ms345124(SQL.90).aspx (read under Isolation Levels Offered in SQL Server 2005)
When SQL Server executes a statement at the read committed isolation level, it acquires short lived share locks on a row by row basis. The duration of these share locks is just long enough to read and process each row; the server generally releases each lock before proceeding to the next row. Thus, if you run a simple select statement under read committed and check for locks (e.g., with sys.dm_tran_locks), you will typically see at most a single row lock at a time. The sole purpose of these locks is to ensure that the statement only reads and returns committed data. The locks work because updates always acquire an exclusive lock which blocks any readers trying to acquire a share lock.
Ripped from here

Read uncommitted mvcc database

say I want to do the following transactions in read committed mode (in postgres).
T1: r(A) -> w(A)
T2: r(A) -> w(A)
If the operations where called in this order:
r1(A)->r2(A)->w1(A)->c1->w2(A)->c2
I would exspect that T2 has to wait at r(A). Because T1 would set an exclusive lock for A at the first read, because it wants to write it later. But with MVCC there are are no read locks?
Now i've got 2 questions:
If I use JDBC to read some data and then execute a separte command for inserting the read data. How does the database know that it has to make an exclusiv lock when it is only reading? Increasing an read lock to a write lock is not allowed in 2PL, as far as I know.
I think my assumtions are wrong... Where does this scenario wait or is one transaction killed? Read uncommitted shouldn't allow lost updates, but I can't see how this can work.
I would be happy if someone could help me. Thanks
I would exspect that T2 has to wait at r(A). Because T1 would set an exclusive lock for A at the first read, because it wants to write it later. But with MVCC there are no read locks?
There are write locks if you specify for update in your select statements. In that case, r2(A) would wait to read if it's trying to lock the same rows as r1(A).
http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html
A deadlock occurs if two transactions start and end up requesting each others already locked rows:
r11(A) -> r22(A) -> r12(A) (same as r22) vs r21(A) (same as r11) -> deadlock
"But with MVCC there are are no read locks?"
MVCC is a different beast. There are no "locks" in MVCC because in that scenario, the system maintains as many versions of a single row as might be needed by the transactions that are running concurrently. "Former contents" of a row are not "lost by an update" (i.e. physically overwritten and destroyed), and thus making sure that a reader does not get to see "new updates", is addressed by "redirecting" that reader's inquiries to the "former content", which is not locked (hence the term "snapshot isolation"). Note that MVCC, in principle, cannot be applied to updating transactions.
"If I use JDBC to read some data and then execute a separate command for inserting the read data. How does the database know that it has to make an exclusive lock when it is only reading? Increasing an read lock to a write lock is not allowed in 2PL, as far as I know."
You are wrong about 2PL. 2PL means that acquired locks are never released until commit time. It does not mean that an existing lock cannot be strengthened. Incidentally : that is why isolation levels such as "cursor stability" are not 2PL : they do release read locks prior to commit time.
The default transaction mode in PostgreSQL is READ COMMITTED, however READ COMMITTED does not provide the level of serialization that you are looking for.
You are looking for the SERIALIZABLE transaction level. Look at the SET TRANSACTION command after reading PostgreSQL's documentation on Transaction Serialization Levels, specifically the SERIALIZABLE mode. PostgreSQL's MVCC docs are also worth reading.
Cheers.

TABLOCK vs TABLOCKX

What is the difference between TABLOCK and TABLOCKX?
http://msdn.microsoft.com/en-us/library/ms187373.aspx states that TABLOCK is a shared lock while TABLOCKX is an exclusive lock. Is the first maybe only an index lock of sorts? And what is the concept of sharing a lock?
Big difference, TABLOCK will try to grab "shared" locks, and TABLOCKX exclusive locks.
If you are in a transaction and you grab an exclusive lock on a table, EG:
SELECT 1 FROM TABLE WITH (TABLOCKX)
No other processes will be able to grab any locks on the table, meaning all queries attempting to talk to the table will be blocked until the transaction commits.
TABLOCK only grabs a shared lock, shared locks are released after a statement is executed if your transaction isolation is READ COMMITTED (default). If your isolation level is higher, for example: SERIALIZABLE, shared locks are held until the end of a transaction.
Shared locks are, hmmm, shared. Meaning 2 transactions can both read data from the table at the same time if they both hold a S or IS lock on the table (via TABLOCK). However, if transaction A holds a shared lock on a table, transaction B will not be able to grab an exclusive lock until all shared locks are released. Read about which locks are compatible with which at msdn.
Both hints cause the db to bypass taking more granular locks (like row or page level locks). In principle, more granular locks allow you better concurrency. So for example, one transaction could be updating row 100 in your table and another row 1000, at the same time from two transactions (it gets tricky with page locks, but lets skip that).
In general granular locks is what you want, but sometimes you may want to reduce db concurrency to increase performance of a particular operation and eliminate the chance of deadlocks.
In general you would not use TABLOCK or TABLOCKX unless you absolutely needed it for some edge case.
Quite an old article on mssqlcity attempts to explain the types of locks:
Shared locks are used for operations that do not change or update data, such as a SELECT statement.
Update locks are used when SQL Server intends to modify a page, and later promotes the update page lock to an exclusive page lock before actually making the changes.
Exclusive locks are used for the data modification operations, such as UPDATE, INSERT, or DELETE.
What it doesn't discuss are Intent (which basically is a modifier for these lock types). Intent (Shared/Exclusive) locks are locks held at a higher level than the real lock. So, for instance, if your transaction has an X lock on a row, it will also have an IX lock at the table level (which stops other transactions from attempting to obtain an incompatible lock at a higher level on the table (e.g. a schema modification lock) until your transaction completes or rolls back).
The concept of "sharing" a lock is quite straightforward - multiple transactions can have a Shared lock for the same resource, whereas only a single transaction may have an Exclusive lock, and an Exclusive lock precludes any transaction from obtaining or holding a Shared lock.
This is more of an example where TABLOCK did not work for me and TABLOCKX did.
I have 2 sessions, that both use the default (READ COMMITTED) isolation level:
Session 1 is an explicit transaction that will copy data from a linked server to a set of tables in a database, and takes a few seconds to run. [Example, it deletes Questions]
Session 2 is an insert statement, that simply inserts rows into a table that Session 1 doesn't make changes to. [Example, it inserts Answers].
(In practice there are multiple sessions inserting multiple records into the table, simultaneously, while Session 1 is running its transaction).
Session 1 has to query the table Session 2 inserts into because it can't delete records that depend on entries that were added by Session 2. [Example: Delete questions that have not been answered].
So, while Session 1 is executing and Session 2 tries to insert, Session 2 loses in a deadlock every time.
So, a delete statement in Session 1 might look something like this:
DELETE tblA FROM tblQ LEFT JOIN tblX on ...
LEFT JOIN tblA a ON tblQ.Qid = tblA.Qid
WHERE ... a.QId IS NULL and ...
The deadlock seems to be caused from contention between querying tblA while Session 2, [3, 4, 5, ..., n] try to insert into tblA.
In my case I could change the isolation level of Session 1's transaction to be SERIALIZABLE. When I did this: The transaction manager has disabled its support for remote/network transactions.
So, I could follow instructions in the accepted answer here to get around it: The transaction manager has disabled its support for remote/network transactions
But a) I wasn't comfortable with changing the isolation level to SERIALIZABLE in the first place- supposedly it degrades performance and may have other consequences I haven't considered, b) didn't understand why doing this suddenly caused the transaction to have a problem working across linked servers, and c) don't know what possible holes I might be opening up by enabling network access.
There seemed to be just 6 queries within a very large transaction that are causing the trouble.
So, I read about TABLOCK and TabLOCKX.
I wasn't crystal clear on the differences, and didn't know if either would work. But it seemed like it would. First I tried TABLOCK and it didn't seem to make any difference. The competing sessions generated the same deadlocks. Then I tried TABLOCKX, and no more deadlocks.
So, in six places, all I needed to do was add a WITH (TABLOCKX).
So, a delete statement in Session 1 might look something like this:
DELETE tblA FROM tblQ q LEFT JOIN tblX x on ...
LEFT JOIN tblA a WITH (TABLOCKX) ON tblQ.Qid = tblA.Qid
WHERE ... a.QId IS NULL and ...

SQL Server SELECT statements causing blocking

We're using a SQL Server 2005 database (no row versioning) with a huge select statement, and we're seeing it block other statements from running (seen using sp_who2). I didn't realise SELECT statements could cause blocking - is there anything I can do to mitigate this?
SELECT can block updates. A properly designed data model and query will only cause minimal blocking and not be an issue. The 'usual' WITH NOLOCK hint is almost always the wrong answer. The proper answer is to tune your query so it does not scan huge tables.
If the query is untunable then you should first consider SNAPSHOT ISOLATION level, second you should consider using DATABASE SNAPSHOTS and last option should be DIRTY READS (and is better to change the isolation level rather than using the NOLOCK HINT). Note that dirty reads, as the name clearly states, will return inconsistent data (eg. your total sheet may be unbalanced).
From documentation:
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. For more information, see Types of Concurrency Control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
A shared lock is compatible with another shared lock or an update lock, but not with an exlusive lock.
That means that your SELECT queries will block UPDATE and INSERT queries and vice versa.
A SELECT query will place a temporary shared lock when it reads a block of values from the table, and remove it when it done reading.
For the time the lock exists, you will not be able to do anything with the data in the locked area.
Two SELECT queries will never block each other (unless they are SELECT FOR UPDATE)
You can enable SNAPSHOT isolation level on your database and use it, but note that it will not prevent UPDATE queries from being locked by SELECT queries (which seems to be your case).
It, though, will prevent SELECT queries from being locked by UPDATE.
Also note that SQL Server, unlike Oracle, uses lock manager and keeps it locks in an in-memory linked list.
That means that under heavy load, the mere fact of placing and removing a lock may be slow, since the linked list should itself be locked by the transaction thread.
To perform dirty reads you can either:
using (new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
//Your code here
}
or
SelectCommand = "SELECT * FROM Table1 WITH (NOLOCK) INNER JOIN Table2 WITH (NOLOCK) ..."
remember that you have to write WITH (NOLOCK) after every table you want to dirty read
You could set the transaction level to Read Uncommitted
You might also get deadlocks:
"deadlocks involving only one table"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
and or incorrect results:
"Selects under READ COMMITTED and REPEATABLE READ may return incorrect results."
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx
You can use WITH(READPAST) table hint. It's different than the WITH(NOLOCK). It will get the data before the transaction was started and will not block anyone. Imagine that, you ran the statement before the transaction was started.
SELECT * FROM table1 WITH (READPAST)

Resources