SQL Server 2012 - How does "Repeatable Read" isolation level work? - sql-server

I feel like I should know this, but I can't find anything that specifically outlines this, so here goes.
The documentation for SQL Server describes REPEATABLE READ as:
Specifies that statements cannot read data that has been modified but
not yet committed by other transactions and that no other transactions
can modify data that has been read by the current transaction until
the current transaction completes
This makes sense, but what actually happens when one of these situation arises? If, for example, Transaction A reads row 1, and then Transaction B attempts to update row 1, what happens? Does Transaction B wait until Transaction A has finished and then try again? Or is an exception thrown?

REPEATABLE READ takes S-locks on all rows that have been read by query plan operators for the duration of the transaction. The answer to your question follows from that:
If the read comes first it S-locks the row and the write must wait.
If the write comes first the S-lock waits for the write to commit.
Under Hekaton it works differently because there are no locks.

Related

Row locking behaviour while updating

In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.

Statement-Level Read Consistency in various SQL/NoSQL DBs

Recently I was thinking about query consistency in various SQL and NoSQL databases. What happens, when I have a (long running) query and rows are inserted or updated while the query is running? A simple theoretic example:
Let’s assume the following query takes a long time:
SELECT SUM(salary) FROM emp;
And while this query is running, another transaction does:
UPDATE emp SET salary = salary * 1.05 WHERE salary > 10000;
COMMIT;
When the SUM query has read half of the updated employees before the update, and the other half after the update, I would get an inconsistent nonsense result. Does this phenomenon have a name? By definition, it is not really a phantom read, because just one query is involved.
How do various DBs handle this situation? I am especially interested in SQL Server, MongoDB, RavenDB and Azure Table Storage.
Oracle for example guarantees statement-level read consistency, which says that the data returned by a single query is committed and consistent for a single point in time.
UPDATE: SQL Server seems to only prevent this kind of problem when READ_COMMITTED_SNAPSHOT is set to ON.
I believe the term you're looking for is "Dirty Read"
I can answer this one for SQL server.
You get 5 options for transaction isolation level, where the default is READ COMMITTED.
Only READ UNCOMMITTED allows dirty reads. You'll have to specifically enable that using SET TRANSACTION LEVEL READ UNCOMMITTED.
READ UNCOMMITTED is equivalent to NOLOCK, but syntactically nicer (opinion) as it doesn't need to be repeated for each table in your query.
Possible isolation levels are as below. I've linked the docs for more detail, if future readers find the link stale please edit.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SNAPSHOT
SERIALIZABLE
By default (read committed), you get your query and the update is blocked by the shared lock taken by your SELECT, until it completes.
If you enable Read Committed Snapshot Isolation Level (RCSI) as a database option, you continue to see the previous version of the data but the update isn't blocked.
Similarly, if the update was running first, when you have RSCI enabled, it doesn't block you, but you see the data before the update started.
RCSI is generally (but not 100% always) a good thing. I always design with it on. In Azure SQL DB, it's on by default.

Concurrent read-and-update transactions with Repeatable Read isolation levels in SQL Server

The specification for the Repeatable-Read isolation level defines that a transaction with this IL will prevent other transactions from updating any rows that this transaction has read until this transaction has completed. Thus, repeatable reads are guaranteed.
Consider the following order of operations for two concurrent transactions T1 and T2, both using repeatable read IL:
T1: Read row
T2: Read row
T1: Update row
T2: Update row
I think that the update in step 3 would violate the specification for the isolation level, since T2 would read a different value if it read the row again.
The converse can be said for the update in step 4.
So, what different options are available for RDBMSs in general resolve this conflict?
More specifically, how is this handled in SQL Server 2017+?
Will this result in a deadlock since neither transaction can complete its operations?
Or would one transaction be rolled back?
I've seen that Lost Updates are prevented in SQL Server. What does this mean for the resolution of this specific case?
I have perused the answers to these questions:
Repeatable read and lock compatibility table
Repeatable Read - am I understanding this right?
repeatable read and second lost updates issue
MySQL Repeatable Read isolation level and Lost Update phenomena
And although the last one asks a similar question but doesn't include any specific info about how RDBMSs which prevent lost updates for txs with this isolation level handle this case.

Dbms best practice

I have some questions about programming with a DBMS (no specific language needed, but I'm using Java; no specific DBMS in mind).
I open a transaction, select a row, then read a field, add 1 to the field, and update, then commit. What happens if another user runs in the same time a transaction on that field? Does it crash the transaction, or what?
Example: I'm a in a shop that has 1 kg of bread. Waiter1 has a client that needs 1 kg of bread. Waiter2 the same. If the program is:
select row "bread"
if quantity>=1 kg then quantity=quantity-1
update row
What happens if the two waiters run the transaction in the same time?
What are the best ways to implement multiuser, avoiding "collision"? Select and lock, transaction only, or what?
When to use optimistic lock, or pessimistic?
When to use lock, and when is it not needed?
Why are you handling this on the application side? Relational databases are built to handle situations like this. Just use an update statement:
UPDATE some_table
SET quantity = quantity - 1
WHERE item_name = 'bread' AND quantity >= 1
What you are looking for is Transaction Isolation. The official SQL standard would handle it like this:
If you don't lock specifically your database will generally lock either the row or even the table for you. Depending on your isolation level it will either wait or raise an error.
Serializable
The second transaction will wait for the first to complete before it can do anything.
Repeatable reads
As soon as the first transaction reads, the second will wait until the first one committed. Or the other way around, if somehow the second transaction starts reading before the first.
Read committed
If the first transaction writes before the second writes, the first will have to wait until the second has committed. Otherwise the second will have to wait until the first has committed.
Read uncommitted
Both can read without an issue, but the first to write will make the other write stall till the transaction has been committed.
If one of the transactions commits after the other reads, you could lose the data and end up with only 1 update.

Is this .NET/SQL Server transaction scenario possible?

I just realized that I fundamentally don't understand how .NET/SQL Server transactions work. I feel like I might pushing the envelop on "there's no such thing as a dumb question", but all of the documentation I've read is not easy to follow. I'm going to try to phrase this question in such a way that the answer will be pretty much yes/no.
If I have a .NET process running on one machine that is effectively doing this (not real code):
For i as Integer = 0 to 100
Using TransactionScope
Using SqlClient.SqlConnection
'Executed using SqlClient.SqlCommand'
"DELETE from TABLE_A"
Thread.Sleep(5000)
"INSERT INTO TABLE_A (Col1) VALUES ('A')"
TransactionScope.Complete()
End Using
End Using
Next i
Is there any Transaction / Isolation-Level configuration that will make 'SELECT count(*) FROM TABLE_A' always return '1' when run from other processes (i.e. even though there are 5 second chunks of time when there are no rows in the table in the context of the transaction)?
Yes, you can make other processes not see the changes you do in the transaction shown. To do that you need to alter the other processes, not the one making the modification.
Turn on snapshot isolation and use IsolationLevel.Snapshot on the other reading processes. They will see the table in the state right before you made any modifications. They won't block (wait).
SNAPSHOT isolation is what you're looking for. Assuming that the table has a row when you start your loop, a concurrent SELECT running under SNAPSHOT isolation level will always see 1 row, no matter when is run, without ever waiting.
All other isolation levels, except READ UNCOMMITTED, will also always see exactly 1 row, but will often block for up to 5 seconds. Note that I consider READ_COMMITTED_SNAPSHOT as SNAPSHOT for this argument.
Dirty reads, ie. SELECTs running under REAd UNCOMMITTED isolation level, will 0, 1 or even 2 rows. That is no mistake, dirty reads may see 2 rows even though you never inserted 2 at a time, it is because race conditions between the scan point of the SELECT and the insert point of your transaction, see Previously committed rows might be missed if NOLOCK hint is used for a similar issue discussion.
I believe the default transaction timeout is 1 minute (see: http://msdn.microsoft.com/en-us/library/ms172070.aspx ) so within the context of your transaction I think you're correct to expect the table to have no records before your insert (regardless of the pause), as each command will complete in sequence within the transaction and that would have been the result of the delete.
Hope that helps.

Resources