With SQL Server's transaction isolation levels, you can avoid certain unwanted concurrency issues, like dirty reads and so forth.
The one I'm interested in right now is lost updates - the fact two transactions can overwrite one another's updates without anyone noticing it. I see and hear conflicting statements as to which isolation level at a minimum I have to choose to avoid this.
Kalen Delaney in her "SQL Server Internals" book says (Chapter 10 - Transactions and Concurrency - Page 592):
In Read Uncommitted isolation, all the behaviors described previously, except lost updates, are possible.
On the other hand, an independent SQL Server trainer giving us a class told us that we need at least "Repeatable Read" to avoid lost updates.
So who's right?? And why??
I dont know if it is too late to answer but I am just learning about transaction isolation levels in college and as part of my research I came across this link:
Microsoft Technet
Specifically the paragraph in question is:
Lost Update
A lost update can be interpreted in one of two ways. In the first scenario, a lost update is considered to have taken place when data that has been updated by one transaction is overwritten by another transaction, before the first transaction is either committed or rolled back. This type of lost update cannot occur in SQL Server 2005 because it is not allowed under any transaction isolation level.
The other interpretation of a lost update is when one transaction (Transaction #1) reads data into its local memory, and then another transaction (Transaction #2) changes this data and commits its change. After this, Transaction #1 updates the same data based on what it read into memory before Transaction #2 was executed. In this case, the update performed by Transaction #2 can be considered a lost update.
So in essence both people are right.
Personally (and I am open to being wrong, so please correct me as I am just learning this) I take from this the following two points:
The whole point of a transaction enviorment is to prevent lost updates as described in the top paragraph. So if even the most basic transaction level cant do that then why bother using it.
When people talk about lost updates, they know the first paragraph applies, and so generally speaking mean the second type of lost update.
Again, please correct me if anything here is wrong as I would like to understand this too.
The example in the book is of Clerk A and Clerk B receiving shipments of Widgets.
They both check the current inventory, see 25 is in stock. Clerk A has 50 widgets and updates to 75, Clerk B has 20 widgets and so updates to 45 overwriting the previous update.
I assume she meant this phenomena can be avoided at all isolation levels by Clerk A doing
UPDATE Widgets
SET StockLevel = StockLevel + 50
WHERE ...
and Clerk B doing
UPDATE Widgets
SET StockLevel = StockLevel + 20
WHERE ...
Certainly if the SELECT and UPDATE are done as separate operations you would need repeatable read to avoid this so the S lock on the row is held for the duration of the transaction (which would lead to deadlock in this scenario)
Lost updates may occur even if reads and writes are in separate transactions, like when users read data into Web pages, then update. In such cases no isolation level can protect you, especially when connections are reused from a connection pool. We should use other approaches, such as rowversion. Here is my canned answer.
My experience is that with Read Uncommitted you no longer get 'lost updates', you can however still get 'lost rollbacks'. The SQL trainer was probably referring to that concurrency issue, so the answer you're likely looking for is Repeatable Read.
That said, I would be very interested if anyone has experience that goes against this.
As marked by Francis Rodgers, what you can rely on SQL Server implementation is that once a transaction updated some data, every isolation level always issue "update locks" over the data, and denying updates and writes from another transaction, whatever it's isolation level it is. You can be sure this kind of lost updates are covered.
However, if the situation is that a transaction reads some data (with an isolation level different than Repeatable Read), then another transaction is able to change this data and commits it's change, and if the first transaction then updates the same data but this time, based on the internal copy that he made, the management system cannot do anything for saving it.
Your answer in that scenario is either use Repeatable Read in the first transaction, or maybe use some read lock from the first transaction over the data (I don't really know about that in a confident way. I just know of the existence of this locks and that you can use them. Maybe this will help anyone who's interested in this approach Microsoft Designing Transactions and Optimizing Locking).
The following is quote from 70-762 Developing SQL Databases (p. 212):
Another potential problem can occur when two processes read the same
row and then update that data with different values. This might happen
if a transaction first reads a value into a variable and then uses the
variable in an update statement in a later step. When this update
executes, another transaction updates the same data. Whichever of
these transactions is committed first becomes a lost update because it
was replaced by the update in the other transaction. You cannot use
isolation levels to change this behavior, but you can write an
application that specifically allows lost updates.
So, it seems that none of the isolation levels can help you in such cases and you need to solve the issue in the code itself. For example:
DROP TABLE IF EXISTS [dbo].[Balance];
CREATE TABLE [dbo].[Balance]
(
[BalanceID] TINYINT IDENTITY(1,1)
,[Balance] MONEY
,CONSTRAINT [PK_Balance] PRIMARY KEY
(
[BalanceID]
)
);
INSERT INTO [dbo].[Balance] ([Balance])
VALUES (100);
-- query window 1
BEGIN TRANSACTION;
DECLARE #CurrentBalance MONEY;
SELECT #CurrentBalance = [Balance]
FROM [dbo].[Balance]
WHERE [BalanceID] = 1;
WAITFOR DELAY '00:00:05'
UPDATE [dbo].[Balance]
SET [Balance] = #CurrentBalance + 20
WHERE [BalanceID] = 1;
COMMIT TRANSACTION;
-- query window 2
BEGIN TRANSACTION;
DECLARE #CurrentBalance MONEY;
SELECT #CurrentBalance = [Balance]
FROM [dbo].[Balance]
WHERE [BalanceID] = 1;
UPDATE [dbo].[Balance]
SET [Balance] = #CurrentBalance + 50
WHERE [BalanceID] = 1;
COMMIT TRANSACTION;
Create the table, the execute each part of the code in separate query windows. Changing the isolation level does nothing. For example, the only difference between read committed and repeatable read is that the last, blocks the second transaction while the first is finished and then overwrites the value.
Related
I have some doubts with respect to transactions and isolation levels:
1) In case the DB transaction level is set to Serializable / Repeatable Read and there are two concurrent transactions trying to modify the same data then one of the transaction will fail.
In such cases, why DB doesn't re-tries the failed operation? Is it a good practice to retry the transaction on application level (hoping the other transaction will be over in mean time)?
2) In case the DB transaction level is set to READ_COMMITTED / DIRTY READ and there are two concurrent transactions trying to modify the same data then why the transactions don't fail?
Ideally we are controlling the read behaviour and concurrent writes should not be allowed.
3) My application has 2 parts and uses the spring managed datasource in one part and application created datasource in other part (this part doesn't use spring and data source is explicit created by passing the properties).
My assumption is that isolation level has no impact - from which datasource the connections is coming from...two concurrent transactions even if coming from different datasource will behave the same based on isolation level as if they are coming from same datasource.
Do you see any issue with this setup? Should we strive for single datasource across application?
I also wait until others to give their feed backs. But now i would like to give my 2 cents to this post.
As you explained isolation's are work differently each.
I'll try to keep a sample data set as follows
IF OBJECT_ID('Employees') IS NOT NULL DROP TABLE Employees
GO
CREATE TABLE Employees (
Emp_id INT IDENTITY,
Emp_name VARCHAR(20),
Emp_Telephone VARCHAR(15),
)
ALTER TABLE Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (emp_id)
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Satsara', '07436743439'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Udhara', '045672903'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Sithara', '58745874859'
REPEATABLE READ and SERIALIZABLE are both very close to each, but SERIALIZABLE is the heights in the isolation. Both options are provided for avoid the dirty readings and both need to manage very carefully because most of the time this will cause for deadlocks due to the way that it handing the data. If there's a deadlock, definitely server will wipe out one transaction from the picture. So it will never run it by the server again due to it doesn't have any clue about that removed transaction, unless a log.
REPEATABLE READ - Not allow to modify (lock records) any records which is already read by another process (another query). But it allows for new records to insert (without a lock) which can be impact to your system while querying.
SERIALIZABLE - Different in Serializable is, its not allow to insert records with
"SET TRANSACTION ISOLATION LEVEL Serializable". So INSERT processors are wait until the previous transaction commit.
Usually REPEATABLE READ and SERIALIZABLE isolation's are keep data locks than other two options.
example [REPEATABLE and SERIALIZABLE]:
In Employee table you have 3 records.
Open a query window and run (QUERY 1)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM Employees;
Now try to run a insert query in a different window (QUERY 2)
INSERT INTO Employees(Emp_name, Emp_Telephone)
SELECT 'JANAKA', '3333333'
System allow to insert the new record in QUERY 2 and now run the same query2 again and you can see 4 records.
Now replace the Query 1 with following code and try the same process to test the Serializable
SET TRANSACTION ISOLATION LEVEL Serializable
BEGIN TRAN
SELECT * FROM Employees;
This time you can see the that 2nd Query insert command not allow to execute and wait until the Query 1 to commit.
Once Query 1 committed only, Query 2 allows to execute the INSERT command.
When compare the Read Committed and the Read Uncommitted,
READ COMMITTED - Changes to the data is not visible to other processors until it commit the records. With Read Committed. it puts shared locks for all the records it reads. If another process found a exclusive lock by, it wait until its lock release.
READ UNCOMMITTED - Not recommended and garbage data can read by the system due to this. (in SQL Server nolock). So this will return the uncommitted data.
"Select * from Employee (nolock)
**DEADLOCKS - ** Whether its Repeatable read, Serializable, READ COMMITTED or READ UNCOMMITTED, it can creates dead locks. Only things
is as we discussed Repeatable read and Serializable are more prone to
deadlocks than other two options.
Note: If you need sample for Read Committed and Read Uncommitted, please let know in the comment section and we can discuss.
Actually this topic is very large topic and need to discuss with lots of samples. I do not know this explanation is enough or not. But i gave a small try. NO idea ware to start and when to stop.
At the same time, you asked about " Is it a good practice to retry the
transaction on application level "
In my opinion that's fine. Personally i also do retrying process in some sort of a situations.
Different techniques used.
Keeping a Flag field to identify it updated or not and retry
Using a Event driven solution such RabitMQ, KAFKA.
In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.
I have a question but I can never get a clear answer. Any stored
procedure that used a transaction that I have looked at up until my recent job always had a commit transaction + a roll back in case of error. However I have seen a lot of code
at my new job that just has a begin transaction and then a commit at the end with no roll back. I understand why you would use a transaction with a rollback but why would you want to begin a transaction with no roll back? Is it so when you run that code you want to lock the table up so no values can be changed why your code is updating? If so why would you not want the added security of a roll back in case something goes wrong? Is this proper use of the transaction statement? Any thoughts or ideas would be great!
For Example:
BEGIN TRANSACTION [Tran1]
INSERT INTO [Test].[dbo].[T1]
([Title], [AVG])
VALUES ('Tidd130', 130), ('Tidd230', 230)
UPDATE [Test].[dbo].[T1]
SET [Title] = N'az2' ,[AVG] = 1
WHERE [dbo].[T1].[Title] = N'az'
COMMIT TRANSACTION [Tran1]
GO
shouldn't this code be using a roll back syntax for proper use of the begin transaction statement?
The idea is that if that set of transactions needs to be "all or nothing", wrapping the lot in a transaction is the way to ensure that is what will happen. You're not seeing an explicit rollback because that's not what they're guarding against. Imagine the ff scenario with your contrived example:
The insert happens
The server crashes (or the log fills up or some other external reason why things can't continue) before the update can happen
If they're both wrapped in the same transaction, the insert won't be reflected in the table data. Which is the desired behavior.
When transactions are not explicitly declared, SQL Server will automatically BEGIN and COMMIT a TRANSACTION for each command. This frees up each command's lock as soon as the command executes.
When executing multiple commands inside a single transaction (as in the example you posted), locks from all commands are held until the transaction is committed.
Depending on the desired behavior, the script you posted may be correct. However, I would be cautious to ensure that the developer did not mistakenly believe that the transaction would be automatically rolled back on error. If that behavior is desired, you do indeed need to explicitly ROLLBACK or SET XACT_ABORT ON
You use transaction when you need the outcome to be atomic, you would see this alot in financial related procedures where you are gravely worried about data acid consistency . Otherwise it is not necessary and introduces a great deal of locking overhead. There is a good question here and here that goes into great depth.
Edit
The takeaway point is if the procedure is a all or none and must either succeed or fail the correct decision is to use a transaction. If the procedure is not a all or none transaction such as simple insert update etc using a transaction is a) unnecessary and b) can introduce an undue performance overhead due to additional locking.
I just realized that I fundamentally don't understand how .NET/SQL Server transactions work. I feel like I might pushing the envelop on "there's no such thing as a dumb question", but all of the documentation I've read is not easy to follow. I'm going to try to phrase this question in such a way that the answer will be pretty much yes/no.
If I have a .NET process running on one machine that is effectively doing this (not real code):
For i as Integer = 0 to 100
Using TransactionScope
Using SqlClient.SqlConnection
'Executed using SqlClient.SqlCommand'
"DELETE from TABLE_A"
Thread.Sleep(5000)
"INSERT INTO TABLE_A (Col1) VALUES ('A')"
TransactionScope.Complete()
End Using
End Using
Next i
Is there any Transaction / Isolation-Level configuration that will make 'SELECT count(*) FROM TABLE_A' always return '1' when run from other processes (i.e. even though there are 5 second chunks of time when there are no rows in the table in the context of the transaction)?
Yes, you can make other processes not see the changes you do in the transaction shown. To do that you need to alter the other processes, not the one making the modification.
Turn on snapshot isolation and use IsolationLevel.Snapshot on the other reading processes. They will see the table in the state right before you made any modifications. They won't block (wait).
SNAPSHOT isolation is what you're looking for. Assuming that the table has a row when you start your loop, a concurrent SELECT running under SNAPSHOT isolation level will always see 1 row, no matter when is run, without ever waiting.
All other isolation levels, except READ UNCOMMITTED, will also always see exactly 1 row, but will often block for up to 5 seconds. Note that I consider READ_COMMITTED_SNAPSHOT as SNAPSHOT for this argument.
Dirty reads, ie. SELECTs running under REAd UNCOMMITTED isolation level, will 0, 1 or even 2 rows. That is no mistake, dirty reads may see 2 rows even though you never inserted 2 at a time, it is because race conditions between the scan point of the SELECT and the insert point of your transaction, see Previously committed rows might be missed if NOLOCK hint is used for a similar issue discussion.
I believe the default transaction timeout is 1 minute (see: http://msdn.microsoft.com/en-us/library/ms172070.aspx ) so within the context of your transaction I think you're correct to expect the table to have no records before your insert (regardless of the pause), as each command will complete in sequence within the transaction and that would have been the result of the delete.
Hope that helps.
I'm having problems getting my head round why this is happening. Pretty sure I understand the theory, but something else must be going on that I don't see.
Table A has the following schema:
ID [Primary Key]
Name
Type [Foreign Key]
SprocA sets Isolation Level to Repeatable Read, and Selects rows from Table A that have Type=1. It also updates these rows.
SprocB selects rows from Table A that have Type=2.
Now given that these are completely different rowsets, if I execute both at the same time (and put WAITFOR calls to slow it down), SprocB doesn't complete until SprocA.
I know it's to do with the query on Type, as if I select based on the Primary ID then it allows concurrent access to the table.
Anyone shed any light?
Cheers
With Repeatable Read set for the isolation level, you will hold a shared lock on all data you read until the transaction completes. That is until you COMMIT or ROLLBACK.
This will lower the concurrency of your application's access to this data. So if your first procedure SELECTS from table then calls a WAITFOR then SELECTS again etc within a transaction you will hold the shared lock the entire time until you commit the transaction or the process completes.
If this is a test procedure you are working with try added a COMMIT after each select and see if that helps the second procedure to run concurrently.
Good luck!
Kevin
SQL Server uses indexes to do range locks (which is what repeatable reads often use) so if you don't have index on Type perhaps it locks entire table...
The thing to remember is that the locked rows are black boxes to the other process.
You know that SprocA is just reading for type = 1 and that SprocbB is just reading for type = 2.
However, SprocB does not know what SprocA is going to do to those records. Before the transaction is completed, SprocA may update all of the records to type = 2. In that case, SprocB would be working incorrectly if it did not wait for SprocA to complete.
Maintaining concurrency when performing range locks / bulk changes is tough.