in-flight collision of DB transaction with different Isolation Level - database

I have some doubts with respect to transactions and isolation levels:
1) In case the DB transaction level is set to Serializable / Repeatable Read and there are two concurrent transactions trying to modify the same data then one of the transaction will fail.
In such cases, why DB doesn't re-tries the failed operation? Is it a good practice to retry the transaction on application level (hoping the other transaction will be over in mean time)?
2) In case the DB transaction level is set to READ_COMMITTED / DIRTY READ and there are two concurrent transactions trying to modify the same data then why the transactions don't fail?
Ideally we are controlling the read behaviour and concurrent writes should not be allowed.
3) My application has 2 parts and uses the spring managed datasource in one part and application created datasource in other part (this part doesn't use spring and data source is explicit created by passing the properties).
My assumption is that isolation level has no impact - from which datasource the connections is coming from...two concurrent transactions even if coming from different datasource will behave the same based on isolation level as if they are coming from same datasource.
Do you see any issue with this setup? Should we strive for single datasource across application?

I also wait until others to give their feed backs. But now i would like to give my 2 cents to this post.
As you explained isolation's are work differently each.
I'll try to keep a sample data set as follows
IF OBJECT_ID('Employees') IS NOT NULL DROP TABLE Employees
GO
CREATE TABLE Employees (
Emp_id INT IDENTITY,
Emp_name VARCHAR(20),
Emp_Telephone VARCHAR(15),
)
ALTER TABLE Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (emp_id)
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Satsara', '07436743439'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Udhara', '045672903'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Sithara', '58745874859'
REPEATABLE READ and SERIALIZABLE are both very close to each, but SERIALIZABLE is the heights in the isolation. Both options are provided for avoid the dirty readings and both need to manage very carefully because most of the time this will cause for deadlocks due to the way that it handing the data. If there's a deadlock, definitely server will wipe out one transaction from the picture. So it will never run it by the server again due to it doesn't have any clue about that removed transaction, unless a log.
REPEATABLE READ - Not allow to modify (lock records) any records which is already read by another process (another query). But it allows for new records to insert (without a lock) which can be impact to your system while querying.
SERIALIZABLE - Different in Serializable is, its not allow to insert records with
"SET TRANSACTION ISOLATION LEVEL Serializable". So INSERT processors are wait until the previous transaction commit.
Usually REPEATABLE READ and SERIALIZABLE isolation's are keep data locks than other two options.
example [REPEATABLE and SERIALIZABLE]:
In Employee table you have 3 records.
Open a query window and run (QUERY 1)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM Employees;
Now try to run a insert query in a different window (QUERY 2)
INSERT INTO Employees(Emp_name, Emp_Telephone)
SELECT 'JANAKA', '3333333'
System allow to insert the new record in QUERY 2 and now run the same query2 again and you can see 4 records.
Now replace the Query 1 with following code and try the same process to test the Serializable
SET TRANSACTION ISOLATION LEVEL Serializable
BEGIN TRAN
SELECT * FROM Employees;
This time you can see the that 2nd Query insert command not allow to execute and wait until the Query 1 to commit.
Once Query 1 committed only, Query 2 allows to execute the INSERT command.
When compare the Read Committed and the Read Uncommitted,
READ COMMITTED - Changes to the data is not visible to other processors until it commit the records. With Read Committed. it puts shared locks for all the records it reads. If another process found a exclusive lock by, it wait until its lock release.
READ UNCOMMITTED - Not recommended and garbage data can read by the system due to this. (in SQL Server nolock). So this will return the uncommitted data.
"Select * from Employee (nolock)
**DEADLOCKS - ** Whether its Repeatable read, Serializable, READ COMMITTED or READ UNCOMMITTED, it can creates dead locks. Only things
is as we discussed Repeatable read and Serializable are more prone to
deadlocks than other two options.
Note: If you need sample for Read Committed and Read Uncommitted, please let know in the comment section and we can discuss.
Actually this topic is very large topic and need to discuss with lots of samples. I do not know this explanation is enough or not. But i gave a small try. NO idea ware to start and when to stop.
At the same time, you asked about " Is it a good practice to retry the
transaction on application level "
In my opinion that's fine. Personally i also do retrying process in some sort of a situations.
Different techniques used.
Keeping a Flag field to identify it updated or not and retry
Using a Event driven solution such RabitMQ, KAFKA.

Related

Statement-Level Read Consistency in various SQL/NoSQL DBs

Recently I was thinking about query consistency in various SQL and NoSQL databases. What happens, when I have a (long running) query and rows are inserted or updated while the query is running? A simple theoretic example:
Let’s assume the following query takes a long time:
SELECT SUM(salary) FROM emp;
And while this query is running, another transaction does:
UPDATE emp SET salary = salary * 1.05 WHERE salary > 10000;
COMMIT;
When the SUM query has read half of the updated employees before the update, and the other half after the update, I would get an inconsistent nonsense result. Does this phenomenon have a name? By definition, it is not really a phantom read, because just one query is involved.
How do various DBs handle this situation? I am especially interested in SQL Server, MongoDB, RavenDB and Azure Table Storage.
Oracle for example guarantees statement-level read consistency, which says that the data returned by a single query is committed and consistent for a single point in time.
UPDATE: SQL Server seems to only prevent this kind of problem when READ_COMMITTED_SNAPSHOT is set to ON.
I believe the term you're looking for is "Dirty Read"
I can answer this one for SQL server.
You get 5 options for transaction isolation level, where the default is READ COMMITTED.
Only READ UNCOMMITTED allows dirty reads. You'll have to specifically enable that using SET TRANSACTION LEVEL READ UNCOMMITTED.
READ UNCOMMITTED is equivalent to NOLOCK, but syntactically nicer (opinion) as it doesn't need to be repeated for each table in your query.
Possible isolation levels are as below. I've linked the docs for more detail, if future readers find the link stale please edit.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SNAPSHOT
SERIALIZABLE
By default (read committed), you get your query and the update is blocked by the shared lock taken by your SELECT, until it completes.
If you enable Read Committed Snapshot Isolation Level (RCSI) as a database option, you continue to see the previous version of the data but the update isn't blocked.
Similarly, if the update was running first, when you have RSCI enabled, it doesn't block you, but you see the data before the update started.
RCSI is generally (but not 100% always) a good thing. I always design with it on. In Azure SQL DB, it's on by default.

MSSQL how to properly Lock rows and insert?

I want to insert two rows in 2 different tables but want to roll back the transaction if some pre conditions on the second table are met.
Does it work In .NET if i simply start a transaction scope and execute a sql query to check data on the second table before executing the insert statements? If so, what is the isolation level to use?
I don't want it lock the whole tables as there are going to be many inserts. UNIQUE constraint is not an option because what i want to do is guarantee not more than 2 rows in the 2nd table to have the same value (FK to a PK column of table 1)
Thanks
Yes you can execute a sql query to check data on the second table before executing the insert statements.
Fyi the default is Serializable. From MSDN:
The lowest isolation level, ReadUncommitted, allows many transactions
to operate on a data store simultaneously and provides no protection
against data corruption due to interruptive transactions. The highest
isolation level, Serializable, provides a high degree of protection
against interruptive transactions, but requires that each transaction
complete before any other transactions are allowed to operate on the
data.
The isolation level of a transaction is determined when the
transaction is created. By default, the System.Transactions
infrastructure creates Serializable transactions. You can determine
the isolation level of an existing transaction using the
IsolationLevel property of a transaction.
Given your requirement, I do not think you want to use Serializable since it is the least friendly for high volume multi user systems because they cause the most amount of blocking.
You need to decide on the amount of protection that is required. At a minimum, you should look into READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ. The following answer goes over Isolation Levels in detail. From that, you can decide what level of protection is sufficient for your requirement.
Transaction isolation levels relation with locks on table

when/what locks are hold/released in READ COMMITTED isolation level

I am trying to understand isolation/locks in SQL Server.
I have following scenario in READ COMMITTED isolation level(Default)
We have a table.
create table Transactions(Tid int,amt int)
with some records
insert into Transactions values(1, 100)
insert into Transactions values(2, -50)
insert into Transactions values(3, 100)
insert into Transactions values(4, -100)
insert into Transactions values(5, 200)
Now from msdn i understood
When a select is fired shared lock is taken so no other transaction can modify data(avoiding dirty read).. Documentation also talks about row level, page level, table level lock. I thought of following scenarion
Begin Transaction
select * from Transactions
/*
some buisness logic which takes 5 minutes
*/
Commit
What I want to understand is for what duration of time shared lock would be acquired and which (row, page, table).
Will lock will be acquire only when statement select * from Transactions is run or would it be acquire for whole 5+ minutes till we reach COMMIT.
You are asking the wrong question, you are concerned about the implementation details. What you should think of and be concerned with are the semantics of the isolation level. Kendra Little has a nice poster explaining them: Free Poster! Guide to SQL Server Isolation Levels.
Your question should be rephrased like:
select * from Items
Q: What Items will I see?
A: All committed Items
Q: What happens if there are uncommitted transactions that have inserted/deleted/update Items?
A: your SELECT will block until all uncommitted Items are committed (or rolled back).
Q: What happens if new Items are inserted/deleted/update while I run the query above?
A: The results are undetermined. You may see some of the modifications, won't see some other, and possible block until some of them commit.
READ COMMITTED makes no promise once your statement finished, irrelevant of the length of the transaction. If you run the statement again you will have again exactly the same semantics as state before, and the Items you've seen before may change, disappear and new one can appear. Obviously this implies that changes can be made to Items after your select.
Higher isolation levels give stronger guarantees: REPEATABLE READ guarantees that no item you've selected the first time can be modified or deleted until you commit. SERIALIZABLE adds the guarantee that no new Item can appear in your second select before you commit.
This is what you need to understand, no how the implementation mechanism works. After you master these concepts, you may ask the implementation details. They're all described in Transaction Processing: Concepts and Techniques.
Your question is a good one. Understanding what kind of locks are acquired allows a deep understanding of DBMS's. In SQL Server, under all isolation levels (Read Uncommitted, Read Committed (default), Repeatable Reads, Serializable) Exclusive Locks are acquired for Write operations.
Exclusive locks are released when transaction ends, regardless of the isolation level.
The difference between the isolation levels refers to the way in which Shared (Read) Locks are acquired/released.
Under Read Uncommitted isolation level, no Shared locks are acquired. Under this isolation level the concurrency issue known as "Dirty Reads" (a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed, so it could be rolled back) can occur.
Under Read Committed isolation level, Shared Locks are acquired for the concerned records. The Shared Locks are released when the current instruction ends. This isolation level prevents "Dirty Reads" but, since the record can be updated by other concurrent transactions, "Non-Repeatable Reads" (transaction A retrieves a row, transaction B subsequently updates the row, and transaction A later retrieves the same row again. Transaction A retrieves the same row twice but sees different data) or "Phantom Reads" (in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first) can occur.
Under Repeatable Reads isolation level, Shared Locks are acquired for the transaction duration. "Dirty Reads" and "Non-Repeatable Reads" are prevented but "Phantom Reads" can still occur.
Under Serializable isolation level, ranged Shared Locks are acquired for the transaction duration. None of the above mentioned concurrency issues occur but performance is drastically reduced and there is the risk of Deadlocks occurrence.
lock will only acquire when select * from Transaction is run
You can check it with below code
open a sql session and run this query
Begin Transaction
select * from Transactions
WAITFOR DELAY '00:05'
/*
some buisness logic which takes 5 minutes
*/
Commit
Open another sql session and run below query
Begin Transaction
Update Transactions
Set = ...
where ....
commit
First, lock only acquire when statement run.
Your statement seprate in two pieces, suppose to be simplfy:
select * from Transactions
update Transactions set amt = xxx where Tid = xxx
When/what locks are hold/released in READ COMMITTED isolation level?
when select * from Transactions run, no lock acquired.
Following update Transactions set amt = xxx where Tid = xxx will add X lock for updating/updated keys, IX lock for page/tab
All lock will release only after committed/rollbacked. That means no lock will release in trans running.

Read committed Snapshot VS Snapshot Isolation Level

Could some one please help me understand when to use SNAPSHOT isolation level over READ COMMITTED SNAPSHOT in SQL Server?
I understand that in most cases READ COMMITTED SNAPSHOT works, but not sure when go for SNAPSHOT isolation.
Thanks
READ COMMITTED SNAPSHOT does optimistic reads and pessimistic writes. In contrast, SNAPSHOT does optimistic reads and optimistic writes.
Microsoft recommends READ COMMITTED SNAPSHOT for most apps that need row versioning.
Read this excellent Microsoft article: Choosing Row Versioning-based Isolation Levels. It explains the benefits and costs of both isolation levels.
And here's a more thorough one:
http://msdn.microsoft.com/en-us/library/ms345124(SQL.90).aspx
[![Isolation levels table][2]][2]
See the example below:
Read Committed Snapshot
Change the database property as below
ALTER DATABASE SQLAuthority
SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE
GO
Session 1
USE SQLAuthority
GO
BEGIN TRAN
UPDATE DemoTable
SET i = 4
WHERE i = 1
Session 2
USE SQLAuthority
GO
BEGIN TRAN
SELECT *
FROM DemoTable
WHERE i = 1
Result – Query in Session 2 shows old value (1, ONE) because current transaction is NOT committed. This is the way to avoid blocking and read committed data also.
Session 1
COMMIT
Session 2
USE SQLAuthority
GO
SELECT *
FROM DemoTable
WHERE i = 1
Result – Query in Session 2 shows no rows because row is updated in session 1. So again, we are seeing committed data.
Snapshot Isolation Level
This is the new isolation level, which was available from SQL Server 2005 onwards. For this feature, there is a change needed in the application as it has to use a new isolation level.
Change database setting using below. We need to make sure that there is no transaction in the database.
ALTER DATABASE SQLAuthority SET AllOW_SNAPSHOT_ISOLATION ON
Now, we also need to change the isolation level of connection by using below
Session 1
USE SQLAuthority
GO
BEGIN TRAN
UPDATE DemoTable
SET i = 10
WHERE i = 2
Session 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO
USE SQLAuthority
GO
BEGIN TRAN
SELECT *
FROM DemoTable
WHERE i = 2
Result- Even if we have changed the value to 10, we will still see old record in session 2 (2, TWO).
Now, let’s commit transaction in session 1
Session 1
COMMIT
Let’s come back to session 2 and run select again.
Session 2
SELECT *
FROM DemoTable
WHERE i = 2
We will still see the record because session 2 has stated the transaction with snapshot isolation. Unless we complete the transaction, we will not see latest record.
Session 2
COMMIT
SELECT *
FROM DemoTable
WHERE i = 2
Now, we should not see the row as it's already updated.
See: SQL Authority, Safari Books Online
No comparison of Snapshot and Snapshot Read Committed is complete without a discussion of the dreaded "snapshot update conflict" exception that can happen in Snapshot, but not Snapshot Read Committed.
In a nutshell, Snapshot isolation retrieves a snapshot of committed data at the start of a transaction, and then uses optimistic locking for both reads and writes. If, when attempting to commit a transaction, it turns out that something else changed some of that same data, the database will rollback the entire transaction and raise an error causing a snapshot update conflict exception in the calling code. This is because the version of data affected by the transaction is not the same at the end of the transaction as it was at the start.
Snapshot Read Committed does not suffer from this problem because it uses locking on writes (pessimistic writes) and it obtains snapshot version information of all committed data at the stat of each statement.
The possibility of snapshot update conflicts happening in Snapshot and NOT Snapshot Read Committed is an extremely significant difference between the two.
Still relevant, starting with Bill's comments I read more and made notes that might be useful to someone else.
By default single statements (including SELECT) work on "committed" data (READ COMMITTED), the question is: do they wait for data to be "idle" and stop others from working when reading?
Setting via right click DB "Properties -> Options -> Miscellaneous":
Concurrency/Blocking: Is Read Committed Snapshot On [defaults off, should be on]:
Use SNAPSHOT for select (read), do not wait for others, nor block them.
Effects operation without code change
ALTER DATABASE <dbName> SET READ_COMMITTED_SNAPSHOT [ON|OFF]
SELECT name, is_read_committed_snapshot_on FROM sys.databases
Consistency: Allow Snapshot Isolation [defaults off, debatable – OK off]:
Allow client to request SNAPSHOT across SQL statements (transactions).
Code must request "transaction" snapshots (like SET TRANSACTION ...)
ALTER DATABASE <dbName> SET ALLOW_SNAPSHOT_ISOLATION [ON|OFF]
SELECT name, snapshot_isolation_state FROM sys.databases
To the question: it is not one or the other between Read Committed Snapshot and Allow Snapshot Isolation. They are two cases of Snapshot, and either could be on or off independently, with Allow Snapshot Isolation a bit more of an advanced topic. Allow Snapshot Isolation allows code to go a step further controlling Snapshot land.
The issue seems clear if you think about one row: by default the system has no copy, so a reader has to wait if anyone else is writing, and a writer also has to wait if anyone else is reading – the row must lock all the time. Enabling "Is Read Committed Snapshot On" activates the DB to support "snapshot copies" to avoid these locks.
Rambling on...
In my opinion "Is Read Committed Snapshot On" should be TRUE for any normal MS SQLServer databases, and that it is a premature optimization that it ships FALSE by default.
However, I'm told the one row lock gets worse not only because you may be addressing multiple rows across tables, but because in SQL Server row locks are implemented using "block" level locks (locking random rows associated by storage proximity) and that there is a threshold where multiple locks trigger table locking - presumably more "optimistic" performance optimizations at the risk of blocking issues in busy databases.
Let me describe 2 points that have not been mentioned.
Firstly let's make it clear how to use both because it's not intuitive.
SNAPSHOT and READ_COMMITTED_SNAPSHOT are two different isolation levels.
SNAPSHOT is isolation level you can use in your transaction explicitly as usual:
begin transaction
set transaction isolation level snapshot;
-- ...
commit
READ_COMMITTED_SNAPSHOT can't be use like this. READ_COMMITTED_SNAPSHOT is both a database level option and an implicit/automatic isolation level. To use it, you need to enable it for the whole database:
alert database ... set read_committed_snapshot on;
What above database setting does, is that every time you run transaction like this:
begin transaction
set transaction isolation level read committed;
-- ...
commit
With this option ON, all READ_COMMITTED transactions will run under READ_COMMITTED_SNAPSHOT isolation level instead. This happens automatically, affecting all READ_COMMITTED transactions issued against database with this setting set to ON. It's not possible to run transaction under READ_COMMITTED isolation level because all transactions with this level will be automatically converted to READ_COMMITTED_SNAPSHOT.
Secondly you shouldn't blindly use READ_COMMITTED_SNAPSHOT option.
To illustrate the kind problems it can create, imagine you have simple events table like this:
create table Events (
id int not null identity(1, 1) primary key,
name nvarchar(450) not null
-- ...
)
And you poll it periodically with query like this:
begin transaction
set transaction isolation level read committed; -- automatically set to read committed snapshot when this setting is ON on database level
select top 100 * from Events where id > ${lastId} order by id asc;
commit
Above query doesn't need to be enclosed with transaction and explicit isolation level. READ_COMMITTED is default isolation level and if you invoke query without wrapping it in transaction block - it'll be implicitly run in READ_COMMITTED transaction.
You'll find that under READ_COMMITTED_SNAPSHOT isolation level auto-increment identity values may have gaps that later appear.
You can easily simulate it with insert like this:
begin transaction
insert into Events (name) values ('test 1');
waitfor delay '00:00:10'
commit
...followed by normal insert:
insert into Events (name) values ('test 2');
Your polling function invoked within 10s will return single row with id 2.
Following poll after updating lastId will return nothing. Row with id 1 had will appear after 10s.
Event with id 1 will be effectively skipped.
This will not happen if you use READ_COMMITTED without READ_COMMITTED_SNAPSHOT auto promotion option.
It's worth understanding this scenario. It's not related to the fact that IDENTITY column doesn't guarantee uniqueness. It's not related to the fact that IDENTITY column doesn't guarantee strict monotonicity. Even when both uniqueness and strict monotonicity are not violated, you still end up with gaps - possibility of seeing commits with higher ids before seeing commits with lower ids.
Under READ_COMMITTED this problem doesn't exist.
Under READ_COMMITTED you can also see gaps - ie. by transactions that rolled back. But those gaps will be permanent - ie. you are not skipping events because they will never reappear. Ie. you won't see lower ids reappearing later after you've seen higher ids.
Please understand above issue and its implications before turning READ_COMMITTED_SNAPSHOT on.
Control of this option lies in the gray area of developer vs db admin responsibility. If you're admin, you should not blindly use it as developers may have relied on READ_COMMITTED isolation semantics when developing application and turning READ_COMMITTED_SNAPSHOT may violate those assumptions in very implicit, hard to find bug way.

SQL Server Isolation Levels - Repeatable Read

I'm having problems getting my head round why this is happening. Pretty sure I understand the theory, but something else must be going on that I don't see.
Table A has the following schema:
ID [Primary Key]
Name
Type [Foreign Key]
SprocA sets Isolation Level to Repeatable Read, and Selects rows from Table A that have Type=1. It also updates these rows.
SprocB selects rows from Table A that have Type=2.
Now given that these are completely different rowsets, if I execute both at the same time (and put WAITFOR calls to slow it down), SprocB doesn't complete until SprocA.
I know it's to do with the query on Type, as if I select based on the Primary ID then it allows concurrent access to the table.
Anyone shed any light?
Cheers
With Repeatable Read set for the isolation level, you will hold a shared lock on all data you read until the transaction completes. That is until you COMMIT or ROLLBACK.
This will lower the concurrency of your application's access to this data. So if your first procedure SELECTS from table then calls a WAITFOR then SELECTS again etc within a transaction you will hold the shared lock the entire time until you commit the transaction or the process completes.
If this is a test procedure you are working with try added a COMMIT after each select and see if that helps the second procedure to run concurrently.
Good luck!
Kevin
SQL Server uses indexes to do range locks (which is what repeatable reads often use) so if you don't have index on Type perhaps it locks entire table...
The thing to remember is that the locked rows are black boxes to the other process.
You know that SprocA is just reading for type = 1 and that SprocbB is just reading for type = 2.
However, SprocB does not know what SprocA is going to do to those records. Before the transaction is completed, SprocA may update all of the records to type = 2. In that case, SprocB would be working incorrectly if it did not wait for SprocA to complete.
Maintaining concurrency when performing range locks / bulk changes is tough.

Resources