I am using transactionscope to ensure that data is being read to the database correctly. However, I may have a need to select some data (from another page) while the transaction is running. Would it be possible to do this? I'm very noob when it comes to databases.
I am using LinqToSQL and SQL Server 2005(dev)/2008(prod).
Yes, it is possible to still select data from a database while a transaction is running.
Data not affected by your transaction (for instance, rows in a table which are being not updated) can usually be read from other transactions. (In certain situations SQL Server will introduce a table lock that stops reads on all rows in the table but they are unusual and most often a symptom of something else going on in your query or on the server).
You need to look into Transaction Isolation Levels since these control exactly how this behaviour will work.
Here is the C# code to set the isolation level of a transaction scope.
TransactionOptions option = new TransactionOptions();
options.IsolationLevel = System.Transactions.IsolationLevel.ReadCommitted;
using (TransactionScope sc = new TransactionScope(TransactionScopeOption.Required, options)
{
// Code within transaction
}
In general, depending on the transaction isolation level specified on a transaction (or any table hints like NOLOCK) you get different levels of data locking that protect the rest of your application from activity tied up in your transaction. With a transaction isolation level of READUNCOMMITTED for example, you can see the writes within that transaction as they occur. This allows for dirty reads but also prevents (most) locks on data.
The other end of the scale is an isolation level like SERIALIZABLE which ensures that your transaction activity is entirely isolated until it has comitted.
In adition to the already provided advice, I would strongly recommend you look into snapshot isolation models. There is a good discussion at Using Snapshot Isolation. Enabling Read Committed Snapshot ON on the database can aleviate a lot of contention problems because readers are no longer blocked by writers. Since default reads are performed under read commited isolation mode, this simple database option switch has immediate benefits and requires no changes in the app.
There is no free lunch, so this comes at a price, in this case the price being aditional load on tempdb, see Row Versioning Resource Usage.
If howeever you are using explict isolation levels and specially if you use the default TransactionScope Serializable mode, then you'll have to review your code to enforce the more bening ReadCommited isolation level. If you don't know what isolation level you use, it means you use ReadCommited.
Yes, by default a TransactionScope will lock the tables involved in the transaction. If you need to read while a transaction is taking place, enter another TransactionScope with TransactionOptions IsolationLevel.ReadUncommitted:
TransactionScopeOptions = new TransactionScopeOptions();
options.IsolationLevel = IsolationLevel.ReadUncommitted;
using(var scope = new TransactionScope(
TransactionScopeOption.RequiresNew,
options
) {
// read the database
}
With a LINQ-to-SQL DataContext:
// db is DataContext
db.Transaction =
db.Connection.BeginTransaction(System.Data.IsolationLevel.ReadUncommitted);
Note that there is a difference between System.Transactions.IsolationLevel and System.Data.IsolationLevel. Yes, you read that correctly.
Related
I have some doubts with respect to transactions and isolation levels:
1) In case the DB transaction level is set to Serializable / Repeatable Read and there are two concurrent transactions trying to modify the same data then one of the transaction will fail.
In such cases, why DB doesn't re-tries the failed operation? Is it a good practice to retry the transaction on application level (hoping the other transaction will be over in mean time)?
2) In case the DB transaction level is set to READ_COMMITTED / DIRTY READ and there are two concurrent transactions trying to modify the same data then why the transactions don't fail?
Ideally we are controlling the read behaviour and concurrent writes should not be allowed.
3) My application has 2 parts and uses the spring managed datasource in one part and application created datasource in other part (this part doesn't use spring and data source is explicit created by passing the properties).
My assumption is that isolation level has no impact - from which datasource the connections is coming from...two concurrent transactions even if coming from different datasource will behave the same based on isolation level as if they are coming from same datasource.
Do you see any issue with this setup? Should we strive for single datasource across application?
I also wait until others to give their feed backs. But now i would like to give my 2 cents to this post.
As you explained isolation's are work differently each.
I'll try to keep a sample data set as follows
IF OBJECT_ID('Employees') IS NOT NULL DROP TABLE Employees
GO
CREATE TABLE Employees (
Emp_id INT IDENTITY,
Emp_name VARCHAR(20),
Emp_Telephone VARCHAR(15),
)
ALTER TABLE Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (emp_id)
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Satsara', '07436743439'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Udhara', '045672903'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Sithara', '58745874859'
REPEATABLE READ and SERIALIZABLE are both very close to each, but SERIALIZABLE is the heights in the isolation. Both options are provided for avoid the dirty readings and both need to manage very carefully because most of the time this will cause for deadlocks due to the way that it handing the data. If there's a deadlock, definitely server will wipe out one transaction from the picture. So it will never run it by the server again due to it doesn't have any clue about that removed transaction, unless a log.
REPEATABLE READ - Not allow to modify (lock records) any records which is already read by another process (another query). But it allows for new records to insert (without a lock) which can be impact to your system while querying.
SERIALIZABLE - Different in Serializable is, its not allow to insert records with
"SET TRANSACTION ISOLATION LEVEL Serializable". So INSERT processors are wait until the previous transaction commit.
Usually REPEATABLE READ and SERIALIZABLE isolation's are keep data locks than other two options.
example [REPEATABLE and SERIALIZABLE]:
In Employee table you have 3 records.
Open a query window and run (QUERY 1)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM Employees;
Now try to run a insert query in a different window (QUERY 2)
INSERT INTO Employees(Emp_name, Emp_Telephone)
SELECT 'JANAKA', '3333333'
System allow to insert the new record in QUERY 2 and now run the same query2 again and you can see 4 records.
Now replace the Query 1 with following code and try the same process to test the Serializable
SET TRANSACTION ISOLATION LEVEL Serializable
BEGIN TRAN
SELECT * FROM Employees;
This time you can see the that 2nd Query insert command not allow to execute and wait until the Query 1 to commit.
Once Query 1 committed only, Query 2 allows to execute the INSERT command.
When compare the Read Committed and the Read Uncommitted,
READ COMMITTED - Changes to the data is not visible to other processors until it commit the records. With Read Committed. it puts shared locks for all the records it reads. If another process found a exclusive lock by, it wait until its lock release.
READ UNCOMMITTED - Not recommended and garbage data can read by the system due to this. (in SQL Server nolock). So this will return the uncommitted data.
"Select * from Employee (nolock)
**DEADLOCKS - ** Whether its Repeatable read, Serializable, READ COMMITTED or READ UNCOMMITTED, it can creates dead locks. Only things
is as we discussed Repeatable read and Serializable are more prone to
deadlocks than other two options.
Note: If you need sample for Read Committed and Read Uncommitted, please let know in the comment section and we can discuss.
Actually this topic is very large topic and need to discuss with lots of samples. I do not know this explanation is enough or not. But i gave a small try. NO idea ware to start and when to stop.
At the same time, you asked about " Is it a good practice to retry the
transaction on application level "
In my opinion that's fine. Personally i also do retrying process in some sort of a situations.
Different techniques used.
Keeping a Flag field to identify it updated or not and retry
Using a Event driven solution such RabitMQ, KAFKA.
In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.
Recently I was thinking about query consistency in various SQL and NoSQL databases. What happens, when I have a (long running) query and rows are inserted or updated while the query is running? A simple theoretic example:
Let’s assume the following query takes a long time:
SELECT SUM(salary) FROM emp;
And while this query is running, another transaction does:
UPDATE emp SET salary = salary * 1.05 WHERE salary > 10000;
COMMIT;
When the SUM query has read half of the updated employees before the update, and the other half after the update, I would get an inconsistent nonsense result. Does this phenomenon have a name? By definition, it is not really a phantom read, because just one query is involved.
How do various DBs handle this situation? I am especially interested in SQL Server, MongoDB, RavenDB and Azure Table Storage.
Oracle for example guarantees statement-level read consistency, which says that the data returned by a single query is committed and consistent for a single point in time.
UPDATE: SQL Server seems to only prevent this kind of problem when READ_COMMITTED_SNAPSHOT is set to ON.
I believe the term you're looking for is "Dirty Read"
I can answer this one for SQL server.
You get 5 options for transaction isolation level, where the default is READ COMMITTED.
Only READ UNCOMMITTED allows dirty reads. You'll have to specifically enable that using SET TRANSACTION LEVEL READ UNCOMMITTED.
READ UNCOMMITTED is equivalent to NOLOCK, but syntactically nicer (opinion) as it doesn't need to be repeated for each table in your query.
Possible isolation levels are as below. I've linked the docs for more detail, if future readers find the link stale please edit.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SNAPSHOT
SERIALIZABLE
By default (read committed), you get your query and the update is blocked by the shared lock taken by your SELECT, until it completes.
If you enable Read Committed Snapshot Isolation Level (RCSI) as a database option, you continue to see the previous version of the data but the update isn't blocked.
Similarly, if the update was running first, when you have RSCI enabled, it doesn't block you, but you see the data before the update started.
RCSI is generally (but not 100% always) a good thing. I always design with it on. In Azure SQL DB, it's on by default.
Quick (I think) question about how Entity Framework participates in a DTC Transaction.
If I create two DbContexts within the same distributed transaction, will data I save in the first context be available to subsequent queries via the second context?
In other words, within a DTC transaction: I fetch, change and save data via context1, then create context2 and query the same entities. Will the uncommitted data from context1 be available to context2?
That depends on the isolation level you're using.
If you were to enclose your DbContexts in a TransactionScope and specifiy IsolationLevel.ReadUncommitted, e.g.
var options = new TransactionOptions{ IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted };
using(var scope= new TransactionScope(TransactionScopeOption.Required, options))
{
... DbContexts here
scope.Complete();
}
Then your second query will be able to see the changes made by the first.
For other isolation levels they won't.
The default isolation level for SQL Server is Read Committed, but for TransactionScope is Serializable.
See http://www.gavindraper.co.uk/2012/02/18/sql-server-isolation-levels-by-example/ for some examples.
It appears that the second context can indeed see the changes committed by the first. We are using read committed isolation. I am essentially doing (pseudo code):
Start Distributed Transaction
Context1: Fetch data into entities
Context1: Change entities
Context1: Save entities
Context2: Fetch data into same entities as used context 1
Context2: Change entities
Context2: Save entities
Commit Distributed Transaction
When I fetch into context 2, I do indeed see the changes that were saved in context 1, even though they are not committed (via DTC).
Say that a method only reads data from a database and does not write to it. Is it always the case that such methods don't need to run within a transaction?
In many databases a request for reading from the database which is not in an explicit transaction implicitly creates a transaction to run the request.
In a SQL database you may want to use a transaction if you are running multiple SELECT statements and you don't want changes from other transactions to show up in one SELECT but not an earlier one. A transaction running at the SERIALIZABLE transaction isolation level will present a consistent view of the data across multiple statements.
No. If you don't read at a specific isolation level you might not get enough guarantees. For example rows might disappear or new rows might appear.
This is true even for a single statement:
select * from Tab
except select * from Tab
This query can actually return rows in case of concurrent modifications because it scans the table twice.
SQL Server: There is an easy way to get fast, nonblocking, nonlocking, consistent reads: Enable snapshot isolation and read in a snapshot transaction. AFAIK Oracle has this capability as well. Postgres too.
the purpose of transaction is to rollback or commit the operations done to a database, if u are just selecting values and making no change in the data there is no need of transaction.