I'm having problems getting my head round why this is happening. Pretty sure I understand the theory, but something else must be going on that I don't see.
Table A has the following schema:
ID [Primary Key]
Name
Type [Foreign Key]
SprocA sets Isolation Level to Repeatable Read, and Selects rows from Table A that have Type=1. It also updates these rows.
SprocB selects rows from Table A that have Type=2.
Now given that these are completely different rowsets, if I execute both at the same time (and put WAITFOR calls to slow it down), SprocB doesn't complete until SprocA.
I know it's to do with the query on Type, as if I select based on the Primary ID then it allows concurrent access to the table.
Anyone shed any light?
Cheers
With Repeatable Read set for the isolation level, you will hold a shared lock on all data you read until the transaction completes. That is until you COMMIT or ROLLBACK.
This will lower the concurrency of your application's access to this data. So if your first procedure SELECTS from table then calls a WAITFOR then SELECTS again etc within a transaction you will hold the shared lock the entire time until you commit the transaction or the process completes.
If this is a test procedure you are working with try added a COMMIT after each select and see if that helps the second procedure to run concurrently.
Good luck!
Kevin
SQL Server uses indexes to do range locks (which is what repeatable reads often use) so if you don't have index on Type perhaps it locks entire table...
The thing to remember is that the locked rows are black boxes to the other process.
You know that SprocA is just reading for type = 1 and that SprocbB is just reading for type = 2.
However, SprocB does not know what SprocA is going to do to those records. Before the transaction is completed, SprocA may update all of the records to type = 2. In that case, SprocB would be working incorrectly if it did not wait for SprocA to complete.
Maintaining concurrency when performing range locks / bulk changes is tough.
Related
I have some doubts with respect to transactions and isolation levels:
1) In case the DB transaction level is set to Serializable / Repeatable Read and there are two concurrent transactions trying to modify the same data then one of the transaction will fail.
In such cases, why DB doesn't re-tries the failed operation? Is it a good practice to retry the transaction on application level (hoping the other transaction will be over in mean time)?
2) In case the DB transaction level is set to READ_COMMITTED / DIRTY READ and there are two concurrent transactions trying to modify the same data then why the transactions don't fail?
Ideally we are controlling the read behaviour and concurrent writes should not be allowed.
3) My application has 2 parts and uses the spring managed datasource in one part and application created datasource in other part (this part doesn't use spring and data source is explicit created by passing the properties).
My assumption is that isolation level has no impact - from which datasource the connections is coming from...two concurrent transactions even if coming from different datasource will behave the same based on isolation level as if they are coming from same datasource.
Do you see any issue with this setup? Should we strive for single datasource across application?
I also wait until others to give their feed backs. But now i would like to give my 2 cents to this post.
As you explained isolation's are work differently each.
I'll try to keep a sample data set as follows
IF OBJECT_ID('Employees') IS NOT NULL DROP TABLE Employees
GO
CREATE TABLE Employees (
Emp_id INT IDENTITY,
Emp_name VARCHAR(20),
Emp_Telephone VARCHAR(15),
)
ALTER TABLE Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (emp_id)
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Satsara', '07436743439'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Udhara', '045672903'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Sithara', '58745874859'
REPEATABLE READ and SERIALIZABLE are both very close to each, but SERIALIZABLE is the heights in the isolation. Both options are provided for avoid the dirty readings and both need to manage very carefully because most of the time this will cause for deadlocks due to the way that it handing the data. If there's a deadlock, definitely server will wipe out one transaction from the picture. So it will never run it by the server again due to it doesn't have any clue about that removed transaction, unless a log.
REPEATABLE READ - Not allow to modify (lock records) any records which is already read by another process (another query). But it allows for new records to insert (without a lock) which can be impact to your system while querying.
SERIALIZABLE - Different in Serializable is, its not allow to insert records with
"SET TRANSACTION ISOLATION LEVEL Serializable". So INSERT processors are wait until the previous transaction commit.
Usually REPEATABLE READ and SERIALIZABLE isolation's are keep data locks than other two options.
example [REPEATABLE and SERIALIZABLE]:
In Employee table you have 3 records.
Open a query window and run (QUERY 1)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM Employees;
Now try to run a insert query in a different window (QUERY 2)
INSERT INTO Employees(Emp_name, Emp_Telephone)
SELECT 'JANAKA', '3333333'
System allow to insert the new record in QUERY 2 and now run the same query2 again and you can see 4 records.
Now replace the Query 1 with following code and try the same process to test the Serializable
SET TRANSACTION ISOLATION LEVEL Serializable
BEGIN TRAN
SELECT * FROM Employees;
This time you can see the that 2nd Query insert command not allow to execute and wait until the Query 1 to commit.
Once Query 1 committed only, Query 2 allows to execute the INSERT command.
When compare the Read Committed and the Read Uncommitted,
READ COMMITTED - Changes to the data is not visible to other processors until it commit the records. With Read Committed. it puts shared locks for all the records it reads. If another process found a exclusive lock by, it wait until its lock release.
READ UNCOMMITTED - Not recommended and garbage data can read by the system due to this. (in SQL Server nolock). So this will return the uncommitted data.
"Select * from Employee (nolock)
**DEADLOCKS - ** Whether its Repeatable read, Serializable, READ COMMITTED or READ UNCOMMITTED, it can creates dead locks. Only things
is as we discussed Repeatable read and Serializable are more prone to
deadlocks than other two options.
Note: If you need sample for Read Committed and Read Uncommitted, please let know in the comment section and we can discuss.
Actually this topic is very large topic and need to discuss with lots of samples. I do not know this explanation is enough or not. But i gave a small try. NO idea ware to start and when to stop.
At the same time, you asked about " Is it a good practice to retry the
transaction on application level "
In my opinion that's fine. Personally i also do retrying process in some sort of a situations.
Different techniques used.
Keeping a Flag field to identify it updated or not and retry
Using a Event driven solution such RabitMQ, KAFKA.
I just realized that I fundamentally don't understand how .NET/SQL Server transactions work. I feel like I might pushing the envelop on "there's no such thing as a dumb question", but all of the documentation I've read is not easy to follow. I'm going to try to phrase this question in such a way that the answer will be pretty much yes/no.
If I have a .NET process running on one machine that is effectively doing this (not real code):
For i as Integer = 0 to 100
Using TransactionScope
Using SqlClient.SqlConnection
'Executed using SqlClient.SqlCommand'
"DELETE from TABLE_A"
Thread.Sleep(5000)
"INSERT INTO TABLE_A (Col1) VALUES ('A')"
TransactionScope.Complete()
End Using
End Using
Next i
Is there any Transaction / Isolation-Level configuration that will make 'SELECT count(*) FROM TABLE_A' always return '1' when run from other processes (i.e. even though there are 5 second chunks of time when there are no rows in the table in the context of the transaction)?
Yes, you can make other processes not see the changes you do in the transaction shown. To do that you need to alter the other processes, not the one making the modification.
Turn on snapshot isolation and use IsolationLevel.Snapshot on the other reading processes. They will see the table in the state right before you made any modifications. They won't block (wait).
SNAPSHOT isolation is what you're looking for. Assuming that the table has a row when you start your loop, a concurrent SELECT running under SNAPSHOT isolation level will always see 1 row, no matter when is run, without ever waiting.
All other isolation levels, except READ UNCOMMITTED, will also always see exactly 1 row, but will often block for up to 5 seconds. Note that I consider READ_COMMITTED_SNAPSHOT as SNAPSHOT for this argument.
Dirty reads, ie. SELECTs running under REAd UNCOMMITTED isolation level, will 0, 1 or even 2 rows. That is no mistake, dirty reads may see 2 rows even though you never inserted 2 at a time, it is because race conditions between the scan point of the SELECT and the insert point of your transaction, see Previously committed rows might be missed if NOLOCK hint is used for a similar issue discussion.
I believe the default transaction timeout is 1 minute (see: http://msdn.microsoft.com/en-us/library/ms172070.aspx ) so within the context of your transaction I think you're correct to expect the table to have no records before your insert (regardless of the pause), as each command will complete in sequence within the transaction and that would have been the result of the delete.
Hope that helps.
I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.
With SQL Server's transaction isolation levels, you can avoid certain unwanted concurrency issues, like dirty reads and so forth.
The one I'm interested in right now is lost updates - the fact two transactions can overwrite one another's updates without anyone noticing it. I see and hear conflicting statements as to which isolation level at a minimum I have to choose to avoid this.
Kalen Delaney in her "SQL Server Internals" book says (Chapter 10 - Transactions and Concurrency - Page 592):
In Read Uncommitted isolation, all the behaviors described previously, except lost updates, are possible.
On the other hand, an independent SQL Server trainer giving us a class told us that we need at least "Repeatable Read" to avoid lost updates.
So who's right?? And why??
I dont know if it is too late to answer but I am just learning about transaction isolation levels in college and as part of my research I came across this link:
Microsoft Technet
Specifically the paragraph in question is:
Lost Update
A lost update can be interpreted in one of two ways. In the first scenario, a lost update is considered to have taken place when data that has been updated by one transaction is overwritten by another transaction, before the first transaction is either committed or rolled back. This type of lost update cannot occur in SQL Server 2005 because it is not allowed under any transaction isolation level.
The other interpretation of a lost update is when one transaction (Transaction #1) reads data into its local memory, and then another transaction (Transaction #2) changes this data and commits its change. After this, Transaction #1 updates the same data based on what it read into memory before Transaction #2 was executed. In this case, the update performed by Transaction #2 can be considered a lost update.
So in essence both people are right.
Personally (and I am open to being wrong, so please correct me as I am just learning this) I take from this the following two points:
The whole point of a transaction enviorment is to prevent lost updates as described in the top paragraph. So if even the most basic transaction level cant do that then why bother using it.
When people talk about lost updates, they know the first paragraph applies, and so generally speaking mean the second type of lost update.
Again, please correct me if anything here is wrong as I would like to understand this too.
The example in the book is of Clerk A and Clerk B receiving shipments of Widgets.
They both check the current inventory, see 25 is in stock. Clerk A has 50 widgets and updates to 75, Clerk B has 20 widgets and so updates to 45 overwriting the previous update.
I assume she meant this phenomena can be avoided at all isolation levels by Clerk A doing
UPDATE Widgets
SET StockLevel = StockLevel + 50
WHERE ...
and Clerk B doing
UPDATE Widgets
SET StockLevel = StockLevel + 20
WHERE ...
Certainly if the SELECT and UPDATE are done as separate operations you would need repeatable read to avoid this so the S lock on the row is held for the duration of the transaction (which would lead to deadlock in this scenario)
Lost updates may occur even if reads and writes are in separate transactions, like when users read data into Web pages, then update. In such cases no isolation level can protect you, especially when connections are reused from a connection pool. We should use other approaches, such as rowversion. Here is my canned answer.
My experience is that with Read Uncommitted you no longer get 'lost updates', you can however still get 'lost rollbacks'. The SQL trainer was probably referring to that concurrency issue, so the answer you're likely looking for is Repeatable Read.
That said, I would be very interested if anyone has experience that goes against this.
As marked by Francis Rodgers, what you can rely on SQL Server implementation is that once a transaction updated some data, every isolation level always issue "update locks" over the data, and denying updates and writes from another transaction, whatever it's isolation level it is. You can be sure this kind of lost updates are covered.
However, if the situation is that a transaction reads some data (with an isolation level different than Repeatable Read), then another transaction is able to change this data and commits it's change, and if the first transaction then updates the same data but this time, based on the internal copy that he made, the management system cannot do anything for saving it.
Your answer in that scenario is either use Repeatable Read in the first transaction, or maybe use some read lock from the first transaction over the data (I don't really know about that in a confident way. I just know of the existence of this locks and that you can use them. Maybe this will help anyone who's interested in this approach Microsoft Designing Transactions and Optimizing Locking).
The following is quote from 70-762 Developing SQL Databases (p. 212):
Another potential problem can occur when two processes read the same
row and then update that data with different values. This might happen
if a transaction first reads a value into a variable and then uses the
variable in an update statement in a later step. When this update
executes, another transaction updates the same data. Whichever of
these transactions is committed first becomes a lost update because it
was replaced by the update in the other transaction. You cannot use
isolation levels to change this behavior, but you can write an
application that specifically allows lost updates.
So, it seems that none of the isolation levels can help you in such cases and you need to solve the issue in the code itself. For example:
DROP TABLE IF EXISTS [dbo].[Balance];
CREATE TABLE [dbo].[Balance]
(
[BalanceID] TINYINT IDENTITY(1,1)
,[Balance] MONEY
,CONSTRAINT [PK_Balance] PRIMARY KEY
(
[BalanceID]
)
);
INSERT INTO [dbo].[Balance] ([Balance])
VALUES (100);
-- query window 1
BEGIN TRANSACTION;
DECLARE #CurrentBalance MONEY;
SELECT #CurrentBalance = [Balance]
FROM [dbo].[Balance]
WHERE [BalanceID] = 1;
WAITFOR DELAY '00:00:05'
UPDATE [dbo].[Balance]
SET [Balance] = #CurrentBalance + 20
WHERE [BalanceID] = 1;
COMMIT TRANSACTION;
-- query window 2
BEGIN TRANSACTION;
DECLARE #CurrentBalance MONEY;
SELECT #CurrentBalance = [Balance]
FROM [dbo].[Balance]
WHERE [BalanceID] = 1;
UPDATE [dbo].[Balance]
SET [Balance] = #CurrentBalance + 50
WHERE [BalanceID] = 1;
COMMIT TRANSACTION;
Create the table, the execute each part of the code in separate query windows. Changing the isolation level does nothing. For example, the only difference between read committed and repeatable read is that the last, blocks the second transaction while the first is finished and then overwrites the value.
I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?
Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.
Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?
if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...