Documentation says, serializable transactions execute one after one.
But in practic it seems not to be truth. Here's two almost equal transactions, the difference is delay for 15 seconds only.
#1:
set transaction isolation level serializable
go
begin transaction
if not exists (select * from articles where title like 'qwe')
begin
waitfor delay '00:00:15'
insert into articles (title) values ('qwe')
end
commit transaction go
#2:
set transaction isolation level serializable
go
begin transaction
if not exists (select * from articles where title like 'qwe')
begin
insert into articles (title) values ('asd')
end
commit transaction go
The second transaction has been run after couple of seconds since the start of first one.
The result is deadlock. The first transaction dies with
Transaction (Process ID 58) was deadlocked on
lock resources with another process and has been chosen as the deadlock victim.
Rerun the transaction.
reason.
The conclusion, serializable transactions are not serial?
serializable transactions don't necessarily execute serially.
The promise is just that transactions can only commit if the result would be as if they had executed serially (in any order).
The locking requirements to meet this guarantee can frequently lead to deadlock where one of the transactions needs to be rolled back. You would need to code your own retry logic to resubmit the failed query.
See The Serializable Isolation Level for more about the differences between the logical description and implementation.
What happens here:
Because transactions 1 runs in serializable isolation level, it keeps a share lock it obtains on table articles while it wait. This way, it is guaranteed that the non exists condition remains true until the transaction terminates.
Transaction 2 gets a share lock as well that allows it to do the exist check condition. Then, with the insert statement, Transaction 2 requires to convert the share lock to a exclusive lock but has to wait as Transaction 1 holds a shared lock.
When Transaction 1 finishes to wait, it also requests a conversion to exclusive mode => deadlock situation, 1 of the transaction has to be terminated.
I got into a similar problem and i found that:
From MSDN:
SERIALIZABLE
Specifies the following:
Statements cannot read data that has been modified but not yet
committed by other transactions.
No other transactions can modify data that has been read by the
current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would
fall in the range of keys read by any statements in the current
transaction until the current transaction completes.
The second point does not state that both sessions can't take the shared lock that will result in deadlock. We solved it with a hint on SELECT.
select * from articles WITH (UPDLOCK, ROWLOCK) where title like 'qwe'
Have not tried if it would work in this case but i think you would have to lock on the table part since the row is not yet created.
Related
I have some doubts with respect to transactions and isolation levels:
1) In case the DB transaction level is set to Serializable / Repeatable Read and there are two concurrent transactions trying to modify the same data then one of the transaction will fail.
In such cases, why DB doesn't re-tries the failed operation? Is it a good practice to retry the transaction on application level (hoping the other transaction will be over in mean time)?
2) In case the DB transaction level is set to READ_COMMITTED / DIRTY READ and there are two concurrent transactions trying to modify the same data then why the transactions don't fail?
Ideally we are controlling the read behaviour and concurrent writes should not be allowed.
3) My application has 2 parts and uses the spring managed datasource in one part and application created datasource in other part (this part doesn't use spring and data source is explicit created by passing the properties).
My assumption is that isolation level has no impact - from which datasource the connections is coming from...two concurrent transactions even if coming from different datasource will behave the same based on isolation level as if they are coming from same datasource.
Do you see any issue with this setup? Should we strive for single datasource across application?
I also wait until others to give their feed backs. But now i would like to give my 2 cents to this post.
As you explained isolation's are work differently each.
I'll try to keep a sample data set as follows
IF OBJECT_ID('Employees') IS NOT NULL DROP TABLE Employees
GO
CREATE TABLE Employees (
Emp_id INT IDENTITY,
Emp_name VARCHAR(20),
Emp_Telephone VARCHAR(15),
)
ALTER TABLE Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (emp_id)
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Satsara', '07436743439'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Udhara', '045672903'
INSERT INTO Employees (Emp_name, Emp_Telephone)
SELECT 'Sithara', '58745874859'
REPEATABLE READ and SERIALIZABLE are both very close to each, but SERIALIZABLE is the heights in the isolation. Both options are provided for avoid the dirty readings and both need to manage very carefully because most of the time this will cause for deadlocks due to the way that it handing the data. If there's a deadlock, definitely server will wipe out one transaction from the picture. So it will never run it by the server again due to it doesn't have any clue about that removed transaction, unless a log.
REPEATABLE READ - Not allow to modify (lock records) any records which is already read by another process (another query). But it allows for new records to insert (without a lock) which can be impact to your system while querying.
SERIALIZABLE - Different in Serializable is, its not allow to insert records with
"SET TRANSACTION ISOLATION LEVEL Serializable". So INSERT processors are wait until the previous transaction commit.
Usually REPEATABLE READ and SERIALIZABLE isolation's are keep data locks than other two options.
example [REPEATABLE and SERIALIZABLE]:
In Employee table you have 3 records.
Open a query window and run (QUERY 1)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM Employees;
Now try to run a insert query in a different window (QUERY 2)
INSERT INTO Employees(Emp_name, Emp_Telephone)
SELECT 'JANAKA', '3333333'
System allow to insert the new record in QUERY 2 and now run the same query2 again and you can see 4 records.
Now replace the Query 1 with following code and try the same process to test the Serializable
SET TRANSACTION ISOLATION LEVEL Serializable
BEGIN TRAN
SELECT * FROM Employees;
This time you can see the that 2nd Query insert command not allow to execute and wait until the Query 1 to commit.
Once Query 1 committed only, Query 2 allows to execute the INSERT command.
When compare the Read Committed and the Read Uncommitted,
READ COMMITTED - Changes to the data is not visible to other processors until it commit the records. With Read Committed. it puts shared locks for all the records it reads. If another process found a exclusive lock by, it wait until its lock release.
READ UNCOMMITTED - Not recommended and garbage data can read by the system due to this. (in SQL Server nolock). So this will return the uncommitted data.
"Select * from Employee (nolock)
**DEADLOCKS - ** Whether its Repeatable read, Serializable, READ COMMITTED or READ UNCOMMITTED, it can creates dead locks. Only things
is as we discussed Repeatable read and Serializable are more prone to
deadlocks than other two options.
Note: If you need sample for Read Committed and Read Uncommitted, please let know in the comment section and we can discuss.
Actually this topic is very large topic and need to discuss with lots of samples. I do not know this explanation is enough or not. But i gave a small try. NO idea ware to start and when to stop.
At the same time, you asked about " Is it a good practice to retry the
transaction on application level "
In my opinion that's fine. Personally i also do retrying process in some sort of a situations.
Different techniques used.
Keeping a Flag field to identify it updated or not and retry
Using a Event driven solution such RabitMQ, KAFKA.
This stored procedure often fails if transactions happen simultaneusly, because it violates a pk constraint on duplicate key (field, name). I have been trying, holdlock, rowlock and begin/commit transaction, which result in the same error.
I am trying to perform an insert statement and if the record exist with the same key (field, name) update it instead.
Performance is important so i am trying to avoid a temp table solution.
MERGE Fielddata AS TARGET
USING (VALUES (#Field, #Name, #Value, #File, #Type))
AS SOURCE (Field, Name, Value, File, Type)
ON TARGET.Field = #Field AND TARGET.Name = #Name
WHEN MATCHED THEN
UPDATE
SET Value = SOURCE.Value,
File = SOURCE.File,
Type = SOURCE.Type
WHEN NOT MATCHED THEN
INSERT (Field, Name, Value, File, Type)
VALUES (SOURCE.Field, SOURCE.Name, SOURCE.Value, SOURCE.File, SOURCE.Type);
EDIT: Testing with serializable/holdlock for 24 hours. After 30 mins: no errors.
EDIT 2: WITH (SERIALIZABLE) / SET TRANSACTION ISOLATION LEVEL SERIALIZABLE solves the duplicate key problem effectively, costing a little bit of performance in our case/scenario.
You must increase the level of transaction isolation.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
SERIALIZABLE Specifies the following: + Statements cannot read data
that has been modified but not yet committed by other transactions. No
other transactions can modify data that has been read by the current
transaction until the current transaction completes. Other
transactions cannot insert new rows with key values that would fall in
the range of keys read by any statements in the current transaction
until the current transaction completes. Range locks are placed in the
range of key values that match the search conditions of each statement
executed in a transaction. This blocks other transactions from
updating or inserting any rows that would qualify for any of the
statements executed by the current transaction. This means that if any
of the statements in a transaction are executed a second time, they
will read the same set of rows. The range locks are held until the
transaction completes. This is the most restrictive of the isolation
levels because it locks entire ranges of keys and holds the locks
until the transaction completes. Because concurrency is lower, use
this option only when necessary. This option has the same effect as
setting HOLDLOCK on all tables in all SELECT statements in a
transaction.
Serializable Implementations
SQL Server happens to use a locking implementation of the serializable
isolation level, where physical locks are acquired and held to the end
of the transaction (hence the deprecated table hint HOLDLOCK as a
synonym for SERIALIZABLE).
This strategy is not quite enough to provide a technical guarantee of
full serializability, because new or changed data could appear in a
range of rows previously processed by the transaction. This
concurrency phenomenon is known as a phantom, and can result in
effects which could not have occurred in any serial schedule.
To ensure protection against the phantom concurrency phenomenon, locks
taken by SQL Server at the serializable isolation level may also
incorporate key-range locking to prevent new or changed rows from
appearing between previously-examined index key values. Range locks
are not always acquired under the serializable isolation level; all we
can say in general is that SQL Server always acquires sufficient locks
to meet the logical requirements of the serializable isolation level.
In fact, locking implementations quite often acquire more, and
stricter, locks than are really needed to guarantee serializability,
but I digress.
https://sqlperformance.com/2014/04/t-sql-queries/the-serializable-isolation-level
if it is simple: it blocks not only the source but also the range for insertion
If I have a database transaction which goes along the lines of:
DELETE FROM table WHERE id = ANY(ARRAY[id1, id2, id3,...]) RETURNING foo, bar;
if num_rows_returned != num_rows_in_array then
rollback and return
Do stuff with deleted data...
Commit
My understanding is that the DELETE query will lock those rows, until the transaction is committed or rolled back. As according to the postgres 9.1 docs:
An exclusive row-level lock on a specific row is automatically
acquired when the row is updated or deleted. The lock is held until
the transaction commits or rolls back, just like table-level locks.
Row-level locks do not affect data querying; they block only writers
to the same row.
I am using the default read committed isolation level in postgres 9.1.13
I would take from this that I should be OK, but I want to ensure that this means the following things are true:
Only one transaction may delete and return a row from this table, unless a previous transaction was rolled back.
This means "Do stuff with deleted data" can only be done once per row.
If two transactions try to do the above at once with conflicting rows, one will always succeed (ignoring system failure), and one will always fail.
Concurrent transactions may succeed when there is no crossover of rows.
If a transaction is unable to delete and return all rows, it will rollback and thus not delete any rows. A transaction may try to delete two rows for example. One row is already deleted by another transaction, but the other is free to be returned. However since one row is already deleted, the other must not be deleted and processed. Only if all specified ids can be deleted and returned may anything take place.
Using the normal idea of concurrency, processes/transactions do not fail when they are locked out of data, they wait.
The DBMS implements execution in such a way that transactions advance but only seeing effects from other transactions according to the isolation level. (Only in the case of detected deadlock is a transaction aborted, and even then its implemented execution will begin again, and the killing is not evident to its next execution or to other transactions except per isolation level.) Under SERIALIZABLE isolation level this means that the database will change as if all transactions happened without overlap in some order. Other levels allow a transaction to see certain effects of overlapped implementation execution of other transactions.
However in the case of PostgresSQL under SERIALIZABLE when a transaction tries to commit and the DBMS sees that it would give non-serialized behaviour the tranasaction is aborted with notification but not automatically restarted. (Note that this is not failure from implementation execution attempted access to a locked resource.)
(Prior to 9.1, PostgrSQL SERIALIZABLE did not give SQL standard (serialized) behaviour: "To retain the legacy Serializable behavior, Repeatable Read should now be requested.")
The locking protocols are how actual implementation execution gets interleaved to maximize throughput while keeping that true. All locking does is prevent actual overlapped implementation execution accesses to effect the apparent serialized execution.
Explicit locking by transaction code also just causes waiting.
Your question does not reflect this. You seem to think that attempted access to a locked resource by the implementation aborts a transaction. That is not so.
I have about 20 stored procedures that consume each other, forming a tree-like dependency chain.
The stored procedures however use in-memory tables for caching and can be called concurrently from many different clients.
To protect against concurrent update / delete attempts against the in-memory tables, I am using sp_getapplock and SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT ON;.
I am using a hash of the stored procedure parameters that is unique to each stored procedure, but multiple concurrent calls to the same stored procedure with the same parameters should generate the same hash. It's this equality of the hash for concurrent calls to the same stored proc with the same parameters that gives me a useful resource name to obtain our applock against.
Below is an example:
BEGIN TRANSACTION
EXEC #LOCK_STATUS = sp_getapplock #Resource= [SOME_HASH_OF_PARAMETERS_TO_THE_SP], #LockMode = 'Exclusive';
...some stored proc code...
IF FAILURE
BEGIN
ROLLBACK;
THROW [SOME_ERROR_NUMBER]
END
...some stored proc code...
COMMIT TRANSACTION
Despite wrapping everything in an applock which should block any concurrent updates or deletes, I still get error 41302:
The current transaction attempted to update a record that has been
updated since this transaction started. The transaction was aborted.
Uncommittable transaction is detected at the end of the batch. The
transaction is rolled back.
Am I using sp_getapplock incorrectly? It seems like the approach I am suggesting should work.
The second you begin your transaction with a memory optimized table you get your "snapshot", which is based on the time the transaction started, for optimistic concurrency resolution. Unfortunately your lock is in place after the snapshot is taken and so it's still entirely possible to have optimistic concurrency resolution failures.
Consider the situation where 2 transactions that need the same lock begin at once. They both begin their transactions "simultaneously" before either obtains the lock or modifies any rows. Their snapshots look exactly the same because no data has been modified yet. Next, one transaction obtains the lock and proceeds to make its changes while the other is blocked. This transaction commits fine because it was first and so the snapshot it refers to still matches the data in memory. The other transaction obtains the lock now, but it's snapshot is invalid (however it doesn't know this yet). It proceeds to execute and at the end realizes its snapshot is invalid so it throws you an error. Truthfully the transactions don't even need to start near simultaneously, the second transaction just needs to start before the first commits.
You need to either enforce the lock at the application level, or through use of sp_getapplock at the session level.
Hope this helped.
I am trying to understand isolation/locks in SQL Server.
I have following scenario in READ COMMITTED isolation level(Default)
We have a table.
create table Transactions(Tid int,amt int)
with some records
insert into Transactions values(1, 100)
insert into Transactions values(2, -50)
insert into Transactions values(3, 100)
insert into Transactions values(4, -100)
insert into Transactions values(5, 200)
Now from msdn i understood
When a select is fired shared lock is taken so no other transaction can modify data(avoiding dirty read).. Documentation also talks about row level, page level, table level lock. I thought of following scenarion
Begin Transaction
select * from Transactions
/*
some buisness logic which takes 5 minutes
*/
Commit
What I want to understand is for what duration of time shared lock would be acquired and which (row, page, table).
Will lock will be acquire only when statement select * from Transactions is run or would it be acquire for whole 5+ minutes till we reach COMMIT.
You are asking the wrong question, you are concerned about the implementation details. What you should think of and be concerned with are the semantics of the isolation level. Kendra Little has a nice poster explaining them: Free Poster! Guide to SQL Server Isolation Levels.
Your question should be rephrased like:
select * from Items
Q: What Items will I see?
A: All committed Items
Q: What happens if there are uncommitted transactions that have inserted/deleted/update Items?
A: your SELECT will block until all uncommitted Items are committed (or rolled back).
Q: What happens if new Items are inserted/deleted/update while I run the query above?
A: The results are undetermined. You may see some of the modifications, won't see some other, and possible block until some of them commit.
READ COMMITTED makes no promise once your statement finished, irrelevant of the length of the transaction. If you run the statement again you will have again exactly the same semantics as state before, and the Items you've seen before may change, disappear and new one can appear. Obviously this implies that changes can be made to Items after your select.
Higher isolation levels give stronger guarantees: REPEATABLE READ guarantees that no item you've selected the first time can be modified or deleted until you commit. SERIALIZABLE adds the guarantee that no new Item can appear in your second select before you commit.
This is what you need to understand, no how the implementation mechanism works. After you master these concepts, you may ask the implementation details. They're all described in Transaction Processing: Concepts and Techniques.
Your question is a good one. Understanding what kind of locks are acquired allows a deep understanding of DBMS's. In SQL Server, under all isolation levels (Read Uncommitted, Read Committed (default), Repeatable Reads, Serializable) Exclusive Locks are acquired for Write operations.
Exclusive locks are released when transaction ends, regardless of the isolation level.
The difference between the isolation levels refers to the way in which Shared (Read) Locks are acquired/released.
Under Read Uncommitted isolation level, no Shared locks are acquired. Under this isolation level the concurrency issue known as "Dirty Reads" (a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed, so it could be rolled back) can occur.
Under Read Committed isolation level, Shared Locks are acquired for the concerned records. The Shared Locks are released when the current instruction ends. This isolation level prevents "Dirty Reads" but, since the record can be updated by other concurrent transactions, "Non-Repeatable Reads" (transaction A retrieves a row, transaction B subsequently updates the row, and transaction A later retrieves the same row again. Transaction A retrieves the same row twice but sees different data) or "Phantom Reads" (in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first) can occur.
Under Repeatable Reads isolation level, Shared Locks are acquired for the transaction duration. "Dirty Reads" and "Non-Repeatable Reads" are prevented but "Phantom Reads" can still occur.
Under Serializable isolation level, ranged Shared Locks are acquired for the transaction duration. None of the above mentioned concurrency issues occur but performance is drastically reduced and there is the risk of Deadlocks occurrence.
lock will only acquire when select * from Transaction is run
You can check it with below code
open a sql session and run this query
Begin Transaction
select * from Transactions
WAITFOR DELAY '00:05'
/*
some buisness logic which takes 5 minutes
*/
Commit
Open another sql session and run below query
Begin Transaction
Update Transactions
Set = ...
where ....
commit
First, lock only acquire when statement run.
Your statement seprate in two pieces, suppose to be simplfy:
select * from Transactions
update Transactions set amt = xxx where Tid = xxx
When/what locks are hold/released in READ COMMITTED isolation level?
when select * from Transactions run, no lock acquired.
Following update Transactions set amt = xxx where Tid = xxx will add X lock for updating/updated keys, IX lock for page/tab
All lock will release only after committed/rollbacked. That means no lock will release in trans running.