Serializable Isolation Level Confusion - Write Skew (Postgres)

Serializable Isolation Level Confusion - Write Skew (Postgres) - database

I'm running Postgres12 and confused about the behavior of the serializable transaction level.
Tables:
Events
id
difficulty
Managers
id
level
Intended behavior (within serialized transaction):
check if there are 7 or more events of difficulty=2
if so, insert a manager with level=2
I'm running the following transactions in serializable but not seeing the behavior I am expected (expected the serializable transaction to detect write skew between 2 sessions)
-- session 1:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
SELECT count(*) from events WHERE difficulty=2
-- RETURNS 7
-- now start session 2
-- session 2:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
SELECT id FROM events WHERE difficulty=2 LIMIT 1;
/*
id
----
4
*/
UPDATE events SET difficulty=1 WHERE id=4;
COMMIT;
now there are only 6 events of difficulty=2
-- back in session 1
-- since we have counted 7 events of difficulty=2 in this session, create a manager
INSERT INTO manager (level) VALUES (2);
COMMIT;
-- Expected write skew to be detected here bc the read event rows have seen updates (only 6 actually)
Unfortunately, our final state is now 6 events of difficulty=2 and a manager of level 2.
Why didn't serializable isolation prevent this write skew?
What am I misunderstanding about serializable isolation use case? Why are events with difficulty=2 not locked or watched by predicate locking or some isolation mechanism?
Picture for clarity

SERIALIZABLE means that there is a way to execute the transactions serially (one after the other) so that the effect is the same. In your case, this equivalent serial execution would run session 1 first, then session 2, with the same effect.
You could say that session 1 executes "logically" before session 2.

Answering my own question after some thinking!
The serialization check is not preventing the two sessions from committing because it is possible to serialize the two transactions and still end up with a level 2 manager and 6 events of difficulty=2.
E.g.
Run session 1 (check if 7 events with difficulty=2, create manager level=2) COMMIT;
Run session 2 (remove one event, now 6 events with difficulty=2) COMMIT;
^Output = 6 events, 1 manager
This is the same result as running concurrently so this is deemed an "acceptable" state for these two serializable transactions.
if you want to prevent this behaviour, session 2 can be updated to the following
begin transaction isolation level serializable;
select count(*) from manager where level=2;
--if no managers
update events set difficulty=1 where id=4;
Now there is no logical way to end up with the state 6 events 1 manager with a serialized ordering. The two possible outcomes from a sequential ordering are
session 1 runs
session 2 runs
^output = 7 events 1 managers
session 2 runs
session 1 runs
^output = 6 events 0 managers
So in this case (with the updated session 2), one of your transactions would be blocked because the transactions are no longer serializable.

Write skew doesn't occur in SERIALIZABLE in PostgreSQL:
Isolation Level
Write Skew
READ UNCOMMITTED
Yes
READ COMMITTED
Yes
REPEATABLE READ
Yes
SERIALIZABLE
No
I experimented if write skew occurs in SERIALIZABLE in PostgreSQL. There is event table with name and user as shown below.
event table:
name
user
Make Sushi
John
Make Sushi
Tom
Then, I took these steps below for the experiment of write skew. *Only 3 users can join the event "Make Sushi":
Flow
Transaction (T1)
Transaction (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT count(*) FROM event WHERE name = 'Make Sushi';2
T1 reads 2 so only one user can join it.
Step 4
SELECT count(*) FROM event WHERE name = 'Make Sushi';2
T2 reads 2 so only one user can join it.
Step 5
INSERT INTO event values ('Make Sushi', 'Lisa');
T1 inserts Lisa to event table.
Step 6
COMMIT;
T1 commits.
Step 7
INSERT INTO event values ('Make Sushi', 'Kai');ERROR: could not serialize access due to read/write dependencies among transactions
T2 cannot insert Kai to event table and gets error.
Step 8
COMMIT;
T2 rollbacks with COMMIT query.*Write skew doesn't occur.

Related

PostgreSQL's Repeatable Read Allows Phantom Reads But its document says that it does not allow

I have a problem with Postgresql repeatable read isolation level.
I did make an experiment about repeatable read isolation level's behavior when phantom read occurred.
Postgresql's manual says "The table also shows that PostgreSQL's Repeatable Read implementation does not allow phantom reads."
But phantom read occurred;
CREATE TABLE public.testmodel
(
id bigint NOT NULL
);
--Session 1 --
BEGIN TRANSACTION ISOLATION LEVEL Repeatable Read;
INSERT INTO TestModel(ID)
VALUES (10);
Select sum(ID)
From TestModel
where ID between 1 and 100;
--COMMIT;
--Session 2--
BEGIN TRANSACTION ISOLATION LEVEL Repeatable Read;
INSERT INTO TestModel(ID)
VALUES (10);
Select sum(ID)
From TestModel
where ID between 1 and 100;
COMMIT;
Steps I followed;
Create Table
Run session 1 (I commented commit statement)
Run session 2
Run commit statement in session 1.
To my surprise, both of them (session 1, session 2) worked without any exceptions.
As far as I understand from the document. It shouldn't have been.
I was expecting session 1 throw exception, when committing it after session 2.
What is the reason of this? I am confused.

The docs you referenced define a "phantom read" as a situation where:
A transaction re-executes a query returning a set of rows
that satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another recently-committed
transaction.
In other words, a phantom read has occurred if you run the same query twice (or two queries seeking the same data), and you get different results. The REPEATABLE READ isolation level prevents this from happening, i.e. if you repeat the same read, you will get the same answer. It does not guarantee that either of those results reflects the current state of the database.
Since you are only reading data once in each transaction, this cannot be an example of a phantom read. It falls under the more general category of a "serialization anomaly", i.e. behaviour which could not occur if the transactions were executed serially. This type of anomaly is only avoided at the SERIALIZABLE isolation level.
There is an excellent set of examples on the Postgres wiki, describing anomalies which are allowed under REPEATABLE READ, but prevented under SERIALIZABLE isolation:
https://wiki.postgresql.org/wiki/SSI

Phantom read doesn't occur in REPEATABLE READ in PostgreSQL as the documentation says. *I explain more about phantom read in my answer on Stack Overflow .
I experimented if phantom read occurs in REPEATABLE READ in PostgreSQL with 2 command prompts.
First, to set REPEATABLE READ, I rans the query below and log out and log in again:
ALTER DATABASE postgres SET DEFAULT_TRANSACTION_ISOLATION TO 'repeatable read';
And, I created person table with id and name as shown below.
person table:
id
name
1
John
2
David
Then, I did these steps below with PostgreSQL queries:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT * FROM person;1 John2 David
T1 reads 2 rows.
Step 4
INSERT INTO person VALUES (3, 'Tom');
T2 inserts the row with 3 and Tom to person table.
Step 5
COMMIT;
T2 commits.
Step 6
SELECT * FROM person;1 John2 David
T1 still reads 2 rows instead of 3 rows after T2 commits.*Phantom read doesn't occur!!
Step 7
COMMIT;
T1 commits.

You are misunderstanding what "does not allow phantom reads" means.
This simply means that phantom reads can not happen, not that there will be an error.
Session 2 will not see any committed changes to the table until the transaction from session 2 is committed as well.
repeatable read guarantees a consistent state of the database for the duration of the transaction where only changes made by that transaction itself will be visible, but no other changes. There is no need to throw an error.

Lost update in snapshot vs all the rest isolation levels

Let's suppose we use create new table and enable snapshot isolation for our database:
alter database database_name set allow_snapshot_isolation on
create table marbles (id int primary key, color char(5))
insert marbles values(1, 'Black') insert marbles values(2, 'White')
Next, in session 1 begin a snaphot transaction:
set transaction isolation level snapshot
begin tran
update marbles set color = 'Blue' where id = 2
Now, before committing the changes, run the following in session 2:
set transaction isolation level snapshot
begin tran
update marbles set color = 'Yellow' where id = 2
Then, when we commit session 1, session 2 will fail with an error about transaction aborted - I understand that is preventing from lost update.
If we follow this steps one by one but with any other isolation level such as: serializable, repeatable read, read committed or read uncommitted this Session 2 will get executed making new update to our table.
Could someone please explain my why is this happening?
For me this is some kind of lost update, but it seems like only snapshot isolation is preventing from it.

Could someone please explain my why is this happening?
Because under all the other isolation levels the point-in-time at which the second session first sees the row is after the first transaction commits. Locking is a kind of time travel. A session enters a lock wait and is transported forward in time to when the resource is eventually available.
For me this is some kind of lost update
No. It's not. Both updates were properly completed, and the final state of the row would have been the same if the transactions had been 10 minutes apart.
In a lost update scenario, each session will read the row before attempting to update it, and the results of the first transaction are needed to properly complete the second transaction. EG if each is incrementing a column by 1.
And under locking READ COMMITTED, REPEATABLE READ, and SERIALIZABLE the SELECT would be blocked, and no lost update would occur. And under READ_COMMITTED_SNAPSHOT the SELECT should have a UPDLOCK hint, and it would block too.

Why does REPEATABLE READ not see new rows?

I thought that REPEATABLE READ should not pick up on changed data but should pick up on new data.
However I have the following script:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
create table testLocking(a int);
BEGIN TRANSACTION
insert into testLocking values (1);
select * from testLocking;
WAITFOR DELAY '000:00:30';
select * from testLocking;
COMMIT TRANSACTION;
BEGIN TRANSACTION
insert into testLocking values (2);
UPDATE testLocking SET A=4 WHERE a=1;
select * from testLocking;
WAITFOR DELAY '000:00:40';
COMMIT TRANSACTION;
drop table testLocking;
I get the results:
1
1
4
2
I was expecting:
1
1
**2**
2
4
Can anyone see what I have done wrong?
UPDATED
I want to be able to see the effects of transaction isolation levels by running queries concurrently.

Repeatable Read means that changes made by other transactions are not reflected in your queries. You code makes it's change in the same transaction.
UPDATED
A single script is not able to show the effects of two transactions interacting with a database. The best way to do this in SSMS is to separate the code into two windows. Open them side by side using Window -> New Vertical Tab Group.
By highlighting lines one at a time you can run the transactions concurrently and you will see the effects of the queries in each window.

SQL Server - Is there any such thing called 'dirty write'?

Does SQL Server allow a transaction to modify the data that is currently being modified by another transaction but hasn't yet been committed? Is this possible under any of the isolation levels, let's say READ UNCOMMITTED since that is the least restrictive? Or does it completely prevent that from happening? Would you call that a 'dirty write' if it is possible?

Any RDBMS providing transactions and atomicity of transactions cannot allow dirty writes.
SQL Server must ensure that all writes can be rolled back. This goes even for a single statement because even a single statement can cause many writes and run for hours.
Imagine a row was written but needed to be rolled back. But meanwhile another write happened to that row that is already committed. Now we cannot roll back because that would violate the durability guarantee provided to the other transaction: the write would be lost. (It would possibly also violate the atomicity guarantee provided to that other transaction, if the row to be rolled back was one of several of its written rows).
The only solution is to always stabilize written but uncommitted data using X-locks.
SQL Server never allows dirty writes or lost writes.

No, you can't unless you update in the same transaction. Setting the Isolation Level to Read Uncommitted will only work to read the data from the table even if it has not been committed but you can't update it.
The Read Uncommitted Isolation Level and the nolock table hint will be ignored for update or delete statements and it will wait until the transaction is committed.

Dirty write didn't occur in SQL Server according to my experiment with READ UNCOMMITTED which is the most loose isolation level. *Basically, dirty write is not allowed with all isolation levels in many databases.
Dirty write is that a transaction updates or deletes (overwrites) the uncommitted data which other transactions insert, update or delete.
I experimented dirty write with MSSQL(SQL Server) and 2 command prompts.
First, I set READ UNCOMMITTED isolation level:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Then, I created person table with id and name as shown below.
person table:
id
name
1
John
2
David
Now, I did these steps below with MSSQL queries. *I used 2 command prompts:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN TRAN;GO;
T1 starts.
Step 2
BEGIN TRAN;GO;
T2 starts.
Step 3
UPDATE person SET name = 'Tom' WHERE id = 2;GO;
T1 updates David to Tom so this row is locked by T1 until T1 commits.
Step 4
UPDATE person SET name = 'Lisa' WHERE id = 2;GO;
T2 cannot update Tom to Lisa before T1 commits because this row is locked by T1 so to update this row, T2 needs to wait for T1 to unlock this row by commit.*Dirty write is not allowed.
Step 5
COMMIT;GO;
Waiting...
T1 commits.
Step 6
UPDATE person SET name = 'Lisa' WHERE id = 2;GO;
Now, T2 can update Tom to Lisa because T1 has already committed(unlocked this row).
Step 7
COMMIT;GO;
T2 commits.

Transaction isolation level REPEATABLE READ causes deadlocks

A part of my application updates a table as per business logic after opening a connection on transaction isolation level REPEATABLE READ. In a rare scenario, If this operation coincides with another part of the application which opens a different connection and tries to reset the same record to its default value. I get following error
Msg 1205, Level 13, State 45, Line 7
Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
I think i am able to re-produce the issue using following example.
1.
create table Accounts
(
id int identity(1,1),
Name varchar(50),
Amount decimal
)
2.
insert into Accounts (Name,Amount) values ('ABC',5000)
insert into Accounts (Name,Amount) values ('WXY',4000)
insert into Accounts (Name,Amount) values ('XYZ',4500)
3.
Start a long transaction with isolation level as REPEATABLE READ
Set transaction isolation level REPEATABLE READ
begin tran
declare #var int
select #var=amount
from Accounts
where id=1
waitfor delay '0:0:10'
if #var > 4000
update accounts
set amount = amount -100;
Commit
4.
While Step.3 above is still being executed. Start another transaction on a different connection
Begin tran
update accounts
set Amount = 5000
where id = 1
commit tran
Transaction started in Step 3 would eventually complete but the one started in Step 4 would fail with following error message.
Msg 1205, Level 13, State 45, Line 7
Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
What are my options to be able to eventually run transaction in step 4. The idea is to be able to reset the record to a default value and anything being performed on other transactions should be overridden in this case. I don't see any issue if both the transactions are not concurrent.

The idea is to be able to reset the record to a default value
In what order do you want the updates applied? Do you want the "reset" to always come through? Then you need to perform the reset strictly after the update in step 3 has completed. Also, the reset update should use a higher lock mode to avoid the deadlock:
update accounts WITH (XLOCK)
set Amount = 5000
where id = 1
That way the reset will wait for the other transaction to finish first because the other tran has an S-lock.
Alternatively, habe step 3 acquire an U-lock or X-lock.

You can set the deadlock priority of transaction in setp 4 to be higher
For more details see http://technet.microsoft.com/en-us/library/ms186736.aspx

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Serializable Isolation Level Confusion - Write Skew (Postgres) - database

Related

PostgreSQL's Repeatable Read Allows Phantom Reads But its document says that it does not allow

Lost update in snapshot vs all the rest isolation levels

Why does REPEATABLE READ not see new rows?

SQL Server - Is there any such thing called 'dirty write'?

Transaction isolation level REPEATABLE READ causes deadlocks

Categories

Resources