"Read skew" vs "Non-repeatable read" (Transaction) - database

I read A beginner’s guide to Read and Write Skew phenomena and A beginner’s guide to Non-Repeatable Read anomaly below to know what read skew and non-repeatable read are.
Read skew:
Non-repeatable read:
But, I cannot differentiate between read skew and non-repeatable read and basically, it seems like both can be prevented with REPEATABLE READ or SERIALIZABLE isolation level.
My questions:
What is the difference between read skew and non-repeatable
read?
Can read skew and non-repeatable read be both prevented by REPEATABLE READ or SERIALIZABLE?

1. What is the difference between read skew and non-repeatable read?
We have two data - let x and y, and there is a relation between them.(e.g parent/child)
Transaction T1 reads x, and then a second transaction T2 updates x and y to new values and commits. If now T1 reads y, it may see an inconsistent state, and therefore produce an inconsistent state as output.
Acceptable consistent states:
x and y
*x and *y
Note: * denotes the updated value of the variable
When x and y are the same data, meaning to read them, need to execute the same query.
I guess, it leads to the problem of non-repeatable.
IMHO, even if we may call read skew is a generalization form of a non-repeatable problem.
2. Can read skew and non-repeatable read be both prevented by REPEATABLE READ or SERIALIZABLE?
Serializable isolation level permits transactions to run concurrently, it creates the effect that transactions are running in serial order: read skew/non-repeatable prevented
Repeatable read isolation level guarantees that each transaction will return the same row regardless of how many times executed.
From the definition, it seems read-skew may not be prevented.
Without knowing how it is implemented, it is hard to claim anything.
Today, DBMS engines use different approaches-concurrency control strategies to implement REPEATABLE Isolation level.
e.g Postgres use database snapshot(consistent view) to implement REPEATABLE READ Isolation level: it will prevent read skew
Other engines may use lock-based concurrency control mechanisms to implement it. - may not prevent read skew.

Non-repeatable read(fuzzy read) is that a transaction reads the same row at least twice but the same row's data is different between the 1st and 2nd reads because other transactions update the same row's data and commit at the same time(concurrently). *I explain more about non-repeatable read in What is the difference between Non-Repeatable Read and Phantom Read?
Read skew is that with two different queries, a transaction reads inconsistent data because between the 1st and 2nd queries, other transactions insert, update or delete data and commit. Finally, an inconsistent result is produced by the inconsistent data.
There are the examples of read skew below which you can do with MySQL and 2 command prompts.
For these examples of read skew below, you need to set READ COMMITTED isolation level to occur read skew:
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
And, there is bank_account table with id, name and balance as shown below.
bank_account table:
id
name
balance
1
John
600
2
Tom
400
These steps below shows read skew. *300 is transferred from John's balance to Tom's balance. Then from Tom's balance, T1 reads 100(Inconsistent data) instead of 400. Finally, 600 + 100 = 700(Inconsistent result) in the total of John's and Tom's balances:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT balance FROM bank_account WHERE id = 1;600
T1 reads 600.
Step 4
UPDATE bank_account set balance = 900 WHERE id = 1;
T2 updates 600 to 900 because 300 is transferred from Tom's balance.
Step 5
UPDATE bank_account set balance = 100 WHERE id = 2;
T2 updates 400 to 100 because 300 is transferred to John's balance.
Step 6
COMMIT;
T2 commits.
Step 7
SELECT balance FROM bank_account WHERE id = 2;100
T1 reads 100(Inconsistent data) instead of 400 after T2 commits.
Step 8
600 + 100 = 700
Finally, T1 gets 700(Inconsistent result) instead of 1000.*Read skew occurs.
Step 9
COMMIT;
T1 commits.
These steps below also shows read skew. *300 is withdrawn from Tom's balance. Then, T1 reads 100(Inconsistent data) instead of 400. Finally, 600 + 100 = 700(Inconsistent result) in the total of John's and Tom's balances:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT balance FROM bank_account WHERE id = 1;600
T1 reads 600.
Step 4
UPDATE bank_account set balance = 100 WHERE id = 2;
T2 updates 400 to 100 because 300 is withdrawn from Tom's balance.
Step 5
COMMIT;
T2 commits.
Step 6
SELECT balance FROM bank_account WHERE id = 2;100
T1 reads 100(Inconsistent data) instead of 400 after T2 commits.
Step 7
600 + 100 = 700
Finally, T1 gets 700(Inconsistent result) instead of 1000.*Read skew occurs.
Step 8
COMMIT;
T1 commits.
In addition, this is also the example of read skew. There are teacher and student tables with id, name as shown below.
teacher table:
id
name
1
John
2
Tom
student table:
id
name
1
Anna
2
Sarah
3
David
4
Mark
5
Kai
These steps below shows read skew. *Lisa, Peter and Roy are inserted to "student" table. Then, T1 reads 8(Inconsistent data) instead of 5. Finally, 2 + 8 = 9(Inconsistent result) in the total of teachers and students:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT count(*) FROM teacher;2
T1 reads 2.
Step 4
INSERT INTO student values (6, 'Lisa'), (7, 'Peter'), (8, 'Roy');
T2 inserts Lisa, Peter and Roy.
Step 5
COMMIT;
T2 commits.
Step 6
SELECT count(*) FROM student;8
T1 reads 8(Inconsistent data) instead of 5 after T2 commits.
Step 7
2 + 8 = 10
Finally, T1 gets 10(Inconsistent result) instead of 7.*Read skew occurs!!
Step 8
COMMIT;
T1 commits.
These tables below are my experiment results of read skew with each isolation level in MySQL and PostgreSQL. *Yes means Occurs, No means Doesn't occur.
MySQL:
Isolation level
Read skew
READ UNCOMMITTED
Yes
READ COMMITTED
Yes
REPEATABLE READ
No
SERIALIZABLE
No
PostgreSQL:
Isolation level
Read skew
READ UNCOMMITTED
Yes
READ COMMITTED
Yes
REPEATABLE READ
No
SERIALIZABLE
No

Related

Lost-Update and Two-Phase Locking (How dead-lock occurs?)

Can you please check whether my answer is acceptable for these questions? Question 1 is about lost-update and Question 2 is about 2PL ( How deadlock occurs? )
QUESTION TITLE
QUESTION 1
My Answer for (Q1)
Lost-Update Problem
Reason:
Expected final value for A will be 280 while B will be 240. But due to lost-update problem, these two respective values couldnt be achieved.
As for the transaction T1, the value A is being read at time t1 which is 160 and value B is being read at time t2 which is 60. During time t2, Transaction T2 begins, and value B is being read as well which is known as a dirty-read. Followed by time t4, the value B has been calculated in Transaction 1 and on time t6, value B has been partially committed which is 120 meanwhile, in Transaction T2 time t5, value B has been calculated as well which now holds 120 and is being partially committed as well. Lastly, value B which is 120 has been committed in Transaction 1 on time t7 and later followed by value B which is 120, committed in Transaction 2 on time t11. This is when Lost-Update problem occurs whereby the value B on Transaction 1 is overwritten by value B on Transaction T2.
Versioning Approach -->
Still unsure about this will update my answer here after i finish my studies! Apologies for this!
QUESTION 2
My Answer for (Q2) How deadlock occurs in 2PL
My explanation
In Transaction T1 there are two locks being requested. During time t2, read-lock is being used on value A because value A is needed to read only and write-lock is being used on value B because value B is needed for modifying or writing data. Meanwhile, Transaction T2 begins and write-lock is being requested and notice that T1 has not yet unlock value B and this is when the write-lock in Transaction T2 Is not granted because value B is still being write-lock in Transaction T1 and this is known as mutual exclusion situation whereby operation in Transaction T2 will be waiting for Transaction T1 operation to finish, then value B will be unlocked and since only Transaction T2 is waiting on Transaction T1 and Transaction T1 isn't waiting for Transaction T2 then there is no deadlock to occur, Transaction T2 will just simply wait until Transaction T1 completes the transaction no matter how long Transaction T1 lasts. And as we can see at time t12 the write-lock on value B is being granted as soon as value A and value B is committed means the exclusive-lock on value B is usable as unlocking has been done on Transaction 1.

Transaction uses one table multiple times with different locks

Let’s consider the transaction:
Begin transaction
1 select from table(xlock)
2 insert into table
3 select from table
Rollbac
The question is: will transaction hold the xlock till transaction ends or it will be only for 1 and 2?

update with rowlock in MSSQL server

I was trying to understand ROWLOCK in SQL server to update a record after locking it. Here is my observation and would like to get a confirm if ROWLOCK is like a table or page lock sort of thing or I have not tried it correctly. ROWLOCK should be a lock to row only not to the table or page.
Here is what I tried:
I created a simple table:row_lock_temp_test with two columns ID and Name with no PK or index. Now I open SQL Server, two different clients but same credentials and tried executed a set of queries as follow:
Client 1:
1: BEGIN TRANSACTION;
2: update row_lock_temp_test set name = 'CC' where id = 2
3: COMMIT
Client 2:
1: BEGIN TRANSACTION;
2: update row_lock_temp_test set name= 'CC' where id = 2
3: COMMIT
I executed Query 1, 2 on C-1 and went to C-2 and executed the same queries, both clients executed the queries and then I committed the transaction, all good.
Then I added RowLock to update query,
C-1
1: BEGIN TRANSACTION;
2: update row_lock_temp_test WITH(rowlock) set name = 'CC' where id = 2
3: COMMIT
C-2
1: BEGIN TRANSACTION;
2: update row_lock_temp_test WITH(rowlock) set name = 'CC' where id = 2
3: COMMIT
Now, I executed query 1 and 2 on C-1 and then went to C-2 and tried to execute the same 2 queries, but query got Stuck as expected because the row is locked by C-1 so it should be in queue until the transaction is committed on C-1. as soon as I committed transaction on C-1 query on C-2 got executed and then I committed the transaction on C-2 as well. All good.
here I tried another scenario to execute the same set of queries with row id = 3
C-2
1: BEGIN TRANSACTION;
2: update row_lock_temp_test WITH(rowlock) set name = 'CC' where id = 3
3: COMMIT
I executed 1st two queries in C-1 and then went to executed 1st two queries of C-2, row id is different in both clients, but still, the query in C-2 got stuck. This means while updating query with id = 2 it has locked the page or table, I was expecting a row lock, but it seems a page or table lock.
I also tried using xlock, HOLDLOCK, and UPDLOCK with different combinations but it is always locking the table. is there any possibility to lock a row only.
Select and insert is working as expected.
Thanks in advance.
Lock hints are only hints. You can't "force" SQL to take a particular kind of lock.
You can see the locks being taken with the following query:
select tl.request_session_id,
tl.resource_type,
tl.request_mode,
tl.resource_description,
tl.request_status
from sys.dm_tran_locks tl
join sys.partitions pt on pt.hobt_id = tl.resource_associated_entity_id
join sys.objects ob on ob.object_id = pt.object_id
where tl.resource_database_id = db_id()
order by tl.request_session_id
OK, let's run some code in an SSMS query window:
create table t(i int, j int);
insert t values (1, 1), (2, 2);
begin tran;
update t with(rowlock) set j = 2 where i = 1;
Open a second SSMS window, and run this:
begin tran;
update t with(rowlock) set j = 2 where i = 2;
The second execution will be blocked. Why?
Run the locking query in a third window, and note that there are two rows with a resource_type of RID, one with a status of "grant", the other with a status of "wait". We'll get to the RID bit in a second. Also, look at the resource_description column for those rows. It's the same value.
OK, so what's a resource_description? It depends on theresource_type. But for our RID it represents: the file id, then the page id, then the row id (also known as the slot). But why are both executions taking a lock on row slot 0? Shouldn't they be trying to lock different rows? After all, we are updating different rows.
David Browne has given the answer: In order to find the correct row to update, SQL has to scan the entire table, because there is no index telling it how many rows there are where i = 1. It will take an update lock on each row as it scans through. Why does it take an update lock on each row? Well, it's not to "do" the update, to so speak. It will take an exclusive lock for that. Update locks are pretty much always taken to prevent deadlocks.
So, the first query has scanned through the rows, taking a U lock on each row. Of course, it found the row it wanted to update right away, in slot 0, and took an X lock. And it still has that X lock, because we haven't committed.
Then we started the second query, which also has to scan all of the rows to find the one it wants. It started off by trying to take the U lock on the first row, and was blocked. The X lock of our first query is blocking it.
So, you see, even with row locking, your second query is still blocked.
OK, let's rollback the queries, and see what happens if we have the first query update the second row, and the second query update the first row? Does that work? Nope! Because SQL still has no way of knowing how many rows match the predicate. So the first query takes its update lock on slot 0, sees that it doesn't have to update it, takes its update lock on slot 1, sees the correct value for i, takes its exclusive lock, and waits for us to commit.
The query 2 comes along, takes the update lock on slot 0, sees the value it wants, takes its exclusive lock, updates the value, and then tries to take an update lock on slot 1, because that might also have the value it wants.
You'll also see "intent locks" on the next "level" up, i.e., the page. The operation is letting the rest of the engine know that it might want to escalate the lock to the page level at some point in the future. But that's not a factor here. Page locking is not causing the issue.
Solution in this case? Add an index on column i. In this case, that's probably the primary key. You can then do the updates in either order. Asking for row locking in this case makes no difference, because SQL doesn't know how many rows match the predicate. But even if you try to force a row lock in some situation, and even with a primary key or appropriate index, SQL can still choose to escalate the lock type, because it can be way more efficient to lock a whole page, or a whole table, than to lock and unlock individual rows.

SQL Server Read Committed Snapshot

I am just wondering something about snapshot behavior on read committed isolation level. Let's assume that I have a table with name "A". Here is the first transaction:
Select blabla
From A
Insert Into A blabla
and second transaction does the same
Select blabla
From A
Insert Into A blabla
and assume that below timeline occurred:
Tran1: select
Tran1: insert (not yet committed)
Tran2: select (I don't know it is possible or not)
Tran2: insert
As far as I know, in standard read committed isolation level, tran2 select query would be blocked because of tran1 insert command not yet committed or rolled back. But, while "is_read_committed_snapshot" is enabled, I expect that any of lock won't acquired during insert or update command.
So what will happen to tran2?
I expect that tran2 select query won't see the data that inserted by tran1, because it would be "dirty read". But it wouldn't get block as well.
Because of the tran1 insert query does not acquire any lock, wouldn't this situation be a problem about concurrency of executing these two transactions?
I expect that any of lock won't acquired during insert or update
command.
That is wrong. Even if you have enabled RCSI, writers still block writers, and X locks are still acqiured.
What is different between RC and RCSI is reading behaviour.
When working on pessimistic RC, SELECT from Tran2 will be blocked on X lock held on A, while working on RCSI Tran2's SELECT will not be blocked, it will be provided with the last committed version of A, i.e. with the state of A before Tran1 has modificated it.
What happend then depends on your table organisation and on what you INSERT.
Some examples.
1) table A is a heap, you are doing single insert in both transactions.
In this case your INSERT in Tran2 will succeed in any case, be it the same value that you try to insert in both transactions or not, because what the server acquires in this case is IX on a table (that is compatible with IX held by Tran1), IX on a page (that is also compatible with IX held by Tran1, even if it is the same page), and X on RID (while Tran1 has X on another RID), so there is no conflict.
2) table A is clustered table, you are trying to insert the same new key in this table.
In this case your Tran2's INSERT will be blocked because of the conflict between two X lock on the same key, the first is held by Tran1, the secont is requested by Tran2 and is blocked.
3) table A is clustered table, you are trying to insert different keys in this table.
Insert2 will succeed because X lock on key requested by Tran2 will be granted as Tran1 holds IX on table, IX on page, and X on another key.
Lets say you're doing it this way:
SELECT id FROM customers
BEGIN TRAN new_tran
UPDATE customers
SET ID = '1'
WHERE ID = '01'
IF your query is something like this:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO
BEGIN TRAN
SELECT *
FROM customers
WHERE id = '01'
Result- Even if we have changed the value to 01, we will still see old record in session 2 (2, TWO).
Now, let’s commit transaction in session 1
Now lets say you commit the transaction, in session 2, now you'll get the new updated value:
COMMIT
SELECT *
FROM DemoTable
WHERE i = 2
You can read more about it on Pinal Dave's blog: blog.sqlauthority.com/2015/07/03/sql-server-difference-between-read-committed-snapshot-and-snapshot-isolation-level/

one to one parent child relationship sql server

I have a table with fields TransactionID, Amount and ParentTransactionID
The transactions can be cancelled so a new entry posted with amount and ParentTransactionID as cancelled TransactionID.
Lets say a transaction
1 100 NULL
I cancelled the above entry, it will like
2 -100 1
Again cancelled the above transaction, so it should like
3 100 2
When I fetch I should get the record 3 as ID 1 and 2 got cancelled.
result should be
3 100 2
If I cancelled the 3rd entry no records should return.
SELECT * FROM Transaction t
WHERE NOT EXISTS (SELECT TOP 1 NULL FROM Transaction pt
WHERE (pt.ParentTransactionID = t.TransactionID OR t.ParentTransactionID = pt.TransactionID)
AND ABS(t.Amount) = ABS(pt.Amount))
This works if only one level of cancel is made.
If all transactions are cancelled by a new transaction setting ParentTransactionId to the transaction it cancels, it can be done using a simple LEFT JOIN;
SELECT t1.* FROM Transactions t1
LEFT JOIN Transactions t2
ON t1.TransactionId = t2.ParentTransactionId
WHERE t2.TransactionId IS NULL;
t1 being the transaction we're currently looking at and t2 being the possibly cancelling transaction. If there is no cancelling transaction (ie the TransactionId for t2 does not exist), return the row.
I'm not sure about your last statement though, If I cancelled the 3rd entry no records should return.. How would you cancel #3 without adding a new transaction to the table? You may have some other condition for a cancel you're not telling us about...?
Simple SQLfiddle demo.
EDIT: Since you don't want cancelled transactions (or rather transactions with an odd number of cancellations), you need a quite a bit more complicated recursive query to figure out whether to show the last transaction or not;
WITH ChangeLog(TransactionID, Amount, ParentTransactionID,
IsCancel, OriginalTransactionID) AS
(
SELECT TransactionID, Amount, ParentTransactionID, 0, TransactionID
FROM Transactions WHERE ParentTransactionID IS NULL
UNION ALL
SELECT t.TransactionID, t.Amount, t.ParentTransactionID,
1-c.IsCancel, c.OriginalTransactionID
FROM Transactions t
JOIN ChangeLog c ON c.TransactionID = t.ParentTransactionID
)
SELECT c1.TransactionID, c1.Amount, c1.ParentTransactionID
FROM ChangeLog c1
LEFT JOIN ChangeLog c2
ON c1.TransactionID < c2.TransactionID
AND c1.OriginalTransactionID = c2.OriginalTransactionID
WHERE c2.TransactionID IS NULL AND c1.IsCancel=0
This will, in your example with 3 transactions, show the last row, but if the last row is cancelled, it won't return anything.
Since SQLfiddle is up again, here is a fiddle to test with.
A short explanation of the query may be in order even if a bit hard to do in a simple way; it defines a recursive "view", ChangeLog that tracks cancels and the original transaction id from the original to the last transaction in a series (a series is all transactions with the same OriginalTransactionId). After that, it joins ChangeLog with itself to find the last entry (ie all transactions that don't have a cancelling transaction). If the last entry found in a series is not a cancel (IsCancel=0) it will show up.

Resources