Two phase locking is claimed to be a solution for ensuring serial execution. However, I'm not sure how it adequately solves the lost update problem during a read-modify-write cycle. I may be overlooking / misunderstanding the locking mechanism here!
For example, assuming we have a database running using 2PL:
Given a SQL table account with an integer column email_count, lets assume we have the following record in our database:
| ID | email_count |
| ----- | ----- |
| 1 | 0 |
Now lets assume we have two concurrently executing transactions, T1 and T2. Both transactions will read email_count from accounts where ID = 1, increment the count value by 1, and write back the result.
Here's one scenario in which 2PL does not seem to address the lost update problem (T1 represents transaction 1):
T1 -> Obtains a non-exclusive, shared read lock. Read from email_count for ID = 1. Gets the result 0. Application sets a new value (0 + 1 = 1) for a later write.
T2 -> Also obtains a non-exclusive, shared read lock. Read from `email_count' for ID = 1. Gets the result 0. Application also sets a new value (using a now stale pre-condition), which is 1 (0 + 1 = 1).
T1 -> Obtains an exclusive write lock and writes the new value (1) to our record. This will block T2 from writing.
T2 -> Attempts to obtain write lock so it can write the value 1, but is forced to wait for T1 to complete its transaction and release all of T1's own locks.
Now here's my question:
Once T1 completes and releases its locks (during the "shrink" phase of our 2PL), T2 still has a stale email_count value of 1! So when T1 completes and T2 proceeds with its write (with email_count = 1), we'll "lose" the original update from T1.
If T2 has read-lock, T1 cannot acquire an exclusive lock until T2 releases the read lock. Thus, the execution sequence you describe cannot happen. T1 would be denied the write lock, and T2 continues the transaction.
T1 -> Obtains an exclusive write lock and writes the new value (1) to
our record. This will block T2 from writing.
Above step cannot happen because T2 has already had shared read lock, so T1 should wait until T2 release shared read lock.
But, T2 can't release shared read lock.
Because, according to 2PL, if T2 want to release shared read lock, T2 should get write lock first.
But T2 can't get write lock because T1 has already had shared read lock.
So,, yes, it's deadlock. That's why 2PL prevent lost update. Even if it may produce deadlock..
Related
I read A beginner’s guide to Read and Write Skew phenomena and A beginner’s guide to Non-Repeatable Read anomaly below to know what read skew and non-repeatable read are.
Read skew:
Non-repeatable read:
But, I cannot differentiate between read skew and non-repeatable read and basically, it seems like both can be prevented with REPEATABLE READ or SERIALIZABLE isolation level.
My questions:
What is the difference between read skew and non-repeatable
read?
Can read skew and non-repeatable read be both prevented by REPEATABLE READ or SERIALIZABLE?
1. What is the difference between read skew and non-repeatable read?
We have two data - let x and y, and there is a relation between them.(e.g parent/child)
Transaction T1 reads x, and then a second transaction T2 updates x and y to new values and commits. If now T1 reads y, it may see an inconsistent state, and therefore produce an inconsistent state as output.
Acceptable consistent states:
x and y
*x and *y
Note: * denotes the updated value of the variable
When x and y are the same data, meaning to read them, need to execute the same query.
I guess, it leads to the problem of non-repeatable.
IMHO, even if we may call read skew is a generalization form of a non-repeatable problem.
2. Can read skew and non-repeatable read be both prevented by REPEATABLE READ or SERIALIZABLE?
Serializable isolation level permits transactions to run concurrently, it creates the effect that transactions are running in serial order: read skew/non-repeatable prevented
Repeatable read isolation level guarantees that each transaction will return the same row regardless of how many times executed.
From the definition, it seems read-skew may not be prevented.
Without knowing how it is implemented, it is hard to claim anything.
Today, DBMS engines use different approaches-concurrency control strategies to implement REPEATABLE Isolation level.
e.g Postgres use database snapshot(consistent view) to implement REPEATABLE READ Isolation level: it will prevent read skew
Other engines may use lock-based concurrency control mechanisms to implement it. - may not prevent read skew.
Non-repeatable read(fuzzy read) is that a transaction reads the same row at least twice but the same row's data is different between the 1st and 2nd reads because other transactions update the same row's data and commit at the same time(concurrently). *I explain more about non-repeatable read in What is the difference between Non-Repeatable Read and Phantom Read?
Read skew is that with two different queries, a transaction reads inconsistent data because between the 1st and 2nd queries, other transactions insert, update or delete data and commit. Finally, an inconsistent result is produced by the inconsistent data.
There are the examples of read skew below which you can do with MySQL and 2 command prompts.
For these examples of read skew below, you need to set READ COMMITTED isolation level to occur read skew:
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
And, there is bank_account table with id, name and balance as shown below.
bank_account table:
id
name
balance
1
John
600
2
Tom
400
These steps below shows read skew. *300 is transferred from John's balance to Tom's balance. Then from Tom's balance, T1 reads 100(Inconsistent data) instead of 400. Finally, 600 + 100 = 700(Inconsistent result) in the total of John's and Tom's balances:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT balance FROM bank_account WHERE id = 1;600
T1 reads 600.
Step 4
UPDATE bank_account set balance = 900 WHERE id = 1;
T2 updates 600 to 900 because 300 is transferred from Tom's balance.
Step 5
UPDATE bank_account set balance = 100 WHERE id = 2;
T2 updates 400 to 100 because 300 is transferred to John's balance.
Step 6
COMMIT;
T2 commits.
Step 7
SELECT balance FROM bank_account WHERE id = 2;100
T1 reads 100(Inconsistent data) instead of 400 after T2 commits.
Step 8
600 + 100 = 700
Finally, T1 gets 700(Inconsistent result) instead of 1000.*Read skew occurs.
Step 9
COMMIT;
T1 commits.
These steps below also shows read skew. *300 is withdrawn from Tom's balance. Then, T1 reads 100(Inconsistent data) instead of 400. Finally, 600 + 100 = 700(Inconsistent result) in the total of John's and Tom's balances:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT balance FROM bank_account WHERE id = 1;600
T1 reads 600.
Step 4
UPDATE bank_account set balance = 100 WHERE id = 2;
T2 updates 400 to 100 because 300 is withdrawn from Tom's balance.
Step 5
COMMIT;
T2 commits.
Step 6
SELECT balance FROM bank_account WHERE id = 2;100
T1 reads 100(Inconsistent data) instead of 400 after T2 commits.
Step 7
600 + 100 = 700
Finally, T1 gets 700(Inconsistent result) instead of 1000.*Read skew occurs.
Step 8
COMMIT;
T1 commits.
In addition, this is also the example of read skew. There are teacher and student tables with id, name as shown below.
teacher table:
id
name
1
John
2
Tom
student table:
id
name
1
Anna
2
Sarah
3
David
4
Mark
5
Kai
These steps below shows read skew. *Lisa, Peter and Roy are inserted to "student" table. Then, T1 reads 8(Inconsistent data) instead of 5. Finally, 2 + 8 = 9(Inconsistent result) in the total of teachers and students:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT count(*) FROM teacher;2
T1 reads 2.
Step 4
INSERT INTO student values (6, 'Lisa'), (7, 'Peter'), (8, 'Roy');
T2 inserts Lisa, Peter and Roy.
Step 5
COMMIT;
T2 commits.
Step 6
SELECT count(*) FROM student;8
T1 reads 8(Inconsistent data) instead of 5 after T2 commits.
Step 7
2 + 8 = 10
Finally, T1 gets 10(Inconsistent result) instead of 7.*Read skew occurs!!
Step 8
COMMIT;
T1 commits.
These tables below are my experiment results of read skew with each isolation level in MySQL and PostgreSQL. *Yes means Occurs, No means Doesn't occur.
MySQL:
Isolation level
Read skew
READ UNCOMMITTED
Yes
READ COMMITTED
Yes
REPEATABLE READ
No
SERIALIZABLE
No
PostgreSQL:
Isolation level
Read skew
READ UNCOMMITTED
Yes
READ COMMITTED
Yes
REPEATABLE READ
No
SERIALIZABLE
No
Can you please check whether my answer is acceptable for these questions? Question 1 is about lost-update and Question 2 is about 2PL ( How deadlock occurs? )
QUESTION TITLE
QUESTION 1
My Answer for (Q1)
Lost-Update Problem
Reason:
Expected final value for A will be 280 while B will be 240. But due to lost-update problem, these two respective values couldnt be achieved.
As for the transaction T1, the value A is being read at time t1 which is 160 and value B is being read at time t2 which is 60. During time t2, Transaction T2 begins, and value B is being read as well which is known as a dirty-read. Followed by time t4, the value B has been calculated in Transaction 1 and on time t6, value B has been partially committed which is 120 meanwhile, in Transaction T2 time t5, value B has been calculated as well which now holds 120 and is being partially committed as well. Lastly, value B which is 120 has been committed in Transaction 1 on time t7 and later followed by value B which is 120, committed in Transaction 2 on time t11. This is when Lost-Update problem occurs whereby the value B on Transaction 1 is overwritten by value B on Transaction T2.
Versioning Approach -->
Still unsure about this will update my answer here after i finish my studies! Apologies for this!
QUESTION 2
My Answer for (Q2) How deadlock occurs in 2PL
My explanation
In Transaction T1 there are two locks being requested. During time t2, read-lock is being used on value A because value A is needed to read only and write-lock is being used on value B because value B is needed for modifying or writing data. Meanwhile, Transaction T2 begins and write-lock is being requested and notice that T1 has not yet unlock value B and this is when the write-lock in Transaction T2 Is not granted because value B is still being write-lock in Transaction T1 and this is known as mutual exclusion situation whereby operation in Transaction T2 will be waiting for Transaction T1 operation to finish, then value B will be unlocked and since only Transaction T2 is waiting on Transaction T1 and Transaction T1 isn't waiting for Transaction T2 then there is no deadlock to occur, Transaction T2 will just simply wait until Transaction T1 completes the transaction no matter how long Transaction T1 lasts. And as we can see at time t12 the write-lock on value B is being granted as soon as value A and value B is committed means the exclusive-lock on value B is usable as unlocking has been done on Transaction 1.
I have a problem where I insert User and Address in a transaction with a 10 second delay and if run my select statement during the execution of the transaction it will wait for transaction to finish but I will get a null on the join. Why don't my select wait for both User/Address data to be committed.
If I run the select statement after the transaction is finish I will get the correct result. Why do i get this error and what is the generic solution to make this work
BEGIN TRANSACTION
insert into user(dummy) values('text')
WAITFOR DELAY '00:00:10';
insert into address(ID_FK) values((SELECT SCOPE_IDENTITY()))
COMMIT TRANSACTION
Running during transaction result in null in join
select * from user u left join address a on u.id = a.ID_FK order by id desc
| ID | dummy | ID_FK |
| 101 | 'text' | null |
Running after transaction result in correct result
select * from user u left join address a on u.id = a.ID_FK order by id desc
| ID | dummy | ID_FK|
| 101 | 'text' | 101 |
This type of thing is entirely possible at default read committed level for on premise SQL Server as that uses read committed locking. It is then execution plan dependent what will happen.
An example is below
CREATE TABLE [user]
(
id INT IDENTITY PRIMARY KEY,
dummy VARCHAR(10)
);
CREATE TABLE [address]
(
ID_FK INT REFERENCES [user](id),
addr VARCHAR(30)
);
Connection One
BEGIN TRANSACTION
INSERT INTO [user]
(dummy)
VALUES ('text')
WAITFOR DELAY '00:00:20';
INSERT INTO address
(ID_FK,
addr)
VALUES (SCOPE_IDENTITY(),
'Address Line 1')
COMMIT TRANSACTION
Connection Two (run this whilst connection one is waiting the 20 seconds)
SELECT *
FROM [user] u
LEFT JOIN [address] a
ON u.id = a.ID_FK
ORDER BY id DESC
OPTION (MERGE JOIN)
Returns
id
dummy
ID_FK
addr
1
text
NULL
NULL
The execution plan is as follows
The scan on User is blocked by the open transaction in Connection 1 that has inserted the row there. This has to wait until that transaction commits and then eventually gets to read the newly inserted row.
Meanwhile the Sort operator has already requested the rows from address by this point as it consumes all its rows in its Open method (i.e. during operator initialisation). This is not blocked as no row has been inserted to address yet. It reads 0 rows from address which explains the final result.
If you switch to using read committed snapshot rather than read committed locking you won't get this issue as it will only read the committed state at the start of the statement so it isn't possible to get this kind of anomaly.
I have a table with fields TransactionID, Amount and ParentTransactionID
The transactions can be cancelled so a new entry posted with amount and ParentTransactionID as cancelled TransactionID.
Lets say a transaction
1 100 NULL
I cancelled the above entry, it will like
2 -100 1
Again cancelled the above transaction, so it should like
3 100 2
When I fetch I should get the record 3 as ID 1 and 2 got cancelled.
result should be
3 100 2
If I cancelled the 3rd entry no records should return.
SELECT * FROM Transaction t
WHERE NOT EXISTS (SELECT TOP 1 NULL FROM Transaction pt
WHERE (pt.ParentTransactionID = t.TransactionID OR t.ParentTransactionID = pt.TransactionID)
AND ABS(t.Amount) = ABS(pt.Amount))
This works if only one level of cancel is made.
If all transactions are cancelled by a new transaction setting ParentTransactionId to the transaction it cancels, it can be done using a simple LEFT JOIN;
SELECT t1.* FROM Transactions t1
LEFT JOIN Transactions t2
ON t1.TransactionId = t2.ParentTransactionId
WHERE t2.TransactionId IS NULL;
t1 being the transaction we're currently looking at and t2 being the possibly cancelling transaction. If there is no cancelling transaction (ie the TransactionId for t2 does not exist), return the row.
I'm not sure about your last statement though, If I cancelled the 3rd entry no records should return.. How would you cancel #3 without adding a new transaction to the table? You may have some other condition for a cancel you're not telling us about...?
Simple SQLfiddle demo.
EDIT: Since you don't want cancelled transactions (or rather transactions with an odd number of cancellations), you need a quite a bit more complicated recursive query to figure out whether to show the last transaction or not;
WITH ChangeLog(TransactionID, Amount, ParentTransactionID,
IsCancel, OriginalTransactionID) AS
(
SELECT TransactionID, Amount, ParentTransactionID, 0, TransactionID
FROM Transactions WHERE ParentTransactionID IS NULL
UNION ALL
SELECT t.TransactionID, t.Amount, t.ParentTransactionID,
1-c.IsCancel, c.OriginalTransactionID
FROM Transactions t
JOIN ChangeLog c ON c.TransactionID = t.ParentTransactionID
)
SELECT c1.TransactionID, c1.Amount, c1.ParentTransactionID
FROM ChangeLog c1
LEFT JOIN ChangeLog c2
ON c1.TransactionID < c2.TransactionID
AND c1.OriginalTransactionID = c2.OriginalTransactionID
WHERE c2.TransactionID IS NULL AND c1.IsCancel=0
This will, in your example with 3 transactions, show the last row, but if the last row is cancelled, it won't return anything.
Since SQLfiddle is up again, here is a fiddle to test with.
A short explanation of the query may be in order even if a bit hard to do in a simple way; it defines a recursive "view", ChangeLog that tracks cancels and the original transaction id from the original to the last transaction in a series (a series is all transactions with the same OriginalTransactionId). After that, it joins ChangeLog with itself to find the last entry (ie all transactions that don't have a cancelling transaction). If the last entry found in a series is not a cancel (IsCancel=0) it will show up.
I have a SQL Server table that I'm using as a queue, and it's being processed by a multi-threaded (and soon to be multi-server) application. I'd like a way for a process to claim the next row from the queue, flagging it as "in-process", without the possibility that multiple threads (or multiple servers) will claim the same row at the same time.
Is there a way to update a flag in a row and retrieve that row at the same time? I want something like this psuedocode, but ideally, without blocking the whole table:
Block the table to prevent others from reading
Grab the next ID in the queue
Update the row of that item with a "claimed" flag (or whatever)
Release the lock and let other threads repeat the process
What's the best way to use T-SQL to accomplish this? I remember seeing a statement one time that would DELETE rows and, at the same time, deposit the DELETED rows into a temp table so you could do something else with them, but I can't for the life of me find it now.
You can use the OUTPUT clause
UPDATE myTable SET flag = 1
WHERE
id = 1
AND
flag <> 1
OUTPUT DELETED.id
Main thing is to use a combination of table hints as shown below, within a transaction.
DECLARE #NextId INTEGER
BEGIN TRANSACTION
SELECT TOP 1 #NextId = ID
FROM QueueTable WITH (UPDLOCK, ROWLOCK, READPAST)
WHERE BeingProcessed = 0
ORDER BY ID ASC
IF (#NextId IS NOT NULL)
BEGIN
UPDATE QueueTable
SET BeingProcessed = 1
WHERE ID = #NextID
END
COMMIT TRANSACTION
IF (#NextId IS NOT NULL)
SELECT * FROM QueueTable WHERE ID = #NextId
UPDLOCK will lock the next available row it finds that's available, preventing other processes from grabbing it.
ROWLOCK will ensure only the individual row is locked (I've never found it to be a problem not using this as I think it will only use a rowlock anyway, but safest to use it).
READPAST will prevent a process being blocked, waiting for another to finish.