Postgres behaviour of concurrent DELETE RETURNING queries

Postgres behaviour of concurrent DELETE RETURNING queries - database

If I have a database transaction which goes along the lines of:
DELETE FROM table WHERE id = ANY(ARRAY[id1, id2, id3,...]) RETURNING foo, bar;
if num_rows_returned != num_rows_in_array then
rollback and return
Do stuff with deleted data...
Commit
My understanding is that the DELETE query will lock those rows, until the transaction is committed or rolled back. As according to the postgres 9.1 docs:
An exclusive row-level lock on a specific row is automatically
acquired when the row is updated or deleted. The lock is held until
the transaction commits or rolls back, just like table-level locks.
Row-level locks do not affect data querying; they block only writers
to the same row.
I am using the default read committed isolation level in postgres 9.1.13
I would take from this that I should be OK, but I want to ensure that this means the following things are true:
Only one transaction may delete and return a row from this table, unless a previous transaction was rolled back.
This means "Do stuff with deleted data" can only be done once per row.
If two transactions try to do the above at once with conflicting rows, one will always succeed (ignoring system failure), and one will always fail.
Concurrent transactions may succeed when there is no crossover of rows.
If a transaction is unable to delete and return all rows, it will rollback and thus not delete any rows. A transaction may try to delete two rows for example. One row is already deleted by another transaction, but the other is free to be returned. However since one row is already deleted, the other must not be deleted and processed. Only if all specified ids can be deleted and returned may anything take place.

Using the normal idea of concurrency, processes/transactions do not fail when they are locked out of data, they wait.
The DBMS implements execution in such a way that transactions advance but only seeing effects from other transactions according to the isolation level. (Only in the case of detected deadlock is a transaction aborted, and even then its implemented execution will begin again, and the killing is not evident to its next execution or to other transactions except per isolation level.) Under SERIALIZABLE isolation level this means that the database will change as if all transactions happened without overlap in some order. Other levels allow a transaction to see certain effects of overlapped implementation execution of other transactions.
However in the case of PostgresSQL under SERIALIZABLE when a transaction tries to commit and the DBMS sees that it would give non-serialized behaviour the tranasaction is aborted with notification but not automatically restarted. (Note that this is not failure from implementation execution attempted access to a locked resource.)
(Prior to 9.1, PostgrSQL SERIALIZABLE did not give SQL standard (serialized) behaviour: "To retain the legacy Serializable behavior, Repeatable Read should now be requested.")
The locking protocols are how actual implementation execution gets interleaved to maximize throughput while keeping that true. All locking does is prevent actual overlapped implementation execution accesses to effect the apparent serialized execution.
Explicit locking by transaction code also just causes waiting.
Your question does not reflect this. You seem to think that attempted access to a locked resource by the implementation aborts a transaction. That is not so.

Related

Row locking behaviour while updating

In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;

With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.

Dbms best practice

I have some questions about programming with a DBMS (no specific language needed, but I'm using Java; no specific DBMS in mind).
I open a transaction, select a row, then read a field, add 1 to the field, and update, then commit. What happens if another user runs in the same time a transaction on that field? Does it crash the transaction, or what?
Example: I'm a in a shop that has 1 kg of bread. Waiter1 has a client that needs 1 kg of bread. Waiter2 the same. If the program is:
select row "bread"
if quantity>=1 kg then quantity=quantity-1
update row
What happens if the two waiters run the transaction in the same time?
What are the best ways to implement multiuser, avoiding "collision"? Select and lock, transaction only, or what?
When to use optimistic lock, or pessimistic?
When to use lock, and when is it not needed?

Why are you handling this on the application side? Relational databases are built to handle situations like this. Just use an update statement:
UPDATE some_table
SET quantity = quantity - 1
WHERE item_name = 'bread' AND quantity >= 1

What you are looking for is Transaction Isolation. The official SQL standard would handle it like this:
If you don't lock specifically your database will generally lock either the row or even the table for you. Depending on your isolation level it will either wait or raise an error.
Serializable
The second transaction will wait for the first to complete before it can do anything.
Repeatable reads
As soon as the first transaction reads, the second will wait until the first one committed. Or the other way around, if somehow the second transaction starts reading before the first.
Read committed
If the first transaction writes before the second writes, the first will have to wait until the second has committed. Otherwise the second will have to wait until the first has committed.
Read uncommitted
Both can read without an issue, but the first to write will make the other write stall till the transaction has been committed.
If one of the transactions commits after the other reads, you could lose the data and end up with only 1 update.

Read Committed Snapshot Isolation: Does Update Conflict Rollback appear as Deadlock?

I have read committed snapshot isolation and allow isolation ON for my database. I'm still receiving a deadlock error. I'm pretty sure I know what is happening...
First transaction gets a sequence number at the beginning of its transaction.
Second one gets a later sequence number at the beginning of its transaction, but after the first transaction has already gotten its (second sequence number is more recent than first).
Second transaction makes it to the update statement first. When it checks the row versioning it sees the record that precedes both transactions since the first one hasn't reached the update yet. It finds that the row's sequence number is in a committed state and moves on it's merry way.
The first transaction takes it's turn and like the second transaction finds the same committed sequence number because it won't see the second one because it is newer than itself. When it tries to commit it finds that another transaction has already updated records that are trying to be committed and has to roll itself back.
Here is my question: Will this rollback appear as a deadlock in a trace?

In a comment attached to the original question you said: "I'm just wondering if an update conflict will appear as a deadlock or if it will appear as something different." I actually had exactly these types of concerns when I started looking into using snapshot isolation. Eventually I realized that there is significant difference between READ_COMMITTED_SNAPSHOT and isolation level SNAPSHOT.
The former uses row versioning for reads, but continues to use exclusive locking for writes. So, READ_COMMITTED_SNAPHOT is actually something in between pure pessimistic and pure optimistic concurrency control. Because it uses locks for writing, update conflicts are not possible, but deadlocks are. At least in SQL Server those deadlocks will be reported as deadlocks just as they are with 'normal' pessimistic locking.
The latter (isolation level SNAPSHOT) is pure optimistic concurrency control. Row versioning is used for both reads and writes. Deadlocks are not possible, but update conflicts are. The latter are reported as update conflicts and not as deadlocks.

The snapshot transaction is rolled back, and it receives the following error message:
Msg 3960, Level 16, State 4, Line 1
Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot
isolation to access table 'Test.TestTran' directly or indirectly in database 'TestDatabase' to
update, delete, or insert the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the update/delete statement.

To prevent deadlock enable both
ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT
ALTER DATABASE [BD] SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE [BD] SET ALLOW_SNAPSHOT_ISOLATION ON;
here explain the differences
http://technet.microsoft.com/en-us/sqlserver/gg545007.aspx

when/what locks are hold/released in READ COMMITTED isolation level

I am trying to understand isolation/locks in SQL Server.
I have following scenario in READ COMMITTED isolation level(Default)
We have a table.
create table Transactions(Tid int,amt int)
with some records
insert into Transactions values(1, 100)
insert into Transactions values(2, -50)
insert into Transactions values(3, 100)
insert into Transactions values(4, -100)
insert into Transactions values(5, 200)
Now from msdn i understood
When a select is fired shared lock is taken so no other transaction can modify data(avoiding dirty read).. Documentation also talks about row level, page level, table level lock. I thought of following scenarion
Begin Transaction
select * from Transactions
/*
some buisness logic which takes 5 minutes
*/
Commit
What I want to understand is for what duration of time shared lock would be acquired and which (row, page, table).
Will lock will be acquire only when statement select * from Transactions is run or would it be acquire for whole 5+ minutes till we reach COMMIT.

You are asking the wrong question, you are concerned about the implementation details. What you should think of and be concerned with are the semantics of the isolation level. Kendra Little has a nice poster explaining them: Free Poster! Guide to SQL Server Isolation Levels.
Your question should be rephrased like:
select * from Items
Q: What Items will I see?
A: All committed Items
Q: What happens if there are uncommitted transactions that have inserted/deleted/update Items?
A: your SELECT will block until all uncommitted Items are committed (or rolled back).
Q: What happens if new Items are inserted/deleted/update while I run the query above?
A: The results are undetermined. You may see some of the modifications, won't see some other, and possible block until some of them commit.
READ COMMITTED makes no promise once your statement finished, irrelevant of the length of the transaction. If you run the statement again you will have again exactly the same semantics as state before, and the Items you've seen before may change, disappear and new one can appear. Obviously this implies that changes can be made to Items after your select.
Higher isolation levels give stronger guarantees: REPEATABLE READ guarantees that no item you've selected the first time can be modified or deleted until you commit. SERIALIZABLE adds the guarantee that no new Item can appear in your second select before you commit.
This is what you need to understand, no how the implementation mechanism works. After you master these concepts, you may ask the implementation details. They're all described in Transaction Processing: Concepts and Techniques.

Your question is a good one. Understanding what kind of locks are acquired allows a deep understanding of DBMS's. In SQL Server, under all isolation levels (Read Uncommitted, Read Committed (default), Repeatable Reads, Serializable) Exclusive Locks are acquired for Write operations.
Exclusive locks are released when transaction ends, regardless of the isolation level.
The difference between the isolation levels refers to the way in which Shared (Read) Locks are acquired/released.
Under Read Uncommitted isolation level, no Shared locks are acquired. Under this isolation level the concurrency issue known as "Dirty Reads" (a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed, so it could be rolled back) can occur.
Under Read Committed isolation level, Shared Locks are acquired for the concerned records. The Shared Locks are released when the current instruction ends. This isolation level prevents "Dirty Reads" but, since the record can be updated by other concurrent transactions, "Non-Repeatable Reads" (transaction A retrieves a row, transaction B subsequently updates the row, and transaction A later retrieves the same row again. Transaction A retrieves the same row twice but sees different data) or "Phantom Reads" (in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first) can occur.
Under Repeatable Reads isolation level, Shared Locks are acquired for the transaction duration. "Dirty Reads" and "Non-Repeatable Reads" are prevented but "Phantom Reads" can still occur.
Under Serializable isolation level, ranged Shared Locks are acquired for the transaction duration. None of the above mentioned concurrency issues occur but performance is drastically reduced and there is the risk of Deadlocks occurrence.

lock will only acquire when select * from Transaction is run
You can check it with below code
open a sql session and run this query
Begin Transaction
select * from Transactions
WAITFOR DELAY '00:05'
/*
some buisness logic which takes 5 minutes
*/
Commit
Open another sql session and run below query
Begin Transaction
Update Transactions
Set = ...
where ....
commit

First, lock only acquire when statement run.
Your statement seprate in two pieces, suppose to be simplfy:
select * from Transactions
update Transactions set amt = xxx where Tid = xxx
When/what locks are hold/released in READ COMMITTED isolation level?
when select * from Transactions run, no lock acquired.
Following update Transactions set amt = xxx where Tid = xxx will add X lock for updating/updated keys, IX lock for page/tab
All lock will release only after committed/rollbacked. That means no lock will release in trans running.

TABLOCK vs TABLOCKX

What is the difference between TABLOCK and TABLOCKX?
http://msdn.microsoft.com/en-us/library/ms187373.aspx states that TABLOCK is a shared lock while TABLOCKX is an exclusive lock. Is the first maybe only an index lock of sorts? And what is the concept of sharing a lock?

Big difference, TABLOCK will try to grab "shared" locks, and TABLOCKX exclusive locks.
If you are in a transaction and you grab an exclusive lock on a table, EG:
SELECT 1 FROM TABLE WITH (TABLOCKX)
No other processes will be able to grab any locks on the table, meaning all queries attempting to talk to the table will be blocked until the transaction commits.
TABLOCK only grabs a shared lock, shared locks are released after a statement is executed if your transaction isolation is READ COMMITTED (default). If your isolation level is higher, for example: SERIALIZABLE, shared locks are held until the end of a transaction.
Shared locks are, hmmm, shared. Meaning 2 transactions can both read data from the table at the same time if they both hold a S or IS lock on the table (via TABLOCK). However, if transaction A holds a shared lock on a table, transaction B will not be able to grab an exclusive lock until all shared locks are released. Read about which locks are compatible with which at msdn.
Both hints cause the db to bypass taking more granular locks (like row or page level locks). In principle, more granular locks allow you better concurrency. So for example, one transaction could be updating row 100 in your table and another row 1000, at the same time from two transactions (it gets tricky with page locks, but lets skip that).
In general granular locks is what you want, but sometimes you may want to reduce db concurrency to increase performance of a particular operation and eliminate the chance of deadlocks.
In general you would not use TABLOCK or TABLOCKX unless you absolutely needed it for some edge case.

Quite an old article on mssqlcity attempts to explain the types of locks:
Shared locks are used for operations that do not change or update data, such as a SELECT statement.
Update locks are used when SQL Server intends to modify a page, and later promotes the update page lock to an exclusive page lock before actually making the changes.
Exclusive locks are used for the data modification operations, such as UPDATE, INSERT, or DELETE.
What it doesn't discuss are Intent (which basically is a modifier for these lock types). Intent (Shared/Exclusive) locks are locks held at a higher level than the real lock. So, for instance, if your transaction has an X lock on a row, it will also have an IX lock at the table level (which stops other transactions from attempting to obtain an incompatible lock at a higher level on the table (e.g. a schema modification lock) until your transaction completes or rolls back).
The concept of "sharing" a lock is quite straightforward - multiple transactions can have a Shared lock for the same resource, whereas only a single transaction may have an Exclusive lock, and an Exclusive lock precludes any transaction from obtaining or holding a Shared lock.

This is more of an example where TABLOCK did not work for me and TABLOCKX did.
I have 2 sessions, that both use the default (READ COMMITTED) isolation level:
Session 1 is an explicit transaction that will copy data from a linked server to a set of tables in a database, and takes a few seconds to run. [Example, it deletes Questions]
Session 2 is an insert statement, that simply inserts rows into a table that Session 1 doesn't make changes to. [Example, it inserts Answers].
(In practice there are multiple sessions inserting multiple records into the table, simultaneously, while Session 1 is running its transaction).
Session 1 has to query the table Session 2 inserts into because it can't delete records that depend on entries that were added by Session 2. [Example: Delete questions that have not been answered].
So, while Session 1 is executing and Session 2 tries to insert, Session 2 loses in a deadlock every time.
So, a delete statement in Session 1 might look something like this:
DELETE tblA FROM tblQ LEFT JOIN tblX on ...
LEFT JOIN tblA a ON tblQ.Qid = tblA.Qid
WHERE ... a.QId IS NULL and ...
The deadlock seems to be caused from contention between querying tblA while Session 2, [3, 4, 5, ..., n] try to insert into tblA.
In my case I could change the isolation level of Session 1's transaction to be SERIALIZABLE. When I did this: The transaction manager has disabled its support for remote/network transactions.
So, I could follow instructions in the accepted answer here to get around it: The transaction manager has disabled its support for remote/network transactions
But a) I wasn't comfortable with changing the isolation level to SERIALIZABLE in the first place- supposedly it degrades performance and may have other consequences I haven't considered, b) didn't understand why doing this suddenly caused the transaction to have a problem working across linked servers, and c) don't know what possible holes I might be opening up by enabling network access.
There seemed to be just 6 queries within a very large transaction that are causing the trouble.
So, I read about TABLOCK and TabLOCKX.
I wasn't crystal clear on the differences, and didn't know if either would work. But it seemed like it would. First I tried TABLOCK and it didn't seem to make any difference. The competing sessions generated the same deadlocks. Then I tried TABLOCKX, and no more deadlocks.
So, in six places, all I needed to do was add a WITH (TABLOCKX).
So, a delete statement in Session 1 might look something like this:
DELETE tblA FROM tblQ q LEFT JOIN tblX x on ...
LEFT JOIN tblA a WITH (TABLOCKX) ON tblQ.Qid = tblA.Qid
WHERE ... a.QId IS NULL and ...