Read uncommitted mvcc database - database

say I want to do the following transactions in read committed mode (in postgres).
T1: r(A) -> w(A)
T2: r(A) -> w(A)
If the operations where called in this order:
r1(A)->r2(A)->w1(A)->c1->w2(A)->c2
I would exspect that T2 has to wait at r(A). Because T1 would set an exclusive lock for A at the first read, because it wants to write it later. But with MVCC there are are no read locks?
Now i've got 2 questions:
If I use JDBC to read some data and then execute a separte command for inserting the read data. How does the database know that it has to make an exclusiv lock when it is only reading? Increasing an read lock to a write lock is not allowed in 2PL, as far as I know.
I think my assumtions are wrong... Where does this scenario wait or is one transaction killed? Read uncommitted shouldn't allow lost updates, but I can't see how this can work.
I would be happy if someone could help me. Thanks

I would exspect that T2 has to wait at r(A). Because T1 would set an exclusive lock for A at the first read, because it wants to write it later. But with MVCC there are no read locks?
There are write locks if you specify for update in your select statements. In that case, r2(A) would wait to read if it's trying to lock the same rows as r1(A).
http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html
A deadlock occurs if two transactions start and end up requesting each others already locked rows:
r11(A) -> r22(A) -> r12(A) (same as r22) vs r21(A) (same as r11) -> deadlock

"But with MVCC there are are no read locks?"
MVCC is a different beast. There are no "locks" in MVCC because in that scenario, the system maintains as many versions of a single row as might be needed by the transactions that are running concurrently. "Former contents" of a row are not "lost by an update" (i.e. physically overwritten and destroyed), and thus making sure that a reader does not get to see "new updates", is addressed by "redirecting" that reader's inquiries to the "former content", which is not locked (hence the term "snapshot isolation"). Note that MVCC, in principle, cannot be applied to updating transactions.
"If I use JDBC to read some data and then execute a separate command for inserting the read data. How does the database know that it has to make an exclusive lock when it is only reading? Increasing an read lock to a write lock is not allowed in 2PL, as far as I know."
You are wrong about 2PL. 2PL means that acquired locks are never released until commit time. It does not mean that an existing lock cannot be strengthened. Incidentally : that is why isolation levels such as "cursor stability" are not 2PL : they do release read locks prior to commit time.

The default transaction mode in PostgreSQL is READ COMMITTED, however READ COMMITTED does not provide the level of serialization that you are looking for.
You are looking for the SERIALIZABLE transaction level. Look at the SET TRANSACTION command after reading PostgreSQL's documentation on Transaction Serialization Levels, specifically the SERIALIZABLE mode. PostgreSQL's MVCC docs are also worth reading.
Cheers.

Related

Row locking behaviour while updating

In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.

SQL Server 2012 - How does "Repeatable Read" isolation level work?

I feel like I should know this, but I can't find anything that specifically outlines this, so here goes.
The documentation for SQL Server describes REPEATABLE READ as:
Specifies that statements cannot read data that has been modified but
not yet committed by other transactions and that no other transactions
can modify data that has been read by the current transaction until
the current transaction completes
This makes sense, but what actually happens when one of these situation arises? If, for example, Transaction A reads row 1, and then Transaction B attempts to update row 1, what happens? Does Transaction B wait until Transaction A has finished and then try again? Or is an exception thrown?
REPEATABLE READ takes S-locks on all rows that have been read by query plan operators for the duration of the transaction. The answer to your question follows from that:
If the read comes first it S-locks the row and the write must wait.
If the write comes first the S-lock waits for the write to commit.
Under Hekaton it works differently because there are no locks.

Postgres behaviour of concurrent DELETE RETURNING queries

If I have a database transaction which goes along the lines of:
DELETE FROM table WHERE id = ANY(ARRAY[id1, id2, id3,...]) RETURNING foo, bar;
if num_rows_returned != num_rows_in_array then
rollback and return
Do stuff with deleted data...
Commit
My understanding is that the DELETE query will lock those rows, until the transaction is committed or rolled back. As according to the postgres 9.1 docs:
An exclusive row-level lock on a specific row is automatically
acquired when the row is updated or deleted. The lock is held until
the transaction commits or rolls back, just like table-level locks.
Row-level locks do not affect data querying; they block only writers
to the same row.
I am using the default read committed isolation level in postgres 9.1.13
I would take from this that I should be OK, but I want to ensure that this means the following things are true:
Only one transaction may delete and return a row from this table, unless a previous transaction was rolled back.
This means "Do stuff with deleted data" can only be done once per row.
If two transactions try to do the above at once with conflicting rows, one will always succeed (ignoring system failure), and one will always fail.
Concurrent transactions may succeed when there is no crossover of rows.
If a transaction is unable to delete and return all rows, it will rollback and thus not delete any rows. A transaction may try to delete two rows for example. One row is already deleted by another transaction, but the other is free to be returned. However since one row is already deleted, the other must not be deleted and processed. Only if all specified ids can be deleted and returned may anything take place.
Using the normal idea of concurrency, processes/transactions do not fail when they are locked out of data, they wait.
The DBMS implements execution in such a way that transactions advance but only seeing effects from other transactions according to the isolation level. (Only in the case of detected deadlock is a transaction aborted, and even then its implemented execution will begin again, and the killing is not evident to its next execution or to other transactions except per isolation level.) Under SERIALIZABLE isolation level this means that the database will change as if all transactions happened without overlap in some order. Other levels allow a transaction to see certain effects of overlapped implementation execution of other transactions.
However in the case of PostgresSQL under SERIALIZABLE when a transaction tries to commit and the DBMS sees that it would give non-serialized behaviour the tranasaction is aborted with notification but not automatically restarted. (Note that this is not failure from implementation execution attempted access to a locked resource.)
(Prior to 9.1, PostgrSQL SERIALIZABLE did not give SQL standard (serialized) behaviour: "To retain the legacy Serializable behavior, Repeatable Read should now be requested.")
The locking protocols are how actual implementation execution gets interleaved to maximize throughput while keeping that true. All locking does is prevent actual overlapped implementation execution accesses to effect the apparent serialized execution.
Explicit locking by transaction code also just causes waiting.
Your question does not reflect this. You seem to think that attempted access to a locked resource by the implementation aborts a transaction. That is not so.

Implement pessimistic locking

I'm interested in how I can implement pessimistic locking, with very specific behavior.
(The reason I tagged the question with Sybase+Oracle+MSSQL, is because I'd be happy with a solution or "that's impossible!" for any one of them)
What I want is this:
1 - be able to lock a row (so that process can later do update, but no other process can lock the row)
2 - when another process tries to lock same row it should get notification that record is locked - I don't want this process to hang (I believe simple timeout can be used here)
3 - when another process tries to read record, it should be able to read it the way it is currently in database (but I don't want to use dirty reads).
The above 3 requirements are currently solved by application using shared memory - and performing record-locking outside database. I'd like to move the locking into the database.
So far, I'm having conflicts between #1 and #3 - if I lock record by doing 'update ...' by updating a field to same value, than 'select' from another process hangs.
Edit:
I'm having some luck now with snapshot isolation level on MSSQL. I can do both the locking, and reads without using dirty reads.
The reason I don't want to use dirty-reads, is that if a report is running, it might read multiple tables, and issue multiple queries. Snapshot gives me a consistent snapshot of the datatabase. With a dirty read, I'd have mismatching data - if there were any updates in the middle.
I think Oracle has snapshot as well, so now I'm most interested in Sybase.
In Oracle you can use select for update nowait to lock a record.
select * from tab where id=1234 for update nowait;
If another process try to execute the same statment it gets an exception:
ORA-00054: resource busy and acquire with NOWAIT specified
until the first process(session) performs commit or rollback.
normally, oracle don't permit dirty reads
Your described conflict between #1 and #3 is a logical one: you can either let the database do dirty reads OR you block the reads. If you could read the locked row, it is a dirty read by definition. That has nothing to do with the specific database system you use!
So if you want it that way: Yes, what you want is impossible with all 3 systems because it hurts the definition of "dirty read".

SQL Server SELECT statements causing blocking

We're using a SQL Server 2005 database (no row versioning) with a huge select statement, and we're seeing it block other statements from running (seen using sp_who2). I didn't realise SELECT statements could cause blocking - is there anything I can do to mitigate this?
SELECT can block updates. A properly designed data model and query will only cause minimal blocking and not be an issue. The 'usual' WITH NOLOCK hint is almost always the wrong answer. The proper answer is to tune your query so it does not scan huge tables.
If the query is untunable then you should first consider SNAPSHOT ISOLATION level, second you should consider using DATABASE SNAPSHOTS and last option should be DIRTY READS (and is better to change the isolation level rather than using the NOLOCK HINT). Note that dirty reads, as the name clearly states, will return inconsistent data (eg. your total sheet may be unbalanced).
From documentation:
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. For more information, see Types of Concurrency Control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
A shared lock is compatible with another shared lock or an update lock, but not with an exlusive lock.
That means that your SELECT queries will block UPDATE and INSERT queries and vice versa.
A SELECT query will place a temporary shared lock when it reads a block of values from the table, and remove it when it done reading.
For the time the lock exists, you will not be able to do anything with the data in the locked area.
Two SELECT queries will never block each other (unless they are SELECT FOR UPDATE)
You can enable SNAPSHOT isolation level on your database and use it, but note that it will not prevent UPDATE queries from being locked by SELECT queries (which seems to be your case).
It, though, will prevent SELECT queries from being locked by UPDATE.
Also note that SQL Server, unlike Oracle, uses lock manager and keeps it locks in an in-memory linked list.
That means that under heavy load, the mere fact of placing and removing a lock may be slow, since the linked list should itself be locked by the transaction thread.
To perform dirty reads you can either:
using (new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
//Your code here
}
or
SelectCommand = "SELECT * FROM Table1 WITH (NOLOCK) INNER JOIN Table2 WITH (NOLOCK) ..."
remember that you have to write WITH (NOLOCK) after every table you want to dirty read
You could set the transaction level to Read Uncommitted
You might also get deadlocks:
"deadlocks involving only one table"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
and or incorrect results:
"Selects under READ COMMITTED and REPEATABLE READ may return incorrect results."
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx
You can use WITH(READPAST) table hint. It's different than the WITH(NOLOCK). It will get the data before the transaction was started and will not block anyone. Imagine that, you ran the statement before the transaction was started.
SELECT * FROM table1 WITH (READPAST)

Resources