Difference between 2PC (2 phase commit) and 2 PL (2 phase locking) - database

What is the difference between the two? The protocol on the surface looks different, but I would like to understand what is really different between the two and why they are not equivalent.

2 Phase locking is a mechanism implemented within a single database instance to achieve serializeable isolation level. Serializeable transaction level is the strongest isolation where even with parallely executing transactions, the end result is same as if the transactions where executed serially. It works as follows:
Whenever the transaction wants to update an object/row, it must acquire a write/exclusive lock. When transactions wants to read an object/row, it must acquire a read/shared lock. Instead of releasing the lock immediately after each query, the locks must be held till the end of the transaction(commit or abort). So while the transaction is being executed, the number of locks held by the transaction expand/grow. (Read/write lock behavior is similar to any other reader/writer locking mechanisms, so not discussing here)
At the end of the transaction, the locks are released and number of locks held by the transactions shrinks.
Since the locks are acquired in one phase and released in another phase i.e., there are no lock releases in acquire phase and no new lock acquire in release phase, this is called 2 phase locking.
2 phase commit is an algorithm for implementing distributed transaction across multiple database instances to ensure all nodes either commit or abort the transaction.
It works by having coordinator(could be a separate service or library within the application initiating the transaction) issue two requests - PREPARE to all nodes in phase 1 and COMMIT(if all nodes returned OK in PREPARE phase) or ABORT(if any node returned NOT OK in PREPARE PHASE) to all nodes in phase 2.
TLDR:
2 phase locking - for serializable isolation within a single database instance
2 phase commit - atomic commit across multiple nodes of a distributed database/datastores

Related

example for rigorous two phase locking

The image below shows an example of an S2PL transaction, can anyone convert this example to R2PL?
The differences between S2PL and R2PL is really only during the second phase, i.e. how they release locks.
For S2PL, the locks for a transaction must hold all its locks till it commits or aborts, while R2PL releases all locks only after commit or abort.
So, to convert it to R2PL, you just have to move the unlock(A) to after the commit point, and before unlock(B).

Kinds of multi-partitioned stored procedures and will they still lock the entire cluster in VoltDB 9?

I try to understand the impact of multi-partitioned transactions in VoltDB 9.x. I know it is designed for single-partioned transactions, but I want to know what it will cost me if I can't avoid it.
In summary, my question is whether it is still the case that multi-partitioned transactions in VoltDB always lock the entire cluster and how are the different kinds of multi-partitioned transactions are related to each other regarding to their execution behaviour?
From H-Store-FAQ:
[...] this allows H-Store to support additional optimizations, such as speculative execution and arbitrary multi-partition transactions. For example, in VoltDB every transaction is either single-partition or all-partition. That is, any transaction that needs to touch multiple partitions will cause the VoltDB’s transaction coordinator to lock all partitions in the cluster, even if the transaction only needs to touch data at two partitions. [...] It is likely VoltDB will support these features in the future [...]
The papers The VoltDB Main Memory DBMS and How VoltDB does Transactions claim that it exists at least one split of multi-partitioned transactions in VoltDB: One-Shot-Reads and General-2PC-Transactions.
In the class MpTransactionTaskQueue there is a distinction, whether a transaction will be routed to the multi-partitioned site (count 1) or a pool of read-only sites (default count up to 20) of the MPI and they can't be executed interleaved.
So these are my sub questions:
Are One-Shot-Reads always be executed on RO-Sites?
Are RO-Sites execute read-only and not-one-phase multi-partitioned transactions in addition?
If it is at least one write fragment in a multi-partitioned transactions it will be executed on the RW-Site and atomic committed with 2PC?
In both cases it is possible, that I don't have to touch all partitions in the cluster. Are uninvolved partitions locked or can they execute single-partitioned transactions in the meantime (if several One-Shot-Reads or one 2PC-Transaction are running on other partitions). If they are locked, how? Does they get the FragmentTaskMessage with an empty or dummy plan fragment for example?
The class SystemProcedureCatalog defines an "Every-Flag" and it will be checked in code in addition to the read-only and single-partitioned flags. How does this flag is related to One-Shot-Reads or the Run-Everywhere-Pattern?
To make things easier for developers, procedures are called the same way regardless of what type they are. Internally there are different types of multi-partition procedures as they provide some optimizations, although there is more to be done and some H-Store projects have done research in these areas.
MP transactions still ultimately involve sending tasks to be done on all the partitions. The one exception you noticed is a special two-partition transaction that is only used in rebalancing data during elastic add or shrink.
Partitions consist of one or more sites (on separate servers) depending on kfactor. These sites stay in sync without a 2PC by requiring deterministic procedures. The partitions work through the backlog in a queue as fast as the process time (or local execution time) allows. All sites handle both reads and writes.
MP tasks sent to those partition queues have to wait on all the pending items to finish. That is why there is a pool of 20 (by default) threads for MP reads. This allows 20 tasks to be sent out at once, so that the next MP read usually doesn't have to wait for 2 networks hops + the max queue wait time + processing time before it can even get queued.
MP reads that are not "single-shot" would be Java procedures with multiple voltExecuteSQL() calls, such as a procedure where subsequent SQL queries depend on the results of prior queries. When these transactions send tasks to the partitions, the partitions have to wait for the max queue wait time + processing time + 2 network hops before they can do the next part of the transaction.
MP writes can also have multiple voltExecuteSQL() calls, plus they have to wait for a final commit signal, so this all delays the progress on the partitions.
There are certainly examples of MP transactions that shouldn't need to involve all of the partitions and could benefit from future optimizations, but it's not as easy as it may seem on a database that has to support durability to disk, k-safety, elastic add and shrink, multi-cluster active-active replication, and many of the other features that have been added to VoltDB over the years since it grew out of the H-Store project.
Disclosure: I work at VoltDB

Database locks and isolation level in JPA

I am not sure in understanding the Database Locks. I am using the repeatable read isolation level. According to wikipedia it keeps read and write locks (acquired on selected data) until the end of the transaction.
Let's consider the following scenario: "Let's have two threads A, B. Thread A begins a transaction. Let's say thread A retrieves a list of all users from table User. (I am expecting here that: Thread A acquired read&write locks on all users ??) Thread B begins another transaction, retrieves one concrete User u from table User and updates the User u then commits the transaction (Since A acquired the locks, does the Thread B has to wait until A commits the transaction ??)"
Is the describes behavior to expect if using JPA ?
Is the lock acquired if the Thread A reads the users outside a transaction (Let's say if I am using the Extended Peristence Context) ??
You are confusing the logical isolation level with its physical implementation. The SQL standard defines the four isolation levels Serializable, Repeatable Read, Read Committed and Read Uncommitted and the three ways in which serializability might be violated: dirty read, nonrepeatable read and phantom read.
How a particular DBMS achieves each level of isolation is an implementation detail which differs between each DBMS. Some DBMS may use a locking strategy which means that read locks are used that means writers are blocked until a transaction completes. Other DBMS may use other strategies, such as multi-version concurrency control, which means readers and writers do not block each other. In order to maximize the performance and scalability of your application you will need to code to the particular implementation of the DBMS you are using.

Oracle deadlock without explicit locking and read committed isolation level, why?

I get this error Message: ORA-00060: deadlock detected while waiting for resource even though I am not using any explicit table locking and my isolation level is set to READ COMMITTED.
I use multiple threads over the Spring TransactionTemplate with default propagation. In my business logic the data is separated so that two transaction will never have the same set of data. Therefor I don't need SERIALIZABLE
Why can Oracle detect a deadlock? Deadlocks are impossible in this constellation, or am I missing something? If I'm not missing anything then my separation algorithm must be wrong, right? Or could there be some other explaination?
Oracle by default does row level locking. You mention using multiple threads. I suspect one thread is locking one row then attempting to lock another which has been locked by another thread. That other thread is then attempting to lock the row the first thread locked. At this point, Oracle will automatically detect a deadlock and break it. The two rows mentioned above could be in the same table or in different tables.
A careful review of what each thread is doing is the starting point. It may be necessary to decide to not run things in parallel, or it may be necessary to use an explicit locking mechanism (select for update for example).
LMK of what you find and of any additional questions….
K
Encountering deadlocks has nothing to do per se with the serialization level. When a row is inserted/updated/deleted oracle locks the row. If you have two transactions running concurrently and trying to change the same row, you can encounter a deadlock. The emphasis in on "CAN". This generally happens if different type of transactions take locks in a different order, which is a sign of bad transaction design.
As was previously mentioned a trace file is generated on encountering a deadlock. If you look at the trace file, you can determine which two sessions are involved in the deadlock. In addition it also shows the respective SQL statements.

Integrity and Confidentiality in Distributed Transactions

I've a question regarding distributed transactions. Let's assume I have 3 transaction programs:
Transaction A
begin
a=read(A)
b=read(B)
c=a+b
write(C,c)
commit
Transaction B
begin
a=read(A)
a=a+1
write(A,a)
commit
Transaction C
begin
c=read(C)
c=c*2
write(A,c)
commit
So there are 5 pairs of critical operations: C2-A5, A2-B4, B4-C4, B2-C4, A2-C4.
I should ensure integrity and confidentiality, do you have any idea of how to achieve it?
Thank you in advance!
What you have described in your post is a common situation in multi-user systems. Different sessions simultaneously start transactions using the same tables and indeed the same rows. There are two issues here:
What happens if Session C reads a record after Session A has updated it but before Session A has committed its trandsaction?
What happens if Session C updates the same record which Session A has updated but not committed?
(Your scenario only illustrates the first of these issues).
The answer to the first question is ioslation level. This is the definition of the visibility of uncommmitted transactions across sessions. The ANSI standard specifies four levels:
SERIALIZABLE: no changes from another session are ever visible.
REPEATABLE READ: phantom reads allowed, that is the same query executed twice may return different results.
READ COMMITTED: only changes which have been committed by another session are visible.
READ UNCOMMITTED: diryt readsallowed, that is uncommitted changes from one session are visible in another.
Different flavours or database implement these in different fashions, and not all databases support all of them. For instance, Oracle only supports READ COMMITTED and SERIALIZABLE, and it implements SERIALIZABLE as a snapsot (i.e. it is a read-only transaction). However, it uses multiversion concurrency control to prevent non-repeatable reads in READ COMMITTED transactions.
So, coming back to your question, the answer is: set the appropriate Isolation Level. What the appropriate level is depends on what levels your database supports, and what behaviour you wish to happen. Probably you want READ COMMITTED or SERIALIZABLE, that is you want your transactions to proceed on the basis of data values being consistent with the start of the transaction.
As to the other matter, the answer is simpler: transactions must issue locks on tables or preferably just the required rows, before they start to update them. This ensures that the transaction can proceed to change those values without causing a deadlock. This is called pessimistic locking. It is not possible in applications which use connection pooling (i.e. most web-based applications), and the situation there is much gnarlier.

Resources